UI for Content Tagging

1/2/07

I’ve come across yet another application that does content tagging completely wrong. And this stuff has been on my mind a lot lately anyway. Nerdiness ahead.

So a little backstory: tagging. New-school method of content organization. Old-school way is to define a taxonomy ahead of time, name your categories and start putting bits of content into them one by one. This is time-consuming, prone to errors, and the taxonomy will be out of date by the time you finish implementing it anyway.

Tagging solves this by letting anybody define whatever ‘tags’ they want for a given bit of content as needed, on the theory that the categories you would’ve defined in a perfect taxonomy will naturally emerge over time because a lot of people will use them. Nifty.

But the reasons tagging works seem to get left out of some applications — or tagging tools appropriate to one situation get applied to a completely different situation where they don’t work at all.

Best case scenario

The best-case scenario for tagging is when you have lots and lots of people defining the tags. If any user at all can come along, read a bit of content, and decide, ‘hmm, this is about antelopes, it really ought to have an “ungulate” tag,’ and can add that ungulate tag, then the next person who still remembers that Gary Larson cartoon will be able to find all the hoofed-mammal content on your site. If somebody decides to tag half the content database as ‘omgwtf’ — and somebody will — no big deal; half the point of tagging is that incorrect tags will be used less often than correct ones, and the rarely-used tags will automatically drop to the bottom of the list. (The other half of the point is that maybe you really do have a lot of ‘omgwtf’ content, and just hadn’t noticed it until that guy and all his friends came along.)

Second-best case scenario

Almost nobody implements the best-case scenario: wikis are pretty much the only situation I can think of. Usually the only people allowed to add tags are the content authors themselves, not the readers — this is how most blog or community-blog systems work.

The reason this is inferior to the first way is that good tagging depends on the law of averages to weed out errors: by restricting the number of people able to add or modify tags to a handful of content authors, or in the case of a single-user blog, to one person, you’re slowing the whole process of tag evolution down tremendously. Also, unless your content authors are really obsessed, you probably won’t have them going back over old content and re-tagging them to keep up with current trends, the way you would with reader-enabled tagging.

Tagging can still work, here, though (obviously). The tags will be more general and there will probably be fewer of them, but given a sufficiently large number of content objects categories can still emerge. Often, readers still do some of the tagging here, just informally: check just about any recent MetaFilter thread to see somebody complaining “Hey, this really needs the batshitinsane tag!” etc.

Worst-case scenario

The most marginal case for tagging is a personal database — one content author, one reader — and they’re the same guy. Yojimbo, for example.

Here you’ve got no law of averages helping you: if you mistag a bit of content, it’ll stay mistagged; nobody else is going to come along and correct it for you.

The only real benefit of using tags here is that you can still define them on the fly: if you create a new bit of content that doesn’t fit into an existing category, adding a new tag effectively creates that category without the user having to go off into some other taxonomy-management interface.