
An interesting discussion is going on at this blog over what constitutes, and what should constitute, the standard way of ‘tagging’. It criticises the usual suspect in this area, del.icio.us, which forces users to use contiguous tags by using ’space’ as its tag delimiter. Most other sites employing tagging, such as Flickr, Technorati and also the WordPress tagging system Ultimate Tag Warrior, allow the use of some other delimiter (commas or speech marks mostly).
Read the rest of this entry for a more in-depth discussion of this issue.
That there isn’t a common standard for tags is not surprising, perhaps. The underlying technology of XML should make the delimiter a mere presentation issue when the information is used in other contexts: there are many XML namespaces which solve this issue by structuring the document with repeatable ‘tag’ XML fields.
But while XHTML is used without additional namespaces, interoperability remains a problem. Without namespsaces explicitly defining tags as part of the document structure, they are simply arbitrary words on a web page. del.icio.us seems alone in forcing its users to use single-word tags, while the others have not come to a clear consensus on delimiters or restrictions.
The issue is complicated further when tags are represented by URLs. Tags must be made URL-safe by omitting or substituting non-URL-safe characters, such as space characters and umlauts over nouns (ü). Different sites have varying algorithms to deal with this problem, and it can lead to tags being represented in quite different ways in URLs between sites: thus an application looking for items tagged “São Paulo” through various sites’ web services would have to allow for systems’ idiosyncrasies: requesting ‘SãoPaulo’ from del.icio.us, ’sao-paulo’ from Flickr and countless other formats for other sites. We have come across this difficulty ourselves in exploring features for Padova, our meta-linking media repository front-end, which integrate tagged content from external sources.
This is a general issue with URLs, but given that tagging’s power is in its simplicity, this lack of a standard restricts the vocabulary of the technology substantially, particularly in languages other than English.
It seems that the definition of a tag has undergone some changes in recent years: from being simply a one-word keyword to an arbitrary set of grouped words. Adopting an XML namespace which supports tagging in XHTML web pages would be a step forward, but the utility of doing this would depend on the page context. The discussion on delimiters and URL representation is worthwhile regardless.
Here’s the Slashdot story.
The Need For A Tagging Standard:
John Carmichael writes “Tags are everywhere now. Not just blogs, but famous news sites, corporate press bulletins, forums, and even Slashdot. That’s why it’s such a shame that they’re rendered almost entirely useless by the lack of a tagging standard with which tags from various sites and tag aggregators like Technorati and Del.icio.us can compare and relate tags to one another. Depending on where you go and who you ask, tags are implemented differently, and even defined in their own unique way. Even more importantly, tags were meant to be universal and compatible: a medium of sharing and conveying info across the blogosphere — the very embodiment of a semantic web. Unfortunately, they’re not. Far from it, tags create more discord and confusion than they do minimize it. I have to say, it would be nice to just learn one way of tagging content and using it everywhere.”"
>(Via Slashdot).
Picture: Tags: Keywords to describe digital objects by cambodia4kidsorg on Flickr. Creative Commons licensed.