Another failed “Semantic Web” experiment, or a potential gold mine?
Twitter recently announced a new development, called “Annotations”, at the Chirp Twitter developers’ conference. Annotations is a way of adding additional metadata to your tweets, and is in many ways arguably an inevitable expansion of their original self-imposed 140 character limit, which has since become one of their strongest trademarks.
Annotations can be seen as the counterpart of “tags” which are often used on blog posts. They provide a context for the tweet – a semantics of sorts. Most twitter users are familiar with “hash tags”, which can be seen as the informal precursor to annotations.
Necessity is the mother of invention, and because natural language can be very ambiguous (ask any computational linguist!), the community of short message system users quickly started using hash tags to overcome this problem. The main motivation was to get the intended message across with the least amount of ambiguity, within the 140 character restriction.
The second motivation that led to the widespread use of tags in general, and hash tags in particular, is the ability it affords the author to provide a concise high-level summary of his intended message – a delineation of the intended conceptual grounds it attempts to cover. Therefore, messages with matching tags could point towards a higher conceptual match than messages simply containing the same keywords.
For instance, as an example, say you were interested in users’ opinions of Apple products in general. If your search for “apple products” presented you with a list of ten pages’ titles only, it might be fair to assume that an article titled “Apple Products” relates to the high-tech company, Apple Inc., and some of their products. However, can you completely rule out the possibility that it relates to McCutcheon’s Apple Products Inc? Well yes, if the author included tags such as “#high-tech”, “#gadgets”, “#apple-company” or “#review”!
But it goes deeper than this: Once objects are tagged, this allows one to explore “themes” in data. And yes, the notion of tags is not limited to textual messages only, it can be applied to all sorts of multimedia and hard-to-index data. This is one of the easiest forms of recommending “related” objects to users: Let users collaboratively tag objects with descriptive tags, and cluster these objects based on these tags – i.e. the more their tags agree the more “related” they are. We are still a long way from being able to automatically analyse, say, video content to detect the main themes or moods or other hard-to-define characteristics contained therein. Humans, however, have been doing this sort of thing for ages.
So how does this relate to Twitter’s chirpy new feature, “Annotations”? Twitter made sure not to define exactly what annotations are for, and how they should be used, but to “encourage innovation in their use” from the community. Initially, you get an additional 512 bytes of data, which “might be increased to up to 2 KB”, within which you can embed your own annotations in the form of name space/key/value pair JSON structures. Important to note is the use of name spaces which allows users to specify the semantics underlying their annotations.
What are the possible implications of this? Well, first off, providing dedicated extra space for annotations could see hash tags being moved from the main body of the tweet to the annotation, thus freeing up precious tweet-estate. Also, and very importantly, Twitter’s compressed format paved the way and created a market for services like URL-shorteners. With the possibility of a dedicated separate space for URLs, use of those services would become a choice and not a necessity.
But that’s just the surface stuff. It is important to realise that what Twitter has created with this, is not merely an added “feature”, but potentially a new platform. The open-endedness in which this is rolled out, provides at once amazing possibilities and a potential developer’s nightmare.
The Semantic Web and Linked Data concepts have slowly progressed over the last decade from a mere pipe dream to a serious initiative gaining lots of traction world-wide. With Twitter’s recently disclosed user-base of 105 million users and their phenomenal growth, they might just be in a position to lend this initiative some serious momentum.
At its heart, the Semantic Web notion strives to make the meaning or semantics of information available to machines, in order to provide a platform where machines can more readily meet the growing information needs of users. The incredible volume of tweets – 55 million tweets a day, 600 million searches a day, as revealed by Twitter – coupled with the ability to potentially make these understandable by machines via some agreed upon semantics, whether formally defined or decided via crowd-sourcing, sets the table for some potentially very interesting applications in the near future.
One could argue that this produces a great platform for incrementally refining machines’ understanding of linguistic “meaning” in the way people tweet and retweet their content, and in the way they apply semantic annotations to this content. Also, and admittedly much less ambitious, the possibilities of having added annotations for location, platform and temporal information and what not greatly increases the possibilities for personalised services catering to your likes and dislikes. This could happen for the simple reason that, wait for it, the service now knows more about you.
All in all, it seems this new addition to the Twitter camp was a necessary move for them as a service to move forward. It brings many hitherto impossible opportunities to the table, but simultaneously calls for a serious and collective effort in defining and engineering the way forward. However, if this pans out well, this might just be referred to in years to come as the #twitularity