Recently Bing, Google and Yahoo! made an agreement to rely on a standard markup to improve the display of their search results, thereby making it easier for people to find the right web pages. They chose Microdata as annotation format to embed the types and properties of the content within the web pages. Microdata is a new feature supported in HTML 5 which provides a mechanism to allow machine-readable data to be embedded in HTML documents in an easy-to-write manner, with an unambiguous parsing model. Schema.org aims at providing a shared collection of schemas that webmasters can use for their Microdata markup.
That sounds a great news. It can be considered as a step towards realizing the semantic web. Then, what is the problem?!
There has been a long discussion in semantic web mailing list about this new announcement. Some people do agree on it as a progress to promote the semantic web efforts and some people criticize it as a new revenue model for the web monsters. In this post I wanted to publish my thoughts about Schema.org approach.
The first question which comes to my mind in this context is: Why Microdata? Why not RDFa or either Microformat?
It looks great that three frontier companies on the web have come to a consensus. I really like and appreciate it. It is a moment that rarely can be seen on a tough competition unless there is a win-win benefit out of it. But I believe doing a job well is better than doing just a good job when there is an opportunity. About choosing Microdata not RDFa or microformats, they mention:
“Focusing on microdata was a pragmatic decision. Supporting multiple syntaxes makes documentation for webmasters more complex and introduces more overhead in terms of defining new formats. Microformats are concise and easy to understand, but they don’t offer an open extensibility mechanism and the reuse of the class tag can cause conflicts with website CSS. RDFa is extensible and very expressive, but the substantial complexity of the language has contributed to slower adoption. Microdata is the most recent well-known standard, created along with HTML5. It strikes a balance between extensibility and simplicity, and is most suitable for building the schema.org. Google and Yahoo! have in the past supported both microformats and RDFa for certain schemas and will continue to support these syntaxes for those schemas. We will also be monitoring the web for RDFa and microformats adoption and if they pick up, we will look into supporting these syntaxes.”
I don’t think it is a good reasoning for a pragmatic decision. Simple is not always the better! As Samuel says, while there are a number of technical merits that speak in favor of RDFa over Microformats and Microdata (fully qualified vocabulary terms, prefix short-hand via CURIEs, accessibility-friendly, unified processing rules, etc. please take a look at this to see that RDFa is not really so complicated!), the main point is realizing of centralized innovation vs. distributed innovation. The web has always relied on distributed innovation and RDFa allows that sort of innovation to continue by solving the tenable problem of a semantics expression mechanism. Microdata has no such general purpose solution. Although it can facilitate one specific problem like searching data, it is not well scalable with the vision of semantic web. Schema.org as a centralized solution for web of data is really in conflict with the vision of making benefit out of distributed information islands. I wish they could make a better decision to speed up realizing a web of knowledge!
Leave a Reply