Posted by mchampion
on December 11, 2003 at 11:50 AM PST
Most would agree that we need more metadata on the Web for it to live up to its full potential. On the other hand, the historical difficulty of getting real people to put metadata in their content is believed by many to doom such efforts to failure. Jon Udell's insight, presented in a keynote at XML 2003, is that we can leverage the technology we have, salted by human vanity, to get usable metadata without technological breakthroughs or unrealistic demands on humans.
This is the first of several reflections on what I think I learned here at the XML 2003 conference in Philadelphia. Sorry if it's too XML-geeky and not of sufficient interest to Java people, but I think a lot of what I heard people talking about have considerable relevance beyond the XML community they were aimed at.
Jon Udell gave a keynote speech on Tuesday that pierced the jaded, slightly cynical shell I've acquired after about 8 years in the XML world. He didn't talk about "maybe someday..." or "if only ...", he showed what a little imagination can do with the widely deployed XHTML, CSS, and XPath technologies today.
He began with an explanation of why the idea of an XML document that can be shared by different application is "magical," using two speeches by Bill Gates to illustrate the point. 10 years or so ago, Gates' slogan was "information at your fingertips," in which information workers could bring together text and data from a variety of applications and easily find, reuse, and revise it. The vision was a powerful one, but we are only beginning to realize it today. Part of the reason for the delay, Udell argued, was that in the OLE architecture that Microsoft promoted (and in the OpenDoc architecture of its rivals), each application owned its own proprietary data, and OLE/OpenDoc provided a means by which they could talk to each other. This worked to some extent, but created tight couplings (I'm not sure if Udell used that term) among the applications, object models, and operating systems and programming languages that support them.
In a more recent version of Gates' stump speech, the key idea is a "universal canvas" enabled by XML. Rather than applications communicating to access each others' proprietary data, they share a common XML representation of the data -- some may be numbers manipulated with a spreadsheet, others text manipulated by a wordprocessor, still others structured data manipulated via a forms interface. This vision is already very close to reality in Microsoft's Office 2003 product (and Udell's keynote was followed by a presentation from Adobe that demonstrated their very similar vision in action).
What makes XML a good "universal canvas"? In Udell's words, "contextual metadata is XML's gift to mankind." To demonstrate this, he showed how he had been eating his own dogfood in the presentation slides, which were being displayed in Mozilla on a Mac, produced with ordinary XHTML, with CSS styling, and some script. He showed some queries that exploit Mozilla's XPath support on data that wasn't really designed to be queried, but exploits the metadata that XHTML authors create without thinking about it -- links, attribute values put in to support CSS styles, etc. When the idea of XML-defined context is more explicitly supported, as in Kimbro Staken's Syncato weblog tool (or is it an "XML fragment management system?"), the power of XML's ability to model context and XPath's ability to query it becomes even more apparent. In any event, knowing that one will use XPath to retrieve information in the future motivates one to think about the structure of content as one writes -- creating the contextual metadata magic without much extra effort.
Udell went on to talk about how these advantages can be extended to email and instant messages, in which most real business communication takes place (and thus where the real content is created). We need to build tools and products that tap the small packets of structure and context and pull it into useful business information. Although Microsoft Office's XML support does not extend into Outlook, there are ways to archive email and IM in XML, and Udell challenged the audience to help devise real products and integrations to exploit the potential this would create. In any event, we don't need new standards, or a solution to the thorny issue of the diversity among flavors of RSS/Atom, to make progress -- XHTML, CSS, and XPath provide the basics of what we need. Udell argued that we need to figure out how to use what we have in a smarter way, and to "smuggle" metadata into content because it leverages the universal urge to make things look cool.
So why did this pierce my cynical shell? Most would agree that we need more metadata on the Web for it to live up to its full potential -- that's the very premise of the Semantic Web effort in which Tim Berners-Lee has invested much of the W3C's resources (and credibility). On the other hand, the historical difficulty of getting real people to put metadata in their content is believed by many to doom such efforts to failure. (Cory Doctrow's essay is the most colorful and cogent, if widely reviled, statement of this position). Udell's insight is that we can leverage the technology we have, salted by human vanity, to get usable metadata without technological breakthroughs or unrealistic demands on humans.