Posted by mchampion
on July 7, 2003 at 7:05 AM PDT
XML makes it easier for those who want to agree on a data "standard" to nail down the technical details. On the other hand, when data is sent around or stored in XML, lots of work can be done without agreement or authority.
The contentious world of RSS and the "(not) Echo" project have been featured in a number of java.net weblogs recently by Simon Phipps and Daniel Steinberg . I've been intrigued by RSS for awhile because it illustrates both the challenges one faces in the real world in getting agreement on what seems like a simple problem, but also on the ability of XML to provide robust solutions even in the absence of agreement.
One of the biggest issues in the (N)Echo debate is whether a new, presumably improved format is worth the disruption it will cause to established users and software developers. Some are saying "If it ain't broke, don't fix it." -- Tinkering is more likely to break the existing applications such as RSS aggregators than to provide a solid foundation for further development. Stability in formats and protocols is, in this view, what the weblogging world needs to continue to expand and prosper.
This argument would be quite compelling in a world without XML, but is somewhat moot now that XML is pervasive: The whole point of XML's "self-describing" tags  is to allow loose coupling between producers and consumers of data. In one widely held view, this means "all that the producers and consumers of information have to agree on is the XML format, and software that supports it can evolve freely." I'd contend, however, that if a community could agree on the data format there might be little need for XML -- a CSV or ASN.1 or Java serialized object format without tags would work more efficiently, be easier to integrate with procedural code, require less network bandwidth, etc. The problem with agreed-upon formats -- besides the difficulty of achieving agreement, of course!-- is their fragility in the face of inevitable change.
The power of XML's tags (namespaced or otherwise) is to allow variation and evolution. The history of RSS bears witness to this very clearly. Even in a world of chronic disagreement, rapid innovation, and several contending "standards," the actual software that syndicates weblogs and aggregates diverse feeds has been remarkably robust. In fact, software to produce and consume (N)Echo appeared almost immediately after . This rapid response wasn't due, AFAIK, to late night hacking but to simple tweaks to the scripts, stylesheets, etc. that had evolved to support the diverse flavors of RSS previously seen.
I don't want to understate the overall business importance of having authoritative "standards" (de jure, de facto, ad hoc, or whatever) in this area. Tim Bray has made a compelling case for this and that seems to have been one of the galvanizing factors in the formation of the (N)Echo community. But whether or not Bray's prototypical Mr. Safe can cope with controversy and diversity, XML's basic technology is definitely up to the job. An eventual "standard" for an RSS-like format will help corporate developers using drag-n-drop IDEs to more easily develop software to produce and process syndication streams, but stasis is not necessary for progress in the XML world. In fact, the evolutionary potential of XML's ability to support loosely coupled applications is its greatest strength.
 Sigh, am not under the illusion that XML instances are "self-describing" in any philosophical sense. XML is "self-describing" only by comparison to alternative formats such as CSV or ASN.1 that require a more rigid data format and does not "tag" individual data items. The XML markup may refer to "namespaces" in which the semantics of specific element names are rigorously defined using an ontology language, or they may simply be hints exploited by a heuristic algorithm, but they supply additional information that a processor can exploit.