Skip to main content

Binary XML vs Binary Data in XML...

11 replies [Last post]
pelegri
Offline
Joined: 2003-06-06
Points: 0

Every now and then somebody, intentionally or not, confuses efficient encoding of binary data in XML with binary XML; here is a recent posting: http://www.theserverside.com/news/thread.tss?thread_id=31940.

To clarify the situation, JAX-RPC 2.0 will be supporting the new W3C specs for efficient encoding of binary data, and the reference implementation (http://jax-rpc.dev.java.net) will *also* support the Fast Infoset standard, using the implementation in the FI (http://fi.dev.java.net) project.

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
andreasst
Offline
Joined: 2004-04-26
Points: 0

> > (BTW There is the danger that "enhanced features"
> for
> > Webservices in some Application servers could turn
> > the theoretical advantage of decoupling to a
> farce).
> >
>
> Can you give some examples of these advanced
> features?

Well, maybe we can argue if that are "enhanced features". Some are maybe more "lack of features" and "doesn't conform to the spec" ;-).

Example: We can generate webservices from EJB's. At first, this sounds good and handy, but can have some consequences:
- At the moment there is the chance, that a so generated webservice only works with the application server which generated it. So in the worst case (and we have examples of that), we need libraries from this application server on the client side to have a working solution. So where is decoupling then?
- If we will have a interoperable, handy webservice, we start first defining the WSDL and then generate the webservice out of it. If we don't do that, the chance is there that the WSDL gets rather complex and doesn't work over multiple technologies (Java, .net, Perl...). So then we don't have interoperability and we don't have decoupling. Too sad that the Application server we use can't generate an ejb driven service out of a WSDL.
- Some application server are webservice version-sticky. So if we update the application server, we must change all clients to a new webservice version because all webservices will be regenerated in a new version. It is not possible to keep the old version and a new version in parallel. That would make migration scenarios more flexible. For me that is not the meaning of decoupling things.

After all, the ws-i.org initiative is a result out of that problems. So I hope things coming better... ;-)

Cheers

Andrew

sandoz
Offline
Joined: 2003-06-20
Points: 0

> > > (BTW There is the danger that "enhanced
> features"
> > for
> > > Webservices in some Application servers could
> turn
> > > the theoretical advantage of decoupling to a
> > farce).
> > >
> >
> > Can you give some examples of these advanced
> > features?
>
> Well, maybe we can argue if that are "enhanced
> features". Some are maybe more "lack of features" and
> "doesn't conform to the spec" ;-).
>

I read "enhanced" but wrote "advanced" for some reason! the mind warps in mysterious ways....

> Example: We can generate webservices from EJB's. At
> first, this sounds good and handy, but can have some
> consequences:
> - At the moment there is the chance, that a so
> generated webservice only works with the application
> server which generated it. So in the worst case (and
> we have examples of that), we need libraries from
> this application server on the client side to have a
> working solution. So where is decoupling then?

Bad!

> - If we will have a interoperable, handy webservice,
> we start first defining the WSDL and then generate
> the webservice out of it. If we don't do that, the
> chance is there that the WSDL gets rather complex and
> doesn't work over multiple technologies (Java, .net,
> Perl...). So then we don't have interoperability and
> we don't have decoupling. Too sad that the
> Application server we use can't generate an ejb
> driven service out of a WSDL.

Not good.

You might want to check out an early draft of JAX-RPC 2.0 Java to WSDL 1.1 mapping (Chapter 3):

http://weblogs.java.net/blog/mhadley/archive/2005/02/jaxrpc_20_early_1.html

to see if this is an improvement. The JSR expert group like to hear feedback!

> - Some application server are webservice
> version-sticky. So if we update the application
> server, we must change all clients to a new
> webservice version because all webservices will be
> regenerated in a new version. It is not possible to
> keep the old version and a new version in parallel.
> That would make migration scenarios more flexible.
> For me that is not the meaning of decoupling things.
>

Even worse!!

> After all, the ws-i.org initiative is a result out of
> that problems. So I hope things coming better... ;-)
>

Yes. I think the Basic Profile group has done very good work at locking down the SOAP 1.1 and WSDL 1.1 specifications (or W3C Notes if one is pedantic).

Paul.

andreasst
Offline
Joined: 2004-04-26
Points: 0

> Thanks for your replies. Now I think I understand the
> whole picture better. Instead of having solely a text
> representation of an XML document, we can now choose
> to have an FI representation, which is more efficient
> in space and can still be accessed/traversed using
> well known APIs like DOM and SAX or XSLT.

The most performance in parsing would be if the native binary presentation like numbers and other binary data wouldn't be converted to strings. SAX, DOM and almost all other XML-API's, they're all string based. I don't see that big performance benefit to convert a integer to a string to a binary presentation back to a string to a integer.

So I think the chance for a binary representation of a XSD-based XML document with performance and resource benefits is there when you can access the native types directly. For that, you need a different interface which doesn't belong to the standards DOM and SAX.
But then you'll leave the advantage of the already proofen Standard interfaces for XML. In my opinion XSD based XML could have a different, more datatype-specific interface for access.

> In my applications, where I use XML to store data in
> a way I can easily modify by hand, I can/should not
> use FI. But WS can benefit a lot from it due to the
> data transmission performance increase.
>
> So far, so good.

It is possible that data transmission is an issue. However in today intranets where we speak almost everywhere about gigabit ethernet's, I don't see the network transport as the most important performance headache.

> I can't still see why "WS represents an evolutionary
> step further than CORBA". I see it as an alternate
> technology, more appropiate in some scenarios, and
> less appropiate in others. WS just had the right
> sponsors, and along with XML they have been
> overstimated and overused.

Well, there are some true points. CORBA has some standardized solutions in areas which WS still is not on the point to be called "stable" or incomplete (in a direct comparison). E.g. XA transactions, CSIv2 Security, Message-lifetime, fault tolerance... not to mention all that already standardized CORBA services.

The advantage of webservice is decoupling. Webservices couple on the area of Agreements and Messages (to define a service).
CORBA has also coupling in the areas of programming languages, object model, Application servers.
(BTW There is the danger that "enhanced features" for Webservices in some Application servers could turn the theoretical advantage of decoupling to a farce).

Having said that, there are other, binary based standards beside of XML which could also decoupling that way (e.g. ASN). But XML is easy to read and understand. For that it's also alot easier to debug it.
In my opinion that's the main reason why the base of Webservices is string based XML and not some binary standard.

> Sorry again as this may not be the right topic/forum
> where to post this question.

I think it is the right forum.

Cheers

Andrew

sandoz
Offline
Joined: 2003-06-20
Points: 0

> The most performance in parsing would be if the
> native binary presentation like numbers and other
> binary data wouldn't be converted to strings. SAX,
> DOM and almost all other XML-API's, they're all
> string based. I don't see that big performance
> benefit to convert a integer to a string to a binary
> presentation back to a string to a integer.
>

Yes, especially for arrays of floating point number.

However, there is still benefit in parsing a fast infoset document compared to parsing an XML document.

Fast Infoset specifies encoding algorithms that, for example, define how strings can be mapped to and from binary data for arrays of integers and IEEE floating point numbers. Conceptually a string is still used but an implementation can go from an in memory representation of integers to the binary bits without producing intermediate characters.

> So I think the chance for a binary representation of
> a XSD-based XML document with performance and
> resource benefits is there when you can access the
> native types directly. For that, you need a different
> interface which doesn't belong to the standards DOM
> and SAX.

The Fast Infoset project is in the process of developing an API for SAX. The Fast Infoset SAX serializer and SAX parser currently support this API for the primitive types byte[], int[] and float[].

If the extended APIs are not used by a client then characters are always returned even if the data is in binary form.

> But then you'll leave the advantage of the already
> proofen Standard interfaces for XML. In my opinion
> XSD based XML could have a different, more
> datatype-specific interface for access.
>

Some of this can be hidden if using a data-binding API like JAXB. I will be investigating how Fast Infoset can be integrated into JAXB 2.0 soon.

> Well, there are some true points. CORBA has some
> standardized solutions in areas which WS still is not
> on the point to be called "stable" or incomplete (in
> a direct comparison). E.g. XA transactions, CSIv2
> Security, Message-lifetime, fault tolerance... not to
> mention all that already standardized CORBA
> services.
>
> The advantage of webservice is decoupling.
> Webservices couple on the area of Agreements and
> Messages (to define a service).

Agreed and also the concept of different message exchange patterns that is defined in WSDL 2.0.

> CORBA has also coupling in the areas of programming
> languages, object model, Application servers.
> (BTW There is the danger that "enhanced features" for
> Webservices in some Application servers could turn
> the theoretical advantage of decoupling to a farce).
>

Can you give some examples of these advanced features?

> Having said that, there are other, binary based
> standards beside of XML which could also decoupling
> that way (e.g. ASN).

The Fast Web Services standard is heavily based on ASN.1, the mapping of XSD to ASN.1 and the PER encoding.

> But XML is easy to read and
> understand. For that it's also alot easier to debug
> it.
> In my opinion that's the main reason why the base of
> Webservices is string based XML and not some binary
> standard.
>

Respectfully disagree. Debugging is advantageous before a system is deployed but not when there are 100s of transactions a second. Debugging is what the tools vendors should do not the deployers.

I think it is for two reasons:

1) Self-description; and
2) Decoupling from schema.

Although the intermediary model is not yet fully proven partial understand and update of a message by an intermediary is a very powerful concept.

This allows one to extend the SOAP protocol with SOAP header blocks. It is what allows security solutions to work the way they do.

> > Sorry again as this may not be the right
> topic/forum
> > where to post this question.
>
> I think it is the right forum.
>

Agreed.

Paul.

pelegri
Offline
Joined: 2003-06-06
Points: 0

re: why use WS instead of Corba...

In addition to Paul's arguments, there is also the: "I have a hammer, thus all problems are nails". I sometimes also describe that as: "Programmers are lazy, they prefer to reuse concepts and machinery in as many situations as possible rather than use different tools for each case".

- eduard/o

sandoz
Offline
Joined: 2003-06-20
Points: 0

>As far as I know, before Web Services and other xml-based
>communication tecnologies, there were (and still are)
>pretty good binary-based transmission technologies and
>protocols, more-or-less standard, like DCOM, RMI and
>CORBA.

Agreed they are capable technologies but they follow a different architecture to that of Web services.

The Web services model tends to be more flexible (or loosely-coupled) for transport, asynchrony, intermediaries and protocol (extensions to SOAP e.g. reliable messaging and security).

In addition i would further add that the encoding of the information that is transmitted can also be a choice dependent on how loosely-coupled you want to be. FI represents an alternative encoding but still allows for loose-coupling and re-use of all the other Web services concepts (WSDL, XSD, BPEL, XSLT, Security and Encryption).

>I thought Web-Services appeared to make it easier for
>applications to inter-operate across internet:

Yes.

> - it is text (xml) based so it can work using port 80
>and bypass firewalls;

Port 80 does not care if HTML documents encoded in UTF-8 or UTF-16, GIF or PNG images, UTF-8 encoded XML documents or a fast infoset documents are transmitted over it.

>- it is programming language independent, like CORBA;
>- it is easier for a human to read what is being >transferred;

Agreed. Although when there are say 60 transactions a minute being processed the last thing you want is a human in the loop reading the messages and slowing the transaction rate to 1 a minute with the potential for human error!

I think the advantage of human readability is for debugging the system and logging for when some messages need to be looked at sometime later (e.g. for legal reasons).

When the system is deployed a more optimal encoding can be used. Fast infoset documents can be easily converted to XML documents. A network interposing tool could easily be modified to view fast infoset documents as XML based on the MIME type.

>We all knew it was going to be slower.
>
>Now, it seems FI tries to translate XML to binary for
>performance reasons. So now, lets say I'm working with
>Java and need to send data. I need first to "serialize"
>my data into XML and then this data is "serialized" again
>into binary with FI. It looks like a double
>serialization.

Fast Infoset specifies a binary encoding the XML Information Set not XML documents. An XML infoset is what you get after parsing an XML document. An XML infoset is what you get after parsing a fast infoset document.

There are many programmatic representations of an XML infoset using Java APIs: SAX, StAX, DOM, JAX-RPC, JAXB, XOM.

Check out https://fi.dev.java.net/ for serializers and parsers to produce and consume fast infoset documents using standard APIs like SAX, StAX and DOM. There are also scripts to convert to and from fast documente and XML documents.

We are using these parsers/serializers with JAX-RPC and SAAJ and at no point is XML converted to FI and back again.

>Wouldn't it be easier, if we really need the performance
>increase, to go back to CORBA, which is in turn much more
>mature and feature-rich than WS?

My understanding is that WS represents an evolutionary step further than CORBA based on the issues that CORBA did not solve very well. The encoding of information is just one important aspect of the WS architecture.

One thing we have been trying to achieve from the outset of the 'Fast' work is to allow for more performance without completely changing the Web services architecture.

People want to reuse all the same tools for WSDL, XSD etc but also in some circumstances do not want the increase in bandwidth and processing. There is solid business case for consolidation of tools so that there is one way to develop and deploy services. Using the same tools but allowing a choice of encoding IMHO offers the best of both worlds.

Paul.

pelegri
Offline
Joined: 2003-06-06
Points: 0

Hey, Paul and I gave the same answers... and without coordinating! It must be true! :-) :-)

- eduard/o

eduardj
Offline
Joined: 2004-06-21
Points: 0

Thanks for your replies. Now I think I understand the whole picture better. Instead of having solely a text representation of an XML document, we can now choose to have an FI representation, which is more efficient in space and can still be accessed/traversed using well known APIs like DOM and SAX or XSLT.

In my applications, where I use XML to store data in a way I can easily modify by hand, I can/should not use FI. But WS can benefit a lot from it due to the data transmission performance increase.

So far, so good.

I can't still see why "WS represents an evolutionary step further than CORBA". I see it as an alternate technology, more appropiate in some scenarios, and less appropiate in others. WS just had the right sponsors, and along with XML they have been overstimated and overused.
So my question is, will the use of FI, with its performance increase, be enough to make WS more appropiate than CORBA (or other technologies) in most escenarios? Will WS be more efficient than the others? Sorry for this unfair question: "most escenarios" is not really specific, but I'd like to know your opinion.

For instance, I have only used WS once. It was three years ago. We had a relational DB, a C# server and ASP.NET client. client and server communicating through Web Services. The data was not XML, and in my opinion should not be. WS was using XML for communication purposes but for us, the developers, that didn't really matter (except for performance headaches). I can only think of one reason to use WS here: the customer had read somewhere about the coolness of WS and was willing to pay more to have it.

Sorry again as this may not be the right topic/forum where to post this question.

Thanks,

Edu

sandoz
Offline
Joined: 2003-06-20
Points: 0

>In my applications, where I use XML to store data in a
>way I can easily modify by hand, I can/should not use >FI.

If your documents need to be edited by hand but are parsed multiple times you could get benefits by maintaining a cached copies of FI documents converted from the XML documents.

It is possible to easily read a fast infoset document by running the script:

fitosaxtoxml | vi -

there is probbably a way in vi to save the contents of the file (but i dunno how to do it) by running the script:

xmltosaxtofi

but it is not very elegant!

>I can't still see why "WS represents an evolutionary step
>further than CORBA".

Because it solves problems relevant to today that CORBA does not solve very well just like CORBA solved yesterday the problems that XDR/RPC did not solve very well the day before yesterday.

>I see it as an alternate technology, more appropiate in >some scenarios, and less appropiate in others.

I tend to view CORBA as being useful in tightly coupled scenarios. If it is sufficient for what you want to do then use it.

However, it is possible to use WS in tightly coupled scnerios if you change the encoding of information. We have an optimal encoding we refer to as Fast Schema that does just that. The benefit is reuse of the same concepts and tools. Then it is about making tradeoffs between the properties of the encoding and performance. Once those trade-offs are accepted it becomes a deployment issue rather than a development and deployment issue of having to use two different architectures and platforms.

> WS just had the right sponsors, and along with XML they
> have been overstimated and overused.

There maybe some truth in that, but i also think there is a genuine desire to increase interoperability and improve on the existing generation of distributed communication systems.

XML documents are just one aspect of a whole bunch of WS stuff that specify how to communicate just like CDR is just one aspect of a whole bunch of CORBA stuff that specify how to communicate.

>So my question is, will the use of FI, with its
>performance increase, be enough to make WS more
>appropiate than CORBA (or other technologies) in most
>escenarios? Will WS be more efficient than the others?
>Sorry for this unfair question: "most escenarios" is not
>really specific, but I'd like to know your opinion.

It is a fair question, and to be honest it remains to be seen. Tests we have performed show that it is possible to get close to RMI/IIOP performance using Fast Infoset, but to get to RMI performance a more optimal encoding (Fast Schema) is required that drops the self-describing property.

Thus it may be sufficient in some scenarios but not in others.

Paul.

eduardj
Offline
Joined: 2004-06-21
Points: 0

Sorry, I don't see the motivation of this XML to binary stuff.

As far as I know, before Web Services and other xml-based communication tecnologies, there were (and still are) pretty good binary-based transmission technologies and protocols, more-or-less standard, like DCOM, RMI and CORBA.
I thought Web-Services appeared to make it easier for applications to inter-operate across internet:
- it is text (xml) based so it can work using port 80 and bypass firewalls;
- it is programming language independent, like CORBA;
- it is easier for a human to read what is being transferred;

We all knew it was going to be slower.

Now, it seems FI tries to translate XML to binary for performance reasons. So now, lets say I'm working with Java and need to send data. I need first to "serialize" my data into XML and then this data is "serialized" again into binary with FI. It looks like a double serialization.

Wouldn't it be easier, if we really need the performance increase, to go back to CORBA, which is in turn much more mature and feature-rich than WS?

Thanks,

Edu

pelegri
Offline
Joined: 2003-06-06
Points: 0

> Sorry, I don't see the motivation of this XML to binary stuff.

I'll get back to that at the end...

> Now, it seems FI tries to translate XML to binary for performance reasons. So now, lets say I'm working with Java and need to send data. I need first to "serialize" my data into XML and then this data is "serialized" again into binary with FI. It looks like a double serialization.

There are two types of scenarios. In one the peer has been manipulating XML content all along, so you really are not moving from a Java type to XML. That scenario is going to become more and more prevalent.

In the other scenario you do start with a Java type. But there is not really a double serialization: the FI encoding is happening as you create the XML representation [the encoding for Fast Schema is actually more complex as a different, less redundant, encoding is created - but that is a longer discussion].

> Wouldn't it be easier, if we really need the performance increase, to go back to CORBA, which is in turn much more mature and feature-rich than WS?

That's a fair question. The are two answers: technical reasons and business reasons.

Technically, WS is focused on structured documents and on interoperability from the beginning. That means, for example, that there are plenty of agents that rely on partial information of the requests to deal with partial bindings of data & updates, version mismatches, management, routing, monitoring, etc, etc.

Business-wise, there are two big reasons. One is interop across the Java and MS platforms. The other is the price point of a widely available technology.

We have talked with many customers that want to use XML and WS tools to solve their problems, but for which textual encoding is just not practical.

> Thanks,
>
> Edu

Hope that helps some,

- eduard/o