Skip to main content

Encoding for FI

8 replies [Last post]
jdmcduf
Offline
Joined: 2004-12-23

Where can I find the schema for FI. I would like to see it to better understand the encoding. I would also like to know what gets padded etc. Thanks.

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
sandoz
Offline
Joined: 2003-06-20

Fast Infoset specifies a number of built-in encoding algorithms that can be mapped to primitive types. The following algorithms can be mapped to primitive types:

boolean
short
int
long
float
double
uuid

The last, uuid, is strictly speaking may not be considered a primitive type, but a UUID can be mapped to a 'long long' or two 'long's if a 128 bit integer type is not supported programatically.

Two further built-in encoding algorithms are:

hexadecical
base64

that can be mapped to a byte primitive type. These names reflect the pattern of characters specified by the algorithms. For example, the hexadecimal encoding algorithm may only be applied to a sequence of characters that contains only characters '0' to '9' and 'A' to 'F'.

fastws
Offline
Joined: 2004-01-17

> Where can I find the schema for FI. I would like to
> see it to better understand the encoding. I would
> also like to know what gets padded etc. Thanks.

Paul, as far as the encoding algorithms(that you are currently implementing) in support of the various types e.g. int, long .... does this mean that when an xml element containing a string like "12345" is encountered when parsing is handled and gets encoded into 3039 hex?

If not can you explain in a bit more detail the use of the encoding algorithm supporting primitive types. Thanks.

sandoz
Offline
Joined: 2003-06-20

Encoding algorithms are a mechanism to optimize size and/or processing of content. The use of encoding algorithms by a serializer are optional.

An encoding algorithm is specified according to a pattern of characters. A sequence of characters conforming to the pattern may be encoded into a sequence of bits. The sequence of bits may be decoded to produce the exact same sequence of characters.

Encoding algorithms may be used in two scenarios:

1) Converting an XML document to a fast infoset document.
2) Directly creating a fast infoset document.

In secnario 1 pattern matching may be performed to acertain what encoding algorithm may apply to what sequence of characters. Since encoding algorithms are round-tripable it is guaranteed that the content of the XML document produced from the fast infoset document is the same.

In secnario 2 a program has a native representation of some content in memory and goes straight to the bits as specified by the encoding algorithm. The conceptual process of creating a sequence of characters and then producing a sequence of bits is not required.

Example: A program has an array of integers in memory that are to be encoded as the content of an attribute value.

If an XML document is to be produced then the array of integers is encoded to a sequence of characters e.g. "1234 865456 34234 456456".

If a fast infoset document is to be produced then the array of integers is encoded as specified by the int encoding algorithm. Each integer is encoded into four bytes with the most significant byte first.

See:

https://fi.dev.java.net/source/browse/fi/FastInfoset/src/com/sun/xml/fas...

fastws
Offline
Joined: 2004-01-17

Thanks for the previous response. I guess what I am still unsure of Scenario 2 as to the where it is used esp. when I am creating an FI Document. Will this happen in the context of using a DOM API?

If so how would that work?
Element customer = doc.createElement("customer");

Attr custIDAttribute = doc.createAttribute("custID");
//String s1 = String.valueOf(tr2.getCustID());
// DOES A DOM API CHANGE HERE, CAN I ADD AN ARRAY
// INTS FOR EXAMPLE INSTEAD OF A STRING, OR DOES THAT
// HAPPEND DOWN BELOW???
custIDAttribute.setValue(s1);

sandoz
Offline
Joined: 2003-06-20

The DOM API needs to be extended.

Working out what algorithms apply to what sequence of characters without some hints is a tricky problem to solve in a performant manner and also may have some unwanted side-effects.

Hints also mean extentions. So it is probably better to extend the API to manage the primitive arrays directly.

Potentialy the DOM L3 API could be used as it has a way to add generic object data to a node. But i would prefer explicit extentions that still result in compatible use with XML (i.e. characters are returned if requested). For example the extending of Attr and TextNode interfaces accordingly. Any help on design and implementation would be much appreciated here.

I am currently working on extensions to the SAX API to suport encoding algorithms. The generic framework is in place and built-in algorithms for int and float are working. Please look at the unit test class:

https://fi.dev.java.net/source/browse/fi/FastInfoset/test/algorithm/Algo...

and the method:

testBuiltInAlgorithms

to see how a fast infoset document is constructed and parsed when using the extentions.

We are slowly working out the best SAX API to use here.

Also

- the unit test code still relies on an internal implementation of the attributes holder that i need to replace with a non-implementation dependent one for client use.

- the use of the SAX API for the creation of fast infoset documents is not recommended for the novice user. Currently it is very easy to create invalid fast infoset documents if methods are not called in the correct order and supply consistent information.

Paul.

sandoz
Offline
Joined: 2003-06-20

As Eduardo says in a previous post the examples document should give you a good idea of the padding.

The tables that present the encoding details highlight the padding and underline the associated padding bits.

Paul.

pelegri
Offline
Joined: 2003-06-06

Paul will know for sure, but I think that the actual spec is private until when it goes final, which is very soon, but I don't know if the encoding itself is public

There is quite a bit of information at http://fi.dev.java.net, besides the actual code there is a document showing exmaples of encodings at:
http://asn1.elibel.tm.fr/xml/example-of-Fast-Infoset-for-UBL.htm

- eduard/o

sandoz
Offline
Joined: 2003-06-20

To get the spec you or your organisation needs to be a member of the ITU-T or ISO.

The ISO ballot is now over and it is at the Enquiry stage of 40.60 'Voting summary dispatched'.

The next stage is to go to Last-Call in the ITU-T after which, when comments have been resolved (if my understanding is correct) we can pre-publish on the ITU-T site. This will be available for free download. We are also investigating the possibility of publishing as HTML (usually it is a PDF document).

Then we move to FDIS in ISO.

I do not have the exact dates on hand but IIRC we should be finished by mid to end summer.

Paul.

[1] http://www.iso.org/iso/en/widepages/stagetable.html#40