Skip to main content

how can i generate and use a FI external vocabulary

3 replies [Last post]
toreba
Offline
Joined: 2006-07-03

hi,

I have been impressed by the performance and compression of fi. i would appreciate an example of generating an external vocab, and serializing/parsing the resultant binary XML. i want to use fi to compress xml before saving it to a database.
The data i am working with is very much like FIX XML.

Thanks

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
sandoz
Offline
Joined: 2003-06-20

Please see this blog:

http://blogs.sun.com/roller/page/sandoz?entry=fast_infoset_1_1_improvements

which explains how to generate an external vocabulary from a schema and an optional set of sample documents.

The schema can be W3C XML Schema or Relax NG (although i have not tested the latter).

You will need to use the FastInfosetUtilities.jar (which depends on JDK 5.0 where as FastInfoset.jar depends on JDK 1.4).

It should be possible to also create a Vocabulary instance by hand but IMHO the best way is to generate using schema.

I have yet to add external vocabulary examples to the samples. Please let me know if this is sufficient or if you require a working sample.

Hope this helps,
Paul.

toreba
Offline
Joined: 2006-07-03

Thanks paul!

I put together the snippets of code and was able to print out an external vocabulary. Is there going to be a way of persisting an external vocabulary to a file and using that to initialise a parser for either deserializing/ serializing ?

Because i want to save the fi XML to a database i would need to persist the external vocabulary as well. Or would that be too risky? Perhaps it doesn't make sense to use an external vocabulary in this case? The improved compression is what i am after.

Thanks for your advice paul.

sandoz
Offline
Joined: 2003-06-20

There is no code to persist a Vocabulary instance (although it should be easier to write some code that does the persistence), and there is no standard explicit representation for an external vocabulary.

If you can always start from the source of schema/samples then generating temporary files/cache of external vocabualries seems a reasonable solution, as long as the external vocabulary is not stored per document :-)

In my experience external vocabularies only improve things when documents are small and the unique markup makes up a resonable portion of the document. For large documents with repeating markup an external vocabulary makes little difference.

If you require compression without speed then compressioning the XML might work for you. Note that you can also compress the FI documents. They should reduce to about the same size as XML documents but there is less data to compress and uncompress.