Skip to main content

FastInfoset failure for writing large documents

5 replies [Last post]
rragno
Offline
Joined: 2008-06-15
Points: 0

The FastInfoset implementation appears to fail when writing many nodes. The error is:

Exception in thread "main" javax.xml.stream.XMLStreamException: java.io.IOException: Integer > 1,048,576
at com.sun.xml.fastinfoset.stax.StAXDocumentSerializer.encodeTerminationAndCurrentElement(StAXDocumentSerializer.java:630)
at com.sun.xml.fastinfoset.stax.StAXDocumentSerializer.writeEndElement(StAXDocumentSerializer.java:271)

The same occurs with SAX.

Other FastInfoset implementations work fine, so it does not appear to be any essential limitation.

Is this normal? Is there any way around this?

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
oleksiys
Offline
Joined: 2006-01-25
Points: 0

Hi,

looks like you really reached the max. index value.
Please take a look at the FI definitions table [1]. All the indexing tables have limit of 1M.
Other implementation could not be so strict, but having such "features" could be dangerous for compatibility.
We have mechanism, which allows you to decrease the number of elements, Strings, which are going to be indexed, by their size [2]. You can have access to those method from StAXDocumentSerializer.

[1] http://www.itu.int/ITU-T/asn1/database/itu-t/x/x891/2005/FastInfoset.html
[2] /**
* Sets the limit on the size of character content chunks
* that will be indexed.
*
* @param size The character content chunk size limit. Any chunk less
* that a length of size limit will be indexed.
*/
public void setCharacterContentChunkSizeLimit(int size);

/**
* Gets the limit on the size of character content chunks
* that will be indexed.
*
* @return The character content chunk size limit.
*/
public int getCharacterContentChunkSizeLimit();

/**
* Sets the limit on the memory size of Map of attribute values
* that will be indexed.
*
* @param size The attribute value size limit. Any value less
* that a length of size limit will be indexed.
*/
public void setCharacterContentChunkMapMemoryLimit(int size);

/**
* Gets the limit on the memory size of Map of attribute values
* that will be indexed.
*
* @return The attribute value size limit.
*/
public int getCharacterContentChunkMapMemoryLimit();

/**
* Sets the limit on the size of attribute values
* that will be indexed.
*
* @param size The attribute value size limit. Any value less
* that a length of size limit will be indexed.
*/
public void setAttributeValueSizeLimit(int size);

/**
* Gets the limit on the size of attribute values
* that will be indexed.
*
* @return The attribute value size limit.
*/
public int getAttributeValueSizeLimit();

/**
* Sets the limit on the memory size of Map of attribute values
* that will be indexed.
*
* @param size The attribute value size limit. Any value less
* that a length of size limit will be indexed.
*/
public void setAttributeValueMapMemoryLimit(int size);

/**
* Gets the limit on the memory size of Map of attribute values
* that will be indexed.
*
* @return The attribute value size limit.
*/
public int getAttributeValueMapMemoryLimit();

rragno
Offline
Joined: 2008-06-15
Points: 0

This seems extremely undesirable. A FastInfoset encoder should *never* be breaking when given correct input. There is no danger for compatibility, except with itself - this is the only FastInfoset implementation I have seen with this bug! It should be viewed as any other bug, and an alarmingly easy to encounter one at that. Unless the implementation is specified as working with some subset of valid inputs, this needs to be fixed.

The specification allows for encoding the documents correctly. You do not need to fill the tables past 1M elements; ensuring that does not happen is the job of the encoder. (A quick and suboptimal fix simply disables adding new entries after the limit has been reached).

The mechanism you refer to does not seem good enough. Even if code using FastInfoset goes through these contortions, it cannot reliably restrict the indexing tables. The limits are based on the input data, and not on the size of the indexing tables. It would be pure guesswork - you could perhaps try again and again with smaller limits, if not streaming, and end up with a suboptimal encoding, but that is not reasonable.

Well, thanks for confirming it...

oleksiys
Offline
Joined: 2006-01-25
Points: 0

You're right.
I wasn't accurate, when reading your prev. post, and thought the exception occurs on deserialization phase. If it's serialization exception - it's bug!
If it's not difficult for you - can you pls. fill the issue on [1], and I'll try to fix this asap.

Thanks.
Alexey.

[1] fi.dev.java.net

rragno
Offline
Joined: 2008-06-15
Points: 0

I don't think I am allowed to, am I? Isn't that limited to 'project members'?

The repro is easy to make up, I suppose... And the suboptimal hack patch is to just skip adding when a table size is already 1M.

oleksiys
Offline
Joined: 2006-01-25
Points: 0

It's not just for members :)
As for fix - agree, it should not be complex. However if you'll fill the issue it will be easier (and for you also) to track the issue status.

Thank you.