Skip to main content

(Bug?) XMLStreamException: An invalid XML character

1 reply [Last post]
randomalious
Offline
Joined: 2004-10-10

Hi,

I'm trying to parse a wikipedia dump file, and so one should expect 'strange' character sets within the file.
I'm getting the following exception:

[javax.xml.stream.XMLStreamException: ParseError at [row,col]:[21828,9]
Message: An invalid XML character (Unicode: 0x10339) was found in the element content of the document.]

However, 0x10339 is a valid Unicode character, see http://www.unicode.org/charts/PDF/U10330.pdf
and also according to the Java API docs, see http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Character.html.
Further, java.lang.Character.isLetter( 0x10339 ) returns 'true'.

Any idea?

-S.

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
kohsuke
Offline
Joined: 2003-06-09

I think you might want to file a bug to http://sjsxp.dev.java.net/issues/