Skip to main content

Entity reference Conversion to Special Character

2 replies [Last post]
Joined: 2008-10-23

Does anyone know how to diable the auto conversion of entity reference by SAXParser?

For example, when I feed source xml file with entity reference like "& lt;" to SAXParser.parse(...), it's converted to < character in the target xml.

I don't want this happen, how can I do it?



Message was edited by: ipodee

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
Joined: 2004-05-06

Hi Kevin,

If you are using Xerces you can turn on the


feature to be notified of these entities. You will still be notified of the parsed characters in the characters() method but because they will be surrounded with startEntity() and endEntity() events you can write some additional logic for this.

SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setFeature("", true);

SAXParser parser = spf.newSAXParser();
XmlHandler handler = new XmlHandler();

//Need this otherwise XmlHandler is treated as a standard DefaultHandler
parser.setProperty ("", handler);

String xml = "This is a &lt;test&gt;";
StringReader reader = new StringReader(xml);

parser.parse(new InputSource(reader), handler);

private static class XmlHandler extends DefaultHandler2
public void characters(char[] ch, int start, int length)
throws SAXException
System.out.println("Characters: " + new String(ch, start, length));

public void endEntity(String name) throws SAXException
System.out.println("end entity: " + name);

public void startEntity(String name) throws SAXException
System.out.println("start entity: " + name);

And the output is:

Characters: This is a
start entity: lt
Characters: <
end entity: lt
Characters: test
start entity: gt
Characters: >
end entity: gt

Hope this helps,


(fixed message to have properly escaped entities)

Message was edited by: prunge

Joined: 2004-12-15

While it's impossible to turn that off since entities like < are sort of built-in in XML, you may eascape them in a CDATA section.