Skip to main content

XML Processing: Java vs C#

8 replies [Last post]
justone
Offline
Joined: 2007-10-21
Points: 0

What's on the agenda for improving the performance of parsing XML and XPath expresson evaluation in Java?

My own personal experience shows that Java is shockingly slow at evaluating XPath expressions when compared to C# - especially when reading relatively large XML documents (about 1MB). I think this is very important considering the popularity of XML and XPath in web services etc. The significant difference in performance would be enough for me to develop such things in C# (.NET) rather than Java (although I'd much prefer not to)

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
madhusudhan.vas...
Offline
Joined: 2012-03-12
Points: 0

I had similar issue with the Xpath Evaluation , I tried using CachedXPathAPI's which is faster by 100X than the XPathApi's which was used earlier.
more information about this Api is provided here :
http://xml.apache.org/xalan-j/apidocs/org/apache/xpath/CachedXPathAPI.html
Key is to create the cachedXpathApi only once and reuse it to eveluate the Xpaths.
Hope it helps.
Cheers,
Madhusudhan

alexlamsl
Offline
Joined: 2004-09-02
Points: 0

I'd suggest wrapping the primary test loop into another method and look at the difference in timing with consecutive runs. Also try running with the server VM and see if there's any further improvements in timings.

justone
Offline
Joined: 2007-10-21
Points: 0

I could do all that you suggest, but isn't there a design issue here first and foremost? It looks like the whole document is parsed each time, regardless of the node parsed in as a parameter. Should all that processing be done in the first place to evaluate an XPath expression? I think not (but I'm not an expert)

alexlamsl
Offline
Joined: 2004-09-02
Points: 0

It would certainly seem so, esp if the instance is immutable.

Filing a performance issue against it might be a good course of action.

linuxhippy
Offline
Joined: 2004-01-07
Points: 0

As far as I know xml parsing in java is on-par with other implementations on compareable platforms.

However I can't speak for XPath, I never used the API integrated into Java as of Java-5, but an external API which well ... was fast enough for my purposes.

If you could find an isolated testcase with e.g. a large, well compressable XML document I could take a small look ... maybe its really a performance bug and should be filed?

lg Clemens

justone
Offline
Joined: 2007-10-21
Points: 0

Thanks for the reply,

Here's my example:

I'm running an Intel Pentium D 3GHz dual core processor and the following code displays the following output when run on Java SDK 1.5.0_09:

Number of nodes read: 252
Time taken: 48750 ms

That's the better part of a minute! The equivalent code in C# takes 46.875 ms... which is over a thousand times faster.

The XML file read is 506,406 bytes long but I can't paste it here (too long). See the code below:

import java.io.File;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

public class XMLPerfDemo {
public static void main(String[] args) throws Exception{
DocumentBuilderFactory buildFact = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = buildFact.newDocumentBuilder();

Document doc = builder.parse(new File("c:\\temp\\DateConfig.xml"));

XPathFactory xpathFact = XPathFactory.newInstance();
XPath xpath = xpathFact.newXPath();

NodeList dateSets = (NodeList)xpath.evaluate("/DateConfig/DateSetList[@Name='WTN']/DateSet",
doc, XPathConstants.NODESET);

long time1 = System.currentTimeMillis();
for (int j=0; j < dateSets.getLength(); j++) {
Node item = dateSets.item(j);

String id = (String)xpath.evaluate("@SetId", item, XPathConstants.STRING);
String firstDate = (String)xpath.evaluate("FirstDate", item, XPathConstants.STRING);
String lastDate = (String)xpath.evaluate("LastDate", item, XPathConstants.STRING);
String otherDate = (String)xpath.evaluate("OtherDate", item, XPathConstants.STRING);

DateSet ds = new DateSet(id, firstDate, lastDate, otherDate);
}
long time2 = System.currentTimeMillis();
System.out.printf("Number of nodes read: %s\n", dateSets.getLength());
System.out.printf("Time taken: %s ms\n", time2 - time1);
}

public static class DateSet {
private String _id;
private String _firstDate;
private String _lastDate;
private String _otherDate;

public DateSet(String id, String firstDate, String lastDate, String otherDate) {
_id = id;
_firstDate = firstDate;
_lastDate = lastDate;
_otherDate = otherDate;
}
}
}

Message was edited by: justone

Added Java version.

linuxhippy
Offline
Joined: 2004-01-07
Points: 0

Well of course it should not take that long - keeing in mind that 500k is no size by any means for todays standards. I used xpath to go throught 10mb files in reasonably time.

Have you tried Netbeans profiler, its really a great tool to see where time is spent - it could help if you're mis-using the API in some way. Also, have you been able to try the example on JDK6?
If it was a performance bug that has been fixed already this would an easy way to find it.

lg Clemens

justone
Offline
Joined: 2007-10-21
Points: 0

I finally got round to profiling the application using the netbeans profiler. It spends most of its time in the [b]com.sun.org.apache.xml.internal.dtm.ref.dom2dtm.DOM2DTM.nextNode()[/b] and [b]com.sun.org.apache.xml.internal.dtm.ref.dom2dtm.DOM2DTM.addNode()[/b] methods.

I think the "problem" is that every time you use XPath.evaluate(), the method parses/scans the entire document. I'm not sure this is really necessary. I was expecting the API to narrow the scope of XPath expression evaluations to the given context node (but perhaps XPath expressions may allow you to search beyond the context node). As the given node context gets smaller the expression evaluation should get quicker.

Anyway, now that I know this I changed one line in the code to duplicate the node, creating a new isolated node separate from the original document: [b]Node item = dateSets.item(j);[/b] becomes [b]Node item = dateSets.item(j).cloneNode(true);[/b]

This reduces the time taken from about 48000ms to about 1000ms. Which is much more reasonable. (Still slower than C# though, and having to duplicate Nodes like that doesn't make efficient use of memory)