Skip to main content

Quick stupid SAX question from someone returning to Sweet Mother Java

2 replies [Last post]
Joined: 2009-02-04

Ladies and Gents,

A career move about 4 years ago took me out of the realm of Java programming. A more recent move brought me back, and I find myself trying to get reacquainted with stuff I vaguely remember. I have a quick question regarding common techniques for parsing streaming XML. (and yes, I know I could probably find the answer with a web search, but I tend to find things after after I've posted stupid questions to forums. Such is the way of the Universe).

Anyway, my issue: I'm trying to figure out the best way to do dynamic parsing of a streaming XML input. Basically, there's an application doing logging in XML to either a socket or a file; my Java application has to be able to parse and display the streaming output in some meaningful manner in realtime.

My question is: what are the common ways of dealing with streaming XML, where the entire document isn't fully formed as it's being parsed (and may never be fully formed, if the stream never ends). The approach I'm taking right now is a SAX parser, with a handler that's taking the parsed elements and building a data structure out of the output. Basic stuff. The problem I run into is when I'm trying to parse a streaming file: the file EOF keeps causing the parser to throw an malformed xml exception, due to the fact that the xml stream isn't done. I haven't begun work on the socket-based input, so I'm not sure if there are any hurdles there.

I have to believe this is a problem that's been solved. Can anyone point me to examples of prior art for something like this?

Thanks in advance.

I have to believe

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
Joined: 2004-03-04

There is an API dedicated to parse Streaming XML.
Here is an article describing how to use it:

Joined: 2006-11-05

Use a pipe.

One thread reads from the file, and writes to a PipedOutputStream. When it hits EOF, it sleeps for a bit, resets its position to the previous end of file and tries again.

Another thread calls the SAX parser, and passes it the corresponding PipedInputStream. Because the InputStream is connected to an open "source", it won't return EOF, rather just blocking until input is available.

When you finally decide that there will be no more input, you should be able to close the PipedOutputStream, which will close the PipedInutStream, and signal EOF to the SAX parser.

There are probably also equivalents using NIO (java.nio.channels.Pipe springs to mind), if that is required.