Convert XML File from HTML file using java

Hi All,

I would like to convert XML file from HTML file using Java.
If anyone of you have sample code please share with me.
Your suggestions greatly appreciated.


correctly formatted html IS xml. Incorrectly formatted html will need a lot of (semi-arbitrary) assumptions to turn into anything machine readable (which is exactly what web browsers do, and why they all display html slightly differently, they don't all make the same assumptions).

Hi Veera,

If your input is HTML and you want XML output, you should probably have a look at JTidy, TagSoup, CyberNeko or similar. These tools can make well-formed XML out of "street-HTML". On the other hand, if you're lucky, and your input is XHTML, you could probably just use that as-is.

When your input is well-formed, you could manipulate it using some XML package like W3C DOM (bundled in JDK), JDom or XOM. I prefer the latter, because of it's clean API and unsurprising results. DOM can be quite tricky at times.
Or maybe you could just convert your data to your XML format using an XSLT style sheet.

There's really a lot of technologies for XML out there. For better or worse... It's very much up to your preferences, what you are familiar with, and of course what your requirements are.

Good luck!