Skip to main content

Convert XML File from HTML file using java

2 replies [Last post]
veerasek
Offline
Joined: 2007-11-13
Points: 0

Hi All,

I would like to convert XML file from HTML file using Java.
If anyone of you have sample code please share with me.
Your suggestions greatly appreciated.

Thanks,
Veera

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
jwenting
Offline
Joined: 2003-12-02
Points: 0

correctly formatted html IS xml. Incorrectly formatted html will need a lot of (semi-arbitrary) assumptions to turn into anything machine readable (which is exactly what web browsers do, and why they all display html slightly differently, they don't all make the same assumptions).

haraldk
Offline
Joined: 2005-05-10
Points: 0

Hi Veera,

If your input is HTML and you want XML output, you should probably have a look at JTidy, TagSoup, CyberNeko or similar. These tools can make well-formed XML out of "street-HTML". On the other hand, if you're lucky, and your input is XHTML, you could probably just use that as-is.

When your input is well-formed, you could manipulate it using some XML package like W3C DOM (bundled in JDK), JDom or XOM. I prefer the latter, because of it's clean API and unsurprising results. DOM can be quite tricky at times.
Or maybe you could just convert your data to your XML format using an XSLT style sheet.

There's really a lot of technologies for XML out there. For better or worse... It's very much up to your preferences, what you are familiar with, and of course what your requirements are.

Good luck!

.k