Skip to main content

Convert html to xml

1 reply [Last post]
Joined: 2008-05-17
Points: 0

Hi all,
I have got a problem, i need to find some information in html text. I need to have a possibility to define in which nodes this text exists. I was thinking, to do this somethink like that: first to convert html to xml, and then parse text in xml (for exaple with XPath), but i have encountered following problem: in xml there could be some entities, (like or antythink else :) ), and parser dosn't recognize its definitions. How to resolve this problem, or maybe i should not try to convert html to xml, and just parse html, with avalaible libraries. Thanks for any suggestions.

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
Joined: 2008-07-04
Points: 0

I have written a Java function to convert HTML to well-formed XML so XPath queries can be used to get information inside HTML pages. This is an opensource project :