Skip to main content

{JavaHelp] Can't store Document --- Exception occurs when building full-text search --- Please explain how

Please note these forums are being decommissioned and use the new and improved forums at
1 reply [Last post]
Joined: 2012-09-11

Hello all,

I have not seen much on this exception (except perhaps in an Oracle forum--even then it did not explain anything, only that someone had the same problem), but I need to be able to get a fix please.

The error is as follows:

Building Full-Text-Search data... Can't store Document
at javax.swing.text.html.parser.DocumentParser.handleText(Unknown Source)
at javax.swing.text.html.parser.Parser.handleText(Unknown Source)
at javax.swing.text.html.parser.Parser.endTag(Unknown Source)
at javax.swing.text.html.parser.Parser.parseTag(Unknown Source)
at javax.swing.text.html.parser.Parser.parseContent(Unknown Source)
at javax.swing.text.html.parser.Parser.parse(Unknown Source)
at javax.swing.text.html.parser.DocumentParser.parse(Unknown Source)
at javax.swing.text.html.parser.ParserDelegator.parse(Unknown Source)

The error is repeated 675 times; however, when I open the .hs the full text search function seems to work.

Any thoughts?

I am using the following tools:



Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
Joined: 2011-06-16

The problem is generally caused by a mismatch between the declared document encoding (eg "UTF-8") and the actual chars you have typed in one of your HTML help files. For example, I saw the error recently when I accidentally saved a file containing accented chars. For some reason my text editor had also added a BOM to the start of the file, so now the Java Document scanner was sure it had a Unicode file, but it was seeing ANSI characters (I don't know why such a simple error causes it to go quite so crazy on its error messages!).

The solution in my example was to remove the BOM, since HTML files should use an explicit header line like:
meta http-equiv="Content-Type" content="text/html; charset=UTF-8"
to tell readers the document is Unicode. Of course, once you say a doc is UTF-8, don't load it into a cheap editor, start adding accented chars, and then save it as an ANSI file. Also note that if you don't specify an encoding, you may get away with it (because the reader could try an determine it for you based on the chars it sees), but you are asking for trouble. Nowadays most document formats support Unicode, and you should explicitly specify and use UTF-8 wherever possible to avoid this sort of issue.