Skip to main content

Leading, but not trailing, whitespace in text ignored by Swing HTML parser?

1 reply [Last post]
keithkml
Offline
Joined: 2003-06-10

It seems that the Swing HTML parser in JDK 1.4.2 ignores whitespace at the beginning of text, but not at the end. When using a ParserDelegator to read this HTML:

X Y Z

handleText is called in my ParserCallback 3 times:
1. "X"
2. "Y "
3. "Z"

The second call has a space after, but not before.

Does anyone know why this is? Is this part of the HTML standard? (Mozilla renders that string with a space between XY and between YZ.)

Does anyone know of a workaround I could use to read that space as part of the text?

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
zpm
Offline
Joined: 2003-06-16

Hi Keith,

Strangely enough, i get using JDK build 1.4.2-rc-b25:
1. 'X'
2. ' Y '
3. 'Z'
and when i load this in a JEditorPane, it is displayed like
X Y Z

As for HTML, the spec says (paragraph 9.1, White Space, http://www.w3.org/TR/html401/struct/text.html#h-9.1):
"In order to avoid problems with SGML line break rules and inconsistencies among extant implementations, authors should not rely on user agents to render white space immediately after a start tag or immediately before an end tag."
So technically it is not incorrect to ignore just one of the spaces, though for consistency it'd be better to either ignore both or keep both