Skip to main content

issue with html page encoding

4 replies [Last post]
ibender
Offline
Joined: 2009-08-28
Points: 0

Hello,

I found a issue with html page encoding.
You can give this error to repeat the fallowing steps:

- Create html form and init HTMLComponent

c.setPage("http://google.ru/m");<br />
c.getDocumentInfo().setEncoding("UTF-8");<br />

- Start Midlet
- Press button "Поиск"(Search) on the google page.

You should see the unreadable symbols.

Cause: html encoding was corrupted after the second request

I found workaround how to avoid this issue:

1. it is necessary to make the following changes in HTML Component Class:

<br />
public void setPage(final String pageURL) {<br />
// We are save previous DocumentInfo object<br />
if (this.docInfo != null) {<br />
this.docInfo.setUrl(pageURL);<br />
} else {<br />
this.docInfo = new DocumentInfo(pageURL);<br />
}<br />
setPage(this.docInfo);<br />
  }

2. HTMLForm class, method submit():
void submit(String submitKey,String submitVal) {<br />
......<br />
if (!error) {<br />
DocumentInfo docInfo;</p>
<p>// We are save previous DocumentInfo object<br />
if (htmlC.getDocumentInfo() != null) {<br />
htmlC.getDocumentInfo().setUrl(url);<br />
htmlC.getDocumentInfo().setParams(params);<br />
htmlC.getDocumentInfo().setPostRequest(isPostMethod);<br />
docInfo = htmlC.getDocumentInfo();<br />
} else {<br />
docInfo = new DocumentInfo(url, params, isPostMethod);<br />
}</p>
<p>//!!!! WHY? May be encType attribut is charset attribut for link <a charset="UTF-8" href="http://www.site.com">site</a><br />
docInfo.setEncoding(charset)</p>
<p>/*if ((encType!=null) && (!encType.equals(""))) {<br />
docInfo.setEncoding(encType);<br />
}*/<br />
htmlC.setPage(docInfo);<br />
}<br />
}

br /

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
ibender
Offline
Joined: 2009-08-28
Points: 0

Hello,

Is it not interesting for anybody? or may be the question is discussed already? or my english not so good and nobody understood me?)

Thank you!

vprise
Offline
Joined: 2003-11-07
Points: 0

I forwarded the question to Ofir a while back but his Oracle email account is having issues. I'm not sure what the right answer is and I would rather Ofir (who wrote the component) addresses this himself.

ofirl
Offline
Joined: 2008-06-24
Points: 0

I tried using google.ru with the LWUITBrowser project (That shows how to use HTMLComponent) - and it works, including encoding after searching.

Which request handler are you using?

ibender
Offline
Joined: 2009-08-28
Points: 0

Hi,

You're right it works if use request handler from project LWUITBrowser.
For tests I have used request handler from project Browser.
Encoding works with request handler from project LWUITBrowser by the reason that the header field content-type is processed with method resourceRequested, which is absent in another request handler.

Here is the place, class LWUITBrowser, method resourceRequested():

<br />
if ((docInfo.getExpectedContentType()==DocumentInfo.TYPE_HTML) ||<br />
(docInfo.getExpectedContentType()==DocumentInfo.TYPE_CSS)) { // Charset is relevant for HTML and CSS only<br />
int charsetIndex = contentTypeStr.indexOf("charset=");<br />
if (charsetIndex!=-1) {<br />
String charset=contentTypeStr.substring(charsetIndex+8);<br />
docInfo.setEncoding(charset.trim());<br />
// if ((charset.startsWith("utf-8")) || (charset.startsWith("utf8"))) { //startwith to allow trailing white spaces<br />
// 		docInfo.setEncoding(DocumentInfo.ENCODING_UTF8);<br />
// }<br />
}<br />
}<br />

AsyncDocumentRequestHandlerImpl request handler is used in the class Oauth2, which doesn't read the the header field content-type.
I agree with you content-type has the advantage of choice of charset, but if we take request handle then there isn't code, which processed content-type,
then need use encoding set with method htmlC.getDocumentInfo().setEncoding("UTF-8");
And it works only during first request but then during next request as I've written the object DocumentInfo isn't saved, and charset is erased on ISO-8859-1.
In the code I've give an example how you can eliminate it, and also I've marked unintelligible the part of code
class HTMLForm, method submit():

<br />
/*if ((encType!=null) && (!encType.equals(""))) {<br />
docInfo.setEncoding(encType);<br />
}*/<br />

encType, doesn't keep the charset of page, but instead there are alike attribute in link.