Skip to main content

a question about saving web page

3 replies [Last post]
jtqiu
Offline
Joined: 2007-11-17

I try to download a web page, and then save it in a Stringbuffer in which the web page will be processed further.
I use following method.
InputStream in=new BufferedInputStream (url.openStream);
InputStreamReader r=new InputStreamReader(in);
Char[] cs=new char[4096];
StringBuffer str=new StringBuffer();
While ((c=r.read(cs)!=-1){
Str.append(cs);
}

Web page in StringBuffer is arranged wrongly. For example, some parts of web page are repeated. Or some parts missed. In short, if save StringBuffer to an html file, the file is not a right web page as it should be.
Why, my method is wrong? Or is it a bug of JAVA?

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
tarbo
Offline
Joined: 2006-12-18

Java's fine.

The problem most likely lies in your application of [i]Reader.read(char[])[/i]. Reader.read(char[] buf) is equivalent to Reader.read(char[] buf, int offset = 0, int length = buf.length).

Suppose your first read gives you 1440 characters. This fills up positions [0, 1440[. When you call StringBuffer.append(buf), you are appending [i]the entire buffer[/i], including [1440, 4096[, which will be zeroed. Now, if you read only 880 characters next time, you will overwrite [0, 880[, but you will leave [880, 1440[ untouched (hence the haphazard repetitions you see).

So use StringBuffer.append(char[] src, int offset, int length), with 0 for offset and the amount of bytes read for length.

jtqiu
Offline
Joined: 2007-11-17

thanks,
However, after trying out your method, problem has not been solved.
I found out that the following code
while((c= r.read())!=-1) {
str.append((char)c);
}
may solve the problem, why?

jtqiu
Offline
Joined: 2007-11-17

sorry, it is my wrong.
I try again,your answer is right.
Thank you very much