Skip to main content

CopyBytes & CopyCharacters

2 replies [Last post]
rhayward
Offline
Joined: 2009-02-02
Points: 0

I've just started learning Java and have been looking at these 2 example programs.
It seems the difference between them is that CopyBytes reads a byte at a time, whereas CopyCharacters reads 16 bits, ie 2 bytes at a time.

To explore this, I downloaded the example xanadu.txt to my Windows laptop, opened it in Notepad and saved it as a unicode document. Having done that, I had a file 300 bytes long.

I counted how many times the while loop executed

while ((c = in.read()) != -1) {
out.write(c);
counter++;
}

I was expecting the CopyBytes to loop 300 times but the CopyCharacters only 150, as it would be reading 2 bytes at a time.

However, they both loop 300 times.

Could anyone tell me why?

Regards
Richard

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
prunge
Offline
Joined: 2004-05-06
Points: 0

Hi Richard,

I'm guessing 'in' is an instance of Reader in the CopyCharacters code and an InputStream in the CopyBytes code.

It is true that Readers read characters one at a time. However, the number of bytes per character is determined by the [i]character encoding[/i] used. A FileReader uses the platform default charset, which on Windows is 'Cp1252', on many other platforms it is 'UTF-8'. Both these character sets use one byte per character for the first 128 characters, but use more bytes per character for the higher-up ones.

You can explicitly set the charset to use for decoding bytes from files using an InputStreamReader that takes a charset argument along with a FileInputStream.

e.g. Reader reader = new InputStreamReader(new FileReader("file.txt", "UTF-16"));

Constructing the above reader will get

Points to remember:
- a character can be an arbitrary number of bytes depending on the character encoding.
- FileReader by default uses platform default encoding which might not always match the file type and can vary from platform to platform.

Peter

rhayward
Offline
Joined: 2009-02-02
Points: 0

Thanks for your help Peter, I've got a better idea now of the possibilities available.

> e.g. Reader reader = new InputStreamReader(new FileReader("file.txt", "UTF-16"));

I had to use this to make it work:

InputStreamReader inputStream = new InputStreamReader(new FileInputStream("file.txt"), "UTF-16");

Regards,
Richard