Posted by joconner
on August 4, 2008 at 10:11 PM PDT
When encoding your Java source files, you have more than two options. NetBeans does it one way, Eclipse another, and now I have a third.
I reported that NetBeans 6.1's project charset encoding feature would allow an unsuspecting user to destroy file data . That's still true...through no fault of NetBeans really. It's just a matter of fact -- if you start out with UTF-8 and convert your project files to ASCII or ISO-8859-1 or any other subset of Unicode, you will lose any characters that are not also in the target charset.
But NetBeans isn't going to let you hang yourself, at least not without warning you first. NetBeans 6.5M1 has added a warning dialog that alerts you when you change from the default UTF-8 encoding. Now instead of blindly following your request to change charsets, NetBeans will tell you the following:
And, of course, you then have the option to cancel your setting before saving it. Good, very good.
Just out of curiosity, I tried the same thing in Eclipse. When I tried to save, Eclipse said this:
Eclipse would not allow me to save the file until I actually returned the encoding back to Cp1252...or as it suggests, until I removed the offending character. That's certainly one reasonable way to approach the problem.
There is a 3rd way to do this, one that I like slightly better. One could simply \uXXXX encode the characters that are not in the target charset. So, for example, if you start in Cp1252, type the word "JosÃ©" as a String or variable, then change the project to ASCII encoding, the IDE could simply \u-encode the string as "Jos\u00E9". Better? Maybe. After all, the IDE doesn't have to display "Jos\u00E9" to you. It could continue to display "JosÃ©" in the editor regardless of the underlying "encoding" of the character. After all, when you edit your file, you don't typically care if the file is UTF-8, or UTF-16, ASCII, or even \uXXXX -- just as long as the characters display correctly and are not lost.
What do you think of these options -- 1, 2, or 3? Or do you prefer something else from your IDE of choice?