Posted by joconner
on March 29, 2007 at 2:36 AM PDT
The internet's core infrastructure, including domain name servers and name resolvers just doesn't handle non-ASCII characters very well. That's why java.net.IDN is so useful.
The Java SE 6 release provides an interesting new class:
java.net.IDN. It's small, simple...very focused on a single task. That task has two parts:
- to convert domain names from practically any Unicode character to an
ASCII Compatible Encoding or
- to convert ACE names back into their full Unicode UTF-16 encoding
To support these two operations, not surprisingly, the class has two static methods:
toASCII method converts its non-ASCII Unicode characters to an ACE form using an algorithm called
punycode. Yeah, I snickered at the name too. The results are always surprising, but don't worry...it's well defined enough that it produces the same results repeatedly. So, for example, if you want to use the domain name 日本語.jp, the
toASCII method would produce the ACE equivalent of
toUnicode method returns the ACE name back to its original form.
So why do you need this? It turns out that the internet's core infrastructure, including domain name servers and name resolvers just don't handle non-ASCII characters very well. At the very least, it's safe to say that they don't purposefully support non-ASCII characters. However, people want the bigger Unicode character range for their name names. So, RFC 3490 allows for internationalized, Unicode names...but with a hitch. We have to pass ACE names to the infrastructure DNS and name resolvers. Your apps can display 日本語.jp, but those same apps have to convert to ACE when they pass the name off to DNS, etc. So that's it. That's why
java.net.IDN is useful.
Java SE 6 has several new internationalization features. IDN support is just one. To read more about this and other new i18n features, take a look at the article International Enhancements in Java SE 6 .