Converting your UTF-8 to a Unicode code point value, I get U+267CC, definitely in the supplementary area. It is a completely valid Unicode character, supported nicely in Java SE 5 and higher. What version of Java are you using? Maybe you are using an older version of the Java platform, one that doesn't quite grok the character. Can you try a slightly less ambitious character, perhaps one up to U+FFFF...let's see how your app works then, and then we'll re-evaluate the problem.
I'm also having some problems that sound similar to this.
I'm submitting data from a web page, and when I use Japanese characters I'm seeing the following received in the debugger:
\u0006F22\u0005B57 (for: æ¼¢å—)
This is UTF-16, but I'm sending this data to a dotnet application that is expecting UTF-8. How can I do this conversion? I've tried:
String utf8Body = new String(request.getBody().getBytes(), "UTF-8");
But this only serves to mangle the String once received. I'm new to internationalization issues, so any help is greatly appreciated.
Most UTF-8 encoded Japanese characters will encode in 3 bytes. For example, the character æ¼¢ (KAN) encodes as the three UTF-8 code units E6 BC A2. If you have Japanese characters that encode as four UTF-8 code units, you must be using characters above the base multilingual plane (supplementary characters). Maybe you are encoding the characters incorrectly? Are you really using supplementary characters?
Your use of this web site or any of its content or software indicates your agreement to be bound by these Terms of Participation.
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.