Skip to main content

JavaMail converts UTF-8 chars to question marks

4 replies [Last post]
olafos
Offline
Joined: 2007-09-18

Hi,

I'm working with glassfish-v2ur1-b09. I'm using JavaMail to send emails (UTF-8 encoded) from my application. Since recently everything worked just fine, but now every UTF character in my emails is represented by a single '?'. I know this is how such characters are displayed in consoles with no UTF-8 support but this is not the case. Here's the method I use to send an email:

--

public static final String DEFAULT_ENCODING = "UTF-8";

public void sendEmail(final Address from, final Address[] to, final Address[] bcc, final Address[] cc, final Address[] replyTo, final String subject,
final String message, final EmailAttachment[] attachments, final boolean asHtml) throws SystemException {
log.finest("sendEmail - start");
log.finest(": from = " + from);
log.finest(": to = " + to);
log.finest(": bcc = " + bcc);
log.finest(": cc = " + cc);
log.finest(": replyTo = " + replyTo);
log.finest(": subject = " + subject);
log.finest(": message length = " + message.length());
log.finest(": attachments = " + attachments);
try {
// create session
final Session session = Session.getInstance(this.props, new SMTPAuthenticator());
// create a message
final MimeMessage mimeMessage = new MimeMessage(session);
mimeMessage.setFrom(from);
mimeMessage.setRecipients(Message.RecipientType.TO, to);
mimeMessage.setRecipients(Message.RecipientType.BCC, bcc);
mimeMessage.setRecipients(Message.RecipientType.CC, cc);
mimeMessage.setReplyTo(replyTo);
mimeMessage.setSubject(subject, DEFAULT_ENCODING);
mimeMessage.setSentDate(new Date());
// create message part
MimeBodyPart messageBodyPart = new MimeBodyPart();
if (asHtml) {
messageBodyPart.setContent(message, "text/html; charset=" + DEFAULT_ENCODING);
} else {
final StringBuffer sb = new StringBuffer("{");
for (char c : message.toCharArray()) {
sb.append('\'');
sb.append(c);
sb.append('\'');
sb.append('=');
sb.append((int) c);
sb.append(", ");
}
sb.append("}");
log.info(": message = " + sb.toString());
messageBodyPart.setText(message, DEFAULT_ENCODING);
}
final Multipart multipart = new MimeMultipart();
multipart.addBodyPart(messageBodyPart);
// create attachments parts
if (attachments != null) {
for (final EmailAttachment attachment : attachments) {
messageBodyPart = new MimeBodyPart();
messageBodyPart.setDataHandler(new DataHandler(attachment));
messageBodyPart.setFileName(attachment.getName());
multipart.addBodyPart(messageBodyPart);
}
}
// add the Multipart to the message
mimeMessage.setContent(multipart);
mimeMessage.saveChanges();
log.finest(": part Content-Transfer-Encoding = " + messageBodyPart.getHeader("Content-Transfer-Encoding", ";"));
// send the message
Transport.send(mimeMessage);
log.finest("sendSuccessful");
} catch (final AddressException ex) {
throw new InvalidEmailException("000603", ex.getRef());
} catch (final MessagingException ex) {
throw new SystemException("000324", ex);
}
log.finest("sendEmail - end");
}

--

As you can see just before invoking messageBodyPart.setText( ) method I'm logging the message character codes and they are ok. Ufortunately the message is sent with question marks instead.

This is an example message I sent:

message = {'K'=75, 'a'=97, 't'=116, 'e'=101, 'g'=103, 'o'=111, 'r'=114, 'i'=105, 'a'=97, ':'=58, ' '=32, 'p'=112, 'y'=121, 't'=116, 'a'=97, 'n'=110, 'i'=105, 'e'=101, ' '=32, 'l'=108, 'u'=117, 'b'=98, ' '=32, 'u'=117, 'w'=119, 'a'=97, 'g'=103, 'a'=97, '
'=10, 'I'=73, 'm'=109, 'i'=105, '?'=281, ' '=32, 'i'=105, ' '=32, 'n'=110, 'a'=97, 'z'=122, 'w'=119, 'i'=105, 's'=115, 'k'=107, 'o'=111, ' '=32, 'u'=117, '?'=380, 'y'=121, 't'=116, 'k'=107, 'o'=111, 'w'=119, 'n'=110, 'i'=105, 'k'=107, 'a'=97, ':'=58, ' '=32, 'O'=79, 'l'=108, 'a'=97, 'f'=102, ' '=32, 'T'=84, 'o'=111, 'm'=109, 'c'=99, 'z'=122, 'a'=97, 'k'=107, '
'=10, 'A'=65, 'd'=100, 'r'=114, 'e'=101, 's'=115, ' '=32, 'I'=73, 'P'=80, ':'=58, ' '=32, '1'=49, '5'=53, '3'=51, '.'=46, '1'=49, '9'=57, '.'=46, '1'=49, '2'=50, '8'=56, '.'=46, '2'=50, '3'=51, '4'=52, ' '=32, '('=40, '1'=49, '5'=53, '3'=51, '.'=46, '1'=49, '9'=57, '.'=46, '1'=49, '2'=50, '8'=56, '.'=46, '2'=50, '3'=51, '4'=52, ')'=41, '
'=10, 'U'=85, 's'=115, 'e'=101, 'r'=114, ' '=32, 'a'=97, 'g'=103, 'e'=101, 'n'=110, 't'=116, ':'=58, ' '=32, 'M'=77, 'o'=111, 'z'=122, 'i'=105, 'l'=108, 'l'=108, 'a'=97, '/'=47, '5'=53, '.'=46, '0'=48, ' '=32, '('=40, 'X'=88, '1'=49, '1'=49, ';'=59, ' '=32, 'U'=85, ';'=59, ' '=32, 'L'=76, 'i'=105, 'n'=110, 'u'=117, 'x'=120, ' '=32, 'i'=105, '6'=54, '8'=56, '6'=54, ';'=59, ' '=32, 'p'=112, 'l'=108, ';'=59, ' '=32, 'r'=114, 'v'=118, ':'=58, '1'=49, '.'=46, '8'=56, '.'=46, '1'=49, '.'=46, '1'=49, '4'=52, ')'=41, ' '=32, 'G'=71, 'e'=101, 'c'=99, 'k'=107, 'o'=111, '/'=47, '2'=50, '0'=48, '0'=48, '8'=56, '0'=48, '4'=52, '2'=50, '8'=56, ' '=32, 'F'=70, 'i'=105, 'r'=114, 'e'=101, 'f'=102, 'o'=111, 'x'=120, '/'=47, '2'=50, '.'=46, '0'=48, '.'=46, '0'=48, '.'=46, '1'=49, '4'=52, '
'=10, '
'=10, 'T'=84, 'r'=114, 'e'=101, '?'=347, '?'=263, ' '=32, 'w'=119, 'i'=105, 'a'=97, 'd'=100, 'o'=111, 'm'=109, 'o'=111, '?'=347, '?'=263, 'i'=105, ':'=58, ' '=32, '
'=10, 'T'=84, 'e'=101, 's'=115, 't'=116, ' '=32, '['=91, '?'=379, '?'=211, '?'=321, '?'=262, ']'=93, ' '=32, 'T'=84, 'e'=101, 's'=115, 't'=116, '
'=10, }

Although the characters look like question marks here their codes are correct. When I receive this email, save it on disk and then read it again to check character codes i actually get code 63 (question mark) where UTF-8 characters are supposed to be.

This is the body of the messge part:

------=_Part_15_11258709.1212559151359
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit

Kategoria: pytanie lub uwaga
Imi? i nazwisko u?ytkownika: Olaf Tomczak
Adres IP: 153.19.128.234 (153.19.128.234)
User agent: Mozilla/5.0 (X11; U; Linux i686; pl; rv:1.8.1.14) Gecko/20080428 Firefox/2.0.0.14

Tre?? wiadomo??i:
Test [????] Test

------=_Part_15_11258709.1212559151359--

What is even more interesting the same code worked before on the machine I'm using (I didn't change glassfish or server configuration). Also this code works well on other machines, and when I wrote a standalone application using JavaMail and the same method to send email and run it - it also worked fine.

What might be the cause of this behaviour?!

Thanks a lot,
Olaf Tomczak

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
shannon
Offline
Joined: 2003-06-10

(In the future you'll probably have better luck with JavaMail questions in the JavaMail forum:
http://forum.java.sun.com/forum.jspa?forumID=43)

How do you save the message to disk and how do you read it back?

Is the message correct on disk? The characters should be in UTF-8 format on disk,
and probably encoded by JavaMail using quoted-printable or base64 encoding.

What locale are you running in on your machine? Did your locale setting change recently,
and is it different on the other machines where this works correctly?

olafos
Offline
Joined: 2007-09-18

Thanks shannon,

The encoding on the machine is UTF-8 and it did not change. I saved the message with Mozilla Thunderbird just after receiving it from my server. Then I read the message using FIleReader (UTF-8 encoding again) only to make sure that the question marks visible in the message are really question marks, not some characters unrecognized by Thunderbird. I'm pretty confident that the message is already sent with question marks instead of UTF characters. What's quite surprising is that on my other machines JavaMail uses quoted-printable transfer encoding and here it uses 7bit encoding (to send the same message) which I thought should only be used with US-ASCII charset.

I'll try to investigate this further..

Thanks for your help
Olaf Tomczak

shannon
Offline
Joined: 2003-06-10

The message takes a path through many pieces of software before it comes
back to you and is stored on your disk. Any of those pieces of software could
be causing the problem. The challenge is to isolate the cause of the problem
to one of those pieces of software.

Before you send the message, or instead of sending the message, do a
msg.writeTo(new FIleOutputStream("msg.txt"));

That's the message that JavaMail is sending to your server. If that message
is correct, the problem is not in JavaMail.

Generally a message stored on disk is *not* going to use the UTF-8 charset.
Usually it will use the ASCII charset and any non-ASCII body parts will be encoded
(e.g., using quoted-printable or base64) into ASCII characters. Possibly Thunderbird
is saving the message using the OS charset and is saving the body parts in 8bit
format. If you read such a message using the wrong charset, you'll get the wrong
data.

If you still can't figure this out, send me a copy of the message you send
(msg.txt above) and a copy of the message you receive. Send them to
javamail@sun.com and I'll help you figure out what's wrong.

olafos
Offline
Joined: 2007-09-18

Thanks,

Actually I managed to solve my problem by explicitly setting JVM encoding to UTF-8 with file.encoding flag. This is strange since server encoding didn't change and I didn't use this flag before.. Also standalone application worked on my server without using this flag. Anyway, thanks for the tip on how to save message sent by JavaMail on the server side. Actually I was looking for the way to do it and couldn't figure it out.

Best regards
Olaf Tomczak