Skip to main content

strings, fonts problem

3 replies [Last post]
pl2
Offline
Joined: 2009-08-19
Points: 0

Reply viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
davyp
Offline
Joined: 2007-01-03
Points: 0

> A java string other than english has wrong length.
> So a Greek letter string "Είσοδος" (entry)
> instead of length 7 has length 14.
> The string "Entry" has the correct length 5.
>
> For fonts the work around is bitmap fonts but it
> requires a lot of re-writting. Too bad because the vm
> is very fast and stable.

I have read your post on two different machines. My laptop shows the
Greek symbols (7 symbols), but on my desktop I see a garbage string
of 14 symbols so it is probably a character encoding issue if you lack
proper font support.

If you compile your own VM, then you can change the bitmap fonts
by using the scripts in phoneme/midp/src/tool/fontgen/ and include
whatever unicode characters you like.

Davy

davyp
Offline
Joined: 2007-01-03
Points: 0

I have noticed something odd printing a string with unicode characters using the built in bitmap
font: \u00A9\u00AE\u20AC (copyright sign ©, registered sign ®, euro €)

If I compile the VM with eVC4 for WM2003 it prints the first two characters correctly but not the
euro sign, because I only included the first to 256 characters into the bitmap font. But if I compile
the same VM source code for WM5 with VS2005, I get rubbish.I traced the problem to some
code in midp/src/lowlevelui/putpixel_port/wince/native/font.c by adding a debug statement in
drawCharImpl() that prints the character c and its numeric code.

For the eVC4 build it returns the same numbers as above: 0xA9, 0xAE and 0x20AC, but for the
VS2005 build, the method drawCharImpl() is called two times for ©, again two times for ®, and
three times for €. Turns out that the UTF-8 encoding is causing the problems. According to
http://en.wikipedia.org/wiki/UTF-8, the euro sign is encoded into 0xE2,0x82,0xAC and the
three drawCharImpl() method calls I get for the euro symbol have as numeric char values:
0xFFE2,0xFF82,0xFFAC. The lower bytes match and I guess the 0xFF is used as a marker.

For the the copyright sign © I get 0xFFC2, 0xFFA9, and for the registered sign ® I get 0xFFC2,
0xFFA9.

I don't know where the string representation within the VM goes funky, but it could explain why
you get the wrong string length.

Davy

sergio_n
Offline
Joined: 2006-10-16
Points: 0

How did you get such string? Seems it stores keyboard sequence instead of UTF character code.