Hi, dear monkeys. Long time no see.
Problem
I got problem exporting 3d Scene to j3o file.
The problem is, I set Light name with Chinese characters, the exported j3o can’t save the name as UTF-8.
Here is the log:
警告: Your export has been saved with an incorrect encoding for its String fields which means it might not load correctly due to encoding issues. You should probably re-export your work. See ISSUE 276 in the jME issue tracker.
11月 27, 2020 11:24:12 上午 com.jme3.export.binary.BinaryInputCapsule readString
I read the source code of BinaryInputCapsule#readString, find that code has error when checking UTF-8 data.
Let a 3 bytes UTF-8 data = [0xE4, 0x8A, 0xBC], when b = 0xE4 (1110 0100), it will be treated as 2 bytes.
See this part:
if (b < 0x80) {
// good
}
else if ((b & 0xC0) == 0xC0) {// (0xE4 & 0xC0) == 0xC0 =====> true
utf8State = UTF8_2BYTE;
}
else if ((b & 0xE0) == 0xE0) {// (0xE4 & 0xE0) == 0xE0 =====> true
utf8State = UTF8_3BYTE_1;
}
else {
utf8State = UTF8_ILLEGAL;
}
3 bytes UTF-8 data while always be treated as 2 bytes UTF-8 data.
Bugfix
UTF-8 encoding data in this way:
bytes | encoding |
---|---|
1 | 0xxxxxxx |
2 | 110xxxxx 10xxxxxx |
3 | 1110xxxx 10xxxxxx 10xxxxxx |
4 | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx |
5 | 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx |
6 | 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx |
Which means:
- For 2 bytes data, the code should check first 3 bits as 110xxxxx, check 2 bits as 10xxxxxx with the following data.
- For 3 bytes data, the code should check first 4 bits as 1110xxxx, check 2 bits as 10xxxxxx with the following data.
- …
This is the fix
if (b < 0x80) {
// good
}
else if ((b & 0xE0) == 0xC0) {// (0xE4 & 0xE0) == 0xC0 =====> false
utf8State = UTF8_2BYTE;
}
else if ((b & 0xF0) == 0xE0) {// (0xE4 & 0xF0) == 0xE0 =====> true
utf8State = UTF8_3BYTE_1;
}
else {
utf8State = UTF8_ILLEGAL;
}