Optimize branch structure of UTF-8 decoder routine

I like the asm which gcc -O3 generates on this modified code...
and guess what: my CPU likes it too!

(The asm is noticeably tighter, without any extra operations in the
path which dispatches to the code for decoding a 1-byte, 2-byte,
3-byte, or 4-byte character. It's just CMP, conditional jump, CMP,
conditional jump, CMP, conditional jump.

...Though I was admittedly impressed to see gcc could implement the
boolean expression `c >= 0xC2 && c <= 0xDF` with just 3 instructions:
add, CMP, then conditional jump. Pretty slick stuff there, guys.)

Benchmark results:

UTF-8, short - to UTF-16LE  faster by 7.36% (0.0001 vs 0.0002)
UTF-8, short - to UTF-16BE  faster by 6.24% (0.0001 vs 0.0002)
UTF-8, medium - to UTF-16BE faster by 4.56% (0.0003 vs 0.0003)
UTF-8, medium - to UTF-16LE faster by 4.00% (0.0003 vs 0.0003)
UTF-8, long - to UTF-16BE   faster by 1.02% (0.0215 vs 0.0217)
UTF-8, long - to UTF-16LE   faster by 1.01% (0.0209 vs 0.0211)
This commit is contained in:
Alex Dowad 2023-01-05 22:41:23 +02:00
parent d8b5b9fa55
commit 092ad3e462

View File

@ -225,7 +225,9 @@ static size_t mb_utf8_to_wchar(unsigned char **in, size_t *in_len, uint32_t *buf
if (c < 0x80) {
*out++ = c;
} else if (c >= 0xC2 && c <= 0xDF) { /* 2 byte character */
} else if (c < 0xC2) {
*out++ = MBFL_BAD_INPUT;
} else if (c <= 0xDF) { /* 2 byte character */
if (p < e) {
unsigned char c2 = *p++;
if ((c2 & 0xC0) != 0x80) {
@ -237,7 +239,7 @@ static size_t mb_utf8_to_wchar(unsigned char **in, size_t *in_len, uint32_t *buf
} else {
*out++ = MBFL_BAD_INPUT;
}
} else if (c >= 0xE0 && c <= 0xEF) { /* 3 byte character */
} else if (c <= 0xEF) { /* 3 byte character */
if ((e - p) >= 2) {
unsigned char c2 = *p++;
unsigned char c3 = *p++;
@ -262,7 +264,7 @@ static size_t mb_utf8_to_wchar(unsigned char **in, size_t *in_len, uint32_t *buf
}
}
}
} else if (c >= 0xF0 && c <= 0xF4) { /* 4 byte character */
} else if (c <= 0xF4) { /* 4 byte character */
if ((e - p) >= 3) {
unsigned char c2 = *p++;
unsigned char c3 = *p++;