php-src/ext/mbstring/tests/armscii8_encoding.phpt
Alex Dowad 3c73225125 New internal interface for fast text conversion in mbstring
When converting text to/from wchars, mbstring makes one function call
for each and every byte or wchar to be converted. Typically, each of
these conversion functions contains a state machine, and its state has
to be restored and then saved for every single one of these calls.
It doesn't take much to see that this is grossly inefficient.

Instead of converting one byte or wchar on each call, the new
conversion functions will either fill up or drain a whole buffer of
wchars on each call. In benchmarks, this is about 3-10× faster.

Adding the new, faster conversion functions for all supported legacy
text encodings still needs some work. Also, all the code which uses
the old-style conversion functions needs to be converted to use the
new ones. After that, the old code can be dropped. (The mailparse
extension will also have to be fixed up so it will still compile.)
2021-12-21 08:33:11 +02:00

46 lines
1.5 KiB
PHP

--TEST--
Exhaustive test of verification and conversion of ARMSCII-8 text
--EXTENSIONS--
mbstring
--SKIPIF--
<?php
if (getenv("SKIP_SLOW_TESTS")) die("skip slow test");
?>
--FILE--
<?php
include('encoding_tests.inc');
srand(111); // Make results consistent
mb_substitute_character(0x25); // '%'
readConversionTable(__DIR__ . '/data/ARMSCII-8.txt', $toUnicode, $fromUnicode);
$irreversible = ["\x28", "\x29", "\x2C", "\x2D", "\x2E"];
findInvalidChars($toUnicode, $invalid, $truncated);
testAllValidChars($toUnicode, 'ARMSCII-8', 'UTF-16BE', false);
foreach ($irreversible as $char)
unset($toUnicode[$char]);
testAllValidChars($toUnicode, 'ARMSCII-8', 'UTF-16BE');
testAllInvalidChars($invalid, $toUnicode, 'ARMSCII-8', 'UTF-16BE', "\x00%");
testTruncatedChars($truncated, 'ARMSCII-8', 'UTF-16BE', "\x00%");
echo "Tested ARMSCII-8 -> UTF-16BE\n";
findInvalidChars($fromUnicode, $invalid, $unused, array_fill_keys(range(0,0xFF), 2));
convertAllInvalidChars($invalid, $fromUnicode, 'UTF-16BE', 'ARMSCII-8', '%');
echo "Tested UTF-16BE -> ARMSCII-8\n";
// Test "long" illegal character markers
mb_substitute_character("long");
convertInvalidString("\xA1", "%", "ARMSCII-8", "UTF-8");
convertInvalidString("\xFF", "%", "ARMSCII-8", "UTF-8");
// Test replacement character which cannot be encoded in ARMSCII-8
mb_substitute_character(0x1234);
convertInvalidString("\x23\x45", '?', 'UTF-16BE', 'ARMSCII-8');
echo "Done!\n";
?>
--EXPECT--
Tested ARMSCII-8 -> UTF-16BE
Tested UTF-16BE -> ARMSCII-8
Done!