php-src/ext/mbstring/tests/ucs4_encoding.phpt
Alex Dowad a789088527 Add more tests for mbstring encoding conversion
When testing the preceding commits, I used a script to generate a large
number of random strings and try to find strings which would yield
different outputs from the new and old encoding conversion code.
Some were found. In most cases, analysis revealed that the new code
was correct and the old code was not.

In all cases where the new code was incorrect, regression tests were
added. However, there may be some value in adding regression tests
for cases where the old code was incorrect as well. That is done here.

This does not cover every case where the new and old code yielded
different results. Some of them were very obscure, and it is proving
difficult even to reproduce them (since I did not keep a record of
all the input strings which triggered the differing output).
2022-05-28 21:53:38 +02:00

40 lines
1.5 KiB
PHP

--TEST--
Test verification and conversion of UCS-4 text
--EXTENSIONS--
mbstring
--FILE--
<?php
include('encoding_tests.inc');
mb_substitute_character(0x25);
testValidString("\xFF\xFE\x00\x00\x00\x30\x00\x00", "\x30\x00", "UCS-4", "UTF-16BE", false);
testValidString("\x00\x00\xFE\xFF\x00\x00\x30\x01", "\x30\x01", "UCS-4", "UTF-16BE", false);
testValidString("\x02\x30\x00\x00", "\x30\x02", "UCS-4LE", "UTF-16BE");
testValidString("\x00\x00\x30\x03", "\x30\x03", "UCS-4BE", "UTF-16BE");
// Truncated input
convertInvalidString("\x01\x02\x03", "%", "UCS-4", "UTF-8");
// Codepoint above U+10FFFF
convertInvalidString("\x00\x11\x00\x00", "%", "UCS-4BE", "UTF-8");
convertInvalidString("\x00\x00\x11\x00", "%", "UCS-4LE", "UTF-8");
// Test "long" illegal character markers
mb_substitute_character("long");
convertInvalidString("\x6F\x00\x00\x00", "U+6F000000", "UCS-4BE", "UTF-8");
convertInvalidString("\x70\x00\x00\x00", "U+70000000", "UCS-4BE", "UTF-8");
convertInvalidString("\x78\x00\x00\x01", "U+78000001", "UCS-4BE", "UTF-8");
convertInvalidString("\x80\x01\x02\x03", "U+80010203", "UCS-4BE", "UTF-8");
convertInvalidString("\x00\x01\x02", "%", "UCS-4BE", "UTF-8");
convertInvalidString("\x00\x00\x00\x6F", "U+6F000000", "UCS-4LE", "UTF-8");
convertInvalidString("\x00\x00\x00\x70", "U+70000000", "UCS-4LE", "UTF-8");
convertInvalidString("\x01\x00\x00\x78", "U+78000001", "UCS-4LE", "UTF-8");
convertInvalidString("\x02\x01\x00", "%", "UCS-4LE", "UTF-8");
echo "Done!";
?>
--EXPECT--
Done!