mirror of
https://github.com/php/php-src.git
synced 2024-09-22 18:37:25 +00:00
0e7160b836
Regarding the optional 3rd `strict` argument to mb_detect_encoding, the documentation states: Controls the behaviour when string is not valid in any of the listed encodings. If strict is set to false, the closest matching encoding will be returned; if strict is set to true, false will be returned. (Ref: https://www.php.net/manual/en/function.mb-detect-encoding.php) Because of bugs in the implementation, mb_detect_encoding did not always behave according to this description when `strict` was false. For example: <?php echo var_export(mb_detect_encoding("\xc0\x00", "UTF-8", false)); // Before this commit, prints: false // After this commit, prints: 'UTF-8' Because `strict` is false in the above example, mb_detect_encoding should return the 'closest matching encoding', which is UTF-8, since that is the only candidate encoding. (Incidentally, this example shows that using mb_detect_encoding with a single candidate encoding in non-strict mode is useless.) The new implementation fixes this bug. It also fixes another problem with the old implementation as regards non-strict detection mode: The old implementation would stop processing of the input string using a particular candidate encoding as soon as it saw an error in that encoding, even in non-strict mode. This means that it could not really detect the 'closest matching encoding'; rather, what it would return in non-strict mode was 'the encoding in which the first decoding error is furthest from the beginning of the input string'. In non-strict mode, the new implementation continues trying to process the input string to its end even after seeing an error. This makes it possible to determine in which candidate encoding the string has the smallest number of errors, i.e. the 'closest matching encoding'. Rejecting candidate encodings as soon as it saw an error gave the old implementation a marked performance advantage in non-strict mode; however, the new implementation still beats it in most cases. Here are a few sample microbenchmark results: UTF-8, ~100 codepoints, strict mode Old: 0.080s (100,000 calls) New: 0.026s (" " ) UTF-8, ~100 codepoints, non-strict mode Old: 0.079s (100,000 calls) New: 0.033s (" " ) UTF-8, ~10000 codepoints, strict mode Old: 6.708s (60,000 calls) New: 1.383s (" " ) UTF-8, ~10000 codepoints, non-strict mode Old: 6.705s (60,000 calls) New: 3.044s (" " ) Notice that the old implementation had almost identical performance between strict and non-strict mode, while the new suffers a significant performance penalty for non-strict detection. This is the cost of implementing the behavior specified in the documentation. A couple more sample results: SJIS, ~10000 codepoints, strict mode Old: 4.563s New: 1.084s SJIS, ~10000 codepoints, non-strict mode Old: 4.569s New: 2.863s This is the only case I found where the new implementation loses: UTF-16LE, ~10000 codepoints, non-strict mode Old: 1.514s New: 2.813s The reason is because the test strings happened to be invalid right from the first few bytes for all the candidate encodings except for UTF-16LE; so the old implementation would immediately reject all those encodings and only process the entire string in UTF-16LE. I believe mb_detect_encoding could be made much faster if we identified good criteria for when to reject candidate encodings before reaching the end of the input string.
90 lines
3.3 KiB
Plaintext
90 lines
3.3 KiB
Plaintext
PHP NEWS
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
?? ??? ????, PHP 8.3.0alpha1
|
|
|
|
- CLI:
|
|
. Added pdeathsig to builtin server to terminate workers when the master
|
|
process is killed. (ilutov)
|
|
|
|
- Core:
|
|
. Fixed bug GH-9388 (Improve unset property and __get type incompatibility
|
|
error message). (ilutov)
|
|
. SA_ONSTACK is now set for signal handlers to be friendlier to other
|
|
in-process code such as Go's cgo. (Kévin Dunglas)
|
|
. SA_ONSTACK is now set when signals are disabled. (Kévin Dunglas)
|
|
. Fix GH-9649: Signal handlers now do a no-op instead of crashing when
|
|
executed on threads not managed by TSRM. (Kévin Dunglas)
|
|
. Fixed potential NULL pointer dereference Windows shm*() functions. (cmb)
|
|
. Added shadow stack support for fibers. (Chen Hu)
|
|
. Fix bug GH-9965 (Fix accidental caching of default arguments with side
|
|
effects). (ilutov)
|
|
|
|
- Fileinfo:
|
|
. Upgrade bundled libmagic to 5.43. (Anatol)
|
|
|
|
- FPM:
|
|
. The status.listen shared pool now uses the same php_values (including
|
|
expose_php) and php_admin_value as the pool it is shared with. (dwxh)
|
|
|
|
- GD:
|
|
. Fixed bug #81739: OOB read due to insufficient input validation in
|
|
imageloadfont(). (CVE-2022-31630) (cmb)
|
|
|
|
- Hash:
|
|
. Fixed bug #81738: buffer overflow in hash_update() on long parameter.
|
|
(CVE-2022-37454) (nicky at mouha dot be)
|
|
|
|
- Intl:
|
|
. Added pattern format error infos for numfmt_set_pattern. (David Carlier)
|
|
|
|
- JSON:
|
|
. Added json_validate(). (Juan Morales)
|
|
|
|
- MBString:
|
|
. mb_detect_encoding is better able to identify the correct encoding for Turkish text. (Alex Dowad)
|
|
. mb_detect_encoding's "non-strict" mode now behaves as described in the
|
|
documentation. Previously, it would return false if the very first byte
|
|
of the input string was invalid in all candidate encodings. (Alex Dowad)
|
|
|
|
- Opcache:
|
|
. Added start, restart and force restart time to opcache's
|
|
phpinfo section. (Mikhail Galanin)
|
|
. Fix GH-9139: Allow FFI in opcache.preload when opcache.preload_user=root.
|
|
(Arnaud, Kapitan Oczywisty)
|
|
. Made opcache.preload_user always optional in the cli and phpdbg SAPIs.
|
|
(Arnaud)
|
|
. Allows W/X bits on page creation on FreeBSD despite system settings.
|
|
(David Carlier)
|
|
|
|
- PCNTL:
|
|
. SA_ONSTACK is now set for pcntl_signal. (Kévin Dunglas)
|
|
. Added SIGINFO constant. (David Carlier)
|
|
|
|
- Posix:
|
|
. Added posix_sysconf. (David Carlier)
|
|
|
|
- Random:
|
|
. Added Randomizer::getBytesFromString(). (Joshua Rüsweg)
|
|
. Added Randomizer::nextFloat(), ::getFloat(), and IntervalBoundary. (timwolla)
|
|
|
|
- Reflection:
|
|
. Fix GH-9470 (ReflectionMethod constructor should not find private parent
|
|
method). (ilutov)
|
|
|
|
- Sockets:
|
|
. Added SO_ATTACH_REUSEPORT_CBPF socket option, to give tighter control
|
|
over socket binding for a cpu core. (David Carlier)
|
|
. Added SKF_AD_QUEUE for cbpf filters. (David Carlier)
|
|
. Added socket_atmark if send/recv needs using MSG_OOB. (David Carlier)
|
|
. Added TCP_QUICKACK constant, to give tigher control over
|
|
ACK delays. (David Carlier)
|
|
|
|
- Standard:
|
|
. E_NOTICEs emitted by unserialize() have been promoted to E_WARNING. (timwolla)
|
|
|
|
- Streams:
|
|
. Fixed bug #51056: blocking fread() will block even if data is available.
|
|
(Jakub Zelenka)
|
|
|
|
<<< NOTE: Insert NEWS from last stable release here prior to actual release! >>>
|