mb_convert_kana now uses the new text encoding conversion
filters. Microbenchmarking shows speed gains of 50%-150%
across various text encodings and input string lengths.
The behavior is the same as the old mb_convert_kana
except for one fix: if the 'zero codepoint' U+0000 appeared
in the input, the old implementation would sometimes drop
it, not passing it through to the output. This is now
fixed.
There were bugs in the legacy implementation. Lots of them.
It did not properly track whether it has switched to JISX 0213 plane 1
or plane 2. If it processes a character in plane 1 and then immediately
one in plane 2, it failed to emit the escape code to switch to plane 2.
Further, when converting codepoints from 0x80-0xFF to ISO-2022-JP-2004,
the legacy implementation would totally disregard which mode it was
operating in. Such codepoints would pass through directly to the output
without any escape sequences being emitted.
If that was not enough, all the legacy implementations of JISX 0213:2004
encodings had another common bug; their 'flush function' did not call
the next flush function in the chain of conversion filters. So if any
of these encodings were converted to an encoding where the flush
function was needed to finish the output string, then the output
would be truncated.
All the legacy implementations of JISX 0213:2004 encodings had a
common bug; their 'flush function' did not call the next flush function
in the chain of conversion filters. So if any of these encodings were
converted to an encoding where the flush function was needed to finish
the output string, then the output would be truncated.
Sigh. Double sigh. After fruitlessly searching the Internet for information on
this mysterious text encoding called "SJIS-open", I wrote a script to try
converting every Unicode codepoint from 0-0xFFFF and compare the results from
different variants of Shift-JIS, to see which one "SJIS-open" would be most
similar to.
The result? It's just CP932. There is no difference at all. So why do we have
two implementations of CP932 in mbstring?
In case somebody, somewhere is using "SJIS-open" (or its aliases "SJIS-win" or
"SJIS-ms"), add these as aliases to CP932 so existing code will continue to
work.
mbstring had an 'identify filter' for almost every supported text encoding
which was used when auto-detecting the most likely encoding for a string.
It would run over the string and set a 'flag' if it saw anything which
did not appear likely to be the encoding in question.
One problem with this scheme was that encodings which merely appeared
less likely to be the correct one were completely rejected, even if there
was no better candidate. Another problem was that the 'identify filters'
had a huge amount of code duplication with the 'conversion filters'.
Eliminate the identify filters. Instead, when auto-detecting text
encoding, use conversion filters to see whether the input string is valid
in candidate encodings or not. At the same type, watch the type of
codepoints which the string decodes to and mark it as less likely if
non-printable characters (ESC, form feed, bell, etc.) or 'private use
area' codepoints are seen.
Interestingly, one old test case in which JIS text was misidentified
as UTF-8 (and this wrong behavior was enshrined in the test) was 'fixed'
and the JIS string is now auto-detected as JIS.
There is no meaningful difference between these and UCS-{2,4}. They are
just a little bit more lax about passing errors silently. They also have
no known use.
Alias to UCS-{2,4} in case someone, somewhere is using them.
Very interesting... it turns out that when Valgrind support was enabled,
`#include "config.h"` from within mbstring was actually including the file "config.h"
from Valgrind, and not the one from mbstring!!
This is because -I/usr/include/valgrind was added to the compiler invocation _before_
-Iext/mbstring/libmbfl.
Make sure we actually include the file which was intended.
All memory allocation and deallocation for mbstring bounces through a table of
function pointers before going to emalloc/efree/etc. But this is unnecessary.
The allocators are never swapped out. Better to just call them directly.
Oniguruma is exclusively used by ext/mbstring, and only if mbregex is
enabled. Therefore it is unnecessary and confusing to have Oniguruma
related config stuff scattered elsewhere.
While we're at it, we also remove the referral to the bundled libonig
which is removed as of PHP 7.4.0, and the duplicated call to
`PHP_INSTALL_HEADERS()`.
Used in php-src the past and today removed and not used anymore:
- HAVE_CURL_EASY_STRERROR
- HAVE_CURL_MULTI_STRERROR
- HAVE_NEW_MIME2TEXT
- HAVE_MBSTR_CN
- HAVE_MBSTR_JA
- HAVE_MBSTR_KR
- HAVE_MBSTR_RU
- HAVE_MBSTR_TW
Part of oniguruma which doesn't use these anymore
- NOT_RUBY
- HAVE_STDARG_PROTOTYPES
Unused:
- HAVE_MPIR
Closes GH-4427
The $Id$ keywords were used in Subversion where they can be substituted
with filename, last revision number change, last changed date, and last
user who changed it.
In Git this functionality is different and can be done with Git attribute
ident. These need to be defined manually for each file in the
.gitattributes file and are afterwards replaced with 40-character
hexadecimal blob object name which is based only on the particular file
contents.
This patch simplifies handling of $Id$ keywords by removing them since
they are not used anymore.
The bundled libmbfl library is no longer API or ABI compatible with
the (currently unmaintained) upstream library. As such, building
against an external libmbfl is no longer possible.
* origin/master:
updated NEWS
refactored the mbstring config.w32
Update NEWS
Fixed compilation warnings
Fixed bug #68504 --with-libmbfl configure option not present on Windows
Changed "finally" handling. Removed EX(fast_ret) and EX(delayed_exception). Allocate and use additional IS_TMP_VAR slot on VM stack instead.
the darwin specific test fails for me with the same output which is the expected for the original test I couldn't find anybody who managed to see this test passing, but I found a bunch of other reports on qa.php.net/reports and on google which do see this test failing on mac. if this change causes you to have this test failing on Mac, please drop me a mail so we can improve the current test so it passes for everybody.
#68446 is fixed
Reimplemented silence operator (@) handling on exceptions. Now each silence region is stored in op_array->brk_cont_array. On exception ZEND_HANDLE_EXCEPTION handler traverse this array and restore original EG(error_reporting) if exception occured inside a "silence" region.
remove the NEWS entries for the reverted stuff
typo fix
go back with phpdbg to the state of 5.6.3, reverting the controversial commits(remote debugging/xml protocol)
5.5.21 now
New label length test
Fix ext/filter/tests/033.phpt
Fix filter_list test
FILTER_VALIDATE_DOMAIN and RFC conformance for FILTER_VALIDATE_URL
Conflicts:
ext/mbstring/config.w32