Commit Graph

129264 Commits

Author SHA1 Message Date
Sergey Panteleev
6a7fd48aae
[ci skip] Update NEWS for PHP 8.2.0 beta3 2022-08-02 17:00:47 +03:00
Máté Kocsis
4679805cd6
Declare ext/sodium constants in stubs (#9225) 2022-08-02 13:57:52 +02:00
Alex Dowad
5370f344d2 mb_strimwidth inserts error markers in invalid input string (for backwards compatibility)
The old implementation did this. It also did the same to the
trim marker, if the trim marker was invalid in the specified
encoding, but I have not imitated that behavior (for performance).
2022-08-02 11:07:06 +02:00
Alex Dowad
78ee18413f Move kana conversion function to mbfilter_cp5022x.c
...To avoid a dependency from libmbfl to mbstring.

Thanks to Nikita Popov for pointing this issue out.
2022-08-02 11:07:06 +02:00
Alex Dowad
e1351eb0a6 Fix legacy text conversion filter for UTF-16
Make necessary changes to filter state before using CK macro.
2022-08-02 11:07:06 +02:00
Alex Dowad
219fff376b Fix legacy text conversion filter for UTF7-IMAP
Make necessary updates to filter state before using CK macro.
2022-08-02 11:07:06 +02:00
Alex Dowad
0a6ea5bd4e Fix legacy text conversion filter for UCS-4
If a downstream filter returns -1 (error), the CK macro
will make the UCS-4 conversion filter also immediately
return. This means that any necessary updates to the filter
state have to be done *before* using CK, or it will be left
in an invalid state and will not behave correctly when
flushed.
2022-08-02 11:07:06 +02:00
Alex Dowad
44b4fb2c36 Fix legacy text conversion filter for CP50220
In my recent commit which replaced the implementation of
mb_convert_kana, the commit message noted that mb_convert_kana
previously had a bug whereby null bytes would be 'swallowed'
and not passed to the output.

This was actually the reason.
2022-08-02 11:07:06 +02:00
Alex Dowad
7299096095 New implementation of mb_strimwidth
This new implementation of mb_strimwidth uses the new text
encoding conversion filters. Changes from the previous
implementation:

• mb_strimwidth allows a negative 'from' argument, which
should count backwards from the end of the string. However,
the implementation of this feature was buggy (starting right
from when it was first implemented).

It used the following code:

    if ((from < 0) || (width < 0)) {
        swidth = mbfl_strwidth(&string);
    }
    if (from < 0) {
        from += swidth;
    }

Do you see the bug? 'from' is a count of CODEPOINTS, but
'swidth' is a count of TERMINAL COLUMNS. Adding those two
together does not make sense. If there were no fullwidth
characters in the input string, then the two counts coincide
and the feature would work correctly. However, each
fullwidth character would throw the result off by one,
causing more characters to be skipped than was requested.

• mb_strimwidth also allows a negative 'width' argument,
which again counts backwards from the end of the string;
in this case, it is not determining the START of the portion
which we want to extract, but rather, the END of that portion.
Perhaps unsurprisingly, this feature was also buggy.

Code:

    if (width < 0) {
        width = swidth + width - from;
    }

'swidth + width' is fine here; the problem is '- from'.
Again, that is subtracting a count of CODEPOINTS from a
count of TERMINAL COLUMNS. In this case, we really need
to count the terminal width of the string prefix skipped
over by 'from', and subtract that rather than the number
of codepoints which are being skipped.

As a result, if a 'from' count was passed along with a
negative 'width', for every fullwidth character in the
skipped prefix, the result of mb_strimwidth was one
terminal column wider than requested.

Since these situations were covered by unit tests, you
might wonder why the bugs were not caught. Well, as far as
I can see, it looks like the author of the 'tests' just
captured the actual output of mb_strimwidth and defined it
as 'correct'. The tests were written in such a way that it
was difficult to examine them and see whether they made
sense or not; but a careful examination of the inputs and
outputs clearly shows that the legacy tests did not conform
to the documented contract of mb_strimwidth.

• The old implementation would always pass the input string
through decoding/encoding filters before returning it to
the caller, even if it fit within the specified width. This
means that invalid byte sequences would be converted to
error markers. For performance, the new implementation
returns the very same string which was passed in if it
does not exceed the specified width. This means that
erroneous byte sequences are not converted to error markers
unless it is necessary to trim the string.

• The same applies to the 'trim marker' string.

• The old implementation was buggy in the (unusual)
case that the trim marker is wider than the requested
maximum width of the result. It did an unsigned subtraction
of the requested width and the width of the trim marker. If the
width of the trim marker was greater, that subtraction would
underflow and yield a huge number. As a result, mb_strimwidth
would then pass the input string through, even if it was
far wider than the requested maximum width.

In that case, since the input string is wider than the
requested width, and NONE of it will fit together with the
trim marker, the new implementation returns just the trim
marker. This is the one case where the output can be wider
than the requested width: when BOTH the input string and
also the trim marker are too wide.

• Since it passed the input string and trim marker through
decoding/encoding filters, when using "Quoted-Printable" as
the encoding, newlines could be inserted into the trim marker
to maintain the maximum line length for QP.

This is an extremely bizarre use case and I don't think there
is any point in worrying about it. QP will be removed from
mbstring in time, anyways.

PERFORMANCE:

• From micro-benchmarking with various input string lengths and
text encodings, it appears that the new implementation is 2-3x
faster for UTF-8 and UTF-16. For legacy Japanese text encodings
like ISO-2022-JP or SJIS, the new implementation is perhaps 25%
faster.

• Note that correctly implementing negative 'from' and 'width'
arguments imposes a small performance burden in such cases; one
which the old implementation did not pay. This slightly skews
benchmarking results in favor of the old implementation. However,
even so, the new implementation is faster in all cases which I
tested.
2022-08-02 11:07:06 +02:00
Alex Dowad
94fde1566f Move implementation of mb_strlen to mbstring.c
mbfl_strlen (in mbfilter.c) is still being used in a couple
of places but will go away soon.
2022-08-02 11:07:06 +02:00
Tim Düsterhus
c63f18dd9b
Unify ext/random unserialize errors with ext/date (#9185)
* Unify ext/random unserialize errors with ext/date

- Use `Error` instead of `Exception`.
- Adjust wording.

* Make `zend_read_property` silent in `Randomizer::__unserialize()`

Having:

> Error: Typed property Random\Randomizer::$engine must not be accessed before
> initialization

is not a value-add in this case.

* Insert the actual class name in the unserialization error of Engines

* Revert unserialization failure back to Exception from Error

see https://news-web.php.net/php.internals/118311
2022-08-02 09:00:37 +02:00
Arnaud Le Blanc
5d5d9796fc [ci skip] NEWS 2022-08-01 19:34:28 +02:00
Arnaud Le Blanc
874b861a38 Merge branch 'PHP-8.1'
* PHP-8.1:
  [ci skip] NEWS
  Extended map_ptr before copying class table (#9188)
2022-08-01 19:32:26 +02:00
Arnaud Le Blanc
832e0ef31f [ci skip] NEWS 2022-08-01 19:32:02 +02:00
Arnaud Le Blanc
bccda7eb1c Extended map_ptr before copying class table (#9188)
Fixes GH-9164
2022-08-01 19:25:07 +02:00
Arnaud Le Blanc
a69708382a
Extended map_ptr before copying class table (#9188)
Fixes GH-9164
2022-08-01 19:21:34 +02:00
George Peter Banyard
1478278f1d
SPL: Use new improved is_line_empty() function instead of the old one (#9217) 2022-08-01 17:55:30 +01:00
Tim Düsterhus
5e518c0552 [ci skip] Move 'Core' into the correct alphabetical order in NEWS
see f957e3e7f1
2022-08-01 17:39:12 +02:00
Tim Düsterhus
09e261e3b4 [ci skip] Update NEWS for ext/random
This adds 50bd8ba51c and fixes the formatting for
two other entries.
2022-08-01 17:39:05 +02:00
Dmitry Stogov
c207efab63 Merge branch 'PHP-8.1'
* PHP-8.1:
  Tracing: Prevent recording types of variables used to pass zend_class_entry
2022-08-01 17:04:08 +03:00
Dmitry Stogov
7ff71a0a55 Merge branch 'PHP-8.0' into PHP-8.1
* PHP-8.0:
  Tracing: Prevent recording types of variables used to pass zend_class_entry
2022-08-01 17:03:56 +03:00
Dmitry Stogov
2758ff2a77 Tracing: Prevent recording types of variables used to pass zend_class_entry 2022-08-01 17:02:53 +03:00
Anton Smirnov
50bd8ba51c
PcgOneseq128XslRr64::jump(): Throw ValueError for negative $advance (#9213)
* PCG64: $advance must be non-negative

Closes GH-9212
2022-08-01 13:47:14 +01:00
Dmitry Stogov
fac37347ce Merge branch 'PHP-8.1'
* PHP-8.1:
  Fix incorrect guard motion out of the loop
2022-08-01 15:33:50 +03:00
Dmitry Stogov
69c10aed58 Fix incorrect guard motion out of the loop
Fixes oss-fuzz #49579
2022-08-01 15:32:49 +03:00
Dmitry Stogov
21507ef28a Merge branch 'PHP-8.1'
* PHP-8.1:
  Fix SSA reconstruction when body of "foreach" loop is removed
2022-08-01 14:01:34 +03:00
Dmitry Stogov
4b19b85eb6 Merge branch 'PHP-8.0' into PHP-8.1
* PHP-8.0:
  Fix SSA reconstruction when body of "foreach" loop is removed
2022-08-01 14:01:11 +03:00
Dmitry Stogov
af1a7b7b72 Fix SSA reconstruction when body of "foreach" loop is removed
Fixes oss-fuzz #49483
2022-08-01 14:00:19 +03:00
zeriyoshi
4e92c74654
random: split Randomizer::getInt() without argument to Randomizer::nextInt()
Since argument overloading is not safe for reflection, the method needed
to be split appropriately.

Co-authored-by: Tim Düsterhus <timwolla@googlemail.com>

Closes GH-9057.
2022-08-01 12:19:22 +02:00
Máté Kocsis
59d257d1ae
Declare ext/tokenizer constants in stubs (#9148) 2022-08-01 10:50:56 +02:00
Nicolas Grekas
dd9f47758e
Declare Transliterator::$id as readonly to unlock subclassing it
Closes GH-9167.
2022-08-01 10:46:57 +02:00
Máté Kocsis
962baf771d
Declare ext/pcntl constants in stubs (#9075) 2022-08-01 10:26:05 +02:00
Ilija Tovilo
53e7141515
Hide skipped tests in CI (#9163) 2022-07-31 20:47:15 +02:00
David Carlier
449edd815b phpdbg few fixes, mostly printf-like format issues due to C str -> zend_string mismatches. annotate the allocator wrapper.
Closes #9210.
2022-07-31 19:07:37 +01:00
Tim Düsterhus
53ca24d46e
Improve phrasing in argument value errors in ext/random (#9206)
This rephrases the error message for argument errors to be a proper English
sentence.

Co-authored-by: Máté Kocsis <kocsismate@woohoolabs.com>
2022-07-31 19:27:28 +02:00
Bob Weinand
b3b21ed558 Fix ZEND_RC_DEBUG build in zend_test observer tests 2022-07-31 14:32:35 +00:00
Bob Weinand
50a3fa49b6 Fix observer test 2022-07-31 14:02:48 +00:00
Ilija Tovilo
7804cffe04
Fix stale message in close-stale-feature-requests.yml 2022-07-31 00:49:41 +02:00
Bob Weinand
625f164963 Include internal functions in the observer API
There are two main motivations to this:
a) The logic for handling internal and userland observation can be unified.
b) Unwinding of observed functions on a bailout does notably not include observers. Even if users of observers were to ensure such handling themselves, it would be impossible to retain the relative ordering - either the user has to unwind all internal observed frames before the automatic unwinding (zend_observer_fcall_end_all) or afterwards, but not properly interleaved.

Signed-off-by: Bob Weinand <bobwei9@hotmail.com>
2022-07-30 19:20:55 +02:00
Máté Kocsis
0c225a2f57
Declare ext/intl constants in stubs - part 1 (#9205) 2022-07-30 18:11:20 +02:00
Bob Weinand
ac31e2e611 Fix memory_leak in zend_test
Properly use globals init/shutdown to allocate the observer_observe_function_names hashtable instead of attempting to do everything in the ini changed handler
2022-07-30 15:57:08 +00:00
Bob Weinand
1c9a49e3f1 Add opcache.preload_user=root to run-tests.php if root
This prevents breaking the testsuite when running it as root.
2022-07-30 15:57:08 +00:00
Nikita Popov
828c93bedc Fix unserialize dictionary generation
We now have namespaced classes in here, and need to escape the
backslashes.
2022-07-30 17:14:22 +02:00
Máté Kocsis
98b858e756
Fix GH-9183 Get rid of unnecessary PHPDoc param and return type checks (#9203) 2022-07-30 15:37:53 +02:00
Tim Düsterhus
5aca25a134
[ci skip] Improve error message of an engine fails to seed from the CSPRNG (#9160) 2022-07-30 15:37:31 +02:00
Máté Kocsis
056e968c54
Declare ext/gd constants in stubs (#9180) 2022-07-30 15:18:06 +02:00
Máté Kocsis
668dbaf6ab
Declare the TestInterface::DUMMY constant in stub 2022-07-30 15:03:48 +02:00
Nikita Popov
fc394b476b Update libmysqlclient version used in CI
The old one is no longer available.
2022-07-29 23:17:08 +02:00
Bob Weinand
9e2de4c2d9 Add an API to manipulate observers at runtime
Signed-off-by: Bob Weinand <bobwei9@hotmail.com>
2022-07-29 13:48:05 +02:00
Ilija Tovilo
f957e3e7f1
Fix arrow function with never return type
Fixes GH-7900
Closes GH-9103
2022-07-29 12:25:09 +02:00