This change results in using the same buffer for multiple
stdio events which should fix inconsistencies of handling
messages that are not ended with a new line and possibly
very long messages that are split to multiple events.
Basically, the algorithm to append a converted string to an existing
`smart_str` works by increasing the `smart_str` buffer, to let `iconv`
convert characters until there is no more space, to set the new length
of the `smart_str` and to repeat until there is no more input.
Formerly, the new length calculation has been wrong, though, since we
would have to take the old `out_len` into account (`buf_growth -
old_out_len - out_len`). However, since there is no need to take the
old `out_len` into account when increasing the `smart_str` buffer, we
can simplify the fix, avoiding an additional variable.
We must not ignore erroneous characters in mime headers, but rather let
iconv_mime_decode() fail in this case, issuing the usual notice
regarding illegal characters.
We have to cater to the possibility that `=?` is not the start of an
encoded-word, but rather a literal `=?`. If a line break is found
while we're still looking for the charset, we can safely assume that
it's a literal `=?`, and act accordingly.
If we're expecting the start of an encoded word (`=?`), but instead of
the question mark get a line break (CR or LF), we must not append it to
the `pretval`.
The minimum length of an encoded-word is actually the pure encoding
overhead plus the length of the `output-charset` plus the minimum unit
of encoded text, which is 4 for B-encoding and (for simplicity) 3 for
Q-encoding. We also cater to the possibility that we need further
encoded words, which would be split by the `line-break-chars` followed
by a space character. Obviously, the former `out_charset_len + 12` is
too simplistic and wrong in the given case (where the magic number
would be 13).
These simplifications are somewhat wasteful, but iconv_mime_encode()
with Q-encoding is wasteful anyway (see bug 66828[1]), and the proper
solution to convert the whole input to the desired output charset
upfront, and applying the encoding afterwards appears too much a change
for the stable releases.
[1] <https://bugs.php.net/66828>
We work around this peculiarity of libxml by using xmlNodeSetContent(),
which does not exhibit this behavior. This also saves us from manually
calculating the string length.
There's no need to actually try to trigger an out-of-memory condition
to proof the leak; instead we can simply rely on the Zend MM to report
the memory leaks in debug mode (at least on Linux). Therefore we
simplify the regression test, which also makes it run much faster.