Commit Graph

1798 Commits

Author SHA1 Message Date
Gustavo André dos Santos Lopes
91727cb844 - Completed rewrite of html.c. Except for determine_charset, almost nothing
remains.
- Fixed bug on determine_charset that was preventing correct detection in
  combination with internal mbstring encoding "none", "pass" or "auto".
- Added profiles for entity encode/decode for HTMl 4.01, XHTML 1.0, XML 1.0
  and HTML 5. Added the constants ENT_HTML401, ENT_XML1, ENT_XHTML and
  ENT_HTML5.
- htmlentities()/htmlspecialchars(), when told not to double encode, verify
  the correctness of the existenting entities more thoroughly.
  It is checked whether the numerical entity represents a valid unicode code
  point (number is between 0 and 0x10FFFF). If using the flag ENT_DISALLOWED,
  it is also checked whether that numerical entity is valid in selected
  document. In HTML 4.01, all the numerical entities that represent a Unicode
  code point (< U+10FFFFFF) are valid, but that's not the case with other
  document types. If the entity is not valid, & is encoded to &amp;.
  For named entities, the check is also more thorough. While before the only
  check would be to determine if the entity was constituted by alphanumeric
  characters, now it is checked whether that entity is necessarily defined for
  the target document type. Otherwise, & is encoded to &amp;.
- For html_entity_decode(), only valid numerical and named entities (as defined
  above for htmlentities()/htmlspecialchars() + !double_encode) are decoded.
  But there is in this case one additional check. Entities that represent
  non-SGML or otherwise invalid characters are not decoded. Note that, in
  HTML5, U+000D is a valid literal character, but the entity &#x0D is not
  valid and is therefore not decoded.
- The hash tables lazily created for decoding in html_entity_decode() that were
  added recently were substituted by static hash tables. Instead of 1 hash
  table per encoding, there's only one hash table per document type defined in
  terms of unicode code points. This means that for charsets other than UTF-8
  and ISO-8859-1, a conversion to unicode code points is necessary before
  decoding.
- On the encoding side, the ad hoc ranges of entities of the translation
  tables, which mapped (in general) non-unicode code points to HTML entities
  were replaced by three-stage tables for HTML 4 and HTML 5. This mapping
  tables are defined only in terms of unicode code points, so a conversion
  is necessary for charsets other than UTF-8 and ISO-8859-1. Even so, the
  multi-stage table is much faster than the previous method, by a factor
  of 5; the conversion to unicode is a small penalty because it's just a
  simple table lookup.
  XML 1.0/htmlspecialchars() uses a simple table instead of a three-stage
  table.
- Added the flag ENT_SUBSTITUTE, which makes htmlentities()/htmlspecialchars()
  replace the invalid multibyte sequences with U+FFFD (UTF-8) or &#FFFD;
  (other encodings).
- Added the flag ENT_DISALLOWED. Implements FR #52860. Characters that cannot
  appear literally are replaced by U+FFFD (UTF-8) or &#FFFD; (otherwise).
  An alternative implementation would be to encode those characters into
  numerical entities, but that would only work in HTML 4.01 due to limitations
  on the values of numerical entities in other document types. See also the
  effects on htmlentities()/htmlspecialchars() with !double_encode above.
2010-10-24 15:01:02 +00:00
Gustavo André dos Santos Lopes
738be1a003 - Three tests were "broken" by rev #304404, not two. Commit the change
to remaining one.
2010-10-14 19:33:12 +00:00
Gustavo André dos Santos Lopes
bfcb754eae - Fixed get_next_char(), used by htmlentities/htmlspecialchars, accepting
certain ill-formed UTF-8 sequences.
2010-10-14 19:14:06 +00:00
Gustavo André dos Santos Lopes
3943351e6a - [DOC] Reverted rev #304382 and rev #304380, as I figured out a way to
fix the erratic behavior without breaking backwards compatibility. Namely,
  $offset retains SEEK_SET behavior but actually SEEK_CUR is passed to
  _php_stream_seek, if possible, by moving the offset stream->position bytes.
- Addresses bug #53006.
2010-10-14 03:15:15 +00:00
Gustavo André dos Santos Lopes
fbd3eb6439 - Ooops. Fixed tests for rev #304380 (stream_get_contents() related) and a small error. 2010-10-14 02:39:21 +00:00
Gustavo André dos Santos Lopes
4de6c3a948 - Added a 3rd parameter to get_html_translation_table. It now takes a charset
hint, like htmlentities et al.
- Fixed bug #49407 (get_html_translation_table doesn't handle UTF-8).
- Fixed bug #25927 (get_html_translation_table calls the ' &#39; instead of
  &#039;).
- Fixed tests for get_html_translation_table and unified the Windows and
  non-Windows versions of the tests.
2010-10-12 02:51:11 +00:00
Gustavo André dos Santos Lopes
7aa43a8d83 - Revamp of the decoding portion of html.c.
- Dramatic improvements on the performance of html_entity_decode and htmlspecialchars_decode, as the
  string is now traversed only once. Speedups of 20 to 25 times with Windows release builds and a
  ~250 characters string (for 2nd and subsequent calls).
- Consistent behavior on html_entity_decode. For instance, the entity in "&&lt;" would be decoded,
  but not "&&#233;". Not anymore. The code path for "basic" and non-basic entities is now mostly
  shared.
- Code of html_entity_decode and htmlspecialchars_decode is now shared.
- [DOC] More consistent behavior of htmlspecialchars_decode. Instead of translating only &lt;, &gt;,
  &amp;, &quot;, &#039; and &#39;, now e.g. &#34;, &apos;, &#0039;, &#x27;, etc. are also decoded.
- [DOC] Previous translation of unicode code points in numerical entities was seriously broken. When
  the code points for some character were not the same in unicode and the target encoding, the
  behavior could be an erroneous translation (e.g. 0x80-0xA0 in win-1252) or no translation at all.
  Added unicode translation tables for all single-byte encodings. Entities are not translated for
  multi-byte entities, except for ASCII characters whose code points are shared. We could add
  the huge translation tables (several thousand elements) for those encodings in the future.
- Fixed numerical entities that after # had text accepted by strcol being accepted.
- Much more commented and well-structured code...
- Tests for get_html_translation_table()) are broken. I stared fixing the tests, but then I realized
  it was completely helpless because get_html_translation_table() is broken by not handling
  multi-byte characters correctly.
2010-10-10 19:04:59 +00:00
Gustavo André dos Santos Lopes
dd5d1b2b66 - Fixed a typo in rev #304208 (24 instead of 34/'"').
- Improved the test bug53021.phpt to reflect other fixes in rev #304208.
- Updated NEWS to reflect other fixes in rev #304208.
2010-10-08 17:27:19 +00:00
Gustavo André dos Santos Lopes
df42830468 - Fixed bug #53021 (In html_entity_decode, failure to convert numeric entities with ENT_NOQUOTES and ISO-8859-1). 2010-10-08 16:19:58 +00:00
Patrick Allaert
0ef60a4544 Fixed typo in tests (thx Eyal) 2010-10-05 10:42:13 +00:00
Dmitry Stogov
d3b6fbe39b Fixed bug #52940 (call_user_func_array still allows call-time pass-by-reference). (cataphract@php.net) 2010-10-01 11:53:04 +00:00
Adam Harvey
f33837ff97 Implemented request #34857 (Change array_combine behaviour when called with
empty arrays). Patch by Joel Perras <joel.perras@gmail.com>.
2010-08-27 03:54:10 +00:00
Kalle Sommer Nielsen
a448b6a72b MFB53: Changed deprecated ini options on startup from E_WARNING to E_DEPRECATED (Fixes #52570)
# Some of the updated tests were to make them sync with 5.3 although they don't run on trunk anymore
2010-08-11 21:41:30 +00:00
Felipe Pena
ef1270e5d0 - Fix test 2010-08-08 16:48:32 +00:00
Ilia Alshanetsky
d9af17b839 Additional fix for bug #52550 & fix test & warning from previous fixes 2010-08-08 15:45:02 +00:00
Felipe Pena
3d2a6927c7 - Fixed bug #52534 (var_export array with negative key) 2010-08-04 23:11:44 +00:00
Dmitry Stogov
fa27ef4620 cleanup 2010-08-03 08:19:51 +00:00
Felipe Pena
a20d96e850 - Removed safe-mode tests 2010-08-01 16:24:42 +00:00
Scott MacVicar
c7b0abe6aa Fix a bug when var_export() causes a fatal error that could inadvertently display data due to flushing of the output buffer.
Examples include, memory limit, execution time and recursion.
2010-07-09 21:11:37 +00:00
Felipe Pena
ce72f33674 - Fixed tests 2010-07-06 00:25:52 +00:00
Felipe Pena
b355aa00b0 - Fixed bug #52138 (Constants are parsed into the ini file for section names) 2010-06-24 22:32:42 +00:00
Pierre Joye
cba1ed2475 - #50563, removing E_WARNING from parse_url() 2010-06-16 18:56:24 +00:00
Dmitry Stogov
d42dbb3bed Fixed bug #51552 (debug_backtrace() causes segmentation fault and/or memory issues) 2010-06-11 08:53:31 +00:00
Christopher Jones
064eda1838 New test for file_exists (bug #39863). It currently xfail's 2010-06-05 19:44:48 +00:00
Michael Wallner
11d24c1593 * implement new output API, fixing some bugs and implementing some feature
requests--let's see what I can dig out of the bugtracker for NEWS--
  and while crossing the road:
   * implemented new zlib API
   * fixed up ext/tidy (what was "s&" in zend_parse_parameters() supposed to do?)

Thanks to Jani and Felipe for pioneering.
2010-05-31 10:29:43 +00:00
Michael Wallner
89e93723fb Added support for object references in recursive serialize() calls. FR #36424 2010-05-26 07:24:37 +00:00
Felipe Pena
de531056f7 - Fixed bug #51899 (Parse error in parse_ini_file() function when empy value followed by no newline) 2010-05-26 02:18:17 +00:00
Martin Jansen
a389c77ce2 Changed test case to use our new dummy MX records. 2010-05-25 05:01:03 +00:00
Dmitry Stogov
c5237d82bf Added caches to eliminate repeatable run-time bindings of functions, classes, constants, methods and properties 2010-05-24 14:11:39 +00:00
Christopher Jones
f291a1253d New current()/next() test 2010-05-20 17:55:58 +00:00
Michael Wallner
e012b36ac2 * fixed bug #47842 sscanf() does not support 64-bit values 2010-05-19 11:28:08 +00:00
Pierre Joye
bd2f9d56ec - #51063, news and test 2010-05-05 13:39:35 +00:00
Adam Harvey
574e578629 Alter the getmxrr() test to use lists.php.net (which we presumably control)
instead of ez.no (which we presumably don't) for the single MX record test.
2010-05-04 09:41:47 +00:00
Antony Dovgal
c23d902e0b fix tests 2010-04-29 12:48:06 +00:00
Kalle Sommer Nielsen
dd8e59da8f Removed safe_mode
* Removed ini options, safe_mode*
 * Removed --enable-safe-mode --with-exec-dir configure options on Unix
 * Updated extensions, SAPI's and core
 * php_get_current_user() is now declared in main.c, thrus no need to include safe_mode.h anymore
2010-04-26 23:53:30 +00:00
Antony Dovgal
19b957b535 fix test 2010-04-26 13:46:40 +00:00
Antony Dovgal
8f2a6d0222 fix test 2010-04-26 13:44:23 +00:00
Antony Dovgal
76b36c0003 fix skipif sections 2010-04-26 13:41:43 +00:00
Pierrick Charron
00209d7966 Update tests related to allow_call_time_pass_reference 2010-04-26 00:27:04 +00:00
Felipe Pena
c4630c0da2 - Remove empty tests 2010-04-26 00:21:18 +00:00
Felipe Pena
0a6bcd44a7 - Removed allow_call_time_pass_reference (Pierrick) 2010-04-26 00:13:34 +00:00
Kalle Sommer Nielsen
1e9c9778f5 For real this time :-/ 2010-04-22 22:57:35 +00:00
Kalle Sommer Nielsen
7bccac6dea Fix the sys_getloadavg() test 2010-04-22 22:41:42 +00:00
Adam Harvey
c6e8a8957b Fix for bug #51604 (newline in end of header is shown in start of message).
Patch by Daniel Egeberg.
2010-04-22 02:22:49 +00:00
Kalle Sommer Nielsen
9d395a4a2b Removed import_request_variables(), this is not needed anymore without register_globals 2010-04-21 22:23:55 +00:00
Kalle Sommer Nielsen
9a38f301d6 Remove highlight.bg, it was removed in the old trunk and its not referenced in zend_highlight.c, meaning its not even implemented correctly in 5.3. 2010-04-21 21:56:24 +00:00
Kalle Sommer Nielsen
febee11285 Removed register_globals 2010-04-21 01:27:22 +00:00
Kalle Sommer Nielsen
8087be61d0 * Changed the way removed ini directives are shown so its easier to add new ones
* Removed define_syslog_variables and its associated functions
2010-04-12 01:52:55 +00:00
Felipe Pena
f3d312ce87 - Fix tests 2010-04-04 17:21:27 +00:00
Felipe Pena
7561171eca - Fix test 2010-04-04 16:59:20 +00:00