Commit Graph

60072 Commits

Author SHA1 Message Date
Gustavo André dos Santos Lopes
91727cb844 - Completed rewrite of html.c. Except for determine_charset, almost nothing
remains.
- Fixed bug on determine_charset that was preventing correct detection in
  combination with internal mbstring encoding "none", "pass" or "auto".
- Added profiles for entity encode/decode for HTMl 4.01, XHTML 1.0, XML 1.0
  and HTML 5. Added the constants ENT_HTML401, ENT_XML1, ENT_XHTML and
  ENT_HTML5.
- htmlentities()/htmlspecialchars(), when told not to double encode, verify
  the correctness of the existenting entities more thoroughly.
  It is checked whether the numerical entity represents a valid unicode code
  point (number is between 0 and 0x10FFFF). If using the flag ENT_DISALLOWED,
  it is also checked whether that numerical entity is valid in selected
  document. In HTML 4.01, all the numerical entities that represent a Unicode
  code point (< U+10FFFFFF) are valid, but that's not the case with other
  document types. If the entity is not valid, & is encoded to &amp;.
  For named entities, the check is also more thorough. While before the only
  check would be to determine if the entity was constituted by alphanumeric
  characters, now it is checked whether that entity is necessarily defined for
  the target document type. Otherwise, & is encoded to &amp;.
- For html_entity_decode(), only valid numerical and named entities (as defined
  above for htmlentities()/htmlspecialchars() + !double_encode) are decoded.
  But there is in this case one additional check. Entities that represent
  non-SGML or otherwise invalid characters are not decoded. Note that, in
  HTML5, U+000D is a valid literal character, but the entity &#x0D is not
  valid and is therefore not decoded.
- The hash tables lazily created for decoding in html_entity_decode() that were
  added recently were substituted by static hash tables. Instead of 1 hash
  table per encoding, there's only one hash table per document type defined in
  terms of unicode code points. This means that for charsets other than UTF-8
  and ISO-8859-1, a conversion to unicode code points is necessary before
  decoding.
- On the encoding side, the ad hoc ranges of entities of the translation
  tables, which mapped (in general) non-unicode code points to HTML entities
  were replaced by three-stage tables for HTML 4 and HTML 5. This mapping
  tables are defined only in terms of unicode code points, so a conversion
  is necessary for charsets other than UTF-8 and ISO-8859-1. Even so, the
  multi-stage table is much faster than the previous method, by a factor
  of 5; the conversion to unicode is a small penalty because it's just a
  simple table lookup.
  XML 1.0/htmlspecialchars() uses a simple table instead of a three-stage
  table.
- Added the flag ENT_SUBSTITUTE, which makes htmlentities()/htmlspecialchars()
  replace the invalid multibyte sequences with U+FFFD (UTF-8) or &#FFFD;
  (other encodings).
- Added the flag ENT_DISALLOWED. Implements FR #52860. Characters that cannot
  appear literally are replaced by U+FFFD (UTF-8) or &#FFFD; (otherwise).
  An alternative implementation would be to encode those characters into
  numerical entities, but that would only work in HTML 4.01 due to limitations
  on the values of numerical entities in other document types. See also the
  effects on htmlentities()/htmlspecialchars() with !double_encode above.
2010-10-24 15:01:02 +00:00
Felipe Pena
f0d2559d26 - Fixed bug #53144 (SplObjectStorage::removeAll()) 2010-10-24 14:03:07 +00:00
Andrey Hristov
fe719c5e42 profiling in trace mode 2010-10-22 15:46:26 +00:00
Dmitry Stogov
68e154b1bf reduced size of temp_variariable 2010-10-22 14:51:07 +00:00
Andrey Hristov
a25ce8c606 last piece to enable trace logging on windows 2010-10-22 14:34:33 +00:00
Andrey Hristov
76783dfeb5 enable debug logging on windows, in debug builds, of course 2010-10-22 14:12:45 +00:00
Dmitry Stogov
968bdc576c Simplified foreach() handling, we don't have to inctrement/decrement refcount twice 2010-10-22 13:59:23 +00:00
Dmitry Stogov
d12098eeec Fixed crash on attempt to insert reference to string offset into an array 2010-10-22 11:05:22 +00:00
Dmitry Stogov
635f3aff75 Removed redundant check 2010-10-22 09:56:39 +00:00
Ilia Alshanetsky
96c769f602 Upgraded bundled sqlite to version 3.7.3 2010-10-20 19:27:34 +00:00
Derick Rethans
0e24a7c400 - Strip out the typehint *checks* only. They are still parsed, and they are
still accessible through the reflection API.
2010-10-19 10:42:38 +00:00
Pierre Joye
defd00ab01 - fix Fixed NULL pointer dereference in ZipArchive::getArchiveComment, (CVE-2010-3709), report&patch from Maksymilian Arciemowicz 2010-10-19 09:56:11 +00:00
Adam Harvey
baa6f7fc71 Fix bug #53089 (php.ini should use portable example of find) by using POSIX
compliant syntax in the suggested find command for cleaning up session files in
the shipped php.ini files.
2010-10-18 02:10:29 +00:00
Felipe Pena
c88dbc2262 - Fixed bug #53070 (Calling enchant_broker_get_dict_path before set_path crashes php) 2010-10-16 17:52:01 +00:00
Derick Rethans
e9dc8785a9 - Added the writing of .sh files so that we can run the tests (including all
INI settings) in one go.
2010-10-15 12:56:45 +00:00
Dmitry Stogov
3690ce39d9 zend_collect_module_handlers() has to be called after zend_extensions startup, because they can register additional 'hidden' extensions 2010-10-15 07:30:24 +00:00
Hartmut Holzgraefe
aaa2f1c30b marked char pointer arguments as const in lots of
places where strings pointed to are not modified 
to prevent compiler warnings about discarded qualifiers ...
2010-10-14 21:33:10 +00:00
Gustavo André dos Santos Lopes
738be1a003 - Three tests were "broken" by rev #304404, not two. Commit the change
to remaining one.
2010-10-14 19:33:12 +00:00
Gustavo André dos Santos Lopes
bfcb754eae - Fixed get_next_char(), used by htmlentities/htmlspecialchars, accepting
certain ill-formed UTF-8 sequences.
2010-10-14 19:14:06 +00:00
Gustavo André dos Santos Lopes
3943351e6a - [DOC] Reverted rev #304382 and rev #304380, as I figured out a way to
fix the erratic behavior without breaking backwards compatibility. Namely,
  $offset retains SEEK_SET behavior but actually SEEK_CUR is passed to
  _php_stream_seek, if possible, by moving the offset stream->position bytes.
- Addresses bug #53006.
2010-10-14 03:15:15 +00:00
Gustavo André dos Santos Lopes
fbd3eb6439 - Ooops. Fixed tests for rev #304380 (stream_get_contents() related) and a small error. 2010-10-14 02:39:21 +00:00
Gustavo André dos Santos Lopes
1ee489f00e - [DOC] Changed stream_get_contents() so that the offset is relative to the
current position (seek with SEEK_CUR, not SEEK_SET). Only positive values are
  allowed. This breaking change is necessary to fix the erratic behavior in
  streams without a seek handlder. Addresses bug #53006.
#Note that the example on the doc page for stream_get_contents() may fail
#without this change.
#This change is also in the spirit of stream_get_contents(), whose description
#is "Reads all remaining bytes (or up to maxlen bytes) from a stream...".
#Previous behavior allowed setting the file pointer to positions before the
#current one, so they wouldn't be "remaining bytes". The previous behavior was
#also inconsistent in that it allowed an moving to offset 1, 2, ..., but not 0.
2010-10-14 02:03:18 +00:00
Adam Harvey
86944b47a6 Fix vim marker folds. 2010-10-13 09:23:39 +00:00
Dmitry Stogov
f4173a8ece Fixed bug #52939 (zend_call_function does not respect ZEND_SEND_PREFER_REF) 2010-10-13 08:51:39 +00:00
Gustavo André dos Santos Lopes
a1888f585c - Fixed forward stream seeking emulation in streams that don't support seeking
in situations where the read operation gives back less data than requested
  and when there was data in the buffer before the emulation started. Also made
  more consistent its behavior -- should return failure every time less data
  than was requested was skipped.
- Small performance improvement by correcting off-by-one error that generate an
  invalid call to the seek handler or read handler. in _php_stream_seek.
2010-10-13 03:13:29 +00:00
Kalle Sommer Nielsen
890d89fdb8 * Added version info for Windows XP Starter/Tablet PC/Media Center editions
* Fixed typo: "Unknow" -> "Unknown"
* Removed useless Win9x version info

# I will merge this to 5.3 once I have analyzed a possible bug
# (and hopefully fixed) why Server 2008 is reported as unknown
2010-10-12 17:34:25 +00:00
Dmitry Stogov
bfe51842ab Added test 2010-10-12 07:38:36 +00:00
Gustavo André dos Santos Lopes
4de6c3a948 - Added a 3rd parameter to get_html_translation_table. It now takes a charset
hint, like htmlentities et al.
- Fixed bug #49407 (get_html_translation_table doesn't handle UTF-8).
- Fixed bug #25927 (get_html_translation_table calls the ' &#39; instead of
  &#039;).
- Fixed tests for get_html_translation_table and unified the Windows and
  non-Windows versions of the tests.
2010-10-12 02:51:11 +00:00
Gustavo André dos Santos Lopes
f4a896c209 - PHP uses a big endian representation when it converts the
code unit sequences to integers so as to store the entity
  maps. Code in traverse_for_entities assumed little
  endian. Fixed.
  (in practice, due to the absence of unicode and entity
  mappings for multi-byte encodings -- except UTF-8 --, this
  doesn't matter, so the relevant code was commented out for
  performance reasons).
2010-10-11 22:26:10 +00:00
Gustavo André dos Santos Lopes
17dc181117 - Removed redundant local variable in dns_get_record.
- (5.3) Fixed bug in the Windows implementation of
  dns_get_record, where the two last parameters wouldn't be
  filled unless the type were DNS_ANY (Gustavo).
2010-10-11 03:07:03 +00:00
Gustavo André dos Santos Lopes
91f64706c2 - [DOC] Added a 5th parameter to dns_get_record, a boolean that tells whether to activate
"raw mode". In this mide, $type (2nd parameter) is the numeric type of the record, and
  the responses are not parsed -- the "type" element will be numeric and there will be
  a "data" element with the raw data of the response buffer, which the programmer will
  have to parse.
- Fixed bug in the Win32 implementation of dns_get_record, where the 3rd and 4th arguments
  would only be filled if the 2nd ($type) was DNS_ANY.
- [DOC] The 3rd and 4th parameters can now be NULL (changed their arginfo).
2010-10-11 02:48:23 +00:00
Gustavo André dos Santos Lopes
7aa43a8d83 - Revamp of the decoding portion of html.c.
- Dramatic improvements on the performance of html_entity_decode and htmlspecialchars_decode, as the
  string is now traversed only once. Speedups of 20 to 25 times with Windows release builds and a
  ~250 characters string (for 2nd and subsequent calls).
- Consistent behavior on html_entity_decode. For instance, the entity in "&&lt;" would be decoded,
  but not "&&#233;". Not anymore. The code path for "basic" and non-basic entities is now mostly
  shared.
- Code of html_entity_decode and htmlspecialchars_decode is now shared.
- [DOC] More consistent behavior of htmlspecialchars_decode. Instead of translating only &lt;, &gt;,
  &amp;, &quot;, &#039; and &#39;, now e.g. &#34;, &apos;, &#0039;, &#x27;, etc. are also decoded.
- [DOC] Previous translation of unicode code points in numerical entities was seriously broken. When
  the code points for some character were not the same in unicode and the target encoding, the
  behavior could be an erroneous translation (e.g. 0x80-0xA0 in win-1252) or no translation at all.
  Added unicode translation tables for all single-byte encodings. Entities are not translated for
  multi-byte entities, except for ASCII characters whose code points are shared. We could add
  the huge translation tables (several thousand elements) for those encodings in the future.
- Fixed numerical entities that after # had text accepted by strcol being accepted.
- Much more commented and well-structured code...
- Tests for get_html_translation_table()) are broken. I stared fixing the tests, but then I realized
  it was completely helpless because get_html_translation_table() is broken by not handling
  multi-byte characters correctly.
2010-10-10 19:04:59 +00:00
Gustavo André dos Santos Lopes
b1d5cf7348 - Added numeric record type and raw data for unknown DNS
record types.
2010-10-08 23:02:37 +00:00
Gustavo André dos Santos Lopes
dd5d1b2b66 - Fixed a typo in rev #304208 (24 instead of 34/'"').
- Improved the test bug53021.phpt to reflect other fixes in rev #304208.
- Updated NEWS to reflect other fixes in rev #304208.
2010-10-08 17:27:19 +00:00
Gustavo André dos Santos Lopes
df42830468 - Fixed bug #53021 (In html_entity_decode, failure to convert numeric entities with ENT_NOQUOTES and ISO-8859-1). 2010-10-08 16:19:58 +00:00
Andrey Hristov
74ec58a045 new collations 2010-10-08 09:15:31 +00:00
Felipe Pena
dbb6e0e9b0 - Added bison 2.4.3 version to the bison version list 2010-10-07 21:44:41 +00:00
Andrey Hristov
0e519d247e fix some uninitialized variables. also fix shadowing of global symbols 2010-10-07 13:49:00 +00:00
Ilia Alshanetsky
412d151681 Fixed extrenous warning inside openssl_encrypt() for cases where iv not provided, but algo does not require an iv 2010-10-07 12:32:00 +00:00
Gustavo André dos Santos Lopes
e283f7a7fe - Added support for ICU Transformations (Transliterator).
- Changes request #52986 to "to be documented".
2010-10-06 18:53:27 +00:00
Gustavo André dos Santos Lopes
da6366e74a - Fixed test for bug #50590 on systems with 64-bit longs. 2010-10-06 17:05:05 +00:00
Andrey Hristov
1f9cf93cac Fix for Bug #52686 mysql_stmt_attr_[gs]et arg. points to incorrect type 2010-10-06 11:11:02 +00:00
Andrey Hristov
e38078bae9 fix broken merge, led to double define 2010-10-06 10:35:02 +00:00
Pierre Joye
76d273d455 - don't build if no libvpx available 2010-10-06 09:02:08 +00:00
Andrey Hristov
63d6892b56 more variable hiding fixes 2010-10-06 07:09:37 +00:00
Andrey Hristov
b373ccd6fc fix shadowing of parameters 2010-10-06 06:08:55 +00:00
Kalle Sommer Nielsen
b5831c05b3 Update copyright year for the license 2010-10-05 22:58:19 +00:00
Andrey Hristov
223832c501 rename parameter name - should not shadow the global symbol alloca 2010-10-05 17:20:00 +00:00
Andrey Hristov
c7a09c682d two more compiler warnings fixed - size does matter 2010-10-05 17:10:47 +00:00
Andrey Hristov
396402fc97 Rename a method so it doesn't clash with a global symbol - a function
Fix compiler waring by extening the type of a variable
2010-10-05 17:03:50 +00:00