php-src

mirror of https://github.com/php/php-src.git synced 2024-09-23 10:57:26 +00:00

Author	SHA1	Message	Date
Yasuo Ohgaki	a84e5dc37d	Remove unneeded string copy. Allow to set ''(empty string values) internal/input/output_encoding for better compatibility. i.e. Runtime INI value changes. More compliance to the RFC. Improve/add encoding handling tests. i.e. Rather than set encoding automagic way, detect it.	2014-03-27 17:20:57 +09:00
Yasuo Ohgaki	e1fe76f28a	Add default_charset handling	2014-03-20 10:50:32 +09:00
Yasuo Ohgaki	cbd108abf1	Implement RFC https://wiki.php.net/rfc/default_encoding	2014-02-13 11:54:52 +09:00
Xinchen Hui	c081ce628f	Bump year	2014-01-03 11:08:10 +08:00
Christopher Jones	9ad97cd489	Reduce (some) compile noise of 'unused variable' and 'may be used uninitialized' warnings.	2013-08-14 20:36:50 -07:00
Gustavo Lopes	77ee200097	Fix bug #64011 (get_html_translation_table()) get_html_translation_table() with encoding ISO-8859-1 and HTMLENTITIES was broken. Only entities for characters U+0000 to U+0040 were being included in the result.	2013-01-18 12:10:27 +01:00
Xinchen Hui	0a7395e009	Happy New Year	2013-01-01 16:28:54 +08:00
Gustavo André dos Santos Lopes	cfdd6c5788	MFH: `7dcada1` for 5.4 - Fixed possible unsigned int wrap around in html.c. Note that 5.3 has the same (potential) problem; even though the code is substantially different, the variable name and the fashion it was incremented was kept.	2012-03-19 16:36:21 +00:00
Gustavo André dos Santos Lopes	ed98579924	- Fixed bug #61374 : html_entity_decode tries to decode code points that don't exist in ISO-8859-1.	2012-03-13 18:08:30 +00:00
Gustavo André dos Santos Lopes	d4cf399cc4	- Merge r323056 (see bug #60965 ).	2012-02-05 09:59:33 +00:00
Felipe Pena	4e19825281	- Year++	2012-01-01 13:15:04 +00:00
Gustavo André dos Santos Lopes	79bb42548d	- Less GCC warnings; code less readable, yay! - Fixed html_tables.h generaration in 64-bit archs. - Closes bug #55394 - Patch to suppress initialization warnings in html.c #signed/unsigned mismatches for another day #regenerated tables on another commit	2011-08-31 05:45:02 +00:00
Xinchen Hui	5540b64a3d	Eliminated compiler's warnings	2011-08-10 11:59:11 +00:00
Gustavo André dos Santos Lopes	a61534eab8	- Elided unused argument in internal linkage function.	2011-08-09 00:40:45 +00:00
Gustavo André dos Santos Lopes	547a96090f	- Fixed bug #54332 (trunk only, null pointer deref due to information loss on long to int conversion) - Fixed some int* pointers being passed as size_t*.	2011-03-20 15:15:08 +00:00
Gustavo André dos Santos Lopes	4a946a91e5	- Fixed CHARSET_UNICODE_COMPAT (ISO-8859-1 is compatible in the relevant sense). - Fixed usage of zend_multibyte_get_internal_encoding (its return cannot be cast to char*). - Change tests to reflect that charset detection now relies on internal_encoding, not on current_internal_encoding. NOTE: This fixes the changes in rev 306077, but it remains that that change introduced a BC break. I assumed it was intentional	2011-01-25 10:57:07 +00:00
Felipe Pena	0203cc3d44	- Year++	2011-01-01 02:17:06 +00:00
Dmitry Stogov	755c2cd0d8	Removed compile time dependency from ext/mbstring	2010-12-08 11:27:34 +00:00
Pierrick Charron	71dfe80e05	Remove unused variables	2010-11-17 17:55:18 +00:00
Gustavo André dos Santos Lopes	e69b1ff2c4	- Fixed bug #49687 (utf8_decode vulnerabilities and deficiencies in the number of reported malformed sequences). (Gustavo) #Made a public interface for get_next_char/utf-8 in trunk to use in utf8_decode. #In PHP 5.3, trunk's get_next_char was copied to xml.c because 5.3's #get_next_char is different and is not prepared to recover appropriately from #errors.	2010-10-27 18:13:25 +00:00
Ilia Alshanetsky	18fa045e75	Code cleanup & CS	2010-10-25 16:46:55 +00:00
Gustavo André dos Santos Lopes	20e2c5fc33	- Fixed uninitialized and 1 character short local variable.	2010-10-24 21:19:04 +00:00
Gustavo André dos Santos Lopes	91727cb844	- Completed rewrite of html.c. Except for determine_charset, almost nothing remains. - Fixed bug on determine_charset that was preventing correct detection in combination with internal mbstring encoding "none", "pass" or "auto". - Added profiles for entity encode/decode for HTMl 4.01, XHTML 1.0, XML 1.0 and HTML 5. Added the constants ENT_HTML401, ENT_XML1, ENT_XHTML and ENT_HTML5. - htmlentities()/htmlspecialchars(), when told not to double encode, verify the correctness of the existenting entities more thoroughly. It is checked whether the numerical entity represents a valid unicode code point (number is between 0 and 0x10FFFF). If using the flag ENT_DISALLOWED, it is also checked whether that numerical entity is valid in selected document. In HTML 4.01, all the numerical entities that represent a Unicode code point (< U+10FFFFFF) are valid, but that's not the case with other document types. If the entity is not valid, & is encoded to &. For named entities, the check is also more thorough. While before the only check would be to determine if the entity was constituted by alphanumeric characters, now it is checked whether that entity is necessarily defined for the target document type. Otherwise, & is encoded to &. - For html_entity_decode(), only valid numerical and named entities (as defined above for htmlentities()/htmlspecialchars() + !double_encode) are decoded. But there is in this case one additional check. Entities that represent non-SGML or otherwise invalid characters are not decoded. Note that, in HTML5, U+000D is a valid literal character, but the entity &#x0D is not valid and is therefore not decoded. - The hash tables lazily created for decoding in html_entity_decode() that were added recently were substituted by static hash tables. Instead of 1 hash table per encoding, there's only one hash table per document type defined in terms of unicode code points. This means that for charsets other than UTF-8 and ISO-8859-1, a conversion to unicode code points is necessary before decoding. - On the encoding side, the ad hoc ranges of entities of the translation tables, which mapped (in general) non-unicode code points to HTML entities were replaced by three-stage tables for HTML 4 and HTML 5. This mapping tables are defined only in terms of unicode code points, so a conversion is necessary for charsets other than UTF-8 and ISO-8859-1. Even so, the multi-stage table is much faster than the previous method, by a factor of 5; the conversion to unicode is a small penalty because it's just a simple table lookup. XML 1.0/htmlspecialchars() uses a simple table instead of a three-stage table. - Added the flag ENT_SUBSTITUTE, which makes htmlentities()/htmlspecialchars() replace the invalid multibyte sequences with U+FFFD (UTF-8) or &#FFFD; (other encodings). - Added the flag ENT_DISALLOWED. Implements FR #52860. Characters that cannot appear literally are replaced by U+FFFD (UTF-8) or &#FFFD; (otherwise). An alternative implementation would be to encode those characters into numerical entities, but that would only work in HTML 4.01 due to limitations on the values of numerical entities in other document types. See also the effects on htmlentities()/htmlspecialchars() with !double_encode above.	2010-10-24 15:01:02 +00:00
Gustavo André dos Santos Lopes	bfcb754eae	- Fixed get_next_char(), used by htmlentities/htmlspecialchars, accepting certain ill-formed UTF-8 sequences.	2010-10-14 19:14:06 +00:00
Gustavo André dos Santos Lopes	4de6c3a948	- Added a 3rd parameter to get_html_translation_table. It now takes a charset hint, like htmlentities et al. - Fixed bug #49407 (get_html_translation_table doesn't handle UTF-8). - Fixed bug #25927 (get_html_translation_table calls the ' ' instead of '). - Fixed tests for get_html_translation_table and unified the Windows and non-Windows versions of the tests.	2010-10-12 02:51:11 +00:00
Gustavo André dos Santos Lopes	f4a896c209	- PHP uses a big endian representation when it converts the code unit sequences to integers so as to store the entity maps. Code in traverse_for_entities assumed little endian. Fixed. (in practice, due to the absence of unicode and entity mappings for multi-byte encodings -- except UTF-8 --, this doesn't matter, so the relevant code was commented out for performance reasons).	2010-10-11 22:26:10 +00:00
Gustavo André dos Santos Lopes	7aa43a8d83	- Revamp of the decoding portion of html.c. - Dramatic improvements on the performance of html_entity_decode and htmlspecialchars_decode, as the string is now traversed only once. Speedups of 20 to 25 times with Windows release builds and a ~250 characters string (for 2nd and subsequent calls). - Consistent behavior on html_entity_decode. For instance, the entity in "&<" would be decoded, but not "&é". Not anymore. The code path for "basic" and non-basic entities is now mostly shared. - Code of html_entity_decode and htmlspecialchars_decode is now shared. - [DOC] More consistent behavior of htmlspecialchars_decode. Instead of translating only <, >, &, ", ' and ', now e.g. ", ', ', ', etc. are also decoded. - [DOC] Previous translation of unicode code points in numerical entities was seriously broken. When the code points for some character were not the same in unicode and the target encoding, the behavior could be an erroneous translation (e.g. 0x80-0xA0 in win-1252) or no translation at all. Added unicode translation tables for all single-byte encodings. Entities are not translated for multi-byte entities, except for ASCII characters whose code points are shared. We could add the huge translation tables (several thousand elements) for those encodings in the future. - Fixed numerical entities that after # had text accepted by strcol being accepted. - Much more commented and well-structured code... - Tests for get_html_translation_table()) are broken. I stared fixing the tests, but then I realized it was completely helpless because get_html_translation_table() is broken by not handling multi-byte characters correctly.	2010-10-10 19:04:59 +00:00
Gustavo André dos Santos Lopes	dd5d1b2b66	- Fixed a typo in rev #304208 (24 instead of 34/'"'). - Improved the test bug53021.phpt to reflect other fixes in rev #304208. - Updated NEWS to reflect other fixes in rev #304208.	2010-10-08 17:27:19 +00:00
Gustavo André dos Santos Lopes	df42830468	- Fixed bug #53021 (In html_entity_decode, failure to convert numeric entities with ENT_NOQUOTES and ISO-8859-1).	2010-10-08 16:19:58 +00:00
Kalle Sommer Nielsen	cb50011016	Fixed compiler warnings in the standard library	2010-09-23 03:45:36 +00:00
Rasmus Lerdorf	906dd4eac5	Switch default_charset, if not specified, from ISO-8859-1 to UTF-8 I have been wanting to make this change for years, but there is a small chance of BC issues, so it shouldn't go into a minor release.	2010-03-23 18:08:06 +00:00
Moriyoshi Koizumi	73ba495674	- Forgot to commit this patch. Sorry.	2010-03-12 16:19:25 +00:00
Sebastian Bergmann	9ba1e81665	sed -i "s#1997-2009#1997-2010#g" */.c */.h */.php	2010-01-03 09:23:27 +00:00
Moriyoshi Koizumi	7d9a7dbad6	- Fix bug #46478 (htmlentities() uses obsolete mapping table for character entity references)	2009-12-22 05:50:34 +00:00
Moriyoshi Koizumi	413196c574	- Take account of surrogate pairs.	2009-12-07 15:41:43 +00:00
Moriyoshi Koizumi	20737bac6a	- Bug #49785 : take 5. What the hell happened to me...	2009-10-13 05:18:37 +00:00
Moriyoshi Koizumi	884cf3f1c0	- Bug #49785 : take 4 - typo. this flaw is unharmful since the return value of get_next_char() is only used when UTF-8 is specified to the third argument.	2009-10-12 14:29:45 +00:00
Moriyoshi Koizumi	1835a63dfd	- A couple more fix for my previous fix. (one of the fix by Arnaud Le Blanc. Thanks!)	2009-10-11 23:52:33 +00:00
Moriyoshi Koizumi	9d19866476	- Fixed bug #49785 (insufficient input string validation of htmlspecialchars()).	2009-10-09 10:02:38 +00:00
Sebastian Bergmann	08659c2dcd	MFH: Bump copyright year, 3 of 3.	2008-12-31 11:15:49 +00:00
Arnaud Le Blanc	18794addbd	MFH: Added ENT_IGNORE as a compatibility flag for htmlentities() and htmlspecialchars() to skip multibyte sequences intead of returning an empty string (as iconv's //IGNORE). These functions will still never return an invalid or incomplete multibyte sequence. Fixes #43896	2008-11-26 03:00:06 +00:00
Arnaud Le Blanc	a05edaf2bd	MFB 5.2	2008-11-26 02:43:16 +00:00
Arnaud Le Blanc	d69dfa4b9f	MFH: initialize optional vars	2008-10-21 22:08:38 +00:00
Moriyoshi Koizumi	0699894884	- MFH: beware of signedness	2008-08-18 03:26:21 +00:00
Arnaud Le Blanc	71e50de4fc	MFH: Fixed bug #45581 (htmlspecialchars() double encoding &#x hex items)	2008-08-10 13:26:13 +00:00
Felipe Pena	fce4f9600e	MFB: Fixed bug #44703 (htmlspecialchars() does not detect bad character set argument)	2008-04-11 19:06:12 +00:00
Stanislav Malyshev	223a53fdeb	rm cruft	2008-01-29 22:03:01 +00:00
Antony Dovgal	37a607c7f8	fix #43927 (koi8r is missing from html_entity_decode()) patch by andy at demos dot su	2008-01-28 23:07:12 +00:00
Scott MacVicar	23e3baf62d	Fix html_entity_decode when converting numeric html entities, the numeric values for the extended characters don't correspond to that of windows-1251 and cp866.	2008-01-25 18:10:45 +00:00
Sebastian Bergmann	d1dded8751	MFH: Bump copyright year, 2 of 2.	2007-12-31 07:17:19 +00:00

1 2 3 4

181 Commits