php-src

mirror of https://github.com/php/php-src.git synced 2024-09-21 18:07:23 +00:00

History

Alex Dowad a1a69c3734 Support Microsoft's "Best Fit" mappings for Windows-1252 text encoding In `b5ff87ca71`, I made a number of adjustments to our conversion code for CP1252. One of the adjustments was to make the mappings match those published by the Unicode Consortium in the file CP1252.TXT. These do not include mappings for the CP1252 bytes 0x81, 0x8D, 0x8F, 0x90, and 0x9D. Rostyslav Gulka reported that this caused a problem. His application stores binary JPEG data in an MS-SQL database. When they SELECT the binary data out of the database, it is treated as CP1252 text and automatically converted to UTF-8. To recover the original binary data, they then do a conversion from UTF-8 to CP1252. Obviously, that does not work if certain CP1252 bytes do not map to any Unicode codepoint at all. While this is a very unusual application of text encoding conversion, and we might choose not to support it if there was no other basis for including those mappings, it seems that Microsoft does actually include them in the Win32 API as "best fit" mappings. These are extra mappings from Unicode to other text encodings, which the Win32 API function WideCharToMultiByte uses by default unless the WC_NO_BEST_FIT_CHARS flag was passed. A list of these "best fit" mappings for CP1252 can be found here: https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252.txt		2022-12-09 15:18:37 +02:00
..
libmbfl	Support Microsoft's "Best Fit" mappings for Windows-1252 text encoding	2022-12-09 15:18:37 +02:00
tests	Support Microsoft's "Best Fit" mappings for Windows-1252 text encoding	2022-12-09 15:18:37 +02:00
ucgendat	Combine control into one character group	2021-08-24 20:39:16 +02:00
common_codepoints.txt	mb_detect_encoding recognizes all letters in Hungarian alphabet	2022-05-25 08:22:07 +02:00
config.m4	Remove duplicate implementation of CP932 from mbstring	2021-06-17 13:12:40 +02:00
config.w32	Remove duplicate implementation of CP932 from mbstring	2021-06-17 13:12:40 +02:00
CREDITS
gen_rare_cp_bitvec.php	Improve detection accuracy of mb_detect_encoding	2021-10-19 18:05:51 +02:00
mb_gpc.c	Update http->https in license (#6945 )	2021-05-06 12:16:35 +02:00
mb_gpc.h	Update http->https in license (#6945 )	2021-05-06 12:16:35 +02:00
mbstring_arginfo.h	Add support for generating MAY_BE_ARRAY_OF_REF func info flag (#7416 )	2021-08-30 13:50:34 +02:00
mbstring.c	Fix GH-9008: mb_detect_encoding(): wrong results with null $encodings	2022-07-20 16:58:55 +02:00
mbstring.h	Update http->https in license (#6945 )	2021-05-06 12:16:35 +02:00
mbstring.stub.php	Add support for generating MAY_BE_ARRAY_OF_REF func info flag (#7416 )	2021-08-30 13:50:34 +02:00
php_mbregex.c	Update http->https in license (#6945 )	2021-05-06 12:16:35 +02:00
php_mbregex.h	Update http->https in license (#6945 )	2021-05-06 12:16:35 +02:00
php_onig_compat.h
php_unicode.c	Return bool from php_unicode_is_prop()	2021-08-24 19:21:21 +02:00
php_unicode.h	Add comments to grouped character properties	2021-08-24 22:09:26 +02:00
rare_cp_bitvec.h	mb_detect_encoding recognizes all letters in Hungarian alphabet	2022-05-25 08:22:07 +02:00
unicode_data.h	Update Unicode tables to 14.0.0	2021-09-20 09:58:20 +02:00