mirror of
https://github.com/php/php-src.git
synced 2024-10-19 15:34:25 +00:00
775 lines
22 KiB
Plaintext
775 lines
22 KiB
Plaintext
|
==========================================
|
|||
|
README for I18N Package
|
|||
|
==========================================
|
|||
|
|
|||
|
o Name and location of package
|
|||
|
|
|||
|
Name: php-3.0.18-i18n-ja-2
|
|||
|
Location: http://www.happysize.co.jp/techie/php-ja-jp/
|
|||
|
ftp://ftp.happysize.co.jp/php-ja-jp/
|
|||
|
http://php.vdomains.org/
|
|||
|
ftp://ftp.vdomains.org/pub/php-ja-jp/
|
|||
|
http://php.jpnnet.com/
|
|||
|
|
|||
|
Currently, this I18N version of PHP only adds Japanese support to base
|
|||
|
PHP. It allows you to use Japanese in scripts, as well as conversion
|
|||
|
between various Japanese encodings. It will work perfectly fine with
|
|||
|
ASCII with i18n option enabled. (note: executable is bit larger due
|
|||
|
to UNICODE table). The basic design aproach is to allow for other
|
|||
|
languages to be added in the future. Developers are encourage to join
|
|||
|
us!
|
|||
|
|
|||
|
For more information on Japanese encodings, please refer to the
|
|||
|
section "Additional Notes."
|
|||
|
|
|||
|
|
|||
|
o What is this package?
|
|||
|
|
|||
|
This package allows you to handle multiple Japanese encodings (SJIS, EUC,
|
|||
|
UTF-8, JIS) in PHP. If you find any bugs in this package, please report
|
|||
|
them to the appropriate mailing list. For now, the PHP-jp mailing list
|
|||
|
is the best place for this.
|
|||
|
|
|||
|
PHP-jp ML mailto:PHP-jp@sidecar.ics.es.osaka-u.ac.jp
|
|||
|
http://sidecar.ics.es.osaka-u.ac.jp/php-jp/
|
|||
|
(discussions are in Japanese)
|
|||
|
|
|||
|
|
|||
|
o Who should use this
|
|||
|
|
|||
|
Due to lack of documentation, it's not intended for beginners. If
|
|||
|
something goes wrong, be prepared to fix it on your own.
|
|||
|
|
|||
|
|
|||
|
o Warranty and Copyright
|
|||
|
|
|||
|
There is no warranty with this package. Use it at your own risk.
|
|||
|
|
|||
|
Please refer to the source code for the copyrights. In general, each
|
|||
|
program's copyright is owned by the programmer. Unless you obey the
|
|||
|
copyright holders restrictions, you are not allowed to use it in any
|
|||
|
form.
|
|||
|
|
|||
|
|
|||
|
o Redistribution
|
|||
|
|
|||
|
As described in the source code, this package and the components are
|
|||
|
allowed to be redistributed with certain restrictions.
|
|||
|
|
|||
|
Due to this package being still in beta, please try to redistribute
|
|||
|
it as an entire package. Please try not to distribute it as a form
|
|||
|
of patch. Because we would prefer to have this package distributed
|
|||
|
as one single package (not patch of patch of patch), avoid releasing
|
|||
|
any patch to this package.
|
|||
|
|
|||
|
|
|||
|
o Who made this
|
|||
|
|
|||
|
A team of volunteers, PHP3 Internationalization, has been contributing
|
|||
|
their free time producing it. Although we are not related to the core
|
|||
|
PHP programmers, we are hoping to have our modifications merged into the
|
|||
|
core distribution in the near future. Thus, we did not call this a
|
|||
|
"Japanese Patch" (or distribution). Our final goal is to have true
|
|||
|
i18nized PHP!
|
|||
|
|
|||
|
For anyone interested in this project, please drop us a line.
|
|||
|
|
|||
|
Contact Address:
|
|||
|
phpj-dev@kage.net
|
|||
|
(Discussions are in Japanese, but feel free to write us in English)
|
|||
|
|
|||
|
Webpage (English and Japanese):
|
|||
|
http://php.jpnnet.com/
|
|||
|
|
|||
|
Project Outline (Japanese):
|
|||
|
http://www.happysize.co.jp/techie/php-ja-jp/spec.htm
|
|||
|
|
|||
|
Developers:
|
|||
|
Hironori Sato <satoh@jpnnet.com>
|
|||
|
Shigeru Kanemoto <sgk@happysize.co.jp>
|
|||
|
Tsukada Takuya <tsukada@fminn.nagano.nagano.jp>
|
|||
|
U. Kenkichi <kenkichi@axes.co.jp>
|
|||
|
Tateyama <tateyan@amy.hi-ho.ne.jp>
|
|||
|
Other gracious contributors
|
|||
|
|
|||
|
|
|||
|
o Future plans
|
|||
|
|
|||
|
- fulfilling what's written in outline
|
|||
|
- support for other languages other than Japanese
|
|||
|
- make the character conversion as a library (?)
|
|||
|
- more testing
|
|||
|
|
|||
|
|
|||
|
o Special Thanks to
|
|||
|
|
|||
|
PHP Japanese webpage maintainer, Hirokawa-san
|
|||
|
http://www.cityfujisawa.ne.jp/%7Elouis/apps/phpfi/
|
|||
|
PHP-JP ML's Yamamoto-san
|
|||
|
http://sidecar.ics.es.osaka-u.ac.jp/php-jp/
|
|||
|
Previous jp-patch developers
|
|||
|
|
|||
|
|
|||
|
|
|||
|
==========================================
|
|||
|
Advantages of using I18N package
|
|||
|
==========================================
|
|||
|
|
|||
|
- allows you to use various character encodings for script files and
|
|||
|
http output
|
|||
|
- distinguish character encoding in POST/GET/COOKIE
|
|||
|
- proper mail output using JIS as body and MIME/Base64/JIS subject
|
|||
|
- if http output's Content-Type is text/html, it will set proper charset
|
|||
|
- stable character encoding conversion
|
|||
|
- multibyte regex
|
|||
|
|
|||
|
|
|||
|
|
|||
|
==========================================
|
|||
|
Installation
|
|||
|
==========================================
|
|||
|
|
|||
|
o Summary
|
|||
|
|
|||
|
Add --enable-i18n option when running configure. For your own setup,
|
|||
|
add any other appropriate options as well.
|
|||
|
|
|||
|
Don't forget to copy php3.ini-dist to desired location.
|
|||
|
(ex. /usr/local/lib/php3.ini)
|
|||
|
|
|||
|
If you have already installed PHP3, copy all the entries in php3.ini-dist
|
|||
|
which start with "i18n.xxxx" to php3.ini.
|
|||
|
|
|||
|
|
|||
|
o configure option
|
|||
|
--enable-i18n
|
|||
|
include i18n features
|
|||
|
|
|||
|
--enable-mbregex
|
|||
|
include multibyte regex library
|
|||
|
(without i18n enabled, mbregex functions will not function)
|
|||
|
|
|||
|
|
|||
|
o creating cgi version
|
|||
|
|
|||
|
% tar xvzf php-3.0.18-i18n-ja-2.tar.gz
|
|||
|
% cd php-3.0.18-i18n-ja-2
|
|||
|
% ./configure --enable-i18n --enable-mbregex
|
|||
|
% make
|
|||
|
|
|||
|
|
|||
|
o creating Apache version (regular module)
|
|||
|
|
|||
|
% tar xvzf php-3.0.18-i18n-ja-2.tar.gz
|
|||
|
% tar xvzf apache_1.3.x.tar.gz
|
|||
|
% cd apache_1.3.x
|
|||
|
% ./configure
|
|||
|
% cd ../php-3.0.18-i18n-ja-2
|
|||
|
% ./configure --with-apache=../apache_1.3.x --enable-i18n --enable-mbregex
|
|||
|
% make
|
|||
|
% make install
|
|||
|
% cd ../apache_1.3.x
|
|||
|
% ./configure --activate-module=src/modules/php3/libphp3.a
|
|||
|
% make
|
|||
|
% make install
|
|||
|
|
|||
|
|
|||
|
o creating Apache DSO version
|
|||
|
|
|||
|
create DSO capable Apache first
|
|||
|
% tar xvzf apache_1.3.x.tar.gz
|
|||
|
% cd apache-1.3.x
|
|||
|
% ./configure --enable-shared=max
|
|||
|
% make
|
|||
|
% make install
|
|||
|
|
|||
|
now create php3
|
|||
|
% cd php-3.0.18-i18n-ja-2
|
|||
|
% ./configure --with-apxs=/usr/local/apache/bin/apxs --enable-i18n \
|
|||
|
--enable-mbregex
|
|||
|
% make
|
|||
|
% make install
|
|||
|
|
|||
|
|
|||
|
==========================================
|
|||
|
Additional Notes
|
|||
|
==========================================
|
|||
|
|
|||
|
o Multibyte regex library
|
|||
|
|
|||
|
From beta4, we have included the multibyte (mb) regex library which comes with
|
|||
|
Ruby. With this addition, you can now use regex in EUC, SJIS and UTF-8
|
|||
|
encoding. To avoid any conflicts with HSREGEX included with Apache,
|
|||
|
each function name has been changed. Therefore, mb regex functions are
|
|||
|
named differently from the original ereg functions in PHP. The character
|
|||
|
encoding used in mb regex is configured in i18n.internal_encoding.
|
|||
|
|
|||
|
|
|||
|
o Binary Output
|
|||
|
|
|||
|
If http output encoding is set to other than 'pass', conversion of encoding
|
|||
|
from internal encoding to http output is done automatically. Thus,
|
|||
|
if you prefer to spit out anything in raw binary format, your data
|
|||
|
may be corrupted. In such event, set http_output to 'pass'.
|
|||
|
|
|||
|
ex.
|
|||
|
<?
|
|||
|
i18n_http_output("pass");
|
|||
|
...
|
|||
|
echo $the_binary_data_string;
|
|||
|
?>
|
|||
|
|
|||
|
|
|||
|
o Content-Type
|
|||
|
|
|||
|
Depending on the setting of http_output, PHP will output the proper charset.
|
|||
|
ex. Content-Type: text/html; charset="..."
|
|||
|
|
|||
|
Be aware of following:
|
|||
|
|
|||
|
- If you set Content-Type header using header() function, that will
|
|||
|
override the automatic addition of charset.
|
|||
|
- Be cautious when you set i18n_http_output, since if any output is
|
|||
|
made prior to this, proper header may have been sent out to the
|
|||
|
client already.
|
|||
|
|
|||
|
|
|||
|
o In the event of trouble
|
|||
|
|
|||
|
If you find any bugs or trouble, please contact us at the above address.
|
|||
|
It may help us to track the problem if you send us the script as well.
|
|||
|
|
|||
|
If you encounter any memory related error such as segmentation violation,
|
|||
|
add --enable-debug when you run configure. This will give you more
|
|||
|
detail information on where error has occurred. The error is stored
|
|||
|
in the server log or regular http output in CGI mode.
|
|||
|
|
|||
|
|
|||
|
o About Japanese encodings
|
|||
|
|
|||
|
Due to historical reason, there are multiple character encodings used
|
|||
|
for Japanese. The most common encodings are: SJIS, EUC, JIS, and UTF-8.
|
|||
|
Here are (very) brief description of them:
|
|||
|
|
|||
|
EUC
|
|||
|
commonly used in UNIX environment
|
|||
|
8bit-8bit combo
|
|||
|
always >=0x80
|
|||
|
|
|||
|
SJIS
|
|||
|
commonly used in Mac or PCs
|
|||
|
similar to EUC
|
|||
|
mostly 8bit-8bit (some 8bit-7bit)
|
|||
|
mostly >=0x80
|
|||
|
there are some halfwidth (size of ASCII) multibytes
|
|||
|
|
|||
|
JIS
|
|||
|
commonly used in 7bit environment (nntp and smtp)
|
|||
|
starts with escaping char, \033 and a few more characters
|
|||
|
|
|||
|
UTF-8
|
|||
|
16bit+ encoding
|
|||
|
defines many languages existing in this world
|
|||
|
see http://www.unicode.org/ for more detail
|
|||
|
|
|||
|
Because of having all these character encodings, PHP needs to translate
|
|||
|
between these encodings on the fly. Also, the addition of the mb regex
|
|||
|
library allows you to handle mb strings without fear of getting mb char
|
|||
|
chopped in half.
|
|||
|
|
|||
|
Since Japanese is not the only language with multiple encodings, we
|
|||
|
encourage other developers to modify our code to suit your needs. We
|
|||
|
definitely need people to work with Korean, Chinese (both traditional
|
|||
|
and simplified), and Russian. Let us know if you are interested in
|
|||
|
this project!
|
|||
|
|
|||
|
|
|||
|
|
|||
|
==========================================
|
|||
|
php3.ini setting
|
|||
|
==========================================
|
|||
|
|
|||
|
The following init options will allow you to change the default settings.
|
|||
|
Define these settings in the global section of php3.ini.
|
|||
|
|
|||
|
All keywords are case-insensitive.
|
|||
|
|
|||
|
o Encoding naming
|
|||
|
|
|||
|
For each encoding, there are three names: standarized, alias, MIME
|
|||
|
|
|||
|
- UTF-8
|
|||
|
standard: UTF-8
|
|||
|
alias: N/A
|
|||
|
mime: UTF-8
|
|||
|
|
|||
|
- ASCII
|
|||
|
standard: ASCII
|
|||
|
alias: N/A
|
|||
|
mime: US-ASCII
|
|||
|
|
|||
|
- Japanese EUC
|
|||
|
standard: EUC-JP
|
|||
|
alias: EUC, EUC_JP, eucJP, x-euc-jp
|
|||
|
mime: EUC-JP
|
|||
|
|
|||
|
- Shift JIS
|
|||
|
standard: SJIS
|
|||
|
alias: x-sjis, MS_Kanji
|
|||
|
mime: Shift_JIS
|
|||
|
|
|||
|
- JIS
|
|||
|
standard: JIS
|
|||
|
alias: N/A
|
|||
|
mime: ISO-2022-JP
|
|||
|
|
|||
|
- Quoted-Printable
|
|||
|
standard: Quoted-Printable
|
|||
|
alias: qprint
|
|||
|
mime: N/A
|
|||
|
|
|||
|
- BASE64
|
|||
|
standard: BASE64
|
|||
|
alias: N/A
|
|||
|
mime: N/A
|
|||
|
|
|||
|
- no conversion
|
|||
|
standard: pass
|
|||
|
alias: none
|
|||
|
mime: N/A
|
|||
|
|
|||
|
- auto encoding detection
|
|||
|
standard: auto
|
|||
|
alias: unknown
|
|||
|
mime: N/A
|
|||
|
|
|||
|
* N/A - Not Applicapable
|
|||
|
|
|||
|
o i18n.http_output - default http output encoding
|
|||
|
|
|||
|
i18n.http_output = EUC-JP|SJIS|JIS|UTF-8|pass
|
|||
|
EUC-JP : EUC
|
|||
|
SJIS: SJIS
|
|||
|
JIS : JIS
|
|||
|
UTF-8: UTF-8
|
|||
|
pass: no conversion
|
|||
|
|
|||
|
The default is pass (internal encoding is used)
|
|||
|
It can be re-configured on the fly using i18n_http_output().
|
|||
|
|
|||
|
|
|||
|
o i18n.internal_encoding - internal encoding
|
|||
|
|
|||
|
i18n.internal_encoding = EUC-JP|SJIS|UTF-8
|
|||
|
EUC-JP : EUC
|
|||
|
SJIS: SJIS
|
|||
|
UTF-8: UTF-8
|
|||
|
|
|||
|
The default is EUC-JP.
|
|||
|
|
|||
|
PHP parser is designed based on using ISO-8859-1. For other
|
|||
|
encodings, following conditions have to be satisfied in order
|
|||
|
to use them:
|
|||
|
- per byte encoding
|
|||
|
- single byte charactor in range of 00h-7fh which is compatible
|
|||
|
with ASCII
|
|||
|
- multibyte without 00h-7fh
|
|||
|
In case of Japanese, EUC-JP and UTF-8 are the only encoding that
|
|||
|
meets this criteria.
|
|||
|
|
|||
|
If i18n.internal_encoding and i18n.http_output differs, conversion
|
|||
|
takes place at the time of output. If you convert any data within
|
|||
|
PHP scripts to URL encoding, BASE64 or Quoted-Printable, encoding
|
|||
|
stays as defined in i18n.internal_encoding. Thus, if you would
|
|||
|
prefer to encode in compliance with i18n.http_output, you need
|
|||
|
to manually convert encoding.
|
|||
|
|
|||
|
ex. $str = urlencode( i18n_convert($str, i18n_http_output()) );
|
|||
|
|
|||
|
Encoding such as ISO-2022-** and HZ encoding which uses escape
|
|||
|
sequences can not be used as internal encoding. If used, they
|
|||
|
result in following errors:
|
|||
|
- parser pukes funky error
|
|||
|
- magic_quotes_*** breaks encoding (SJIS may have similar problem)
|
|||
|
- string manipulation and regex will malfunction
|
|||
|
|
|||
|
|
|||
|
o i18n.script_encoding - script encoding
|
|||
|
|
|||
|
i18n.script_encoding = auto|EUC-JP|SJIS|JIS|UTF-8
|
|||
|
auto: automatic
|
|||
|
EUC-JP : EUC
|
|||
|
SJIS: SJIS
|
|||
|
JIS : JIS
|
|||
|
UTF-8: UTF-8
|
|||
|
|
|||
|
The default is auto.
|
|||
|
The script's encoding is converted to i18n.internal_encoding before
|
|||
|
entering the script parser.
|
|||
|
|
|||
|
Be aware that auto detection may fail under some conditions.
|
|||
|
For best auto detection, add multibyte charactor at begining of
|
|||
|
script.
|
|||
|
|
|||
|
|
|||
|
o i18n.http_input - handling of http input (GET/POST/COOKIE)
|
|||
|
|
|||
|
i18n.http_input = pass|auto
|
|||
|
auto: auto conversion
|
|||
|
pass: no conversion
|
|||
|
|
|||
|
The default is auto.
|
|||
|
If set to pass, no conversion will take place.
|
|||
|
If set to auto, it will automatically detect the encoding. If
|
|||
|
detection is successful, it will convert to the proper internal
|
|||
|
encoding. If not, it will assume the input as defined in
|
|||
|
i18n.http_input_default.
|
|||
|
|
|||
|
o i18n.http_input_default - default http input encoding
|
|||
|
|
|||
|
i18n.http_input_default = pass|EUC-JP|SJIS|JIS|UTF-8
|
|||
|
pass: no conversion
|
|||
|
EUC-JP : EUC
|
|||
|
SJIS: SJIS
|
|||
|
JIS : JIS
|
|||
|
UTF-8: UTF-8
|
|||
|
|
|||
|
The default is pass.
|
|||
|
This option is only effective as long as i18n.http_input is set to
|
|||
|
auto. If the auto detection fails, this encoding is used as an
|
|||
|
assumption to convert the http input to the internal encoding.
|
|||
|
If set to pass, no conversion will take place.
|
|||
|
|
|||
|
o sample settings
|
|||
|
|
|||
|
1) For most flexibility, we recommend using following example.
|
|||
|
i18n.http_output = SJIS
|
|||
|
i18n.internal_encoding = EUC-JP
|
|||
|
i18n.script_encoding = auto
|
|||
|
i18n.http_input = auto
|
|||
|
i18n.http_input_default = SJIS
|
|||
|
|
|||
|
2) To avoid unexpected encoding problems, try these:
|
|||
|
|
|||
|
i18n.http_output = pass
|
|||
|
i18n.internal_encoding = EUC-JP
|
|||
|
i18n.script_encoding = pass
|
|||
|
i18n.http_input = pass
|
|||
|
i18n.http_input_default = pass
|
|||
|
|
|||
|
|
|||
|
|
|||
|
==========================================
|
|||
|
PHP functions
|
|||
|
==========================================
|
|||
|
|
|||
|
The following describes the additional PHP functions.
|
|||
|
|
|||
|
All keywords are case-insensitive.
|
|||
|
|
|||
|
o i18n_http_output(encoding)
|
|||
|
o encoding = i18n_http_output()
|
|||
|
|
|||
|
This will set the http output encoding. Any output following this
|
|||
|
function will be controlled by this function. If no argument is given,
|
|||
|
the current http output encode setting is returned.
|
|||
|
|
|||
|
encodings
|
|||
|
EUC-JP : EUC
|
|||
|
SJIS: SJIS
|
|||
|
JIS : JIS
|
|||
|
UTF-8: UTF-8
|
|||
|
pass: no conversion
|
|||
|
|
|||
|
NONE is not allowed
|
|||
|
|
|||
|
|
|||
|
o encoding = i18n_internal_encoding()
|
|||
|
|
|||
|
Returns the current internal encoding as a string.
|
|||
|
|
|||
|
internal encoding
|
|||
|
EUC-JP : EUC
|
|||
|
SJIS: SJIS
|
|||
|
UTF-8: UTF-8
|
|||
|
|
|||
|
|
|||
|
o encoding = i18n_http_input()
|
|||
|
|
|||
|
Returns http input encoding.
|
|||
|
|
|||
|
encodings
|
|||
|
EUC-JP : EUC
|
|||
|
SJIS: SJIS
|
|||
|
JIS : JIS
|
|||
|
UTF-8: UTF-8
|
|||
|
pass: no conversion (only if i18n.http_input is set to pass)
|
|||
|
|
|||
|
|
|||
|
o string = i18n_convert(string, encoding)
|
|||
|
string = i18n_convert(string, encoding, pre-conversion-encoding)
|
|||
|
|
|||
|
Returns converted string in desired encoding. If
|
|||
|
pre-conversion-encoding is not defined, the given
|
|||
|
string is assumed to be in internal encoding.
|
|||
|
|
|||
|
encoding
|
|||
|
EUC-JP : EUC
|
|||
|
SJIS: SJIS
|
|||
|
JIS : JIS
|
|||
|
UTF-8: UTF-8
|
|||
|
pass: no conversion
|
|||
|
|
|||
|
pre-conversion-encoding
|
|||
|
EUC-JP : EUC
|
|||
|
SJIS: SJIS
|
|||
|
JIS : JIS
|
|||
|
UTF-8: UTF-8
|
|||
|
pass: no conversion
|
|||
|
auto: auto detection
|
|||
|
|
|||
|
|
|||
|
o encoding = i18n_discover_encoding(string)
|
|||
|
|
|||
|
Encoding of the given string is returned (as a string).
|
|||
|
|
|||
|
encoding
|
|||
|
EUC-JP : EUC
|
|||
|
SJIS: SJIS
|
|||
|
JIS : JIS
|
|||
|
UTF-8: UTF-8
|
|||
|
ASCII: ASCII (only 09h, 0Ah, 0Dh, 20h-7Eh)
|
|||
|
pass: unable to determine (text is too short to determine)
|
|||
|
unknown: unknown or possible error
|
|||
|
|
|||
|
|
|||
|
o int = mbstrlen(string)
|
|||
|
o int = mbstrlen(string, encoding)
|
|||
|
|
|||
|
Returns character length of a given string. If no encoding is defined,
|
|||
|
the encoding of string is assumed to be the internal encoding.
|
|||
|
|
|||
|
encoding
|
|||
|
EUC-JP : EUC
|
|||
|
SJIS: SJIS
|
|||
|
JIS : JIS
|
|||
|
UTF-8: UTF-8
|
|||
|
auto: automatic
|
|||
|
|
|||
|
|
|||
|
o int = mbstrpos(string1, string2)
|
|||
|
o int = mbstrpos(string1, string2, start)
|
|||
|
o int = mbstrpos(string1, string2, start, encoding)
|
|||
|
|
|||
|
Same as strpos. If no encoding is defined, the encoding of string
|
|||
|
is assumed to be the internal encoding.
|
|||
|
|
|||
|
encoding
|
|||
|
EUC-JP : EUC
|
|||
|
SJIS: SJIS
|
|||
|
JIS : JIS
|
|||
|
UTF-8: UTF-8
|
|||
|
|
|||
|
|
|||
|
o int = mbstrrpos(string1, string2)
|
|||
|
o int = mbstrrpos(string1, string2, encoding)
|
|||
|
|
|||
|
Same as strrpos. If no encoding is defined, the encoding of string
|
|||
|
is assumed to be the internal encoding.
|
|||
|
|
|||
|
encoding
|
|||
|
EUC-JP : EUC
|
|||
|
SJIS: SJIS
|
|||
|
JIS : JIS
|
|||
|
UTF-8: UTF-8
|
|||
|
|
|||
|
|
|||
|
o string = mbsubstr(string, position)
|
|||
|
o string = mbsubstr(string, position, length)
|
|||
|
o string = mbsubstr(string, position, length, encoding)
|
|||
|
|
|||
|
Same as substr. If no encoding is defined, the encoding of string
|
|||
|
is assumed to be the internal encoding.
|
|||
|
|
|||
|
encoding
|
|||
|
EUC-JP : EUC
|
|||
|
SJIS: SJIS
|
|||
|
JIS : JIS
|
|||
|
UTF-8: UTF-8
|
|||
|
|
|||
|
|
|||
|
o string = mbstrcut(string, position)
|
|||
|
o string = mbstrcut(string, position, length)
|
|||
|
o string = mbstrcut(string, position, length, encoding)
|
|||
|
|
|||
|
Same as subcut. If position is the 2nd byte of a mb character, it will cut
|
|||
|
from the first byte of that character. It will cut the string without
|
|||
|
chopping a single byte from a mb character. In another words, if you
|
|||
|
set length to 5, you will only get two mb characters. If no encoding
|
|||
|
is defined, the encoding of string is assumed to be the internal encoding.
|
|||
|
|
|||
|
encoding
|
|||
|
EUC-JP : EUC
|
|||
|
SJIS: SJIS
|
|||
|
JIS : JIS
|
|||
|
UTF-8: UTF-8
|
|||
|
|
|||
|
|
|||
|
o string = i18n_mime_header_encode(string)
|
|||
|
MIME encode the string in the format of =?ISO-2022-JP?B?[string]?=.
|
|||
|
|
|||
|
|
|||
|
o string = i18n_mime_header_decode(string)
|
|||
|
MIME decodes the string.
|
|||
|
|
|||
|
|
|||
|
o string = i18n_ja_jp_hantozen(string)
|
|||
|
o string = i18n_ja_jp_hantozen(string, option)
|
|||
|
o string = i18n_ja_jp_hantozen(string, option, encoding)
|
|||
|
|
|||
|
Conversion between full width character and halfwidth character.
|
|||
|
|
|||
|
option
|
|||
|
The following options are allowed. The default is "KV".
|
|||
|
Acronym: FW = fullwidth, HW = halfwidth
|
|||
|
|
|||
|
"r" : FW alphabet -> HW alphabet
|
|||
|
|
|||
|
"R" : HW alphabet -> FW alphabet
|
|||
|
|
|||
|
"n" : FW number -> HW number
|
|||
|
|
|||
|
"N" : HW number -> FW number
|
|||
|
|
|||
|
"a" : FW alpha numeric (21h-7Eh) -> HW alpha numeric
|
|||
|
|
|||
|
"A" : HW alpha numeric (21h-7Eh) -> FW alpha numeric
|
|||
|
|
|||
|
"k" : FW katakana -> HW katakana
|
|||
|
|
|||
|
"K" : HW katakana -> FW katakana
|
|||
|
|
|||
|
"h" : FW hiragana -> HW hiragana
|
|||
|
|
|||
|
"H" : HW hiragana -> FW katakana
|
|||
|
|
|||
|
"c" : FW katakana -> FW hiragana
|
|||
|
|
|||
|
"C" : FW hiragana -> FW katakana
|
|||
|
|
|||
|
"V" : merge dakuon character. only works with "K" and "H" option
|
|||
|
|
|||
|
encoding
|
|||
|
If no encoding is defined, the encoding of string is assumed to be
|
|||
|
the internal encoding.
|
|||
|
EUC-JP : EUC
|
|||
|
SJIS: SJIS
|
|||
|
JIS : JIS
|
|||
|
UTF-8: UTF-8
|
|||
|
|
|||
|
|
|||
|
int = mbereg(regex_pattern, string, string)
|
|||
|
int = mberegi(regex_pattern, string, string)
|
|||
|
mb version of ereg() and eregi()
|
|||
|
|
|||
|
|
|||
|
string = mbereg_replace(regex_pattern, string, string)
|
|||
|
string = mberegi_replace(regex_pattern, string, string)
|
|||
|
mb version of ereg_replace() and eregi_replace()
|
|||
|
|
|||
|
|
|||
|
string_array = mbsplit(regex, string, limit)
|
|||
|
mb version of split()
|
|||
|
|
|||
|
|
|||
|
|
|||
|
==========================================
|
|||
|
FAQ
|
|||
|
==========================================
|
|||
|
|
|||
|
Here, we have gathered some commonly asked questions on PHP-jp mailing
|
|||
|
list.
|
|||
|
|
|||
|
o To use Japanese in GET method
|
|||
|
|
|||
|
If you need to assign Japanese text in GET method with argument, such as;
|
|||
|
xxxx.php?data=<Japanese text>, use urlencode function in PHP. If not,
|
|||
|
text may not be passed onto action php properly.
|
|||
|
|
|||
|
ex: <a href="hoge.php?data=<? echo urlencode($data) ?>">Link</a>
|
|||
|
|
|||
|
|
|||
|
o When passing data via GET/POST/COOKIE, \ character sneaks in
|
|||
|
|
|||
|
When using SJIS as internal encoding, or passed-on data includes '"\,
|
|||
|
PHP automatically inserts escaping character, \. Set magic_quotes_gpc
|
|||
|
in php3.ini from On to Off. An alternative work around to this problem
|
|||
|
is to use StripSlashes().
|
|||
|
|
|||
|
If $quote_str is in SJIS and you would like to extract Japanese text,
|
|||
|
use ereg_replace as follows:
|
|||
|
|
|||
|
ereg_replace(sprintf("([%c-%c%c-%c]\\\\)\\\\",0x81,0x9f,0xe0,0xfc),
|
|||
|
"\\1",$quote_str);
|
|||
|
|
|||
|
This will effectively extract Japanese text out of $quote_str.
|
|||
|
|
|||
|
|
|||
|
o Sometimes, encoding detection fails
|
|||
|
|
|||
|
If i18n_http_input() returns 'pass', it's likely that PHP failed to
|
|||
|
detect whether it's SJIS or EUC. In such case, use <input type=hidden
|
|||
|
value="some Japanese text"> to properly detect the incoming text's
|
|||
|
encoding.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
==========================================
|
|||
|
Japanese Manual
|
|||
|
==========================================
|
|||
|
Translated manual done by "PHP Japanese Manual Project" :
|
|||
|
|
|||
|
http://www.php.net/manual/ja/manual.php
|
|||
|
|
|||
|
Starting 3.0.18-i18n-ja, we have removed doc-jp from tarball package.
|
|||
|
|
|||
|
|
|||
|
==========================================
|
|||
|
Change Logs
|
|||
|
==========================================
|
|||
|
|
|||
|
o 2000-10-28, Rui Hirokawa <hirokawa@php.net>
|
|||
|
|
|||
|
This patch is derived from php-3.0.15-i18n-ja as well as php-3.0.16 by
|
|||
|
Kuwamura applied to original php-3.0.18. It also includes following fixes:
|
|||
|
|
|||
|
1) allows you to set charset in mail().
|
|||
|
2) fixed mbregex definitions to avoid conflicts with system regex
|
|||
|
3) php3.ini-dist now uses PASS for http_output instead of SJIS
|
|||
|
|
|||
|
o 2000-11-24, Hironori Sato <satoh@yyplanet.com>
|
|||
|
|
|||
|
Applied above patched and added detection for gdImageStringTTF in configure.
|
|||
|
Following setups are known to work:
|
|||
|
|
|||
|
gd-1.3-6, gd-devel-1.3-6, freetype-1.3.1-5, freetype-devel-1.3.1-5
|
|||
|
ImageTTFText($im,$size,$angle,$x1,$y1,$color,"/path/to/font.ttf",
|
|||
|
i18n_convert("<22><><EFBFBD>ܸ<EFBFBD>", "UTF-8"));
|
|||
|
ImageGif($im);
|
|||
|
|
|||
|
gd-1.7.3-1k1, gd-devel-1.7.3-1k1, freetype-1.3.1-5, freetype-devel-1.3.1-5
|
|||
|
ImageTTFText($im,$size,$angle,$x1,$y1,$color,"/path/to/font.ttf","<22><><EFBFBD>ܸ<EFBFBD>");
|
|||
|
ImagePng($im);
|
|||
|
* i18n_internal_encoding = EUC <20><><EFBFBD><EFBFBD> SJIS
|
|||
|
|
|||
|
For any gd libraries before 1.6.2, you need to use i18n_convert. For
|
|||
|
gd-1.5.2/3, upgrade to anything above 1.7 to use ImageTTFText without
|
|||
|
using i18n_convert. As long as you have internal_encoding set to EUC or
|
|||
|
SJIS, ImageTTFText should work without mojibake. Again, make sure you
|
|||
|
have i18n_http_output("pass") before calling ImageGif, ImagePng, ImageJpeg!
|
|||
|
|
|||
|
o 2000-12-09, Rui Hirokawa <hirokawa@php.net>
|
|||
|
|
|||
|
Fixed mail() which was causing segmentation fault when header was null.
|
|||
|
|