mirror of
https://github.com/php/php-src.git
synced 2024-10-22 16:57:05 +00:00
11b0f469f0
. Supports various encodings such as BIG5, GB2312 and ISO-8859-* . Fixes bug #26677 (mbstring compile errors with IRIX) . Many thanks to K.Kosako. - Remove redundant files that are not relevant to the build.
166 lines
5.0 KiB
Plaintext
166 lines
5.0 KiB
Plaintext
README 2004/02/25
|
|
|
|
Oniguruma ---- (C) K.Kosako <kosako@sofnec.co.jp>
|
|
|
|
http://www.ruby-lang.org/cgi-bin/cvsweb.cgi/oniguruma/
|
|
http://www.freebsd.org/cgi/cvsweb.cgi/ports/devel/oniguruma/
|
|
|
|
Oniguruma is a regular expressions library.
|
|
The characteristics of this library is that different character encoding
|
|
for every regular expression object can be specified.
|
|
|
|
Supported character encodings:
|
|
|
|
ASCII, UTF-8,
|
|
EUC-JP, EUC-TW, EUC-KR, EUC-CN,
|
|
Shift_JIS, Big5, KOI8, KOI8-R,
|
|
ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5,
|
|
ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-10,
|
|
ISO-8859-11, ISO-8859-13, ISO-8859-14, ISO-8859-15, ISO-8859-16
|
|
|
|
|
|
There are two ways of using of it in this program.
|
|
|
|
* Built-in regular expression engine of Ruby
|
|
* C library (supported APIs: GNU regex, POSIX, Oniguruma native)
|
|
|
|
------------------------------------------------------------
|
|
|
|
Install
|
|
|
|
(A) Install into Ruby
|
|
|
|
See INSTALL-RUBY.
|
|
|
|
(character encodings: ASCII, UTF-8, EUC-JP, Shift_JIS)
|
|
|
|
|
|
(B) Install C library
|
|
|
|
(B-1) Unix and Cygwin platform
|
|
|
|
1. ./configure
|
|
2. make
|
|
3. make install
|
|
|
|
(* uninstall: make uninstall)
|
|
|
|
* test (ASCII/EUC-JP)
|
|
4. make ctest
|
|
|
|
|
|
(B-2) Win32 platform (VC++)
|
|
|
|
1. copy win32\Makefile Makefile
|
|
2. copy win32\config.h config.h
|
|
3. nmake
|
|
|
|
onig_s.lib: static link library
|
|
onig.dll: dynamic link library
|
|
|
|
* test (ASCII/Shift_JIS)
|
|
4. copy win32\testc.c testc.c
|
|
5. nmake ctest
|
|
|
|
|
|
|
|
License
|
|
|
|
When this software is partly used or it is distributed with Ruby,
|
|
this of Ruby follows the license of Ruby.
|
|
It follows the BSD license in the case of the one except for it.
|
|
|
|
|
|
|
|
Regular Expressions
|
|
|
|
See doc/RE (or doc/RE.ja for Japanese).
|
|
|
|
|
|
Sample Programs
|
|
|
|
sample/simple.c example of the minimum (native API)
|
|
sample/names.c example of the named group callback.
|
|
sample/encode.c example of some encodings.
|
|
sample/listcap.c example of the capture history.
|
|
sample/posix.c POSIX API sample.
|
|
sample/sql.c example of the variable meta characters.
|
|
(SQL-like pattern matching)
|
|
|
|
|
|
Source Files
|
|
|
|
oniguruma.h Oniguruma API header file. (public)
|
|
oniggnu.h GNU regex API header file. (public)
|
|
onigcmpt200.h Oniguruma API backward compatibility header file. (public)
|
|
(for 2.0.0 or more older version)
|
|
|
|
regenc.h character encodings framework header file.
|
|
regint.h internal definitions
|
|
regparse.h internal definitions for regparse.c and regcomp.c
|
|
regcomp.c compiling and optimization functions
|
|
regenc.c character encodings framework.
|
|
regerror.c error message function
|
|
regex.c source files wrapper for Ruby
|
|
regexec.c search and match functions
|
|
regparse.c parsing functions.
|
|
reggnu.c GNU regex API functions
|
|
|
|
onigposix.h POSIX API header file. (public)
|
|
regposerr.c POSIX error message function.
|
|
regposix.c POSIX functions.
|
|
|
|
enc/mktable.c character type table generator.
|
|
enc/ascii.c ASCII encoding.
|
|
enc/iso8859_1.c ISO-8859-1 encoding. (Latin-1)
|
|
enc/iso8859_2.c ISO-8859-2 encoding. (Latin-2)
|
|
enc/iso8859_3.c ISO-8859-3 encoding. (Latin-3)
|
|
enc/iso8859_4.c ISO-8859-4 encoding. (Latin-4)
|
|
enc/iso8859_5.c ISO-8859-5 encoding. (Cyrillic)
|
|
enc/iso8859_6.c ISO-8859-6 encoding. (Arabic)
|
|
enc/iso8859_7.c ISO-8859-7 encoding. (Greek)
|
|
enc/iso8859_8.c ISO-8859-8 encoding. (Hebrew)
|
|
enc/iso8859_9.c ISO-8859-9 encoding. (Latin-5 or Turkish)
|
|
enc/iso8859_10.c ISO-8859-10 encoding. (Latin-6 or Nordic)
|
|
enc/iso8859_11.c ISO-8859-11 encoding. (Thai)
|
|
enc/iso8859_13.c ISO-8859-13 encoding. (Latin-7 or Baltic Rim)
|
|
enc/iso8859_14.c ISO-8859-14 encoding. (Latin-8 or Celtic)
|
|
enc/iso8859_15.c ISO-8859-15 encoding. (Latin-9 or West European with Euro)
|
|
enc/iso8859_16.c ISO-8859-16 encoding.
|
|
(Latin-10 or South-Eastern European with Euro)
|
|
enc/utf8.c UTF-8 encoding.
|
|
enc/euc_jp.c EUC-JP encoding.
|
|
enc/euc_tw.c EUC-TW encoding.
|
|
enc/euc_kr.c EUC-KR, EUC-CN encoding.
|
|
enc/sjis.c Shift_JIS encoding.
|
|
enc/koi8.c KOI8 encoding.
|
|
enc/koi8_r.c KOI8-R encoding.
|
|
enc/big5.c Big5 encoding.
|
|
|
|
|
|
|
|
API differences with Japanized GNU regex(version 0.12) of Ruby
|
|
|
|
+ re_compile_fastmap() is removed.
|
|
+ re_recompile_pattern() is added.
|
|
+ re_alloc_pattern() is added.
|
|
|
|
|
|
ToDo
|
|
|
|
1 support 16-bit encodings. (UTF-16)
|
|
2 different encoding pattern with target.
|
|
(ex. ASCII/UTF-16, UTF-16 BE and UTF-16 LE)
|
|
3 add enc/name.c (onigenc_get_enc_by_name(name))
|
|
|
|
? transmission stopper. (return ONIG_STOP from match_at())
|
|
? implement syntax behavior ONIG_SYN_CONTEXT_INDEP_ANCHORS.
|
|
? better acess to hash table (st.c).
|
|
non null-terminated key version st_lookup().
|
|
? grep-like tool 'onigrep'.
|
|
? return parse tree of regexp pattern to application.
|
|
?? /a{n}?/ should be interpreted as /(?:a{n})?/.
|
|
?? \h hexadecimal digit char ([0-9a-fA-F]), \H not \h.
|
|
|
|
and I'm thankful to Akinori MUSHA.
|