README 2003/07/04 Oniguruma ---- (C) K.Kosako http://www.ruby-lang.org/cgi-bin/cvsweb.cgi/oniguruma/ Oniguruma is a regular expression library. The characteristics of this library is that different character encodings for every regular expression object can be specified. (Supported character encodings: ASCII, UTF-8, EUC-JP, Shift_JIS) There are two ways of using of it in this program. * Built-in regular expression engine of Ruby * C library (supported APIs: GNU regex, POSIX, Oniguruma native) Install A. Install into Ruby See INSTALL-RUBY. B. C library B1. Unix, Cygwin 1. ./configure 2. make 3. make install (* uninstall: make uninstall) * test (EUC-JP) 4. make ctest B2. Win32 platform (VC++) 1. copy win32\config.h config.h 2. copy win32\Makefile Makefile 3. nmake onig_s.lib: static link library onig.dll: dynamic link library * test (Shift_JIS) 4. copy win32\testc.c testc.c 5. nmake ctest License When this software is partly used or it is distributed with Ruby, this of Ruby follows the license of Ruby. It follows the BSD license in the case of the one except for it. Source Files oniguruma.h Oniguruma and GNU regex API header file regint.h internal definitions regparse.h internal definitions for regparse.c and regcomp.c regparse.c parsing functions. regcomp.c compiling and optimization functions regerror.c error message function regex.c source files wrapper for Ruby regexec.c search and match functions reggnu.c GNU regex API functions onigposix.h POSIX API header file regposerr.c POSIX API error message function (regerror) regposix.c POSIX API functions sample/simple.c example of the minimum (native API) sample/posix.c POSIX API sample. sample/names.c example of the named group callback. Regular expression See doc/RE. API differences with Japanized GNU regex(version 0.12) of Ruby + re_compile_fastmap() is removed. + re_recompile_pattern() is added. + re_alloc_pattern() is added. ToDo 1 support 16-bit and 31-bit encodings. (UCS-2, UCS-4, UTF-16) (each encoding has meta-character code table?) 2 if-then-else. (?(condition)then), (?(condition)then|else) ? variable meta characters. ? implement syntax behavior REG_SYN_CONTEXT_INDEP_ANCHORS. ? pattern encoding different with target. (ex. UCS-2 Big Endian and UCS-2 Little Endian) ? better acess to hash table. non null-terminated key version st_lookup(). (but it needs to modify st.[ch]) ? character set specific POSIX bracket extensions. ([:hiragana:]) ? grep-like tool 'onigrep'. (variable syntax option etc..) ? check invalid wide char value in WC2MB, WC2MB_FIRST on Ruby M17N. ? define THREAD_PASS in regint.h as rb_thread_pass(). and I'm thankful to Akinori MUSHA.