Oniguruma Regular Expressions 2003/07/04 syntax: REG_SYNTAX_RUBY (default) 1. Syntax elements \ escape | alternation (...) group [...] character class 2. Characters \t horizontal tab (0x09) \v vertical tab (0x0B) \n newline (0x0A) \r return (0x0D) \b back space (0x08) (* in character class only) \f form feed (0x0C) \a bell (0x07) \e escape (0x1B) \nnn octal char \xHH hexadecimal char \x{7HHHHHHH} wide hexadecimal char \cx control char \C-x control char \M-x meta (x|0x80) \M-\C-x meta control char 3. Character types . any character (except newline) \w word character (alphanumeric, "_" and multibyte char) \W non-word char \s whitespace char (\t, \n, \v, \f, \r, \x20) \S non-whitespace char \d digit char \D non-digit char 4. Quantifier greedy ? 1 or 0 times * 0 or more times + 1 or more times {n,m} at least n but not more than m times {n,} at least n times {n} n times reluctant ?? 1 or 0 times *? 0 or more times +? 1 or more times {n,m}? at least n but not more than m times {n,}? at least n times possessive (greedy and does not backtrack after repeated) ?+ 1 or 0 times *+ 0 or more times ++ 1 or more times 5. Anchors ^ beginning of the line $ end of the line \b word boundary \B not word boundary \A beginning of string \Z end of string, or before newline at the end \z end of string \G previous end-of-match position 6. POSIX character class ([:xxxxx:], negate [:^xxxxx:]) alnum alphabet or digit char alpha alphabet ascii code value: [0 - 127] blank \t, \x20 cntrl digit 0-9 graph lower print punct space \t, \n, \v, \f, \r, \x20 upper xdigit 0-9, a-f, A-F 7. Operators in character class [...] group (character class in character class) && intersection (lowest precedence operator in character class) ex. [a-w&&[^c-g]z] ==> ([a-w] and ([^c-g] or z)) ==> [abh-w] 8. Extended expressions (?#...) comment (?imx-imx) option on/off i: ignore case m: multi-line (dot(.) match newline) x: extended form (?imx-imx:subexp) option on/off for subexp (?:subexp) not captured (?=subexp) look-ahead (?!subexp) negative look-ahead (?<=subexp) look-behind (?subexp) don't backtrack (?subexp) define named group (name can not include '>', ')', '\' and NUL character) 9. Back reference \n back reference by group number (n >= 1) \k back reference by group name 10. Subexp call ("Tanaka Akira special") \g call by group name \g call by group number (only if 'n' is not defined as name) ----------------------------- 11. Original extensions + named group (?...) + named backref \k + subexp call \g, \g 12. Lacked features compare with perl 5.8.0 + [:word:] + \N{name} + \l,\u,\L,\U, \P, \X, \C + (?{code}) + (??{code}) + (?(condition)yes-pat|no-pat) + \Q...\E (* This is effective on REG_SYNTAX_PERL and REG_SYNTAX_JAVA) 13. Syntax depend options + REG_SYNTAX_RUBY (default) (?m): dot(.) match newline + REG_SYNTAX_PERL, REG_SYNTAX_JAVA (?s): dot(.) match newline (?m): ^ match after newline, $ match before newline 14. Differences with Japanized GNU regex(version 0.12) of Ruby + add look behind (?<=fixed-char-length-pattern), (? match /(?:()|())*\1\2/ =~ "" #=> fail /(?:\1a|())*/ =~ "a" #=> match with "" + Ignore case option is not effect to an octal or hexadecimal numbered char, but it becomes effective if it appears in the char class. This doesn't have consistency, though they are the specifications which are the same as GNU regex of Ruby. /\x61/i.match("A") # => nil /[\x61]/i.match("A") # => match // END