mirror of
https://github.com/php/php-src.git
synced 2024-09-24 19:37:26 +00:00
upgrade to pcre 7.7
This commit is contained in:
parent
453e502236
commit
8a5db93312
@ -1,6 +1,114 @@
|
||||
ChangeLog for PCRE
|
||||
------------------
|
||||
|
||||
Version 7.7 07-May-08
|
||||
---------------------
|
||||
|
||||
1. Applied Craig's patch to sort out a long long problem: "If we can't convert
|
||||
a string to a long long, pretend we don't even have a long long." This is
|
||||
done by checking for the strtoq, strtoll, and _strtoi64 functions.
|
||||
|
||||
2. Applied Craig's patch to pcrecpp.cc to restore ABI compatibility with
|
||||
pre-7.6 versions, which defined a global no_arg variable instead of putting
|
||||
it in the RE class. (See also #8 below.)
|
||||
|
||||
3. Remove a line of dead code, identified by coverity and reported by Nuno
|
||||
Lopes.
|
||||
|
||||
4. Fixed two related pcregrep bugs involving -r with --include or --exclude:
|
||||
|
||||
(1) The include/exclude patterns were being applied to the whole pathnames
|
||||
of files, instead of just to the final components.
|
||||
|
||||
(2) If there was more than one level of directory, the subdirectories were
|
||||
skipped unless they satisfied the include/exclude conditions. This is
|
||||
inconsistent with GNU grep (and could even be seen as contrary to the
|
||||
pcregrep specification - which I improved to make it absolutely clear).
|
||||
The action now is always to scan all levels of directory, and just
|
||||
apply the include/exclude patterns to regular files.
|
||||
|
||||
5. Added the --include_dir and --exclude_dir patterns to pcregrep, and used
|
||||
--exclude_dir in the tests to avoid scanning .svn directories.
|
||||
|
||||
6. Applied Craig's patch to the QuoteMeta function so that it escapes the
|
||||
NUL character as backslash + 0 rather than backslash + NUL, because PCRE
|
||||
doesn't support NULs in patterns.
|
||||
|
||||
7. Added some missing "const"s to declarations of static tables in
|
||||
pcre_compile.c and pcre_dfa_exec.c.
|
||||
|
||||
8. Applied Craig's patch to pcrecpp.cc to fix a problem in OS X that was
|
||||
caused by fix #2 above. (Subsequently also a second patch to fix the
|
||||
first patch. And a third patch - this was a messy problem.)
|
||||
|
||||
9. Applied Craig's patch to remove the use of push_back().
|
||||
|
||||
10. Applied Alan Lehotsky's patch to add REG_STARTEND support to the POSIX
|
||||
matching function regexec().
|
||||
|
||||
11. Added support for the Oniguruma syntax \g<name>, \g<n>, \g'name', \g'n',
|
||||
which, however, unlike Perl's \g{...}, are subroutine calls, not back
|
||||
references. PCRE supports relative numbers with this syntax (I don't think
|
||||
Oniguruma does).
|
||||
|
||||
12. Previously, a group with a zero repeat such as (...){0} was completely
|
||||
omitted from the compiled regex. However, this means that if the group
|
||||
was called as a subroutine from elsewhere in the pattern, things went wrong
|
||||
(an internal error was given). Such groups are now left in the compiled
|
||||
pattern, with a new opcode that causes them to be skipped at execution
|
||||
time.
|
||||
|
||||
13. Added the PCRE_JAVASCRIPT_COMPAT option. This makes the following changes
|
||||
to the way PCRE behaves:
|
||||
|
||||
(a) A lone ] character is dis-allowed (Perl treats it as data).
|
||||
|
||||
(b) A back reference to an unmatched subpattern matches an empty string
|
||||
(Perl fails the current match path).
|
||||
|
||||
(c) A data ] in a character class must be notated as \] because if the
|
||||
first data character in a class is ], it defines an empty class. (In
|
||||
Perl it is not possible to have an empty class.) The empty class []
|
||||
never matches; it forces failure and is equivalent to (*FAIL) or (?!).
|
||||
The negative empty class [^] matches any one character, independently
|
||||
of the DOTALL setting.
|
||||
|
||||
14. A pattern such as /(?2)[]a()b](abc)/ which had a forward reference to a
|
||||
non-existent subpattern following a character class starting with ']' and
|
||||
containing () gave an internal compiling error instead of "reference to
|
||||
non-existent subpattern". Fortunately, when the pattern did exist, the
|
||||
compiled code was correct. (When scanning forwards to check for the
|
||||
existencd of the subpattern, it was treating the data ']' as terminating
|
||||
the class, so got the count wrong. When actually compiling, the reference
|
||||
was subsequently set up correctly.)
|
||||
|
||||
15. The "always fail" assertion (?!) is optimzed to (*FAIL) by pcre_compile;
|
||||
it was being rejected as not supported by pcre_dfa_exec(), even though
|
||||
other assertions are supported. I have made pcre_dfa_exec() support
|
||||
(*FAIL).
|
||||
|
||||
16. The implementation of 13c above involved the invention of a new opcode,
|
||||
OP_ALLANY, which is like OP_ANY but doesn't check the /s flag. Since /s
|
||||
cannot be changed at match time, I realized I could make a small
|
||||
improvement to matching performance by compiling OP_ALLANY instead of
|
||||
OP_ANY for "." when DOTALL was set, and then removing the runtime tests
|
||||
on the OP_ANY path.
|
||||
|
||||
17. Compiling pcretest on Windows with readline support failed without the
|
||||
following two fixes: (1) Make the unistd.h include conditional on
|
||||
HAVE_UNISTD_H; (2) #define isatty and fileno as _isatty and _fileno.
|
||||
|
||||
18. Changed CMakeLists.txt and cmake/FindReadline.cmake to arrange for the
|
||||
ncurses library to be included for pcretest when ReadLine support is
|
||||
requested, but also to allow for it to be overridden. This patch came from
|
||||
Daniel Bergström.
|
||||
|
||||
19. There was a typo in the file ucpinternal.h where f0_rangeflag was defined
|
||||
as 0x00f00000 instead of 0x00800000. Luckily, this would not have caused
|
||||
any errors with the current Unicode tables. Thanks to Peter Kankowski for
|
||||
spotting this.
|
||||
|
||||
|
||||
Version 7.6 28-Jan-08
|
||||
---------------------
|
||||
|
||||
|
@ -125,7 +125,8 @@ Opcodes with no following data
|
||||
These items are all just one byte long
|
||||
|
||||
OP_END end of pattern
|
||||
OP_ANY match any character
|
||||
OP_ANY match any one character other than newline
|
||||
OP_ALLANY match any one character, including newline
|
||||
OP_ANYBYTE match any single byte, even in UTF-8 mode
|
||||
OP_SOD match start of data: \A
|
||||
OP_SOM, start of match (subject + offset): \G
|
||||
@ -318,9 +319,12 @@ maximally respectively. All three are followed by LINK_SIZE bytes giving (as a
|
||||
positive number) the offset back to the matching bracket opcode.
|
||||
|
||||
If a subpattern is quantified such that it is permitted to match zero times, it
|
||||
is preceded by one of OP_BRAZERO or OP_BRAMINZERO. These are single-byte
|
||||
opcodes which tell the matcher that skipping this subpattern entirely is a
|
||||
valid branch.
|
||||
is preceded by one of OP_BRAZERO, OP_BRAMINZERO, or OP_SKIPZERO. These are
|
||||
single-byte opcodes that tell the matcher that skipping the following
|
||||
subpattern entirely is a valid branch. In the case of the first two, not
|
||||
skipping the pattern is also valid (greedy and non-greedy). The third is used
|
||||
when a pattern has the quantifier {0,0}. It cannot be entirely discarded,
|
||||
because it may be called as a subroutine from elsewhere in the regex.
|
||||
|
||||
A subpattern with an indefinite maximum repetition is replicated in the
|
||||
compiled data its minimum number of times (or once with OP_BRAZERO if the
|
||||
@ -411,4 +415,4 @@ at compile time, and so does not cause anything to be put into the compiled
|
||||
data.
|
||||
|
||||
Philip Hazel
|
||||
August 2007
|
||||
April 2008
|
||||
|
@ -1,6 +1,14 @@
|
||||
News about PCRE releases
|
||||
------------------------
|
||||
|
||||
|
||||
Release 7.7 07-May-08
|
||||
---------------------
|
||||
|
||||
This is once again mainly a bug-fix release, but there are a couple of new
|
||||
features.
|
||||
|
||||
|
||||
Release 7.6 28-Jan-08
|
||||
---------------------
|
||||
|
||||
|
@ -276,6 +276,15 @@ library. You can read more about them in the pcrebuild man page.
|
||||
Note that libreadline is GPL-licenced, so if you distribute a binary of
|
||||
pcretest linked in this way, there may be licensing issues.
|
||||
|
||||
Setting this option causes the -lreadline option to be added to the pcretest
|
||||
build. In many operating environments with a sytem-installed readline
|
||||
library this is sufficient. However, in some environments (e.g. if an
|
||||
unmodified distribution version of readline is in use), it may be necessary
|
||||
to specify something like LIBS="-lncurses" as well. This is because, to quote
|
||||
the readline INSTALL, "Readline uses the termcap functions, but does not link
|
||||
with the termcap or curses library itself, allowing applications which link
|
||||
with readline the to choose an appropriate library."
|
||||
|
||||
The "configure" script builds the following files for the basic C library:
|
||||
|
||||
. Makefile is the makefile that builds the library
|
||||
@ -740,4 +749,4 @@ The distribution should contain the following files:
|
||||
Philip Hazel
|
||||
Email local part: ph10
|
||||
Email domain: cam.ac.uk
|
||||
Last updated: 25 January 2008
|
||||
Last updated: 13 April 2008
|
||||
|
@ -132,9 +132,7 @@ them both to 0; an emulation function will be used. */
|
||||
#endif
|
||||
|
||||
/* Define to 1 if you have the `strtoll' function. */
|
||||
#ifndef HAVE_STRTOLL
|
||||
#define HAVE_STRTOLL 1
|
||||
#endif
|
||||
/* #undef HAVE_STRTOLL */
|
||||
|
||||
/* Define to 1 if you have the `strtoq' function. */
|
||||
#ifndef HAVE_STRTOQ
|
||||
@ -251,13 +249,13 @@ them both to 0; an emulation function will be used. */
|
||||
#define PACKAGE_NAME "PCRE"
|
||||
|
||||
/* Define to the full name and version of this package. */
|
||||
#define PACKAGE_STRING "PCRE 7.6"
|
||||
#define PACKAGE_STRING "PCRE 7.7"
|
||||
|
||||
/* Define to the one symbol short name of this package. */
|
||||
#define PACKAGE_TARNAME "pcre"
|
||||
|
||||
/* Define to the version of this package. */
|
||||
#define PACKAGE_VERSION "7.6"
|
||||
#define PACKAGE_VERSION "7.7"
|
||||
|
||||
|
||||
/* If you are compiling for a system other than a Unix-like system or
|
||||
@ -310,7 +308,7 @@ them both to 0; an emulation function will be used. */
|
||||
|
||||
/* Version number of package */
|
||||
#ifndef VERSION
|
||||
#define VERSION "7.6"
|
||||
#define VERSION "7.7"
|
||||
#endif
|
||||
|
||||
/* Define to empty if `const' does not conform to ANSI C. */
|
||||
|
File diff suppressed because it is too large
Load Diff
@ -42,9 +42,9 @@ POSSIBILITY OF SUCH DAMAGE.
|
||||
/* The current PCRE version information. */
|
||||
|
||||
#define PCRE_MAJOR 7
|
||||
#define PCRE_MINOR 6
|
||||
#define PCRE_MINOR 7
|
||||
#define PCRE_PRERELEASE
|
||||
#define PCRE_DATE 2008-01-28
|
||||
#define PCRE_DATE 2008-05-07
|
||||
|
||||
/* When an application links to a PCRE DLL in Windows, the symbols that are
|
||||
imported have to be identified as such. When building PCRE, the appropriate
|
||||
@ -124,6 +124,7 @@ extern "C" {
|
||||
#define PCRE_NEWLINE_ANYCRLF 0x00500000
|
||||
#define PCRE_BSR_ANYCRLF 0x00800000
|
||||
#define PCRE_BSR_UNICODE 0x01000000
|
||||
#define PCRE_JAVASCRIPT_COMPAT 0x02000000
|
||||
|
||||
/* Exec-time and get/set-time error codes */
|
||||
|
||||
|
@ -156,7 +156,7 @@ static const char verbnames[] =
|
||||
"SKIP\0"
|
||||
"THEN";
|
||||
|
||||
static verbitem verbs[] = {
|
||||
static const verbitem verbs[] = {
|
||||
{ 6, OP_ACCEPT },
|
||||
{ 6, OP_COMMIT },
|
||||
{ 1, OP_FAIL },
|
||||
@ -166,7 +166,7 @@ static verbitem verbs[] = {
|
||||
{ 4, OP_THEN }
|
||||
};
|
||||
|
||||
static int verbcount = sizeof(verbs)/sizeof(verbitem);
|
||||
static const int verbcount = sizeof(verbs)/sizeof(verbitem);
|
||||
|
||||
|
||||
/* Tables of names of POSIX character classes and their lengths. The names are
|
||||
@ -293,14 +293,15 @@ static const char error_texts[] =
|
||||
/* 55 */
|
||||
"repeating a DEFINE group is not allowed\0"
|
||||
"inconsistent NEWLINE options\0"
|
||||
"\\g is not followed by a braced name or an optionally braced non-zero number\0"
|
||||
"(?+ or (?- or (?(+ or (?(- must be followed by a non-zero number\0"
|
||||
"\\g is not followed by a braced, angle-bracketed, or quoted name/number or by a plain number\0"
|
||||
"a numbered reference must not be zero\0"
|
||||
"(*VERB) with an argument is not supported\0"
|
||||
/* 60 */
|
||||
"(*VERB) not recognized\0"
|
||||
"number is too big\0"
|
||||
"subpattern name expected\0"
|
||||
"digit expected after (?+";
|
||||
"digit expected after (?+\0"
|
||||
"] is an invalid data character in JavaScript compatibility mode";
|
||||
|
||||
|
||||
/* Table to identify digits and hex digits. This is used when compiling
|
||||
@ -529,14 +530,31 @@ else
|
||||
*errorcodeptr = ERR37;
|
||||
break;
|
||||
|
||||
/* \g must be followed by a number, either plain or braced. If positive, it
|
||||
is an absolute backreference. If negative, it is a relative backreference.
|
||||
This is a Perl 5.10 feature. Perl 5.10 also supports \g{name} as a
|
||||
reference to a named group. This is part of Perl's movement towards a
|
||||
unified syntax for back references. As this is synonymous with \k{name}, we
|
||||
fudge it up by pretending it really was \k. */
|
||||
/* \g must be followed by one of a number of specific things:
|
||||
|
||||
(1) A number, either plain or braced. If positive, it is an absolute
|
||||
backreference. If negative, it is a relative backreference. This is a Perl
|
||||
5.10 feature.
|
||||
|
||||
(2) Perl 5.10 also supports \g{name} as a reference to a named group. This
|
||||
is part of Perl's movement towards a unified syntax for back references. As
|
||||
this is synonymous with \k{name}, we fudge it up by pretending it really
|
||||
was \k.
|
||||
|
||||
(3) For Oniguruma compatibility we also support \g followed by a name or a
|
||||
number either in angle brackets or in single quotes. However, these are
|
||||
(possibly recursive) subroutine calls, _not_ backreferences. Just return
|
||||
the -ESC_g code (cf \k). */
|
||||
|
||||
case 'g':
|
||||
if (ptr[1] == '<' || ptr[1] == '\'')
|
||||
{
|
||||
c = -ESC_g;
|
||||
break;
|
||||
}
|
||||
|
||||
/* Handle the Perl-compatible cases */
|
||||
|
||||
if (ptr[1] == '{')
|
||||
{
|
||||
const uschar *p;
|
||||
@ -563,18 +581,24 @@ else
|
||||
while ((digitab[ptr[1]] & ctype_digit) != 0)
|
||||
c = c * 10 + *(++ptr) - '0';
|
||||
|
||||
if (c < 0)
|
||||
if (c < 0) /* Integer overflow */
|
||||
{
|
||||
*errorcodeptr = ERR61;
|
||||
break;
|
||||
}
|
||||
|
||||
if (c == 0 || (braced && *(++ptr) != '}'))
|
||||
if (braced && *(++ptr) != '}')
|
||||
{
|
||||
*errorcodeptr = ERR57;
|
||||
break;
|
||||
}
|
||||
|
||||
if (c == 0)
|
||||
{
|
||||
*errorcodeptr = ERR58;
|
||||
break;
|
||||
}
|
||||
|
||||
if (negated)
|
||||
{
|
||||
if (c > bracount)
|
||||
@ -609,7 +633,7 @@ else
|
||||
c -= '0';
|
||||
while ((digitab[ptr[1]] & ctype_digit) != 0)
|
||||
c = c * 10 + *(++ptr) - '0';
|
||||
if (c < 0)
|
||||
if (c < 0) /* Integer overflow */
|
||||
{
|
||||
*errorcodeptr = ERR61;
|
||||
break;
|
||||
@ -950,7 +974,7 @@ be terminated by '>' because that is checked in the first pass.
|
||||
|
||||
Arguments:
|
||||
ptr current position in the pattern
|
||||
count current count of capturing parens so far encountered
|
||||
cd compile background data
|
||||
name name to seek, or NULL if seeking a numbered subpattern
|
||||
lorn name length, or subpattern number if name is NULL
|
||||
xmode TRUE if we are in /x mode
|
||||
@ -959,10 +983,11 @@ Returns: the number of the named subpattern, or -1 if not found
|
||||
*/
|
||||
|
||||
static int
|
||||
find_parens(const uschar *ptr, int count, const uschar *name, int lorn,
|
||||
find_parens(const uschar *ptr, compile_data *cd, const uschar *name, int lorn,
|
||||
BOOL xmode)
|
||||
{
|
||||
const uschar *thisname;
|
||||
int count = cd->bracount;
|
||||
|
||||
for (; *ptr != 0; ptr++)
|
||||
{
|
||||
@ -982,10 +1007,34 @@ for (; *ptr != 0; ptr++)
|
||||
continue;
|
||||
}
|
||||
|
||||
/* Skip over character classes */
|
||||
/* Skip over character classes; this logic must be similar to the way they
|
||||
are handled for real. If the first character is '^', skip it. Also, if the
|
||||
first few characters (either before or after ^) are \Q\E or \E we skip them
|
||||
too. This makes for compatibility with Perl. */
|
||||
|
||||
if (*ptr == '[')
|
||||
{
|
||||
BOOL negate_class = FALSE;
|
||||
for (;;)
|
||||
{
|
||||
int c = *(++ptr);
|
||||
if (c == '\\')
|
||||
{
|
||||
if (ptr[1] == 'E') ptr++;
|
||||
else if (strncmp((const char *)ptr+1, "Q\\E", 3) == 0) ptr += 3;
|
||||
else break;
|
||||
}
|
||||
else if (!negate_class && c == '^')
|
||||
negate_class = TRUE;
|
||||
else break;
|
||||
}
|
||||
|
||||
/* If the next character is ']', it is a data character that must be
|
||||
skipped, except in JavaScript compatibility mode. */
|
||||
|
||||
if (ptr[1] == ']' && (cd->external_options & PCRE_JAVASCRIPT_COMPAT) == 0)
|
||||
ptr++;
|
||||
|
||||
while (*(++ptr) != ']')
|
||||
{
|
||||
if (*ptr == 0) return -1;
|
||||
@ -1250,6 +1299,7 @@ for (;;)
|
||||
case OP_NOT_WORDCHAR:
|
||||
case OP_WORDCHAR:
|
||||
case OP_ANY:
|
||||
case OP_ALLANY:
|
||||
branchlength++;
|
||||
cc++;
|
||||
break;
|
||||
@ -1542,7 +1592,7 @@ for (code = first_significant_code(code + _pcre_OP_lengths[*code], NULL, 0, TRUE
|
||||
|
||||
/* Groups with zero repeats can of course be empty; skip them. */
|
||||
|
||||
if (c == OP_BRAZERO || c == OP_BRAMINZERO)
|
||||
if (c == OP_BRAZERO || c == OP_BRAMINZERO || c == OP_SKIPZERO)
|
||||
{
|
||||
code += _pcre_OP_lengths[c];
|
||||
do code += GET(code, 1); while (*code == OP_ALT);
|
||||
@ -1628,6 +1678,7 @@ for (code = first_significant_code(code + _pcre_OP_lengths[*code], NULL, 0, TRUE
|
||||
case OP_NOT_WORDCHAR:
|
||||
case OP_WORDCHAR:
|
||||
case OP_ANY:
|
||||
case OP_ALLANY:
|
||||
case OP_ANYBYTE:
|
||||
case OP_CHAR:
|
||||
case OP_CHARNC:
|
||||
@ -1822,11 +1873,12 @@ return -1;
|
||||
that is referenced. This means that groups can be replicated for fixed
|
||||
repetition simply by copying (because the recursion is allowed to refer to
|
||||
earlier groups that are outside the current group). However, when a group is
|
||||
optional (i.e. the minimum quantifier is zero), OP_BRAZERO is inserted before
|
||||
it, after it has been compiled. This means that any OP_RECURSE items within it
|
||||
that refer to the group itself or any contained groups have to have their
|
||||
offsets adjusted. That one of the jobs of this function. Before it is called,
|
||||
the partially compiled regex must be temporarily terminated with OP_END.
|
||||
optional (i.e. the minimum quantifier is zero), OP_BRAZERO or OP_SKIPZERO is
|
||||
inserted before it, after it has been compiled. This means that any OP_RECURSE
|
||||
items within it that refer to the group itself or any contained groups have to
|
||||
have their offsets adjusted. That one of the jobs of this function. Before it
|
||||
is called, the partially compiled regex must be temporarily terminated with
|
||||
OP_END.
|
||||
|
||||
This function has been extended with the possibility of forward references for
|
||||
recursions and subroutine calls. It must also check the list of such references
|
||||
@ -2111,7 +2163,6 @@ if (next >= 0) switch(op_code)
|
||||
/* For OP_NOT, "item" must be a single-byte character. */
|
||||
|
||||
case OP_NOT:
|
||||
if (next < 0) return FALSE; /* Not a character */
|
||||
if (item == next) return TRUE;
|
||||
if ((options & PCRE_CASELESS) == 0) return FALSE;
|
||||
#ifdef SUPPORT_UTF8
|
||||
@ -2614,7 +2665,7 @@ for (;; ptr++)
|
||||
zerofirstbyte = firstbyte;
|
||||
zeroreqbyte = reqbyte;
|
||||
previous = code;
|
||||
*code++ = OP_ANY;
|
||||
*code++ = ((options & PCRE_DOTALL) != 0)? OP_ALLANY: OP_ANY;
|
||||
break;
|
||||
|
||||
|
||||
@ -2629,7 +2680,17 @@ for (;; ptr++)
|
||||
opcode is compiled. It may optionally have a bit map for characters < 256,
|
||||
but those above are are explicitly listed afterwards. A flag byte tells
|
||||
whether the bitmap is present, and whether this is a negated class or not.
|
||||
*/
|
||||
|
||||
In JavaScript compatibility mode, an isolated ']' causes an error. In
|
||||
default (Perl) mode, it is treated as a data character. */
|
||||
|
||||
case ']':
|
||||
if ((cd->external_options & PCRE_JAVASCRIPT_COMPAT) != 0)
|
||||
{
|
||||
*errorcodeptr = ERR64;
|
||||
goto FAILED;
|
||||
}
|
||||
goto NORMAL_CHAR;
|
||||
|
||||
case '[':
|
||||
previous = code;
|
||||
@ -2663,6 +2724,19 @@ for (;; ptr++)
|
||||
else break;
|
||||
}
|
||||
|
||||
/* Empty classes are allowed in JavaScript compatibility mode. Otherwise,
|
||||
an initial ']' is taken as a data character -- the code below handles
|
||||
that. In JS mode, [] must always fail, so generate OP_FAIL, whereas
|
||||
[^] must match any character, so generate OP_ALLANY. */
|
||||
|
||||
if (c ==']' && (cd->external_options & PCRE_JAVASCRIPT_COMPAT) != 0)
|
||||
{
|
||||
*code++ = negate_class? OP_ALLANY : OP_FAIL;
|
||||
if (firstbyte == REQ_UNSET) firstbyte = REQ_NONE;
|
||||
zerofirstbyte = firstbyte;
|
||||
break;
|
||||
}
|
||||
|
||||
/* If a class contains a negative special such as \S, we need to flip the
|
||||
negation flag at the end, so that support for characters > 255 works
|
||||
correctly (they are all included in the class). */
|
||||
@ -3818,28 +3892,38 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
|
||||
|
||||
if (repeat_min == 0)
|
||||
{
|
||||
/* If the maximum is also zero, we just omit the group from the output
|
||||
altogether. */
|
||||
/* If the maximum is also zero, we used to just omit the group from the
|
||||
output altogether, like this:
|
||||
|
||||
if (repeat_max == 0)
|
||||
{
|
||||
code = previous;
|
||||
goto END_REPEAT;
|
||||
}
|
||||
** if (repeat_max == 0)
|
||||
** {
|
||||
** code = previous;
|
||||
** goto END_REPEAT;
|
||||
** }
|
||||
|
||||
/* If the maximum is 1 or unlimited, we just have to stick in the
|
||||
BRAZERO and do no more at this point. However, we do need to adjust
|
||||
any OP_RECURSE calls inside the group that refer to the group itself or
|
||||
any internal or forward referenced group, because the offset is from
|
||||
the start of the whole regex. Temporarily terminate the pattern while
|
||||
doing this. */
|
||||
However, that fails when a group is referenced as a subroutine from
|
||||
elsewhere in the pattern, so now we stick in OP_SKIPZERO in front of it
|
||||
so that it is skipped on execution. As we don't have a list of which
|
||||
groups are referenced, we cannot do this selectively.
|
||||
|
||||
if (repeat_max <= 1)
|
||||
If the maximum is 1 or unlimited, we just have to stick in the BRAZERO
|
||||
and do no more at this point. However, we do need to adjust any
|
||||
OP_RECURSE calls inside the group that refer to the group itself or any
|
||||
internal or forward referenced group, because the offset is from the
|
||||
start of the whole regex. Temporarily terminate the pattern while doing
|
||||
this. */
|
||||
|
||||
if (repeat_max <= 1) /* Covers 0, 1, and unlimited */
|
||||
{
|
||||
*code = OP_END;
|
||||
adjust_recurse(previous, 1, utf8, cd, save_hwm);
|
||||
memmove(previous+1, previous, len);
|
||||
code++;
|
||||
if (repeat_max == 0)
|
||||
{
|
||||
*previous++ = OP_SKIPZERO;
|
||||
goto END_REPEAT;
|
||||
}
|
||||
*previous++ = OP_BRAZERO + repeat_type;
|
||||
}
|
||||
|
||||
@ -4034,6 +4118,13 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
|
||||
}
|
||||
}
|
||||
|
||||
/* If previous is OP_FAIL, it was generated by an empty class [] in
|
||||
JavaScript mode. The other ways in which OP_FAIL can be generated, that is
|
||||
by (*FAIL) or (?!) set previous to NULL, which gives a "nothing to repeat"
|
||||
error above. We can just ignore the repeat in JS case. */
|
||||
|
||||
else if (*previous == OP_FAIL) goto END_REPEAT;
|
||||
|
||||
/* Else there's some kind of shambles */
|
||||
|
||||
else
|
||||
@ -4320,7 +4411,7 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
|
||||
|
||||
/* Search the pattern for a forward reference */
|
||||
|
||||
else if ((i = find_parens(ptr, cd->bracount, name, namelen,
|
||||
else if ((i = find_parens(ptr, cd, name, namelen,
|
||||
(options & PCRE_EXTENDED) != 0)) > 0)
|
||||
{
|
||||
PUT2(code, 2+LINK_SIZE, i);
|
||||
@ -4566,7 +4657,7 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
|
||||
references (?P=name) and recursion (?P>name), as well as falling
|
||||
through from the Perl recursion syntax (?&name). We also come here from
|
||||
the Perl \k<name> or \k'name' back reference syntax and the \k{name}
|
||||
.NET syntax. */
|
||||
.NET syntax, and the Oniguruma \g<...> and \g'...' subroutine syntax. */
|
||||
|
||||
NAMED_REF_OR_RECURSE:
|
||||
name = ++ptr;
|
||||
@ -4617,7 +4708,7 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
|
||||
recno = GET2(slot, 0);
|
||||
}
|
||||
else if ((recno = /* Forward back reference */
|
||||
find_parens(ptr, cd->bracount, name, namelen,
|
||||
find_parens(ptr, cd, name, namelen,
|
||||
(options & PCRE_EXTENDED) != 0)) <= 0)
|
||||
{
|
||||
*errorcodeptr = ERR15;
|
||||
@ -4644,6 +4735,15 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
|
||||
case '5': case '6': case '7': case '8': case '9': /* subroutine */
|
||||
{
|
||||
const uschar *called;
|
||||
terminator = ')';
|
||||
|
||||
/* Come here from the \g<...> and \g'...' code (Oniguruma
|
||||
compatibility). However, the syntax has been checked to ensure that
|
||||
the ... are a (signed) number, so that neither ERR63 nor ERR29 will
|
||||
be called on this path, nor with the jump to OTHER_CHAR_AFTER_QUERY
|
||||
ever be taken. */
|
||||
|
||||
HANDLE_NUMERICAL_RECURSION:
|
||||
|
||||
if ((refsign = *ptr) == '+')
|
||||
{
|
||||
@ -4665,7 +4765,7 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
|
||||
while((digitab[*ptr] & ctype_digit) != 0)
|
||||
recno = recno * 10 + *ptr++ - '0';
|
||||
|
||||
if (*ptr != ')')
|
||||
if (*ptr != terminator)
|
||||
{
|
||||
*errorcodeptr = ERR29;
|
||||
goto FAILED;
|
||||
@ -4718,8 +4818,8 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
|
||||
|
||||
if (called == NULL)
|
||||
{
|
||||
if (find_parens(ptr, cd->bracount, NULL, recno,
|
||||
(options & PCRE_EXTENDED) != 0) < 0)
|
||||
if (find_parens(ptr, cd, NULL, recno,
|
||||
(options & PCRE_EXTENDED) != 0) < 0)
|
||||
{
|
||||
*errorcodeptr = ERR15;
|
||||
goto FAILED;
|
||||
@ -5089,6 +5189,64 @@ we set the flag only if there is a literal "\r" or "\n" in the class. */
|
||||
zerofirstbyte = firstbyte;
|
||||
zeroreqbyte = reqbyte;
|
||||
|
||||
/* \g<name> or \g'name' is a subroutine call by name and \g<n> or \g'n'
|
||||
is a subroutine call by number (Oniguruma syntax). In fact, the value
|
||||
-ESC_g is returned only for these cases. So we don't need to check for <
|
||||
or ' if the value is -ESC_g. For the Perl syntax \g{n} the value is
|
||||
-ESC_REF+n, and for the Perl syntax \g{name} the result is -ESC_k (as
|
||||
that is a synonym for a named back reference). */
|
||||
|
||||
if (-c == ESC_g)
|
||||
{
|
||||
const uschar *p;
|
||||
save_hwm = cd->hwm; /* Normally this is set when '(' is read */
|
||||
terminator = (*(++ptr) == '<')? '>' : '\'';
|
||||
|
||||
/* These two statements stop the compiler for warning about possibly
|
||||
unset variables caused by the jump to HANDLE_NUMERICAL_RECURSION. In
|
||||
fact, because we actually check for a number below, the paths that
|
||||
would actually be in error are never taken. */
|
||||
|
||||
skipbytes = 0;
|
||||
reset_bracount = FALSE;
|
||||
|
||||
/* Test for a name */
|
||||
|
||||
if (ptr[1] != '+' && ptr[1] != '-')
|
||||
{
|
||||
BOOL isnumber = TRUE;
|
||||
for (p = ptr + 1; *p != 0 && *p != terminator; p++)
|
||||
{
|
||||
if ((cd->ctypes[*p] & ctype_digit) == 0) isnumber = FALSE;
|
||||
if ((cd->ctypes[*p] & ctype_word) == 0) break;
|
||||
}
|
||||
if (*p != terminator)
|
||||
{
|
||||
*errorcodeptr = ERR57;
|
||||
break;
|
||||
}
|
||||
if (isnumber)
|
||||
{
|
||||
ptr++;
|
||||
goto HANDLE_NUMERICAL_RECURSION;
|
||||
}
|
||||
is_recurse = TRUE;
|
||||
goto NAMED_REF_OR_RECURSE;
|
||||
}
|
||||
|
||||
/* Test a signed number in angle brackets or quotes. */
|
||||
|
||||
p = ptr + 2;
|
||||
while ((digitab[*p] & ctype_digit) != 0) p++;
|
||||
if (*p != terminator)
|
||||
{
|
||||
*errorcodeptr = ERR57;
|
||||
break;
|
||||
}
|
||||
ptr++;
|
||||
goto HANDLE_NUMERICAL_RECURSION;
|
||||
}
|
||||
|
||||
/* \k<name> or \k'name' is a back reference by name (Perl syntax).
|
||||
We also support \k{name} (.NET syntax) */
|
||||
|
||||
@ -5595,14 +5753,14 @@ do {
|
||||
if (!is_anchored(scode, options, bracket_map, backref_map)) return FALSE;
|
||||
}
|
||||
|
||||
/* .* is not anchored unless DOTALL is set and it isn't in brackets that
|
||||
are or may be referenced. */
|
||||
/* .* is not anchored unless DOTALL is set (which generates OP_ALLANY) and
|
||||
it isn't in brackets that are or may be referenced. */
|
||||
|
||||
else if ((op == OP_TYPESTAR || op == OP_TYPEMINSTAR ||
|
||||
op == OP_TYPEPOSSTAR) &&
|
||||
(*options & PCRE_DOTALL) != 0)
|
||||
op == OP_TYPEPOSSTAR))
|
||||
{
|
||||
if (scode[1] != OP_ANY || (bracket_map & backref_map) != 0) return FALSE;
|
||||
if (scode[1] != OP_ALLANY || (bracket_map & backref_map) != 0)
|
||||
return FALSE;
|
||||
}
|
||||
|
||||
/* Check for explicit anchoring */
|
||||
|
@ -1146,11 +1146,11 @@ for (;;)
|
||||
do ecode += GET(ecode,1); while (*ecode == OP_ALT);
|
||||
break;
|
||||
|
||||
/* BRAZERO and BRAMINZERO occur just before a bracket group, indicating
|
||||
that it may occur zero times. It may repeat infinitely, or not at all -
|
||||
i.e. it could be ()* or ()? in the pattern. Brackets with fixed upper
|
||||
repeat limits are compiled as a number of copies, with the optional ones
|
||||
preceded by BRAZERO or BRAMINZERO. */
|
||||
/* BRAZERO, BRAMINZERO and SKIPZERO occur just before a bracket group,
|
||||
indicating that it may occur zero times. It may repeat infinitely, or not
|
||||
at all - i.e. it could be ()* or ()? or even (){0} in the pattern. Brackets
|
||||
with fixed upper repeat limits are compiled as a number of copies, with the
|
||||
optional ones preceded by BRAZERO or BRAMINZERO. */
|
||||
|
||||
case OP_BRAZERO:
|
||||
{
|
||||
@ -1172,6 +1172,14 @@ for (;;)
|
||||
}
|
||||
break;
|
||||
|
||||
case OP_SKIPZERO:
|
||||
{
|
||||
next = ecode+1;
|
||||
do next += GET(next,1); while (*next == OP_ALT);
|
||||
ecode = next + 1 + LINK_SIZE;
|
||||
}
|
||||
break;
|
||||
|
||||
/* End of a group, repeated or non-repeating. */
|
||||
|
||||
case OP_KET:
|
||||
@ -1419,13 +1427,12 @@ for (;;)
|
||||
/* Match a single character type; inline for speed */
|
||||
|
||||
case OP_ANY:
|
||||
if ((ims & PCRE_DOTALL) == 0)
|
||||
{
|
||||
if (IS_NEWLINE(eptr)) RRETURN(MATCH_NOMATCH);
|
||||
}
|
||||
if (IS_NEWLINE(eptr)) RRETURN(MATCH_NOMATCH);
|
||||
/* Fall through */
|
||||
|
||||
case OP_ALLANY:
|
||||
if (eptr++ >= md->end_subject) RRETURN(MATCH_NOMATCH);
|
||||
if (utf8)
|
||||
while (eptr < md->end_subject && (*eptr & 0xc0) == 0x80) eptr++;
|
||||
if (utf8) while (eptr < md->end_subject && (*eptr & 0xc0) == 0x80) eptr++;
|
||||
ecode++;
|
||||
break;
|
||||
|
||||
@ -1721,16 +1728,25 @@ for (;;)
|
||||
case OP_REF:
|
||||
{
|
||||
offset = GET2(ecode, 1) << 1; /* Doubled ref number */
|
||||
ecode += 3; /* Advance past item */
|
||||
ecode += 3;
|
||||
|
||||
/* If the reference is unset, set the length to be longer than the amount
|
||||
of subject left; this ensures that every attempt at a match fails. We
|
||||
can't just fail here, because of the possibility of quantifiers with zero
|
||||
minima. */
|
||||
/* If the reference is unset, there are two possibilities:
|
||||
|
||||
length = (offset >= offset_top || md->offset_vector[offset] < 0)?
|
||||
md->end_subject - eptr + 1 :
|
||||
md->offset_vector[offset+1] - md->offset_vector[offset];
|
||||
(a) In the default, Perl-compatible state, set the length to be longer
|
||||
than the amount of subject left; this ensures that every attempt at a
|
||||
match fails. We can't just fail here, because of the possibility of
|
||||
quantifiers with zero minima.
|
||||
|
||||
(b) If the JavaScript compatibility flag is set, set the length to zero
|
||||
so that the back reference matches an empty string.
|
||||
|
||||
Otherwise, set the length to the length of what was matched by the
|
||||
referenced subpattern. */
|
||||
|
||||
if (offset >= offset_top || md->offset_vector[offset] < 0)
|
||||
length = (md->jscript_compat)? 0 : md->end_subject - eptr + 1;
|
||||
else
|
||||
length = md->offset_vector[offset+1] - md->offset_vector[offset];
|
||||
|
||||
/* Set up for repetition, or handle the non-repeated case */
|
||||
|
||||
@ -2933,14 +2949,22 @@ for (;;)
|
||||
case OP_ANY:
|
||||
for (i = 1; i <= min; i++)
|
||||
{
|
||||
if (eptr >= md->end_subject ||
|
||||
((ims & PCRE_DOTALL) == 0 && IS_NEWLINE(eptr)))
|
||||
if (eptr >= md->end_subject || IS_NEWLINE(eptr))
|
||||
RRETURN(MATCH_NOMATCH);
|
||||
eptr++;
|
||||
while (eptr < md->end_subject && (*eptr & 0xc0) == 0x80) eptr++;
|
||||
}
|
||||
break;
|
||||
|
||||
case OP_ALLANY:
|
||||
for (i = 1; i <= min; i++)
|
||||
{
|
||||
if (eptr >= md->end_subject) RRETURN(MATCH_NOMATCH);
|
||||
eptr++;
|
||||
while (eptr < md->end_subject && (*eptr & 0xc0) == 0x80) eptr++;
|
||||
}
|
||||
break;
|
||||
|
||||
case OP_ANYBYTE:
|
||||
eptr += min;
|
||||
break;
|
||||
@ -3149,15 +3173,15 @@ for (;;)
|
||||
switch(ctype)
|
||||
{
|
||||
case OP_ANY:
|
||||
if ((ims & PCRE_DOTALL) == 0)
|
||||
for (i = 1; i <= min; i++)
|
||||
{
|
||||
for (i = 1; i <= min; i++)
|
||||
{
|
||||
if (IS_NEWLINE(eptr)) RRETURN(MATCH_NOMATCH);
|
||||
eptr++;
|
||||
}
|
||||
if (IS_NEWLINE(eptr)) RRETURN(MATCH_NOMATCH);
|
||||
eptr++;
|
||||
}
|
||||
else eptr += min;
|
||||
break;
|
||||
|
||||
case OP_ALLANY:
|
||||
eptr += min;
|
||||
break;
|
||||
|
||||
case OP_ANYBYTE:
|
||||
@ -3414,16 +3438,14 @@ for (;;)
|
||||
RMATCH(eptr, ecode, offset_top, md, ims, eptrb, 0, RM42);
|
||||
if (rrc != MATCH_NOMATCH) RRETURN(rrc);
|
||||
if (fi >= max || eptr >= md->end_subject ||
|
||||
(ctype == OP_ANY && (ims & PCRE_DOTALL) == 0 &&
|
||||
IS_NEWLINE(eptr)))
|
||||
(ctype == OP_ANY && IS_NEWLINE(eptr)))
|
||||
RRETURN(MATCH_NOMATCH);
|
||||
|
||||
GETCHARINC(c, eptr);
|
||||
switch(ctype)
|
||||
{
|
||||
case OP_ANY: /* This is the DOTALL case */
|
||||
break;
|
||||
|
||||
case OP_ANY: /* This is the non-NL case */
|
||||
case OP_ALLANY:
|
||||
case OP_ANYBYTE:
|
||||
break;
|
||||
|
||||
@ -3575,15 +3597,14 @@ for (;;)
|
||||
RMATCH(eptr, ecode, offset_top, md, ims, eptrb, 0, RM43);
|
||||
if (rrc != MATCH_NOMATCH) RRETURN(rrc);
|
||||
if (fi >= max || eptr >= md->end_subject ||
|
||||
((ims & PCRE_DOTALL) == 0 && IS_NEWLINE(eptr)))
|
||||
(ctype == OP_ANY && IS_NEWLINE(eptr)))
|
||||
RRETURN(MATCH_NOMATCH);
|
||||
|
||||
c = *eptr++;
|
||||
switch(ctype)
|
||||
{
|
||||
case OP_ANY: /* This is the DOTALL case */
|
||||
break;
|
||||
|
||||
case OP_ANY: /* This is the non-NL case */
|
||||
case OP_ALLANY:
|
||||
case OP_ANYBYTE:
|
||||
break;
|
||||
|
||||
@ -3837,23 +3858,11 @@ for (;;)
|
||||
case OP_ANY:
|
||||
if (max < INT_MAX)
|
||||
{
|
||||
if ((ims & PCRE_DOTALL) == 0)
|
||||
for (i = min; i < max; i++)
|
||||
{
|
||||
for (i = min; i < max; i++)
|
||||
{
|
||||
if (eptr >= md->end_subject || IS_NEWLINE(eptr)) break;
|
||||
eptr++;
|
||||
while (eptr < md->end_subject && (*eptr & 0xc0) == 0x80) eptr++;
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
for (i = min; i < max; i++)
|
||||
{
|
||||
if (eptr >= md->end_subject) break;
|
||||
eptr++;
|
||||
while (eptr < md->end_subject && (*eptr & 0xc0) == 0x80) eptr++;
|
||||
}
|
||||
if (eptr >= md->end_subject || IS_NEWLINE(eptr)) break;
|
||||
eptr++;
|
||||
while (eptr < md->end_subject && (*eptr & 0xc0) == 0x80) eptr++;
|
||||
}
|
||||
}
|
||||
|
||||
@ -3861,22 +3870,28 @@ for (;;)
|
||||
|
||||
else
|
||||
{
|
||||
if ((ims & PCRE_DOTALL) == 0)
|
||||
for (i = min; i < max; i++)
|
||||
{
|
||||
for (i = min; i < max; i++)
|
||||
{
|
||||
if (eptr >= md->end_subject || IS_NEWLINE(eptr)) break;
|
||||
eptr++;
|
||||
while (eptr < md->end_subject && (*eptr & 0xc0) == 0x80) eptr++;
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
eptr = md->end_subject;
|
||||
if (eptr >= md->end_subject || IS_NEWLINE(eptr)) break;
|
||||
eptr++;
|
||||
while (eptr < md->end_subject && (*eptr & 0xc0) == 0x80) eptr++;
|
||||
}
|
||||
}
|
||||
break;
|
||||
|
||||
case OP_ALLANY:
|
||||
if (max < INT_MAX)
|
||||
{
|
||||
for (i = min; i < max; i++)
|
||||
{
|
||||
if (eptr >= md->end_subject) break;
|
||||
eptr++;
|
||||
while (eptr < md->end_subject && (*eptr & 0xc0) == 0x80) eptr++;
|
||||
}
|
||||
}
|
||||
else eptr = md->end_subject; /* Unlimited UTF-8 repeat */
|
||||
break;
|
||||
|
||||
/* The byte case is the same as non-UTF8 */
|
||||
|
||||
case OP_ANYBYTE:
|
||||
@ -4062,17 +4077,14 @@ for (;;)
|
||||
switch(ctype)
|
||||
{
|
||||
case OP_ANY:
|
||||
if ((ims & PCRE_DOTALL) == 0)
|
||||
for (i = min; i < max; i++)
|
||||
{
|
||||
for (i = min; i < max; i++)
|
||||
{
|
||||
if (eptr >= md->end_subject || IS_NEWLINE(eptr)) break;
|
||||
eptr++;
|
||||
}
|
||||
break;
|
||||
if (eptr >= md->end_subject || IS_NEWLINE(eptr)) break;
|
||||
eptr++;
|
||||
}
|
||||
/* For DOTALL case, fall through and treat as \C */
|
||||
break;
|
||||
|
||||
case OP_ALLANY:
|
||||
case OP_ANYBYTE:
|
||||
c = max - min;
|
||||
if (c > (unsigned int)(md->end_subject - eptr))
|
||||
@ -4448,6 +4460,7 @@ end_subject = md->end_subject;
|
||||
|
||||
md->endonly = (re->options & PCRE_DOLLAR_ENDONLY) != 0;
|
||||
utf8 = md->utf8 = (re->options & PCRE_UTF8) != 0;
|
||||
md->jscript_compat = (re->options & PCRE_JAVASCRIPT_COMPAT) != 0;
|
||||
|
||||
md->notbol = (options & PCRE_NOTBOL) != 0;
|
||||
md->noteol = (options & PCRE_NOTEOL) != 0;
|
||||
|
@ -514,7 +514,8 @@ time, run time, or study time, respectively. */
|
||||
(PCRE_CASELESS|PCRE_EXTENDED|PCRE_ANCHORED|PCRE_MULTILINE| \
|
||||
PCRE_DOTALL|PCRE_DOLLAR_ENDONLY|PCRE_EXTRA|PCRE_UNGREEDY|PCRE_UTF8| \
|
||||
PCRE_NO_AUTO_CAPTURE|PCRE_NO_UTF8_CHECK|PCRE_AUTO_CALLOUT|PCRE_FIRSTLINE| \
|
||||
PCRE_DUPNAMES|PCRE_NEWLINE_BITS|PCRE_BSR_ANYCRLF|PCRE_BSR_UNICODE)
|
||||
PCRE_DUPNAMES|PCRE_NEWLINE_BITS|PCRE_BSR_ANYCRLF|PCRE_BSR_UNICODE| \
|
||||
PCRE_JAVASCRIPT_COMPAT)
|
||||
|
||||
#define PUBLIC_EXEC_OPTIONS \
|
||||
(PCRE_ANCHORED|PCRE_NOTBOL|PCRE_NOTEOL|PCRE_NOTEMPTY|PCRE_NO_UTF8_CHECK| \
|
||||
@ -604,16 +605,20 @@ contain UTF-8 characters with values greater than 255. */
|
||||
value such as \n. They must have non-zero values, as check_escape() returns
|
||||
their negation. Also, they must appear in the same order as in the opcode
|
||||
definitions below, up to ESC_z. There's a dummy for OP_ANY because it
|
||||
corresponds to "." rather than an escape sequence. The final one must be
|
||||
ESC_REF as subsequent values are used for backreferences (\1, \2, \3, etc).
|
||||
There are two tests in the code for an escape greater than ESC_b and less than
|
||||
ESC_Z to detect the types that may be repeated. These are the types that
|
||||
consume characters. If any new escapes are put in between that don't consume a
|
||||
character, that code will have to change. */
|
||||
corresponds to "." rather than an escape sequence, and another for OP_ALLANY
|
||||
(which is used for [^] in JavaScript compatibility mode).
|
||||
|
||||
The final escape must be ESC_REF as subsequent values are used for
|
||||
backreferences (\1, \2, \3, etc). There are two tests in the code for an escape
|
||||
greater than ESC_b and less than ESC_Z to detect the types that may be
|
||||
repeated. These are the types that consume characters. If any new escapes are
|
||||
put in between that don't consume a character, that code will have to change.
|
||||
*/
|
||||
|
||||
enum { ESC_A = 1, ESC_G, ESC_K, ESC_B, ESC_b, ESC_D, ESC_d, ESC_S, ESC_s,
|
||||
ESC_W, ESC_w, ESC_dum1, ESC_C, ESC_P, ESC_p, ESC_R, ESC_H, ESC_h,
|
||||
ESC_V, ESC_v, ESC_X, ESC_Z, ESC_z, ESC_E, ESC_Q, ESC_k, ESC_REF };
|
||||
ESC_W, ESC_w, ESC_dum1, ESC_dum2, ESC_C, ESC_P, ESC_p, ESC_R, ESC_H,
|
||||
ESC_h, ESC_V, ESC_v, ESC_X, ESC_Z, ESC_z, ESC_E, ESC_Q, ESC_g, ESC_k,
|
||||
ESC_REF };
|
||||
|
||||
|
||||
/* Opcode table: Starting from 1 (i.e. after OP_END), the values up to
|
||||
@ -639,141 +644,146 @@ enum {
|
||||
OP_WHITESPACE, /* 9 \s */
|
||||
OP_NOT_WORDCHAR, /* 10 \W */
|
||||
OP_WORDCHAR, /* 11 \w */
|
||||
OP_ANY, /* 12 Match any character */
|
||||
OP_ANYBYTE, /* 13 Match any byte (\C); different to OP_ANY for UTF-8 */
|
||||
OP_NOTPROP, /* 14 \P (not Unicode property) */
|
||||
OP_PROP, /* 15 \p (Unicode property) */
|
||||
OP_ANYNL, /* 16 \R (any newline sequence) */
|
||||
OP_NOT_HSPACE, /* 17 \H (not horizontal whitespace) */
|
||||
OP_HSPACE, /* 18 \h (horizontal whitespace) */
|
||||
OP_NOT_VSPACE, /* 19 \V (not vertical whitespace) */
|
||||
OP_VSPACE, /* 20 \v (vertical whitespace) */
|
||||
OP_EXTUNI, /* 21 \X (extended Unicode sequence */
|
||||
OP_EODN, /* 22 End of data or \n at end of data: \Z. */
|
||||
OP_EOD, /* 23 End of data: \z */
|
||||
OP_ANY, /* 12 Match any character (subject to DOTALL) */
|
||||
OP_ALLANY, /* 13 Match any character (not subject to DOTALL) */
|
||||
OP_ANYBYTE, /* 14 Match any byte (\C); different to OP_ANY for UTF-8 */
|
||||
OP_NOTPROP, /* 15 \P (not Unicode property) */
|
||||
OP_PROP, /* 16 \p (Unicode property) */
|
||||
OP_ANYNL, /* 17 \R (any newline sequence) */
|
||||
OP_NOT_HSPACE, /* 18 \H (not horizontal whitespace) */
|
||||
OP_HSPACE, /* 19 \h (horizontal whitespace) */
|
||||
OP_NOT_VSPACE, /* 20 \V (not vertical whitespace) */
|
||||
OP_VSPACE, /* 21 \v (vertical whitespace) */
|
||||
OP_EXTUNI, /* 22 \X (extended Unicode sequence */
|
||||
OP_EODN, /* 23 End of data or \n at end of data: \Z. */
|
||||
OP_EOD, /* 24 End of data: \z */
|
||||
|
||||
OP_OPT, /* 24 Set runtime options */
|
||||
OP_CIRC, /* 25 Start of line - varies with multiline switch */
|
||||
OP_DOLL, /* 26 End of line - varies with multiline switch */
|
||||
OP_CHAR, /* 27 Match one character, casefully */
|
||||
OP_CHARNC, /* 28 Match one character, caselessly */
|
||||
OP_NOT, /* 29 Match one character, not the following one */
|
||||
OP_OPT, /* 25 Set runtime options */
|
||||
OP_CIRC, /* 26 Start of line - varies with multiline switch */
|
||||
OP_DOLL, /* 27 End of line - varies with multiline switch */
|
||||
OP_CHAR, /* 28 Match one character, casefully */
|
||||
OP_CHARNC, /* 29 Match one character, caselessly */
|
||||
OP_NOT, /* 30 Match one character, not the following one */
|
||||
|
||||
OP_STAR, /* 30 The maximizing and minimizing versions of */
|
||||
OP_MINSTAR, /* 31 these six opcodes must come in pairs, with */
|
||||
OP_PLUS, /* 32 the minimizing one second. */
|
||||
OP_MINPLUS, /* 33 This first set applies to single characters.*/
|
||||
OP_QUERY, /* 34 */
|
||||
OP_MINQUERY, /* 35 */
|
||||
OP_STAR, /* 31 The maximizing and minimizing versions of */
|
||||
OP_MINSTAR, /* 32 these six opcodes must come in pairs, with */
|
||||
OP_PLUS, /* 33 the minimizing one second. */
|
||||
OP_MINPLUS, /* 34 This first set applies to single characters.*/
|
||||
OP_QUERY, /* 35 */
|
||||
OP_MINQUERY, /* 36 */
|
||||
|
||||
OP_UPTO, /* 36 From 0 to n matches */
|
||||
OP_MINUPTO, /* 37 */
|
||||
OP_EXACT, /* 38 Exactly n matches */
|
||||
OP_UPTO, /* 37 From 0 to n matches */
|
||||
OP_MINUPTO, /* 38 */
|
||||
OP_EXACT, /* 39 Exactly n matches */
|
||||
|
||||
OP_POSSTAR, /* 39 Possessified star */
|
||||
OP_POSPLUS, /* 40 Possessified plus */
|
||||
OP_POSQUERY, /* 41 Posesssified query */
|
||||
OP_POSUPTO, /* 42 Possessified upto */
|
||||
OP_POSSTAR, /* 40 Possessified star */
|
||||
OP_POSPLUS, /* 41 Possessified plus */
|
||||
OP_POSQUERY, /* 42 Posesssified query */
|
||||
OP_POSUPTO, /* 43 Possessified upto */
|
||||
|
||||
OP_NOTSTAR, /* 43 The maximizing and minimizing versions of */
|
||||
OP_NOTMINSTAR, /* 44 these six opcodes must come in pairs, with */
|
||||
OP_NOTPLUS, /* 45 the minimizing one second. They must be in */
|
||||
OP_NOTMINPLUS, /* 46 exactly the same order as those above. */
|
||||
OP_NOTQUERY, /* 47 This set applies to "not" single characters. */
|
||||
OP_NOTMINQUERY, /* 48 */
|
||||
OP_NOTSTAR, /* 44 The maximizing and minimizing versions of */
|
||||
OP_NOTMINSTAR, /* 45 these six opcodes must come in pairs, with */
|
||||
OP_NOTPLUS, /* 46 the minimizing one second. They must be in */
|
||||
OP_NOTMINPLUS, /* 47 exactly the same order as those above. */
|
||||
OP_NOTQUERY, /* 48 This set applies to "not" single characters. */
|
||||
OP_NOTMINQUERY, /* 49 */
|
||||
|
||||
OP_NOTUPTO, /* 49 From 0 to n matches */
|
||||
OP_NOTMINUPTO, /* 50 */
|
||||
OP_NOTEXACT, /* 51 Exactly n matches */
|
||||
OP_NOTUPTO, /* 50 From 0 to n matches */
|
||||
OP_NOTMINUPTO, /* 51 */
|
||||
OP_NOTEXACT, /* 52 Exactly n matches */
|
||||
|
||||
OP_NOTPOSSTAR, /* 52 Possessified versions */
|
||||
OP_NOTPOSPLUS, /* 53 */
|
||||
OP_NOTPOSQUERY, /* 54 */
|
||||
OP_NOTPOSUPTO, /* 55 */
|
||||
OP_NOTPOSSTAR, /* 53 Possessified versions */
|
||||
OP_NOTPOSPLUS, /* 54 */
|
||||
OP_NOTPOSQUERY, /* 55 */
|
||||
OP_NOTPOSUPTO, /* 56 */
|
||||
|
||||
OP_TYPESTAR, /* 56 The maximizing and minimizing versions of */
|
||||
OP_TYPEMINSTAR, /* 57 these six opcodes must come in pairs, with */
|
||||
OP_TYPEPLUS, /* 58 the minimizing one second. These codes must */
|
||||
OP_TYPEMINPLUS, /* 59 be in exactly the same order as those above. */
|
||||
OP_TYPEQUERY, /* 60 This set applies to character types such as \d */
|
||||
OP_TYPEMINQUERY, /* 61 */
|
||||
OP_TYPESTAR, /* 57 The maximizing and minimizing versions of */
|
||||
OP_TYPEMINSTAR, /* 58 these six opcodes must come in pairs, with */
|
||||
OP_TYPEPLUS, /* 59 the minimizing one second. These codes must */
|
||||
OP_TYPEMINPLUS, /* 60 be in exactly the same order as those above. */
|
||||
OP_TYPEQUERY, /* 61 This set applies to character types such as \d */
|
||||
OP_TYPEMINQUERY, /* 62 */
|
||||
|
||||
OP_TYPEUPTO, /* 62 From 0 to n matches */
|
||||
OP_TYPEMINUPTO, /* 63 */
|
||||
OP_TYPEEXACT, /* 64 Exactly n matches */
|
||||
OP_TYPEUPTO, /* 63 From 0 to n matches */
|
||||
OP_TYPEMINUPTO, /* 64 */
|
||||
OP_TYPEEXACT, /* 65 Exactly n matches */
|
||||
|
||||
OP_TYPEPOSSTAR, /* 65 Possessified versions */
|
||||
OP_TYPEPOSPLUS, /* 66 */
|
||||
OP_TYPEPOSQUERY, /* 67 */
|
||||
OP_TYPEPOSUPTO, /* 68 */
|
||||
OP_TYPEPOSSTAR, /* 66 Possessified versions */
|
||||
OP_TYPEPOSPLUS, /* 67 */
|
||||
OP_TYPEPOSQUERY, /* 68 */
|
||||
OP_TYPEPOSUPTO, /* 69 */
|
||||
|
||||
OP_CRSTAR, /* 69 The maximizing and minimizing versions of */
|
||||
OP_CRMINSTAR, /* 70 all these opcodes must come in pairs, with */
|
||||
OP_CRPLUS, /* 71 the minimizing one second. These codes must */
|
||||
OP_CRMINPLUS, /* 72 be in exactly the same order as those above. */
|
||||
OP_CRQUERY, /* 73 These are for character classes and back refs */
|
||||
OP_CRMINQUERY, /* 74 */
|
||||
OP_CRRANGE, /* 75 These are different to the three sets above. */
|
||||
OP_CRMINRANGE, /* 76 */
|
||||
OP_CRSTAR, /* 70 The maximizing and minimizing versions of */
|
||||
OP_CRMINSTAR, /* 71 all these opcodes must come in pairs, with */
|
||||
OP_CRPLUS, /* 72 the minimizing one second. These codes must */
|
||||
OP_CRMINPLUS, /* 73 be in exactly the same order as those above. */
|
||||
OP_CRQUERY, /* 74 These are for character classes and back refs */
|
||||
OP_CRMINQUERY, /* 75 */
|
||||
OP_CRRANGE, /* 76 These are different to the three sets above. */
|
||||
OP_CRMINRANGE, /* 77 */
|
||||
|
||||
OP_CLASS, /* 77 Match a character class, chars < 256 only */
|
||||
OP_NCLASS, /* 78 Same, but the bitmap was created from a negative
|
||||
OP_CLASS, /* 78 Match a character class, chars < 256 only */
|
||||
OP_NCLASS, /* 79 Same, but the bitmap was created from a negative
|
||||
class - the difference is relevant only when a UTF-8
|
||||
character > 255 is encountered. */
|
||||
|
||||
OP_XCLASS, /* 79 Extended class for handling UTF-8 chars within the
|
||||
OP_XCLASS, /* 80 Extended class for handling UTF-8 chars within the
|
||||
class. This does both positive and negative. */
|
||||
|
||||
OP_REF, /* 80 Match a back reference */
|
||||
OP_RECURSE, /* 81 Match a numbered subpattern (possibly recursive) */
|
||||
OP_CALLOUT, /* 82 Call out to external function if provided */
|
||||
OP_REF, /* 81 Match a back reference */
|
||||
OP_RECURSE, /* 82 Match a numbered subpattern (possibly recursive) */
|
||||
OP_CALLOUT, /* 83 Call out to external function if provided */
|
||||
|
||||
OP_ALT, /* 83 Start of alternation */
|
||||
OP_KET, /* 84 End of group that doesn't have an unbounded repeat */
|
||||
OP_KETRMAX, /* 85 These two must remain together and in this */
|
||||
OP_KETRMIN, /* 86 order. They are for groups the repeat for ever. */
|
||||
OP_ALT, /* 84 Start of alternation */
|
||||
OP_KET, /* 85 End of group that doesn't have an unbounded repeat */
|
||||
OP_KETRMAX, /* 86 These two must remain together and in this */
|
||||
OP_KETRMIN, /* 87 order. They are for groups the repeat for ever. */
|
||||
|
||||
/* The assertions must come before BRA, CBRA, ONCE, and COND.*/
|
||||
|
||||
OP_ASSERT, /* 87 Positive lookahead */
|
||||
OP_ASSERT_NOT, /* 88 Negative lookahead */
|
||||
OP_ASSERTBACK, /* 89 Positive lookbehind */
|
||||
OP_ASSERTBACK_NOT, /* 90 Negative lookbehind */
|
||||
OP_REVERSE, /* 91 Move pointer back - used in lookbehind assertions */
|
||||
OP_ASSERT, /* 88 Positive lookahead */
|
||||
OP_ASSERT_NOT, /* 89 Negative lookahead */
|
||||
OP_ASSERTBACK, /* 90 Positive lookbehind */
|
||||
OP_ASSERTBACK_NOT, /* 91 Negative lookbehind */
|
||||
OP_REVERSE, /* 92 Move pointer back - used in lookbehind assertions */
|
||||
|
||||
/* ONCE, BRA, CBRA, and COND must come after the assertions, with ONCE first,
|
||||
as there's a test for >= ONCE for a subpattern that isn't an assertion. */
|
||||
|
||||
OP_ONCE, /* 92 Atomic group */
|
||||
OP_BRA, /* 93 Start of non-capturing bracket */
|
||||
OP_CBRA, /* 94 Start of capturing bracket */
|
||||
OP_COND, /* 95 Conditional group */
|
||||
OP_ONCE, /* 93 Atomic group */
|
||||
OP_BRA, /* 94 Start of non-capturing bracket */
|
||||
OP_CBRA, /* 95 Start of capturing bracket */
|
||||
OP_COND, /* 96 Conditional group */
|
||||
|
||||
/* These three must follow the previous three, in the same order. There's a
|
||||
check for >= SBRA to distinguish the two sets. */
|
||||
|
||||
OP_SBRA, /* 96 Start of non-capturing bracket, check empty */
|
||||
OP_SCBRA, /* 97 Start of capturing bracket, check empty */
|
||||
OP_SCOND, /* 98 Conditional group, check empty */
|
||||
OP_SBRA, /* 97 Start of non-capturing bracket, check empty */
|
||||
OP_SCBRA, /* 98 Start of capturing bracket, check empty */
|
||||
OP_SCOND, /* 99 Conditional group, check empty */
|
||||
|
||||
OP_CREF, /* 99 Used to hold a capture number as condition */
|
||||
OP_RREF, /* 100 Used to hold a recursion number as condition */
|
||||
OP_DEF, /* 101 The DEFINE condition */
|
||||
OP_CREF, /* 100 Used to hold a capture number as condition */
|
||||
OP_RREF, /* 101 Used to hold a recursion number as condition */
|
||||
OP_DEF, /* 102 The DEFINE condition */
|
||||
|
||||
OP_BRAZERO, /* 102 These two must remain together and in this */
|
||||
OP_BRAMINZERO, /* 103 order. */
|
||||
OP_BRAZERO, /* 103 These two must remain together and in this */
|
||||
OP_BRAMINZERO, /* 104 order. */
|
||||
|
||||
/* These are backtracking control verbs */
|
||||
|
||||
OP_PRUNE, /* 104 */
|
||||
OP_SKIP, /* 105 */
|
||||
OP_THEN, /* 106 */
|
||||
OP_COMMIT, /* 107 */
|
||||
OP_PRUNE, /* 105 */
|
||||
OP_SKIP, /* 106 */
|
||||
OP_THEN, /* 107 */
|
||||
OP_COMMIT, /* 108 */
|
||||
|
||||
/* These are forced failure and success verbs */
|
||||
|
||||
OP_FAIL, /* 108 */
|
||||
OP_ACCEPT /* 109 */
|
||||
OP_FAIL, /* 109 */
|
||||
OP_ACCEPT, /* 110 */
|
||||
|
||||
/* This is used to skip a subpattern with a {0} quantifier */
|
||||
|
||||
OP_SKIPZERO /* 111 */
|
||||
};
|
||||
|
||||
|
||||
@ -782,7 +792,7 @@ for debugging. The macro is referenced only in pcre_printint.c. */
|
||||
|
||||
#define OP_NAME_LIST \
|
||||
"End", "\\A", "\\G", "\\K", "\\B", "\\b", "\\D", "\\d", \
|
||||
"\\S", "\\s", "\\W", "\\w", "Any", "Anybyte", \
|
||||
"\\S", "\\s", "\\W", "\\w", "Any", "AllAny", "Anybyte", \
|
||||
"notprop", "prop", "\\R", "\\H", "\\h", "\\V", "\\v", \
|
||||
"extuni", "\\Z", "\\z", \
|
||||
"Opt", "^", "$", "char", "charnc", "not", \
|
||||
@ -798,7 +808,8 @@ for debugging. The macro is referenced only in pcre_printint.c. */
|
||||
"AssertB", "AssertB not", "Reverse", \
|
||||
"Once", "Bra", "CBra", "Cond", "SBra", "SCBra", "SCond", \
|
||||
"Cond ref", "Cond rec", "Cond def", "Brazero", "Braminzero", \
|
||||
"*PRUNE", "*SKIP", "*THEN", "*COMMIT", "*FAIL", "*ACCEPT"
|
||||
"*PRUNE", "*SKIP", "*THEN", "*COMMIT", "*FAIL", "*ACCEPT", \
|
||||
"Skip zero"
|
||||
|
||||
|
||||
/* This macro defines the length of fixed length operations in the compiled
|
||||
@ -814,7 +825,7 @@ in UTF-8 mode. The code that uses this table must know about such things. */
|
||||
1, /* End */ \
|
||||
1, 1, 1, 1, 1, /* \A, \G, \K, \B, \b */ \
|
||||
1, 1, 1, 1, 1, 1, /* \D, \d, \S, \s, \W, \w */ \
|
||||
1, 1, /* Any, Anybyte */ \
|
||||
1, 1, 1, /* Any, AllAny, Anybyte */ \
|
||||
3, 3, 1, /* NOTPROP, PROP, EXTUNI */ \
|
||||
1, 1, 1, 1, 1, /* \R, \H, \h, \V, \v */ \
|
||||
1, 1, 2, 1, 1, /* \Z, \z, Opt, ^, $ */ \
|
||||
@ -863,7 +874,7 @@ in UTF-8 mode. The code that uses this table must know about such things. */
|
||||
1, /* DEF */ \
|
||||
1, 1, /* BRAZERO, BRAMINZERO */ \
|
||||
1, 1, 1, 1, /* PRUNE, SKIP, THEN, COMMIT, */ \
|
||||
1, 1 /* FAIL, ACCEPT */
|
||||
1, 1, 1 /* FAIL, ACCEPT, SKIPZERO */
|
||||
|
||||
|
||||
/* A magic value for OP_RREF to indicate the "any recursion" condition. */
|
||||
@ -879,7 +890,7 @@ enum { ERR0, ERR1, ERR2, ERR3, ERR4, ERR5, ERR6, ERR7, ERR8, ERR9,
|
||||
ERR30, ERR31, ERR32, ERR33, ERR34, ERR35, ERR36, ERR37, ERR38, ERR39,
|
||||
ERR40, ERR41, ERR42, ERR43, ERR44, ERR45, ERR46, ERR47, ERR48, ERR49,
|
||||
ERR50, ERR51, ERR52, ERR53, ERR54, ERR55, ERR56, ERR57, ERR58, ERR59,
|
||||
ERR60, ERR61, ERR62, ERR63 };
|
||||
ERR60, ERR61, ERR62, ERR63, ERR64 };
|
||||
|
||||
/* The real format of the start of the pcre block; the index of names and the
|
||||
code vector run on as long as necessary after the end. We store an explicit
|
||||
@ -1004,6 +1015,7 @@ typedef struct match_data {
|
||||
BOOL notbol; /* NOTBOL flag */
|
||||
BOOL noteol; /* NOTEOL flag */
|
||||
BOOL utf8; /* UTF8 flag */
|
||||
BOOL jscript_compat; /* JAVASCRIPT_COMPAT flag */
|
||||
BOOL endonly; /* Dollar not before final \n */
|
||||
BOOL notempty; /* Empty string match not wanted */
|
||||
BOOL partial; /* PARTIAL flag */
|
||||
|
@ -215,6 +215,13 @@ do
|
||||
tcode += 1 + LINK_SIZE;
|
||||
break;
|
||||
|
||||
/* SKIPZERO skips the bracket. */
|
||||
|
||||
case OP_SKIPZERO:
|
||||
do tcode += GET(tcode,1); while (*tcode == OP_ALT);
|
||||
tcode += 1 + LINK_SIZE;
|
||||
break;
|
||||
|
||||
/* Single-char * or ? sets the bit and tries the next item */
|
||||
|
||||
case OP_STAR:
|
||||
@ -339,6 +346,7 @@ do
|
||||
switch(tcode[1])
|
||||
{
|
||||
case OP_ANY:
|
||||
case OP_ALLANY:
|
||||
return SSB_FAIL;
|
||||
|
||||
case OP_NOT_DIGIT:
|
||||
|
@ -124,7 +124,8 @@ static const int eint[] = {
|
||||
REG_BADPAT, /* (?+ or (?- must be followed by a non-zero number */
|
||||
REG_BADPAT, /* number is too big */
|
||||
REG_BADPAT, /* subpattern name expected */
|
||||
REG_BADPAT /* digit expected after (?+ */
|
||||
REG_BADPAT, /* digit expected after (?+ */
|
||||
REG_BADPAT /* ] is an invalid data character in JavaScript compatibility mode */
|
||||
};
|
||||
|
||||
/* Table of texts corresponding to POSIX error codes */
|
||||
@ -261,7 +262,7 @@ PCREPOSIX_EXP_DEFN int
|
||||
regexec(const regex_t *preg, const char *string, size_t nmatch,
|
||||
regmatch_t pmatch[], int eflags)
|
||||
{
|
||||
int rc;
|
||||
int rc, so, eo;
|
||||
int options = 0;
|
||||
int *ovector = NULL;
|
||||
int small_ovector[POSIX_MALLOC_THRESHOLD * 3];
|
||||
@ -294,7 +295,23 @@ else if (nmatch > 0)
|
||||
}
|
||||
}
|
||||
|
||||
rc = pcre_exec((const pcre *)preg->re_pcre, NULL, string, (int)strlen(string),
|
||||
/* REG_STARTEND is a BSD extension, to allow for non-NUL-terminated strings.
|
||||
The man page from OS X says "REG_STARTEND affects only the location of the
|
||||
string, not how it is matched". That is why the "so" value is used to bump the
|
||||
start location rather than being passed as a PCRE "starting offset". */
|
||||
|
||||
if ((eflags & REG_STARTEND) != 0)
|
||||
{
|
||||
so = pmatch[0].rm_so;
|
||||
eo = pmatch[0].rm_eo;
|
||||
}
|
||||
else
|
||||
{
|
||||
so = 0;
|
||||
eo = strlen(string);
|
||||
}
|
||||
|
||||
rc = pcre_exec((const pcre *)preg->re_pcre, NULL, string + so, (eo - so),
|
||||
0, options, ovector, nmatch * 3);
|
||||
|
||||
if (rc == 0) rc = nmatch; /* All captured slots were filled in */
|
||||
|
@ -59,6 +59,7 @@ extern "C" {
|
||||
#define REG_DOTALL 0x0010 /* NOT defined by POSIX. */
|
||||
#define REG_NOSUB 0x0020
|
||||
#define REG_UTF8 0x0040 /* NOT defined by POSIX. */
|
||||
#define REG_STARTEND 0x0080 /* BSD feature: pass subject string by so,eo */
|
||||
|
||||
/* This is not used by PCRE, but by defining it we make it easier
|
||||
to slot PCRE into existing programs that make POSIX calls. */
|
||||
|
135
ext/pcre/pcrelib/testdata/testinput2
vendored
135
ext/pcre/pcrelib/testdata/testinput2
vendored
@ -2589,4 +2589,139 @@ a random value. /Ix
|
||||
|
||||
/[[:a\dz:]]/
|
||||
|
||||
/^(?<name>a|b\g<name>c)/
|
||||
aaaa
|
||||
bacxxx
|
||||
bbaccxxx
|
||||
bbbacccxx
|
||||
|
||||
/^(?<name>a|b\g'name'c)/
|
||||
aaaa
|
||||
bacxxx
|
||||
bbaccxxx
|
||||
bbbacccxx
|
||||
|
||||
/^(a|b\g<1>c)/
|
||||
aaaa
|
||||
bacxxx
|
||||
bbaccxxx
|
||||
bbbacccxx
|
||||
|
||||
/^(a|b\g'1'c)/
|
||||
aaaa
|
||||
bacxxx
|
||||
bbaccxxx
|
||||
bbbacccxx
|
||||
|
||||
/^(a|b\g'-1'c)/
|
||||
aaaa
|
||||
bacxxx
|
||||
bbaccxxx
|
||||
bbbacccxx
|
||||
|
||||
/(^(a|b\g<-1>c))/
|
||||
aaaa
|
||||
bacxxx
|
||||
bbaccxxx
|
||||
bbbacccxx
|
||||
|
||||
/(^(a|b\g<-1'c))/
|
||||
|
||||
/(^(a|b\g{-1}))/
|
||||
bacxxx
|
||||
|
||||
/(?-i:\g<name>)(?i:(?<name>a))/
|
||||
XaaX
|
||||
XAAX
|
||||
|
||||
/(?i:\g<name>)(?-i:(?<name>a))/
|
||||
XaaX
|
||||
** Failers
|
||||
XAAX
|
||||
|
||||
/(?-i:\g<+1>)(?i:(a))/
|
||||
XaaX
|
||||
XAAX
|
||||
|
||||
/(?=(?<regex>(?#simplesyntax)\$(?<name>[a-zA-Z_\x{7f}-\x{ff}][a-zA-Z0-9_\x{7f}-\x{ff}]*)(?:\[(?<index>[a-zA-Z0-9_\x{7f}-\x{ff}]+|\$\g<name>)\]|->\g<name>(\(.*?\))?)?|(?#simple syntax withbraces)\$\{(?:\g<name>(?<indices>\[(?:\g<index>|'(?:\\.|[^'\\])*'|"(?:\g<regex>|\\.|[^"\\])*")\])?|\g<complex>|\$\{\g<complex>\})\}|(?#complexsyntax)\{(?<complex>\$(?<segment>\g<name>(\g<indices>*|\(.*?\))?)(?:->\g<segment>)*|\$\g<complex>|\$\{\g<complex>\})\}))\{/
|
||||
|
||||
/(?<n>a|b|c)\g<n>*/
|
||||
abc
|
||||
accccbbb
|
||||
|
||||
/^(?+1)(?<a>x|y){0}z/
|
||||
xzxx
|
||||
yzyy
|
||||
** Failers
|
||||
xxz
|
||||
|
||||
/(\3)(\1)(a)/
|
||||
cat
|
||||
|
||||
/(\3)(\1)(a)/<JS>
|
||||
cat
|
||||
|
||||
/TA]/
|
||||
The ACTA] comes
|
||||
|
||||
/TA]/<JS>
|
||||
The ACTA] comes
|
||||
|
||||
/(?2)[]a()b](abc)/
|
||||
abcbabc
|
||||
|
||||
/(?2)[^]a()b](abc)/
|
||||
abcbabc
|
||||
|
||||
/(?1)[]a()b](abc)/
|
||||
abcbabc
|
||||
** Failers
|
||||
abcXabc
|
||||
|
||||
/(?1)[^]a()b](abc)/
|
||||
abcXabc
|
||||
** Failers
|
||||
abcbabc
|
||||
|
||||
/(?2)[]a()b](abc)(xyz)/
|
||||
xyzbabcxyz
|
||||
|
||||
/(?&N)[]a(?<N>)](?<M>abc)/
|
||||
abc<abc
|
||||
|
||||
/(?&N)[]a(?<N>)](abc)/
|
||||
abc<abc
|
||||
|
||||
/a[]b/
|
||||
|
||||
/a[^]b/
|
||||
|
||||
/a[]b/<JS>
|
||||
** Failers
|
||||
ab
|
||||
|
||||
/a[]+b/<JS>
|
||||
** Failers
|
||||
ab
|
||||
|
||||
/a[]*+b/<JS>
|
||||
** Failers
|
||||
ab
|
||||
|
||||
/a[^]b/<JS>
|
||||
aXb
|
||||
a\nb
|
||||
** Failers
|
||||
ab
|
||||
|
||||
/a[^]+b/<JS>
|
||||
aXb
|
||||
a\nX\nXb
|
||||
** Failers
|
||||
ab
|
||||
|
||||
/a(?!)+b/
|
||||
|
||||
/a(*FAIL)+b/
|
||||
|
||||
/ End of testinput2 /
|
||||
|
12
ext/pcre/pcrelib/testdata/testinput5
vendored
12
ext/pcre/pcrelib/testdata/testinput5
vendored
@ -461,4 +461,16 @@ can't tell the difference.) --/
|
||||
|
||||
/[[:a\x{100}b:]]/8
|
||||
|
||||
/a[^]b/<JS>8
|
||||
a\x{1234}b
|
||||
a\nb
|
||||
** Failers
|
||||
ab
|
||||
|
||||
/a[^]+b/<JS>8
|
||||
aXb
|
||||
a\nX\nX\x{1234}b
|
||||
** Failers
|
||||
ab
|
||||
|
||||
/ End of testinput5 /
|
||||
|
27
ext/pcre/pcrelib/testdata/testinput7
vendored
27
ext/pcre/pcrelib/testdata/testinput7
vendored
@ -4364,5 +4364,32 @@
|
||||
a\r\r\r\r\rb
|
||||
a\x85\85b\<bsr_anycrlf>
|
||||
a\x0b\0bb\<bsr_anycrlf>
|
||||
|
||||
/a(?!)|\wbc/
|
||||
abc
|
||||
|
||||
/a[]b/<JS>
|
||||
** Failers
|
||||
ab
|
||||
|
||||
/a[]+b/<JS>
|
||||
** Failers
|
||||
ab
|
||||
|
||||
/a[]*+b/<JS>
|
||||
** Failers
|
||||
ab
|
||||
|
||||
/a[^]b/<JS>
|
||||
aXb
|
||||
a\nb
|
||||
** Failers
|
||||
ab
|
||||
|
||||
/a[^]+b/<JS>
|
||||
aXb
|
||||
a\nX\nXb
|
||||
** Failers
|
||||
ab
|
||||
|
||||
/ End of testinput7 /
|
||||
|
4
ext/pcre/pcrelib/testdata/testoutput10
vendored
4
ext/pcre/pcrelib/testdata/testoutput10
vendored
@ -21,7 +21,7 @@ Memory allocation (code space): 25
|
||||
------------------------------------------------------------------
|
||||
0 21 Bra
|
||||
3 9 CBra 1
|
||||
8 Any*
|
||||
8 AllAny*
|
||||
10 X
|
||||
12 6 Alt
|
||||
15 ^
|
||||
@ -37,7 +37,7 @@ Memory allocation (code space): 29
|
||||
0 25 Bra
|
||||
3 9 Bra
|
||||
6 04 Opt
|
||||
8 Any*
|
||||
8 AllAny*
|
||||
10 X
|
||||
12 8 Alt
|
||||
15 04 Opt
|
||||
|
268
ext/pcre/pcrelib/testdata/testoutput2
vendored
268
ext/pcre/pcrelib/testdata/testoutput2
vendored
@ -1126,7 +1126,7 @@ Need char = 'X'
|
||||
/.*X/IDZs
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
Any*
|
||||
AllAny*
|
||||
X
|
||||
Ket
|
||||
End
|
||||
@ -1160,7 +1160,7 @@ No need char
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
CBra 1
|
||||
Any*
|
||||
AllAny*
|
||||
X
|
||||
Alt
|
||||
^
|
||||
@ -1179,7 +1179,7 @@ No need char
|
||||
------------------------------------------------------------------
|
||||
Bra
|
||||
CBra 1
|
||||
Any*
|
||||
AllAny*
|
||||
X
|
||||
Alt
|
||||
^
|
||||
@ -1199,7 +1199,7 @@ No need char
|
||||
Bra
|
||||
Bra
|
||||
04 Opt
|
||||
Any*
|
||||
AllAny*
|
||||
X
|
||||
Alt
|
||||
04 Opt
|
||||
@ -1212,8 +1212,8 @@ No need char
|
||||
------------------------------------------------------------------
|
||||
Capturing subpattern count = 0
|
||||
Partial matching not supported
|
||||
No options
|
||||
First char at start or follows newline
|
||||
Options: anchored
|
||||
No first char
|
||||
No need char
|
||||
|
||||
/\Biss\B/I+
|
||||
@ -8074,13 +8074,13 @@ No match
|
||||
Failed: reference to non-existent subpattern at offset 7
|
||||
|
||||
/^(a)\g/
|
||||
Failed: \g is not followed by a braced name or an optionally braced non-zero number at offset 5
|
||||
Failed: a numbered reference must not be zero at offset 5
|
||||
|
||||
/^(a)\g{0}/
|
||||
Failed: \g is not followed by a braced name or an optionally braced non-zero number at offset 7
|
||||
Failed: a numbered reference must not be zero at offset 8
|
||||
|
||||
/^(a)\g{3/
|
||||
Failed: \g is not followed by a braced name or an optionally braced non-zero number at offset 8
|
||||
Failed: \g is not followed by a braced, angle-bracketed, or quoted name/number or by a plain number at offset 8
|
||||
|
||||
/^(a)\g{4a}/
|
||||
Failed: reference to non-existent subpattern at offset 9
|
||||
@ -8217,13 +8217,13 @@ No match
|
||||
No match
|
||||
|
||||
/x(?-0)y/
|
||||
Failed: (?+ or (?- or (?(+ or (?(- must be followed by a non-zero number at offset 5
|
||||
Failed: a numbered reference must not be zero at offset 5
|
||||
|
||||
/x(?-1)y/
|
||||
Failed: reference to non-existent subpattern at offset 5
|
||||
|
||||
/x(?+0)y/
|
||||
Failed: (?+ or (?- or (?(+ or (?(- must be followed by a non-zero number at offset 5
|
||||
Failed: a numbered reference must not be zero at offset 5
|
||||
|
||||
/x(?+1)y/
|
||||
Failed: reference to non-existent subpattern at offset 5
|
||||
@ -9385,4 +9385,250 @@ Failed: unknown POSIX class name at offset 6
|
||||
/[[:a\dz:]]/
|
||||
Failed: unknown POSIX class name at offset 3
|
||||
|
||||
/^(?<name>a|b\g<name>c)/
|
||||
aaaa
|
||||
0: a
|
||||
1: a
|
||||
bacxxx
|
||||
0: bac
|
||||
1: bac
|
||||
bbaccxxx
|
||||
0: bbacc
|
||||
1: bbacc
|
||||
bbbacccxx
|
||||
0: bbbaccc
|
||||
1: bbbaccc
|
||||
|
||||
/^(?<name>a|b\g'name'c)/
|
||||
aaaa
|
||||
0: a
|
||||
1: a
|
||||
bacxxx
|
||||
0: bac
|
||||
1: bac
|
||||
bbaccxxx
|
||||
0: bbacc
|
||||
1: bbacc
|
||||
bbbacccxx
|
||||
0: bbbaccc
|
||||
1: bbbaccc
|
||||
|
||||
/^(a|b\g<1>c)/
|
||||
aaaa
|
||||
0: a
|
||||
1: a
|
||||
bacxxx
|
||||
0: bac
|
||||
1: bac
|
||||
bbaccxxx
|
||||
0: bbacc
|
||||
1: bbacc
|
||||
bbbacccxx
|
||||
0: bbbaccc
|
||||
1: bbbaccc
|
||||
|
||||
/^(a|b\g'1'c)/
|
||||
aaaa
|
||||
0: a
|
||||
1: a
|
||||
bacxxx
|
||||
0: bac
|
||||
1: bac
|
||||
bbaccxxx
|
||||
0: bbacc
|
||||
1: bbacc
|
||||
bbbacccxx
|
||||
0: bbbaccc
|
||||
1: bbbaccc
|
||||
|
||||
/^(a|b\g'-1'c)/
|
||||
aaaa
|
||||
0: a
|
||||
1: a
|
||||
bacxxx
|
||||
0: bac
|
||||
1: bac
|
||||
bbaccxxx
|
||||
0: bbacc
|
||||
1: bbacc
|
||||
bbbacccxx
|
||||
0: bbbaccc
|
||||
1: bbbaccc
|
||||
|
||||
/(^(a|b\g<-1>c))/
|
||||
aaaa
|
||||
0: a
|
||||
1: a
|
||||
2: a
|
||||
bacxxx
|
||||
0: bac
|
||||
1: bac
|
||||
2: bac
|
||||
bbaccxxx
|
||||
0: bbacc
|
||||
1: bbacc
|
||||
2: bbacc
|
||||
bbbacccxx
|
||||
0: bbbaccc
|
||||
1: bbbaccc
|
||||
2: bbbaccc
|
||||
|
||||
/(^(a|b\g<-1'c))/
|
||||
Failed: \g is not followed by a braced, angle-bracketed, or quoted name/number or by a plain number at offset 15
|
||||
|
||||
/(^(a|b\g{-1}))/
|
||||
bacxxx
|
||||
No match
|
||||
|
||||
/(?-i:\g<name>)(?i:(?<name>a))/
|
||||
XaaX
|
||||
0: aa
|
||||
1: a
|
||||
XAAX
|
||||
0: AA
|
||||
1: A
|
||||
|
||||
/(?i:\g<name>)(?-i:(?<name>a))/
|
||||
XaaX
|
||||
0: aa
|
||||
1: a
|
||||
** Failers
|
||||
No match
|
||||
XAAX
|
||||
No match
|
||||
|
||||
/(?-i:\g<+1>)(?i:(a))/
|
||||
XaaX
|
||||
0: aa
|
||||
1: a
|
||||
XAAX
|
||||
0: AA
|
||||
1: A
|
||||
|
||||
/(?=(?<regex>(?#simplesyntax)\$(?<name>[a-zA-Z_\x{7f}-\x{ff}][a-zA-Z0-9_\x{7f}-\x{ff}]*)(?:\[(?<index>[a-zA-Z0-9_\x{7f}-\x{ff}]+|\$\g<name>)\]|->\g<name>(\(.*?\))?)?|(?#simple syntax withbraces)\$\{(?:\g<name>(?<indices>\[(?:\g<index>|'(?:\\.|[^'\\])*'|"(?:\g<regex>|\\.|[^"\\])*")\])?|\g<complex>|\$\{\g<complex>\})\}|(?#complexsyntax)\{(?<complex>\$(?<segment>\g<name>(\g<indices>*|\(.*?\))?)(?:->\g<segment>)*|\$\g<complex>|\$\{\g<complex>\})\}))\{/
|
||||
|
||||
/(?<n>a|b|c)\g<n>*/
|
||||
abc
|
||||
0: abc
|
||||
1: a
|
||||
accccbbb
|
||||
0: accccbbb
|
||||
1: a
|
||||
|
||||
/^(?+1)(?<a>x|y){0}z/
|
||||
xzxx
|
||||
0: xz
|
||||
1: <unset>
|
||||
yzyy
|
||||
0: yz
|
||||
1: <unset>
|
||||
** Failers
|
||||
No match
|
||||
xxz
|
||||
No match
|
||||
|
||||
/(\3)(\1)(a)/
|
||||
cat
|
||||
No match
|
||||
|
||||
/(\3)(\1)(a)/<JS>
|
||||
cat
|
||||
0: a
|
||||
1:
|
||||
2:
|
||||
3: a
|
||||
|
||||
/TA]/
|
||||
The ACTA] comes
|
||||
0: TA]
|
||||
|
||||
/TA]/<JS>
|
||||
Failed: ] is an invalid data character in JavaScript compatibility mode at offset 2
|
||||
|
||||
/(?2)[]a()b](abc)/
|
||||
Failed: reference to non-existent subpattern at offset 3
|
||||
|
||||
/(?2)[^]a()b](abc)/
|
||||
Failed: reference to non-existent subpattern at offset 3
|
||||
|
||||
/(?1)[]a()b](abc)/
|
||||
abcbabc
|
||||
0: abcbabc
|
||||
1: abc
|
||||
** Failers
|
||||
No match
|
||||
abcXabc
|
||||
No match
|
||||
|
||||
/(?1)[^]a()b](abc)/
|
||||
abcXabc
|
||||
0: abcXabc
|
||||
1: abc
|
||||
** Failers
|
||||
No match
|
||||
abcbabc
|
||||
No match
|
||||
|
||||
/(?2)[]a()b](abc)(xyz)/
|
||||
xyzbabcxyz
|
||||
0: xyzbabcxyz
|
||||
1: abc
|
||||
2: xyz
|
||||
|
||||
/(?&N)[]a(?<N>)](?<M>abc)/
|
||||
Failed: reference to non-existent subpattern at offset 4
|
||||
|
||||
/(?&N)[]a(?<N>)](abc)/
|
||||
Failed: reference to non-existent subpattern at offset 4
|
||||
|
||||
/a[]b/
|
||||
Failed: missing terminating ] for character class at offset 4
|
||||
|
||||
/a[^]b/
|
||||
Failed: missing terminating ] for character class at offset 5
|
||||
|
||||
/a[]b/<JS>
|
||||
** Failers
|
||||
No match
|
||||
ab
|
||||
No match
|
||||
|
||||
/a[]+b/<JS>
|
||||
** Failers
|
||||
No match
|
||||
ab
|
||||
No match
|
||||
|
||||
/a[]*+b/<JS>
|
||||
** Failers
|
||||
No match
|
||||
ab
|
||||
No match
|
||||
|
||||
/a[^]b/<JS>
|
||||
aXb
|
||||
0: aXb
|
||||
a\nb
|
||||
0: a\x0ab
|
||||
** Failers
|
||||
No match
|
||||
ab
|
||||
No match
|
||||
|
||||
/a[^]+b/<JS>
|
||||
aXb
|
||||
0: aXb
|
||||
a\nX\nXb
|
||||
0: a\x0aX\x0aXb
|
||||
** Failers
|
||||
No match
|
||||
ab
|
||||
No match
|
||||
|
||||
/a(?!)+b/
|
||||
Failed: nothing to repeat at offset 5
|
||||
|
||||
/a(*FAIL)+b/
|
||||
Failed: nothing to repeat at offset 8
|
||||
|
||||
/ End of testinput2 /
|
||||
|
20
ext/pcre/pcrelib/testdata/testoutput5
vendored
20
ext/pcre/pcrelib/testdata/testoutput5
vendored
@ -1608,4 +1608,24 @@ No match
|
||||
/[[:a\x{100}b:]]/8
|
||||
Failed: unknown POSIX class name at offset 3
|
||||
|
||||
/a[^]b/<JS>8
|
||||
a\x{1234}b
|
||||
0: a\x{1234}b
|
||||
a\nb
|
||||
0: a\x{0a}b
|
||||
** Failers
|
||||
No match
|
||||
ab
|
||||
No match
|
||||
|
||||
/a[^]+b/<JS>8
|
||||
aXb
|
||||
0: aXb
|
||||
a\nX\nX\x{1234}b
|
||||
0: a\x{0a}X\x{0a}X\x{1234}b
|
||||
** Failers
|
||||
No match
|
||||
ab
|
||||
No match
|
||||
|
||||
/ End of testinput5 /
|
||||
|
42
ext/pcre/pcrelib/testdata/testoutput7
vendored
42
ext/pcre/pcrelib/testdata/testoutput7
vendored
@ -7211,5 +7211,47 @@ No match
|
||||
No match
|
||||
a\x0b\0bb\<bsr_anycrlf>
|
||||
No match
|
||||
|
||||
/a(?!)|\wbc/
|
||||
abc
|
||||
0: abc
|
||||
|
||||
/a[]b/<JS>
|
||||
** Failers
|
||||
No match
|
||||
ab
|
||||
No match
|
||||
|
||||
/a[]+b/<JS>
|
||||
** Failers
|
||||
No match
|
||||
ab
|
||||
No match
|
||||
|
||||
/a[]*+b/<JS>
|
||||
** Failers
|
||||
No match
|
||||
ab
|
||||
No match
|
||||
|
||||
/a[^]b/<JS>
|
||||
aXb
|
||||
0: aXb
|
||||
a\nb
|
||||
0: a\x0ab
|
||||
** Failers
|
||||
No match
|
||||
ab
|
||||
No match
|
||||
|
||||
/a[^]+b/<JS>
|
||||
aXb
|
||||
0: aXb
|
||||
a\nX\nXb
|
||||
0: a\x0aX\x0aXb
|
||||
** Failers
|
||||
No match
|
||||
ab
|
||||
No match
|
||||
|
||||
/ End of testinput7 /
|
||||
|
@ -17,7 +17,7 @@ typedef struct cnode {
|
||||
|
||||
#define f0_scriptmask 0xff000000 /* Mask for script field */
|
||||
#define f0_scriptshift 24 /* Shift for script value */
|
||||
#define f0_rangeflag 0x00f00000 /* Flag for a range item */
|
||||
#define f0_rangeflag 0x00800000 /* Flag for a range item */
|
||||
#define f0_charmask 0x001fffff /* Mask for code point value */
|
||||
|
||||
/* Things for the f1 field */
|
||||
|
Loading…
Reference in New Issue
Block a user