Commit Graph

684 Commits

Author SHA1 Message Date
Philip.Hazel 8057c3c8b9 Renamed dftables as pcre2_dftables and enable it to write the tables in binary.
Update documentation about character tables.
2020-03-20 18:09:59 +00:00
Philip.Hazel 3155a6951f Fix bugs in new UCP casing code for back references and characters with more
than 2 cases.
2020-02-26 16:53:39 +00:00
Philip.Hazel 3be538015b Fix bad lookbehind compilation when preceded by a DEFINE group. 2020-02-24 17:29:00 +00:00
Philip.Hazel f50ee03f5d Fix bug in UTF-16 checker returning wrong offset for missing low surrogate. 2020-02-24 15:39:56 +00:00
Philip.Hazel 4a7dfab0ec Unicode upper/lower casing is now used when UCP is set, even if UTF is not set.
This is not yet documented, and it not yet implemented in JIT.
2020-02-23 16:40:05 +00:00
Zoltán Herczeg c21bd97754 Fix a crash which occurs when the character type of an invalid UTF character is decoded in JIT. 2020-02-20 07:42:47 +00:00
Zoltán Herczeg 697cf5f602 Fix control verb chain restoration issue in JIT. 2020-02-10 10:18:01 +00:00
Philip.Hazel b040e2e1cd Limit function recursion in pcre2_study to avoid stack overflow issues. 2020-01-27 10:28:19 +00:00
Philip.Hazel 3a6b4948d1 Fix bug in processing (?(DEFINE)...) within lookbehind assertions. 2020-01-26 15:31:27 +00:00
Philip.Hazel 9e960f5465 Ensure a newline after the final line in a file is output by pcre2grep. 2020-01-25 15:50:44 +00:00
Zoltán Herczeg 09984bb0e4 The JIT stack should be freed when the low-level stack allocation fails. 2020-01-24 08:28:23 +00:00
Philip.Hazel e8d70e2459 Implement PCRE2_SUBSTITUTE_REPLACEMENT_ONLY. 2020-01-22 17:50:12 +00:00
Zoltán Herczeg bf4cd8212f Fix *THEN verbs in lookahead assertions in JIT. 2020-01-11 15:28:15 +00:00
Philip.Hazel 5ba5230b82 Allow real repetition of assertions. 2020-01-01 12:07:02 +00:00
Philip.Hazel ac4ab7186d Add (?* and (?<* synonyms for non-atomic lookarounds. 2019-12-28 13:53:59 +00:00
Philip.Hazel d170829b26 Implement PCRE2_SUBSTITUTE_MATCHED. 2019-12-27 13:35:17 +00:00
Philip.Hazel 777582d4de Avoid some VS compiler warnings. 2019-12-26 15:10:26 +00:00
Philip.Hazel f3fd8b18cb Implement PCRE2_SUBSTITUTE_LITERAL. 2019-12-26 14:53:24 +00:00
Philip.Hazel 0a2033f0f7 Remove atomic restriction on capture groups containing recursive back
references, as since 10.30 it has been unnecessary.
2019-12-18 16:16:12 +00:00
Zoltán Herczeg 880aac5dda Fix the too early access of the fields of a compiled pattern in JIT. 2019-12-07 16:00:53 +00:00
Zoltán Herczeg 2632526c67 Fix ARMv5 JIT improper handling of labels right after a constant pool. 2019-11-29 11:03:10 +00:00
Zoltán Herczeg f5286d8f56 Use PCRE2_MATCH_EMPTY flag to detect empty matches in JIT. 2019-11-28 11:35:08 +00:00
Philip.Hazel add4db4c87 Final file tidies for 10.34 2019-11-21 16:31:08 +00:00
Zoltán Herczeg af45f41fbb Fixed the incorrect computation of jump sizes on x86 CPUs in JIT. 2019-11-19 12:25:32 +00:00
Philip.Hazel 3c869816ac Fix sometimes failing caseless non-ASCII matching in assertion. 2019-11-16 17:30:07 +00:00
Philip.Hazel 9323fa32b2 Documentation update 2019-11-08 16:04:43 +00:00
Philip.Hazel 8855b0efe1 File tidies for 10.34-RC2. 2019-11-06 16:51:31 +00:00
Philip.Hazel ae9208ab7b Source tidies (trailing spaces) etc. for 10.34-RC1. 2019-10-17 16:39:38 +00:00
Philip.Hazel 90ae0ae01e Fix minor test issues and tidies/updates for 10.34-RC1 testing release. 2019-10-15 15:23:31 +00:00
Philip.Hazel 2a0faa2114 Ensure regexec is thread safe to avoid sanitizer warnings. 2019-10-15 10:46:36 +00:00
Philip.Hazel e413f3147c Optimize certain starting code unit bit maps into a single starting code unit. 2019-09-13 17:02:06 +00:00
Philip.Hazel d917899be5 Improve starting-byte bit map for UTF-8 patterns with wide characters in
classes.
2019-09-10 15:38:42 +00:00
Philip.Hazel bf15267c30 Optimize classes such as [Aa] to be a single caseless character. 2019-09-09 17:00:19 +00:00
Zoltán Herczeg aae44b83f8 Add underflow check in JIT. 2019-09-09 07:12:00 +00:00
Philip.Hazel b48aa469d6 Install .gitignore file to help those using svn via git. 2019-09-07 15:27:05 +00:00
Philip.Hazel 27d40c8ad8 When computing minimum length, don't scan subsequent branches if any branch in
a group has zero minimum length.
2019-09-07 15:16:10 +00:00
Philip.Hazel 7bbdc58513 Fix pessimizing optimization of start-of-match code units in the interpreters. 2019-09-06 16:08:45 +00:00
Philip.Hazel 963b570fd0 Back off failed attempt to handle nested lookbehinds for estimating how much of
a partial match to retain for multi-segment matching. Document the current 
difficulty if the whole first segment cannot be retained.
2019-09-04 18:14:54 +00:00
Philip.Hazel 0970ae4195 Add the pcre2_maketables_free() function. 2019-09-03 14:16:07 +00:00
Philip.Hazel 71eb916d79 Fix allusedtext bug, rightmost consulted character incorrect in negative
lookaheads.
2019-08-10 11:34:50 +00:00
Philip.Hazel 59c7c5d100 Fix incorrect computation of group length when one branch exceeded 65535. 2019-08-03 08:30:40 +00:00
Philip.Hazel c0ed5a3ab3 Minor upgrade to pcre2test and comment in ucptest. 2019-07-30 17:59:42 +00:00
Philip.Hazel aff5a78056 Upgrade to Unicode 12.1.0 2019-07-29 15:32:36 +00:00
Philip.Hazel 9319b5bb83 Correct tables argument data type for pcre2_set_character_tables() and fix
documentation for pcre2_maketables().
2019-07-28 15:58:24 +00:00
Philip.Hazel 24c62fc0d0 (*ACCEPT) at start of branch was not recording "may match empty string". 2019-07-23 16:58:57 +00:00
Philip.Hazel 3572634086 More partial match tweaks. 2019-07-22 16:30:44 +00:00
Philip.Hazel c84a06c96e Update definition of partial match and fix \z and \Z (as documented). 2019-07-21 16:48:13 +00:00
Philip.Hazel bca9888a2c Implemented pcre2_get_match_data_size(). 2019-07-16 15:50:09 +00:00
Philip.Hazel 046c5cd21c Fix lookbehind within lookahead within lookbehind misbehaviour bug. 2019-07-16 15:06:21 +00:00
Philip.Hazel 620f3a1307 Implement non-atomic positive assertions. 2019-07-13 11:12:03 +00:00
Philip.Hazel 2e06fdcdc1 Check for integer overflow when computing lookbehind lengths. Fixes Clusterfuzz
issue 13656.
2019-07-04 17:01:53 +00:00
Philip.Hazel a5c601091e Give error for zero timing argument to pcre2test. 2019-07-03 17:15:37 +00:00
Philip.Hazel c0d0ee5365 Fix partial matching bug in pcre2_dfa_match(). 2019-06-26 16:13:28 +00:00
Philip.Hazel 434e3f7468 Make pcre2test show actual pre-match consulted characters for a partial match,
not the length of the longest lookbehind. Control this by "allusedtext".
2019-06-26 08:23:47 +00:00
Philip.Hazel d21f7daf9b Improve maximum lookbehind calculation for nested lookbehinds. 2019-06-25 15:40:42 +00:00
Philip.Hazel da5155fed3 Don't ignore {1}+ when it is applied to a parenthesized item. 2019-06-19 16:27:50 +00:00
Philip.Hazel ef79b978a6 Fix minimum length bug for patterns containing (*ACCEPT). 2019-06-18 16:07:43 +00:00
Philip.Hazel 1ebc2c50cc Another extension to minimum length calculation. 2019-06-17 16:26:44 +00:00
Philip.Hazel ead78198d1 Improve minimum length finder in the presence of back references when there are
multiple groups with the same number.
2019-06-16 15:37:45 +00:00
Philip.Hazel 0d1ab8515f Fix pcre2grep -o bug when ovector overflows; add option to adjust the limit;
raise the default limit; give error if -o requests an uncaptured parens.
2019-06-15 15:51:07 +00:00
Philip.Hazel 49f174ef78 Make pcre2_match() return (*MARK) names from successful conditional assertions,
as Perl and the JIT do.
2019-06-13 16:49:40 +00:00
Philip.Hazel 1f6b9097f4 Minor improvement to minimum length calculation. 2019-06-13 16:00:11 +00:00
Philip.Hazel 306f2b9c57 Allow (*ACCEPT) to be quantified. 2019-06-10 16:41:22 +00:00
Philip.Hazel d5dc4e0c33 Tweak limits on "must have" code unit searches (improves some performance). 2019-05-28 16:34:28 +00:00
Philip.Hazel 4f31de2866 Add support for invalid UTF-8 matching to pcre2grep. 2019-05-28 14:14:22 +00:00
Philip.Hazel 16c046ce50 Implement support for invalid UTF in the pcre2_match() interpreter. 2019-05-24 17:15:48 +00:00
Philip.Hazel e118e60a68 Fix crash when \X is used without UTF in JIT. 2019-05-13 16:26:17 +00:00
Zoltán Herczeg 274efb8ded Improved the invalid utf32 support of the JIT compiler. 2019-05-10 13:15:20 +00:00
Philip.Hazel 16de9003e5 Implement a check on the number of capturing parentheses, which for some reason
has never existed. This fixes ClusterFuzz issue 14376.
2019-04-22 12:39:38 +00:00
Philip.Hazel 4e4f273f07 Final file tidies for 10.33. 2019-04-16 15:34:27 +00:00
Philip.Hazel 95c9d011e3 Change a number of expressions like 1<<10 to 1u<<10. 2019-04-12 14:40:27 +00:00
Zoltán Herczeg 590bc16842 Disable SSE2 JIT optimizations in x86 CPUs when SSE2 is not available. 2019-03-25 14:10:24 +00:00
Philip.Hazel e85de98d0a Fix crash in pcre2_substitute() with NULL match context. 2019-03-11 17:29:08 +00:00
Philip.Hazel 7375089fa5 More file tidies for 10.33-RC1 2019-03-04 18:07:04 +00:00
Philip.Hazel 02ff543f9c Final file tidies for 10.33-RC1 2019-03-04 18:04:44 +00:00
Philip.Hazel 473d8f95d7 Fix --enable-jit=auto for out-of-tree builds. 2019-03-01 16:19:49 +00:00
Philip.Hazel 4fd8932e83 Try to fix CMake old policy warning issue. 2019-02-16 11:58:37 +00:00
Zoltán Herczeg 1b95f98f95 Compile invalid UTF check in JIT test when only pcre32 is enabled. 2019-02-14 07:33:57 +00:00
Philip.Hazel 255f5e741b Compile \p{Any} the same as . in DOTALL mode, to benefit from auto-anchoring. 2019-02-13 17:30:24 +00:00
Philip.Hazel 8c8deae8eb Implement PCRE2_EXTRA_ALT_BSUX to support ECMAscript 6's \u{hhh..} syntax. 2019-02-12 17:50:19 +00:00
Philip.Hazel 45959f1eec Fix small bug in new Unicode group name logic. 2019-02-07 17:29:50 +00:00
Philip.Hazel d7b10a57d1 Allow non-ASCII in group names when UTF is set; revise group naming terminology
in documentation to use "capture group", as Perl does.
2019-02-06 18:11:36 +00:00
Philip.Hazel ae913fbee7 Update POSIX wrapper to use macros in the .h file, but also have the POSIX
function names in the library.
2019-01-30 16:11:16 +00:00
Philip.Hazel 86349f8814 Fix bug in VERSION conditional test in DFA matching. 2019-01-29 14:34:59 +00:00
Zoltán Herczeg d38c7f7e8d Fix word boundary in JIT compiler. Patch by Mike Munday. 2019-01-17 11:47:59 +00:00
Philip.Hazel 7de013bac3 Fix issues with BAD_ESCAPE_IS_LITERAL in character classes. 2019-01-04 16:41:32 +00:00
Philip.Hazel 9938684b7b Cast to get rid of compiler warning. 2018-12-14 16:02:29 +00:00
Philip.Hazel ed63958dad Make RunTest check stack settablility using the -bigstack value. 2018-12-07 16:32:05 +00:00
Philip.Hazel 0448b486e9 Redirect stderr in RunGrepTest instead of appending to testtrygrep from two
different file descriptors, because the latter doesn't always work as expected.
2018-12-06 17:13:41 +00:00
Philip.Hazel 8f1727af98 Cut out test of NUL characters in RunGrepTest for all OS except Linux, as it
doesn't work for *BSD as well as for Solaris and MacOS (which were already cut 
out).
2018-12-06 17:05:06 +00:00
Philip.Hazel 0b64d9cfca Fix non-recognition of anchoring when preceded by (*MARK) etc. 2018-11-27 16:00:58 +00:00
Zoltán Herczeg 57f1eca640 Improve MAP_JIT flag usage on MacOS. Patch by Rich Siegel. 2018-11-25 17:11:52 +00:00
Philip.Hazel 8187224514 Add VMS support for pcre2grep callout of an external program. 2018-11-24 16:31:10 +00:00
Philip.Hazel cd73c9319e Fix two instances of <= 0 being applied to unsigned integers. 2018-11-17 16:59:39 +00:00
Philip.Hazel 0ad7ff1549 Add --disable-pcre2grep-callout-fork configuration setting. 2018-11-17 16:45:57 +00:00
Philip.Hazel 149af0e21b Implement --disable-percent-zt to avoid %zu and %td even if the environment
claims to be C99 or greater.
2018-11-15 18:09:02 +00:00
Philip.Hazel 19c50b9d41 Unconditionally use inttypes.h instead of trying for stdint.h (simplification)
and remove the now unnecessary inclusion in pcre2_internal.h.
2018-11-14 16:59:19 +00:00
Philip.Hazel 900f457222 Update VMS-specific code in pcre2test, on the advice of a VMS user. 2018-11-09 18:10:25 +00:00
Philip.Hazel 66cd7df514 Add pcre2_jit_free_unused_memory() to pcre2grep, for tidiness. 2018-10-28 17:27:48 +00:00
Philip.Hazel 87a9887e6e Add "kibibytes" to the output of pcre2test -C to show the units of the heap
limit.
2018-10-22 16:56:11 +00:00
Philip.Hazel 951bc4b9ff Fix heap limit checking overflow bug in pcre2_dfa_match(). 2018-10-22 16:47:55 +00:00
Philip.Hazel 996892434f Fix zero-repeated subroutine call at start of pattern bug, which recorded an
incorrect first code unit.
2018-10-20 09:28:02 +00:00
Philip.Hazel 8a0dd8955a Set subject field in match data to NULL after failed match. 2018-10-19 15:31:16 +00:00
Philip.Hazel f90ce1a333 Implement PCRE2_COPY_MATCHED_SUBJECT. 2018-10-17 08:33:38 +00:00
Philip.Hazel 971f885277 Fix typos in code for alphabetic ranges in EBCDIC environments. 2018-10-15 11:01:24 +00:00
Philip.Hazel 0fc5cda13b Documentation and tests update for script runs. 2018-10-12 17:02:34 +00:00
Philip.Hazel 4e7a204d18 Update Script Run code to use the Script Extension property instead of the
Script property.
2018-10-09 16:42:21 +00:00
Philip.Hazel 866750fd53 Basic "script run" implementation. Not yet complete, and not yet documented. 2018-10-02 15:25:58 +00:00
Philip.Hazel f26b0b0bae Implement Perl 5.28's alphabetic lookaround syntax, e.g. (*pla:...) and also
(*atomic:...).
2018-09-24 16:23:53 +00:00
Philip.Hazel 69254c77f1 Implement PCRE2_EXTRA_ESCAPED_CR_IS_LF 2018-09-21 16:59:48 +00:00
Zoltán Herczeg 8800191109 Fix an xclass matching issue in JIT. 2018-09-21 07:24:34 +00:00
Philip.Hazel 992e1fad44 Provide alternative POSIX names. 2018-09-19 16:33:09 +00:00
Philip.Hazel a69267246f Implement callouts from pcre2_substitute(). 2018-09-18 16:31:30 +00:00
Zoltán Herczeg 80adf9d165 Fix subject buffer overread in JIT. Found by Yunho Kim. 2018-09-18 10:19:14 +00:00
Philip.Hazel 3fce7c75e9 Add "allvector" to pcre2test. 2018-09-15 17:10:39 +00:00
Philip.Hazel bf3c7c68ec Final file tidies for 10.32 2018-09-11 14:27:39 +00:00
Philip.Hazel ab30606b01 Fix small bug in pcre2grep (no effect other than a sanitizer warning). 2018-09-10 17:34:19 +00:00
Philip.Hazel 80c57b59f6 Minor code fix to avoid static analyzer complaint. 2018-09-06 15:59:11 +00:00
Philip.Hazel bfad956b34 Treat empty-string-matching repeated conditionals the same as ordinary ones
when checking for an anchored pattern.
2018-09-03 15:20:40 +00:00
Philip.Hazel 59c2175ed9 Fix anchoring bug in conditionals with only one branch. 2018-09-02 16:53:29 +00:00
Philip.Hazel 50f0de6015 Lock out \N{U+hhhh} in non-UTF (non-Unicode) modes. 2018-09-02 16:03:27 +00:00
Philip.Hazel a8f00b314b Fix typo in Makefile.am, which caused testoutput8-16-4 to be omitted from
tarballs.
2018-08-29 08:26:29 +00:00
Philip.Hazel 1c6f2fc972 Tidy unnecessarily complicated macros in escapes table. 2018-08-19 16:54:41 +00:00
Philip.Hazel 91715304cb Remove unused character flag ctype_meta, no longer used. 2018-08-19 15:44:06 +00:00
Philip.Hazel 6e6bb40a3d Fix bad auto-possessification of certain types of class. 2018-08-17 14:45:35 +00:00
Philip.Hazel 91a6a3a521 Zero pointers in serialized patterns, for consistency. 2018-08-15 18:03:29 +00:00
Philip.Hazel 392974a0cb File tidies and documentation update for 10.32-RC1 Release Candidate. 2018-08-13 11:57:09 +00:00
Philip.Hazel 1a8cc3dab6 Make bcopy() emulation of memmove() work properly. 2018-08-10 16:27:44 +00:00
Philip.Hazel 9332d4be69 Fix dynamic options changing bug. 2018-08-04 08:20:18 +00:00
Philip.Hazel b196143523 Make /x more Perl-compatible by recognizing all of Unicode's "Pattern White
Space" characters, not just the ASCII ones.
2018-08-03 09:38:36 +00:00
Philip.Hazel 6e245572b8 Add support for (?^) as now supported by Perl. 2018-07-28 16:23:24 +00:00
Philip.Hazel e9aa3c0a21 Add support for \N{U+dd...}, for ASCII and Unicode modes only. 2018-07-27 16:30:40 +00:00
Philip.Hazel 192b82cf6e Allow :NAME on (*ACCEPT), (*FAIL), and (*COMMIT) and fix bug with (*MARK)
followed by (*ACCEPT) in an assertion. More small updates to perltest.sh.
2018-07-21 14:34:51 +00:00
Philip.Hazel 635d04fbb7 Upgrade perltest.sh to support (some) #pattern modifiers. 2018-07-17 16:00:09 +00:00
Philip.Hazel 666e94cd59 Fixed atomic group backtracking bug. 2018-07-16 15:24:32 +00:00
Philip.Hazel 937617f343 Update to Unicode 11.0.0 2018-07-07 16:10:29 +00:00
Philip.Hazel 50aa69657e Fix bug in VERSION number reading. 2018-07-02 12:26:04 +00:00
Philip.Hazel b2294373d7 Ignore qualifiers on lookaheads within lookbehinds when checking for a fixed
length.
2018-07-02 11:23:45 +00:00
Philip.Hazel 1c79bdf36f Fix global search/replace in pcre2test and pcre2_substitute() when the pattern
matches an empty string, but never at the starting offset.
2018-07-02 10:54:03 +00:00
Philip.Hazel 374770c2e3 Increase stack size when linking pcre2test with MSVC. 2018-06-27 16:34:06 +00:00
Philip.Hazel 9de1a271a0 Remove previous patch, as it did not take account of read-only source
directories.
2018-06-22 15:04:01 +00:00
Philip.Hazel c5c9d9bacd Both make systems now delete src/{pcre2.h,config.h} before starting. The
existence of these files can confuse if building is happening in another 
directory.
2018-06-21 16:13:15 +00:00
Philip.Hazel 9d87fcb727 Patches for portability. 2018-06-20 17:05:31 +00:00
Philip.Hazel 7aaced3475 Make stdint.h an optional inclusion, in case it's not present in some systems.
Use inttypes.h instead if it exists.
2018-06-19 17:41:01 +00:00
Philip.Hazel b4aaf222d7 Undefine WIN32 for pcre2grep under Cygwin. 2018-06-19 16:27:42 +00:00
Philip.Hazel 8af671a36d Documentation update. 2018-06-18 16:49:12 +00:00
Philip.Hazel e75410a5d8 More typos and changes to "Kibibytes" for "Kilobytes". 2018-06-18 14:03:33 +00:00
Philip.Hazel fabea723cf Typos in documentation and comments noted by Jason Hood. 2018-06-17 14:13:28 +00:00
Philip.Hazel 3fb01b0443 Ensure all match limit tests set a limit, don't rely on the default. 2018-04-29 15:07:44 +00:00
Philip.Hazel fb15b37b2c Remove ctrl/Z from the input for test 6. 2018-04-28 16:05:48 +00:00