Commit Graph

1393 Commits

Author SHA1 Message Date
Zoltan Herczeg 6614b281bc
Implement script extension support in JIT. (#66)
Fix incorect operator in GenerateUcd.py (modulo -> bitwise and)

Co-authored-by: Zoltan Herczeg <hzmester@freemail.hu>
2021-12-29 15:57:32 +00:00
Zoltan Herczeg afa4756d19
Rework script extension handling (#64)
Co-authored-by: Zoltan Herczeg <hzmester@freemail.hu>
2021-12-29 09:35:22 +00:00
Philip Hazel 7713f33e46 Add support for 4-character script abbreviations 2021-12-28 15:10:12 +00:00
Michael Kaufmann af2637ee5e
Fix parameter types in the pcre2serialize man page (#63) 2021-12-27 11:57:28 +00:00
Philip Hazel 98e7d70bc6 Refactor Python scripts for generating Unicode property data 2021-12-26 17:49:58 +00:00
Philip Hazel 321b559ed4 Ignore Python cache 2021-12-24 16:20:26 +00:00
Philip Hazel 16c8a84cce Arrange to distribute pcre2_ucptables.c 2021-12-23 16:13:45 +00:00
Philip Hazel 4514ddd2a2 Split generated tables from fixed tables 2021-12-22 16:55:30 +00:00
Philip Hazel 944f0e10a1 Documentation for script handling update 2021-12-22 15:02:26 +00:00
Philip Hazel b29732063b Revised script handling (see ChangeLog) 2021-12-21 16:11:30 +00:00
Philip Hazel 92d7cf1dd0 Very minor code speed up for maximizing character property matches 2021-12-17 12:30:05 +00:00
Philip Hazel 1d432ee3cf Do bidi synonyms properly 2021-12-15 11:48:23 +00:00
Philip Hazel 194a15315a Correct comment in test 2021-12-14 15:54:48 +00:00
Philip Hazel 1c41a5b815 Fix minor issues raised by Clang sanitize 2021-12-14 15:52:24 +00:00
Zoltan Herczeg 4243515033 JIT support for Bidi_Control and Bidi_Class 2021-12-13 07:04:19 +00:00
Philip Hazel 49b29f837d Add short synonyms for Bidi_Control and Bidi_Class 2021-12-10 16:32:10 +00:00
Philip Hazel 30abd0ac8d Documentation for Bidi_Control and Bidi_Class 2021-12-08 16:37:34 +00:00
Philip Hazel 0246c6bf64 Add support for Bidi_Control and Bidi_Class properties 2021-12-08 15:34:27 +00:00
Philip Hazel 823d4ac956 Add bidi class and control information to Unicode property data 2021-12-05 18:00:10 +00:00
Philip Hazel ba3d0edcbd Documentation update 2021-12-01 16:21:08 +00:00
Philip Hazel 4ef0c51d2b Interpret NULL pointer, zero length as an empty string for subjects and replacements. 2021-11-30 16:34:39 +00:00
Philip Hazel 7ab2769728 Check for NULL replacement in pcre2_substitute() 2021-11-28 17:19:17 +00:00
Philip Hazel 2a294ddadb Add check for NULL subject to POSIX regexec(). 2021-11-28 16:38:36 +00:00
Philip Hazel cb854a912e Add options for NULL pointers to pcre2test. 2021-11-28 16:22:24 +00:00
Philip Hazel 16dccbcb13 Update ChangeLog for latest patches 2021-11-27 16:54:14 +00:00
Carlo Marcelo Arenas Belón ae4e6261e5
match: avoid crash if subject NULL and PCRE2_ZERO_TERMINATED (#53)
* pcre2_match: avoid crash if subject NULL and PCRE2_ZERO_TERMINATED

When length of subject is PCRE2_ZERO_TERMINATED strlen is used
to calculate its size, which will trigger a crash if subject is
also NULL.

Move the NULL check before strlen on it would be used, and make
sure or dependent variables are set after the NULL validation
as well.

While at it, fix a typo in a debug flag in the same file, which
is otherwise unrelated and make sure the full section of constrain
checks can be identified clearly using the leading comment alone.

* pcre2_dfa_match: avoid crash if subject NULL and PCRE2_ZERO_TERMINATED

When length of subject is PCRE2_ZERO_TERMINATED strlen is used
to calculate its size, which will trigger a crash if subject is
also NULL.

Move the NULL check before the detection for subject sizes to
avoid this issue.

* pcre2_substitute: avoid crash if subject or replacement are NULL

The underlying pcre2_match() function will validate the subject if
needed, but will crash when length is PCRE2_ZERO_TERMINATED or if
subject == NULL and pcre2_match() is not being called because
match_data was provided.

The replacement parameter is missing NULL checks, and so currently
allows for an equivalent response to "" if rlength == 0.

Restrict all other cases to avoid strlen(NULL) crashes in the same
way that is done for subject, but also make sure to reject invalid
length values as early as possible.
2021-11-27 16:49:31 +00:00
Carlo Marcelo Arenas Belón d24a1c9d31
cmake: avoid man3 glob post processing (#48)
It doesn't seem needed, and is apparently resulting in at least one
duplicated entry in the installation list that causes problems for
uninstalling.

Fixes: #46

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
2021-11-27 16:41:06 +00:00
Carlo Marcelo Arenas Belón 055b7ce4a9
pcre2grep: remove JFRIEDL_DEBUG obsoleted code (#49)
Still uses the already obsoleted PCRE1 API

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
2021-11-27 16:36:17 +00:00
Philip Hazel 4a8f5d104c Local updates consequent on ocumentation patches (PR#47). 2021-11-27 16:32:52 +00:00
Carlo Marcelo Arenas Belón 587b94277b
doc: formatting/typo fixes to documentation (#47)
* doc: fix incorrect use of JOIN and typo

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>

* doc: reformat of pcre2_substitute to align options

includes some rewording to fit better in an 80 char wide troff output.

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>

* doc: update names to pcre2
2021-11-27 16:27:49 +00:00
Philip Hazel c8d31f1605 Update ChangeLog for GitHub #52 (adf76faa) 2021-11-26 17:37:10 +00:00
Carlo Marcelo Arenas Belón adf76faace
pcre2grep: fix build for Hurd (#52)
Since d5a61ee8 (Patch to detect (and ignore) symlink loops in
pcre2grep., 2021-08-28), there is optional code that depends
on readlink and PATH_MAX but that had only detection added for
the first.

GNU Hurd doesn't have the later so it fails to build.

Improve the detection to include both dependencies in autotools
and cmake to fix that.

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
2021-11-26 17:31:35 +00:00
Zoltan Herczeg d144199dfb
Revert an unintended change in JIT repeat detection. (#58)
Co-authored-by: Zoltan Herczeg <hzmester@freemail.hu>
2021-11-24 16:58:30 +00:00
Carlo Marcelo Arenas Belón eb42305f07
jit: avoid integer wraparound in stack size definition (#42)
pcre2_jit_stack_create() allows the user to indicate how big of a
stack size JIT should be able to allocate and use, using a size_t
variable which should be able to hold bigger values than reasonable.

Internally, the value is rounded to the next 8K, but if the value
is unreasonable large, would overflow and could result in a smaller
than expected stack or a maximun size that is smaller than the
minimum..

Avoid the overflow by checking the value and failing early, and
while at it make the check clearer while documenting the failure
mode.

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
2021-11-19 09:23:46 +01:00
Philip Hazel 46890604a4 Update ChangeLog for GitHub #37 (acc520924). 2021-11-09 17:26:08 +00:00
Carlo Marcelo Arenas Belón acc520924c
test: avoid failing RunTest if pcre2test -S is not supported (#37)
* test: avoid failing RunTest if pcre2test -S is not supported

If `pcre2test -S` is not supported then then avoid checking for it
in a test.

There is already a conditional check for it to be used when it is
needed and it is available, so adjust that as well.

* pcre2test: update list of platform support for -S

Minix 3 has a BSD userspace and now works fine, but Haiku still
doesn't support stack limits, so update accordingly.
2021-11-09 17:23:02 +00:00
Philip Hazel bc70a183fc Update ChangeLog for GitHub #36 (dae47509) patch. 2021-11-09 17:19:26 +00:00
Carlo Marcelo Arenas Belón dae475092d
pcre2grep: avoid portability minefield with buffered fseek(stdin) (#36)
To allow pcre2grep to do an early exit in a resumable way, -m uses
fseek on stdin, which is sadly not supported in several platforms.

Most of the conflicting issues come from the fact that managing the
position while buffering is not trivial, and is therefore an optional
feature[1] of POSIX.1-2017

Workaround this by removing the buffer to stdin, if the -m option is
being used.  There is likely not a significant performance benefit
even for the platforms that support it, but it could be conditionally
added in that case, later.

Fixes: #10

[1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/fseek.html
2021-11-09 17:15:38 +00:00
Philip Hazel 1ed34b9cb1 Update version to 10.40-RC1 and fix consequent version test issue. 2021-11-09 17:12:50 +00:00
Philip Hazel f19e84674e Update ChangeLog for GitHub #35 fix. 2021-11-09 17:12:13 +00:00
Carlo Marcelo Arenas Belón 7db8784296
pcre2grep: correctly handle multiple passes (#35)
* tests: use a explicit filehandle to share in testing -m

The way stdin is shared to all participants of a subshell varies
per shell, and at least the standard /bin/sh in Solaris seem to
create a new copy for each command, defeating the purpose of the
test.

Use instead exec to create a filehandle that could then be used
explicitly in the test to confirm that the stream is set.

* pcre2grep: correctly handle multiple passes

When the -m option is used, pcre2grep is meant to exit after enough
matches are found but while leaving the stream pinned to the next position
after the last match.

Unfortunately, it wasn't tracking correctly the beginning of the stream
on subsequent passes, and therefore it will fail to use the right seek
value.

Grab the position of the stream at the beginning and while at it, make
sure that the stream passed hasn't been consumed already.
2021-11-09 16:57:48 +00:00
Philip Hazel 072717a61f Fix very minor typos in documentation: redundant spaces. 2021-10-30 11:25:12 +01:00
Philip Hazel 35fee4193b Final file tidies for 10.39. 2021-10-29 17:09:37 +01:00
Philip Hazel 3469b13b8e Update docs and version info for 10.39. 2021-10-29 17:03:31 +01:00
Philip Hazel 29c37f9aa3 Update ChangeLog for GitHub #32 patch. 2021-10-29 16:13:30 +01:00
Carlo Marcelo Arenas Belón 128c50360c
fix building on ancient compilers (#32)
* jit: allow building with ancient MSVC versions

Visual Studio older than 2013, fails to build with JIT enabled,
because it is unable to parse non C89 compatible syntax, with
mixed declarations and code.

While most recent compilers wouldn't even report this as a warning
since it is valid C99, it could be also made visible by adding to
gcc/clang the -Wdeclaration-after-statement flag at build time.

Move the code below the affected definitions.

* pcre2grep: avoid mixing declarations with code

Since d5a61ee8 (Patch to detect (and ignore) symlink loops in
pcre2grep., 2021-08-28), code will fail to build in a strict C89
compiler.

Reformat slightly to make it C89 compatible again.
2021-10-29 16:07:53 +01:00
Philip Hazel bf2c8cc564 Update ChangeLog for GitHub commits and generate HTML docs. 2021-10-29 15:12:56 +01:00
Philip Hazel 87f32b9b39 Add ChangeLog item for GitHub #29. 2021-10-29 15:07:03 +01:00
Philip Hazel 7ed39af7cc Create ChangeLog item for issue #28 merge. 2021-10-29 15:07:03 +01:00
Carlo Marcelo Arenas Belón 3b973ebf4b
inttypes and stdint cleanup (#30)
* cleanup: remove references to no longer used stdint.h

Since 19c50b9d (Unconditionally use inttypes.h instead of trying for
stdint.h (simplification) and remove the now unnecessary inclusion in
pcre2_internal.h., 2018-11-14), stdint.h is no longer used.

Remove checks for it in autotools and CMake and document better the
expected build failures for systems that might have stdint.h (C99)
and not inttypes.h (from POSIX), like old Windows.

* cleanup: remove detection for inttypes.h which is a hard dependency

CMake checks for standard headers are not meant to be used for hard
dependencies, so will prevent a possible fallback to work.

Alternatively, the header could be checked to make the configuration
fail instead of breaking the build, but that was punted, as it was
missing anyway from autotools.
2021-10-29 15:05:19 +01:00