Commit Graph

1043 Commits

Author SHA1 Message Date
Philip Hazel 13be26a5c2 Adjust code for PR100 to put declarations before code 2022-04-15 12:23:04 +01:00
pagabuc ba6a5f16d2
Enable fuzzing of the JIT engine (#100) 2022-04-15 12:06:05 +01:00
Zoltan Herczeg d07c967b3a Fix recursion issue in JIT 2022-04-14 19:09:40 +00:00
Carlo Marcelo Arenas Belón 4279abbd7d
pcre2test: allow using readline headers for libedit (#99)
At least in OpenBSD, there is a libedit library in base, but without
public headers.  Public headers for readline are available but since
15db5d36 (pcre2test: avoid using readline headers with libedit,
2022-04-07) won't be picked up automatically.

Allow pointing cmake to those headers by doing (for example):

  $ cmake -DEDITLINE_INCLUDE_DIR=/usr/include/readline

Or using custom CPPFLAGS with configure (for example):

  $ CPPFLAGS=-I/usr/include/readline ./configure --enable-pcre2test-libedit

Since the headers from readline.h would be otherwise incomplete, detect
that case and pull the extra headers that are required automagically and
while at it, cleanup the NCURSES dependency that was unnecessarily copied
from readline.
2022-04-14 16:51:51 +01:00
Zoltan Herczeg e612e06b5d JIT compiler update 2022-04-11 05:30:45 +00:00
Carlo Marcelo Arenas Belón 9c8abddc52
pcre2test: really allow using libedit when enabled (#96)
* pcre2test: use the right header for libedit in FreeBSD with autoconf

When `./configure --enable-pcre2test-libedit` is used in FreeBSD,
the resulting test will succeed but won't set the necessary flag
to distinguish between libedit and readline header files, therefore
using readline's at built time (if installed)

Consolidate all header tests into one and use instead the corresponding
autogenerated defines to check for each possibility.

* pcre2test: really allow libedit with cmake

Using cmake to configure and enable linking pcre2test with libedit,
could result in a broken build, because the header used was instead
pointing to readline.

In cases were the build will succeed (because both libraries were
available), it would likely show warnings, because several history
functions were being used without declarations, since readline
requires including "history.h" for those.

Additionally, since PCRE2_SUPPORT_READLINE is ON by default (unlike
configure), turning PCRE2_SUPPORT_LIBEDIT=ON, would require setting
that other option to OFF explicitly (even if readline wasn't available)
or the setup would abort.

Lastly, in systems with no default sysroot (ex: macOS), the use of
absolute paths for searching for libedit's readline.h could fail so
use instead relative PATH_SUFFIXES.

* pcre2test: avoid using readline headers with libedit

When asked to enable libedit in a system that ALSO has readline,
the headers of the former would be found and used by the earlier.

While that would mostly work, some functions will be missing
definitions (which is forbidden in C99), so instead abort the
configuration and let the user provide for them.
2022-04-08 17:07:30 +01:00
Carlo Marcelo Arenas Belón f11c26842d
jit: update SIMD code to latest syntax (#95) 2022-04-03 10:22:45 +02:00
Zoltan Herczeg 4ca0530b9b Improve the code generator of recurse in JIT 2022-03-27 06:38:01 +00:00
Zoltan Herczeg 03654e751e Fixed an issue affecting recursions in JIT 2022-03-26 08:03:11 +00:00
Zoltan Herczeg d4fa336fbc Fix incorrect value reading in JIT. 2022-03-24 05:34:42 +00:00
Zoltan Herczeg 50a51cb7e6 Fixed a unicode properrty matching issue in JIT 2022-03-23 07:53:58 +00:00
Philip Hazel eef5740ff9 Remove overlooked excess initializer 2022-03-03 16:44:06 +00:00
Zoltan Herczeg dea56d2df9 JIT compiler update. 2022-02-24 14:15:15 +00:00
Philip Hazel fdd9479108 Fix incorrect compiling when [Aa] etc. are quantified 2022-01-26 08:37:18 +00:00
Philip Hazel 419e3c68a3 Tidy comments 2022-01-14 16:05:30 +00:00
Zoltan Herczeg e21345de97
Extend unicode boolean property bitset index to 12 bit (#81)
Co-authored-by: Zoltan Herczeg <hzmester@freemail.hu>
2022-01-14 15:51:03 +00:00
Philip Hazel 504ff06fff Fix overrun bug in recent property name parsing change 2022-01-14 12:24:23 +00:00
Philip Hazel 360a84e80b Update descriptive comments in UCD generation. 2022-01-12 17:38:48 +00:00
Zoltan Herczeg 061e57695a
Merge scriptx and bidi fields (#78)
Co-authored-by: Zoltan Herczeg <hzmester@freemail.hu>
2022-01-12 17:00:12 +00:00
Philip Hazel bf35c0518c Add -LP and -LS (list properties, list scripts) features to pcre2test. 2022-01-12 15:01:14 +00:00
Zoltan Herczeg 68fbc1982e
Support boolean properties in JIT (#76)
Co-authored-by: Zoltan Herczeg <hzmester@freemail.hu>
2022-01-11 16:03:34 +00:00
Philip Hazel 06d3a66065 Fix bug in modifier listing 2022-01-11 09:21:27 +00:00
Philip Hazel 87571b5af3 Update documentation and comments for UCD generation 2022-01-10 16:26:41 +00:00
Philip Hazel 838cdac4dc Remove vestiges of previous Bidi_Class coding 2022-01-10 14:57:45 +00:00
Philip Hazel ec091e2e44 Restore lost de-duplication 2022-01-10 11:31:27 +00:00
Philip Hazel 636569a957 Initial code for Boolean property support 2022-01-09 14:46:43 +00:00
Zoltan Herczeg f90542a209
Improve unicode property abbreviation support (#74)
* Improve unicode property abbreviation support

* Auto-generate script names

Co-authored-by: Zoltan Herczeg <hzmester@freemail.hu>
2022-01-07 10:01:18 +00:00
Carlo Marcelo Arenas Belón 14dbc6e6ec
jit: use correct type when checking for max value (#73)
eb42305f (jit: avoid integer wraparound in stack size definition (#42),
2021-11-19) introduces a check to avoid an integer overflow when
allocating stack size for JIT.

Unfortunately the maximum value was using PCRE2_SIZE_MAX, eventhough
the variable is of type size_t, so correct it.

Practically; the issue shouldn't affect the most common configurations
where both values are the same, and it will be unlikely that there would
be a configuration where PCRE2_SIZE_MAX > SIZE_MAX, hence the mistake
is unlikely to have reintroduced the original bug and this change should
be therefore mostly equivalent.

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
2022-01-06 14:46:43 +01:00
Jessica Clarke 04ecb267c0
match: Properly align heapframes for CHERI/Arm's Morello prototype (#72)
On CHERI, and thus Arm's Morello prototype, pointers are represented as
hardware capabilities, which consist of both an integer address and
additional metadata, meaning they are twice the size of the platform's
size_t type, i.e. 16 bytes on a 64-bit system. The ovector member of
heapframe happens to only be 8 byte aligned, and so computing frame_size
ends up with a multiple of 8 but not 16. Whilst the first frame is
always suitably aligned, this then misaligns the frame that follows it,
resulting in an alignment fault when storing a pointer to Fecode at the
start of match.

Thus, round up frame_size to a multiple of heapframe's alignment to
ensure alignment is preserved. This can be completely optimised away on
traditional architectures and, since CHERI's capabilities are in fact
2 * sizeof(PCRE2_SIZE) bytes in size, the variable part of the
expression is also proven to be a multiple of the alignment and so the
aligning gets folded into the offsetof part by adding an additional 8,
so no dynamic alignment code is needed even on CHERI architectures.
2022-01-04 17:06:14 +00:00
Philip Hazel 31fb2e58a1 Suppress compiler fall-through warnings 2022-01-03 15:57:48 +00:00
Zoltan Herczeg 435140a0ac
Fix script extension support on jit (#69)
Co-authored-by: Zoltan Herczeg <hzmester@freemail.hu>
2022-01-03 15:49:26 +00:00
Philip Hazel d888d36013 Update script run code to work with new script extensions coding 2021-12-31 16:06:05 +00:00
Zoltan Herczeg 6614b281bc
Implement script extension support in JIT. (#66)
Fix incorect operator in GenerateUcd.py (modulo -> bitwise and)

Co-authored-by: Zoltan Herczeg <hzmester@freemail.hu>
2021-12-29 15:57:32 +00:00
Zoltan Herczeg afa4756d19
Rework script extension handling (#64)
Co-authored-by: Zoltan Herczeg <hzmester@freemail.hu>
2021-12-29 09:35:22 +00:00
Philip Hazel 7713f33e46 Add support for 4-character script abbreviations 2021-12-28 15:10:12 +00:00
Philip Hazel 98e7d70bc6 Refactor Python scripts for generating Unicode property data 2021-12-26 17:49:58 +00:00
Philip Hazel 4514ddd2a2 Split generated tables from fixed tables 2021-12-22 16:55:30 +00:00
Philip Hazel b29732063b Revised script handling (see ChangeLog) 2021-12-21 16:11:30 +00:00
Philip Hazel 92d7cf1dd0 Very minor code speed up for maximizing character property matches 2021-12-17 12:30:05 +00:00
Philip Hazel 1d432ee3cf Do bidi synonyms properly 2021-12-15 11:48:23 +00:00
Philip Hazel 1c41a5b815 Fix minor issues raised by Clang sanitize 2021-12-14 15:52:24 +00:00
Zoltan Herczeg 4243515033 JIT support for Bidi_Control and Bidi_Class 2021-12-13 07:04:19 +00:00
Philip Hazel 49b29f837d Add short synonyms for Bidi_Control and Bidi_Class 2021-12-10 16:32:10 +00:00
Philip Hazel 0246c6bf64 Add support for Bidi_Control and Bidi_Class properties 2021-12-08 15:34:27 +00:00
Philip Hazel 823d4ac956 Add bidi class and control information to Unicode property data 2021-12-05 18:00:10 +00:00
Philip Hazel 4ef0c51d2b Interpret NULL pointer, zero length as an empty string for subjects and replacements. 2021-11-30 16:34:39 +00:00
Philip Hazel 7ab2769728 Check for NULL replacement in pcre2_substitute() 2021-11-28 17:19:17 +00:00
Philip Hazel 2a294ddadb Add check for NULL subject to POSIX regexec(). 2021-11-28 16:38:36 +00:00
Philip Hazel cb854a912e Add options for NULL pointers to pcre2test. 2021-11-28 16:22:24 +00:00
Carlo Marcelo Arenas Belón ae4e6261e5
match: avoid crash if subject NULL and PCRE2_ZERO_TERMINATED (#53)
* pcre2_match: avoid crash if subject NULL and PCRE2_ZERO_TERMINATED

When length of subject is PCRE2_ZERO_TERMINATED strlen is used
to calculate its size, which will trigger a crash if subject is
also NULL.

Move the NULL check before strlen on it would be used, and make
sure or dependent variables are set after the NULL validation
as well.

While at it, fix a typo in a debug flag in the same file, which
is otherwise unrelated and make sure the full section of constrain
checks can be identified clearly using the leading comment alone.

* pcre2_dfa_match: avoid crash if subject NULL and PCRE2_ZERO_TERMINATED

When length of subject is PCRE2_ZERO_TERMINATED strlen is used
to calculate its size, which will trigger a crash if subject is
also NULL.

Move the NULL check before the detection for subject sizes to
avoid this issue.

* pcre2_substitute: avoid crash if subject or replacement are NULL

The underlying pcre2_match() function will validate the subject if
needed, but will crash when length is PCRE2_ZERO_TERMINATED or if
subject == NULL and pcre2_match() is not being called because
match_data was provided.

The replacement parameter is missing NULL checks, and so currently
allows for an equivalent response to "" if rlength == 0.

Restrict all other cases to avoid strlen(NULL) crashes in the same
way that is done for subject, but also make sure to reject invalid
length values as early as possible.
2021-11-27 16:49:31 +00:00