diff --git a/ChangeLog b/ChangeLog index ce57091..9d61911 100644 --- a/ChangeLog +++ b/ChangeLog @@ -216,6 +216,9 @@ unit". Previously only non-anchored patterns did this. 49. Update extended grapheme breaking rules to the latest set that are in Unicode Standard Annex #29. +50. Added experimental foreign pattern conversion facilities +(pcre2_pattern_convert() and friends). + Version 10.23 14-February-2017 ------------------------------ diff --git a/Makefile.am b/Makefile.am index 56d3434..bbf23b8 100644 --- a/Makefile.am +++ b/Makefile.am @@ -36,6 +36,10 @@ dist_html_DATA = \ doc/html/pcre2_compile_context_create.html \ doc/html/pcre2_compile_context_free.html \ doc/html/pcre2_config.html \ + doc/html/pcre2_convert_context_copy.html \ + doc/html/pcre2_convert_context_create.html \ + doc/html/pcre2_convert_context_free.html \ + doc/html/pcre2_converted_pattern_free.html \ doc/html/pcre2_dfa_match.html \ doc/html/pcre2_general_context_copy.html \ doc/html/pcre2_general_context_create.html \ @@ -59,6 +63,7 @@ dist_html_DATA = \ doc/html/pcre2_match_data_create.html \ doc/html/pcre2_match_data_create_from_pattern.html \ doc/html/pcre2_match_data_free.html \ + doc/html/pcre2_pattern_convert.html \ doc/html/pcre2_pattern_info.html \ doc/html/pcre2_serialize_decode.html \ doc/html/pcre2_serialize_encode.html \ @@ -70,6 +75,8 @@ dist_html_DATA = \ doc/html/pcre2_set_compile_extra_options.html \ doc/html/pcre2_set_compile_recursion_guard.html \ doc/html/pcre2_set_depth_limit.html \ + doc/html/pcre2_set_glob_escape.html \ + doc/html/pcre2_set_glob_separator.html \ doc/html/pcre2_set_heap_limit.html \ doc/html/pcre2_set_match_limit.html \ doc/html/pcre2_set_max_pattern_length.html \ @@ -94,6 +101,7 @@ dist_html_DATA = \ doc/html/pcre2build.html \ doc/html/pcre2callout.html \ doc/html/pcre2compat.html \ + doc/html/pcre2convert.html \ doc/html/pcre2demo.html \ doc/html/pcre2grep.html \ doc/html/pcre2jit.html \ @@ -121,6 +129,10 @@ dist_man_MANS = \ doc/pcre2_compile_context_create.3 \ doc/pcre2_compile_context_free.3 \ doc/pcre2_config.3 \ + doc/pcre2_convert_context_copy.3 \ + doc/pcre2_convert_context_create.3 \ + doc/pcre2_convert_context_free.3 \ + doc/pcre2_converted_pattern_free.3 \ doc/pcre2_dfa_match.3 \ doc/pcre2_general_context_copy.3 \ doc/pcre2_general_context_create.3 \ @@ -144,6 +156,7 @@ dist_man_MANS = \ doc/pcre2_match_data_create.3 \ doc/pcre2_match_data_create_from_pattern.3 \ doc/pcre2_match_data_free.3 \ + doc/pcre2_pattern_convert.3 \ doc/pcre2_pattern_info.3 \ doc/pcre2_serialize_decode.3 \ doc/pcre2_serialize_encode.3 \ @@ -155,6 +168,8 @@ dist_man_MANS = \ doc/pcre2_set_compile_extra_options.3 \ doc/pcre2_set_compile_recursion_guard.3 \ doc/pcre2_set_depth_limit.3 \ + doc/pcre2_set_glob_escape.3 \ + doc/pcre2_set_glob_separator.3 \ doc/pcre2_set_heap_limit.3 \ doc/pcre2_set_match_limit.3 \ doc/pcre2_set_max_pattern_length.3 \ @@ -179,6 +194,7 @@ dist_man_MANS = \ doc/pcre2build.3 \ doc/pcre2callout.3 \ doc/pcre2compat.3 \ + doc/pcre2convert.3 \ doc/pcre2demo.3 \ doc/pcre2grep.1 \ doc/pcre2jit.3 \ diff --git a/doc/html/index.html b/doc/html/index.html index 2a373f5..b9393d9 100644 --- a/doc/html/index.html +++ b/doc/html/index.html @@ -35,6 +35,9 @@ first.
+Return to the PCRE2 index page. +
+
+This page is part of the PCRE2 HTML documentation. It was generated
+automatically from the original man page. If there is any nonsense in it,
+please consult the man page, in case the conversion went wrong.
+
+
+SYNOPSIS
+
+
+#include <pcre2.h> +
++pcre2_convert_context *pcre2_convert_context_copy( + pcre2_convert_context *cvcontext); +
++This function is part of an experimental set of pattern conversion functions. +It makes a new copy of a convert context, using the memory allocation function +that was used for the original context. The result is NULL if the memory cannot +be obtained. +
++The pattern conversion functions are described in the +pcre2convert +documentation. +
+Return to the PCRE2 index page. +
diff --git a/doc/html/pcre2_convert_context_create.html b/doc/html/pcre2_convert_context_create.html new file mode 100644 index 0000000..2564780 --- /dev/null +++ b/doc/html/pcre2_convert_context_create.html @@ -0,0 +1,41 @@ + + ++Return to the PCRE2 index page. +
+
+This page is part of the PCRE2 HTML documentation. It was generated
+automatically from the original man page. If there is any nonsense in it,
+please consult the man page, in case the conversion went wrong.
+
+
+SYNOPSIS
+
+
+#include <pcre2.h> +
++pcre2_convert_context *pcre2_convert_context_create( + pcre2_general_context *gcontext); +
++This function is part of an experimental set of pattern conversion functions. +It creates and initializes a new convert context. If its argument is +NULL, malloc() is used to get the necessary memory; otherwise the memory +allocation function within the general context is used. The result is NULL if +the memory could not be obtained. +
++The pattern conversion functions are described in the +pcre2convert +documentation. +
+Return to the PCRE2 index page. +
diff --git a/doc/html/pcre2_convert_context_free.html b/doc/html/pcre2_convert_context_free.html new file mode 100644 index 0000000..ab6db6c --- /dev/null +++ b/doc/html/pcre2_convert_context_free.html @@ -0,0 +1,39 @@ + + ++Return to the PCRE2 index page. +
+
+This page is part of the PCRE2 HTML documentation. It was generated
+automatically from the original man page. If there is any nonsense in it,
+please consult the man page, in case the conversion went wrong.
+
+
+SYNOPSIS
+
+
+#include <pcre2.h> +
++void pcre2_convert_context_free(pcre2_convert_context *cvcontext); +
++This function is part of an experimental set of pattern conversion functions. +It frees the memory occupied by a convert context, using the memory +freeing function from the general context with which it was created, or +free() if that was not set. +
++The pattern conversion functions are described in the +pcre2convert +documentation. +
+Return to the PCRE2 index page. +
diff --git a/doc/html/pcre2_converted_pattern_free.html b/doc/html/pcre2_converted_pattern_free.html new file mode 100644 index 0000000..961f04f --- /dev/null +++ b/doc/html/pcre2_converted_pattern_free.html @@ -0,0 +1,39 @@ + + ++Return to the PCRE2 index page. +
+
+This page is part of the PCRE2 HTML documentation. It was generated
+automatically from the original man page. If there is any nonsense in it,
+please consult the man page, in case the conversion went wrong.
+
+
+SYNOPSIS
+
+
+#include <pcre2.h> +
++void pcre2_converted_pattern_free(PCRE2_UCHAR *converted_pattern); +
++This function is part of an experimental set of pattern conversion functions. +It frees the memory occupied by a converted pattern that was obtained by +calling pcre2_pattern_convert() with arguments that caused it to place +the converted pattern into newly obtained heap memory. +
++The pattern conversion functions are described in the +pcre2convert +documentation. +
+Return to the PCRE2 index page. +
diff --git a/doc/html/pcre2_pattern_convert.html b/doc/html/pcre2_pattern_convert.html new file mode 100644 index 0000000..2fcd7cc --- /dev/null +++ b/doc/html/pcre2_pattern_convert.html @@ -0,0 +1,70 @@ + + ++Return to the PCRE2 index page. +
+
+This page is part of the PCRE2 HTML documentation. It was generated
+automatically from the original man page. If there is any nonsense in it,
+please consult the man page, in case the conversion went wrong.
+
+
+SYNOPSIS
+
+
+#include <pcre2.h> +
++int pcre2_pattern_convert(PCRE2_SPTR pattern, PCRE2_SIZE length, + uint32_t options, PCRE2_UCHAR **buffer, + PCRE2_SIZE *blength, pcre2_convert_context *cvcontext); +
++This function is part of an experimental set of pattern conversion functions. +It converts a foreign pattern (for example, a glob) into a PCRE2 regular +expression pattern. Its arguments are: +
+ pattern The foreign pattern + length The length of the input pattern or PCRE2_ZERO_TERMINATED + options Option bits + buffer Pointer to pointer to output buffer, or NULL + blength Pointer to output length field + cvcontext Pointer to a convert context or NULL ++The length of the converted pattern (excluding the terminating zero) is +returned via blength. If buffer is NULL, the function just returns +the output length. If buffer points to a NULL pointer, heap memory is +obtained for the converted pattern, using the allocator in the context if +present (or else malloc()), and the field pointed to by buffer is +updated. If buffer points to a non-NULL field, that must point to a +buffer whose size is in the variable pointed to by blength. This value is +updated. + +
+The option bits are: +
+ PCRE2_CONVERT_UTF Input is UTF + PCRE2_CONVERT_NO_UTF_CHECK Do not check UTF validity + PCRE2_CONVERT_POSIX_BASIC Convert POSIX basic pattern + PCRE2_CONVERT_POSIX_EXTENDED Convert POSIX extended pattern + PCRE2_CONVERT_GLOB ) Convert + PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR ) various types + PCRE2_CONVERT_GLOB_NO_STARSTAR ) of glob ++The return value from pcre2_pattern_convert() is zero on success or a +non-zero PCRE2 error code. + +
+The pattern conversion functions are described in the +pcre2convert +documentation. +
+Return to the PCRE2 index page. +
diff --git a/doc/html/pcre2_set_glob_escape.html b/doc/html/pcre2_set_glob_escape.html new file mode 100644 index 0000000..2b55627 --- /dev/null +++ b/doc/html/pcre2_set_glob_escape.html @@ -0,0 +1,43 @@ + + ++Return to the PCRE2 index page. +
+
+This page is part of the PCRE2 HTML documentation. It was generated
+automatically from the original man page. If there is any nonsense in it,
+please consult the man page, in case the conversion went wrong.
+
+
+SYNOPSIS
+
+
+#include <pcre2.h> +
++int pcre2_set_glob_escape(pcre2_convert_context *cvcontext, + uint32_t escape_char); +
++This function is part of an experimental set of pattern conversion functions. +It sets the escape character that is used when converting globs. The second +argument must either be zero (meaning there is no escape character) or a +punctuation character whose code point is less than 256. The default is grave +accent if running under Windows, otherwise backslash. The result of the +function is zero for success or PCRE2_ERROR_BADDATA if the second argument is +invalid. +
++The pattern conversion functions are described in the +pcre2convert +documentation. +
+Return to the PCRE2 index page. +
diff --git a/doc/html/pcre2_set_glob_separator.html b/doc/html/pcre2_set_glob_separator.html new file mode 100644 index 0000000..538748d --- /dev/null +++ b/doc/html/pcre2_set_glob_separator.html @@ -0,0 +1,42 @@ + + ++Return to the PCRE2 index page. +
+
+This page is part of the PCRE2 HTML documentation. It was generated
+automatically from the original man page. If there is any nonsense in it,
+please consult the man page, in case the conversion went wrong.
+
+
+SYNOPSIS
+
+
+#include <pcre2.h> +
++int pcre2_set_glob_separator(pcre2_convert_context *cvcontext, + uint32_t separator_char); +
++This function is part of an experimental set of pattern conversion functions. +It sets the component separator character that is used when converting globs. +The second argument must one of the characters forward slash, backslash, or +dot. The default is backslash when running under Windows, otherwise forward +slash. The result of the function is zero for success or PCRE2_ERROR_BADDATA if +the second argument is invalid. +
++The pattern conversion functions are described in the +pcre2convert +documentation. +
+Return to the PCRE2 index page. +
diff --git a/doc/html/pcre2api.html b/doc/html/pcre2api.html index 1fb5738..67c5802 100644 --- a/doc/html/pcre2api.html +++ b/doc/html/pcre2api.html @@ -24,37 +24,38 @@ please consult the man page, in case the conversion went wrong.#include <pcre2.h> @@ -334,7 +335,43 @@ backward compatibility. They should not be used in new code. The first is replaced by pcre2_set_depth_limit(); the second is no longer needed and has no effect (it always returns zero).
-
+pcre2_convert_context *pcre2_convert_context_create(
+ pcre2_general_context *gcontext);
+
+
+pcre2_convert_context *pcre2_convert_context_copy(
+ pcre2_convert_context *cvcontext);
+
+
+void pcre2_convert_context_free(pcre2_convert_context *cvcontext);
+
+
+int pcre2_set_glob_escape(pcre2_convert_context *cvcontext,
+ uint32_t escape_char);
+
+
+int pcre2_set_glob_separator(pcre2_convert_context *cvcontext,
+ uint32_t separator_char);
+
+
+int pcre2_pattern_convert(PCRE2_SPTR pattern, PCRE2_SIZE length,
+ uint32_t options, PCRE2_UCHAR **buffer,
+ PCRE2_SIZE *blength, pcre2_convert_context *cvcontext);
+
+
+void pcre2_converted_pattern_free(PCRE2_UCHAR *converted_pattern);
+
+
+These functions provide a way of converting non-PCRE2 patterns into
+patterns that can be processed by pcre2_compile(). This facility is
+experimental and may be changed in future releases. At present, "globs" and
+POSIX basic and extended patterns can be converted. Details are given in the
+pcre2convert
+documentation.
+
There are three PCRE2 libraries, supporting 8-bit, 16-bit, and 32-bit code units, respectively. However, there is just one header file, pcre2.h. @@ -395,7 +432,7 @@ In the function summaries above, and in the rest of this document and other PCRE2 documents, functions and data types are described using their generic names, without the _8, _16, or _32 suffix.
-PCRE2 has its own native API, which is described in this document. There are also some wrapper functions for the 8-bit library that correspond to the @@ -503,7 +540,7 @@ Functions with names ending with _free() are used for freeing memory blocks of various sorts. In all cases, if one of these functions is called with a NULL argument, it does nothing.
-The PCRE2 API uses string lengths and offsets into strings of code units in several places. These values are always of type PCRE2_SIZE, which is an @@ -513,7 +550,7 @@ as a special indicator for zero-terminated strings and unset offsets. Therefore, the longest string that can be handled is one less than this maximum.
-PCRE2 supports five different conventions for indicating line breaks in strings: a single CR (carriage return) character, a single LF (linefeed) @@ -548,7 +585,7 @@ The choice of newline convention does not affect the interpretation of the \n or \r escape sequences, nor does it affect what \R matches; this has its own separate convention.
-In a multithreaded application it is important to keep thread-specific data separate from data that can be shared between threads. The PCRE2 library code @@ -628,7 +665,7 @@ match. This includes details of what was matched, as well as additional information such as the name of a (*MARK) setting. Each thread must provide its own copy of this memory.
-Some PCRE2 functions have a lot of parameters, many of which are used only by specialist applications, for example, those that use custom memory management @@ -1013,7 +1050,7 @@ where ddd is a decimal number. However, such a setting is ignored unless ddd is less than the limit set by the caller of pcre2_match() or pcre2_dfa_match() or, if no such limit is set, less than the default.
-int pcre2_config(uint32_t what, void *where);
@@ -1150,7 +1187,7 @@ the PCRE2 version string, zero-terminated. The number of code units used is returned. This is the length of the string plus one unit for the terminating zero. -pcre2_code *pcre2_compile(PCRE2_SPTR pattern, PCRE2_SIZE length, uint32_t options, int *errorcode, PCRE2_SIZE *erroroffset, @@ -1741,7 +1778,7 @@ dangerous option. Use with care. PCRE2_EXTRA_MATCH_LINE This option is provided for use by the -x option of pcre2grep. It -causes the pattern only to match complete lines. This is achieved by +causes the pattern only to match complete lines. This is achieved by automatically inserting the code for "^(?:" at the start of the compiled pattern and ")$" at the end. Thus, when PCRE2_MULTILINE is set, the matched line may be in the middle of the subject string. This option can be used with @@ -1756,7 +1793,7 @@ at the start of the compiled pattern and ")\b" at the end. The option may be used with PCRE2_LITERAL. However, it is ignored if PCRE2_EXTRA_MATCH_LINE is also set.
-There are nearly 100 positive error codes that pcre2_compile() may return (via errorcode) if it finds an error in the pattern. There are also some @@ -1769,7 +1806,7 @@ error message" below) can be called to obtain a textual error message from any error code.
-
int pcre2_jit_compile(pcre2_code *code, uint32_t options);
@@ -1807,7 +1844,7 @@ patterns to be analyzed, and for one-off matches and simple patterns the
benefit of faster execution might be offset by a much slower compilation time.
Most (but not all) patterns can be optimized by the JIT compiler.
PCRE2 handles caseless matching, and determines whether characters are letters, digits, or whatever, by reference to a set of tables, indexed by character code @@ -1863,7 +1900,7 @@ is saved with the compiled pattern, and the same tables are used by compilation and matching both happen in the same locale, but different patterns can be processed in different locales.
-int pcre2_pattern_info(const pcre2 *code, uint32_t what, void *where);
@@ -2188,7 +2225,7 @@ value returned by this option, because there are cases where the code that calculates the size has to over-estimate. Processing a pattern with the JIT compiler does not alter the value returned by this option. -int pcre2_callout_enumerate(const pcre2_code *code, int (*callback)(pcre2_callout_enumerate_block *, void *), @@ -2207,7 +2244,7 @@ contents of the callout enumeration block are described in the pcre2callout documentation, which also gives further details about callouts.
-It is possible to save compiled patterns on disc or elsewhere, and reload them later, subject to a number of restrictions. The functions whose names begin @@ -2216,7 +2253,7 @@ the pcre2serialize documentation.
-pcre2_match_data *pcre2_match_data_create(uint32_t ovecsize, pcre2_general_context *gcontext); @@ -2287,7 +2324,7 @@ match data block (for that match) have taken place. When a match data block itself is no longer needed, it should be freed by calling pcre2_match_data_free().
-int pcre2_match(const pcre2_code *code, PCRE2_SPTR subject, PCRE2_SIZE length, PCRE2_SIZE startoffset, @@ -2525,7 +2562,7 @@ examples, in the pcre2partial documentation.
-When PCRE2 is built, a default newline convention is set; this is usually the standard convention for the operating system. The default can be overridden in @@ -2565,7 +2602,7 @@ does \s, even though it includes CR and LF in the characters that it matches. Notwithstanding the above, anomalous effects may still occur when CRLF is a valid newline sequence and explicit \r or \n escapes appear in the pattern.
-
uint32_t pcre2_get_ovector_count(pcre2_match_data *match_data);
@@ -2664,7 +2701,7 @@ parentheses, no more than ovector[0] to ovector[2n+1] are set by
pcre2_match(). The other elements retain whatever values they previously
had.
PCRE2_SPTR pcre2_get_mark(pcre2_match_data *match_data);
@@ -2714,7 +2751,7 @@ the code unit offset of the invalid UTF character. Details are given in the
pcre2unicode
page.
If pcre2_match() fails, it returns a negative number. This can be converted to a text string by calling the pcre2_get_error_message() @@ -2820,7 +2857,7 @@ faulted at compile time, but more complicated cases, in particular mutual recursions between two different subpatterns, cannot be detected until matching is attempted.
-int pcre2_get_error_message(int errorcode, PCRE2_UCHAR *buffer, PCRE2_SIZE bufflen); @@ -2841,7 +2878,7 @@ returned. If the buffer is too small, the message is truncated (but still with a trailing zero), and the negative error code PCRE2_ERROR_NOMEMORY is returned. None of the messages are very long; a buffer size of 120 code units is ample.
-int pcre2_substring_length_bynumber(pcre2_match_data *match_data, uint32_t number, PCRE2_SIZE *length); @@ -2938,7 +2975,7 @@ The substring did not participate in the match. For example, if the pattern is (abc)|(def) and the subject is "def", and the ovector contains at least two capturing slots, substring number 1 is unset.
-int pcre2_substring_list_get(pcre2_match_data *match_data, " PCRE2_UCHAR ***listptr, PCRE2_SIZE **lengthsptr); @@ -2977,7 +3014,7 @@ can be distinguished from a genuine zero-length substring by inspecting the appropriate offset in the ovector, which contain PCRE2_UNSET for unset substrings, or by calling pcre2_substring_length_bynumber().
-int pcre2_substring_number_from_name(const pcre2_code *code, PCRE2_SPTR name); @@ -3037,7 +3074,7 @@ names are not included in the compiled code. The matching process uses only numbers. For this reason, the use of different names for subpatterns of the same number causes an error at compile time.
-int pcre2_substitute(const pcre2_code *code, PCRE2_SPTR subject, PCRE2_SIZE length, PCRE2_SIZE startoffset, @@ -3244,7 +3281,7 @@ obtained by calling the pcre2_get_error_message() function (see "Obtaining a textual error message" above).
-int pcre2_substring_nametable_scan(const pcre2_code *code, PCRE2_SPTR name, PCRE2_SPTR *first, PCRE2_SPTR *last); @@ -3289,7 +3326,7 @@ in the section entitled Information about a pattern. Given all the relevant entries for the name, you can extract each of their numbers, and hence the captured data.
-The traditional matching function uses a similar algorithm to Perl, which stops when it finds the first match at a given point in the subject. If you want to @@ -3307,7 +3344,7 @@ substring. Then return 1, which forces pcre2_match() to backtrack and try other alternatives. Ultimately, when it runs out of matches, pcre2_match() will yield PCRE2_ERROR_NOMATCH.
-int pcre2_dfa_match(const pcre2_code *code, PCRE2_SPTR subject, PCRE2_SIZE length, PCRE2_SIZE startoffset, @@ -3503,13 +3540,13 @@ some plausibility checks are made on the contents of the workspace, which should contain data about the previous partial match. If any of these checks fail, this error is given.
-pcre2build(3), pcre2callout(3), pcre2demo(3), pcre2matching(3), pcre2partial(3), pcre2posix(3), pcre2sample(3), pcre2unicode(3).
-
Philip Hazel
@@ -3518,9 +3555,9 @@ University Computing Service
Cambridge, England.
-Last updated: 16 June 2017
+Last updated: 10 July 2017
Copyright © 1997-2017 University of Cambridge.
diff --git a/doc/html/pcre2convert.html b/doc/html/pcre2convert.html
new file mode 100644
index 0000000..8b4d87f
--- /dev/null
+++ b/doc/html/pcre2convert.html
@@ -0,0 +1,190 @@
+
+
+Return to the PCRE2 index page. +
+
+This page is part of the PCRE2 HTML documentation. It was generated
+automatically from the original man page. If there is any nonsense in it,
+please consult the man page, in case the conversion went wrong.
+
+
+This document describes a set of functions that can be used to convert +"foreign" patterns into PCRE2 regular expressions. This facility is currently +experimental, and may be changed in future releases. Two kinds of pattern, +globs and POSIX patterns, are supported. +
+
+pcre2_convert_context *pcre2_convert_context_create(
+ pcre2_general_context *gcontext);
+
+
+pcre2_convert_context *pcre2_convert_context_copy(
+ pcre2_convert_context *cvcontext);
+
+
+void pcre2_convert_context_free(pcre2_convert_context *cvcontext);
+
+
+int pcre2_set_glob_escape(pcre2_convert_context *cvcontext,
+ uint32_t escape_char);
+
+
+int pcre2_set_glob_separator(pcre2_convert_context *cvcontext,
+ uint32_t separator_char);
+
+
+A convert context is used to hold parameters that affect the way that pattern
+conversion works. Like all PCRE2 contexts, you need to use a context only if
+you want to override the defaults. There are the usual create, copy, and free
+functions. If custom memory management functions are set in a general context
+that is passed to pcre2_convert_context_create(), they are used for all
+memory management within the conversion functions.
+
+There are only two parameters in the convert context at present. Both apply +only to glob conversions. The escape character defaults to grave accent under +Windows, otherwise backslash. It can be set to zero, meaning no escape +character, or to any punctuation character with a code point less than 256. +The separator character defaults to backslash under Windows, otherwise forward +slash. It can be set to forward slash, backslash, or dot. +
++The two setting functions return zero on success, or PCRE2_ERROR_BADDATA if +their second argument is invalid. +
+
+int pcre2_pattern_convert(PCRE2_SPTR pattern, PCRE2_SIZE length,
+ uint32_t options, PCRE2_UCHAR **buffer,
+ PCRE2_SIZE *blength, pcre2_convert_context *cvcontext);
+
+
+void pcre2_converted_pattern_free(PCRE2_UCHAR *converted_pattern);
+
+
+The first two arguments of pcre2_pattern_convert() define the foreign
+pattern that is to be converted. The length may be given as
+PCRE2_ZERO_TERMINATED. The options argument defines how the pattern is to
+be processed. If the input is UTF, the PCRE2_CONVERT_UTF option should be set.
+PCRE2_CONVERT_NO_UTF_CHECK may also be set if you are sure the input is valid.
+One or more of the glob options, or one of the following POSIX options must be
+set to define the type of conversion that is required:
+
+ PCRE2_CONVERT_GLOB + PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR + PCRE2_CONVERT_GLOB_NO_STARSTAR + PCRE2_CONVERT_POSIX_BASIC + PCRE2_CONVERT_POSIX_EXTENDED ++Details of the conversions are given below. The buffer and blength +arguments define how the output is handled: + +
+If buffer is NULL, the function just returns the length of the converted +pattern via blength. This is one less than the length of buffer needed, +because a terminating zero is always added to the output. +
++If buffer points to a NULL pointer, an output buffer is obtained using +the allocator in the context or malloc() if no context is supplied. A +pointer to this buffer is placed in the variable to which buffer points. +When no longer needed the output buffer must be freed by calling +pcre2_converted_pattern_free(). +
++If buffer points to a non-NULL pointer, blength must be set to the +actual length of the buffer provided (in code units). +
++In all cases, after successful conversion, the variable pointed to by +blength is updated to the length actually used (in code units), excluding +the terminating zero that is always added. +
++If an error occurs, the length (via blength) is set to the offset +within the input pattern where the error was detected. Only gross syntax errors +are caught; there are plenty of errors that will get passed on for +pcre2_compile() to discover. +
++The return from pcre2_pattern_convert() is zero on success or a non-zero +PCRE2 error code. Note that PCRE2 error codes may be positive or negative: +pcre2_compile() uses mostly positive codes and pcre2_match() +negative ones; pcre2_convert() uses existing codes of both kinds. A +textual error message can be obtained by calling +pcre2_get_error_message(). +
++Globs are used to match file names, and consequently have the concept of a +"path separator", which defaults to backslash under Windows and forward slash +otherwise. If PCRE2_CONVERT_GLOB is set, the wildcards * and ? are not +permitted to match separator characters, but the double-star (**) feature +(which does match separators) is supported. +
++PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR matches globs with wildcards allowed to +match separator characters. PCRE2_GLOB_NO_STARSTAR matches globs with the +double-star feature disabled. These options may be given together. +
++POSIX defines two kinds of regular expression pattern: basic and extended. +These can be processed by setting PCRE2_CONVERT_POSIX_BASIC or +PCRE2_CONVERT_POSIX_EXTENDED, respectively. +
++In POSIX patterns, backslash is not special in a character class. Unmatched +closing parentheses are treated as literals. +
++In basic patterns, ? + | {} and () must be escaped to be recognized +as metacharacters outside a character class. If the first character in the +pattern is * it is treated as a literal. ^ is a metacharacter only at the start +of a branch. +
++In extended patterns, a backslash not in a character class always +makes the next character literal, whatever it is. There are no backreferences. +
++Note: POSIX mandates that the longest possible match at the first matching +position must be found. This is not what pcre2_match() does; it yields +the first match that is found. An application can use pcre2_dfa_match() +to find the longest match, but that does not support backreferences (but then +neither do POSIX extended patterns). +
+
+Philip Hazel
+
+University Computing Service
+
+Cambridge, England.
+
+
+Last updated: 12 July 2017
+
+Copyright © 1997-2017 University of Cambridge.
+
+
+Return to the PCRE2 index page. +
diff --git a/doc/html/pcre2test.html b/doc/html/pcre2test.html index aaf8336..12ff36b 100644 --- a/doc/html/pcre2test.html +++ b/doc/html/pcre2test.html @@ -630,6 +630,10 @@ heavily used in the test files. bsr=[anycrlf|unicode] specify \R handling /B bincode show binary code without lengths callout_info show callout information + convert=<options> request foreign pattern conversion + convert_glob_escape=c set glob escape character + convert_glob_separator=c set glob separator character + convert_length set convert buffer length debug same as info,fullbincode framesize show matching frame size fullbincode show binary code with lengths @@ -1065,6 +1069,41 @@ are ignored (for the stacked copy), with a warning message, except for replace, which causes an error. Note that jitverify, which is allowed, does not carry through to any subsequent matching that uses a stacked pattern. + ++The experimental foreign pattern conversion functions in PCRE2 can be tested by +setting the convert modifier. Its argument is a colon-separated list of +options, which set the equivalent option for the pcre2_pattern_convert() +function: +
+ glob PCRE2_CONVERT_GLOB + glob_no_starstar PCRE2_CONVERT_GLOB_NO_STARSTAR + glob_no_wild_separator PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR + posix_basic PCRE2_CONVERT_POSIX_BASIC + posix_extended PCRE2_CONVERT_POSIX_EXTENDED + unset Unset all options ++The "unset" value is useful for turning off a default that has been set by a +#pattern command. When one of these options is set, the input pattern is +passed to pcre2_pattern_convert(). If the conversion is successful, the +result is reflected in the output and then passed to pcre2_compile(). The +normal utf and no_utf_check options, if set, cause the +PCRE2_CONVERT_UTF and PCRE2_CONVERT_NO_UTF_CHECK options to be passed to +pcre2_pattern_convert(). + +
+By default, the conversion function is allowed to allocate a buffer for its +output. However, if the convert_length modifier is set to a value greater +than zero, pcre2test passes a buffer of the given length. This makes it +possible to test the length check. +
++The convert_glob_escape and convert_glob_separator modifiers can be +used to specify the escape and separator characters for glob processing, +overriding the defaults, which are operating-system dependent.
@@ -1866,7 +1905,7 @@ Cambridge, England.
-Last updated: 02 July 2017
+Last updated: 12 July 2017
Copyright © 1997-2017 University of Cambridge.
diff --git a/doc/index.html.src b/doc/index.html.src
index 2a373f5..b9393d9 100644
--- a/doc/index.html.src
+++ b/doc/index.html.src
@@ -35,6 +35,9 @@ first.