Change Log for PCRE2 -------------------- Version 10.10 xx-xxx-2015 ------------------------- 1. When a pattern is compiled, it remembers the highest back reference so that when matching, if the ovector is too small, extra memory can be obtained to use instead. A conditional subpattern whose condition is a check on a capture having happened, such as, for example in the pattern /^(?:(a)|b)(?(1)A|B)/, is another kind of back reference, but it was not setting the highest backreference number. This mattered only if pcre2_match() was called with an ovector that was too small to hold the capture, and there was no other kind of back reference (a situation which is probably quite rare). The effect of the bug was that the condition was always treated as FALSE when the capture could not be consulted, leading to a incorrect behaviour by pcre2_match(). This bug has been fixed. 2. Functions for serialization and deserialization of sets of compiled patterns have been added. 3. The value that is returned by PCRE2_INFO_SIZE has been corrected to remove excess code units at the end of the data block that may occasionally occur if the code for calculating the size over-estimates. This change stops the serialization code copying uninitialized data, to which valgrind objects. The documentation of PCRE2_INFO_SIZE was incorrect in stating that the size did not include the general overhead. This has been corrected. 4. All code units in every slot in the table of group names are now set, again in order to avoid accessing uninitialized data when serializing. 5. The (*NO_JIT) feature is implemented. 6. If a bug that caused pcre2_compile() to use more memory than allocated was triggered when using valgrind, the code in (3) above passed a stupidly large value to valgrind. This caused a crash instead of an "internal error" return. 7. A reference to a duplicated named group (either a back reference or a test for being set in a conditional) that occurred in a part of the pattern where PCRE2_DUPNAMES was not set caused the amount of memory needed for the pattern to be incorrectly calculated, leading to overwriting. 8. A mutually recursive set of back references such as (\2)(\1) caused a segfault at compile time (while trying to find the minimum matching length). The infinite loop is now broken (with the minimum length unset, that is, zero). Version 10.00 05-January-2015 ----------------------------- Version 10.00 is the first release of PCRE2, a revised API for the PCRE library. Changes prior to 10.00 are logged in the ChangeLog file for the old API, up to item 20 for release 8.36. The code of the library was heavily revised as part of the new API implementation. Details of each and every modification were not individually logged. In addition to the API changes, the following changes were made. They are either new functionality, or bug fixes and other noticeable changes of behaviour that were implemented after the code had been forked. 1. Including Unicode support at build time is now enabled by default, but it can optionally be disabled. It is not enabled by default at run time (no change). 2. The test program, now called pcre2test, was re-specified and almost completely re-written. Its input is not compatible with input for pcretest. 3. Patterns may start with (*NOTEMPTY) or (*NOTEMPTY_ATSTART) to set the PCRE2_NOTEMPTY or PCRE2_NOTEMPTY_ATSTART options for every subject line that is matched by that pattern. 4. For the benefit of those who use PCRE2 via some other application, that is, not writing the function calls themselves, it is possible to check the PCRE2 version by matching a pattern such as /(?(VERSION>=10)yes|no)/ against a string such as "yesno". 5. There are case-equivalent Unicode characters whose encodings use different numbers of code units in UTF-8. U+023A and U+2C65 are one example. (It is theoretically possible for this to happen in UTF-16 too.) If a backreference to a group containing one of these characters was greedily repeated, and during the match a backtrack occurred, the subject might be backtracked by the wrong number of code units. For example, if /^(\x{23a})\1*(.)/ is matched caselessly (and in UTF-8 mode) against "\x{23a}\x{2c65}\x{2c65}\x{2c65}", group 2 should capture the final character, which is the three bytes E2, B1, and A5 in UTF-8. Incorrect backtracking meant that group 2 captured only the last two bytes. This bug has been fixed; the new code is slower, but it is used only when the strings matched by the repetition are not all the same length. 6. A pattern such as /()a/ was not setting the "first character must be 'a'" information. This applied to any pattern with a group that matched no characters, for example: /(?:(?=.)|(?