Added a reproducer corpus and fixed the broken libicu code.
The buffer overflow could be triggered by psl_load(), psl_load_fp(),
psl_is_public_suffix(), psl_is_public_suffix2(), psl_unregistrable_domain(),
and psl_registrable_domain().
Using valgrind testing the fuzz corpora revealed a missing
check in _add_punycode_if_needed() which lead to a
"Uninitialised value was created by a stack allocation".
Thanks to OSS-fuzz for the corpora, thanks valgrind to find this
issue (asan and ubsan didn't find it).
Also add a new ./configure function to set a distribution wide
PSL file used by psl_latest(): --with-psl-distfile
If possible that filename should point to a DAFSA PSL file that
becomes updated regularly.
* fixing memory leaks
* proper handling of unterminated results of u8_tolower()
* second call to iconv() ensures flush of internal memory
* check more code paths of psl_str_to_utf8lower() via
tests/test-registrable-domain.c
I've talked to the good people on #debian-bootstrap who would be most
affected by the possible build-dep cycle, and i think the simplest
approach is actually to split out make_dafsa.py into its own
architecture-independent package.
I'm thinking i'll call the package psl-make-dafsa, and in the course of
shipping it, i'll place src/make_dafsa.py as /usr/bin/psl-make-dafsa.
This is because:
* debian discourages scripts on the $PATH from having language-specific
suffixes like .py:
https://lintian.debian.org/tags/script-with-language-extension.html
* "-" appears to be a more common delimiter in command names than "_":
0 dkg@alice:~$ for x in - _; do printf "%s: %d " "$x" $(ls -1 ${PATH//:/ } | grep -c "$x"); done; echo
-: 1235 _: 368
0 dkg@alice:~$
* i'd prefer to prefix the command with "psl-" since it really is
producing and interpreting PSL-specific data structures.
Accepting this patch would mean i'd have fewer changes to make in the
debian packaging, and would allow other distributors to take a similar
approach if they want to.