Commit Graph

246 Commits

Author SHA1 Message Date
Tim Rühsen 6244c731b9 Extend visibility support
Also renamed PSL_PUBLIC -> PSL_API (conforming to other known libraries).
2018-04-19 10:07:10 +02:00
Chun-wei Fan 859b307593 src/psl.c: Make code compilable on native Windows
Include the Windows/Winsock2 counterparts of the networking headers on
Windows and avoid including *NIX-specific headers on Windows.

Also remove the small bits of C99isms from the code.
2018-04-19 09:37:37 +02:00
Emmanuele Bassi c9a550e93c Fix non-srcdir builds
We need to modify the various paths we reference to include the build
directory, in the case when the build directory is not the source
directory.
2018-04-06 18:26:07 +01:00
Daniel Kahn Gillmor 2c66c15a99 fix spelling errors 2018-03-23 11:33:55 +00:00
Tim Rühsen f7e0d9441a Fix --disable-builtin configure option 2018-03-05 11:25:06 +01:00
Tim Rühsen e0c1ed0e7a Improve docs for PSL_TYPE_NO_STAR_RULE 2018-02-26 11:45:57 +01:00
Tim Rühsen 8fd480584e Fix PSL_TYPE_NO_STAR_RULE and improve test suite
Reported-by: Daniel Kahn Gillmor
2018-02-23 12:09:07 +01:00
Tim Rühsen 43ec750b40 Update copyrights 2018-02-22 10:04:00 +01:00
Tim Rühsen aaacdae977 Add TLDs to (DAFSA) data to allow skipping the star rule 2018-02-21 20:49:26 +01:00
Claudio Saavedra 9e9341f5b9 psl_is_public_suffix2(): allow checking for suffixes not in the list
Add a PSL_TYPE_NO_STAR_RULE type to check for suffixes without the '*'
rule. This allows checking for suffixes that are not in the PSL.
2018-02-21 17:11:01 +02:00
Tim Rühsen 819486edd1 Remove C99 loop construct 2018-02-21 15:56:58 +01:00
Tim Rühsen 179ca703b2 Limit CPU wasting on large inputs
Large inputs on psl_registrable_domain() and psl_unregistrable_domain()
suffer from a O(N^2) behavior. This change limits N to avoid excessive
CPU usage.

At the same time we limit the fuzz corpora size to 64k which is far more
then we expect any real life domain to be.

Reported-by: OSS-Fuzz
2018-02-13 15:42:17 +01:00
Tim Rühsen 1c44781718 Fix unsigned integer overflow in _mem_is_ascii()
Found by OSS-Fuzz. It has no impact.
2017-11-03 12:10:05 +01:00
Tim Rühsen 4e51142022 psl_*_count() return -1 if information is not available 2017-09-15 17:14:32 +02:00
Tim Rühsen c7a48a0bf8 Amend start of comments in lookup_string_in_fixed_set.c 2017-09-15 17:14:32 +02:00
Tim Rühsen a12bd1d2a6 Fix input encoding for python3 2017-09-14 20:25:59 +02:00
Darshit Shah 4d5982ed98 Add new function psl_free_string()
When writing a wrapper around LibPSL in a different language it is
important that libpsl provide functions to free any memory that it
allocates. Without this, it is impossible to correctly free the memory
allocated by psl_str_to_utf8lower() function since in other languages
one may not have access to the same free() call from libc.
2017-08-30 11:07:04 +02:00
Tim Rühsen 659ee4391e Remove compiler warnings 2017-07-20 11:36:13 +02:00
Tim Rühsen a6e4703318 Fix oss-fuzz issue #2600 (buffer overflow in libicu code)
Added a reproducer corpus and fixed the broken libicu code.
The buffer overflow could be triggered by psl_load(), psl_load_fp(),
psl_is_public_suffix(), psl_is_public_suffix2(), psl_unregistrable_domain(),
and psl_registrable_domain().
2017-07-13 15:40:58 +02:00
Tim Rühsen 926cc34ade Fix uninitialised value created by stack allocation
Using valgrind testing the fuzz corpora revealed a missing
check in _add_punycode_if_needed() which lead to a
"Uninitialised value was created by a stack allocation".

Thanks to OSS-fuzz for the corpora, thanks valgrind to find this
issue (asan and ubsan didn't find it).
2017-07-09 20:21:55 +02:00
Tim Rühsen 492c884d7d Fix memory overflow in LIBICU code of psl_str_to_utf8lower()
Immediately discovered with the new --enable-asan / --enable-ubsan
configure options, thanks to the fuzz corpora.
2017-06-20 16:30:29 +02:00
Tim Rühsen d686c1fff7 Fix memleak in _psl_is_public_suffix() (found by fuzzing) 2017-06-13 22:24:18 +02:00
Tim Rühsen e584007f42 * src/psl.c (psl_str_to_utf8lower): Fix docs 2017-06-12 17:00:53 +02:00
Tim Rühsen 045bf63031 Fix double free in psl_load_fp(), found by fuzzing 2017-06-09 22:53:19 +02:00
Tim Rühsen a33feb8ff4 Fix typos found by ka7/misspell_fixer 2017-04-19 11:46:27 +02:00
Tim Rühsen 448f6e4564 Fix order of files in psl_latest()
If 'dist_filename' and 'filename' are given and both have the same
age, we want 'dist_filename' (expected DAFSA) being loaded.
2017-02-21 12:18:29 +01:00
Tim Rühsen eda8276b5f Use NON-TRANSITIONAL toASCII() with libicu 2017-01-16 10:47:21 +01:00
Tim Rühsen 26d0856d0a Fix typo 2017-01-16 10:26:12 +01:00
Tim Rühsen 526768cc5d Use TR46 non-transitional with libidn2 >= 0.14
I changed my mind after talking with the cURL
maintainer Daniel Stenberg.
See https://github.com/curl/curl/pull/1207
2017-01-14 15:47:44 +01:00
Tim Rühsen 2c17d56234 Use TR46 transitional with libidn2 >= 0.14 2017-01-03 12:30:43 +01:00
Tim Rühsen ff29f13d8f Add functions psl_latest() and psl_dist_filename()
Also add a new ./configure function to set a distribution wide
PSL file used by psl_latest(): --with-psl-distfile
If possible that filename should point to a DAFSA PSL file that
becomes updated regularly.
2016-12-06 20:16:12 +01:00
Tim Rühsen deabd4a546 Replace psl2c by psl-make-dafsa
Removed --input-format from psl-make-dafsa.
Added --output-format=cxx+ to psl-make-dafsa.
Removed psl2c.
2016-12-06 15:22:18 +01:00
Tim Rühsen 6490b8214b Don't taint out variable on error in psl_str_to_utf8lower()
Fixes #71
2016-12-05 16:28:47 +01:00
Tim Rühsen b9e04d6958 Update copyright year 2016-12-05 15:03:27 +01:00
Tim Rühsen 2a3a743643 Fix typo Publix -> Public 2016-12-05 15:01:27 +01:00
Frederic Cambus 9f0b09e830 Missing includes for in6_addr / AF_INET*, fixes compilation on OpenBSD 2016-12-02 19:11:18 +01:00
Tim Rühsen 65e785e1ca Replace NFCK -> NFKC in the docs 2016-11-29 14:49:35 +01:00
Tim Rühsen 5d32b80077 Make API docs more detailed 2016-11-14 12:08:20 +01:00
Olle Liljenzin 3f276c7d1e Fix psl-make-dafsa to work with python3 2016-11-12 21:21:28 +01:00
Tim Rühsen 761d938d2a Fix name of Olle Liljenzin in src/psl-make-dafsa.1 2016-11-06 22:47:33 +01:00
Olle Liljenzin 3a4dff8805 Fixed documentation and error message to match the actual code. 2016-11-06 16:26:44 +01:00
Tim Rühsen 2c871b1306 Skip conversion in _psl_is_public_suffix() for builtin psl context 2016-11-06 11:59:36 +01:00
Tim Rühsen 44e6bd4eb8 src/psl2c.c: Also include UTF-8 into DAFSA output 2016-11-06 11:30:20 +01:00
Tim Rühsen 3211a66f00 Put punycode + UTF-8 rules into DAFSA in utf-8 mode 2016-11-06 11:30:20 +01:00
Tim Rühsen 3ac807d987 Add --encoding to psl-make-dafsa man page 2016-11-05 10:37:01 +01:00
Tim Rühsen 4b42762cbf Skip punycode conversion for _psl_is_public_suffix() if data contains UTF-8 rules 2016-11-05 10:37:01 +01:00
Olle Liljenzin 86034ac7c9 Added function to the parser for reading DAFSA encoding mode. 2016-11-05 10:37:01 +01:00
Olle Liljenzin 8c2bcd5a24 Added version info into generated DAFSA.
psl-make-dafsa got a mode switch so that the old version can be
generated for testing.
2016-11-05 10:01:54 +01:00
Olle Liljenzin e03953e27a Updated DAFSA generator and parser to support UTF-8 encoding 2016-11-05 10:01:54 +01:00
Tim Rühsen 598a78b2de Add better test code coverage 2016-09-26 15:15:34 +02:00
Tim Rühsen 5ebc24f0e0 Code cleanup in libidn2 branch of _psl_idna_toASCII()
Reported-by: https://github.com/daurnimator
2016-09-26 10:13:43 +02:00
Tim Rühsen 7eb8592035 Let u8_tolower() allocate the result buffer.
Reported-by: https://github.com/daurnimator
2016-09-25 19:44:33 +02:00
Tim Rühsen 32543dd5a5 Avoid unneeded memory allocactions in psl_str_to_utf8lower()
Reported-by: https://github.com/daurnimator
2016-09-25 12:49:56 +02:00
Tim Rühsen 1baaacccd5 Fix libidn/libidn2 code path of psl_str_to_utf8lower()
* fixing memory leaks
* proper handling of unterminated results of u8_tolower()
* second call to iconv() ensures flush of internal memory
* check more code paths of psl_str_to_utf8lower() via
  tests/test-registrable-domain.c
2016-09-23 12:35:08 +02:00
Tim Rühsen e2812e8c4c Check return value for strdup and strndup
Fixes #60
Reported-by: https://github.com/daurnimator
2016-09-22 15:53:31 +02:00
Tim Rühsen 351b3fb912 Remove redundant define of countof() 2016-09-22 11:37:23 +02:00
Tim Rühsen 9e1ca81be4 Remove memory allocations from _utf8_to_utf32()
Reported-by: https://github.com/daurnimator
2016-09-22 11:19:52 +02:00
Tim Rühsen 6cfb33e530 Amend API docs to be more precise about invalid input.
Fixes #59
Reported-by: https://github.com/daurnimator
2016-09-21 12:03:00 +02:00
Tim Rühsen 10f7b5fe7c Fallback to malloc from alloca for larger memory chunks
Fixes #58
Reported-by: https://github.com/daurnimator
2016-09-21 11:54:39 +02:00
Tim Rühsen 1ab7be5641 Check malloc/realloc results in src/psl.c
Fixes #57
Reported-by: https://github.com/daurnimator
2016-09-21 11:15:43 +02:00
Dagobert Michelsen 7983f86820 Use proper library path and libs for ICU 2016-09-17 14:46:06 +02:00
Tim Rühsen 126d2dca9c Package and install psl.1 and psl-make-dafsa.1
Fixes #53
Reported-by: https://github.com/yselkowitz
2016-09-17 14:46:00 +02:00
Jeremy Ehrhardt 003dec4203 Change src/psl-make-dafsa shebang so it'll run on OS X 2016-09-16 18:42:54 -07:00
Daniel Kahn Gillmor dc7bf5bbae rename src/make_dafsa.py to src/psl-make-dafsa, add documentation
I've talked to the good people on #debian-bootstrap who would be most
affected by the possible build-dep cycle, and i think the simplest
approach is actually to split out make_dafsa.py into its own
architecture-independent package.

I'm thinking i'll call the package psl-make-dafsa, and in the course of
shipping it, i'll place src/make_dafsa.py as /usr/bin/psl-make-dafsa.

This is because:

 * debian discourages scripts on the $PATH from having language-specific
   suffixes like .py:

    https://lintian.debian.org/tags/script-with-language-extension.html

 * "-" appears to be a more common delimiter in command names than "_":

    0 dkg@alice:~$ for x in - _; do printf "%s: %d " "$x" $(ls -1 ${PATH//:/ } | grep -c "$x"); done; echo
    -: 1235 _: 368
    0 dkg@alice:~$

 * i'd prefer to prefix the command with "psl-" since it really is
   producing and interpreting PSL-specific data structures.

Accepting this patch would mean i'd have fewer changes to make in the
debian packaging, and would allow other distributors to take a similar
approach if they want to.
2016-07-14 11:55:04 +02:00
Tim Rühsen 8dba092c73 Add magic header to DAFSA binary files 2016-07-13 11:14:18 +02:00
Tim Rühsen 852931571f Fixed invocation of make_dafsa.py in psl2c.c 2016-07-13 11:13:04 +02:00
Daniel Kahn Gillmor dc9cc02982 s/publix/public/ 2016-07-06 15:32:51 +02:00
Daniel Kahn Gillmor 248327e4aa use https where possible 2016-07-06 15:32:51 +02:00
Tim Rühsen 2914afa8c7 New linter/ dir with pslint.py selftest 2016-02-18 16:40:06 +01:00
Tim Rühsen 811513f17e Print message and exit when no suffixes are found 2016-02-12 12:27:25 +01:00
Tim Rühsen d19c46c003 Make a few enhancements to pslint 2016-02-08 14:11:52 +01:00
Tim Rühsen 36609787d5 Fix python3 UTF-8 runtime error and section detection 2016-02-08 09:40:43 +01:00
Tim Rühsen 568394438d Add disabled code for 'Group Order' checking
The check has been disabled since it turned out that those
'groupings' of PSL entries are not really ordered in the way
(# of labels, TLD, sublabel#1, sublabel#2, ...)

This commit also fixes section detection / verification
2016-02-05 12:16:50 +01:00
Tim Rühsen aa028e606b Adjust text in doublette comment in src/pslint.py 2016-02-02 22:49:02 +01:00
Tim Rühsen a46af675b4 Fix indentation multi-line comment in src/pslint.py 2016-02-02 22:41:18 +01:00
Tim Rühsen bd70c79c18 Indent src/pslint.py with tabs 2016-02-02 22:20:58 +01:00
Tim Rühsen 98aed19c3a Convert copyright line to UTF-8 in pslint.py 2016-02-02 19:59:45 +01:00
Tim Rühsen 3ba8903915 Add PSL linter written in Python 2016-02-02 16:43:03 +01:00
Tim Rühsen 8c39291f55 Slightly shorter DAFSA array when sorting input 2016-01-05 10:57:07 +01:00
Tim Rühsen 1bd9347af9 Fix for commit fd928da46e 2016-01-04 22:15:43 +01:00
Tim Rühsen fd928da46e Fix python3 incompatibilities in make_dafsa.py 2016-01-04 20:22:13 +01:00
Tim Rühsen 95a5152e56 Update copyright year to 2016 2016-01-02 13:36:49 +01:00
Tim Rühsen 96e0848d81 Release unused memory after loading DAFSA data 2016-01-02 13:31:53 +01:00
Tim Rühsen 748e3ae9cc Load DAFSA precompiled files (auto-detection) 2016-01-01 22:38:21 +01:00
Tim Rühsen 1604cb3dca Fix make_dafsa.py to generate 4 bit return values 2016-01-01 22:32:11 +01:00
Tim Rühsen 23345f5f37 Convert lookup_string_in_fixed_set.c into UTF-8 2016-01-01 22:31:01 +01:00
Tim Rühsen c9d76e4898 Remove unused variable source_date_epoch 2016-01-01 17:20:30 +01:00
Tim Rühsen cde5e53ea6 Remove psl_builtin_compile_time() for reproducable builds 2016-01-01 15:44:24 +01:00
Tim Rühsen c699e3c441 Add --input-format and --output-format to make_dafsa.py 2015-12-30 17:52:48 +01:00
Tim Rühsen 355edc152f Fix for previous commit 2015-12-29 17:20:28 +01:00
Tim Rühsen 82e9445493 Add psl2c --binary to create DAFSA binary file from PSL 2015-12-29 16:53:47 +01:00
Tim Rühsen 5363290cbe Remove debugging printf 2015-12-26 14:29:10 +01:00
Tim Rühsen 093d5eac3d Fix ./configure --disable-runtime
Added runtime punycode generation code from
  http://www.nicemice.net/idn/punycode-spec.gz
2015-12-26 14:15:08 +01:00
Tim Rühsen e252af877f Fix ./configure --disable-builtin 2015-12-15 20:46:25 +01:00
Daniel Kahn Gillmor 01a3751524 re-fix psl_builtin_outdated() 2015-12-11 22:59:15 -05:00
Tim Rühsen 0ca3741df6 Use DAWG/DAFSA format for builtin data
This data representation reduces the size of the PSL data
drastically and still allows fast lookups.
2015-12-09 09:35:04 +01:00
Tim Rühsen 36139b601d Merge branch 'develop' into dafsa 2015-12-07 10:33:44 +01:00
Tim Rühsen 9d2e93f0b8 New function psl_is_public_suffix2()
The current PSL has two sections, ICANN and PRIVATE.
This new function allows to limit the check for one or both
of these sections.
2015-12-06 21:55:56 +01:00
Tim Rühsen 883e67f008 Create src/suffixes_dafsa.c with DAFSA C array 2015-12-04 21:26:30 +01:00
Tim Rühsen aa0593460c Remove .travis.yml from branch 2015-12-04 17:15:03 +01:00