rename src/make_dafsa.py to src/psl-make-dafsa, add documentation

I've talked to the good people on #debian-bootstrap who would be most
affected by the possible build-dep cycle, and i think the simplest
approach is actually to split out make_dafsa.py into its own
architecture-independent package.

I'm thinking i'll call the package psl-make-dafsa, and in the course of
shipping it, i'll place src/make_dafsa.py as /usr/bin/psl-make-dafsa.

This is because:

 * debian discourages scripts on the $PATH from having language-specific
   suffixes like .py:

    https://lintian.debian.org/tags/script-with-language-extension.html

 * "-" appears to be a more common delimiter in command names than "_":

    0 dkg@alice:~$ for x in - _; do printf "%s: %d " "$x" $(ls -1 ${PATH//:/ } | grep -c "$x"); done; echo
    -: 1235 _: 368
    0 dkg@alice:~$

 * i'd prefer to prefix the command with "psl-" since it really is
   producing and interpreting PSL-specific data structures.

Accepting this patch would mean i'd have fewer changes to make in the
debian packaging, and would allow other distributors to take a similar
approach if they want to.
This commit is contained in:
Daniel Kahn Gillmor 2016-07-14 11:52:56 +02:00
parent 8dba092c73
commit dc7bf5bbae
7 changed files with 46 additions and 10 deletions

View File

@ -78,7 +78,7 @@ representation of strings. Here we use it to reduce the whole PSL to about 32k i
Generate `psl.dafsa` from `list/public_suffix_list.dat`
$ src/make_dafsa.py --output-format=binary --input-format=psl list/public_suffix_list.dat psl.dafsa
$ src/psl-make-dafsa --output-format=binary --input-format=psl list/public_suffix_list.dat psl.dafsa
Test the result (example)
@ -90,7 +90,7 @@ License
Libpsl is made available under the terms of the MIT license.<br>
See the LICENSE file that accompanies this distribution for the full text of the license.
src/make_dafsa.py and src/lookup_string_in_fixed_set.c are licensed under the term written in
src/psl-make-dafsa and src/lookup_string_in_fixed_set.c are licensed under the term written in
src/LICENSE.chromium.
Building from git

View File

@ -72,7 +72,7 @@ AS_IF([ test "$enable_man" != no ], [
AC_MSG_RESULT([no])
])
# src/make_dafsa.py needs python 2.7+
# src/psl-make-dafsa needs python 2.7+
AM_PATH_PYTHON([2.7])
PKG_PROG_PKG_CONFIG

View File

@ -1,5 +1,5 @@
* The following License is for the source code files
make_dafsa.py and lookup_string_in_fixed_set.c.
psl-make-dafsa and lookup_string_in_fixed_set.c.
// Copyright 2015 The Chromium Authors. All rights reserved.
//

View File

@ -22,7 +22,7 @@ endif
noinst_PROGRAMS = psl2c
psl2c_SOURCES = psl2c.c lookup_string_in_fixed_set.c
psl2c_CPPFLAGS = -I$(top_srcdir)/include -DMAKE_DAFSA=\"$(top_srcdir)/src/make_dafsa.py\"
psl2c_CPPFLAGS = -I$(top_srcdir)/include -DMAKE_DAFSA=\"$(top_srcdir)/src/psl-make-dafsa\"
if BUILTIN_GENERATOR_LIBICU
psl2c_LDADD = -licuuc
endif
@ -38,4 +38,4 @@ endif
suffixes_dafsa.c: $(PSL_FILE) psl2c$(EXEEXT)
./psl2c$(EXEEXT) "$(PSL_FILE)" suffixes_dafsa.c
EXTRA_DIST = make_dafsa.py LICENSE.chromium
EXTRA_DIST = psl-make-dafsa LICENSE.chromium

View File

@ -118,14 +118,14 @@ static int GetReturnValue(const unsigned char* offset,
* Looks up the string |key| with length |key_length| in a fixed set of
* strings. The set of strings must be known at compile time. It is converted to
* a graph structure named a DAFSA (Deterministic Acyclic Finite State
* Automaton) by the script make_dafsa.py during compilation. This permits
* efficient (in time and space) lookup. The graph generated by make_dafsa.py
* Automaton) by the script psl-make-dafsa during compilation. This permits
* efficient (in time and space) lookup. The graph generated by psl-make-dafsa
* takes the form of a constant byte array which should be supplied via the
* |graph| and |length| parameters. The return value is kDafsaNotFound,
* kDafsaFound, or a bitmap consisting of one or more of kDafsaExceptionRule,
* kDafsaWildcardRule and kDafsaPrivateRule ORed together.
*
* Lookup a domain key in a byte array generated by make_dafsa.py.
* Lookup a domain key in a byte array generated by psl-make-dafsa.
*/
/* prototype to skip warning with -Wmissing-prototypes */

View File

@ -418,7 +418,7 @@ def encode(dafsa):
def to_cxx(data):
"""Generates C++ code from a list of encoded bytes."""
text = '/* This file is generated. DO NOT EDIT!\n\n'
text += 'The byte array encodes effective tld names. See make_dafsa.py for'
text += 'The byte array encodes effective tld names. See psl-make-dafsa source for'
text += ' documentation.'
text += '*/\n\n'
text += 'static const unsigned char kDafsa[%s] = {\n' % len(data)

36
src/psl-make-dafsa.1 Normal file
View File

@ -0,0 +1,36 @@
.TH PSL "1" "July 2016" "psl 0.13.0" "User Commands"
.SH NAME
psl-make-dafsa \- generate a compact and optimized DAFSA from a Public Suffix List
.SH SYNOPSIS
.B psl-make-dafsa
[\fI\,options\/\fR] \fIinfile\fR \fIoutfile\fR
.SH DESCRIPTION
\fBpsl-make-dafsa\fR produces C/C++ code or an
architecture-independent binary object that represents a Deterministic
Acyclic Finite State Automaton (DAFSA) from a textual representation
of a Public Suffix List. Input and output files must be specified on
the command line.
This compact representation enables optimized queries of the list,
saving both time and space when compared to searches of human-readable
representations.
.SH OPTIONS
The format of the data read and written by \fBpsl-make-dafsa\fR
depends on options passed to it.
.br
.TP
\fB\-\-input\-format=\fR[\fIpsl2c\fR|\fIpsl\fR]
\fBpsl2c\fR: (default) input is C code generated by libpsl/psl2c
.br
\fBpsl\fR: input is standard textual Public Suffix List file
.TP
\fB\-\-output\-format=\fR[\fIcxx\fR|\fIbinary\fR]
\fBcxx\fR: (default) output is C/C++ code
.br
\fBbinary\fR: output is an architecture-independent binary format
.SH SEE ALSO
.IR https://publicsuffix.org/ ", " https://github.com/rockdaboot/libpsl
.SH COPYRIGHT
\fBpsl-make-dafsa\fR was originally part of the Chromium project, and
has been modified by Tim Ruehsen and Daniel Kahn Gillmor. The code
and its documentation is governed by a BSD-style license.