diff --git a/README.md b/README.md index 0a3aea8..d8f0fbf 100644 --- a/README.md +++ b/README.md @@ -55,6 +55,32 @@ flawfinder (including its various options) and related information (such as how it supports CWE). For example, the `--html` option generates output in HTML format. The `--help` option gives a brief list of options. +# Character Encoding Errors + +Flawfinder must be able to correctly interpret your source code's +character encoding. +In the vast majority of cases this is not a problem, especially +if the source code is correctly encoded using UTF-8 and your system +is configured to use UTF-8 (the most common situation by far). + +However, it's possible for flawfinder to halt if there is a +character encoding problem and you're running Python3. +The usual symptom is error meesages like this: +`Error: encoding error in FILENAME 'ENCODING' codec can't decode byte ... in position ...: invalid start byte` + +Unfortunately, Python3 fails to provide useful built-ins to deal with this. +Thus, it's non-trivial to deal with this problem without depending on external +libraries (which we're trying to avoid). + +If you have this problem, see the flawfinder manual page for a collection +of various solutions. +One of the simplest is to simply convert the source code and system +configuration to UTF-8. +You can convert source code to UTF-8 using tools such as the +system tool `iconv` or the Python program +[`cvt2utf`](https://pypi.org/project/cvt2utf/); +you can install `cvt2utf` using `pip install cvt2utf`. + # Under the hood More technically, flawfinder uses lexical scanning to find tokens diff --git a/flawfinder.1 b/flawfinder.1 index 992e789..07acc1e 100644 --- a/flawfinder.1 +++ b/flawfinder.1 @@ -664,7 +664,10 @@ You may be able to replace "C" after LC_ALL= with your real language locale Option #4: Convert the encoding of the files to be analyzed so that it's a single encoding - it's highly recommended to convert to UTF-8. -For example, the program "iconv" can be used to convert encodings. +For example, the system program "iconv" +or the Python program cvt2utf +can be used to convert encodings. +(You can install cvt2utf with "pip install cvtutf"). This works well if some files have one encoding, and some have another, but they are consistent within a single file. If the files have encoding errors, you'll have to fix them.