Add better error message on encoding problems

Python3 doesn't provide easy-to-use built-in libraries to deal
with common encoding issues (e.g., Windows-1252 encoded characters
in a UTF-8 stream), so when we see an encoding error,
provide better information on how to deal with it and
a pointer to the more detailed information in the documentation.

Signed-off-by: David A. Wheeler <dwheeler@dwheeler.com>
This commit is contained in:
David A. Wheeler 2019-09-22 15:22:17 -04:00
parent f1fdd59da5
commit fe78940e6f
1 changed files with 13 additions and 1 deletions

View File

@ -1505,7 +1505,19 @@ def process_c_file(f, patch_infos):
print("Examining", f)
sys.stdout.flush()
text = "".join(my_input.readlines())
# Python3 is often configured to use only UTF-8, and presumes
# that inputs cannot have encoding errors.
# The real world isn't like that, so provide a prettier warning
# in such cases - with some hints on how to solve it.
try:
text = "".join(my_input.readlines())
except UnicodeDecodeError as err:
print('Error: encoding error in', h(f))
print(err)
print('Run as PYTHONUTF8=0 LC_ALL=C.ISO-8859-1 python3 flawfinder,')
print('convert source code to UTF-8, or run flawfinder using python2.')
print('See documentation for more information.')
sys.exit(15)
i = 0
while i < len(text):