Add more wide character rules and refine CWE mapping

2014-07-22 23:01:18 -04:00 · 2014-07-22 23:01:18 -04:00 · a33ae6c62e
parent bbe7a28ada
commit a33ae6c62e
4 changed files with 109 additions and 83 deletions
--- a/correct-results.html
+++ b/correct-results.html
@ -11,7 +11,7 @@
 Here are the security scan results from
 <a href="http://www.dwheeler.com/flawfinder">Flawfinder version 1.30</a>,
 (C) 2001-2014 <a href="http://www.dwheeler.com">David A. Wheeler</a>.
-Number of dangerous functions in C/C++ ruleset: 160
+Number of rules (primarily dangerous function names) in C/C++ ruleset: 169
 <p>
 Examining test.c <br>
 Examining test2.c <br>
@ -20,7 +20,8 @@ Examining test2.c <br>
 <ul>
 <li>test.c:32: <b>  [5] </b> (buffer) <i> gets:
  Does not check for buffer overflows (<a
-  href="http://cwe.mitre.org/data/definitions/120.html">CWE-120</a>). Use
+  href="http://cwe.mitre.org/data/definitions/120.html">CWE-120</a>, <a
+  href="http://cwe.mitre.org/data/definitions/20.html">CWE-20</a>). Use
  fgets() instead. </i>
 <pre>
 gets(f);
@ -114,16 +115,18 @@ Examining test2.c <br>
 <li>test.c:25: <b>  [4] </b> (buffer) <i> scanf:
  The scanf() family's %s operation, without a limit specification, permits
  buffer overflows (<a
-  href="http://cwe.mitre.org/data/definitions/120.html">CWE-120</a>). Specify
-  a limit to %s, or use a different input function. </i>
+  href="http://cwe.mitre.org/data/definitions/120.html">CWE-120</a>, <a
+  href="http://cwe.mitre.org/data/definitions/20.html">CWE-20</a>). Specify a
+  limit to %s, or use a different input function. </i>
 <pre>
 scanf("%s", s);
 </pre>
 <li>test.c:27: <b>  [4] </b> (buffer) <i> scanf:
  The scanf() family's %s operation, without a limit specification, permits
  buffer overflows (<a
-  href="http://cwe.mitre.org/data/definitions/120.html">CWE-120</a>). Specify
-  a limit to %s, or use a different input function. </i>
+  href="http://cwe.mitre.org/data/definitions/120.html">CWE-120</a>, <a
+  href="http://cwe.mitre.org/data/definitions/20.html">CWE-20</a>). Specify a
+  limit to %s, or use a different input function. </i>
 <pre>
 scanf("%s", s);
 </pre>
@ -169,9 +172,9 @@ Examining test2.c <br>
 </pre>
 <li>test.c:91: <b>  [3] </b> (buffer) <i> getopt_long:
  Some older implementations do not protect against internal buffer overflows
-  (<a href="http://cwe.mitre.org/data/definitions/120.html">CWE-120</a>).
-  Check implementation on installation, or limit the size of all string
-  inputs. </i>
+  (<a href="http://cwe.mitre.org/data/definitions/120.html">CWE-120</a>, <a
+  href="http://cwe.mitre.org/data/definitions/20.html">CWE-20</a>). Check
+  implementation on installation, or limit the size of all string inputs. </i>
 <pre>
    while ((optc = getopt_long (argc, argv, "a",longopts, NULL )) != EOF) {
 </pre>
@ -192,8 +195,8 @@ Examining test2.c <br>
 sprintf(s, "hello");
 </pre>
 <li>test.c:45: <b>  [2] </b> (buffer) <i> char:
-  Statically-sized arrays can be overflowed or have other issues (<a
-  href="http://cwe.mitre.org/data/definitions/119.html">CWE-119</a>,<a
+  Statically-sized arrays can be improperly restricted, leading to potential
+  overflows or other issues (CWE-119:<a
  href="http://cwe.mitre.org/data/definitions/120.html">CWE-120</a>). Perform
  bounds checking, use functions that limit length, or ensure that the size
  is larger than the maximum possible length. </i>
@ -201,8 +204,8 @@ Examining test2.c <br>
  char d[20];
 </pre>
 <li>test.c:46: <b>  [2] </b> (buffer) <i> char:
-  Statically-sized arrays can be overflowed or have other issues (<a
-  href="http://cwe.mitre.org/data/definitions/119.html">CWE-119</a>,<a
+  Statically-sized arrays can be improperly restricted, leading to potential
+  overflows or other issues (CWE-119:<a
  href="http://cwe.mitre.org/data/definitions/120.html">CWE-120</a>). Perform
  bounds checking, use functions that limit length, or ensure that the size
  is larger than the maximum possible length. </i>
--- a/correct-results.txt
+++ b/correct-results.txt
@ -1,12 +1,12 @@
 Flawfinder version 1.30, (C) 2001-2014 David A. Wheeler.
-Number of dangerous functions in C/C++ ruleset: 160
+Number of rules (primarily dangerous function names) in C/C++ ruleset: 169
 Examining test.c
 Examining test2.c

 FINAL RESULTS:

 test.c:32:  [5] (buffer) gets:
-  Does not check for buffer overflows (CWE-120). Use fgets() instead.
+  Does not check for buffer overflows (CWE-120, CWE-20). Use fgets() instead.
 test.c:56:  [5] (buffer) strncat:
  Easily used incorrectly (e.g., incorrectly computing the correct maximum
  size to add) (CWE-120). Consider strcat_s, strlcat, or automatically
@ -48,12 +48,12 @@ test.c:23:  [4] (format) printf:
  (CWE-134). Use a constant for the format specification.
 test.c:25:  [4] (buffer) scanf:
  The scanf() family's %s operation, without a limit specification, permits
-  buffer overflows (CWE-120). Specify a limit to %s, or use a different input
-  function.
+  buffer overflows (CWE-120, CWE-20). Specify a limit to %s, or use a
+  different input function.
 test.c:27:  [4] (buffer) scanf:
  The scanf() family's %s operation, without a limit specification, permits
-  buffer overflows (CWE-120). Specify a limit to %s, or use a different input
-  function.
+  buffer overflows (CWE-120, CWE-20). Specify a limit to %s, or use a
+  different input function.
 test.c:38:  [4] (format) syslog:
  If syslog's format strings can be influenced by an attacker, they can be
  exploited (CWE-134). Use a constant format string for syslog.
@ -76,8 +76,8 @@ test.c:75:  [3] (shell) CreateProcess:
  different program to run.
 test.c:91:  [3] (buffer) getopt_long:
  Some older implementations do not protect against internal buffer overflows
-  (CWE-120). Check implementation on installation, or limit the size of all
-  string inputs.
+  (CWE-120, CWE-20). Check implementation on installation, or limit the size
+  of all string inputs.
 test.c:16:  [2] (buffer) strcpy:
  Does not check for buffer overflows when copying to destination (CWE-120).
  Consider using strcpy_s, strncpy, or strlcpy (warning, strncpy is easily
@ -86,13 +86,15 @@ test.c:19:  [2] (buffer) sprintf:
  Does not check for buffer overflows (CWE-120). Use sprintf_s, snprintf, or
  vsnprintf. Risk is low because the source has a constant maximum length.
 test.c:45:  [2] (buffer) char:
-  Statically-sized arrays can be overflowed or have other issues
-  (CWE-119,CWE-120). Perform bounds checking, use functions that limit
-  length, or ensure that the size is larger than the maximum possible length.
+  Statically-sized arrays can be improperly restricted, leading to potential
+  overflows or other issues (CWE-119:CWE-120). Perform bounds checking, use
+  functions that limit length, or ensure that the size is larger than the
+  maximum possible length.
 test.c:46:  [2] (buffer) char:
-  Statically-sized arrays can be overflowed or have other issues
-  (CWE-119,CWE-120). Perform bounds checking, use functions that limit
-  length, or ensure that the size is larger than the maximum possible length.
+  Statically-sized arrays can be improperly restricted, leading to potential
+  overflows or other issues (CWE-119:CWE-120). Perform bounds checking, use
+  functions that limit length, or ensure that the size is larger than the
+  maximum possible length.
 test.c:50:  [2] (buffer) memcpy:
  Does not check for buffer overflows when copying to destination (CWE-120).
  Make sure destination can always hold the source data.
--- a/53
+++ b/53
@ -286,7 +286,7 @@ def print_multi_line(text):
    position = position + len(w) + 1
      
 # This matches references to CWE identifiers, so we can HTMLize them.
-# We don't refer to CWE's with one digit, so we'll only match on 2+ digits.
+# We don't refer to CWEs with one digit, so we'll only match on 2+ digits.
 link_cwe_pattern = re.compile(r'(CWE-([1-9][0-9]+))([,()])')

 class Hit:
@ -493,7 +493,7 @@ def strip_i18n(text):

 p_trailingbackslashes = re.compile( r'(\s|\\(\n|\r))*$')

-p_c_singleton_string = re.compile( r'^\s*"([^\\]|\\[^0-6]|\\[0-6]+)?"\s*$')
+p_c_singleton_string = re.compile( r'^\s*L?"([^\\]|\\[^0-6]|\\[0-6]+)?"\s*$')

 def c_singleton_string(text):
  "Returns true if text is a C string with 0 or 1 character."
@ -501,7 +501,7 @@ def c_singleton_string(text):
  else: return 0

 # This string defines a C constant.
-p_c_constant_string = re.compile( r'^\s*"([^\\]|\\[^0-6]|\\[0-6]+)*"$')
+p_c_constant_string = re.compile( r'^\s*L?"([^\\]|\\[^0-6]|\\[0-6]+)*"$')

 def c_constant_string(text):
  "Returns true if text is a constant C string."
@ -764,14 +764,14 @@ c_ruleset = {
      "buffer", "", {}),
  "char|TCHAR|wchar_t":  # This isn't really a function call, but it works.
     (c_static_array, 2,
-      "Statically-sized arrays can be overflowed or have other issues " +
-        "(CWE-119,CWE-120)",
+      "Statically-sized arrays can be improperly restricted, " +
+      "leading to potential overflows or other issues (CWE-119:CWE-120)",
      "Perform bounds checking, use functions that limit length, " +
        "or ensure that the size is larger than the maximum possible length",
      "buffer", "", {'extract_lookahead' : 1}),

  "gets|_getts":
-     (normal, 5, "Does not check for buffer overflows (CWE-120)",
+     (normal, 5, "Does not check for buffer overflows (CWE-120, CWE-20)",
      "Use fgets() instead", "buffer", "", {'input' : 1}),

  # The "sprintf" hook will raise "format" issues instead if appropriate:
@ -781,14 +781,13 @@ c_ruleset = {
      "Use sprintf_s, snprintf, or vsnprintf",
      "buffer", "", {}),

-  # TODO: Add "wide character" versions of these functions.
-  "printf|vprintf|vwprintf|vfwprintf|_vtprintf":
+  "printf|vprintf|vwprintf|vfwprintf|_vtprintf|wprintf":
     (c_printf, 4,
      "If format strings can be influenced by an attacker, they can be exploited (CWE-134)",
      "Use a constant for the format specification",
      "format", "", {}),

-  "fprintf|vfprintf|_ftprintf|_vftprintf":
+  "fprintf|vfprintf|_ftprintf|_vftprintf|fwprintf|fvwprintf":
     (c_printf, 4,
      "If format strings can be influenced by an attacker, they can be exploited (CWE-134)",
      "Use a constant for the format specification",
@ -809,17 +808,17 @@ c_ruleset = {
      "Use a constant for the format specification",
      "format", "", { 'format_position' : 3}),

-  "scanf|vscanf|wscanf|_tscanf":
+  "scanf|vscanf|wscanf|_tscanf|vwscanf":
     (c_scanf, 4,
      "The scanf() family's %s operation, without a limit specification, " +
-        "permits buffer overflows (CWE-120)",
+        "permits buffer overflows (CWE-120, CWE-20)",
      "Specify a limit to %s, or use a different input function",
      "buffer", "", {'input' : 1}),

-  "fscanf|sscanf|vsscanf|vfscanf|_ftscanf":
+  "fscanf|sscanf|vsscanf|vfscanf|_ftscanf|fwscanf|vfwscanf|vswscanf":
     (c_scanf, 4,
      "The scanf() family's %s operation, without a limit specification, "
-      "permits buffer overflows",
+      "permits buffer overflows (CWE-120, CWE-20)",
      "Specify a limit to %s, or use a different input function",
      "buffer", "", {'input' : 1, 'format_position' : 2}),

@ -855,7 +854,7 @@ c_ruleset = {
  "realpath":
     (normal, 3,
      "This function does not protect against buffer overflows, " +
-        "and some implementations can overflow internally (CWE-120)",
+        "and some implementations can overflow internally (CWE-120/CWE-785)",
      "Ensure that the destination buffer is at least of size MAXPATHLEN, and" +
        "to protect against implementation problems, the input argument should also " +
        "be checked to ensure it is no larger than MAXPATHLEN",
@ -863,27 +862,27 @@ c_ruleset = {

  "getopt|getopt_long":
     (normal, 3,
-     "Some older implementations do not protect against internal buffer overflows (CWE-120)",
+     "Some older implementations do not protect against internal buffer overflows (CWE-120, CWE-20)",
      "Check implementation on installation, or limit the size of all string inputs",
      "buffer", "dangers-c", {'input' : 1}),

  "getpass":
     (normal, 3,
-     "Some implementations may overflow buffers (CWE-120)",
+     "Some implementations may overflow buffers (CWE-120, CWE-20)",
      "",
      "buffer", "dangers-c", {'input' : 1}),

  "getwd":
     (normal, 3,
     "This does not protect against buffer overflows "
-     "by itself, so use with caution (CWE-120)",
+     "by itself, so use with caution (CWE-120, CWE-20)",
      "Use getcwd instead",
      "buffer", "dangers-c", {'input' : 1}),

  # fread not included here; in practice I think it's rare to mistake it.
  "getchar|fgetc|getc|read|_gettc":
     (normal, 1,
-     "Check buffer boundaries if used in a loop including recursive loops (CWE-120)",
+     "Check buffer boundaries if used in a loop including recursive loops (CWE-120, CWE-20)",
      "",
      "buffer", "dangers-c", {'input' : 1}),

@ -892,7 +891,7 @@ c_ruleset = {
      "This usually indicates a security flaw.  If an " +
      "attacker can change anything along the path between the " +
      "call to access() and the file's actual use (e.g., by moving " +
-      "files), the attacker can exploit the race condition (CWE-362)",
+      "files), the attacker can exploit the race condition (CWE-362/CWE-367)",
      "Set up the correct permissions (e.g., using setuid()) and " +
      "try to open the file directly",
      "race",
@ -926,7 +925,7 @@ c_ruleset = {
      "This accepts filename arguments; if an attacker " +
      "can move those files or change the link content, " +
      "a race condition results.  " +
-      "Also, it does not terminate with ASCII NUL. (CWE-362)",
+      "Also, it does not terminate with ASCII NUL. (CWE-362, CWE-20)",
      # This is often just a bad idea, and it's hard to suggest a
      # simple alternative:
      "Reconsider approach",
@ -1009,7 +1008,7 @@ c_ruleset = {
        "or embedded spaces could allow an attacker to force a different program to run",
      "shell", "", {'check_for_null' : 1}),

-  "atoi|atol":
+  "atoi|atol|_wtoi|_wtoi64":
     (normal, 2,
      "Unless checked, the resulting number can exceed the expected range",
      " If source untrusted, check both minimum and maximum, even if the" +
@ -1060,7 +1059,7 @@ c_ruleset = {
  "getenv|curl_getenv":
     (normal, 3, "Environment variables are untrustable input if they can be" +
                 " set by an attacker.  They can have any content and" +
-                 " length, and the same variable can be set more than once (CWE-807)",
+                 " length, and the same variable can be set more than once (CWE-807, CWE-20)",
      "Check environment variables carefully before using them",
      "buffer", "", {'input' : 1}),

@ -1068,7 +1067,7 @@ c_ruleset = {
     (normal, 3, "This function is synonymous with 'getenv(\"HOME\")';" +
                 "it returns untrustable input if the environment can be" +
                 "set by an attacker.  It can have any content and length, " +
-                 "and the same variable can be set more than once (CWE-807)",
+                 "and the same variable can be set more than once (CWE-807, CWE-20)",
      "Check environment variables carefully before using them",
      "buffer", "", {'input' : 1}),

@ -1076,7 +1075,7 @@ c_ruleset = {
     (normal, 3, "This function is synonymous with 'getenv(\"TMP\")';" +
                 "it returns untrustable input if the environment can be" +
                 "set by an attacker.  It can have any content and length, " +
-                 "and the same variable can be set more than once (CWE-807)",
+                 "and the same variable can be set more than once (CWE-807, CWE-20)",
      "Check environment variables carefully before using them",
      "buffer", "", {'input' : 1}),

@ -1102,7 +1101,7 @@ c_ruleset = {
      "misc", "", {}),

  "LoadLibrary|LoadLibraryEx":
-     (normal, 3, "Ensure that the full path to the library is specified, or current directory may be used (CWE-829)",
+     (normal, 3, "Ensure that the full path to the library is specified, or current directory may be used (CWE-829, CWE-20)",
      "Use registry entry or GetWindowsDirectory to find library path, if you aren't already",
      "misc", "", {'input' : 1}),

@ -1171,7 +1170,7 @@ c_ruleset = {
   # Input functions, useful for -I
  "recv|recvfrom|recvmsg|fread|readv":
     (normal, 0, "Function accepts input from outside program",
-      "Make sure input data is filtered, especially if an attacker could manipulate it",
+      "Make sure input data is filtered, especially if an attacker could manipulate it (CWE-20)",
      "input", "", {'input' : 1}),


@ -1451,7 +1450,7 @@ def display_ruleset(ruleset):
 def initialize_ruleset():
  expand_ruleset(c_ruleset)
  if showheading:
-    print "Number of dangerous functions in C/C++ ruleset:", len(c_ruleset)
+    print "Number of rules (primarily dangerous function names) in C/C++ ruleset:", len(c_ruleset)
    if output_format: print "<p>"
  if list_rules:
    display_ruleset(c_ruleset)
--- a/flawfinder.1
+++ b/flawfinder.1
@ -732,18 +732,61 @@ In a few cases more than one CWE identifier may be listed.
 The HTML report also includes hypertext links to the CWE definitions
 hosted at MITRE.
 In this way, flawfinder is designed to meet the CWE-Output requirement.
-Note that many of these CWEs are identified in the CWE/SANS top 25 list
-2011 (http://cwe.mitre.org/top25/).
+.PP
+Many of the CWEs reported by flawfinder
+are identified in the CWE/SANS top 25 list 2011 (http://cwe.mitre.org/top25/).
+Many people will want to search for CWEs in this list,
+such as CWE-120 (classic buffer overflow),
+When flawfinder maps to a CWE that is more general than a top 25 item,
+it lists it as more-general:more-specific
+(e.g., CWE-119:CWE-120), where more-general is the actual mapping.
+If flawfinder maps to a more specific CWE item that is a specific
+case of a top 25 item,
+it is listed in the form top-25/more-specific (e.g., CWE-362/CWE-367),
+where the real mapping is the more specific CWE entry.
+If the same entry maps to multiple CWEs, the CWEs are separated by commas
+(this often occurs with CWE-20, Improper Input Validation).
+This simplifies searching for certain CWEs.
+.PP
+CWE version 2.7 (released June 23, 2014) was used for the mapping.
+The current CWE mappings select the most specific CWE the tool can determine.
+In theory, most CWE security elements (signatures/patterns that the
+tool searches for) could theoretically be mapped to
+CWE-676 (Use of Potentially Dangerous Function), but such a mapping would
+not be useful.
+Thus, more specific mappings were preferred where one could be found.
+Flawfinder is a lexical analysis tool; as a result, it is impractical
+for it to be more specific than the mappings currently implemented.
+This also means that it is unlikely to need much
+updating for map currency; it simply doesn't have enough information to
+refine to a detailed CWE level that CWE changes would typically affect.
+The list of CWE identifiers was generated automatically using "make show-cwes",
+so there is confidence that this list is correct.
+Please report CWE mapping problems as bugs if you find any.
+.PP
+Flawfinder may fail to find a vulnerability, even if flawfinder covers
+one of these CWE weaknesses.
+That said, flawfinder does find vulnerabilities listed by the CWEs it covers,
+and it will not report lines without those vulnerabilities in many cases.
+Thus, as required for any tool intending to be CWE compatible,
+flawfinder has a rate of false positives less than 100%
+and a rate of false negatives less than 100%.
+Flawfinder almost always reports whenever it finds a match to a
+CWE security element (a signature/pattern as defined in its database),
+though certain obscure constructs can cause it to fail (see BUGS below).
 .PP
 Flawfinder can report on the following CWEs
 (these are the CWEs that flawfinder covers; ``*'' marks those in the
 CWE/SANS top 25 list):
 .IP \(bu 2
+CWE-20: Improper Input Validation
+.IP \(bu 2
 CWE-22: Improper Limitation of a Pathname to a Restricted Directory (``Path Traversal'')
 .IP \(bu
 CWE-78: Improper Neutralization of Special Elements used in an OS Command (``OS Command Injection'')*
 .IP \(bu
 CWE-119: Improper Restriction of Operations within the Bounds of a Memory Buffer
+(a parent of CWE-120*, so this is shown as CWE-119:CWE-120)
 .IP \(bu
 CWE-120: Buffer Copy without Checking Size of Input (``Classic Buffer Overflow'')*
 .IP \(bu
@ -765,37 +808,13 @@ CWE-676: Use of Potentially Dangerous Function*
 .IP \(bu
 CWE-732: Incorrect Permission Assignment for Critical Resource*
 .IP \(bu
+CWE-785: Use of Path Manipulation Function without Maximum-sized Buffer
+(child of CWE-120*, so this is shown as CWE-120/CWE-785)
+.IP \(bu
 CWE-807: Reliance on Untrusted Inputs in a Security Decision*
 .IP \(bu
 CWE-829: Inclusion of Functionality from Untrusted Control Sphere*
 .PP
-CWE version 2.7 (released June 23, 2014) was used for the mapping.
-The current CWE mappings select the most specific CWE the tool can determine.
-In theory, most CWE security elements (signatures/patterns that the
-tool searches for) could theoretically be mapped to
-CWE-676 (Use of Potentially Dangerous Function), but such a mapping would
-not be useful.
-Thus, more specific mappings were preferred where one could be found.
-Flawfinder is a lexical analysis tool; as a result, it is impractical
-for it to be more specific than the mappings currently implemented.
-This also means that it is unlikely to need much
-updating for map currency; it simply doesn't have enough information to
-refine to a detailed CWE level that CWE changes would typically affect.
-The list of CWE identifiers was generated automatically using "make show-cwes",
-so there is confidence that this list is correct.
-Please report CWE mapping problems as bugs if you find any.
-.PP
-Flawfinder may fail to find a vulnerability, even if flawfinder covers
-one of these CWE weaknesses listed above.
-That said, flawfinder does find vulnerabilities listed by the CWEs it covers,
-and it will not report lines without those vulnerabilities in many cases.
-Thus, as required for any tool intending to be CWE compatible,
-flawfinder has a rate of false positives less than 100%
-and a rate of false negatives less than 100%.
-Flawfinder almost always reports whenever it finds a match to a
-CWE security element (a signature/pattern as defined in its database),
-though certain obscure constructs can cause it to fail (see BUGS below).
-.PP
 You can select a specific subset of CWEs to report by using
 the ``\-\-regex'' (-e) option.
 This option accepts a regular expression, so you can select multiple CWEs,
@ -937,6 +956,9 @@ intended to be secure against erroneous or maliciously constructed data.

 Flawfinder is currently limited to C/C++.
 In addition, when analyzing C++ it focuses primarily on the C subset of C++.
+In particular, flawfinder does not report on expressions like cin >> charbuf,
+where charbuf is a char array; flawfinder doesn't have type information,
+and ">>" is safe with many other types.
 That said,
 it's designed so that adding support for other languages should be easy.
 .PP