Writing Cppcheck rules
Daniel
Marjamäki
Cppcheck
2010
Introduction
This is supposed to be a manual for developers who want to write
Cppcheck rules.
There are two ways to write rules.
Regular expressions
Simple rules can be created by using regular expressions. No
compilation is required.
C++
Advanced rules must be created with C++. These rules must be
compiled and linked statically with Cppcheck.
The data used by the rules are not the raw source code. Cppcheck
will read the source code and process it before the rules are used.
Data representation of the source code
There are two types of data you can use: symbol database and token
list.
Token lists
The code is stored in token lists (simple double-linked
lists).
The token lists are designed for rule matching. All redundant
information is removed. A number of transformations are made
automatically on the token lists to simplify writing rules.
The class Tokenizer create the token lists and
perform all simplifications.
The class Token is used for every token in the
token list. The Token class also contain
functionality for matching tokens.
Normal token list
The first token list that is created has many basic
simplifications. For example:
There are no templates. Templates have been
instantiated.
There is no "else if". These are converted into "else { if
.."
The bodies of "if", "else", "while", "do" and "for" are
always enclosed in "{" and "}".
A declaration of multiple variables is split up into
multiple variable declarations. "int a,b;" => "int a; int
b;"
All variables have unique ID numbers
Simplified token list
The second token list that is created has all simplifications
the normal token list has and then many more simplifications. For
example:
There is no sizeof
There are no templates.
Control flow transformations.
NULL is replaced with 0.
Static value flow analysis is made. Known values are
inserted into the code.
variable initialization is replaced with assignment
The simple token list is written if you use
--debug. For example, use cppcheck --debug
test1.cpp and check this code:
void f1() {
int a = 1;
f2(a++);
}
The result is:
##file test1.cpp
1: void f1 ( ) {
2: ; ;
3: f2 ( 1 ) ;
4: }
Reference
To learn more about the token lists, the doxygen information for
the Tokenizer is recommended.
http://cppcheck.sourceforge.net/doxyoutput/classTokenizer.html
Regular expressions
Simple rules can be defined through regular expressions.
A rule consist of:
a pattern to search for.
an error message that is reported when pattern is found
Here is an example:
<?xml version="1.0"?>
<rule data="simple">
<pattern>/ 0</pattern>
<message>
<id>divbyzero</id>
<severity>error</severity>
<summary>Division by zero</summary>
</message>
</rule>
It is recommended that you use the simple token
list whenever you can. If you need some information that is removed in it
then try the normal token list.
When you write the patterns remember that;
tokens are always separated by spaces. "1+2" is not
possible.
there is no indentation, spaces, comments, line breaks.