Writing Cppcheck rules Daniel Marjamäki Cppcheck 2010
Introduction This is supposed to be a manual for developers who want to write Cppcheck rules. There are two ways to write rules. Regular expressions Simple rules can be created by using regular expressions. No compilation is required. C++ Advanced rules must be created with C++. These rules must be compiled and linked statically with Cppcheck. The data used by the rules are not the raw source code. Cppcheck will read the source code and process it before the rules are used.
Data representation of the source code There are two types of data you can use: symbol database and token list.
Token lists The code is stored in token lists (simple double-linked lists). The token lists are designed for rule matching. All redundant information is removed. A number of transformations are made automatically on the token lists to simplify writing rules. The class Tokenizer create the token lists and perform all simplifications. The class Token is used for every token in the token list. The Token class also contain functionality for matching tokens.
Normal token list The first token list that is created has many basic simplifications. For example: There are no templates. Templates have been instantiated. There is no "else if". These are converted into "else { if .." The bodies of "if", "else", "while", "do" and "for" are always enclosed in "{" and "}". A declaration of multiple variables is split up into multiple variable declarations. "int a,b;" => "int a; int b;" All variables have unique ID numbers
Simplified token list The second token list that is created has all simplifications the normal token list has and then many more simplifications. For example: There is no sizeof There are no templates. Control flow transformations. NULL is replaced with 0. Static value flow analysis is made. Known values are inserted into the code. variable initialization is replaced with assignment The simple token list is written if you use --debug. For example, use cppcheck --debug test1.cpp and check this code: void f1() { int a = 1; f2(a++); } The result is: ##file test1.cpp 1: void f1 ( ) { 2: ; ; 3: f2 ( 1 ) ; 4: }
Reference To learn more about the token lists, the doxygen information for the Tokenizer is recommended. http://cppcheck.sourceforge.net/doxyoutput/classTokenizer.html
Regular expressions Simple rules can be defined through regular expressions. A rule consist of: a pattern to search for. an error message that is reported when pattern is found Here is an example: <?xml version="1.0"?> <rule data="simple"> <pattern>/ 0</pattern> <message> <id>divbyzero</id> <severity>error</severity> <summary>Division by zero</summary> </message> </rule> It is recommended that you use the simple token list whenever you can. If you need some information that is removed in it then try the normal token list. When you write the patterns remember that; tokens are always separated by spaces. "1+2" is not possible. there is no indentation, spaces, comments, line breaks.