Writing Cppcheck rules Daniel Marjamäki Cppcheck 2010
Introduction This is supposed to be a manual for developers who want to write Cppcheck rules. There are two ways to write rules. Regular expressions Simple rules can be created by using regular expressions. No compilation is required. C++ Advanced rules must be created with C++. These rules must be compiled and linked statically with Cppcheck. The data used by the rules are not the raw source code. Cppcheck will read the source code and process it before the rules are used.
Data representation of the source code There are two types of data you can use: symbol database and token list.
Token lists The code is stored in token lists (simple double-linked lists). The token lists are designed for rule matching. All redundant information is removed. A number of transformations are made automatically on the token lists to simplify writing rules. The class Tokenizer create the token lists and perform all simplifications. The class Token is used for every token in the token list. The Token class also contain functionality for matching tokens.
Normal token list The first token list that is created has many basic simplifications. For example: There are no templates. Templates have been instantiated. There is no "else if". These are converted into "else { if .." The bodies of "if", "else", "while", "do" and "for" are always enclosed in "{" and "}". A declaration of multiple variables is split up into multiple variable declarations. "int a,b;" => "int a; int b;" All variables have unique ID numbers
Simplified token list The second token list that is created has all simplifications the normal token list has and then many more simplifications. For example: There is no sizeof There are no templates. Control flow transformations. NULL is replaced with 0. Static value flow analysis is made. Known values are inserted into the code. variable initialization is replaced with assignment The simple token list is written if you use --debug. For example, use cppcheck --debug test1.cpp and check this code: void f1() { int a = 1; f2(a++); } The result is: ##file test1.cpp 1: void f1 ( ) { 2: ; ; 3: f2 ( 1 ) ; 4: }
Reference To learn more about the token lists, the doxygen information for the Tokenizer is recommended. http://cppcheck.sourceforge.net/doxyoutput/classTokenizer.html
Reference There are many
Symbol database TODO: write more here.
Regular expressions Simple rules can be defined through regular expressions. A rule consist of: a pattern to search for. an error message that is reported when pattern is found Here is an example: <?xml version="1.0"?> <rule data="simple"> <pattern> / 0</pattern> <message> <id>divbyzero</id> <severity>error</severity> <summary>Division by zero</summary> </message> </rule> It is recommended that you use the simple token list whenever you can. If you need some information that is removed in it then try the normal token list.
C++ Advanced rules are created with C++. Here is a simple function that detects division by zero: void CheckDivByZero::check() { // Scan through all tokens for (const Token *tok = _tokens; tok; tok = tok->next()) { // Match tokens to see if there is division with zero.. if (Token::Match(tok, "/ 0")) { // Division by zero found. Report error reportError(tok); } } } All rules must be encapsulated in classes. These classes must inherit from the base class Check. It is also possible to inherit from ExecutionPath, it provides better control-flow analysis, but that is much more advanced. You should be a master on using Check before you try to use ExecutionPath. Adding your rules to Cppcheck is easy. Just make sure they are linked with Cppcheck when it is compiled. Cppcheck will automatically use all rules that are compiled into it. TODO: A full example? The recommendation is that you use the simple token list whenever possible. Only use the normal token list when necessary. TODO: more descriptions