Writing rules: Target this guide for beginners. Skip C++ and only describe how rules are created with regular expressions.
This commit is contained in:
parent
04b811b74f
commit
80b2c0594b
|
@ -21,8 +21,8 @@
|
||||||
<section>
|
<section>
|
||||||
<title>Introduction</title>
|
<title>Introduction</title>
|
||||||
|
|
||||||
<para>This is supposed to be a manual for developers who want to write
|
<para>This is a short guide for developers who want to write Cppcheck
|
||||||
Cppcheck rules.</para>
|
rules.</para>
|
||||||
|
|
||||||
<para>There are two ways to write rules.</para>
|
<para>There are two ways to write rules.</para>
|
||||||
|
|
||||||
|
@ -46,130 +46,73 @@
|
||||||
</varlistentry>
|
</varlistentry>
|
||||||
</variablelist>
|
</variablelist>
|
||||||
|
|
||||||
<para>The data used by the rules are not the raw source code. Cppcheck
|
<para>It is a good first step to use regular expressions. It is easier.
|
||||||
will read the source code and process it before the rules are used.</para>
|
You'll get results quicker. Therefore this guide will focus on regular
|
||||||
|
expressions.</para>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section>
|
<section>
|
||||||
<title>Data representation of the source code</title>
|
<title>Data representation of the source code</title>
|
||||||
|
|
||||||
<para>There are two types of data you can use: symbol database and token
|
<para>The data used by the rules are not the raw source code.
|
||||||
list.</para>
|
<literal>Cppcheck</literal> will read the source code and process it
|
||||||
|
before the rules are used.</para>
|
||||||
|
|
||||||
<section>
|
<para>Cppcheck is designed to find bugs and dangerous code. Stylistic
|
||||||
<title>Token lists</title>
|
information such as indentation, comments, etc are filtered out at an
|
||||||
|
early state. You don't need to worry about such stylistic information when
|
||||||
|
you write rules.</para>
|
||||||
|
|
||||||
<para>The code is stored in token lists (simple double-linked
|
<para>Between each token in the code there is always a space. For instance
|
||||||
lists).</para>
|
the raw code "1+f()" is processed into "1 + f ( )".</para>
|
||||||
|
|
||||||
<para>The token lists are designed for rule matching. All redundant
|
<para>The code is simplified in many ways. For example:</para>
|
||||||
information is removed. A number of transformations are made
|
|
||||||
automatically on the token lists to simplify writing rules.</para>
|
|
||||||
|
|
||||||
<para>The class <literal>Tokenizer</literal> create the token lists and
|
<itemizedlist>
|
||||||
perform all simplifications.</para>
|
<listitem>
|
||||||
|
<para>The templates are instantiated</para>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
<para>The class <literal>Token</literal> is used for every token in the
|
<listitem>
|
||||||
token list. The <literal>Token</literal> class also contain
|
<para>The typedefs are handled</para>
|
||||||
functionality for matching tokens.</para>
|
</listitem>
|
||||||
|
|
||||||
<section>
|
<listitem>
|
||||||
<title>Normal token list</title>
|
<para>There is no "else if". These are converted into "else {
|
||||||
|
if.."</para>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
<para>The first token list that is created has many basic
|
<listitem>
|
||||||
simplifications. For example:</para>
|
<para>The bodies of "if", "else", "while", "do" and "for" are always
|
||||||
|
enclosed in "{" and "}"</para>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
<itemizedlist>
|
<listitem>
|
||||||
<listitem>
|
<para>A declaration of multiple variables is split up into multiple
|
||||||
<para>There are no templates. Templates have been
|
variable declarations. "int a,b;" => "int a; int b;"</para>
|
||||||
instantiated.</para>
|
</listitem>
|
||||||
</listitem>
|
|
||||||
|
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>There is no "else if". These are converted into "else { if
|
<para>There is no sizeof</para>
|
||||||
.."</para>
|
</listitem>
|
||||||
</listitem>
|
|
||||||
|
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>The bodies of "if", "else", "while", "do" and "for" are
|
<para>NULL is replaced with 0</para>
|
||||||
always enclosed in "{" and "}".</para>
|
</listitem>
|
||||||
</listitem>
|
|
||||||
|
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>A declaration of multiple variables is split up into
|
<para>Static value flow analysis is made. Known values are inserted
|
||||||
multiple variable declarations. "int a,b;" => "int a; int
|
into the code.</para>
|
||||||
b;"</para>
|
</listitem>
|
||||||
</listitem>
|
|
||||||
|
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>All variables have unique ID numbers</para>
|
<para>.. and many more</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
</section>
|
|
||||||
|
|
||||||
<section>
|
<para>The simplifications are made in the <literal>Cppcheck</literal>
|
||||||
<title>Simplified token list</title>
|
<literal>Tokenizer</literal>. For more information see:
|
||||||
|
<uri>http://cppcheck.sourceforge.net/doxyoutput/classTokenizer.html</uri></para>
|
||||||
<para>The second token list that is created has all simplifications
|
|
||||||
the normal token list has and then many more simplifications. For
|
|
||||||
example:</para>
|
|
||||||
|
|
||||||
<itemizedlist>
|
|
||||||
<listitem>
|
|
||||||
<para>There is no sizeof</para>
|
|
||||||
</listitem>
|
|
||||||
|
|
||||||
<listitem>
|
|
||||||
<para>There are no templates.</para>
|
|
||||||
</listitem>
|
|
||||||
|
|
||||||
<listitem>
|
|
||||||
<para>Control flow transformations.</para>
|
|
||||||
</listitem>
|
|
||||||
|
|
||||||
<listitem>
|
|
||||||
<para>NULL is replaced with 0.</para>
|
|
||||||
</listitem>
|
|
||||||
|
|
||||||
<listitem>
|
|
||||||
<para>Static value flow analysis is made. Known values are
|
|
||||||
inserted into the code.</para>
|
|
||||||
</listitem>
|
|
||||||
|
|
||||||
<listitem>
|
|
||||||
<para>variable initialization is replaced with assignment</para>
|
|
||||||
</listitem>
|
|
||||||
</itemizedlist>
|
|
||||||
|
|
||||||
<para>The simple token list is written if you use
|
|
||||||
<literal>--debug</literal>. For example, use <literal>cppcheck --debug
|
|
||||||
test1.cpp</literal> and check this code:</para>
|
|
||||||
|
|
||||||
<programlisting>void f1() {
|
|
||||||
int a = 1;
|
|
||||||
f2(a++);
|
|
||||||
}</programlisting>
|
|
||||||
|
|
||||||
<para>The result is:</para>
|
|
||||||
|
|
||||||
<programlisting>##file test1.cpp
|
|
||||||
1: void f1 ( ) {
|
|
||||||
2: ; ;
|
|
||||||
3: f2 ( 1 ) ;
|
|
||||||
4: }</programlisting>
|
|
||||||
|
|
||||||
<para></para>
|
|
||||||
</section>
|
|
||||||
|
|
||||||
<section>
|
|
||||||
<title>Reference</title>
|
|
||||||
|
|
||||||
<para>To learn more about the token lists, the doxygen information for
|
|
||||||
the <literal>Tokenizer</literal> is recommended.</para>
|
|
||||||
|
|
||||||
<para>http://cppcheck.sourceforge.net/doxyoutput/classTokenizer.html</para>
|
|
||||||
</section>
|
|
||||||
</section>
|
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section>
|
<section>
|
||||||
|
@ -189,10 +132,10 @@
|
||||||
</listitem>
|
</listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
|
|
||||||
<para>Here is an example:</para>
|
<para>Here is a simple example:</para>
|
||||||
|
|
||||||
<programlisting><?xml version="1.0"?>
|
<programlisting><?xml version="1.0"?>
|
||||||
<rule data="simple">
|
<rule version="1">
|
||||||
<pattern>/ 0</pattern>
|
<pattern>/ 0</pattern>
|
||||||
<message>
|
<message>
|
||||||
<id>divbyzero</id>
|
<id>divbyzero</id>
|
||||||
|
@ -201,21 +144,8 @@
|
||||||
</message>
|
</message>
|
||||||
</rule></programlisting>
|
</rule></programlisting>
|
||||||
|
|
||||||
<para>It is recommended that you use the <literal>simple</literal> token
|
<para></para>
|
||||||
list whenever you can. If you need some information that is removed in it
|
|
||||||
then try the <literal>normal</literal> token list.</para>
|
|
||||||
|
|
||||||
<para>When you write the patterns remember that;</para>
|
<para></para>
|
||||||
|
|
||||||
<itemizedlist>
|
|
||||||
<listitem>
|
|
||||||
<para>tokens are always separated by spaces. "1+2" is not
|
|
||||||
possible.</para>
|
|
||||||
</listitem>
|
|
||||||
|
|
||||||
<listitem>
|
|
||||||
<para>there is no indentation, spaces, comments, line breaks.</para>
|
|
||||||
</listitem>
|
|
||||||
</itemizedlist>
|
|
||||||
</section>
|
</section>
|
||||||
</article>
|
</article>
|
||||||
|
|
Loading…
Reference in New Issue