2010-12-28 11:36:42 +01:00
|
|
|
<?xml version="1.0" encoding="UTF-8"?>
|
2011-06-27 23:36:23 +02:00
|
|
|
<section>
|
|
|
|
<info>
|
|
|
|
<title>Part 2 - The Cppcheck data representation</title>
|
2010-12-28 11:36:42 +01:00
|
|
|
|
|
|
|
<author>
|
|
|
|
<firstname>Daniel</firstname>
|
|
|
|
|
|
|
|
<surname>Marjamäki</surname>
|
|
|
|
|
|
|
|
<affiliation>
|
|
|
|
<orgname>Cppcheck</orgname>
|
|
|
|
</affiliation>
|
|
|
|
</author>
|
|
|
|
|
|
|
|
<pubdate>2010</pubdate>
|
2011-06-27 23:36:23 +02:00
|
|
|
</info>
|
2010-12-28 11:36:42 +01:00
|
|
|
|
|
|
|
<section>
|
|
|
|
<title>Introduction</title>
|
|
|
|
|
|
|
|
<para>In this article I will discuss the data representation that Cppcheck
|
|
|
|
uses.</para>
|
|
|
|
|
|
|
|
<para>The data representation that Cppcheck uses is specifically designed
|
|
|
|
for static analysis. It is not intended to be generic and useful for other
|
|
|
|
tasks.</para>
|
|
|
|
</section>
|
|
|
|
|
|
|
|
<section>
|
|
|
|
<title>See the data</title>
|
|
|
|
|
|
|
|
<para>There are two ways to look at the data representation at
|
|
|
|
runtime.</para>
|
|
|
|
|
|
|
|
<para>Using --rule=.+ is one way. All tokens are written on a line:</para>
|
|
|
|
|
|
|
|
<programlisting> int a ; int b ;</programlisting>
|
|
|
|
|
|
|
|
<para>Using --debug is another way. The tokens are line separated in the
|
|
|
|
same way as the original code:</para>
|
|
|
|
|
|
|
|
<programlisting>1: int a@1 ;
|
|
|
|
2: int b@2 ;</programlisting>
|
|
|
|
|
|
|
|
<para>In the --debug output there are "@1" and "@2" shown. These are the
|
|
|
|
variable ids (Cppcheck gives each variable a unique id). You can ignore
|
|
|
|
these if you only plan to write rules with regular expressions, you can't
|
|
|
|
use variable ids with regular expressions.</para>
|
2010-12-30 10:11:33 +01:00
|
|
|
|
|
|
|
<para>In general, I will use the <literal>--rule=.+</literal> output in
|
|
|
|
this article because it is more compact.</para>
|
2010-12-28 11:36:42 +01:00
|
|
|
</section>
|
|
|
|
|
|
|
|
<section>
|
2010-12-30 10:11:33 +01:00
|
|
|
<title>Some of the simplifications</title>
|
2010-12-28 11:36:42 +01:00
|
|
|
|
2010-12-30 20:54:52 +01:00
|
|
|
<para>The data is simplified in many ways.</para>
|
2010-12-28 11:36:42 +01:00
|
|
|
|
|
|
|
<section>
|
2010-12-30 10:11:33 +01:00
|
|
|
<title>Preprocessing</title>
|
2010-12-28 11:36:42 +01:00
|
|
|
|
|
|
|
<para>The Cppcheck data is preprocessed. There are no comments, #define,
|
|
|
|
#include, etc.</para>
|
|
|
|
|
2010-12-30 10:11:33 +01:00
|
|
|
<para>Original source code:</para>
|
|
|
|
|
2010-12-28 11:36:42 +01:00
|
|
|
<programlisting>#define SIZE 123
|
|
|
|
char a[SIZE];</programlisting>
|
|
|
|
|
2010-12-30 10:11:33 +01:00
|
|
|
<para>The Cppcheck data for that is:</para>
|
2010-12-28 11:36:42 +01:00
|
|
|
|
2010-12-30 10:11:33 +01:00
|
|
|
<programlisting> char a [ 123 ] ;</programlisting>
|
2010-12-28 11:36:42 +01:00
|
|
|
</section>
|
|
|
|
|
|
|
|
<section>
|
2010-12-30 20:54:52 +01:00
|
|
|
<title>typedef</title>
|
2010-12-28 11:36:42 +01:00
|
|
|
|
|
|
|
<para>The typedefs are simplified.</para>
|
|
|
|
|
|
|
|
<programlisting>typedef char s8;
|
|
|
|
s8 x;</programlisting>
|
|
|
|
|
2010-12-30 10:11:33 +01:00
|
|
|
<para>The Cppcheck data for that is:</para>
|
2010-12-28 11:36:42 +01:00
|
|
|
|
2010-12-30 10:11:33 +01:00
|
|
|
<programlisting> ; char x ;</programlisting>
|
2010-12-28 11:36:42 +01:00
|
|
|
</section>
|
|
|
|
|
|
|
|
<section>
|
2010-12-30 20:54:52 +01:00
|
|
|
<title>Calculations</title>
|
2010-12-28 11:36:42 +01:00
|
|
|
|
|
|
|
<para>Calculations are simplified.</para>
|
|
|
|
|
2010-12-30 20:54:52 +01:00
|
|
|
<programlisting>int a[10 + 4];</programlisting>
|
|
|
|
|
|
|
|
<para>The Cppcheck data for that is:</para>
|
|
|
|
|
|
|
|
<programlisting> int a [ 14 ] ;</programlisting>
|
2010-12-28 11:36:42 +01:00
|
|
|
</section>
|
|
|
|
|
|
|
|
<section>
|
|
|
|
<title>Variables</title>
|
|
|
|
|
|
|
|
<section>
|
2010-12-30 20:54:52 +01:00
|
|
|
<title>Variable declarations</title>
|
2010-12-28 11:36:42 +01:00
|
|
|
|
|
|
|
<para>Variable declarations are simplified. Only one variable can be
|
|
|
|
declared at a time. The initialization is also broken out into a
|
|
|
|
separate statement.</para>
|
|
|
|
|
2010-12-30 20:54:52 +01:00
|
|
|
<programlisting>int *a=0, b=2;</programlisting>
|
|
|
|
|
|
|
|
<para>The Cppcheck data for that is:</para>
|
|
|
|
|
|
|
|
<programlisting>int * a ; a = 0 ; int b ; b = 2 ;</programlisting>
|
2010-12-28 11:36:42 +01:00
|
|
|
|
2010-12-30 10:11:33 +01:00
|
|
|
<para>This is even done in the global scope. Even though that is
|
|
|
|
invalid in C/C++.</para>
|
2010-12-28 11:36:42 +01:00
|
|
|
</section>
|
|
|
|
|
|
|
|
<section>
|
2010-12-30 20:54:52 +01:00
|
|
|
<title>Known variable values</title>
|
2010-12-28 11:36:42 +01:00
|
|
|
|
|
|
|
<para>Known variable values are simplified.</para>
|
|
|
|
|
|
|
|
<programlisting>void f()
|
|
|
|
{
|
|
|
|
int x = 0;
|
|
|
|
x++;
|
|
|
|
array[x + 2] = 0;
|
|
|
|
}</programlisting>
|
|
|
|
|
2010-12-30 10:11:33 +01:00
|
|
|
<para>The <literal>--debug</literal> output for that is:</para>
|
2010-12-28 11:36:42 +01:00
|
|
|
|
|
|
|
<programlisting>1: void f ( )
|
|
|
|
2: {
|
|
|
|
3: ; ;
|
|
|
|
4: ;
|
|
|
|
5: array [ 3 ] = 0 ;
|
|
|
|
6: }</programlisting>
|
|
|
|
|
|
|
|
<para>The variable x is removed because it is not used after the
|
|
|
|
simplification. It is therefore redundant.</para>
|
|
|
|
|
|
|
|
<para>The "known values" doesn't have to be numeric. Variable aliases,
|
|
|
|
pointer aliases, strings, etc should be handled too.</para>
|
|
|
|
|
|
|
|
<para>Example code:</para>
|
|
|
|
|
|
|
|
<programlisting>void f()
|
|
|
|
{
|
|
|
|
char *a = strdup("hello");
|
|
|
|
char *b = a;
|
|
|
|
free(b);
|
|
|
|
}</programlisting>
|
|
|
|
|
2010-12-30 10:11:33 +01:00
|
|
|
<para>The <literal>--debug</literal> output for that is:</para>
|
2010-12-28 11:36:42 +01:00
|
|
|
|
|
|
|
<programlisting>1: void f ( )
|
|
|
|
2: {
|
|
|
|
3: char * a@1 ; a@1 = strdup ( "hello" ) ;
|
|
|
|
4: ; ;
|
|
|
|
5: free ( a@1 ) ;
|
|
|
|
6: }</programlisting>
|
|
|
|
</section>
|
|
|
|
</section>
|
|
|
|
|
|
|
|
<section>
|
|
|
|
<title>if/for/while</title>
|
|
|
|
|
|
|
|
<section>
|
2010-12-30 20:54:52 +01:00
|
|
|
<title>Braces in if/for/while-body</title>
|
2010-12-28 11:36:42 +01:00
|
|
|
|
2010-12-30 10:11:33 +01:00
|
|
|
<para>Cppcheck makes sure that there are always braces in if/for/while
|
|
|
|
bodies.</para>
|
2010-12-28 11:36:42 +01:00
|
|
|
|
2010-12-30 20:54:52 +01:00
|
|
|
<programlisting> if (x)
|
|
|
|
f1();</programlisting>
|
|
|
|
|
|
|
|
<para>The Cppcheck data for that is:</para>
|
|
|
|
|
|
|
|
<programlisting> if ( x ) { f1 ( ) ; }</programlisting>
|
2010-12-28 11:36:42 +01:00
|
|
|
</section>
|
|
|
|
|
|
|
|
<section>
|
|
|
|
<title>No else if</title>
|
|
|
|
|
|
|
|
<para>The simplified data representation doesn't have "else
|
|
|
|
if".</para>
|
|
|
|
|
|
|
|
<programlisting>void f(int x)
|
|
|
|
{
|
|
|
|
if (x == 1)
|
|
|
|
f1();
|
|
|
|
else if (x == 2)
|
|
|
|
f2();
|
|
|
|
}</programlisting>
|
|
|
|
|
2010-12-30 10:11:33 +01:00
|
|
|
<para>The <literal>--debug</literal> output:</para>
|
2010-12-28 11:36:42 +01:00
|
|
|
|
|
|
|
<programlisting>1: void f ( int x@1 )
|
|
|
|
2: {
|
|
|
|
3: if ( x@1 == 1 ) {
|
|
|
|
4: f1 ( ) ; }
|
|
|
|
5: else { if ( x@1 == 2 ) {
|
|
|
|
6: f2 ( ) ; } }
|
|
|
|
7: }
|
|
|
|
</programlisting>
|
|
|
|
</section>
|
|
|
|
|
|
|
|
<section>
|
|
|
|
<title>Condition is always true / false</title>
|
|
|
|
|
|
|
|
<para>Conditions that are always true / false are simplified.</para>
|
|
|
|
|
|
|
|
<programlisting>void f()
|
|
|
|
{
|
|
|
|
if (true) {
|
|
|
|
f1();
|
|
|
|
}
|
|
|
|
}</programlisting>
|
|
|
|
|
2010-12-30 10:11:33 +01:00
|
|
|
<para>The Cppcheck data is:</para>
|
2010-12-28 11:36:42 +01:00
|
|
|
|
2010-12-30 10:11:33 +01:00
|
|
|
<programlisting> void f ( ) { { f1 ( ) ; } }</programlisting>
|
2010-12-28 11:36:42 +01:00
|
|
|
|
|
|
|
<para>Another example:</para>
|
|
|
|
|
|
|
|
<programlisting>void f()
|
|
|
|
{
|
|
|
|
if (false) {
|
|
|
|
f1();
|
|
|
|
}
|
|
|
|
}</programlisting>
|
|
|
|
|
|
|
|
<para>The debug output:</para>
|
|
|
|
|
2010-12-30 10:11:33 +01:00
|
|
|
<programlisting> void f ( ) { }</programlisting>
|
2010-12-28 11:36:42 +01:00
|
|
|
</section>
|
|
|
|
|
|
|
|
<section>
|
2010-12-30 20:54:52 +01:00
|
|
|
<title>Assignments</title>
|
2010-12-28 11:36:42 +01:00
|
|
|
|
|
|
|
<para>Assignments within conditions are broken out from the
|
|
|
|
condition.</para>
|
|
|
|
|
|
|
|
<programlisting>void f()
|
|
|
|
{
|
|
|
|
int x;
|
|
|
|
if ((x = f1()) == 12) {
|
|
|
|
f2();
|
|
|
|
}
|
|
|
|
}</programlisting>
|
|
|
|
|
2010-12-30 20:54:52 +01:00
|
|
|
<para>The <literal>x=f1()</literal> is broken out. The
|
|
|
|
<literal>--debug</literal> output:</para>
|
2010-12-28 11:36:42 +01:00
|
|
|
|
|
|
|
<programlisting>1: void f ( )
|
|
|
|
2: {
|
|
|
|
3: int x@1 ;
|
|
|
|
4: x@1 = f1 ( ) ; if ( x@1 == 12 ) {
|
|
|
|
5: f2 ( ) ;
|
|
|
|
6: }
|
|
|
|
7: }</programlisting>
|
|
|
|
|
|
|
|
<para>Replacing the "if" with "while" in the above example:</para>
|
|
|
|
|
|
|
|
<programlisting>void f()
|
|
|
|
{
|
|
|
|
int x;
|
|
|
|
while ((x = f1()) == 12) {
|
|
|
|
f2();
|
|
|
|
}
|
|
|
|
}</programlisting>
|
|
|
|
|
2010-12-30 20:54:52 +01:00
|
|
|
<para>The <literal>x=f1()</literal> is broken out twice. The
|
2010-12-30 10:11:33 +01:00
|
|
|
<literal>--debug</literal> output:</para>
|
2010-12-28 11:36:42 +01:00
|
|
|
|
|
|
|
<programlisting>1: void f ( )
|
|
|
|
2: {
|
|
|
|
3: int x@1 ;
|
|
|
|
4: x@1 = f1 ( ) ; while ( x@1 == 12 ) {
|
|
|
|
5: f2 ( ) ; x@1 = f1 ( ) ;
|
|
|
|
5:
|
|
|
|
6: }
|
|
|
|
7: }</programlisting>
|
|
|
|
</section>
|
2010-12-30 20:54:52 +01:00
|
|
|
|
|
|
|
<section>
|
|
|
|
<title>Comparison with ></title>
|
|
|
|
|
|
|
|
<para>Comparisons are simplified. The two conditions in this example
|
|
|
|
are logically the same:</para>
|
|
|
|
|
|
|
|
<programlisting>void f()
|
|
|
|
{
|
|
|
|
if (x < 2);
|
|
|
|
if (2 > x);
|
|
|
|
}</programlisting>
|
|
|
|
|
|
|
|
<para>Cppcheck data doesn't use <literal>></literal> for
|
|
|
|
comparisons. It is converted into <literal><</literal> instead. In
|
|
|
|
the Cppcheck data there is no difference for <literal>2>x</literal>
|
|
|
|
and <literal>x<2</literal>.</para>
|
|
|
|
|
|
|
|
<programlisting>1:
|
|
|
|
2: void f ( )
|
|
|
|
3: {
|
|
|
|
4: if ( x < 2 ) { ; }
|
|
|
|
5: if ( x < 2 ) { ; }
|
|
|
|
6: }</programlisting>
|
|
|
|
|
|
|
|
<para>A similar conversion happens when <literal>>=</literal> is
|
|
|
|
used.</para>
|
|
|
|
</section>
|
|
|
|
|
|
|
|
<section>
|
|
|
|
<title>if (x) and if (!x)</title>
|
|
|
|
|
|
|
|
<para>If possible a condition will be reduced to x or !x. Here is an
|
|
|
|
example code:</para>
|
|
|
|
|
|
|
|
<programlisting>void f()
|
|
|
|
{
|
|
|
|
if (!x);
|
|
|
|
if (NULL == x);
|
|
|
|
if (x == 0);
|
|
|
|
|
|
|
|
if (x);
|
|
|
|
if (NULL != x);
|
|
|
|
if (x != 0);
|
|
|
|
}</programlisting>
|
|
|
|
|
|
|
|
<para>The <literal>--debug</literal> output is:</para>
|
|
|
|
|
|
|
|
<programlisting>1: void f ( )
|
|
|
|
2: {
|
|
|
|
3: if ( ! x ) { ; }
|
|
|
|
4: if ( ! x ) { ; }
|
|
|
|
5: if ( ! x ) { ; }
|
|
|
|
6:
|
|
|
|
7: if ( x ) { ; }
|
|
|
|
8: if ( x ) { ; }
|
|
|
|
9: if ( x ) { ; }
|
|
|
|
10: }</programlisting>
|
|
|
|
</section>
|
2010-12-28 11:36:42 +01:00
|
|
|
</section>
|
|
|
|
</section>
|
2011-06-27 23:36:23 +02:00
|
|
|
</section>
|