12 Jun 2019 - tsp

Last update 27 Nov 2019

19 mins

** Disclaimer**: This is only a short primer that shows basic principles of formal
verification of C programs. This article may be updates at a later point in time. Note
that formal verification is very different than most programmers think (as one has to
clearly specify behaviour - as one should normally think about software but - let’s be
fair - normally programms don’t do). Formal verification takes a lot more time than
just hacking something and is not really required in most circumstances where basic
testing (unit tests, smoke tests, etc.) is sufficient. But it adds the ability to be really
sure about properties about program behaviour which is interesting in critical applications.

Frama-C is a collection of tools to help with static and dynamic analysis of software. It can be used on plain C code but is mostly used with ACSL annotated C code that defines some expected behaviour (Hoare logic). Frama-C is a tool developed at Inria and by the French Alternative Energies and Atomic Energy Commission (CEA). It has a modular plugin architecture that provides

- Value analysis
- Deductive proofes (based on Why that uses theorem proofers like Alt-Ergo or Coq)
- Weakest precondition based proofs
- Impact analysis
- Slicing of programs (i.e. generating subsets of programs that preserve some of the original properties)
- Dead code / spare code removal
- Dominator computation
- Functional dependency computation

It depends on the C intermediate language during parsing - Frama-C does support most of but not all of ANSI C. There is one major feature often used in C that currently cannot be handeled by Frama-C: Function pointers.

Note that formal verification does not proof that your program is **errorless**. It can
only proof that the program fulfills some properties that you require. If you specify the
wrong properties wrong behaving programs are of course also verified correctly to conform
to the wrong specification.

The weakest precondition calculus is used to proof that a codeblock `C`

indeed transforms
a set of preconditions `P`

into a set of postconditions `Q`

. These three parts form
the Horae tripple `{P,C,Q}`

. For forward calculations one could in theory calculate if the
specified postconditions is as strong or weaker than the calculated postcondition (based on the
preconditions `P`

transformed by the transformation `C`

) to reason that the proof is
satisfied. For weakest precondition the proof is normally done backwards using Dijkstras approach - the
postcondition is inversely transformed. If the specified precondition is at least as strong as the
calculated required precondition the proof is satisfied.

Currently the usage of Frama-C is easy on most operating systems despite Microsoft Windows. It is
runnable on Windows in some cases but it’s hard to achieve. To install it on FreeBSD one can use
the `devel/frama-c`

port or package, on most Linux systems it’s available via the `frama-c`

package (i.e. `apt-get install frama-c`

, `yum install frama-c`

, etc.).

If one already has a running opam installation one can use opam to install frama-c and alt-ergo together with all dependencies:

```
opam install frama-c
opam install alt-ergo
```

If one decides to do that separately one also has to install either `math/coq`

or `math/alt-ergo`

(i.e. on Linux the coq and alt-ergo packages) or both to be used as a proof assistant.

One can use a really simple example to perform some basic arithmetics and proof the program is correct to verify that the installation worked indeed. We’ll write a short program that proofs that a swap and an add function performs correctly:

```
#include <stdio.h>
/*@
requires \valid(a) && \valid(b);
assigns *a, *b;
ensures *a == \old(*b);
ensures *b == \old(*a);
*/
static void swap(int* a, int* b) {
int temp;
temp = (*a);
(*a) = (*b);
(*b) = temp;
}
/*@
assigns \result;
ensures \result == (a - b);
*/
static int difference(int a, int b) {
return a-b;
}
int main(int argc, char* argv[]) {
int a = 21;
int b = 42;
swap(&a, &b);
//@ assert a == 42 && b == 21;
int c = difference(b, a);
//@ assert c == 22;
return 0;
}
```

First one can add the sourcefile to the Frama-C project when using the GUI

As one can see frama-c already parses and somewhat preprocesses the original code shown on the right into a slightly modified form shown on the left. If no proof assistant has been run all ACSL statements are marked with blue circles - that means that these are just statements that have not been proofen / processed until now.

Then one can use the WP plugin to proof the behaviour of the code. To run analysis one selects the **Analyses**
item from the **Analyses** menu. Then one can configure the WP plugin to use the WP module under WPs **goal selection**.

As one can see all ACSL statements have been processed successfully except the error we’ve made in our assertion.

After we’ve spotted both errors (hint: There is one in the assertion and one logical error) and fixed it we can
re-run the proofer after re-parsing the code with **File/Reparse**

To demonstrate that Frama-C / Alt-Ergo helps us to catch errors that are most often overlooked we add the
postcondition, that the subtraction result of `a-b`

should be smaller than `a`

.

This is of course wrong because of two conditions. The first one is easy to spot: b may be negative or zero.
Let’s just assume that `b`

always has to be positive as well as non-zero and add the requirement to the
subtraction function.

As we expect this is returns success.

Now let’s demonstrate another possible error that’s not easy to spot. Let’s implement a **absolute value** function.
This is mathematically defined to be the value itself for positive or zero input and the value times -1 for
negative input data - i.e. this is a function that always returns the positive numeric value and discards any
sign. Let’s implement that naively in C - and think about if you spot a bug here (interestingly it’s the
same kind of bug that lead to the anomality during the Ariane 5 launch which caused the rocket to explode):

```
#include <limits.h>
/*@
ensures \result >= 0;
behavior inPositive:
assumes inValue >= 0;
ensures \result == inValue;
assigns \result;
behavior inNegative:
assumes inValue < 0;
ensures \result == -inValue;
assigns \result;
complete behaviors inPositive, inNegative;
disjoint behaviors inPositive, inNegative;
*/
int absValue(int inValue) {
if(inValue >= 0) {
return inValue;
} else {
return -1 * inValue;
}
}
int main(int argc, char* argv[]) {
int a = +1;
int b = -1;
int c = -9223372036854775808;
int tmp = absValue(a);
//@ assert(tmp == 1);
tmp = absValue(b);
//@ assert(tmp == 1);
tmp = absValue(c);
//@ assert(tmp == 9223372036854775808);
}
```

One has to enable RTE guards this time.

As we can see we spot an error in our `absValue`

function. The RTE guards inserted
automatically assertions that clamp our signed integer to the supported range (on 32 bit
systems normally in the range −2147483648 to 2147483647 and on 64 bit systems normally
in the range from −9223372036854775808 to 9223372036854775807). As we can see there exists
the minimal signed integer value that is not projectable onto any positive number. This is
because of the encoding of integers on most platforms.

If one re-visits the previous example - especially the difference function - with RTE guards in place

```
/*@
requires b > 0;
assigns \result;
ensures \result == (a - b);
ensures (\result < a);
*/
static int difference(int a, int b) {
return a-b;
}
```

one can also spot that this function does ** not** satisfy the specified requirements!

Indeed this can only be fixed if we limit the input range of the function inside our
precondition (the `requires`

statement) - which will of course be checked against each
of our calling functions as well.

As we’ve already seen with the examples function contracts specify preconditions and postconditions as well as (not seen) axioms for functions.

The ** postcondition** is specified via the

`ensures`

statement. As all contracts it’s terminated
with a semicolon at the end. As usual there can be multiple `ensures`

statements per contract,
globally specified postconditions as well as per-behaviour conditions. Note that one could
also simply join postconditions with logical ands (since these are simple mathematical statements
and not code that’s run in sequence) but using multiple ensure statements is often more readable.As usual the normal well known connectors are available:

- Negation via
`!`

- Bitwise complement via
`~`

- Bitwise and
`&`

, Bitwise or`|`

, Implication`-->`

as well as equivalence relation`<-->`

- Boolean and
`&&`

, Boolean or`||`

, Boolean exclusive or`^^`

- All of the well known operations like addition, subtraction, multiplication, division, modulo, bitshifts
- Equality via double equal signs
`==`

, inqeuality`!=`

and of course well known larger and smaller comparistons - Pointer dereferencing via
`*`

and address-of operators`&`

are of course also supported.

One can always **name** the properties one is defining:

```
/*@
requires \valid(a) && \valid(b);
assigns *a, *b;
ensures elements_swapped : (*a == \old(*b)) && (*b == \old(*a));
*/
static void swap(int* a, int* b) {
int temp;
temp = (*a);
(*a) = (*b);
(*b) = temp;
}
```

Naming is especially useful for longer contracts.

** Preconditions** are specified via

`requires`

. Normally a C programmer would also check them
with parameter validation statements - but if the program is always proofen one can discard these
checks and rely on the proof alone - since the proofer proofs the whole program.One can use typical mathematical notation:

```
/*@
requires 0 < seed <= 100;
*/
void exampleFunctionRandomSeed1To100(int seed) {
/* ... */
}
```

As we have also seen we can access the values before applying the function via `\old`

as well
as the values after the execution without any modifier. If one wants to add statements about intermediate
state at the end one can add a label there and use the `\at`

modifier:

```
int a = 21;
notthewholetruth:
a = 42;
//@ assert a == 42;
//@ assert \at(a, notthewholetruth) == 21;
```

There are 6 predefined labels in ACSL:

`Pre`

and`Old`

is the value before the function call`Post`

is the value after the function call`LoopEntry`

is the value at the loop entry (i.e. before the first run of a loop)`LoopCurrent`

is the value at the beginning of the current step in the loop`Here`

is the value of the current program point - which is how we think of variables in a sequential program.

One can also require that pointers point to valid memory locations. This is done via the `\valid`

function.
Note that the memory model of proofers cannot handle arbitrary casts - in fact it normally models disjoint
memory locations for different types! One can verify a single pointer location with `\valid(a)`

or a whole
memory range with a statement like `\valid(a + (0 .. 100))`

which would require `a[0]`

up to `a[100]`

to be valid. If one only accesses a variable with reads one can also use `\valid_read`

that’s also true
for const pointers.

** Side effects** are specified via the

`assigns`

clause. One has to list all memory locations that
get modified by the function. If a function is side-effect free one can use `assigns \nothing`

To ensure that pointers do not point to overlapping memory locations one can use the `\separated`

statement
that receives a list of memory locations that are not allowed to overlap:

```
/*@
requires \separated(a,b,...,c);
*/
```

As has also been seen above one can specify different behaviours that have different preconditions that lead
to different postconditions. Note that behaviours have to be `complete`

(i.e. describe all possible outcomes)
and `disjoint`

which means they do not overlap (i.e. are true alternatives).

Note that for all examples we have annotated the functions directly at their implementation. To use modular verification one should add the specifications to the definitions (i.e. the header files). This allows to run the verification against the implementation (i.e. the source file) as well as use the definitions to validate users of the function. In case one verifies using the modular structure fetching definitions from the header files Frama-C assumes that the postconditions are true (this is shown when using the UI using half-green, half-blue bullets besides the annotations).

This is also how ANSI-C standard library is realized for Frama-C. It uses it’s own set of header files that contain standards conformant annotations for all standards library functions.

Up until now we’ve only seen the usage of ACSL to annotate functions one can also use it to provide statements
about loops. There is the `loop invariant`

and `loop assigns`

value. With `loop invariant`

we can
specify the range of our loop variable. For example when emulating a `for`

loop with an `while`

:

```
int i = 1;
int h = 10;
/*
Note: The following program contains an error!
*/
/*@
loop invariant 1 <= i < 100;
loop invariant h == \at(h, Pre) + (i-1);
loop assigns i;
*/
while(i < 100) {
i++;
h++;
}
//@ assert i == 99
```

One can also specify a `loop variant`

- that is an expression that has to be strictly decreasing
on each loop iteration (and thus guarantees termination of the loop)

```
/*@
loop assigns \nothing;
loop variant n-i;
*/
for(int i = 0; i < n; i++) {
// ...
}
```

In mathematics one often uses the `forall`

operator to specify that a statement applies to
a number of cases. This can be done in ACSL too. For example to specify that all entries in an
array will be set to a value in the range from 0 to 10 inclusive:

```
/*@
requires \valid(data + (0 .. len-1));
assigns data[0 .. len-1];
ensures \forall size_t i; 0 <= i < len ==> 0 <= data[i] <= 10;
*/
void setToRange(int* data, size_t len);
```

If we do that with an random statement:

```
/*@
requires \valid(data + (0 .. len-1));
assigns data[0 .. len-1];
ensures \forall size_t i; 0 <= i < len ==> 0 <= data[i] <= 10;
*/
void setToRange(int* data, size_t len) {
/*@
loop invariant 0 <= i <= len;
loop invariant \forall size_t j; 0 <= j < len ==> 0 <= data[j] <= 10;
loop assigns i, data[0 .. len-1];
loop variant len-i;
*/
for(size_t i = 0; i < len; i++) {
data[i] = rand() % 11;
}
}
```

To annotate a more complex clamping function that should limit an array of input integers to values between 0 to 255:

```
/*@
requires \valid(data + (0 .. len-1));
assigns data[0 .. len-1];
ensures \forall size_t i; 0 <= i < len && \old(data[i]) > 255 ==> data[i] == 255;
ensures \forall size_t i; 0 <= i < len && \old(data[i]) < 0 ==> data[i] == 0;
ensures \forall size_t i; 0 <= i < len && \old(data[i]) >= 0 && \old(data[i] <= 255) ==> data[i] == \old(data[i]);
*/
void clampData(int* data, size_t len) {
/*@
loop invariant 0 <= i <= len;
loop invariant \forall size_t j; 0 <= j < i && \at(data[j], Pre) > 255 ==> data[j] == 255;
loop invariant \forall size_t j; 0 <= j < i && \at(data[j], Pre) < 0 ==> data[j] == 0;
loop invariant \forall size_t j; 0 <= j < i && \at(data[j], Pre) >= 0 && \at(data[j], Pre) <= 255 ==> data[j] == \at(data[j], Pre);
loop assigns i, data[0 .. len-1];
loop variant len-i;
*/
for(size_t i = 0; i < len; i++) {
if(data[i] > 255) { data[i] = 255; }
else if(data[i] < 0) { data[i] = 0; }
}
}
```

Note that assertions about unreachable code are - as normal with mathematical implication - always true since it’s not deducible that a statement that never runs is false or an event that never happens leads to an error.

The following example function does a binary search (by reducing a bracket around the target value step by step) inside an ordered array:

```
/*@
requires len > 0;
requires \valid_read(data + (0 .. len-1));
requires \forall int i,j; 0 <= i <= j < len ==> data[i] < data[j];
behavior found:
requires \exists int i; 0 <= i < len ==> data[i] == key;
ensures \result >= 0 && \result < len;
behavior notfound:
requires \forall int i; 0 <= i < len ==> data[i] != key;
ensures \result == -1;
assigns \nothing;
*/
int searchData(int* data, int len, int key) {
if(len <= 0) { return -1; }
int low = 0;
int up = len-1;
/*@
loop assigns low,up;
loop invariant up < len;
loop invariant low < len;
loop invariant low >= 0;
loop invariant up >= 0;
*/
while(low < up) {
int mid = low + (up - low)/2;
if(data[mid] > key) {
low = mid;
} else if(data[mid] < key) {
up = mid;
} else {
return mid;
}
}
if(data[low] == key) { return low; }
if(data[up] == key) { return up; }
return -1;
}
```

Predicates are the (pure) function equivalent to C functions. They allow to write the conditions we have already used and name them. They can receive arguments as functions from programming languages and have to be either true or false. These can then be used later on in different places.

For example the previous searchData function can be rewritten with predicates:

```
/*@
predicate orderedIntArray(int* data, size_t len, int val) =
len > 0 &&
\valid_read(data + (0 .. len-1)) &&
\forall int i,j; 0 <= i <= j < len ==> data[i] < data[j];
predicate elementInIntArray(int* data, size_t len, int val) =
len > 0 &&
\valid_read(data + (0 .. len-1)) &&
\exists int i; 0 <= i < len ==> data[i] == key;
*/
/*@
requires orderedIntArray(data, len, key);
behavior found:
requires elementInIntArray(data, len, key);
ensures \result >= 0 && \result < len;
behavior notfound:
requires !elementInIntArray(data, len, key);
ensures \result == -1;
assigns \nothing;
*/
int searchData(int* data, int len, int key) {
if(len <= 0) { return -1; }
int low = 0;
int up = len-1;
/*@
loop assigns low,up;
loop invariant up < len;
loop invariant low < len;
loop invariant low >= 0;
loop invariant up >= 0;
*/
while(low < up) {
int mid = low + (up - low)/2;
if(data[mid] > key) {
low = mid;
} else if(data[mid] < key) {
up = mid;
} else {
return mid;
}
}
if(data[low] == key) { return low; }
if(data[up] == key) { return up; }
return -1;
}
```

As with C++ predicates can be overloaded as long as they differ in their type signature.

Logic functions on the other hand can be functions that describe program behaviour in a logical expression (for example the calculation performed by an function). These can be used in any comparison or similar context:

```
/*@
logic vectorInProduct2D(int* v1, int* v2) = v1[0] * v2[0] + v1[1] + v2[1];
*/
```

One can of course use recursion to realize loops but has to take care of the limits of the proof assistant used to verify the expression.

This article is tagged: Programming, Math, Formal, Frama C, Security

Dipl.-Ing. Thomas Spielauer, Wien (webcomplains389t48957@tspi.at)

This webpage is also available via TOR at http://coihcmhmb6cg6bvtelykwlte45yqhxkl6ffdoco5kc3a4qn3uno53oqd.onion/