To C or not to C: deeper code insights through static analysis
C-STAT is an innovative static analysis tool fully integrated into IAR Embedded Workbench. This article will take a closer look at what C-STAT is, and what it isn’t.
A wise man once said: Always code as if the person who ends up maintaining your code will be a violent psychopath who knows where you live. Static analysis is one of the techniques that can help you uphold this gold standard and save you trouble down the line.
Static analysis is one of those terms that mean different things to different people and there are a number of alternative definitions available. In this article, we will talk about static analysis as a general way of analyzing source code to highlight potential errors or code patterns that can inhibit code portability to other hardware platforms or expose the code to known security vulnerabilities.
Potential errors that can be found by static analysis include arithmetic issues like conversion errors or division by zero, memory management issues like memory leaks and pointer out-of-bounds situations, as well as dead code and related issues. A related area is adherence to coding standards like MISRA C, or avoidance of certain code constructs pinpointed by for example CERT¹ or the CWE² initiative as being dangerous.
The defining characteristic of static analysis in this context is that it’s done on the source level as opposed to runtime analysis that involves running the program to identify potential issues.
This kind of static source code analysis can be contrasted with for example the worst case stack depth analysis available for some IAR Embedded Workbench targets. That is also a kind of upfront, static analysis, but it’s done on a compiled and linked representation of the final program.
One of the major theoretical benefits of static analysis is that it does not impact the performance of your system, since you are not even running the system. It’s also independent of the quality of your test suites. After all, finding a specific error in running code is dependent on executing a specific path through the program with a specific data set, but a static analysis tool can, at least in theory, examine all possible paths through the code.
There is a variety of static analysis tools available for C/C++ developers, from simple pattern matchers that can find certain typical issues in code with a lot of noise (so called false positives – we will get back to that issue in a little while) to very sophisticated tools, with likewise sophisticated price tags, which might run for hours and hours before presenting their results; results that most often are more accurate than those presented by the tools at the other end of the scale, but still with a potentially large ratio of signal to noise. At the same time, most static analysis tools does not claim to find all potential errors of a kind in the source and thus leaves so called false negatives undetected, simply because the computing resources needed in trying to find all true errors of a certain kind will scale very badly with program size and thus an analysis run might take days to complete even for very modestly sized programs. Tools that claim to detect all errors of a kind definitely have their place, but are not really suited for the day-to-day activities of implementing and verifying new functionality in projects with tight deadlines (and are still plagued by false-positives and prohibitive price tags).
Another very real issue with the use of general static analysis tools in the embedded industry is the fact that they very seldom work right out of the box in your build environment. Instead, they need a heavy dose of configuration, if not outright magic hand waving, to understand everything from how to find files to how to understand and deal with language extensions tailored for efficient programming close to the hardware.
Comparing different static analysis tools is not easy – all tools are built with different assumptions, different focus areas and different technologies, so the practical outcome of a benchmark effort might very well be that all participating tools find completely different issues in the tested code. This issue is exacerbated by the fact that some tools are specialized on finding very specific errors of a very specific category and nothing else. This is very clearly shown by the SAMATE³ project administrated by the US National Institute of Standards and Technology. They run a periodic exposition of static analysis tools called SATE³ where tool vendors can participate in a highly formalized evaluation of tools.
So where does IAR Systems come in to all this? When we started to look into static analysis with the intent to provide a product to our customers we soon found out that there is a hole in the market where excellent value for money meets an extremely easy-to-use workflow with zero tool integration issues. In addition, we wanted to support a wide range of detectable problems as well as the ability to support more formalized coding standards like various MISRA rule sets etc.
The result of our endeavor is now ready and released as C-STAT, an add-on product to IAR Embedded Workbench. C-STAT is fully integrated into the IDE and is as simple to use as the regular build tools. No need for complex tool setup and no struggle with language support and general build issues. On top of this, you also have the possibility to analyze your code from the command line with the same benefits, if you prefer to manage your own build environment.
The following bullets describe briefly what C-STAT is:
- A static analyzer that supports C and C++, including all language constructs that are specific to IAR Embedded Workbench and can be enabled in the build toolchain to simplify development of code that interfaces directly with hardware.
- The static analysis is based on leading edge technology to identify code patterns ranging from merely suspicious to plain wrong, including things like buffer overflows, arithmetic and conversion issues and temporal properties of heap management etc.
- C-STAT features innovative technology to manage and reject false positives based on model checking and constraint solving.
- C-STAT supports approximately 250 unique checks, which are mapped to approximately 600 rules from different coding standards and rule sets, including rules from CERT and the CWE initiative, as well as MISRA C:2012, MISRA C++:2008 and MISRA C:2004. Note that MISRA C:2004 (and MISRA C:1998) is also supported directly by most IAR C/C++Compilers. This analysis is built on different technology and thus complements the C-STAT analysis nicely.
- C-STAT gives you fine-grained control of what rules and checks you want to deal with.
- For checks that benefits from cross module analysis, C-STAT will propagate information across module boundaries. C-STAT can also be run in multi-file compilation mode, where the whole source will be analyzed as one module to increase the analysis precision of certain checks.
In conclusion, C-STAT lets you take full control of your code and improve code quality during development. We believe companies have a lot to gain by including code quality checks in the daily work of each developer. Finding issues early minimizes the impact on the finished product as well as on the project timeline, not the mention the reduced personal risks if there actually is a violent psychopath maintaining your code down the line.
¹ The CERT C/C++ Secure Coding Standards are standards published by the Computer Emergency Response Team (CERT) providing rules and recommendations for secure coding in the C/C++ programming languages. More information is available at www.cert.org
² CWE (the Common Weakness Enumeration) is a community-developed dictionary of software weaknesses and vulnerabilities. More information is available at cwe.mitre.org/
³ More information about SAMATE is available at samate.nist.gov/Main_Page.html