Static Code Analysis for PHP

Static Code Analysis for PHP

Static Code Analysis (SCA) is the first of the three pillars of automated testing. This is the practice of running tools that compile (depending on the language you use) and parse your source code, matching to see of various rules are violated. No code is executed per se. These are the cheapest possible checks that you can add to your continuous integration system. This is your first line of defense. Those tools are usually set only once and they do not need additional adjustment. If you’re applying them to existing project, you’l need to fix all the reported issues first, then add the check to your CI. Otherwise the build will constantly fail. There might be some significant work, depending on the size and the legacy of your application.

If any of the checks fail, it should break the build. While there are pretty graphic tools that generate marvelous reports, stick to command line tools, since you’ll be implementing them into your build process.

PHP hipsters use Notepad

Let’s dig deeper into PHP land. Since the language is interpreted, there is no native compilation. All the checks that are built into the compliers of the compiled languages are missing. And since given the attitude of the majority of the PHP developers to not use IDE at all, there are lots of simple to catch errors that are pushed on production every day. If you use a modern PHP IDE it will light up like a Christmas tree on most of those errors. Otherwise nothing will stop you from pushing a line of code missing the semicolon at the end. Your application will perform just fine, until the execution reaches the problematic line.

PHP linter

This is the first tool in our SCA for PHP arsenal. Surprisingly enough, it is built in into the PHP command line.

php -l the_file_to_check.php

Say you missed semicolon at the end:

$url_parts = parse_url($url)

The linter will complain:

php -l api/models/mobile_push_model.php
PHP Parse error: api/models/mobile_push_model.php on line 61
Errors parsing api/models/mobile_push_model.php

The linter will catch only syntax errors similar to the one above, but it's a cheap (as in very fast) check. You should break the build and not run any expensive tests if the linter reports an error.

HHVM

HHVM is created by Facebook, attempting to speed up PHP code execution by using just-in-time compilation. You can use it for this purpose (reports say you may get more than 40% speed improvement), however for SCA purposes, you'll be using its analyzer. The command looks like this:

hhvm --hphp -t analyze --input-list="php_files_to_check.txt" --include-path="libs/external" -o static_analysis

HHVM will catch more advanced errors such as calling a method that is not declared:

$pusher->unsubscribeDevice($params, $actions);

The error would be:

UnknownObjectMethod in file:
api/models/mobile_push_model.php, line: 55, problem entry: 
$pusher->unsubscribeDevice($params, $actions) 

Other errors include calling a method with more or with less arguments than defined, using undefined variable, using result form void method and so on. The full list of errors can be found here.

Since HHVM requires lots of resources and a particular OS - Linux, you may not be able to run it on your local machine (though once running, it's surprisingly fast).

I'm using a small script in Ruby to check HHVM output for errors, print them and exit with non zero status if they are present. This is needed because you'll want to use this script in your CI environment

The usage would be like this:

check_for_hhvm_errors.rb static_analysis/CodeError.js

It's too bad that Facebook decided to remove ‘-t analyze’ option from HHVM. You can still use it though, just use HHVM version 3.3.5 or lower.

PHP Mess Detector

PHPMD can detect lots of errors that, while not critical should be fixed to keep the code clean. Two of the most important ones are cyclomatic complexity and maximum method size.

PHPMD is easily configured with a xml config file. Here is how to do that for cyclomatic complexity:


<rule ref="rulesets/codesize.xml/CyclomaticComplexity">
        <properties>
            <property name="reportLevel" value="10" />
        </properties>
</rule>
  

Keep your method complexity lower than 10 and your method size lower than 100 lines (and even this is big). If you do not want your code to quickly become big ball of mud, fix those two violations after the commit that breaks the build. No need to do 'refactoring' days to lower you complexity. Build quality in - check Deming's point #3

PHP Code Sniffer

PHPCS is a tool to check if your code conforms to the coding guidelines your team agreed upon (you have such, right?). You can enable breakages to your build if these checks fail. I personally do not recommend build failures on code style violations for various reasons. The least of it is that those errors will not lead to immediate problems with your application, no need to stress the developers. Also they'll start calling you code style nazi.

Custom Tools based on PHP Parser

As every application is unique snowflake, it has specific needs. When you develop a custom tool you have two options to parse PHP code - use token_get_all() which is very limited, or use PHP parser - more advanced.

I've used the later to create PHP Reaper - a tool to detect SQL injection in code that is using ADOdb for database connection. Use it and you'll be notified super fast of ALL SQL injections in your code, on every commit, no need to run expensive (in terms of time and money) web application security scanners. More details will follow in a separate post about this tool.

Other Tips And Tricks

Bear in mind, that the SCA should be faster than your unit tests. For this reason you can configure your SCA checks to check only the files that were changed - e.g. lint only 1 file that is affected by the commit, instead of linting all 2,500 in your whole project. Also no need to check the source of the external libraries that your project is using.

PHP has a build tool called phing. It's xml based like Java build tools. Although you can build your own scripts, try to use phing, especially if you have developers using different OSes for development — e.g. Windows and OSX. The reason is that phing is portable and everyone will be able to share and use a single build system.

Lastly, as with all your CI jobs, all SCA checks should run when you type a sinle line. When you use phing, on the CI it can be:

phing ci

or phing on development box

phing local