Posted by cos
on March 15, 2006 at 5:20 PM PST
Talking about bugs prediction technology
In this short article I'll try to summarize what I was discussing for
the last couple of months.
So, let's briefly list key factors that are likely to affect our
judgment of software quality.
- our code quality expectations (good enough quality, remember?)
- coverage isn't everything
- code complexity and a frequency of the changes
- number of bugs filed against source code modules/files
- testing methodologies
Alas, the last one doesn't sound like a beast, it might reduce the
effectiveness of defects discovery rate a lot. Obviously, it is a
choice of approaches of test failures analysis. The bulky one with a
weak algorithm of false positives detection pisses off engineers and
they begin ignoring most possibly important warnings and
Anyways, I want to talk about a combination of the first three bullets
About a year ago, a few of Sun's fellas were chatting about simpler
ways of delivering a better code. Static analysis and variety of
testing approaches were among the things on the table. At some point,
the bright idea of mixing both of those and adding some other flavors
had appeared. Afterwards, we came up with what was called Buggy Spots
The idea itself is as follows:
we're creating a static call graph (CFG) for any given source
code, using a commercial or home-grown tools
having this, we can calculate a few things about this graph,
e.g. the frequency of calls to any particular method; the
frequency of calls from a method to other methods with-in the
code; basic-block based complexity of a code, etc. (currently, we
calculate about five or seven of them, i.e. coverage per function,
basic block per function, et cetera)
when executing tests against the instrumented build, we can
prepare a code coverage metric for it
combine these two lists of modules - from CFG and from code
coverage runs - by module names
sort the resulting list ascending by in-call frequency and
descending by coverage scores
let's assume that most frequently called functions are, perhaps,
most important from the quality standpoint. Well, their code is
called more often, so any problems will immediately affect
a top-level or at least quite important functionality.
if such methods are having low coverage numbers and high
complexity or high number of reported bugs, then it might be a
good indication that the code has to be targeted by quality
engineers and/or developers.
So, all that is giving you a way of quickly selecting possibly buggy
spots, e.g. the pieces of the source code which are likely to become a
root of defects found by your customers. Why? Well, simply
because of the fact that the coverage is low in these areas and
existing tests aren't guaranteed an acceptable level of quality.
As any heuristic approach, this one might produce incorrect
results. However, our preliminary predictions are quite coherent to
the fact that most of externally reported defects were found in the
poorly covered but frequently called methods.
In organization with limited QE resource, a manager might want to
firstly address such hot spots. This will help achieve a good-enough
quality level and then concentrate on less important issues.
Yet another benefit is that the technique is a language
independent. Once you'll build a universal presentation for CFG and
code coverage information, you can use the same engine to measure
Java, C++, and programs written in other languages.
And, of course, our methodology doesn't replace a human knowledge of
the importance of product features. It helps engineers see a valuable
projection of static-to-runtime boundaries and helps focus on some
aspects of that complex matter.
And I just want to remind to you about Project Mustang (Java6)
Regressions Challenge. Please check