Sun and Oracle Community Voices How to Buy Log In United States [Change] English

»  Contrarian Minds Archive
Building a Better Bug-Checker

The Parfait Project Takes a Layered Approach

Story by Al Riske. Photography by Howard Friedenberg.

May 12, 2009 - Cristina Cifuentes, a senior staff engineer in Sun Labs, says her latest project began with a simple observation: Sun has a lot of programmers writing a lot of code, but they don't use bug-checkers, even though there are any number of tools available.

"I talked to the Solaris security group, the Java security group, the QA guys and so on, just to see what their experiences were," she says, "and the feedback was pointing to the fact that a lot of the tools do not scale well."

The Solaris operating system/networking consolidation, for example, had more than 6 million lines of code at the time. It has more than 8 million now.

The team had tried various tools, but there was always some deal breaker. Some would not compile the code; others simply took too long.

"One of them ended up taking a whole week to run over the code," Cifuentes says, "so you basically couldn't integrate it into your build process. On a day-to-day basis you couldn't use it."

"The feedback was pointing to the fact that a lot of the tools do not scale well ... One of them ended up taking a whole week to run over the code."

Cristina Cifuentes
Senior Staff Engineer
Sun Labs

 

But speed and scale weren't the only concerns.

"There were other tools that ran faster -- several hours -- but when they looked at the reports, they found that there were too many that were not really bugs. Roughly 50 percent were false positives," she says. "For a large code base, that's a significant amount of engineering time to go through them all to decide which ones are real and which ones are not."

Cristina Cifuentes

And, for quality-assurance inspectors, there was the danger that the tools could give them a false sense of security.

"If the tool says, 'In this module, you have five bugs,' and you clean them up, you might think, 'I'm done.' But, no, it turns out you still have other bugs in your code," Cifuentes says. "I talked to the QA folks about their experiences, and they had this sense of not really knowing how much the tool had missed."

The makers of commercial bug-checking tools provide no information about the accuracy of their results.

And, for all their shortcomings, such tools are extremely expensive.

"I talked to the QA folks about their experiences, and they had this sense of not really knowing how much the tool had missed: How many false negatives?"

Cristina Cifuentes
Senior Staff Engineer
Sun Labs

 

So that was the world into which Cifuentes ventured with the Parfait project. And guess what? Parfait can analyze 8 million lines of code in 20 minutes.

Not hours, not days, not a week -- 20 minutes.

How is that possible? It all stems from a single insight:

In a quick survey of available tools, Cifuentes found a range of techniques such as string matching, data-flow analysis, abstract interpretation, model checking, and theorem proving. The more complex the technique the more bugs it would find and the longer it would take.

"But the interesting observation here is that, if you have a bug -- and there are some bugs that are easy bugs and some that are hard bugs -- the easy ones you could find with a cheaper computational analysis," she says.

And by cheaper she means you spend less time on the problem.

By getting the easy bugs out of the way quickly, you free up time to find the harder ones. "If you apply one of the more expensive analyses to try to find everything," Cifuentes says, "you are wasting computing time."

"If you apply one of the more expensive analyses to try to find everything, you are wasting computing time."

Cristina Cifuentes
Senior Staff Engineer
Sun Labs

 

Based on that observation, Cifuentes set to work with a rotating team of researchers that included Dr. Bernhard Scholz, a visiting professor from the University of Sydney, post-docs Nathan Keynes and Lian Li, and a variety of interns and temporary contractors.

What they came up with was Parfait, a multi-layered tool named after the multi-layered dessert.

Cristina Cifuentes

"We say, if you have a particular type of bug you're looking for, we're going to have several different types of analysis that are going to range from cheaper to more expensive in terms of run time," she says.

Take buffer overflows, for example. This kind of bug represents a potential security threat, because a hacker might be able to insert malicious code into the overflow. So any location where that could happen goes on a list of possible bug locations.

"We start from that big list, we send it to the first analysis, the cheap one, and that analysis may be able to find some bugs for us to fix, so we can then take those off the list," Cifuentes says. "It may also be able to find that in certain locations there are no bugs whatsoever."

In this way the list of all possible bug locations gets smaller and smaller.

"By the time you end up with your most expensive analysis, you are spending that time on fewer, harder bugs," she says.

"We say, if you have a particular type of bug you're looking for, we're going to have several different types of analysis that are going to range from cheaper to more expensive in terms of run time."

Cristina Cifuentes
Senior Staff Engineer
Sun Labs

 

Parfait also uses an approach called demand-driven analysis.

"What that means is you only do the analysis when you need to," Cifuentes says.

"We can't afford going over millions lines of code multiple times, so we ask, 'What sort of things does this analysis need to do its job?' and we apply the analysis on a small subset of the code. That really saves a lot of time."

Using the buffer overflow example again, she points out that there are a limited number of places where a user can write to a buffer, so you only have to look for them in those locations, not the entire code base.

The alpha version of Parfait, demonstrated last October for the Solaris group, focused on buffer overflow bugs, but more capabilities are being added.

"At the moment the types of bugs we're finding include other memory-pointer related bugs. Things like null pointer dereference, double free, use after free, memory leaks, format string type mismatches -- they can all be found with similar types of analysis," Cifuentes says. "Those are some that we're putting our emphasis on now."

Cristina Cifuentes

Cifuentes and her colleagues in the Labs Down Under (an outpost of Sun Labs, based in Brisbane, Australia) also noted that there are no standard benchmarks for bug-checking tools.

So they put together what they call an accuracy suite -- composed of bug kernels taken from freely available open-source code -- that they feel is representative of real-world programs.

Since they already know where all the bugs are, they can then determine the accuracy of their tool -- how many false positives showed up, how many false negatives?

That is helping them make Parfait even better.

Based on results so far, Cifuentes' manager, Mario Wolczko, believes Parfait is destined to be a huge hit.

"When you compare the system with the competition," he says, "the results are very impressive in every dimension."


Cristina Cifuentes

Title: Senior Staff Engineer, Sun Labs.

Focus: Design and implementation of program analyses for large-scale software programs, including bug-checking, parallelizing compilers, binary translation, and decompilation.

Quote: "Growing up in Bogota, Colombia, I never imagined I'd do something like this."

What Others Say: "Cristina is one of the most energetic, upbeat, and organized people I've ever met, and it looks like Parfait will be a huge hit. When you compare the system with the competition, the results are very impressive in every dimension." - Distinguished Engineer Mario Wolczko

Education: Doctorate in computer science from the Queensland University of Technology, in Brisbane, Australia.

Background: Assistant professor at the University of Tasmania, Hobart. Associate professor at the University of Queensland, Brisbane. Taught various computer science courses and did research on binary translation. Joined Sun as a visiting professor in July 1999 and ended up staying.

Patents: One granted, three pending.

Papers Published: 87.

Hobbies: Scrapbooking and biking.

Little-Known Fact: Was born in San Francisco, but grew up in Bogota, Colombia, which explains her accent.

Last Book Read: For One More Day by Mitch Albom.

Favorite Food: "Depends on where I am. In the San Francisco Bay Area, Mexican. You don't get much Mexican food in Australia. I also like Indian and Thai food. We do get some of that in Australia."

Favorite Movies: Australia and Titanic.

Pet Peeve: Smokers who cluster around entryways.

Proudest Moment: Received Outstanding Alumnus award from Queensland University of Technology in 2001. ("For my parents, this was really nice. They could see that, because we came to Australia, I had completely different opportunities. This was an accomplishment I wouldn't have been able to achieve if we had stayed in Colombia.")

First Job: Developing a database program as an intern for Australia Telecom.

Favorite Destinations: The water or the mountains.

What Brought Her to Sun: Was working on a binary translation program at the university, and met some people from Sun Labs, remotely.

What Keeps Her Here: "It is a great environment. In the Labs, you are given the freedom to decide what you are going to do, looking at what real product people need."