|
| United States Worldwide |
|
No Bad DogsHow to Make a Dog-Slow System Sit Up and Speak By Al Riske 14.Nov.05--Bryan Cantrill, the slender, energetic engineer behind dynamic tracing, has been recognized as one of the world's top young innovators. Technology Review -- the prestigious "Magazine of Innovation" published by MIT -- recently placed Cantrill among the top 35 innovators under the age of 35. "You look at the other people [on the TR35] and they're literally rocket scientists and people finding cures for cancer. Things that I'm used to talking about merely metaphorically, they're actually doing," Cantrill says. "So it's quite humbling in that regard."
One line from the magazine's summary -- "They gravitate to the most interesting and difficult scientific and engineering problems at hand, and arrive at solutions no one had imagined" -- seems especially apt in Cantrill's case. Apt because Cantrill's brainchild, DTrace, solves a problem software engineers had struggled with for decades. "He refuses to believe hard problems can't be solved simply because no one else has been able to solve them yet," Sun CTO Greg Papadopoulos says of Cantrill. "DTrace is a perfect example. Bryan and his teammates came up with an elegant solution to a seemingly intractable problem. Not only that, they solved it within the constraints of a production environment."
DTrace, the dynamic tracing facility built into Solaris 10, provides an unprecedented view into interactions between the operating system and the applications running on it. Simply put, DTrace makes it possible to quickly identify bottlenecks and dramatically increase system performance. "When you have customers like FedEx or the Philadelphia Stock Exchange stand on the stage, as they did when we launched Solaris 10, and talk about the wins they got in production, it just about brings a tear to your eye," Cantrill says.
In the case of the stock exchange, he recalls, the team used DTrace to examine a sluggish application and soon had it running two-thirds faster -- on a server that was one-third the size. "I love Jonathan's rhetoric about being the good guys. I think there are a lot of people at Sun who want to work for the good guys, and that's very important to me personally," Cantrill says. "That's why I like that example, but there are many more like that. We've done it over and over again."
Cantrill came up with the idea for DTrace in 1996, while he was still a computer science student at Brown University in Providence, Rhode Island. His faculty adviser told him it couldn't be done. "I sketched out some specific ideas on how I thought it would be possible, and his reaction was, 'Well, you know, if this were possible, they would have already done it by now,'" Cantrill recalls. "At the time, I took what he said at face value. I thought, There's obviously some subtlety I'm missing -- something about the microprocessor or something -- that makes this impossible." Fortunately, he didn't listen to the professor's other advice -- "This is the same person who told me that there was nothing interesting in operating system development, that operating systems were done" -- and joined Sun later that year to work on Solaris.
"I wanted to do operating system development, and I interviewed everywhere," Cantrill says. "The amount of energy at Sun was probably three orders of magnitude greater than any other place. All the other computer companies ... their operating system development groups were like morgues, because operating system development was viewed as something of the past." Not so at Sun. "I remember exactly where I was when this happened. It's one of those moments in your life that is crystal clear in your memory. We were on Willow Road on the bridge over the 101. I was in the backseat of a blue minivan, talking to Jeff Bonwick (now a Distinguished Engineer at Sun, then an engineer in the operating system group) and sketching out some of these ideas. My question was, 'Why is this impossible? I understand it must be impossible or you would have done it by now, but why is it impossible? What am I missing?' And Jeff said, 'Yeah, I think that would work.' "It was clear to me at that moment that Sun was, particularly in operating system development, an environment where things were not thought to be impossible simply because they hadn't been done before," Cantrill continues. "That's incredibly important. It's easy for us to forget, simply because we have such an innovative culture, that the idea that things are impossible simply because smart people have thought about the problem and they didn't come up with an answer -- that's an idea that's pervasive elsewhere."
The reason he pursued OS development says a lot about Cantrill -- and why he couldn't let go of his notions for dynamic tracing. "I had written video games, I had written spreadsheets, and I had the absurd idea that I could implement anything. Then I did some kernel development. This was at a small company that has an operating system called QNX, in Canada. They had a uniprocessor kernel and I brought it up on a multiprocessor -- designed that architecture -- and oh, my god, did that thing kick my ass," he says. "I got it working, but there were bugs where I thought to myself, 'I'm never going to find this.' It was a feeling that was completely foreign to me, because I had the idea that for a broken program, if I turned the crank long enough, I'd find the bug. But when you start implementing at the operating system kernel level, the pathologies you discover are much nastier and you lose that feeling. In fact, the feeling you get is, 'I could work in perpetuity on this and not find this bug. And this bug could prevent me from shipping.' That's the other thing that is just terrifying about doing operating system development work. You never know when the bug that you only run into during some stress test actually constitutes a design flaw that is a complete deal breaker for your software." And this is a reason to pursue OS development? "If you come up to a problem and you don't know that you can solve it, if somewhere in the back of your brain there's a voice saying, 'This one is going to have the last laugh' -- that's a hard problem. But the satisfaction you get from solving that kind of problem is incomparable," Cantrill says.
The amazing thing about DTrace is not that it solves a hard problem, but that it does so within the constraints of a production environment. "The core problem we solved was the problem of a production system that is misbehaving in a transient way -- a polite way of saying the thing is dog slow," Cantrill explains. "It's not crashing, but in many ways a fatal software problem is an easier problem. Why did the browser crash? Why did the operating system crash? But when you're dealing with a performance pathology, you have to ask the question, 'Why is the system slow?' Much harder question to answer." If an application misbehaves in development, the coder can kill it, recompile, and restart. Not so in a production environment. "If your Oracle database is misbehaving, you can't restart Oracle. That's not an option. You certainly can't recompile Oracle," he says. "So, on that production machine, how do you see what the software is doing? Before DTrace there wasn't a lot you could do. There were just a lot of ad hoc tools basically."
At first DTrace was a wish as much as anything. Cantrill and Mike Shapiro, a friend from Brown whom he recruited to join Sun, kicked the idea around over and over. "By 1999 or 2000, we had such a clear idea of what we wanted to go build that we knew exactly the kind of problems it would solve. So we would say, 'Oh, man, I really needed DTrace today' or 'DTrace would have saved my ass today.' It was kind of a ridiculous thing for something that didn't exist. Even worse, we started telling other people who were having problems that they couldn't figure out, 'You know, DTrace would solve that problem.'" Not a line of code had been written yet. "I think someone finally said, 'If DTrace would solve that problem, why don't you go write DTrace?'" And, to make a long story short, he and Shapiro -- joined by Adam Leventhal in early 2002 -- did just that. |
|
||||||||||||||||