Saturday, November 07, 2009

Given That Deterministic Replay is Still Slow, What Should We Do for Parallel Debugging?

Debugging has been tough, and it becomes tougher these days. One of the reasons is that people write more parallel programs and introduce concurrency bugs. Concurrent bugs are especially hard to debug, because many of them do not always show up. There can be many things that make a bug hard to fix, not being reproduced is probably the worst one.

What is the ideal way to debug a parallel program? In my mind, a parallel debugging framework should provide the following:
  • In the production run, records the execution with zero or very low overhead.
  • In the debugging phase, let people deterministically replay what happened when the bug manifests. It is even better if the replay can go both forward and backward.
  • Programmers can attach their favorite debuggers to the replay session.
This even sounds tempting to sequential debuggers, doesn't it? However, the real world is not ideal. The main problem is that recording introduces huge overhead in the production run. A lot of research has been done to make it faster, but still, there is no very realistic solutions yet, especially for applications that run on multi-processors. In this years SOSP, there are two papers on this topic: Do You Have to Reproduce the Bug at the First Replay Attempt? -- PRES: Probabilistic Replay with Execution Sketching on Multiprocessors and ODR: Output-Deterministic Replay for Multicore Debugging. They both send out three interesting messages:
  • They are both software-only solutions.
  • They both make compromise - push the complexity from recording to replaying.
  • They both work on multiprocessors.
This trend shows that people become more realistic about this hard problem. First, software-only solutions are preferred, because they are easier to implement. Don't get me wrong. I'm not saying hardware-based or software-hardware-hybrid solutions are unrealistic, but they indeed take longer time to become real. Second, people compromise something that is less important to make more important things to be feasible. Third, multiprocessors are so common today, so a realistic solution should work with it.

The idea of PRES is to sacrifice the efficiency of replay. Instead of reproducing the buggy execution at the first try, it requires more than one attempts to reproduce it. Meanwhile, the reward of this is that it only needs to record much less information in the production run. One could think of it as PRES drawing a "sketch" in the production run, instead of a "finished paint". Although it may need to try several times to recover the "finished paint", but "sketch" is fast and enough to catch a rough idea about what happened. If painters use this idea, why can't we? Very smart! During replaying, PRES uses both the sketch from the production run and feedbacks from previously failed attempts. The idea is that when PRES encounter a data race that has not been recorded in the production run, it makes a random guess. As soon as PRES finds out the execution does not match the sketch any more, or does not reproduce symptoms in the buggy execution, what it will do is to roll back a bit, flip some data races and give it another try. This actually reminds me about another paper, Rx, which is one of the system research papers that I like the most.

The idea of ODR is to sacrifice the accuracy of replay. Instead of reproducing the buggy execution based on internal values, it tries to reproduce an execution that has the same output as the buggy one. Output includes segment fault, core dump, print out strings and such. The reproduced execution may be different from the buggy one, but the rationale behind this idea is that if it has the same output, it is very likely that the bug that appears in the buggy execution also manifests in the reproduced execution. If output-determinism is enough, why would we want value-determinism?

There are many other good papers about deterministic replay for debugging parallel programs. If interested, you can find more from the references of these two papers.

Besides academic research, VMWare's replay debugger is a production-class parallel debugger tool. Although it only works for single processor execution, it has almost everything an ideal parallel program debugger should have. I was really really impressed. E Lewis gave talk in Google about it:


6 comments:

Helen Li said...

The talk from Google is impressive, though the video was old; they have been working on that feature for a long time.

Helen Li said...
This comment has been removed by the author.
Xiao Ma said...

To Helen: Yeah, you're right. It was about one year ago they put it into work station 6.5. For the latest information about VMWare Replay Debugger, you can take a look at the presenter's blog: http://www.replaydebugging.com/.

Anonymous said...

now I among your readers

Anonymous said...

Nice post! GA is also my biggest earning. However, it’s not a much.

discount furniture said...

Very helpful post! with amazing furniture that is the best!