Sunday, November 29, 2009

You Must Do Code Review And Do It Right

Our company started code review about one and half years ago. It is one of the smarted decisions we have made. It significantly improves the code quality and helps programmers learn from each other. Do code review, unless there is only one programmer in your company or you don't care about code quality. I just read a white paper about code review from Smart Bear. It is a very interesting paper. I also have a few tips myself to share.

First, stop sending patches through emails and pick the right tool for code review. As a start-up company, we always prefer free tools. We found Review Board really great. It was developed by a couple engineers in VMWare and used within VMWare originally. It makes the code review very convenient and efficient. We would not have started code review that early if there was no Review Board. With Review Board, all a submitter needs to do is to upload a diff. Review Board automatically pulls files from SCM systems, generates side-by-side diff view and highlights code differences and syntax. People can add conversations on any lines in the diff. Review Board works in an asynchronized way - you don't need to finish a complete review at once. You can do a review piece by piece at your own pace. Review Board also supports follow-up diff files, email notifications and many other cool features. Its interface has been improved significantly after the 1.0 release. If you are open to commercial tools, check out Code Collaborator from Smart Bear and Crucible from Atlassian. I'm not familiar with them.

Second, pick the right reviewer. The simple rule is to pick the people who know the background of your changes and have some time to do the review in the near future. Someone suggested to check the reviewer's schedule first, but I think it is too much for small companies. It would help if the review tool shows information about this. For example, the tool can show how many pending reviews the reviewer has right now and how large they are.

Third, submit reviews in the right size. According to the user study in CISCO, the white paper shows that 200-400 line changes is the right size for one review. The longer a review is, the fewer defects can be found and the more boring the review process will be. I absolutely agree with this. The longest code review I have done is about 80 files with thousands of lines of changed code. I had 3 shots of espresso to finish it and I don't think I did a good job. When a review is about 200-400 lines of changes, I feel that I can finish it in 30-60 minutes, so I can be more focus and careful to read every detail.

Forth, respect your reviewers. Keep in mind that a code review is a big favor your reviewer does you, so try to make it as easy as possible. Prepare the code review well, don't just throw a diff and let your reviewer struggle to understand it. I often review frontend code from one of our engineers. He always attaches screenshots and draws figures to explain the layout and relationship of UI components, which makes the review much easier. I think there are a few things that the submitter should always do:
  • Explain the rationale behind the review and how it works
  • Explain the reason why to change each of the files, which is usually not mentioned in comments in the code
  • Guide the reviewer through the review, such as which files should be read first
  • Attach additional documents if they help
Fifth, respect your submitters. When someone sends you a review, it shows that she trusts you. Do the review in a timely manner. If you can't do it in time, talk to the submitter and pass the review to others. More importantly, do the review with responsibility. Code review is the last step to find problems, assume that the submitter already fully tests the changes. After the review, the code will be committed to the repository. So try your best to find problems.

Sixth, a submitter should always address all comments and upload follow-up diffs. This is a ground rule in our company and it works really well. It makes sure that the submitter responds all problems and fixes them in the code. The reviewer won't give "Ship It!" until follow-up diffs are uploaded.

Seventh, code review is a good opportunity to check unit testing code. Submitters should submit unit testing code together with the functioning code and reviewers should be responsible to check if the unit testing code is good enough. Keep in mind that unit testing code is as important as functioning code, so it needs to be reviewed as well. Including unit testing code in code review has a few advantages:

  • Code review covers all new code - perfect coverage.
  • The reviewer gets very familiar with the changed code after reading the code, its the best time to review the unit testing code as well.
  • The only chance to enforce writing unit testing code. Some testing coverage tools can also do this, but they are inflexible and not be able to find missed unit testing code in finer granularity. 
  • Unit testing code help the reviewer understand the changes.
Last but not the least, build a positive code review culture. The white paper also mentions this. I think it is very important. We never had culture problems in our company, but we may have it in the future as we grow. There are a few things I think people should do.
  • Everybody should realize code review is a win-win deal. Both reviewers and submitters learn something from it.
  • As a reviewer, don't be harsh. Ask questions before make statements. Code review is not the stage where you show off.
  • As a reviewer, don't force everybody else to write code in the exact some way as you write code. So don't fixating on small things if there is disagreement.
  • As a reviewer, inject compliment when you see some cool snippets.
  • As a submitter, don't be self-defensive. Remember that the review is about the code, not about people who write the code.
  • As a submitter, respond to every comment, nicely. Speak out your reasons if you think some comments are not correct.
I'm happy to hear other good code review experiences.

By the way, Smart Bear also published a book, "Best Kept Secrets of Peer Code Review". I just requested for a free copy from here.


Updates on Dec. 1, 2009: modified the description of how Review Board was developed, as pointed out by one of the authors of RB: Christian Hammond. Thanks, Christian!

Thursday, November 26, 2009

Peking University's New Tactic Will Not Work In Reality


(Picture is drawn by Xiaomo Tao. It is from www.jyb.cn)

In China, since 1977 the college entrance exam has been the only way for universities to select students. Although it is just a 3-day exam once every year, it significantly affects tens of millions of people's lives. I'm not exaggerating at all. 10.2 million students took college entrance exams in 2009. When somebody prepares for it, that will be the only thing her whole family care about for about a year.  I believe most of them are single-child, which make the number up to about 30 million. This single-exam policy has been criticized for many years, however, nobody could come up with better solutions.

One of the big changes is to give universities the power to select students, instead of depending on nation's Ministry of Education to do everything for everyone. This is good, because universities can use whatever they think is the best way to pick high school graduates. A few days ago, Peking University, one of the best universities in China, proposed a brave and new tactic. It chose 39 high school principals and let each of them recommend 1-5 students. If these students can pass an additional interview, they will have 30 "free" scores in the college entrance exam next year. This is a new attempt to change the current situation, but the question is - does it really work? Let's start with something I don't like about this plan.

First, one of the key points of this new policy is that the names of these 39 high school principals are public. I don't know how you feel about this, but I feel it is kinda of funny. I'm not sure what it infers. Does it mean that if referees' names are not public, they will not be trusted?

Second, it is not fair to pick certain high schools to recommend students. Does Peking University want to pick students, or another affiliated high school? On the other hand, this is probably a perfect commercial for those 39 high schools. It delivers this message: these 39 high schools are the best high schools in China, go there so you will have chance to be recommended to Peking University.

Third, I think teachers know students better than principals, so they should recommend students. Back to when I was in high school, I only talked to the principal in my high school once or twice. Besides teachers and principals, people outside of schools should also be able to recommend students.

Although I don't agree with all of above, I do understand why Peking University did it. They all boil down to one problem - people do not trust each other. If everybody is trustable, Peking University can simply take any recommendations into account. Unfortunately, they can only trust 39 people after letting everybody know their names. What they want to do is to make accountability very clear. If you have doubts on any of the recommended students, go ask those principals. This also enforces the supervise on referees, so they will not do bad things, such as taking bribes or recommending their friends or relatives.

So, it looks like a good plan, doesn't it? Unfortunately, I don't think it will work in reality. The execution of this new policy will turn to be formalism. Most of those high school principals will probably just recommend students who have best scores in exams - the traditional, but controversial, way to define best students. That is obviously the safest way to do it. Everybody is watching, they don't want to be questioned for doing something "new", or I would say "better". For example, this high school in Chongqing recommended three students who have the best scores in the recent four exams. These students can probably go to Peking University even without this so-called recommendation. There is nothing wrong with doing this, but if all of other 38 high schools do the same, this new policy does not make any sense, because its original goal is to give opportunities to students who are not usually considered as the best - those who do not have the best scores in exams. I'm very curious what other high schools will do next.

Just like many other problems related to education, how to pick students to colleges is a very hard problem. Having admission hinge on a single exam is definitely a bad idea, but it is probably the best way for now, given the realities in China. Things take time to change. If Peking University's new policy can more or less make a step further on the way, it is worth the effort.

Thursday, November 12, 2009

Write Parallel Gomoku AI Programs With The Branch & Bound Pattern

In today's lecture, we learned the Branch & Bound Pattern. The main idea of this pattern is to explore huge search spaces with multiple threads/processes while applying bounding or pruning strategies to control the size of the search space. This is a very interesting pattern. It reminds me some fun time in high school. When I was in high school. a very popular game among computer geeks was to write programs to play Gomoku (Also names Five-in-a-row). We let programs to compete with each other to see who wrote the smarted programs. The basic rule of Gomoku is straightforward: two people place black/white pieces on a 19x19 Go board in turns. Whoever make five of her colors in a row wins the game. Branch & Bound Pattern is a perfect pattern to parallelize such gaming problems.

The basic idea to make a move in Gomoku games is to go from the current state of the board, explore as many steps in the future as possible, and pick the move with the highest possibility to win finally. For example, after the first white piece is placed, the search graph is something like this:



It's not hard to see that the search space can be huge. Let's think about the four operations in the Branch & Bound Pattern. We define a function G(state) for each state. G(state) indicates the chance for the program to win finally.

As for branching, it is straightforward. When the program makes a move, it needs to calculate the G(state) for each of the to-be states and pick the best choice. G(state) is actually a recursive function of G(state') where state' is the subsequent states of state. So we need to calculate all G(state') first.

As for evaluating. G(state) is used to evaluate states in the search.

As for bounding, Gomoku game is not as trivial as SAT or integer linear programming mentioned in the pattern. It is very hard to efficiently calculate the bonding value of G(state). What we can do here is to come up with an estimated Ge(state) according to the current state and/or Ge(state) of its sub-states. There are many ideas to calculate Ge(state). For example, we can consider how many three-in-a-row  cases in each side. The more three-in-a-row cases one side has, the more likely that side will win. Usually we use several strategies and give each of them a weight. The final Ge(state) is calculated as a function of all strategies.

As for pruning, if a state has been evaluated in other branches or a state's Ge(state) is worse than the current-best G(state), we can prune the branch started by this state. Of source, there can be many tricks to do smarter pruning than this.

To parallelize the searching process, structure-based parallelization is the best fit for this problem. Because all processors should process states in a single pool, we can use either Synchronized-Single-Pool (SSP) or Asynchronized-Single-Pool (ASP) with shared current-best state and visited states. I think the best-state-first strategy may work better then the depth-first strategy. We need to carefully limit the maximum depth to search regardless which search ordering strategy we use.

It is already close to the end of the class, otherwise I think it is a good idea to have a Parallelized Gomoky Program Contest in the class ;-)

Saturday, November 07, 2009

Given That Deterministic Replay is Still Slow, What Should We Do for Parallel Debugging?

Debugging has been tough, and it becomes tougher these days. One of the reasons is that people write more parallel programs and introduce concurrency bugs. Concurrent bugs are especially hard to debug, because many of them do not always show up. There can be many things that make a bug hard to fix, not being reproduced is probably the worst one.

What is the ideal way to debug a parallel program? In my mind, a parallel debugging framework should provide the following:
  • In the production run, records the execution with zero or very low overhead.
  • In the debugging phase, let people deterministically replay what happened when the bug manifests. It is even better if the replay can go both forward and backward.
  • Programmers can attach their favorite debuggers to the replay session.
This even sounds tempting to sequential debuggers, doesn't it? However, the real world is not ideal. The main problem is that recording introduces huge overhead in the production run. A lot of research has been done to make it faster, but still, there is no very realistic solutions yet, especially for applications that run on multi-processors. In this years SOSP, there are two papers on this topic: Do You Have to Reproduce the Bug at the First Replay Attempt? -- PRES: Probabilistic Replay with Execution Sketching on Multiprocessors and ODR: Output-Deterministic Replay for Multicore Debugging. They both send out three interesting messages:
  • They are both software-only solutions.
  • They both make compromise - push the complexity from recording to replaying.
  • They both work on multiprocessors.
This trend shows that people become more realistic about this hard problem. First, software-only solutions are preferred, because they are easier to implement. Don't get me wrong. I'm not saying hardware-based or software-hardware-hybrid solutions are unrealistic, but they indeed take longer time to become real. Second, people compromise something that is less important to make more important things to be feasible. Third, multiprocessors are so common today, so a realistic solution should work with it.

The idea of PRES is to sacrifice the efficiency of replay. Instead of reproducing the buggy execution at the first try, it requires more than one attempts to reproduce it. Meanwhile, the reward of this is that it only needs to record much less information in the production run. One could think of it as PRES drawing a "sketch" in the production run, instead of a "finished paint". Although it may need to try several times to recover the "finished paint", but "sketch" is fast and enough to catch a rough idea about what happened. If painters use this idea, why can't we? Very smart! During replaying, PRES uses both the sketch from the production run and feedbacks from previously failed attempts. The idea is that when PRES encounter a data race that has not been recorded in the production run, it makes a random guess. As soon as PRES finds out the execution does not match the sketch any more, or does not reproduce symptoms in the buggy execution, what it will do is to roll back a bit, flip some data races and give it another try. This actually reminds me about another paper, Rx, which is one of the system research papers that I like the most.

The idea of ODR is to sacrifice the accuracy of replay. Instead of reproducing the buggy execution based on internal values, it tries to reproduce an execution that has the same output as the buggy one. Output includes segment fault, core dump, print out strings and such. The reproduced execution may be different from the buggy one, but the rationale behind this idea is that if it has the same output, it is very likely that the bug that appears in the buggy execution also manifests in the reproduced execution. If output-determinism is enough, why would we want value-determinism?

There are many other good papers about deterministic replay for debugging parallel programs. If interested, you can find more from the references of these two papers.

Besides academic research, VMWare's replay debugger is a production-class parallel debugger tool. Although it only works for single processor execution, it has almost everything an ideal parallel program debugger should have. I was really really impressed. E Lewis gave talk in Google about it: