This idea seems to be simple and straightforward, but there are a lot of things to think about to make it work well. Without too much thinking deeper, this idea can be implemented by just what Java Thread class provides. For each task, if it is small enough, solve directly; otherwise create several threads and let each new thread handle a smaller chuck. Then call join() method on each new threads to wait until all of them finish. Finally merge results. There are a few problems of this approach. First of all, creating threads is expensive. When a task is fairly small, the time spent in creating more threads is even longer than the computing time spent in solving the task itself. Second, there will be a lot of threads created. These threads compete for CPU cycles which result in worse performance. Third, programmer needs to write specific code for each divide-and-conquer algorithm. It makes code harder to write, read and maintain.
If you are familiar with the java.util.concurrency package introduced since Java 5, you may want to suggest to use a thread pool based executor and call invokeAll() to process divided tasks. This saves the overhead of creating threads and limits the number of threads - a big step ahead, but there is good odds that this idea will not work well either.
- The tasks queue, most likely an unbounded blocking queue, that is shared by all threads in an executor could become a bottleneck, because every thread needs to lock the queue when it gets tasks from the queue or puts new sub-tasks into the queue.
- What should happen when a thread is done its job? It may not be able to process the next queued task because that task may still be waiting for join its sub-tasks. In this case, how could this thread pick another task to continue instead of being idle?
- Programmers still have to write a fairly big amount of code in addition to the code that runs the algorithm itself.
The Fork/join Framework solves these problems. Each worker thread maintains its own tasks scheduling queue. This prevent one single shared queue from being a bottleneck. What should a thread do when its own tasks queue becomes empty?
Here comes the smartest trick in Fork/join Framework - the "work-stealing" strategy. A scheduling queue of each thread is a double-end queue. When a task needs to be divided into smaller tasks, it sends all its sub-tasks to one worker thread. So as the time goes, the head of a queue contains bigger tasks and the tail of a queue contains smaller tasks. For each thread, it processes tasks from the tail of its queue (FIFO fashion). This utilizes data locality and also tries to concentrate on all sub-tasks of one task as much as possible. When a worker thread finishes all its own tasks, it "steals" tasks from the head of other threads' queues (LIFO fashion). There is a good chance that this thread will pick a big task and thus it can further divide this tasks without affecting other on-going work. It won't need to "steal" again in the near future. For each double-end queue, one thread works on one end and other threads may work on the other end, but not very often. This is very good to prevent contention on queues.
Last but not the least, with the Fork/join Framework, programmers can focus on the code that tackles their own problems. The code could be much cleaner and easier to read.
In his paper, Doug also evaluates the Fork/join Framework by solving a few real problems. He mentioned that the performance can also be affected by garbage collection, memory locality and bandwidth, task synchronization and task locality. These aspects should be considered careful when design the algorithm to utilize the Fork/join Framework. I like this paper, it is a typical Doug-style paper and it is a lot of fun to read.
The Fork/join Framework is a part of JSR-166. It can be downloaded from the JSR-166 interest site.
No comments:
Post a Comment