Tuesday, September 15, 2009

Domain specific distributed programming platform

(Reading notes for Chapter 3 of "Beautiful Architecture" - Architecting for Scale)


This chapter talks about Project Darkstar, a Sun-led open source distributed programming platform for online games and virtual worlds. One major thing I learned from this Chapter is to analyze a particular domain thoroughly and design a distributed programming platform for it, instead of trying to build a platform that works everywhere. Nowadays, distributing computing is an unavoidable task in front of almost every programmer. However, to write robust and efficient distributed programs is not trivial at all. It requires deep knowledge of how computer systems work and very careful design. Distributed programs are also much more difficult to test and debug. Does it mean that we need to train very programmer for it? No! Programmers should not be distracted from their own domain specific tasks. In this case,  to have a platform helps programmers focus on what they are good at and quickly scale their programs up. Unfortunately, distributed computing is very complicated and it often involves many design trade-offs. There is no one general  design that fit in everywhere. Project Darkstar is designed for the uniqueness of massively multiplayer online games (MMOs) and virtual world.

The target domain of Project Darkstar has a similar programming model with many web based services. It has a server and a huge amount of clients (players). Players send Tasks to server to query or update states. In the meantime, it has a few differences which lead to the design trade-offs of this system. I am especially interested in the design of its data model. I also found this technique report cover many details about this. It is very interesting to read.

First, latency is more critical than throughput for MMOs and virtual worlds. It requires very low latency to make sure that people can play games smoothly. This become an especially big problem when we talk about data storage, because data accessing is slow due to disk performance. Achieving low latency could be hard even in a non-distributed environment. Project Darkstar uses a distributed caching system with callbacks to avoid frequent permanent storage accesses and network communication.

Second, data access pattern is different. MMOs and virtual worlds have thick clients, which interactive with users by fancy graphic interfaces. Servers usually just keep states. So servers process a lot of small data access and about half of them are modifications. On the other hands, for most other web-based applications, clients are significantly simpler and servers handle most of the work. This character also mainly affects the design of the storage layer.

Third, users of MMOs and virtual worlds are more open to tolerate temporary data unavailability or even loss. So the design of Project Darkstar trade offs on this.

Fourth, a centralized and globally shared data management is especially important for MMOs and virtual worlds. The reason is two folds. It does not require game designer to divide games into playing areas. Every player is able to interact with every other player. It also prevents cheating more effectively.

Besides data storage, Project Darkstar also put a lot of efforts in other aspects of scaling an application, such as load balancing (demo), dynamic scaling, transaction management and so on.

No comments: