Tuesday, September 15, 2009

Facebook invites everyone to be involved

Facebook announced the Facebook Platform in May 2007 which I think is definitely one of the smartest moves Facebook has ever taken. There was only a few applications available in May 2007, but today, there are more than 350,000 active applications and 250 of which have more than one million monthly active users. The Facebook Platform provides an interface to developers to build applications within Facebook, it is just like an operating system provides an interface between hardware and programmers. One big difference is that Facebook's platform also provides a huge amount of very valuable social networking data.

Facebook's original goal is to let people connect to people. But connection itself is not going to be interesting enough to attract more people and connection needs people to grow. So Facebook needs applications to make itself interesting and useful. Instead of developing all kinds of applications by itself, Facebook give the opportunities to everyone. Facebook needs applications and applications need existing social network data. This is apparently a win-win game. Very smart!

Facebook Platform is very interesting to take a look at because it fits well into the software architecture of modern web-based applications. Before looking into how Facebook does it, ask yourself what you will do if you are asked to design such a platform. There are a few questions to answer:
  • How to let developers access data in Facebook, with something that they are already familiar with, like SQL?
  • How to let developers draw web content, with something that they are already familiar with, like HTML and CSS?
  • How to let developers write dynamic web content, with something that they are already familiar with, like client-side JavaScripts?
  • How to let developers do all above in a secured way?
I never developed Facebook applications, but after skipping through their documentation, the design is nice and straightforward to follow. When you develop a Facebook application, you use FQL to access data. FQL is very like SQL. It supports complex operation to reduce the number of requests you need to query. To draw web content, you have basically two choices, use FBML or an IFrame-based application. Jesse Farmer has a pretty nice introduction for FBML. But if you want to re-use some existing code, IFrame-based approach might work better for you. This article describes how to choose tween FBML and IFrame. Facebook Platform make dynamic and Ajax style content possible by providing FBJS. FBJS is a good balance between security and flexibility. One big problem of using FBJS is that many great JavaScript libraries won't work properly.

Facebook Platform is good, but its problem is that applications written with Facebook Platform need to run within Facebook. In many cases, applications that are outside of Facebook also want to utilize the social networking information in Facebook. Facebook announced Facebook Connect in Nov, 2008. It allows applications out of Facebook to access Facebook users' identity information, social graph and streams. A lot of good web applications start using Facebook Connect, such as Aardvark and many others. This is good for applications, because they can utilize the social networks that people already create in Facebook, instead of letting people create hundreds of different social networks. This is also critical for Facebook itself. Facebook oversees one way to gain revenue as to provide something like Paypal. This can becomes true only if Facebook Connect has a massive adoption.

Domain specific distributed programming platform

(Reading notes for Chapter 3 of "Beautiful Architecture" - Architecting for Scale)


This chapter talks about Project Darkstar, a Sun-led open source distributed programming platform for online games and virtual worlds. One major thing I learned from this Chapter is to analyze a particular domain thoroughly and design a distributed programming platform for it, instead of trying to build a platform that works everywhere. Nowadays, distributing computing is an unavoidable task in front of almost every programmer. However, to write robust and efficient distributed programs is not trivial at all. It requires deep knowledge of how computer systems work and very careful design. Distributed programs are also much more difficult to test and debug. Does it mean that we need to train very programmer for it? No! Programmers should not be distracted from their own domain specific tasks. In this case,  to have a platform helps programmers focus on what they are good at and quickly scale their programs up. Unfortunately, distributed computing is very complicated and it often involves many design trade-offs. There is no one general  design that fit in everywhere. Project Darkstar is designed for the uniqueness of massively multiplayer online games (MMOs) and virtual world.

The target domain of Project Darkstar has a similar programming model with many web based services. It has a server and a huge amount of clients (players). Players send Tasks to server to query or update states. In the meantime, it has a few differences which lead to the design trade-offs of this system. I am especially interested in the design of its data model. I also found this technique report cover many details about this. It is very interesting to read.

First, latency is more critical than throughput for MMOs and virtual worlds. It requires very low latency to make sure that people can play games smoothly. This become an especially big problem when we talk about data storage, because data accessing is slow due to disk performance. Achieving low latency could be hard even in a non-distributed environment. Project Darkstar uses a distributed caching system with callbacks to avoid frequent permanent storage accesses and network communication.

Second, data access pattern is different. MMOs and virtual worlds have thick clients, which interactive with users by fancy graphic interfaces. Servers usually just keep states. So servers process a lot of small data access and about half of them are modifications. On the other hands, for most other web-based applications, clients are significantly simpler and servers handle most of the work. This character also mainly affects the design of the storage layer.

Third, users of MMOs and virtual worlds are more open to tolerate temporary data unavailability or even loss. So the design of Project Darkstar trade offs on this.

Fourth, a centralized and globally shared data management is especially important for MMOs and virtual worlds. The reason is two folds. It does not require game designer to divide games into playing areas. Every player is able to interact with every other player. It also prevents cheating more effectively.

Besides data storage, Project Darkstar also put a lot of efforts in other aspects of scaling an application, such as load balancing (demo), dynamic scaling, transaction management and so on.

Thursday, December 18, 2008

What do 2.6 billion minutes mean?



It is about 5000 years.

It is the time of 75 people's whole life.

It is time that 620 people work for their entire lives.

It is the time that a person can walk from the earth to the moon 45 times back and forth.

It is the time that people spend on Facebook.com every day...

Tuesday, October 14, 2008

What would I ask more from Mint.com?


The answer is "no more for now". Jed told me about Mint.com today and I started liking it immediately, which ended my long-term struggling with Quicken. Although I still feel a bit scary to put all my fiance information there.

Mint.com has a very clean and intuitive interface, just like many other great web products. The feature I like the most about Mint.com is that it is so easy to set up bank accounts. I had a really tough time about it when I was using Quicken.  Quicken always asked me for all kinds of PINs which I don't have. But on Mint.com, my on-line bank login information just worked well. It also perfectly recognizes all my  transactions, marks spending as negative and earning as positive. So I can clearly see how much money I have and how much debts I carry. 

Mint.com provides a great set of features to modify and tag transactions. You can easily add note, split anamount, set filters and tag a transaction. The tagging is especially handy. For example, I can tag those expenses that I will get reimbursement later and filter them out when I generate reports or charts.

Mint.com provides several really useful charting features. The pie charting of expense is especially cool and functional. You can easily zoom into any category to see sub categories or individual transactions.

Besides basic bank information management, Mint.com can also help with investment, which I haven't got time to dig more yet.

Last but not the least, Mint.com is web-based and I can use it whenever I want.

Many people hesitate to jump in. Their biggest concern is about the privacy. Aaron Patzer, Mint.com's founder, explained how Mint.com protect people's privacy. I think it makes sense. The only part I don't quite buy is: 

We ask for your online banking user name and passwords, but we do not see or store that information.

I guess once an user types in a username/password, Mint.com creates a tunnel with the bank and that tunnel keeps existing until the user change password or something else happens. Not sure about this though. I hope Mint.com could explain it in more details.

The business model of Mint.com is also interesting. There is no obvious ads, but Mint.com recommends you some bank deals, such as credit cards with more rewards, checking accounts with higher interests ratio and so on. I feel that Mint.com should do something beyond this to earn money.

The way I see Mint.com is that it requires almost zero effort to use but solves 80% problems. If you start managing your money. Try Mint.com before you HAVE TO spend money and time on those complicated application.

Tuesday, October 23, 2007

Machine translation has a long way to go.

Recently, Google switched to its own translation system from Systran system. There is an interesting article compares these two by translation between English and German.

Honestly, I'm quite pessimistic about machine translation. I never had good experience with any translator. Just out of my curiosity, I compared Google and Systran by translating an English snippet into Chinese and it turns out that neither of them did a good work and nobody can understand their result, though Google did a slightly better. Everyone knows machine translation is a hard problem, me too. But according to their result, I think they could have done much better. They just didn't put enough effort.

Let's take a look at the translation result.

Englisth to Chinese Tranlastion Comparison
English SnippetGoogle TranslatorSystran translation
Apple infuriated developers and some iPhone users when it issued a software update September 27 that disabled unofficial programs installed on the handsets.

Until Wednesday, Apple had tried to control which applications consumers had on their iPhones.

Now, Jobs said the company intends to release a software development kit in February that will let coders create applications to work directly on the iPhone and the iPod Touch. The Touch is the new iPod portable player that resembles the iPhone but lacks the function of a cell phone.

"We are excited about creating a vibrant third-party developer community around the iPhone and enabling hundreds of new applications for our users," Jobs said in the posting.
苹果激怒了开发商和一些iphone的用户时,它发出的一个软件更新9月27日伤残非官方程式安装于手机。

直到周三,苹果公司曾试图控制哪些申请消费者对它们iphones 。

现在,乔布斯表示,该公司打算发布一个软件开发工具包,在2月,可以让编码员创造应用工作,直接就对iphone与ipod的触摸。触摸是新的ipod便携式播放相似iphone的,但欠缺的功能手机。

"我们很高兴看到,创造一个充满活力的第三方开发社区周围iphone的和有利的数百名新申请用户, "乔布斯说,在张贴。
苹果计算机公司激怒了开发商和一些iPhone 用户当它发布了残疾非官方的节目安装在手机的软件更新9月27 日。

直到星期三, 应用消费者有在他们的iPhones 的苹果计算机公司设法控制。

现在, 工作认为公司打算发布软件开发成套工具在将让编码人创造应用工作直接地在iPhone 和iPod 接触的2月。接触是类似iPhone 的新iPod 便携式的球员但缺乏手机的作用。

"我们被激发关于创造一个充满活力的第三方开发商社区在iPhone 附近和使能上百新应用为我们的用户," 工作说在0N 投稿。

In the first paragraph, both Google and Systran translate "disabled" as an adjectives, as "disabled" in "a disabled veteran", which messed up the meaning of the whole sentence. Because in Chinese, we use different words for "disabled" in "a disabled veteran" and "disabled" in "disable a functionality". As well known, verbs are usually very importing for understanding the sentence. So verbs should be handled every carefully. Actually, by using better NLP technologies, it can be known that "disabled" here is a verb and its object is "programs". Then it can be translated in a much better way. However, Systran translates the time adverb clause, "when it issued a ...", in a better way. Chinese will misinterpret Google's translation as "When Apple infuriated developers and some iPhone users, it issued it issued a software update ...".

It becomes even worse when I went through the second paragraph. The translation doesn't make any sense at all. It is the matter of the order of words and the meaning of "application", which is translated into "the act of applying" by both, instead of "computer software".

In the third paragraph, Google did a better job. It recognized "Jobs" as the name of Steven Jobs, but Systran translated "Jobs" as "job" in "look for a job". Simply from the fact that "J" is in capital, Systran should have done better. Systran also translates "player" as in "football player"... Moreover, Google did a lightly better job on the order or words, but still, the result is very hard to understand.

In the last paragraph, Google did a perfect job in translating "We are excited about" into a beautiful Chinese sentence, but Systran though it was "we are inspired". I think Google benefits from a collection of tons of common phrases, even sentences, and corresponding accurate translation. I actually used Google Translation a lot for phrases or short sentences.

To sum up, I saw the biggest two problems are:
(1) how to figure out the meaning of words in a particular context, and use the accurate translation in the target language.
(2) how to order translated words in a good way which is used in the target language. (more difficult)

Both of Google and Systran try to leverage huge dictionaries, NLP knowledge and statistic models, but it seems to me that in most cases, they are still translating text simply word by word. The translated text is very hard to understand, even misleading. So I would say they really have a long long way to go before people can understand their translation.

Wednesday, October 17, 2007

Google is on its way for to-do list.

I have been waiting for Google's to-do list for quite a long time. Not surprisingly, so did many others. Fortunately, we won't wait for too long. Google is "working to add our special Google secret sauce to the to- do lists space".

Friday, October 05, 2007

Updated SOSP/OSDI HOF with SOSP07 papers.

I just updated the SOSP/OSDI Hall of Fame with the SOSP 07 papers.

Check it out!