Node.js vs Erlang: SyncPad’s Experience

原文在墙外:http://blog.mysyncpad.com/post/2073441622/node-js-vs-erlang-syncpads-experience

Disclaimer

First and foremost I want to be clear - this post is not comparing Erlang and Node.js. This is about my experience with one versus the other.  The truth is, you can’t compare these two. Erlang was made for the largest global telecom systems and Javascript was made to make our web browsers more dynamic (an interesting side-note, Brendan Eich the creator of JS, mentions the Erlang actor model as a possible direction for JS). Regardless they are both useful and elegant in their own ways and I greatly enjoy both.

Secondly, I acknowledge that there are undoubtedly problems with the code and test below.  My tests are neither exhaustive, nor scientific - it’s just a “back-of-the-napkin” type test.  It was conducted to illustrate to you the experience we had once we connected the iPad apps to the server. Ultimately, I hope by sharing this with you I can gain insight on how to fix the issues we ran into.

Background

When I was first contacted about this project, 39inc had created the SyncPad app and prototyped an initial server in PHP/MySQL with polling. I didn’t think that would work very well; moreover, I was waiting for a project I could create using Node.js. I recreated the server in Node.js to use long polling and opted for Redis as the datastore because of features like lists,pubsub, and it’s blazing speed.  

Once we started testing with the new server with our iPads, we noticed the Node.js SyncPad server was consuming memory and CPU.  It’s worth noting that because we are bootstrapping SyncPad, we were limited running on a small VM, only 524MB of Ram, 2.4GHz AMD quad-core processor for dev. 

Code

Unfortunately (and hopefully understandably) I cannot release the server code, but every “draw” that the server receives is encoded as a JSON object and put to the datastore.  Every consuming client, it decodes the JSON, performs some functions, and re-encode it before sending it to the consumer. Scribbling on SyncPad alone doesn’t consume the server’s resources, but when there is another consumer client the servers resources are consumed. 

Testing

I conducted a simple test in which we create comparable load on the server as 1 ipad drawing non-stop and 1 ipad receiving the non-stop drawing. I measured how many requests I generate in 1 minute of absurd scribbling on SyncPad - about 3300 requests/min. I generated aiming for ~3300 requests/min and connected another SyncPad to poll for the results. The stats were gathered using vmstat at one second intervals and graphed with a tool called vmstax.

SyncPad Node.js Server

I attempted to send 198k requests, but the server quit after 138k requests, which took almost 2hrs. 

Researching this issue led me to an HackerNews thread and related blog postthat made me theorize the issue may be related to garbage collection, JSON.parse and the decoding of the JSON.

I tried to figure out how best to store the objects in something other than JSON.  I looked into buffers, but whatever I did, I was still doing it wrong.

Toy Node.js Server

I realize it would be hard for you to comment on the issues without seeing code, so I have recreated the code that you can test and review. This code, to a lesser degree, recreates the same issues that we saw in SyncPad’s node.js server code. The toyserver.js code is only intended as a working example.

This toyserver.js server performs better than the Node.js SyncPad server.  It was able to complete the 198k requests in 1.5hrs, but it still exhibits high CPU usage and memory leaking.

Additionally I traced the garbage collection for this test and 2 hours past the end of the test.  You can view the results here.

SyncPad Erlang Server

I tried to resolve these issues by looking for other ways to encode.  I looked into Buffers, but I wasn’t sure exactly how to implement to resolve my issue. Finally, for the sake of time, I re-wrote the server in Erlang.

Aside from the fact it’s Erlang, one of the major differences was that the draw “objects” were no longer stored as strings. It’s super easy to work with binary in erlang and because the datastore handles binary as well, it was a perfect fit.   Now all of the objects are stored as binary. Other than that, everything else was about the same.

When I ran the exact same tests I had much better results.  First the Erlang server completed 198k requests in about 50m and the memory and CPU usage was negligible. 

Multi-core vs Single Core

The graphs above do not represent the multi-core nature of the CPU very well.  For example, the Node.js servers above only show CPU usage of about ~25%, however if you were watching htop, you would see that the single-threaded node.js process had one of the CPU’s pegged at 100%.

With Erlang, it naturally spreads the work over all four cores.  Node.js does not have this luxury, out of the box, yet.

I attempted to address the lack of Node.js multi-core concurrency usingmulti-node and a few other projects, but none worked out well in my initial attempts. I did not pursue it any further as I ended up switching to Erlang anyhow.

Conclusion

Node.js was fun to program and I want to use it in future projects, but I’d like to know what I can do to resolve the issues above. I encourage your constructive feedback.

Erlang was also a delight to code.  It is a very rich language with lots of features and OTP is a great framework with which to build servers.  The SyncPad server acts as messaging server, so Erlang was what fit best. I would encourage you to try your next project in Erlang as well.