
Why We Chose C++ Over Java

This document is to clarify our position regarding C++ vs. Java for choice of implementation language. There are two fundamental reasons why C++ is superior to Java for this particular application.

  1. Hypertable is memory (malloc) intensive. Hypertable caches all updates in an in-memory data structure (e.g. stl map). Periodically, these in-memory data structures get spilled to disk. These spilled disk files get merged together to form larger files when their number reaches a certain threshold. The performance of the system is, in large part, dictated by how much memory it has available to it. Less memory means more spilling and merging which increases load on the network and underlying DFS. It also increases the CPU work required of the system, in the form of extra heap-merge operations. Java is a poor choice for memory hungry applications. In particular, in managing a large in-memory map of key/value pairs, Java's memory performance is poor in comparison with C++. It's on the order of two to three times worse (if you don't believe me, try it).
  2. Hypertable is CPU intensive. There are several places where Hypertable is CPU intensive. The first place is the in-memory maps of key/value pairs. Traversing and managing those maps can consume a lot of CPU. Plus, given Java's inefficient use of memory with regard to these maps, the processor caches become much less effective. A recent run of the tool Calibrator (http://monetdb.cwi.nl/Calibrator/) on one of our 2GHz Opterons yields the following statistics:
    level  size    linesize   miss
    -latency        replace-time
    1     64 KB   64 bytes    6.06 ns =  12 cy    5.60 ns =  11 cy
    2    768 KB  128 bytes   74.26 ns = 149 cy   75.90 ns = 152 cy

posted @ 2008-07-05 18:04  大恐龙  阅读(158)  评论(0编辑  收藏  举报