解决数据库多写问题,同事推荐使用hbase,并做了HBase培训,也看到老大tim参会说淘宝用hbase替代部分mysql核心应用,学习研究下看是否适用

分布式计算的谬论.:

1 The network is reliable.
2 Latency is zero.
3 Bandwidth is infinite.
4 The network is secure.
5 Topology doesn't change.
6 There is one administrator.
7 Transport cost is zero.
8 The network is homogeneous.

 下载版本0.92.1  889个文件  285749 行java代码(find . -name '*.java'|wc -l)

《HBase 权威指南》目录摘要:

 

 

  1. hbase演进

    November 2006
    Google releases paper on BigTable
    February 2007
    Initial HBase prototype created as Hadoop contrib§
    October 2007
    First “usable” HBase (Hadoop 0.15.0)
    January 2008
    Hadoop becomes an Apache top-level project, HBase becomes subproject
    October 2008
    HBase 0.18.1 released
    January 2009
    HBase 0.19.0 released
    September 2009
    HBase 0.20.0 released, the performance release
    May 2010
    HBase becomes an Apache top-level project
    June 2010
    HBase 0.89.20100621, first developer release
    January 2011
    HBase 0.90.0 released, the durability and stability release
    Mid 2011
    HBase 0.92.0 released, tagged as coprocessor and security release

  2. rdbms的局限性
    举例“Hush, the HBase URL Shortener”这个应用,随访问量增大要加slave,加cache,只能做简单查询,考虑读写的不断优化和扩展,分表分库,在应用层面改程序,做sharding,买好的硬件,以及随后的不尽噩梦。
  3. HBase的面向column的表

    the most basic unit is a column. One or more columns form a
    row that is addressed uniquely by a row key. A number of rows, in turn, form a table,
    and there can be many of them. Each column may have multiple versions, with each
    distinct value contained in a separate cell.
    (Table, RowKey, Family, Column, Timestamp) → Value  可在编程语言中表达为:

    SortedMap<RowKey, List<SortedMap<Column, List<Value, Timestamp>>>>   (p19)
    相同rowkey会有不同时间戳的数据,对应不同的版本,数据存储在HFiles中,索引保存在内存中,默认64KB,HFiles又被保存在Hadoop Distributed File System(hdfs)中,确保在跨服务器的数据写入不会丢失。索引存储在文件块的最后面.

  4. HBase的anto-sharding

    region去管理监控做sharding。“Each region is served by exactly one region server, and each of these servers can serve

    many regions at any time"

  5. 数据写入流程

    When data is updated it is first written to a commit log, called a write-ahead log (WAL)
    in HBase, and then stored in the in-memory memstore. Once the data in memory has
    exceeded a given maximum value, it is flushed as an HFile to disk. After the flush, the
    commit logs can be discarded up to the last unflushed modification. While the system
    is flushing the memstore to disk, it can continue to serve readers and writers without
    having to block them.  

    Since flushing memstores to disk causes more and more HFiles to be created, HBase
    has a housekeeping mechanism that merges the files into larger ones using compaction.
    There are two types of compaction: minor compactions and major compactions.(p24)

  6. HBase组成部分
    the client library, one master server, and many region servers.HBase master server 使用zookeeper管理region servers,负载均衡,去掉繁忙服务器。hbase相比google bigtable,增加了" push-down predicates, that is, filters,

    reducing data transferred over the network"


     

     

     

 

 

posted on 2012-05-28 14:02  坚毅的刀刀  阅读(1068)  评论(0编辑  收藏  举报