A top-down intro to NoSQL

A top-down intro to NoSQL


Data storage

1.        No structured storage

2.        Structured storage


No structured storage

1.        file

2.        chunk


Structured storage

http://en.wikipedia.org/wiki/Structured_storage

1.        SQL

2.        NoSQL


SQL

http://en.wikipedia.org/wiki/SQL

Relational Database

1.        SQL Server

2.        Oracle

3.        MySQL


NoSQL

http://en.wikipedia.org/wiki/Nosql

A movement promoting a loosely defined class.

Architecture

http://www.infoq.com/resource/articles/nosql-in-the-enterprise/en/resources/Image1.JPG

Properties

1.        No fixed table schemas

2.        Avoid join operations

3.        Scale horizontally(水平伸缩)

Motivation

Data-intensive applications, such as:

l        indexing a large number of documents

l        serving pages on high-traffic websites

l        delivering streaming media


Taxonomy

Document store

1.        CouchDB

2.        XML database

Graph

1.        Neo4j

Key/value store on disk

1.        BigTable

2.        Memcachedb

Key/value cache in RAM

1.        memcached

Eventuallyconsistent keyvalue store

1.        Dynamo

2.        Cassandra

Ordered key-value store

1.        Memcachedb

Tabular

1.        BigTable

2.        Hbase

 

 

 

Graph Database

http://en.wikipedia.org/wiki/Graph_database

A database uses graph structures with nodes, edges and properties to represent and store information.

Properties

Faster for associative data sets

Map more to the structure of OOP

Not require expensive join operations



Document-oriented database

http://en.wikipedia.org/wiki/Document-oriented_database

¡        Dynamic Fields

¡        For example here's a document:

l        FirstName="Bob", Address="5 Oak St.", Hobby="sailing".

¡        Another document could be:

l        FirstName="Jonathan", Address="15 Wanamassa Point Road", Children=("Michael,10", "Jennifer,8", "Samantha,5", "Elena,2").



CouchDB

http://en.wikipedia.org/wiki/Couchdb

¡        Features

l        Document Storage

l        ACID Semantics

l        Map/Reduce Views and Indexes

l        Distributed Architecture with Replication

l        REST API:

¡        REST uses the HTTP methods POST, GET, PUT and DELETE for the four basic CRUD (Create, Read, Update, Delete) operations on all resources.

l        MVCCMulti-Version-Concurrency-Control

¡        读写均不锁定数据库

l        服务端脚本 —— 纯 JavaScript 开发环境



Key-Value Database(目前最广泛)

  HBase vs Cassandra

http://wangxu.me/blog/?p=371

CAP

¡        HbaseCA,基于BigTableGFS,对MapReduce支持更好

¡        CassandraAP,后来者,更灵活,基于Dynamo 

Hbase

l                   模块性更强,需要多个组件构成

l                   因为要部署多个组件,部署困难




Cassandra

 

Cassandra最初由Avinash Lakshman (Amazon's Dynamo的作者之一 Prashant Malik ( Facebook工程师)Facebook设计开发,在2008Facebook把它贡献给了开源社区。 在很多的地方你可以把Cassandra看成是Dynamo的升级版本2.0或者是DynamoBigTable的结合。CassandraFacebook投入实际应用运行,但仍然处于大量开发进展阶段。


http://www.ruohai.org/?p=17

l         Backgound

n         Digg在去年九月宣布了他们转向Cassandra的计划,仔细比对了其它项目——HBaseHypertableTokyo Cabinet/TyrantVoldemort,以及Dynomite——,他们最终选择了Cassandra

l         Architecture

n         集群模型:Dynamo

u         (去中心&&单纯的KeyValue

n         数据模型:BigTable

l         Key

n         决定数据份分布在哪些节点上面

n         Keyspace:解决不同应用间的作用域问题。相当于不同的scheme

n         一个Key对应一个行

l         Value

n         Column: (name, value )

n         SuperColumn: (name, sortedlist<Column> )

n         ColumnFamily: 相当于RDBM中的Table

l         Example 01

l         Users ColumnFamily

n         Column组成

Users: { // ColumnFamily

    ruohai : { // 用户的nick作为key

        {name: "nick", value: "ruohai", timestamp: "123456"},

        {name: "email", value: "sucode@gmail.com", timestamp: "234567"},

        {name: "website", value: "http://www.ruohai.org", timestamp: "345678"},

        {name: "twitter", value: "sucode", timestamp: "456789"},

        // other properties

    }

    user2 : {

        // ...

    }

}

l         Example 02

n         Favourites ColumnFamily

n         SuperColumn组成

Favorites: { // ColumnFamily

    ruohai : { // ruohai的收藏信息, Row key

      lining : { // SuperColumn name,表示收藏的tag

         {name: "123", value: "1", timestamp: 123},

         {name: "125", value: "7", timestamp: 125},

         {name: "139", value: "13", timestamp: 139}

      },

      nike : { // 另一个tag

         {name: "223", value: "11", timestamp: 223},

         {name: "225", value: "9", timestamp: 225},

         {name: "239", value: "23", timestamp: 239}

      },

      // ... 其他tag

   },

   user2 : {

      // user2tag收藏信息

   }

}

l         Evaluation

n         读性能较差

n         写性能较好

l         Approach

n         分布式 Key-Value 存储系统:Cassandra 入门

u         http://www.ibm.com/developerworks/cn/opensource/os-cn-cassandra/




总结

NoSQL的起因是:目前的Web系统,Data-based越来越明显,Model-based越来越弱化,数据暂时不需要复杂的结构。

大多数应用只需要在松散的数据结构上存取数据。例如twitterfacebook。而不需要复杂的计算模型和数据模型。

 

同时,Google广泛使用的MapReduce计算模型,

在计算模型上具有广泛适用的灵活性与Scale horizontally(水平伸缩)能力。

Key-Value数据模型,也是一种松散的数据结构。

CassandraDynamoHbase也具有类似特征。




 

大半文字由wiki总结而来,我已经给出相应链接。

Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply.


 


posted @ 2010-08-15 17:09  贺韬  阅读(248)  评论(0)    收藏  举报