A top-down intro to NoSQL
A top-down intro to NoSQL
Data storage
1. No structured storage
2. Structured storage
No structured storage
1. file
2. chunk
Structured storage
http://en.wikipedia.org/wiki/Structured_storage
1. SQL
2. NoSQL
SQL
http://en.wikipedia.org/wiki/SQL
Relational Database
1. SQL Server
2. Oracle
3. MySQL
NoSQL
http://en.wikipedia.org/wiki/Nosql
A movement promoting a loosely defined class.
Architecture
Properties
1. No fixed table schemas
2. Avoid join operations
3. Scale horizontally(水平伸缩)
Motivation
Data-intensive applications, such as:
l indexing a large number of documents
l serving pages on high-traffic websites
l delivering streaming media
Taxonomy
1. CouchDB
2. XML database
1. Neo4j
Key/value store on disk
1. BigTable
2. Memcachedb
Key/value cache in RAM
1. memcached
Eventually‐consistent key‐value store
1. Dynamo
2. Cassandra
Ordered key-value store
1. Memcachedb
Tabular
1. BigTable
2. Hbase
Graph Database
http://en.wikipedia.org/wiki/Graph_database
A database uses graph structures with nodes, edges and properties to represent and store information.
Properties
Faster for associative data sets
Map more to the structure of OOP
Not require expensive join operations
Document-oriented database
http://en.wikipedia.org/wiki/Document-oriented_database
¡ Dynamic Fields
¡ For example here's a document:
l FirstName="Bob", Address="5 Oak St.", Hobby="sailing".
¡ Another document could be:
l FirstName="Jonathan", Address="15 Wanamassa Point Road", Children=("Michael,10", "Jennifer,8", "Samantha,5", "Elena,2").
CouchDB
http://en.wikipedia.org/wiki/Couchdb
¡ Features
l Document Storage
l ACID Semantics
l Map/Reduce Views and Indexes
l Distributed Architecture with Replication
l REST API:
¡ REST uses the HTTP methods POST, GET, PUT and DELETE for the four basic CRUD (Create, Read, Update, Delete) operations on all resources.
l MVCC(Multi-Version-Concurrency-Control)
¡ 读写均不锁定数据库
l 服务端脚本 —— 纯 JavaScript 开发环境
Key-Value Database(目前最广泛)
HBase vs Cassandra
CAP
¡ Hbase:CA,基于BigTable,GFS,对MapReduce支持更好
¡ Cassandra:AP,后来者,更灵活,基于Dynamo
Hbase
l 模块性更强,需要多个组件构成
l 因为要部署多个组件,部署困难
Cassandra
Cassandra最初由Avinash Lakshman (Amazon's Dynamo的作者之一) 和 Prashant Malik ( Facebook工程师)在Facebook设计开发,在2008年Facebook把它贡献给了开源社区。 在很多的地方你可以把Cassandra看成是Dynamo的升级版本2.0或者是Dynamo与BigTable的结合。Cassandra在Facebook投入实际应用运行,但仍然处于大量开发进展阶段。
l Backgound
n Digg在去年九月宣布了他们转向Cassandra的计划,仔细比对了其它项目——HBase,Hypertable,Tokyo Cabinet/Tyrant,Voldemort,以及Dynomite——,他们最终选择了Cassandra
l Architecture
n 集群模型:Dynamo
u (去中心&&单纯的KeyValue)
n 数据模型:BigTable
l Key
n 决定数据份分布在哪些节点上面
n Keyspace:解决不同应用间的作用域问题。相当于不同的scheme。
n 一个Key对应一个行
l Value
n Column: (name, value )
n SuperColumn: (name, sortedlist<Column> )
n ColumnFamily: 相当于RDBM中的Table
l Example 01
l Users ColumnFamily
n 由Column组成
Users: { // ColumnFamily
ruohai : { // 用户的nick作为key
{name: "nick", value: "ruohai", timestamp: "123456"},
{name: "email", value: "sucode@gmail.com", timestamp: "234567"},
{name: "website", value: "http://www.ruohai.org", timestamp: "345678"},
{name: "twitter", value: "sucode", timestamp: "456789"},
// other properties
}
user2 : {
// ...
}
}
l Example 02
n Favourites ColumnFamily
n 由SuperColumn组成
Favorites: { // ColumnFamily
ruohai : { // ruohai的收藏信息, Row key
lining : { // SuperColumn name,表示收藏的tag
{name: "123", value: "1", timestamp: 123},
{name: "125", value: "7", timestamp: 125},
{name: "139", value: "13", timestamp: 139}
},
nike : { // 另一个tag
{name: "223", value: "11", timestamp: 223},
{name: "225", value: "9", timestamp: 225},
{name: "239", value: "23", timestamp: 239}
},
// ... 其他tag
},
user2 : {
// user2的tag收藏信息
}
}
l Evaluation
n 读性能较差
n 写性能较好
l Approach
n 分布式 Key-Value 存储系统:Cassandra 入门
u http://www.ibm.com/developerworks/cn/opensource/os-cn-cassandra/
总结
NoSQL的起因是:目前的Web系统,Data-based越来越明显,Model-based越来越弱化,数据暂时不需要复杂的结构。
大多数应用只需要在松散的数据结构上存取数据。例如twitter,facebook。而不需要复杂的计算模型和数据模型。
同时,Google广泛使用的MapReduce计算模型,
在计算模型上具有广泛适用的灵活性与Scale horizontally(水平伸缩)能力。
其Key-Value数据模型,也是一种松散的数据结构。
Cassandra,Dynamo,Hbase也具有类似特征。
Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply.
浙公网安备 33010602011771号