逖靖寒的世界

每天进步一点点

导航

Cassandra-0.7.0-beta1中的新特性

前一阵子Cassandra-0.7.0-beta1发布了,今天把代码拿下来粗略浏览了一下,发现主要有以下几点变化:


1 数据模型中的Keyspace和ColumnFamily可以动态修改:

之前的版本中,如果想在Cassandra中修改Keyspace和ColumnFamily,必须先停掉Cassandra,然后修改配置文件,最后再重启Cassandra才能生效。

在现在的版本中,我们只需要定义新的Keyspace和ColumnFamily,然后再调用Thrift接口将新的Keyspace和ColumnFamily定义发送给Cassandra即可。

相关的结构体和接口定义可以在cassandra.thrift文件中找到:

/* 相关结构体定义. */

/* describes a column in a column family. */
struct ColumnDef {
    1: required binary name,
    2: required string validation_class,
    3: optional IndexType index_type,
    4: optional string index_name
}

/* describes a column family. */
struct CfDef {
    1: required string keyspace,
    2: required string name,
    3: optional string column_type="Standard",
    4: optional string clock_type="Timestamp",
    5: optional string comparator_type="BytesType",
    6: optional string subcomparator_type="",
    7: optional string reconciler="",
    8: optional string comment="",
    9: optional double row_cache_size=0,
    10: optional bool preload_row_cache=0,
    11: optional double key_cache_size=200000,
    12: optional double read_repair_chance=1.0
    13: optional list<ColumnDef> column_metadata
    14: optional i32 gc_grace_seconds
}

/* describes a keyspace. */
struct KsDef {
    1: required string name,
    2: required string strategy_class,
    3: optional map<string,string> strategy_options,
    4: required i32 replication_factor,
    5: required list<CfDef> cf_defs,
}

/* 相关接口定义. */

/** adds a column family. returns the new schema id. */
string system_add_column_family(1:required CfDef cf_def)
throws (1:InvalidRequestException ire),
   
/** drops a column family. returns the new schema id. */
string system_drop_column_family(1:required string column_family)
throws (1:InvalidRequestException ire),
   
/** renames a column family. returns the new schema id. */
string system_rename_column_family(1:required string old_name, 2:required string new_name)
throws (1:InvalidRequestException ire),

/** adds a keyspace and any column families that are part of it. returns the new schema id. */
string system_add_keyspace(1:required KsDef ks_def)
throws (1:InvalidRequestException ire),

/** drops a keyspace and any column families that are part of it. returns the new schema id. */
string system_drop_keyspace(1:required string keyspace)
throws (1:InvalidRequestException ire),

/** renames a keyspace. returns the new schema id. */
string system_rename_keyspace(1:required string old_name, 2:required string new_name)
throws (1:InvalidRequestException ire),

2 增加二级索引,提供对Column的value进行查询的功能:

和几乎所有的K/V系统一样,Cassandra只能提供对key的查询,如果我们希望查询某一个key下的value值为一个特定值的情况,只能是将所有的数据取出来,然后遍历,或者使用一些其他的方案提供查询效率避免全表扫描。如:我之前的文章《反转Cassandra索引》,还有一个叫做Lucandra。

如果希望在新的版本中使用二级索引的功能,需要在ColumnFamily中指定要对哪个Column建立索引。同时指定的建立索引方式(目前只支持IndexType.KEYS)。

当包含索引的ColumnFamily在Cassandra建立的时候,Cassandra会额外为ColumnFamily中每一个需要建立索引的Column再建立独立的IndexedColumnFamily。

当写入数据的时候,数据不仅会出存储和数据相关的ColumnFamily中,IndexedColumnFamily中也会存储所有和本索引相关的数据。

当按照索引查询数据的时候,Cassandra将直接从IndexedColumnFamily查询相应的数据。

相关的结构体和接口定义可以在cassandra.thrift文件中找到:

/* 相关结构体定义. */

enum IndexType {
    KEYS,
}

/* describes a column in a column family. */
struct ColumnDef {
    1: required binary name,
    2: required string validation_class,
    3: optional IndexType index_type,
    4: optional string index_name
}

/* 相关接口定义. */

/** Returns the subset of columns specified in SlicePredicate for the rows matching the IndexClause */
list<KeySlice> get_indexed_slices(1:required ColumnParent column_parent,
				2:required IndexClause index_clause,
				3:required SlicePredicate column_predicate,
				4:required ConsistencyLevel consistency_level=ONE)
throws (1:InvalidRequestException ire, 2:UnavailableException ue, 3:TimedOutException te),

3 配置文件格式修改

新版本的Cassandra采用了yaml格式来进行配置,好处是可读性更好。

我们可以对比一下配置集群的名称这个选项,2中不同格式的区别:

老版本(storage-conf.xml):

<!--
~ The name of this cluster. This is mainly used to prevent machines in
~ one logical cluster from joining another.
-->
<ClusterName>Test Cluster</ClusterName>

新版本(cassandra.yaml):

# name of the cluster
cluster_name:
'Test Cluster'

除此之外。还有大量的修改:

 

0.7-beta1
* sstable versioning (CASSANDRA-389)
* switched to slf4j logging (CASSANDRA-625)
* add (optional) expiration time for column (CASSANDRA-699)
* access levels for authentication/authorization (CASSANDRA-900)
* add ReadRepairChance to CF definition (CASSANDRA-930)
* fix heisenbug in system tests, especially common on OS X (CASSANDRA-944)
* convert to byte[] keys internally and all public APIs (CASSANDRA-767)
* ability to alter schema definitions on a live cluster (CASSANDRA-44)
* renamed configuration file to cassandra.xml, and log4j.properties to
log4j
-server.properties, which must now be loaded from
the classpath (which
is how our scripts in bin/ have always done it)
(CASSANDRA
-971)
* change get_count to require a SlicePredicate. create multi_get_count
(CASSANDRA
-744)
* re-organized endpointsnitch implementations and added SimpleSnitch
(CASSANDRA
-994)
* Added preload_row_cache option (CASSANDRA-946)
* add CRC to commitlog header (CASSANDRA-999)
* removed deprecated batch_insert and get_range_slice methods (CASSANDRA-1065)
* add truncate thrift method (CASSANDRA-531)
* http mini-interface using mx4j (CASSANDRA-1068)
* optimize away copy of sliced row on memtable read path (CASSANDRA-1046)
* replace constant-size 2GB mmaped segments and special casing for index
entries spanning segment boundaries, with SegmentedFile that computes
segments that always contain entire entries
/rows (CASSANDRA-1117)
* avoid reading large rows into memory during compaction (CASSANDRA-16)
* added hadoop OutputFormat (CASSANDRA-1101)
* efficient Streaming (no more anticompaction) (CASSANDRA-579)
* split commitlog header into separate file and add size checksum to
mutations (CASSANDRA
-1179)
* avoid allocating a new byte[] for each mutation on replay (CASSANDRA-1219)
* revise HH schema to be per-endpoint (CASSANDRA-1142)
* add joining/leaving status to nodetool ring (CASSANDRA-1115)
* allow multiple repair sessions per node (CASSANDRA-1190)
* optimize away MessagingService for local range queries (CASSANDRA-1261)
* make framed transport the default so malformed requests can't OOM the
server (CASSANDRA-475)
* significantly faster reads from row cache (CASSANDRA-1267)
* take advantage of row cache during range queries (CASSANDRA-1302)
* make GCGraceSeconds a per-ColumnFamily value (CASSANDRA-1276)
* keep persistent row size and column count statistics (CASSANDRA-1155)
* add IntegerType (CASSANDRA-1282)
* page within a single row during hinted handoff (CASSANDRA-1327)
* push DatacenterShardStrategy configuration into keyspace definition,
eliminating datacenter.properties. (CASSANDRA
-1066)
* optimize forward slices starting with '' and single-index-block name
queries by skipping the column index (CASSANDRA
-1338)
* streaming refactor (CASSANDRA-1189)
* faster comparison for UUID types (CASSANDRA-1043)
* secondary index support (CASSANDRA-749 and subtasks)

 

更多关于Cassandra的文章:http://www.cnblogs.com/gpcuster/tag/Cassandra/

posted on 2010-08-20 14:45  逖靖寒  阅读(2514)  评论(3编辑  收藏