redis 源码阅读内部数据结构--字符串

redis的内部数据结构主要有：字符串，双端链表，字典，跳跃表。

这里主要记录redise字符串的设计。相关的源码位于：src/sds.h 和 src/sds.c。

一字符串 sds的结构体

struct sdshdr {
int len; // buf 已占用长度
int free; // buf 剩余可用长度
char buf[]; // 实际保存字符串数据的地方
};

从这个结构可以看出，redis字符串和C的不一样，本质字符串是保存在内存的某一个位置，然后把它的指针放到buf上。.

这种方式对于读取字符串的长度的很快，是O(1)。

另一个原因是redis 对字符串的追加操作比较频繁。这种方式的追加可以减少对内存的申请频度。

对于这种可以举个简单的例子：

struct sdshdr {
len = 11;
free = 0;
buf = "hello world\0"; // buf 的实际长度为 len + 1
};

二字符串的追加

当在buf后追加字符串时，发现free=0或不足于让新追加的字符串加到buf时，就会按照策略去申请更大的空间。如果free的大小足够大，就不会去申请。

申请的策略在sdsMakeRoomFor中。如下是redis的源码。

/* Enlarge the free space at the end of the sds string so that the caller
* is sure that after calling this function can overwrite up to addlen
* bytes after the end of the string, plus one more byte for nul term.
*
* Note: this does not change the *length* of the sds string as returned
* by sdslen(), but only the free buffer space we have. */
sds sdsMakeRoomFor(sds s, size_t addlen) {
struct sdshdr *sh, *newsh;
size_t free = sdsavail(s);
size_t len, newlen;
 
if (free >= addlen) return s;
len = sdslen(s);
sh = (void*) (s-(sizeof(struct sdshdr)));
newlen = (len+addlen);
if (newlen < SDS_MAX_PREALLOC)
newlen *= 2;
else
newlen += SDS_MAX_PREALLOC;
newsh = zrealloc(sh, sizeof(struct sdshdr)+newlen+1);
if (newsh == NULL) return NULL;
 
newsh->free = newlen - len;
return newsh->buf;
}

其中，#define SDS_MAX_PREALLOC (1024*1024) 。如果新字符串的总长度小于 SDS_MAX_PREALLOC。那么为字符串分配 2 倍于所需长度的空间。否则就分配所需长度加上 SDS_MAX_PREALLOC 数量的空间。

三字符串的API

对于这样一个结构体，就应该有对应API提供给其他模块操作。sds对外API都在他的头文件中src/sds.h。

/*
字符串的长度
*/
static inline size_t sdslen(const sds s) {
struct sdshdr *sh = (void*)(s-(sizeof(struct sdshdr)));
return sh->len;
}
 
/*
字符串剩余长度
*/
static inline size_t sdsavail(const sds s) {
struct sdshdr *sh = (void*)(s-(sizeof(struct sdshdr)));
return sh->free;
}
 
sds sdsnewlen(const void *init, size_t initlen); /* 创建新字符串，内部申请了内存 */
sds sdsnew(const char *init); /* 对sdsnewlen的封装而已 */
sds sdsempty(void); /*创建一个空的字符串 调用sdsnewlen */
size_t sdslen(const sds s);
sds sdsdup(const sds s); /* 拷贝 */
void sdsfree(sds s); /* 释放 */
size_t sdsavail(const sds s);
sds sdsgrowzero(sds s, size_t len); /* 将给定 sds 的 buf 扩展至指定长度，无内容的部分用 \0 来填充 */
sds sdscatlen(sds s, const void *t, size_t len); /* 追加一个C类型的字符串 带长度len */
sds sdscat(sds s, const char *t); /* 调用sdscatlen， 在内部算长度 */
sds sdscatsds(sds s, const sds t); /* 追加一个sds 字符串 */
sds sdscpylen(sds s, const char *t, size_t len); /* 拷贝一个C类型的字符串 带长度len */
sds sdscpy(sds s, const char *t); /* 调用sdscpylen， 在内部算长度 */
 
sds sdscatvprintf(sds s, const char *fmt, va_list ap);
#ifdef __GNUC__
sds sdscatprintf(sds s, const char *fmt, ...)
__attribute__((format(printf, 2, 3)));
#else
sds sdscatprintf(sds s, const char *fmt, ...);
#endif
 
sds sdscatfmt(sds s, char const *fmt, ...);
sds sdstrim(sds s, const char *cset);
void sdsrange(sds s, int start, int end); /* 取出子串 end为负时从后面往前算起 */
void sdsupdatelen(sds s); /* 当手动强制把字符串砍掉时， 要用sdsupd telen更新len和free */
void sdsclear(sds s); /* 清除掉当前的字符串 */
int sdscmp(const sds s1, const sds s2); /* 比较两个字符串 */
/* 把s按sep分割, len是s的长度，seplen是sep的长度 */
sds *sdssplitlen(const char *s, int len, const char *sep, int seplen, int *count);
void sdsfreesplitres(sds *tokens, int count); /* 释放 sdssplitlen 的返回值，sdssplitlen专用啊,其实就是释放一个数组 */
void sdstolower(sds s); /* 转为小写 */
void sdstoupper(sds s); /* 转为大写 */
sds sdsfromlonglong(long long value); /*long long 转为字符串 */
sds sdscatrepr(sds s, const char *p, size_t len);
sds *sdssplitargs(const char *line, int *argc);
sds sdsmapchars(sds s, const char *from, const char *to, size_t setlen);
sds sdsjoin(char **argv, int argc, char *sep);
 
/* Low level functions exposed to the user API */
sds sdsMakeRoomFor(sds s, size_t addlen); /* 按策略申请长度 */
void sdsIncrLen(sds s, int incr);
sds sdsRemoveFreeSpace(sds s);
size_t sdsAllocSize(sds s);

大致看了一些实现，还算是比较清晰。可以了解几个比较主要的函数。其中sdsnewlen是对字符串的初始化。

/* Create a new sds string with the content specified by the 'init' pointer
* and 'initlen'.
* If NULL is used for 'init' the string is initialized with zero bytes.
*
* The string is always null-termined (all the sds strings are, always) so
* even if you create an sds string with:
*
* mystring = sdsnewlen("abc",3");
*
* You can print the string with printf() as there is an implicit \0 at the
* end of the string. However the string is binary safe and can contain
* \0 characters in the middle, as the length is stored in the sds header. */
sds sdsnewlen(const void *init, size_t initlen) {
struct sdshdr *sh;
if (init) {
sh = zmalloc(sizeof(struct sdshdr)+initlen+1);
} else {
sh = zcalloc(sizeof(struct sdshdr)+initlen+1);
}
if (sh == NULL) return NULL;
sh->len = initlen;
sh->free = 0;
if (initlen && init)
memcpy(sh->buf, init, initlen);
sh->buf[initlen] = '\0';
return (char*)sh->buf;
}
/* Free an sds string. No operation is performed if 's' is NULL. */
void sdsfree(sds s) {
if (s == NULL) return;
zfree(s-sizeof(struct sdshdr));
}

其中zmalloc zcalloc 和zfree 是申请内存的。封装malloc和calloc，主要是考虑到跨平台的情况。不过从这个地方可以看出redis在对内存申请与释放到什么独到的管理方式。这种方式用sds字符串，一不小心就可能会内存泄漏了。

四好处与坏外

按照《redis 设计与实现》这书的说法，是：

对比 C 字符串，sds 有以下特性：

可以高效地执行长度计算（ strlen）；

可以高效地执行追加操作（ append）；

二进制安全；

sds 会为追加操作进行优化：加快追加操作的速度，并降低内存分配的次数，代价是多占

用了一些内存，而且这些内存不会被主动释放。

五抽出模块

看redis字符串模块的源码的过程中，抽出简化一些，做了一个test。放在了github上。地址是：https://github.com/CarlosFang/modrds/tree/master/string

posted @ 2016-07-22 15:52 xyecho 阅读(927) 评论(0) 收藏举报

刷新页面返回顶部

我所能努力的~

from myself.

redis 源码阅读内部数据结构--字符串

公告

我所能努力的~

from myself.

redis 源码阅读 内部数据结构--字符串

公告

redis 源码阅读内部数据结构--字符串