Redis学习笔记1-SDS数据结构

Redis6.0源码及《Redis设计与实现》阅读

简单动态字符串simple dynamic string, redis中默认的字符串表示。

源码阅读:

/src/sds.h & /src/sds.c

SDS的定义

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
/* Note: sdshdr5 is never used, we just access the flags byte directly.
* However is here to document the layout of type 5 SDS strings. */
struct __attribute__ ((__packed__)) sdshdr5 {
unsigned char flags; /* 3 lsb of type, and 5 msb of string length */
char buf[];
};
struct __attribute__ ((__packed__)) sdshdr8 {
uint8_t len; /* used */
uint8_t alloc; /* excluding the header and null terminator */
unsigned char flags; /* 3 lsb of type, 5 unused bits */
char buf[];
};
struct __attribute__ ((__packed__)) sdshdr16 {
uint16_t len; /* used */
uint16_t alloc; /* excluding the header and null terminator */
unsigned char flags; /* 3 lsb of type, 5 unused bits */
char buf[];
};
struct __attribute__ ((__packed__)) sdshdr32 {
uint32_t len; /* used */
uint32_t alloc; /* excluding the header and null terminator */
unsigned char flags; /* 3 lsb of type, 5 unused bits */
char buf[];
};
struct __attribute__ ((__packed__)) sdshdr64 {
uint64_t len; /* used */
uint64_t alloc; /* excluding the header and null terminator */
unsigned char flags; /* 3 lsb of type, 5 unused bits */
char buf[];
};

SDS有5种不同的类型,使用非字节对齐的结构体让不同的字符串使用不同长度的数据结构,从而节省内存。

SDS 遵循 C 字符串以空字符结尾的惯例, 保存空字符的 1 字节空间不计算在 SDS 的 len 属性里面, 并且为空字符分配额外的 1 字节空间, 以及添加空字符到字符串末尾等操作都是由 SDS 函数自动完成的, 所以这个空字符对于 SDS 的使用者来说是完全透明的。

获取字符串长度

SDS中获取一个字符串长度的复杂度仅为O(1),在len属性中已经记录了字符串的长度,获取长度仅需访问属性即可

杜绝缓冲区溢出

当SDS api需要对SDS进行修改的时候,API会先检查SDS的空间是否满足修改所需的要求,如果不满足,api会自动将SDS空间扩展至执行修改所需要的大小,才进行修改操作,所以这种空间分配策略完全杜绝了发生字符串缓冲区溢出的可能性

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
/* Append the specified binary-safe string pointed by 't' of 'len' bytes to the
* end of the specified sds string 's'.
*
* After the call, the passed sds string is no longer valid and all the
* references must be substituted with the new pointer returned by the call. */
sds sdscatlen(sds s, const void *t, size_t len) {
size_t curlen = sdslen(s);

s = sdsMakeRoomFor(s,len);
if (s == NULL) return NULL;
memcpy(s+curlen, t, len);
sdssetlen(s, curlen+len);
s[curlen+len] = '\0';
return s;
}

检查SDS空间是否足够,不够的话通过sdsMakeRoomFor自动为要拼接的字符串拓展空间

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
/* Enlarge the free space at the end of the sds string so that the caller
* is sure that after calling this function can overwrite up to addlen
* bytes after the end of the string, plus one more byte for nul term.
*
* Note: this does not change the *length* of the sds string as returned
* by sdslen(), but only the free buffer space we have. */
sds sdsMakeRoomFor(sds s, size_t addlen) {
void *sh, *newsh;
size_t avail = sdsavail(s);
size_t len, newlen;
char type, oldtype = s[-1] & SDS_TYPE_MASK;
int hdrlen;

/* Return ASAP if there is enough space left. */
if (avail >= addlen) return s;

/* SDS_MAX_PREALLOC is defined to 1MB according to sds.h.*/
len = sdslen(s);
sh = (char*)s-sdsHdrSize(oldtype);
newlen = (len+addlen);
if (newlen < SDS_MAX_PREALLOC)
newlen *= 2;
else
newlen += SDS_MAX_PREALLOC;

/* Change header if necessary.*/
type = sdsReqType(newlen);

/* Don't use type 5: the user is appending to the string and type 5 is
* not able to remember empty space, so sdsMakeRoomFor() must be called
* at every appending operation. */
if (type == SDS_TYPE_5) type = SDS_TYPE_8;

hdrlen = sdsHdrSize(type);
if (oldtype==type) {
newsh = s_realloc(sh, hdrlen+newlen+1);
if (newsh == NULL) return NULL;
s = (char*)newsh+hdrlen;
} else {
/* Since the header size changes, need to move the string forward,
* and can't use realloc */
newsh = s_malloc(hdrlen+newlen+1);
if (newsh == NULL) return NULL;
memcpy((char*)newsh+hdrlen, s, len+1);
s_free(sh);
s = (char*)newsh+hdrlen;
s[-1] = type;
sdssetlen(s, len);
}
sdssetalloc(s, newlen);
return s;
}

减少修改字符串时带来的内存重分配次数

  • 空间预分配

当SDS api对一个sds进行修改并需要对其进行空间扩展的时候,程序还会额外为其分配除了所需的空间之外的未使用空间。

实现方式如上sdsMakeRoomFor

  • 惰性空间释放
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
/* Remove the part of the string from left and from right composed just of
* contiguous characters found in 'cset', that is a null terminted C string.
*
* After the call, the modified sds string is no longer valid and all the
* references must be substituted with the new pointer returned by the call.
*
* Example:
*
* s = sdsnew("AA...AA.a.aa.aHelloWorld :::");
* s = sdstrim(s,"Aa. :");
* printf("%s\n", s);
*
* Output will be just "HelloWorld".
*/
sds sdstrim(sds s, const char *cset) {
char *start, *end, *sp, *ep;
size_t len;

sp = start = s;
ep = end = s+sdslen(s)-1;
while(sp <= end && strchr(cset, *sp)) sp++;
while(ep > sp && strchr(cset, *ep)) ep--;
len = (sp > ep) ? 0 : ((ep-sp)+1);
if (s != sp) memmove(s, sp, len);
s[len] = '\0';
sdssetlen(s,len);
return s;
}

可以看出sdstrim结束后并没有真正释放空间,SDS将多出来的空间作为未使用保存在SDS中,而是修改了结构体中的len属性,这样避免了缩短字符串所需的内存重分配操作,并为将来可能有的增长操作提供了优化。如果有需要真正释放内存空间的时候需要调用sdsRemoveFreeSpace函数:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
/* Reallocate the sds string so that it has no free space at the end. The
* contained string remains not altered, but next concatenation operations
* will require a reallocation.
*
* After the call, the passed sds string is no longer valid and all the
* references must be substituted with the new pointer returned by the call. */
sds sdsRemoveFreeSpace(sds s) {
void *sh, *newsh;
char type, oldtype = s[-1] & SDS_TYPE_MASK;
int hdrlen, oldhdrlen = sdsHdrSize(oldtype);
size_t len = sdslen(s);
size_t avail = sdsavail(s);
sh = (char*)s-oldhdrlen;

/* Return ASAP if there is no space left. */
if (avail == 0) return s;

/* Check what would be the minimum SDS header that is just good enough to
* fit this string. */
type = sdsReqType(len);
hdrlen = sdsHdrSize(type);

/* If the type is the same, or at least a large enough type is still
* required, we just realloc(), letting the allocator to do the copy
* only if really needed. Otherwise if the change is huge, we manually
* reallocate the string to use the different header type. */
if (oldtype==type || type > SDS_TYPE_8) {
newsh = s_realloc(sh, oldhdrlen+len+1);
if (newsh == NULL) return NULL;
s = (char*)newsh+oldhdrlen;
} else {
newsh = s_malloc(hdrlen+len+1);
if (newsh == NULL) return NULL;
memcpy((char*)newsh+hdrlen, s, len+1);
s_free(sh);
s = (char*)newsh+hdrlen;
s[-1] = type;
sdssetlen(s, len);
}
sdssetalloc(s, len);
return s;
}

二进制数据的存放

SDS api会以处理二进制的方式来处理任何放进buf数组的数据并不会对其中的数据作任何限制、过滤或者假设,数据在线写入的时候时什么样子的,被读取的时候就是什么样子的。

例如,SDS使用len属性来判断字符串是否结束而不是空字符。打个不完全恰当的比方,SDS就不会出现类似于php中%00截断的安全问题。

看源码的时候发现这个有趣的函数,意思是可以在手动修改字符串后,使用sdsupdatelen进行字符串长度的更新,即按照空字符进行截断,而不是上述默认的长度判定方式。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
/* Set the sds string length to the length as obtained with strlen(), so
* considering as content only up to the first null term character.
*
* This function is useful when the sds string is hacked manually in some
* way, like in the following example:
*
* s = sdsnew("foobar");
* s[2] = '\0';
* sdsupdatelen(s);
* printf("%d\n", sdslen(s));
*
* The output will be "2", but if we comment out the call to sdsupdatelen()
* the output will be "6" as the string was modified but the logical length
* remains 6 bytes. */
void sdsupdatelen(sds s) {
size_t reallen = strlen(s);
sdssetlen(s, reallen);
}

Redis版本:6.0

参考部分《Redis设计与实现》及其作者的源码注释,书中版本为3.0

为了看起来和谐,部分代码中自己添加的注释保持使用英文 :)

参考资料:

http://cs-cjl.com/2019/04_13_redis5_data_structure_1_sds

https://github.com/redis/redis

https://github.com/huangz1990/redis-3.0-annotated