共计 10693 个字符,预计需要花费 27 分钟才能阅读完成。
这篇文章给大家分享的是有关 Redis 内存诡异增长如何排查问题的内容。丸趣 TV 小编觉得挺实用的,因此分享给大家做个参考,一起跟随丸趣 TV 小编过来看看吧。
一、现象
实例名:r-bp1cxxxxxxxxxd04(主从)
问题:一分钟内存上涨了 2G,如下图所示:
键值规模:6000 万左右
内存一分钟增长 2G.png
二、Redis 内存分析
1. 内存组成
上图中的内存统计的是 Redis 的 info memory 命令中的 used_memory 属性,例如:
redis infomemory#Memoryused_memory:9195978072used_memory_human:8.56Gused_memory_rss:9358786560used_memory_peak:10190212744used_memory_peak_human:9.49Gused_memory_lua:38912mem_fragmentation_ratio:1.02mem_allocator:jemalloc-3.6.0
每个属性的详细说明
属性名属性说明 used_memoryRedis 分配器分配的内存量,也就是实际存储数据的内存总量 used_memory_human 以可读格式返回 Redis 使用的内存总量 used_memory_rss 从操作系统的角度,Redis 进程占用的总物理内存 used_memory_peak 内存分配器分配的最大内存,代表 used_memory 的历史峰值 used_memory_peak_human 以可读的格式显示内存消耗峰值 used_memory_luaLua 引擎所消耗的内存 mem_fragmentation_ratioused_memory_rss /used_memory 比值,表示内存碎片率 mem_allocatorRedis 所使用的内存分配器。默认: jemalloc
计算公式如下:
used_memory = 自身内存 + 对象内存 + 缓冲内存 +lua 内存 used_rss = used_memory + 内存碎片
如下图所示:
2. 内存分析
(1) 自身内存:一个空的 Redis 占用很小,可以忽略不计
(2) kv 内存:key 对象 + value 对象
(3) 缓冲区:客户端缓冲区 (普通 + slave 伪装 + pubsub) 以及 aof 缓冲区(比较固定,一般没问题)
(4) Lua:Lua 引擎所消耗的内存
3. 内存突增常见问题
(1) kv 内存:bigkey、大量写入
(2) 客户端缓冲区:一般常见的有普通客户端缓冲区 (例如 monitor 命令) 或者 pubsub 客户端缓冲区
三、问题排查
(1) bigkey ? 经扫描未发现 bigkey
Sampled 67234427 keys in the keyspace!
Total key length in bytes is 1574032382 (avg len 23.41)
Biggest string found CCARD_DEVICE_CARD_REF_MAP_KEY_016817000004209 has 20862 bytes
Biggest list found CCARD_VALID_DEVICE_TRAIN_QUEUE_KEY has 51 items
Biggest hash found CCARD_VALID_DEVICE_TRAIN_MAP_KEY has 51 fields
67234359 strings with 71767890 bytes (100.00% of keys, avg size 1.07)
67 lists with 151 items (00.00% of keys, avg size 2.25)
0 sets with 0 members (00.00% of keys, avg size 0.00)
1 hashs with 51 fields (00.00% of keys, avg size 51.00)
0 zsets with 0 members (00.00% of keys, avg size 0.00)
(2) 键值个数增加?未发现键值有明显变化
(3) 客户端缓冲区
由于内存增上去后,长时间没下落,如果是因为缓冲区问题,会从 info clients 找到明显问题,执行后发现:
redis info clients
# Clients
connected_clients:43
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0
admin_clients:6
rejected_vpc_conn_count:0
close_idle_unknown_conn_count:0
执行 client 中也没有明显的 omem 大于 0 的情况
id=80207addr=10.xx.0.4:63920fd=46name=age=624idle=1flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=0obl=0oll=0omem=0events=r cmd=ping read=0write=0type=user
id=80215addr=10.xx.0.23:43489fd=36name=age=591idle=1flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=0obl=0oll=0omem=0events=r cmd=ping read=0write=0type=user
id=80366addr=10.xx.0.8:59785fd=18name=age=84idle=1flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=0obl=0oll=0omem=0events=r cmd=delread=0write=0type=user
id=80356addr=10.xx.0.33:32117fd=13name=age=114idle=0flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=32768obl=0oll=0omem=0events=r cmd=ping read=0write=0type=user
id=80064addr=10.xx.59.4:53446fd=38name=age=1070idle=1070flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=0obl=0oll=0omem=0events=r cmd=NULL read=0write=0type=admin
id=80276addr=10.xx.0.23:48511fd=8name=age=387idle=1flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=0obl=0oll=0omem=0events=r cmd=ping read=0write=0type=user
id=80188addr=10.xx.0.33:16265fd=42name=age=681idle=3flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=0obl=0oll=0omem=0events=r cmd=ping read=0write=0type=user
id=80326addr=10.xx.0.32:59779fd=16name=age=209idle=0flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=32768obl=0oll=0omem=0events=r cmd=ping read=0write=0type=user
id=80065addr=10.xx.59.4:53447fd=45name=age=1070idle=1070flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=0obl=0oll=0omem=0events=r cmd=NULL read=0write=0type=admin
id=79936addr=10.xx.0.22:10607fd=30name=age=1480idle=1flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=32768obl=0oll=0omem=0events=r cmd=ping read=0write=0type=user
id=80174addr=10.xx.0.5:60914fd=6name=age=722idle=2flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=0obl=0oll=0omem=0events=r cmd=ping read=0write=0type=user
id=80300addr=10.xx.0.22:22757fd=48name=age=298idle=1flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=32768obl=0oll=0omem=0events=r cmd=ping read=0write=0type=user
id=80037addr=10.xx.0.5:55189fd=15name=age=1143idle=2flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=0obl=0oll=0omem=0events=r cmd=ping read=0write=0type=user
id=80330addr=10.xx.0.8:48533fd=17name=age=199idle=10flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=0obl=0oll=0omem=0events=r cmd=ping read=0write=0type=user
id=79896addr=10.xx.0.30:26814fd=11name=age=1616idle=1flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=0obl=0oll=0omem=0events=r cmd=ping read=0write=0type=user
id=80299addr=10.xx.0.24:11227fd=44name=age=303idle=3flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=0obl=0oll=0omem=0events=r cmd=ping read=0write=0type=user
id=80086addr=10.xx.0.32:52526fd=40name=age=1002idle=1flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=0obl=0oll=0omem=0events=r cmd=ping read=0write=0type=user
id=80202addr=10.xx.0.33:16658fd=26name=age=636idle=3flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=0obl=0oll=0omem=0events=r cmd=ping read=0write=0type=user
id=80256addr=10.xx.0.24:60496fd=19name=age=448idle=2flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=0obl=0oll=0omem=0events=r cmd=ping read=0write=0type=user
id=79908addr=10.xx.0.29:18975fd=12name=age=1583idle=1flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=0obl=0oll=0omem=0events=r cmd=ping read=0write=0type=user
id=80365addr=10.xx.0.29:46429fd=14name=age=85idle=1flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=32768obl=0oll=0omem=0events=r cmd=ping read=0write=0type=user
id=79869addr=10.xx.27.4:48455fd=35name=age=1700idle=1700flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=0obl=0oll=0omem=0events=r cmd=NULL read=0write=0type=admin
id=80334addr=10.xx.0.23:50012fd=39name=age=189idle=1flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=32768obl=0oll=0omem=0events=r cmd=ping read=0write=0type=user
id=80041addr=10.xx.0.32:51107fd=33name=age=1132idle=3flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=0obl=0oll=0omem=0events=r cmd=ping read=0write=0type=user
id=79992addr=10.xx.0.22:12068fd=28name=age=1289idle=1flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=32768obl=0oll=0omem=0events=r cmd=ping read=0write=0type=user
id=80251addr=10.xx.0.30:44213fd=23name=age=468idle=1flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=32768obl=0oll=0omem=0events=r cmd=ping read=0write=0type=user
id=80006addr=10.xx.0.2:45895fd=31name=age=1242idle=1flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=0obl=0oll=0omem=0events=r cmd=ping read=0write=0type=user
id=80321addr=10.xx.0.30:48048fd=5name=age=224idle=3flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=0obl=0oll=0omem=0events=r cmd=ping read=0write=0type=user
id=80381addr=10.xx.0.8:13360fd=22name=age=24idle=1flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=32768obl=0oll=0omem=0events=r cmd=delread=0write=0type=user
id=80200addr=10.xx.0.24:59183fd=24name=age=640idle=0flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=32768obl=0oll=0omem=0events=r cmd=ping read=0write=0type=user
id=80113addr=10.xx.0.2:52492fd=21name=age=915idle=1flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=32768obl=0oll=0omem=0events=r cmd=ping read=0write=0type=user
id=174addr=11.216.117.242:53027fd=9name=age=281390idle=0flags=S db=0sub=0psub=0multi=-1qbuf=0qbuf-free=32768obl=0oll=0omem=0events=r cmd=replconf read=0write=0type=admin
id=79991addr=10.xx.0.4:48412fd=25name=age=1296idle=0flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=32768obl=0oll=0omem=0events=r cmd=ping read=0write=0type=user
id=80301addr=127.0.0.1:47869fd=49name=age=291idle=261flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=0obl=0oll=0omem=0events=r cmd=strlen read=0write=0type=admin
id=80047addr=10.xx.59.4:53184fd=41name=age=1114idle=1114flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=0obl=0oll=0omem=0events=r cmd=NULL read=0write=0type=admin
id=80236addr=10.xx.0.5:62546fd=47name=age=516idle=1flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=32768obl=0oll=0omem=0events=r cmd=ping read=0write=0type=user
id=80364addr=10.xx.0.4:18794fd=7name=age=85idle=1flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=32768obl=0oll=0omem=0events=r cmd=ping read=0write=0type=user
id=80175addr=10.xx.0.4:62245fd=29name=age=718idle=1flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=32768obl=0oll=0omem=0events=r cmd=ping read=0write=0type=user
id=80336addr=10.xx.0.29:45701fd=50name=age=180idle=1flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=32768obl=0oll=0omem=0events=r cmd=ping read=0write=0type=user
id=80050addr=10.xx.59.4:53188fd=43name=age=1114idle=1114flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=0obl=0oll=0omem=0events=r cmd=NULL read=0write=0type=admin
id=79765addr=10.xx.0.2:33832fd=37name=age=2027idle=177flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=0obl=0oll=0omem=0events=r cmd=info read=0write=0type=user
id=80170addr=10.xx.0.2:57853fd=20name=age=728idle=24flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=0obl=0oll=0omem=0events=r cmd=ping read=0write=0type=user
id=80390addr=127.0.0.1:49449fd=27name=age=0idle=0flags=N db=0sub=0psub=0multi=-1qbuf=0qbuf-free=32768obl=0oll=0omem=0events=r cmd=client read=0write=0type=admin
四、揪出元凶
常用的几招都用了,还是不行,同事 @径远帮忙一起分析,怀疑是不是因为 Redis 的 kv 哈希表做了 rehash。
1. Redis 的 kv 存储结构
如下图所示,Redis 的所有 kv 保存在 dict 中,其中 ht 对应两个哈希表 ht[0]和 ht[1],平时一个空闲,一个用于存储数据,只有当需要 rehash 时,ht[1]才会用到。
2. Redis 的字典 rehash
为了保证哈希表的负载,当哈希表的元素个数等于哈希表槽数时候,会进行 rehash 扩容。扩容后 h[1]的容量等于第一个大于等于 ht[0].size* 2 的 2n,例如 hash 表的初始化容量是 4,那么下一次扩容就是 8,以此类推。
3. 测试
(1) 测试方法
先批量写入到 rehash 阈值附近,然后在逐条去写,观察内存变化
// 为每个键设置 1 天过期时间
int expireTime = 60 * 60 * 24;
// rehash 阈值 - 50 为了方便观察 rehash 内存变化
int rehashThreshold = (int) Math.pow(2, 25) - 50;
// 1. 批量写入:pipeline 批量写入,由于是本机测试,这里用 10000,实际生产不要这么用
Pipeline pipeline = jedis.pipelined();
pipeline = jedis.pipelined();
for (int i = 0; i rehashThreshold; i++) { pipeline.setex(String.valueOf(i), expireTime, String.valueOf(i));
if (i % 10000 == 0) { pipeline.sync();
}
pipeline.sync();
// 2. 等待写增量
TimeUnit.SECONDS.sleep(5);
for (int i = rehashThreshold; i rehashThreshold + 200; i++) { jedis.setex(String.valueOf(i), expireTime, String.valueOf(i));
TimeUnit.SECONDS.sleep(1);
}
(2) 开始测试
(a) 当阈值 =215=32768,从下面可以看出到 key 的个数为 32769 时,内存涨了一些,但是还不明显。
keys mem clients blocked requests connections32766 4.69M 3 0 32797 (+2) 4
32767 4.69M 3 0 32799 (+2) 4
32768 4.69M 3 0 32801 (+2) 4
32769 5.44M 3 0 32803 (+2) 4
(b) 当阈值 =220=1048576,从下面可以看出到 key 的个数为 1048577 时,内存涨了 32M。因为 rehash 会扩容,所以新的哈希表中的槽位变为了 221 * 2(因为每个 key 都设置了过期时间,expires 表),指针为 8 个字节,221 ? 2 ? 8 = 225 = 32MB。
keys mem clients blocked requests connections1048574 128.69M 3 0 3364129 (+2) 16
1048575 128.69M 3 0 3364131 (+2) 16
1048576 128.69M 3 0 3364133 (+2) 16
1048577 160.69M 3 0 3364135 (+2) 16
1048578 160.69M 3 0 3364137 (+2) 16
(c) 当阈值 =226=67108864,从下面可以看出到 key 的个数为 67108865 时,内存涨了 2GB。因为 rehash 会扩容,所以新的哈希表中的槽位变为了 227 * 2(因为每个 key 都设置了过期时间,expires 表),指针为 8 个字节,227 ? 2 ? 8 = 231 = 2GB。
keys mem clients blocked requests connections67108862 9.70G 3 0 70473683 (+2) 18
67108863 9.70G 3 0 70473685 (+2) 18
67108864 9.70G 3 0 70473687 (+2) 18
67108865 11.70G 3 0 70473689 (+2) 18
67108866 11.70G 3 0 70473691 (+2) 18
67108867 11.70G 3 0 70473693 (+2) 18
回过来看 r -bp1c15fd9b142d04 的 key 和内存变化图,可以发现上面的规则是正确的:
4. 后续观察
17 点时,rehash 结束,内存降了增加的 2G 的一半。
感谢各位的阅读!关于“Redis 内存诡异增长如何排查问题”这篇文章就分享到这里了,希望以上内容可以对大家有一定的帮助,让大家可以学到更多知识,如果觉得文章不错,可以把它分享出去让更多的人看到吧!