共计 3610 个字符,预计需要花费 10 分钟才能阅读完成。
丸趣 TV 小编给大家分享一下 Ceph pg unfound 处理过程的示例分析,相信大部分人都还不怎么了解,因此分享这篇文章给大家参考一下,希望大家阅读完这篇文章后大有收获,下面让我们一起去了解一下吧!
今天检查 ceph 集群,发现有 pg 丢失,于是就有了本文~~~
1. 查看集群状态
[root@k8snode001 ~]# ceph health detail HEALTH_ERR 1/973013 objects unfound (0.000%); 17 scrub errors; Possible data damage: 1 pg recovery_unfound, 8 pgs inconsistent, 1 pg repair; Degraded data redundancy: 1/2919039 objects degraded (0.000%), 1 pg degraded OBJECT_UNFOUND 1/973013 objects unfound (0.000%) pg 2.2b has 1 unfound objects OSD_SCRUB_ERRORS 17 scrub errors PG_DAMAGED Possible data damage: 1 pg recovery_unfound, 8 pgs inconsistent, 1 pg repair pg 2.2b is active+recovery_unfound+degraded, acting [14,22,4], 1 unfound pg 2.44 is active+clean+inconsistent, acting [14,8,21] pg 2.73 is active+clean+inconsistent, acting [25,14,8] pg 2.80 is active+clean+scrubbing+deep+inconsistent+repair, acting [4,8,14] pg 2.83 is active+clean+inconsistent, acting [14,13,6] pg 2.ae is active+clean+inconsistent, acting [14,3,2] pg 2.c4 is active+clean+inconsistent, acting [8,21,14] pg 2.da is active+clean+inconsistent, acting [23,14,15] pg 2.fa is active+clean+inconsistent, acting [14,23,25] PG_DEGRADED Degraded data redundancy: 1/2919039 objects degraded (0.000%), 1 pg degraded pg 2.2b is active+recovery_unfound+degraded, acting [14,22,4], 1 unfound
从输出发现 pg 2.2b is active+recovery_unfound+degraded, acting [14,22,4], 1 unfound
现在我们来查看 pg 2.2b,看看这个 pg 的想想信息。
[root@k8snode001 ~]# ceph pg dump_json pools |grep 2.2b dumped all 2.2b 2487 1 1 0 1 9533198403 3048 3048 active+recovery_unfound+degraded 2020-07-23 08:56:07.669903 10373 5448370 10373:7312614 [14,22,4] 14 [14,22,4] 14 10371 5437258 2020-07-23 08:56:06.637012 10371 5437258 2020-07-23 08:56:06.637012 0
可以看到它现在只有一个副本
2. 查看 pg map
[root@k8snode001 ~]# ceph pg map 2.2b osdmap e10373 pg 2.2b (2.2b) - up [14,22,4] acting [14,22,4]
从 pg map 可以看出,pg 2.2b 分布到 osd [14,22,4] 上
3. 查看存储池状态
[root@k8snode001 ~]# ceph osd pool stats k8s-1 pool k8s-1 id 2 1/1955664 objects degraded (0.000%) 1/651888 objects unfound (0.000%) client io 271 KiB/s wr, 0 op/s rd, 52 op/s wr [root@k8snode001 ~]# ceph osd pool ls detail|grep k8s-1 pool 2 k8s-1 replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 88 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
4. 尝试恢复 pg 2.2b 丢失地块
[root@k8snode001 ~]# ceph pg repair 2.2b
如果一直修复不成功,可以查看卡住 PG 的具体信息,主要关注 recovery_state,命令如下
[root@k8snode001 ~]# ceph pg 2.2b query { ...... recovery_state : [ { name : Started/Primary/Active , enter_time : 2020-07-21 14:17:05.855923 , might_have_unfound : [], recovery_progress : { backfill_targets : [], waiting_on_backfill : [], last_backfill_started : MIN , backfill_info : { begin : MIN , end : MIN , objects : [] }, peer_backfill_info : [], backfills_in_flight : [], recovering : [], pg_backend : { pull_from_peer : [], pushing : [] } }, scrub : { scrubber.epoch_start : 10370 , scrubber.active : false, scrubber.state : INACTIVE , scrubber.start : MIN , scrubber.end : MIN , scrubber.max_end : MIN , scrubber.subset_last_update : 0 0 , scrubber.deep : false, scrubber.waiting_on_whom : [] } }, { name : Started , enter_time : 2020-07-21 14:17:04.814061 } ], agent_state : {} }
如果 repair 修复不了; 两种解决方案,回退旧版或者直接删除
5. 解决方案
回退旧版 [root@k8snode001 ~]# ceph pg 2.2b mark_unfound_lost revert 直接删除 [root@k8snode001 ~]# ceph pg 2.2b mark_unfound_lost delete
6. 验证
我这里直接删除了,然后 ceph 集群重建 pg, 稍等会再看,pg 状态变为 active+clean
[root@k8snode001 ~]# ceph pg 2.2b query { state : active+clean , snap_trimq : [] , snap_trimq_len : 0, epoch : 11069, up : [ 12, 22, 4 ],
再次查看集群状态
[root@k8snode001 ~]# ceph health detail HEALTH_OK
以上是“Ceph pg unfound 处理过程的示例分析”这篇文章的所有内容,感谢各位的阅读!相信大家都有了一定的了解,希望分享的内容对大家有所帮助,如果还想学习更多知识,欢迎关注丸趣 TV 行业资讯频道!