Oracle集群心跳及其參数misscount/disktimeout/reboottime分析

123次阅读

共计 4439 个字符，预计需要花费 12 分钟才能阅读完成。

行业资讯
数据库
关系型数据库
Oracle 集群心跳及其參数 misscount/disktimeout/reboottime 分析

这篇文章主要讲解了“Oracle 集群心跳及其參数 misscount/disktimeout/reboottime 分析”，文中的讲解内容简单清晰，易于学习与理解，下面请大家跟着丸趣 TV 小编的思路慢慢深入，一起来研究和学习“Oracle 集群心跳及其參数 misscount/disktimeout/reboottime 分析”吧！

一、OCSSD 与 CSS
OCSSD 是一个管理及提供 Cluster Synchronization Services (CSS) 服务的 Linux 或者 Unix 进程。使用 Oracle 用户来执行该进程并提供节点成员管理功能，一旦该进程失败。将导致节点重新启动。CSS 服务提供 2 种心跳机制。一种为网络心跳。一种为磁盘心跳。两种心跳都有最大延时，网络心跳的延时叫 MC(Misscount)，磁盘心跳延时叫作 IOT (I/O Timeout)。

这 2 个參数都以秒为单位。缺省时情况下 Misscount Disktimeout。

以下分别描写叙述这 2 种心跳机制。

二、网络心跳
故名思义即是通过私有网络来检測节点的状态。假设私有网络硬件、软件导致集群节点间私有网络在一定时间内无法进行正常通信。由此而导致脑裂。由于集群环境中的存储为共享存储，因此此时必须要将故障节点从集群隔离出来，以避免数据灾难。关于这个网络心跳的详细动作描写叙述例如以下：
Every one second, a sending thread in the cssd sends a network tcp heartbeat to itself and all nodes. The receiving thread of the ocssd.bin receives the heartbeat.
If the package network is dropped or has error, the error correction mechanism on tcp would retransmit the package.
Oracle does not retransmit. From the ocssd.log, you will see a WARNING message about missing of heartbeat if a node does not receive a heartbeat from another node for 15 seconds (50% of miscount). Another warning is reported in ocssd.log if the same node is missing for 22 seconds (75% of miscount)..another warning continues from the same node for 27 seconds (90% miscount). When the heartbeat is missing 100% ..30 seconds miscount, the node is evicted

这个网络心跳的延迟称之为 misscount，能够通过 crsctl 工具查询及改动。
[grid@Linux-01 ~]$ crsctl get css misscount
CRS-4678: Successful get misscount 30 for Cluster Synchronization Services.

上面的查询结果表明，假设集群各节点间内联网络延迟大于 30s，Oracle 觉得节点间发生了脑裂，须要将故障节点逐出集群。

怎样寻找故障节点。Oracle 则通过投票算法来决定，以下是一个算法描写叙述演示样例，描写叙述參考大话 Oracle RAC。
集群中各个节点须要心跳机制来通报彼此的健康状态。假设每收到一个节点的通报代表一票。对于三个节点的集群。正常执行时，每一个节点都会有 3 票。当结点 A 心跳出现故障但节点 A 还在执行，这时整个集群就会分裂成 2 个小的 partition。

节点 A 是一个。剩下的 2 个是一个。

这是必须剔除一个 partition 才干保障集群的健康执行。对于这 3 个节点的集群，A 心跳出现故障后，B 和 C 是一个 partion，有 2 票，A 仅仅有 1 票。

依照投票算法。B 和 C 组成的集群获得控制权。A 被剔除。假设仅仅有 2 个节点，投票算法就失效了。

由于每一个节点上都仅仅有 1 票。这时就须要引入第三个设备：Quorum Device. Quorum Device 通常採用的是共享磁盘，这个磁盘也叫作 Quorum disk。这个 Quorum Disk 也代表一票。当 2 个结点的心跳出现故障时，2 个节点同一时候去争取 Quorum Disk 这一票，最早到达的请求被最先满足。

故最先获得 Quorum Disk 的节点就获得 2 票。还有一个节点就会被剔除。

节点一旦被隔离之后，在 11gR2 之前一般是重新启动故障节点。

而在 11gR2 中。ClusterWare 会首先尝试关闭该节点的全部资源，尝试对集群中失败的组建进行清理，即重新启动失败的组件。

假设清理失败的组件未成功，为了强制清理，则再对节点进行重新启动。

三、磁盘心跳
A thread in ocssd.bin updates the voting disk every second.
If a node does not update the voting disks for 200 seconds, it s evicted.
However, the ocssd.bin on the local node has the logic that it will bring down the node if it has an I/O error more than majority of the voting disks. Also there is a CRS reconfiguration is happening when misscount is 27 second and the local node is rebooted. As a result, you rarely see an eviction due to failure of the voting disk on 10.2.0.4 (this is more common in 10.2.0.1)) because the ocssd.bin will abort the node before it get evicted by another node if writing to the voting disk is the problem.
如上所述，每一个节点会每一秒钟更新一次表决磁盘。共享的表决磁盘用于检查磁盘心跳。

假设 ocssd 进程更新表决磁盘的时间超过 200s，即 disktimeout 设定的值。Oracle 会觉得该表决磁盘脱机，同一时候在 Clusterware 的告警日志中生成表决磁盘脱机记录。假设当前节点表决磁盘脱机的个数小于在线表决磁盘的个数，该节点能够幸存，假设脱机表决磁盘的个数大于或等于在线表决磁盘的个数，则 clusterware 觉得磁盘心跳出现故障。故障节点会被逐出集群。执行自己主动修复过程。

比方有 3 个表决磁盘。节点 A 有表决磁盘出现了脱机。此时脱机磁盘 (1 个) 在线磁盘 (2)。clusterware 会在告警日志中生成脱机记录，但不採取不论什么行动。假设当前节点有 2 个或 2 个以上表决磁盘脱机，此时脱机磁盘 (2 个) 在线磁盘 (1 个)。那节点 A 被踢出集群。

四、RebootTime 參数
注意这个 RebootTime 參数。也非常重要，缺省情况下为 3s。
Default 3 seconds -the amount of time allowed for a node to complete a reboot
after the CSS daemon has been evicted.
crsctl get css reboottime

五、心跳參数的调整
1) 10.2.0.2 to 11.1.0.7 版本号的改动方法
a) Shut down CRS on all but one node. For exact steps use note 309542.1
b) Execute crsctl as root to modify the misscount:
$CRS_HOME/bin/crsctl set css misscount n #### where n is the maximum private network latency in seconds
$CRS_HOME/bin/crsctl set css reboottime r [-force] #### (r is seconds)
$CRS_HOME/bin/crsctl set css disktimeout d [-force] #### (d is seconds)
c) Reboot the node where adjustment was made
d) Start all other nodes which was shutdown in step 1
e) Execute crsctl as root to confirm the change:
$CRS_HOME/bin/crsctl get css misscount
$CRS_HOME/bin/crsctl get css reboottime
$CRS_HOME/bin/crsctl get css disktimeout

2) 11gR2 的改动方法
With 11gR2, these settings can be changed online without taking any node down:

a) Execute crsctl as root to modify the misscount:
$CRS_HOME/bin/crsctl set css misscount n #### where n is the maximum private network latency in seconds
$CRS_HOME/bin/crsctl set css reboottime r [-force] #### (r is seconds)
$CRS_HOME/bin/crsctl set css disktimeout d [-force] #### (d is seconds)
b) Execute crsctl as root to confirm the change:
$CRS_HOME/bin/crsctl get css misscount
$CRS_HOME/bin/crsctl get css reboottime
$CRS_HOME/bin/crsctl get css disktimeout

感谢各位的阅读，以上就是“Oracle 集群心跳及其參数 misscount/disktimeout/reboottime 分析”的内容了，经过本文的学习后，相信大家对 Oracle 集群心跳及其參数 misscount/disktimeout/reboottime 分析这一问题有了更深刻的体会，具体使用情况还需要大家实践验证。这里是丸趣 TV，丸趣 TV 小编将为大家推送更多相关知识点的文章，欢迎关注！

正文完