共计 3216 个字符,预计需要花费 9 分钟才能阅读完成。
这篇文章主要介绍了 oracle 11g rac 又一节点无法启动的生产 case 怎么办,具有一定借鉴价值,感兴趣的朋友可以参考下,希望大家阅读完这篇文章之后大有收获,下面让丸趣 TV 小编带着大家一起了解一下。
一、环境描述
11g rac 双节点,AIX 小型机
二、现象
节点 2 无法启动
crsctl start crs 执行报错。
三、问题分析处理
1. 查看数据库日志
Archived Log entry 399348 added for thread 2 sequence 205493 ID 0xffffffff8452e669 dest 1:
Sat Dec 09 11:13:47 2017
Thread 2 advanced to log sequence 205495 (LGWR switch)
Current log# 3 seq# 205495 mem# 0: +DATA/orcl2/onlinelog/group_3.257.890091875
Sat Dec 09 11:13:51 2017
Archived Log entry 399349 added for thread 2 sequence 205494 ID 0xffffffff8452e669 dest 1:
Sat Dec 09 11:24:07 2017
NOTE: ASMB terminating
Errors in file /u01/app/oracle/diag/rdbms/orcl2/PTS22/trace/PTS22_asmb_8847608.trc:
ORA-15064: ? ASM ??????
ORA-03113: ?????????
Errors in file /u01/app/oracle/diag/rdbms/orcl2/PTS22/trace/PTS22_asmb_8847608.trc:
ORA-15064: ? ASM ??????
ORA-03113: ?????????
ASMB (ospid: 8847608): terminating the instance due to error 15064
Sat Dec 09 11:24:07 2017
-- 判断可能是通信问题
orcldb2:/u01/app/oracle/diag/rdbms/orcl2/orcl22/trace$oerr ora 15064
15064, 00000, communication failure with ASM instance
// *Cause: There was a failure to communicate with the ASM instance, most
// likely because the connection went down.
// *Action: Check the accompanying error messages for more information on the
// reason for the failure. Note that database instances will always
// return this error when the ASM instance is terminated abnormally.
2. 查看集群日志
2017-12-09 11:23:51.026
[cssd(7667900)]CRS-1612:Network communication with node orcldb1 (1) missing for 50% of timeout interval. Removal of this node from cluster in 14.523 seconds
2017-12-09 11:23:59.039
[cssd(7667900)]CRS-1611:Network communication with node orcldb1 (1) missing for 75% of timeout interval. Removal of this node from cluster in 6.509 seconds
2017-12-09 11:24:03.052
[cssd(7667900)]CRS-1610:Network communication with node orcldb1 (1) missing for 90% of timeout interval. Removal of this node from cluster in 2.497 seconds
2017-12-09 11:24:05.552
[cssd(7667900)]CRS-1609:This node is unable to communicate with other nodes in the cluster and is going down to preserve cluster integrity; details at (:CSSNM00008:) in /u01/app/11.2.0/grid/log/orcldb2/cssd/ocssd.log.
2017-12-09 11:24:05.552
[cssd(7667900)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/11.2.0/grid/log/orcldb2/cssd/ocssd.log
2017-12-09 11:24:05.614
[cssd(7667900)]CRS-1652:Starting clean up of CRSD resources.
3. 查看系统日志
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
FE2DEE00 1209123617 P S SYSXAIXIF DUPLICATE IP ADDRESS DETECTED IN THE NET
FE2DEE00 1209122517 P S SYSXAIXIF DUPLICATE IP ADDRESS DETECTED IN THE NET
FE2DEE00 1209114417 P S SYSXAIXIF DUPLICATE IP ADDRESS DETECTED IN THE NET
FE2DEE00 1209114317 P S SYSXAIXIF DUPLICATE IP ADDRESS DETECTED IN THE NET
A924A5FC 1209112417 P S SYSPROC SOFTWARE PROGRAM ABNORMALLY TERMINATED
综上所以的日志都指向数据库通信可能有问题。
检查心跳网络,在节点一上 ping 节点二是通的,ping 自己当然也是通的。
这里感觉好奇怪,貌似心跳也没问题啊。各种问好??????整理下思路,在节点二上 ping 节点一,好嘛,真心 ping 不通。找到这个问题之后和客户沟通,发现网络刚刚做了调整导致的。经过网络工程师的处理。心跳网络恢复。轮到我上了,把集群给拉起来。
--root 用户执行
crsctl stop crs -- 报错
crsctl stop crs -f 强制关闭
crsctl start crs
crsctl stat res -t
感谢你能够认真阅读完这篇文章,希望丸趣 TV 小编分享的“oracle 11g rac 又一节点无法启动的生产 case 怎么办”这篇文章对大家有帮助,同时也希望大家多多支持丸趣 TV,关注丸趣 TV 行业资讯频道,更多相关知识等着你来学习!
正文完