共计 9182 个字符,预计需要花费 23 分钟才能阅读完成。
行业资讯
数据库
关系型数据库
数据库中如何利用 systemstate dump trace 文件寻找 dg terminated 原因
这篇文章主要介绍了数据库中如何利用 systemstate dump trace 文件寻找 dg terminated 原因,具有一定借鉴价值,感兴趣的朋友可以参考下,希望大家阅读完这篇文章之后大有收获,下面让丸趣 TV 小编带着大家一起了解一下。
收到告警,一台 dataguard 宕机了。软件版本为 11.2.0.4. 登录上来首先查看 alert 文件.
Fri Jan 25 11:00:39 2019
System State dumped to trace file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_ora_42298.trc
Fri Jan 25 11:00:54 2019
Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_ora_42298.trc:
Fri Jan 25 11:00:58 2019
Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:
ORA-03170: deadlocked on readable physical standby (undo segment 65535)
Fri Jan 25 11:01:19 2019
Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:
ORA-03170: deadlocked on readable physical standby (undo segment 65535)
Fri Jan 25 11:01:39 2019
Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:
ORA-03170: deadlocked on readable physical standby (undo segment 65535)
Fri Jan 25 11:01:59 2019
Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:
ORA-03170: deadlocked on readable physical standby (undo segment 65535)
Fri Jan 25 11:02:19 2019
Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:
ORA-03170: deadlocked on readable physical standby (undo segment 65535)
Fri Jan 25 11:02:40 2019
Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:
ORA-03170: deadlocked on readable physical standby (undo segment 65535)
Fri Jan 25 11:03:00 2019
Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:
ORA-03170: deadlocked on readable physical standby (undo segment 65535)
Fri Jan 25 11:03:21 2019
Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:
ORA-03170: deadlocked on readable physical standby (undo segment 65535)
Fri Jan 25 11:03:41 2019
Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:
ORA-03170: deadlocked on readable physical standby (undo segment 65535)
Fri Jan 25 11:04:01 2019
Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:
ORA-03170: deadlocked on readable physical standby (undo segment 65535)
Fri Jan 25 11:04:21 2019
Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:
ORA-03170: deadlocked on readable physical standby (undo segment 65535)
Fri Jan 25 11:04:41 2019
Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:
ORA-03170: deadlocked on readable physical standby (undo segment 65535)
Fri Jan 25 11:05:01 2019
Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:
ORA-03170: deadlocked on readable physical standby (undo segment 65535)
Fri Jan 25 11:05:21 2019
Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:
ORA-03170: deadlocked on readable physical standby (undo segment 65535)
Fri Jan 25 11:05:27 2019
RFS[3]: Selected log 9 for thread 1 sequence 85314 dbid -1036715693 branch 899829907
Fri Jan 25 11:05:27 2019
Media Recovery Waiting for thread 1 sequence 85314 (in transit)
Recovery of Online Redo Log: Thread 1 Group 9 Seq 85314 Reading mem 0
Mem# 0: /san/data/oradata/EIN1/standby_redo03.log
Fri Jan 25 11:05:27 2019
Archived Log entry 165577 added for thread 1 sequence 85313 ID 0xc2352453 dest 1:
Fri Jan 25 11:05:41 2019
Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:
ORA-03170: deadlocked on readable physical standby (undo segment 65535)
Fri Jan 25 11:06:01 2019
Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:
ORA-03170: deadlocked on readable physical standby (undo segment 65535)
Fri Jan 25 11:06:21 2019
Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:
ORA-03170: deadlocked on readable physical standby (undo segment 65535)
Fri Jan 25 11:06:41 2019
Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:
ORA-03170: deadlocked on readable physical standby (undo segment 65535)
Fri Jan 25 11:06:45 2019
Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_lgwr_56177.trc:
ORA-04020: deadlock detected while trying to lock object SYS.ERAC
LGWR (ospid: 56177): terminating the instance due to error 4020
Fri Jan 25 11:06:46 2019
System state dump requested by (instance=1, osid=56177 (LGWR)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_diag_56159_20190125110646.trc
可以看到持续大量的 ORA-03170 死锁告警,11:06:46 分的时候实例被终止。在贴出的 alert 日志开头生成了一个 trace 文件,首先分析一下这个文件。
Vi
编辑 /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_ora_42298.trc 文件,搜索关键字 final,定位到下面这段:
SO: 0xa1cccae30, type: 4, owner: 0xa0c7c8da8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0xa0c7c8da8, name=session, file=ksu.h LINE:12729, pg=0
(session) sid: 2228 ser: 9 trans: (nil), creator: 0xa0c7c8da8
flags: (0x45) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
flags2: (0x40009) -/-/INC
DID: , short-term DID:
txn branch: (nil)
edition#: 100 oct: 3, prv: 0, sql: 0xa33cf5e08, psql: 0xa3e7cdec0, user: 627/CONVERT_R
ksuxds FALSE at location: 0
service name: SYS$USERS
client details:
O/S info: user: EM-H6VQXD2$, term: unknown, ospid: 1234
machine: EM-H6VQXD2 program: JDBC Thin Client
application name: JDBC Thin Client, hash value=2546894660
Current Wait Stack:
0: waiting for library cache lock
handle address=0xa3ff9eb18, lock address=0x50de690a0, 100*mode+namespace=0x1004a0002
wait_id=945878 seq_num=28394 snap_id=1
wait times: snap=2 min 53 sec, exc=2 min 53 sec, total=2 min 53 sec
wait times: max=15 min 0 sec, heur=2 min 53 sec
wait counts: calls=60 os=60
in_wait=1 iflags=0x15a2
There is at least one session blocking this session.
Dumping 1 direct blocker(s):
inst: 1, sid: 970, ser: 1
Dumping final blocker:
inst: 1, sid: 2059, ser: 26985
Wait State:
fixed_waits=0 flags=0x22 boundary=(nil)/-1
Session Wait History:
elapsed time of 0.000134 sec since current wait
0: waited for SQL*Net message from client
driver id=0x54435000, #bytes=0x1, =0x0
wait_id=945877 seq_num=28393 snap_id=1
wait times: snap=3.470941 sec, exc=3.470941 sec, total=3.470941 sec
wait times: max=infinite
wait counts: calls=0 os=0
occurred after 0.000001 sec of elapsed time
1: waited for SQL*Net message to client
driver id=0x54435000, #bytes=0x1, =0x0
wait_id=945876 seq_num=28392 snap_id=1
wait times: snap=0.000001 sec, exc=0.000001 sec, total=0.000001 sec
wait times: max=infinite
wait counts: calls=0 os=0
occurred after 0.000048 sec of elapsed time
可以看到最源头的 blocker 是 sid: 2059, ser: 26985 这个 session。
这里的 SO 是 0xa1cccae30, type: 4 说明是会话信息, owner: 0xa0c7c8da8 是这个 SO(0xa1cccae30)的父节点。
搜索这个位置所在的 PROCESS 进程,发现正是 LGWR 进程
PROCESS 17: LGWR
然后直接搜索 sid: 2059 ser: 26985
SO: 0xa18befcd0, type: 4, owner: 0xa18786ed8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0xa18786ed8, name=session, file=ksu.h LINE:12729, pg=0
(session) sid: 2059 ser: 26985 trans: 0xa1b361290, creator: 0xa18786ed8
flags: (0x41) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
flags2: (0x40009) -/-/INC
DID: , short-term DID:
txn branch: 0xa0f20f998
edition#: 100 oct: 3, prv: 0, sql: 0xa345bb8d0, psql: 0xa2fe3c0a8,
user: 510/READER
ksuxds FALSE at location: 0
service name: SYS$USERS
client details:
O/S info: user: Administrator, term: ADMIN-PC, ospid: 8960:8832
machine: WORKGROUP\ADMIN-PC program: plsqldev.exe
application name: PL/SQL Developer, hash value=1190136663
action name: SQL
ISIN 香港净值.sql, hash value=506963986
Current Wait Stack:
0: waiting for SQL*Net message from
dblink
driver id=0x28444553, #bytes=0x1, =0x0
wait_id=474 seq_num=475 snap_id=1
wait times: snap=4 min 45 sec, exc=4 min 45 sec,
total=4 min 45 sec
wait times: max=infinite, heur=4 min 45 sec
wait counts: calls=0 os=0
in_wait=1 iflags=0x5a0
There are 28 sessions blocked by this session.
可以看到是 reader 用户从一台 Windows 的 PL/SQL developer 发起的一段 sql 文本导致的。而且这段 sql 使用了 db_link。已经持续了 4 分 45 秒,接近 5 分钟的时间。该会话阻塞了 28 个会话。
目前为止完整的阻塞情况可能并没有全部得到,但是已经得到了 LGWR 进程被一个 PL/SQL developer 会话所阻塞的结论,在 sql 文本执行了接近五分钟的时候,实例被 LGWR 终止。
这里为什么是五分钟呢?这是因为之前发生过的其他故障,那时修改了一个隐含参数_adg_parselock_timeout
SQL col name for a30
SQL col value for a20
SQL col description for a70
SQL set line 150
SQL select a.ksppinm name,b.ksppstvl value,a.ksppdesc description
2 from x$ksppi a,x$ksppcv b
3 where a.inst_id = USERENV (Instance)
4 and b.inst_id = USERENV (Instance)
5 and a.indx = b.indx
6 and upper(a.ksppinm) LIKE upper(%adg_parselock_timeout%)
7 order by name;
NAME VALUE DESCRIPTION
—————————— ——————– ———————————————————————-
_adg_parselock_timeout 30000 timeout for parselock get on ADG in centiseconds
_adg_parselock_timeout_sleep 100 sleep duration after a parselock timeout on ADG in milliseconds
SQL col name for a30;
SQL col value for a10;
SQL select * from
2 (select
3 x.ksppinm name,
4 y.ksppstvl value,
5 y.ksppstdf isdefault,
6 decode(bitand(y.ksppstvf,7),1, MODIFIED ,4, SYSTEM_MOD , FALSE ) ismod,
7 decode(bitand(y.ksppstvf,2),2, TRUE , FALSE ) isadj
8 from
9 sys.x$ksppi x,
10 sys.x$ksppcv y
11 where
12 x.inst_id = userenv(Instance) and
13 y.inst_id = userenv(Instance) and
14 x.indx = y.indx
15 order by
16 translate(x.ksppinm, _ ,)) T where T.name like %adg_parselock_timeout%
NAME VALUE ISDEFAULT ISMOD ISADJ
—————————— ———- ————————— —————————— —————
_adg_parselock_timeout 30000 FALSE FALSE FALSE
_adg_parselock_timeout_sleep 100 TRUE FALSE FALSE
_adg_parselock_timeout 该值设置为了 30000 厘秒,约等于 300 秒,也就是五分钟。
搜索 MOS 后确认为 BUG
Bug 18515268 ORA-4020 in ADG Standby Database causing instance crash by LGWR
Solution
Apply interim patch 18515268, if available for your platform and Oracle version.
Fixed releases: 11.2.0.4.BP20 12.1.0.2 12.2.0.1
感谢你能够认真阅读完这篇文章,希望丸趣 TV 小编分享的“数据库中如何利用 systemstate dump trace 文件寻找 dg terminated 原因”这篇文章对大家有帮助,同时也希望大家多多支持丸趣 TV,关注丸趣 TV 行业资讯频道,更多相关知识等着你来学习!