数据库中如何利用systemstate dump trace文件寻找dg terminated原因

72次阅读
没有评论

共计 9182 个字符,预计需要花费 23 分钟才能阅读完成。

行业资讯    
数据库    
关系型数据库    
数据库中如何利用 systemstate dump trace 文件寻找 dg terminated 原因

这篇文章主要介绍了数据库中如何利用 systemstate dump trace 文件寻找 dg terminated 原因,具有一定借鉴价值,感兴趣的朋友可以参考下,希望大家阅读完这篇文章之后大有收获,下面让丸趣 TV 小编带着大家一起了解一下。

收到告警,一台 dataguard 宕机了。软件版本为 11.2.0.4. 登录上来首先查看 alert 文件.

Fri Jan 25   11:00:39 2019

System   State dumped to trace file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_ora_42298.trc

Fri Jan 25   11:00:54 2019

Errors in file   /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_ora_42298.trc:

Fri Jan 25   11:00:58 2019

Errors in file   /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170:   deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25   11:01:19 2019

Errors in file   /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170: deadlocked   on readable physical standby (undo segment 65535)

Fri Jan 25   11:01:39 2019

Errors in file   /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170:   deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25   11:01:59 2019

Errors in file   /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170:   deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25   11:02:19 2019

Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170:   deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25   11:02:40 2019

Errors in file   /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170:   deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25   11:03:00 2019

Errors in file   /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170:   deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25   11:03:21 2019

Errors in file   /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170:   deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25   11:03:41 2019

Errors in file   /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170:   deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25   11:04:01 2019

Errors in file   /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170:   deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25   11:04:21 2019

Errors in file   /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170:   deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25   11:04:41 2019

Errors in file   /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170:   deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25   11:05:01 2019

Errors in file   /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170:   deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25   11:05:21 2019

Errors in file   /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170:   deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25   11:05:27 2019

RFS[3]: Selected   log 9 for thread 1 sequence 85314 dbid -1036715693 branch 899829907

Fri Jan 25   11:05:27 2019

Media Recovery   Waiting for thread 1 sequence 85314 (in transit)

Recovery of   Online Redo Log: Thread 1 Group 9 Seq 85314 Reading mem 0

  Mem# 0:   /san/data/oradata/EIN1/standby_redo03.log

Fri Jan 25   11:05:27 2019

Archived Log   entry 165577 added for thread 1 sequence 85313 ID 0xc2352453 dest 1:

Fri Jan 25 11:05:41   2019

Errors in file   /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170:   deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25   11:06:01 2019

Errors in file   /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170:   deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25   11:06:21 2019

Errors in file   /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170:   deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25   11:06:41 2019

Errors in file   /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170:   deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25   11:06:45 2019

Errors in file   /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_lgwr_56177.trc:

ORA-04020:   deadlock detected while trying to lock object SYS.ERAC

LGWR (ospid:   56177): terminating the instance due to error 4020

Fri Jan 25   11:06:46 2019

System state dump   requested by (instance=1, osid=56177 (LGWR)), summary=[abnormal instance   termination].

System State   dumped to trace file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_diag_56159_20190125110646.trc

 

可以看到持续大量的 ORA-03170 死锁告警,11:06:46 分的时候实例被终止。在贴出的 alert 日志开头生成了一个 trace 文件,首先分析一下这个文件。

Vi
编辑 /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_ora_42298.trc 文件,搜索关键字 final,定位到下面这段:

   
SO:   0xa1cccae30, type: 4, owner: 0xa0c7c8da8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3

  proc=0xa0c7c8da8, name=session,   file=ksu.h LINE:12729, pg=0

  (session) sid: 2228 ser: 9 trans: (nil),   creator: 0xa0c7c8da8

  flags: (0x45) USR/- flags_idl:   (0x1) BSY/-/-/-/-/-

  flags2: (0x40009) -/-/INC

  DID: , short-term DID:

  txn branch: (nil)

  edition#: 100  oct: 3, prv: 0, sql:   0xa33cf5e08, psql: 0xa3e7cdec0, user: 627/CONVERT_R

  ksuxds FALSE at location: 0

  service name: SYS$USERS

  client details:

  O/S info: user: EM-H6VQXD2$, term:   unknown, ospid: 1234

  machine: EM-H6VQXD2 program: JDBC Thin   Client

  application name: JDBC Thin Client,   hash value=2546894660

  Current Wait Stack:

  0: waiting for library cache lock

  handle address=0xa3ff9eb18, lock   address=0x50de690a0, 100*mode+namespace=0x1004a0002

  wait_id=945878 seq_num=28394   snap_id=1

  wait times: snap=2 min 53 sec, exc=2   min 53 sec, total=2 min 53 sec

  wait times: max=15 min 0 sec, heur=2   min 53 sec

  wait counts: calls=60 os=60

  in_wait=1 iflags=0x15a2

There is at least one session blocking this session.

  Dumping 1   direct blocker(s):

  inst:   1, sid: 970, ser: 1

  Dumping   final blocker:

  inst:   1, sid: 2059, ser: 26985

  Wait State:

    fixed_waits=0 flags=0x22 boundary=(nil)/-1

  Session   Wait History:

  elapsed   time of 0.000134 sec since current wait

  0: waited   for SQL*Net message from client

   driver id=0x54435000, #bytes=0x1, =0x0

    wait_id=945877 seq_num=28393 snap_id=1

  wait   times: snap=3.470941 sec, exc=3.470941 sec, total=3.470941 sec

  wait   times: max=infinite

  wait   counts: calls=0 os=0

    occurred after 0.000001 sec of elapsed time

  1: waited   for SQL*Net message to client

  driver   id=0x54435000, #bytes=0x1, =0x0

    wait_id=945876 seq_num=28392 snap_id=1

  wait   times: snap=0.000001 sec, exc=0.000001 sec, total=0.000001 sec

  wait   times: max=infinite

  wait   counts: calls=0 os=0

    occurred after 0.000048 sec of elapsed time

可以看到最源头的 blocker 是 sid: 2059, ser: 26985 这个 session。

这里的 SO 是 0xa1cccae30, type: 4 说明是会话信息, owner: 0xa0c7c8da8 是这个 SO(0xa1cccae30)的父节点。

搜索这个位置所在的 PROCESS 进程,发现正是 LGWR 进程

PROCESS 17: LGWR

然后直接搜索 sid: 2059 ser: 26985

   
SO: 0xa18befcd0, type:   4, owner: 0xa18786ed8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3

  proc=0xa18786ed8, name=session,   file=ksu.h LINE:12729, pg=0

  (session) sid: 2059 ser: 26985 trans:   0xa1b361290, creator: 0xa18786ed8

  flags: (0x41) USR/- flags_idl:   (0x1) BSY/-/-/-/-/-

  flags2: (0x40009) -/-/INC

  DID: , short-term DID:

  txn branch: 0xa0f20f998

  edition#: 100  oct: 3, prv: 0, sql:   0xa345bb8d0, psql: 0xa2fe3c0a8,
user: 510/READER

  ksuxds FALSE at location: 0

  service name: SYS$USERS

  client details:

 
 O/S info: user: Administrator, term:   ADMIN-PC, ospid: 8960:8832

  machine: WORKGROUP\ADMIN-PC program:   plsqldev.exe

  application name: PL/SQL Developer,   hash value=1190136663

  action name: SQL
ISIN 香港净值.sql, hash value=506963986

  Current Wait Stack:

  0: waiting for SQL*Net message from
dblink

  driver id=0x28444553, #bytes=0x1,   =0x0

  wait_id=474 seq_num=475 snap_id=1

  wait times: snap=4 min 45 sec, exc=4   min 45 sec,
total=4 min 45 sec

  wait times: max=infinite, heur=4 min   45 sec

  wait counts: calls=0 os=0

  in_wait=1 iflags=0x5a0

  There are 28 sessions blocked by this   session.

可以看到是 reader 用户从一台 Windows 的 PL/SQL developer 发起的一段 sql 文本导致的。而且这段 sql 使用了 db_link。已经持续了 4 分 45 秒,接近 5 分钟的时间。该会话阻塞了 28 个会话。

目前为止完整的阻塞情况可能并没有全部得到,但是已经得到了 LGWR 进程被一个 PL/SQL developer 会话所阻塞的结论,在 sql 文本执行了接近五分钟的时候,实例被 LGWR 终止。

这里为什么是五分钟呢?这是因为之前发生过的其他故障,那时修改了一个隐含参数_adg_parselock_timeout

SQL col name   for a30

SQL col value   for a20

SQL col   description for a70

SQL set line   150

SQL select   a.ksppinm name,b.ksppstvl value,a.ksppdesc description

  2    from x$ksppi a,x$ksppcv b

  3    where a.inst_id = USERENV (Instance)

  4    and b.inst_id = USERENV (Instance)

  5    and a.indx = b.indx

  6    and upper(a.ksppinm) LIKE upper(%adg_parselock_timeout%)

  7    order by name;

 

NAME  VALUE  DESCRIPTION

——————————   ——————–   ———————————————————————-

_adg_parselock_timeout  30000  timeout for parselock get on   ADG in centiseconds

_adg_parselock_timeout_sleep  100  sleep duration after a   parselock timeout on ADG in milliseconds

 

SQL col name   for a30;

SQL col value   for a10;

SQL select *   from

  2    (select

  3    x.ksppinm name,

  4    y.ksppstvl value,

  5    y.ksppstdf isdefault,

  6    decode(bitand(y.ksppstvf,7),1, MODIFIED ,4, SYSTEM_MOD , FALSE )   ismod,

  7    decode(bitand(y.ksppstvf,2),2, TRUE , FALSE ) isadj

  8    from

  9    sys.x$ksppi x,

 10    sys.x$ksppcv y

 11    where

 12    x.inst_id = userenv(Instance) and

 13    y.inst_id = userenv(Instance) and

 14    x.indx = y.indx

 15    order by

 16    translate(x.ksppinm, _ ,)) T where T.name like   %adg_parselock_timeout%

 

NAME  VALUE  ISDEFAULT  ISMOD  ISADJ

——————————   ———- ————————— ——————————   —————

_adg_parselock_timeout  30000  FALSE  FALSE  FALSE

_adg_parselock_timeout_sleep  100     TRUE  FALSE  FALSE

_adg_parselock_timeout 该值设置为了 30000 厘秒,约等于 300 秒,也就是五分钟。

搜索 MOS 后确认为 BUG

Bug 18515268 ORA-4020 in ADG Standby Database causing   instance crash by LGWR

Solution

Apply interim patch 18515268, if available for your platform and   Oracle version.

Fixed releases: 11.2.0.4.BP20 12.1.0.2 12.2.0.1

感谢你能够认真阅读完这篇文章,希望丸趣 TV 小编分享的“数据库中如何利用 systemstate dump trace 文件寻找 dg terminated 原因”这篇文章对大家有帮助,同时也希望大家多多支持丸趣 TV,关注丸趣 TV 行业资讯频道,更多相关知识等着你来学习!

正文完
 
丸趣
版权声明:本站原创文章,由 丸趣 2023-07-24发表,共计9182字。
转载说明:除特殊说明外本站除技术相关以外文章皆由网络搜集发布,转载请注明出处。
评论(没有评论)