您的位置:首页 > 数据库

记一次数据库无法增删改趋于HANG住状态的故障诊断和处理

2015-03-21 23:18 585 查看
数据库无法增删改,包括v$transaction视图无法查询,类似于HANG的状态,我首先我通过查询v$session_wait视图,情况如下:
SQL> select sid,event,p1,p2,p3,wait_time,seconds_in_wait,state from v$session_wait where wait_class <> 'Idle';

SID EVENT P1 P2 P3 WAIT_TIME SECONDS_IN_WAIT STATE

---------- ----------------------------------------- ---------- ---------- ---------- ---------- --------------- -------------------

125 library cache lock 2130013560 2158412560 301 0 7850 WAITING

126 library cache lock 2130014088 2158399144 301 0 7850 WAITING

127 library cache lock 2130014088 2158495528 301 0 9231 WAITING

128 library cache lock 2130013560 2158380576 301 0 9231 WAITING

129 library cache lock 2130013560 2158307736 301 0 10611 WAITING

130 library cache lock 2130014088 2158498840 301 0 10611 WAITING

131 buffer busy waits 1 11170 1 0 11839 WAITING

132 library cache lock 2130013560 2158391432 301 0 11992 WAITING

133 log file switch (checkpoint incomplete) 0 0 0 0 12616 WAITING

136 log file switch (checkpoint incomplete) 0 0 0 0 12947 WAITING

138 enq: TX - row lock contention 1415053318 589854 665 0 13321 WAITING

139 buffer busy waits 2 9 17 0 12616 WAITING

141 enq: WF - contention 1464205318 0 0 0 1650 WAITING

144 enq: CI - contention 1128857606 1 5 0 15355 WAITING

150 log file switch (checkpoint incomplete) 0 0 0 0 12891 WAITING

159 switch logfile command 0 0 0 0 17051 WAITING

161 log file switch (checkpoint incomplete) 0 0 0 0 12715 WAITING

164 rdbms ipc reply 7 21457644 0 0 0 WAITING

18 rows selected

以上看到有三个log file switch (checkpoint incomplete)等待事件,意味着checkpoint没完成,于是查询v$log检查在线日志的情况:

SQL> select * from v$log;

GROUP# THREAD# SEQUENCE# BYTES MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIM

---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- ---------

1 1 113 52428800 1 NO CURRENT 8590086940 21-MAR-15 <<<<CURRENT

2 1 111 52428800 1 YES ACTIVE 8590086619 21-MAR-15 <<<<ACTIVE

3 1 112 52428800 1 YES ACTIVE 8590086938 21-MAR-15 <<<<ACTIVE

SQL> archive log list;

Database log mode Archive Mode

Automatic archival Enabled <<<<<<归模式档

Archive destination /home/oracle/arch

Oldest online log sequence 111

Next log sequence to archive 113

Current log sequence 113

SQL> !

[oracle@ora10g bdump]$ df -h

Filesystem Size Used Avail Use% Mounted on

/dev/sda3 95G 14G 77G 15% / <<<<<<<<<<<<磁盘空间足够,说明不是磁盘空间不够无法归档造成的

/dev/sda1 99M 12M 82M 13% /boot

tmpfs 1006M 0 1006M 0% /dev/shm

[oracle@ora10g bdump]$ ls -lrt /home/oracle/arch/

total 107176

-rw-r----- 1 oracle oinstall 952832 Mar 15 12:02 1_103_847657195.dbf

-rw-r----- 1 oracle oinstall 29585920 Mar 17 21:35 1_104_847657195.dbf

-rw-r----- 1 oracle oinstall 14306816 Mar 21 12:02 1_105_847657195.dbf

-rw-r----- 1 oracle oinstall 22298112 Mar 21 12:19 1_106_847657195.dbf

-rw-r----- 1 oracle oinstall 42112000 Mar 21 17:14 1_107_847657195.dbf

-rw-r----- 1 oracle oinstall 159232 Mar 21 17:20 1_108_847657195.dbf

-rw-r----- 1 oracle oinstall 1536 Mar 21 17:21 1_109_847657195.dbf

-rw-r----- 1 oracle oinstall 15360 Mar 21 17:24 1_110_847657195.dbf

-rw-r----- 1 oracle oinstall 148480 Mar 21 17:30 1_111_847657195.dbf

-rw-r----- 1 oracle oinstall 1024 Mar 21 17:30 1_112_847657195.dbf <<<<<<<<自17:30时间以后不再有归档日志产生

[oracle@ora10g bdump]$ date

Sat Mar 21 22:24:09 CST 2015

在线日志切换之后没有并归档之前都是处于ACTIVE状态,该状态下无法被复用,没有在线日志组可用时候,数据库会挂起.

做了HANGANALYZE:

==============

HANG ANALYSIS:

==============

Open chains found:

Chain 1 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :

<0/133/2/0x83a69028/4790/log file switch (checkpoint inco>

-- <0/129/1/0x83a6afc8/4877/library cache lock>

-- <0/125/1/0x83a6cf68/4993/library cache lock>

-- <0/128/3/0x83a6b7b0/4931/library cache lock>

-- <0/132/3/0x83a69810/4815/library cache lock> <<<<<<<<<<<<sid为132,128,125,129被sid为133的会话阻塞,而133又在等检查点完成

Chain 2 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :

<0/164/1/0x83a5f208/3048/rdbms ipc reply>

-- <0/144/47/0x83a660b8/4402/enq: CI - contention>

-- <0/138/137/0x83a68058/4757/enq: TX - row lock contention> <<<<<<sid为138,144的会话被sid为164的会话阻塞,而144在等待的enq: CI - contention却被

Other chains found:

Chain 3 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :

<0/123/6/0x83a6d750/5378/No Wait>

Chain 4 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :

<0/126/2/0x83a6c780/4991/library cache lock>

-- <0/127/1/0x83a6bf98/4933/library cache lock>

-- <0/130/1/0x83a6a7e0/4875/library cache lock>

Chain 5 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :

<0/131/1/0x83a69ff8/4821/buffer busy waits>

-- <0/126/2/0x83a6c780/4991/library cache lock>

-- <0/127/1/0x83a6bf98/4933/library cache lock>

-- <0/130/1/0x83a6a7e0/4875/library cache lock>

Chain 6 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :

<0/136/7/0x83a68840/4770/log file switch (checkpoint inco>

Chain 7 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :

<0/139/10/0x83a67870/4609/buffer busy waits>

Chain 8 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :

<0/141/6/0x83a668a0/4748/enq: WF - contention>

Chain 9 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :

<0/147/3/0x83a658d0/3079/Streams AQ: qmn slave idle wait>

Chain 10 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :

<0/148/4/0x83a650e8/3077/Streams AQ: waiting for time man>

Chain 11 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :

<0/150/166/0x83a64900/4752/log file switch (checkpoint inco>

Chain 12 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :

<0/152/1/0x83a64118/3068/Streams AQ: qmn coordinator idle>

Chain 13 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :

<0/159/3/0x83a62960/3062/switch logfile command>

Chain 14 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :

<0/161/1/0x83a609c0/3054/log file switch (checkpoint inco>

Extra information that will be dumped at higher levels:

[level 4] : 2 node dumps -- [REMOTE_WT] [LEAF] [LEAF_NW]

[level 5] : 8 node dumps -- [SINGLE_NODE] [SINGLE_NODE_NW] [IGN_DMP]

[level 6] : 12 node dumps -- [NLEAF]

[level 10] : 12 node dumps -- [IGN]

State of nodes

([nodenum]/cnode/sid/sess_srno/session/ospid/state/start/finish/[adjlist]/predecessor):

[122]/0/123/6/0x83b52750/5378/SINGLE_NODE_NW/1/2//none

[124]/0/125/1/0x83b55220/4993/NLEAF/3/8/[128][132]/127

[125]/0/126/2/0x83b56788/4991/NLEAF/9/12/[130]/126

[126]/0/127/1/0x83b57cf0/4933/NLEAF/13/14/[125][130]/129

[127]/0/128/3/0x83b59258/4931/NLEAF/15/16/[128][124][132]/131

[128]/0/129/1/0x83b5a7c0/4877/NLEAF/4/7/[132]/124

[129]/0/130/1/0x83b5bd28/4875/NLEAF/17/18/[125][126][130]/none

[130]/0/131/1/0x83b5d290/4821/NLEAF/10/11/[132]/125

[131]/0/132/3/0x83b5e7f8/4815/NLEAF/19/20/[128][124][127][132]/none

[132]/0/133/2/0x83b5fd60/4790/LEAF/5/6//128 <<<<<<<sid为133的阻塞了sid为128的会话

[135]/0/136/7/0x83b63d98/4770/SINGLE_NODE/21/22//none

[137]/0/138/137/0x83b66868/4757/NLEAF/23/28/[143]/none

[138]/0/139/10/0x83b67dd0/4609/NLEAF/29/30/[132]/none

[139]/0/140/8/0x83b69338/4605/IGN/31/32//none

[140]/0/141/6/0x83b6a8a0/4748/NLEAF/33/34/[143]/none

[143]/0/144/47/0x83b6e8d8/4402/NLEAF/24/27/[163]/137

[146]/0/147/3/0x83b72910/3079/SINGLE_NODE/35/36//none

[147]/0/148/4/0x83b73e78/3077/SINGLE_NODE/37/38//none

[149]/0/150/166/0x83b76948/4752/SINGLE_NODE/39/40//none

[151]/0/152/1/0x83b79418/3068/SINGLE_NODE/41/42//none

[154]/0/155/1/0x83b7d450/3066/IGN/43/44//none

[155]/0/156/1/0x83b7e9b8/3064/IGN/45/46//none

[158]/0/159/3/0x83b829f0/3062/SINGLE_NODE/47/48//none

[159]/0/160/1/0x83b83f58/3056/IGN/49/50//none

[160]/0/161/1/0x83b854c0/3054/SINGLE_NODE/51/52//none

[161]/0/162/1/0x83b86a28/3052/IGN/53/54//none

[162]/0/163/1/0x83b87f90/3050/IGN/55/56//none

[163]/0/164/1/0x83b894f8/3048/LEAF/25/26//143 <<<<<<<sid为164的阻塞了sid为143的会话
[164]/0/165/1/0x83b8aa60/3046/IGN/57/58//none

[165]/0/166/1/0x83b8bfc8/3044/IGN/59/60//none

[166]/0/167/1/0x83b8d530/3042/IGN/61/62//none

[167]/0/168/1/0x83b8ea98/3040/IGN/63/64//none

[168]/0/169/1/0x83b90000/3038/IGN/65/66//none

[169]/0/170/1/0x83b91568/3036/IGN/67/68//none

做了SYSTEMSTATE DUMP:
用ass对SYSTEMSTATE处理以后:

System State 1

~~~~~~~~~~~~~~~~

1:

2: waiting for 'pmon timer' wait

3: waiting for 'rdbms ipc message' wait

4: waiting for 'rdbms ipc message' wait

5: waiting for 'rdbms ipc message' wait

6: waiting for 'rdbms ipc message' wait

7: waiting for 'rdbms ipc message' wait

8: waiting for 'rdbms ipc reply' wait

9: waiting for 'rdbms ipc message' wait

10: waiting for 'rdbms ipc message' wait

11: waiting for 'log file switch (checkpoint incomplete)' wait

12: waiting for 'rdbms ipc message' wait

13:

14:

15: waiting for 'switch logfile command' wait

16: waiting for 'rdbms ipc message' wait

17: waiting for 'rdbms ipc message' wait

18: waiting for 'Streams AQ: qmn coordinator idle wait' wait

19: waiting for 'log file switch (checkpoint incomplete)' wait

Cmd: Insert

20: for 'Streams AQ: waiting for time management or cleanup tasks' wait

21: waiting for 'Streams AQ: qmn slave idle wait' wait

22: waiting for 'enq: CI - contention' [Enqueue CI-00000001-00000005] wait

23: waiting for 'enq: WF - contention' [Enqueue WF-00000000-00000000] wait

Cmd: PL/SQL Execute

24: waiting for 'SQL*Net message from client' wait

25: waiting for 'buffer busy waits' (2,9,11) wait

Cmd: Select

26: waiting for 'enq: TX - row lock contention'[Enqueue TX-0009001E-00000299] wait

27: waiting for 'log file switch (checkpoint incomplete)' wait

Cmd: Insert

28: waiting for 'log file switch (checkpoint incomplete)' wait

29: waiting for 'library cache lock' [LOCK: handle=7ef56d78] wait

30: waiting for 'buffer busy waits' (1,2ba2,1) wait

31: waiting for 'library cache lock' [LOCK: handle=7ef56f88] wait

32: waiting for 'library cache lock' [LOCK: handle=7ef56d78] wait

33: waiting for 'library cache lock' [LOCK: handle=7ef56d78] wait

34: waiting for 'library cache lock' [LOCK: handle=7ef56f88] wait

35: waiting for 'library cache lock' [LOCK: handle=7ef56f88] wait

36: waiting for 'library cache lock' [LOCK: handle=7ef56d78] wait

37: last wait for 'ksdxexeotherwait'

Blockers

~~~~~~~~

Above is a list of all the processes. If they are waiting for a resource

then it will be given in square brackets. Below is a summary of the

waited upon resources, together with the holder of that resource.

Notes:

~~~~~

o A process id of '???' implies that the holder was not found in the

systemstate.

Resource Holder State

Enqueue CI-00000001-00000005 8: waiting for 'rdbms ipc reply'

Enqueue WF-00000000-00000000 22: 22: is waiting for 8:

Enqueue TX-0009001E-00000299 22: 22: is waiting for 8:

LOCK: handle=7ef56d78 28: waiting for 'log file switch (checkpoint incomplete)'

LOCK: handle=7ef56f88 30: waiting for 'buffer busy waits' (1,2ba2,1)

Object Names

~~~~~~~~~~~~

Enqueue CI-00000001-00000005

Enqueue WF-00000000-00000000

Enqueue TX-0009001E-00000299

LOCK: handle=7ef56d78 TABL:EXFSYS.RLM$SCHDNEGACTION

LOCK: handle=7ef56f88 TABL:EXFSYS.RLM$EVTCLEANUP

System State 2

~~~~~~~~~~~~~~~~

1:

2: waiting for 'pmon timer' wait

3: waiting for 'rdbms ipc message' wait

4: waiting for 'rdbms ipc message' wait

5: waiting for 'rdbms ipc message' wait

6: waiting for 'rdbms ipc message' wait

7: waiting for 'rdbms ipc message' wait

8: waiting for 'rdbms ipc reply' wait

9: waiting for 'rdbms ipc message' wait

10: waiting for 'rdbms ipc message' wait

11: waiting for 'log file switch (checkpoint incomplete)' wait

12: waiting for 'rdbms ipc message' wait

13:

14:

15: waiting for 'switch logfile command' wait

16: waiting for 'rdbms ipc message' wait

17: waiting for 'rdbms ipc message' wait

18: waiting for 'Streams AQ: qmn coordinator idle wait' wait

19: waiting for 'log file switch (checkpoint incomplete)' wait

Cmd: Insert

20: for 'Streams AQ: waiting for time management or cleanup tasks' wait

21: waiting for 'Streams AQ: qmn slave idle wait' wait

22: waiting for 'enq: CI - contention' [Enqueue CI-00000001-00000005] wait

23: waiting for 'enq: WF - contention' [Enqueue WF-00000000-00000000] wait

Cmd: PL/SQL Execute

24: waiting for 'SQL*Net message from client' wait

25: waiting for 'buffer busy waits' (2,9,11) wait

Cmd: Select

26: waiting for 'enq: TX - row lock contention'[Enqueue TX-0009001E-00000299] wait

27: waiting for 'log file switch (checkpoint incomplete)' wait

Cmd: Insert

28: waiting for 'log file switch (checkpoint incomplete)' wait

29: waiting for 'library cache lock' [LOCK: handle=7ef56d78] wait

30: waiting for 'buffer busy waits' (1,2ba2,1) wait

31: waiting for 'library cache lock' [LOCK: handle=7ef56f88] wait

32: waiting for 'library cache lock' [LOCK: handle=7ef56d78] wait

33: waiting for 'library cache lock' [LOCK: handle=7ef56d78] wait

34: waiting for 'library cache lock' [LOCK: handle=7ef56f88] wait

35: waiting for 'library cache lock' [LOCK: handle=7ef56f88] wait

36: waiting for 'library cache lock' [LOCK: handle=7ef56d78] wait

37: last wait for 'ksdxexeotherwait'

Blockers

~~~~~~~~

Above is a list of all the processes. If they are waiting for a resource

then it will be given in square brackets. Below is a summary of the

waited upon resources, together with the holder of that resource.

Notes:

~~~~~

o A process id of '???' implies that the holder was not found in the

systemstate.

Resource Holder State

Enqueue CI-00000001-00000005 8: waiting for 'rdbms ipc reply'

Enqueue WF-00000000-00000000 22: 22: is waiting for 8:

Enqueue TX-0009001E-00000299 22: 22: is waiting for 8:

LOCK: handle=7ef56d78 28: waiting for 'log file switch (checkpoint incomplete)'

LOCK: handle=7ef56f88 30: waiting for 'buffer busy waits' (1,2ba2,1)

Object Names

~~~~~~~~~~~~

Enqueue CI-00000001-00000005

Enqueue WF-00000000-00000000

Enqueue TX-0009001E-00000299

LOCK: handle=7ef56d78 TABL:EXFSYS.RLM$SCHDNEGACTION

LOCK: handle=7ef56f88 TABL:EXFSYS.RLM$EVTCLEANUP

List of Processes That May Be Stuck

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

8: waiting for 'rdbms ipc reply' wait

11: waiting for 'log file switch (checkpoint incomplete)' wait

15: waiting for 'switch logfile command' wait

18: waiting for 'Streams AQ: qmn coordinator idle wait' wait

19: waiting for 'log file switch (checkpoint incomplete)' wait

20: for 'Streams AQ: waiting for time management or cleanup tasks' wait

21: waiting for 'Streams AQ: qmn slave idle wait' wait

22: waiting for 'enq: CI - contention' wait

23: waiting for 'enq: WF - contention' wait

24: waiting for 'SQL*Net message from client' wait

25: waiting for 'buffer busy waits' (2,9,11) wait

26: waiting for 'enq: TX - row lock contention' wait

27: waiting for 'log file switch (checkpoint incomplete)' wait

28: waiting for 'log file switch (checkpoint incomplete)' wait

29: waiting for 'library cache lock' wait

30: waiting for 'buffer busy waits' (1,2ba2,1) wait

31: waiting for 'library cache lock' wait

32: waiting for 'library cache lock' wait

33: waiting for 'library cache lock' wait

34: waiting for 'library cache lock' wait

35: waiting for 'library cache lock' wait

36: waiting for 'library cache lock' wait
SO: 0x83b5fd60, type: 4, owner: 0x83a69028, flag: INIT/-/-/0x00

(session) sid: 133 trans: 0x822d5790, creator: 0x83a69028, flag: (41) USR/- BSY/-/-/-/-/-

DID: 0001-001C-00000002, short-term DID: 0000-0000-00000000

txn branch: (nil)

oct: 0, prv: 0, sql: (nil), psql: (nil), user: 0/SYS

service name: SYS$USERS

O/S info: user: oracle, term: UNKNOWN, ospid: 4790, machine: ora10g

program: oracle@ora10g (J001)

waiting for 'log file switch (checkpoint incomplete)' blocking sess=0x(nil) seq=1 wait_time=0 seconds since wait started=12130

=0, =0, =0

Dumping Session Wait History

for 'log file switch (checkpoint incomplete)' count=1 wait_time=1107898

=0, =0, =0

for 'log file switch (checkpoint incomplete)' count=1 wait_time=336
...

PROCESS 8:

----------------------------------------

SO: 0x83a5f208, type: 2, owner: (nil), flag: INIT/-/-/0x00

(process) Oracle pid=8, calls cur/top: 0x83ba9a28/0x83ba9a28, flag: (16) SYSTEM

int error: 0, call error: 0, sess error: 0, txn error 0

(post info) last post received: 0 0 33

last post received-location: ksrpublish

last process to post me: 83a68840 1 0

last post sent: 0 0 24

last post sent-location: ksasnd

last process posted by me: 83a5ea20 1 6

(latch info) wait_event=0 bits=0

Process Group: DEFAULT, pseudo proc: 0x83aa67e8

O/S info: user: oracle, term: UNKNOWN, ospid: 3048

OSD pid info: Unix process pid: 3048, image: oracle@ora10g (SMON)

Short stack dump:
...

PROCESS 22:

----------------------------------------

SO: 0x83a660b8, type: 2, owner: (nil), flag: INIT/-/-/0x00

(process) Oracle pid=22, calls cur/top: 0x83bafe00/0x83bafb40, flag: (2) SYSTEM

int error: 0, call error: 0, sess error: 0, txn error 0

(post info) last post received: 0 0 0

last post received-location: No post

last process to post me: none

last post sent: 0 0 24

last post sent-location: ksasnd

last process posted by me: 83a5e238 1 6

(latch info) wait_event=0 bits=0

Process Group: DEFAULT, pseudo proc: 0x83aa67e8

O/S info: user: oracle, term: UNKNOWN, ospid: 4402

OSD pid info: Unix process pid: 4402, image: oracle@ora10g (m000)

Short stack dump:

自此,把注意力转移至SMON进程,首先检查SMON的trace文件:
/u01/app/oracle/admin/ora10g/bdump/ora10g_smon_3048.trc

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production

With the Partitioning, OLAP, Data Mining and Real Application Testing options

ORACLE_HOME = /u01/app/oracle/product/10.2.0

System name: Linux

Node name: ora10g

Release: 2.6.18-194.el5

Version: #1 SMP Tue Mar 16 21:52:39 EDT 2010

Machine: x86_64

Instance name: ora10g

Redo thread mounted by this instance: 1

Oracle process number: 8

Unix process pid: 3048, image: oracle@ora10g (SMON)

*** 2015-03-21 17:34:17.611

*** SERVICE NAME:(SYS$BACKGROUND) 2015-03-21 17:34:17.611

*** SESSION ID:(164.1) 2015-03-21 17:34:17.611

Waited for detached process: CKPT for 300 seconds:

*** 2015-03-21 17:34:17.611

Dumping diagnostic information for CKPT:

OS pid = 3046

loadavg : 0.00 0.00 0.00

memory info: free memory = 0.00M

swap info: free = 0.00M alloc = 0.00M total = 0.00M

F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD

0 S oracle 3046 1 0 75 0 - 184266 - 17:24 ? 00:00:00 ora_ckpt_ora10g

[Thread debugging using libthread_db enabled]

0x00000035cc2d517a in semtimedop () from /lib64/libc.so.6

#0 0x00000035cc2d517a in semtimedop () from /lib64/libc.so.6

#1 0x0000000003be9b09 in sskgpwwait ()

#2 0x0000000003bccdf0 in skgpwwait ()

#3 0x0000000000855f4a in ksdxsus ()

#4 0x0000000000857118 in ksdxffrz ()

#5 0x0000000000853003 in ksdxcb ()

#6 0x0000000001ebd0cf in sspuser ()

#7 <signal handler called>

#8 0x00000035cc2d517a in semtimedop () from /lib64/libc.so.6

#9 0x0000000003be9b09 in sskgpwwait ()

#10 0x0000000003bccdf0 in skgpwwait ()

#11 0x0000000000798319 in kslwaitns_timed ()

#12 0x00000000008c3b1d in kskthbwt ()

#13 0x0000000000797e54 in kslwait ()

#14 0x00000000029f3f3b in ksarcv ()

#15 0x000000000082e8bf in ksbabs ()

#16 0x0000000000835822 in ksbrdp ()

#17 0x0000000002f4d840 in opirip ()

#18 0x000000000132b016 in opidrv ()

#19 0x0000000001eb3146 in sou2o ()

#20 0x0000000000723245 in opimai_real ()

#21 0x00000000007230fc in main ()

A debugging session is active.

Inferior 1 [process 3046] will be detached.

Quit anyway? (y or n) [answered Y; input not from terminal]


*** 2015-03-21 17:34:19.125

*** 2015-03-21 17:34:29.129

Waited for detached process: CKPT for 310 seconds:

*** 2015-03-21 17:34:29.129

Dumping diagnostic information for CKPT:

OS pid = 3046

loadavg : 0.07 0.02 0.00

memory info: free memory = 0.00M

swap info: free = 0.00M alloc = 0.00M total = 0.00M

F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD

0 S oracle 3046 1 0 75 0 - 184266 - 17:24 ? 00:00:00 ora_ckpt_ora10g

[Thread debugging using libthread_db enabled]

0x00000035cc2d517a in semtimedop () from /lib64/libc.so.6

#0 0x00000035cc2d517a in semtimedop () from /lib64/libc.so.6

#1 0x0000000003be9b09 in sskgpwwait ()

#2 0x0000000003bccdf0 in skgpwwait ()

#3 0x0000000000855f4a in ksdxsus ()

#4 0x0000000000857118 in ksdxffrz ()

#5 0x0000000000853003 in ksdxcb ()

#6 0x0000000001ebd0cf in sspuser ()

#7 <signal handler called>

#8 0x00000035cc2d517a in semtimedop () from /lib64/libc.so.6

#9 0x0000000003be9b09 in sskgpwwait ()

#10 0x0000000003bccdf0 in skgpwwait ()

#11 0x0000000000798319 in kslwaitns_timed ()

#12 0x00000000008c3b1d in kskthbwt ()

#13 0x0000000000797e54 in kslwait ()

#14 0x00000000029f3f3b in ksarcv ()

#15 0x000000000082e8bf in ksbabs ()

#16 0x0000000000835822 in ksbrdp ()

#17 0x0000000002f4d840 in opirip ()

#18 0x000000000132b016 in opidrv ()

#19 0x0000000001eb3146 in sou2o ()

#20 0x0000000000723245 in opimai_real ()

#21 0x00000000007230fc in main ()

A debugging session is active.

Inferior 1 [process 3046] will be detached.

Quit anyway? (y or n) [answered Y; input not from terminal]

*** 2015-03-21 17:34:29.895

*** 2015-03-21 17:34:39.899

Waited for detached process: CKPT for 320 seconds:

*** 2015-03-21 17:34:39.899

Dumping diagnostic information for CKPT:

OS pid = 3046

loadavg : 0.06 0.01 0.00

memory info: free memory = 0.00M

swap info: free = 0.00M alloc = 0.00M total = 0.00M

F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD

0 S oracle 3046 1 0 75 0 - 184266 - 17:24 ? 00:00:00 ora_ckpt_ora10g

[Thread debugging using libthread_db enabled]

0x00000035cc2d517a in semtimedop () from /lib64/libc.so.6

#0 0x00000035cc2d517a in semtimedop () from /lib64/libc.so.6

#1 0x0000000003be9b09 in sskgpwwait ()

#2 0x0000000003bccdf0 in skgpwwait ()

#3 0x0000000000855f4a in ksdxsus ()

#4 0x0000000000857118 in ksdxffrz ()

#5 0x0000000000853003 in ksdxcb ()

#6 0x0000000001ebd0cf in sspuser ()

#7 <signal handler called>

#8 0x00000035cc2d517a in semtimedop () from /lib64/libc.so.6

#9 0x0000000003be9b09 in sskgpwwait ()

#10 0x0000000003bccdf0 in skgpwwait ()

#11 0x0000000000798319 in kslwaitns_timed ()

#12 0x00000000008c3b1d in kskthbwt ()

#13 0x0000000000797e54 in kslwait ()

#14 0x00000000029f3f3b in ksarcv ()

#15 0x000000000082e8bf in ksbabs ()

#16 0x0000000000835822 in ksbrdp ()

#17 0x0000000002f4d840 in opirip ()

#18 0x000000000132b016 in opidrv ()

#19 0x0000000001eb3146 in sou2o ()

#20 0x0000000000723245 in opimai_real ()

#21 0x00000000007230fc in main ()

A debugging session is active.

Inferior 1 [process 3046] will be detached.

Quit anyway? (y or n) [answered Y; input not from terminal]

*** 2015-03-21 17:34:40.598

从日志文件看出,SMON进程在等待与CKPT进程通信,难道CKPT进程异常吗?再将注意力转移至CKPT,查看CKPT的trace文件:
*** 2015-03-21 17:26:15.972

*** SERVICE NAME:(SYS$BACKGROUND) 2015-03-21 17:26:15.972

*** SESSION ID:(165.1) 2015-03-21 17:26:15.972

Received ORADEBUG command 'suspend' from process Unix process pid: 3062, image: <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<问题出在这里!!!!!

Received ORADEBUG command 'tracefile_name' from process Unix process pid: 3062, image:

自此,我们就明白了事情的缘由经过:

1. 由于CKPT进程被挂起,使得检查点无法完成、SMON进程长时间持有资源Enqueue CI-00000001-00000005,最后导致数据库HANG住;
2. 随着资源Enqueue CI-00000001-00000005被SMON进程持有,导致一系列等待事件library cache lock,enq: CI - contention,log file switch (checkpoint incomplete)的出现

重新把CKPT进程恢复:

SQL> oradebug setospid 3046

Oracle pid: 7, Unix process pid: 3046, image: oracle@ora10g (CKPT)

SQL> oradebug resume;

Statement processed.
SQL> select sid,event,p1,p2,p3,wait_time,seconds_in_wait,state from v$session_wait where wait_class <> 'Idle';

SID EVENT P1 P2 P3 WAIT_TIME SECONDS_IN_WAIT STATE

---------- ---------------------------------------------------------------- ---------- ---------- ---------- ---------- --------------- -------------------

到此,数据库恢复正常

---------------------------------------------------

道行尚浅,欢迎拍砖。

转载请注明出处.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: