|
| Solaris: Process spins/ASM and DB Crash if RAC Instance Is Up For > 248 Days by LMHB with ORA-29770 (文档 ID 2159643.1) |
转到底部 |
|
In this Document
APPLIES TO:Oracle Database - Enterprise Edition - Version 10.2.0.4 to 12.1.0.2 [Release 10.2 to 12.1]
Oracle Solaris on SPARC (64-bit)
DESCRIPTIONASM or DB processes may start spinning in a RAC environment after the instance has been running continuously for more than 248 days. This issue only affects Solaris platforms and is due to a faulty C compiler optimization.
The same problem can also affect non-RAC/ASM sessions (in particular if SQLNET.EXPIRE_TIME is used but this is not a requirement to hit the problem)
OCCURRENCEOnly affects Solaris SPARC with Oracle RAC/non-RAC with ASM
SYMPTOMSASM or DB processes may start spinning in a RAC environment after the instance has been running continuously for more than 248 days. This issue only affects Solaris platforms and is due to a faulty C compiler optimization. The same problem can also affect non RAC / ASM sessions (in particular if SQLNET.EXPIRE_TIME is used but this is not a requirement to hit the problem).
The RAC instance (DB or ASM) has been up for more than 248 days continuously with no shutdown. The spinning processes show stacks similar to: sslssalck <- sskgxp_alarm_set <- skgxp_setalarm() <- sslsstehdlr() <- __sighndlr() <- call_user_handler() <- __pollsys() <- _pollsys() ... Instance crashes can occur following process/es starting to spin due to various timeouts and blocked resources and so the symptoms of a crash due to this issue can vary. On Non-RAC/ASM User sessions are seen to spin with a stack like above especially if SQLNET.EXPIRE_TIME is set in the server side SQLNET.ORA ( Note: It is NOT a requirement to have EXPIRE_TIME set to encounter this bug issue, but if that is set then the issue may be more visible). Example traces from 11gR2 on a two node SPARC (64-bit) cluster.
LMHB terminated one of the instances.
The following errors were reported in the Alert log:
Errors in file /oracle/product/diag/rdbms/<dbname>/<sid>/trace/<sid>_lmhb_29680.trc (incident=144129): ORA-29770: global enqueue process LMON (OSID 29660) is hung for more than 70 seconds ERROR: Some process(s) is not making progress. LMHB (ospid: 29680) is terminating the instance. Please check LMHB trace file for more details. Please also check the CPU load, I/O load and other system properties for anomalous behavior ERROR: Some process(s) is not making progress. LMHB (ospid: 29680): terminating the instance due to error 29770 For each LM BG processes (LMON, LMD0, LMS0, LMS1 and LCK0 in this case), the trace file shows information similar to the following:
*** 2012-06-17 02:06:31.607 ============================== LMON (ospid: 29660) has not moved for 30 sec (1339891590.1339891560) : waiting for event 'rdbms ipc message' for 25 secs with wait_id 3381334669. ===[ Wait Chain ]=== Wait chain is empty. *** 2012-06-17 02:06:36.617 ============================== LMD0 (ospid: 29662) has not moved for 32 sec (1339891595.1339891563) : waiting for event 'ges remote message' for 32 secs with wait_id 2766541183. ===[ Wait Chain ]=== Wait chain is empty.
Note that in the trace output, the wait_id related to each of the BG processes is not changing throughout the LMHB trace file .
Hence in this example, all LMON 'waiting for event' reports in the trace file reflect the same wait_id (3381334669 in this example)
Non-RAC ASM instance crashed after the following error message
Sat Dec 28 15:53:27 2013 NOTE: ASM client db0:db died unexpectedly. NOTE: Process state recorded in trace file /opt/app/oracle/diag/asm/+asm/+ASM0/trace/+ASM0_ora_295.trc Sat Dec 28 15:54:06 2013 Errors in file /opt/app/oracle/diag/asm/+asm/+ASM0/trace/+ASM0_pmon_28911.trc: ORA-00490: PSP process terminated with error PMON (ospid: 28911): terminating the instance due to error 490 Sat Dec 28 15:54:09 2013 ORA-1092 : opitsk aborting process Sat Dec 28 15:54:09 2013 License high water mark = 6 Instance terminated by PMON, pid = 28911 USER (ospid: 3728): terminating the instance Instance terminated by USER, pid = 3728 Sat Dec 28 15:54:15 2013 Starting ORACLE instance (normal) Instance up for about 248 days(last startup time 2013-04-24 and crashed time 2013-12-28).
Wed Apr 24 02:36:39 2013 Starting ORACLE instance (normal) ....
Sat Dec 28 15:54:06 2013 Errors in file /opt/app/oracle/diag/asm/+asm/+ASM0/trace/+ASM0_pmon_28911.trc: ORA-00490: PSP process terminated with error PMON (ospid: 28911): terminating the instance due to error 490
WORKAROUNDAs a workaround restart the instance every less than 248 days or whenever you hit this problem.
PATCHESRequest/apply patch for bug 18740837 or apply the 12.1.0.2.160719 (Jul 2016) Database Patch Set Update (DB PSU).
HISTORY07/11/2016 - note created
08/05/2016 - first published
|
|
| Solaris: Process spins/ASM and DB Crash if RAC Instance Is Up For > 248 Days by LMHB with ORA-29770 (文档 ID 2159643.1) |
转到底部 |
|
In this Document
APPLIES TO:Oracle Database - Enterprise Edition - Version 10.2.0.4 to 12.1.0.2 [Release 10.2 to 12.1]
Oracle Solaris on SPARC (64-bit)
DESCRIPTIONASM or DB processes may start spinning in a RAC environment after the instance has been running continuously for more than 248 days. This issue only affects Solaris platforms and is due to a faulty C compiler optimization.
The same problem can also affect non-RAC/ASM sessions (in particular if SQLNET.EXPIRE_TIME is used but this is not a requirement to hit the problem)
OCCURRENCEOnly affects Solaris SPARC with Oracle RAC/non-RAC with ASM
SYMPTOMSASM or DB processes may start spinning in a RAC environment after the instance has been running continuously for more than 248 days. This issue only affects Solaris platforms and is due to a faulty C compiler optimization. The same problem can also affect non RAC / ASM sessions (in particular if SQLNET.EXPIRE_TIME is used but this is not a requirement to hit the problem).
The RAC instance (DB or ASM) has been up for more than 248 days continuously with no shutdown. The spinning processes show stacks similar to: sslssalck <- sskgxp_alarm_set <- skgxp_setalarm() <- sslsstehdlr() <- __sighndlr() <- call_user_handler() <- __pollsys() <- _pollsys() ... Instance crashes can occur following process/es starting to spin due to various timeouts and blocked resources and so the symptoms of a crash due to this issue can vary. On Non-RAC/ASM User sessions are seen to spin with a stack like above especially if SQLNET.EXPIRE_TIME is set in the server side SQLNET.ORA ( Note: It is NOT a requirement to have EXPIRE_TIME set to encounter this bug issue, but if that is set then the issue may be more visible). Example traces from 11gR2 on a two node SPARC (64-bit) cluster.
LMHB terminated one of the instances.
The following errors were reported in the Alert log:
Errors in file /oracle/product/diag/rdbms/<dbname>/<sid>/trace/<sid>_lmhb_29680.trc (incident=144129): ORA-29770: global enqueue process LMON (OSID 29660) is hung for more than 70 seconds ERROR: Some process(s) is not making progress. LMHB (ospid: 29680) is terminating the instance. Please check LMHB trace file for more details. Please also check the CPU load, I/O load and other system properties for anomalous behavior ERROR: Some process(s) is not making progress. LMHB (ospid: 29680): terminating the instance due to error 29770 For each LM BG processes (LMON, LMD0, LMS0, LMS1 and LCK0 in this case), the trace file shows information similar to the following:
*** 2012-06-17 02:06:31.607 ============================== LMON (ospid: 29660) has not moved for 30 sec (1339891590.1339891560) : waiting for event 'rdbms ipc message' for 25 secs with wait_id 3381334669. ===[ Wait Chain ]=== Wait chain is empty. *** 2012-06-17 02:06:36.617 ============================== LMD0 (ospid: 29662) has not moved for 32 sec (1339891595.1339891563) : waiting for event 'ges remote message' for 32 secs with wait_id 2766541183. ===[ Wait Chain ]=== Wait chain is empty.
Note that in the trace output, the wait_id related to each of the BG processes is not changing throughout the LMHB trace file .
Hence in this example, all LMON 'waiting for event' reports in the trace file reflect the same wait_id (3381334669 in this example)
Non-RAC ASM instance crashed after the following error message
Sat Dec 28 15:53:27 2013 NOTE: ASM client db0:db died unexpectedly. NOTE: Process state recorded in trace file /opt/app/oracle/diag/asm/+asm/+ASM0/trace/+ASM0_ora_295.trc Sat Dec 28 15:54:06 2013 Errors in file /opt/app/oracle/diag/asm/+asm/+ASM0/trace/+ASM0_pmon_28911.trc: ORA-00490: PSP process terminated with error PMON (ospid: 28911): terminating the instance due to error 490 Sat Dec 28 15:54:09 2013 ORA-1092 : opitsk aborting process Sat Dec 28 15:54:09 2013 License high water mark = 6 Instance terminated by PMON, pid = 28911 USER (ospid: 3728): terminating the instance Instance terminated by USER, pid = 3728 Sat Dec 28 15:54:15 2013 Starting ORACLE instance (normal) Instance up for about 248 days(last startup time 2013-04-24 and crashed time 2013-12-28).
Wed Apr 24 02:36:39 2013 Starting ORACLE instance (normal) ....
Sat Dec 28 15:54:06 2013 Errors in file /opt/app/oracle/diag/asm/+asm/+ASM0/trace/+ASM0_pmon_28911.trc: ORA-00490: PSP process terminated with error PMON (ospid: 28911): terminating the instance due to error 490
WORKAROUNDAs a workaround restart the instance every less than 248 days or whenever you hit this problem.
PATCHESRequest/apply patch for bug 18740837 or apply the 12.1.0.2.160719 (Jul 2016) Database Patch Set Update (DB PSU).
HISTORY07/11/2016 - note created
08/05/2016 - first published
|
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理