Using Diagwait as a diagnostic to get more information for diagnosing Oracle Clusterware Node evicti
2015-01-09 09:53
447 查看
In this Document
Oracle Database - Enterprise Edition - Version 10.1.0.5 to 11.1.0.7 [Release 10.1 to 11.1]
HP-UX PA-RISC (64-bit)
Linux x86
IBM AIX on POWER Systems (64-bit)
Oracle Solaris on SPARC (64-bit)
HP-UX Itanium
Red Hat Enterprise Linux Advanced Server x86-64 (AMD Opteron Architecture)
Red Hat Enterprise Linux Advanced Server Itanium
Oracle Solaris on x86-64 (64-bit)
Linux x86-64
UnitedLinux Itanium
***Checked for relevance on 10-Jul-2014***
Oracle Clusterware evicts the node from the cluster when
Node is not pinging via the network heartbeat
Node is not pinging the Voting disk
Node is hung/busy and is unable to perform either of the earlier tasks
In Most cases when the node is evicted, there is information written to the logs to analyze the cause of the node eviction. However in certain cases this may be missing, the steps documented in this note are to
be used for those cases where there is not enough information or no information to diagnose the cause of the eviction for Clusterware versions less than 11gR2 (11.2.0.1).
Starting with 11.2.0.1, Customers do not need to set diagwait as the architecture has been changed.
None
When the node is evicted and the node is extremely busy in terms of CPU (or lack of it) it is possible that the OS did not get time to flush the logs/traces to the file system. It may be useful to set diagwait
attribute to delay the node reboot to give additional time to the OS to write the traces. This setting will provide more time for diagnostic data to be collected by safely and will NOT increase probability of corruption.
After setting diagwait, the Clusterware will wait an additional 10 seconds (Diagwait - reboottime). Customers can unset diagwait by following the steps documented below after fixing their OS scheduling issues.
* -- Diagwait can be set on windows but it does not change the behaviour as it does on Unix-Linux platforms
@ For internal Support Staff
Diagwait attribute was introduced in 10.2.0.3 and is included in 10.2.0.4 & 11.1.0.6 and higher releases. It has also been subsequently backported to 10.1.0.5 on most platforms. This means it is possible to set diagwait on 10.1.0.5 (or higher), 10.2.0.3 (or
higher) and in 11.1.0.6 (or higher). If the command crsctl set/get css diagwait reports "unrecognized parameter diagwait specified" then it can be safely assumed that the Clusterware version does
not the necessary fixes to implement diagwait. If that is the case then customer is adviced to apply the latest patchset available before attempting to set diagwait
It is important that the clusterware stack must be down on all the nodes when changing diagwait .The following steps provides the step-by-step instructions on setting diagwait.
Execute as root
#crsctl stop crs
#<CRS_HOME>/bin/oprocd stop
Ensure that Clusterware stack is down on all nodes by executing
#ps -ef |egrep "crsd.bin|ocssd.bin|evmd.bin|oprocd"
This should return no processes. If there are clusterware processes running and you proceed to the next step, you will corrupt your OCR. Do not continue until the clusterware processes are down on all the nodes of the cluster.
From one node of the cluster, change the value of the "diagwait" parameter to 13 seconds by issuing the command as root:
#crsctl set css diagwait 13 -force
Check if diagwait is set successfully by executing. the following command. The command should return 13. If diagwait is not set, the following message will be returned "Configuration parameter diagwait is not defined"
#crsctl get css diagwait
Restart the Oracle Clusterware on all the nodes by executing:
#crsctl start crs
Validate that the node is running by executing:
#crsctl check crs
such setting diagwait does not affect most customers.In case there is a need to remove diagwait, the above mentioned steps need to be followed except step 3 needs to be replaced by the following command
#crsctl unset css diagwait -force
(Note: the -force option must be used when unsetting diagwait since CRS will be down when doing so)
NOTE:726833.1 - Linux: Hangcheck-Timer
Module Requirements for Oracle 9i, 10g, and 11gR1 RAC
Symptoms |
Changes |
Cause |
Solution |
Unsetting/Removing diagwait |
References |
APPLIES TO:
Oracle Database - Enterprise Edition - Version 10.1.0.5 to 11.1.0.7 [Release 10.1 to 11.1]HP-UX PA-RISC (64-bit)
Linux x86
IBM AIX on POWER Systems (64-bit)
Oracle Solaris on SPARC (64-bit)
HP-UX Itanium
Red Hat Enterprise Linux Advanced Server x86-64 (AMD Opteron Architecture)
Red Hat Enterprise Linux Advanced Server Itanium
Oracle Solaris on x86-64 (64-bit)
Linux x86-64
UnitedLinux Itanium
***Checked for relevance on 10-Jul-2014***
SYMPTOMS
Oracle Clusterware evicts the node from the cluster whenNode is not pinging via the network heartbeat
Node is not pinging the Voting disk
Node is hung/busy and is unable to perform either of the earlier tasks
In Most cases when the node is evicted, there is information written to the logs to analyze the cause of the node eviction. However in certain cases this may be missing, the steps documented in this note are to
be used for those cases where there is not enough information or no information to diagnose the cause of the eviction for Clusterware versions less than 11gR2 (11.2.0.1).
Starting with 11.2.0.1, Customers do not need to set diagwait as the architecture has been changed.
CHANGES
None
CAUSE
When the node is evicted and the node is extremely busy in terms of CPU (or lack of it) it is possible that the OS did not get time to flush the logs/traces to the file system. It may be useful to set diagwaitattribute to delay the node reboot to give additional time to the OS to write the traces. This setting will provide more time for diagnostic data to be collected by safely and will NOT increase probability of corruption.
After setting diagwait, the Clusterware will wait an additional 10 seconds (Diagwait - reboottime). Customers can unset diagwait by following the steps documented below after fixing their OS scheduling issues.
* -- Diagwait can be set on windows but it does not change the behaviour as it does on Unix-Linux platforms
@ For internal Support Staff
Diagwait attribute was introduced in 10.2.0.3 and is included in 10.2.0.4 & 11.1.0.6 and higher releases. It has also been subsequently backported to 10.1.0.5 on most platforms. This means it is possible to set diagwait on 10.1.0.5 (or higher), 10.2.0.3 (or
higher) and in 11.1.0.6 (or higher). If the command crsctl set/get css diagwait reports "unrecognized parameter diagwait specified" then it can be safely assumed that the Clusterware version does
not the necessary fixes to implement diagwait. If that is the case then customer is adviced to apply the latest patchset available before attempting to set diagwait
SOLUTION
It is important that the clusterware stack must be down on all the nodes when changing diagwait .The following steps provides the step-by-step instructions on setting diagwait.Execute as root
#crsctl stop crs
#<CRS_HOME>/bin/oprocd stop
Ensure that Clusterware stack is down on all nodes by executing
#ps -ef |egrep "crsd.bin|ocssd.bin|evmd.bin|oprocd"
This should return no processes. If there are clusterware processes running and you proceed to the next step, you will corrupt your OCR. Do not continue until the clusterware processes are down on all the nodes of the cluster.
From one node of the cluster, change the value of the "diagwait" parameter to 13 seconds by issuing the command as root:
#crsctl set css diagwait 13 -force
Check if diagwait is set successfully by executing. the following command. The command should return 13. If diagwait is not set, the following message will be returned "Configuration parameter diagwait is not defined"
#crsctl get css diagwait
Restart the Oracle Clusterware on all the nodes by executing:
#crsctl start crs
Validate that the node is running by executing:
#crsctl check crs
Unsetting/Removing diagwait
Customers should not unset diagwait without fixing the OS scheduling issues as that can lead to node evictions via reboot. Diagwait delays the node eviction (and reconfiguration) by diagwait (13) seconds and assuch setting diagwait does not affect most customers.In case there is a need to remove diagwait, the above mentioned steps need to be followed except step 3 needs to be replaced by the following command
#crsctl unset css diagwait -force
(Note: the -force option must be used when unsetting diagwait since CRS will be down when doing so)
REFERENCES
NOTE:726833.1 - Linux: Hangcheck-TimerModule Requirements for Oracle 9i, 10g, and 11gR1 RAC
相关文章推荐
- Using Diagwait as a diagnostic to get more information for diagnosing Oracle Clusterware Node evicti
- Is there any way to get detailed error information for Win32 errors when using Platform Invoke?
- Using Diagwait during Oracle Clusterware Node evictions
- Host Credentials报错"Connection to host as user oracle failed: ERROR: Wrong password for user"的解决一例
- Are you looking for a way to get the entire text of a word document into a RichEdit without using the Clipboard?
- How to Modify Public Network Information including VIP in Oracle Clusterware (文档 ID 276434.1)
- Using CREATE TABLE AS SELECT (CTAS) to Reorganize Oracle Tables
- How to Modify Public Network Information including VIP in Oracle Clusterware
- Script to Collect Data Guard Primary Site Diagnostic Information for Version 10g and above(+RAC)
- Listener Hanging - Information to Get For Resolving or Troubleshooting (Doc ID 230156.1)
- How to generate Oracle Net tracing for a DBMS_JOB using a database link
- How to Get Hardware Information using C#
- Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is rec
- 反射引发的错误“reflection Unable to load one or more of the requested types. Retrieve the LoaderExceptions property for more information.”
- Checkstyle "Unable to get class information for @throws tag 'xxxException" 问题
- Got an exception - java.lang.RuntimeException: Unable to get class information for @throws tag 'XXXException'.
- Java-httpClient警告: Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
- ORA-00824: cannot set sga_target due to existing internal settings, see alert log for more information
- how to get device node name such as /dev/ttyS1.