11.2.0.3 ASM实例出现ORA-4031错误导致数据库归档失败
2014-09-07 09:25
621 查看
环境:
平台:RedHat EnterPrise 5.8 X86_X64
数据库:Oracle EnterPrise 11.2.0.3
集群软件:Oracle grid 11.2.0.3
故障现象:
数据库出现了归档失败,报错 ARC3 ERROR 19504(alert_hdwkxt2.log)
日志信息如下:
[plain]
view plaincopyprint?
![](https://oscdn.geek-share.com/Uploads/Images/Content/201611/a7c8e286f463007e2a900848b93dd72c.png)
![](https://oscdn.geek-share.com/Uploads/Images/Content/201611/9e12f1d3e499fc949c886e7c9e0484f9)
Fri Feb 28 19:49:04 2014
ARC1: Error 19504 Creating archive log file to '+DATA02'
ARCH: Archival stopped, error occurred. Will continue retrying
ORACLE Instance orcl1 - Archival Error
ORA-16038: log 14 sequence# 68244 cannot be archived
ORA-19504: failed to create file ""
ORA-00312: online log 14 thread 1: '+DATA02/orcl/onlinelog/group_14.264.792274883'
ORA-00312: online log 14 thread 1: '+DATA02/orcl/onlinelog/group_14.265.792274889'
Archiver process freed from errors. No longer stopped
Fri Feb 28 19:50:22 2014
ARC0: LGWR is actively archiving destination LOG_ARCHIVE_DEST_3
ARCH: Archival stopped, error occurred. Will continue retrying
ORACLE Instance orcl1 - Archival Error
ORA-16014: log 14 sequence# 68244 not archived, no available destinations
ORA-00312: online log 14 thread 1: '+DATA02/orcl/onlinelog/group_14.264.792274883'
ORA-00312: online log 14 thread 1: '+DATA02/orcl/onlinelog/group_14.265.792274889'
ARC0: Archive log rejected (thread 1 sequence 68240) at host 'orclsh'
FAL[server, ARC0]: FAL archive failed, see trace file.
ARCH: FAL archive failed. Archiver continuing
ORACLE Instance orcl1 - Archival Error. Archiver continuing.
分析:
由于归档失败发生在ASM磁盘上,首先检查ASM磁盘空间以及DB_RECOVERY_FILE_DEST_SIZE,ASM磁盘空间是足够的,而且由于只有一个节点出现出现了无法归档的问题,也可以排除是空间不足造成的。确认两个节点的DB_RECOVERY_FILE_DEST_SIZE参数设置都是0,基本上可以判断问题和当前节点的ASM实例状态不正常有关。
检查ASM实例的错误信息:
[plain]
view plaincopyprint?
![](https://oscdn.geek-share.com/Uploads/Images/Content/201611/a7c8e286f463007e2a900848b93dd72c.png)
![](https://oscdn.geek-share.com/Uploads/Images/Content/201611/9e12f1d3e499fc949c886e7c9e0484f9)
Fri Feb 28 19:41:23 2014
Dumping diagnostic data in directory=[cdmp_20130702164115], requested by (instance=2, osid=2032294 (LMD0)), summary=[incident=165521].
Fri Feb 28 19:49:19 2014
Dumping diagnostic data in directory=[cdmp_20130702164845], requested by (instance=2, osid=2032294 (LMD0)), summary=[incident=165522].
Fri Feb 28 19:55:56 2014
Dumping diagnostic data in directory=[cdmp_20130702165517], requested by (instance=2, osid=2032294 (LMD0)), summary=[incident=165523].
当前节点ASM实例出现了的这个信息,说明报错发生在实例2上:
[plain]
view plaincopyprint?
![](https://oscdn.geek-share.com/Uploads/Images/Content/201611/a7c8e286f463007e2a900848b93dd72c.png)
![](https://oscdn.geek-share.com/Uploads/Images/Content/201611/9e12f1d3e499fc949c886e7c9e0484f9)
Fri Feb 28 18:34:25 2014
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_lmd0_2032294.trc (incident=186256):
ORA-04031: unable to allocate 3768 bytes of shared memory ("shared pool","unknown object","sga heap(1,0)","ges enqueues")
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Insufficient shared pool to allocate a GES object (ospid 2032294)
Fri Feb 28 18:29:53 2014
Sweep [inc][186256]: completed
Fri Feb 28 18:36:49 2014
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_lmd0_2032294.trc (incident=186257):
ORA-04031: unable to allocate 3768 bytes of shared memory ("shared pool","unknown object","sga heap(1,0)","ges enqueues")
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Insufficient shared pool to allocate a GES object (ospid 2032294)
果然实例2上的ASM出现了大量ORA-4031错误。检查ASM启动的参数配置:
[sql]
view plaincopyprint?
![](https://oscdn.geek-share.com/Uploads/Images/Content/201611/a7c8e286f463007e2a900848b93dd72c.png)
![](https://oscdn.geek-share.com/Uploads/Images/Content/201611/9e12f1d3e499fc949c886e7c9e0484f9)
Fri Feb 28 20:06:55 2012
NOTE: No asm libraries found in the system
ERROR: -5(Duplicate disk DATA_DG01:ASM_DISK1)
ERROR: -5(Duplicate disk DATA_DG01:ASM_DISK2)
MEMORY_TARGET defaulting to 411041792.
* instance_number obtained from CSS = 2, checking
for the existence of node 0...
* node 0 does not exist. instance_number = 2
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Private Interface 'en1' configured
from GPnP for use as a private interconnect.
[name='en1', type=1, ip=169.254.78.6, mac=00-1a-64-bb-50-7d, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]
Public Interface
'en0' configured from GPnP
for use as a public interface.
[name='en0', type=1, ip=10.1.16.35, mac=00-1a-64-bb-50-7c, net=10.1.16.32/27, mask=255.255.255.224, use=public/1]
Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_1 parameter default value
as /u01/app/11.2.0.3/grid/dbs/arch
Autotune of undo retention
is turned on.
LICENSE_MAX_USERS = 0
SYS auditing is disabled
NOTE: Volume support enabled
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters
and Automatic Storage Management options.
ORACLE_HOME = /u01/app/11.2.0.3/grid
System name: AIX
Node name: orcldb2
Release: 1
Version: 6
Machine: 00C94E064C00
Using parameter settings in server-side pfile /u01/app/11.2.0.3/grid/dbs/init+ASM2.ora
System parameters with non-default
values:
large_pool_size = 12M
instance_type = "asm"
remote_login_passwordfile= "EXCLUSIVE"
asm_diskstring = "/dev/ocr_*"
asm_diskstring = "/dev/voting_*"
asm_diskstring = "/dev/asm_*"
asm_diskgroups = "DATA"
asm_diskgroups = "DATA_DG01"
asm_diskgroups = "SPFILE_DG"
asm_power_limit = 1
diagnostic_dest = "/u01/app/grid"
Cluster communication is configured
to use the following interface(s) for this instance
169.254.78.6
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
调整及建议:
当前ASM实例使用默认的MEMORY_TARGET配置,分配大小大约是400M,根据Oracle的MOS文章:ASM & Shared Pool (ORA-4031) [ID 437924.1],在11.2.0.3中,Oracle增加了ASM实例所允许的默认进程数PROCESSES,但是默认的MEMORY_TARGET参数没有增加。
根据Oracle的建议,11.2.0.3的MEMORY_TARGET至少应该设置到1536M,而MEMORY_MAX_TARGET设置为4096M。
[sql]
view plaincopyprint?
![](https://oscdn.geek-share.com/Uploads/Images/Content/201611/a7c8e286f463007e2a900848b93dd72c.png)
![](https://oscdn.geek-share.com/Uploads/Images/Content/201611/9e12f1d3e499fc949c886e7c9e0484f9)
SQL> alter system
set memory_max_target=4096m scope=spfile;
SQL> alter system
set memory_target=1536m scope=spfile;
对于当前的情况,如果短时间内无法重启DB和ASM实例,可以在问题节点配置一个本地归档路径,设置目标路径为本地磁盘,从而避免归档无法完成而导致的实例HANG死。
平台:RedHat EnterPrise 5.8 X86_X64
数据库:Oracle EnterPrise 11.2.0.3
集群软件:Oracle grid 11.2.0.3
故障现象:
数据库出现了归档失败,报错 ARC3 ERROR 19504(alert_hdwkxt2.log)
日志信息如下:
[plain]
view plaincopyprint?
![](https://oscdn.geek-share.com/Uploads/Images/Content/201611/a7c8e286f463007e2a900848b93dd72c.png)
Fri Feb 28 19:49:04 2014
ARC1: Error 19504 Creating archive log file to '+DATA02'
ARCH: Archival stopped, error occurred. Will continue retrying
ORACLE Instance orcl1 - Archival Error
ORA-16038: log 14 sequence# 68244 cannot be archived
ORA-19504: failed to create file ""
ORA-00312: online log 14 thread 1: '+DATA02/orcl/onlinelog/group_14.264.792274883'
ORA-00312: online log 14 thread 1: '+DATA02/orcl/onlinelog/group_14.265.792274889'
Archiver process freed from errors. No longer stopped
Fri Feb 28 19:50:22 2014
ARC0: LGWR is actively archiving destination LOG_ARCHIVE_DEST_3
ARCH: Archival stopped, error occurred. Will continue retrying
ORACLE Instance orcl1 - Archival Error
ORA-16014: log 14 sequence# 68244 not archived, no available destinations
ORA-00312: online log 14 thread 1: '+DATA02/orcl/onlinelog/group_14.264.792274883'
ORA-00312: online log 14 thread 1: '+DATA02/orcl/onlinelog/group_14.265.792274889'
ARC0: Archive log rejected (thread 1 sequence 68240) at host 'orclsh'
FAL[server, ARC0]: FAL archive failed, see trace file.
ARCH: FAL archive failed. Archiver continuing
ORACLE Instance orcl1 - Archival Error. Archiver continuing.
Fri Feb 28 19:49:04 2014 ARC1: Error 19504 Creating archive log file to '+DATA02' ARCH: Archival stopped, error occurred. Will continue retrying ORACLE Instance orcl1 - Archival Error ORA-16038: log 14 sequence# 68244 cannot be archived ORA-19504: failed to create file "" ORA-00312: online log 14 thread 1: '+DATA02/orcl/onlinelog/group_14.264.792274883' ORA-00312: online log 14 thread 1: '+DATA02/orcl/onlinelog/group_14.265.792274889' Archiver process freed from errors. No longer stopped Fri Feb 28 19:50:22 2014 ARC0: LGWR is actively archiving destination LOG_ARCHIVE_DEST_3 ARCH: Archival stopped, error occurred. Will continue retrying ORACLE Instance orcl1 - Archival Error ORA-16014: log 14 sequence# 68244 not archived, no available destinations ORA-00312: online log 14 thread 1: '+DATA02/orcl/onlinelog/group_14.264.792274883' ORA-00312: online log 14 thread 1: '+DATA02/orcl/onlinelog/group_14.265.792274889' ARC0: Archive log rejected (thread 1 sequence 68240) at host 'orclsh' FAL[server, ARC0]: FAL archive failed, see trace file. ARCH: FAL archive failed. Archiver continuing ORACLE Instance orcl1 - Archival Error. Archiver continuing.
分析:
由于归档失败发生在ASM磁盘上,首先检查ASM磁盘空间以及DB_RECOVERY_FILE_DEST_SIZE,ASM磁盘空间是足够的,而且由于只有一个节点出现出现了无法归档的问题,也可以排除是空间不足造成的。确认两个节点的DB_RECOVERY_FILE_DEST_SIZE参数设置都是0,基本上可以判断问题和当前节点的ASM实例状态不正常有关。
检查ASM实例的错误信息:
[plain]
view plaincopyprint?
![](https://oscdn.geek-share.com/Uploads/Images/Content/201611/a7c8e286f463007e2a900848b93dd72c.png)
Fri Feb 28 19:41:23 2014
Dumping diagnostic data in directory=[cdmp_20130702164115], requested by (instance=2, osid=2032294 (LMD0)), summary=[incident=165521].
Fri Feb 28 19:49:19 2014
Dumping diagnostic data in directory=[cdmp_20130702164845], requested by (instance=2, osid=2032294 (LMD0)), summary=[incident=165522].
Fri Feb 28 19:55:56 2014
Dumping diagnostic data in directory=[cdmp_20130702165517], requested by (instance=2, osid=2032294 (LMD0)), summary=[incident=165523].
Fri Feb 28 19:41:23 2014 Dumping diagnostic data in directory=[cdmp_20130702164115], requested by (instance=2, osid=2032294 (LMD0)), summary=[incident=165521]. Fri Feb 28 19:49:19 2014 Dumping diagnostic data in directory=[cdmp_20130702164845], requested by (instance=2, osid=2032294 (LMD0)), summary=[incident=165522]. Fri Feb 28 19:55:56 2014 Dumping diagnostic data in directory=[cdmp_20130702165517], requested by (instance=2, osid=2032294 (LMD0)), summary=[incident=165523].
当前节点ASM实例出现了的这个信息,说明报错发生在实例2上:
[plain]
view plaincopyprint?
![](https://oscdn.geek-share.com/Uploads/Images/Content/201611/a7c8e286f463007e2a900848b93dd72c.png)
Fri Feb 28 18:34:25 2014
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_lmd0_2032294.trc (incident=186256):
ORA-04031: unable to allocate 3768 bytes of shared memory ("shared pool","unknown object","sga heap(1,0)","ges enqueues")
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Insufficient shared pool to allocate a GES object (ospid 2032294)
Fri Feb 28 18:29:53 2014
Sweep [inc][186256]: completed
Fri Feb 28 18:36:49 2014
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_lmd0_2032294.trc (incident=186257):
ORA-04031: unable to allocate 3768 bytes of shared memory ("shared pool","unknown object","sga heap(1,0)","ges enqueues")
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Insufficient shared pool to allocate a GES object (ospid 2032294)
Fri Feb 28 18:34:25 2014 Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_lmd0_2032294.trc (incident=186256): ORA-04031: unable to allocate 3768 bytes of shared memory ("shared pool","unknown object","sga heap(1,0)","ges enqueues") Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Insufficient shared pool to allocate a GES object (ospid 2032294) Fri Feb 28 18:29:53 2014 Sweep [inc][186256]: completed Fri Feb 28 18:36:49 2014 Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_lmd0_2032294.trc (incident=186257): ORA-04031: unable to allocate 3768 bytes of shared memory ("shared pool","unknown object","sga heap(1,0)","ges enqueues") Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Insufficient shared pool to allocate a GES object (ospid 2032294)
果然实例2上的ASM出现了大量ORA-4031错误。检查ASM启动的参数配置:
[sql]
view plaincopyprint?
![](https://oscdn.geek-share.com/Uploads/Images/Content/201611/a7c8e286f463007e2a900848b93dd72c.png)
Fri Feb 28 20:06:55 2012
NOTE: No asm libraries found in the system
ERROR: -5(Duplicate disk DATA_DG01:ASM_DISK1)
ERROR: -5(Duplicate disk DATA_DG01:ASM_DISK2)
MEMORY_TARGET defaulting to 411041792.
* instance_number obtained from CSS = 2, checking
for the existence of node 0...
* node 0 does not exist. instance_number = 2
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Private Interface 'en1' configured
from GPnP for use as a private interconnect.
[name='en1', type=1, ip=169.254.78.6, mac=00-1a-64-bb-50-7d, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]
Public Interface
'en0' configured from GPnP
for use as a public interface.
[name='en0', type=1, ip=10.1.16.35, mac=00-1a-64-bb-50-7c, net=10.1.16.32/27, mask=255.255.255.224, use=public/1]
Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_1 parameter default value
as /u01/app/11.2.0.3/grid/dbs/arch
Autotune of undo retention
is turned on.
LICENSE_MAX_USERS = 0
SYS auditing is disabled
NOTE: Volume support enabled
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters
and Automatic Storage Management options.
ORACLE_HOME = /u01/app/11.2.0.3/grid
System name: AIX
Node name: orcldb2
Release: 1
Version: 6
Machine: 00C94E064C00
Using parameter settings in server-side pfile /u01/app/11.2.0.3/grid/dbs/init+ASM2.ora
System parameters with non-default
values:
large_pool_size = 12M
instance_type = "asm"
remote_login_passwordfile= "EXCLUSIVE"
asm_diskstring = "/dev/ocr_*"
asm_diskstring = "/dev/voting_*"
asm_diskstring = "/dev/asm_*"
asm_diskgroups = "DATA"
asm_diskgroups = "DATA_DG01"
asm_diskgroups = "SPFILE_DG"
asm_power_limit = 1
diagnostic_dest = "/u01/app/grid"
Cluster communication is configured
to use the following interface(s) for this instance
169.254.78.6
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
Fri Feb 28 20:06:55 2012 NOTE: No asm libraries found in the system ERROR: -5(Duplicate disk DATA_DG01:ASM_DISK1) ERROR: -5(Duplicate disk DATA_DG01:ASM_DISK2) MEMORY_TARGET defaulting to 411041792. * instance_number obtained from CSS = 2, checking for the existence of node 0... * node 0 does not exist. instance_number = 2 Starting ORACLE instance (normal) LICENSE_MAX_SESSION = 0 LICENSE_SESSIONS_WARNING = 0 Private Interface 'en1' configured from GPnP for use as a private interconnect. [name='en1', type=1, ip=169.254.78.6, mac=00-1a-64-bb-50-7d, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62] Public Interface 'en0' configured from GPnP for use as a public interface. [name='en0', type=1, ip=10.1.16.35, mac=00-1a-64-bb-50-7c, net=10.1.16.32/27, mask=255.255.255.224, use=public/1] Picked latch-free SCN scheme 3 Using LOG_ARCHIVE_DEST_1 parameter default value as /u01/app/11.2.0.3/grid/dbs/arch Autotune of undo retention is turned on. LICENSE_MAX_USERS = 0 SYS auditing is disabled NOTE: Volume support enabled Starting up: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production With the Real Application Clusters and Automatic Storage Management options. ORACLE_HOME = /u01/app/11.2.0.3/grid System name: AIX Node name: orcldb2 Release: 1 Version: 6 Machine: 00C94E064C00 Using parameter settings in server-side pfile /u01/app/11.2.0.3/grid/dbs/init+ASM2.ora System parameters with non-default values: large_pool_size = 12M instance_type = "asm" remote_login_passwordfile= "EXCLUSIVE" asm_diskstring = "/dev/ocr_*" asm_diskstring = "/dev/voting_*" asm_diskstring = "/dev/asm_*" asm_diskgroups = "DATA" asm_diskgroups = "DATA_DG01" asm_diskgroups = "SPFILE_DG" asm_power_limit = 1 diagnostic_dest = "/u01/app/grid" Cluster communication is configured to use the following interface(s) for this instance 169.254.78.6 cluster interconnect IPC version:Oracle UDP/IP (generic) IPC Vendor 1 proto 2
调整及建议:
当前ASM实例使用默认的MEMORY_TARGET配置,分配大小大约是400M,根据Oracle的MOS文章:ASM & Shared Pool (ORA-4031) [ID 437924.1],在11.2.0.3中,Oracle增加了ASM实例所允许的默认进程数PROCESSES,但是默认的MEMORY_TARGET参数没有增加。
根据Oracle的建议,11.2.0.3的MEMORY_TARGET至少应该设置到1536M,而MEMORY_MAX_TARGET设置为4096M。
[sql]
view plaincopyprint?
![](https://oscdn.geek-share.com/Uploads/Images/Content/201611/a7c8e286f463007e2a900848b93dd72c.png)
SQL> alter system
set memory_max_target=4096m scope=spfile;
SQL> alter system
set memory_target=1536m scope=spfile;
SQL> alter system set memory_max_target=4096m scope=spfile; SQL> alter system set memory_target=1536m scope=spfile;
对于当前的情况,如果短时间内无法重启DB和ASM实例,可以在问题节点配置一个本地归档路径,设置目标路径为本地磁盘,从而避免归档无法完成而导致的实例HANG死。
相关文章推荐
- 11.2.0.3 ASM实例出现ORA-4031错误导致数据库归档失败
- 11.2.0.3 ASM实例出现ORA-4031导致数据库归档失败
- pfile文件错误导致数据库实例启动失败
- 连接ASM实例后出现ORA-1012错误1.2.3
- Oracle技术之ASM上恢复STANDBY数据库出现ORA-15173错误
- Oracle数据库案例整理-Oracle系统运行时故障-tnsnames.ora文件配置错误导致数据库TNSping失败
- oracle数据库连接时提示ora-00257错误,提示数据库归档日志归档失败
- myeclipse中建立oracle数据库连接时出现ora-12705错误的解决办法
- vs.net C# 连接数据库 出现错误 用户 '(null)' 登录失败。原因: 未与信任 SQL Server 连接相关联
- 安装oracle10g数据库出现错误oui-25031一些configuration assistant失败
- ora-00257数据库日志归档器错误解决方法
- 在破坏掉ASM 某个磁盘组对应的磁盘后,出现如下的错误,ORA-15001, 并引起instance 关闭.
- SPFILE 错误导致数据库无法启动(ORA-01565)
- 解决数据库实例启动ORA-00205或控制文件损坏错误
- 连接数据库经常出现ORA-12545错误
- Asp.Net : 捕捉和记录网站中出现的所有未处理错误,抛出详细的页面来源和访问ip,调用的接口方法及异常实例(记事本日志,系统日志及数据库日志)
- SPFILE 错误导致数据库无法启动(ORA-01565)
- Oracle 10g 关闭数据库操作不当,出现ORA-10997/09968的错误
- 向数据库中写入数字时出现ORA-02036错误的原因
- 关于dedecms近期频繁出现[错误警告:连接数据库失败..]错误的解决方法