11G ocrvote盘损坏恢复
2015-08-28 13:43
399 查看
11G OCR、VOTING损坏恢复
概述:11Grac经常有会碰到ora.cssd,ora.crsd进程启动失败。一般css.d进程失败多是由于voting盘损坏或者voting盘数量不足导致,而crsd进程失败多是OCR损坏或者集群的配置信息损坏1.OCR一般默认4个小时备份一次,在备份文件位置处,至少存在5份OCR备份信息,最近4小时生成的OCR,最近一天生成的一份备份,最近一周的一份备份[grid@rac1 rac1]$ ocrconfig -showbackup rac2 2015/08/25 14:54:37 /u01/app/11.2/grid/cdata/rac-cluster/backup00.ocr rac2 2015/08/24 21:12:34 /u01/app/11.2/grid/cdata/rac-cluster/backup01.ocr rac2 2015/08/24 17:12:34 /u01/app/11.2/grid/cdata/rac-cluster/backup02.ocr rac2 2015/08/24 13:12:33 /u01/app/11.2/grid/cdata/rac-cluster/day.ocr rac1 2015/08/13 13:12:12 /u01/app/11.2/grid/cdata/rac-cluster/week.ocr2.手动备份OCR信息
[root@rac1 grid]# ocrconfig -showbackup rac2 2015/08/25 14:54:37 /u01/app/11.2/grid/cdata/rac-cluster/backup00.ocr rac2 2015/08/24 21:12:34 /u01/app/11.2/grid/cdata/rac-cluster/backup01.ocr rac2 2015/08/24 17:12:34 /u01/app/11.2/grid/cdata/rac-cluster/backup02.ocr rac2 2015/08/24 13:12:33 /u01/app/11.2/grid/cdata/rac-cluster/day.ocr rac1 2015/08/13 13:12:12 /u01/app/11.2/grid/cdata/rac-cluster/week.ocr rac1 2015/08/28 09:09:18 /u01/app/11.2/grid/cdata/rac-cluster/backup_20150828_090918.ocr3.模拟ocr盘损坏检查ocr、voting所使用的盘
[root@rac1 grid]# crsctl query css votedisk STATE File Universal Id File Name Disk group -- ----- ----------------- --------- --------- 1. ONLINE 1da20ec3577a4fa9bf2882a391d66afb (/dev/raw/raw1) [DATA]模拟损坏OCR盘
dd if=/dev/zero of=/dev/raw/raw1 bs=4K count=1004 启动集群,打开集群的alert日志
[grid@rac1 ~]$ cd /u01/app/11.2/grid/log/rac1/ [grid@rac1 rac1]$ tail -f alertrac1.log [root@rac1 grid]# crsctl start cluster -all我们可以看到有以下报错:
2015-08-28 09:22:17.471: [ohasd(1990)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE 2015-08-28 09:22:17.471: [ohasd(1990)]CRS-2769:Unable to failover resource 'ora.diskmon'. 2015-08-28 09:22:30.845: [cssd(7243)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/11.2/grid/log/rac1/cssd/ocssd.log即集群找不到voting盘文件,我们知道ocr记录的是集群配置信息,这也于我们dd掉ocr盘预期的结果相符以下步骤用来恢复ocr,重新启动集群:5 停止所有节点集群
[root@rac1 grid]# crsctl stop crs -f如果无法停止,可以使用以下方式:
ps -elf | egrep "PID|d.bin|ohas|oraagent|orarootagent|cssdagent|cssdmonitor" | grep -v grep 上面这种方式需要对查询出来的PID手动kill -9 ps -elf | egrep "d.bin|ohas|oraagent|orarootagent|cssdagent|cssdmonitor" | grep -v grep |awk '{print $4}' |xargs -n 10 kill -9通过以下方式检查确认集群停止成功
[root@rac1 grid]# ps -ef|grep crs root 9229 4909 0 09:42 pts/0 00:00:00 grep crs [root@rac1 grid]# ps -ef|grep css root 9231 4909 0 09:42 pts/0 00:00:00 grep css [root@rac1 grid]# ps -ef|grep evm root 9236 4909 0 09:42 pts/0 00:00:00 grep evm [root@rac1 grid]# ps -ef|grep ohas root 9204 1 0 09:41 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run root 9239 4909 0 09:42 pts/0 00:00:00 grep ohas6 以独占模式启动crs
[root@rac1 grid]# crsctl start crs -excl -nocrs CRS-4123: Oracle High Availability Services has been started. CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'rac1' CRS-2677: Stop of 'ora.drivers.acfs' on 'rac1' succeeded CRS-2672: Attempting to start 'ora.mdnsd' on 'rac1' CRS-2676: Start of 'ora.mdnsd' on 'rac1' succeeded CRS-2672: Attempting to start 'ora.gpnpd' on 'rac1' CRS-2676: Start of 'ora.gpnpd' on 'rac1' succeeded CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1' CRS-2672: Attempting to start 'ora.gipcd' on 'rac1' CRS-2676: Start of 'ora.cssdmonitor' on 'rac1' succeeded CRS-2676: Start of 'ora.gipcd' on 'rac1' succeeded CRS-2672: Attempting to start 'ora.cssd' on 'rac1' CRS-2672: Attempting to start 'ora.diskmon' on 'rac1' CRS-2676: Start of 'ora.diskmon' on 'rac1' succeeded CRS-2676: Start of 'ora.cssd' on 'rac1' succeeded CRS-2672: Attempting to start 'ora.drivers.acfs' on 'rac1' CRS-2679: Attempting to clean 'ora.cluster_interconnect.haip' on 'rac1' CRS-2672: Attempting to start 'ora.ctssd' on 'rac1' CRS-2681: Clean of 'ora.cluster_interconnect.haip' on 'rac1' succeeded CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac1' CRS-2676: Start of 'ora.drivers.acfs' on 'rac1' succeeded CRS-2676: Start of 'ora.ctssd' on 'rac1' succeeded CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rac1' succeeded CRS-2672: Attempting to start 'ora.asm' on 'rac1' CRS-2676: Start of 'ora.asm' on 'rac1' succeeded说明:-excl 该参数指定使用独占模式-nocrs 该参数指定忽略查找crs及voting此时集群状态:
[grid@rac1 trace]$ crsctl stat res -t -init -------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS ---------------------------------------------------------------- Cluster Resources ------------------------------------------------------------------------ ora.asm 1 ONLINE INTERMEDIATE rac1 OCR not started ora.cluster_interconnect.haip 1 ONLINE ONLINE rac1 ora.crf 1 OFFLINE OFFLINE ora.crsd 1 OFFLINE OFFLINE ora.cssd 1 ONLINE ONLINE rac1 ora.cssdmonitor 1 ONLINE ONLINE rac1 ora.ctssd 1 ONLINE ONLINE rac1 ACTIVE:0 ora.diskmon 1 OFFLINE OFFLINE ora.drivers.acfs 1 ONLINE ONLINE rac1 ora.evmd 1 OFFLINE OFFLINE ora.gipcd 1 ONLINE ONLINE rac1 ora.gpnpd 1 ONLINE ONLINE rac1 ora.mdnsd 1 ONLINE ONLINE rac17 重新创建ocrvoting
SQL> create diskgroup data external redundancy disk '/dev/raw/raw1' attribute 'au_size'='1M','compatible.asm' = '11.2.0','compatible.rdbms' = '11.2.0';注意此处的ocrvote的名字一定要和损坏之前的一致,否则在恢复ocrvote的时候会报错:
PROT-35: The configured OCR locations are not accessible.8 利用备份恢复OCR
[root@rac1 bin]# ./ocrconfig -restore /u01/app/11.2/grid/cdata/rac-cluster/backup00.ocr可以用以下命令检查:
cluvfy comp ocr -n all ocrcheck9 恢复vote盘
[root@rac1 bin]# ./crsctl replace votedisk +DATA Successful addition of voting disk 4201f39953204fbdbf2b502ef4abe9cb. Successfully replaced voting disk group with +DATA. CRS-4266: Voting file(s) successfully replaced注意此处可能会报错:
crsctl replace votedisk +ocrvote CRS-4602: Failed 27 to add voting file 5a71f4b0868e4f8abfc4808566c5c7fa. CRS-4602: Failed 27 to add voting file 66699f04c8a74f57bf08e0682294e449. CRS-4602: Failed 27 to add voting file 7181a4d009884fecbff2cab4c69f2de2. Failed to replace voting disk group with +ocrvote. CRS-4000: Command Replace failed, or completed with errors.可以用以下方式解决:
SQL> show parameter disk NAME TYPE ------------------------------------ ---------------------- VALUE ------------------------------ asm_diskgroups string OCRVOTE asm_diskstring string SQL> alter system set asm_diskstring='/dev/raw/*';然后重新执行命令恢复vote盘检查确认:
[grid@rac1 ~]$ crsctl query css votedisk10 重新创建spfile注意,如何集群asm所使用的spfile放在了ocr共享盘,此处需要重新创建,方式有两种:1) 利用11g的特性
create spfile from memory2) 手动创建
root@rac2 ~]# vi /tmp/asm_pfile.txt加入如下参数:
*.asm_power_limit=1 *.diagnostic_dest='/u01/app/grid/11.2.0/log' *.instance_type='asm' *.large_pool_size=12M *.remote_login_passwordfile='EXCLUSIVE'利用我们自己编辑的文档重新创建spfile
SQL> create spfile='+DATA' from pfile='/tmp/asm_pfile.txt';11 关闭集群,重启集群:
[root@rac1 grid]# crsctl stop crs -f [root@rac1 grid]# crsctl start crs [root@rac1 grid]# crsctl start cluster -all12 检查集群资源状态:1)集群信息
[root@rac1 grid]# crsctl stat res -t -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Local Resource
ora.DATA.dg ONLINE ONLINE rac1 ONLINE ONLINE rac2 ora.LISTENER.lsnr ONLINE ONLINE rac1 ONLINE ONLINE rac2 ora.ORADATA.dg ONLINE ONLINE rac1 ONLINE ONLINE rac2 ora.asm ONLINE ONLINE rac1 Started ONLINE ONLINE rac2 Started ora.gsd OFFLINE OFFLINE rac1 OFFLINE OFFLINE rac2 ora.net1.network ONLINE ONLINE rac1 ONLINE ONLINE rac2 ora.ons ONLINE ONLINE rac1 ONLINE ONLINE rac2 ora.registry.acfs ONLINE ONLINE rac1 ONLINE ONLINE rac2 -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE rac1 ora.cvu 1 ONLINE ONLINE rac1 ora.oc4j 1 ONLINE ONLINE rac1 ora.rac1.vip 1 ONLINE ONLINE rac1 ora.rac2.vip 1 ONLINE ONLINE rac2 ora.scan1.vip 1 ONLINE ONLINE rac1 ora.sunny.db 1 ONLINE ONLINE rac1 Open 2 ONLINE ONLINE rac2 Open2)检查ocr vote信息
[root@rac1 grid]# ocrcheck Status of Oracle Cluster Registry is as follows : Version : 3 Total space (kbytes) : 262120 Used space (kbytes) : 3084 Available space (kbytes) : 259036 ID : 101930821 Device/File Name : +DATA Device/File integrity check succeeded Device/File not configured Device/File not configured Device/File not configured Device/File not configured Cluster registry integrity check succeeded Logical corruption check succeeded3)检查spfile信息
SQL> show parameter spfile NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ spfile string /u01/app/11.2/grid/dbs/spfile+ ASM1.ora4)检查DG是否正常
[grid@rac1 rac-cluster]$ asmcmd lsdg State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name MOUNTED EXTERN N 512 4096 1048576 3082 2687 0 2687 0 Y DATA/ MOUNTED EXTERN N 512 4096 1048576 8197 5802 0 5802 0 N ORADATA/到此恢复完成,更spfile的位置到DATA里面:
SQL> create pfile='/tmp/aa.txt' from spfile; File created. SQL> create spfile='+DATA' from pfile='/tmp/aa.txt'; File created.关闭集群,重新启动集群:
crsctl stop cluster -all crsctl start cluster -all13 知识拓展1)关于export 和 import 手工备份OCR:
[root@rac1 rac-cluster]# ocrconfig -manualbackup可以使用import参数到处ocr信息,其也可以用了恢复ocr
ocrconfig -export /tmp/ocr.bak ocrconfig -import file_name如果使用export的ocr备份恢复ocr盘,不可以使用restore参数,需要使用 -import参数2)关于利用kfed命令读取磁盘头,获得ocr在磁盘位置信息
[root@rac1 rac-cluster]# kfed read /dev/raw/raw1 | grep -E 'vfstart|vfend' kfdhdb.vfstart: 320 ; 0x0ec: 0x00000140 kfdhdb.vfend: 352 ; 0x0f0: 0x00000160特别:对于没有备份信息恢复ocr,只能才去重建方式,所以日常工作中一定要注意检查ocr的备份信息
本文出自 “oracle一体机” 博客,请务必保留此出处http://woquer.blog.51cto.com/9290811/1689258
相关文章推荐
- Core Animation1-简介
- spring配置文件命名空间读取顺序
- Ant打可执行jar包指南
- 【设计模式】从菜鸟到大鸟之《大话设计模式》初体验
- 数据库通用Jdbc操作
- 字典方法setValue:ForKey setObject:ForKey的区别
- Android_开源框架_AndroidUniversalImageLoader网络图片加载
- 在O(1)时间删除链表结点
- javascript常用代码大全
- 男人未老先衰有征兆
- 《从零开始学Swift》学习笔记(Day 7)——Swift 2.0中的print函数几种重载形式
- 整理网上资料---C标准库值篇一 :POSIX.1 and ISO C标准头文件
- 抢购项目总结
- CALayer4-自定义层
- Mybatis的动态SQL
- PHP初学-http简述
- android图片充满屏幕
- 基于NodeJS的全栈式开发(基于NodeJS的前后端分离)
- Laravel 4 系列入门教程(一)
- [原创]Java中字符串、数组、集合及JSONArray的长度属性