您的位置:首页 > 其它

RAC在OCR磁盘故障且无备份时快速恢复的方法

2012-03-19 14:19 531 查看
问题背景:

在刷业务库的过程出现IO访问磁盘失败!检查发现RAC进程全部退出!!!重新执行/opt/oracrs/bin/crsctl start crs失败

出错信息(tail -f /var/log/messages):

Oracle Cluster Registry initialization failed with invalid format: PROC-22: The OCR backend has an invalid format

怀疑时OCR盘物理损坏(后来发现用作OCR的阵列有磁盘坏告警,立即修复,阵列故障处理略过)

这时在一个RAC节点检查OCR是否有备份

ocrconfig -showbackup无结果输出(如果有自动备份则有结果,不幸的是,没有!)

/opt/oracle/product/11g/db/cdata下无OCR备份(如果有自动备份,则在此目录有备份的文件)

ocrcheck无结果输出(这时基本可以定位是OCR盘故障了)

GDGZ-DCS-SV01C-RAC01:/opt/oracle/product/11g/db/bin # ./crsctl check boot

Oracle Cluster Registry initialization failed with invalid format: PROC-22: The OCR backend has an invalid format

和前面的启动报错是一致的!!!

问题处理(OCR故障且无备份如何快速重建恢复):

一,清除OCR

1、停止crs进程(我此次RAC已经自己退出,故不涉及;如果发现有crs进程,则需手动退出)

#/opt/oracrs/bin/crsctl stop crs

2、备份整个OCR目录

#cp /opt/oracrs /opt/oracrs_bak

3,分别在每个节点上执行$ORA_CRS_HOME/install/rootdelete.sh (必须root执行)

GDGZ-DCS-SV02C-RAC01:/etc/oracle # /opt/oracrs/install/rootdelete.sh

Getting local node name

NODE = GDGZ-DCS-SV02C-RAC01

PRKO-2006 : Invalid node name: GDGZ-DCS-SV02C-RAC01

Oracle Cluster Registry initialization failed with invalid format: PROC-22: The OCR backend has an invalid format

Oracle CRS stack is not running.

Oracle CRS stack is down now.

Removing script for Oracle Cluster Ready services

Updating ocr file for downgrade

Cleaning up SCR settings in '/etc/oracle/scls_scr'

Cleaning up Network socket directories

4,在某个RAC节点执行/opt/oracrs/install/rootdeinstall.sh

shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory

Verifying existence of ocr.loc file

Removing contents from OCR device

2560+0 records in

2560+0 records out

10485760 bytes (10 MB) copied, 1.15907 seconds, 9.0 MB/s

5,删除旧OCR信息

修改 /etc/inittab, 删除以下三行.

h1:2:respawn:/etc/init.evmd run >/dev/null 2>&1 </dev/null

h2:2:respawn:/etc/init.cssd fatal >/dev/null 2>&1 </dev/null

h3:2:respawn:/etc/init.crsd run >/dev/null 2>&1 </dev/null

rm -rf /etc/oracle/*

rm -f /etc/init.d/init.cssd

rm -f /etc/init.d/init.crs

rm -f /etc/init.d/init.crsd

rm -f /etc/init.d/init.evmd

rm -f /etc/inittab.crs

cp /etc/inittab.orig /etc/inittab

rm -rf /var/tmp/.oracle

rm -rf /tmp/.oracle

(由于执行rootdelete.sh时删除了上面的一些目录或文件,故仅需补充删除存在的即可)

6,使用dd清除vote disk和ocr,需要修改成相应的文件

dd if=/dev/zero of=/dev/raw/raw1 bs=8192 count=12800

dd if=/dev/zero of=/dev/raw/raw2 bs=8192 count=12800

GDGZ-DCS-SV02C-RAC01:/opt # dd if=/dev/zero of=/dev/raw/raw2 bs=8192 count=128000

dd: writing `/dev/raw/raw2': No space left on device

123500+0 records in

123499+0 records out

1011709440 bytes (1.0 GB) copied, 67.9911 seconds, 14.9 MB/s

GDGZ-DCS-SV02C-RAC01:/opt # dd if=/dev/zero of=/dev/raw/raw1 bs=8192 count=128000

dd: writing `/dev/raw/raw1': No space left on device

123496+0 records in

123495+0 records out

1011677184 bytes (1.0 GB) copied, 66.8413 seconds, 15.1 MB/s

(从第4步结果提示来看,应该ocr已经清除了,可以尝试只清除VOTEDISK,可惜我先执行了dd清除命令)

二,重建OCR(这里我独创性的使用图形工具runInstaller,模仿新建的方法来重建OCR)

1,由于首次安装CRS时需要新的空的/opt/oracle /opt/oracrs目录,所以先在各个节点执行如下命令:

cd /opt

mv oracle oracle_old

mv oracrs oracrs_old

mkdir oracle

mkdir oracrs

mkdir /opt/oracle/oraInventory

chown oracle:dba /opt/oracle/oraInventory

chmod 755 /opt/oracle/oraInventory

chown -R oracle:dba /opt/oracle

chmod -R 770 /opt/oracle

chown -R oracle:dba /opt/oracrs

chmod -R 770 /opt/oracrs

2,xmanager登陆RAC1节点使用图形工具安装CRS(这里可以参考RAC安装过程的CRS安装部分)

oracle用户执行/opt/orabak/clusterware/runInstaller

需要注意的是,这里的CRS安装配置需要和第一次安装配置保持完全一致!!!

安装最后需要根据图形界面提示在各个RAC节点执行root.sh

GDGZ-DCS-SV01C-RAC01:/opt/oracle # /opt/oracrs/root.sh

Checking to see if Oracle CRS stack is already configured

/etc/oracle does not exist. Creating it now.

Setting the permissions on OCR backup directory

Setting up Network socket directories

Oracle Cluster Registry configuration upgraded successfully

Successfully accumulated necessary OCR keys.

Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.

node <nodenumber>: <nodename> <private interconnect name> <hostname>

node 1: gdgz-dcs-sv01c-rac01 gdgz-dcs-sv01c-rac01_base gdgz-dcs-sv01c-rac01

node 2: gdgz-dcs-sv02c-rac01 gdgz-dcs-sv02c-rac01_base gdgz-dcs-sv02c-rac01

node 3: gdgz-dcs-sv03c-rac01 gdgz-dcs-sv03c-rac01_base gdgz-dcs-sv03c-rac01

node 4: gdgz-dcs-sv04c-rac01 gdgz-dcs-sv04c-rac01_base gdgz-dcs-sv04c-rac01

Creating OCR keys for user 'root', privgrp 'root'..

Operation successful.

Now formatting voting device: /dev/raw/raw2

Format of 1 voting devices complete.

Startup will be queued to init within 30 seconds.

Adding daemons to inittab

Expecting the CRS daemons to be up within 600 seconds.

Cluster Synchronization Services is active on these nodes.

gdgz-dcs-sv01c-rac01

gdgz-dcs-sv02c-rac01

gdgz-dcs-sv03c-rac01

gdgz-dcs-sv04c-rac01

Cluster Synchronization Services is active on all the nodes.

Waiting for the Oracle CRSD and EVMD to start

Oracle CRS stack installed and running under init(1M)

Running vipca(silent) for configuring nodeapps

Creating VIP application resource on (4) nodes.....

Creating GSD application resource on (4) nodes.....

Creating ONS application resource on (4) nodes.....

Starting VIP application resource on (4) nodes.....

Starting GSD application resource on (4) nodes.....

Starting ONS application resource on (4) nodes.....

Done.

点击OK后检查有一个失败,根据提示:/opt/oracrs/cfgtoollogs下有出错命令日志

/opt/oracrs/bin/cluvfy stage -post crsinst -n GDGZ-DCS-SV01C-RAC01,GDGZ-DCS-SV02C-RAC01,GDGZ-DCS-SV03C-RAC01,GDGZ-DCS-SV04C-RAC01

GDGZ-DCS-SV01C-RAC01:/opt/oracrs/cfgtoollogs # vi configToolFailedCommands

GDGZ-DCS-SV01C-RAC01:/opt/oracrs/cfgtoollogs # /opt/oracrs/bin/cluvfy stage -post crsinst -n GDGZ-DCS-SV01C-RAC01,GDGZ-DCS-SV02C-RAC01,GDGZ-DCS-SV03C-RAC01,GDGZ-DCS-SV04C-RAC01

这是root信任关系导致,可以不理!!!

3,此时CRS的3个核心进程资源已经online,但需要先在各节点执行/opt/oracrs/bin/crsctl stop crs,退出CRS进程。

4,恢复原来的/opt/oracle(各节点都要执行)

cd /opt

mv oracle oracle_new

mv oracle_old oracle

5,在各节点执行/opt/oracrs/bin/crsctl start crs启动CRS

6,将数据库、实例等信息手工注册到OCR中

有ASM需要向OCR中注册

srvctl add asm -n gdgz-dcs-sv01c-rac01 -i +ASM1 -o $ORACLE_HOME

srvctl add asm -n gdgz-dcs-sv02c-rac01 -i +ASM2 -o $ORACLE_HOME

srvctl add asm -n gdgz-dcs-sv03c-rac01 -i +ASM3 -o $ORACLE_HOME

srvctl add asm -n gdgz-dcs-sv04c-rac01 -i +ASM4 -o $ORACLE_HOME

(参考深圳的ASM信息)

NAME=ora.gdsz-dcs-sv01c-rac01.ASM1.asm

TYPE=application

TARGET=ONLINE

STATE=ONLINE on gdsz-dcs-sv01c-rac01

注册数据库

srvctl add database -d ORA -o $ORACLE_HOME

(单独启动srvctl start database -d ORA,执行这个将下面加的实例也拉起来)

(参考深圳的db信息)

NAME=ora.ORA.db

TYPE=application

TARGET=ONLINE

STATE=ONLINE on gdsz-dcs-sv01c-rac01

注册实例

srvctl add instance -d ORA -i ORA1 -n gdgz-dcs-sv01c-rac01

srvctl add instance -d ORA -i ORA2 -n gdgz-dcs-sv02c-rac01

srvctl add instance -d ORA -i ORA3 -n gdgz-dcs-sv03c-rac01

srvctl add instance -d ORA -i ORA4 -n gdgz-dcs-sv04c-rac01

(参考深圳的inst信息)

注册服务

srvctl add service -d ORA -s service_ora -r "ORA1,ORA2,ORA3,ORA4" -P BASIC

(单独启动srvctl start service -d ORA -s service_ora,可以后面统一启动)

NAME=ora.ORA.ORA1.inst

TYPE=application

TARGET=ONLINE

STATE=ONLINE on gdsz-dcs-sv01c-rac01

添加LISTENER(建议用netca配置,需要先删除原来的监听再添加新的监听)

7,重新启动RAC

/opt/oracrs/bin/crsctl stop crs

/opt/oracrs/bin/crsctl start crs

8,检查RAC是否起来

oracle@GDGZ-DCS-SV04C-RAC01:~> crs_stat -t

Name Type Target State Host

------------------------------------------------------------

ora....A1.inst application ONLINE ONLINE gdgz...ac01

ora....A2.inst application ONLINE ONLINE gdgz...ac01

ora....A3.inst application ONLINE ONLINE gdgz...ac01

ora....A4.inst application ONLINE ONLINE gdgz...ac01

ora.ORA.db application ONLINE ONLINE gdgz...ac01

ora....RA1.srv application ONLINE ONLINE gdgz...ac01

ora....RA2.srv application ONLINE ONLINE gdgz...ac01

ora....RA3.srv application ONLINE ONLINE gdgz...ac01

ora....RA4.srv application ONLINE ONLINE gdgz...ac01

ora...._ora.cs application ONLINE ONLINE gdgz...ac01

ora....SM1.asm application ONLINE ONLINE gdgz...ac01

ora....01.lsnr application ONLINE ONLINE gdgz...ac01

ora....c01.gsd application ONLINE ONLINE gdgz...ac01

ora....c01.ons application ONLINE ONLINE gdgz...ac01

ora....c01.vip application ONLINE ONLINE gdgz...ac01

ora....SM2.asm application ONLINE ONLINE gdgz...ac01

ora....01.lsnr application ONLINE ONLINE gdgz...ac01

ora....c01.gsd application ONLINE ONLINE gdgz...ac01

ora....c01.ons application ONLINE ONLINE gdgz...ac01

ora....c01.vip application ONLINE ONLINE gdgz...ac01

ora....SM3.asm application ONLINE ONLINE gdgz...ac01

ora....01.lsnr application ONLINE ONLINE gdgz...ac01

ora....c01.gsd application ONLINE ONLINE gdgz...ac01

ora....c01.ons application ONLINE ONLINE gdgz...ac01

ora....c01.vip application ONLINE ONLINE gdgz...ac01

ora....SM4.asm application ONLINE ONLINE gdgz...ac01

ora....01.lsnr application ONLINE ONLINE gdgz...ac01

ora....c01.gsd application ONLINE ONLINE gdgz...ac01

ora....c01.ons application ONLINE ONLINE gdgz...ac01

ora....c01.vip application ONLINE ONLINE gdgz...ac01

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: