您的位置：首页 > 其它

VMWARE虚拟机在迁移10gRAC VOTEDISK时出现问题，从网上搜的解决方法

2013-12-12 22:27 232 查看

本人在VMWARE虚拟机在迁移10g RAC VOTEDISK时出现奇怪的问题，不能使用备份恢复VOTEDISK，

./crsctl query css votedisk输出有问题，非常奇怪，后来只能重新注册相关信息，从网上搜的解决方法，成功解决问题：

假设OCR磁盘和Votedisk磁盘全部破坏，并且都没有备份，该如何恢复，这时最简单的方法就是重新初始话OCR和Votedisk，具体操作如下：
参考《大话oracle rac》

模拟磁盘损坏：
[root@node1 ~]# crsctl stop crs
Stopping resources.
Error while stopping resources. Possible cause: CRSD is down.
Stopping CSSD.
Unable to communicate with the CSS daemon.
[root@node1 ~]# dd if=/dev/zero f=/dev/raw/raw1 bs=102400 count=1200
dd: writing `/dev/raw/raw1': No space left on device
1045+0 records in
1044+0 records out
106938368 bytes (107 MB) copied, 6.68439 seconds, 16.0 MB/s
You have new mail in /var/spool/mail/root
[root@node1 ~]# dd if=/dev/zero f=/dev/raw/raw2 bs=102400 count=1200
dd: writing `/dev/raw/raw2': No space left on device
1045+0 records in
1044+0 records out
106938368 bytes (107 MB) copied, 7.62786 seconds, 14.0 MB/s
[root@node1 ~]# dd if=/dev/zero f=/dev/raw/raw5 bs=102400 count=1200
dd: writing `/dev/raw/raw5': No space left on device
1045+0 records in
1044+0 records out
106938368 bytes (107 MB) copied, 8.75194 seconds, 12.2 MB/s
[root@node1 ~]# dd if=/dev/zero f=/dev/raw/raw6 bs=102400 count=1200
dd: writing `/dev/raw/raw6': No space left on device
1045+0 records in
1044+0 records out
106938368 bytes (107 MB) copied, 6.50958 seconds, 16.4 MB/s
[root@node1 ~]# dd if=/dev/zero f=/dev/raw/raw6 bs=102400 count=3000
dd: writing `/dev/raw/raw6': No space left on device
1045+0 records in
1044+0 records out
106938368 bytes (107 MB) copied, 6.61992 seconds, 16.2 MB/s
[root@node1 ~]# dd if=/dev/zero f=/dev/raw/raw7 bs=102400 count=3000
dd: writing `/dev/raw/raw7': No space left on device
2509+0 records in
2508+0 records out
256884736 bytes (257 MB) copied, 16.0283 seconds, 16.0 MB/s
[root@node1 ~]#
[root@node1 ~]#

1 停止所有节点的Clusterware Stack
Crsctl stop crs;
格式化所有的OCR和Votedisk

2 分别在每个节点用root用户执行$CRS_HOME\install\rootdelete.sh脚本
[root@node1 ~]# $CRS_HOME/install/rootdelete.sh
Shutting down Oracle Cluster Ready Services (CRS):
OCR initialization failed with invalid format: PROC-22: The OCR backend has an invalid format
Shutdown has begun. The daemons should exit soon.
Checking to see if Oracle CRS stack is down...
Oracle CRS stack is not running.
Oracle CRS stack is down now.
Removing script. for Oracle Cluster Ready services
Updating ocr file for downgrade
Cleaning up SCR settings in '/etc/oracle/scls_scr'

3 在任意一个节点上用root用户执行$CRS_HOME\install\rootinstall.sh 脚本
[root@node1 ~]# $CRS_HOME/install/rootdeinstall.sh
Removing contents from OCR device
2560+0 records in
2560+0 records out
10485760 bytes (10 MB) copied, 2.36706 seconds, 4.4 MB/s

4 在和上一步同一个节点上用root执行$CRS_HOME\root.sh脚本
[root@node1 ~]# $CRS_HOME/root.sh
WARNING: directory '/opt/ora10g/product/10.2.0' is not owned by root
WARNING: directory '/opt/ora10g/product' is not owned by root
WARNING: directory '/opt/ora10g' is not owned by root
Checking to see if Oracle CRS stack is already configured

Setting the permissions on OCR backup directory
Setting up NS directories
Oracle Cluster Registry configuration upgraded successfully
WARNING: directory '/opt/ora10g/product/10.2.0' is not owned by root
WARNING: directory '/opt/ora10g/product' is not owned by root
WARNING: directory '/opt/ora10g' is not owned by root
assigning default hostname node1 for node 1.
assigning default hostname node2 for node 2.
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node <nodenumber>: <nodename> <private interconnect name> <hostname>
node 1: node1 node1-priv node1
node 2: node2 node2-priv node2
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Now formatting voting device: /dev/raw/raw1
Format of 1 voting devices complete.
Startup will be queued to init within 90 seconds.
Adding daemons to inittab
Expecting the CRS daemons to be up within 600 seconds.
CSS is active on these nodes.
         node1
CSS is inactive on these nodes.
         node2
Local node checking complete.
Run root.sh on remaining nodes to start CRS daemons.

5 在其他节点用root执行行$CRS_HOME\root.sh脚本
[root@node2 ~]# $CRS_HOME/root.sh
WARNING: directory '/opt/ora10g/product/10.2.0' is not owned by root
WARNING: directory '/opt/ora10g/product' is not owned by root
WARNING: directory '/opt/ora10g' is not owned by root
Checking to see if Oracle CRS stack is already configured

Setting the permissions on OCR backup directory
Setting up NS directories
Oracle Cluster Registry configuration upgraded successfully
WARNING: directory '/opt/ora10g/product/10.2.0' is not owned by root
WARNING: directory '/opt/ora10g/product' is not owned by root
WARNING: directory '/opt/ora10g' is not owned by root
clscfg: EXISTING configuration version 3 detected.
clscfg: version 3 is 10G Release 2.
assigning default hostname node1 for node 1.
assigning default hostname node2 for node 2.
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node <nodenumber>: <nodename> <private interconnect name> <hostname>
node 1: node1 node1-priv node1
node 2: node2 node2-priv node2
clscfg: Arguments check out successfully.

NO KEYS WERE WRITTEN. Supply -force parameter to override.
-force is destructive and will destroy any previous cluster
configuration.
Oracle Cluster Registry for cluster has already been initialized
Startup will be queued to init within 90 seconds.
Adding daemons to inittab
Expecting the CRS daemons to be up within 600 seconds.
CSS is active on these nodes.
         node1
         node2
CSS is active on all nodes.
Waiting for the Oracle CRSD and EVMD to start
Waiting for the Oracle CRSD and EVMD to start
Oracle CRS stack installed and running under init(1M)
Running vipca(silent) for configuring nodeapps
Error 0(Native: listNetInterfaces:[3])
  [Error 0(Native: listNetInterfaces:[3])]
[root@node2 ~]# vipca
Error 0(Native: listNetInterfaces:[3])
  [Error 0(Native: listNetInterfaces:[3])]

上述错误的解决：
[root@node1 ~]# oifcfg iflist
eth1  10.10.17.0
virbr0  192.168.122.0
eth0  192.168.1.0
[root@node1 ~]# oifcfg setif -global eth0/192.168.1.0:public
[root@node1 ~]# oifcfg setif -global eth1/10.10.17.0:cluster_interconnect
[root@node1 ~]#
[root@node1 ~]#
[root@node1 ~]# oifcfg iflist
eth1  10.10.17.0
virbr0  192.168.122.0
eth0  192.168.1.0
[root@node1 ~]# oifcfg getif
eth0  192.168.1.0  global  public
eth1  10.10.17.0  global  cluster_interconnect

由于上述错误，ONS,GSD,VIP 没有创建成功，需手工运行vipca.

[root@node1 ~]# crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora.node1.gsd  application    ONLINE    ONLINE    node1
ora.node1.ons  application    ONLINE    ONLINE    node1
ora.node1.vip  application    ONLINE    ONLINE    node1
ora.node2.gsd  application    ONLINE    ONLINE    node2
ora.node2.ons  application    ONLINE    ONLINE    node2
ora.node2.vip  application    ONLINE    ONLINE    node2

6 用netca 命令重新配置监听，确认注册到Clusterware中
[root@node1 ~]# crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora....E1.lsnr application    ONLINE    ONLINE    node1
ora.node1.gsd  application    ONLINE    ONLINE    node1
ora.node1.ons  application    ONLINE    ONLINE    node1
ora.node1.vip  application    ONLINE    ONLINE    node1
ora....E2.lsnr application    ONLINE    ONLINE    node2
ora.node2.gsd  application    ONLINE    ONLINE    node2
ora.node2.ons  application    ONLINE    ONLINE    node2
ora.node2.vip  application    ONLINE    ONLINE    node2

到目前为止，只有Listener，ONS,GSD,VIP 注册到OCR中，还需要把ASM，数据库都注册到OCR中。

7  向OCR中添加ASM(需在oracle用户下)
[root@node1 dbs]# srvctl add asm -n node1 -i +ASM1 -o /opt/ora10g/product/10.2.0/db_1
null
  [PRKS-1030 : Failed to add configuration for ASM instance "+ASM1" on node "node1" in cluster registry, [PRKH-1001 : HASContext Internal Error]
  [PRKH-1001 : HASContext Internal Error]]
[root@node1 dbs]# su - oracle
[oracle@node1 ~]$ srvctl add asm -n node1 -i +ASM1 -o /opt/ora10g/product/10.2.0/db_1
[oracle@node1 ~]$ srvctl add asm -n node2 -i +ASM2 -o /opt/ora10g/product/10.2.0/db_1

8 启动ASM
[oracle@node1 ~]$ srvctl start asm -n node1
[oracle@node1 ~]$ srvctl start asm -n node2

若在启动时报ORA-27550错误。是因为RAC无法确定使用哪个网卡作为Private Interconnect，解决方法：在两个ASM的pfile文件里添加如下参数：
+ASM1.cluster_interconnects='10.10.17.221'
+ASM2.cluster_interconnects='10.10.17.222'

9 手工向OCR中添加Database对象。
[oracle@node1 ~]$ srvctl add database -d racdb -o /opt/ora10g/product/10.2.0/db_1

10 添加2个实例对象
[oracle@node1 ~]$ srvctl add instance -d racdb -i racdb1 -n node1
[oracle@node1 ~]$ srvctl add instance -d racdb -i racdb2 -n node2

11 修改实例和ASM实例的依赖关系
[oracle@node1 ~]$ srvctl modify instance -d
racdb -i racdb1 -s +ASM1
[oracle@node1 ~]$ srvctl modify instance -d
racdb -i racdb2 -s +ASM2

12 启动数据库
[oracle@node1 ~]$ srvctl start database -d racdb

若也出现ORA-27550错误。也是因为RAC无法确定使用哪个网卡作为Private Interconnect，修改pfile参数在重启动即可解决。
SQL>alter system set cluster_interconnects='10.10.17.221' scope=spfile sid='RACDB1';
SQL>alter system set cluster_interconnects='10.10.17.222' scope=spfile sid='RACDB2';

[root@node1 ~]# crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora....B1.inst application    ONLINE    ONLINE    node1
ora....B2.inst application    ONLINE    ONLINE    node2
ora.RACDB.db   application    ONLINE    ONLINE    node1
ora....SM1.asm application    ONLINE    ONLINE    node1
ora....E1.lsnr application    ONLINE    ONLINE    node1
ora.node1.gsd  application    ONLINE    ONLINE    node1
ora.node1.ons  application    ONLINE    ONLINE    node1
ora.node1.vip  application    ONLINE    ONLINE    node1
ora....SM2.asm application    ONLINE    ONLINE    node2
ora....E2.lsnr application    ONLINE    ONLINE    node2
ora.node2.gsd  application    ONLINE    ONLINE    node2
ora.node2.ons  application    ONLINE    ONLINE    node2
ora.node2.vip  application    ONLINE    ONLINE    node2

注：取消资源注册命令：crs_unregister ora.racdb.racdb1.inst，非常好用的命令，在使用srvctl不能remove时，可以尝试此命令。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航