您的位置:首页 > 其它

vertica-三节点集群,一台机器故障,所有文件丢失。的恢复

2016-07-12 12:40 706 查看
数据文件夹全部丢失,此时在节点上重装软件,数据库识别该失败节点,但是无法启动或者删除该节点,因为缺少核心的媒介spread.conf 文件
重启这个节点报错

"{0}. Error was: {1}.".format(repr(msg), e))
ATReceiveFailure_Init: Problem json decoding message '{"status": null, "content": {"special_environment": null}, "error_type": null, "error_message": null, "exec_stack": null}'. Error was: None is not a valid status.

[dbadmin@n1 v_embigdata_dev_node0003_catalog]$ sudo mkdir EmBigData_dev

[dbadmin@n1 v_embigdata_dev_node0003_catalog]$ sudo chown -R dbadmin:verticadba EmBigData_dev/

[dbadmin@n1 v_embigdata_dev_node0003_catalog]$ cd EmBigData_dev/

[dbadmin@n1 v_embigdata_dev_node0003_catalog]$ sudo mkdir v_embigdata_dev_node0003_catalog

[dbadmin@n1 v_embigdata_dev_node0003_catalog]$ sudo chown -R dbadmin:verticadba v_embigdata_dev_node0003_catalog/

[dbadmin@n1 v_embigdata_dev_node0003_catalog]$ cd v_embigdata_dev_node0003_catalog/

正常节点向其传输spread.conf 文件

[dbadmin@n3 v_embigdata_dev_node0003_catalog]$ scp spread.conf 172.16.57.26:/opt/EmBigData_dev/v_embigdata_dev_node0003_catalog/
spread.conf 100% 403 0.4KB/s 00:00

问题节点查看spread.conf

[dbadmin@n1 v_embigdata_dev_node0003_catalog]$ ll
total 4
-rw-r--r-- 1 dbadmin verticadba 403 Jul 12 09:24 spread.conf

尝试重启问题节点,多次执行以下步骤,执行成功

Nodes UP: v_embigdata_dev_node0002, v_embigdata_dev_node0003
Nodes DOWN: v_embigdata_dev_node0001 (may be still initializing).
It is suggested that you continue waiting.
Do you want to continue waiting? (yes/no) [yes] yes

总结一下:
这里可以看到我们在节点1里面,却配置了节点3的v_embigdata_dev_node0003_catalog,节点1依然启动成功!!
这里发现vertica有一个识别节点的spread.conf文件,这个文件在每个节点都一样的
可以看看问题节点在重启成功后的文件结构

[dbadmin@n1 EmBigData_dev]$ ll
total 36
-rw-r--r-- 1 dbadmin verticadba 7164 Jul 12 09:30 dbLog
drwx------ 9 dbadmin verticadba 4096 Jul 12 09:40 v_embigdata_dev_node0001_catalog
drwxrwx--- 502 dbadmin verticadba 20480 Jul 12 09:31 v_embigdata_dev_node0001_data
drwxr-xr-x 2 dbadmin verticadba 4096 Jul 12 09:24 v_embigdata_dev_node0003_catalog

这里v_embigdata_dev_node0001_catalog和v_embigdata_dev_node0001_data全部重新生成了
查看日志,了解vertica恢复步骤

2016-07-12 09:24:48.426 unknown:0x7f7352307720 [Txn] <INFO> Looking for catalog at: /opt/EmBigData_dev/v_embigdata_dev_node0001_catalog/Catalog
2016-07-12 09:24:48.426 unknown:0x7f7352307720 [Init] <INFO> Catalog not loaded from /opt/EmBigData_dev/v_embigdata_dev_node0001_catalog, probably being bootstrapped or empty
2016-07-12 09:24:48.426 unknown:0x7f7352307720 [Comms] <INFO> Spread domain socket will be opened in directory /opt/vertica/spread/tmp
2016-07-12 09:24:48.426 unknown:0x7f7352307720 [Comms] <INFO> About to launch spread with '/opt/vertica/spread/sbin/spread -c /opt/EmBigData_dev/v_embigdata_dev_node0001_catalog/spread.conf'
2016-07-12 09:24:48.427 unknown:0x7f7352307720 [Comms] <INFO> forked spread pid=33233, wrote pidfile /opt/EmBigData_dev/v_embigdata_dev_node0001_catalog/spread.pid

可以看到spread.conf在集群之间传递

...............
...............
...............

2016-07-12 09:25:00.021 Spread Client:0x7c4eb30 [VMPI] <INFO> We are invited to join the cluster
2016-07-12 09:25:00.021 Spread Client:0x7c4eb30 [Init] <INFO> Startup [Waiting for Cluster Invite] Invited
2016-07-12 09:25:00.021 Spread Client:0x7c4eb30-fff0000000000bca [Txn] <INFO> Begin Txn: fff0000000000bca 'Installing New Catalog'
2016-07-12 09:25:00.021 Spread Client:0x7c4eb30-fff0000000000bca [Catalog] <INFO> installNewCatalog: Received new catalog, replacing current TRANSACTION-fff0000000000bca catalog (old version=0, new version 0x35f2d)

...............
...............
...............

2016-07-12 09:25:24.026 Spread Client:0x7c4eb30 [Txn] <INFO> Commit Complete: Txn: a0000000000001 at epoch 0xa93c
2016-07-12 09:25:24.026 Spread Client:0x7c4eb30 [Txn] <INFO> Joining DB group with catalog version 220973
2016-07-12 09:25:24.026 Spread Client:0x7c4eb30 [Comms] <INFO> joinDBGroup: Node v_embigdata_dev_node0001 (#r0-10#N172016057026) joining cluster for DB EmBigData_dev
2016-07-12 09:25:24.037 Spread Client:0x7c4eb30 [Comms] <INFO> nodeToState map:
2016-07-12 09:25:24.037 Spread Client:0x7c4eb30 [Comms] <INFO> v_embigdata_dev_node0001 : INITIALIZING
2016-07-12 09:25:24.037 Spread Client:0x7c4eb30 [Comms] <INFO> v_embigdata_dev_node0002 : UP
2016-07-12 09:25:24.037 Spread Client:0x7c4eb30 [Comms] <INFO> v_embigdata_dev_node0003 : UP
2016-07-12 09:25:24.037 Spread Client:0x7c4eb30 [Recover] <INFO> State change for node v_embigdata_dev_node0001: INITIALIZING; catalog 220973
2016-07-12 09:25:24.037 Spread Client:0x7c4eb30 [Comms] <INFO> Saw membership message 4352 (0x1100) on V:EmBigData_dev
2016-07-12 09:25:24.037 Spread Client:0x7c4eb30 [Comms] <INFO> DB Group changed

...............
...............
...............

2016-07-12 09:25:24.688 DistCall Dispatch:0x7f72a07b6bb0 <LOG> @v_embigdata_dev_node0001: 00000/3298: Event Posted: Event Code:6 Event Id:1 Event Severity: Informational [6] PostedTimestamp: 2016-07-12 09:25:24.688094 ExpirationTimestamp: 2084-07-30 12:39:31.688094 EventCodeDescription: Node State Change ProblemDescription: Changing node v_embigdata_dev_node0001 startup state to RECOVERING DatabaseName: EmBigData_dev Hostname: n1
2016-07-12 09:25:24.688 DistCall Dispatch:0x7f72a07b6bb0 [Recover] <INFO> Changing node v_embigdata_dev_node0001 startup state from INITIALIZING to RECOVERING
2016-07-12 09:25:24.688 Spread Client:0x7c4eb30 [Comms] <INFO> nodeToState map:
2016-07-12 09:25:24.688 Spread Client:0x7c4eb30 [Comms] <INFO> v_embigdata_dev_node0001 : RECOVERING
2016-07-12 09:25:24.688 Spread Client:0x7c4eb30 [Comms] <INFO> v_embigdata_dev_node0002 : UP
2016-07-12 09:25:24.688 Spread Client:0x7c4eb30 [Comms] <INFO> v_embigdata_dev_node0003 : UP
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: