repmgr配置备库报错File exists处理一例
2021-02-28 17:22
936 查看
背景
生产上选用repmgr给PostgreSQL数据库做高可用集群,在给生产上一套库做高可用改造时发现standby clone时报错,无法复制备库,报错内容如下:
原因
先说原因,是因为对PG和pg_basebackup比较了解的同学可能自己就可以想出解决方案,不需要再继续往下看了。原因是由于创建的独立表空间指定的目录放在$PGDATA目录下,repmgr的standby clone调用的是pg_basebackup,而且没有指定输出格式,默认为plain,会复制主库目录时把PGDATA目录下所有文件、目录和独立表空间目录,所以会报错File exists。
解决方案
- 迁移主库独立表空间到PGDATA以外的目录(会阻塞写)
- 指定新目录做standby clone,clone完后把把文件移到实际PGDATA目录
由于方案一涉及到对主库做操作,不建议在生产上操作,除非不介意对应用的影响。
测试方案二
节点1操作
添加表空间、创建database、写表
postgres=# create user pguser login password 'pguser'; CREATE ROLE postgres=# create tablespace tbs_mydb owner pguser location '/home/postgres/data/pg_tbs/tbs_mydb'; WARNING: tablespace location should not be inside the data directory CREATE TABLESPACE postgres=# create database mydb with owner=pguser template=template0 encoding='UTF8' tablespace =tbs_mydb; CREATE DATABASE postgres=# grant all on database mydb to pguser with grant option; GRANT postgres=# grant all on tablespace tbs_mydb to pguser; GRANT postgres=# \c mydb pguser You are now connected to database "mydb" as user "pguser". mydb=> create table t1 (id int); CREATE TABLE mydb=> insert into t1 values(1); INSERT 0 1 mydb=> select * from t1; id ---- 1 (1 row)
节点2操作
第一次尝试 standby clone,出现与生产上一致的报错,报错信息与生产一致
INFO: checking and correcting permissions on existing directory "/home/postgres/data" NOTICE: starting backup (using pg_basebackup)... HINT: this may take some time; consider using the -c/--fast-checkpoint option INFO: executing: /usr/local/pgsql/bin/pg_basebackup -l "repmgr base backup" -D /home/postgres/data -h 192.168.56.111 -p 6000 -U repmgr -X stream pg_basebackup: could not create directory "/home/postgres/data/pg_tbs": File exists pg_basebackup: removing contents of data directory "/home/postgres/data" pg_basebackup: changes to tablespace directories will not be undone ERROR: unable to take a base backup of the source server HINT: data directory ("/home/postgres/data") may need to be cleaned up manually
修改repmgr.conf中的data_directory='/home/postgres/repmgr'
再次尝试 standby clone,成功
[postgres@repmgr2 ~]$ repmgr -h 192.168.56.111 -U repmgr -d repmgr -f ~/repmgr.conf standby clone -p6000 NOTICE: destination directory "/home/postgres/repmgr" provided INFO: connecting to source node DETAIL: connection string is: host=192.168.56.111 user=repmgr port=6000 dbname=repmgr DETAIL: current installation size is 45 MB DEBUG: 1 node records returned by source node DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.56.111 port=6000 fallback_application_name=repmgr options=-csearch_path=" DEBUG: upstream_node_id determined as 111 INFO: replication slot usage not requested; no replication slot will be set up for this standby NOTICE: checking for available walsenders on the source node (2 required) NOTICE: checking replication connections can be made to the source server (2 required) INFO: checking and correcting permissions on existing directory "/home/postgres/repmgr" NOTICE: starting backup (using pg_basebackup)... HINT: this may take some time; consider using the -c/--fast-checkpoint option INFO: executing: /usr/local/pgsql/bin/pg_basebackup -l "repmgr base backup" -D /home/postgres/repmgr -h 192.168.56.111 -p 6000 -U repmgr -X stream DEBUG: create_recovery_file(): creating "/home/postgres/repmgr/recovery.conf"... DEBUG: recovery.conf line: standby_mode = 'on' DEBUG: recovery.conf line: primary_conninfo = 'host=192.168.56.111 user=repmgr port=6000 application_name=repmgr2 connect_timeout=2' DEBUG: recovery.conf line: recovery_target_timeline = 'latest' NOTICE: standby clone (using pg_basebackup) complete NOTICE: you can now start your PostgreSQL server HINT: for example: pg_ctl -D /home/postgres/repmgr start HINT: after starting the server, you need to register this standby with "repmgr standby register"
修改repmgr.conf为原来的配置,并把repmgr目录下的所有文件mv到data目录下
data_directory='/home/postgres/data' [postgres@repmgr2 repmgr]$ mv * ~/data/ mv: cannot move ‘pg_tbs’ to ‘/home/postgres/data/pg_tbs’: File exists
修改配置文件中的cluster_name参数并启动数据库
[postgres@repmgr2 data]$ pg_ctl -D /home/postgres/data/ start waiting for server to start....2021-02-28 10:09:15.905 CST [3498] LOG: listening on IPv4 address "0.0.0.0", port 6000 2021-02-28 10:09:15.912 CST [3498] LOG: listening on Unix socket "/tmp/.s.PGSQL.6000" 2021-02-28 10:09:15.949 CST [3498] LOG: redirecting log output to logging collector process 2021-02-28 10:09:15.949 CST [3498] HINT: Future log output will appear in directory "log". . done server started
注册备库成功
[postgres@repmgr2 data]$ repmgr -f ../repmgr.conf standby register INFO: connecting to local node "repmgr2" (ID: 113) DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.56.113 port=6000 fallback_application_name=repmgr options=-csearch_path=" INFO: connecting to primary database DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.56.111 port=6000 fallback_application_name=repmgr options=-csearch_path=" WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID 111) INFO: standby registration complete NOTICE: standby node "repmgr2" (ID: 113) successfully registered
检查集群状态
[postgres@repmgr2 data]$ repmgr -f ../repmgr.conf cluster show DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.56.113 port=6000 fallback_application_name=repmgr options=-csearch_path=" DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.56.111 port=6000 fallback_application_name=repmgr options=-csearch_path=" DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.56.113 port=6000 fallback_application_name=repmgr options=-csearch_path=" DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.56.111 port=6000 fallback_application_name=repmgr options=-csearch_path=" ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string -----+---------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------------------------- 111 | repmgr1 | primary | * running | | default | 100 | 5 | host=192.168.56.111 port=6000 user=repmgr dbname=repmgr connect_timeout=2 113 | repmgr2 | standby | running | repmgr1 | default | 100 | 5 | host=192.168.56.113 port=6000 user=repmgr dbname=repmgr connect_timeout=2
测试数据同步
主库测试添加数据
mydb=> insert into t1 values(2); INSERT 0 1 mydb=> select * from t1; id ---- 1 2 (2 rows)
从库查询
[postgres@repmgr2 data]$ psql psql (10.11) Type "help" for help. postgres=# \c mydb pguser You are now connected to database "mydb" as user "pguser". mydb=> select * from t1; id ---- 1 2 (2 rows)
写在最后
其实在创建独立表空间时PG已经做了提示表空间不应用在DATA目录,所以出现上面的报错就是掉进了前人的坑。
WARNING: tablespace location should not be inside the data directory
如果想尝试方案一的可以提供一下思路
#新创建一个表空间 postgres=# create tablespace zhijian owner pguser location '/data/pgdata/11/pg_tbs/tbs_zhijian'; CREATE TABLESPACE #更改数据库的表空间 mydb=> \c postgres postgres postgres=# alter database mydb set tablespace zhijian; ALTER DATABASE
相关文章推荐
- [Linux]-SSL证书配置https及其报错处理
- Linux下配置APUE的编译 报错之后如何处理
- Lync就地升级Skype for Business报错处理一例
- IIS 报错500.13,服务器太忙,此时无法处理请求,通讯量超出网站的配置能力
- 处理第二个从库报错一例
- VMWare虚拟网络配置出错处理一例
- 批处理实现FTP(或者路由器)的快速配置
- day08 EL+JSTL处理Cookie乱码 jsp配置错误页面
- jumpserver注意事项以及报错处理
- 配置用户组策略环回处理模式
- 【故障处理】之Cisco 3650交换机配置无法保存
- Spring Hibernate 单元测试配置为事务自动回滚时需要提交事务时的处理
- 错误处理,与php配置文件php.ini相关函数
- spring中处理中文乱码问题的过滤器配置
- 关于服务器双网卡接口连接两个网络的静态路由配置---处理
- 配置PLSQL问题处理
- DICOM医学图像处理:DCMTK在VS2012中的配置
- SSH框架学习笔记(三)----struts2的开发步骤,常规配置,异常处理,国际化
- 安装MySQL配置时报错的处理方法
- 没有配置辨别者列造成的报错