您的位置:首页 > 数据库

postgresql 9.1 基于 async stream 的 master/salve 切换过程

2018-01-22 10:14 344 查看
vm: Oracle VM VirtualBox 5.2.4 r119785 (Qt5.6.2)

os: debian 8.2

postgresql: 9.1.22

ip 规划如下:两个网段

# 192.168.56.x

# 192.168.165.x

node1

bond0 192.168.56.100

bond0:1 192.168.56.101

eth0

eth1

bond1 192.168.165.100

bond1:1 192.168.165.101

eth2

eth3

node2

bond0 192.168.56.200

bond0:1

eth0

eth1

bond1 192.168.165.200

bond1:1

eth2

eth3

node3

eth0 192.168.56.50

eth1 192.168.165.50

其中 192.168.56.101、192.168.165.101 为 pgsql的 vip,应用程序只需要连接这两个ip中的一个就ok了。

#######################################################################

debian 8.2 安装 pgsql 9.1

#vi /etc/apt/sources.list.d/pgdg.list
deb http://apt.postgresql.org/pub/repos/apt/ jessie-pgdg main

#wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | \
apt-key add -
#apt-get update


#apt-get install postgresql-9.1


包含了如下包

postgresql-9.1

postgresql-client-9.1

postgresql-client-common

postgresql-common

postgresql-contrib-9.1

手动安装如下包

#apt-get install postgresql-server-dev-9.1


包含了如下包

libpq-dev

postgresql-server-dev-9.1

使用bonding,安装 ifenslave

# apt-get install ifenslave


bonding 配置,可以参考另外一篇blog。

#######################################################################

搭建 pgsql stream 环境,一主两从

node1 节点创建复制用户

$ psql
psql (9.1.22)
postgres=# CREATE USER replicator REPLICATION LOGIN ENCRYPTED PASSWORD 'rep123';


node1、node1、node3 节点修改 pg_hba.conf

$ cd /etc/postgresql/9.1/main
$ vi pg_hba.conf

host     replication     replicator     192.168.56.100/32         md5
host     replication     replicator     192.168.165.100/32        md5
host     replication     replicator     192.168.56.200/32         md5
host     replication     replicator     192.168.165.200/32        md5
host     replication     replicator     192.168.56.50/32          md5
host     replication     replicator     192.168.165.50/32         md5
host     replication     replicator     192.168.56.101/32         md5
host     replication     replicator     192.168.165.101/32        md5


node2,node3 两个节点上修改 postgresql.conf省略。

$ ls -l
drwx------ 5 postgres postgres 4096 1月  16 16:12 base
drwx------ 2 postgres postgres 4096 1月  16 18:51 global
drwx------ 2 postgres postgres 4096 1月  16 16:12 pg_clog
drwx------ 4 postgres postgres 4096 1月  16 16:12 pg_multixact
drwx------ 2 postgres postgres 4096 1月  16 18:48 pg_notify
drwx------ 2 postgres postgres 4096 1月  16 16:12 pg_serial
drwx------ 2 postgres postgres 4096 1月  16 16:12 pg_stat_tmp
drwx------ 2 postgres postgres 4096 1月  16 16:12 pg_subtrans
drwx------ 2 postgres postgres 4096 1月  16 16:12 pg_tblspc
drwx------ 2 postgres postgres 4096 1月  16 16:12 pg_twophase
-rw------- 1 postgres postgres    4 1月  16 16:12 PG_VERSION
drwx------ 3 postgres postgres 4096 1月  16 16:12 pg_xlog
-rw------- 1 postgres postgres  133 1月  16 18:48 postmaster.opts
lrwxrwxrwx 1 root     root       36 1月  16 16:12 server.crt -> /etc/ssl/certs/ssl-cert-snakeoil.pem
lrwxrwxrwx 1 root     root       38 1月  16 16:12 server.key -> /etc/ssl/private/ssl-cert-snakeoil.key


node2,node3 两个节点上使用pg_basebackup来拉取文件

$ pg_basebackup -h 192.168.56.101 -D /var/lib/postgresql/9.1/main -U replicator -v -P


node2,node3 两个节点上配置 recovery.conf

$ cd /var/lib/postgresql/9.1/main
$ vi recovery.conf
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=192.168.56.101 port=5432 user=replicator password=rep123'
trigger_file = '/tmp/postgresql.trigger.5432'


#######################################################################

node1 节点 pgbench 生成数据过程中关闭 pgsql

$ psql
psql (9.1.22)
postgres=# create database peiybdb
$ pgbench -i -s 200 peiybdb


node2 查看 lsn

$ psql
psql (9.1.22)
postgres=# select pg_last_xlog_receive_location(),
pg_last_xlog_replay_location(),
pg_last_xact_replay_timestamp(),
now() - pg_last_xact_replay_timestamp() AS replication_delay;
-[ RECORD 1 ]-----------------+------------------------------
pg_last_xlog_receive_location | 0/84001D8
pg_last_xlog_replay_location  | 0/84001D8
pg
10a84
_last_xact_replay_timestamp | 2018-01-16 22:07:30.637413+08
replication_delay             | 00:03:41.62903


node3 查看 lsn

$ psql
psql (9.1.22)
postgres=# select pg_last_xlog_receive_location(),
pg_last_xlog_replay_location(),
pg_last_xact_replay_timestamp(),
now() - pg_last_xact_replay_timestamp() AS replication_delay;
-[ RECORD 1 ]-----------------+------------------------------
pg_last_xlog_receive_location | 0/84001D8
pg_last_xlog_replay_location  | 0/84001D8
pg_last_xact_replay_timestamp | 2018-01-16 22:07:30.637413+08
replication_delay             | 00:04:04.309157


node1 节点 使用 pg_controldata 查看最后生成的 lsn

$ /usr/lib/postgresql/9.1/bin/pg_controldata /var/lib/postgresql/9.1/main


#######################################################################

node1 的 postgresql 会话关闭

$ psql
psql (9.1.22)
postgres=# SELECT 'select pg_terminate_backend('|| psa.procpid || ');' as term_spid,
psa.usename <> 'postgres'
psa.procpid,
psa.waiting,
psa.client_addr,
psa.current_query,
now() - psa.xact_start as xact_time,
now() - psa.query_start as query_time,
psa.*
FROM pg_stat_activity psa
WHERE 1=1
and psa.usename <> 'postgres'
order by case when psa.current_query = '<IDLE>' then 99
when psa.current_query = '<insufficient privilege>' then 98
else 1
end,
now() - psa.xact_start desc
;


然后手动执行 select pg_terminate_backend(”);

node1 的文件 recovery.done

$ cd /var/lib/postgresql/9.1/main/
$ vi recovery.done
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=192.168.56.101 port=5432 user=replicator password=rep123'
trigger_file = '/tmp/postgresql.trigger.5432'


node1 的 postgresql 关闭

# service postgresql stop
或者
$ /usr/lib/postgresql/9.1/bin/pg_ctl stop -m fast -D /var/lib/postgresql/9.1/main


node1 的文件 recovery.done 重命名为 recovery.conf,谨防启动后变成双主

$ cd /var/lib/postgresql/9.1/main/
$ mv recovery.done recovery.conf


node1 的 bond 网卡子接口 vip 删除

# ip addr del 192.168.56.101/24 dev bond0 label bond0:1
# ip addr del 192.168.165.101/24 dev bond1 label bond1:1
# ifconfig
# ip a


node2 的 bond 网卡子接口 vip 添加

# ip addr add 192.168.56.101/24 brd 192.168.56.255 dev bond0 label bond0:1
# ip addr add 192.168.165.101/24 brd 192.168.165.255 dev bond1 label bond1:1
# ifconfig
# ip a


node2 进行 arping 广播

# arping -q -A -c 1 -I bond0:1 192.168.56.101
# arping -q -A -c 1 -I bond1:1 10.10.165.101


node2 的从库提升为主库

$ /usr/lib/postgresql/9.1/bin/pg_ctl promote -D /var/lib/postgresql/9.1/main


node2 查看pgsql 状态

$ /usr/lib/postgresql/9.1/bin/pg_controldata /var/lib/postgresql/9.1/main


node2 上生成的 时间线拷贝到其余的从节点上

pgsql 9.1 还需要手动 拷贝这个时间线文件,pgsql 9.3 开始就不需要手动拷贝生成的时间线文件了,会自动传输。这个一定要注意,否则再 log 文件会生成这样的FATAL 提示

timeline 2 of the primary does not match recovery target timeline 1

$ scp ./00000002.history postgres@192.168.56.100:/var/lib/postgresql/9.1/main/pg_xlog/
$ scp ./00000002.history postgres@192.168.56.50:/var/lib/postgresql/9.1/main/pg_xlog/


node1 启动pgsql 9.1

# service postgresql start
或者
$ /usr/lib/postgresql/9.1/bin/pg_ctl start -D /var/lib/postgresql/9.1/main -o "-c config_file=/etc/postgresql/9.1/main/postgresql.conf"


node2 查看从库复制情况

$ psql
psql (9.1.22)
postgres=# select client_addr,
pg_xlog_location_diff(pg_current_xlog_insert_location(), pg_current_xlog_location() ) as local_noflush_delay,
pg_xlog_location_diff(pg_current_xlog_location(), sent_location) as local_sent_delay,
pg_xlog_location_diff(sent_location, write_location) as stream_write_delay,
pg_xlog_location_diff(sent_location, flush_location) as stream_flush_delay,
pg_xlog_location_diff(sent_location, replay_location) as stream_replay_delay
from pg_stat_replication;


node2 查看master当前的 lsn

$ psql
psql (9.1.22)
postgres=# select pg_current_xlog_insert_location(),
pg_current_xlog_location()
;


node3 查看slave当前的 lsn

$ psql
psql (9.1.22)
postgres=# select pg_last_xlog_receive_location(),
pg_last_xlog_replay_location(),
pg_last_xact_replay_timestamp(),
now() - pg_last_xact_replay_timestamp() AS replication_delay
;


至此,pgsql 9.1 的master/slave 切换成功。由于使用了vip,应用不需要做任何的修改,只是在应用的日志里会输出到数据库的连接丢失或者无法连接数据库的提示。

#######################################################################

##### 参考

#######################################################################

1、查看ip

# ifconfig
或者
# ip -a


注意 scope global bond0

scope global secondary bond0:1

2、取消bond0:1 (192.168.56.101)

# ip addr del 192.168.56.101/24 dev bond0 label bond0:1
或者
# ifconfig bond0:1 192.168.56.101 down


3、添加bond0:1 (192.168.56.101)

# ip addr add 192.168.56.101/24 brd 192.168.56.255 dev bond0 label bond0:1
或者
# ifconfig bond0:1 192.168.56.101/24 up


4、postgresql 关闭

# service postgresql stop
或者
$ /usr/lib/postgresql/9.1/bin/pg_ctl stop -m fast -D /var/lib/postgresql/9.1/main


5、postgresql 启动

# service postgresql start
或者
$ /usr/lib/postgresql/9.1/bin/pg_ctl start -D /var/lib/postgresql/9.1/main -o "-c config_file=/etc/postgresql/9.1/main/postgresql.conf"


6、postgresql slave提升为master

$ /usr/lib/postgresql/9.1/bin/pg_ctl promote -D /var/lib/postgresql/9.1/main
或者生成recovery.conf 的 trigger_file
$ touch /tmp/postgresql.trigger.5432


7、recovery.conf 文件

vi recovery.conf
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=192.168.56.101 port=5432 user=replicator password=rep123'
trigger_file = '/tmp/postgresql.trigger.5432'


注意,recovery_target_timeline 一定要设置为 latest ,否则就算把新的时间线文件拷贝到slave上,也会报 timeline 2 of the primary does not match recovery target timeline 1 的错误

8、bond0:1、bond1:1 变更后执行 arping

# arping -q -A -c 1 -I bond0:1 192.168.56.101
# arping -q -A -c 1 -I bond1:1 10.10.165.101


关于 arping 命令,可以参考 http://man.linuxde.net/arping
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: