您的位置:首页 > 运维架构 > Linux

在 Azure CentOS VM 中配置 SQL Server 2019 AG - (上)

2020-04-29 16:39 721 查看
### 前文 - 假定您对Azure和SQL Server HA具有基础知识 - 假定您对Azure Cli具有基础知识 - 目标是在Azure Linux VM上创建一个具有三个副本的可用性组,并实现侦听器和Fencing配置 ### 环境 - SQL Server 2019 Developer on Linux - Azure VM Fencing agent - Azure Cli实现部分配置 - CentOS 7.7 Azure VM,分别SQL19N1,SQL19N2,SQL19N3,位于同一VNet ### 步骤 - #### 为VM创建资源组和可用性集 ``` # 中国东部2创建资源组 az group create --name SQL-DEMO-RG --location chinaeast2 # 创建用于VM人Availability Set,配置2个容错域,2个更新域 az vm availability-set create \ --resource-group SQL-DEMO-RG \ --name AGLinux-AvailabilitySet \ --platform-fault-domain-count 2 \ --platform-update-domain-count 2 ``` - #### 使用Template部署3台VM 第一次创建VM时,会生成template,然后下载保存下,修改其中的参数值后,就可以方便地创建配置类似的VM。VM的配置主要有: - 使用前面的可用性集 - 使用同一个子网 - IP使用Standard - SSH public key配置 模板和参数文件太长,就不展示了。可以在Azure Portal上自行获取。 ``` # 如下是SQL19N2的配置,修改参数文件后,直接可以用于创建SQL19N3 templateFile="./templateFile" paramFile="./vmParams-sql19n2.json" az deployment group validate --name sql19n2vm \ -g SQL-DEMO-RG --template-file $templateFile --parameters $paramFile ``` - #### 配置VM使用固定内网IP和公网DNS Label > 三台VM都需要修改配置,如下只是一台的配置示例 ``` # 找出nic和IP的信息 az network nic list -g SQL-DEMO-RG --query "[].{nicName:name,configuration:ipConfigurations[].{ipName:name,ip:privateIpAddress,method:privateIpAllocationMethod}}" -o yaml # 修改privateIpAllocationMethod为Static az network nic ip-config update -g SQL-DEMO-RG --nic-name sql19n1152 --name ipconfig1 --set privateIpAllocationMethod=Static # 找出pbulic ip名称 az network public-ip list -g SQL-DEMO-RG --query "[].name" -o tsv # 配置Public IP的DNS name,只能使用数据和小字字母 az network public-ip update -g SQL-DEMO-RG -n SQL19N1ip851 --dns-name sql19n1 ``` - #### 安装HA相关软件包 > 最好先更新一下系统的软件包,再安装HA相关软件。 ``` yum update -y yum install -y pacemaker pcs fence-agents-all resource-agents fence-agents-azure-arm reboot ``` - #### 为群集和SQL Server开放防火墙端口 ``` # Pacemaker和Corosync的端口 # TCP: Ports 2224,3121,21064,5405 # UDP: Port 5405 firewall-cmd --add-port=2224/tcp --permanent firewall-cmd --add-port=2224/tcp --permanent firewall-cmd --add-port=21064/tcp --permanent firewall-cmd --add-port=5405/tcp --permanent firewall-cmd --add-port=5405/udp --permanent # SQL Server端口和AG镜像端口 # TCP: 1433,5022 firewall-cmd --add-port=1433/tcp --permanent firewall-cmd --add-port=5022/tcp --permanent firewall-cmd --reload ``` - #### 添加hosts记录 ``` vi /etc/hosts 172.17.2.8 SQL19N1 172.17.2.9 SQL19N2 172.17.2.10 SQL19N3 ``` - #### 创建Pacemaker群集 ``` # 设置Pacemaker的默认用户密码,三台VM上 passwd hacluster # 设置pacemaker和pcsd自启动在三台VM上 systemctl enable pcsd systemctl start pcsd systemctl enable pacemaker # 创建群集,在master节点 sudo pcs cluster auth SQL19N1 SQL19N2 SQL19N3 -u hacluster sudo pcs cluster setup --name agcluster SQL19N1 SQL19N2 SQL19N3 --token 30000 --force sudo pcs cluster start --all sudo pcs cluster enable --all # 查看群集状态 pcs status # 在三个节点上修改quorum的expected-votes为3,其实三节点群集默认为3 # 设置表示,群集存活需要3票,这个修改只影响当前running群集,不会变成群集的永久性配置保存下来 pcs quorum expected-votes 3 ``` - #### 在Azure上为Fencing Agent配置Servic Princinpal ``` # 1. 创建 aad app,成功后记录下相应的appID az ad app create --display-name sqldemorg-app --identifier-uris http://localhost \ --password "1qaz@WSX3edc" --end-date '2030-04-27' --credential-description "sql19 ag secret" # 2. 创建aad App的Service Principal az ad sp create --id # 3. 将service Principal分配到VM对应的管理role,对每个VM都要执行 # 我这里分配的是Owner role,这不是安全的做法。应该使用自定义一个role,只给最小权限 # 自定义role需要Azure订阅是PP1或者PP2级别 az role assignment create --assignee --role owner \ --scope /subscriptions//resourceGroups//providers/Microsoft.Compute/virtualMachines/SQL19N1 ``` - #### 创建Azure的STONITH 设备 > 我使用的是Azure China,所以需要指定cloud=china,如果使用global Azure不需要指定此参数。 > 执行 fence_azure_arm -h,查看此资源代理的更多帮助信息 ``` pcs property set stonith-timeout=900 pcs stonith create rsc_st_azure fence_azure_arm login="" passwd="" resourceGroup="" tenantId="" subscriptionId="" power_timeout=240 pcmk_reboot_timeout=900 cloud=china ``` - #### 安装SQL 2019及工具 ``` # 安装 SQL 2019和HA 资源代理 sudo curl -o /etc/yum.repos.d/mssql-server.repo https://packages.microsoft.com/config/rhel/7/mssql-server-2019.repo sudo yum install -y mssql-server sudo /opt/mssql/bin/mssql-conf setup sudo yum install mssql-server-ha # 安装 mssql-tools sudo curl -o /etc/yum.repos.d/msprod.repo https://packages.microsoft.com/config/rhel/7/prod.repo sudo yum install -y mssql-tools unixODBC-devel # 将mssql-tools目录加入到aPATH,方便使用 echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bash_profile echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bashrc source ~/.bashrc # 安装 mssql-cli sudo rpm --import https://packages.microsoft.com/keys/microsoft.asc sudo curl -o /etc/yum.repos.d/mssql-cli.repo https://packages.microsoft.com/config/rhel/7/prod.repo sudo yum install mssql-cli # 查看SQL 状态 systemctl status mssql-server ``` > 如果您熟悉 SQL Server相关的PowerShell,建议将PowerShell也安装上,并安装SQLServer module。对SQL Server的配置,使用PowerShell会方便很多 ``` yum install powershell -y pwsh Install-Module SQLServer # 查看SQL相关的命令 Get-Command -Module SQLServer ``` - ### 配置AG - ##### 创建PowerShell 函数方便后续执行T-SQL ``` # 打开PowerShell的 profile文件,如果不存在需要则需要创建 vi /root/.config/powershell/Microsoft.PowerShell_profile.ps1 # 将如下函数加入 到 profile文件中,每次打开pwsh时就可以直接调用 # 函数有两个参数,$sql表示需要执行的T-SQL,最好使用here-string以避免字符转义问题 # $servers表示目标实例,数组类型。默认值为当前环境中的三个实例 function run-sql ($sql,$servers=("SQL19N1","SQL19N2","SQL19N3")) { $secpasswd = "1qaz@WSX"|ConvertTo-SecureString -AsPlainText -Force $cred=New-Object System.Management.Automation.PSCredential -ArgumentList 'sa', $secpasswd $sql "---------" foreach($svr in $servers) {"Running T-SQL on $svr..."; Invoke-Sqlcmd -ServerInstance $svr -Credential $cred -Query $sql} } ``` --- - ##### 启用 hadr功能,每个实例 ``` sudo /opt/mssql/bin/mssql-conf set hadr.hadrenabled 1 sudo systemctl restart mssql-server ``` - ##### 启动AG extened event session ``` # T-SQL,每个实例 ALTER EVENT SESSION AlwaysOn_health ON SERVER WITH (STARTUP_STATE=ON); GO ``` - ##### 在主副本实例上创建证书,这个证书用于验证Mirroring endpoint通信。将证书和私钥复制到其它节点上的相同的目录位置。授予mssql用户访问权限 ``` CREATE MASTER KEY ENCRYPTION BY PASSWORD = '1qaz@WSX'; GO CREATE CERTIFICATE dbm_certificate WITH SUBJECT = 'dbm'; GO BACKUP CERTIFICATE dbm_certificate TO FILE = '/var/opt/mssql/data/dbm_certificate.cer' WITH PRIVATE KEY ( FILE = '/var/opt/mssql/data/dbm_certificate.pvk', ENCRYPTION BY PASSWORD = '1qaz@WSX' ); ``` ``` # 复制证书和私钥到辅助副本主机SQL19N2和SQL19N3 cd /var/opt/mssql/data scp dbm_certificate.* root@SQL19N2:/var/opt/mssql/data/ scp dbm_certificate.* root@SQL19N3:/var/opt/mssql/data/ # 辅助副本节点上修改权限 cd /var/opt/mssql/data chown mssql:mssql dbm_certificate.* ``` - ##### 在辅助副本实例中创建master key并导入证书 ``` CREATE MASTER KEY ENCRYPTION BY PASSWORD = '1qaz@WSX'; GO CREATE CERTIFICATE dbm_certificate FROM FILE = '/var/opt/mssql/data/dbm_certificate.cer' WITH PRIVATE KEY ( FILE = '/var/opt/mssql/data/dbm_certificate.pvk', DECRYPTION BY PASSWORD = '1qaz@WSX' ); ``` - ##### 创建AG的镜像端口,注意防火墙和NSG配置端口例外 ``` CREATE ENDPOINT [Hadr_endpoint] AS TCP (LISTENER_PORT = 5022) FOR DATABASE_MIRRORING ( ROLE = ALL, AUTHENTICATION = CERTIFICATE dbm_certificate, ENCRYPTION = REQUIRED ALGORITHM AES ); GO ALTER ENDPOINT [Hadr_endpoint] STATE = STARTED; ``` - ##### 创建三个副本,同步模式的AG,主副本实例上执行 ``` CREATE AVAILABILITY GROUP [ag1] WITH (DB_FAILOVER = ON, CLUSTER_TYPE = EXTERNAL) FOR REPLICA ON N'SQL19N1' WITH ( ENDPOINT_URL = N'tcp://SQL19N1:5022', AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, FAILOVER_MODE = EXTERNAL, SEEDING_MODE = AUTOMATIC, SECONDARY_ROLE(ALLOW_CONNECTIONS = ALL) ), N'SQL19N2' WITH ( ENDPOINT_URL = N'tcp://SQL19N2:5022', AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, FAILOVER_MODE = EXTERNAL, SEEDING_MODE = AUTOMATIC, SECONDARY_ROLE(ALLOW_CONNECTIONS = ALL) ), N'SQL19N3' WITH( ENDPOINT_URL = N'tcp://SQL19N3:5022', AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, FAILOVER_MODE = EXTERNAL, SEEDING_MODE = AUTOMATIC, SECONDARY_ROLE(ALLOW_CONNECTIONS = ALL) ); GO ALTER AVAILABILITY GROUP [ag1] GRANT CREATE ANY DATABASE; GO ``` - ##### 为Pacemaker创建sql登录并授权,每个实例 ``` USE [master] GO CREATE LOGIN [pacemakerLogin] with PASSWORD= N'1qaz@WSX' go ALTER SERVER ROLE [sysadmin] ADD MEMBER [pacemakerLogin]; GO ``` - ##### 将pacemaker的login信息保存到本地文件 ``` echo "pacemakerLogin" >> /var/opt/mssql/secrets/passwd echo "1qaz@WSX" >> /var/opt/mssql/secrets/passwd # 只允许root读取 chown root:root /var/opt/mssql/secrets/passwd chmod 400 /var/opt/mssql/secrets/passwd ``` - ##### 将辅助副本加入到AG, 辅助副本执行 ``` ALTER AVAILABILITY GROUP [ag1] JOIN WITH (CLUSTER_TYPE = EXTERNAL); GO # auto_seeding功能需要的权限 ALTER AVAILABILITY GROUP [ag1] GRANT CREATE ANY DATABASE; GO ``` - ##### 如果您不希望pacemakerLogin具有sysadmin的权限,可以将之从sysadmin中移除,并授予如下权限。每个实例 ``` ALTER SERVER ROLE [sysadmin] DROP MEMBER [pacemakerLogin] GO GRANT ALTER, CONTROL, VIEW DEFINITION ON AVAILABILITY GROUP::ag1 TO pacemakerLogin; GO GRANT VIEW SERVER STATE TO pacemakerLogin; GO ``` - ##### 添加数据库到AG,主副本执行 ``` CREATE DATABASE [db1]; GO ALTER DATABASE [db1] SET RECOVERY FULL; GO BACKUP DATABASE [db1] TO DISK = N'nul'; GO ALTER AVAILABILITY GROUP [ag1] ADD DATABASE [db1]; GO ``` - ##### 可用性数据库状态 ``` SELECT * FROM sys.databases WHERE name = 'db1'; GO SELECT DB_NAME(database_id) AS 'database', synchronization_state_desc FROM sys.dm_hadr_database_replica_states; ``` - #### 在Pacemaker群集中配置AG ---- - ##### 创建AG资源,ag_name要指定为之前创建AG名称 ``` pcs resource create agcluster ocf:mssql:ag ag_name=ag1 meta failure-timeout=30s master notify=true ``` - ##### 创建虚拟IP资源 ``` # 禁用fencing pcs property set stonith-enabled=false # 创建VIP pcs resource create virtualip ocf:heartbeat:IPaddr2 ip=172.17.2.7 ``` - ##### 创建 colacation constraint,vip和master必需在同一个节点上启动 ``` pcs constraint colocation add virtualip agcluster-master INFINITY with-rsc-role=Master ``` - ##### 创建 ordering constraint,vip要先于master副本资源启动 ``` pcs constraint order promote agcluster-master then start virtualip # 查看当前的约束 pcs constraint show --full ``` - ##### 重新启用STONITH并查看群集状态 ``` pcs property set stonith-enabled=true pcs status # 我的环境中的状态信息 --------------------------------------- Cluster name: agcluster Stack: corosync Current DC: SQL19N3 (version 1.1.20-5.el7_7.2-3c4c782f70) - partition with quorum Last updated: Wed Apr 29 04:24:50 2020 Last change: Wed Apr 29 04:24:45 2020 by root via cibadmin on SQL19N1 3 nodes configured 5 resources configured Online: [ SQL19N1 SQL19N2 SQL19N3 ] Full list of resources: rsc_st_azure (stonith:fence_azure_arm): Started SQL19N1 Master/Slave Set: agcluster-master [agcluster] Masters: [ SQL19N1 ] Slaves: [ SQL19N2 SQL19N3 ] virtualip (ocf::heartbeat:IPaddr2): Started SQL19N1 Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled ``` - ##### 测试Failover和Fencing ``` # 手动failover pcs resource move agcluster-master SQL19N2 --master pcs status # 手动 failover会生成一个constraint,避免AG资源再回到原来的节点 # 如果希望AG后续还能 failover回来,需要手动删除之 pcs constraint show --full pcs constraint remove cli-prefer-agcluster-master # 尝试Fencing群集节点,每个节点都试一下 # 如下命令的fencing只是重启node,如果要安全关闭node,使用--off参数 pcs stonith fence SQL19N3 --debug ```
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: