kubeadm 安装 k8s 1.14.1版本(HA)
2019-07-04 09:59
1046 查看
参考官网:
https://kubernetes.io/docs/setup/independent/install-kubeadm/#verify-the-mac-address-and-product-uuid-are-unique-for-every-node
kubeadm init 配置文件参数参考:
https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init/
-
环境:
5台 centos7 最新的系统 etc 集群跑在3台 master 节点上 网络组件使用 calico
主机名 | ip | 说明 | 组件 |
---|---|---|---|
k8s-company01-master01 ~ 03 | 172.16.4.201 ~ 203 | 3个 master 节点 | keepalived、haproxy、etcd、kubelet、kube-apiserver |
k8s-company01-worker001 ~ 002 | 172.16.4.204 ~ 205 | 2个 worker 节点 | kubelet |
k8-company01-lb | 172.16.4.200 | keepalived虚IP |
-
准备(在所有节点上执行):
1. 虚拟机确定 mac 和主机uuid 是唯一的。 (uuid 查看方法:cat /sys/class/dmi/id/product_uuid) 2. Swap disabled. (执行命令:swapoff -a; sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab) 3. 习惯性关闭 selinux,设置时区timedatectl set-timezone Asia/Shanghai,可选:echo "Asia/Shanghai" > /etc/timezone 4. 更新时间(etcd 对时间一致性要求高)ntpdate asia.pool.ntp.org (写入到 crontab:8 * * * * /usr/sbin/ntpdate asia.pool.ntp.org && /sbin/hwclock --systohc ) 4. yum update 到最新并重启系统让新的内核生效。 备注:关闭 selinux setenforce 0 sed -i --follow-symlinks "s/^SELINUX=enforcing/SELINUX=disabled/g" /etc/selinux/config sed -i --follow-symlinks "s/^SELINUX=permissive/SELINUX=disabled/g" /etc/selinux/config 关闭 firewalld,如果不关闭,后面很多 k8s 以外的组件会网络不通,一个个排查很麻烦,由于我们 k8s 在内网,就直接关闭了。 systemctl stop firewalld.service systemctl disable firewalld.service 配置主机名(注意根据实际环境修改主机名): 5台主机分别设置主机名: hostnamectl set-hostname k8s-company01-master01 hostnamectl set-hostname k8s-company01-master02 hostnamectl set-hostname k8s-company01-master03 hostnamectl set-hostname k8s-company01-worker001 hostnamectl set-hostname k8s-company01-worker002 在5台主机的/etc/hosts 中添加 cat >> /etc/hosts <<EOF 172.16.4.201 k8s-company01-master01.skymobi.cn k8s-company01-master01 172.16.4.202 k8s-company01-master02.skymobi.cn k8s-company01-master02 172.16.4.203 k8s-company01-master03.skymobi.cn k8s-company01-master03 172.16.4.200 k8s-company01-lb.skymobi.cn k8s-company01-lb 172.16.4.204 k8s-company01-worker001.skymobi.cn k8s-company01-worker001 172.16.4.205 k8s-company01-worker002.skymobi.cn k8s-company01-worker002 EOF yum install wget git jq psmisc vim net-tools tcping bash-completion -y yum update -y && reboot # 重启不仅是是让新升级的 kernel 生效,也让调用到 hostname 的相关服务使用新的 hostname
-
每台安装CRI(这里默认使用 docker,k8s 1.12开始推荐使用 docker 18.06 版本,但由于18.06有个 root 提权的漏洞,这里我们使用最新的版本,18.09.5)
安装参考:
https://kubernetes.io/docs/setup/cri/
## Install prerequisites. yum install -y yum-utils device-mapper-persistent-data lvm2 ## Add docker repository. yum-config-manager \ --add-repo \ https://download.docker.com/linux/centos/docker-ce.repo ## 查看 docker-ce 所有版本:yum list docker-ce --showduplicates | sort -r ## Install docker.if use 'yum install docker-ce' is install the latest.Here we use specify version: yum install -y docker-ce-18.09.5 docker-ce-cli-18.09.5 # Setup daemon. mkdir /etc/docker cat > /etc/docker/daemon.json <<EOF { "exec-opts": ["native.cgroupdriver=systemd"], "log-driver": "json-file", "log-opts": { "max-size": "100m" }, "storage-driver": "overlay2", "storage-opts": [ "overlay2.override_kernel_check=true" ] } EOF mkdir -p /etc/systemd/system/docker.service.d # Restart docker. systemctl daemon-reload systemctl enable docker.service systemctl restart docker
-
固定 docker 版本,防止以后意外更新到另外的大版本:
yum -y install yum-plugin-versionlock yum versionlock docker-ce docker-ce-cli yum versionlock list # 注: # 解锁 # yum versionlock delete docker-ce docker-ce-cli
## Some users on RHEL/CentOS 7 have reported issues with traffic being routed incorrectly due to iptables being bypassed. You should ensure net.bridge.bridge-nf-call-iptables is set to 1 in your sysctl config, e.g. cat <<EOF > /etc/sysctl.d/k8s.conf net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 net.ipv4.ip_forward = 1 vm.swappiness=0 vm.overcommit_memory=1 vm.panic_on_oom=0 fs.may_detach_mounts = 1 fs.inotify.max_user_watches=89100 fs.file-max=52706963 fs.nr_open=52706963 net.netfilter.nf_conntrack_max=2310720 EOF modprobe br_netfilter sysctl --system cat <<EOF > /etc/yum.repos.d/kubernetes.repo [kubernetes] name=Kubernetes baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/ enabled=1 gpgcheck=1 repo_gpgcheck=1 gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg exclude=kube* EOF 同时安装ipvsadm,后面kube-proxy会采用ipvs的方式(cri-tools-1.12.0 kubernetes-cni-0.7.5 是两个关联包) yum install -y kubelet-1.14.1 kubeadm-1.14.1 kubectl-1.14.1 cri-tools-1.12.0 kubernetes-cni-0.7.5 ipvsadm --disableexcludes=kubernetes # 加载 ipvs 相关内核模块 modprobe ip_vs modprobe ip_vs_rr modprobe ip_vs_wrr modprobe ip_vs_sh modprobe nf_conntrack_ipv4 modprobe br_netfilter # 加入开机启动中 cat <<EOF >>/etc/rc.d/rc.local modprobe ip_vs modprobe ip_vs_rr modprobe ip_vs_wrr modprobe ip_vs_sh modprobe nf_conntrack_ipv4 modprobe br_netfilter EOF ##默认 rc.local 软链接源文件没有可执行权限需要加上可执行权限 chmod +x /etc/rc.d/rc.local lsmod | grep ip_vs # 配置kubelet使用国内pause镜像 # 配置kubelet的cgroups # 获取docker的cgroups DOCKER_CGROUPS=$(docker info | grep 'Cgroup' | cut -d' ' -f3) echo $DOCKER_CGROUPS cat > /etc/sysconfig/kubelet <<EOF KUBELET_EXTRA_ARGS="--cgroup-driver=$DOCKER_CGROUPS --pod-infra-container-image=registry.cn-hangzhou.aliyuncs.com/google_containers/pause-amd64:3.1" EOF # 开机启动并 start now,之后 kubelet 启动是失败的,每隔几秒钟会自动重启,这是在等待 kubeadm 告诉它要做什么。 systemctl enable --now kubelet # 添加 kubectl 参数 tab 键自动补全功能 sour 7ff7 ce /usr/share/bash-completion/bash_completion source <(kubectl completion bash) echo "source <(kubectl completion bash)" >> ~/.bashrc
- 在三台 master 上配置haproxy代理:
如下操作在三个master节点操作,采用16443端口代理k8s的6443端口,注意修改最后面的主机名和 ip,如server k8s-company01-master01 172.16.4.201:6443 # 拉取haproxy镜像【采用alpine小镜像版本】 docker pull reg01.sky-mobi.com/k8s/haproxy:1.9.1-alpine mkdir /etc/haproxy cat >/etc/haproxy/haproxy.cfg<<EOF global log 127.0.0.1 local0 err maxconn 30000 uid 99 gid 99 #daemon nbproc 1 pidfile haproxy.pid defaults mode http log 127.0.0.1 local0 err maxconn 30000 retries 3 timeout connect 5s timeout client 30s timeout server 30s timeout check 2s listen admin_stats mode http bind 0.0.0.0:1080 log 127.0.0.1 local0 err stats refresh 30s stats uri /haproxy-status stats realm Haproxy\ Statistics stats auth admin:skymobik8s stats hide-version stats admin if TRUE frontend k8s-https bind 0.0.0.0:16443 mode tcp #maxconn 30000 default_backend k8s-https backend k8s-https mode tcp balance roundrobin server k8s-company01-master01 172.16.4.201:6443 weight 1 maxconn 1000 check inter 2000 rise 2 fall 3 server k8s-company01-master02 172.16.4.202:6443 weight 1 maxconn 1000 check inter 2000 rise 2 fall 3 server k8s-company01-master03 172.16.4.203:6443 weight 1 maxconn 1000 check inter 2000 rise 2 fall 3 EOF # 启动haproxy docker run -d --name k8s-haproxy \ -v /etc/haproxy:/usr/local/etc/haproxy:ro \ -p 16443:16443 \ -p 1080:1080 \ --restart always \ -d reg01.sky-mobi.com/k8s/haproxy:1.9.1-alpine # 查看是否启动成功,如果查看日志有连接报错,是正常的,因为 kube-api的6443端口还没起来。 docker ps # 如果上述配置失败后,需清理重新实验 docker stop k8s-haproxy docker rm k8s-haproxy
- 在三台 master 上配置 keepalived
# 拉取keepalived镜像 docker pull reg01.sky-mobi.com/k8s/keepalived:2.0.10 # 启动keepalived , 注意修改网卡名和 ip # eth0为本次实验172.16.4.0/24网段的所在网卡(如果你的不是,请改成自己的网卡名称,用法参考https://github.com/osixia/docker-keepalived/tree/v2.0.10) # 密码不要超过8位,如果是skymobk8s,则发送的 vrrp 包中只有8前8位:addrs: k8s-master-lb auth "skymobik" # KEEPALIVED_PRIORITY Master节点设置为200 ,其他backup 上设置为150 docker run --net=host --cap-add=NET_ADMIN \ -e KEEPALIVED_ROUTER_ID=55 \ -e KEEPALIVED_INTERFACE=eth0 \ -e KEEPALIVED_VIRTUAL_IPS="#PYTHON2BASH:['172.16.4.200']" \ -e KEEPALIVED_UNICAST_PEERS="#PYTHON2BASH:['172.16.4.201','172.16.4.202','172.16.4.203']" \ -e KEEPALIVED_PASSWORD=skyk8stx \ -e KEEPALIVED_PRIORITY=150 \ --name k8s-keepalived \ --restart always \ -d reg01.sky-mobi.com/k8s/keepalived:2.0.10 # 查看日志 # 会看到两个成为backup 一个成为master docker logs k8s-keepalived # 如果日志中有 received an invalid passwd! 的信息,网络中有配置相同的 ROUTER_ID,修改 ROUTER_ID 即可。 # 从任意一台 master 上ping测试 ping -c 4 虚IP # 如果上述配置失败后,需清理重新实验 docker stop k8s-keepalived docker rm k8s-keepalived
- 高可用
参考:
https://kubernetes.io/docs/setup/independent/high-availability/
第一台:k8s-master01 上操作
# 注意修改 controlPlaneEndpoint: "k8s-company01-lb:16443" 中对应的虚ip主机名 cat << EOF > kubeadm-config.yaml apiVersion: kubeadm.k8s.io/v1beta1 kind: ClusterConfiguration kubernetesVersion: v1.14.1 # add the available imageRepository in china imageRepository: reg01.sky-mobi.com/k8s/k8s.gcr.io controlPlaneEndpoint: "k8s-company01-lb:16443" networking: podSubnet: "10.254.0.0/16" --- apiVersion: kubeproxy.config.k8s.io/v1alpha1 kind: KubeProxyConfiguration ipvs: minSyncPeriod: 1s syncPeriod: 10s mode: ipvs EOF
kubeadm-config 参数参考:
https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-config/
预拉取镜像:
kubeadm config images pull --config kubeadm-config.yaml
master01 初始化:
kubeadm init --config=kubeadm-config.yaml --experimental-upload-certs 注意刚开始的打印出的信息,根据提示,消除掉所有的 WARNING 如果想要重来,使用 kubeadm reset 命令,并且按照提示清空 iptables 和 ipvs 配置,然后重启 docker 服务。
提示成功后,记录下最后 join 的所有参数,用于后面节点的加入(两小时内有效。一个用于 master 节点的加入,一个用于 worker 节点的加入)
You can now join any number of the control-plane node running the following command on each as root: kubeadm join k8s-company01-lb:16443 --token fp0x6g.cwuzedvtwlu1zg1f \ --discovery-token-ca-cert-hash sha256:5d4095bc9e4e4b5300abe5a25afe1064f32c1ddcecc02a1f9b0aeee7710c3383 \ --experimental-control-plane --certificate-key b56be86f65e73d844bb60783c7bd5d877fe20929296a3e254854d3b623bb86f7 Please note that the certificate-key gives access to cluster sensitive data, keep it secret! As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use "kubeadm init phase upload-certs --experimental-upload-certs" to reload certs afterward. Then you can join any number of worker nodes by running the following on each as root: kubeadm join k8s-company01-lb:16443 --token fp0x6g.cwuzedvtwlu1zg1f \ --discovery-token-ca-cert-hash sha256:5d4095bc9e4e4b5300abe5a25afe1064f32c1ddcecc02a1f9b0aeee7710c3383
记得执行如下命令,以便使用 kubectl访问集群
mkdir -p $HOME/.kube cp -i /etc/kubernetes/admin.conf $HOME/.kube/config chown $(id -u):$(id -g) $HOME/.kube/config # 如果不执行,将会出现一下报错: # [root@k8s-master01 ~]# kubectl -n kube-system get pod # The connection to the server localhost:8080 was refused - did you specify the right host or port?
查看集群状态时,coredns pending 没关系,因为网络插件还没装
# 显示结果作为参考 [root@k8s-master01 ~]# kubectl get pod -n kube-system NAME READY STATUS RESTARTS AGE coredns-56c9dc7946-5c5z2 0/1 Pending 0 34m coredns-56c9dc7946-thqwd 0/1 Pending 0 34m etcd-k8s-master01 1/1 Running 2 34m kube-apiserver-k8s-master01 1/1 Running 2 34m kube-controller-manager-k8s-master01 1/1 Running 1 33m kube-proxy-bl9c6 1/1 Running 2 34m kube-scheduler-k8s-master01 1/1 Running 1 34m
将 master02 和 master03 加入 cluster
# 使用之前生成的 join 参数将 master02 和 master03 加入集群(--experimental-control-plane 会自动加入服务集群 kubeadm join k8s-company01-lb:16443 --token fp0x6g.cwuzedvtwlu1zg1f \ --discovery-token-ca-cert-hash sha256:5d4095bc9e4e4b5300abe5a25afe1064f32c1ddcecc02a1f9b0aeee7710c3383 \ --experimental-control-plane --certificate-key b56be86f65e73d844bb60783c7bd5d877fe20929296a3e254854d3b623bb86f7 # 如果join 参数没有记下来,或者已经失效,参考: http://wiki.sky-mobi.com:8090/pages/viewpage.action?pageId=9079715 # 成功加入后,添加下 kubectl 的访问集群权限 mkdir -p $HOME/.kube cp -i /etc/kubernetes/admin.conf $HOME/.kube/config chown $(id -u):$(id -g) $HOME/.kube/config
安装 calico 网络插件(在 master01 上操作)
参考: https://docs.projectcalico.org/v3.6/getting-started/kubernetes/installation/calico 下载 yaml 文件( 这里的版本是 v3.6.1,文件源于官网https://docs.projectcalico.org/v3.6/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/typha/calico.yaml 修改过网段和replicas以及 image 地址) # 机房外部使用(有访问限制,公司自己的公网地址) curl http://111.1.17.135/yum/scripts/k8s/calico_v3.6.1.yaml -O # 机房内部使用 curl http://192.168.160.200/yum/scripts/k8s/calico_v3.6.1.yaml -O ## 修改 yaml 文件,网络地址段改成和kubeadm-config.yaml 中podSubnet 一致。 ## ## export POD_CIDR="10.254.0.0/16" ; sed -i -e "s?192.168.0.0/16?$POD_CIDR?g" calico.yaml ## replicas 改成3份,用于生产(默认是1) ## 还修改过镜像地址,镜像放到了reg01.sky-mobi.com 上 # 需要开启允许pod 被调度到master 节点上(在master01 上执行就行) [root@k8s-company01-master01 ~]# kubectl taint nodes --all node-role.kubernetes.io/master- node/k8s-company01-master01 untainted node/k8s-company01-master02 untainted node/k8s-company01-master03 untainted # 安装 calico (卸载是kubectl delete -f calico_v3.6.1.yaml) [root@k8s-company01-master01 ~]# kubectl apply -f calico_v3.6.1.yaml configmap/calico-config created customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/ipamblocks.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/blockaffinities.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/ipamhandles.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/ipamconfigs.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org created customresourcedefinition.apiextensions.k8s.io/networkpolicies.crd.projectcalico.org created clusterrole.rbac.authorization.k8s.io/calico-kube-controllers created clusterrolebinding.rbac.authorization.k8s.io/calico-kube-controllers created clusterrole.rbac.authorization.k8s.io/calico-node created clusterrolebinding.rbac.authorization.k8s.io/calico-node created service/calico-typha created deployment.apps/calico-typha created poddisruptionbudget.policy/calico-typha created daemonset.extensions/calico-node created serviceaccount/calico-node created deployment.extensions/calico-kube-controllers created serviceaccount/calico-kube-controllers created # 至此,所有pod 运行正常 [root@k8s-company01-master01 ~]# kubectl -n kube-system get pod NAME READY STATUS RESTARTS AGE calico-kube-controllers-749f7c8df8-knlx4 0/1 Running 0 20s calico-kube-controllers-749f7c8df8-ndf55 0/1 Running 0 20s calico-kube-controllers-749f7c8df8-pqxlx 0/1 Running 0 20s calico-node-4txj7 0/1 Running 0 21s calico-node-9t2l9 0/1 Running 0 21s calico-node-rtxlj 0/1 Running 0 21s calico-typha-646cdc958c-7j948 0/1 Pending 0 21s coredns-56c9dc7946-944nt 0/1 Running 0 4m9s coredns-56c9dc7946-nh2sk 0/1 Running 0 4m9s etcd-k8s-company01-master01 1/1 Running 0 3m26s etcd-k8s-company01-master02 1/1 Running 0 2m52s etcd-k8s-company01-master03 1/1 Running 0 110s kube-apiserver-k8s-company01-master01 1/1 Running 0 3m23s kube-apiserver-k8s-company01-master02 1/1 Running 0 2m53s kube-apiserver-k8s-company01-master03 1/1 Running 1 111s kube-controller-manager-k8s-company01-master01 1/1 Running 1 3m28s kube-controller-manager-k8s-company01-master02 1/1 Running 0 2m52s kube-controller-manager-k8s-company01-master03 1/1 Running 0 56s kube-proxy-8wm4v 1/1 Running 0 4m9s kube-proxy-vvdrl 1/1 Running 0 2m53s kube-proxy-wnctx 1/1 Running 0 2m2s kube-scheduler-k8s-company01-master01 1/1 Running 1 3m18s kube-scheduler-k8s-company01-master02 1/1 Running 0 2m52s kube-scheduler-k8s-company01-master03 1/1 Running 0 55s # 所有master 节点都是 ready 状态 [root@k8s-company01-master01 ~]# kubectl get node NAME STATUS ROLES AGE VERSION k8s-company01-master01 Ready master 4m48s v1.14.1 k8s-company01-master02 Ready master 3m12s v1.14.1 k8s-company01-master03 Ready master 2m21s v1.14.1 # 遇到 coredns 不停重启,关闭 firewalld 后正常,再次开启 firewalld 也正常了...
- 两台 worker 节点加入集群(按照前文做基础配置,安装好 docker 和 kubeadm 等)
# 与 master 加入集群的区别是少了 --experimental-control-plane 参数 kubeadm join k8s-company01-lb:16443 --token fp0x6g.cwuzedvtwlu1zg1f \ --discovery-token-ca-cert-hash sha256:5d4095bc9e4e4b5300abe5a25afe1064f32c1ddcecc02a1f9b0aeee7710c3383 # 如果join 参数没有记下来,或者已经失效,参考 http://wiki.sky-mobi.com:8090/pages/viewpage.action?pageId=9079715 # 添加成功显示: This node has joined the cluster: * Certificate signing request was sent to apiserver and a response was received. * The Kubelet was informed of the new secure connection details. Run 'kubectl get nodes' on the master to see this node join the cluster. ### kubectl get nodes 命令在任意 master 节点执行。
[root@k8s-company01-master01 ~]# kubectl get pod -n kube-system -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES calico-kube-controllers-749f7c8df8-knlx4 1/1 Running 1 5m2s 10.254.28.66 k8s-company01-master02 <none> <none> calico-kube-controllers-749f7c8df8-ndf55 1/1 Running 4 5m2s 10.254.31.67 k8s-company01-master03 <none> <none> calico-kube-controllers-749f7c8df8-pqxlx 1/1 Running 4 5m2s 10.254.31.66 k8s-company01-master03 <none> <none> calico-node-4txj7 1/1 Running 0 5m3s 172.16.4.203 k8s-company01-master03 <none> <none> calico-node-7fqwh 1/1 Running 0 68s 172.16.4.205 k8s-company01-worker002 <none> <none> calico-node-9t2l9 1/1 Running 0 5m3s 172.16.4.201 k8s-company01-master01 <none> <none> calico-node-rkfxj 1/1 Running 0 86s 172.16.4.204 k8s-company01-worker001 <none> <none> calico-node-rtxlj 1/1 Running 0 5m3s 172.16.4.202 k8s-company01-master02 <none> <none> calico-typha-646cdc958c-7j948 1/1 Running 0 5m3s 172.16.4.204 k8s-company01-worker001 <none> <none> coredns-56c9dc7946-944nt 0/1 CrashLoopBackOff 4 8m51s 10.254.28.65 k8s-company01-master02 <none> <none> coredns-56c9dc7946-nh2sk 0/1 CrashLoopBackOff 4 8m51s 10.254.31.65 k8s-company01-master03 <none> <none> etcd-k8s-company01-master01 1/1 Running 0 8m8s 172.16.4.201 k8s-company01-master01 <none> <none> etcd-k8s-company01-master02 1/1 Running 0 7m34s 172.16.4.202 k8s-company01-master02 <none> <none> etcd-k8s-company01-master03 1/1 Running 0 6m32s 172.16.4.203 k8s-company01-master03 <none> <none> kube-apiserver-k8s-company01-master01 1/1 Running 0 8m5s 172.16.4.201 k8s-company01-master01 <none> <none> kube-apiserver-k8s-company01-master02 1/1 Running 0 7m35s 172.16.4.202 k8s-company01-master02 <none> <none> kube-apiserver-k8s-company01-master03 1/1 Running 1 6m33s 172.16.4.203 k8s-company01-master03 <none> <none> kube-controller-manager-k8s-company01-master01 1/1 Running 1 8m10s 172.16.4.201 k8s-company01-master01 <none> <none> kube-controller-manager-k8s-company01-master02 1/1 Running 0 7m34s 172.16.4.202 k8s-company01-master02 <none> <none> kube-controller-manager-k8s-company01-master03 1/1 Running 0 5m38s 172.16.4.203 k8s-company01-master03 <none> <none> kube-proxy-8wm4v 1/1 Running 0 8m51s 172.16.4.201 k8s-company01-master01 <none> <none> kube-proxy-k8rng 1/1 Running 0 68s 172.16.4.205 k8s-company01-worker002 <none> <none> kube-proxy-rqnkv 1/1 Running 0 86s 172.16.4.204 k8s-company01-worker001 <none> <none> kube-proxy-vvdrl 1/1 Running 0 7m35s 172.16.4.202 k8s-company01-master02 <none> <none> kube-proxy-wnctx 1/1 Running 0 6m44s 172.16.4.203 k8s-company01-master03 <none> <none> kube-scheduler-k8s-company01-master01 1/1 Running 1 8m 172.16.4.201 k8s-company01-master01 <none> <none> kube-scheduler-k8s-company01-master02 1/1 Running 0 7m34s 172.16.4.202 k8s-company01-master02 <none> <none> kube-scheduler-k8s-company01-master03 1/1 Running 0 5m37s 172.16.4.203 k8s-company01-master03 <none> <none> [root@k8s-company01-master01 ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION k8s-company01-master01 Ready master 9m51s v1.14.1 k8s-company01-master02 Ready master 8m15s v1.14.1 k8s-company01-master03 Ready master 7m24s v1.14.1 k8s-company01-worker001 Ready <none> 2m6s v1.14.1 k8s-company01-worker002 Ready <none> 108s v1.14.1 [root@k8s-company01-master01 ~]# kubectl get csr NAME AGE REQUESTOR CONDITION csr-94f5v 8m27s system:bootstrap:fp0x6g Approved,Issued csr-g9tbg 2m19s system:bootstrap:fp0x6g Approved,Issued csr-pqr6l 7m49s system:bootstrap:fp0x6g Approved,Issued csr-vwtqq 2m system:bootstrap:fp0x6g Approved,Issued csr-w486d 10m system:node:k8s-company01-master01 Approved,Issued [root@k8s-company01-master01 ~]# kubectl get componentstatuses NAME STATUS MESSAGE ERROR scheduler Healthy ok controller-manager Healthy ok etcd-0 Healthy {"health":"true"}
- 安装 metrics-server 用于简单的监控,如命令 kubectl top nodes
# 不安装的情况下: [root@k8s-master03 ~]# kubectl top nodes Error from server (NotFound): the server could not find the requested resource (get services http:heapster:) 这里使用 helm 安装: 安装 helm(在 master01 上执行): wget http://192.168.160.200/yum/scripts/k8s/helm-v2.13.1-linux-amd64.tar.gz 或 wget http://111.1.17.135/yum/scripts/k8s/helm-v2.13.1-linux-amd64.tar.gz tar xvzf helm-v2.13.1-linux-amd64.tar.gz mv linux-amd64/helm /usr/local/bin/helm # 验证 helm help 每个节点执行 yum install -y socat 使用微软的源(阿里的源很长时间都没更新了!) # helm init --client-only --stable-repo-url https://aliacs-app-catalog.oss-cn-hangzhou.aliyuncs.com/charts/ # helm repo add incubator https://aliacs-app-catalog.oss-cn-hangzhou.aliyuncs.com/charts-incubator/ helm init --client-only --stable-repo-url http://mirror.azure.cn/kubernetes/charts/ helm repo add incubator http://mirror.azure.cn/kubernetes/charts-incubator/ helm repo update # 在 Kubernetes 中安装 Tiller 服务,因为官方的镜像因为某些原因无法拉取,使用-i指定自己的镜像,可选镜像:registry.cn-hangzhou.aliyuncs.com/google_containers/tiller:v2.9.1(阿里云),该镜像的版本与helm客户端的版本相同,使用helm version可查看helm客户端版本。 helm init --service-account tiller --upgrade -i registry.cn-hangzhou.aliyuncs.com/google_containers/tiller:v2.13.1 --tiller-tls-cert /etc/kubernetes/ssl/tiller001.pem --tiller-tls-key /etc/kubernetes/ssl/tiller001-key.pem --tls-ca-cert /etc/kubernetes/ssl/ca.pem --tiller-namespace kube-system --stable-repo-url http://mirror.azure.cn/kubernetes/charts/ --service-account tiller --history-max 200
给 Tiller 授权(master01 上执行)
# 因为 Helm 的服务端 Tiller 是一个部署在 Kubernetes 中 Kube-System Namespace 下 的 Deployment,它会去连接 Kube-Api 在 Kubernetes 里创建和删除应用。 # 而从 Kubernetes 1.6 版本开始,API Server 启用了 RBAC 授权。目前的 Tiller 部署时默认没有定义授权的 ServiceAccount,这会导致访问 API Server 时被拒绝。所以我们需要明确为 Tiller 部署添加授权。 # 创建 Kubernetes 的服务帐号和绑定角色 kubectl create serviceaccount --namespace kube-system tiller kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller kubectl patch deploy --namespace kube-system tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}' # 查看是否授权成功 [root@k8s-company01-master01 ~]# kubectl -n kube-system get pods|grep tiller tiller-deploy-7bf47568d4-42wf5 1/1 Running 0 17s [root@k8s-company01-master01 ~]# helm version Client: &version.Version{SemVer:"v2.13.1", GitCommit:"618447cbf203d147601b4b9bd7f8c37a5d39fbb4", GitTreeState:"clean"} Server: &version.Version{SemVer:"v2.13.1", GitCommit:"618447cbf203d147601b4b9bd7f8c37a5d39fbb4", GitTreeState:"clean"} [root@k8s-company01-master01 ~]# helm repo list NAME URL stable http://mirror.azure.cn/kubernetes/charts/ local http://127.0.0.1:8879/charts incubator http://mirror.azure.cn/kubernetes/charts-incubator/ ## 如果要替换仓库,先移除原先的仓库 #helm repo remove stable ## 添加新的仓库地址 #helm repo add stable http://mirror.azure.cn/kubernetes/charts/ #helm repo add incubator http://mirror.azure.cn/kubernetes/charts-incubator/ #helm repo update
使用helm安装metrics-server(在 master01上执行,因为只有 master01装了 helm)
# 创建 metrics-server-custom.yaml cat >> metrics-server-custom.yaml <<EOF image: repository: reg01.sky-mobi.com/k8s/gcr.io/google_containers/metrics-server-amd64 tag: v0.3.1 args: - --kubelet-insecure-tls - --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP EOF # 安装 metrics-server(这里 -n 是 name) [root@k8s-master01 ~]# helm install stable/metrics-server -n metrics-server --namespace kube-system --version=2.5.1 -f metrics-server-custom.yaml [root@k8s-company01-master01 ~]# kubectl get pod -n kube-system | grep metrics metrics-server-dcbdb9468-c5f4n 1/1 Running 0 21s # 保存 yaml 文件退出后,metrics-server pod 会自动销毁原来的,拉起一个新的。新 pod 起来后,过一两分钟再执行kubectl top命令就有结果了: [root@k8s-company01-master01 ~]# kubectl top node NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% k8s-company01-master01 404m 5% 1276Mi 4% k8s-company01-master02 493m 6% 1240Mi 3% k8s-company01-master03 516m 6% 1224Mi 3% k8s-company01-worker001 466m 0% 601Mi 0% k8s-company01-worker002 244m 0% 516Mi 0%
- 使用helm安装prometheus-operator
# 为方便管理,创建一个单独的 Namespace monitoring,Prometheus Operator 相关的组件都会部署到这个 Namespace。 kubectl create namespace monitoring ## 自定义 prometheus-operator 参数 # helm fetch stable/prometheus-operator --version=5.0.3 --untar # cat prometheus-operator/values.yaml | grep -v '#' | grep -v ^$ > prometheus-operator-custom.yaml # 只保留我们要修改 image 的部分,还有使用 https 连接 etcd,例如: 参考:https://fengxsong.github.io/2018/05/30/Using-helm-to-manage-prometheus-operator/ cat >> prometheus-operator-custom.yaml << EOF ## prometheus-operator/values.yaml alertmanager: service: nodePort: 30503 type: NodePort alertmanagerSpec: image: repository: reg01.sky-mobi.com/k8s/quay.io/prometheus/alertmanager tag: v0.16.1 prometheusOperator: image: repository: reg01.sky-mobi.com/k8s/quay.io/coreos/prometheus-operator tag: v0.29.0 pullPolicy: IfNotPresent configmapReloadImage: repository: reg01.sky-mobi.com/k8s/quay.io/coreos/configmap-reload tag: v0.0.1 prometheusConfigReloaderImage: repository: reg01.sky-mobi.com/k8s/quay.io/coreos/prometheus-config-reloader tag: v0.29.0 hyperkubeImage: repository: reg01.sky-mobi.com/k8s/k8s.gcr.io/hyperkube tag: v1.12.1 pullPolicy: IfNotPresent prometheus: service: nodePort: 30504 type: NodePort prometheusSpec: image: repository: reg01.sky-mobi.com/k8s/quay.io/prometheus/prometheus tag: v2.7.1 secrets: [etcd-client-cert] kubeEtcd: serviceMonitor: scheme: https insecureSkipVerify: false serverName: "" caFile: /etc/prometheus/secrets/etcd-client-cert/ca.crt certFile: /etc/prometheus/secrets/etcd-client-cert/healthcheck-client.crt keyFile: /etc/prometheus/secrets/etcd-client-cert/healthcheck-client.key ## prometheus-operator/charts/grafana/values.yaml grafana: service: nodePort: 30505 type: NodePort image: repository: reg01.sky-mobi.com/k8s/grafana/grafana tag: 6.0.2 sidecar: image: reg01.sky-mobi.com/k8s/kiwigrid/k8s-sidecar:0.0.13 ## prometheus-operator/charts/kube-state-metrics/values.yaml kube-state-metrics: image: repository: reg01.sky-mobi.com/k8s/k8s.gcr.io/kube-state-metrics tag: v1.5.0 ## prometheus-operator/charts/prometheus-node-exporter/values.yaml prometheus-node-exporter: image: repository: reg01.sky-mobi.com/k8s/quay.io/prometheus/node-exporter tag: v0.17.0 EOF ## 注:以上的prometheus-operator/charts/grafana/values.yaml 对应项添加了 grafana (按chats 目录添加的:) #[root@k8s-master01 ~]# ll prometheus-operator/charts/ #total 0 #drwxr-xr-x 4 root root 114 Apr 1 00:48 grafana #drwxr-xr-x 3 root root 96 Apr 1 00:18 kube-state-metrics #drwxr-xr-x 3 root root 110 Apr 1 00:20 prometheus-node-exporter # 创建连接 etcd 的证书secret: kubectl -n monitoring create secret generic etcd-client-cert --from-file=/etc/kubernetes/pki/etcd/ca.crt --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.crt --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.key helm install stable/prometheus-operator --version=5.0.3 --name=monitoring --namespace=monitoring -f prometheus-operator-custom.yaml ## 如果想要删除重来,可以使用 helm 删除,指定名字 monitoring #helm del --purge monitoring #kubectl delete crd prometheusrules.monitoring.coreos.com #kubectl delete crd servicemonitors.monitoring.coreos.com #kubectl delete crd alertmanagers.monitoring.coreos.com 重新安装 不要删除之前的,再安装可能会报错,用 upgrade 就好: helm upgrade monitoring stable/prometheus-operator --version=5.0.3 --namespace=monitoring -f prometheus-operator-custom.yaml [root@k8s-company01-master01 ~]# kubectl -n monitoring get pod NAME READY STATUS RESTARTS AGE alertmanager-monitoring-prometheus-oper-alertmanager-0 2/2 Running 0 29m monitoring-grafana-7dd5cf9dd7-wx8mz 2/2 Running 0 29m monitoring-kube-state-metrics-7d98487cfc-t6qqw 1/1 Running 0 29m monitoring-prometheus-node-exporter-fnvp9 1/1 Running 0 29m monitoring-prometheus-node-exporter-kczcq 1/1 Running 0 29m monitoring-prometheus-node-exporter-m8kf6 1/1 Running 0 29m monitoring-prometheus-node-exporter-mwc4g 1/1 Running 0 29m monitoring-prometheus-node-exporter-wxmt8 1/1 Running 0 29m monitoring-prometheus-oper-operator-7f96b488f6-2j7h5 1/1 Running 0 29m prometheus-monitoring-prometheus-oper-prometheus-0 3/3 Running 1 28m [root@k8s-company01-master01 ~]# kubectl get svc -n monitoring NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE alertmanager-operated ClusterIP None <none> 9093/TCP,6783/TCP 31m monitoring-grafana NodePort 10.109.159.105 <none> 80:30579/TCP 32m monitoring-kube-state-metrics ClusterIP 10.100.31.235 <none> 8080/TCP 32m monitoring-prometheus-node-exporter ClusterIP 10.109.119.13 <none> 9100/TCP 32m monitoring-prometheus-oper-alertmanager NodePort 10.105.171.135 4000 <none> 9093:31309/TCP 32m monitoring-prometheus-oper-operator ClusterIP 10.98.135.170 <none> 8080/TCP 32m monitoring-prometheus-oper-prometheus NodePort 10.96.15.36 <none> 9090:32489/TCP 32m prometheus-operated ClusterIP None <none> 9090/TCP 31m # 查看有没有异常告警,alerts里面的第一个Watchdog 是正常的报警,用于监控功能探测。 http://172.16.4.200:32489/alerts http://172.16.4.200:32489/targets #以下是安装 kubernetes-dashboard,用处不大,正式环境暂时不装 #helm install --name=kubernetes-dashboard stable/kubernetes-dashboard --version=1.4.0 --namespace=kube-system --set image.repository=reg01.sky-mobi.com/k8s/k8s.gcr.io/kubernetes-dashboard-amd64,image.tag=v1.10.1,rbac.clusterAdminRole=true #Heapter 已在 Kubernetes 1.13 版本中移除(https://github.com/kubernetes/heapster/blob/master/docs/deprecation.md),推荐使用 metrics-server 与 Prometheus。
相关文章推荐
- 使用kubeadm安装K8s-1.5版本证书到期
- 基于kubeadm安装k8s 1.12.2和dashboard(国内网络环境)
- automake安装1.14.1版本
- k8s官方安装版本
- (五)二进制安装k8s-1.11版本之node组件部署
- Kubeadm安装Kubernetes-1.5.1版本
- (一)二进制安装k8s-1.11版本之系统初始化及ca证书申请
- (七)二进制安装k8s-1.11版本之coredns部署
- k8s 使用kubeadm 安装(国内镜像)
- k8s技术预研3--使用kubeadm安装、配置Kubernetes集群以及进行故障排查的方法
- kubeadm1.14.1 安装Metrics Server
- (八)二进制安装k8s-1.11版本之master高可用
- 使用Kubeadm安装Kubernetes1.5版本
- (三)二进制安装k8s-1.11版本之flanneld部署
- 1 复习ha相关 + weekend110的hive的元数据库mysql方式安装配置(完全正确配法)(CentOS版本)(包含卸载系统自带的MySQL)
- Ubuntu16.04上通过kubeadm安装指定版本的kubenetes
- centos7安装指定版本docker和kubeadm安装kubernetes