您的位置:首页 > 其它

kubeadm 安装 k8s 1.14.1版本(HA)

2019-07-04 09:59 1046 查看

参考官网:

https://kubernetes.io/docs/setup/independent/install-kubeadm/#verify-the-mac-address-and-product-uuid-are-unique-for-every-node

kubeadm init 配置文件参数参考:

https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init/
  • 环境:

5台 centos7 最新的系统
etc 集群跑在3台 master 节点上
网络组件使用 calico
主机名 ip 说明 组件
k8s-company01-master01 ~ 03 172.16.4.201 ~ 203 3个 master 节点 keepalived、haproxy、etcd、kubelet、kube-apiserver
k8s-company01-worker001 ~ 002 172.16.4.204 ~ 205 2个 worker 节点 kubelet
k8-company01-lb 172.16.4.200 keepalived虚IP
  • 准备(在所有节点上执行):

1. 虚拟机确定 mac 和主机uuid 是唯一的。
(uuid 查看方法:cat /sys/class/dmi/id/product_uuid)
2. Swap disabled.
(执行命令:swapoff -a; sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab)
3. 习惯性关闭 selinux,设置时区timedatectl set-timezone Asia/Shanghai,可选:echo "Asia/Shanghai" > /etc/timezone
4. 更新时间(etcd 对时间一致性要求高)ntpdate asia.pool.ntp.org
(写入到 crontab:8 * * * * /usr/sbin/ntpdate asia.pool.ntp.org && /sbin/hwclock --systohc )
4. yum update 到最新并重启系统让新的内核生效。

备注:关闭 selinux
setenforce 0
sed -i --follow-symlinks "s/^SELINUX=enforcing/SELINUX=disabled/g"  /etc/selinux/config
sed -i --follow-symlinks "s/^SELINUX=permissive/SELINUX=disabled/g" /etc/selinux/config

关闭 firewalld,如果不关闭,后面很多 k8s 以外的组件会网络不通,一个个排查很麻烦,由于我们 k8s 在内网,就直接关闭了。
systemctl stop firewalld.service
systemctl disable firewalld.service

配置主机名(注意根据实际环境修改主机名):
5台主机分别设置主机名:
hostnamectl set-hostname k8s-company01-master01
hostnamectl set-hostname k8s-company01-master02
hostnamectl set-hostname k8s-company01-master03
hostnamectl set-hostname k8s-company01-worker001
hostnamectl set-hostname k8s-company01-worker002

在5台主机的/etc/hosts 中添加
cat >> /etc/hosts <<EOF
172.16.4.201 k8s-company01-master01.skymobi.cn k8s-company01-master01
172.16.4.202 k8s-company01-master02.skymobi.cn k8s-company01-master02
172.16.4.203 k8s-company01-master03.skymobi.cn k8s-company01-master03
172.16.4.200 k8s-company01-lb.skymobi.cn k8s-company01-lb
172.16.4.204 k8s-company01-worker001.skymobi.cn k8s-company01-worker001
172.16.4.205 k8s-company01-worker002.skymobi.cn k8s-company01-worker002
EOF

yum install wget git jq psmisc vim net-tools tcping bash-completion -y
yum update -y && reboot
# 重启不仅是是让新升级的 kernel 生效,也让调用到 hostname 的相关服务使用新的 hostname
  • 每台安装CRI(这里默认使用 docker,k8s 1.12开始推荐使用 docker 18.06 版本,但由于18.06有个 root 提权的漏洞,这里我们使用最新的版本,18.09.5)

安装参考:

https://kubernetes.io/docs/setup/cri/

## Install prerequisites.
yum install -y yum-utils device-mapper-persistent-data lvm2

## Add docker repository.
yum-config-manager \
--add-repo \
https://download.docker.com/linux/centos/docker-ce.repo

## 查看 docker-ce 所有版本:yum list docker-ce --showduplicates | sort -r
## Install docker.if use 'yum install docker-ce' is install the latest.Here we use specify version:
yum install -y docker-ce-18.09.5 docker-ce-cli-18.09.5

# Setup daemon.
mkdir /etc/docker
cat > /etc/docker/daemon.json <<EOF
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2",
"storage-opts": [
"overlay2.override_kernel_check=true"
]
}
EOF

mkdir -p /etc/systemd/system/docker.service.d

# Restart docker.
systemctl daemon-reload
systemctl enable docker.service
systemctl restart docker
  • 固定 docker 版本,防止以后意外更新到另外的大版本:

yum -y install yum-plugin-versionlock
yum versionlock docker-ce docker-ce-cli
yum versionlock list

# 注:
# 解锁
# yum versionlock delete docker-ce docker-ce-cli
## Some users on RHEL/CentOS 7 have reported issues with traffic being routed incorrectly due to iptables being bypassed. You should ensure net.bridge.bridge-nf-call-iptables is set to 1 in your sysctl config, e.g.

cat <<EOF >  /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
vm.swappiness=0
vm.overcommit_memory=1
vm.panic_on_oom=0
fs.may_detach_mounts = 1
fs.inotify.max_user_watches=89100
fs.file-max=52706963
fs.nr_open=52706963
net.netfilter.nf_conntrack_max=2310720
EOF

modprobe br_netfilter
sysctl --system

cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
exclude=kube*
EOF

同时安装ipvsadm,后面kube-proxy会采用ipvs的方式(cri-tools-1.12.0 kubernetes-cni-0.7.5 是两个关联包)
yum install -y kubelet-1.14.1 kubeadm-1.14.1 kubectl-1.14.1 cri-tools-1.12.0 kubernetes-cni-0.7.5 ipvsadm --disableexcludes=kubernetes

# 加载 ipvs 相关内核模块
modprobe ip_vs
modprobe ip_vs_rr
modprobe ip_vs_wrr
modprobe ip_vs_sh
modprobe nf_conntrack_ipv4
modprobe br_netfilter

# 加入开机启动中
cat <<EOF >>/etc/rc.d/rc.local
modprobe ip_vs
modprobe ip_vs_rr
modprobe ip_vs_wrr
modprobe ip_vs_sh
modprobe nf_conntrack_ipv4
modprobe br_netfilter
EOF

##默认 rc.local 软链接源文件没有可执行权限需要加上可执行权限
chmod +x /etc/rc.d/rc.local

lsmod | grep ip_vs

# 配置kubelet使用国内pause镜像
# 配置kubelet的cgroups
# 获取docker的cgroups
DOCKER_CGROUPS=$(docker info | grep 'Cgroup' | cut -d' ' -f3)
echo $DOCKER_CGROUPS
cat > /etc/sysconfig/kubelet <<EOF
KUBELET_EXTRA_ARGS="--cgroup-driver=$DOCKER_CGROUPS --pod-infra-container-image=registry.cn-hangzhou.aliyuncs.com/google_containers/pause-amd64:3.1"
EOF

# 开机启动并 start now,之后 kubelet 启动是失败的,每隔几秒钟会自动重启,这是在等待 kubeadm 告诉它要做什么。
systemctl enable --now kubelet

# 添加 kubectl 参数 tab 键自动补全功能
sour
7ff7
ce /usr/share/bash-completion/bash_completion
source <(kubectl completion bash)
echo "source <(kubectl completion bash)" >> ~/.bashrc
  • 在三台 master 上配置haproxy代理:
如下操作在三个master节点操作,采用16443端口代理k8s的6443端口,注意修改最后面的主机名和 ip,如server k8s-company01-master01 172.16.4.201:6443
# 拉取haproxy镜像【采用alpine小镜像版本】
docker pull reg01.sky-mobi.com/k8s/haproxy:1.9.1-alpine
mkdir /etc/haproxy
cat >/etc/haproxy/haproxy.cfg<<EOF
global
log 127.0.0.1 local0 err
maxconn 30000
uid 99
gid 99
#daemon
nbproc 1
pidfile haproxy.pid

defaults
mode http
log 127.0.0.1 local0 err
maxconn 30000
retries 3
timeout connect 5s
timeout client 30s
timeout server 30s
timeout check 2s

listen admin_stats
mode http
bind 0.0.0.0:1080
log 127.0.0.1 local0 err
stats refresh 30s
stats uri     /haproxy-status
stats realm   Haproxy\ Statistics
stats auth    admin:skymobik8s
stats hide-version
stats admin if TRUE

frontend k8s-https
bind 0.0.0.0:16443
mode tcp
#maxconn 30000
default_backend k8s-https

backend k8s-https
mode tcp
balance roundrobin
server k8s-company01-master01 172.16.4.201:6443 weight 1 maxconn 1000 check inter 2000 rise 2 fall 3
server k8s-company01-master02 172.16.4.202:6443 weight 1 maxconn 1000 check inter 2000 rise 2 fall 3
server k8s-company01-master03 172.16.4.203:6443 weight 1 maxconn 1000 check inter 2000 rise 2 fall 3
EOF

# 启动haproxy
docker run -d --name k8s-haproxy \
-v /etc/haproxy:/usr/local/etc/haproxy:ro \
-p 16443:16443 \
-p 1080:1080 \
--restart always \
-d reg01.sky-mobi.com/k8s/haproxy:1.9.1-alpine

# 查看是否启动成功,如果查看日志有连接报错,是正常的,因为 kube-api的6443端口还没起来。
docker ps

# 如果上述配置失败后,需清理重新实验
docker stop k8s-haproxy
docker rm k8s-haproxy
  • 在三台 master 上配置 keepalived
# 拉取keepalived镜像
docker pull reg01.sky-mobi.com/k8s/keepalived:2.0.10

# 启动keepalived , 注意修改网卡名和 ip
# eth0为本次实验172.16.4.0/24网段的所在网卡(如果你的不是,请改成自己的网卡名称,用法参考https://github.com/osixia/docker-keepalived/tree/v2.0.10)
# 密码不要超过8位,如果是skymobk8s,则发送的 vrrp 包中只有8前8位:addrs: k8s-master-lb auth "skymobik"
# KEEPALIVED_PRIORITY Master节点设置为200 ,其他backup 上设置为150
docker run --net=host --cap-add=NET_ADMIN \
-e KEEPALIVED_ROUTER_ID=55 \
-e KEEPALIVED_INTERFACE=eth0 \
-e KEEPALIVED_VIRTUAL_IPS="#PYTHON2BASH:['172.16.4.200']" \
-e KEEPALIVED_UNICAST_PEERS="#PYTHON2BASH:['172.16.4.201','172.16.4.202','172.16.4.203']" \
-e KEEPALIVED_PASSWORD=skyk8stx \
-e KEEPALIVED_PRIORITY=150 \
--name k8s-keepalived \
--restart always \
-d reg01.sky-mobi.com/k8s/keepalived:2.0.10

# 查看日志
# 会看到两个成为backup 一个成为master
docker logs k8s-keepalived
# 如果日志中有 received an invalid passwd! 的信息,网络中有配置相同的 ROUTER_ID,修改 ROUTER_ID 即可。

# 从任意一台 master 上ping测试
ping -c 4 虚IP

# 如果上述配置失败后,需清理重新实验
docker stop k8s-keepalived
docker rm k8s-keepalived
  • 高可用
    参考:
https://kubernetes.io/docs/setup/independent/high-availability/

第一台:k8s-master01 上操作

# 注意修改 controlPlaneEndpoint: "k8s-company01-lb:16443" 中对应的虚ip主机名
cat << EOF > kubeadm-config.yaml
apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterConfiguration
kubernetesVersion: v1.14.1
# add the available imageRepository in china
imageRepository: reg01.sky-mobi.com/k8s/k8s.gcr.io
controlPlaneEndpoint: "k8s-company01-lb:16443"
networking:
podSubnet: "10.254.0.0/16"
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
ipvs:
minSyncPeriod: 1s
syncPeriod: 10s
mode: ipvs
EOF

kubeadm-config 参数参考:

https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-config/

预拉取镜像:

kubeadm config images pull --config kubeadm-config.yaml

master01 初始化:

kubeadm init --config=kubeadm-config.yaml --experimental-upload-certs
注意刚开始的打印出的信息,根据提示,消除掉所有的 WARNING
如果想要重来,使用 kubeadm reset 命令,并且按照提示清空 iptables 和 ipvs 配置,然后重启 docker 服务。

提示成功后,记录下最后 join 的所有参数,用于后面节点的加入(两小时内有效。一个用于 master 节点的加入,一个用于 worker 节点的加入)

You can now join any number of the control-plane node running the following command on each as root:

kubeadm join k8s-company01-lb:16443 --token fp0x6g.cwuzedvtwlu1zg1f \
--discovery-token-ca-cert-hash sha256:5d4095bc9e4e4b5300abe5a25afe1064f32c1ddcecc02a1f9b0aeee7710c3383 \
--experimental-control-plane --certificate-key b56be86f65e73d844bb60783c7bd5d877fe20929296a3e254854d3b623bb86f7

Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --experimental-upload-certs" to reload certs afterward.

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join k8s-company01-lb:16443 --token fp0x6g.cwuzedvtwlu1zg1f \
--discovery-token-ca-cert-hash sha256:5d4095bc9e4e4b5300abe5a25afe1064f32c1ddcecc02a1f9b0aeee7710c3383

记得执行如下命令,以便使用 kubectl访问集群

mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config

# 如果不执行,将会出现一下报错:
# [root@k8s-master01 ~]# kubectl -n kube-system get pod
# The connection to the server localhost:8080 was refused - did you specify the right host or port?

查看集群状态时,coredns pending 没关系,因为网络插件还没装

# 显示结果作为参考
[root@k8s-master01 ~]# kubectl get pod -n kube-system
NAME                                   READY   STATUS    RESTARTS   AGE
coredns-56c9dc7946-5c5z2               0/1     Pending   0          34m
coredns-56c9dc7946-thqwd               0/1     Pending   0          34m
etcd-k8s-master01                      1/1     Running   2          34m
kube-apiserver-k8s-master01            1/1     Running   2          34m
kube-controller-manager-k8s-master01   1/1     Running   1          33m
kube-proxy-bl9c6                       1/1     Running   2          34m
kube-scheduler-k8s-master01            1/1     Running   1          34m
将 master02 和 master03 加入 cluster
# 使用之前生成的 join 参数将 master02 和 master03 加入集群(--experimental-control-plane 会自动加入服务集群

kubeadm join k8s-company01-lb:16443 --token fp0x6g.cwuzedvtwlu1zg1f \
--discovery-token-ca-cert-hash sha256:5d4095bc9e4e4b5300abe5a25afe1064f32c1ddcecc02a1f9b0aeee7710c3383 \
--experimental-control-plane --certificate-key b56be86f65e73d844bb60783c7bd5d877fe20929296a3e254854d3b623bb86f7

# 如果join 参数没有记下来,或者已经失效,参考:
http://wiki.sky-mobi.com:8090/pages/viewpage.action?pageId=9079715

# 成功加入后,添加下 kubectl 的访问集群权限
mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config

安装 calico 网络插件(在 master01 上操作)

参考:
https://docs.projectcalico.org/v3.6/getting-started/kubernetes/installation/calico

下载 yaml 文件( 这里的版本是 v3.6.1,文件源于官网https://docs.projectcalico.org/v3.6/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/typha/calico.yaml  修改过网段和replicas以及 image 地址)

# 机房外部使用(有访问限制,公司自己的公网地址)
curl http://111.1.17.135/yum/scripts/k8s/calico_v3.6.1.yaml -O
# 机房内部使用
curl http://192.168.160.200/yum/scripts/k8s/calico_v3.6.1.yaml -O

## 修改 yaml 文件,网络地址段改成和kubeadm-config.yaml 中podSubnet 一致。
##
## export POD_CIDR="10.254.0.0/16" ; sed -i -e "s?192.168.0.0/16?$POD_CIDR?g" calico.yaml
## replicas 改成3份,用于生产(默认是1)
## 还修改过镜像地址,镜像放到了reg01.sky-mobi.com 上

# 需要开启允许pod 被调度到master 节点上(在master01 上执行就行)
[root@k8s-company01-master01 ~]# kubectl taint nodes --all node-role.kubernetes.io/master-
node/k8s-company01-master01 untainted
node/k8s-company01-master02 untainted
node/k8s-company01-master03 untainted

# 安装 calico (卸载是kubectl delete -f calico_v3.6.1.yaml)
[root@k8s-company01-master01 ~]# kubectl apply -f calico_v3.6.1.yaml
configmap/calico-config created
customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamblocks.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/blockaffinities.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamhandles.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamconfigs.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networkpolicies.crd.projectcalico.org created
clusterrole.rbac.authorization.k8s.io/calico-kube-controllers created
clusterrolebinding.rbac.authorization.k8s.io/calico-kube-controllers created
clusterrole.rbac.authorization.k8s.io/calico-node created
clusterrolebinding.rbac.authorization.k8s.io/calico-node created
service/calico-typha created
deployment.apps/calico-typha created
poddisruptionbudget.policy/calico-typha created
daemonset.extensions/calico-node created
serviceaccount/calico-node created
deployment.extensions/calico-kube-controllers created
serviceaccount/calico-kube-controllers created

# 至此,所有pod 运行正常
[root@k8s-company01-master01 ~]# kubectl -n kube-system get pod
NAME                                             READY   STATUS    RESTARTS   AGE
calico-kube-controllers-749f7c8df8-knlx4         0/1     Running   0          20s
calico-kube-controllers-749f7c8df8-ndf55         0/1     Running   0          20s
calico-kube-controllers-749f7c8df8-pqxlx         0/1     Running   0          20s
calico-node-4txj7                                0/1     Running   0          21s
calico-node-9t2l9                                0/1     Running   0          21s
calico-node-rtxlj                                0/1     Running   0          21s
calico-typha-646cdc958c-7j948                    0/1     Pending   0          21s
coredns-56c9dc7946-944nt                         0/1     Running   0          4m9s
coredns-56c9dc7946-nh2sk                         0/1     Running   0          4m9s
etcd-k8s-company01-master01                      1/1     Running   0          3m26s
etcd-k8s-company01-master02                      1/1     Running   0          2m52s
etcd-k8s-company01-master03                      1/1     Running   0          110s
kube-apiserver-k8s-company01-master01            1/1     Running   0          3m23s
kube-apiserver-k8s-company01-master02            1/1     Running   0          2m53s
kube-apiserver-k8s-company01-master03            1/1     Running   1          111s
kube-controller-manager-k8s-company01-master01   1/1     Running   1          3m28s
kube-controller-manager-k8s-company01-master02   1/1     Running   0          2m52s
kube-controller-manager-k8s-company01-master03   1/1     Running   0          56s
kube-proxy-8wm4v                                 1/1     Running   0          4m9s
kube-proxy-vvdrl                                 1/1     Running   0          2m53s
kube-proxy-wnctx                                 1/1     Running   0          2m2s
kube-scheduler-k8s-company01-master01            1/1     Running   1          3m18s
kube-scheduler-k8s-company01-master02            1/1     Running   0          2m52s
kube-scheduler-k8s-company01-master03            1/1     Running   0          55s

# 所有master 节点都是 ready 状态
[root@k8s-company01-master01 ~]# kubectl get node
NAME                     STATUS   ROLES    AGE     VERSION
k8s-company01-master01   Ready    master   4m48s   v1.14.1
k8s-company01-master02   Ready    master   3m12s   v1.14.1
k8s-company01-master03   Ready    master   2m21s   v1.14.1

# 遇到 coredns 不停重启,关闭 firewalld 后正常,再次开启 firewalld 也正常了...
  • 两台 worker 节点加入集群(按照前文做基础配置,安装好 docker 和 kubeadm 等)
# 与 master 加入集群的区别是少了 --experimental-control-plane 参数
kubeadm join k8s-company01-lb:16443 --token fp0x6g.cwuzedvtwlu1zg1f \
--discovery-token-ca-cert-hash sha256:5d4095bc9e4e4b5300abe5a25afe1064f32c1ddcecc02a1f9b0aeee7710c3383

# 如果join 参数没有记下来,或者已经失效,参考
http://wiki.sky-mobi.com:8090/pages/viewpage.action?pageId=9079715

# 添加成功显示:
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the master to see this node join the cluster.

### kubectl get nodes 命令在任意 master 节点执行。
[root@k8s-company01-master01 ~]# kubectl get pod -n kube-system -o wide
NAME                                             READY   STATUS             RESTARTS   AGE     IP             NODE                      NOMINATED NODE   READINESS GATES
calico-kube-controllers-749f7c8df8-knlx4         1/1     Running            1          5m2s    10.254.28.66   k8s-company01-master02    <none>           <none>
calico-kube-controllers-749f7c8df8-ndf55         1/1     Running            4          5m2s    10.254.31.67   k8s-company01-master03    <none>           <none>
calico-kube-controllers-749f7c8df8-pqxlx         1/1     Running            4          5m2s    10.254.31.66   k8s-company01-master03    <none>           <none>
calico-node-4txj7                                1/1     Running            0          5m3s    172.16.4.203   k8s-company01-master03    <none>           <none>
calico-node-7fqwh                                1/1     Running            0          68s     172.16.4.205   k8s-company01-worker002   <none>           <none>
calico-node-9t2l9                                1/1     Running            0          5m3s    172.16.4.201   k8s-company01-master01    <none>           <none>
calico-node-rkfxj                                1/1     Running            0          86s     172.16.4.204   k8s-company01-worker001   <none>           <none>
calico-node-rtxlj                                1/1     Running            0          5m3s    172.16.4.202   k8s-company01-master02    <none>           <none>
calico-typha-646cdc958c-7j948                    1/1     Running            0          5m3s    172.16.4.204   k8s-company01-worker001   <none>           <none>
coredns-56c9dc7946-944nt                         0/1     CrashLoopBackOff   4          8m51s   10.254.28.65   k8s-company01-master02    <none>           <none>
coredns-56c9dc7946-nh2sk                         0/1     CrashLoopBackOff   4          8m51s   10.254.31.65   k8s-company01-master03    <none>           <none>
etcd-k8s-company01-master01                      1/1     Running            0          8m8s    172.16.4.201   k8s-company01-master01    <none>           <none>
etcd-k8s-company01-master02                      1/1     Running            0          7m34s   172.16.4.202   k8s-company01-master02    <none>           <none>
etcd-k8s-company01-master03                      1/1     Running            0          6m32s   172.16.4.203   k8s-company01-master03    <none>           <none>
kube-apiserver-k8s-company01-master01            1/1     Running            0          8m5s    172.16.4.201   k8s-company01-master01    <none>           <none>
kube-apiserver-k8s-company01-master02            1/1     Running            0          7m35s   172.16.4.202   k8s-company01-master02    <none>           <none>
kube-apiserver-k8s-company01-master03            1/1     Running            1          6m33s   172.16.4.203   k8s-company01-master03    <none>           <none>
kube-controller-manager-k8s-company01-master01   1/1     Running            1          8m10s   172.16.4.201   k8s-company01-master01    <none>           <none>
kube-controller-manager-k8s-company01-master02   1/1     Running            0          7m34s   172.16.4.202   k8s-company01-master02    <none>           <none>
kube-controller-manager-k8s-company01-master03   1/1     Running            0          5m38s   172.16.4.203   k8s-company01-master03    <none>           <none>
kube-proxy-8wm4v                                 1/1     Running            0          8m51s   172.16.4.201   k8s-company01-master01    <none>           <none>
kube-proxy-k8rng                                 1/1     Running            0          68s     172.16.4.205   k8s-company01-worker002   <none>           <none>
kube-proxy-rqnkv                                 1/1     Running            0          86s     172.16.4.204   k8s-company01-worker001   <none>           <none>
kube-proxy-vvdrl                                 1/1     Running            0          7m35s   172.16.4.202   k8s-company01-master02    <none>           <none>
kube-proxy-wnctx                                 1/1     Running            0          6m44s   172.16.4.203   k8s-company01-master03    <none>           <none>
kube-scheduler-k8s-company01-master01            1/1     Running            1          8m      172.16.4.201   k8s-company01-master01    <none>           <none>
kube-scheduler-k8s-company01-master02            1/1     Running            0          7m34s   172.16.4.202   k8s-company01-master02    <none>           <none>
kube-scheduler-k8s-company01-master03            1/1     Running            0          5m37s   172.16.4.203   k8s-company01-master03    <none>           <none>

[root@k8s-company01-master01 ~]# kubectl get nodes
NAME                      STATUS   ROLES    AGE     VERSION
k8s-company01-master01    Ready    master   9m51s   v1.14.1
k8s-company01-master02    Ready    master   8m15s   v1.14.1
k8s-company01-master03    Ready    master   7m24s   v1.14.1
k8s-company01-worker001   Ready    <none>   2m6s    v1.14.1
k8s-company01-worker002   Ready    <none>   108s    v1.14.1

[root@k8s-company01-master01 ~]# kubectl get csr
NAME        AGE     REQUESTOR                            CONDITION
csr-94f5v   8m27s   system:bootstrap:fp0x6g              Approved,Issued
csr-g9tbg   2m19s   system:bootstrap:fp0x6g              Approved,Issued
csr-pqr6l   7m49s   system:bootstrap:fp0x6g              Approved,Issued
csr-vwtqq   2m      system:bootstrap:fp0x6g              Approved,Issued
csr-w486d   10m     system:node:k8s-company01-master01   Approved,Issued

[root@k8s-company01-master01 ~]# kubectl get componentstatuses
NAME                 STATUS    MESSAGE             ERROR
scheduler            Healthy   ok
controller-manager   Healthy   ok
etcd-0               Healthy   {"health":"true"}
  • 安装 metrics-server 用于简单的监控,如命令 kubectl top nodes
# 不安装的情况下:
[root@k8s-master03 ~]# kubectl top nodes
Error from server (NotFound): the server could not find the requested resource (get services http:heapster:)

这里使用 helm 安装:
安装 helm(在 master01 上执行):

wget http://192.168.160.200/yum/scripts/k8s/helm-v2.13.1-linux-amd64.tar.gz
或 wget http://111.1.17.135/yum/scripts/k8s/helm-v2.13.1-linux-amd64.tar.gz
tar xvzf helm-v2.13.1-linux-amd64.tar.gz
mv linux-amd64/helm /usr/local/bin/helm
# 验证
helm help

每个节点执行
yum install -y socat

使用微软的源(阿里的源很长时间都没更新了!)
# helm init --client-only --stable-repo-url https://aliacs-app-catalog.oss-cn-hangzhou.aliyuncs.com/charts/
# helm repo add incubator https://aliacs-app-catalog.oss-cn-hangzhou.aliyuncs.com/charts-incubator/
helm init --client-only --stable-repo-url http://mirror.azure.cn/kubernetes/charts/
helm repo add incubator http://mirror.azure.cn/kubernetes/charts-incubator/
helm repo update

# 在 Kubernetes 中安装 Tiller 服务,因为官方的镜像因为某些原因无法拉取,使用-i指定自己的镜像,可选镜像:registry.cn-hangzhou.aliyuncs.com/google_containers/tiller:v2.9.1(阿里云),该镜像的版本与helm客户端的版本相同,使用helm version可查看helm客户端版本。

helm init --service-account tiller --upgrade -i registry.cn-hangzhou.aliyuncs.com/google_containers/tiller:v2.13.1 --tiller-tls-cert /etc/kubernetes/ssl/tiller001.pem --tiller-tls-key /etc/kubernetes/ssl/tiller001-key.pem --tls-ca-cert /etc/kubernetes/ssl/ca.pem --tiller-namespace kube-system --stable-repo-url http://mirror.azure.cn/kubernetes/charts/ --service-account tiller --history-max 200

给 Tiller 授权(master01 上执行)

# 因为 Helm 的服务端 Tiller 是一个部署在 Kubernetes 中 Kube-System Namespace 下 的 Deployment,它会去连接 Kube-Api 在 Kubernetes 里创建和删除应用。

# 而从 Kubernetes 1.6 版本开始,API Server 启用了 RBAC 授权。目前的 Tiller 部署时默认没有定义授权的 ServiceAccount,这会导致访问 API Server 时被拒绝。所以我们需要明确为 Tiller 部署添加授权。

# 创建 Kubernetes 的服务帐号和绑定角色

kubectl create serviceaccount --namespace kube-system tiller

kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller

kubectl patch deploy --namespace kube-system tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}'

# 查看是否授权成功
[root@k8s-company01-master01 ~]# kubectl -n kube-system get pods|grep tiller
tiller-deploy-7bf47568d4-42wf5                   1/1     Running   0          17s

[root@k8s-company01-master01 ~]# helm version
Client: &version.Version{SemVer:"v2.13.1", GitCommit:"618447cbf203d147601b4b9bd7f8c37a5d39fbb4", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.13.1", GitCommit:"618447cbf203d147601b4b9bd7f8c37a5d39fbb4", GitTreeState:"clean"}

[root@k8s-company01-master01 ~]# helm repo list
NAME            URL
stable          http://mirror.azure.cn/kubernetes/charts/
local           http://127.0.0.1:8879/charts
incubator       http://mirror.azure.cn/kubernetes/charts-incubator/

## 如果要替换仓库,先移除原先的仓库
#helm repo remove stable
## 添加新的仓库地址
#helm repo add stable http://mirror.azure.cn/kubernetes/charts/
#helm repo add incubator http://mirror.azure.cn/kubernetes/charts-incubator/
#helm repo update

使用helm安装metrics-server(在 master01上执行,因为只有 master01装了 helm)

# 创建 metrics-server-custom.yaml
cat >> metrics-server-custom.yaml <<EOF
image:
repository: reg01.sky-mobi.com/k8s/gcr.io/google_containers/metrics-server-amd64
tag: v0.3.1
args:
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
EOF

# 安装 metrics-server(这里 -n 是 name)
[root@k8s-master01 ~]# helm install stable/metrics-server -n metrics-server --namespace kube-system --version=2.5.1 -f metrics-server-custom.yaml

[root@k8s-company01-master01 ~]# kubectl get pod -n kube-system  | grep metrics
metrics-server-dcbdb9468-c5f4n                   1/1     Running   0          21s

# 保存 yaml 文件退出后,metrics-server pod 会自动销毁原来的,拉起一个新的。新 pod 起来后,过一两分钟再执行kubectl top命令就有结果了:
[root@k8s-company01-master01 ~]# kubectl top node
NAME                      CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
k8s-company01-master01    404m         5%     1276Mi          4%
k8s-company01-master02    493m         6%     1240Mi          3%
k8s-company01-master03    516m         6%     1224Mi          3%
k8s-company01-worker001   466m         0%     601Mi           0%
k8s-company01-worker002   244m         0%     516Mi           0%
  • 使用helm安装prometheus-operator
# 为方便管理,创建一个单独的 Namespace monitoring,Prometheus Operator 相关的组件都会部署到这个 Namespace。

kubectl create namespace monitoring

## 自定义 prometheus-operator 参数
# helm fetch stable/prometheus-operator --version=5.0.3  --untar
# cat prometheus-operator/values.yaml  | grep -v '#' | grep -v ^$ > prometheus-operator-custom.yaml
# 只保留我们要修改 image 的部分,还有使用 https 连接 etcd,例如:
参考:https://fengxsong.github.io/2018/05/30/Using-helm-to-manage-prometheus-operator/

cat >> prometheus-operator-custom.yaml << EOF
## prometheus-operator/values.yaml
alertmanager:
service:
nodePort: 30503
type: NodePort
alertmanagerSpec:
image:
repository: reg01.sky-mobi.com/k8s/quay.io/prometheus/alertmanager
tag: v0.16.1
prometheusOperator:
image:
repository: reg01.sky-mobi.com/k8s/quay.io/coreos/prometheus-operator
tag: v0.29.0
pullPolicy: IfNotPresent
configmapReloadImage:
repository: reg01.sky-mobi.com/k8s/quay.io/coreos/configmap-reload
tag: v0.0.1
prometheusConfigReloaderImage:
repository: reg01.sky-mobi.com/k8s/quay.io/coreos/prometheus-config-reloader
tag: v0.29.0
hyperkubeImage:
repository: reg01.sky-mobi.com/k8s/k8s.gcr.io/hyperkube
tag: v1.12.1
pullPolicy: IfNotPresent
prometheus:
service:
nodePort: 30504
type: NodePort
prometheusSpec:
image:
repository: reg01.sky-mobi.com/k8s/quay.io/prometheus/prometheus
tag: v2.7.1
secrets: [etcd-client-cert]
kubeEtcd:
serviceMonitor:
scheme: https
insecureSkipVerify: false
serverName: ""
caFile: /etc/prometheus/secrets/etcd-client-cert/ca.crt
certFile: /etc/prometheus/secrets/etcd-client-cert/healthcheck-client.crt
keyFile: /etc/prometheus/secrets/etcd-client-cert/healthcheck-client.key

## prometheus-operator/charts/grafana/values.yaml
grafana:
service:
nodePort: 30505
type: NodePort
image:
repository: reg01.sky-mobi.com/k8s/grafana/grafana
tag: 6.0.2
sidecar:
image: reg01.sky-mobi.com/k8s/kiwigrid/k8s-sidecar:0.0.13

## prometheus-operator/charts/kube-state-metrics/values.yaml
kube-state-metrics:
image:
repository: reg01.sky-mobi.com/k8s/k8s.gcr.io/kube-state-metrics
tag: v1.5.0

## prometheus-operator/charts/prometheus-node-exporter/values.yaml
prometheus-node-exporter:
image:
repository: reg01.sky-mobi.com/k8s/quay.io/prometheus/node-exporter
tag: v0.17.0
EOF

## 注:以上的prometheus-operator/charts/grafana/values.yaml 对应项添加了 grafana (按chats 目录添加的:)
#[root@k8s-master01 ~]#  ll prometheus-operator/charts/
#total 0
#drwxr-xr-x 4 root root 114 Apr  1 00:48 grafana
#drwxr-xr-x 3 root root  96 Apr  1 00:18 kube-state-metrics
#drwxr-xr-x 3 root root 110 Apr  1 00:20 prometheus-node-exporter

# 创建连接 etcd 的证书secret:
kubectl -n monitoring create secret generic etcd-client-cert --from-file=/etc/kubernetes/pki/etcd/ca.crt --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.crt --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.key

helm install stable/prometheus-operator --version=5.0.3 --name=monitoring --namespace=monitoring -f prometheus-operator-custom.yaml

## 如果想要删除重来,可以使用 helm 删除,指定名字 monitoring
#helm del --purge monitoring
#kubectl delete crd prometheusrules.monitoring.coreos.com
#kubectl delete crd servicemonitors.monitoring.coreos.com
#kubectl delete crd alertmanagers.monitoring.coreos.com

重新安装 不要删除之前的,再安装可能会报错,用 upgrade 就好:
helm upgrade monitoring stable/prometheus-operator --version=5.0.3  --namespace=monitoring -f prometheus-operator-custom.yaml

[root@k8s-company01-master01 ~]# kubectl -n monitoring get pod
NAME                                                     READY   STATUS    RESTARTS   AGE
alertmanager-monitoring-prometheus-oper-alertmanager-0   2/2     Running   0          29m
monitoring-grafana-7dd5cf9dd7-wx8mz                      2/2     Running   0          29m
monitoring-kube-state-metrics-7d98487cfc-t6qqw           1/1     Running   0          29m
monitoring-prometheus-node-exporter-fnvp9                1/1     Running   0          29m
monitoring-prometheus-node-exporter-kczcq                1/1     Running   0          29m
monitoring-prometheus-node-exporter-m8kf6                1/1     Running   0          29m
monitoring-prometheus-node-exporter-mwc4g                1/1     Running   0          29m
monitoring-prometheus-node-exporter-wxmt8                1/1     Running   0          29m
monitoring-prometheus-oper-operator-7f96b488f6-2j7h5     1/1     Running   0          29m
prometheus-monitoring-prometheus-oper-prometheus-0       3/3     Running   1          28m

[root@k8s-company01-master01 ~]# kubectl get svc -n monitoring
NAME                                      TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
alertmanager-operated                     ClusterIP   None             <none>        9093/TCP,6783/TCP   31m
monitoring-grafana                        NodePort    10.109.159.105   <none>        80:30579/TCP        32m
monitoring-kube-state-metrics             ClusterIP   10.100.31.235    <none>        8080/TCP            32m
monitoring-prometheus-node-exporter       ClusterIP   10.109.119.13    <none>        9100/TCP            32m
monitoring-prometheus-oper-alertmanager   NodePort    10.105.171.135
4000
<none>        9093:31309/TCP      32m
monitoring-prometheus-oper-operator       ClusterIP   10.98.135.170    <none>        8080/TCP            32m
monitoring-prometheus-oper-prometheus     NodePort    10.96.15.36      <none>        9090:32489/TCP      32m
prometheus-operated                       ClusterIP   None             <none>        9090/TCP            31m

# 查看有没有异常告警,alerts里面的第一个Watchdog 是正常的报警,用于监控功能探测。
http://172.16.4.200:32489/alerts
http://172.16.4.200:32489/targets

#以下是安装 kubernetes-dashboard,用处不大,正式环境暂时不装
#helm install --name=kubernetes-dashboard stable/kubernetes-dashboard --version=1.4.0 --namespace=kube-system --set image.repository=reg01.sky-mobi.com/k8s/k8s.gcr.io/kubernetes-dashboard-amd64,image.tag=v1.10.1,rbac.clusterAdminRole=true

#Heapter 已在 Kubernetes 1.13 版本中移除(https://github.com/kubernetes/heapster/blob/master/docs/deprecation.md),推荐使用 metrics-server 与 Prometheus。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: