您的位置:首页 > 运维架构 > Kubernetes

kubernetes部署 Prometheus Operator监控系统

2019-03-14 13:48 936 查看

 

版权声明:本文为博主原创文章,转载请注明出处。 https://blog.csdn.net/networken/article/details/85620793

Prometheus Operator简介

各组件功能说明:
1.MetricServer:是kubernetes集群资源使用情况的聚合器,收集数据给kubernetes集群内使用,如kubectl,hpa,scheduler等。
2.PrometheusOperator:是一个系统监测和警报工具箱,用来存储监控数据。
3.NodeExporter:用于各node的关键度量指标状态数据。
4.KubeStateMetrics:收集kubernetes集群内资源对象数据,制定告警规则。
5.Prometheus:采用pull方式收集apiserver,scheduler,controller-manager,kubelet组件数据,通过http协议传输。
6.Grafana:是可视化数据统计和监控平台。
7.Alertmanager:实现短信或邮件报警。

部署环境准备

kubernetes集群准备
Prometheus Operator的github链接:
https://github.com/coreos/prometheus-operator

Prometheus Operator所有yaml文件所在路径:
https://github.com/coreos/prometheus-operator/tree/master/contrib/kube-prometheus/manifests

克隆prometheus-operator仓库到本地:
git clone https://github.com/coreos/prometheus-operator.git

复制一份yaml文件到指定目录:
cp -R prometheus-operator/contrib/kube-prometheus/manifests/ $HOME && cd $HOME/manifests

一键部署所有yaml文件:

[code][centos@k8s-master manifests]$ kubectl apply -f .
  • 1

查看所有pod状态:
可能部分pod由于镜像拉取失败无法正常启动:

[code][centos@k8s-master manifests]$ kubectl get all -n monitoring  -o wide
NAME                                       READY   STATUS              RESTARTS   AGE   IP              NODE         NOMINATED NODE   READINESS GATES
pod/grafana-6689854d5-xtj6c                1/1     Running             0          25m   10.244.1.219    k8s-node1    <none>           <none>
pod/kube-state-metrics-86bc74fd4c-9pzj7    0/4     ContainerCreating   0          25m   <none>          k8s-node1    <none>           <none>
pod/node-exporter-5992x                    0/2     ErrImagePull        0          24m   192.168.92.56   k8s-master   <none>           <none>
pod/node-exporter-9mnpg                    0/2     ErrImagePull        0          24m   192.168.92.58   k8s-node2    <none>           <none>
pod/node-exporter-xzgsv                    0/2     ContainerCreating   0          24m   192.168.92.57   k8s-node1    <none>           <none>
pod/prometheus-adapter-5cc8b5d556-n9nvw    0/1     ContainerCreating   0          25m   <none>          k8s-node1    <none>           <none>
pod/prometheus-operator-5cfb7f4c54-bzc29   0/1     ContainerCreating   0          25m   <none>          k8s-node1    <none>           <none>
  •  

获取拉取失败的镜像:

[code][centos@k8s-master manifests]$  kubectl describe pod node-exporter-5992x -n monitoring
......
Events:
Type     Reason     Age                From                 Message
----     ------     ----               ----                 -------
Normal   Scheduled  30m                default-scheduler    Successfully assigned monitoring/node-exporter-5992x to k8s-master
Warning  Failed     24m                kubelet, k8s-master  Failed to pull image "quay.io/prometheus/node-exporter:v0.16.0": rpc error: code = Unknown desc = context canceled
Warning  Failed     24m                kubelet, k8s-master  Error: ErrImagePull
Normal   Pulling    24m                kubelet, k8s-master  pulling image "quay.io/coreos/kube-rbac-proxy:v0.4.0"
Normal   Pulling    19m (x2 over 30m)  kubelet, k8s-master  pulling image "quay.io/prometheus/node-exporter:v0.16.0"
Warning  Failed     19m                kubelet, k8s-master  Failed to pull image "quay.io/coreos/kube-rbac-proxy:v0.4.0": rpc error: code = Unknown desc = net/http: TLS handshake timeout
Warning  Failed     19m                kubelet, k8s-master  Error: ErrImagePull
  •  

查看Events信息可以看到有2个镜像在k8s-master节点上拉取失败:
quay.io/coreos/kube-rbac-proxy:v0.4.0
quay.io/prometheus/node-exporter:v0.16.0

从阿里云拉取镜像
登录k8s-master节点,手动拉取镜像,或者从阿里云或者dockerhub镜像仓库搜索镜像,拉取到本地后修改tag。

需要拉取的镜像列表:

[code]#node-exporter-daemonset.yaml
quay.io/prometheus/node-exporter:v0.16.0
quay.io/coreos/kube-rbac-proxy:v0.4.0

#kube-state-metrics-deployment.yaml
quay.io/coreos/kube-state-metrics:v1.4.0
quay.io/coreos/addon-resizer:1.0

#0prometheus-operator-deployment.yaml
quay.io/coreos/configmap-reload:v0.0.1
quay.io/coreos/prometheus-config-reloader:v0.26.0
quay.io/coreos/prometheus-operator:v0.26.0

#alertmanager-alertmanager.yaml
quay.io/prometheus/alertmanager:v0.15.3

#prometheus-adapter-deployment.yaml
quay.io/coreos/k8s-prometheus-adapter-amd64:v0.4.1

#prometheus-prometheus.yaml
quay.io/prometheus/prometheus:v2.5.0

#grafana-deployment.yaml
grafana/grafana:5.2.4

以上镜像已经push到阿里云仓库,准备镜像列表文件imagepath.txt放在$HOME目录下,其他版本镜像请自行搜索获取。

[code]cat $HOME/imagepath.txt
quay.io/prometheus/node-exporter:v0.16.0
quay.io/coreos/kube-rbac-proxy:v0.4.0
......

运行以下脚本将镜像列表文件imagepath.txt中的镜像全部拉取到本地所有节点:

[code]wget -O- https://raw.githubusercontent.com/zhwill/LinuxShell/master/pull-aliyun-images.sh | sh
  • 1

查看所有pod状态:

[code][centos@k8s-master manifests]$ kubectl get pod -n monitoring -o wide
NAME                                   READY   STATUS    RESTARTS   AGE   IP              NODE         NOMINATED NODE   READINESS GATES
alertmanager-main-0                    2/2     Running   2          19h   10.244.1.227    k8s-node1    <none>           <none>
alertmanager-main-1                    2/2     Running   2          18h   10.244.2.200    k8s-node2    <none>           <none>
alertmanager-main-2                    2/2     Running   2          17h   10.244.1.230    k8s-node1    <none>           <none>
grafana-6689854d5-xtj6c                1/1     Running   1          19h   10.244.1.233    k8s-node1    <none>           <none>
kube-state-metrics-75fd9687fc-dmmlw    4/4     Running   4          19h   10.244.2.205    k8s-node2    <none>           <none>
node-exporter-5992x                    2/2     Running   2          19h   192.168.92.56   k8s-master   <none>           <none>
node-exporter-9mnpg                    2/2     Running   2          19h   192.168.92.58   k8s-node2    <none>           <none>
node-exporter-xzgsv                    2/2     Running   2          19h   192.168.92.57   k8s-node1    <none>           <none>
prometheus-adapter-5cc8b5d556-n9nvw    1/1     Running   1          19h   10.244.1.234    k8s-node1    <none>           <none>
prometheus-k8s-0                       3/3     Running   11         19h   10.244.1.235    k8s-node1    <none>           <none>
prometheus-k8s-1                       3/3     Running   5          17h   10.244.2.207    k8s-node2    <none>           <none>
prometheus-operator-5cfb7f4c54-bzc29   1/1     Running   1          19h   10.244.1.229    k8s-node1    <none>           <none>
[centos@k8s-master manifests]$

所有pod状态为running说明部署成功。

配置NodePort

修改grafana-service.yaml文件,使用nodepode方式访问grafana:

[code][centos@k8s-master manifests]$ vim grafana-service.yaml
apiVersion: v1
kind: Service
metadata:
name: grafana
namespace: monitoring
spec:
type: NodePort      #添加内容
ports:
- name: http
port: 3000
targetPort: http
nodePort: 30100   #添加内容
selector:
app: grafana

修改prometheus-service.yaml,改为nodepode

[code][centos@k8s-master manifests]$ vim prometheus-service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
prometheus: k8s
name: prometheus-k8s
namespace: monitoring
spec:
type: NodePort
ports:
- name: web
port: 9090
targetPort: web
nodePort: 30200
selector:
app: prometheus
prometheus: k8s

修改alertmanager-service.yaml,改为nodepode

[code][centos@k8s-master manifests]$ vim alertmanager-service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
alertmanager: main
name: alertmanager-main
namespace: monitoring
spec:
type: NodePort
ports:
- name: web
port: 9093
targetPort: web
nodePort: 30300
selector:
alertmanager: main
app: alertmanager

访问prometheus

prometheus对应的nodeport端口为30200,访问http://192.168.115.5:30200

通过访问http://192.168.115.5:30200/target 可以看到prometheus已经成功连接上了k8s的apiserver

查看service-discovery

Prometheus自己的指标

prometheus的WEB界面上提供了基本的查询K8S集群中每个POD的CPU使用情况,查询条件如下:
sum by (pod_name)( rate(container_cpu_usage_seconds_total{image!="", pod_name!=""}[1m] ) )

上述的查询有出现数据,说明node-exporter往prometheus中写入数据正常,接下来我们就可以部署grafana组件,实现更友好的webui展示数据了。

访问grafana

查看grafana服务暴露的端口号:

[code][centos@k8s-master ~]$ kubectl get service -n monitoring  | grep grafana
grafana                 NodePort    10.107.56.143    <none>        3000:30100/TCP      20h
[centos@k8s-master ~]$

如上可以看到grafana的端口号是30100,浏览器访问http://192.168.92.56:30100
用户名密码默认admin/admin

修改密码并登陆。

添加数据源
grafana默认已经添加了Prometheus数据源,grafana支持多种时序数据源,每种数据源都有各自的查询编辑器。

Prometheus数据源的相关参数:

目前官方支持了如下几种数据源:

导入dashboard:
导入面板,可以直接输入模板编号315在线导入,或者下载好对应的json模板文件本地导入,面板模板下载地址:
https://grafana.com/dashboards/315
https://grafana.com/dashboards/8919

导入面板之后就可以看到对应的监控数据了,点击HOME选择查看,其实Grafana已经预定义了一系列Dashboard:

查看集群监控信息
另外一个dashborad模板

 可以监控 Kubernetes 集群的整体健康状态
 整个集群的资源使用情况
 Kubernetes 各个管理组件的状态
 整个集群的资源使用情况
 节点的资源使用情况
 Deployment 的运行状态
 Pod 的运行状态
这些 Dashboard 展示了从集群到 Pod 的运行状况,能够帮助用户更好地运维 Kubernetes。

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: