kubernetes部署 Prometheus Operator监控系统
版权声明:本文为博主原创文章,转载请注明出处。 https://blog.csdn.net/networken/article/details/85620793
Prometheus Operator简介
各组件功能说明:
1.MetricServer:是kubernetes集群资源使用情况的聚合器,收集数据给kubernetes集群内使用,如kubectl,hpa,scheduler等。
2.PrometheusOperator:是一个系统监测和警报工具箱,用来存储监控数据。
3.NodeExporter:用于各node的关键度量指标状态数据。
4.KubeStateMetrics:收集kubernetes集群内资源对象数据,制定告警规则。
5.Prometheus:采用pull方式收集apiserver,scheduler,controller-manager,kubelet组件数据,通过http协议传输。
6.Grafana:是可视化数据统计和监控平台。
7.Alertmanager:实现短信或邮件报警。
部署环境准备
kubernetes集群准备
Prometheus Operator的github链接:
https://github.com/coreos/prometheus-operator
Prometheus Operator所有yaml文件所在路径:
https://github.com/coreos/prometheus-operator/tree/master/contrib/kube-prometheus/manifests
克隆prometheus-operator仓库到本地:
git clone https://github.com/coreos/prometheus-operator.git
复制一份yaml文件到指定目录:
cp -R prometheus-operator/contrib/kube-prometheus/manifests/ $HOME && cd $HOME/manifests
一键部署所有yaml文件:
[code][centos@k8s-master manifests]$ kubectl apply -f .
- 1
查看所有pod状态:
可能部分pod由于镜像拉取失败无法正常启动:
[code][centos@k8s-master manifests]$ kubectl get all -n monitoring -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod/grafana-6689854d5-xtj6c 1/1 Running 0 25m 10.244.1.219 k8s-node1 <none> <none> pod/kube-state-metrics-86bc74fd4c-9pzj7 0/4 ContainerCreating 0 25m <none> k8s-node1 <none> <none> pod/node-exporter-5992x 0/2 ErrImagePull 0 24m 192.168.92.56 k8s-master <none> <none> pod/node-exporter-9mnpg 0/2 ErrImagePull 0 24m 192.168.92.58 k8s-node2 <none> <none> pod/node-exporter-xzgsv 0/2 ContainerCreating 0 24m 192.168.92.57 k8s-node1 <none> <none> pod/prometheus-adapter-5cc8b5d556-n9nvw 0/1 ContainerCreating 0 25m <none> k8s-node1 <none> <none> pod/prometheus-operator-5cfb7f4c54-bzc29 0/1 ContainerCreating 0 25m <none> k8s-node1 <none> <none>
获取拉取失败的镜像:
[code][centos@k8s-master manifests]$ kubectl describe pod node-exporter-5992x -n monitoring ...... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 30m default-scheduler Successfully assigned monitoring/node-exporter-5992x to k8s-master Warning Failed 24m kubelet, k8s-master Failed to pull image "quay.io/prometheus/node-exporter:v0.16.0": rpc error: code = Unknown desc = context canceled Warning Failed 24m kubelet, k8s-master Error: ErrImagePull Normal Pulling 24m kubelet, k8s-master pulling image "quay.io/coreos/kube-rbac-proxy:v0.4.0" Normal Pulling 19m (x2 over 30m) kubelet, k8s-master pulling image "quay.io/prometheus/node-exporter:v0.16.0" Warning Failed 19m kubelet, k8s-master Failed to pull image "quay.io/coreos/kube-rbac-proxy:v0.4.0": rpc error: code = Unknown desc = net/http: TLS handshake timeout Warning Failed 19m kubelet, k8s-master Error: ErrImagePull
查看Events信息可以看到有2个镜像在k8s-master节点上拉取失败:
quay.io/coreos/kube-rbac-proxy:v0.4.0
quay.io/prometheus/node-exporter:v0.16.0
从阿里云拉取镜像
登录k8s-master节点,手动拉取镜像,或者从阿里云或者dockerhub镜像仓库搜索镜像,拉取到本地后修改tag。
需要拉取的镜像列表:
[code]#node-exporter-daemonset.yaml quay.io/prometheus/node-exporter:v0.16.0 quay.io/coreos/kube-rbac-proxy:v0.4.0 #kube-state-metrics-deployment.yaml quay.io/coreos/kube-state-metrics:v1.4.0 quay.io/coreos/addon-resizer:1.0 #0prometheus-operator-deployment.yaml quay.io/coreos/configmap-reload:v0.0.1 quay.io/coreos/prometheus-config-reloader:v0.26.0 quay.io/coreos/prometheus-operator:v0.26.0 #alertmanager-alertmanager.yaml quay.io/prometheus/alertmanager:v0.15.3 #prometheus-adapter-deployment.yaml quay.io/coreos/k8s-prometheus-adapter-amd64:v0.4.1 #prometheus-prometheus.yaml quay.io/prometheus/prometheus:v2.5.0 #grafana-deployment.yaml grafana/grafana:5.2.4
以上镜像已经push到阿里云仓库,准备镜像列表文件imagepath.txt放在$HOME目录下,其他版本镜像请自行搜索获取。
[code]cat $HOME/imagepath.txt quay.io/prometheus/node-exporter:v0.16.0 quay.io/coreos/kube-rbac-proxy:v0.4.0 ......
运行以下脚本将镜像列表文件imagepath.txt中的镜像全部拉取到本地所有节点:
[code]wget -O- https://raw.githubusercontent.com/zhwill/LinuxShell/master/pull-aliyun-images.sh | sh
- 1
查看所有pod状态:
[code][centos@k8s-master manifests]$ kubectl get pod -n monitoring -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES alertmanager-main-0 2/2 Running 2 19h 10.244.1.227 k8s-node1 <none> <none> alertmanager-main-1 2/2 Running 2 18h 10.244.2.200 k8s-node2 <none> <none> alertmanager-main-2 2/2 Running 2 17h 10.244.1.230 k8s-node1 <none> <none> grafana-6689854d5-xtj6c 1/1 Running 1 19h 10.244.1.233 k8s-node1 <none> <none> kube-state-metrics-75fd9687fc-dmmlw 4/4 Running 4 19h 10.244.2.205 k8s-node2 <none> <none> node-exporter-5992x 2/2 Running 2 19h 192.168.92.56 k8s-master <none> <none> node-exporter-9mnpg 2/2 Running 2 19h 192.168.92.58 k8s-node2 <none> <none> node-exporter-xzgsv 2/2 Running 2 19h 192.168.92.57 k8s-node1 <none> <none> prometheus-adapter-5cc8b5d556-n9nvw 1/1 Running 1 19h 10.244.1.234 k8s-node1 <none> <none> prometheus-k8s-0 3/3 Running 11 19h 10.244.1.235 k8s-node1 <none> <none> prometheus-k8s-1 3/3 Running 5 17h 10.244.2.207 k8s-node2 <none> <none> prometheus-operator-5cfb7f4c54-bzc29 1/1 Running 1 19h 10.244.1.229 k8s-node1 <none> <none> [centos@k8s-master manifests]$
所有pod状态为running说明部署成功。
配置NodePort
修改grafana-service.yaml文件,使用nodepode方式访问grafana:
[code][centos@k8s-master manifests]$ vim grafana-service.yaml apiVersion: v1 kind: Service metadata: name: grafana namespace: monitoring spec: type: NodePort #添加内容 ports: - name: http port: 3000 targetPort: http nodePort: 30100 #添加内容 selector: app: grafana
修改prometheus-service.yaml,改为nodepode
[code][centos@k8s-master manifests]$ vim prometheus-service.yaml apiVersion: v1 kind: Service metadata: labels: prometheus: k8s name: prometheus-k8s namespace: monitoring spec: type: NodePort ports: - name: web port: 9090 targetPort: web nodePort: 30200 selector: app: prometheus prometheus: k8s
修改alertmanager-service.yaml,改为nodepode
[code][centos@k8s-master manifests]$ vim alertmanager-service.yaml apiVersion: v1 kind: Service metadata: labels: alertmanager: main name: alertmanager-main namespace: monitoring spec: type: NodePort ports: - name: web port: 9093 targetPort: web nodePort: 30300 selector: alertmanager: main app: alertmanager
访问prometheus
prometheus对应的nodeport端口为30200,访问http://192.168.115.5:30200
通过访问http://192.168.115.5:30200/target 可以看到prometheus已经成功连接上了k8s的apiserver
查看service-discovery
Prometheus自己的指标
prometheus的WEB界面上提供了基本的查询K8S集群中每个POD的CPU使用情况,查询条件如下:
sum by (pod_name)( rate(container_cpu_usage_seconds_total{image!="", pod_name!=""}[1m] ) )
上述的查询有出现数据,说明node-exporter往prometheus中写入数据正常,接下来我们就可以部署grafana组件,实现更友好的webui展示数据了。
访问grafana
查看grafana服务暴露的端口号:
[code][centos@k8s-master ~]$ kubectl get service -n monitoring | grep grafana grafana NodePort 10.107.56.143 <none> 3000:30100/TCP 20h [centos@k8s-master ~]$
如上可以看到grafana的端口号是30100,浏览器访问http://192.168.92.56:30100
用户名密码默认admin/admin
修改密码并登陆。
添加数据源
grafana默认已经添加了Prometheus数据源,grafana支持多种时序数据源,每种数据源都有各自的查询编辑器。
Prometheus数据源的相关参数:
目前官方支持了如下几种数据源:
导入dashboard:
导入面板,可以直接输入模板编号315在线导入,或者下载好对应的json模板文件本地导入,面板模板下载地址:
https://grafana.com/dashboards/315
https://grafana.com/dashboards/8919
导入面板之后就可以看到对应的监控数据了,点击HOME选择查看,其实Grafana已经预定义了一系列Dashboard:
查看集群监控信息
另外一个dashborad模板
可以监控 Kubernetes 集群的整体健康状态
整个集群的资源使用情况
Kubernetes 各个管理组件的状态
整个集群的资源使用情况
节点的资源使用情况
Deployment 的运行状态
Pod 的运行状态
这些 Dashboard 展示了从集群到 Pod 的运行状况,能够帮助用户更好地运维 Kubernetes。
- Kubernetes+Prometheus+Grafana部署笔记
- Kubernetes + Prometheus + Grafana 集群监控
- Docker+ cadvisor+Prometheus+Grafana监控部署实践
- Prometheus+Grafana搭建监控系统(一)
- Prometheus+Grafana搭建监控系统
- prometheus+grafana 监控生产环境机器的系统信息、redis、mongodb以及jmx
- prometheus+node_exporter监控系统搭建
- Kubernetes+Prometheus+Grafana部署笔记
- 部署 Prometheus Operator - 每天5分钟玩转 Docker 容器技术(179)
- kubernetes上部署Fluentd+Elasticsearch+kibana日志收集系统
- prometheus+grafana构建应用监控(二)
- 在Linux中安装部署模块化系统监控工具Hegemon
- Grafana + Zabbix --- 部署分布式监控系统
- Shell脚本快速部署Kubernetes集群系统
- SCOM 2007 R2监控系统安装部署(四)使用SCOM 2007 R2监控AD域控制器
- 部署 Nagios 监控系统
- SCOM 2007 R2监控系统安装部署(一)SCOM简介及安装SQL Server 2008 R2 数据库 推荐
- 部署Nagios监控系统(三)NRPE监控远程主机系统状况 推荐
- CentOS+Puppet分布式部署Zabbix监控系统