Kube State Metrics
3 分钟阅读
简要概述
K8S应用监控
Pod出现CrashLoopBackOff状态
- 告警规则
alert: KubePodStatusCrashLooping
expr: |
max_over_time(kube_pod_container_status_waiting_reason{appid!~"uptime",reason="CrashLoopBackOff"}[5m]) >= 1
for: 6m
labels:
api_opsaid_cn_fault_priority: P5
api_opsaid_cn_pm2_uuid: 99feafb5-bed6-4daf-927a-69a2ab80c485
annotations:
summary: 2分钟检测一次,持续6分钟,k8s集群中pod存在CrashLoopBackOff状态
- 原理分析
pass
Pod部署拉取镜像时间过长
- 告警规则
alert: KubePodStatusErrImagePull
alert: KubePodStatusImagePullBackOff
expr: |
max_over_time(kube_pod_container_status_waiting_reason{appid!~"uptime",reason="ImagePullBackOff"}[5m]) >= 1
for: 6m
labels:
api_opsaid_cn_fault_priority: P5
api_opsaid_cn_pm2_uuid: 99feafb5-bed6-4daf-927a-69a2ab80c485
annotations:
summary: 2分钟检测一次,持续6分钟,k8s集群中pod部署镜像未成功拉取
- 原理分析
pass
Pod出现非Running或Succeeded状态
alert: KubePodStatusNotRunning
expr: kube_pod_status_phase{appid!~"uptime",phase!~"Running|Succeeded"} == 1
for: 6m
labels:
api_opsaid_cn_fault_priority: P5
api_opsaid_cn_pm2_uuid: 99feafb5-bed6-4daf-927a-69a2ab80c485
annotations:
summary: 2分钟检测一次,持续6分钟,k8s集群中pod存在非Running状态
Deployment部署Generation与期望不一致
- 告警规则
alert: KubeDeploymentGenerationMismatch
expr: |
kube_deployment_status_observed_generation{appid!~"uptime"} != kube_deployment_metadata_generation{appid!~"uptime"}
for: 6m
labels:
api_opsaid_cn_fault_priority: P5
api_opsaid_cn_pm2_uuid: 99feafb5-bed6-4daf-927a-69a2ab80c485
annotations:
summary: 2分钟检测一次,持续6分钟,k8s集群中deployment部署generation与期望不一致
- 规则解析
每次变更deployment资源,metadata.generation值会加1,之后等待控制器同步,以便使status.observedGeneration值等于metadata.generation,所以这种情况:observedGeneration 只能是小于等于 generation,如果持续长时间不等,就需要介入排查,相关源码主要在以下路径:
pkg/controller/deployment/sync.go
通过代码中,还有一种场景为:observedGeneration 大于等于 generation,目前这还不清楚具体情况,待测试(TODO);
Deployment管理的Pod副本与期望不等
- 告警规则
alert: KubeDeploymentReplicasMismatch
expr: |
(
kube_deployment_spec_replicas{appid!~"uptime"} != kube_deployment_status_replicas_available{appid!~"uptime"}
)
and
(
changes(kube_deployment_status_replicas_updated{appid!~"uptime"}[10m]) == 0
)
for: 6m
labels:
api_opsaid_cn_fault_priority: P5
api_opsaid_cn_pm2_uuid: 99feafb5-bed6-4daf-927a-69a2ab80c485
annotations:
summary: 2分钟检测一次,持续6分钟,k8s集群中deployment存在副本数与期望不一致
- 规则解析
排除近10分钟在滚动更新的deployment资源,判断deployment期望与当前实际运行的副本是否一致。
( changes(kube_deployment_status_replicas_updated{appid!~“uptime”}[10m]) == 0) 用于计算最近10分钟内deployment.status.updatedReplicas 没有发生变化,这个一般在部署时会出现数量变更。
- 预警处理
TODO;
StatefulSet管理的Pod副本与期望不等
- 告警规则
alert: KubeStatefulSetReplicasMismatch
expr: |
(
kube_statefulset_spec_replicas{appid!~"uptime"} != kube_statefulset_status_replicas_available{appid!~"uptime"}
)
and
(
changes(kube_statefulset_status_replicas_updated{appid!~"uptime"}[10m]) == 0
)
for: 6m
labels:
api_opsaid_cn_fault_priority: P5
api_opsaid_cn_pm2_uuid: 99feafb5-bed6-4daf-927a-69a2ab80c485
annotations:
summary: 2分钟检测一次,持续6分钟,k8s集群中deployment存在副本数与期望不一致
- 规则解析
排除近10分钟在滚动更新的statefulset资源,判断statefulset期望与当前实际运行的副本是否一致。
( changes(kube_statefulset_status_replicas_updated{appid!~“uptime”}[10m]) == 0) 用于计算最近10分钟内statefulset.status.updatedReplicas 没有发生变化,这个一般在部署时会出现数量变更。
- 预警处理
TODO;
StatefulSet部署Generation与期望不一致
- 告警规则
alert: KubeStatefulSetGenerationMismatch
expr: |
kube_statefulset_status_observed_generation{appid!~"uptime"} != kube_statefulset_metadata_generation{appid!~"uptime"}
for: 6m
labels:
api_opsaid_cn_fault_priority: P5
api_opsaid_cn_pm2_uuid: 99feafb5-bed6-4daf-927a-69a2ab80c485
annotations:
summary: 2分钟检测一次,持续6分钟,k8s集群中statefulset部署generation与期望不一致
- 规则解析
每次变更statefulset资源,metadata.generation值会加1,之后等待控制器同步,以便使status.observedGeneration值等于metadata.generation,所以这种情况:observedGeneration 只能是小于等于 generation,如果持续长时间不等,就需要介入排查,相关源码主要在以下路径:
pkg/controller/statefulset/stateful_set_control.go
通过代码中,还有一种场景为:observedGeneration 大于等于 generation,目前这还不清楚具体情况,待测试(TODO);
StatefulSet更新未成功
- 告警规则
alert: KubeStatefulSetUpdateNotRolledOut
expr: |
(
max without (revision) (
kube_statefulset_status_current_revision{appid!~"uptime"}
unless
kube_statefulset_status_update_revision{appid!~"uptime"}
)
*
(
kube_statefulset_replicas{appid!~"uptime"}
!=
kube_statefulset_status_replicas_updated{appid!~"uptime"}
)
) and (
changes(kube_statefulset_status_replicas_updated{appid!~"uptime"}[5m])
==
0
)
for: 6m
labels:
api_opsaid_cn_fault_priority: P5
api_opsaid_cn_pm2_uuid: 99feafb5-bed6-4daf-927a-69a2ab80c485
annotations:
summary: 2分钟检测一次,持续6分钟,k8s集群中statefulset更新存在失败
- 规则解析
TODO
DaemonSet更新未成功
- 告警规则
alert: KubeDaemonSetRolloutStuck
expr: |
(
(
kube_daemonset_status_current_number_scheduled{appid!~"uptime"}
!=
kube_daemonset_status_desired_number_scheduled{appid!~"uptime"}
) or (
kube_daemonset_status_number_misscheduled{appid!~"uptime"}
!=
0
) or (
kube_daemonset_status_updated_number_scheduled{appid!~"uptime"}
!=
kube_daemonset_status_desired_number_scheduled{appid!~"uptime"}
) or (
kube_daemonset_status_number_available{appid!~"uptime"}
!=
kube_daemonset_status_desired_number_scheduled{appid!~"uptime"}
)
) and (
changes(kube_daemonset_status_updated_number_scheduled{appid!~"uptime"}[5m])
==
0
)
for: 6m
labels:
api_opsaid_cn_fault_priority: P5
api_opsaid_cn_pm2_uuid: 99feafb5-bed6-4daf-927a-69a2ab80c485
annotations:
summary: 2分钟检测一次,持续6分钟,k8s集群中daemonset更新存在失败
- 规则解析
TODO
DaemonSet存在未成功调度
- 告警规则
alert: KubeDaemonSetNotScheduled
expr: (kube_daemonset_status_desired_number_scheduled{appid!~"uptime"} - kube_daemonset_status_current_number_scheduled{appid!~"uptime"}) > 0
for: 6m
labels:
api_opsaid_cn_fault_priority: P5
api_opsaid_cn_pm2_uuid: 99feafb5-bed6-4daf-927a-69a2ab80c485
annotations:
summary: 2分钟检测一次,持续6分钟,k8s集群中daemonset存在未成功调度节点
- 规则解析
集群中期望被调度的节点数,减去当前实际已经调度的节点数,如果大于0则说明集群中存在未成功调度。
Pod等待过久
- 告警规则
alert: KubeContainerWaiting
expr: sum by (appid,namespace,pod,container,reason) (kube_pod_container_status_waiting_reason{appid!~"uptime"}) > 0
for: 30m
labels:
api_opsaid_cn_fault_priority: P5
api_opsaid_cn_pm2_uuid: 99feafb5-bed6-4daf-927a-69a2ab80c485
annotations:
summary: 2分钟检测一次,持续30分钟,k8s集群中存在容器还处于等待启动中
- 规则解析
通过定期获取 pod.status.containerStatuses 数组,循环判断 item.state.waiting 是否非空,如果存在则记录 item.state.waiting.reason 值。
github.com/kubernetes/kube-state-metrics/internal/store/pod.go
func createPodContainerStatusWaitingReasonFamilyGenerator() generator.FamilyGenerator {
return *generator.NewFamilyGenerator(
"kube_pod_container_status_waiting_reason",
"Describes the reason the container is currently in waiting state.",
metric.Gauge,
"",
wrapPodFunc(func(p *v1.Pod) *metric.Family {
ms := make([]*metric.Metric, 0, len(p.Status.ContainerStatuses))
for _, cs := range p.Status.ContainerStatuses {
// Skip creating series for running containers.
if cs.State.Waiting != nil {
ms = append(ms, &metric.Metric{
LabelKeys: []string{"container", "reason"},
LabelValues: []string{cs.Name, cs.State.Waiting.Reason},
Value: 1,
})
}
}
return &metric.Family{
Metrics: ms,
}
}),
)
}
DaemonSet在不该被调度的节点上运行
- 告警规则
alert: KubeDaemonSetMisScheduled
expr: kube_daemonset_status_number_misscheduled{appid!~"uptime"} > 0
for: 6m
labels:
api_opsaid_cn_fault_priority: P5
api_opsaid_cn_pm2_uuid: 99feafb5-bed6-4daf-927a-69a2ab80c485
annotations:
summary: 2分钟检测一次,持续6分钟,k8s集群中daemonset存在不匹配的调度
- 规则解析
daemonset在集群中应该有按照特定的标签调度部署至不同节点,可能存在某些场景导致pod在非期望的节点上运行,可以通过定期获取 daemonset.status.numberMisscheduled值,以判断是否有不匹配的调度。
github.com/kubernetes/kubernetes-1.18.8/pkg/controller/daemon/daemon_controller.go
Job执行时间过长
- 告警规则
alert: KubeJobNotCompleted
expr: |
(
time() - max by(namespace, job_name) (kube_job_status_start_time{appid!~"uptime"}
and
kube_job_status_active{appid!~"uptime"} > 0) > 43200
)
for: 6m
labels:
api_opsaid_cn_fault_priority: P5
api_opsaid_cn_pm2_uuid: 99feafb5-bed6-4daf-927a-69a2ab80c485
annotations:
summary: 2分钟检测一次,持续6分钟,k8s集群中Job {{ $labels.namespace }}/{{ $labels.job_name }} 执行时间超过 {{ "43200" | humanizeDuration }}
- 规则解析
获取当前正在运行的job列表,由当前时间减去任务开始执行时间,获取时差是否超过指定阈值。
Job执行失败
- 告警规则
alert: KubeJobFailed
expr: kube_job_failed{appid!~"uptime"} > 0
for: 6m
labels:
api_opsaid_cn_fault_priority: P5
api_opsaid_cn_pm2_uuid: 99feafb5-bed6-4daf-927a-69a2ab80c485
annotations:
summary: 2分钟检测一次,持续6分钟,k8s集群中Job {{ $labels.namespace }}/{{ $labels.job_name }} 执行失败
- 规则解析
获取 job.status.conditions 判断job是否执行失败,这个与 KubeJobNotCompleted 形成互补,有时候任务一直无法结束,这种情况下就不存在conditions数组值。
K8S资源监控
CPU超过
- 告警规则
alert: KubeCPUOvercommit
expr: kube_job_failed{appid!~"uptime"} > 0
for: 6m
labels:
api_opsaid_cn_fault_priority: P5
api_opsaid_cn_pm2_uuid: 99feafb5-bed6-4daf-927a-69a2ab80c485
annotations:
summary: 2分钟检测一次,持续6分钟,k8s集群中Job {{ $labels.namespace }}/{{ $labels.job_name }} 执行失败
- 规则解析
kube_pod_container_resource_requests{resource="cpu",namespace="user-cy7053",container="cms3html"}