Kube State Metrics

简要概述

kube-state-metrics

K8S应用监控

Pod出现CrashLoopBackOff状态

  • 告警规则
alert: KubePodStatusCrashLooping
expr: |
  max_over_time(kube_pod_container_status_waiting_reason{appid!~"uptime",reason="CrashLoopBackOff"}[5m]) >= 1
for: 6m
labels:
  api_opsaid_cn_fault_priority: P5
  api_opsaid_cn_pm2_uuid: 99feafb5-bed6-4daf-927a-69a2ab80c485
annotations:
  summary: 2分钟检测一次,持续6分钟,k8s集群中pod存在CrashLoopBackOff状态
  • 原理分析

pass

Pod部署拉取镜像时间过长

  • 告警规则

alert: KubePodStatusErrImagePull

alert: KubePodStatusImagePullBackOff
expr: |
  max_over_time(kube_pod_container_status_waiting_reason{appid!~"uptime",reason="ImagePullBackOff"}[5m]) >= 1
for: 6m
labels:
  api_opsaid_cn_fault_priority: P5
  api_opsaid_cn_pm2_uuid: 99feafb5-bed6-4daf-927a-69a2ab80c485
annotations:
  summary: 2分钟检测一次,持续6分钟,k8s集群中pod部署镜像未成功拉取
  • 原理分析

pass

Pod出现非Running或Succeeded状态

alert: KubePodStatusNotRunning
expr: kube_pod_status_phase{appid!~"uptime",phase!~"Running|Succeeded"} == 1
for: 6m
labels:
  api_opsaid_cn_fault_priority: P5
  api_opsaid_cn_pm2_uuid: 99feafb5-bed6-4daf-927a-69a2ab80c485
annotations:
  summary: 2分钟检测一次,持续6分钟,k8s集群中pod存在非Running状态

Deployment部署Generation与期望不一致

  • 告警规则
alert: KubeDeploymentGenerationMismatch
expr: |
  kube_deployment_status_observed_generation{appid!~"uptime"} != kube_deployment_metadata_generation{appid!~"uptime"}
for: 6m
labels:
  api_opsaid_cn_fault_priority: P5
  api_opsaid_cn_pm2_uuid: 99feafb5-bed6-4daf-927a-69a2ab80c485
annotations:
  summary: 2分钟检测一次,持续6分钟,k8s集群中deployment部署generation与期望不一致
  • 规则解析

每次变更deployment资源,metadata.generation值会加1,之后等待控制器同步,以便使status.observedGeneration值等于metadata.generation,所以这种情况:observedGeneration 只能是小于等于 generation,如果持续长时间不等,就需要介入排查,相关源码主要在以下路径:

pkg/controller/deployment/sync.go

通过代码中,还有一种场景为:observedGeneration 大于等于 generation,目前这还不清楚具体情况,待测试(TODO);

Deployment管理的Pod副本与期望不等

  • 告警规则
alert: KubeDeploymentReplicasMismatch
expr: |
  (
    kube_deployment_spec_replicas{appid!~"uptime"} != kube_deployment_status_replicas_available{appid!~"uptime"}
  ) 
    and 
  ( 
    changes(kube_deployment_status_replicas_updated{appid!~"uptime"}[10m]) == 0
  )
for: 6m
labels:
  api_opsaid_cn_fault_priority: P5
  api_opsaid_cn_pm2_uuid: 99feafb5-bed6-4daf-927a-69a2ab80c485
annotations:
  summary: 2分钟检测一次,持续6分钟,k8s集群中deployment存在副本数与期望不一致
  • 规则解析

排除近10分钟在滚动更新的deployment资源,判断deployment期望与当前实际运行的副本是否一致。

( changes(kube_deployment_status_replicas_updated{appid!~“uptime”}[10m]) == 0) 用于计算最近10分钟内deployment.status.updatedReplicas 没有发生变化,这个一般在部署时会出现数量变更。

  • 预警处理

TODO;

StatefulSet管理的Pod副本与期望不等

  • 告警规则
alert: KubeStatefulSetReplicasMismatch
expr: |
  (
    kube_statefulset_spec_replicas{appid!~"uptime"} != kube_statefulset_status_replicas_available{appid!~"uptime"}
  ) 
    and 
  ( 
    changes(kube_statefulset_status_replicas_updated{appid!~"uptime"}[10m]) == 0
  )
for: 6m
labels:
  api_opsaid_cn_fault_priority: P5
  api_opsaid_cn_pm2_uuid: 99feafb5-bed6-4daf-927a-69a2ab80c485
annotations:
  summary: 2分钟检测一次,持续6分钟,k8s集群中deployment存在副本数与期望不一致
  • 规则解析

排除近10分钟在滚动更新的statefulset资源,判断statefulset期望与当前实际运行的副本是否一致。

( changes(kube_statefulset_status_replicas_updated{appid!~“uptime”}[10m]) == 0) 用于计算最近10分钟内statefulset.status.updatedReplicas 没有发生变化,这个一般在部署时会出现数量变更。

  • 预警处理

TODO;

StatefulSet部署Generation与期望不一致

  • 告警规则
alert: KubeStatefulSetGenerationMismatch
expr: |
  kube_statefulset_status_observed_generation{appid!~"uptime"} != kube_statefulset_metadata_generation{appid!~"uptime"}
for: 6m
labels:
  api_opsaid_cn_fault_priority: P5
  api_opsaid_cn_pm2_uuid: 99feafb5-bed6-4daf-927a-69a2ab80c485
annotations:
  summary: 2分钟检测一次,持续6分钟,k8s集群中statefulset部署generation与期望不一致
  • 规则解析

每次变更statefulset资源,metadata.generation值会加1,之后等待控制器同步,以便使status.observedGeneration值等于metadata.generation,所以这种情况:observedGeneration 只能是小于等于 generation,如果持续长时间不等,就需要介入排查,相关源码主要在以下路径:

pkg/controller/statefulset/stateful_set_control.go

通过代码中,还有一种场景为:observedGeneration 大于等于 generation,目前这还不清楚具体情况,待测试(TODO);

StatefulSet更新未成功

  • 告警规则
alert: KubeStatefulSetUpdateNotRolledOut
expr: |
  (
    max without (revision) (
      kube_statefulset_status_current_revision{appid!~"uptime"}
        unless
      kube_statefulset_status_update_revision{appid!~"uptime"}
    )
      *
    (
      kube_statefulset_replicas{appid!~"uptime"}
        !=
      kube_statefulset_status_replicas_updated{appid!~"uptime"}
    )
  )  and (
    changes(kube_statefulset_status_replicas_updated{appid!~"uptime"}[5m])
      ==
    0
  )
for: 6m
labels:
  api_opsaid_cn_fault_priority: P5
  api_opsaid_cn_pm2_uuid: 99feafb5-bed6-4daf-927a-69a2ab80c485
annotations:
  summary: 2分钟检测一次,持续6分钟,k8s集群中statefulset更新存在失败
  • 规则解析

TODO

DaemonSet更新未成功

  • 告警规则
alert: KubeDaemonSetRolloutStuck
expr: |
  (
    (
      kube_daemonset_status_current_number_scheduled{appid!~"uptime"}
       !=
      kube_daemonset_status_desired_number_scheduled{appid!~"uptime"}
    ) or (
      kube_daemonset_status_number_misscheduled{appid!~"uptime"}
       !=
      0
    ) or (
      kube_daemonset_status_updated_number_scheduled{appid!~"uptime"}
       !=
      kube_daemonset_status_desired_number_scheduled{appid!~"uptime"}
    ) or (
      kube_daemonset_status_number_available{appid!~"uptime"}
       !=
      kube_daemonset_status_desired_number_scheduled{appid!~"uptime"}
    )
  ) and (
    changes(kube_daemonset_status_updated_number_scheduled{appid!~"uptime"}[5m])
      ==
    0
  )
for: 6m
labels:
  api_opsaid_cn_fault_priority: P5
  api_opsaid_cn_pm2_uuid: 99feafb5-bed6-4daf-927a-69a2ab80c485
annotations:
  summary: 2分钟检测一次,持续6分钟,k8s集群中daemonset更新存在失败
  • 规则解析

TODO

DaemonSet存在未成功调度

  • 告警规则
alert: KubeDaemonSetNotScheduled
expr: (kube_daemonset_status_desired_number_scheduled{appid!~"uptime"} - kube_daemonset_status_current_number_scheduled{appid!~"uptime"}) > 0
for: 6m
labels:
  api_opsaid_cn_fault_priority: P5
  api_opsaid_cn_pm2_uuid: 99feafb5-bed6-4daf-927a-69a2ab80c485
annotations:
  summary: 2分钟检测一次,持续6分钟,k8s集群中daemonset存在未成功调度节点
  • 规则解析

集群中期望被调度的节点数,减去当前实际已经调度的节点数,如果大于0则说明集群中存在未成功调度。

Pod等待过久

  • 告警规则
alert: KubeContainerWaiting
expr: sum by (appid,namespace,pod,container,reason) (kube_pod_container_status_waiting_reason{appid!~"uptime"}) > 0
for: 30m
labels:
  api_opsaid_cn_fault_priority: P5
  api_opsaid_cn_pm2_uuid: 99feafb5-bed6-4daf-927a-69a2ab80c485
annotations:
  summary: 2分钟检测一次,持续30分钟,k8s集群中存在容器还处于等待启动中
  • 规则解析

通过定期获取 pod.status.containerStatuses 数组,循环判断 item.state.waiting 是否非空,如果存在则记录 item.state.waiting.reason 值。

github.com/kubernetes/kube-state-metrics/internal/store/pod.go
func createPodContainerStatusWaitingReasonFamilyGenerator() generator.FamilyGenerator {
    return *generator.NewFamilyGenerator(
        "kube_pod_container_status_waiting_reason",
        "Describes the reason the container is currently in waiting state.",
        metric.Gauge,
        "",
        wrapPodFunc(func(p *v1.Pod) *metric.Family {
            ms := make([]*metric.Metric, 0, len(p.Status.ContainerStatuses))
            for _, cs := range p.Status.ContainerStatuses {
                // Skip creating series for running containers.
                if cs.State.Waiting != nil {
                    ms = append(ms, &metric.Metric{
                        LabelKeys:   []string{"container", "reason"},
                        LabelValues: []string{cs.Name, cs.State.Waiting.Reason},
                        Value:       1,
                    })
                }
            }
            return &metric.Family{
                Metrics: ms,
            }
        }),
    )
}

DaemonSet在不该被调度的节点上运行

  • 告警规则
alert: KubeDaemonSetMisScheduled
expr: kube_daemonset_status_number_misscheduled{appid!~"uptime"} > 0
for: 6m
labels:
  api_opsaid_cn_fault_priority: P5
  api_opsaid_cn_pm2_uuid: 99feafb5-bed6-4daf-927a-69a2ab80c485
annotations:
  summary: 2分钟检测一次,持续6分钟,k8s集群中daemonset存在不匹配的调度
  • 规则解析

daemonset在集群中应该有按照特定的标签调度部署至不同节点,可能存在某些场景导致pod在非期望的节点上运行,可以通过定期获取 daemonset.status.numberMisscheduled值,以判断是否有不匹配的调度。

github.com/kubernetes/kubernetes-1.18.8/pkg/controller/daemon/daemon_controller.go

Job执行时间过长

  • 告警规则
alert: KubeJobNotCompleted
expr: |
  (
    time() - max by(namespace, job_name) (kube_job_status_start_time{appid!~"uptime"}
    and
    kube_job_status_active{appid!~"uptime"} > 0) > 43200  
  )
for: 6m
labels:
  api_opsaid_cn_fault_priority: P5
  api_opsaid_cn_pm2_uuid: 99feafb5-bed6-4daf-927a-69a2ab80c485
annotations:
  summary: 2分钟检测一次,持续6分钟,k8s集群中Job {{ $labels.namespace }}/{{ $labels.job_name }} 执行时间超过 {{ "43200" | humanizeDuration }}
  • 规则解析

获取当前正在运行的job列表,由当前时间减去任务开始执行时间,获取时差是否超过指定阈值。

Job执行失败

  • 告警规则
alert: KubeJobFailed
expr: kube_job_failed{appid!~"uptime"} > 0
for: 6m
labels:
  api_opsaid_cn_fault_priority: P5
  api_opsaid_cn_pm2_uuid: 99feafb5-bed6-4daf-927a-69a2ab80c485
annotations:
  summary: 2分钟检测一次,持续6分钟,k8s集群中Job {{ $labels.namespace }}/{{ $labels.job_name }} 执行失败
  • 规则解析

获取 job.status.conditions 判断job是否执行失败,这个与 KubeJobNotCompleted 形成互补,有时候任务一直无法结束,这种情况下就不存在conditions数组值。

K8S资源监控

CPU超过

  • 告警规则
alert: KubeCPUOvercommit
expr: kube_job_failed{appid!~"uptime"} > 0
for: 6m
labels:
  api_opsaid_cn_fault_priority: P5
  api_opsaid_cn_pm2_uuid: 99feafb5-bed6-4daf-927a-69a2ab80c485
annotations:
  summary: 2分钟检测一次,持续6分钟,k8s集群中Job {{ $labels.namespace }}/{{ $labels.job_name }} 执行失败
  • 规则解析
kube_pod_container_resource_requests{resource="cpu",namespace="user-cy7053",container="cms3html"}



最后修改 2023.02.13: feat: 添加告警规则 (2fcc453)