kube-apiserver

简要概述

记录 kube-apiserver 相关问题。

无法启动类

“dial tcp 127.0.0.1:8080: connect: connection refused”

  • 关键日志
root@k8s/etc/kubernetes/manifests# kubectl get nodes
E0803 10:01:41.668372  300010 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
E0803 10:01:41.669287  300010 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
E0803 10:01:41.671326  300010 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
E0803 10:01:41.672078  300010 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
E0803 10:01:41.674043  300010 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
The connection to the server localhost:8080 was refused - did you specify the right host or port?
root@k8s/etc/kubernetes/manifests#
  • 问题原因

新版不在开放本地的 “127.0.0.1:8080” 端口,需配置下 kubeconfig 路径,如:$HOME/.kube/config

“failed to verify certificate: x509”

  • 关键日志
root@k8s/etc/kubernetes/manifests# kubectl get nodes
E0803 10:27:46.079494  300470 memcache.go:265] couldn't get current server API group list: Get "https://10.49.2.108:6443/api?timeout=32s": tls: failed to verify certificate: x509: certificate signed by unknown authority
E0803 10:27:46.094227  300470 memcache.go:265] couldn't get current server API group list: Get "https://10.49.2.108:6443/api?timeout=32s": tls: failed to verify certificate: x509: certificate signed by unknown authority
E0803 10:27:46.105654  300470 memcache.go:265] couldn't get current server API group list: Get "https://10.49.2.108:6443/api?timeout=32s": tls: failed to verify certificate: x509: certificate signed by unknown authority
E0803 10:27:46.117784  300470 memcache.go:265] couldn't get current server API group list: Get "https://10.49.2.108:6443/api?timeout=32s": tls: failed to verify certificate: x509: certificate signed by unknown authority
E0803 10:27:46.127952  300470 memcache.go:265] couldn't get current server API group list: Get "https://10.49.2.108:6443/api?timeout=32s": tls: failed to verify certificate: x509: certificate signed by unknown authority
Unable to connect to the server: tls: failed to verify certificate: x509: certificate signed by unknown authority
root@k8s/etc/kubernetes/manifests#
  • 问题原因

在 “$HOME/.kube/config” 中内嵌的 ca 证书不对,检查方式:

  1. 获取 “$HOME/.kube/config” 中 “certificate-authority-data” 内容,使用 base64 反解:
echo -n '${certificate-authority-data}' | base64 -d

资源无法释放

命名空间无法删除

NAME              STATUS        AGE
argo              Active        3d21h
argo-events       Terminating   6d
argocd            Active        3d22h

命名空间 “argo-events” 无法删除,持续处于 “Terminating” 状态,这种情况一般是因为相关的 crd 资源没有删除干净,此时通过查询 etcd 中键值。

ETCDCTL_API=3 etcdctl \
	--endpoints https://127.0.0.1:2379 \
	--cacert=/etc/kubernetes/pki/etcd/ca.crt \
	--cert=/etc/kubernetes/pki/etcd/server.crt \
	--key=/etc/kubernetes/pki/etcd/server.key \
	get / --prefix --keys-only | grep argo-events
/kubernetes/fake/registry/argoproj.io/eventbus/argo-events/default
/kubernetes/fake/registry/argoproj.io/eventsources/argo-events/oneops-syncmi-v1
/kubernetes/fake/registry/argoproj.io/sensors/argo-events/log
/kubernetes/fake/registry/clusterrolebindings/argo-events-binding
/kubernetes/fake/registry/clusterrolebindings/argo-events-webhook-binding
/kubernetes/fake/registry/clusterroles/argo-events-aggregate-to-admin
/kubernetes/fake/registry/clusterroles/argo-events-aggregate-to-edit
/kubernetes/fake/registry/clusterroles/argo-events-aggregate-to-view
/kubernetes/fake/registry/clusterroles/argo-events-role
/kubernetes/fake/registry/clusterroles/argo-events-webhook
/kubernetes/fake/registry/namespaces/argo-events

通过键值可知还有哪些资源存在,此时只要把以下自定义资源移除,命名空间也就自动删除了。

# /kubernetes/fake/registry/argoproj.io/sensors/argo-events/log
# /kubernetes/fake/registry/argoproj.io/eventsources/argo-events/oneops-syncmi-v1
# /kubernetes/fake/registry/argoproj.io/eventbus/argo-events/default

ETCDCTL_API=3 etcdctl \
	--endpoints https://127.0.0.1:2379 \
	--cacert=/etc/kubernetes/pki/etcd/ca.crt \
	--cert=/etc/kubernetes/pki/etcd/server.crt \
	--key=/etc/kubernetes/pki/etcd/server.key \
    del /kubernetes/fake/registry/argoproj.io/eventbus/argo-events/default



最后修改 2024.06.18: docs: 添加故障处理案例 (23e0734)