kube-apiserver
2 分钟阅读
简要概述
记录 kube-apiserver 相关问题。
无法启动类
“dial tcp 127.0.0.1:8080: connect: connection refused”
- 关键日志
root@k8s/etc/kubernetes/manifests# kubectl get nodes
E0803 10:01:41.668372 300010 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
E0803 10:01:41.669287 300010 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
E0803 10:01:41.671326 300010 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
E0803 10:01:41.672078 300010 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
E0803 10:01:41.674043 300010 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
The connection to the server localhost:8080 was refused - did you specify the right host or port?
root@k8s/etc/kubernetes/manifests#
- 问题原因
新版不在开放本地的 “127.0.0.1:8080” 端口,需配置下 kubeconfig 路径,如:$HOME/.kube/config
“failed to verify certificate: x509”
- 关键日志
root@k8s/etc/kubernetes/manifests# kubectl get nodes
E0803 10:27:46.079494 300470 memcache.go:265] couldn't get current server API group list: Get "https://10.49.2.108:6443/api?timeout=32s": tls: failed to verify certificate: x509: certificate signed by unknown authority
E0803 10:27:46.094227 300470 memcache.go:265] couldn't get current server API group list: Get "https://10.49.2.108:6443/api?timeout=32s": tls: failed to verify certificate: x509: certificate signed by unknown authority
E0803 10:27:46.105654 300470 memcache.go:265] couldn't get current server API group list: Get "https://10.49.2.108:6443/api?timeout=32s": tls: failed to verify certificate: x509: certificate signed by unknown authority
E0803 10:27:46.117784 300470 memcache.go:265] couldn't get current server API group list: Get "https://10.49.2.108:6443/api?timeout=32s": tls: failed to verify certificate: x509: certificate signed by unknown authority
E0803 10:27:46.127952 300470 memcache.go:265] couldn't get current server API group list: Get "https://10.49.2.108:6443/api?timeout=32s": tls: failed to verify certificate: x509: certificate signed by unknown authority
Unable to connect to the server: tls: failed to verify certificate: x509: certificate signed by unknown authority
root@k8s/etc/kubernetes/manifests#
- 问题原因
在 “$HOME/.kube/config” 中内嵌的 ca 证书不对,检查方式:
- 获取 “$HOME/.kube/config” 中 “certificate-authority-data” 内容,使用
base64
反解:
echo -n '${certificate-authority-data}' | base64 -d
资源无法释放
命名空间无法删除
NAME STATUS AGE
argo Active 3d21h
argo-events Terminating 6d
argocd Active 3d22h
命名空间 “argo-events” 无法删除,持续处于 “Terminating” 状态,这种情况一般是因为相关的 crd 资源没有删除干净,此时通过查询 etcd 中键值。
ETCDCTL_API=3 etcdctl \
--endpoints https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
get / --prefix --keys-only | grep argo-events
/kubernetes/fake/registry/argoproj.io/eventbus/argo-events/default
/kubernetes/fake/registry/argoproj.io/eventsources/argo-events/oneops-syncmi-v1
/kubernetes/fake/registry/argoproj.io/sensors/argo-events/log
/kubernetes/fake/registry/clusterrolebindings/argo-events-binding
/kubernetes/fake/registry/clusterrolebindings/argo-events-webhook-binding
/kubernetes/fake/registry/clusterroles/argo-events-aggregate-to-admin
/kubernetes/fake/registry/clusterroles/argo-events-aggregate-to-edit
/kubernetes/fake/registry/clusterroles/argo-events-aggregate-to-view
/kubernetes/fake/registry/clusterroles/argo-events-role
/kubernetes/fake/registry/clusterroles/argo-events-webhook
/kubernetes/fake/registry/namespaces/argo-events
通过键值可知还有哪些资源存在,此时只要把以下自定义资源移除,命名空间也就自动删除了。
# /kubernetes/fake/registry/argoproj.io/sensors/argo-events/log
# /kubernetes/fake/registry/argoproj.io/eventsources/argo-events/oneops-syncmi-v1
# /kubernetes/fake/registry/argoproj.io/eventbus/argo-events/default
ETCDCTL_API=3 etcdctl \
--endpoints https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
del /kubernetes/fake/registry/argoproj.io/eventbus/argo-events/default
最后修改 2024.06.18: docs: 添加故障处理案例 (23e0734)