etcd
少于1分钟
简要概述
记录 etcd 相关的场景问题。
无法启动类
“received signal; shutting down”
- 关键日志
{"level":"info","ts":"2023-07-27T13:31:34.277Z","caller":"rafthttp/stream.go:274","msg":"established TCP streaming connection with remote peer","stream-writer-type":"stream MsgApp v2","local-member-id":"edacac26a3fb24b3","remote-peer-id":"2c8b55a3459bb7ff"}
{"level":"info","ts":"2023-07-27T13:33:03.898Z","caller":"osutil/interrupt_unix.go:64","msg":"received signal; shutting down","signal":"terminated"}
{"level":"info","ts":"2023-07-27T13:33:03.898Z","caller":"embed/etcd.go:367","msg":"closing etcd server","name":"node3","data-dir":"/data/etcd/","advertise-peer-urls":["https://10.49.2.36:2380"],"advertise-client-urls":["https://10.49.2.36:2379"]}
- 问题原因
在文件 /etc/containerd/config.toml
中的 SystemdCgroup
配置为 false
了,与 kubelet
中不同。
- 解决步骤
更改配置,使用同 “kubelet” 一样的 cgroup 为 systemd
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
备份恢复类
控制节点 etcd 数据恢复
- 登录故障节点,移除 etcd 静态 POD
cd /etc/kubernetes/manifests/
mv etcd.yaml /usr/local/src/
- 登录另外正常的 etcd 集群节点
查看成员列表
ctr --namespace k8s.io run \
--rm \
--net-host \
--mount type=bind,src=/etc/kubernetes,dst=/etc/kubernetes,options=rbind:ro \
registry.cn-hangzhou.aliyuncs.com/kube-image-repo/etcd:v3.5.10 etcd-health \
etcdctl --cert /etc/kubernetes/pki/etcd/peer.crt \
--key /etc/kubernetes/pki/etcd/peer.key \
--cacert /etc/kubernetes/pki/etcd/ca.crt \
--endpoints https://127.0.0.1:2379 \
member list
- 移除异常的成员
ctr --namespace k8s.io run \
--rm \
--net-host \
--mount type=bind,src=/etc/kubernetes,dst=/etc/kubernetes,options=rbind:ro \
registry.cn-hangzhou.aliyuncs.com/kube-image-repo/etcd:v3.5.10 etcd-health \
etcdctl --cert /etc/kubernetes/pki/etcd/peer.crt \
--key /etc/kubernetes/pki/etcd/peer.key \
--cacert /etc/kubernetes/pki/etcd/ca.crt \
--endpoints https://127.0.0.1:2379 \
member remove 63e2da0114a7b9b7
- 更改故障节点 etcd 作为新成员加入
ctr --namespace k8s.io run \
--rm \
--net-host \
--mount type=bind,src=/etc/kubernetes,dst=/etc/kubernetes,options=rbind:ro \
registry.cn-hangzhou.aliyuncs.com/kube-image-repo/etcd:v3.5.10 etcd-health \
etcdctl --cert /etc/kubernetes/pki/etcd/peer.crt \
--key /etc/kubernetes/pki/etcd/peer.key \
--cacert /etc/kubernetes/pki/etcd/ca.crt \
--endpoints https://127.0.0.1:2379 \
member add node2 --peer-urls="https://192.168.201.122:2380"
ETCD_NAME="node2"
ETCD_INITIAL_CLUSTER="node3=https://192.168.201.123:2380,node2=https://192.168.201.122:2380,node1=https://192.168.201.121:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://192.168.201.122:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"
- 启动故障节点的 etcd 服务
--initial-cluster-state=new
更改为
--initial-cluster-state=existing
最后修改 2024.05.06: docs: 独立常见问题处理 (96c4309)