Bryan Tanoue:
Need help. No matter what I have done, I backup and restore etcd but the pod is always in pending. I’ll put the commands I used this thread.
Bryan Tanoue:
sudo ETCDCTL_API=3 etcdctl --endpoints=<https://127.0.0.1:2379> --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key snapshot save my-backup
Bryan Tanoue:
sudo ETCDCTL_API=3 etcdctl --endpoints=<https://127.0.0.1:2379> --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --data-dir=/var/lib/etcd/my-backup snapshot restore my-backup
Bryan Tanoue:
snip of etcd.yaml
Bryan Tanoue:
- hostPath:
path: /var/lib/etcd/my-backup
type: DirectoryOrCreate
name: etcd-data
Bryan Tanoue:
Here are docker logs for the container
Bryan Tanoue:
2021-06-11 17:16:47.484028 W | etcdserver: failed to apply request "header:<ID:7587855070753099372 username:\"kube-apiserver-etcd-client\" auth_revision:1 > txn:<compare:<target:MOD key:\"/registry/events/kube-system/kube-scheduler-kubemaster.1687964b3116b19e\" mod_revision:61845 > success:<request_put:<key:\"/registry/events/kube-system/kube-scheduler-kubemaster.1687964b3116b19e\" value_size:665 lease:7587855070661415016 >> failure:<request_range:<key:\"/registry/events/kube-system/kube-scheduler-kubemaster.1687964b3116b19e\" > >>" with response "" took (82.914µs) to execute, err is lease not found
2021-06-11 17:16:47.885029 W | etcdserver: failed to apply request "header:<ID:7587855070753099375 username:\"kube-apiserver-etcd-client\" auth_revision:1 > txn:<compare:<target:MOD key:\"/registry/events/kube-system/kube-apiserver-kubemaster.16879695ca899307\" mod_revision:0 > success:<request_put:<key:\"/registry/events/kube-system/kube-apiserver-kubemaster.16879695ca899307\" value_size:698 lease:7587855070661415016 >> failure:<>>" with response "" took (45.58µs) to execute, err is lease not found
2021-06-11 17:16:48.285410 W | etcdserver: failed to apply request "header:<ID:7587855070753099378 username:\"kube-apiserver-etcd-client\" auth_revision:1 > txn:<compare:<target:MOD key:\"/registry/events/kube-system/kube-apiserver-kubemaster.16879695ca899307\" mod_revision:0 > success:<request_put:<key:\"/registry/events/kube-system/kube-apiserver-kubemaster.16879695ca899307\" value_size:698 lease:7587855070661415016 >> failure:<>>" with response "" took (32.813µs) to execute, err is lease not found
2021-06-11 17:16:48.682883 W | etcdserver: failed to apply request "header:<ID:7587855070753099380 username:\"kube-apiserver-etcd-client\" auth_revision:1 > txn:<compare:<target:MOD key:\"/registry/events/kube-system/kube-apiserver-kubemaster.16879695ca899307\" mod_revision:0 > success:<request_put:<key:\"/registry/events/kube-system/kube-apiserver-kubemaster.16879695ca899307\" value_size:698 lease:7587855070661415016 >> failure:<>>" with response "" took (35.698µs) to execute, err is lease not found
2021-06-11 17:16:48.883419 W | etcdserver: failed to apply request "header:<ID:7587855070753099381 username:\"kube-apiserver-etcd-client\" auth_revision:1 > txn:<compare:<target:MOD key:\"/registry/events/kube-system/etcd-kubemaster.1687969ec171ab29\" mod_revision:0 > success:<request_put:<key:\"/registry/events/kube-system/etcd-kubemaster.1687969ec171ab29\" value_size:671 lease:7587855070661415016 >> failure:<>>" with response "" took (34.04µs) to execute, err is lease not found
2021-06-11 17:16:49.085105 W | etcdserver: failed to apply request "header:<ID:7587855070753099382 username:\"kube-apiserver-etcd-client\" auth_revision:1 > txn:<compare:<target:MOD key:\"/registry/events/kube-system/etcd-kubemaster.1687969ec2e94320\" mod_revision:0 > success:<request_put:<key:\"/registry/events/kube-system/etcd-kubemaster.1687969ec2e94320\" value_size:625 lease:7587855070661415016 >> failure:<>>" with response "" took (809.937µs) to execute, err is lease not found
2021-06-11 17:16:49.125564 I | etcdserver/api/etcdhttp: /health OK (status code 200)
2021-06-11 17:16:49.305458 W | etcdserver: failed to apply request "header:<ID:7587855070753099384 username:\"kube-apiserver-etcd-client\" auth_revision:1 > txn:<compare:<target:MOD key:\"/registry/events/kube-system/etcd-kubemaster.1687969ec9938908\" mod_revision:0 > success:<request_put:<key:\"/registry/events/kube-system/etcd-kubemaster.1687969ec9938908\" value_size:625 lease:7587855070661415016 >> failure:<>>" with response "" took (70.839µs) to execute, err is lease not found
2021-06-11 17:16:59.114768 I | etcdserver/api/etcdhttp: /health OK (status code 200)
2021-06-11 17:20:44.135847 I | mvcc: store.index: compact 62419
2021-06-11 17:20:44.175159 I | mvcc: finished scheduled compaction at 62419 (took 38.944774ms)
Mayur Sharma:
Exact steps I follow, how much time did you wait for etcd container to come up …
Hope you are running all the commands on master node only
Bryan Tanoue:
Master node only.
SaidBen:
$ systemctl daemon-reload && systemctl restart kubelet
Bryan Tanoue:
Actually, I just read this https://github.com/mmumshad/kubernetes-the-hard-way/blob/master/practice-questions-answers/cluster-maintenance/backup-etcd/etcd-backup-and-restore.md|https://github.com/mmumshad/kubernetes-the-hard-way/blob/master/practice-questions[…]wers/cluster-maintenance/backup-etcd/etcd-backup-and-restore.md
Bryan Tanoue:
If you do change --data-dir to /var/lib/etcd-from-backup in the YAML file, make sure that the volumeMounts for etcd-data is updated as well, with the mountPath pointing to /var/lib/etcd-from-backup (THIS COMPLETE STEP IS OPTIONAL AND NEED NOT BE DONE FOR COMPLETING THE RESTORE)
Bryan Tanoue:
So when I changed etcd.yaml, the pod gets recreated since it is static. However it just stayed in pending.
Bryan Tanoue:
I deleted the pod and let it recreate it again and then it went into running.
Bryan Tanoue:
I’m happy it works, but I don’t know why.
Bryan Tanoue:
Note2: If the etcd pod is not getting Ready 1/1, then restart it by kubectl delete pod -n kube-system etcd-controlplane and wait 1 minute.
SaidBen:
Yup, you need update the path for three fields in etcd.yaml
Bryan Tanoue:
I thought it was just the hostpath?
Bryan Tanoue:
apiVersion: v1
kind: Pod
metadata:
annotations:
<http://kubeadm.kubernetes.io/etcd.advertise-client-urls|kubeadm.kubernetes.io/etcd.advertise-client-urls>: <https://192.168.1.20:2379>
creationTimestamp: null
labels:
component: etcd
tier: control-plane
name: etcd
namespace: kube-system
spec:
containers:
- command:
- etcd
- --advertise-client-urls=<https://192.168.1.20:2379>
- --cert-file=/etc/kubernetes/pki/etcd/server.crt
- --client-cert-auth=true
- --data-dir=/var/lib/etcd
- --initial-advertise-peer-urls=<https://192.168.1.20:2380>
- --initial-cluster=kubemaster=<https://192.168.1.20:2380>
- --key-file=/etc/kubernetes/pki/etcd/server.key
- --listen-client-urls=<https://127.0.0.1:2379>,<https://192.168.1.20:2379>
- --listen-metrics-urls=<http://127.0.0.1:2381>
- --listen-peer-urls=<https://192.168.1.20:2380>
- --name=kubemaster
- --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
- --peer-client-cert-auth=true
- --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
- --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
- --snapshot-count=10000
- --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
image: <http://k8s.gcr.io/etcd:3.4.13-0|k8s.gcr.io/etcd:3.4.13-0>
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
path: /health
port: 2381
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
name: etcd
resources:
requests:
cpu: 100m
ephemeral-storage: 100Mi
memory: 100Mi
startupProbe:
failureThreshold: 24
httpGet:
host: 127.0.0.1
path: /health
port: 2381
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
volumeMounts:
- mountPath: /var/lib/etcd
name: etcd-data
- mountPath: /etc/kubernetes/pki/etcd
name: etcd-certs
hostNetwork: true
priorityClassName: system-node-critical
volumes:
- hostPath:
path: /etc/kubernetes/pki/etcd
type: DirectoryOrCreate
name: etcd-certs
- hostPath:
path: /var/lib/etcd/my-backup
type: DirectoryOrCreate
name: etcd-data
status: {}
Bryan Tanoue:
Did another test, this time exact same thing using my command history. I just created a new backup and new path. Edited etcd.yaml and this time it worked fine without deleting the pod.