Need help. No matter what I have done, I backup and restore etcd but the pod is . . .

Bryan Tanoue:
Need help. No matter what I have done, I backup and restore etcd but the pod is always in pending. I’ll put the commands I used this thread.

Bryan Tanoue:

sudo ETCDCTL_API=3 etcdctl --endpoints=<https://127.0.0.1:2379> --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key snapshot save my-backup

Bryan Tanoue:

sudo ETCDCTL_API=3 etcdctl --endpoints=<https://127.0.0.1:2379> --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --data-dir=/var/lib/etcd/my-backup snapshot restore my-backup

Bryan Tanoue:
snip of etcd.yaml

Bryan Tanoue:

- hostPath:
      path: /var/lib/etcd/my-backup
      type: DirectoryOrCreate
    name: etcd-data

Bryan Tanoue:
Here are docker logs for the container

Bryan Tanoue:

2021-06-11 17:16:47.484028 W | etcdserver: failed to apply request "header:&lt;ID:7587855070753099372 username:\"kube-apiserver-etcd-client\" auth_revision:1 &gt; txn:&lt;compare:&lt;target:MOD key:\"/registry/events/kube-system/kube-scheduler-kubemaster.1687964b3116b19e\" mod_revision:61845 &gt; success:&lt;request_put:&lt;key:\"/registry/events/kube-system/kube-scheduler-kubemaster.1687964b3116b19e\" value_size:665 lease:7587855070661415016 &gt;&gt; failure:&lt;request_range:&lt;key:\"/registry/events/kube-system/kube-scheduler-kubemaster.1687964b3116b19e\" &gt; &gt;&gt;" with response "" took (82.914µs) to execute, err is lease not found
2021-06-11 17:16:47.885029 W | etcdserver: failed to apply request "header:&lt;ID:7587855070753099375 username:\"kube-apiserver-etcd-client\" auth_revision:1 &gt; txn:&lt;compare:&lt;target:MOD key:\"/registry/events/kube-system/kube-apiserver-kubemaster.16879695ca899307\" mod_revision:0 &gt; success:&lt;request_put:&lt;key:\"/registry/events/kube-system/kube-apiserver-kubemaster.16879695ca899307\" value_size:698 lease:7587855070661415016 &gt;&gt; failure:&lt;&gt;&gt;" with response "" took (45.58µs) to execute, err is lease not found
2021-06-11 17:16:48.285410 W | etcdserver: failed to apply request "header:&lt;ID:7587855070753099378 username:\"kube-apiserver-etcd-client\" auth_revision:1 &gt; txn:&lt;compare:&lt;target:MOD key:\"/registry/events/kube-system/kube-apiserver-kubemaster.16879695ca899307\" mod_revision:0 &gt; success:&lt;request_put:&lt;key:\"/registry/events/kube-system/kube-apiserver-kubemaster.16879695ca899307\" value_size:698 lease:7587855070661415016 &gt;&gt; failure:&lt;&gt;&gt;" with response "" took (32.813µs) to execute, err is lease not found
2021-06-11 17:16:48.682883 W | etcdserver: failed to apply request "header:&lt;ID:7587855070753099380 username:\"kube-apiserver-etcd-client\" auth_revision:1 &gt; txn:&lt;compare:&lt;target:MOD key:\"/registry/events/kube-system/kube-apiserver-kubemaster.16879695ca899307\" mod_revision:0 &gt; success:&lt;request_put:&lt;key:\"/registry/events/kube-system/kube-apiserver-kubemaster.16879695ca899307\" value_size:698 lease:7587855070661415016 &gt;&gt; failure:&lt;&gt;&gt;" with response "" took (35.698µs) to execute, err is lease not found
2021-06-11 17:16:48.883419 W | etcdserver: failed to apply request "header:&lt;ID:7587855070753099381 username:\"kube-apiserver-etcd-client\" auth_revision:1 &gt; txn:&lt;compare:&lt;target:MOD key:\"/registry/events/kube-system/etcd-kubemaster.1687969ec171ab29\" mod_revision:0 &gt; success:&lt;request_put:&lt;key:\"/registry/events/kube-system/etcd-kubemaster.1687969ec171ab29\" value_size:671 lease:7587855070661415016 &gt;&gt; failure:&lt;&gt;&gt;" with response "" took (34.04µs) to execute, err is lease not found
2021-06-11 17:16:49.085105 W | etcdserver: failed to apply request "header:&lt;ID:7587855070753099382 username:\"kube-apiserver-etcd-client\" auth_revision:1 &gt; txn:&lt;compare:&lt;target:MOD key:\"/registry/events/kube-system/etcd-kubemaster.1687969ec2e94320\" mod_revision:0 &gt; success:&lt;request_put:&lt;key:\"/registry/events/kube-system/etcd-kubemaster.1687969ec2e94320\" value_size:625 lease:7587855070661415016 &gt;&gt; failure:&lt;&gt;&gt;" with response "" took (809.937µs) to execute, err is lease not found
2021-06-11 17:16:49.125564 I | etcdserver/api/etcdhttp: /health OK (status code 200)
2021-06-11 17:16:49.305458 W | etcdserver: failed to apply request "header:&lt;ID:7587855070753099384 username:\"kube-apiserver-etcd-client\" auth_revision:1 &gt; txn:&lt;compare:&lt;target:MOD key:\"/registry/events/kube-system/etcd-kubemaster.1687969ec9938908\" mod_revision:0 &gt; success:&lt;request_put:&lt;key:\"/registry/events/kube-system/etcd-kubemaster.1687969ec9938908\" value_size:625 lease:7587855070661415016 &gt;&gt; failure:&lt;&gt;&gt;" with response "" took (70.839µs) to execute, err is lease not found
2021-06-11 17:16:59.114768 I | etcdserver/api/etcdhttp: /health OK (status code 200)
2021-06-11 17:20:44.135847 I | mvcc: store.index: compact 62419
2021-06-11 17:20:44.175159 I | mvcc: finished scheduled compaction at 62419 (took 38.944774ms)

Mayur Sharma:
Exact steps I follow, how much time did you wait for etcd container to come up …
Hope you are running all the commands on master node only

Bryan Tanoue:
Master node only.

SaidBen:
$ systemctl daemon-reload &amp;&amp; systemctl restart kubelet

Bryan Tanoue:
Actually, I just read this https://github.com/mmumshad/kubernetes-the-hard-way/blob/master/practice-questions-answers/cluster-maintenance/backup-etcd/etcd-backup-and-restore.md|https://github.com/mmumshad/kubernetes-the-hard-way/blob/master/practice-questions[…]wers/cluster-maintenance/backup-etcd/etcd-backup-and-restore.md

Bryan Tanoue:

If you do change --data-dir to /var/lib/etcd-from-backup in the YAML file, make sure that the volumeMounts for etcd-data is updated as well, with the mountPath pointing to /var/lib/etcd-from-backup (THIS COMPLETE STEP IS OPTIONAL AND NEED NOT BE DONE FOR COMPLETING THE RESTORE)

Bryan Tanoue:
So when I changed etcd.yaml, the pod gets recreated since it is static. However it just stayed in pending.

Bryan Tanoue:
I deleted the pod and let it recreate it again and then it went into running.

Bryan Tanoue:
I’m happy it works, but I don’t know why.

Bryan Tanoue:

Note2: If the etcd pod is not getting Ready 1/1, then restart it by kubectl delete pod -n kube-system etcd-controlplane and wait 1 minute.

SaidBen:
Yup, you need update the path for three fields in etcd.yaml

Bryan Tanoue:
I thought it was just the hostpath?

Bryan Tanoue:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    <http://kubeadm.kubernetes.io/etcd.advertise-client-urls|kubeadm.kubernetes.io/etcd.advertise-client-urls>: <https://192.168.1.20:2379>
  creationTimestamp: null
  labels:
    component: etcd
    tier: control-plane
  name: etcd
  namespace: kube-system
spec:
  containers:
  - command:
    - etcd
    - --advertise-client-urls=<https://192.168.1.20:2379>
    - --cert-file=/etc/kubernetes/pki/etcd/server.crt
    - --client-cert-auth=true
    - --data-dir=/var/lib/etcd
    - --initial-advertise-peer-urls=<https://192.168.1.20:2380>
    - --initial-cluster=kubemaster=<https://192.168.1.20:2380>
    - --key-file=/etc/kubernetes/pki/etcd/server.key
    - --listen-client-urls=<https://127.0.0.1:2379>,<https://192.168.1.20:2379>
    - --listen-metrics-urls=<http://127.0.0.1:2381>
    - --listen-peer-urls=<https://192.168.1.20:2380>
    - --name=kubemaster
    - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
    - --peer-client-cert-auth=true
    - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    - --snapshot-count=10000
    - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    image: <http://k8s.gcr.io/etcd:3.4.13-0|k8s.gcr.io/etcd:3.4.13-0>
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: 127.0.0.1
        path: /health
        port: 2381
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
    name: etcd
    resources:
      requests:
        cpu: 100m
        ephemeral-storage: 100Mi
        memory: 100Mi
    startupProbe:
      failureThreshold: 24
      httpGet:
        host: 127.0.0.1
        path: /health
        port: 2381
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
    volumeMounts:
    - mountPath: /var/lib/etcd
      name: etcd-data
    - mountPath: /etc/kubernetes/pki/etcd
      name: etcd-certs
  hostNetwork: true
  priorityClassName: system-node-critical
  volumes:
  - hostPath:
      path: /etc/kubernetes/pki/etcd
      type: DirectoryOrCreate
    name: etcd-certs
  - hostPath:
      path: /var/lib/etcd/my-backup
      type: DirectoryOrCreate
    name: etcd-data
status: {}

Bryan Tanoue:
Did another test, this time exact same thing using my command history. I just created a new backup and new path. Edited etcd.yaml and this time it worked fine without deleting the pod.