*Hello all, I saw a lot of ETCD backup process posts here but none of them are c . . .

Sarma Pasumarthi:
Hello all, I saw a lot of ETCD backup process posts here but none of them are conclusive. So, I went ahead and prepared some good steps (keeping exam console window in perspective).
Please suggest any changes if required. Thank you.

  1. Make sure you are on Master/Controlplane node
  2. Identify etcd pod by running the following command:
        k get pods --all-namespaces
  1. Run the following command to describe etcd pod:
          k -n kube-system describe pod <etcd-pod-name-here>
  1. From this output, identify and copy the following lines (under “Command:” section) and paste them in notepad of exam window:
    a) --cert-file=<value>
    b) --data-dir=<value>
    c) --listen-client-urls=<value>
    d) --key-file=<value>
    e) --peer-trusted-ca-file=<value>
  2. Now, at command prompt, issue this command to get help on “etcdctl snapshot save”. Notice the “-h” option
        ETCDCTL_API=3 etcdctl snapshot save -h
  1. From help output, identify and copy (to notepad) the necessary tags that should be passed along:
    a) --cacert=
    b) --cert=
    c) --endpoints=
    d) --key=
  2. Based on this information, build the “snapshot save” command in notepad. It should look something like this. You can get more info from kubernetes documentation tab:
        ETCDCTL_API=3 etcdctl --endpoints &lt;listen-client-urls&gt; --cert=&lt;cert-file&gt; --cacert=&lt;peer-trusted-ca-file&gt;  --key=&lt;key-file&gt; snapshot save &lt;path-from-exam-question&gt;
  1. Once you replace values, it looks like this:
        ETCDCTL_API=3 etcdctl --endpoints <> --cert=/etc/kubernetes/pki/etcd/server.crt --cacert=/etc/kubernetes/pki/etcd/ca.crt  --key=/etc/kubernetes/pki/etcd/server.key snapshot save /opt/snapshot-pre-boot.db
  1. Run this command and make sure the snapshot is taken in target folder
  2. Now, to restore, use the help screen “ETCDCTL_API=3 etcdctl snapshot restore -h”. From out of help screen, identify and add the following to save command you created above.
    So, here’s final command:
        ETCDCTL_API=3 etcdctl --endpoints <> --cert=/etc/kubernetes/pki/etcd/server.crt --cacert=/etc/kubernetes/pki/etcd/ca.crt  --key=/etc/kubernetes/pki/etcd/server.key --data-dir=/var/lib/etcd-from-backup --initial-advertise-peer-urls="<>" --initial-cluster="default=<>" --initial-cluster-token="etcd-cluster-1" snapshot restore /opt/snapshot-pre-boot.db
  1. Once done, navigate to –data-dir path and make sure a new director by name “member” is created
  2. Final task is to make changes in “/etc/kubernetes/manifests/etcd.yaml” file.
    Under “command” section, change –data-dir value to the --data-dir value you used in above command (i.e /var/lib/etcd-from-backup)
    Under “command” section, add a new entry: - --initial-cluster-token=etcd-cluster-1 (This item, you can find in snapshot command above. Just copy, paste as it is… You need to add a hyphen (-) though)
    Under “volumes” section, in “hostPath”, change the directory path to “–data-dir” path you used in above command. (i.e /var/lib/etcd-from-backup)
    Under “volumeMounts” section, change this path to “–data-dir” path as well (i.e /var/lib/etcd-from-backup)
  3. Since this etcd pod is deployed as static pod, it should automatically identify the changes and restart
        Get status of newly created pod using: watch "docker ps -a | grep etcd" (you will see etcd pod started a few seconds ago and running)
  1. Optionally, you can use this command to check: ETCDCTL_API=3 etcdctl member list
  2. Finally, execute any kubectl commands and make sure you are getting results.

one improvement you can make on this is that instead of waiting forever for it to get back up after a restore, you can do the following:

# watch docker ps

you should see the etcd container restarted and after about 30 seconds, the API Server container restart as well. Once the container API Server is up and running, you can start to issue:

$ kubectl ...


Andrey Tsediakov:
Question : I can not find FULL example of ETCD backup in the docs accessible thru the exam? Dont think that memorizing all the steps is the right way to go :neutral_face: . Any advice?

No. But the help can give you all the parameters you need and you can copy/paste their values from etcd.yaml. For the help:

$ ETCDCTL_API=3 etcdctl snapshot restore -h

Sarma Pasumarthi:
Thats the exact reason I mentioned the steps to get commands from help screens…

Andrey Tsediakov:
I dont see that it gives me the keys ETCD asking for …

Andrey Tsediakov:
controlplane $ ETCDCTL_API=3 etcdctl snapshot restore -h

Command 'etcdctl' not found, but can be installed with:

snap install etcd # version 3.4.5, or
apt install etcd-client

See 'snap info etcd' for additional versions.

controlplane $

Andrey Tsediakov:
My bad, didnt installed etcdctl :slightly_smiling_face:

Sarma Pasumarthi:
Looks like your OS doesn’t have etcdctl installed. Can you go ahead and install it>

Andrey Tsediakov:
Now its working …

Sarma Pasumarthi:
Great… Please follow steps and suggest any changes that are helpful to others…

@Andrey Tsediakov In general, try to install the version that mirrors the version used by etcd. I’ve run into scenarios where a restore failed because the versions were different (and it didn’t even warn me so I spent hours troubleshooting it)

Andrey Tsediakov:
@Sarma Pasumarthi I am wondering why do you need :
Restore option worked without it for me …

@Andrey Tsediakov You don’t need them in simple single-node clusters. I think you do when you have multiple nodes in your control plane. In fact, you can get away with this if you have just a single control node but you have to run this on the control node:

ETCDCTL_API=3 etcdctl snapshot restore /tmp/etcd-backup.db --data-dir /var/lib/etcd-backup

Sarma Pasumarthi:
As I don’t have multi-node cluster on my Ubuntu VM, I followed course labs to prepare those instructions… Good to have but you can skip them if you are trying on single node clusters

It would have been preferable if the course had covered both scenarios (single node control plane vs multi-node control plane) and when to use the various switches

Rixin Lan:
Step 13: Should we change volumeMounts:
- mountPath: /var/lib/etcd
to the directory path to “/var/lib/etcd-from-backup” ?

Sarma Pasumarthi:
@Rixin Lan if you change volume that has “hostPath”, both volume mounts are using that volume so its a change at one place. Optionally, you can change volumeMounts and leave volume as it is…

Sarma Pasumarthi:
I will try again tomorrow and let you knoiw

Andrey Tsediakov:
@Sarma Pasumarthi when i perform restore on 2 node cluster with optional flags I am getting error :