Error in CKA labs - Backup and Restore methods

alexander.vendryes · May 20, 2022, 8:09pm

Hello,

I was attempting the CKA labs in the " Udemy Labs - Certified Kubernetes Administrator with Practice Tests" course. This issue is specifically for the “PRACTICE TEST BACKUP AND RESTORE METHODS” lab under Cluster Maintenance. On the final section, where I am supposed to restore etcd, kubectl fails to start up again after modifying the etcd.yaml file. After a couple of minutes, I keep getting this repeating error when I run any kubectl command:

The connection to the server controlplane:6443 was refused - did you specify the right host or port?)

I modified the etcd.yaml file by setting the etcd-date volume location to /var/lib/etcd-from-backup. This is also the data-dir that I specified when running ‘etcdctl snapshot restore’. Eventually, I gave up, restarted the lab, and tried following the given solution in the Solution tab. However, after the step where I modify the etcd.yaml file, I ran into the exact same error, so the expected solution to the lab’s final question is not work for me.

Could someone please help? I’m not certain why this error would be occurring.

I don’t know if it’s related or not, but when I checked the /etc/systemd/system folder, there didn’t seem to be a kube-apiserver service.

Ayman · May 20, 2022, 8:16pm

Hello @alexander.vendryes,
This should be working fine.
First Restore the snapshot:

root@controlplane:~# ETCDCTL_API=3 etcdctl  --data-dir /var/lib/etcd-from-backup \
snapshot restore /opt/snapshot-pre-boot.db


2022-03-25 09:19:27.175043 I | mvcc: restore compact to 2552
2022-03-25 09:19:27.266709 I | etcdserver/membership: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32
root@controlplane:~#

Note: In this case, we are restoring the snapshot to a different directory but in the same server where we took the backup ( the controlplane node) As a result, the only required option for the restore command is the –data-dir .

Next, update the /etc/kubernetes/manifests/etcd.yaml:

We have now restored the etcd snapshot to a new path on the controlplane - /var/lib/etcd-from-backup , so, the only change to be made in the YAML file, is to change the hostPath for the volume called etcd-data from old directory ( /var/lib/etcd ) to the new directory ( /var/lib/etcd-from-backup ).

  volumes:
  - hostPath:
      path: /var/lib/etcd-from-backup
      type: DirectoryOrCreate
    name: etcd-data

With this change, /var/lib/etcd on the container points to /var/lib/etcd-from-backup on the controlplane (which is what we want)

When this file is updated, the ETCD pod is automatically re-created as this is a static pod placed under the /etc/kubernetes/manifests directory.

Note 1: As the ETCD pod has changed it will automatically restart, and also kube-controller-manager and kube-scheduler. Wait 1-2 to mins for this pods to restart. You can run a watch "docker ps | grep etcd" command to see when the ETCD pod is restarted.

Note 2: If the etcd pod is not getting Ready 1/1 , then restart it by kubectl delete pod -n kube-system etcd-controlplane and wait 1 minute.

Note 3: This is the simplest way to make sure that ETCD uses the restored data after the ETCD pod is recreated. You don’t have to change anything else.

If you do change –data-dir to /var/lib/etcd-from-backup in the YAML file, make sure that the volumeMounts for etcd-data is updated as well, with the mountPath pointing to /var/lib/etcd-from-backup ( THIS COMPLETE STEP IS OPTIONAL AND NEED NOT BE DONE FOR COMPLETING THE RESTORE )

Thanks,
KodeKloud Support

alexander.vendryes · May 20, 2022, 8:44pm

Thanks KodeKloud support.

I think you changed your initial message. Before I saw a link to here “GitHub - mmumshad/kubernetes-cka-practice-test-solution-etcd-backup-and-restore: This is the solution to the practice test for backing up and restoring an ETCD Cluster”. Following that old solution actually worked in the lab for me; I did have to change ‘/tmp/snapshot-pre-boot.db’ to ‘/opt/snapshot-pre-boot.db’, but otherwise copy/pasting the solution from that link worked out.

FYI, I did try that solution you posted above first. It’s the same solution in the Solution tab in the lab. Following those instructions still resulted in the ‘connection to the server controlplane:6443 was refused’ error though, so that didn’t work. I wonder why? Is it because it does not specify an ’ --initial-cluster-token’ argument? That argument was in the github link’s solution despite:

that argument not appearing in the etcd.yaml manifest file in the lab
not seeing that argument referenced in the Certified Kubernetes Administrator (CKA) with Practice Tests course - Backup and Restore Methods lecture.

Ayman · May 23, 2022, 1:05pm

you can skip --initial-cluster-token argument. Kindly check the steps in the attached gif
etcd-backup1

rathinakarthi · July 13, 2022, 7:18am

i restarted kubelet and issue of “connection to the server controlplane:6443 was refused’ error though, so that didn’t work” resolved

headkaze · September 13, 2022, 1:04pm

Yes the final step for me was to run:

systemctl daemon-reload
systemctl restart kubelet

Ayman · September 23, 2022, 4:35am

Hello @headkaze,
Thanks for sahring!