Exam failed : blocking questions: etcd restore and cluster upgrade

hisseinsouleyman · January 16, 2021, 12:14pm

Hi All

I contact you because I passed the CKA exam twice unfortunately I failed.

For the first I had a score of 40% and the second 57%.

I need your help on three issues that prevent me from passing the new version of the CKA certification.

on the questions on the cluster upgrade and the restoration of ETCD

The operations do not work because the files and admin.conf, kubelet.conf, config.yaml as well as etcd.yaml are not present on the default directory and I cannot find these files. As a result, I did not succeed in these two tasks.

and on the third question it is asked to create a pod which contains a container with multiple images. I don’t know how to do it.

I would like to have your help in order to pass this certification.

thank you for your lesson, it’s very well done.

sorry for my English. I am Francophone.

Best Regards

ramalrg · January 20, 2021, 6:32pm

Steps:

ssh to master node and issue the below command:
[master@k8s-master ~]$ systemctl status kubelet.service

● kubelet.service - kubelet: The Kubernetes Node Agent

Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)

Drop-In: /usr/lib/systemd/system/kubelet.service.d

└─ 10-kubeadm.conf

Active: active (running) since Wed 2021-01-13 14:16:44 PST; 6 days ago

----Truncated output--------

view " /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf " (Drop-In from the above command)

Note: This dropin only works with kubeadm and kubelet v1.11+

[Service]

Environment=“KUBELET_KUBECONFIG_ARGS=–bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf”

Environment="KUBELET_CONFIG_ARGS =–config=/var/lib/kubelet/config.yaml "

This is a file that “kubeadm init” and “kubeadm join” generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically

EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env

This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use

the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.

EnvironmentFile=-/etc/sysconfig/kubelet

ExecStart=

ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS

Look for –config in the file. View the file mentioned in –config and look for “staticPodPath”. The ‘etcd.yaml’ file should be present here.

hisseinsouleyman · January 25, 2021, 9:51am

thanks for your answer, i will do the test

ramalrg · February 1, 2021, 4:49pm

W.r.t CKA exam the etcd is installed on the edge node and the question is to take etcd backup and restore on edge node only. Keep in mind when attempting this question.

Some tips to check etcd server on edge node:

Step 1: check the etcd server status:

cloud@edge-node:~$ systemctl status etcd
● etcd.service - etcd
Loaded: loaded (/etc/systemd/system/etcd.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2021-01-29 21:58:40 CST; 2 days ago
Docs: CoreOS · GitHub
Main PID: 1299767 (etcd)
Tasks: 10 (limit: 4615)
Memory: 8.3M
CGroup: /system.slice/etcd.service
└─1299767 /usr/bin/etcd --name master-1 --cert-file=/etc/etcd/etcd.crt --key-file=/etc/etcd/etcd.key --peer-cert-file=/etc/etcd/etcd.cr>

STep 2: cat /etc/systemd/system/etcd.service

[Unit]
Description=etcd
Documentation=CoreOS · GitHub

[Service]
ExecStart=/usr/bin/etcd
–name master-1
–cert-file=/etc/etcd/etcd.crt
–key-file=/etc/etcd/etcd.key
–peer-cert-file=/etc/etcd/etcd.crt
–peer-key-file=/etc/etcd/etcd.key
–trusted-ca-file=/etc/etcd/ca.crt
–peer-trusted-ca-file=/etc/etcd/ca.crt
–peer-client-cert-auth
–client-cert-auth
–initial-advertise-peer-urls https://11.0.0.79:2380
–listen-peer-urls https://11.0.0.79:2380
–listen-client-urls https://11.0.0.79:2379,https://127.0.0.1:2379
–advertise-client-urls https://11.0.0.79:2379
–initial-cluster-token etcd-cluster-0
–initial-cluster master-1=https://11.0.0.79:2380
–initial-cluster-state new
–data-dir=/var/lib/etcd
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

Step 3: You will see certificate and data dir paths on edge node. Keep in mind the certificate paths will be provided in question itself.

Step 4: backup and restore commands from the course should work on edge node.

Step 5: When you attempt restore to different --data-dir make sure to change the folder owner permissions:
For example if the new data-dir is /var/lib/etcd-from-back-up issue the below command to change the permissions
chown +R etcd.etcd /var/lib/etcd-from-back-up

Step 6: Fianlly issue “systemctl daemon-reload”, “systemctl retsart etcd” and “systemctl enable etcd”

Hope this helps.

hisseinsouleyman · February 1, 2021, 9:47pm

thank you very much, it is very clear.
i hope to succeed in this task
please , a last question.
for the cluster upgrade is done on the master node or edge node ?

ramalrg · February 1, 2021, 11:29pm

Cluster upgrade is on master node only. You need to ssh master node and perform upgrade.

All the best for your exam.

vpd.k8s · February 2, 2021, 9:25am

–data-dir is restored on the student node (base node), do we need to copy this directory on the master node as etcd service is running on the master node and not on the base node. Also, what about cluster token details?

hisseinsouleyman · February 2, 2021, 10:10am

Thank you Ramalrg. that is all for me

ramalrg · February 2, 2021, 6:37pm

In exam you no need to copy --data-dir in master node. Restart the etcd server after restore in student node. I believe that is enough.

vpd.k8s · February 4, 2021, 6:44pm

Thanks for the clarification.

vpd.k8s · February 6, 2021, 3:57pm

Was just thinking: The etcd service/process is running on the master node right? So, how the process gets the restored data details? is NFS configured b/w student node & master node. Please clarify. Thanks.

hisseinsouleyman · February 6, 2021, 7:56pm

I also think that on the exam it will be necessary to copy the directory from the base node to the master node or to launch the restoration on the master. so the directory is on the master.

at the same time I also doubt that the etcd is running on the base node.

Ramalrg , are you sure that the etcd is runind as a service on the base node ?

ramalrg · February 13, 2021, 5:10pm

Hmmm. The question clearly says that etcd is running at 127.0.0.1:2379. Also systemctl status etcd on base node reports it is active and running. The same command on master mode reports no etcd is running. So it is evident that etcd is running on base node. I couldn’t get a chance to check NFS between master and base node due to time limitations in exam.

The steps provided in my previous post were tried on my 3 node cluster.

I know this is little bit confusing but this is the solution I have got it from various blogs, forums and after tried out in our local cluster.

hisseinsouleyman · February 13, 2021, 9:23pm

ok thank you i understand. i agree as the question specifies , and that you have performed the tests

Best regards

bistub1986 · April 17, 2021, 7:34am

@ramalrg During etcd restore in edge/worker node, after the restore command is executed, do we need to update the etcd.yaml static pod file to update the new data-dir values? If yes, where can we find this yaml file in the worker node. Or it is not needed at all as it is running as a service?

ramalrg · April 27, 2021, 3:18am

Step 1: check the etcd server status:

cloud@edge-node:~$ systemctl status etcd
● etcd.service - etcd
Loaded: loaded (/etc/systemd/system/etcd.service; enabled; vendor preset: Step 1: check the etcd server status:

cloud@edge-node:~$ systemctl status etcd
● etcd.service - etcd
Loaded: loaded (/etc/systemd/system/etcd.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2021-01-29 21:58:40 CST; 2 days ago
Docs: https://github.com/coreos
Main PID: 1299767 (etcd)
Tasks: 10 (limit: 4615)
Memory: 8.3M
CGroup: /system.slice/etcd.service
└─1299767 /usr/bin/etcd --name master-1 --cert-file=/etc/etcd/etcd.crt --key-file=/etc/etcd/etcd.key --peer-cert-file=/etc/etcd/etcd.cr>

STep 2: cat /etc/systemd/system/etcd.service

[Unit]
Description=etcd
Documentation=CoreOS · GitHub

[Service]
ExecStart=/usr/bin/etcd
–name master-1
–cert-file=/etc/etcd/etcd.crt
–key-file=/etc/etcd/etcd.key
–peer-cert-file=/etc/etcd/etcd.crt
–peer-key-file=/etc/etcd/etcd.key
–trusted-ca-file=/etc/etcd/ca.crt
–peer-trusted-ca-file=/etc/etcd/ca.crt
–peer-client-cert-auth
–client-cert-auth
–initial-advertise-peer-urls https://11.0.0.79:2380
–listen-peer-urls https://11.0.0.79:2380
–listen-client-urls https://11.0.0.79:2379,https://127.0.0.1:2379
–advertise-client-urls https://11.0.0.79:2379
–initial-cluster-token etcd-cluster-0
–initial-cluster master-1=https://11.0.0.79:2380
–initial-cluster-state new
–data-dir=/var/lib/etcd
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

Step 3: You will see data dir paths on edge node.

You can delete the etcd folder in data-dir before restoring is attempted. That way you no need to edit the yaml file.

Hope it clears

nitin194 · May 22, 2021, 5:50am

Hey @ramalrg

Thank you so much for your detailed explanation. This helped me clear my doubt as well … I was able to upgrade the cluster but was facing issues in restoring etcd. As per the course, I restored the etcd backup in another dir but then I couldn’t find the manifest path and YAML file. Then I tried to look for the kubelet service so that I can find out the manifest folder path if it’s different on this student node. But to my surprise, there was no kubelet service running. I mean service kubelet status or systemctl status kubelet didn’t return any output therefore I was unable to find the config file to look for a manifest folder. Little did I know that the etcd was in fact running as a service.

But do you think it was possible that there was no kubelet service running on that node?

ahmooody · November 1, 2021, 3:43pm

guy as anybody can explain the above question with sample Master Node or Student Node, in the mentioned steps and as well how to edit the Data dir in the Service file and so on, appreciate your prompt respond, as far i know the ETCD will be running on the Master Node so should we take the backup on Master and then restore it on Student/Edge Node, please more clarification to the steps will be highly appreciated

diegoashraf · February 7, 2022, 2:22pm

@ramalrg ,

For example, if i restored the backup to /var/lib/etcd/backup ,
in this case, instead of editing the etcd config – data-dir= ,
can I copy the restored file to /var/lib/etcd/ , does this works ?

<<For example if the new data-dir is /var/lib/etcd-from-back-up issue the below command to change the permissions
chown +R etcd.etcd /var/lib/etcd-from-back-up>>

is this “etcd.etcd” is username and group of etcdctl ? , how can we find it.

Thanks in advance …

liuyi647 · April 27, 2022, 11:46pm

This is million dollars answer. I did a very complicate solution in this task which accidently broke the etcd of ok8s cluster that is NOT used in this task.

When I started the task that using the ok8s cluster, I got “the connection to the server xx.xx.xx.xx 6443 was refused - did you specify the right host or port”. Therefore, I cannot do it, lost 7%.

And then I found another cluster mk8s totally down, ‘kubectl get node’ returns nothing. the cluster is shared with the ETCD task as well. I lost another 7%

I totally lost 21% because this failed task. I don’t think it is fair, because other two tasks shouldn’t be impacted.

this guy made a good point.