Worker Node Failure Troubleshooting Exercises

Rupam Bezbarua:
Worker Node Failure Troubleshooting Exercises

Rupam Bezbarua:
Hi - Need some help with the Worker Node failure Troubleshooting exercises. While trying to solve Question 2, I run journalctl -u kubelet to look at the logs and I get a line that reads:

"Apr 30 18:02:57 master kubelet[562]: F0430 18:02:57.860270 562 server.go:196] failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml".
However, the file is very much present in the directory /var/lib/kubelet/config.yaml. So, I am not sure where I need to check to make sure it looks at this directory and picks the config file.

Sahil Rahi:
Cause you are running it on master node, the node which is not working is node01, ssh into node01 and run this command there

Rahul Soni:
Kubelet is present on all nodes.
Can you checl the config and verify the certificates path and name if its correct.

Rupam Bezbarua:
<@UTG489Y9F> <@UK2LT2FN0> Thank you so much for your help. I was indeed trying to run it in the master instead of node01.

Rahul Soni:
Thats not wrong, kubelet runs on master node too. Every node in the cluster

Rupam Bezbarua:
<@UK2LT2FN0> I am actually stuck in Q3 of the same section. Under /etc/kubernetes/kubelet.conf, I am able to see the following: client-certificate: /var/lib/kubelet/pki/kubelet-client-current.pem
client-key: /var/lib/kubelet/pki/kubelet-client-current.pem

Rahul Soni:
Check the certificate name in the directory.

Rupam Bezbarua:
It’s using the same file for both certificate and key. However, under /var/lib/kubelet/pki/, I see the following kubelet-client-2020-05-01-12-47-18.pem kubelet-client-current.pem kubelet.crt kubelet.keyWhat’s the best way to find out the relevant one? I did try to view the details using openssl x509 in each one of the .pem and .crt files but can’t seem to understand which one could be the right one

Rahul Soni:
Student question: use kubelet.crt

Rahul Soni:
i would urge you go through the lectures again, would be helpful for you.
And yes we are here always to help you.

Rupam Bezbarua:
Yes, I tried kubelet.crt as well. Not working for some reason.

Rupam Bezbarua:
<@UK2LT2FN0> Yes, it’s a good idea to go through the entire certificate lectures again. Will do that.
Just can’t wrap my head around what I am missing here. used kubelet.crt as the certificate and kubelet.key as the key for the node to authenticate itself. Restarted the service but can’t seem to fix it. I will go through TLS Certs section again.

Sahil Rahi:
Hi rupam, for cluster issues, first check the deployments, get logs from them, check nodes are ready or not, if all nodes are ready then use command journalctl -u kubelet -f in master node first, if issue is not there then ssh into all nodes and check their kubelet status using the above command. Also check for pods in kube-system if there is any issue, if kubelet is not working in master node and you are getting error like did you specify correct port then use command - docker ps -a , and then docker logs container-id. If you dont know any step i said above then check the lectures again. Hope it helps.

Sahil Rahi:
For the third question which you are solving, since the worker node01 is not working, try to ssh into node01 and use command: journalctl -u kubelet -f. you will see the issue.

Rupam Bezbarua:
<@UTG489Y9F> yep, I was looking at the wrong place. The journalctl command really helped. The kubelet service was unable to connect to the API server because of a wrong port configured in the kubeconfig file. Once I fixed that and reloaded the service, it seems to be working fine.

Thanks to you both for your help. The 3rd one took a greater part of the my today. Apologies for multiple pings on what really was an oversight on my part.