ETCD Backup and Restore Issue - Restoring ETCD causes invalid bearer token in Kube Apiserver

roy.joachim.jabs · July 7, 2020, 9:47am

Hi,

I have an issue with the Practice Test (.117).

When i try to restore etcd and restart the kube-apiserver i get the message spam: Unable To Authenticate Request due to an error: Invalid Bearer Token [Token has been invalidated] (Sorry, cant copy and paste the console here)

I tried now like 4-5 times this test, tried around with different ways to restart the kubernetes control plane, im at my wits end (in real life i would just tear it down and set it up from scratch at this point - and avoid doing etcd backups and rather rely on a repository that holds the yaml configs.)

Any idea?

Tej-Singh-Rana · July 7, 2020, 9:56am

Can we interact with your steps? Not all just commands.

roy.joachim.jabs · July 7, 2020, 10:10am

I guess i figured it out - I am simply not supposed to restart the kube apiserver. I watched the solution video (which i actually never do), and then i saw that contrary in the lesson the kube apiserver was not stopped and restarted (Since there was no service / unitfile, i stopped it by stopping kubelet and stopping the container with docker stop and restarted kubelet which automatically also started the api server after finishing restoring etcd).

Tej-Singh-Rana · July 7, 2020, 10:12am

kubelet will track changes in manifest file if any changes happened then it will restart the Pod to applied in the cluster.

roy.joachim.jabs · July 7, 2020, 10:18am

Yes. But i just double checked - In the lesson you are explicitly told to stop and restart the kube api server, but that wont work in the lab. I think it should have been mentioned somewhere. And actually, i would also expect that restarting the kube apiserver should not break the environment. I dont know if this is an issue with the lab or kubernetes itself. What i learned from that lesson is to make sure that all changes in the kubernetes cluster (if i have to run one) have to be made with the descriptive approach. I will not back up etcd and recover with snapshots in real life since this seems error prone, and the last thing i want in production is an unreliable recovery procedure.

ronny1 · June 10, 2022, 1:55pm

did u ever fiz this issue ? i got the same issue after trying to recover with etcd, , ive tried eveything its a quite big cluster really dont want to redo everything