K3s (SQLite): Promoting a Node to the Control Plane
K3s is a lightweight Kubernetes distribution in which the core components are packaged into a single binary. The only way to accomplish a Control Plane promotion in K3s is by doing a backup & restore of the embedded SQLite datastore.
These instructions are intended for administrators using the default embedded SQLite. There will be some downtime (~15 min). A zero downtime promotion is possible but would require an external DB.
Why I Needed to Make A Different Node the Control Plane
I was running my cluster on 2 Raspberry Pi’s with MicroSD cards. For those unaware, Kubernetes is relatively write-heavy and even the high-endurance SD cards will give out after a year or so. I was running my cluster on a ticking time bomb and an unexpected SD card failure would mean a total rebuild.
To avoid this, I decided to add a new SSD-backed node to serve as my control plane. As a result, I’m less worred about my SD cards giving out because it’s a lot easier to remediate a broken worker node than a control plane node.
Requirements
- Existing K3s cluster using the default embedded SQLite datastore
Step 1: Take Backups
On the existing control plane server:
# Stop k3s before taking backup
sudo systemctl stop k3s
# Backup the SQLite datastore
sudo cp /var/lib/rancher/k3s/server/db/state.db ~/k3s_backup/
# Backup the server token
sudo cp /var/lib/rancher/k3s/server/token ~/k3s_backup/
Step 2: Wipe K3s from all nodes
Run on all nodes (control plane and workers)
sudo k3s-killall.sh
sudo rm -rf /etc/rancher/k3s /var/lib/rancher/k3s
Step 3: Prepare the NEW control plane node
- SCP the state.db and token from the old control plane to the new one (any temp directory is fine for now)
-
On Raspberry Pis, add the following args to the end of the single line in /boot/firmware/cmdline.txt (without these, the k3s install will fail)
cgroup_enable=memory cgroup_memory=1 systemd.unified_cgroup_hierarchy=1
-
Then reboot:
sudo reboot
Step 4: Install K3s on new control plane node
Run k3s installation script from https://docs.k3s.io/quick-start
curl -sfL https://get.k3s.io | sh -
Then:
sudo systemctl stop k3s
# Restore backups
sudo cp state.db /var/lib/rancher/k3s/server/db/state.db
sudo cp token /var/lib/rancher/k3s/server/token
# Clean up conflicting data - these will be regenerated
sudo rm -rf /var/lib/rancher/k3s/server/tls
sudo rm -f /etc/rancher/k3s/k3s.yaml
sudo rm -f /var/lib/rancher/k3s/server/cred/*
sudo systemctl start k3s
Step 5: Remove old control plane metadata
sudo kubectl get nodes -o wide
# Delete the original control plane node before rejoining it as a worker
sudo kubectl delete node {OLD_CONTROLPLANE}
Step 6: Join all worker nodes to the new control plane
Retrieve Node Token from new control plane node:
sudo cat /var/lib/rancher/k3s/server/node-token
On each worker node:
curl -sfL https://get.k3s.io | K3S_URL=https://{CONTROL_PLANE_IP}:6443 K3S_TOKEN={TOKEN} sh -s - agent
Step 7: Confirm all nodes are in ready state
sudo kubectl get nodes -o wide
Extras
Allow your non-root user privilege to use kubectl
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
sudo chown $(id -u):$(id -g) ~/.kube/config
chmod 600 ~/.kube/config # Or 644 if multiple users need read access
export KUBECONFIG=~/.kube/config # Add this to bashrc too
Enable autocompletion
# Add all of the below to ~/.bashrc
echo 'source <(kubectl completion bash)' >>~/.bashrc
echo 'alias k=kubectl' >>~/.bashrc
echo 'complete -o default -F __start_kubectl k' >>~/.bashrc
source ~/.bashrc
Other
- Reinstall helm
- Reinstall docker
- Reinstall argoCD CLI
- Take care to clean up old PVCs
- Update your firewall (ufw or other)
- Uninstall and reinstall Traefik ingress controller (resolved a bunch of conflicts with RBAC that the backup & restore seemed to have caused)