K3s is a lightweight Kubernetes distribution in which the core components are packaged into a single binary. The only way to accomplish a Control Plane promotion in K3s is by doing a backup & restore of the embedded SQLite datastore.

These instructions are intended for administrators using the default embedded SQLite. There will be some downtime (~15 min). A zero downtime promotion is possible but would require an external DB.

Why I Needed to Make A Different Node the Control Plane

I was running my cluster on 2 Raspberry Pi’s with MicroSD cards. For those unaware, Kubernetes is relatively write-heavy and even the high-endurance SD cards will give out after a year or so. I was running my cluster on a ticking time bomb and an unexpected SD card failure would mean a total rebuild.

To avoid this, I decided to add a new SSD-backed node to serve as my control plane. As a result, I’m less worred about my SD cards giving out because it’s a lot easier to remediate a broken worker node than a control plane node.

Requirements

  • Existing K3s cluster using the default embedded SQLite datastore

Step 1: Take Backups

On the existing control plane server:

  # Stop k3s before taking backup
  sudo systemctl stop k3s

  # Backup the SQLite datastore
  sudo cp /var/lib/rancher/k3s/server/db/state.db ~/k3s_backup/

  # Backup the server token
  sudo cp /var/lib/rancher/k3s/server/token ~/k3s_backup/

Step 2: Wipe K3s from all nodes

Run on all nodes (control plane and workers)

  sudo k3s-killall.sh 
  sudo rm -rf /etc/rancher/k3s /var/lib/rancher/k3s

Step 3: Prepare the NEW control plane node

  • SCP the state.db and token from the old control plane to the new one (any temp directory is fine for now)
  • On Raspberry Pis, add the following args to the end of the single line in /boot/firmware/cmdline.txt (without these, the k3s install will fail)

    cgroup_enable=memory cgroup_memory=1 systemd.unified_cgroup_hierarchy=1
    
  • Then reboot:

    sudo reboot
    

Step 4: Install K3s on new control plane node

Run k3s installation script from https://docs.k3s.io/quick-start

  curl -sfL https://get.k3s.io | sh -

Then:

  sudo systemctl stop k3s
  
  # Restore backups
  sudo cp state.db /var/lib/rancher/k3s/server/db/state.db
  sudo cp token /var/lib/rancher/k3s/server/token

  # Clean up conflicting data - these will be regenerated
  sudo rm -rf /var/lib/rancher/k3s/server/tls 
  sudo rm -f /etc/rancher/k3s/k3s.yaml
  sudo rm -f /var/lib/rancher/k3s/server/cred/*

  sudo systemctl start k3s

Step 5: Remove old control plane metadata

  sudo kubectl get nodes -o wide
  # Delete the original control plane node before rejoining it as a worker
  sudo kubectl delete node {OLD_CONTROLPLANE}

Step 6: Join all worker nodes to the new control plane

Retrieve Node Token from new control plane node:

  sudo cat /var/lib/rancher/k3s/server/node-token

On each worker node:

  curl -sfL https://get.k3s.io | K3S_URL=https://{CONTROL_PLANE_IP}:6443 K3S_TOKEN={TOKEN} sh -s - agent

Step 7: Confirm all nodes are in ready state

  sudo kubectl get nodes -o wide

Extras

Allow your non-root user privilege to use kubectl

  sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
  sudo chown $(id -u):$(id -g) ~/.kube/config
  chmod 600 ~/.kube/config  # Or 644 if multiple users need read access
  export KUBECONFIG=~/.kube/config # Add this to bashrc too

Enable autocompletion

  # Add all of the below to ~/.bashrc 
  echo 'source <(kubectl completion bash)' >>~/.bashrc
  echo 'alias k=kubectl' >>~/.bashrc
  echo 'complete -o default -F __start_kubectl k' >>~/.bashrc
  source ~/.bashrc

Other

  • Reinstall helm
  • Reinstall docker
  • Reinstall argoCD CLI
  • Take care to clean up old PVCs
  • Update your firewall (ufw or other)
  • Uninstall and reinstall Traefik ingress controller (resolved a bunch of conflicts with RBAC that the backup & restore seemed to have caused)