GPUs on Kubernetes

Motivation

I wanted to add a GPU node onto my cluster to run/serve some local LLMs for something fun and new to play with. Is the kubernetes part of this strictly necessary? No. Am I going to not do it anyways ? Hell yeah ! After all, overkill is under rated !

Problems Encountered.

There’s actually a fair amount of incomplete (or inconsistent) data out there on getting gpus to work in kubernetes. Unfortunately, its fragmented and at least for a number of more “home lab” installations it seems that the information is out of date, or has been tainted by drifting configurations and installation instructions from vendors over time.

Proceedure

Install the container tool kit. curl -fsSL https://nvidia.github.io/libnvidia-contain er/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list && sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
Install your drivers. sudo apt install -y nvidia-container-runtime cuda-drivers-fabricmanager-550 nvidia-headless-550-server
Configure your runtimes. (Only containerd is needed but I’m doing docker as well because the GPU computer is a desktop.) sudo nvidia-ctk runtime configure --runtime=docker && sudo nvidia-ctk runtime configure --runtime=containerd
Test if docker can run on the GPU. sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

Should return

 1+-----------------------------------------------------------------------------------------+
 2| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
 3|-----------------------------------------+------------------------+----------------------+
 4| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
 5| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
 6|                                         |                        |               MIG M. |
 7|=========================================+========================+======================|
 8|   0  NVIDIA GeForce RTX 3060        Off |   00000000:0B:00.0  On |                  N/A |
 9|  0%   53C    P3             32W /  170W |     316MiB /  12288MiB |     36%      Default |
10|                                         |                        |                  N/A |
11+-----------------------------------------+------------------------+----------------------+
12                                                                                         
13+-----------------------------------------------------------------------------------------+
14| Processes:                                                                              |
15|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
16|        ID   ID                                                               Usage      |
17|=========================================================================================|
18+-----------------------------------------------------------------------------------------+

Install k3s on your node now. curl -sfL https://get.k3s.io | K3S_URL=https://${PRIMARY_K3S_IP}:6443 K3S_TOKEN=${K3S_CLUSTER_TOKEN} sh -s -
Label the node as having a GPU. kubectl label nodes <your-node-name> gpu=true.
Create a RunTimeClass object on the cluster.

1---
2apiVersion: node.k8s.io/v1
3kind: RuntimeClass
4metadata:
5  name: nvidia
6handler: nvidia

Create a DaemonSet for the Nvidia GPU controller. (Notice we set the node selector for gpu : “true” as we only need the DaemonSet to run on GPU nodes.)

 1---
 2apiVersion: apps/v1
 3kind: DaemonSet/home/dstorey/dylanbstorey.gitlab.io/content/blog/rasberry_pi_usb_boot.md
 4spec:
 5  selector:
 6    matchLabels:
 7      name: nvidia-device-plugin-ds
 8  updateStrategy:
 9    type: RollingUpdate
10  template:
11    metadata:
12      labels:
13        name: nvidia-device-plugin-ds
14    spec:
15      nodeSelector:
16        gpu : "true"
17      tolerations:
18      - key: nvidia.com/gpu
19        operator: Exists
20        effect: NoSchedule
21      # Mark this pod as a critical add-on; when enabled, the critical add-on
22      # scheduler reserves resources for critical add-on pods so that they can
23      # be rescheduled after a failure.
24      # See https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/
25      priorityClassName: "system-node-critical"
26      runtimeClassName: nvidia
27      containers:
28      - image: nvcr.io/nvidia/k8s-device-plugin:v0.14.1
29        name: nvidia-device-plugin-ctr
30        env:
31          - name: FAIL_ON_INIT_ERROR
32            value: "false"
33        securityContext:
34          allowPrivilegeEscalation: false
35          capabilities:
36            drop: ["ALL"]
37        volumeMounts:
38        - name: device-plugin
39          mountPath: /var/lib/kubelet/device-plugins
40      volumes:
41      - name: device-plugin
42        hostPath:
43          path: /var/lib/kubelet/device-plugins

Test your new toy with the nbody problem.

 1cat << EOF | kubectl create -f -                                                                                              ─╯
 2apiVersion: v1  
 3kind: Pod
 4metadata:
 5  name: nbody-gpu-benchmark
 6  namespace: default
 7spec:
 8  restartPolicy: OnFailure
 9  runtimeClassName: nvidia
10  containers:
11  - name: cuda-container
12    image: nvcr.io/nvidia/k8s/cuda-sample:nbody
13    args: ["nbody", "-gpu", "-benchmark"]
14    resources:
15      limits:
16        nvidia.com/gpu: 1
17    env:
18    - name: NVIDIA_VISIBLE_DEVICES
19      value: all
20    - name: NVIDIA_DRIVER_CAPABILITIES
21      value: all
22EOF

kubectl logs nbody-gpu-benchmark -n default

 1Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
 2        -fullscreen       (run n-body simulation in fullscreen mode)
 3        -fp64             (use double precision floating point values for simulation)
 4        -hostmem          (stores simulation data in host memory)
 5        -benchmark        (run benchmark to measure performance) 
 6        -numbodies=<N>    (number of bodies (>= 1) to run in simulation) 
 7        -device=<d>       (where d=0,1,2.... for the CUDA device to use)
 8        -numdevices=<i>   (where i=(number of CUDA devices > 0) to use for simulation)
 9        -compare          (compares simulation results running once on the default GPU and once on the CPU)
10        -cpu              (run n-body simulation on the CPU)
11        -tipsy=<file.bin> (load a tipsy model file for simulation)
12
13NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
14
15> Windowed mode
16> Simulation data stored in video memory
17> Single precision floating point simulation
18> 1 Devices used for simulation
19GPU Device 0: "Ampere" with compute capability 8.6
20
21> Compute 8.6 CUDA device: [NVIDIA GeForce RTX 3060]
2228672 bodies, total time for 10 iterations: 22.099 ms
23= 372.001 billion interactions per second
24= 7440.026 single-precision GFLOP/s at 20 flops per interaction