Why GPU acceleration matters
Running AI models on CPU alone is functional but slow. With a compatible NVIDIA GPU, inference speed improves dramatically:| Setup | Typical speed |
|---|---|
| CPU only | 10–15 tokens/second |
| NVIDIA GPU (accelerated) | 100+ tokens/second |
Requirements
- A server with a compatible NVIDIA GPU installed
- The NVIDIA Container Toolkit installed on the host OS
AMD GPUs are not currently supported for GPU-accelerated AI inference in N.O.M.A.D.
Setup
Install an NVIDIA GPU
Install a compatible NVIDIA GPU in your server if one is not already present. Any modern NVIDIA GPU with sufficient VRAM will work. More VRAM allows you to run larger models.For build recommendations at different price points, see the Hardware Guide.
Install the NVIDIA Container Toolkit
Follow the official installation guide for the NVIDIA Container Toolkit on your host OS:https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.htmlThis toolkit allows Docker containers to access the GPU. Without it, containers (including the AI Assistant) cannot use the GPU even if one is present.
Force reinstall the AI Assistant
N.O.M.A.D. detects the GPU during the AI Assistant installation process. If the AI Assistant was already installed before you set up the toolkit, you need to reinstall it:
- Go to Settings → Apps
- Find AI Assistant
- Click Force Reinstall
NVIDIA container runtime detected in the installation progress log.Force Reinstall preserves your downloaded AI models. You will not need to re-download them.
After upgrading or swapping your GPU
If you add a new GPU or swap to a different GPU after the initial AI Assistant install, the AI container still has its old configuration and will continue running on CPU only. The fix is the same: Force Reinstall the AI Assistant from Settings → Apps. This recreates the container with the correct GPU configuration for your current hardware.”GPU passthrough not working” warning
This typically happens when:- The NVIDIA Container Toolkit was installed after the AI Assistant
- The GPU was added or swapped after the AI Assistant was installed
- The container runtime configuration changed
Measuring the improvement
Run the System Benchmark before and after enabling GPU acceleration to measure the performance difference:- Go to Settings → System Benchmark
- Run a Full Benchmark before making any changes
- Enable GPU acceleration (steps above)
- Run the Full Benchmark again
- Compare your NOMAD Score and tokens-per-second results