Skip to main content

Why GPU acceleration matters

Running AI models on CPU alone is functional but slow. With a compatible NVIDIA GPU, inference speed improves dramatically:
SetupTypical speed
CPU only10–15 tokens/second
NVIDIA GPU (accelerated)100+ tokens/second
That is a 10–20x improvement — the difference between waiting 30 seconds for a response and getting one in under 3 seconds.

Requirements

  • A server with a compatible NVIDIA GPU installed
  • The NVIDIA Container Toolkit installed on the host OS
N.O.M.A.D. detects GPU availability automatically when the Container Toolkit is present. You do not need to configure anything manually beyond the toolkit installation.
AMD GPUs are not currently supported for GPU-accelerated AI inference in N.O.M.A.D.

Setup

1

Install an NVIDIA GPU

Install a compatible NVIDIA GPU in your server if one is not already present. Any modern NVIDIA GPU with sufficient VRAM will work. More VRAM allows you to run larger models.For build recommendations at different price points, see the Hardware Guide.
2

Install the NVIDIA Container Toolkit

Follow the official installation guide for the NVIDIA Container Toolkit on your host OS:https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.htmlThis toolkit allows Docker containers to access the GPU. Without it, containers (including the AI Assistant) cannot use the GPU even if one is present.
3

Force reinstall the AI Assistant

N.O.M.A.D. detects the GPU during the AI Assistant installation process. If the AI Assistant was already installed before you set up the toolkit, you need to reinstall it:
  1. Go to Settings → Apps
  2. Find AI Assistant
  3. Click Force Reinstall
During reinstall, N.O.M.A.D. detects the NVIDIA Container Toolkit and configures the AI container with GPU support. You will see NVIDIA container runtime detected in the installation progress log.
Force Reinstall preserves your downloaded AI models. You will not need to re-download them.

After upgrading or swapping your GPU

If you add a new GPU or swap to a different GPU after the initial AI Assistant install, the AI container still has its old configuration and will continue running on CPU only. The fix is the same: Force Reinstall the AI Assistant from Settings → Apps. This recreates the container with the correct GPU configuration for your current hardware.

”GPU passthrough not working” warning

If you see a “GPU passthrough not working” banner on the System Information or AI Settings pages, it means N.O.M.A.D. detected a GPU on the host but the AI container cannot access it.
This typically happens when:
  • The NVIDIA Container Toolkit was installed after the AI Assistant
  • The GPU was added or swapped after the AI Assistant was installed
  • The container runtime configuration changed
To fix it: Click the “Fix: Reinstall AI Assistant” button shown in the warning banner, or manually go to Settings → Apps → AI Assistant → Force Reinstall. This recreates the container with proper GPU access and preserves your downloaded models.

Measuring the improvement

Run the System Benchmark before and after enabling GPU acceleration to measure the performance difference:
  1. Go to Settings → System Benchmark
  2. Run a Full Benchmark before making any changes
  3. Enable GPU acceleration (steps above)
  4. Run the Full Benchmark again
  5. Compare your NOMAD Score and tokens-per-second results
You can also submit your score to the community leaderboard to see how your build compares.