PyTroch on Ubuntu

Note

PyTorch does not support AMD AIE (NPU).

Prerequisite

Hardware

The experimental device is a 零刻 SER mini-pc. Specification: - AMD Ryzen 8845HS - 32GiB RAM - 1TiB SSD

OS requirement

only support Ubuntu 24.04.3, Ubuntu 22.04.5, RHEL 10.1, and RHEL 9.7

Some GUI may be conflicted with the newest amdgpu driver.

Tested: - Ubuntu 24.04 LTS Server (no GUI) - Ubuntu-mate (Ubuntu 24.04 LTS)

Not Supported: - Ubuntu 24.04 LTS Cinnamon (cannot boot after install the newest amdgpu driver, error code -22)

UnTested: - Ubuntu 24.04 LTS Desktop (gnome) - ...

VGRAM

It is recommended to fix 16GiB for VGRAM as my experiments take about 8.8 GiB memory on iGPU for each model with recorded parameters.

The standard way of configuration is: Bios -> Advanced -> AMD CBS -> NBIO -> GFX config.

Building Tools

sudo apt update
# forget from where
sudo apt install gfortran git ninja-build cmake g++ pkg-config xxd patchelf automake libtool python3-venv python3-dev libegl1-mesa-dev texinfo bison flex

Installation

There are tipically 3 ways. The easiest way now is to use the pre-built tarballs / wheels / deb.

Prebuilt

For more info, check TheRock release page

Install ROCm

# RoCm Installation
# https://rocm.docs.amd.com/projects/install-on-linux/en/docs-7.2.0/install/quick-start.html
wget https://repo.radeon.com/amdgpu-install/7.2/ubuntu/noble/amdgpu-install_7.2.70200-1_all.deb
# This will take about 30GiB spaces and hours of time
sudo apt install -y ./amdgpu-install_7.2.70200-1_all.deb
sudo apt update
sudo apt install -y python3-setuptools python3-wheel
sudo usermod -a -G render,video $LOGNAME # Add the current user to the render and video groups
newgrp render video
sudo apt install -y rocm

Checking AMDGPU Driver

# Driver check (skip reinstall if already OK)
# If the driver matches the required version and the kernel module is loaded, do nothing.
# Otherwise, print a warning and let you reinstall manually.
REQUIRED_AMDGPU="1:7.2.70200-2278374.24.04"
INSTALLED_AMDGPU=$(dpkg-query -W -f='${Version}' amdgpu 2>/dev/null || true)

if [ "$INSTALLED_AMDGPU" = "$REQUIRED_AMDGPU" ] && lsmod | grep -q amdgpu; then
  echo "✔ AMDGPU driver version $INSTALLED_AMDGPU is installed and module is loaded. Skipping driver reinstall."
else
  cat <<'EOF'

⚠️ AMDGPU driver check failed (version mismatch or module not loaded).

Installed version: ${INSTALLED_AMDGPU:-<none>}
Required version:  $REQUIRED_AMDGPU

To reinstall the driver manually, run the steps below from a text console (Ctrl+Alt+F3):

  sudo apt autoremove -y amdgpu-dkms
  sudo rm -f /etc/apt/sources.list.d/amdgpu.list
  sudo rm -rf /var/cache/apt/*
  sudo apt clean all
  sudo apt install -y ./amdgpu-install_7.2.70200-1_all.deb
  sudo apt update
  sudo apt install -y "linux-headers-$(uname -r)" "linux-modules-extra-$(uname -r)"
  sudo apt install -y amdgpu-dkms
  sudo update-initramfs -u -k all
  sudo reboot

EOF
  exit 1
fi

Install `uv` and Activate `uv` Virtual Environment

sudo apt update
sudo apt install curl -y
curl -LsSf https://astral.sh/uv/install.sh | sh

uv venv venv
source venv/bin/activate

Install Pre-built Packages

uv pip install --pre torch torchvision torchaudio \
  --index-url https://rocm.nightlies.amd.com/v2/gfx110X-all/

Test `RoCm` Availability

import torch
import time

print("ROCm:", torch.version.hip)
print("GPU available:", torch.cuda.is_available())

if torch.cuda.is_available():
    print(torch.cuda.get_device_name(0))

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)

# Create deterministic tensors
tensor_a_cpu = torch.full((1000, 1000), 2.0, device='cpu')
tensor_b_cpu = torch.full((1000, 1000), 3.0, device='cpu')

# CPU computation
start_time = time.time()
result_cpu = tensor_a_cpu + tensor_b_cpu
cpu_time = time.time() - start_time
print(f"CPU operation took: {cpu_time:.6f} seconds")

# GPU computation
if torch.cuda.is_available():
    tensor_a_gpu = tensor_a_cpu.to('cuda')
    tensor_b_gpu = tensor_b_cpu.to('cuda')

    torch.cuda.synchronize()  # ensure accurate timing
    start_time = time.time()

    result_gpu = tensor_a_gpu + tensor_b_gpu

    torch.cuda.synchronize()  # wait for GPU to finish
    gpu_time = time.time() - start_time
    print(f"GPU operation took: {gpu_time:.6f} seconds")

    # Move GPU result back to CPU for comparison
    result_gpu_cpu = result_gpu.to('cpu')

    # Verify correctness
    if torch.allclose(result_cpu, result_gpu_cpu):
        print("CPU and GPU results match!")
    else:
        print("Results differ!")

Old method

Before Dec 2025, pyTorch with ROCm supports should be installed by changing arch to gtx1100 by export HSA_OVERRIDE_GFX_VERSION=11.0.0. It is not neccessary now.

Build from Source

[WIP]

ROCm can be built by TheRock from source.

TheRock now supports gfx110X-all