PyTorch on Windows (ROCm on AMD Radeon 780M gfx1103)

Current Windows Support Status

ROCm support on Windows

ROCm support on Windows is currently incomplete and under active development. Users should expect potential instability, limited feature sets, and library-specific issues compared to the Linux implementation.

Key Resources:

GPU Compatibility: Check SUPPORTED_GPUS.md for the latest list of verified hardware.
Component Support Status: Refer to windows_support.md for a detailed breakdown of ROCm components.
GFX1103 Progress: Track the latest updates for Radeon 780M and other consumer GPUs in TheRock Issue #1337.

Support Summary:

Supported: Core math libraries (rocBLAS, rocRAND, rocFFT, rocSOLVER, rocSPARSE), ML libraries (MIOpen, hipDNN), and the AMD-LLVM compiler toolchain are generally functional.
Unsupported/Limited: Profiling tools (rocprofiler-sdk, aqlprofile), communication libraries (RCCL), and media decoding (rocDecode, rocJPEG) are currently unsupported or restricted on Windows. System-level tools like amdsmi and rocr-runtime are also pending full support.

Prerequisites

Install the latest Adrenaline driver.
Read TheRock release guidance (Windows and compatibility notes): https://github.com/ROCm/TheRock/blob/main/RELEASES.md.

ROCm installation paths

Support three ways to install ROCm:

Option 1 (recommended for most users): pip + nightlies

uv pip install --index-url https://rocm.nightlies.amd.com/v2/gfx110X-all/ "rocm[libraries,devel]"

Option 2 (recommended for reproducibility): artifacts download

Clone TheRock repository
Use the artifact installation helper:
- TheRock\build_tools\install_rocm_from_artifacts.py
Supported channels:
- dev (recommended)
- nightly
Optional: target a specific GitHub Actions run ID for fixed behavior. You can install specific versions verified for certain device models. Example reference run:
- https://github.com/ROCm/TheRock/actions/runs/23688149485/job/69056949871

Check more in Artifact install docs

Option 3 (source build, least recommended)

Source builds require ~100GB disk, tons of hours, should be equivalent to artifacts in code path. Use only if you need custom build control.

Prerequisites and Set up Environment

Confirm prerequisites in RELEASES.md
Reinstall Git with "Use Git and optional Unix tools from the Windows Command Prompt".
Set these git options:
- git config --global core.symlinks true
- git config --global core.longpaths true
- git config --global core.autocrlf true
Clone https://github.com/ROCm/TheRock.
Validate environment using:
- .\TheRock\build_tools\validate_windows_install.ps1

VS Build Tools environment note

x64 Native Tools Command Prompt for VS 2022 may fail PowerShell scripts.
Developer PowerShell for VS 2022 is x86 by default, not sufficient for TheRock builds.
Use scripts\activate_building_tools.ps1 in this repo to map vsDevCmd output into PowerShell.
Usage: Open PowerShell, copy the script content and run in terminal. Note that vsDevCmd path needs adjustment based on actual install. This script converts bat settings to PowerShell environment variables to recognize VS Build Tools.

Build flow

uv pip install -r requirements.txt. If venv is not activated, activate it first.
uv run python ./build_tools/fetch_sources.py. (Takes a long time, handles many files).
TheRock\build_tools\setup_ccache.py (Optional).
Build: cmake -B build -GNinja . -DTHEROCK_AMDGPU_FAMILIES=gfx110X-all

Additional docs:

https://github.com/ROCm/TheRock/blob/main/docs/development/README.md
https://github.com/ROCm/TheRock/blob/main/docs/development/windows_support.md

PyTorch installation

This step usually has no issues. Currently supports Python 3.10-3.12 for Torch 2.9 & 2.10.

Recommended:

uv pip install --index-url https://rocm.nightlies.amd.com/v2/gfx110X-all/ torch torchaudio torchvision

Other Methods

This pulls prebuilt wheels that usually have the best compatibility.

If your device still has compatibility issues, inspect the Windows PyTorch wheel release workflow for artifact options or source build instructions.

Fixing Known issue: HIP API error 0100 on iGPU mapping

checkHipErrors() HIP API error = 0100 "no ROCm-capable device is detected"

Observed GPU enumeration issue on Windows

TheRock issues confirm that Windows iGPU + discrete GPU enumeration is still under active work:

AMD iGPU is cuda:0 when using ROCm wheels built by TheRock
- AMD developer confirms Windows device enumeration priority is buggy and can skip the iGPU or conflict with dGPU.
- Standard workaround: set HIP_VISIBLE_DEVICES=0 explicitly.
Confused about current status of support for Radeon 780M (gfx1103) on Windows
- Confirms gfx1103 support in TheRock nightlies is turned on, but some libraries are not yet fully bulletproof; first-load or op compilation hangs are reported.
Hangs/fails on Radeon 780M (gfx1103)
- Multiple iGPU users report silent lockups without explicit visible device, or memory scheduling issues.
- See more on Issue #1184

Workaround (for current session)

To force the HIP runtime to recognize the iGPU, you must explicitly set the HIP_VISIBLE_DEVICES environment variable to 0.

In PowerShell (Current Session):

$env:HIP_VISIBLE_DEVICES = "0"

In cmd.exe (Current Session):

set HIP_VISIBLE_DEVICES=0

To make permanent, add HIP_VISIBLE_DEVICES user environment variable value 0 in Windows Settings.

More information can be found in GitHub issues.

Final Verification and Testing

import torch
import time

print("ROCm:", torch.version.hip)
print("GPU available:", torch.cuda.is_available())

if torch.cuda.is_available():
    print(torch.cuda.get_device_name(0))

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)

# Create deterministic tensors
tensor_a_cpu = torch.full((1000, 1000), 2.0, device='cpu')
tensor_b_cpu = torch.full((1000, 1000), 3.0, device='cpu')

# CPU computation
start_time = time.time()
result_cpu = tensor_a_cpu + tensor_b_cpu
cpu_time = time.time() - start_time
print(f"CPU operation took: {cpu_time:.6f} seconds")

# GPU computation
if torch.cuda.is_available():
    tensor_a_gpu = tensor_a_cpu.to('cuda')
    tensor_b_gpu = tensor_b_cpu.to('cuda')

    torch.cuda.synchronize()  # ensure accurate timing
    start_time = time.time()

    result_gpu = tensor_a_gpu + tensor_b_gpu

    torch.cuda.synchronize()  # wait for GPU to finish
    gpu_time = time.time() - start_time
    print(f"GPU operation took: {gpu_time:.6f} seconds")

    # Move GPU result back to CPU for comparison
    result_gpu_cpu = result_gpu.to('cpu')

    # Verify correctness
    if torch.allclose(result_cpu, result_gpu_cpu):
        print("CPU and GPU results match!")
    else:
        print("Results differ!")

Sample output

CPU operation took: 0.004181 seconds
GPU operation took: 0.001731 seconds
CPU and GPU results match!