PyTorch on Windows (ROCm on AMD Radeon 780M gfx1103)
Current Windows Support Status
ROCm support on Windows
ROCm support on Windows is currently incomplete and under active development. Users should expect potential instability, limited feature sets, and library-specific issues compared to the Linux implementation.
Key Resources:
- GPU Compatibility: Check SUPPORTED_GPUS.md for the latest list of verified hardware.
- Component Support Status: Refer to windows_support.md for a detailed breakdown of ROCm components.
- GFX1103 Progress: Track the latest updates for Radeon 780M and other consumer GPUs in TheRock Issue #1337.
Support Summary:
- Supported: Core math libraries (rocBLAS, rocRAND, rocFFT, rocSOLVER, rocSPARSE), ML libraries (MIOpen, hipDNN), and the AMD-LLVM compiler toolchain are generally functional.
- Unsupported/Limited: Profiling tools (rocprofiler-sdk, aqlprofile), communication libraries (RCCL), and media decoding (rocDecode, rocJPEG) are currently unsupported or restricted on Windows. System-level tools like
amdsmiandrocr-runtimeare also pending full support.
Prerequisites
- Install the latest Adrenaline driver.
- Read TheRock release guidance (Windows and compatibility notes): https://github.com/ROCm/TheRock/blob/main/RELEASES.md.
ROCm installation paths
Support three ways to install ROCm:
Option 1 (recommended for most users): pip + nightlies
uv pip install --index-url https://rocm.nightlies.amd.com/v2/gfx110X-all/ "rocm[libraries,devel]"
Option 2 (recommended for reproducibility): artifacts download
- Clone TheRock repository
- Use the artifact installation helper:
TheRock\build_tools\install_rocm_from_artifacts.py
- Supported channels:
dev(recommended)nightly
- Optional: target a specific GitHub Actions run ID for fixed behavior. You can install specific versions verified for certain device models. Example reference run:
- https://github.com/ROCm/TheRock/actions/runs/23688149485/job/69056949871
Check more in Artifact install docs
Option 3 (source build, least recommended)
Source builds require ~100GB disk, tons of hours, should be equivalent to artifacts in code path. Use only if you need custom build control.
Prerequisites and Set up Environment
- Confirm prerequisites in RELEASES.md
- Reinstall Git with "Use Git and optional Unix tools from the Windows Command Prompt".
- Set these git options:
git config --global core.symlinks truegit config --global core.longpaths truegit config --global core.autocrlf true
- Clone https://github.com/ROCm/TheRock.
- Validate environment using:
.\TheRock\build_tools\validate_windows_install.ps1
VS Build Tools environment note
x64 Native Tools Command Prompt for VS 2022may fail PowerShell scripts.Developer PowerShell for VS 2022is x86 by default, not sufficient for TheRock builds.- Use
scripts\activate_building_tools.ps1in this repo to mapvsDevCmdoutput into PowerShell. - Usage: Open PowerShell, copy the script content and run in terminal. Note that
vsDevCmdpath needs adjustment based on actual install. This script converts bat settings to PowerShell environment variables to recognize VS Build Tools.
Build flow
uv pip install -r requirements.txt. If venv is not activated, activate it first.uv run python ./build_tools/fetch_sources.py. (Takes a long time, handles many files).TheRock\build_tools\setup_ccache.py(Optional).- Build:
cmake -B build -GNinja . -DTHEROCK_AMDGPU_FAMILIES=gfx110X-all
Additional docs:
- https://github.com/ROCm/TheRock/blob/main/docs/development/README.md
- https://github.com/ROCm/TheRock/blob/main/docs/development/windows_support.md
PyTorch installation
This step usually has no issues. Currently supports Python 3.10-3.12 for Torch 2.9 & 2.10.
Recommended:
uv pip install --index-url https://rocm.nightlies.amd.com/v2/gfx110X-all/ torch torchaudio torchvision
Other Methods
This pulls prebuilt wheels that usually have the best compatibility.
If your device still has compatibility issues, inspect the Windows PyTorch wheel release workflow for artifact options or source build instructions.
Fixing Known issue: HIP API error 0100 on iGPU mapping
checkHipErrors() HIP API error = 0100 "no ROCm-capable device is detected"
Observed GPU enumeration issue on Windows
TheRock issues confirm that Windows iGPU + discrete GPU enumeration is still under active work:
- AMD iGPU is cuda:0 when using ROCm wheels built by TheRock
- AMD developer confirms Windows device enumeration priority is buggy and can skip the iGPU or conflict with dGPU.
- Standard workaround: set
HIP_VISIBLE_DEVICES=0explicitly.
- Confused about current status of support for Radeon 780M (gfx1103) on Windows
- Confirms gfx1103 support in TheRock nightlies is turned on, but some libraries are not yet fully bulletproof; first-load or op compilation hangs are reported.
- Hangs/fails on Radeon 780M (gfx1103)
- Multiple iGPU users report silent lockups without explicit visible device, or memory scheduling issues.
- See more on Issue #1184
Workaround (for current session)
To force the HIP runtime to recognize the iGPU, you must explicitly set the HIP_VISIBLE_DEVICES environment variable to 0.
In PowerShell (Current Session):
$env:HIP_VISIBLE_DEVICES = "0"
In cmd.exe (Current Session):
set HIP_VISIBLE_DEVICES=0
To make permanent, add
HIP_VISIBLE_DEVICESuser environment variable value0in Windows Settings.More information can be found in GitHub issues.
Final Verification and Testing
import torch
import time
print("ROCm:", torch.version.hip)
print("GPU available:", torch.cuda.is_available())
if torch.cuda.is_available():
print(torch.cuda.get_device_name(0))
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)
# Create deterministic tensors
tensor_a_cpu = torch.full((1000, 1000), 2.0, device='cpu')
tensor_b_cpu = torch.full((1000, 1000), 3.0, device='cpu')
# CPU computation
start_time = time.time()
result_cpu = tensor_a_cpu + tensor_b_cpu
cpu_time = time.time() - start_time
print(f"CPU operation took: {cpu_time:.6f} seconds")
# GPU computation
if torch.cuda.is_available():
tensor_a_gpu = tensor_a_cpu.to('cuda')
tensor_b_gpu = tensor_b_cpu.to('cuda')
torch.cuda.synchronize() # ensure accurate timing
start_time = time.time()
result_gpu = tensor_a_gpu + tensor_b_gpu
torch.cuda.synchronize() # wait for GPU to finish
gpu_time = time.time() - start_time
print(f"GPU operation took: {gpu_time:.6f} seconds")
# Move GPU result back to CPU for comparison
result_gpu_cpu = result_gpu.to('cpu')
# Verify correctness
if torch.allclose(result_cpu, result_gpu_cpu):
print("CPU and GPU results match!")
else:
print("Results differ!")
Sample output
CPU operation took: 0.004181 seconds
GPU operation took: 0.001731 seconds
CPU and GPU results match!