Overview of ROCm Ecosystem (v6.4.1-20250526)¶
Work-in-progress
This document is a work-in-progress. It may still contain inaccuracies or mistakes.
This overview is being created in the context of adding support for ROCm to EESSI, the European Environment for Scientific Software Installations (https://eessi.io).
Last update: 26 May 2025
Jump to Overview | Jump to ABC
Table of Contents¶
- Introduction
- AMD GPU Microarchitectures
- Core Components
- Programming Models
- Compiler Ecosystem
- Developer Tools
- Libraries and Frameworks
- Compatibility Policies
- AMD GPUs in Azure
Introduction¶
The AMD ROCm™ (Radeon Open Compute) platform is an open-source software stack designed for GPU computing. ROCm 6.4.x provides a comprehensive set of tools, libraries, and software development kits that enable developers to harness the power of AMD's hardware accelerators.
ROCm serves as AMD's unified platform for high-performance computing (HPC), artificial intelligence (AI), and machine learning workloads, offering a viable alternative to NVIDIA's CUDA ecosystem. The platform is designed with portability, performance, and open standards in mind.
The ROCm software stack consists of six major parts:
- AMD GPU Microarchitectures: the microarchitectures used by AMD GPU hardware
- Core Components: software essential to using AMD GPUs (drivers, runtimes, etc)
- Programming Models: how to create programs that run on AMD GPUs
- Compiler Ecosystem: compilers with support for the programming models
- Developer Tools: debugging, profiling, and tracing tools
- Libraries and Frameworks: for common operations and programming structures
AMD GPU Microarchitectures¶
AMD's GPU architectures have evolved significantly over the years, with distinct product lines targeting different market segments.
CDNA (Compute DNA)¶
CDNA is AMD's data center and HPC-focused architecture for GPU compute workloads.
- CDNA 1 (2020)
- Used in AMD Instinct MI100 accelerator
- Matrix Core Technology for AI/ML workloads
- CDNA 2 (2021)
- Powers AMD Instinct MI200 series
- MCM (multi-chip module) design with chiplets
- Infinity Fabric connections for multi-GPU scaling
- CDNA 3 (2023)
- Powers AMD Instinct MI300 series
- Integrates CPU and GPU in the same package (for example MI300A) (APD - Accelerated Processing Device)
- Enhanced AI and HPC capabilities
RDNA (Radeon DNA)¶
RDNA is AMD's consumer-focused graphics architecture, designed for gaming and content creation.
- RDNA 1 (2019)
- First introduced with the Radeon RX 5000 series
- RDNA 2 (2020)
- Powers Radeon RX 6000 series
- Used in PlayStation 5 and Xbox Series X/S consoles
- RDNA 3 (2022)
- Powers Radeon RX 7000 series
- Chiplet-based design (first for consumer GPUs)
- RDNA 4 (2024)
- Powers latest Radeon RX 8000 and 9000 series
Earlier Architectures¶
- GCN (Graphics Core Next) - 2011-2019
- Five generations (GCN 1-5)
- Transitioned to RDNA for consumer products
- Powered Radeon HD 7000 through RX Vega and some RX 500 series
- Vega (2017)
- Based on GCN 5
- Used in Radeon RX Vega and Radeon VII
GFX Codes¶
In LLVM each AMDGPU processor has an architecture (GFX) code that indicates which specific microarchitecture is used. These codes are critical for hardware compatibility and optimization with ROCm. Generally, AMD uses the "gfxAB" format, where A is a major version indicator and B a two-digit minor version indicator. The format "gfxA" is also used to refer to a family of architectures with the same major version indicator.
An overview of gfx codes:
- GFX6 (GCN): gfx600, gfx601, gfx602
- GFX7 (GCN): gfx700, ..., gfx705
- GFX8 (GCN): gfx801, gfx802, gfx803, gfx805, gfx810
- GFX9 (Vega): gfx900, gfx902, gfx904, gfx906
- GFX9 (CDNA1): gfx908
- GFX9 (CDNA2): gfx90a
- GFX9 (CDNA3): gfx942
- GFX10.1 (RDNA1): gfx1010, ..., gfx1013
- GFX10.3 (RDNA2): gfx1030, ..., gfx1036
- GFX11 (RDNA3): gfx1100, ..., gfx1103
- GFX11 (RDNA3.5): gfx1150, ..., gfx1153
- GFX12 (RDNA4): gfx1200, gfx1201
Core Components¶
- AMDGPU Driver with KFD
- The kernel-mode driver for AMD GPUs
- Github
- Platform Runtime
- Runtime that manages GPU resources, scheduling, and memory management
- Github
- ROCm-LLVM
- AMD-maintained fork of the LLVM git repository
- Github
- AMD SMI (System Management Interface)
- AMD SMI - equivalent to nvidia-smi
- Successor to ROCm SMI
- Github
- ROCm SMI (System Management Interface) (deprecated)
- ROCm SMI LIB - equivalent to nvidia-smi
- Github
- ROCmInfo
- ROCm Application for Reporting System Info
- Github
- ROCTracer
- ROCm Tracer Callback/Activity Library for Performance tracing AMD GPUs
- Github
- ROCm examples
- A collection of examples for the ROCm software stack
- Github
Core components dependencies¶
graph LR;
subgraph Driver
A[AMDGPU Driver with KFD]
end
subgraph Runtime
B[ROCm Platform Runtime]
end
subgraph Compiler
C[ROCm LLVM Compiler]
end
subgraph ROCm
D[ROCm core]
end
subgraph AMD smi
E[AMD smi]
end
subgraph Programming Model
F[HIP]
end
subgraph Reporting
G[ROCminfo]
end
B -->|Depends on| A
B -->|Depends on| C
B -->|Depends on| D
D -->|Depends on| E
D -->|Depends on| F
G -->|Depends on| C
Programming Models¶
HIP (Heterogeneous-Computing Interface for Portability)¶
HIP is AMD's C++ runtime API and kernel language that allows developers to write portable code that can run on both AMD and NVIDIA GPUs. It's a key component of ROCm's strategy for facilitating code migration from CUDA.
- HIP Github
- CLR Github
- Features:
OpenMP Support¶
ROCm supports OpenMP offloading, which allows developers to use directive-based programming to offload computations to GPUs.
- Features:
- Familiar pragma-based approach
- Incremental parallelization of existing CPU code
- Support for target offload constructs
OpenCL Support¶
While not the primary focus of ROCm, OpenCL support is maintained for compatibility with existing code bases and as an open standard option.
Programming Models Dependencies¶
graph LR;
subgraph Key Programming Models
A[HIP]
B[OpenMP]
C[OpenCL]
end
subgraph Compiler
D[ROCm-LLVM]
end
subgraph ROCm Components
E[rocm-cmake]
F[ROCmInfo]
G[ROCm Core]
end
A -->|Depends on| D
A -->|Depends on| E
A -->|Depends on| F
A -->|Depends on| G
B -->|Depends on| G
C -->|Depends on| G
Compiler Ecosystem¶
ROCm provides a comprehensive set of compilers to support various programming languages and models. These compilers are essential for translating high-level code into optimized machine code for AMD GPUs.
C/C++ Compilers¶
- ROCm-LLVM (AMDGPU LLVM):
- AOMP (AMD OpenMP Compiler) (preview):
- Specialized for OpenMP target offloading to AMD GPUs
- Based on the LLVM project with specific optimizations for OpenMP
- Supports OpenMP 5.0+ features relevant to GPU offloading
- Currently a development-preview, not yet a full product
- Github
- AOCC (AMD Optimizing C/C++ Compiler):
- Primarily focused on AMD CPU optimization
- Can be used in conjunction with ROCm for heterogeneous computing
- Based on LLVM/Clang with AMD-specific optimizations
- Closed source
- hipcc:
- Compiler wrapper for HIP applications
- Simplifies compilation process by handling complex flag combinations
- Part of the HIP package
Fortran Compilers¶
- AOCC (AMD Optimizing Fortran Compiler):
- Based on Flang and LLVM
- Supports GPU offloading via OpenMP directives
- Optimized for AMD architectures
- Flang for ROCm (deprecated):
- Part of the LLVM project's Fortran implementation
- The new Flang implementation (as described in LLVM's blog post) brings improved compatibility and performance
- Github
Compilers Dependencies¶
graph LR;
subgraph Compilers
A[ROCm LLVM]
B[AOCC]
C[HIPCC]
end
subgraph ROCm Components
D[ROCm Core]
E[HIP]
end
A -->|Depends on| D
C -->|Depends on| E
Developer Tools¶
ROCm offers several tools to aid in development, debugging, and performance optimization:
- ROCgdb: Debugger for HIP and OpenCL applications
- ROCProfiler: Performance profiling tool
- rocm-cmake: CMake modules for ROCm
- ROCm Compute Profiler: Performance analysis tool for AMD GPUs
- ROCTracer: API tracing library
Developer Tools Dependencies¶
graph LR;
subgraph Developer tools
D[ROCProfiler]
A[ROCm-cmake]
B[ROCTracer]
H[ROCm Compute Profiler]
C[ROCgdb]
end
subgraph ROCm Components
F[ROCm LLVM]
E[ROCm Core]
G[ROCminfo]
end
B -->|Depends on| E
B -->|Depends on| F
D -->|Depends on| F
D -->|Depends on| E
D -->|Depends on| G
A -->|Depends on| F
H -->|Depends on| E
Libraries and Frameworks¶
ROCm provides a rich set of libraries to accelerate various computational workloads.
Core Math Libraries¶
- rocBLAS: Basic Linear Algebra Subprograms implementation
- rocSOLVER: Linear algebra solver library
- rocFFT: Fast Fourier Transform implementation
- rocRAND: Random number generator library
- rocSPARSE: Sparse matrix routines
ML/DL Frameworks¶
- MIOpen: Deep learning primitives library
- ROCm TensorFlow: TensorFlow support for AMD GPUs
- ROCm PyTorch: PyTorch support for AMD GPUs
- RCCL: Communication library for multi-GPU/multi-node training
Communication Libraries¶
- ROCm Communication Collectives Library (RCCL): Optimized collective operations
- UCX: Unified Communication X support
- ROCm MPI: Message Passing Interface integration
Compatibility Policies¶
ROCm Version and GPU Driver Compatibility¶
ROCm follows a versioning scheme that ensures compatibility between the software stack and GPU drivers.
- Major Version Compatibility:
- Major ROCm versions (e.g., 6.x) typically maintain driver compatibility within the same major version.
- Major version upgrades may require driver updates.
- Minor Version Compatibility:
- Minor versions (e.g., 6.4.x) are generally compatible with drivers designed for the same major version.
- Backward compatibility is maintained where possible, but newer hardware features may require newer drivers.
ROCm Version and glibc Compatibility¶
Versions of glibc supported by ROCm 6.4:
- 2.28
- 2.31
- 2.34
- 2.35
- 2.36
- 2.38
- 2.39
AMD GPUs in Azure¶
Azure offers several VM series featuring AMD GPUs. The following is an overview of available SKUs.
- NVv4 series
- cpu: AMD EPYC 7V12 (Rome) [x86-64]
- gpu: AMD Instinct MI25 GPU (16GB)
- Azure
- NGads_V620 series
- cpu: AMD EPYC 7763 (Milan) [x86-64]
- gpu: AMD Radeon PRO V620 GPU (32GB)
- Azure
- NVads_V710_v5 series
- cpu: AMD EPYC 9V64 F (Genoa) [x86-64]
- gpu: AMD Radeon™ Pro V710
- Azure
- ND-MI300X-V5 series
- cpu: Intel Xeon (Sapphire Rapids) [x86-64]
- gpu: AMD Instinct MI300X GPU (192GB)
- Azure
ABC of ROCm¶
AMDGPU Driver | AMD Instinct | AMD SMI | AOCC | AOMP | CDNA | GCN | GFX | HIP | HIPIFY | OpenCL | OpenMP | Platform Runtime | Radeon RX | RDNA | ROCm | ROCm-LLVM | ROCm SMI | Vega
A¶
AMDGPU Driver¶
The kernel-mode driver for AMD GPUs that provides the foundation for ROCm's functionality, including the Kernel Fusion Driver (KFD) that enables compute capabilities.
AMD Instinct¶
AMD Instinct is AMD's dedicated compute accelerator lineup for data centers and AI/HPC applications, optimized for the ROCm software platform.
AMD SMI¶
AMD SMI (System Management Interface) is a command-line tool within the ROCm ecosystem that allows users to query and control various aspects of AMD GPUs. ROCm SMI is an older, more limited interface primarily used for hardware monitoring that has been largely superseded by AMD SMI, which offers a broader range of management capabilities.
AOCC¶
AMD Optimizing C/C++/Fortran Compiler, AMD's optimizing compiler suite primarily focused on AMD CPU optimization but can work with ROCm for heterogeneous computing.
AOMP¶
AMD OpenMP Compiler, a specialized compiler (development-preview) for OpenMP target offloading to AMD GPUs, supporting OpenMP 5.0+ features for GPU computing.
C¶
CDNA¶
CDNA (Compute DNA) is AMD's GPU architecture optimized specifically for data center and high-performance computing workloads within the ROCm ecosystem.
G¶
GCN¶
GCN (Graphics Core Next) is AMD's older GPU architecture that served as the foundation for their compute-focused platforms in early ROCm releases.
GFX¶
GFX codes in AMD ROCm are architecture identifiers that specify GPU hardware generations, determining compatibility and optimization targets for HPC and machine learning workloads on AMD GPUs.
H¶
HIP¶
HIP (Heterogeneous-Compute Interface for Portability) is AMD's C++ runtime API and kernel language that allows developers to write portable code that can run on both AMD GPUs and NVIDIA GPUs, serving as a key component of the ROCm platform for high-performance computing and machine learning workloads.
HIPIFY¶
HIPIFY is a tool within AMD's ROCm platform that converts CUDA code into portable HIP (Heterogeneous-computing Interface for Portability) code to enable GPU applications to run on AMD hardware.
O¶
OpenCL¶
OpenCL is a framework that allows developers to write programs that execute across heterogeneous platforms (including AMD GPUs) by using the OpenCL runtime and compiler infrastructure provided within the ROCm ecosystem.
OpenMP¶
OpenMP is a parallel programming model supported through the ROCm toolchain that allows developers to write multi-threaded CPU and GPU code using familiar OpenMP directives, targeting AMD GPUs via the Clang/LLVM compiler infrastructure.
P¶
Platform Runtime¶
The Platform Runtime refers to the ROCr (ROCm Runtime) layer that provides low-level APIs for managing GPU resources, memory, and queues, forming the foundation upon which higher-level programming models like HIP operate.
R¶
Radeon RX¶
AMD Radeon RX is a consumer-focused GPU series primarily designed for gaming and content creation.
RDNA¶
RDNA (Radeon DNA) is AMD's consumer-focused graphics architecture optimized for gaming and media applications within the ROCm ecosystem.
ROCm¶
Radeon Open Compute is an open-source software stack developed by AMD for GPU computing and machine learning applications.
ROCm-LLVM¶
AMD ROCm's LLVM implementation is a modified version of the LLVM compiler infrastructure that enables GPU code generation, optimization, and execution for AMD GPUs within the ROCm (Radeon Open Compute) platform, providing essential support for high-performance computing and machine learning workloads.
ROCm SMI¶
ROCm SMI (System Management Interface) is a command-line utility for monitoring and managing AMD GPUs within the ROCm ecosystem, providing functionality to query hardware information, control power states, monitor temperature, configure memory, and manage device performance. ROCm SMI is an older, more limited interface primarily used for hardware monitoring that has been largely superseded by AMD SMI, which offers a broader range of management capabilities.
V¶
Vega¶
Vega refers to AMD's GPU architecture that was one of the first to fully support the ROCm ecosystem for high-performance computing and machine learning workloads.