Overview of ROCm Ecosystem (v6.4.1-20250526)¶

Work-in-progress

This document is a work-in-progress. It may still contain inaccuracies or mistakes.

This overview is being created in the context of adding support for ROCm to EESSI, the European Environment for Scientific Software Installations (https://eessi.io).

Last update: 26 May 2025

Jump to Overview | Jump to ABC

Introduction¶

The AMD ROCm™ (Radeon Open Compute) platform is an open-source software stack designed for GPU computing. ROCm 6.4.x provides a comprehensive set of tools, libraries, and software development kits that enable developers to harness the power of AMD's hardware accelerators.

ROCm serves as AMD's unified platform for high-performance computing (HPC), artificial intelligence (AI), and machine learning workloads, offering a viable alternative to NVIDIA's CUDA ecosystem. The platform is designed with portability, performance, and open standards in mind.

The ROCm software stack consists of six major parts:

AMD GPU Microarchitectures: the microarchitectures used by AMD GPU hardware
Core Components: software essential to using AMD GPUs (drivers, runtimes, etc)
Programming Models: how to create programs that run on AMD GPUs
Compiler Ecosystem: compilers with support for the programming models
Developer Tools: debugging, profiling, and tracing tools
Libraries and Frameworks: for common operations and programming structures

AMD GPU Microarchitectures¶

AMD's GPU architectures have evolved significantly over the years, with distinct product lines targeting different market segments.

CDNA (Compute DNA)¶

CDNA is AMD's data center and HPC-focused architecture for GPU compute workloads.

CDNA 1 (2020)
- Used in AMD Instinct MI100 accelerator
- Matrix Core Technology for AI/ML workloads
CDNA 2 (2021)
- Powers AMD Instinct MI200 series
- MCM (multi-chip module) design with chiplets
- Infinity Fabric connections for multi-GPU scaling
CDNA 3 (2023)
- Powers AMD Instinct MI300 series
- Integrates CPU and GPU in the same package (for example MI300A) (APD - Accelerated Processing Device)
- Enhanced AI and HPC capabilities

RDNA (Radeon DNA)¶

RDNA is AMD's consumer-focused graphics architecture, designed for gaming and content creation.

RDNA 1 (2019)
- First introduced with the Radeon RX 5000 series
RDNA 2 (2020)
- Powers Radeon RX 6000 series
- Used in PlayStation 5 and Xbox Series X/S consoles
RDNA 3 (2022)
- Powers Radeon RX 7000 series
- Chiplet-based design (first for consumer GPUs)
RDNA 4 (2024)
- Powers latest Radeon RX 8000 and 9000 series

Earlier Architectures¶

GCN (Graphics Core Next) - 2011-2019
- Five generations (GCN 1-5)
- Transitioned to RDNA for consumer products
- Powered Radeon HD 7000 through RX Vega and some RX 500 series
Vega (2017)
- Based on GCN 5
- Used in Radeon RX Vega and Radeon VII

GFX Codes¶

In LLVM each AMDGPU processor has an architecture (GFX) code that indicates which specific microarchitecture is used. These codes are critical for hardware compatibility and optimization with ROCm. Generally, AMD uses the "gfxAB" format, where A is a major version indicator and B a two-digit minor version indicator. The format "gfxA" is also used to refer to a family of architectures with the same major version indicator.

An overview of gfx codes:

GFX6 (GCN): gfx600, gfx601, gfx602
GFX7 (GCN): gfx700, ..., gfx705
GFX8 (GCN): gfx801, gfx802, gfx803, gfx805, gfx810
GFX9 (Vega): gfx900, gfx902, gfx904, gfx906
GFX9 (CDNA1): gfx908
GFX9 (CDNA2): gfx90a
GFX9 (CDNA3): gfx942
GFX10.1 (RDNA1): gfx1010, ..., gfx1013
GFX10.3 (RDNA2): gfx1030, ..., gfx1036
GFX11 (RDNA3): gfx1100, ..., gfx1103
GFX11 (RDNA3.5): gfx1150, ..., gfx1153
GFX12 (RDNA4): gfx1200, gfx1201

Source

Core Components¶

Source Github

AMDGPU Driver with KFD
- The kernel-mode driver for AMD GPUs
- Github
Platform Runtime
- Runtime that manages GPU resources, scheduling, and memory management
- Github
ROCm-LLVM
- AMD-maintained fork of the LLVM git repository
- Github
AMD SMI (System Management Interface)
- AMD SMI - equivalent to nvidia-smi
- Successor to ROCm SMI
- Github
ROCm SMI (System Management Interface) (deprecated)
- ROCm SMI LIB - equivalent to nvidia-smi
- Github
ROCmInfo
- ROCm Application for Reporting System Info
- Github
ROCTracer
- ROCm Tracer Callback/Activity Library for Performance tracing AMD GPUs
- Github
ROCm examples
- A collection of examples for the ROCm software stack
- Github

Core components dependencies¶

graph LR;
    subgraph Driver
        A[AMDGPU Driver with KFD]
    end
    subgraph Runtime
        B[ROCm Platform Runtime]
    end
    subgraph Compiler
        C[ROCm LLVM Compiler]
    end
    subgraph ROCm
        D[ROCm core]
    end
    subgraph AMD smi
        E[AMD smi]
    end
    subgraph Programming Model
        F[HIP]
    end
    subgraph Reporting
        G[ROCminfo]
    end
        B -->|Depends on| A
        B -->|Depends on| C
        B -->|Depends on| D
        D -->|Depends on| E
        D -->|Depends on| F
        G -->|Depends on| C

Programming Models¶

HIP (Heterogeneous-Computing Interface for Portability)¶

HIP is AMD's C++ runtime API and kernel language that allows developers to write portable code that can run on both AMD and NVIDIA GPUs. It's a key component of ROCm's strategy for facilitating code migration from CUDA.

HIP Github
CLR Github
Features:
- CUDA-like programming model with familiar syntax
- Source-level compatibility with CUDA
- Tools to automate conversion of CUDA code (HIPIFY)
  - Github
- Runtime API and kernel language for GPU computing

OpenMP Support¶

ROCm supports OpenMP offloading, which allows developers to use directive-based programming to offload computations to GPUs.

Features:
- Familiar pragma-based approach
- Incremental parallelization of existing CPU code
- Support for target offload constructs

OpenCL Support¶

While not the primary focus of ROCm, OpenCL support is maintained for compatibility with existing code bases and as an open standard option.

Github

Programming Models Dependencies¶

graph LR;
    subgraph Key Programming Models
        A[HIP]
        B[OpenMP]
        C[OpenCL]
    end
    subgraph Compiler
        D[ROCm-LLVM]
    end
    subgraph ROCm Components
        E[rocm-cmake]
        F[ROCmInfo]
        G[ROCm Core]
    end
        A -->|Depends on| D
        A -->|Depends on| E
        A -->|Depends on| F
        A -->|Depends on| G
        B -->|Depends on| G
        C -->|Depends on| G

Compiler Ecosystem¶

ROCm provides a comprehensive set of compilers to support various programming languages and models. These compilers are essential for translating high-level code into optimized machine code for AMD GPUs.

C/C++ Compilers¶

ROCm-LLVM (AMDGPU LLVM):
- The foundation of ROCm's compiler toolchain
- Based on LLVM/Clang infrastructure with AMD GPU-specific additions
- Supports HIP, OpenMP offloading, and other programming models
- Github
AOMP (AMD OpenMP Compiler) (preview):
- Specialized for OpenMP target offloading to AMD GPUs
- Based on the LLVM project with specific optimizations for OpenMP
- Supports OpenMP 5.0+ features relevant to GPU offloading
- Currently a development-preview, not yet a full product
- Github
AOCC (AMD Optimizing C/C++ Compiler):
- Primarily focused on AMD CPU optimization
- Can be used in conjunction with ROCm for heterogeneous computing
- Based on LLVM/Clang with AMD-specific optimizations
- Closed source
hipcc:
- Compiler wrapper for HIP applications
- Simplifies compilation process by handling complex flag combinations
- Part of the HIP package

Fortran Compilers¶

AOCC (AMD Optimizing Fortran Compiler):
- Based on Flang and LLVM
- Supports GPU offloading via OpenMP directives
- Optimized for AMD architectures
Flang for ROCm (deprecated):
- Part of the LLVM project's Fortran implementation
- The new Flang implementation (as described in LLVM's blog post) brings improved compatibility and performance
- Github

Compilers Dependencies¶

graph LR;
    subgraph Compilers
        A[ROCm LLVM]
        B[AOCC]
        C[HIPCC]
    end
    subgraph ROCm Components
        D[ROCm Core]
        E[HIP]
    end
    A -->|Depends on| D
    C -->|Depends on| E

Developer Tools¶

ROCm offers several tools to aid in development, debugging, and performance optimization:

ROCgdb: Debugger for HIP and OpenCL applications
- Github
ROCProfiler: Performance profiling tool
- Github
rocm-cmake: CMake modules for ROCm
- Github
ROCm Compute Profiler: Performance analysis tool for AMD GPUs
- Github
ROCTracer: API tracing library
- Github

Developer Tools Dependencies¶

graph LR;
    subgraph Developer tools
        D[ROCProfiler]
        A[ROCm-cmake]
        B[ROCTracer]
        H[ROCm Compute Profiler]
        C[ROCgdb]
    end
    subgraph ROCm Components
        F[ROCm LLVM]
        E[ROCm Core]
        G[ROCminfo]
    end
    B -->|Depends on| E
    B -->|Depends on| F
    D -->|Depends on| F
    D -->|Depends on| E
    D -->|Depends on| G
    A -->|Depends on| F
    H -->|Depends on| E

Libraries and Frameworks¶

ROCm provides a rich set of libraries to accelerate various computational workloads.

Core Math Libraries¶

rocBLAS: Basic Linear Algebra Subprograms implementation
- Github
rocSOLVER: Linear algebra solver library
- Github
rocFFT: Fast Fourier Transform implementation
- Github
rocRAND: Random number generator library
- Github
rocSPARSE: Sparse matrix routines
- Github

ML/DL Frameworks¶

MIOpen: Deep learning primitives library
- Github
ROCm TensorFlow: TensorFlow support for AMD GPUs
- Github
ROCm PyTorch: PyTorch support for AMD GPUs
- Github
RCCL: Communication library for multi-GPU/multi-node training
- Github

Communication Libraries¶

ROCm Communication Collectives Library (RCCL): Optimized collective operations
- Github
UCX: Unified Communication X support
- Github
ROCm MPI: Message Passing Interface integration
- Github

Compatibility Policies¶

Source

ROCm Version and GPU Driver Compatibility¶

ROCm follows a versioning scheme that ensures compatibility between the software stack and GPU drivers.

Major Version Compatibility:
- Major ROCm versions (e.g., 6.x) typically maintain driver compatibility within the same major version.
- Major version upgrades may require driver updates.
Minor Version Compatibility:
- Minor versions (e.g., 6.4.x) are generally compatible with drivers designed for the same major version.
- Backward compatibility is maintained where possible, but newer hardware features may require newer drivers.

ROCm Version and glibc Compatibility¶

Source

Versions of glibc supported by ROCm 6.4:

2.28
2.31
2.34
2.35
2.36
2.38
2.39

AMD GPUs in Azure¶

Azure offers several VM series featuring AMD GPUs. The following is an overview of available SKUs.

Source

NVv4 series
- cpu: AMD EPYC 7V12 (Rome) [x86-64]
- gpu: AMD Instinct MI25 GPU (16GB)
- Azure
NGads_V620 series
- cpu: AMD EPYC 7763 (Milan) [x86-64]
- gpu: AMD Radeon PRO V620 GPU (32GB)
- Azure
NVads_V710_v5 series
- cpu: AMD EPYC 9V64 F (Genoa) [x86-64]
- gpu: AMD Radeon™ Pro V710
- Azure
ND-MI300X-V5 series
- cpu: Intel Xeon (Sapphire Rapids) [x86-64]
- gpu: AMD Instinct MI300X GPU (192GB)
- Azure

ABC of ROCm¶

A¶

AMDGPU Driver ¶

The kernel-mode driver for AMD GPUs that provides the foundation for ROCm's functionality, including the Kernel Fusion Driver (KFD) that enables compute capabilities.

AMD Instinct ¶

AMD Instinct is AMD's dedicated compute accelerator lineup for data centers and AI/HPC applications, optimized for the ROCm software platform.

AMD Docs

AMD SMI ¶

AMD SMI (System Management Interface) is a command-line tool within the ROCm ecosystem that allows users to query and control various aspects of AMD GPUs. ROCm SMI is an older, more limited interface primarily used for hardware monitoring that has been largely superseded by AMD SMI, which offers a broader range of management capabilities.

AMD Docs

AOCC ¶

AMD Optimizing C/C++/Fortran Compiler, AMD's optimizing compiler suite primarily focused on AMD CPU optimization but can work with ROCm for heterogeneous computing.

AMD Docs

AOMP ¶

AMD OpenMP Compiler, a specialized compiler (development-preview) for OpenMP target offloading to AMD GPUs, supporting OpenMP 5.0+ features for GPU computing.

C¶

CDNA ¶

CDNA (Compute DNA) is AMD's GPU architecture optimized specifically for data center and high-performance computing workloads within the ROCm ecosystem.

AMD Docs and AMD Docs

G¶

GCN ¶

GCN (Graphics Core Next) is AMD's older GPU architecture that served as the foundation for their compute-focused platforms in early ROCm releases.

AMD Docs and AMD Docs

GFX ¶

GFX codes in AMD ROCm are architecture identifiers that specify GPU hardware generations, determining compatibility and optimization targets for HPC and machine learning workloads on AMD GPUs.

AMD Docs and AMD Docs

H¶

HIP ¶

HIP (Heterogeneous-Compute Interface for Portability) is AMD's C++ runtime API and kernel language that allows developers to write portable code that can run on both AMD GPUs and NVIDIA GPUs, serving as a key component of the ROCm platform for high-performance computing and machine learning workloads.

AMD Docs

HIPIFY ¶

HIPIFY is a tool within AMD's ROCm platform that converts CUDA code into portable HIP (Heterogeneous-computing Interface for Portability) code to enable GPU applications to run on AMD hardware.

AMD Docs

O¶

OpenCL ¶

OpenCL is a framework that allows developers to write programs that execute across heterogeneous platforms (including AMD GPUs) by using the OpenCL runtime and compiler infrastructure provided within the ROCm ecosystem.

OpenMP ¶

OpenMP is a parallel programming model supported through the ROCm toolchain that allows developers to write multi-threaded CPU and GPU code using familiar OpenMP directives, targeting AMD GPUs via the Clang/LLVM compiler infrastructure.

P¶

Platform Runtime ¶

The Platform Runtime refers to the ROCr (ROCm Runtime) layer that provides low-level APIs for managing GPU resources, memory, and queues, forming the foundation upon which higher-level programming models like HIP operate.

AMD Docs

R¶

Radeon RX ¶

AMD Radeon RX is a consumer-focused GPU series primarily designed for gaming and content creation.

RDNA ¶

RDNA (Radeon DNA) is AMD's consumer-focused graphics architecture optimized for gaming and media applications within the ROCm ecosystem.

AMD Docs and AMD Docs

ROCm ¶

Radeon Open Compute is an open-source software stack developed by AMD for GPU computing and machine learning applications.

AMD Docs

ROCm-LLVM ¶

AMD ROCm's LLVM implementation is a modified version of the LLVM compiler infrastructure that enables GPU code generation, optimization, and execution for AMD GPUs within the ROCm (Radeon Open Compute) platform, providing essential support for high-performance computing and machine learning workloads.

AMD Docs

ROCm SMI ¶

ROCm SMI (System Management Interface) is a command-line utility for monitoring and managing AMD GPUs within the ROCm ecosystem, providing functionality to query hardware information, control power states, monitor temperature, configure memory, and manage device performance. ROCm SMI is an older, more limited interface primarily used for hardware monitoring that has been largely superseded by AMD SMI, which offers a broader range of management capabilities.

AMD Docs

V¶

Vega ¶

Vega refers to AMD's GPU architecture that was one of the first to fully support the ROCm ecosystem for high-performance computing and machine learning workloads.

AMD Docs and AMD Docs