Skip to content

Overview of ROCm Ecosystem (v6.4.1-20250526)

Work-in-progress

This document is a work-in-progress. It may still contain inaccuracies or mistakes.

This overview is being created in the context of adding support for ROCm to EESSI, the European Environment for Scientific Software Installations (https://eessi.io).

Last update: 26 May 2025

Jump to Overview | Jump to ABC

Table of Contents

  1. Introduction
  2. AMD GPU Microarchitectures
  3. Core Components
  4. Programming Models
  5. Compiler Ecosystem
  6. Developer Tools
  7. Libraries and Frameworks
  8. Compatibility Policies
  9. AMD GPUs in Azure

Introduction

The AMD ROCm™ (Radeon Open Compute) platform is an open-source software stack designed for GPU computing. ROCm 6.4.x provides a comprehensive set of tools, libraries, and software development kits that enable developers to harness the power of AMD's hardware accelerators.

ROCm serves as AMD's unified platform for high-performance computing (HPC), artificial intelligence (AI), and machine learning workloads, offering a viable alternative to NVIDIA's CUDA ecosystem. The platform is designed with portability, performance, and open standards in mind.

The ROCm software stack consists of six major parts:

  1. AMD GPU Microarchitectures: the microarchitectures used by AMD GPU hardware
  2. Core Components: software essential to using AMD GPUs (drivers, runtimes, etc)
  3. Programming Models: how to create programs that run on AMD GPUs
  4. Compiler Ecosystem: compilers with support for the programming models
  5. Developer Tools: debugging, profiling, and tracing tools
  6. Libraries and Frameworks: for common operations and programming structures

AMD GPU Microarchitectures

AMD's GPU architectures have evolved significantly over the years, with distinct product lines targeting different market segments.

CDNA (Compute DNA)

CDNA is AMD's data center and HPC-focused architecture for GPU compute workloads.

  1. CDNA 1 (2020)
    • Used in AMD Instinct MI100 accelerator
    • Matrix Core Technology for AI/ML workloads
  2. CDNA 2 (2021)
    • Powers AMD Instinct MI200 series
    • MCM (multi-chip module) design with chiplets
    • Infinity Fabric connections for multi-GPU scaling
  3. CDNA 3 (2023)
    • Powers AMD Instinct MI300 series
    • Integrates CPU and GPU in the same package (for example MI300A) (APD - Accelerated Processing Device)
    • Enhanced AI and HPC capabilities

RDNA (Radeon DNA)

RDNA is AMD's consumer-focused graphics architecture, designed for gaming and content creation.

  1. RDNA 1 (2019)
    • First introduced with the Radeon RX 5000 series
  2. RDNA 2 (2020)
    • Powers Radeon RX 6000 series
    • Used in PlayStation 5 and Xbox Series X/S consoles
  3. RDNA 3 (2022)
    • Powers Radeon RX 7000 series
    • Chiplet-based design (first for consumer GPUs)
  4. RDNA 4 (2024)
    • Powers latest Radeon RX 8000 and 9000 series

Earlier Architectures

  1. GCN (Graphics Core Next) - 2011-2019
    • Five generations (GCN 1-5)
    • Transitioned to RDNA for consumer products
    • Powered Radeon HD 7000 through RX Vega and some RX 500 series
  2. Vega (2017)
    • Based on GCN 5
    • Used in Radeon RX Vega and Radeon VII

GFX Codes

In LLVM each AMDGPU processor has an architecture (GFX) code that indicates which specific microarchitecture is used. These codes are critical for hardware compatibility and optimization with ROCm. Generally, AMD uses the "gfxAB" format, where A is a major version indicator and B a two-digit minor version indicator. The format "gfxA" is also used to refer to a family of architectures with the same major version indicator.

An overview of gfx codes:

  • GFX6 (GCN): gfx600, gfx601, gfx602
  • GFX7 (GCN): gfx700, ..., gfx705
  • GFX8 (GCN): gfx801, gfx802, gfx803, gfx805, gfx810
  • GFX9 (Vega): gfx900, gfx902, gfx904, gfx906
  • GFX9 (CDNA1): gfx908
  • GFX9 (CDNA2): gfx90a
  • GFX9 (CDNA3): gfx942
  • GFX10.1 (RDNA1): gfx1010, ..., gfx1013
  • GFX10.3 (RDNA2): gfx1030, ..., gfx1036
  • GFX11 (RDNA3): gfx1100, ..., gfx1103
  • GFX11 (RDNA3.5): gfx1150, ..., gfx1153
  • GFX12 (RDNA4): gfx1200, gfx1201

Source

Core Components

Source Github

  • AMDGPU Driver with KFD
    • The kernel-mode driver for AMD GPUs
    • Github
  • Platform Runtime
    • Runtime that manages GPU resources, scheduling, and memory management
    • Github
  • ROCm-LLVM
    • AMD-maintained fork of the LLVM git repository
    • Github
  • AMD SMI (System Management Interface)
    • AMD SMI - equivalent to nvidia-smi
    • Successor to ROCm SMI
    • Github
  • ROCm SMI (System Management Interface) (deprecated)
    • ROCm SMI LIB - equivalent to nvidia-smi
    • Github
  • ROCmInfo
    • ROCm Application for Reporting System Info
    • Github
  • ROCTracer
    • ROCm Tracer Callback/Activity Library for Performance tracing AMD GPUs
    • Github
  • ROCm examples
    • A collection of examples for the ROCm software stack
    • Github

Core components dependencies

graph LR;
    subgraph Driver
        A[AMDGPU Driver with KFD]
    end
    subgraph Runtime
        B[ROCm Platform Runtime]
    end
    subgraph Compiler
        C[ROCm LLVM Compiler]
    end
    subgraph ROCm
        D[ROCm core]
    end
    subgraph AMD smi
        E[AMD smi]
    end
    subgraph Programming Model
        F[HIP]
    end
    subgraph Reporting
        G[ROCminfo]
    end
        B -->|Depends on| A
        B -->|Depends on| C
        B -->|Depends on| D
        D -->|Depends on| E
        D -->|Depends on| F
        G -->|Depends on| C

Programming Models

HIP (Heterogeneous-Computing Interface for Portability)

HIP is AMD's C++ runtime API and kernel language that allows developers to write portable code that can run on both AMD and NVIDIA GPUs. It's a key component of ROCm's strategy for facilitating code migration from CUDA.

  • HIP Github
  • CLR Github
  • Features:
    • CUDA-like programming model with familiar syntax
    • Source-level compatibility with CUDA
    • Tools to automate conversion of CUDA code (HIPIFY)
    • Runtime API and kernel language for GPU computing

OpenMP Support

ROCm supports OpenMP offloading, which allows developers to use directive-based programming to offload computations to GPUs.

  • Features:
    • Familiar pragma-based approach
    • Incremental parallelization of existing CPU code
    • Support for target offload constructs

OpenCL Support

While not the primary focus of ROCm, OpenCL support is maintained for compatibility with existing code bases and as an open standard option.

Programming Models Dependencies

graph LR;
    subgraph Key Programming Models
        A[HIP]
        B[OpenMP]
        C[OpenCL]
    end
    subgraph Compiler
        D[ROCm-LLVM]
    end
    subgraph ROCm Components
        E[rocm-cmake]
        F[ROCmInfo]
        G[ROCm Core]
    end
        A -->|Depends on| D
        A -->|Depends on| E
        A -->|Depends on| F
        A -->|Depends on| G
        B -->|Depends on| G
        C -->|Depends on| G

Compiler Ecosystem

ROCm provides a comprehensive set of compilers to support various programming languages and models. These compilers are essential for translating high-level code into optimized machine code for AMD GPUs.

C/C++ Compilers

  • ROCm-LLVM (AMDGPU LLVM):
    • The foundation of ROCm's compiler toolchain
    • Based on LLVM/Clang infrastructure with AMD GPU-specific additions
    • Supports HIP, OpenMP offloading, and other programming models
    • Github
  • AOMP (AMD OpenMP Compiler) (preview):
    • Specialized for OpenMP target offloading to AMD GPUs
    • Based on the LLVM project with specific optimizations for OpenMP
    • Supports OpenMP 5.0+ features relevant to GPU offloading
    • Currently a development-preview, not yet a full product
    • Github
  • AOCC (AMD Optimizing C/C++ Compiler):
    • Primarily focused on AMD CPU optimization
    • Can be used in conjunction with ROCm for heterogeneous computing
    • Based on LLVM/Clang with AMD-specific optimizations
    • Closed source
  • hipcc:
    • Compiler wrapper for HIP applications
    • Simplifies compilation process by handling complex flag combinations
    • Part of the HIP package

Fortran Compilers

  • AOCC (AMD Optimizing Fortran Compiler):
    • Based on Flang and LLVM
    • Supports GPU offloading via OpenMP directives
    • Optimized for AMD architectures
  • Flang for ROCm (deprecated):
    • Part of the LLVM project's Fortran implementation
    • The new Flang implementation (as described in LLVM's blog post) brings improved compatibility and performance
    • Github

Compilers Dependencies

graph LR;
    subgraph Compilers
        A[ROCm LLVM]
        B[AOCC]
        C[HIPCC]
    end
    subgraph ROCm Components
        D[ROCm Core]
        E[HIP]
    end
    A -->|Depends on| D
    C -->|Depends on| E

Developer Tools

ROCm offers several tools to aid in development, debugging, and performance optimization:

  • ROCgdb: Debugger for HIP and OpenCL applications
  • ROCProfiler: Performance profiling tool
  • rocm-cmake: CMake modules for ROCm
  • ROCm Compute Profiler: Performance analysis tool for AMD GPUs
  • ROCTracer: API tracing library

Developer Tools Dependencies

graph LR;
    subgraph Developer tools
        D[ROCProfiler]
        A[ROCm-cmake]
        B[ROCTracer]
        H[ROCm Compute Profiler]
        C[ROCgdb]
    end
    subgraph ROCm Components
        F[ROCm LLVM]
        E[ROCm Core]
        G[ROCminfo]
    end
    B -->|Depends on| E
    B -->|Depends on| F
    D -->|Depends on| F
    D -->|Depends on| E
    D -->|Depends on| G
    A -->|Depends on| F
    H -->|Depends on| E

Libraries and Frameworks

ROCm provides a rich set of libraries to accelerate various computational workloads.

Core Math Libraries

  • rocBLAS: Basic Linear Algebra Subprograms implementation
  • rocSOLVER: Linear algebra solver library
  • rocFFT: Fast Fourier Transform implementation
  • rocRAND: Random number generator library
  • rocSPARSE: Sparse matrix routines

ML/DL Frameworks

  • MIOpen: Deep learning primitives library
  • ROCm TensorFlow: TensorFlow support for AMD GPUs
  • ROCm PyTorch: PyTorch support for AMD GPUs
  • RCCL: Communication library for multi-GPU/multi-node training

Communication Libraries

  • ROCm Communication Collectives Library (RCCL): Optimized collective operations
  • UCX: Unified Communication X support
  • ROCm MPI: Message Passing Interface integration

Compatibility Policies

Source

ROCm Version and GPU Driver Compatibility

ROCm follows a versioning scheme that ensures compatibility between the software stack and GPU drivers.

  1. Major Version Compatibility:
    • Major ROCm versions (e.g., 6.x) typically maintain driver compatibility within the same major version.
    • Major version upgrades may require driver updates.
  2. Minor Version Compatibility:
    • Minor versions (e.g., 6.4.x) are generally compatible with drivers designed for the same major version.
    • Backward compatibility is maintained where possible, but newer hardware features may require newer drivers.

ROCm Version and glibc Compatibility

Source

Versions of glibc supported by ROCm 6.4:

  • 2.28
  • 2.31
  • 2.34
  • 2.35
  • 2.36
  • 2.38
  • 2.39

AMD GPUs in Azure

Azure offers several VM series featuring AMD GPUs. The following is an overview of available SKUs.

Source

  • NVv4 series
    • cpu: AMD EPYC 7V12 (Rome) [x86-64]
    • gpu: AMD Instinct MI25 GPU (16GB)
    • Azure
  • NGads_V620 series
    • cpu: AMD EPYC 7763 (Milan) [x86-64]
    • gpu: AMD Radeon PRO V620 GPU (32GB)
    • Azure
  • NVads_V710_v5 series
    • cpu: AMD EPYC 9V64 F (Genoa) [x86-64]
    • gpu: AMD Radeon™ Pro V710
    • Azure
  • ND-MI300X-V5 series
    • cpu: Intel Xeon (Sapphire Rapids) [x86-64]
    • gpu: AMD Instinct MI300X GPU (192GB)
    • Azure

ABC of ROCm

AMDGPU Driver | AMD Instinct | AMD SMI | AOCC | AOMP | CDNA | GCN | GFX | HIP | HIPIFY | OpenCL | OpenMP | Platform Runtime | Radeon RX | RDNA | ROCm | ROCm-LLVM | ROCm SMI | Vega

A

AMDGPU Driver

The kernel-mode driver for AMD GPUs that provides the foundation for ROCm's functionality, including the Kernel Fusion Driver (KFD) that enables compute capabilities.

AMD Instinct

AMD Instinct is AMD's dedicated compute accelerator lineup for data centers and AI/HPC applications, optimized for the ROCm software platform.

AMD Docs

AMD SMI

AMD SMI (System Management Interface) is a command-line tool within the ROCm ecosystem that allows users to query and control various aspects of AMD GPUs. ROCm SMI is an older, more limited interface primarily used for hardware monitoring that has been largely superseded by AMD SMI, which offers a broader range of management capabilities.

AMD Docs

AOCC

AMD Optimizing C/C++/Fortran Compiler, AMD's optimizing compiler suite primarily focused on AMD CPU optimization but can work with ROCm for heterogeneous computing.

AMD Docs

AOMP

AMD OpenMP Compiler, a specialized compiler (development-preview) for OpenMP target offloading to AMD GPUs, supporting OpenMP 5.0+ features for GPU computing.

C

CDNA

CDNA (Compute DNA) is AMD's GPU architecture optimized specifically for data center and high-performance computing workloads within the ROCm ecosystem.

AMD Docs and AMD Docs

G

GCN

GCN (Graphics Core Next) is AMD's older GPU architecture that served as the foundation for their compute-focused platforms in early ROCm releases.

AMD Docs and AMD Docs

GFX

GFX codes in AMD ROCm are architecture identifiers that specify GPU hardware generations, determining compatibility and optimization targets for HPC and machine learning workloads on AMD GPUs.

AMD Docs and AMD Docs

H

HIP

HIP (Heterogeneous-Compute Interface for Portability) is AMD's C++ runtime API and kernel language that allows developers to write portable code that can run on both AMD GPUs and NVIDIA GPUs, serving as a key component of the ROCm platform for high-performance computing and machine learning workloads.

AMD Docs

HIPIFY

HIPIFY is a tool within AMD's ROCm platform that converts CUDA code into portable HIP (Heterogeneous-computing Interface for Portability) code to enable GPU applications to run on AMD hardware.

AMD Docs

O

OpenCL

OpenCL is a framework that allows developers to write programs that execute across heterogeneous platforms (including AMD GPUs) by using the OpenCL runtime and compiler infrastructure provided within the ROCm ecosystem.

OpenMP

OpenMP is a parallel programming model supported through the ROCm toolchain that allows developers to write multi-threaded CPU and GPU code using familiar OpenMP directives, targeting AMD GPUs via the Clang/LLVM compiler infrastructure.

P

Platform Runtime

The Platform Runtime refers to the ROCr (ROCm Runtime) layer that provides low-level APIs for managing GPU resources, memory, and queues, forming the foundation upon which higher-level programming models like HIP operate.

AMD Docs

R

Radeon RX

AMD Radeon RX is a consumer-focused GPU series primarily designed for gaming and content creation.

RDNA

RDNA (Radeon DNA) is AMD's consumer-focused graphics architecture optimized for gaming and media applications within the ROCm ecosystem.

AMD Docs and AMD Docs

ROCm

Radeon Open Compute is an open-source software stack developed by AMD for GPU computing and machine learning applications.

AMD Docs

ROCm-LLVM

AMD ROCm's LLVM implementation is a modified version of the LLVM compiler infrastructure that enables GPU code generation, optimization, and execution for AMD GPUs within the ROCm (Radeon Open Compute) platform, providing essential support for high-performance computing and machine learning workloads.

AMD Docs

ROCm SMI

ROCm SMI (System Management Interface) is a command-line utility for monitoring and managing AMD GPUs within the ROCm ecosystem, providing functionality to query hardware information, control power states, monitor temperature, configure memory, and manage device performance. ROCm SMI is an older, more limited interface primarily used for hardware monitoring that has been largely superseded by AMD SMI, which offers a broader range of management capabilities.

AMD Docs

V

Vega

Vega refers to AMD's GPU architecture that was one of the first to fully support the ROCm ecosystem for high-performance computing and machine learning workloads.

AMD Docs and AMD Docs