Skip to content

Blog

MPI at Warp Speed: EESSI Meets Slingshot-11

High-performance computing environments are constantly evolving, and keeping pace with the latest interconnect technologies is crucial for maximising application performance. However, we cannot rebuild all the software in EESSI that depends on improvements to communication libraries. So how do we take advantage of new technological developments?

Specifically we look at taking benefit of the HPE/Cray Slingshot-11. Slingshot-11 promises to offer a significant advancement in HPC networking, offering improved bandwidth, lower latency, and better scalability for exascale computing workloads ... so this should be worth the effort!

In this blog post, we present the requirements for building OpenMPI 5.x with Slingshot-11 support on HPE/Cray systems and its integration with EESSI using the host_injections mechanism of EESSI to inject custom-built OpenMPI libraries. This approach enables overriding EESSI’s default MPI library with an ABI-compatible, Slingshot-optimized version which should give us optimal performance.

Building ROCm Support in EESSI

Following our overview of the ROCm ecosystem, we're excited to share our progress on the next phase of our ROCm initiative: actually building and integrating ROCm support into EESSI. This work represents a significant step forward in making AMD GPU computing more accessible to the scientific computing community through our software stack.

Mapping the AMD ROCm Ecosystem

Within the EESSI community and Inuits, we're excited to share our latest contribution to the scientific computing community: a high-level overview of AMD's ROCm ecosystem. This document is the result of our recent efforts to prepare for adding ROCm support to EESSI, and we believe it will serve as a valuable resource for anyone working with AMD GPUs in scientific computing environments.

The full overview document can be found at Overview of ROCm Ecosystem.

GPU Support in EESSI: From Zero to Science in Seconds

"How long until I can run my simulation?"

It's the question every computational scientist asks when setting up a new environment. With GPU-accelerated EESSI, the answer might surprise you: as little as 15 seconds from login to launching your first computation.

In the high-stakes world of scientific computing, every minute spent configuring software is a minute not spent on discovery. That's why we've developed a metric we call Mean-Time-To-Science – the total time from system access to running your first scientific computation. By optimizing this crucial metric, EESSI's GPU support transforms the traditional hours-long setup process into a seamless experience that keeps researchers focused on their science.

Although EESSI aims to provide pre-built software for all common HPC architectures, GPU support introduces multiplicative requirements for software builds. Each GPU compute capability (e.g., CC7.5, CC8.0, CC8.6) needs to be combined with each CPU architecture (zen2, zen3, generic x86_64), creating a large matrix of possible configurations. While it's possible to pre-build all software for all CPU/GPU combinations, testing all the configurations is not - the combination of CPU/GPU might not even exist in the real-world.

To address this challenge, we're developing additional documentation highlighting which CPU/GPU combinations are already built into EESSI. Additionally, we provide the tools and process for users to build any EasyBuild-enabled software on EESSI, allowing them to create architecture-specific builds for their particular needs when a specific combination isn't available in the standard distribution.

Integration in the EuroHPC Federation Platform

A couple of weeks ago the EuroHPC Joint Undertaking (EuroHPC JU) announced the consortium that will develop the EuroHPC Federation Platform (EFP).

This ambitious effort will deliver a 'one-stop shop' for researchers using the EuroHPC supercomputers, as well as the upcoming EuroHPC AI Factories and quantum computers, built with open source software.

EuroHPC JU logo

Ghent University is part of this consortium to integrate EESSI into the EuroHPC Federation Platform as common software stack.

Henrik Nortamo (CSC), the technical lead of the EFP consortium, gave an excellent 20-minute talk on EFP last weekend in the 10th HPC, Big Data, and Data Science devroom at FOSDEM'25 in Brussels. Slides and recording of the talk are available here.