Skip to content

Blog

Extrae available in EESSI

Thanks to the work developed under MultiXscale CoE we are proud to announce that as of 22 July 2024, Extrae v4.2.0 is available in the EESSI production repository software.eessi.io, optimized for the 8 CPU targets that are fully supported by version 2023.06 of EESSI. This allows using Extrae effortlessly on the EuroHPC systems where EESSI is already available, like Vega and Karolina.

It is worth noting that from that date Extrae is also available in the EESSI RISC-V repository risv.eessi.io.

Extrae is a package developed at BSC devoted to generate Paraver trace-files for a post-mortem analysis of applications performance. Extrae is a tool that uses different interposition mechanisms to inject probes into the target application so as to gather information regarding the application performance. It is one of the tools used in the POP3 CoE.

The work to incorporate Extrae into EESSI started early in May. It took quite some time and effort but has resulted in a number of updates, improvements and bug fixes for Extrae. The following sections explain the work done describing the issues encountered and the solutions adopted.

Adapting EESSI software layer

During the first attempt to build Extrae (in this case v4.0.6) in the EESSI context, we found out two issues:

  1. the configure script of Extrae was not able to find binutils in the location it is provided by the compat layer of EESSI, and
  2. the configure/make files of Extrae make use of which command that does not work in our build container.

Both problems were solved by adding a pre_configure_hook in the eb_hooks.py file of the EESSI software layer that:

  • avoids the use of which during configuration and building processes by replacing it with command -v in the necessary files, and
  • specifies the correct path to binutils in EESSI compat layer by passing --with-binutils option to the Extrae configure script.

Moving to version 4.1.6

By the time we completed this work, v4.1.6 of Extrae was available so we decided to switch to that version as v4.0.6 was throwing errors in the test suite provided by Extrae through the make checkcommand.

When first trying to build this new version, we noticed that there were still problems with binutils detection because the configure scripts of Extrae assume that the binutils libraries are under a lib directory in the provided binutils path while in the EESSI compat layer they are directly in the provided directory (i.e. without the /lib). This was solved with a patch file committed to the EasyBuild easyconfigs repository, that modifies both configure and config/macros.m4 to make binutils detection more robust. This patch was also provided to Extrae developers to incorporate into future releases.

The next step was to submit a Pull Request to the EasyBuild easyblocks repository with some modifications to the extrae.py easyblock that:

  • Removed configure options --enable-xml and --with-dwarf that were no longer available in Extrae starting from v4.1.0.
  • Added --with-xml option to specify libxml2 root dir.
  • Added --enable-posix-clock configure option for RISCV64 (needed to build Extrae as no lower level clock seems to be available yet in RISC-V architectures).

With all of this in place, we managed to correctly build Extrae but found out that many tests failed to pass, including all 21 under the MPI directory. We reported this fact to Extrae developers who answered that there was a critical bug fix related to MPI tracing in version 4.1.7 so we switched to that version before continuing our work.

Work with version 4.1.7

We tested the build of that version (of course including all the work done before for previous versions) and we still saw some errors in the make checkphase. We focused first in the following 3:

  • FAIL: mpi_commranksize_f_1proc.sh
  • FAIL: pthread.sh
  • FAIL: check_Extrae_xml_envvar_counters.sh

Regarding the first one, we found a bug in the Extrae test itself: the mpi_comm_ranksize_f_1proc.sh was invoking trace-ldpreload.sh instead of the Fortran version trace-ldpreloadf.sh and this caused the test to fail. We submitted a Pull Request to the Extrae repository with the bugfix that has been already merged and incorporated into new releases.

Regarding the second one, it was reported to Extrae developers as an issue. They suggested commenting out a call at src/tracer/wrappers/pthread/pthread_wrapper.c at line 240: //Backend_Flush_pThread (pthread_self());. We reported that this fixed the issue so this change has also been incorporated into the Extrae main branch for future releases.

The last failing test was an issue related with the access to HW counters on the building/testing system. The problem was that the test assumed that Extrae (through PAPI) can access HW counters (in this case, PAPI_TOT_INS). This might not be the case because this is very system-dependent (since it involves permissions, etc). As a solution, we committed a patch to the Extrae repository which ensured that the test will not fail if PAPI_TOT_CYC is unavailable in the testing system. As this has not been incorporated yet into the Extrae repository, we also committed a patch file to the EasyBuild easyconfigs repository that solves the problem with this specific test but also with others that suffered from this same issue.

Finally, version 4.2.0

Due to the bugfixes mentioned in previous section that were incorporated into the Extrae repository, we switched again to an updated version of Extrae (in this case v4.2.0). With that updated version and the easyconfig (and patches) and easyblock modifications tests started to pass successfully in most of the testing platforms.

We noticed, however, that Extrae produced segmentation faults when using libunwind in ARM architectures. Our approach to that was to report the issue to Extrae developers and to make this dependency architecture specific (i.e. forcing --without-unwind when building for ARM while keeping the dependency for the rest of architectures). We did this in a Pull Request to the EasyBuild easyconfigs repository that is already merged. In this same Pull Request we added zlib as an explicit dependency in the easyconfig file for all architectures.

The last issue we encountered was similar to the previous one but in this case was seen on some RISC-V platforms and related to dynamic memory instrumentation. We adopted the same approach and reported the issue to Extrae developers and added --disable-instrument-dynamic-memory to the configure options in a Pull Request already merged into the EasyBuild-Easyconfigs repository.

With that, all tests passed in all platforms and we were able to incorporate Extrae to the list of software available in both the software.eessi.io and riscv.eessi.io repositories of EESSI.

Portable test run of ESPResSo on EuroHPC systems via EESSI

ESPResSo logo

Since 14 June 2024, ESPResSo v4.2.2 is available in the EESSI production repository software.eessi.io, optimized for the 8 CPU targets that are fully supported by version 2023.06 of EESSI. This allows running ESPResSo effortlessly on the EuroHPC systems where EESSI is already available, like Vega and Karolina.

On 27 June 2024, an additional installation of ESPResSo v4.2.2 that is optimized for Arm A64FX processors was added, which enables also running ESPResSo efficiently on Deucalion, even though EESSI is not available yet system-wide on Deucalion (see below for more details).

With the portable test for ESPResSo that is available in the EESSI test suite we can easily evaluate the scalability of ESPResSo across EuroHPC systems, even if those systems have different system architectures.

Simulating Lennard-Jones fluids using ESPResSo

Lennard-Jones fluids model interacting soft spheres with a potential that is weakly attractive at medium range and strongly repulsive at short range. Originally designed to model noble gases, this simple setup now underpins most particle-based simulations, such as ionic liquids, polymers, proteins and colloids, where strongly repulsive pairwise potentials are desirable to prevent particles from overlapping with one another. In addition, solvated systems with atomistic resolution typically have a large excess of solvent atoms compared to solute atoms, thus Lennard-Jones interactions tend to account for a large portion of the simulation time. Compared to other potentials, the Lennard-Jones interaction is inexpensive to calculate, and its limited range allows us to partition the simulation domain into arbitrarily small regions that can be distributed among many processors.

Portable test to evaluate performance of ESPResSo

To evaluate the performance of ESPResSo, we have implemented a portable test for ESPResSo in the EESSI test suite; the results shown here were collected using version 0.3.2.

After installing and configuring the EESSI test suite on Vega, Karolina, and Deucalion, running the Lennard-Jones (LJ) test case with ESPResSo 4.2.2 available in EESSI can be done with:

reframe --name "ESPRESSO_LJ.*%module_name=ESPResSo/4.2.2"

This will automatically run the LJ test case with ESPResSo across all known scales in the EESSI test suite, which range from single core up to 8 full nodes.

Performance + scalability results on Vega, Karolina, Deucalion

The performance results of the tests are collected by ReFrame in a detailed JSON report.

The parallel performance of ESPResSo, expressed in particles integrated per second, scales linearly with the number of cores. On Vega using 8 nodes (1024 MPI ranks, one per physical core), ESPResSo 4.2.2 can integrate the equations of motion of roughly 615 million particles every second. On Deucalion using 8 nodes (384 cores), we observe a performance of roughly 62 million particles integrated per second.

Performance of ESPResSo 4.2.2 on Vega, Karolina, Deucalion

Plotting the parallel efficiency of ESPResSo 4.2.2 (weak scaling, 2000 particles per MPI rank) on the three EuroHPC systems we used shows that it decreases approximately linearly with the logarithm of the number of cores.

Parallel efficiency of ESPResSo 4.2.2 on Vega, Karolina, Deucalion

Running ESPResSo on Deucalion via EESSI + cvmfsexec

While EESSI is already available system-wide on both Vega and Karolina for some time (see here and here for more information, respectively), it was not available yet on Deucalion when these performance experiments were run.

Nevertheless, we were able to leverage the optimized installation of ESPResSo for A64FX that is available in EESSI since 27 June 2024, by leveraging the cvmfsexec tool, and by creatively implementing two simple shell wrapper scripts.

cvmfsexec wrapper script

The first wrapper script cvmfsexec_eessi.sh can be used to run a command in a subshell in which the EESSI CernVM-FS repository (software.eessi.io) is mounted via cvmfsexec. This script can be used by regular users on Deucalion, it does not require any special privileges beyond the Linux kernel features that cvmfsexec leverages, like user namespaces.

Contents of ~/bin/cvmfsexec_eessi.sh:

#!/bin/bash
if [ -d /cvmfs/software.eessi.io ]; then
    # run command directly, EESSI CernVM-FS repository is already mounted
    "$@"
else
    # run command via in subshell where EESSI CernVM-FS repository is mounted,
    # via cvmfsexec which is set up in a unique temporary directory
    orig_workdir=$(pwd)
    mkdir -p /tmp/$USER
    tmpdir=$(mktemp -p /tmp/$USER -d)
    cd $tmpdir
    git clone https://github.com/cvmfs/cvmfsexec.git > $tmpdir/git_clone.out 2>&1
    cd cvmfsexec
    ./makedist default > $tmpdir/cvmfsexec_makedist.out 2>&1
    cd $orig_workdir
    $tmpdir/cvmfsexec/cvmfsexec software.eessi.io -- "$@"
    # cleanup
    rm -rf $tmpdir
fi

Do make sure that this script is executable:

chmod u+x ~/bin/cvmfsexec_eessi.sh

A simple way to test this script is to use it to inspect the contents of the EESSI repository:

~/bin/cvmfsexec_eessi.sh ls /cvmfs/software.eessi.io

or to start an interactive shell in which the EESSI repository is mounted:

~/bin/cvmfsexec_eessi.sh /bin/bash -l

The job scripts that were submitted by ReFrame on Deucalion leverage cvmfsexec_eessi.sh to set up the environment and get access to the ESPResSo v4.2.2 installation that is available in EESSI (see below).

orted wrapper script

In order to get multi-node runs of ESPResSo working without having EESSI available system-wide, we also had to create a small wrapper script for the orted command that is used by Open MPI to start processes on remote nodes. This is necessary because mpirun launches orted, which must be run in an environment in which the EESSI repository is mounted. If not, MPI startup will fail with an error like "error: execve(): orted: No such file or directory".

This wrapper script must be named orted, and must be located in a path that is listed in $PATH.

We placed it in ~/bin/orted, and add export PATH=$HOME/bin:$PATH to our ~/.bashrc login script.

Contents of ~/bin/orted:

#!/bin/bash

# first remove path to this orted wrapper from $PATH, to avoid infinite loop
orted_wrapper_dir=$(dirname $0)
export PATH=$(echo $PATH | tr ':' '\n' | grep -v $orted_wrapper_dir | tr '\n' ':')

~/bin/cvmfsexec_eessi.sh orted "$@"

Do make sure that also this orted wrapper script is executable:

chmod u+x ~/bin/orted

If not, you will likely run into an error that starts with:

An ORTE daemon has unexpectedly failed after launch ...

Slurm job script

We can use the cvmfsexec_eessi.sh script in a Slurm job script on Deucalion to initialize the EESSI environment in a subshell in which the EESSI CernVM-FS repository is mounted, and subsequently load the module for ESPResSo v4.2.2 and launch the Lennard-Jones fluid simulation via mpirun:

Job script (example using 2 full 48-core nodes on A64FX partition of Deucalion):

#!/bin/bash
#SBATCH --ntasks=96
#SBATCH --ntasks-per-node=48
#SBATCH --cpus-per-task=1
#SBATCH --time=5:0:0
#SBATCH --partition normal-arm
#SBATCH --export=None
#SBATCH --mem=30000M
~/bin/cvmfsexec_eessi.sh << EOF
export EESSI_SOFTWARE_SUBDIR_OVERRIDE=aarch64/a64fx
source /cvmfs/software.eessi.io/versions/2023.06/init/bash
module load ESPResSo/4.2.2-foss-2023a
export SLURM_EXPORT_ENV=HOME,PATH,LD_LIBRARY_PATH,PYTHONPATH
mpirun -np 96 python3 lj.py
EOF

(the lj.py Python script is available in the EESSI test suite, see here)

EESSI promo tour @ ISC'24 (May 2024, Hamburg)

ISC logo

This week, we had the privilege of attending the ISC'24 conference in the beautiful city of Hamburg, Germany. This was an excellent opportunity for us to showcase EESSI, and gain valuable insights and feedback from the HPC community.


BoF session on EESSI

The EESSI Birds-of-a-Feather (BoF) session on Tuesday morning, part of the official ISC'24 program, was the highlight of our activities in Hamburg.

It was well attended, with well over 100 people joining us at 9am.

EESSI BoF session at ISC'24

During this session, we introduced the EESSI project with a short presentation, followed by a well-received live hands-on demo of installing and using EESSI by spinning up an "empty" Linux virtual machine instance in Amazon EC2 and getting optimized installations of popular scientific applications like GROMACS and TensorFlow running in a matter of minutes.

During the second part of the BoF session, we engaged with the audience through an interactive poll and by letting attendees ask questions.

The presentation slides, including the results of the interactive poll and questions that were raised by attendees, are available here.


Workshops

During the last day of ISC'24, EESSI was present in no less than three different workshops.

RISC-V workshop

At the Fourth International workshop on RISC-V for HPC, Julián Morillo (BSC) presented our paper "Preparing to Hit the Ground Running: Adding RISC-V support to EESSI" (slides available here).

EESSI @ RISC-V workshop at ISC'24

Julián covered the initial work that was done in the scope of the MultiXscale EuroHPC Centre-of-Excellence to add support for RISC-V to EESSI, outlined the challenges we encountered, and shared the lessons we have learned along the way.

AHUG workshop

During the Arm HPC User Group (AHUG) workshop, Kenneth Hoste (HPC-UGent) gave a talk entitled "Extending Arm’s Reach by Going EESSI" (slides available here).

Next to a high-level introduction to EESSI, we briefly covered some of the challenges we encountered when testing the optimized software installations that we had built for the Arm Neoverse V1 microarchitecture, including bugs in OpenMPI and GROMACS.

EESSI @ Arm HPC User Group workshop at ISC'24

Kenneth gave a live demonstration of how to get access to EESSI and start running the optimized software installations we provide through our CernVM-FS repository on a fresh AWS Graviton 3 instance in a matter of minutes.

POP workshop

In the afternoon on Thursday, Lara Peeters (HPC-UGent) presented MultiXscale during the Readiness of HPC Extreme-scale Applications workshop, which was organised by the POP EuroHPC Centre-of-Excellence (slides available here).

EESSI @ POP workshop at ISC'24

Lara outlined the pilot use cases on which MultiXscale focuses, and explained how EESSI helps to achieve the goals of MultiXscale in terms of Productivity, Performance, and Portability.

Group picture @ POP workshop at ISC'24

At the end of the workshop, a group picture was taken with both organisers and speakers, which was a great way to wrap up a busy week in Hamburg!


Talks and demos on EESSI at exhibit

Not only was EESSI part of the official ISC'24 program via a dedicated BoF session and various workshops: we were also prominently present on the exhibit floor.

Microsoft Azure booth

Microsoft Azure invited us to give a 1-hour introductory presentation on EESSI on both Monday and Wednesday at their booth during the ISC'24 exhibit, as well as to provide live demonstrations at the demo corner of their booth on Tuesday afternoon on how to get access to EESSI and the user experience it provides.

Talk @ Microsoft Azure booth at ISC'24

Exhibit attendees were welcome to pass by and ask questions, and did so throughout the full 4 hours we were present there.

Demo @ Microsoft Azure booth at ISC'24

Both Microsoft Azure and AWS have been graciously providing resources in their cloud infrastructure free-of-cost for developing, testing, and demonstrating EESSI for several years now.

EuroHPC booth

The MultiXscale EuroHPC Centre-of-Excellence we are actively involved in, and through which the development of EESSI is being co-funded since Jan'23, was invited by the EuroHPC JU to present the goals and preliminary achievements at their booth.

Talk @ EuroHPC booth at ISC'24

Elisabeth Ortega (HPCNow!) did the honours to give the last talk at the EuroHPC JU booth of the ISC'24 exhibit.


Stickers!

Last but not least: we handed out a boatload free stickers with the logo of both MultiXscale and EESSI itself, as well as of various of the open source software projects we leverage, including EasyBuild, Lmod, and CernVM-FS.

Stickers at ISC'24

We have mostly exhausted our sticker collection during ISC'24, but don't worry: we will make sure we have more available at upcoming events...