Skip to content

Installing and configuring the EESSI test suite

This page covers the installation and configuration of the EESSI test suite.

For information on using the test suite, see here.

Installation

Requirements

The EESSI test suite requires Python >= 3.6 and ReFrame v4.3.3 (or newer).

(for more details on the ReFrame version requirement, click here)

Two important bugs were resolved in ReFrame's CPU autodetect functionality in version 4.3.3.

We strongly recommend you use ReFrame >= 4.3.3.

If you are using an older version of ReFrame, you may encounter some issues:

  • ReFrame will try to use the parallel launcher command configured for each partition (e.g. mpirun) when doing the remote autodetect. If there is no system-version of mpirun available, that will fail (see ReFrame issue #2926).
  • CPU autodetection only worked when using a clone of the ReFrame repository, not when it was installed with pip or EasyBuild (as is also the case for the ReFrame shipped with EESSI) (see ReFrame issue #2914).

Installing Reframe (incl. test library)

You need to make sure that ReFrame is available - that is, the reframe command should work:

reframe --version

General instructions for installing ReFrame are available in the ReFrame documentation.

ReFrame test library (hpctestlib)

The EESSI test suite requires that the ReFrame test library (hpctestlib) is available, which is currently not included in a standard installation of ReFrame.

We recommend installing ReFrame using EasyBuild (version 4.8.1, or newer), or using a ReFrame installation that is available in EESSI (pilot version 2023.06, or newer).

For example (using EESSI):

source /cvmfs/pilot.eessi-hpc.org/versions/2023.06/init/bash
module load ReFrame/4.2.0

To check whether the ReFrame test library is available, try importing a submodule of the hpctestlib Python package:

python3 -c 'import hpctestlib.sciapps.gromacs'

Installing the EESSI test suite

To install the EESSI test suite, you can either use pip or clone the GitHub repository directly:

Using pip

pip install git+https://github.com/EESSI/test-suite.git

Cloning the repository

git clone https://github.com/EESSI/test-suite $HOME/EESSI-test-suite
cd EESSI-test-suite
export PYTHONPATH=$PWD:$PYTHONPATH

Verify installation

To check whether the EESSI test suite installed correctly, try importing the eessi.testsuite Python package:

python3 -c 'import eessi.testsuite'

Configuration

Before you can run the EESSI test suite, you need to create a configuration file for ReFrame that is specific to the system on which the tests will be run.

Example configuration files are available in the config subdirectory of the EESSI/test-suite GitHub repository](https://github.com/EESSI/test-suite/tree/main/config), which you can use as a template to create your own.

Configuring ReFrame environment variables

We recommend setting a couple of $RFM_* environment variables to configure ReFrame, to avoid needing to include particular options to the reframe command over and over again.

ReFrame configuration file ($RFM_CONFIG_FILES)

(see also RFM_CONFIG_FILES in ReFrame docs)

Define the $RFM_CONFIG_FILES environment variable to instruct ReFrame which configuration file to use, for example:

export RFM_CONFIG_FILES=$HOME/EESSI-test-suite/config/example.py

Alternatively, you can use the --config-file (or -C) reframe option.

See the section on the ReFrame configuration file below for more information.

Search path for tests ($RFM_CHECK_SEARCH_PATH)

(see also RFM_CHECK_SEARCH_PATH in ReFrame docs)

Define the $RFM_CHECK_SEARCH_PATH environment variable to tell ReFrame which directory to search for tests.

In addition, define $RFM_CHECK_SEARCH_RECURSIVE to ensure that ReFrame searches $RFM_CHECK_SEARCH_PATH recursively (i.e. so that also tests in subdirectories are found).

For example:

export RFM_CHECK_SEARCH_PATH=$HOME/EESSI-test-suite/eessi/testsuite/tests
export RFM_CHECK_SEARCH_RECURSIVE=1

Alternatively, you can use the --checkpath (or -c) and --recursive (or -R) reframe options.

ReFrame prefix ($RFM_PREFIX)

(see also RFM_PREFIX in ReFrame docs)

Define the $RFM_PREFIX environment variable to tell ReFrame where to store the files it produces. E.g.

export RFM_PREFIX=$HOME/reframe_runs

This involves:

  • test output directories (which contain e.g. the job script, stderr and stdout for each of the test jobs)
  • staging directories (unless otherwise specified by staging, see below);
  • performance logs;

Note that the default is for ReFrame to use the current directory as prefix. We recommend setting a prefix so that logs are not scattered around and nicely appended for each run.

If our common logging configuration is used, the regular ReFrame log file will also end up in the location specified by $RFM_PREFIX.

Warning

Using the --prefix option in your reframe command is not equivalent to setting $RFM_PREFIX, since our common logging configuration only picks up on the $RFM_PREFIX environment variable to determine the location for the ReFrame log file.

ReFrame configuration file

In order for ReFrame to run tests on your system, it needs to know some properties about your system. For example, it needs to know what kind of job scheduler you have, which partitions the system has, how to submit to those partitions, etc. All of this has to be described in a ReFrame configuration file (see also the section on $RFM_CONFIG_FILES above).

The official ReFrame documentation provides the full description on configuring ReFrame for your site. However, there are some configuration settings that are specifically required for the EESSI test suite. Also, there are a large amount of configuration settings available in ReFrame, which makes the official documentation potentially a bit overwhelming.

Here, we will describe how to create a configuration file that works with the EESSI test suite, starting from an example configuration file settings_example.py, which defines the most common configuration settings.

You can look at other example configurations in the config directory for more inspiration.

Python imports

The EESSI test suite standardizes a few string-based values as constants, as well as the logging format used by ReFrame. Every ReFrame configuration file used for running the EESSI test suite should therefore start with the following import statements:

from eessi.testsuite.common_config import common_logging_config
from eessi.testsuite.constants import *

High-level system info (systems)

First, we describe the system at its highest level through the systems keyword.

You can define multiple systems in a single configuration file (systems is a Python list value). We recommend defining just a single system in each configuration file, as it makes the configuration file a bit easier to digest (for humans).

An example of the systems section of the configuration file would be:

site_configuration = {
    'systems': [
    # We could list multiple systems. Here, we just define one
        {
            'name': 'example',
            'descr': 'Example cluster',
            'modules_system': 'lmod',
            'hostnames': ['*'],
            'stagedir': f'/some/shared/dir/{os.environ.get("USER")}/reframe_output/staging',
            'partitions': [...],
        }
    ]
}

The most common configuration items defined at this level are:

  • name: The name of the system. Pick whatever makes sense for you.
  • descr: Description of the system. Again, pick whatever you like.
  • modules_system: The modules system used on your system. EESSI provides modules in lmod format. There is no need to change this, unless you want to run tests from the EESSI test suite with non-EESSI modules.
  • hostnames: The names of the hosts on which you will run the ReFrame command, as regular expression. Using these names, ReFrame can automatically determine which of the listed configurations in the systems list to use, which is useful if you're defining multiple systems in a single configuration file. If you follow our recommendation to limit yourself to one system per configuration file, simply define 'hostnames': ['*'].
  • prefix: Prefix directory for a ReFrame run on this system. Any directories or files produced by ReFrame will use this prefix, if not specified otherwise. We recommend setting the $RFM_PREFIX environment variable rather than specifying prefix in your configuration file, so our common logging configuration can pick up on it (see also $RFM_PREFIX).
  • stagedir: A shared directory that is available on all nodes that will execute ReFrame tests. This is used for storing (temporary) files related to the test. Typically, you want to set this to a path on a (shared) scratch filesystem. Defining this is optional: the default is a 'stage' directory inside the prefix directory.
  • partitions: Details on system partitions, see below.

System partitions (systems.partitions)

The next step is to add the system partitions to the configuration files, which is also specified as a Python list since a system can have multiple partitions.

The partitions section of the configuration for a system with two Slurm partitions (one CPU partition, and one GPU partition) could for example look something like this:

site_configuration = {
    'systems': [
        {
            ...
            'partitions': [
                {
                    'name': 'cpu_partition',
                    'descr': 'CPU partition'
                    'scheduler': 'slurm',
                    'prepare_cmds': ['source /cvmfs/pilot.eessi-hpc.org/latest/init/bash'],
                    'launcher': 'mpirun',
                    'access':  ['-p cpu'],
                    'environs': ['default'],
                    'max_jobs': 4,
                    'features': [FEATURES[CPU]],
                },
                {
                    'name': 'gpu_partition',
                    'descr': 'GPU partition'
                    'scheduler': 'slurm',
                    'prepare_cmds': ['source /cvmfs/pilot.eessi-hpc.org/latest/init/bash'],
                    'launcher': 'mpirun',
                    'access':  ['-p gpu'],
                    'environs': ['default'],
                    'max_jobs': 4,
                    'resources': [
                        {
                            'name': '_rfm_gpu',
                            'options': ['--gpus-per-node={num_gpus_per_node}'],
                        }
                    ],
                    'devices': [
                        {
                            'type': DEVICE_TYPES[GPU],
                            'num_devices': 4,
                        }
                    ],
                    'features': [
                        FEATURES[CPU],
                        FEATURES[GPU],
                    ],
                    'extras': {
                        GPU_VENDOR: GPU_VENDORS[NVIDIA],
                    },
                },
            ]
        }
    ]
}

The most common configuration items defined at this level are:

  • name: The name of the partition. Pick anything you like.
  • descr: Description of the partition. Again, pick whatever you like.
  • scheduler: The scheduler used to submit to this partition, for example slurm. All valid options can be found in the ReFrame documentation.
  • launcher: The parallel launcher used on this partition, for example mpirun or srun. All valid options can be found in the ReFrame documentation.
  • access: A list of arguments that you would normally pass to the scheduler when submitting to this partition (for example '-p cpu' for submitting to a Slurm partition called cpu). If supported by your scheduler, we recommend to not export the submission environment (for example by using '--export=None' with Slurm). This avoids test failures due to environment variables set in the submission environment that are passed down to submitted jobs.
  • prepare_cmds: Commands to execute at the start of every job that runs a test. If your batch scheduler does not export the environment of the submit host, this is typically where you can initialize the EESSI environment.
  • environs: The names of the programming environments (to be defined later in the configuration file via environments) that may be used on this partition. A programming environment is required for tests that are compiled first, before they can run. The EESSI test suite however only tests existing software installations, so no compilation (or specific programming environment) is needed. Simply specify 'environs': ['default'], since ReFrame requires that a default environment is defined.
  • max_jobs: The maximum amount of jobs ReFrame is allowed to submit in parallel. Some batch systems limit how many jobs users are allowed to have in the queue. You can use this to make sure ReFrame doesn't exceed that limit.
  • resources: This field defines how additional resources can be requested in a batch job. Specifically, on a GPU partition, you have to define a resource with the name '_rfm_gpu'. The options field should then contain the argument to be passed to the batch scheduler in order to request a certain number of GPUs per node, which could be different for different batch schedulers. For example, when using Slurm you would specify:
    'resources': [
      {
          'name': '_rfm_gpu',
          'options': ['--gpus-per-node={num_gpus_per_node}'],
      },
    ],
    
  • processor: We recommend to NOT define this field, unless CPU autodetection is not working for you. The EESSI test suite relies on information about your processor topology to run. Using CPU autodetection is the easiest way to ensure that all processor-related information needed by the EESSI test suite are defined. Only if CPU autodetection is failing for you do we advice you to set the processor in the partition configuration as an alternative. Although additional fields might be used by future EESSI tests, at this point you'll have to specify at least the following fields:
    'processor': {
        'num_cpus': 64,  # Total number of CPU cores in a node
        'num_sockets': 2,  # Number of sockets in a node
        'num_cpus_per_socket': 32,  # Number of CPU cores per socket
        'num_cpus_per_core': 1,  # Number of hardware threads per CPU core
    }                 
    
  • features: The features field is used by the EESSI test suite to run tests only on a partition if it supports a certain feature (for example if GPUs are available). Feature names are standardized in the EESSI test suite in eessi.testsuite.constants.FEATURES dictionary. Typically, you want to define features: [FEATURES[CPU]] for CPU based partitions, and features: [FEATURES[GPU]] for GPU based partitions. The first tells the EESSI test suite that this partition can only run CPU-based tests, whereas second indicates that this partition can only run GPU-based tests. You can define a single partition to have both the CPU and GPU features (since features is a Python list). However, since the CPU-based tests will not ask your batch scheduler for GPU resources, this may fail on batch systems that force you to ask for at least one GPU on GPU-based nodes. Also, running CPU-only code on a GPU node is typically considered bad practice, thus testing its functionality is typically not relevant.
  • devices: This field specifies information on devices (for example) present in the partition. Device types are standardized in the EESSI test suite in the eessi.testsuite.constants.DEVICE_TYPES dictionary. This is used by the EESSI test suite to determine how many of these devices it can/should use per node. Typically, there is no need to define devices for CPU partitions. For GPU partitions, you want to define something like:
    'devices': {
        'type': DEVICE_TYPES[GPU],
        'num_devices': 4,  # or however many GPUs you have per node
    }
    
  • extras: This field specifies extra information on the partition, such as the GPU vendor. Valid fields for extras are standardized as constants in eessi.testsuite.constants (for example GPU_VENDOR). This is used by the EESSI test suite to decide if a partition can run a test that specifically requires a certain brand of GPU. Typically, there is no need to define extras for CPU partitions. For GPU partitions, you typically want to specify the GPU vendor, for example:
    'extras': {
        GPU_VENDOR: GPU_VENDORS[NVIDIA]
    }
    

Note that as more tests are added to the EESSI test suite, the use of features, devices and extras by the EESSI test suite may be extended, which may require an update of your configuration file to define newly recognized fields.

Note

Keep in mind that ReFrame partitions are virtual entities: they may or may not correspond to a partition as it is configured in your batch system. One might for example have a single partition in the batch system, but configure it as two separate partitions in the ReFrame configuration file based on additional constraints that are passed to the scheduler, see for example the AWS CitC example configuration.

The EESSI test suite (and more generally, ReFrame) assumes the hardware within a partition defined in the ReFrame configuration file is homogeneous.

Environments

ReFrame needs a programming environment to be defined in its configuration file for tests that need to be compiled before they are run. While we don't have such tests in the EESSI test suite, ReFrame requires some programming environment to be defined:

site_configuration = {
    ...
    'environments': [
        {
            'name': 'default',  # Note: needs to match whatever we set for 'environs' in the partition
            'cc': 'cc',
            'cxx': '',
            'ftn': '',
        }
    ]
}

Note

The name here needs to match whatever we specified for the environs property of the partitions.

Logging

ReFrame allows a large degree of control over what gets logged, and where. For convenience, we have created a common logging configuration in eessi.testsuite.common_config that provides a reasonable default. It can be used by importing common_logging_config and calling it as a function to define the 'logging setting:

from eessi.testsuite.common_config import common_logging_config

site_configuration = {
    ...
    'logging':  common_logging_config(),
}
When combined by setting the $RFM_PREFIX environment variable, the output, performance log, and regular ReFrame logs will all end up in the directory specified by $RFM_PREFIX, which we recommend doing.

Alternatively, a prefix can be passed as an argument like common_logging_config(prefix), which will control where the regular ReFrame log ends up. Note that the performance logs do not respect this prefix: they will still end up in the standard ReFrame prefix (by default the current directory, unless otherwise set with $RFM_PREFIX or --prefix).

Auto-detection of processor information

You can let ReFrame auto-detect the processor information for your system.

ReFrame will automatically use auto-detection when two conditions are met:

  1. The partitions section of you configuration file does not specify processor information for a particular partition (as per our recommendation in the previous section);
  2. The remote_detect option is enabled in the general part of the configuration, as follows:
    site_configuration = {
        'systems': ...
        'logging': ...
        'general': [
            {
                'remote_detect': True,
            }
        ]
    }
    

To trigger the auto-detection of processor information, it is sufficient to let ReFrame list the available tests:

reframe --list

ReFrame will store the processor information for your system in ~/.reframe/topology/<system>-<partition>/processor.json.

Verifying your ReFrame configuration

To verify the ReFrame configuration, you can query the configuration using --show-config.

To see the full configuration, use:

reframe --show-config

To only show the configuration of a particular system partition, you can use the --system option. To query a specific setting, you can pass an argument to --show-config.

For example, to show the configuration of the gpu partition of the example system:

reframe --system example:gpu --show-config systems/0/partitions

You can drill it down further to only show the value of a particular configuration setting.

For example, to only show the launcher value for the gpu partition of the example system:

reframe --system example:gpu --show-config systems/0/partitions/@gpu/launcher

Last update: October 4, 2023