Build-test-deploy bot¶

Building, testing, and deploying software is done by one or more bot instances.

The EESSI build-test-deploy bot is implemented as a GitHub App in the eessi-bot-software-layer repository.

It operates in the context of pull requests to the compatibility-layer repository or the software-layer repository, and follows the instructions supplied by humans, so the procedure of adding software to EESSI is semi-automatic.

It leverages the scripts provided in the bot/ subdirectory of the target repository (see for example here), like bot/build.sh to build software, and bot/check-result.sh to check whether the software was built correctly.

High-level design¶

The bot consists of two components: the event handler, and the job manager.

Event handler¶

The bot event handler is responsible for handling GitHub events for the GitHub repositories it is registered to.

It is triggered for every event that it receives from GitHub. Most events are ignored, but specific events trigger the bot to take action.

Examples of actionable events are submitting of a comment that starts with bot:, which may specify an instruction for the bot like building software, or adding a bot:deploy label (see deploying).

Job manager¶

The bot job manager is responsible for monitoring the queued and running jobs, and reporting back when jobs completed.

It runs every couple of minutes as a cron job.

Basics¶

Instructions for the bot should always start with bot:.

To get help from the bot, post a comment with bot: help.

To make the bot report how it is configured, post a comment with bot: show_config.

Permissions¶

The bot is configured to only act on instructions issued by specific GitHub accounts.

There are separate configuration options for allowing to send instructions to the bot, to trigger building of software, and to deploy software installations in to the EESSI repository.

Note

Ask for help in the #software-layer-bot channel of the EESSI Slack if needed!

Building¶

To instruct the bot to build software, one or more build instructions should be issued by posting a comment in the pull request (see also here).

The most basic build instruction that can be sent to the bot is:

bot: build for:arch=<for_arch>

Where the for_arch could be e.g. x64_64/amd/zen4. This will trigger the bot to allocate a node of that type and build in the /cvmfs/software.eessi.io/versions/<eessi-version>/software/linux/x86_64/amd/zen4 prefix.

Note

The for: (and on:, see below) argument to the bot were introduced in bot version 0.9.0. They replace the architecture=... accelerator=... syntax used in bot versions <= v0.8.0.

Warning

Most likely, you want to supply one or more filters to avoid that all bots are triggered to build for all configurations that match the above command.

Filters¶

Build instructions can include filters that are applied by each bot instance to determine which builds should be executed, based on:

instance: the name of the bot instance, for example instance:aws for the bot instance running in AWS;
repository: the target repository, for example eessi-2023.06-software which corresponds to the 2023.06 version of the EESSI software layer;
on:architecture=<on_arch>,accelerator=<on_accelerator>: the name of the CPU microarchitecture and GPU accelerator you want to build on, for example on:architecture=x86_64/amd/zen4,accelerator=nvidia/cc90;
for:architecture=<for_arch>,accelerator=<for_accelerator>: the name of the CPU microarchitecture and GPU accelerator you want to build for, for example for:architecture=x86_64/amd/zen4,accelerator=nvidia/cc90;

Note

Use : as separator to specify a value for a particular argument, do not add spaces after the :.

The bot recognizes shorthands for the supported filters, so you can use inst:... instead of instance:..., repo:... instead of repository:..., and arch=... instead of architecture=..., and accel= instead of accelerator.

Combining filters¶

You can combine multiple filters in a single build instruction. Separate filters with a space, order of filters does not matter.

For example:

bot: build repo:eessi.io-2023.06-software for:arch=x86_64/amd/zen2

Multiple build instructions¶

You can issue multiple build instructions in a single comment, even across multiple bot instances, repositories, and CPU targets. Specify one build instruction per line.

For example:

bot: build repo:eessi.io-2023.06-software for:arch=x86_64/amd/zen3 inst:aws
bot: build repo:eessi.io-2023.06-software for:arch=x86_64/amd/zen4 inst:azure

Native builds¶

If you want to allocate the same node type that you want to build for, you can omit the on: argument. For example: bot:build for:arch=x86_64/amd/zen4 is fully equivalent to bot:build on:arch=x86_64/amd/zen4 for:arch=x86_64/amd/zen4.

Cross-compiling¶

The reason for the separate on: and for: arguments to exist is to allow cross-compilation, and to be specific about which architecture to allocate when doing so. The typical use case is to build GPU software on a CPU-only node.

For example: bot:build on:arch=x86_64/amd/zen2 for:arch=x86_64/amd/zen2,accel=nvidia/cc80 will instruct the bot to build on a zen2 CPU-only node, for a combination of a zen2 CPU with a GPU with CUDA Compute Capability 8.0.

Warning

Cross-compilation for different CPU targets can not be done with the current setup. This is not a limitation of the bot, but of the build scripts in the software-layer-scripts repository. The only thing the bot does is prepare a job directory in which the configuration passed through for: is stored in a job.cfg file. What is done with that information is up to the build scripts from software-layer-scripts. While these currently set the CUDA Compute Capability configuration item for EasyBuild based on the accelerator target defined in the job.cfg, the architecture target is only used to determine the installation path. It is not used to set EasyBuild's optarch configuration, which will still default to native optimization (i.e. for the host).

Partial filter matching¶

The bot applies the filters with partial matching, but not for the for: argument. I.e. you can do bot:build on:arch=zen4 for:arch=x84_64/amd/zen4,accel=nvidia/cc80, but not bot:build on:arch=zen4 for:arch=zen4,accel=nvidia/cc80 The reason is that for the on: argument, the bot can compare against the configured node types to find a match (while for the for: argument, there is no such reference to match against).

Accelerator filter matching¶

If the bot config declares that a node type has a certain accelerator, that node type will only be allocated if a corresponding accel=<on_accel> argument is passed. This is to avoid that on:arch=x86_64/amd/zen4 would cause builds to be triggered on a zen4+GPU node, while the same system also has a CPU-only zen4 node type.

Behind-the-scenes¶

Processing build instructions¶

When the bot receives build instructions through a comment in a pull request, they are processed by the event handler component. It will:

1) Combine its active configuration (instance name, repositories, supported CPU targets) and the build instructions to prepare a list of jobs to submit;

2) Create a working directory for each job, including a Slurm job script that runs the bot/build.sh script in the context of the changes proposed in the pull request to build the software, and runs bot/check-result.sh script at the end to check whether the build was successful;

3) Submit each prepared job to a workernode that can build for the specified CPU target, and put a hold on it.

Managing build jobs¶

During the next iteration of the job manager, the submitted jobs are released and queued for execution.

The job manager also monitors the running jobs at regular intervals, and reports back in the pull request when a job has completed. It also reports the result (SUCCESS or FAILURE ), based on the result of the bot/check-result.sh script.

Artefacts¶

If all goes well, each job should produce a tarball as an artefact, which contains the software installations and the corresponding environment module files.

The message reported by the job manager provides an overview of the contents of the artefact, which was created by the bot/check-result.sh script.

Testing¶

The bot also runs tests in the EESSI test suite if these match the software being installed. These tests are run after the build step by the bot through the bot/test.sh script. The mapping of which tests will run given the built software is defined in tests/eessi_test_mapping/software_to_tests.yml. Lastly, the job manager reports a summary of the test results created by the bot/check-test.sh script.

Deploying¶

To deploy the artefacts that were obtained in the build phase, you should add the bot:deploy label to the pull request.

This will trigger the event handler to upload the artefacts for ingestion into the EESSI repository.

Behind-the-scenes¶

The current setup for the software-layer repository, is as follows:

The bot deploys the artefacts (tarballs) to an S3 bucket in AWS, along with a metadata file, using the eessi-upload-to-staging script;
A cron job that runs every couple of minutes on the CernVM-FS Stratum-0 server opens a pull request to the (private) EESSI/staging repository, to move the metadata file for each uploaded tarball from the staged to the approved directory;
Once that pull request gets merged, the target is automatically ingested into the EESSI repository by a cron job on the Stratum-0 server, and the metadata file is moved from approved to ingested in the EESSI/staging repository;