Build-test-deploy bot¶
Building, testing, and deploying software is done by one or more bot instances.
The EESSI build-test-deploy bot is implemented as a GitHub App
in the
eessi-bot-software-layer
repository.
It operates in the context of pull requests to
the compatibility-layer
repository or the
software-layer
repository,
and follows the instructions supplied by humans,
so the procedure of adding software to EESSI is semi-automatic.
It leverages the scripts provided in the bot/
subdirectory of the target repository
(see for example here), like bot/build.sh
to build software, and bot/check-result.sh
to check whether the software was built correctly.
High-level design¶
The bot consists of two components: the event handler, and the job manager.
Event handler¶
The bot event handler is responsible for handling GitHub events for the GitHub repositories it is registered to.
It is triggered for every event that it receives from GitHub. Most events are ignored, but specific events trigger the bot to take action.
Examples of actionable events are submitting of a comment that starts with bot:
,
which may specify an instruction for the bot like building software,
or adding a bot:deploy
label (see deploying).
Job manager¶
The bot job manager is responsible for monitoring the queued and running jobs, and reporting back when jobs completed.
It runs every couple of minutes as a cron job.
Basics¶
Instructions for the bot should always start with
bot:
.
To get help from the bot, post a comment with bot: help
.
To make the bot report how it is configured, post a comment with bot: show_config
.
Permissions¶
The bot is configured to only act on instructions issued by specific GitHub accounts.
There are separate configuration options for allowing to send instructions to the bot, to trigger building of software, and to deploy software installations in to the EESSI repository.
Note
Ask for help in the #software-layer-bot
channel of the EESSI Slack if needed!
Building¶
To instruct the bot to build software, one or more
build
instructions
should be issued by posting a comment in the pull request (see also here).
The most basic build instruction that can be sent to the bot is:
Where the for_arch
could be e.g. x64_64/amd/zen4
. This will trigger the bot to allocate a node of that type and
build in the /cvmfs/software.eessi.io/versions/<eessi-version>/software/linux/x86_64/amd/zen4
prefix.
Note
The for:
(and on:
, see below) argument to the bot were introduced in bot version 0.9.0. They replace the architecture=... accelerator=...
syntax used in bot versions <= v0.8.0.
Warning
Most likely, you want to supply one or more filters to avoid that all bots are triggered to build for all configurations that match the above command.
Filters¶
Build instructions can include filters that are applied by each bot instance to determine which builds should be executed, based on:
instance
: thename
of the bot instance, for exampleinstance:aws
for the bot instance running in AWS;repository
: the target repository, for exampleeessi-2023.06-software
which corresponds to the 2023.06 version of the EESSI software layer;on:architecture=<on_arch>,accelerator=<on_accelerator>
: the name of the CPU microarchitecture and GPU accelerator you want to build on, for exampleon:architecture=x86_64/amd/zen4,accelerator=nvidia/cc90
;for:architecture=<for_arch>,accelerator=<for_accelerator>
: the name of the CPU microarchitecture and GPU accelerator you want to build for, for examplefor:architecture=x86_64/amd/zen4,accelerator=nvidia/cc90
;
Note
Use :
as separator to specify a value for a particular argument, do not add spaces after the :
.
The bot recognizes shorthands for the supported filters, so you can use inst:...
instead of instance:...
,
repo:...
instead of repository:...
, and arch=...
instead of architecture=...
, and accel=
instead of accelerator
.
Combining filters¶
You can combine multiple filters in a single build
instruction.
Separate filters with a space, order of filters does not matter.
For example:
Multiple build instructions¶
You can issue multiple build instructions in a single comment, even across multiple bot instances, repositories, and CPU targets. Specify one build instruction per line.
For example:
bot: build repo:eessi.io-2023.06-software for:arch=x86_64/amd/zen3 inst:aws
bot: build repo:eessi.io-2023.06-software for:arch=x86_64/amd/zen4 inst:azure
Native builds¶
If you want to allocate the same node type that you want to build for, you can omit the on:
argument.
For example: bot:build for:arch=x86_64/amd/zen4
is fully equivalent to bot:build on:arch=x86_64/amd/zen4 for:arch=x86_64/amd/zen4
.
Cross-compiling¶
The reason for the separate on:
and for:
arguments to exist is to allow cross-compilation, and to be specific about
which architecture to allocate when doing so. The typical use case is to build GPU software on a CPU-only node.
For example: bot:build on:arch=x86_64/amd/zen2 for:arch=x86_64/amd/zen2,accel=nvidia/cc80
will instruct the bot to
build on a zen2
CPU-only node, for a combination of a zen2
CPU with a GPU with CUDA Compute Capability 8.0.
Warning
Cross-compilation for different CPU targets can not be done with the current setup. This is not a limitation of
the bot, but of the build scripts in the software-layer-scripts
repository. The only thing the bot does is
prepare a job directory in which the configuration passed through for:
is stored in a job.cfg
file. What is
done with that information is up to the build scripts from software-layer-scripts
. While these currently
set the CUDA Compute Capability configuration item for EasyBuild based on the accelerator target defined in the
job.cfg
, the architecture target is only used to determine the installation path. It is not used to set
EasyBuild's optarch
configuration, which will still default to native optimization (i.e. for the host).
Partial filter matching¶
The bot applies the filters with partial matching, but not for the for:
argument. I.e. you can do
bot:build on:arch=zen4 for:arch=x84_64/amd/zen4,accel=nvidia/cc80
, but not
bot:build on:arch=zen4 for:arch=zen4,accel=nvidia/cc80
The reason is that for the on:
argument, the bot can compare against the configured node types to find a match
(while for the for:
argument, there is no such reference to match against).
Accelerator filter matching¶
If the bot config declares that a node type has a certain accelerator, that node type will only be allocated
if a corresponding accel=<on_accel>
argument is passed. This is to avoid that on:arch=x86_64/amd/zen4
would
cause builds to be triggered on a zen4
+GPU node, while the same system also has a CPU-only zen4
node type.
Behind-the-scenes¶
Processing build instructions¶
When the bot receives build instructions through a comment in a pull request, they are processed by the event handler component. It will:
1) Combine its active configuration (instance name, repositories, supported CPU targets) and the build instructions to prepare a list of jobs to submit;
2) Create a working directory for each job, including a Slurm job script that
runs the bot/build.sh
script in the context of the changes proposed in the pull request to build the
software, and runs bot/check-result.sh
script at the end to check whether the build was successful;
3) Submit each prepared job to a workernode that can build for the specified CPU target, and put a hold on it.
Managing build jobs¶
During the next iteration of the job manager, the submitted jobs are released and queued for execution.
The job manager also monitors the running jobs at regular intervals, and reports back in the pull request
when a job has completed. It also reports the result (SUCCESS
or
FAILURE
), based on the result
of the
bot/check-result.sh
script.
Artefacts¶
If all goes well, each job should produce a tarball as an artefact, which contains the software installations and the corresponding environment module files.
The message reported by the job manager provides an overview of the contents of the artefact,
which was created by the bot/check-result.sh
script.
Testing¶
The bot also runs tests in the EESSI test suite if these match the software being installed. These tests are run after the build step by the bot through the bot/test.sh
script. The mapping of which tests will run given the built software is defined in tests/eessi_test_mapping/software_to_tests.yml
. Lastly, the job manager reports a summary of the test results created by the bot/check-test.sh
script.
Deploying¶
To deploy the artefacts that were obtained in the build phase, you should add the bot:deploy
label
to the pull request.
This will trigger the event handler to upload the artefacts for ingestion into the EESSI repository.
Behind-the-scenes¶
The current setup for the software-layer repository, is as follows:
- The bot deploys the artefacts (tarballs) to an S3 bucket in AWS, along with a metadata file, using the
eessi-upload-to-staging
script; - A cron job that runs every couple of minutes on the CernVM-FS Stratum-0 server opens a pull request to
the (private) EESSI/staging repository, to move the metadata file for
each uploaded tarball from the
staged
to theapproved
directory; - Once that pull request gets merged, the target is automatically ingested into the EESSI repository by a cron job
on the Stratum-0 server, and the metadata file is moved from
approved
toingested
in theEESSI/staging
repository;