Known issues¶
EESSI Production Repository (v2023.06)¶
Failed to modify UD QP to INIT on mlx5_0: Operation not permitted
¶
This is an error that occurs with OpenMPI after updating to OFED 23.10.
There is an upstream issue on this problem opened with EasyBuild. See: https://github.com/easybuilders/easybuild-easyconfigs/issues/20233
WorkaroundsYou can instruct OpenMPI to not use libfabric and turn off `uct`(see https://openucx.readthedocs.io/en/master/running.html#running-mpi) by passing the following options to `mpirun`:
Or equivalently, you can set the following environment variables: You may also set these additional environment variables via site-specific Lmod hooks:require("strict")
local hook=require("Hook")
-- Fix Failed to modify UD QP to INIT on mlx5_0: Operation not permitted
function fix_ud_qp_init_openmpi(t)
local simpleName = string.match(t.modFullName, "(.-)/")
if simpleName == 'OpenMPI' then
setenv('OMPI_MCA_btl', '^uct,ofi')
setenv('OMPI_MCA_pml', 'ucx')
setenv('OMPI_MCA_mtl', '^ofi')
end
end
local function combined_load_hook(t)
if eessi_load_hook ~= nil then
eessi_load_hook(t)
end
fix_ud_qp_init_openmpi(t)
end
hook.register("load", combined_load_hook)
GCC-12.2.0 and foss-2022b based modules cannot be loaded on zen4
architecture¶
The zen4
architecture was released late 2022. As a result, the compilers and BLAS libraries that are part of the 2022b toolchain generation did not yet (fully) support this architecture. Concretely, it was found in this pr that unit tests in the OpenBLAS version that is part of the foss-2022b toolchain were failing. As a result, it was decided that we would not support this toolchain-generation at all on the zen4
architecture.