How to Test Your Workflow on the New RHEL8 Operating System
In advance of the March 2025 Quest downtime, during which Northwestern IT will upgrade Quest’s Operating System from Red Hat Enterprise Linux 7 (RHEL7) to Red Hat Enterprise Linux 8 (RHEL8), Quest users have the opportunity to test their software and research workflows on CPU and GPU Computes nodes which are running the new RHEL8 OS.
The RHEL8 Pilot Environment is available for use now.
Table of Contents
What Is RHEL8 Pilot?
The RHEL8 test environment consists of 72 Quest10 CPU compute nodes and 24 Quest13 NVIDIA H100 GPU compute nodes all running RHEL8. The CPU compute nodes will be available for testing for users with either a Quest buy-in or General Access allocation. The GPU compute nodes will only be accessible to users with a General Access allocation.
What Am I Testing For?
Although the Research Computing and Data Services team has verified that much of the software on Quest will work on CPU and GPU nodes running RHEL8, we have many modules on Quest and users are allowed to install their own software and we cannot guarantee that all software will run without issues. To that end, we invite researchers to provide us with feedback during the pilot by contacting the Research Computing and Data Services team at quest-help@northwestern.edu with the Slurm Job ID of any jobs that failed to run due to software or environmental issues.
Please Note: If you are leveraging the H100 GPUs, then your application must have been compiled with a version of CUDA Toolkit between 11.8 and 12.4. We have instructions for installing popular GPU accelerated Python libraries in this manner on our Using GPUs on Quest KB article.
How to Access It?
GPU Compute Nodes
The GPU compute nodes will only be accessible to users with a General Access allocation.
Users with General Access Allocations Only
Batch Jobs
To participate in the RHEL8 pilot and to utilize the H100 GPU Nodes, users of the General Access partition gengpu will need to update the GPU request in their Slurm submission script from
#SBATCH --gres=gpu:a100:1
to
#SBATCH --gres=gpu:h100:1
Please see the summary table below for examples of modifying your script for these partitions.
Partition |
Original Submission Script |
Modified Submission Script |
gengpu |
#SBATCH --account=p12345
#SBATCH --partition=gengpu
#SBATCH --gres=gpu:a100:1
#SBATCH --time=HH:MM:SS
...
|
#SBATCH --account=p12345
#SBATCH --partition=gengpu
#SBATCH --gres=gpu:h100:1 ## modify this line
#SBATCH --constraint=rhel8 ## add this line
#SBATCH --time=HH:MM:SS
...
|
gengpu
(with sxm) |
#SBATCH --account=p12345
#SBATCH --partition=gengpu
#SBATCH --gres=gpu:a100:1
#SBATCH --constraint=sxm
#SBATCH --time=HH:MM:SS ...
|
#SBATCH --account=p12345
#SBATCH --partition=gengpu
#SBATCH --gres=gpu:h100:1 ## modify this line
#SBATCH --constraint=rhel8 ## modify this line
#SBATCH --time=HH:MM:SS ...
|
gengpu
(with pcie) |
#SBATCH --account=p12345
#SBATCH --partition=gengpu
#SBATCH --gres=gpu:a100:1
#SBATCH --constraint=pcie
#SBATCH --time=HH:MM:SS
...
|
#SBATCH --account=p12345
#SBATCH --partition=gengpu
#SBATCH --gres=gpu:h100:1 ## modify this line
#SBATCH --constraint=rhel8 ## modify this line
#SBATCH --time=HH:MM:SS
...
|
Interactive Jobs
The following salloc and srun commands start an interactive session on one of the RHEL8 H100 GPU nodes in the gengpu partition. Replace p12345 with the name of your general access allocation and update the other parameters (--time, --mem, --ntasks-per-node) as needed. In the salloc example, replace qgpu3001 with the name of the node you are allocated by salloc.
##srun example
srun --account=p12345 --partition=gengpu --gres=gpu:h100:1 --constraint=rhel8 --time=01:00:00 --mem=7GB --nodes=1 --ntasks-per-node=1 --pty bash -l
## salloc example
salloc --account=p30157 --partition=gengpu --gres=gpu:h100:1 --constraint=rhel8 --time=00:30:00 --mem=7G --nodes=1 --ntasks-per-node=1
ssh qgpu3001
Quest OnDemand
Similarly, users can access the RHEL8 GPU nodes as part of a Quest OnDemand session. Be sure to update the "GPU" parameter to an H100 option when launching the Quest OnDemand application.
In addition, update the "Constraint" parameter to choose the "rhel8" option.
CPU Compute Nodes
The CPU compute nodes will be available for testing for users with either a Quest buy-in or General Access allocation.
Users with General Access Allocations
Batch Jobs
To participate in the RHEL8 pilot, users of the General Access partitions short, normal, and long, will need to either add the following line to their existing Slurm submission script.
#SBATCH --constraint=rhel8
or if you already have a --constraint
setting in your Slurm submission script, you must update it to the above as you cannot --constraint
set multiple times in your submission script. Please see the summary table below for examples of modifying your script for these partitions.
Partition |
Original Submission Script |
Modified Submission Script |
short |
#SBATCH --account=p12345
#SBATCH --partition=short
#SBATCH --time=HH:MM:SS
...
|
#SBATCH --account=p12345
#SBATCH --partition=short
#SBATCH --constraint=rhel8
#SBATCH --time=HH:MM:SS
...
|
normal |
#SBATCH --account=p12345
#SBATCH --partition=normal
#SBATCH --time=HH:MM:SS
...
|
#SBATCH --account=p12345
#SBATCH --partition=normal
#SBATCH --constraint=rhel8
#SBATCH --time=HH:MM:SS
...
|
long |
#SBATCH --account=p12345
#SBATCH --partition=long
#SBATCH --time=HH:MM:SS
...
|
#SBATCH --account=p12345
#SBATCH --partition=long
#SBATCH --constraint=rhel8
#SBATCH --time=HH:MM:SS
...
|
Interactive Jobs
The following salloc and srun commands start an interactive session on one of the RHEL8 nodes in the short, normal, and long partitions. Replace p12345 with the name of your general access allocation and update the other parameters (--time, --mem, --ntasks-per-node, --partition) as needed. In the salloc example, replace qnode0437 with the name of the node you are allocated by salloc.
##srun example
srun
--account=p12345 --partition=short --constraint=rhel8 --time=01:00:00 --mem=7G --nodes=1 --ntasks-per-node=1
--pty bash -l
## salloc example
salloc --account=p12345 --partition=short --constraint=rhel8 --time=01:00:00 --mem=7G --nodes=1 --ntasks-per-node=1
ssh qnode0437
Quest OnDemand
To access the RHEL8 CPU nodes via a Quest OnDemand session, be sure to update the "Constraint" parameter to "rhel8" when launching your OnDemand application.
Users with Buy-In
To participate in the RHEL8 pilot, users in buy-in allocations with access to compute resources will need modify the partition setting in their submission script from either
#SBATCH --partition=b12345
or
#SBATCH --partition=buyin
to
#SBATCH --partition=buyin-dev
Please see the summary table below for examples of modifying your script for these partitions.
Partition |
Original Submission Script |
Modified Submission Script |
b12345 |
#SBATCH --account=b12345
#SBATCH --partition=b12345
#SBATCH --time=HH:MM:SS
...
|
#SBATCH --account=b12345
#SBATCH --partition=buyin-dev
#SBATCH --time=HH:MM:SS
...
|
buyin |
#SBATCH --account=b12345
#SBATCH --partition=buyin
#SBATCH --time=HH:MM:SS
...
|
#SBATCH --account=b12345
#SBATCH --partition=buyin-dev
#SBATCH --time=HH:MM:SS
...
|
Quest Analytics Nodes
The Quest Analytics Node environment is not part of the initial test cluster.
However, a RHEL8 test Analytics environment will be available for Quest user testing before the March 2025 Quest Downtime.
Additional information regarding testing on Quest Analytics Nodes will be provided at a later time.
Submit Feedback
We invite researchers to provide us with feedback during the pilot by contacting the Research Computing and Data Services team at quest-help@northwestern.edu with the Slurm Job ID of any jobs that failed to run due to software or environmental issues.