GPUs on QUEST

Quest RHEL8 Pilot Environment - November 18.

Starting November 18, all Quest users are invited to test and run their workflows in a RHEL8 pilot environment to prepare for Quest moving completely to RHEL8 in March 2025. We invite researchers to provide us with feedback during the pilot by contacting the Research Computing and Data Services team at quest-help@northwestern.edu. The pilot environment will consist of 24 H100 GPU nodes and seventy-two CPU nodes, and it will expand with additional nodes through March 2025. Details on how to access this pilot environment will be published in a KB article on November 18.

What GPUs are available on QUEST?

There are 34 GPU nodes available to the Quest General Access allocations. These nodes have driver version 550.127.05 which is compatible with CUDA 12.4 or earlier:

  • 16 nodes which each have 2 x 40GB Tesla A100 PCIe GPU cards, 52 CPU cores, and 192 GB of CPU RAM.
  • 18 nodes which each have 4 x 80GB Tesla A100 SXM GPU cards, 52 CPU cores, and 512 GB of CPU RAM.
  • 24 nodes which each have 4 x 80GB Tesla H100 SXM GPU cards, 64 CPU cores, and 1 TB of CPU RAM.

There are 4 GPU nodes in the Genomics Compute Cluster (b1042). These nodes have driver version 525.105.17 which is compatible with CUDA 12.0 or earlier:

  • 2 nodes which each have 4 x 40GB Tesla A100 PCIe GPU cards, 52 CPU cores, and 192 GB of CPU RAM
  • 2 nodes which each have 4 x 80GB Tesla A100 PCIe GPU cards, 64 CPU cores, and 512 GB of CPU RAM

 

Table of Contents

Using General Access GPUs

The maximum run time is 48 hours for a job on these nodes. To submit jobs to general access GPU nodes, you should set gengpu as the partition and state the number of GPUs in your job submission command or script. You can also identify the type of GPU you want in your job submission. For instance to request one A100 GPU, you should add the following lines in your job submission script:

#SBATCH -A <allocationID>
#SBATCH -p gengpu
#SBATCH --gres=gpu:a100:1
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -t 1:00:00
#SBATCH --mem=XXG

Note that the memory you request here is for CPU memory. You are automatically given access to the entire memory of the GPU, but you will also need CPU memory as you will be copying memory from the CPU to the GPU.

To schedule another type of GPU, e.g. H100, you should change the a100 designation to the other GPU type, e.g. h100.

Specifying GPU Interconnect Types

There are two flavors of A100 GPUs on Quest, PCIe and SXM. In the submission script in the block above, there is no way of knowing which type of A100 GPU Slurm would assign the job to. However, you can specify which type of A100 node you'd prefer using the --constraint flag. The choices are either pcie for the 40GB A100s or sxm for the 80GB A100s.

Choosing whether you want your job to land on a PCIe or SXM A100 largely depends on the kind of job you are executing. Specifically, there are considerations to make when it comes to using a single GPU card or multiple GPU cards, which will influence whether you want to use a PCIe A100 or a SXM A100.

Considerations for Using a Single GPU Card

If you only need to use one GPU card, you want to look at how many GB you will need on that one card. If your memory needs are <40GB, you request a PCIe A100. However, if you need >40GB on a single GPU card, you should request a SXM A100.

Considerations for Using Multiple GPU Cards

If you know that you want to use multiple GPU cards for your job, an important consideration to make is that the sharing of data between two, three, or four SXM cards will be a lot faster than sharing data between the two cards on the PCIe A100.

The following example submission script would request one 80GB SXM A100 card.

#SBATCH -A <allocationID>
#SBATCH -p gengpu
#SBATCH --gres=gpu:a100:1
#SBATCH --constraint=sxm
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -t 1:00:00
#SBATCH --mem=XXG

Replace sxm with pcie and you'd receive a 40GB A100. 

If you don't specify any constraint you will be assigned an A100 at random.  With the GPUs you are automatically given 100% of the memory on the GPU, 40GBs or 80GBs. GPU memory is treated separately from the system memory you request with the --mem flag, they are not the same thing.

Using Genomics Compute Cluster GPUs

The maximum run time is 48 hours for a job on these nodes. Feinberg members of the Genomics Compute Cluster should use the partition genomics-gpu, while non-Feinberg members should use genomicsguest-gpu. To submit a job to these GPUs, include the appropriate partition name and specify the type and number of GPUs:

 

#SBATCH -A b1042
#SBATCH -p genomics-gpu
#SBATCH --gres=gpu:a100:1
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -t 1:00:00
#SBATCH --mem=XXG

Note that the memory you request here is for CPU memory. You are automatically given access to the entire memory of the GPU, but you will also need CPU memory as you will be copying memory from the CPU to the GPU.

Interactive GPU jobs

If you want to start an interactive session on a GPU instead of a batch submission, you can use a run command similar to the one below - these examples both request a A100:

srun will start a session on the node immediately after the job has been scheduled.

$ srun -A pXXXXX -p gengpu --mem=XX --gres=gpu:a100:1 -N 1 -n 1 -t 1:00:00 --pty bash -l

salloc will allocate the resource after which you will have to SSH to the GPU node.

$ salloc -A pXXXXX -p gengpu --mem=XX --gres=gpu:a100:1 -N 1 -n 1 -t 1:00:00

Install Popular GPU Accelerated Python Software with Anaconda Virtual Environments

CUDA

jaxlib

Tensorflow

PyTorch

All instructions will utilize the software management utility mamba. Please see Using Python on QUEST for more information on Mamba virtual environments.

These install instructions should work for PyTorch v2.4.0 through v2.5.1. It may work for newer releases of PyTorch but that will depend on the version of CUDA that was used to compile the conda package. If the version is older than CUDA 11.8 and newer than CUDA 12.4, it will not work with the Quest GPUs. You can search all versions of PyTorch that have been compiled with CUDA toolkit 12.4 with the following command.

Check Available Versions

$ module purge
$ module load mamba/24.3.0
$ mamba search 'pytorch::pytorch[subdir=linux-64,build=*cuda12.4*]'

Installation

Please run the command that come after the $.

module purge
module load mamba/24.3.0
CONDA_OVERRIDE_CUDA="12.4" mamba create --prefix ./pytorch-cuda-12-4 -c nvidia -c pytorch pytorch[build=*cuda12.4*]

Testing

This Python script will test to verify that the PyTorch Installation can see and use the GPU devices on Quest. The Python script must be run on a GPU node either through a batch job or an interactive job.

Python Script

test_gpu.py

import torch
print(torch.cuda.is_available())
print(torch.cuda.device_count())
print(torch.cuda.get_device_name(0))
Run Python Script
$ python3 test_gpu.py
True
4
NVIDIA H100 PCIe

PyTorch on Multiple Nodes with Multiple GPUs

The batch submission script below shows how to use multiple nodes with multiple GPUs for your workflow/jobs. Please check out the GitHub page that provides the Python code for setting this up. You can modify the Python code to suit your needs.

https://github.com/nuitrcs/examplejobs/tree/master/python/pytorch_ddp

#!/bin/bash
#SBATCH --account=pXXXX
#SBATCH --partition=gengpu
#SBATCH --time=04:00:00
#SBATCH --job-name=multinode-example
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=2
#SBATCH --gres=gpu:a100:2
#SBATCH --mem=20G
#SBATCH --cpus-per-task=4

module purge
module load mamba/24.3.0
source activate ../torch_with_cuda_12_4/

export LOGLEVEL=INFO

srun torchrun \
    --nnodes 2 \
    --nproc_per_node 2 \
    --rdzv_id $RANDOM \
    --rdzv_backend c10d \
    --rdzv_endpoint "$SLURMD_NODENAME:29500" \
    ./multinode_torchrun.py 10000 100

CUpy

Rapids

Was this helpful?
60% helpful - 5 reviews