GPUs on QUEST

What GPUs are available on QUEST?

All GPUs cards on Quest (both General Access and Buy-In) have NVIDIA driver version 570.86.15 which are compatible with applications compiled against CUDA Toolkit <= version 12.8.

There are 58 GPU nodes available to the Quest General Access allocations.

16 nodes which each have 2 x 40GB Tesla A100 PCIe GPU cards, 52 CPU cores, and 192 GB of CPU RAM.
18 nodes which each have 4 x 80GB Tesla A100 SXM GPU cards, 52 CPU cores, and 512 GB of CPU RAM.
24 nodes which each have 4 x 80GB Tesla H100 SXM GPU cards, 64 CPU cores, and 1 TB of CPU RAM.

There are 4 GPU nodes in the Genomics Compute Cluster (b1042).

2 nodes which each have 4 x 40GB Tesla A100 PCIe GPU cards, 52 CPU cores, and 192 GB of CPU RAM
2 nodes which each have 4 x 80GB Tesla A100 PCIe GPU cards, 64 CPU cores, and 512 GB of CPU RAM

Please note that there are some limits on the General Access GPU partitions. The total number of GPUs any individual user can utilize across all running jobs is eight. If you submit additional jobs that put you above this limit your job will be set to pending (PD) with a reason of (QOSMaxGRESPerUser). As your active jobs complete and the sum total of GPUs you are running on goes below eight, your pending jobs will change their reason to the usual (RESOURCES) or (PRIORITY).

Using General Access GPUs
Specifying GPU Interconnect Types
Using Genomics Compute Cluster GPUs
Interactive GPU jobs
Install Popular GPU Accelerated Python Software with Anaconda Virtual Environments
- CUDA
- Mamba
- jaxlib
- Tensorflow
- PyTorch
- CUpy
- Rapids

Using General Access GPUs

The maximum run time is 48 hours for a job on these nodes. To submit jobs to General Access GPU nodes, you should set gengpu as the partition and state the number of GPUs in your job submission command or script. You can also identify the type of GPU you want in your job submission. For instance to request one A100 GPU, you should add the following lines in your job submission script:


#SBATCH -A <allocationID>
#SBATCH -p gengpu
#SBATCH --gres=gpu:a100:1
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -t 1:00:00
#SBATCH --mem=XXG

Note that the memory you request here is for CPU memory. You are automatically given access to the entire memory of the GPU, but you will also need CPU memory as you will be copying memory from the CPU to the GPU.

To schedule another type of GPU, e.g. H100, you should change the a100 designation to the other GPU type, e.g. h100.

Specifying GPU Interconnect Types

There are two flavors of A100 GPUs on Quest, PCIe and SXM. In the submission script in the block above, there is no way of knowing which type of A100 GPU Slurm would assign the job to. However, you can specify which type of A100 node you'd prefer using the --constraint flag. The choices are either pcie for the 40GB A100s or sxm for the 80GB A100s.

Choosing whether you want your job to land on a PCIe or SXM A100 largely depends on the kind of job you are executing. Specifically, there are considerations to make when it comes to using a single GPU card or multiple GPU cards, which will influence whether you want to use a PCIe A100 or a SXM A100.

Considerations for Using a Single GPU Card

If you only need to use one GPU card, you want to look at how many GB you will need on that one card. If your memory needs are <40GB, you request a PCIe A100. However, if you need >40GB on a single GPU card, you should request a SXM A100.

Considerations for Using Multiple GPU Cards

If you know that you want to use multiple GPU cards for your job, an important consideration to make is that the sharing of data between two, three, or four SXM cards will be a lot faster than sharing data between the two cards on the PCIe A100.

The following example submission script would request one 80GB SXM A100 card.


#SBATCH -A <allocationID>
#SBATCH -p gengpu
#SBATCH --gres=gpu:a100:1
#SBATCH --constraint=sxm
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -t 1:00:00
#SBATCH --mem=XXG

Replace sxm with pcie and you'd receive a 40GB A100.

If you don't specify any constraint you will be assigned an A100 at random. With the GPUs you are automatically given 100% of the memory on the GPU, 40GBs or 80GBs. GPU memory is treated separately from the system memory you request with the --mem flag, they are not the same thing.

Using Genomics Compute Cluster GPUs

The maximum run time is 48 hours for a job on these nodes. Feinberg members of the Genomics Compute Cluster should use the partition genomics-gpu, while non-Feinberg members should use genomicsguest-gpu. To submit a job to these GPUs, include the appropriate partition name and specify the type and number of GPUs:


#SBATCH -A b1042
#SBATCH -p genomics-gpu
#SBATCH --gres=gpu:a100:1
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -t 1:00:00
#SBATCH --mem=XXG

Interactive GPU jobs

If you want to start an interactive session on a GPU instead of a batch submission, you can use a run command similar to the one below - these examples both request a A100:

srun will start a session on the node immediately after the job has been scheduled.


$ srun -A pXXXXX -p gengpu --mem=XX --gres=gpu:a100:1 -N 1 -n 1 -t 1:00:00 --pty bash -l

salloc will allocate the resource after which you will have to SSH to the GPU node.


$ salloc -A pXXXXX -p gengpu --mem=XX --gres=gpu:a100:1 -N 1 -n 1 -t 1:00:00

Install Popular GPU Accelerated Python Software with Anaconda Virtual Environments

CUDA

To see which versions of CUDA are available on Quest, run the command:

module spider cuda

NOTE: You cannot use code or applications that require a CUDA toolkit or module that is newer than the CUDA versions listed above. However, CUDA modules and toolkits that are older than the CUDA versions listed at the top of this page should still work.

jaxlib

All instructions will utilize the software management utility mamba. Please see Using Python on QUEST for more information on Mamba virtual environments.

These install instructions should work for jaxlib vXXXX through vXXX. It may work for newer releases of jaxlib but that will depend on the version of CUDA that was used to compile the PyPi package. If the version is older than CUDA 11.8 and newer than CUDA 12.4, it will not work with the Quest GPUs.

Installation

Please run the command that come after the $.


$ module purge
$ module load mamba/24.3.0
$ mamba create -p ./jaxlib-cuda-12-4 python=3.11
$ source activate ./jaxlib-cuda-12-4
$ python -m pip install "jax[cuda12]"

Testing

This Python script will test to verify that the Jax Installation can see and use the GPU devices on Quest. The Python script must be run on a GPU node either through a batch job or an interactive job.

Python Script

test_gpu.py


from jax import extend
print(extend.backend.get_backend().platform)
print(extend.backend.get_backend().platform_version)
print(extend.backend.get_backend().local_devices())

Run Python Script


$ python3 test_gpu.py
gpu
PJRT C API
cuda 12030
[CudaDevice(id=0), CudaDevice(id=1), CudaDevice(id=2), CudaDevice(id=3)]

Tensorflow

All instructions will utilize the software management utility mamba. Please see Using Python on QUEST for more information on Mamba virtual environments.

These install instructions should work for Tensorflow v2.15.0 through v2.17.0. It may work for newer releases of tensorflow but that will depend on the version of CUDA that was used to compile the conda package. If the version is older than CUDA 11.8 and newer than CUDA 12.4, it will not work with the Quest GPUs. You can search all versions of Tensorflow that have been compiled with CUDA toolkit 12.X with the following command.

Check Available Versions


$ module purge
$ module load mamba/24.3.0
$ mamba search 'tensorflow[channel=conda-forge,subdir=linux-64,build=*cuda12*]'

Installation

Please run the command that come after the $.


$ module purge
$ module load mamba/24.3.0
$ CONDA_OVERRIDE_CUDA="12" mamba create --prefix ./tensorflow_with_cuda_12 -c nvidia tensorflow[build=*cuda12*]

Testing

This Python script will test to verify that the Tensorflow Installation can see and use the GPU devices on Quest. The Python script must be run on a GPU node either through a batch job or an interactive job.

Python Script

test_gpu.py


import tensorflow as tf

physical_devices = tf.config.list_physical_devices('GPU')
print("Num GPUs:", len(physical_devices))
print("GPUs: ", physical_devices)
print("Tensorflow Built With CUDA Support {0}".format(tf.test.is_built_with_cuda()))

Run Python Script

You will need to set the variable XLA_FLAGS to the directory of your virtual environment when you run a tensorflow script.


$ XLA_FLAGS=--xla_gpu_cuda_data_dir=./tensorflow_with_cuda_12 python3 test_gpu.py
Num GPUs: 4
GPUs:  [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:3', device_type='GPU')]
Tensorflow Built With CUDA Support True

PyTorch

All instructions will utilize the software management utility mamba. Please see Using Python on QUEST for more information on Mamba virtual environments.

These install instructions should work for PyTorch v2.4.0 through v2.5.1. It may work for newer releases of PyTorch but that will depend on the version of CUDA that was used to compile the conda package. If the version is older than CUDA 11.8 and newer than CUDA 12.4, it will not work with the Quest GPUs. You can search all versions of PyTorch that have been compiled with CUDA toolkit 12.4 with the following command.

Check Available Versions


$ module purge
$ module load mamba/24.3.0
$ mamba search 'pytorch::pytorch[subdir=linux-64,build=*cuda12.4*]'

Installation

Please run the command that come after the $.


module purge
module load mamba/24.3.0
CONDA_OVERRIDE_CUDA="12.4" mamba create --prefix ./pytorch-cuda-12-4 -c nvidia -c pytorch pytorch[build=*cuda12.4*]

Testing

This Python script will test to verify that the PyTorch Installation can see and use the GPU devices on Quest. The Python script must be run on a GPU node either through a batch job or an interactive job.

Python Script

test_gpu.py


import torch
print(torch.cuda.is_available())
print(torch.cuda.device_count())
print(torch.cuda.get_device_name(0))

Run Python Script


$ python3 test_gpu.py
True
4
NVIDIA H100 PCIe

PyTorch on Multiple Nodes with Multiple GPUs

The batch submission script below shows how to use multiple nodes with multiple GPUs for your workflow/jobs. Please check out the GitHub page that provides the Python code for setting this up. You can modify the Python code to suit your needs.

https://github.com/nuitrcs/examplejobs/tree/master/python/pytorch_ddp


#!/bin/bash
#SBATCH --account=pXXXX
#SBATCH --partition=gengpu
#SBATCH --time=04:00:00
#SBATCH --job-name=multinode-example
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=2
#SBATCH --gres=gpu:a100:2
#SBATCH --mem=20G
#SBATCH --cpus-per-task=4

module purge
module load mamba/24.3.0
source activate ../torch_with_cuda_12_4/

export LOGLEVEL=INFO

srun torchrun \
    --nnodes 2 \
    --nproc_per_node 2 \
    --rdzv_id $RANDOM \
    --rdzv_backend c10d \
    --rdzv_endpoint "$SLURMD_NODENAME:29500" \
    ./multinode_torchrun.py 10000 100

CUpy

All instructions will utilize the software management utility mamba. Please see Using Python on QUEST for more information on Mamba virtual environments.

These install instructions should work for CUpy v12.4.

Installation

Please run the command that come after the $.


module purge
module load mamba/24.3.0
mamba create --prefix ./cupy-with-cuda-12-4 python=3.12 cuda-toolkit=12.4 -c nvidia
source activate ./cupy-with-cuda-12-4  
python3 -m pip install cupy-cuda124

Testing

This Python script will test to verify that the CUpy Installation can see and use the GPU devices on Quest. The Python script must be run on a GPU node either through a batch job or an interactive job.

Python Script

test_gpu.py


import cupy as cp
x_gpu = cp.array([1, 2, 3])
l2_gpu = cp.linalg.norm(x_gpu)

Run Python Script


$ python3 test_gpu.py

Rapids

All instructions will utilize the software management utility mamba. Please see Using Python on QUEST for more information on Mamba virtual environments.

These install instructions should work for rapidsai v24.10. It may work for newer releases of rapidsai but that will depend on the version of CUDA that was used to compile the conda package. If the version is older than CUDA 12.0 and newer than CUDA 12.4, it will not work with the Quest GPUs.

Installation

Please run the command that come after the $.


module purge
module load mamba/24.3.0
mamba create -n rapids-24.10 -c rapidsai -c conda-forge -c nvidia  rapids=24.10 python=3.12 'cuda-version>=12.0,<=12.4'

Note: Please see getting started with rapids for more details.

Was this helpful?

50% helpful - 6 reviews

Print Article

Updating...

GPUs on QUEST

What GPUs are available on QUEST?

Table of Contents

Using General Access GPUs

Specifying GPU Interconnect Types

Considerations for Using a Single GPU Card

Considerations for Using Multiple GPU Cards

Using Genomics Compute Cluster GPUs

Interactive GPU jobs

Install Popular GPU Accelerated Python Software with Anaconda Virtual Environments

CUDA

jaxlib

Installation

Testing

Python Script

Run Python Script

Tensorflow

Check Available Versions

Installation

Testing

Python Script

Run Python Script

PyTorch

Check Available Versions

Installation

Testing

Python Script

Run Python Script

PyTorch on Multiple Nodes with Multiple GPUs

CUpy

Installation

Testing

Python Script

Run Python Script

Rapids

Installation

Deleting...