Quest Partitions/Queues

Body

Quest and Kellogg Linux Cluster Downtime, December 14 - 18.

Quest, including the Quest Analytics Nodes, the Genomics Compute Cluster (GCC), the Kellogg Linux Cluster (KLC), and Quest OnDemand, will be unavailable for scheduled maintenance starting at 8 A.M. on Saturday, December 14, and ending approximately at 5 P.M. on Wednesday, December 18. During the maintenance window, you will not be able to login to Quest, Quest Analytics Nodes, the GCC, KLC, or Quest OnDemand submit new jobs, run jobs, or access files stored on Quest in any way including Globus. For details on this maintenance, please see the Status of University IT Services page.

Quest RHEL8 Pilot Environment - November 18.

Starting November 18, all Quest users are invited to test and run their workflows in a RHEL8 pilot environment to prepare for Quest moving completely to RHEL8 in March 2025. We invite researchers to provide us with feedback during the pilot by contacting the Research Computing and Data Services team at quest-help@northwestern.edu. The pilot environment will consist of 24 H100 GPU nodes and seventy-two CPU nodes, and it will expand with additional nodes through March 2025. Details on how to access this pilot environment will be published in a KB article on November 18.

Overview

Quest offers several partitions or queues where you can run your job. Based on the duration of your job, number of cores, and type of access to Quest, you should select the most appropriate partition for your job. A partition must be specified when you submit your job or the scheduler will return the error, "sbatch: error: Batch job submission failed: No partition specified or system default partition". Users with full access to Quest or the Genomics Compute Cluster must specify the appropriate partition for those resources. To specify the partition, please include a -p option in your job submission script:

#SBATCH -p <PartitionName>

Partition Definitions: General Access ("p" and "e" accounts)

Partition Maximum Walltime Notes
short 04:00:00 Short jobs have access to more cores on Quest than longer jobs do and are usually scheduled sooner.
normal 48:00:00 Normal jobs may run up to 2 days.
long 168:00:00 Long jobs may run for up to 7 days.
gengpu 48:00:00 This partition can be used only if your job requires GPUs.
genhimem 48:00:00 This partition can be used only if your job requires more than 243 GB memory per node.

Partition Definitions: Full Access (buy-ins, or "b" accounts)

Partition Maximum Walltime Notes
Allocation name (e.g. "b1234")
or
"buyin"
Allocation-specific Using the allocation name as the partition name is only available to users with full access to Quest. The resources available and any limits on jobs are governed by the specific policies of the full-access allocation.
Example: #SLURM -p b1234
When using the buyin partition, you must also specify the appropriate buyin allocation ID in your job submission script, using the -A flag. Using the buyin partition is the same as using your allocation name as the partition name.
Example: #SLURM -p buyin
If your allocation has specific partition names, such as genomics, ciera-std, grail-std etc., you should use those partition names instead of your allocation name or buyin partition.

Notes

Additional specialized partitions exist for specific allocations. You may be instructed to use a partition name that isn't listed above.

Note that jobs that have not finished on their own by the end of the requested time are terminated by the scheduler.

When resources for a job are not immediately available, the job will be assigned a pending (PD) status while the scheduler waits for resources to become available. There is a hard limit of 5,000 total submitted jobs per user at one time.

General access allocations have access to compute nodes with GPUs under the gengpu partition. To request one A100 GPU, you should set gengpu as the partition and add the following line to your job submission script: #SBATCH --gres=gpu:a100:1. Please see GPUs on QUEST for more information about the GPUs.

If you need to run jobs longer than one week, contact Research Computing for a consultation. Some special accommodations can be made for jobs requiring the resources of up to a single node for a month or less.

Details

Details

Article ID: 1549
Created
Thu 5/12/22 1:39 PM
Modified
Fri 10/25/24 3:14 PM