Examples of Jobs on Quest

Quest and Kellogg Linux Cluster Downtime, December 14 - 18.

Quest, including the Quest Analytics Nodes, the Genomics Compute Cluster (GCC), the Kellogg Linux Cluster (KLC), and Quest OnDemand, will be unavailable for scheduled maintenance starting at 8 A.M. on Saturday, December 14, and ending approximately at 5 P.M. on Wednesday, December 18. During the maintenance window, you will not be able to login to Quest, Quest Analytics Nodes, the GCC, KLC, or Quest OnDemand submit new jobs, run jobs, or access files stored on Quest in any way including Globus. For details on this maintenance, please see the Status of University IT Services page.

Quest RHEL8 Pilot Environment - November 18.

Starting November 18, all Quest users are invited to test and run their workflows in a RHEL8 pilot environment to prepare for Quest moving completely to RHEL8 in March 2025. We invite researchers to provide us with feedback during the pilot by contacting the Research Computing and Data Services team at quest-help@northwestern.edu. The pilot environment will consist of 24 H100 GPU nodes and seventy-two CPU nodes, and it will expand with additional nodes through March 2025. Details on how to access this pilot environment will be published in a KB article on November 18.

Additional details and examples of Quest job submission scripts and commands.

Interactive Jobs

Batch Jobs

Job Array

Dependent Jobs

See Research Computing's GitHub repository of example jobs for language/software specific examples and scripts you can copy and modify.

Interactive Jobs

Example: Interactive Job with More than One Node

If you submit an interactive job requiring more than one node, you will land on the Head Node for your job once the scheduler finds the resources. For more information see Submitting an Interactive Job.

[abc123@quser22 ~]$ srun -A p12345 -p normal -N 2 --ntasks-per-node=10 --mem-per-cpu=1G --time=01:00:00 --pty bash -l
----------------------------------------
srun job start: Thu Feb 28 16:31:27 CST 2019
Job ID: 557781
Username: abc123
Queue: normal
Account: p12345
----------------------------------------
The following variables are not
guaranteed to be the same in
prologue and the job run script
----------------------------------------
PATH (in prologue) : /hpc/slurm/usertools:/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/lpp/mmfs/bin:/opt/ibutils/bin:/home/abc123/bin
WORKDIR is: /home/abc123
----------------------------------------
[abc123@qnode4217 ~]$

This interactive job requests 2 nodes and 10 cores in each of those nodes. In total you will have access to 20 cores. Since the job reserved 1 GB per core, 20 GB of RAM is allocated in total (i.e. 10 GB in each node).

To find the name of the other compute node(s) that you have been assigned as part of your job, you can use the squeue command from the same terminal session. Look under NODELIST in the squeue output:

[abc123@qnode4217 ~]$ squeue
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
557781    normal     bash  abc123   R       5:51      2 qnode[4217-4218]

You can then ssh to the other node(s) if needed as long as your job is active.

ssh qnode4218

Batch Jobs

Job Array

Job arrays can be used to submit multiple jobs at once that use the same application script. This can be useful if you want to run the same script multiple times with different input parameters.

In the example below, the --array option defines the job array, with a specification of the index numbers you want to use (in this case, 0 through 9). The $SLURM_ARRAY_TASK_ID bash environmental variable takes on the value of the job array index for each job (so here, integer values 0 through 9, one value for each job). In this example, the value of $SLURM_ARRAY_TASK_ID is used to select the correct index from the input_args bash array which was constructed by reading in input_args.txt, each row of which is then passed on to a script as command line arguments.

jobsubmission.sh
#!/bin/bash
#SBATCH --account=w10001  ## YOUR ACCOUNT pXXXX or bXXXX
#SBATCH --partition=w10001  ### PARTITION (buyin, short, normal, w10001, etc)
#SBATCH --array=0-9 ## number of jobs to run "in parallel" 
#SBATCH --nodes=1 ## how many computers do you need
#SBATCH --ntasks-per-node=1 ## how many cpus or processors do you need on each computer
#SBATCH --time=00:10:00 ## how long does this need to run (remember different partitions have restrictions on this param)
#SBATCH --mem-per-cpu=1G ## how much RAM do you need per CPU (this effects your FairShare score so be careful to not ask for more than you need))
#SBATCH --job-name="sample_job_\${SLURM_ARRAY_TASK_ID}" ## use the task id in the name of the job
#SBATCH --output=sample_job.%A_%a.out ## use the jobid (A) and the specific job index (a) to name your log file
#SBATCH --mail-type=ALL ## you can receive e-mail alerts from SLURM when your job begins and when your job finishes (completed, failed, etc)
#SBATCH --mail-user=email@u.northwestern.edu  ## your email

module purge all
module load python-anaconda3
source activate /projects/intro/envs/slurm-py37-test

IFS=$'\n' read -d '' -r -a input_args < input_args.txt

python slurm_test.py --filename ${input_args[$SLURM_ARRAY_TASK_ID]}

where input_args.txt contains the following:

input_args.txt
filename1.txt
filename2.txt
filename3.txt
filename4.txt
filename5.txt
filename6.txt
filename7.txt
filename8.txt
filename9.txt
filename10.txt

and myscript.py contains the following code:

myscript.py
import argparse
import time


def parse_commandline():
    """Parse the arguments given on the command-line.
    """
    parser = argparse.ArgumentParser(description=__doc__)
    parser.add_argument("--filename",
                       help="Name of file",
                       default=None)


    args = parser.parse_args()

    return args


###############################################################################
# BEGIN MAIN FUNCTION
###############################################################################
if __name__ == '__main__':
    args = parse_commandline()
    #time.sleep(10) # Sleep for 3 seconds
    print(args.filename)

In this example, myscript.py will receive the values in input.csv as arguments: the first field will be sys.argv[1], the second field will be sys.argv[2], etc.

Note: make sure the number you specify for the --array parameter matches the number of lines in your input file!

Also, note that in this example standard output and error files are printed separately for each element of the job array with the --output and --error options. To avoid each element overwriting these files, tag them with jobID (%A) and elementID (%a) variables (which are automatically assigned by the scheduler) so elements have their own distinct output and error files.

Submit this script with:

sbatch jobsubmission.sh

The job array will then be submitted to the scheduler.

Dependent Jobs

Dependent jobs are a series of jobs which run or wait to run conditional on the state of another job. For instance, you may submit two jobs and you want the first job to complete successfully before the second job runs. In order to submit this type of workflow, you pass sbatch the jobid of the job that needs to finish before this job starts via the command line argument:

--dependency=afterok:<jobid>

To accomplish this, it is helpful to write all of your sbatch commands in bash script. You will notice that anything you can tell slurm via #SBATCH in the submission script itself, you can also pass to sbatch via the command line. The key here is that the bash variable jid0, jid1, jid2 will contain the jobid that SLURM assigns after you run the sbatchcommand.

wrapper_script.sh
#!/bin/bash

jid0=($(sbatch --time=00:10:00 --account=w10001 --partition=w10001 --nodes=1 --ntasks-per-node=1 --mem=8G --job-name=example --output=job_%A.out example_submit.sh))

echo "jid0 ${jid0[-1]}" >> slurm_ids

jid1=($(sbatch --dependency=afterok:${jid0[-1]} --time=00:10:00 --account=w10001 --partition=w10001 --nodes=1 --ntasks-per-node=1 --mem=8G --job-name=example --output=job_%A.out --export=DEPENDENTJOB=${jid0[-1]} example_submit.sh))

echo "jid1 ${jid1[-1]}" >> slurm_ids

jid2=($(sbatch --dependency=afterok:${jid1[-1]} --time=00:10:00 --account=w10001 --partition=w10001 --nodes=1 --ntasks-per-node=1 --mem=8G --job-name=example --output=job_%A.out --export=DEPENDENTJOB=${jid1[-1]} example_submit.sh))

echo "jid2 ${jid2[-1]}" >> slurm_ids

In the above, the second job will not start until the first job is finished and the third job will not start until the second one is finished. The actual submission script that is being run is below.

example_submit.sh
#!/bin/bash
#SBATCH --mail-type=ALL ## you can receive e-mail alerts from SLURM when your job begins and when your job finishes (completed, failed, etc)
#SBATCH --mail-user=email@u.northwestern.edu ## your email

if [[ -z "${DEPENDENTJOB}" ]]; then
    echo "First job in workflow"
else
    echo "Job started after " $DEPENDENTJOB
fi

module purge all
module load python-anaconda3
source activate /projects/intro/envs/slurm-py37-test

python --version
python myscript.py --job-id $DEPENDENTJOB

where myscript.py contains the following code:

myscript.py
import argparse
import time


def parse_commandline():
    """Parse the arguments given on the command-line.
    """
    parser = argparse.ArgumentParser(description=__doc__)
    parser.add_argument("--job-id",
                       help="Job number",
                       default=0)

    args = parser.parse_args()

    return args


###############################################################################
# BEGIN MAIN FUNCTION
###############################################################################
if __name__ == '__main__':
    args = parse_commandline()
    time.sleep(3) # Sleep for 3 seconds
    print(args.job_id)

In this example, we print the job id that had to finish in order for the dependent job to begin. Therefore, the very first job should print 0 because it did not rely on any job to finish in order to run but the second job should print the jobid of the first job and so on.

bash wrapper_script.sh

This will submit the three jobs in sequence and you should see jobs 2 and 3 pending for reason DEPENDENCY.

Was this helpful?
0 reviews