Note that any version before 2017a (i.e. 2016a, 2015a, 2014b, 2014a and 2013a) is not supported by Slurm scheduler for scalable parallelization.
To run parallel jobs on Quest that use cores across multiple nodes, you need to create a parallel profile.
Log in to Quest with X-forwarding enabled (use the -X option when connecting via ssh or connect with FastX to have X-forwarding enabled by default).
Launch MATLAB on the login node you land on from your home directory:
module load matlab/r2021b
matlab
CONFIGURATION OF QUEST CLUSTER
We will configure MATLAB to run parallel jobs on Quest by calling configCluster
. configCluster
only needs to be called once per version of MATLAB. Please be aware that running configCluster
more than once per version will reset your cluster profile back to default settings and erase any saved modifications to the profile.
>> rehash toolboxcache
>> configCluster
Jobs will now default to the cluster rather than submit to the local machine.
Although the Cluster Profile created by this script will work out of the box, we recommend slightly modifying the Cluster Profile.
If you are using the MATLAB GUI on Quest, you should be able to see and modify new cluster profile by doing the following: Parallel > Create and Manage Clusters). You should see that the default Cluster Profile is now quest R20XXa/b depending on what version of MATLAB > r2017a you are using.
Click Edit and change the JobStorageLocation entry and then click Done. We recommend updating the JobStorageLocation to be the default value of current working directory (which can be achieved by leaving the PATH blank).
CONFIGURING JOBS
Prior to submitting the job, we can specify various parameters to pass to our jobs, such as partition, allocation, walltime, etc.
>> % Get a handle to the cluster
>> c = parcluster;
The following options are required in order to submit a MATLAB job to the cluster.
>> % Specify the walltime (e.g. 4 hours)
>> c.AdditionalProperties.WallTime = '04:00:00';
>> % Specify an account to use for MATLAB jobs (e.g. pXXXX, bXXXX, etc)
>> c.AdditionalProperties.AccountName = 'account-name';
>> % Specify a queue to use for MATLAB jobs (e.g. short, normal, long)
>> c.AdditionalProperties.QueueName = 'queue-name';
The following arguments are optional but are worth considering when running MATLAB jobs on the cluster
>> % Specify memory to use for MATLAB jobs, per core (default: 4gb)
>> c.AdditionalProperties.MemUsage = '6gb';
>> % Specify number of nodes to use
>> c.AdditionalProperties.Nodes = 1;
>> % Require exclusive node
>> c.AdditionalProperties.RequireExclusiveNode = false;
>> % Specify e-mail address to receive notifications about your job
>> c.AdditionalProperties.EmailAddress = 'user-id@northwestern.edu';
The following arguments are apply when running GPU accelerated MATLAB jobs
>> % Specify number of GPUs
>> c.AdditionalProperties.GpusPerNode = 1;
>> % Specify type of GPU card to use (e.g. a100)
>> c.AdditionalProperties.GpuCard = '';
Save changes after modifying AdditionalProperties for the above changes to persist between MATLAB sessions.
>> c.saveProfile
To see the values of the current configuration options, display AdditionalProperties.
>> % To view current properties
>> c.AdditionalProperties
Unset a value when no longer needed.
>> % Turn off email notifications
>> c.AdditionalProperties.EmailAddress = '';
>> c.saveProfile
PARALLEL BATCH JOB
Users can submit parallel workflows with the batch
command either with or without the MATLAB GUI. Let’s use the following example for a parallel job, which is saved as quest_parallel_example.m
.
disp('Start Sim')
iter = 100000;
t0 = tic;
parfor idx = 1:iter
A(idx) = idx;
end
t = toc(t0);
X = sprintf('Simulation took %f seconds',t);
disp(X)
save RESULTS A t
SUBMITTING A PARALLEL BATCH JOB THROUGH THE MATLAB GUI
First, we will make a MATLAB script called submit_matlab_job.m
which we will place in the same directory as quest_parallel_example.m
and will look a lot like a SLURM submission script.
% Get a handle to the cluster
c = parcluster;
%% Required arguments in order to submit MATLAB job
% Specify the walltime (e.g. 4 hours)
c.AdditionalProperties.WallTime = '01:00:00';
% Specify an account to use for MATLAB jobs (e.g. pXXXX, bXXXX, etc)
c.AdditionalProperties.AccountName = 'w10001';
% Specify a queue/partition to use for MATLAB jobs (e.g. short, normal, long)
c.AdditionalProperties.QueueName = 'w10001';
% Constrain this MPI job
c.AdditionalProperties.Constraint = '"[quest9|quest10|quest11|quest12]"';
%% optional arguments but worth considering
% Specify memory to use for MATLAB jobs, per core (default: 4gb)
c.AdditionalProperties.MemUsage = '5gb';
% Specify number of nodes to use
c.AdditionalProperties.Nodes = 1;
% Specify e-mail address to receive notifications about your job
c.AdditionalProperties.EmailAddress = 'quest_demo@northwestern.edu';
% The script that you want to run through SLURM needs to be in the MATLAB PATH
% Here we assume that quest_parallel_example.m
lives in the same folder as submit_matlab_job.m
addpath(pwd)
% Finally we will submit the MATLAB script quest_parallel_example
to SLURM such that MATLAB
% will request enough resources to run a parallel pool of size 4 (i.e. parallelize over 4 CPUs).,
job = c.batch('quest_parallel_example', 'Pool', 16, 'CurrentFolder', '.');
After you have written your SLURM submission script like MATLAB program, run submit_matlab_job.m
using the Run button in the MATLAB GUI.
MATLAB should output the arguments that it passes to the sbatch
command used to submit a job to SLURM and which are based on the configuration settings in submit_matlab_job.m.
Note that the number of tasks submitted is the size of the parallel pool plus one. This extra CPU is for the root or main MATLAB worker. You should also see a folder called JobX appear in your current working directory as can be seen in the screen shot below. This is the folder where the MATLAB job is running and where any print statements in your code will show up.
If we enter into the Job1 folder, we will see that for every MATLAB task and/or worker there has a set of files associated with it. The most important file to keep in mind is Task1.diary.txt
which is where any print or display statements that you have in your script will go. We demonstrate what Task1.diary.txt
would contain based on our example job quest_parallel_example.m
.
MONITORING YOUR JOB THROUGH THE JOB MONITOR
MATLAB provides an application called the Job Monitor which can also be a handy way of checking on the status of your MATLAB jobs. To launch Job Monitor from the MATLAB GUI, go to Parallel > Job Monitor .
Job Monitor allows you to…
- Check on the state of your MATLAB job (pending, running, finished).
- If the job failed, Show Errors will display any MATLAB error messages.
- If you had print statements in your MATLAB job (i.e. disp, fprintf, etc), then Show Diary will show all the standard output from your job.
- Finally, Load Variables will allow you to load all of the variables defined in your program to your current WorkSpace.
We demonstrate running Show Diary and Load Variables after successfully running the program quest_parallel_example.m
. Note the variables added to our workspace after running Load Variables.
SUBMITTING A PARALLEL BATCH JOB WITHOUT THE MATLAB GUI
If you do not plan to use the MATLAB GUI to run submit_matlab_job.m
then you will want to create a bash script which will load MATLAB and run MATLAB in command line only mode.
Create a file called submit_matlab_job_wrapper.sh
which contains these two lines
module load matlab/r2021b
matlab -singleCompThread -batch submit_matlab_job
All that is left to do is to submit the job by running
$ bash submit_matlab_job_wrapper.sh
on the command line. This will take a little while to run but you will know when MATLAB has submitted a job to SLURM when it outputs the sbatch
command that it ran based on the configuration settings in submit_matlab_job.m.
For example, the above MATLAB submission script would produce this output:
$ bash submit_matlab_job_wrapper.sh
additionalSubmitArgs =
'--ntasks=17 --cpus-per-task=1 --ntasks-per-core=1 -A w10001 -t 01:00:00 -p w10001 -N 1 --mem-per-cpu=5gb --mail-type=ALL --mail-user=quest_demo@northwestern.edu'
Note that the number of tasks submitting is the size of the parallel pool plus one. This extra CPU is for the root or main MATLAB worker.
GPU BATCH JOB
Users can submit GPU workflows with the batch
command either with or without the MATLAB GUI. Let’s use the following example for a GPU job, which is saved as quest_gpu_example.m
.
display(gpuDevice)
A = gpuArray([1 0 1; -1 -2 0; 0 1 -1]);
e = eig(A);
SUBMITTING A GPU BATCH JOB WITHOUT THE MATLAB GUI
In order to submit the above MATLAB job without the GUI, we will create two additional scripts, one BASH script and one MATLAB script.
First, we will make a MATLAB script called submit_matlab_job.m
which will look a lot like a GPU SLURM submission script.
% Get a handle to the cluster
c = parcluster;
%% Required arguments in order to submit a MATLAB GPU job
% Specify the walltime (e.g. 4 hours)
c.AdditionalProperties.WallTime = '01:00:00';
% Specify an account to use for MATLAB jobs (e.g. pXXXX, bXXXX, etc)
c.AdditionalProperties.AccountName = 'w10001';
% Specify a queue/partition to use for MATLAB jobs (e.g. short, normal, long)
c.AdditionalProperties.QueueName = 'w10001';
% Specify number of GPUs
c.AdditionalProperties.GpusPerNode = 1;
% Specify type of GPU card to use (e.g. a100)
c.AdditionalProperties.GpuCard = 'a100';
%% optional arguments but worth considering
% Specify memory to use for MATLAB jobs, per core (default: 4gb)
c.AdditionalProperties.MemUsage = '5gb';
% Specify number of nodes to use
c.AdditionalProperties.Nodes = 1;
% Specify e-mail address to receive notifications about your job
c.AdditionalProperties.EmailAddress = 'quest_demo@northwestern.edu';
% The script that you want to run through SLURM needs to be in the MATLAB PATH
% Here we assume that quest_gpu_example.m lives in the same folder as submit_matlab_job.m
addpath(pwd)
% Finally we will submit the MATLAB script quest_gpu_example to SLURM such that MATLAB
job = c.batch('quest_gpu_example', 'CurrentFolder', '.');
After you have written your SLURM submission script like MATLAB program, we create a bash script which will simply run this MATLAB script which will submit a job to SLURM.
Create a file called submit_matlab_job_wrapper.sh
which contains these two lines
module load matlab/r2021b
matlab -singleCompThread -batch submit_matlab_job
All that is left to do is to submit the job by running
$ bash submit_matlab_job_wrapper.sh
on the command line. This will take a little while to run but you will know when MATLAB has submitted a job to SLURM when it outputs the sbatch
command that it ran based on the configuration settings in submit_matlab_job.m.
For example, the above MATLAB submission script would produce this output:
$ bash submit_matlab_job_wrapper.sh
additionalSubmitArgs =
'--ntasks=1 --cpus-per-task=1 --ntasks-per-core=1 -A w10001 -t 01:00:00 -p w10001 -N 1 --gres=gpu:a100:1 --mem-per-cpu=5gb --mail-type=ALL --mail-user=quest_demo@northwestern.edu'
Note the line --gres=gpu:a100:1
which let's you know that your have correctly requested for this job to run on a GPU resource.
INTERACTIVE JOBS
To run an interactive pool job on the cluster, continue to use parpool
as you’ve done before.
>> % Get a handle to the cluster
>> c = parcluster;
>> % Open a pool of 12 workers on the cluster
>> pool = c.parpool(12);
In the screenshot below, when you create the pool object MATLAB will show you the arugments that it is passing to SLURM in order to make running the pool on the cluster.
Rather than running local on the local machine, the pool can now run across multiple nodes on the cluster.
>> % Run a parfor over 1000 iterations
>> parfor idx = 1:1000
a(idx) = …
end
Once we’re done with the pool, delete it.
>> % Delete the pool
>> pool.delete
DEBUGGING
If a serial job produces an error, call the getDebugLog method to view the error log file. When submitting independent jobs, with multiple tasks, specify the task number.
>> c.getDebugLog(job.Tasks(3))
For Pool jobs, only specify the job object.
>> c.getDebugLog(job)
When troubleshooting a job, the cluster admin may request the scheduler ID of the job. This can be derived by calling schedID
>> schedID(job)
ans =
25539
TO LEARN MORE
To learn more about the MATLAB Parallel Computing Toolbox, check out these resources: