Running jobs

Quick access

Slurm
List of Slurm options
Example script for a single-core job
Example script for a multi-core job
Example script for a job array
Modifying job options once they are running
Interactive sessions
Benchmarking jobs
Useful slurm commands


Slurm job scheduler

Marbits uses SLURM as a job scheduler. From its website: «As a cluster workload manager, Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.»

In order to submit a job you need to tell slurm what resources you will need so as it can allocate them and make sure that all users can fairly share them.

The easiest way of doing that is embedding your program calls within a script that can be understood by slurm and then submitting it with the command sbatch.

sbatch <your_script_name>

Slurm options explained

You can call sbatch from the command line and specify the options inline. However, that is only recommended for experienced users that have absolute contempt for science reproducibility. The best way of submitting a job is by crafting a script that embeds all slurm options in a text file. You can then re-use them or use them to write comments about what that script is supposed to do, what programs and versions it uses, etc.

All the script options are preceded by #SBATCH. Remember that a # works as a comment, so your shell is not going to interpret the option, although slurm will.

The following options are compulsory. Failure to add them will prevent your job to run.

#SBATCH --account=<your_bank_account>
  • --account refers to the «bank» or project account you want your job to be accounted for. One user may have more that one of such accounts, so you may have to choose which one is going to be billed by your job. Also, the same bank/project account is going to be shared between the users that belong to the same group/project. Hence, don’t mistake your «user» account (that’d be your user name that you use to log in to marbits) with the bank/project account, that will be used for billing purposes. See how to know your «bank» account

Other options

All this options have default values, so not specifying them will be fine. Nevertheless they are recommended if you want to fine-tune the behavior of your script and may be necessary if you are going to use parallelization or job arrays.

#SBATCH --job-name=<your_job_name>
#SBATCH --time=00-00:01:00
#SBATCH --mem=<real_amount_of_memory_to_be_used_by_your_job_in_MB>
#SBATCH --nodes=<number_nodes_to_be_used_by_your_job>
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=<number_of_cores> 
#SBATCH --output=path_to_log_file/log_%J.out
#SBATCH --error=path_to_log_file/log_%J.err 
#SBATCH --array=i-j
#SBATCH --mail-type=ALL
#SBATCH --mail-user=username@icm.csic.es
  • --job-name. The name you want your job to have. Defaults to the script name
  • --time. Time that your job will need to complete. If your job runs longer than the specified time it will be killed, but when you declare this option, the job scheduler is able to better optimize the available resources. Format is DD-HH:MM:SS.
  • --mem. Amount of memory that your job will need per node. Units are MB, but can also be specified with suffixes as 10G instead 10000. You need to benchmark your jobs before using this parameter confidently (see how to benchmark your jobs).
  • --nodes it’s the number of nodes that your job needs to run. Generally you can skip this one, unless you are using mpi or job arrays. In these cases it may help you to better control the cluster usage.
  • --ntasks it’s the number of tasks you are going to do. It is generally one, and defaults to one.
  • --cpus-per-task. Number of cores that your job is going to use. It defaults to 1*. It is very important that the number of cores/threads that you are asking in your job equals to the value of this option.
  • --output. specifies a path for the job standard output file. The variable %J adds the job number to the output file name. Variable %A has the same effect when your job is an array of jobs. Variable %a adds the array index to the file name. i.e: log%A%a.out.
  • --error. Idem for standard error.
  • --array. If you have number-indexed input files or a similar workaround, this option creates as many sub-jobs as elements are in the list i-j (i.e --array 1-10 will create 10 sub-jobs). Each subjob will set the value of variable $SLURM_ARRAY_TASK_ID to the corresponding index. The list also accepts intervals such as 1-5,6,10-15 to run subjobs 1 to 5, 6 and 10 to 15. Appending the modulator %n will limit the number of concurrent subjobs to n (i.e: --array 1-10%2 will run 10 jobs with only 2 jobs being run at the same time).
  • --mail-type. Send an email to the submitter in the following cases (pick one): BEGIN, END, FAIL, ALL (self explanatory, I think). There are more options, see sbatch man page. Currently, this option needs to be specified alongside --mail-user.
  • --mail-user. E-mail address of recipient of the --mail-type option.

Example script for a single-core job

#!/bin/sh
# Remember you can add as many comments to your
# script as you want, preceded by `#`.
#
# SLURM HEADER
#############################################
# JOB INFO
#SBATCH --account=test
#SBATCH --job-name=myFirstJob
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=12G
#SBATCH --output=path_to_my_output_directory/jobLog_%J.out
#SBATCH --error=path_to_my_output_directory/jobLog_%J.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=pepito@tuenti.com

# Load the modules you'll need (i.e. megahit)
##############################################
module load megahit/1.1.3

# Declare variables or write more comments so as
# you know in 3 months what this script is supposed
# to do ;-)

reads1=data/in1.fastq
reads2=data/in2.fastq
outdir=results

# And now do the job...
megahit -o ${outdir} --out-prefix megahit.test -1 ${reads1} -2 ${reads2}

Example script for a multi-core job

This is the same job as before. Just note that both the option that specifies --cpus-per-task in the header and the option of the program megahit that actually activates more than one core (-t) have the same value (12 in this case). This is very important to properly schedule your job.

#!/bin/sh
# Remember you can add as many comments to your
# script as you want, preceded by `#`.
#
# SLURM HEADER
#############################################
# JOB INFO
#SBATCH --account=test
#SBATCH --job-name=myFirstMultiCoreJob
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=12
#SBATCH --mem=12G
#SBATCH --output=path_to_my_output_directory/jobLog_%J.out
#SBATCH --error=path_to_my_output_directory/jobLog_%J.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=pepito@tuenti.com

# Load the modules you'll need (i.e. megahit)
##############################################
module load megahit/1.1.3

# Declare variables or write more comments so as
# you know in 3 months what this script is supposed
# to do ;-)

reads1=data/in1.fastq
reads2=data/in2.fastq
outdir=results

# And now do the job...
megahit -o ${outdir} --out-prefix megahit.test -t 12 -1 ${reads1} -2 ${reads2}

Example script for a job array

Job arrays are an easy way of submitting many similar jobs when you have several input files that have to undergo through the same workflow. It takes advantage of variable SLURM_ARRAY_TASK_ID (see above), that takes the values specified to the option --array.
For instance, if we have

$ ls -1 data
infile1.fasta
infile2.fasta
infile3.fasta

we can submit a single slurm script that spawns 3 different jobs, one for each input file, as follows:

#!/bin/sh
# Remember you can add as many comments to your
# script as you want, preceded by `#`.
#
# SLURM HEADER
#############################################
# JOB INFO
#SBATCH --account=test
#SBATCH --job-name=simpleArray
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=1G
#SBATCH --output=path_to_my_output_directory/jobLog_%A_%a.out
#SBATCH --error=path_to_my_output_directory/jobLog_%A_%a.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=pepito@tuenti.com
#SBATCH --array=1-3%1

# Load the modules you'll need
##############################################

# Declare variables or write more comments so as
# you know in 3 months what this script is supposed
# to do ;-)


# And now do the job...
# This simple job locates the `>` character at the beginning of a line
# in file infile.#.fasta, where # is the value of the ${SLURM_ARRAY_TASK_ID}
# variable, that is taken from the range given to `--array`,
# and counts how many of them there are, dumping the result to
# infile.#.count.txt

grep -c '^>' infile${SLURM_ARRAY_TASK_ID}.fasta > infile${SLURM_ARRAY_TASK_ID}.count.txt

You can run job arrays in which each task uses more than one cpu. You can run the example above with option --cpus-per-task=4 and each task (each element of the array) will use 4 cpus. If you don’t specify %n in --array=1-3 and there are enough free resources, all three tasks will run at the same time, using a total of 3 x 4 = 12 cpus. Notice that the value of cpus is that of each task!

You are encouraged to be respectful with other users and specify a sensible number of simultaneous jobs in your job array with %n, where n is a number that won’t clog the cluster. In this example, only one job will be run at the same time. If you need to change the number of simultaneous jobs once they have started see below scontrol update.

Notice that in the --output and --error options we have substituted %J to %A plus %a. The first option is equivalent to %J, as it takes the value of the job ID. The second takes the value of the array index, so you can identify each log file with first, a job submission, and second with the input file that has been processed.

If you are going to submit large arrays with lots of tasks, please use the option --exclusive=user. Otherwise, all tasks of your job will spread out throughout the cluster. Many users like to have empty nodes for their jobs and this would interfere with them.

Modifying job options once they are running

Sometimes you may need to change options of your jobs once they have started. You can achieve it using the command scontrol update. Many of the commands issued with scontrol are reserved to the administrator, but some others can be issued to jobs you own. From the scontrol man page:

Update job, step, node, partition, powercapping or reservation configuration per the supplied specification. SPECIFICATION is in the same format as the Slurm configuration file and the output of the show command described above. It may be desirable to execute the show command (described above) on the specific entity you want to update, then use cut-and-paste tools to enter updated configuration values to the update. Note that while most configuration values can be changed using this command, not all can be changed using this mechanism.

See some examples here:

If you want to throttle up or down the number of simultaneous tasks that can be execute in a job array (remember, the number behind % in --array=1-100%5 do

scontrol update JobId=<JobNumber> ArrayTaskThrottle=10

and it will go from executing 5 simultaneous tasks to run 10 of them. Alternatively you can use job name:

scontrol update JobName=<JobName> ArrayTaskThrottle=10

Following the same syntax, you can also update the amount of memory you want to use:

scontrol update JobName=<JobName> MinMemoryNode=<megabytes>

Interactive session

You can ask slurm to allocate an interactive session in a compute node. Instead of submitting a script with sbatch you may use the command srun (it does many other things, see the srun manpage. The simplest way of asking for an interactive slot is

srun --pty /bin/bash

This will give you one slot and as much memory as the system has bound to a core. You can tune your srun job to ask for more memory or more cores or nodes. You may use the abbreviated option flags (again, see the srun manpage):

srun -A <account> -c 4 --mem=10G --pty /bin/bash

The previous instance will allocate 4 cores and 10 GB for an interactive bash prompt. You may use /bin/sh or any other shell you fancy (provided that it is installed).

The option --pty is important. This gives a login prompt and a session that looks very much like a normal login session but it is on one of the compute nodes instead of being in the master node. If you forget the --pty you will not get a login prompt.

Exit the interactive session by typing

logout
# or
exit

or by simply pressing ctrl+d key combinations.

Benchmarking jobs

In the beginning it may seem difficult to choose values for slurm options like --mem or --time. You can always give conservative values and, based on the resources that you have used, similar future jobs can be more precisely declared. It is important to do it right, as resources can be managed better by slurm if it knows how long are they going to run and how much memory they are going to take. To avoid abuse (not that you are going to do it…), slurm monitors the differences between the amount of resources asked by the user and the real usage. Then it awards «karma» points to each user that affects the priority of their future jobs to ensure the fair use of the cluster.

You can monitor your jobs with

scontrol show job <jobNumber>

When they have finished you can use the command sacct to see the consumption of resources of your last jobs (CPU time, maximum memory usage…). Get information of older jobs by doing

sacct -j <jobNumber>

Useful slurm commands

  • squeue: Shows the status of the queue, including all jobs from all users. It can be customized to show more or less information.
  • sinfo: Shows the status of the nodes of the different queues (or partitions). It can also be customized.
  • sacct: Shows accounting information for your latest jobs. You can specify the job id to limit the output.
  • scancel: Allows you to kill one or more of your jobs. You can specify the job number or its name (with -n), or all your jobs (with -u >your_user_name>).

Make it an alias

It’s a good idea to make the previous commands a bit more friendly. This is my suggestion to declare some aliases at your ~/.bashrc file to get more information with fewer key stokes…

# SLURM aliases
alias sq="squeue -o \"%10A %12j %5K %10u %8a %10P %3t %10M %6D %6C %10f %R\" -u $(whoami)"
alias sqa="squeue -ao \"%10A %12j %5K %10u %8a %10P %3t %10M %6D %6C %10f %R\""
alias si="sinfo -o \"%14R %10n %5a %7t %6c %8O %8m %8e %7z %60E\""
alias sacct="sacct --format=jobid,jobname,submit,start,end,elapsed,ncpus,ntasks,MaxVMSize,AllocNodes,state"