tutorials_scheduler

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
tutorials_scheduler [2025/05/16 21:43] – [Interactive Jobs] docuadmintutorials_scheduler [2025/05/27 19:32] (current) – [Non-interactive Jobs] docuadmin
Line 1: Line 1:
 ===== HPC Jobs ===== ===== HPC Jobs =====
  
-In the context of HPC, a job is a task (a program or script) that you ask the computer to run. Jobs are managed by a scheduler that accepts submissions from all cluster users and works out when and on which resources the job will run. When you connect to the cluster, you connect to a login node or master node. Jobs are not run on the login node as the high resource demand of the job would hinder the functionality of this node, which is to provide access to your files, and accept job submissions. Instead, your jobs should be run on the compute nodes by submitting a request to the scheduler and allowing the scheduler to allocate a compute node (or nodes) for your job execution and executing the job when the nodes are available. Jobs can take two forms: interactive and non-interactive. Interactive jobs are jobs that require you to provide input while the job is running. For these jobs, the scheduler will allocate a compute node and will connect you to an interactive shell on the allocated node. From the scheduler's standoint, the job is running as long as the shell is open; when the shell is exited, the job is completed. Interactive jobs can also be applications, like Jupyter Notebook or RStudio, that allow user interaction. Non-interactive jobs are jobs that can be executed on a compute node and do not require any interaction or input while the job is running.+In the context of HPC, a **job** is a task (a program or script) that you ask the computer to run. Jobs are managed by a **scheduler** that accepts submissions from all cluster users and works out when and on which resources the job will run. When you connect to the cluster, you connect to a login node or **master node**. Jobs are not run on the login node as the high resource demand of the job would hinder the functionality of this node, which is to provide access to your files, and accept job submissions. Instead, your jobs should be run on the **compute nodes** by submitting a request to the scheduler and allowing the scheduler to allocate a compute node (or nodes) for your job execution and executing the job when the nodes are available. Jobs can take two forms: interactive and non-interactive. **Interactive jobs** are jobs that require you to provide input while the job is running. For these jobs, the scheduler will allocate a compute node and will connect you to an interactive shell on the allocated node. From the scheduler's standoint, the job is running as long as the shell is open; when the shell is exited, the job is completed. Interactive jobs can also be applications, like Jupyter Notebook or RStudio, that allow user interaction. Typically interactive jobs are run to experiment and test scripts and workflows. **Non-interactive jobs** are jobs that can be executed on a compute node and do not require any interaction or input while the job is running. Generally speaking, your goal is to design jobs that can reliably run non-interactively so they can be submitted to the scheduler and you can return to the output when the job is finished which may be hours, days or weeks later.
  
 ===== Interactive Jobs ===== ===== Interactive Jobs =====
  
  
 +An interactive job can be run on a compute node using the ''salloc'' command. The ''salloc'' command obtains a resource allocation from the scheduler and executes a command. For an interactive job, the command that is passed to ''salloc'' to execute is an interactive shell like ''bash''. Once the command is finished executing (i.e. when the bash shell is exited) the allocated resources are released.
 +
 +''salloc'' has a number of options that can be used to specify the requested resources and set some properties of the job. Some useful and frequently used options are:
 +
 +  * Set Working Directory (''-D'') The remote process will change into this directory before running
 +  * Nodes (''-N'') sets the number of nodes to be allocated to the job
 +  * Tasks (''-n'') specifies the max number of tasks that steps (discrete commands in the shell) will run
 +  * Memory required per node (''--mem''), default unit is megabytes
 +  * Time limit (''-t=days-HH:MM:SS'') Set a time limit on the runtime - a job that exceeds this limit may be killed
 +  * Name (''-J'') A name for the job (this makes it easier to spot in logs and scheduler status queries)
 +
 +Example command requesting 2 nodes for a job you anticipate will take 4 hours and where you plan to run a script that will require 8 tasks:
 +
 +  $ salloc -N 2 -n 8 -t 04:00:00
 +
 +The execution environment allocated by this command includes 2 nodes with 8 cores distributed across those nodes.
 +Commands issued within this shell will still be executed on the master node unless invoked with the SLURM command ''srun''. If you issue the ''srun'' command within this execution environment, SLURM will distribute the tasks invoked by the ''srun'' command across these resources. Each invocation of the ''srun'' command is a job step within the ''salloc'' job.
  
 ===== Non-interactive Jobs ===== ===== Non-interactive Jobs =====
 +
 +**Non-interactive jobs** are submitted to the scheduler using one of two commands: ''srun'' and ''sbatch''.
 +The ''srun'' command submits a job for execution in real time. If ''srun'' is invoked within an allocation acquired through ''salloc'' then the ''srun'' command uses that allocation to execute the job immediately. If ''srun'' is invoked without a previously acquired allocation, the job parameters issued with the ''srun'' command are used to request an allocation and execute the job.
 +
 +This ''srun'' command will execute the ''hostname'' command with the default allocation of 1 node and 1 task:
 +
 +  srun hostname
 +
 +The result is the hostname of the compute node on which the ''hostname'' command is executed. This ''srun'' command will execute the ''hostname'' command 10 times, 5 times on each of 2 requested nodes:
 +
 +  srun -N 2 -n 10 hostname
 +  
 +The result is the hostname of each of the compute nodes will be printed 5 times.
 +
 +The ''sbatch'' command submits a batch script to be executed at a time of the scheduler's choosing: presumably as soon as all the requested resources become available. A batch script contains SLURM directives (specifying required resources and job properties) and commands to be executed as part of the job.
 +
 +A batch script submitted to ''sbatch'' will contain SLURM directives at the start of the script. These directives mirror the options and parameters that can be passed to ''srun'' as flags to specify the number of nodes, tasks, and other SLURM-related options of the job. These directives are placed in the batch script, one per line, with the line beginning with "#SBATCH". These options set up the execution environment for the commands that follow later in the script.
 +  
 +  #!/bin/bash
 +  
 +  #SBATCH -N 2
 +  #SBATCH -n 10
 +  ...
 +
 +Following the ''#SBATCH'' directives will be the commands or scripts that make up the compute job you want to execute. To distribute the tasks across the allocated processors, the batch script should contain an 'srun' command or similar command (''mpirun'', ''mpiexec'', etc) to assign tasks to allocated processors. Without such a command the commands issued in the batch script will just be steps in a single task assigned to a single processor. The output of the following batch script will illustrate this:
 +
 +File ''printhostname.sub'':
 +
 +  #!/bin/bash
 +    
 +  #SBATCH -N 2
 +  #SBATCH -n 10
 +    
 +  echo "This command is executed on $('hostname')
 +  echo "The following 'hostname' commands come from 'srun':"
 +  srun hostname | sort
 +
 +Submit a batch job by supplying the batch script as the first argument of the 'sbatch' command:
 +
 +  sbatch printhostname.sub
 +
 +Output:
 +
 +  This command is executed on node1
 +  The following 'hostname' commands come from 'srun':
 +  node1
 +  node1
 +  node1
 +  node1
 +  node1
 +  node2
 +  node2
 +  node2
 +  node2
 +  node2
 +
 +