Differences

This shows you the differences between two versions of the page.

--- quickstart [2023/09/11 18:52] – [The SLURM Scheduler] docuadmin
+++ quickstart [2025/05/08 19:43] (current) – docuadmin
@@ Line 19: / Line 19: @@
 ====== The SLURM Scheduler ======
-Compute jobs on the cluster are managed by a scheduler: SLURM. To run jobs on the cluster you'll need to prepare your jobs then submit them to the scheduler along with some information about the resources that are needed to run the job. (Refer to the [[https://slurm.schedmd.com/quickstart.html#commands]SLURM Quickstart guide] for more information on using and interacting with the SLURM scheduler.)
+Compute jobs on the cluster are managed by a scheduler: SLURM. To run jobs on the cluster you'll need to prepare your jobs then submit them to the scheduler along with some information about the resources that are needed to run the job. (Refer to the [[https://slurm.schedmd.com/quickstart.html#commands|SLURM Quickstart guide]] for more information on using and interacting with the SLURM scheduler.)
-Jobs can be submitted to the scheduler using the ''srun'' command. ''srun'' can be used with arguments to set parameters directly, or can be used with a script that sets parameters and defines the job to be executed.
+Jobs can be submitted to the scheduler using the ''srun'' command. ''srun'' can be used with arguments to set parameters directly, or can be used with a script that sets parameters and defines the job to be executed. At this time, each node on the Grinnell HPC cluster has two CPUs (sockets) with 10 cores each. If you need to submit a job that requires 16 cores, you would need to submit a job that requests 1 node, 2 sockets, and 8 cores per socket. If you need to submit a job that requires 32 cores, you would request 2 nodes, 2 sockets per node, 8 cores per socket. Alternatively, SLURM also uses the concept of //tasks// so rather than specifying sockets per node and cores per socket, you can specify tasks per node. For a 32 core job you could request 2 nodes, 16 tasks per node.
 These are common parameters passed to SLURM when submitting a job:
-  * Initial Working Directory: (''-d'') The directory the job should execute in
+  * Set Working Directory (''-D'') The remote process will change into this directory before running
-  * Resource list: (''-l'') Defines the resources needed by the job. Often defined resources are:
+  * Nodes (''-N''),
-    * nodes and processors per node (''nodes=N:ppn=N'')
+  * Sockets (or CPUs) per node (''--sockets-per-node'')
-    * physical memory per task (''pmem=N'')
+  * Cores (''--cores-per-socket'')
-    * walltime (''walltime=NN:NN:NN'')
+  * Tasks (''--tasks-per-node'')
-  * Output Path: (''-o'') The path for the output file of the job
+  * Memory required per node (''--mem'')
-  * Error Path: (''-e'') The path for the error file of the job
+  * Time limit (''-t=NN:NN'') Set a time limit on the runtime - a job that exceeds this limit may be killed
-  * Name: (''-N'') A name for the job
+  * Output Path (''-o'') The path for the output file of the job
+  * Error Path (''-e'') The path for the error file of the job
+  * Name (''-J'') A name for the job
-A complete list of parameters that Moab accepts is available in the [[http://docs.adaptivecomputing.com/9-1-3/suite/help.htm#topics/moabWorkloadManager/moabCommands/user-cmds.html%3FTocPath%3DMoab%2520Workload%2520Manager%7CChapter%25203%253A%2520Scheduler%2520Commands%7C_____6|Moab Documentation]].
+A complete list of parameters that ''srun'' accepts is available in the [[https://slurm.schedmd.com/srun.html|manpage]] for ''srun''.
-As an example, consider this command. It will print the date, wait 5 seconds, then print the date again.
+====== Run an Interactive Job on a Compute Node ======
+Running an interactive job on a compute node is the preferred method of testing jobs and scripts. Doing so avoids running resource-intensive workloads on the master (or login) node.
+To start a job with an interactive shell on a compute node you can use this command:
+    srun -N 1 -n 1 --pty /bin/bash
+====== Basic job submission ======
+Consider this command. If you enter this command on the command line of a Terminal, it will print the date, wait 5 seconds, then print the date again.
     $ date; sleep 5; date;
-To execute this job on the cluster, we would turn this command in to a very simple script:
+To execute this simple series of commands you can create a very simple script:
-File `date.sub`:
+File ''date.sub'':
     #!/bin/bash
@@ Line 46: / Line 58: @@
     sleep 5
     date
-Once the script is created it can submitted to the scheduler. But we also need to tell the scheduler what resources are needed and how long the job will likely take. This information can be passed to Moab directly in the `msub` command or it can be placed in the script itself.
-The job can be submitted to the scheduler adding flags to the `msub` command to request one node, one core, and specify a duration (i.e. walltime) of 6 seconds.
+The commands are now in a format that can be submitted to the scheduler to be run as a job on the cluster. But we also need to tell the scheduler what resources are needed. This information can be passed to SLURM directly in the ''srun'' command or it can be placed in the script itself.
-    $ msub -l nodes=1:ppn=1,walltime=00:06 date.sub
+The job can be submitted to the scheduler by adding flags to the ''sbatch'' command to request one node and one task per node:
-To place this request in the script itself, we modify the script to include an additional line with the resources request:
+    $ sbatch -J SleepJob -N 1 --tasks-per-node=1 date.sub &
-File `date.sub`:
+To place the resource request in the script itself, we modify the script to include additional lines for SLURM:
+File ''date.sub'':
     #!/bin/bash
-    #PBS -l nodes=1:ppn=1,walltime=02:00
+    #SBATCH -N 1
+    #SBATCH --tasks-per-node=1
+    #SBATCH -J SleepJob
     date
@@ Line 64: / Line 78: @@
     date
-Then the job is submitted with `msub`:
+Then the job is submitted with ''sbatch'':
-    $ msub date.sub
+    $ sbatch date.sub
-The ''msub'' command will return a number for your job, the JOBID.  Now that the job is in the queue, you can check the status of the job using the ''showq'' command or the ''checkjob'' command:
+The ''srun'' command will return a number for your job, the JOBID.  Now that the job is in the queue, you can check the status of the job using the ''squeue'' command:
-    $ showq
+    $ squeue --job <JOBID>
 -or-
+    $ squeue -n <Job name if specified upon submission>
+====== Next Steps ======
-    $ checkjob JOBID
+A step-by-step tutorial for creating and submitting a job using Open OnDemand is available here.
-Refer to the [[http://docs.adaptivecomputing.com/9-1-3/suite/help.htm#topics/moabWorkloadManager/moabCommands/user-cmds.html%3FTocPath%3DMoab%2520Workload%2520Manager%7CChapter%25203%253A%2520Scheduler%2520Commands%7C_____6|Moab Documentation]] for complete list of parameters that can be fed to the scheduler.
+Comprehensive [[https://slurm.schedmd.com/documentation.html|documentation]] for SLURM, including various [[https://slurm.schedmd.com/tutorials.html|tutorials]] is available on the Schedmd.com website.