Running Applications with Jobs

Running Applications with Jobs#

Because our HPC system is shared among many researchers, Research Computing manages system usage through jobs. Jobs are simply an allotment of resources that can be used to execute processes. Research Computing uses a program named the Simple Linux Utility for Resource Management, or Slurm, to create and manage jobs.

In order to run a program on a cluster, you must request resources from Slurm to generate a job. Resources can be requested from a login node or a compile node. You must then provide commands to run your program on those requested resources. Where you provide your commands depends on whether you are running a batch job or an interactive job.

When you run a batch job or an interactive job, it will be placed in a queue until resources are available.

See also

A detailed guide on the Slurm queue and accounting tools can be found in the Useful Slurm Commands page.

Note

Alpine is a heterogeneous system, meaning compute nodes have different hardware configurations. Nodes with similar capabilities are grouped into partitions, each offering different resources. When submitting a job, you must choose the partition that best supports your job’s needs. For more information, see the Alpine Hardware page.

Batch Jobs#

The primary method of running applications on Research Computing resources is through a batch job. A batch job is a job that runs on a compute node with little or no interaction with the users. You should use batch jobs for:

  • Any computationally expensive application that could take hours or days to run

  • Any application that requires little or no user input

  • Applications that you do not need to monitor extensively

Unlike running an application on your personal machine, you do not call the application you wish to run directly. Instead, you create a job script that includes a call to your application. Job scripts are simply a set of resource requests and commands. When a job script is run, all the commands in the job script are executed on a compute node.

Once created, you can run your job script by passing it to the Slurm queue with the sbatch command followed by your job script name:

sbatch <your-jobscript-name>

If no job script is provided then sbatch will take whatever commands follow as standard input.

See also

A detailed guide on constructing and running Job scripts can be found in the Batch Jobs and Job Scripting page.

Interactive Jobs#

Another method of running applications on Research Computing resources is through an interactive job. As the name would imply, an interactive job is a job that allows users to interact with requested resources in real-time. Users can run applications, execute scripts, or run other commands directly on a compute node. Interactive jobs should be used for:

  • Debugging applications or workflows

  • Any application that requires user input at runtime

  • Any application with a GUI (Graphical User Interface)

You can request an interactive job by using the sinteractive command. Similar to sbatch, resources must be requested. With sinteractive, this is done via the command line through the use of flags. You will need to, at a minimum, include the --partition, --qos, and --time flags. We encourage using the --ntasks and --nodes as well, otherwise the job will default to 1 task and 1 node.

sinteractive --partition=amilan --qos=normal --time=00:10:00 --ntasks=4 --nodes=1

The example above will submit an interactive job requesting the amilan partition with 4 cores on one node with the normal quality of service (QoS) for ten minutes. Once the interactive session has started, you will be provided a terminal session on a compute node. Within this session, you can run any interactive terminal application you may need from the command line.

Important

Be careful when setting --ntasks and ensure you also set --nodes. If --nodes is not set, Slurm may spread your job across multiple nodes. Also, be aware that GPU-based interactive jobs must set --nodes=1 and cannot currently run across multiple nodes.

See also

More details on sinteractive parameters can be found in the Slurm Flags, Partitions, and QoS page and in the Interactive Jobs page.