Queuing System Guide - UGE

Contents:

Overview

In an HPC cluster, the users tasks to be done on compute nodes are controlled by a batch queuing system.  On Talon 2, we have chosen the Univa Grid Engine (UGE), which is an updated software of the no longer Sun Grid Engine (SGE). Queuing system manage job requests (shell scripts generally referred to as jobs) submitted by users.  In other words, to get your computations done by the cluster, you must submit a job request to a specific batch queue. The scheduler will assign your job to a compute node in the order determined by the policy on that queue and the availability of an idle compute node. Currently, Talon 2 resources have several policies in place to help guarantee fair resource utilization. 

Queues

Name Description
serial.q For running single processor jobs, sharing resources on compute nodes with other users.
parallel.q For running multi-processor jobs, whether on a single host or multiple hosts.  Requires  "-l exculsive_job=1" and "-pe" options. See below.
test.q For running quick test computations for debugging purposes.  Jobs are limited to 12 hours and a total of 32 cores.

UGE Commands

The following table lists frequently used commands

UGE Command Discription LSF Equiv.
qsub script.job submit a job bsub < script.job
qstat display status user's jobs bjobs
qstat -g c display queue summary stauts bqueues
qquota display current resource quota  
qdel delete a job in current state bkill
qalter modify a pending job bmod
qlogin or qrsh run a interactive job bsub -Is bash
qmake run a parallel make to compile code  

Job State

When using qstat, the following job states are possible and combine with the entity.  Example state: "qw" meaning queue wait.
State Description
r Running state. The job is running on the execution host.
t Job is in a transferring state. The job is sent to the execution host.
d The job is in a deletion state. The job is currently deleted by the system.
E The job is in an error state.
R The job was restarted.
T The job is in a suspended state because of threshold limitations.
w The job is in a waiting state.
h The job is in a hold state. The hold state prevents scheduling of the job.
S The job is in an automatic suspended state. The job suspension was triggered not directly.
s The job is in a manual suspend state. The job suspension was triggered manually.
z The job is in a zombie state.
Entity Description
g global
q queue
h host

The following table lists common UGE variables, for a compete list see the qsub manpage:

UGE Variable Discription
SGE_O_WORKDIR current working directory of the submitting client
JOB_ID unique identifier assigned when the job was submitted
NSLOTS number of queue slots in use by a parallel job
NHOSTS number of hosts in use by a parallel job
SGE_TASK_ID index  number of the current array job task

Job Submission Tips

At the top of your job script, begin with special directive #$, which are qsub options.  Alternatively these options also could be submitted as command line options with qsub.

  • #$ -V

    Exports all environment variables at job submission time into the job script.

  • #$ -cwd

    Execute the job from the current working directory.  NOTE: If you do not include this, you job will try to execute from your home directory on the compute node.

  • #$ -q parallel.q

    Defines the queue which may be used to execute this job.

  • #$ -pe openmpi_16 64

    Defines the parallel environment as openmpi with 16 cores per host and 64 cores (slots) total.  NOTE: your job will not start if you request less total slots than the openmpi_X number of slots per host. Ex: "-pe openmpi_16 8" will not run.

  • #$ -l exclusive_job=1

    Sets the job to be exclusive not allowing other jobs to share the compute node.  This is required for all parallel.q submissions.

  • #$ -l mem_free=48G

    Set the  minimum amount of free memory required per host to run your job. For example, requesting more than 32GB, as done with "48G" above will get you on a 64GB node.

  • #$ -m abe

    Defines under which circumstances mail is to be sent to the job owner. (a) aborted, (b) begins, (e) ends.

  • Send mail to the specified email destination user@domain.com

Basic Serial (non-parallel) job script:

#!/bin/bash
#$ -V
#$ -cwd
#$ -q serial.q

## SETUP STORAGE ##
STORAGE_DIR="/storage/scratch2/$USER/$JOB_ID"
mkdir -pv $STORAGE_DIR
cd $STORAGE_DIR
## COPY FILES ##
cp $SGE_CWD_PATH/file.in $STORAGE_DIR
## EXECUTE CODE ##
time executable < file.in > file.out
## COPY OUTPUT BACK ##
cp -a file.out $SGE_CWD_PATH

Basic Parallel (OpenMPI) job script:

#!/bin/bash
#$ -V
#$ -cwd
#$ -q parallel.q
#$ -pe openmpi_16 64
#$ -l exclusive_job=1

## SETUP STORAGE ##
STORAGE_DIR="/storage/scratch2/$USER/$JOB_ID"
mkdir -pv $STORAGE_DIR
cd $STORAGE_DIR
## COPY FILES ##
cp $SGE_CWD_PATH/file.in $STORAGE_DIR
## EXECUTE CODE ##
time mpirun executable < file.in > file.out
## COPY OUTPUT BACK ##
cp -a file.out $SGE_CWD_PATH

 

Basic Interactive Jobs:

You can submit interactive jobs utilizing the bsubcommand. This is ideal for many tasks including interactive shell sessions, pre/post processing. In the simplest form you can request a interactive shell on a node:

$ qlogin

At this point basic job information will print followed by spawning the shell:

Your job 764 ("QLOGIN") has been submitted
waiting for interactive job to be scheduled ...
Your interactive job 764 has been successfully scheduled.
Establishing builtin session to host c64-6-7.local ...

Once the job has been dispatched you can proceed interacting as with any other bash shell prompt. When you are done, type exitto end the interactive session and release the node.

Basic job array script

#!/bin/bash
#$ -V
#$ -cwd
#$ -q serial.q

## SETUP STORAGE ##
STORAGE_DIR="/storage/scratch2/$USER/$JOB_ID"
mkdir -pv $STORAGE_DIR
cd $STORAGE_DIR
## COPY FILES ##
cp $SGE_CWD_PATH/file.in.$SGE_TASK_ID $STORAGE_DIR
## EXECUTE CODE ##
time executable < file.in.$SGE_TASK_ID > file.out.$SGE_TASK_ID
## COPY OUTPUT BACK ##
cp -a file.out.$SGE_TASK_ID $SGE_CWD_PATH