Working with the Euler high-performance cluster
As an ETH student you get access to the Euler supercomputing cluster. Inaugurated in 2014 and hosted at CSCS in Lugano, Euler is a computer cluster dedicated for use by researchers and students alike. Courses like High-Performance Computing for Science and Engineering use it to allow students to practice the principles of working with a supercomputer. This page will contain a summary of my notes from this course.
Connecting to Euler
You can only connect to Euler from within the ETHZ network or through a VPN connection. Log in by using the command:
$ username@euler.ethz.ch
You will be prompted for your ETHZ password.
Modules
The Euler environment is organized in modules, which are conceptually software packages that can be loaded and unloaded as needed. The basic commands for working with modules are:
module load <modulename>
: set environment variables related tomodulename
.module unload <modulename>
: unset environment variables related tomodulename
.module list
: list loaded modulesmodule avail
: list all available modules.
Example: module load gcc
. Now we can compile C++ programs!
Jobs
High-performance software is not run on the login nodes, but submitted to the computing nodes with a job system. To submit a job use a command like:
$ bsub -n 24 -W 08:00 -o output_file ./program_name program_args
This command will submit a job requesting 24 processing cores from a single node and a wall-clock runtime or 8 hours. If the program is still running after 8 hours, it will be terminated. The report of the job, along with the information that would usually appear in the terminal, will be appended to the file output_file
, in the folder where the job started.
While one or more jobs are running you can use the command bjobs
to get the state and the IDs of submitted jobs.
In order to terminate a job you can use the command bkill <jobID>
.
I/O performance and $SCRATCH
Since your simulations might involve a lot of I/O you must never run your software in the $HOME
directory, but set up your runs in the $SCRATCH
space. The disks associated with this space are especially designed for heavy loads. However, $SCRATCH
is not designed for frequent storing. If you are logging temporary results into a file, you should open it once at the beginning of the run, and only flush it occasionally. Note that std::endl
not only appends \n
but also flushes the stream.
Furthermore, your quota in $HOME
is much smaller compared to $SCRATCH
, but any files on $SCRATCH
older than 15 days will be automatically deleted (see $SCRATCH/__USAGE_RULES__
).
Starting an interactive session
To start an interactive bash session on a compute node, use the -Is
flags and point to the bash installation, for example:
[bones@eu-login-02-ng ~]$ bsub -n 1 -W 1:00 -Is /bin/bash
Generic job.
Job <75404124> is submitted to queue <normal.4h>.
<<Waiting for dispatch ...>>
<<Starting on eu-ms-016-35>>
FILE: /sys/fs/cgroup/cpuset/lsf/euler/job.75404124.19517.1539329284/tasks
[bones@eu-ms-016-35 ~]$
This allows you to submit several jobs without waiting in queue each time. However, any running jobs will be killed if you log out.
Requesting specific CPUs
Euler has several clusters, and different CPUs are available. To perform benchmarking, it is usually better to use the same for every run. So, we can specify what processor we would like:
bsub -n 24 -R fullnode -R "select[model=XeonE5_2680v3]" -W 00:10 -Is bash
The following processors are available:
- XeonE5_2697v2
- XeonE5_2680v3
- XeonE7_8867v3
- XeonGold_6150
- XeonGold_5118