Working with the Euler high-performance cluster

As an ETH student you get access to the Euler supercomputing cluster. Inaugurated in 2014 and hosted at CSCS in Lugano, Euler is a computer cluster dedicated for use by researchers and students alike. Courses like High-Performance Computing for Science and Engineering use it to allow students to practice the principles of working with a supercomputer. This page will contain a summary of my notes from this course.

Connecting to Euler

You can only connect to Euler from within the ETHZ network or through a VPN connection. Log in by using the command:

$ username@euler.ethz.ch

You will be prompted for your ETHZ password.

Modules

The Euler environment is organized in modules, which are conceptually software packages that can be loaded and unloaded as needed. The basic commands for working with modules are:

  • module load <modulename>: set environment variables related to modulename.
  • module unload <modulename>: unset environment variables related to modulename.
  • module list: list loaded modules
  • module avail: list all available modules.

Example: module load gcc. Now we can compile C++ programs!

Jobs

High-performance software is not run on the login nodes, but submitted to the computing nodes with a job system. To submit a job use a command like:

$ bsub -n 24 -W 08:00 -o output_file ./program_name program_args

This command will submit a job requesting 24 processing cores from a single node and a wall-clock runtime or 8 hours. If the program is still running after 8 hours, it will be terminated. The report of the job, along with the information that would usually appear in the terminal, will be appended to the file output_file, in the folder where the job started.
While one or more jobs are running you can use the command bjobsto get the state and the IDs of submitted jobs.
In order to terminate a job you can use the command bkill <jobID>.

I/O performance and $SCRATCH

Since your simulations might involve a lot of I/O you must never run your software in the $HOME directory, but set up your runs in the $SCRATCH space. The disks associated with this space are especially designed for heavy loads. However, $SCRATCH is not designed for frequent storing. If you are logging temporary results into a file, you should open it once at the beginning of the run, and only flush it occasionally. Note that std::endl not only appends \n but also flushes the stream.
Furthermore, your quota in $HOME is much smaller compared to $SCRATCH, but any files on $SCRATCH older than 15 days will be automatically deleted (see $SCRATCH/__USAGE_RULES__).

Starting an interactive session

To start an interactive bash session on a compute node, use the -Is flags and point to the bash installation, for example:

[bones@eu-login-02-ng ~]bsub -n 1 -W 1:00 -Is /bin/bash
Generic job.
Job <75404124> is submitted to queue <normal.4h>.
<<Waiting for dispatch ...>>
<<Starting on eu-ms-016-35>>
FILE: /sys/fs/cgroup/cpuset/lsf/euler/job.75404124.19517.1539329284/tasks
[bones@eu-ms-016-35 ~] 

This allows you to submit several jobs without waiting in queue each time. However, any running jobs will be killed if you log out.

Requesting specific CPUs

Euler has several clusters, and different CPUs are available. To perform benchmarking, it is usually better to use the same for every run. So, we can specify what processor we would like:

bsub -n 24 -R fullnode -R "select[model=XeonE5_2680v3]" -W 00:10 -Is bash

The following processors are available:

  • XeonE5_2697v2
  • XeonE5_2680v3
  • XeonE7_8867v3
  • XeonGold_6150
  • XeonGold_5118

Useful links