System Overview

Compute Nodes

There are 8 Nodes in Sakura System. Each nodes have following specifications:

CPU Xeon D-1571 16 Cores @1.30GHz
Memory 32GB
Accelerators PEZY-SC2, 700MHz, 64GB
Inteconnect Infiniband EDR
Storage nfs /home

File Systems

There is 1 storage mounted /home is shared with other uses. /home has 100GB and is shared with other users. Threre are no quotes. Please use gentlemanly.

Connecting

Connect with SSH

System Name Hostname RSA Key Fingerprints
Sakura matsu.exascaler.co.jp SHA256:gywcL7XDCOgXm4UXV4m0hi2Xzo2I4XLUD2CDnqeDPlA

For example, to connect to Sakura login node from a UNIX-based system, use the following command::

[your machine] $ ssh-add                               # Add ssh key to auth agent
[your machine] $ ssh -A userid@matsu.exascaler.co.jp   # You have to use SSH Agent forward

Then you can login to login node. To connect to Sakura Head Node from Sakura login node, use the following command::

[ssh] $ ssh esfe

First login, you should change your password. You have to use yppasswd command instead of passwd command.:

[esfe] $ yppasswd

Compiling

..(snip)..

Running Jobs

We use slurm for scheduling of clients jobs. Please use --exclusive option to grab entire of a node.

Single Node Job

Sample Job File:

#!/bin/bash
#SBATCH -p debug  # partition name
#SBATCH -N 1      # number of nodes
#SBATCH -n 1      # number of process
#SBATCH --exclusive

./myprogram

To submit job, use the sbatch command::

$ sbatch job.sh

Partition

There are some partition(s) in system. To see partition information, use the sinfo command::

$ sinfo

You could choose partition name in job script.

MPI Job

You can use openMPI. mpivars.sh location is following.

  • openmpi:

    $ source /usr/mpi/gcc/openmpi-1.8.4/bin/mpivars.sh
    

MPI with Slurm

Sample Job File:

#!/bin/bash
#SBATCH -p debug
#SBATCH -N 1 # number of nodes
#SBATCH -n 4 # number of process
#SBATCH --ntasks-per-node=4
#SBATCH --exclusive

source /usr/mpi/gcc/openmpi-1.8.4/bin/mpivars.sh

mpirun ./myprogram

Monitoring Jobs

To monitor your job, use the squeue command::

$ squeue

Canceling Jobs

To cancel your job, use the scancel command::

$ scancel [jobid]

More Information about slurm

See Slurm offical man pages for more information about sbatch, srun, squeue and scancel.

Debugging

You can use interactive session with following::

$ srun -N 1 --exclusive --pty bash

If you want to use 2 nodes::

$ srun -N 2 --exclusive --pty bash