Project

General

Profile

Dione user instructions » History » Version 7

Anonymous, 2019-10-21 16:41

1 1 Anonymous
h1. User instructions for Dione cluster
2 6 Anonymous
3 1 Anonymous
University of Turku
4 1 Anonymous
Åbo Akademi
5 1 Anonymous
Jussi Salmi (jussi.salmi@utu.fi)
6 2 Anonymous
7 1 Anonymous
h2. 1. Resources
8 2 Anonymous
9 1 Anonymous
h3. 1.1. Computation nodes
10 1 Anonymous
11 5 Anonymous
<pre>
12 5 Anonymous
PARTITION NODES NODELIST  MEMORY
13 5 Anonymous
normal    36    di[1-36]  192GB
14 5 Anonymous
gpu       6     di[37-42] 384GB
15 5 Anonymous
</pre>
16 1 Anonymous
17 7 Anonymous
Dione has 6 GPU-nodes where the user can perform calculation which benefits from very fast and parallel number crunching. This includes e.g. neural nets. The 36 other nodes are general purpose processors. Please use 'normal' or 'gpu' partitions when starting jobs to keep the gpu resources separate. The nodes are connected via a fast network, Infiniband, enabling MPI (Message Passing Interface) usage in the cluster. In addition, the cluster is connected to the EGI-grid (European Grid Infrastructure) and NORDUGRID which are allowed to use a part of the computational resources. The website
18 1 Anonymous
19 1 Anonymous
https://p55cc.utu.fi/
20 1 Anonymous
21 1 Anonymous
Contains information on the cluster, a cluster monitor and provides instructions on getting access and using the cluster.
22 3 Anonymous
23 1 Anonymous
h3. 1.2. Disk space
24 1 Anonymous
25 1 Anonymous
The system has an NFS4 file system with 100TB capacity on the home partition. The system is not backed up anywhere, so the user must handle backups himself/herself.
26 3 Anonymous
27 1 Anonymous
h3. 1.3. Software
28 1 Anonymous
29 1 Anonymous
The system uses the SLURM workload manager (Simple Linux Utility for Resource Management) for scheduling the jobs.
30 1 Anonymous
31 1 Anonymous
The cluster uses the module-system for loading software modules with different version for execution.
32 3 Anonymous
33 1 Anonymous
h2. 2. Executing jobs in the cluster
34 1 Anonymous
35 1 Anonymous
The user may not execute jobs on the login node. All jobs must be dispatched to the cluster by using SLURM commands. Normally a script is used to define the jobs and the parameters for SLURM. There is a large number of parameters and environment variables that can be used to define how the jobs should be executed, please look at the SLURM manual for a complete list.
36 1 Anonymous
37 1 Anonymous
A typical script for starting the jobs can look as follows (name:batch-submit.job):
38 1 Anonymous
39 1 Anonymous
<pre>
40 1 Anonymous
#!/bin/bash
41 1 Anonymous
#SBATCH --job-name=test
42 1 Anonymous
#SBATCH -o result.txt
43 1 Anonymous
#SBATCH --workdir=<Workdir path>
44 1 Anonymous
#SBATCH -c 1
45 1 Anonymous
#SBATCH -t 10:00
46 1 Anonymous
#SBATCH --mem=10M
47 7 Anonymous
#SBATCH --partition=all
48 7 Anonymous
49 1 Anonymous
module purge # Purge modules for a clean start
50 1 Anonymous
module load <desired modules if needed> # You can either inherit module environment, or insert one here
51 1 Anonymous
52 1 Anonymous
srun <executable>
53 1 Anonymous
srun sleep 60
54 1 Anonymous
</pre>
55 1 Anonymous
56 1 Anonymous
57 1 Anonymous
The script is run with
58 1 Anonymous
59 1 Anonymous
sbatch batch-submit.job
60 1 Anonymous
61 1 Anonymous
The script defines several parameters that will be used for the job.
62 1 Anonymous
63 1 Anonymous
<pre>
64 7 Anonymous
--job-name          defines the name
65 7 Anonymous
-o result.txt       redirects the standard output to results.txt
66 7 Anonymous
--workdir           defines the working directory
67 7 Anonymous
-c 1                sets the number of cpus per task to 1
68 7 Anonymous
-t 10:00            the time limit of the task is set to 10 minutes. After that the process is stopped
69 7 Anonymous
--mem=10M           the memory required for the task is 10MB.
70 7 Anonymous
--partition=normal  use the 'normal' partition. Please use 'normal' or 'gpu' partitions to keep the gpu resources separate. 'all' uses all partitions.
71 1 Anonymous
</pre>
72 1 Anonymous
73 4 Anonymous
srun starts a task. When starting the task SLURM gives it a job id which can be used to track it’s execution with e.g. the squeue command.
74 1 Anonymous
75 1 Anonymous
76 1 Anonymous
h2. 3. The module system
77 1 Anonymous
78 1 Anonymous
Many of the software packages in Dione require you to load the kernel modules prior to using the software. Different versions of the software can be used with module.
79 1 Anonymous
80 1 Anonymous
<pre>
81 1 Anonymous
module avail Show available modules
82 1 Anonymous
83 1 Anonymous
module list Show loaded modules
84 1 Anonymous
85 1 Anonymous
module unload <module> Unload a module
86 1 Anonymous
87 1 Anonymous
module load <module> Load a module
88 1 Anonymous
89 1 Anonymous
module load <module>/10.0 Load version 10.0 of <module>
90 1 Anonymous
91 1 Anonymous
module purge unload all modules
92 1 Anonymous
</pre>
93 1 Anonymous
94 1 Anonymous
95 1 Anonymous
h2. 4. Useful commands in SLURM
96 1 Anonymous
97 1 Anonymous
<pre>
98 1 Anonymous
sinfo shows the current status of the cluster.
99 1 Anonymous
100 1 Anonymous
sinfo -p gpu Shows the status of the GPU-partition
101 1 Anonymous
sinfo -O all Shows a comprehensive status report node per node
102 1 Anonymous
103 1 Anonymous
sstat <job id> Shows information on your job
104 1 Anonymous
105 1 Anonymous
squeue The status of the job queue
106 1 Anonymous
squeue -u <username> Show only your jobs
107 1 Anonymous
108 1 Anonymous
srun <command> Dispatch jobs to the scheduler
109 1 Anonymous
110 1 Anonymous
sbatch <script> Run a script defining jobs to be run
111 1 Anonymous
112 1 Anonymous
scontrol Control your jobs in many aspects
113 1 Anonymous
scontrol show job <job id> Show details about the job
114 1 Anonymous
scontrol -u <username> Show only a certain users jobs
115 1 Anonymous
116 1 Anonymous
scancel <job id> Cancel a job
117 1 Anonymous
scancel -u <username> Cancel all your jobs
118 1 Anonymous
</pre>
119 1 Anonymous
120 1 Anonymous
121 1 Anonymous
h2. 5. Further information
122 1 Anonymous
123 1 Anonymous
Further information can be asked from the administrators (fgi-admins@lists.utu.fi).