You are here

Msc Teaching Cluster

Printer-friendly versionPrinter-friendly version

The teaching cluster will be receiving additional nodes in early 2018 and will switch over to an alternative scheduler slurm The idea is to reserve the original nodes for project work and use the new nodes for teaching and undergraduate project work. Unfortunately this is all very dynamic at the moment, watch this space for more information.

The Msc Teaching cluster/MLP cluster is made up of nodes which are used to support MSc teaching courses. Currently the cluster consists of two groups of machines one supporting GPU based parallel processing courses and one which supports extreme computing. The head nodes are accessible as mlp, mlp1 and mlp2.

Home directories

Please note that because of the way the cluster starts jobs on the compute nodes they do not have kerberos credentials and hence afs tokens so they are unable to access files in the /afs filestore. We have provided home directories on a local distributed filesystem to give users space to manage their data. You can cp files across from afs, you will find your home directory under /afs/inf.ed.ac.uk/users/...

Quick Start

You can see what partitions are available using sinfo

[escience5]iainr: sinfo
PARTITION   AVAIL  TIMELIMIT  NODES  STATE NODELIST
Interactive    up    2:00:00      2   idle landonia[01,25]
Standard*      up    8:00:00      2    mix landonia[04,11]
Standard*      up    8:00:00     10   idle landonia[13-17,20-24]
Short          up    4:00:00      1    mix landonia18
Short          up    4:00:00      1   idle landonia02
LongJobs       up 3-08:00:00      1  drain landonia10
LongJobs       up 3-08:00:00      5    mix landonia[03-04,11,18-19]

Note that you will have to specify the partition you wish to use and, if you want to use GPUs how many GPUs you want to use. By default your jobs will run in the stared partition and you will get one GPU.

For Interactive jobs

escience6]iainr: srun  --pty bash
[charles17]iainr: nvidia-smi
Thu Jun 14 08:45:12 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.48                 Driver Version: 390.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN X (Pascal)    Off  | 00000000:02:00.0 Off |                  N/A |
| 23%   25C    P8     9W / 250W |      0MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
[charles17]iainr: exit
[escience6]iainr: 

for batch jobs (assuming test.sh is a shell script)


[escience5]iainr: ls test2.sh
test2.sh
[escience5]iainr: cat test2.sh
#!/bin/sh

/bin/hostname
/usr/bin/who
/usr/bin/nvidia-smi

Submit the job using sbatch requesting 2 gpus as requestable resources, in this case we're going to run squeue immediately after the sbatch command because the job will have been scheduled and run before we can type the squeue command

[escience5]iainr: sbatch  --gres=gpu:2 test2.sh ; squeue
Submitted batch job 127096
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
            126488  Standard   run.sh s1302760 PD       0:00      1 (PartitionTimeLimit)
            127096  Standard test2.sh    iainr PD       0:00      1 (None)
            126716  LongJobs   run.sh s1302760  R 1-19:24:23      1 landonia04
            126745  LongJobs   run.sh s1302760  R 1-18:36:28      1 landonia03
            126751  LongJobs   run.sh s1302760  R 1-18:14:14      1 landonia03
            126769  LongJobs   run.sh s1302760  R 1-17:44:40      1 landonia18
            126770  LongJobs   run.sh s1302760  R 1-17:44:08      1 landonia18
            126851  LongJobs run_ted_ s1723861  R 1-12:23:27      1 landonia04
            126852  LongJobs run_ted_ s1723861  R 1-12:21:41      1 landonia04
            126998  LongJobs CNN_BALD s1718004  R   16:03:31      1 landonia11
            127000  LongJobs CNN_BALD s1718004  R   16:02:59      1 landonia03
            127002  LongJobs CNN_BALD s1718004  R   16:02:53      1 landonia04
            127026  LongJobs CNN_Kcen s1718004  R   11:59:02      1 landonia18
            127027  LongJobs CNN_Kcen s1718004  R   11:58:55      1 landonia18
            127028  LongJobs CNN_Kcen s1718004  R   11:58:51      1 landonia18
            127030  LongJobs run-epoc s1739461  R   11:46:35      1 landonia11
            127039  LongJobs run-epoc s1739461  R   11:16:33      1 landonia03
            127054  LongJobs run-epoc s1739461  R   10:46:32      1 landonia03
            127087  LongJobs run-epoc s1739461  R    8:56:28      1 landonia18
            127088  LongJobs run-epoc s1739461  R    8:46:27      1 landonia19
            127089  LongJobs run-epoc s1739461  R    8:36:26      1 landonia03
            127090  LongJobs run-epoc s1739461  R    8:26:26      1 landonia19
            127091  LongJobs run-epoc s1739461  R    8:16:25      1 landonia19
            127092  LongJobs run-epoc s1739461  R    8:06:25      1 landonia03
            127093  LongJobs run-epoc s1739461  R    8:06:25      1 landonia19
            127094  LongJobs run-epoc s1739461  R    7:46:24      1 landonia19
[escience5]iainr: ls *.out
slurm-127096.out
[escience5]iainr: cat slurm-127096.out
landonia05.inf.ed.ac.uk
Fri Jun 15 09:38:10 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.48                 Driver Version: 390.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 106...  Off  | 00000000:04:00.0 Off |                  N/A |
| 24%   33C    P0    26W / 120W |      0MiB /  6078MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 106...  Off  | 00000000:09:00.0 Off |                  N/A |
| 24%   33C    P0    28W / 120W |      0MiB /  6078MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
[escience5]iainr: 

Limits/Quotas

There is currently no disk quota on the shared home directory however this is a fixed finite resource and we would encourage users to use it responsibly and to clear away data when you are finished with it. Please not that whilst this filesystem has some build in redundancy to allow disaster recovery this filesystem is NOT backed up. DO NOT USE THIS FILESYSTEM AS THE ONLY STORAGE FOR IMPORTANT DATA. Disk space will be reclaimed once your access is finished, it is your responsibility to ensure that copies are made of anything you wish to keep.

Extreme Computing

The scutter nodes are currently reserved for students on the extreme computing course please see the course notes for details of how to use them.

Access

You should automatically be given access to the clusters if you are on the appropriate course. For project access please have your supervisor submit an rt ticket giving details of the project

Last reviewed: 
13/02/2017

System Status

Home dirs (AFS)
Network
Mail
Other services
Scheduled downtime

Choose a topic