You are here

Msc Teaching Cluster

Printer-friendly versionPrinter-friendly version

The Msc Teaching cluster is made up of nodes which are used to support MSc teaching courses. Currently the cluster consists of two groups of machines one supporting GPU based parallel processing courses and one which supports extreme computing.

Home directories

Please note that because of the way the cluster starts jobs on the compute nodes they do not have kerberos credentials and hence afs tokens so they are unable to access files in the /afs filestore. We have provided home directories on a local distributed filesystem to give users space to manage their data. You can cp files across from afs, you will find your home directory under /afs/inf.ed.ac.uk/users/...

CPU

We have a numbe of nodes which are available for cpu processing, please submit jobs via the cpu queue using

qsub -q cpu 

GPGPU

We currently have 5 nodes each providing 3 Geforce Titan X's scheduling is provided by Gridengine in our case we are using the Son of Gridengine open source fork. This is a batch scheduler which takes jobs into a queue and allocates them to nodes as the nodes become free. Interactive sessions can also be scheduled. See this link for a basic tutorial.

The head nodes are aliased to msccluster and msccluster1, the nodes themselves are letha01-05.

Quick Start

For Interactive jobs

[porthemmet]iainr: qlogin -q gpuinteractive
Your job 55 ("QLOGIN") has been submitted
waiting for interactive job to be scheduled ...
Your interactive job 55 has been successfully scheduled.
Establishing builtin session to host letha01.inf.ed.ac.uk ...
[letha01]iainr: nvidia-smi
Tue Jun 14 17:57:07 2016       
+------------------------------------------------------+                       
| NVIDIA-SMI 361.45     Driver Version: 361.45.11      |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX TIT...  Off  | 0000:02:00.0     Off |                  N/A |
| 22%   32C    P8    15W / 250W |     23MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX TIT...  Off  | 0000:04:00.0     Off |                  N/A |
| 22%   32C    P8    15W / 250W |     23MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX TIT...  Off  | 0000:84:00.0     Off |                  N/3XS Systems,A |
| 22%   32C    P8    15W / 250W |     23MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
[letha01]iainr: 

for batch jobs (assuming test.sh is a shell script)

[porthemmet]iainr: ls 
test.sh
[porthemmet]iainr: cat test.sh
#!/bin/sh

/bin/hostname
/usr/bin/who
/usr/bin/nvidia-smi

Submit the job using qsub (the output will go to <script>.o(jobnumber) in ~ or in the current working directory if you use -cwd

[porthemmet]iainr: qsub -cwd test.sh 
Your job 65 ("test.sh") has been submitted

Check on the status of the queue with qstat (qstat -u \* for all users)

[porthemmet]iainr: qstat
job-ID  prior   name       user         state submit/start at     queue     3XS Systems,                     slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
     65 0.55500 test.sh    iainr        r     06/14/2016 18:11:31 gpgpu@letha02.inf.ed.ac.uk         1    

and the results should be in the cwd.

[porthemmet]iainr: ls
test.sh  test.sh.e65  test.sh.o65
[porthemmet]iainr: cat test.sh.e65
[porthemmet]iainr: cat test.sh.o65
letha02.inf.ed.ac.uk
Tue Jun 14 18:11:31 2016       
+------------------------------------------------------+                       
| NVIDIA-SMI 361.45     Driver Version: 361.45.11      |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX TIT...  Off  | 0000:02:00.0     Off |                  N/A |
| 22%   30C    P8    15W / 250W |     23MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX TIT...  Off  | 0000:04:00.0     Off |                  N/A |
| 22%   32C    P8    15W / 250W |     23MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX TIT...  Off  | 0000:84:00.0     Off |                  N/A |
| 22%   29C    P8    15W / 250W |     23MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
[porthemmet]iainr: 

Limits/Quotas

There are currently no restrictions on the number of jobs people can submit however there is a soft (default) runtime limit of 6 hours and a hard (requestable) limit of 12 hours.

There is currently no disk quota on the shared home directory however this is a fixed finite resource and we would encourage users to use it responsibly and to clear away data when you are finished with it. Please not that whilst this filesystem has some build in redundancy to allow disaster recovery this filesystem is NOT backed up. DO NOT USE THIS FILESYSTEM AS THE ONLY STORAGE FOR IMPORTANT DATA. Disk space will be reclaimed once your access is finished, it is your responsibility to ensure that copies are made of anything you wish to keep.

Extreme Computing

under construction

Access

You should automatically be given access to the clusters if you are on the appropriate course. For project access please have your supervisor submit an rt ticket giving details of the project

Last reviewed: 
13/02/2017

System Status

Home dirs (AFS)
Network
Mail
Other services
Scheduled downtime

Choose a topic