You are here

James and Charles cluster

Printer-friendly versionPrinter-friendly version

The James and Charles cluster is available for use by PPAR and DS CDT students.

Admin mailing list

To recieve administrative emails regarding the cluster please sign up to the cdtcluster mailing list which is a low traffic list comtaining information about downtime and other notices.

Nodes

The cluster is made up of the following nodes:

GPU Nodes
charles01 to charles15. nodes 1 and 2 have one NVIDIA Tesla K40m and one Gefroce GTX TITAN X installed, Nodes 3-10 have two NVIDIA Tesla K40ms installed, nodes 13 and 14 currently have a single Geforce TITAN X installed. all nodes have 2 16 core Xeons
Multiprocessor nodes
james01 to james21. nodes have 4 16 core Opterons
Big memory nodes
We have two nodes anna and mary which are similar to the james nodes but have 1TB of memory.

Software

The cluster is running the SL7 version of DICE, since this is still being developed for use there may be some software missing, if there is anything you wish to be added please submit an RT ticket.

Scheduling

The clusters are switching over to using gridengine as a scheduler, this will be staged with the charles machines being done first and then other nodes being added opportunistically as they become free.

Note that the nodes will be reinstalled as part of this process and that /disk/scratch is cleared between installsso you will need to ensure you have copies of any files you want kept.

In our case we are using the Son of Gridengine open source fork. This is a batch schedulter which takes jobs into a queue and allocates them to nodes as the nodes become free. Interactive sessions can also be scheduled. See this link for a basic tutorial.

The head nodes are aliased to cdtcluster and cdtcluster1.

Quick Start

For Interactive jobs

[porthemmet]iainr: qlogin -q gpuinteractive
Your job 55 ("QLOGIN") has been submitted
waiting for interactive job to be scheduled ...
Your interactive job 55 has been successfully scheduled.
Establishing builtin session to host letha01.inf.ed.ac.uk ...
[letha01]iainr: nvidia-smi
Tue Jun 14 17:57:07 2016       
+------------------------------------------------------+                       
| NVIDIA-SMI 361.45     Driver Version: 361.45.11      |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX TIT...  Off  | 0000:02:00.0     Off |                  N/A |
| 22%   32C    P8    15W / 250W |     23MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX TIT...  Off  | 0000:04:00.0     Off |                  N/A |
| 22%   32C    P8    15W / 250W |     23MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX TIT...  Off  | 0000:84:00.0     Off |                  N/A |
| 22%   32C    P8    15W / 250W |     23MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
[letha01]iainr: 

for batch jobs (assuming test.sh is a shell script)

[porthemmet]iainr: ls 
test.sh
[porthemmet]iainr: cat test.sh
#!/bin/sh

/bin/hostname
/usr/bin/who
/usr/bin/nvidia-smi

Submit the job using qsub (the output will go to <script>.o(jobnumber) in ~ or in the current working directory if you use -cwd

[porthemmet]iainr: qsub -q gpgpu -cwd test.sh 
Your job 65 ("test.sh") has been submitted

Check on the status of the queue with qstat (qstat -u \* for all users)

[porthemmet]iainr: qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
     65 0.55500 test.sh    iainr        r     06/14/2016 18:11:31 gpgpu@letha02.inf.ed.ac.uk         1    

and the results should be in the cwd.

[porthemmet]iainr: ls
test.sh  test.sh.e65  test.sh.o65
[porthemmet]iainr: cat test.sh.e65
[porthemmet]iainr: cat test.sh.o65
letha02.inf.ed.ac.uk
Tue Jun 14 18:11:31 2016       
+------------------------------------------------------+                       
| NVIDIA-SMI 361.45     Driver Version: 361.45.11      |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX TIT...  Off  | 0000:02:00.0     Off |                  N/A |
| 22%   30C    P8    15W / 250W |     23MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX TIT...  Off  | 0000:04:00.0     Off |                  N/A |
| 22%   32C    P8    15W / 250W |     23MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX TIT...  Off  | 0000:84:00.0     Off |                  N/A |
| 22%   29C    P8    15W / 250W |     23MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
[porthemmet]iainr: 

Last reviewed: 
13/09/2017

System Status

Home dirs (AFS)
Network
Mail
Other services
Scheduled downtime

Choose a topic