You are here
Hadoop cluster
The DICE Hadoop cluster is for Extreme Computing students, who get access automatically when needed. When Extreme Computing isn't running, researchers can use the cluster for tests - just ask for access using the Computing Support form.
Using the cluster
The cluster's head node or "namenode" is scutter01
. To login to it, ssh scutter01
. Hadoop commands - such as accessing your HDFS file space or scheduling jobs with YARN - can be accessed from it. Please do not run jobs on scutter01!
Hardware and software
The cluster uses Hadoop 2.9.2.
Each of the twelve nodes is a Dell PowerEdge R430 with 96GB of memory, 6TB of SATA disk and two 10-core Xeon E5-2650 v3 2.3GHz CPUs.
All nodes are on the same network switches.
Test cluster
From time to time there's also a small Hadoop test cluster. It's useful for testing software and configuration changes without disrupting users of the main cluster. The test cluster uses virtual machines.
Learning Hadoop
The Hadoop documentation includes a Single Node tutorial which lets you have a play with your own temporary one-node Hadoop cluster. After that, try the Map/Reduce Tutorial on the DICE Hadoop cluster.
Questions?
If you have a question or a problem to do with the Hadoop cluster please ask Computing Support.
Further reading
For information about Hadoop see: