You are here

Hadoop cluster

Printer-friendly versionPrinter-friendly version

The DICE Hadoop cluster is for Extreme Computing students, who get access automatically when needed. When Extreme Computing isn't running, researchers can use the cluster for tests - just ask for access using the Computing Support form.

Using the cluster

The cluster's head node or "namenode" is scutter01. To login to it, ssh scutter01. Hadoop commands - such as accessing your HDFS file space or scheduling jobs with YARN - can be accessed from it. Please do not run jobs on scutter01!

Hardware and software

The cluster uses Hadoop 2.9.2.
Each of the twelve nodes is a Dell PowerEdge R430 with 96GB of memory, 6TB of SATA disk and two 10-core Xeon E5-2650 v3 2.3GHz CPUs.
All nodes are on the same network switches.

Test cluster

From time to time there's also a small Hadoop test cluster. It's useful for testing software and configuration changes without disrupting users of the main cluster. The test cluster uses virtual machines.

Learning Hadoop

The Hadoop documentation includes a Single Node tutorial which lets you have a play with your own temporary one-node Hadoop cluster. After that, try the Map/Reduce Tutorial on the DICE Hadoop cluster.

Questions?

If you have a question or a problem to do with the Hadoop cluster please ask Computing Support.

Further reading

For information about Hadoop see:

Last reviewed: 
25/01/2019

System Status

Home dirs (AFS)
Network
Mail
Other services
Scheduled downtime

Choose a topic