Networking in the cluster can sometimes be confusing as some aspects are configured differently to the rest of the Informatics network. This page will explain the configuration and how this interacts with accessing the cluster including using the VPNs.
The principle differences are:
- cluster nodes are not accessible by SSH;
- cluster nodes use RFC1918 addressing.
Inbound connections
Head nodes for clusters are accessible from within the Informatics network by SSH, and you can access other servers that you run on the head nodes such as for jupyter and wandb.
Compute nodes cannot be connected to by SSH: you must submit jobs using the Slurm system. Within your jobs you can run servers and make connections between nodes (with MPI, for example).
Most compute nodes use RFC1918 addresses and so cannot be connected to from outside of the Informatics routing realm. When using the Informatics OpenVPN, your connecting computer can make connections to compute nodes if you have selected a configuration that routes their addresses. The avoid conflicts with your local network, most of the configurations filter out Informatics' RFC1918 networks - but you can add these back into a local copy of the config file. Alternatively, you can use the AllNets configuration. This is a good choice when you're on site and using the EduRoam wifi - all your network access will become essentially be 'inside Infomatics'. On your home network, AllNets will prevent you from accessing local resources such as printers, AirPlay, Chromecast and so on.
Inf network | Inf OpenVPN | Edlan and wifi | Uni VPN | Internet | |
---|---|---|---|---|---|
remote.ssh gateways | yes | yes | yes | yes | yes |
other ssh gateways | yes | yes | yes | yes | no |
xrdp/login | yes | yes | yes | yes | no |
most DICE & SM servers | yes | yes | no | no | no |
compute servers | yes | yes | no | no | no |
cluster head nodes | yes | yes | no | no | no |
cluster compute nodes(*1) | yes | yes(*2) | no | no | no |
DICE w/ RFC1918 v4 addr | yes | yes(*2) | no | no | no |
(*1) No SSH login access - all access is through Slurm
(*2) RFC1918 address ranges are not included in the InfNets OpenVPN config by default
Jupyter servers
If you're running a Jupyter server as part of a compute job, you should review the above table for where you can access the server from. If you're not able to use one of the above networks for access, you could use a SSH port forward on your connection to the cluster head node: for example, -L 8888:crannog01.inf.ed.ac.uk:8888
.
Outbound connections
In general outbound connections are allowed. An example is downloading a model from a HuggingFace or code from GitHub. However, because most cluster nodes have RFC1918 addresses, you'll find that outbound connections fail. This is because some popular services do not public ipv6 addresses, and we do not operate NAT or proxies that allow connections from RFC1918 be routed beyond the Informatics network.
HuggingFace
By default the HuggingFace python libraries, like transformers will download and cache models as needed. If the libraries try to perform a download on a compute node, it will often fail because of the network configuration discussed above. HuggingFace is flexible and can access models cached or explicitly downloaded to the shared cluster filesystem.
HuggingFace has documentation on its caching system, offline mode, and how to download models:
- Installation: Cache Setup
- Installation: Offline mode
- Guide: Download files from the Hub
- Guide: Manage huggingface_hub cache-system
You may find that some HuggingFace host names have ipv6 addresses: these may work for downloads on cluster nodes that have global unicast ipv6 addresses. However, generally it's best not to download data during your compute jobs, but to do this as part of your preparations before running your jobs.