Informatics-managed services
Friday 9th May 2025 - FORUM POWER LOSS
This morning an electricity failure affected the Informatics Forum.
- Power came back at 09:20.
- Services have largely returned to normal.
- Self-managed servers were powered down to prevent damage from overheating, as our sensors indicated a rapid rise in temperature in their server rooms. Cooling has since been restored to these rooms, and the managers of these machines have been given the all clear to power them back on.
- Many DICE GPU servers were also powered down temporarily.
- Appleton Tower was not affected.
mlp/ilcc-cluster shared filesystem (lustre) - updated 2025-01-21 16:40
There is a problem with a lustre OST serving the filesystem that's being investigated. The affected OST (OST0003) has been deactivated and you will now see I/O or permissions errors if you try to access affected files.
If the affected files can be recreated, you can do so, and can remove affected files with rm/mv.
If you need access to affected data, please get in touch as we have a snapshot of the filesystem from 2025-01-07.
You can access snapshot data as it appears from directories under /lustre/telemach/recovery/te250107/ if you need expedited access to this snapshot data then please submit an rt ticket.
Downtime for Informatics services is announced here or on the sys-announce list. It may also get a mention on the Computing Systems blog or on Mastodon at mastodon.social/@infalerts.
University-wide services
The status of University-wide computing services can be tracked at alerts.is.ed.ac.uk/.