Informatics-managed services
Friday 9th May 2025 - FORUM POWER LOSS
This morning an electricity failure affected the Informatics Forum.
- Power came back at 09:20.
- Many services have now (10:05) been restored but we may not be entirely back to normal for a while yet.
- Self-managed servers were powered down to prevent damage from overheating, as our sensors indicated a rapid rise in temperature in their server rooms. Cooling has since been restored to these rooms.
- Many DICE GPU servers were also powered down.
- Appleton Tower was not affected.
mlp/ilcc-cluster shared filesystem (lustre) - updated 2025-01-21 16:40
There is a problem with a lustre OST serving the filesystem that's being investigated. The affected OST (OST0003) has been deactivated and you will now see I/O or permissions errors if you try to access affected files.
If the affected files can be recreated, you can do so, and can remove affected files with rm/mv.
If you need access to affected data, please get in touch as we have a snapshot of the filesystem from 2025-01-07.
You can access snapshot data as it appears from directories under /lustre/telemach/recovery/te250107/ if you need expedited access to this snapshot data then please submit an rt ticket.
Downtime for Informatics services is announced here or on the sys-announce list. It may also get a mention on the Computing Systems blog or on Mastodon at mastodon.social/@infalerts.
University-wide services
The status of University-wide computing services can be tracked at alerts.is.ed.ac.uk/.