You are here

Other services

Informatics-managed services

Monday 8 April - MLP cluster storage

The MLP cluster currently has an ongoing storage issue with the Lustre filesystem. It started intermittently on Friday 22 March, but the major problem has been from some point over the weekend (23-24 March). The lustre filesystem is made up of 4 zfs pools of storage, one of which has become corrupted. The affected pool has been isolated - which means affected files are currently not available.

Users should be able to log in and run jobs on the cluster. If you're not able to log in, please report this using the Support Form so we can restore access with a clean home directory. Where we've had to do this we're working to restore access to files which are still accessible as quickly as possible.

We have moved all the affected files out of the way, and will restore unaffected files into a directory called ~/<homedir>.recovered. We are working through the affected files as quickly as possible, if you need expedited access to files please submit an RT ticket.

We have lost access to some of the data stored in the filesystem. Efforts are currently ongoing to attempt to recover data. This is likely to be a long and involved process, if it's possible to recover this data. Attempts to access files in this inaccessible store will result in the shell locking up. If this is happening during conda updates, we would advise moving your conda directory to one side, and reinstalling your conda environment.

Downtime for Informatics services is announced here or on the sys-announce list. It may also get a mention on the Computing Systems blog or on Mastodon at mastodon.social/@infalerts.


University-wide services

The status of University-wide computing services can be tracked at alerts.is.ed.ac.uk/.

Last update: 24-Apr-2024 12:35 pm

System Status

Home dirs (AFS)
Network
Mail
Other services
University services
Scheduled downtime

Choose a topic