You are here

Data storage options

This page is a guide to where to store research data.

Factors to think about

When looking for the most suitable storage, there are several issues to consider. They are usually related - for example faster storage will be more expensive than slow storage, and redundant storage will cost more than a single disk. The old joke that "your storage can be fast, reliable or cheap; pick two out of the three" applies, but with many more than three factors to consider:

Cost

How much do you want to pay? 1TB disks can be bought for under £50 but top end storage may cost many hundreds of pounds per Terabyte.

Availability

Do you only need to access your data on a single host? A small number of hosts? Do you want to be able to access your data from your machine at home as well as the School? From anywhere in the world? Do other School members need to access the data? Other University users? Collaborators from outside the University?

Security

How safe from prying eyes does your data have to be? Should only you or a small number of collaborators have access to the data or should it be accessible by anyone? Does it need to be encrypted on the disk? If it is transferred to your host via the network, does it have to be be encrypted on the wire?

Performance

How are you going to use your data? Are you going to be doing a lot of reading and writing of the data or will there be very little movement of data on the disk? Will you be using parallel processing? You may need a file system optimised for this sort of work.

Reliability

Can your data be recreated easily, so that it doesn't matter if it is lost when a disk fails? Or should it be resilient enough to survive foreseeable hardware problems?

Longevity

Is your data just scratch work which can be deleted as soon as you are done with it? Or does it need to be available for many years to come?

Legal Matters

Many funding bodies have strict conditions about how, where and for how long data should be stored. Does your storage choice meet these conditions?

What's Available

These options are available to Informatics users:

Local Disk
Every DICE desktop machine has some spare disk space left over after the operating system has been installed. Normally this is on the order of 100GB, but the exact amount will depend on the age of the machine. If the machine has been assigned to you, this spare space can be configured for your use. There are two ways in which it can be used: as scratch space, meant for data with a strictly limited lifespan and therefore not mirrored or backed up, or as a local data partition which can be mirrored or backed up to tape if required. Local disk space is free and is fairly fast since there is no network overhead, but it is only available on the hosting machine.
AFS file space
AFS is the default Informatics distributed file system. We use AFS for home directories and for most group space. Its advantages are that it is free (within reason; if you want more than 0.2TB we'll charge you, but depends on how much you want and why), secure and highly accessible from most operating systems anywhere in the world. On the down side you need to be authenticated to access your data, which can be an issue with long running jobs (though there are workarounds), and performance can be slow compared to other options, particularly if you are doing a lot of writing and re-reading. AFS based space can be mirrored (1 day retention period) and/or backed up onto tape (13 month retention period) nightly but these options will incur extra cost.
NFS3 File Space
This is what the School's distributed file system used to look like. We had good reasons for moving to AFS but NFS may have an advantage for a few niche usage cases. It is far less secure than AFS, but this does mean that your application doesn't have to be authenticated to access the file system. It may also be slightly faster than AFS in some situations. On the debit side, it is only accessible within the Informatics network perimeter. For most users, it's hard to see why you might prefer NFS to AFS, but in a few special cases it might make sense. Like AFS space, it can be mirrored and backed up to tape at extra cost.
Self Managed Storage
If you need a lot of storage (multiple terabytes) but you don't want to, or can't afford to, pay the School or University large amounts of money for it, you have a problem. There is another way: buy storage and attach it to your DICE or self-managed workstation. Options range from simple USB enclosures attached to a single machine to NAS boxes attached to the school network. The advantage of this course of action is that the cost per terabyte of your storage will be less than with the fully supported option. On the other hand the storage may well be less resilient that the fully supported options, and you will continue to be responsible for all costs for this storage (replacements for failed disks, maintenance contracts and so on). You should not just buy a piece of kit from a website and expect it to be attached to the School's infrastructure: the School's computing staff should be consulted and their agreement obtained before making any purchases.
Centrally Provided Storage
The University's Information Services provides two separate SAN based storage networks, the Infrastructure SAN and the ECDF SAN also known as RDM DataStore. Research groups looking for storage are encouraged to use DataStore, so that's what we'll talk about here. Each researcher in the University is automatically allocated 0.5TB of DataStore storage free of charge for research data. Half of this can be allocated to a research group pool at the researcher's discretion. Additional storage on DataStore can be purchased at a yearly cost. Storage on DataStore is mirrored and backed up to tape every night at no extra cost, the retention periods are 14 and 60 days respectively.

The access mechanisms for DataStore are not optimal for providing automatic access from all Informatics workstations - though access can be arranged from individual DICE workstations and self-managed machines. Instructions for connecting from DICE can be found below.

In addition, ECDF also offers a high performance file system for users of the EDDIE cluster. The same remarks about access from DICE apply.

Note that as mentioned above, and unlike School provided storage where the cost is a one-off, IS managed storage is charged on a yearly basis. The costs are listed on the Research Services Charges page, below.

For more details of central data storage options, including DataStore, DataSync, Data Safe Haven, DataVault and more, see our page on Central data storage:

The Cloud
... by which is meant any form of storage located outwith the University of Edinburgh - for example Dropbox, Google Drive and iCloud. At first these seem attractive since they are usually easy to use, inexpensive and offer an easy way of sharing data with collaborators outside the University. However there are pitfalls, especially the issue of security. You are potentially giving access to your data to whoever is running the cloud service. If the data is in any way confidential, this may be very bad. Some cloud storage providers have a reputation for prioritising usability over security. Another issue is that some funders require that project data not go outside certain geographical areas. With cloud storage you will probably be unable to say with any certainty exactly which country houses your data. Finally, the rate at which data can be transferred to or from the cloud is highly dependent on the quality of one's connection to the internet. If you are thinking of using a cloud storage service, be sure to read its terms and conditions carefully.

Our policies and guidelines page has guidance which may be relevant here, including laws affecting our computing provision and advice on data security.

Summary

Storage type Cost Availability Security Performance Reliability Longevity Legal Issues
Local disk Free Local machine Low Good Data loss on single disk failure For life of machine None
AFS Under review but currently <200GB free, thereafter £500/TB, £1000/TB (mirrored) or £1500/TB (mirrored and backed up to tape) (All one-off charges) Worldwide High Low RAID No fixed end of life None
NFS Under review but currently <200GB free, thereafter £500/TB, £1000/TB (mirrored) or £1500/TB (mirrored and backed up to tape) (All one-off charges) Within School Low Better than AFS RAID No fixed end-of-life None
Self-managed storage Depends on kit bought but probably lowest of cost options Within School Depends on kit bought Depends on kit bought Depends on kit bought For as long as you can keep it running None
Central storage £175/TB/year (Datastore), £400/TB/Year (EDDIE) Within University Low Depends on type of storage and access method RAID For as long as you keep paying the recurring cost None
Cloud storage iCloud £3.5/TB/month; Google Drive £4.00/TB/month; Dropbox £2.75/person/TB/month Worldwide Questionable Low Depends on service but probably robust For as long as you keep paying the monthly cost Research funders etc may prescribe location of data
Last reviewed: 
02/05/2023

System Status

Home dirs (AFS)
Network
Mail
Other services
University services
Scheduled downtime

Choose a topic