You are here

Data storage options

Printer-friendly versionPrinter-friendly version

Anyone seeking a suitable home for data can be faced with a bewildering array of choices. Local storage, network attached storage, storage area networks and (the most nebulous option) the cloud all compete for attention. This page provides some guidance.

Factors to think about

When identifying the most suitable type of storage, there are a number of issues to consider. They are usually related - for example faster storage will inevitably be more expensive than slow storage, and redundant storage will cost more than a single disk. The old joke that "your storage can be fast, reliable or cheap; pick two out of the three" applies, but with many more than three factors to consider:

Cost

How much do you want to pay? 1TB disks can be bought for under £50 but top end storage may cost many hundreds of pounds per Terabyte.

Availability

Do you only need to access your data on a single host? A small number of hosts? Do you want to be able to access your data from your machine at home as well as the School? From anywhere in the world? Do other School members need to access the data? Other University users? Collaborators from outside the University?

Security

How safe from prying eyes does your data have to be? Should only you or a small number of collaborators have access to the data or should it be accessible by anyone? Does it need to be encrypted on the disk? If it is transferred to your host via the network, does it have to be be encrypted on the wire?

Performance

How are you going to use your data? Are you going to be doing a lot of reading and writing of the data or will there be very little movement of data on the disk? Will you be using parallel processing? You may need a file system optimised for this sort of work.

Reliability

Can your data be recreated easily, so that it doesn't matter if it is lost when a disk fails? Or should it be resilient enough to survive foreseeable hardware problems?

Longevity

Is your data just scratch work which can be deleted as soon as you are done with it? Or does it need to be available for many years to come?

Legal Matters

Many funding bodies now have strict conditions about how, where and for how long data should be stored. Does your storage choice meet these conditions?

What's Available

These options are available to Informatics users:

Local Disk

Every DICE desktop machine has some spare disk space left over after the operating system has been installed. Normally this is on the order of 100GB, but the exact amount will depend on the age of the machine. If the machine has been assigned to you, this spare space can be configured for your use. There are two ways in which it can be used: as scratch space, meant for data with a strictly limited lifespan and therefore not mirrored or backed up, or as a local data partition which can be mirrored or backed up to tape if required. Local disk space is free and is fairly fast since there is no network overhead, but it is only available on the hosting machine.

AFS file space

AFS is the default Informatics distributed file system. We use AFS for home directories and for most group space. Its advantages are that it is free (within reason; if you want more than 0.2TB we'll charge you, but depends on how much you want and why), secure and highly accessible from most operating systems anywhere in the world. There is even a client for iOS. On the down side you need to be authenticated to access your data, which can be an issue with long running jobs (though there are workarounds), and that performance can be slow compared to other options, particularly if you are doing a lot of writing and re-reading.

NFS3 File Space

This is what the School's distributed file system used to look like. We had good reasons for moving to AFS but NFS may have an advantage for a few niche usage cases. It is far less secure than AFS, but this does mean that your application doesn't have to be authenticated to access the file system. It may also be slightly faster than AFS in some situations. On the debit side, it is only accessible within the Informatics network perimeter. For most users, it's hard to see why you might prefer NFS to AFS, but in a few special cases it might make sense.

Self Managed Storage

If you need a lot of storage (multiple terabytes) but you don't want to, or can't afford to, pay the School or University large amounts of money for it, you have a problem. There is another way: buy storage and attach it to your DICE or self-managed workstation. Options range from simple USB enclosures attached to a single machine, through NAS boxes attached to the school network, right up to fibre attached disk arrays attached to the School SAN. The advantage of this course is that the cost per terabyte of your storage will be less than with the fully supported option. On the other hand the storage may well be less resilient that the fully supported options, and you will continue to be responsible for all costs for this storage (replacements for failed disks, maintenance contracts and so on). You should not just buy a piece of kit from a website and expect it to be attached to the School's infrastructure: the School's computing staff should be consulted and their agreement obtained before making any purchases.

Centrally Provided Storage

Somewhat confusingly, Information Services provides two separate SAN based storage networks, the Infrastructure SAN and the ECDF SAN also known as RDM DataStore. Research groups looking for storage are encouraged to use Datastore so that's what we'll talk about here. Each researcher in the University is automatically allocated 0.5TB of Datastore storage free of charge for research data, half of which can be allocated to a research group pool at the researcher's discretion. Additional storage on Datastore can be purchased at a yearly cost. Unfortunately the access mechanisms for Datastore are not optimal for providing automatic access from all Informatics workstations though access can be arranged from individual DICE workstations and self-managed machines. Instructions for connecting from DICE can be found here and for self managed machines here.

In addition, ECDF also offer a high performance file system for users of the EDDIE cluster. The same remarks about access from DICE apply.

Note that as mentioned above, and unlike School provided storage where the cost is a one-off, IS managed storage is charged on a yearly basis. Details of the costs for ECDF storage can be found on this page

For more details of central data storage options, see here.

The Cloud

... by which is meant any form of storage located outwith the University of Edinburgh - for example Dropbox, Google Drive and iCloud. At first these seem attractive since they are usually easy to use, inexpensive and offer an easy way of sharing data with collaborators outside the University. However there are pitfalls, especially the issue of security. You are potentially giving access to your data to whoever is running the cloud service. If the data is in any way confidential, this may be very bad. Some cloud storage providers have a reputation for prioritising usability over security. Another issue is that some funders require that project data not go outside certain geographical areas. With cloud storage you will probably be unable to say with any certainty exactly which country houses your data. Finally, the rate at which data can be transferred to or from the cloud is highly dependent on the quality of one's connection to the internet. If you are thinking of using a cloud storage service, be sure to read its terms and conditions carefully.

Our policies and guidelines page has guidance which may be relevant here including laws affecting our computing provision and advice on data security.

Summary

Storage type Cost Availability Security Performance Reliability Longevity Legal Issues
Local disk Free Local machine Low Good Data loss on single disk failure For life of machine None
AFS <200GB free, thereafter £500/TB (one-off charge) Worldwide High Low RAID No fixed end of life None
NFS <200GB free, thereafter £500/TB (one-off charge) Within School Low Better than AFS RAID No fixed end-of-life None
Self-managed storage Depends on kit bought but probably lowest of cost options Within School Depends on kit bought Depends on kit bought Depends on kit bought For as long as you can keep it running None
Central storage £175/TB/year (Datastore), £400/TB/Year (EDDIE) Within University Low Depends on type of storage and access method RAID For as long as you keep paying the recurring cost None
Cloud storage iCloud £15/TB/month; Google Drive £5.20 ($10)/TB/month; Dropbox £7.99/TB/month Worldwide Questionable Low Depends on service but probably robust For as long as you keep paying the monthly cost Research funders etc may prescribe location of data
Last reviewed: 
19/02/2019

System Status

Home dirs (AFS)
Network
Mail
Other services
Scheduled downtime

Choose a topic