You are here

Data storage options

Printer-friendly versionPrinter-friendly version

Anyone seeking a suitable home for data can be faced with a bewildering array of choices. Local storage, network attached storage, storage area networks and (the most nebulous option) the cloud all compete for attention. This page provides some guidance.

Factors to think about

When identifying the most suitable type of storage, there are a number of issues to consider. They are usually related - for example faster storage will inevitably be more expensive than slow storage, and redundant storage will cost more than a single disk. The old joke that "your storage can be fast, reliable or cheap; pick two out of the three" applies, but with many more than three factors to consider:

Cost

How much do you want to pay? 1TB disks can be bought for under £50 but top end storage may cost many hundreds of pounds per Terabyte.

Availability

Do you only need to access your data on a single host? A small number of hosts? Do you want to be able to access your data from your machine at home as well as the School? From anywhere in the world? Do other School members need to access the data? Other University users? Collaborators from outside the University?

Security

How safe from prying eyes does your data have to be? Should only you or a small number of collaborators have access to the data or should it be accessible by anyone? Does it need to be encrypted on the disk? If it is transferred to your host via the network, does it have to be be encrypted on the wire?

Performance

How are you going to use your data? Are you going to be doing a lot of reading and writing of the data or will there be very little movement of data on the disk? Will you be using parallel processing? You may need a file system optimised for this sort of work.

Reliability

Can your data be recreated easily, so that it doesn't matter if it is lost when a disk fails? Or should it be resilient enough to survive foreseeable hardware problems?

Longevity

Is your data just scratch work which can be deleted as soon as you are done with it? Or does it need to be available for many years to come?

Legal Matters

Many funding bodies now have strict conditions about how, where and for how long data should be stored. Does your storage choice meet these conditions?

What's Available

These options are available to Informatics users:

Local Disk

Every DICE desktop machine has some spare disk space left over after the operating system has been installed. Normally this is on the order of 100GB, but the exact amount will depend on the age of the machine. If the machine has been assigned to you, this spare space can be configured for your use. There are two ways in which it can be used: as scratch space, meant for data with a strictly limited lifespan and therefore not mirrored or backed up, or as a local data partition which can be mirrored or backed up to tape if required. Local disk space is free and is fairly fast since there is no network overhead, but it is only available on the hosting machine.

AFS file space

AFS is the default Informatics distributed file system. We use AFS for home directories and for most group space. Its advantages are that it is free (within reason; if you want more than 0.2TB we'll charge you, but depends on how much you want and why), secure and highly accessible from most operating systems anywhere in the world. There is even a client for iOS. On the down side you need to be authenticated to access your data, which can be an issue with long running jobs (though there are workarounds), and that performance can be slow compared to other options, particularly if you are doing a lot of writing and re-reading.

NFS3 File Space

This is what the School's distributed file system used to look like. We had good reasons for moving to AFS but NFS may have an advantage for a few niche usage cases. It is far less secure than AFS, but this does mean that your application doesn't have to be authenticated to access the file system. It may also be slightly faster than AFS in some situations. On the debit side, it is only accessible within the Informatics network perimeter. For most users, it's hard to see why you might prefer NFS to AFS, but in a few special cases it might make sense.

Self Managed Storage

If you need a lot of storage (multiple terabytes) but you don't want to, or can't afford to, pay the School or University large amounts of money for it, you have a problem. There is another way: buy storage and attach it to your DICE or self-managed workstation. Options range from simple USB enclosures attached to a single machine, through NAS boxes attached to the school network, right up to fibre attached disk arrays attached to the School SAN. The advantage of this course is that the cost per terabyte of your storage will be less than with the fully supported option. On the other hand the storage may well be less resilient that the fully supported options, and you will continue to be responsible for all costs for this storage (replacements for failed disks, maintenance contracts and so on). You should not just buy a piece of kit from a website and expect it to be attached to the School's infrastructure: the School's computing staff should be consulted and their agreement obtained before making any purchases.

Centrally Provided Storage

Information Services provides a service called RDM DataStore. It aims to provide each researcher in the University with 0.5TB of storage for research data. Half of this can be allocated to a research group pool at the researcher's discretion. Unfortunately the access mechanisms for this storage are not optimal for Informatics workstations, so provision of this space to Informatics users has been delayed until we know whether or not it can be provided via AFS. Instructions on connecting to DataStore from DICE are available.

In addition to RDM DataStore, IS also runs a commodity SAN where space can be purchased. Three tiers of storage are offered, ranging from low performance but low cost SATA based storage (similar to that offered by the School) right up to very high performance storage with a correspondingly high price point. The commodity SAN offers the same access mechanisms as RDM DataStore and as a consequence it's similarly unsuitable for use with Informatics workstations. Note that unlike School provided storage, when buying storage from IS there is both an upfront and a recurring charge. Costs for IS SAN storage are detailed here.

The Cloud

... by which is meant any form of storage located outwith the University of Edinburgh - for example Dropbox, Google Drive and iCloud. At first these seem attractive since they are usually easy to use, inexpensive and offer an easy way of sharing data with collaborators outside the University. However there are pitfalls, especially the issue of security. You are potentially giving access to your data to whoever is running the cloud service. If the data is in any way confidential, this may be very bad. Some cloud storage providers have a reputation for prioritising usability over security. Another issue is that some funders require that project data not go outside certain geographical areas. With cloud storage you will probably be unable to say with any certainty exactly which country houses your data. Finally, the rate at which data can be transferred to or from the cloud is highly dependent on the quality of one's connection to the internet. If you are thinking of using a cloud storage service, be sure to read its terms and conditions carefully.

Our policies and guidelines page has guidance which may be relevant here including laws affecting our computing provision and advice on data security.

Summary

Storage type Cost Availability Security Performance Reliability Longevity Legal Issues
Local disk Free Local machine Low Good Data loss on single disk failure For life of machine None
AFS <200GB free, thereafter £500/TB Worldwide High Low RAID No fixed end of life None
NFS <200GB free, thereafter £500/TB Within School Low Better than AFS RAID No fixed end-of-life None
Self-managed storage Depends on kit bought but probably lowest of cost options Within School Depends on kit bought Depends on kit bought Depends on kit bought For as long as you can keep it running None
Central storage from £1788/TB capital cost and £286/TB/year Within University Low Depends on type of storage and access method RAID For as long as you keep paying the recurring cost None
Cloud storage iCloud £15/TB/month; Google Drive £5.20 ($10)/TB/month; Dropbox £7.99/TB/month Worldwide Questionable Low Depends on service but probably robust For as long as you keep paying the monthly cost Research funders etc may prescribe location of data
Last reviewed: 
27/09/2016

System Status

Home dirs (AFS)
Network
Mail
Other services
Scheduled downtime

Choose a topic