Data Archival: Moving at a Glacial pace

The realm of online data archival is speeding up and slowing down at the same time.

What I’m referring to, of course, is one of Amazon Web Services’ newer offerings: Glacier.

Glacier is a long-term data archival service from the fine folks over at AWS. It’s similar to their S3 (Simple Storage Service), but differs in that it’s designed for long-term storage: retrieval is request-wait-download process.

Where Glacier really shines is in its rock-bottom pricing for storage: $0.01/GB month in the N. Virgina region. That’s right – 1 cent per gigabyte per month. That means I can store 100 Gigabytes for $1. Talk about cheap. Upload and retrieval requests cost just $0.050 per 1,000 requests. The long-term storage benefits are overwhelming. For very little investment, I can indefinitely store every project I’ve ever completed with all the benefits of a state-of-the-art data center: redundancy, uptime and reliability.

Glacier ‘makes its money’ so to speak on data transfer and retrieval. All data transfer to EC2 (Elastic Compute Cloud) in the same region is free, and transferring data to another AWS region will only cost you $0.02 per GB. Data transfer out to the Internet (i.e., your computer) is what can really be expensive. The first GB each month is free, and between 1 GB – 10 TB is $0.12 per GB. That means that if you did transfer out 10 TB in a single month, it would cost roughly $120 just to download. There is a catch, though. Glacier is designed for infrequent retrieval and you can retrieve 5% of your average monthly storage (pro-rated daily) for free (plus transfer fees). However, if you exceed that threshold, you are subject to a retrieval fee of $0.01 per gigabyte. So moving that 10TB in and out in the same month (or year for that matter) would cost roughly $1800!

Also, like S3 when it started, Glacier does not offer a web interface for their service. If you don’t know how to program, you “can’t” upload to Glacier. Amazon offers a Java SDK, a .Net SDK and a RESTful API.

Although Amazon itself does not offer a GUI, other developers have filled that role nicely. I used a cross-platform Java application called “SAGU”: Simple Amazon Glacier Uploader. The title says uploader, but it will also send download requests and notify you once the download is ready. You can find SAGU here.

In all, Glacier is fantastic for ‘upload and forget’ type of archiving. For me that would be old client files: project files, invoices, emails, etc. Things that I shouldn’t ever need again – but I don’t want to go the trouble of putting it on a hard drive (that could fail) or eating up my Mozy storage for that type of thing. Glacier isn’t designed for, and shouldn’t be used for, backups that happen every night – or frequent retrievals.

It’s great for the one thing it was designed for: archival. And I think it does a ‘cool’ job of it.

Leave a Reply