Digital archiving suffers from a perception problem though one that is probably well-deserved. Perceived as difficult to cost-justify, hard to implement and whose benefits can often be achieved by simply throwing more disk at the problem, most companies have had a hard time justifying its deployment. However a wave of fundamental changes in the storage industry as a whole and in digital archiving technology itself are setting this technology up to be one of the hottest technologies in the months and years to come.
To say that no one invests in archiving would be at best inaccurate. A 2012 IDC worldwide storage software market report highlighted that even as the growth of storage software in general remains flat, archiving software was a bright spot coming in #2 behind only data protection and recovery software in terms of its year over year growth. However the #2 spot only translated into $404 million of total revenue – not too shabby but certainly nowhere near what organizations collectively spend annually on storage hardware.
This is poised to change. A number of trends occurring in the storage industry specifically and in the broader computing industry are setting the stage for archiving to play a much larger role in all size organizations beginning in 2014. The specific trends that are setting up archiving to assume this broader role include:
- Archiving appliances. One of the obstacles associated with deploying archiving is simply getting the archiving software configured and deployed. Software, servers and storage must all be acquired and then configured. To do so, organizations either hire consultants or have someone do it in-house – all of which takes time and typically results in custom implementations that can be difficult to manage and support.
Archiving appliances such as the Nexsan Assureon (now part of Imation) include both software and hardware. These appliances expedite setup and configuration, ship with default policies so organizations can immediately implement best practices when setting up archiving and offer sufficient scalability to hold hundreds of TBs of data if not petabytes of data.
- Centralized data stores. Archiving really only works well when data is centrally stored. While archiving has enjoyed some success in database (Oracle, Sybase, SQL Server) and email (Exchange and Lotus Notes) environments, for it to gain broader adoption across more organizations they also need to centrally store their file data. As more organizations centralize the storage of their files on file servers, it becomes both more practical and cost-effective to archive the data residing on these file repositories as much of this data is inactive (60 – 95% depending on who you read) and rarely or never accessed.
- Flash memory and hybrid storage systems. The future of primary storage systems is definitely storage systems with flash memory or some combination of flash and hard disk drives (hybrid.) The potential downside associated with either of these systems is that the cost of storage capacity is anywhere from 3 – 20x of what storage costs on HDDs now. Further, these flash and hybrid systems make the traditional approach to handling data (throwing more disk drives at the problem) impractical. By implementing archiving, only the most active and/or performance intensive data needs to reside on flash and hybrid systems with the rest moved off to archival storage.
- “Infinite” retention periods. Some vendors promote the idea of deleting data and attorneys may even advise their own corporations to delete data that they are no longer legally bound to keep.
However the individual who actually has to delete data often feels like the individual who has to push the button to launch a nuclear weapon. You may receive the order to push the button (or in this case, delete the data) but that person also knows that if anything goes wrong or it is determined later that the data is needed, they are the one who will more than likely be blamed for deleting it. (I know because I was once this person.)
Archiving gives organizations the flexibility to economically keep data much longer, potentially even forever using the latest optical technologies, while individuals do not have to worry about losing their job for deleting the wrong data.
- Public storage clouds to create “infinite” storage capacity. In November 2012 NetApp announced its NetApp Private Storage Cloud for Amazon Web Services (AWS) Direct Connect so that users of NetApp storage can transparently move data stored on NetApp filers to a back end Amazon cloud. According to various individuals within NetApp, this alliance has created more interest among its customer base than almost anything else it has announced in its history. The appeal of this type of solution is that organizations can theoretically store as much data as they want on a file server as the file server essentially now acts as both an infinite storage pool and an archive.
- Too much data to backup and restore. A question that does not get asked nearly enough is, “How to do you quickly backup or restore tens of TBs, hundreds of TBs or even PBs of data?” The answer is you don’t. By using a robust archiving solution, data is moved off of primary storage so the data that does remain in production can be backed up within established backup windows. Once data is in the archive, two or maybe three copies of that data are made with the archive itself backed up once or twice. Further, once data is in the archive, a robust archiving solution will continually monitor and check on the integrity of the data in the archive and then repair it should irregularities in the data be detected.
- Unstructured data growth. Companies are creating data from a variety of sources. While humans create much of it, machine generated data is quickly becoming the largest generator of data as it may come in from multiple sources 24 hours a day, 7 days a week. The questions organizations then struggle with are, “How valuable is this data?” “When in the data’s life cycle is it most valuable?” and “Will it have value again (and again?)” Archiving provides a cost-effective means to keep this data around and easily accessible until such determinations can be made.
Archiving has suffered from a perception problem for years if not decades in large part because it has being difficult to cost-justify and hard to implement and manage. In fact, it often felt like archiving vendors were trying to fit a square peg in a round hole.
That analogy no longer applies. A combination of trends is coming together to form a perfect storm as to why archiving should move to the top of the technology heap that will make it one of the more needed and sought after storage technologies for organizations of all sizes.