“Keep it all” sounds like a great corporate data management strategy on the surface. That is until someone puts in a request to retrieve information needed to satisfy a particular inquiry or the organization is required to bring all data into compliance with some external mandate. Then suddenly the ease of implementing the “Keep it all” strategy is offset by the pains associated with identifying the information needed to fulfill a request or satisfy a specific mandate. Challenges like these are driving companies to implement solutions like CommVault Simpana® Reference Copy to lower today’s Big Data complexities, costs and risks while putting in place the infrastructure that tomorrow’s Big Data analytics tools will need.
Whether companies like it or not, the era of Big Data is here and many companies are ill-prepared for its arrival. An article in the Wall Street Journal entitled The Risks of Big Data for Companies highlights some of the key issues that organizations encounter when trying to derive new benefits from it. These include:
- Knowing what data to examine and ignore
- Big Data analytic tools are still immature
- People are ill-equipped to use these tools
- Appropriately responding to the wealth of information that these tools will eventually provide
Yet an assumption that the author of this article appears to make is that data within organizations is readily accessible, available and referenceable. “Referenceable” implies that there is a primary copy of data that these data analytics tools should access and use to draw conclusions. Yet many organizations have multiple copies of the same data, only one of which should be treated as viable and used for analysis.
It is therefore incumbent upon organizations to first bring their data under control and make it centrally accessible. While having a viable copy of data is certainly necessary to take full advantage of the data analytics tools that many organizations are looking to implement at some point, there are other practical, near-term benefits that creating a central, referenceable copy of data offers to include:
- Effectively responding to and managing continual data growth
- Expediting backups and recoveries
- Minimizing or even eliminating manual processes to search for requested data
- Positioning companies to quickly and effectively respond to eDiscovery searches or litigation holds
- Reducing storage costs
These benefits combine to improve an organization’s overall productivity as fewer IT staff may confidently accomplish many more tasks in less time. Of course, realizing these benefits is predicated upon companies first having such a tool in place that actually performs these functions.
This is where CommVault Simpana and its Reference Copy feature come into play. Unlike other solutions that create and manage separate copies of data for archival and backup purposes, Simpana can store and manage as little as one copy of data that it references through its ContentStore (the underlying metadata database and policy engine for Simpana) when doing either archive or backup. Using Simpana, organizations get the flexibility to set policies that determine where data is placed, how long it is retained, who has access to it and what information it contains as Simpana indexes the data as the data is stored.
Simpana Reference Copy builds upon these existing features to provide even more granular data management capabilities. For instance, more organizations expect to store data with a public storage cloud provider such as Amazon Web Services (AWS) S3 or even AWS Glacier. However an organization may incur unexpected costs and lag times using a public cloud storage provider if they store data in the cloud and then need to access it again shortly after they store it there.
This puts organizations in a bind. They may have a large amount of data that they know they will likely never need and can justifiably be stored in the cloud. But they also know there is a subset of that data that may be frequently accessed, is subject to a legal hold, etc. They do not want to store that data in the cloud since they do not want to wait to retrieve it nor do they want to pay the extra fees that accessing it from the cloud incurs.
Reference Copy addresses this dilemma by offering more granular data management policies so the right data is stored on the right tier of storage. Whereas before organizations may have created policies that migrated all data over one (1) year in age from local storage to the cloud, using Reference Copy they can be more specific.
For instance, they can still instruct Simpana to send any data that is over one year old to the cloud. But now using Reference Copy, if a specific file is frequently accessed or subject to a legal hold, it should not be placed in the cloud. Or if it needs to be placed in the cloud to meet some external compliance requirement, a copy of the data should be retained on local storage. This will provide for fast search and retrieval to satisfy internal organizational requirements but then once the file is no longer accessed or the legal hold expires, it can be deleted from local storage since a copy of the data already exists in the cloud.
This example is probably just one of a countless number of ways that companies can practically leverage Reference Copy to capitalize on the metadata already contained in the Simpana ContentStore. In so doing they can lower their storage costs while still meeting internal user expectations or satisfying external compliance mandates.
Yet putting Simpana Reference Copy in place to realize these near-term benefits also serves a longer term purpose as well. As companies look to fully mine and understand the value of the data they have under management, they will have a head start on their competitors in this respect: they will already have in place an infrastructure with data that is accessible and ready for use by tomorrow’s rapidly maturing data analytics tools.