Over the last few months I have been doing a series of interviews with end users and then preparing more formalized DCIG case studies based upon these interviews. In these particular instances, all of the end users have been EMC Data Domain users but what I have found particularly intriguing in my conversations with these end users is that while “data deduplicaton” initially grabs their attention, after EMC Data Domain gets implemented, it simply becomes the icing on the backup to disk cake.
In the interviews I have done with these EMC Data Domain end users to date, they both shared a common experience as it related to performing backups prior to implementing EMC Data Domain’s solution: PAIN.
One of the end users I spoke to who was located in Alaska was responsible for managing backups in four data centers and dozens of remote offices. Further, when he assumed his role of Infrastructure Manager in 2007, his organization was using a mix of tape drives and formats as well as multiple different backup software products at all of these offices. Yet this was only the tip of the iceberg (pardon the pun) in terms of the problems he was facing.
- The process of of procuring data protection supplies was inefficient. Every site purchased its own backup software, tape drives and tape cartridges so he had no sense of how much was being spent, if it was being used or if the solution they had in place even worked.
- Backup management was highly distributed. IT staff in each site performed their own backups so they were done randomly with no formal routines in place to verify that backups completed successfully or that data could be recovered.
- It would not support his planned virtualization initiatives. He needed to backup up dozens of physical machines that hosted hundreds of virtual machines and he saw his current situation as a bottleneck to delivering on that objective.
Not surprisingly, the pre-EMC Data Domain environment of another user that I spoke with in California just a few weeks ago shared a similar story. His team was backing up over 300 servers with EMC NetWorker across four sites to tape and here are just some of the quotes he had to share about how difficult it was using tape as a primary backup target:
- “There was a good chance that in any one we were going to have a problem somewhere.”
- “We were constantly duking it out with tape changers as they were not staging data fast enough for us, particularly at our larger sites.”
- “The tape changers were just bound and determined to break.”
- “We easily had to spend 10 hours a week managing backups and that’s a low ball estimate. In essence, we were forced to sit around nursing our backups.”
Needless to say, these users found that once they implemented EMC Data Domain in their environments all of their backup problems went away. (After all, why else would EMC Data Domain let me talk to these users if it did not solve their problems, right?)
Yet somewhat to my surprise and maybe even to EMC’s, it was deduplication that initially got their attention and perhaps why they were talking to EMC Data Domain in the first place. But at the end of the day, the benefits that deduplication offered were more akin to the icing on the cake for these two users.
When I spoke to the user in Alaska and asked him what his deduplication ratio was, he couldn’t even tell me. Then the other user in California also could not quote the ratio. In fact, if his Network Design Specialist was not on the phone with us while we were talking, I doubt he could have told me the deduplication ratio he was achieving (it was about 20:1.)
What both users talked at length about were: (a) getting their life and the life of their IT staff back so they were not constantly worrying about backups; and, (b) having time to pursue more strategic initiatives like implementing server virtualization or security measures that were long overdue. One of the users in California said it best. “It is not like now that backup is fixed and we are reading the paper more and kicking up our heels. We are working on security more and other processes that need improvement.“
However readers of this blog entry should not walk away from it thinking that deduplication has not played a large role in the success of EMC Data Domain in these environments. It certainly has. In both cases they are using deduplication to minimize their data stores and efficiently replicate data over long distances.
In the case of the user in the state of Alaska, he has what he refers to as WAN connectivity that is “dicey at best” to remote offices, limited in bandwidth and shared with other offices. Despite these restrictions he found that EMC Data Domain could replicate data so efficiently that even over his worst WAN links he could still replicate data from his most remote data centers to his primary data center by noon the next day.
The user in California had a similar experience to share. While he was also able to efficiently replicate data, the real benefit he found was in moving large VM images from one data center to another. Since EMC Data Domain deduplicates data so efficiently and controls replication so effectively, he has found he can quickly and easily migrate a VM from one site to another and bring it up for testing, DR or even production.
This is not the first time I have heard stories like these and I doubt it will be the last. But what those organizations who are still sitting on the fence about deduplication should take away from this blog entry is that deduplication is just the icing on the backup to disk cake.
Yes, deduplication makes storing data on disk more efficient and enables the effective replication of data over WAN links. But its initial benefits – that of ensuring backups complete quickly and successfully and what that means from increasing productivity and the quality of life of backup administrators – are not being given sufficient credit and really should be the principle reasons that organizations implement a disk-based backup solution that offers deduplication.