I have made no secret about my skepticism of using dual controller architectures for inline deduplication, specifically at the enterprise level. My concern was that the workloads in enterprise backup environments would essentially overwhelm the capacity of just using two controllers and negatively impact backup jobs. However a recent briefing I had with Data Domain’s VP of Product Management, Brian Biles, has started to change my perspective as to why doing inline deduplication using dual controller architectures is becoming a more viable option for enterprise environments.
So what caused me to soften my sentiments? Well, it was not the DD660 or its new internal hardware that was recently announced on March 23. Yes, it offers 50% or more performance and capacity enhancements over previous product releases. However so does the new release of every vendor’s product when they put the latest CPUs and the newest and largest capacity disk drives in their system. What I found more interesting was the announcement it made earlier in the month about its new operating system release for its appliances – the DD OS 4.6 – and how it relates to the new hardware features.
If one reads the press release, Data Domain claims that users can achieve increases in performance ranging from 50 – 100% merely by upgrading the OS on a Data Domain system from DD OS 4.5 to 4.6 while all of the underlying hardware stays the same. The reason for this boost in performance was that Data Domain made two important changes to how its OS operates in this release:
- Enhanced its CPU centric model of analyzing data. One of the keys to doing inline deduplication at the enterprise level is to minimize or even eliminate accessing the index of the deduplicated data that is kept on disk. The more of the index that is kept in cache, the faster the deduplication algorithm will perform. By tweaking its method of analyzing data as it ingests it, it is able to more effectively use the existing hardware and extract greater levels of performance.
- Changed more processes from single threaded to multi-threaded. Prior to this release, some of its code was already multi-threaded but this release makes much more of its code multi-threaded. As a result, upgrading code on existing hardware platforms will enhance the performance on existing Data Domain solutions since the code can take better advantage of processors on current as well as future appliances.
Yet it was what Biles said later on in our conversation that prompted me to reconsider the feasibility of using dual controller inline deduplication for enterprise use. Because Data Domain is improving its underlying code in conjunction with introducing new hardware, it can now achieve backup throughput speeds of 750 MB/s or 2.7 TB/hour which is sufficient for many enterprise shops. Plus Biles argued that post-processing approaches are ultimately capped by the speed of the disk drives. This contrasts with Data Domain’s approach of using code to take advantage of faster processor speeds and more cache, it can conceivably overcome this throughput barrier.
Biles brings forward some salient points as to how dual controller inline deduplication architectures are evolving to move into the enterprise space. While I still suspect that some of Data Domain’s throughput speed claims are still only achieved in specific circumstances (the fact that it highlights the Veritas NetBackkup OpenStorage (OST) by Symantec so prominently makes me wonder if it can attain these speeds with other software products), it is clear that the gap between dual controller inline deduplication and post processing deduplication appliances for the enterprise space is starting to close or has closed. Going forward, I’ll be curious to see what enterprise users of Data Domain’s devices are saying about the backup performance they are seeing, what type of workloads they are subjecting its appliances to and what backup software they are using in conjunction with it.