Deduplicating disk-based appliances are well on their way to becoming an integral component of backup processes in small, midsize and large enterprises. Yet as this transition occurs, more organizations are recognizing replication software on these appliances is still immature at best even as their need to move more data quickly increases. In this fifth part of my interview series with Sepaton’s Director of Product Management, Peter Quirk, he reveals the enhancements that Sepaton made to its VirtuoSO platform to better deliver replication to meet enterprise expectations.
Jerome: Please elaborate on the intelligent data mover software that is available in VirtuoSO.
Peter: The VTL platform had two replication engines. One is very tape oriented and the second is OST oriented. In the VirtuoSO architecture, data movement is a core primitive (not a bolt-on process) and the data mover architecture is extensible. In the product today the data mover is split into a front and back end. The data mover software sits down in the data store layer but it knows how to talk to the upper layer of the protocol stack – theVirtuoSO inline deduplication engine.
When a VirtuoSO system replicates to another VirtuoSO system, the front end of the remote system knows that the back end of the source system could be sending hash-based data, differential data or even raw data. The front end of the data mover is aware of these different types and has a conversation with the sender about what kinds of data it wants to send.
If the data mover on the source system is sending hash-based data, it sends a bunch of hashes to the data mover on the target system. The target system returns a bit vector indicating which hashes it already has, and the source system responds by sending the blocks of data for which the target system has no hash entries.
If the source system needs to send a delta, it sends a list of ids of dependent deltas to the target, inquiring about any supporting data that needs to be sent before the delta is sent. The target sends back a list of the deltas it has already.
The two kinds of data can be interleaved on the wire (both the queries to the target and the actual data being replicated.) This is designed to optimize the utilization of available network bandwidth by interspersing hash-based data and delta-differenced data in the same set of transactions on the wire.
Just as the VirtuoSO system makes decisions about whether to do inline or post process deduplication during ingest, the data mover makes decisions about whether to send data in a deduplicated format or in a raw format based on the likelihood of hashes being in the dictionary on the target. During a transfer, the target system constantly updates the source system with information about the hit-rate for hash entries in the target’s dictionary. If the hit-rate is low (or zero, indicating an initial backup,) the source system optimizes the transfer by not querying about every hash or delta and sends the raw data (suitably compressed.)
For client-side deduplication (or source data reduction as it is sometimes known)Sepaton will deliver a plug-in that combines the inline deduplication engine with an installable file system. In this way the VirtuoSO system can take advantage of its data mover architecture by enabling a conversation between the inline engine that is running on the client machine and the back end of the data mover software which is talking to VirtuoSO’s data store.
Another example of extending the data mover architecture is to say, “Let’s make the back end aware of new kinds of storage, like dumb storage out on an Amazon S3 storage bucket, or perhaps slower storage tier such as an archiving tier with big fat slow disks.” When we implement this, the back end will know how to optimize data transfers to those big slow disks as they might require data to be sent in a different way to optimize their performance.
It’s a very flexible and extensible model that allows us to cleave the data mover software, marrying the source-aware portion with a backup client, and the target-aware portion with the back end of a different system, or even a different storage platform such as Amazon S3. Both halves of the data mover still have the same conversation that they would have if they were on a single system.
Jerome: This description of your data mover software makes it sound like Sepaton is readying or already has released a client-side backup acceleration software such as OST or DD Boost?
Peter: Sepaton will actually be demonstrating that kind of source side backup acceleration software in Q1 2014. Sepaton is using this data mover architecture to implement replication for all of the data in a system whether it is sent through NFS or OST as they are built on top of this data mover architecture. However, over time, Sepaton will use it to move data off to archive platforms from other partners and even to cloud storage.
Jerome: Should organizations expect to see performance enhancements in the VirtuoSO platform as well?
Peter: That is a reasonable assumption. While we’re talking about replication we should note how VirtuoSO’s replication model improves the time-to-safety at the remote site.
The replication model in VirtuoSO is more like a syncing operation. It is constantly trying to keep the DR system in sync with the source system in the data center. As soon as any change is made to an object in the source system, VirtuoSO schedules a replication of the change to the target system which results in the data being available sooner on the target.