In the first part of our interview series with GreenBytes CEO Bob Petrocelli, we got a glimpse into the company’s groundwork with solid-state drives (SSDs) that led to the development of Solidarity. It is a high-availability (HA), globally optimized SSD storage array solution receiving a great deal of attention because it does away with magnetic drives and delivers a massive 200,000-plus IOPS performance. Today I resume my interview with Petrocelli as he lays out the configurations and processes that make Solidarity hum.
Bob: There are actually a couple of different hardware configurations. But for the initial introduction, you’re right: it is basically the same platform–with the addition of a hardware compression accelerator.
Ben: Even with the high-duty SSD, though, when you’re dumping that much data onto an SSD, you must be dealing with garbage collection pretty frequently?
Bob: Actually, no. Because as you’ll see, what we’ve done is we’ve created three tiers of storage in the system. There is a tiny sliver of SSD now that’s used for logging.
And those SSDs are RAM-based SSDs. So they have no requirement for trim, if you will. They have infinite endurance, they don’t have any issues in terms of garbage collection.
Ben: They just have to stay powered.
Bob: Well, they don’t. They actually have a super capacitor protection as well as flash backup. So it’s sort of spooky when you turn the system off, they stay lit up. And then they dump the contents to SLC [single-level cell flash memory] that’s internal.
Ben: But they have to stay powered by some means. The capacitor is how that is done, right?
Bob: Yes. They have to stay powered on long enough to dump their contents to the backup flash, in the same drive. So, basically, you can pull one of these guys out while it’s writing and it will commit that last write to flash that’s already on board, and then it doesn’t need power anymore for basically ever.
There’s only enough capacity on there to just commit those writes to ordinary SLC. Then the SLC has the customer shelf life of a couple years or so.
Ben: You said that you have a hardware accelerator coming. What have you previously done for your software? Is it just a standard RAID [redundant array of independent disks] stack from somewhere? Or is it your own proprietary RAID stack?
Bob: We’ve always used software RAID because we have a lot of cores. We found that our performance with software RAID has been better than hardware RAID so we don’t need the extra power consumption and space on the controller.
The decoder–the hardware encoder–is a specialized set of ASICs [application-specific integrated circuits] that do high-level gzip at the block level. That allows us to do our software-based option and the LZ style encoder runs very fast. We can do decoding of a
single stream with the HA blocks at about 800 MBs a second; so, it’s pretty fast.
But our compression rates of, let’s say, Canterbury Corpus–which is a standard compression benchmark for a set of data–our compression rates of that data don’t get much higher than about 1.8 to 1, because this is a software compression that’s optimized for throughput.
With the ASIC unit, that can run a single stream at a gigabyte per second; so, it’s very fast. But that’s really not why we have it. The reason we’re using that is its compression on the same data is 3.3 to 1. It’s just dramatically more efficient.
And we’re not doing it for CPU offload, which would be your first thought. We’re actually doing it for space. So when you combine that with the deduplication engine, which is particularly effective in virtual environments–and I’ll get into that in a minute–you’re starting to really drive down your cost per gig of the flash.
Also, the flash we’re using is already quite inexpensive because we are using 3K flash. But we’re using 3K flash that’s packaged in a SAS-2 [Serial Attached SCSI-2] drive with power fail protection. So it’s not a consumer drive.
We’re using real honest-to-goodness drives; they’re just designed to sit out in front of an ingest buffer that’s very large. So we have 16 GBs of battery-backed RAM [random-access memory] in the form of these SAS RAM drives that will sit there and coalesce writes for anywhere between 10 and 30 seconds, depending on the I/O [input/output] rate that’s involved.
What’s interesting is, let’s say you’re doing a classical SQL server database update, where you’re putting a bunch of updates to a database. And you update one column. The way anything works is you update the whole page. You just don’t update the one value.
And so if you have a bunch of inbound transactions, what will happen over that period of 10 seconds is you get a bunch of successive rewrites of the same pages that are dirty. What our system literally does is rewrite them, but it’s rewriting them in the RAM log. It does not commit them to the final repository until the transaction fires. And then it only commits the last one.
If you’re doing a lot of update in place–which is typical of database traffic or of VDI paging traffic–you take all that noise right out of the equation. And then it’s easy because you can actually measure it. You can see, when you look on the UI [user interface] of the system, how many I/Os per second are inbound into the log drives, and how many IOs per second are inbound to the storage reduction.
So it’s a pretty easy, straight line extrapolation to see that there is a very often a 10-times reduction in the I/Os because of this consolidation effect that occurs between what’s coming in, in terms of chatter on the front end, and what happens on the back end.
Now, that’s not even considering any of the deduplication or compression effects. Because once you turn on compression and deduplication, and you start the starting blocks, that are either patterned with zeroes or highly compressible or duplicates, then you drive your I/O down even further.
And, of course, you’re further extending the life of the flash. We’re able to measure the life of the flash because Smart [Solidarity Smart Write technology] tells us how many numbers of cycles you have on the drive.
So, what we have is each controller has multi-core Xeon CPUs that we use to run our stack. Each unit has hardware compression. It’s got removable canisters that are highly available.
We have enterprise quality SAS2 SSDs that are just running lower-endurance flash, a much cheaper flash, less expensive. Big caches so that we have the ability to front a lot of write traffic in the caches and further enhance the characteristics of the SSDs. Each controller has dual 10 Gb FC HBA.
So, we have built in a best practice where we actually create these right out of the box. So you can have a replication network and a primary data network right out of the box. You just plug it into your switches and go.
In the Part I of this series, GreenBytes CEO Bob Petrocellis
gives us some background on how forays into SSD and the replacement of
magnetic drives led to the development of Solidarity, a solution that’s
got people talking.
In Part III of this interview series, GreenBytes CEO Petrocellis shares what he refers to as some of the “dirty little secrets” about some hardware that was
cleverly repurposed to give Solidarity an edge in compression.
In Part IV of this interview series, GreenBytes CEO Bob Petrocelli talks about Solidarity’s failover response, including a failover response time of merely three seconds between canisters measured during testing.