Data Loss on SATA-Based Storage Systems - Coming Soon to Your Company?
Whether or not to recommend users deploy SATA-based storage systems remains one of my more frustrating tasks as an analyst. On one hand, I want to encourage companies to use storage systems that use SATA disk drives because they are economical, meet the availability requirements for many corporate applications and have reliability ratings that on paper are arguably as good as, if not better than, FC and SCSI/SAS drives. What prevents me from doing so is that, having worked in enterprise organizations, these same companies many times have unforgiving mission-critical application requirements and the last thing I want to do is encourage companies to deploy SATA-based storage systems that fail in some key area.
This point was brought home to me in a recent joint briefing that I received from NEC's vice president of Advanced Storage Products, Karen Dutch, and RAID Inc.'s COO, Robert Picardi. The purpose of the briefing was to discuss the new OEM agreement announced today between NEC and RAID Inc. RAID Inc. will be reselling NEC's D-Series storage systems as its Xanadu storage system line to its HPC (high performance computing) and government markets. However what's more interesting is the story behind the headline and what prompted the necessity for this OEM relationship in the first place.
I was first briefed on the NEC D-Series about a year ago and was, at that time, impressed by its breadth of functions and scalability. However it then promptly and curiously disappeared from view (as has happened before with other computing products offered by NEC) such that I largely forgot about the D-Series product line. Then last week the D-Series re-appears out of the blue in conjunction with the announcement of an OEM relationship with RAID Inc. In my mind, this did not make sense. Why does someone like RAID Inc. take a chance with a relatively unknown product in the US storage market when it can partner with any number of existing and established storage system providers?
The answer that RAID Inc.'s Picardi gave to this question goes to the very heart of the concerns that I made about SATA-based storage systems at the outset of this blog entry. Picardi indicated that they were getting some very disturbing feedback from their existing HPC customers that used SATA-based storage systems. In these environments, some of their clients were running the same query against the same data set and coming back with different answers. This was not a one-time occurrence but occurring frequently enough that they felt the need to change out their storage systems. Their clients then began internal procedures to double-check their answers as well as checked with their colleagues at other HPC locations and found that they too were having similar problems with SATA-based storage systems.
Granted, this is not the first time I have heard about problems with HPC shops having problems with SATA. In talking with Mark Seager, the assistant department head for advanced technology at the Lawrence Livermore National Laboratory about a year ago, he indicated that SATA disk drives in his systems went catatonic from time to time but he never indicated that there was any sort of data loss. The fact that RAID Inc.'s HPC clients were coming to RAID Inc. put them in crisis mode to identify the source of the problem as well as provide a SATA-based storage system that would not loose data while it is written or later when it was read.
As one of RAID Inc.'s customers investigated the source of the problem, it discovered that many, if not most, storage system providers of SATA disk drives do not provide an extra 8 byte integrity field at the end of each sector on a SATA disk drive. Rather than utilizing the 520 byte sector sizes that are found on FC and SAS storage systems, many SATA based storage system providers only use 512 byte sector sizes. While 512 byte sector sizes deliver faster performance than a 520 byte sector, the trade-off is that data integrity is sacrificed. Until now, the necessity for this extra 8 byte integrity field on SATA-based storage systems for data integrity was relatively unknown. Now only after data loss is beginning to occur are companies starting to find out the importance of this 8 byte integrity field.
The fact that the NEC D-Series addresses this concern is a major reason that RAID Inc. selected NEC over competing products. Jack Igoe, NEC's Director, Product Management, says that NEC's engineers recognized this lack of a data integrity check in other SATA based storage systems as a fundamental problem and addressed this early on in the design of the D-Series array controllers. The D-Series array controllers add the extra 8 byte integrity field to the data as it is stored to ensure that the integrity of the data is maintained. This helps to put the data the D-Series manages and stores on par with storage systems that use FC and SAS drives so customers can have the same level of assurance that the integrity of their data is maintained.
RAID Inc.'s selection of NEC's D-Series is a disconcerting statement about the storage industry as a whole. While kudos goes to NEC for recognizing this problem early-on and designing its storage systems to address this problem, one has to wonder about EMC, HDS, IBM and NetApp as well as Compellent, Dell EqualLogic, Pillar Data Systems and the many others that offer SATA disk drives as an option in some of their storage systems.
While these other storage systems may not regularly go into HPC environments that have specific low cost/high capacity requirements, the storage capacity in all storage systems is climbing, especially now that 750 GB and 1 TB SATA disk drives are becoming more widely available. These rapid growth rates coupled with RAID Inc.'s willingness to select the NEC D-Series, a relative newcomer to the US storage market, should prompt all companies to question how much longer they have to wait before they experience unexpected data loss on their SATA-based storage system - or if data loss has already occurred and they just don't know about it yet.