In 2010, the amount of information created in the world crossed a zeta byte for the first time, driven in large part by highly motivated organizations seeking out actionable business intelligence. Identifying that information dictates that organizations aggregate their data into data warehouses in order to gain new operational efficiencies, identify emerging market trends and/or provide better support. In direct response to these demands, Teradata has evolved its data warehouse platform so organizations can quickly read or write data, cost-effectively store it and scale the solution without making it a hassle to do so.
In the last few years, the roles that data warehouses and business intelligence play in organizations have fundamentally changed. Once primarily reserved in support of making strategic decisions where business intelligence may not be needed for hours or even days, information is now needed in real time to support day-to-day and even minute-to-minute operations.
Teradata exemplifies the model that organizations have come to expect from data warehouses. It has for years embraced an integrated, turnkey approach to providing two key features that are integral to data warehouse management.
- Scale-out architecture. This architecture enables organizations to start small and then grow as their needs dictate. To accomplish this, Teradata uses a “shared nothing” architecture that leverages independent nodes which are each dedicated to processing data. Modular storage systems are assigned to each node to provide it with sufficient capacity with data placed on it to optimize performance for that particular node. This technique of using independent nodes and modular storage systems gives organizations the flexibility to scale-out either capacity or performance as their needs change over time.
- Self-managing. A scale-out architecture solution is interesting only inasmuch as organizations can easily and effectively manage it. Teradata’s differentiator is that its performance grows linearly even as it remains focused on this aspect of self-management by automating the optimization of storage capacity and performance as it scales out.
Yet as more organizations look to glean information from their data warehouses in real time, new technology is needed that delivers higher levels of performance without sacrificing the attributes to which they have become accustomed. This is why we are seeing the introduction of solid state disk (SSD) into many storage systems used by data warehouses to provide this extra performance kick.
SSD is a new form of storage that is becoming widely available in storage systems as it provides up to a 3x boost in write performance and a 20x or greater boost in read performance over hard disk drives (HDDs). However this boost in performance is accompanied by a similar boost in cost (about 10x per GB more than HDDs.)
To offset this cost, many storage systems include storage tiering functionality that promise to “place the right data on the right storage at the right time.” This is done to limit the number of SSDs needed and their cost by placing only the most frequently accessed data on SSDs while putting the rest on HDDs.
This approach works reasonably well for normal business applications, but data warehouses have special requirements. Teradata has found in analyzing data from its customer base that 20 – 25% of data in a Teradata data warehouse is “hot” or actively accessed while the remaining 70-80% of the data is “warm” and suitable for storing on HDDs.
On the surface, this would seem to make a Teradata data warehouse ideal for use with storage system-based storage tiering. However when this approach is closely examined, its limitations quickly become evident. Consider:
- Data is only moved based upon preset policies or on a scheduled basis. Implemented this way, data may or may not be on the right tier of storage when the data warehouse needs it.
- Manual. An administrator has to understand the storage system and the data warehouse and then set storage tiering policies accordingly. Data placement is then only as good as the storage system’s storage tiering feature and the administrator managing it.
- All data kept in cache. Some storage systems seek to minimize the need for storage tiering by keeping all data in cache. However using cache is even more cost prohibitive than just using SSDs.
It is for these reasons that Teradata took advantage of one of the special ingredients of its data warehouse: Teradata Virtual Storage™. Teradata Virtual Storage™ assumes the responsibility of placing data on the right tier at the right time as it already understands the data use in the data warehouse and can use that information to place the data on the most appropriate storage regardless if it is SSD or HDD.
It is this flexibility that has enabled Teradata to seamlessly introduce a hybrid storage mix of both SSDs and HDDs into its solution. The real significance of hybrid storage is that the 20X drive level speed gain that SSD provides enables the full performance of Teradata to be applied to a far smaller amount of data resulting in a higher performing solution that does more with your data using less storage and requires a smaller upfront investment.
This is the game changing alternative to the data warehouse industry’s traditional use of HDDs which require the use of large numbers of drives that match or balance the processing performance across the system. Using an HDD-only approach as the only way to accelerate the performance of your same data is to limit how much data is stored on each HDD and requires buying a large but unneeded storage system with more HDDs so the performance requirements can be spread across them.
Further, organizations can be confident that they will realize this performance per data space gain as Teradata Virtual Storage recognizes the performance characteristics of this SSD and will proactively place “hot” data on it as the applications need it.
To get this level of visibility into the backend storage, Teradata Virtual Storage integrates with the NetApp E-Series modular storage systems. While there are other storage solutions available, the E-Series aligns with Teradata in two key ways. It delivers modular storage growth as a turnkey solution in a single enclosure that can be “right-sized” for customer environment, and it provides the storage level performance that Teradata demands.
This architecture of the NetApp E-Series coupled with Teradata’s tight integration with the NetApp E-Series firmware gives Teradata a solution that meets an organization’s data warehousing requirements. It also continues to deliver the simplicity of management that organizations still expect their data warehouses to deliver even as they get a needed boost in performance.
SSDs have the potential to open up a plethora of new ways for data warehouses to solve some long standing analytical challenges. But randomly deploying SSDs in data warehouse environments is no guarantee of success. Rather, SSDs require a coordinated approach between both the data warehouse and storage provider to ensure a successful transition to this new environment.
Teradata Virtual Storage coupled with the NetApp E-Series hybrid storage put organizations on this path of success as it places data on the most appropriate tier of storage without placing any new burdens on organizations to manage it. But more importantly than keeping “hot” data on “hot” storage (SSDs) and “warm/cold” data on “warm/cold” storage (HDDs), Teradata ensures that organizations get the performance they now need w
hile maintaining the no-hassle database warehouse experience to which they are accustomed.