Blockchain Technology Being Used to Protect Data as Opposed to Holding It Ransom

Blockchain technology holds the potential to dramatically enhance global commerce and every supply chain. Unfortunately, the first real-world experience many organizations have had with it is using its implementation vis-à-vis Bitcoin to pay a ransom to cybercriminals who have encrypted their company’s files. The good news is that vendors like Nexsan see the upside of blockchain and are using it for more noble purposes: protecting files stored on its Unity Active Archive appliances.

Blockchain technology is on the verge of becoming really big. As in HUGE big. In a 2016 TED Talk, Don Tapscott referred to it as the technology that is likely to have the greatest impact in the next few decades for one simple reason: it facilitates the creation of trust. In fact, in the video he calls it “the trust protocol.

This brings me to Nexsan and its use of blockchain technology in its Active Archive product. Why does Nexsan use blockchain? So users can trust that when they go to retrieve a file, they know it will be available in its original, undefiled state.

In the case of the Unity Active Archive, whenever it ingests a file, it stores two copies of the file and generates two cryptographic file hashes or digital fingerprints. It stores those fingerprints separately, in a hardened private blockchain internal to the device.

These digital fingerprints are more than a “just-in-case” technology. Rather, they are used in automated file integrity audits. These audits guard the data from silent data corruption. When it discovers a mismatch between an original fingerprint and a fingerprint generated during an audit, it replaces the corrupted file using the other copy of the file from the archive’s object store.

If you are like me, you are sick and tired of seeing criminals use the latest, greatest technologies like blockchain for nefarious purposes. Nexsan’s use of blockchain technology to protect files in its Active Archive is particularly satisfying to me on two levels. Not only does it provide a great way to verify file authenticity, it does it using the very technology that cybercriminals are using to get paid and avoid detection by authorities. I cannot think of a better way to capitalize on blockchain technology while playing turnabout on cybercriminals at the same time. Kudos to Nexsan for doing so!




Ransomware Detection and Event-based Backup Scheduling Lead the Acronis Backup 12.5 Feature Parade

It’s summer time and nothing typifies it more in the United States than a parade on one of its summer holidays. Keeping with this tradition, the Acronis Backup 12.5 release rolls out a parade of new features that help differentiate it in a crowded market. Leading its feature parade is the introduction of security software to authenticate preexisting backups; the flexibility to customize the names of archived backups; and, event-based backup scheduling all caught my eye as features that few other backup software products currently offer.

These days almost any product briefing with any backup software provider starts with some mention of how they deal with ransomware and, really, how can you blame them? Management is usually more aware of the quality of the coffee in the break room than their company’s ability to recover from backup. Ransomware has changed all of that. Suddenly having viable backups from which one can quickly and easily recover from a ransomware attack has the attention of everyone from department level managers to executives in corporate boardrooms.

While it can be said with some level of certainly that any properly configured backup software product provides some level of protection again ransomware, the ability of each backup software product to do so varies. Further, ransomware is rapidly evolving. Acronis recently became aware of at least one iteration of ransomware that is corrupting/infecting older backups that reside on disk. In this variation, even possessing older backups may not guarantee a good restore since the ransomware may infect those backups.

This is how the Acronis Backup 12.5 feature parade begins: by setting itself apart using security software to authenticate backup files. Part of its new Acronis Active Protection feature functionality, the security software actively monitors preexisting backup files for any changes to them and compares them to the original state of the backup. If unauthorized changes to older backups are detected, it creates an alert and restores the backup to its original state.

The second feature found in Acronis Backup 12.5 focuses less on engineering and more on Acronis listening to its customer base. Its beta testing for 12.5 was done by nearly 3,000 of its customers. Feedback out of that testing was their desire to give archived backups more meaningful names. Up to that point, archived backups were given an automated, predetermined name assigned by Acronis Backup. In version 12.5, users have flexibility to assign names to archived backups that are easier to identify and use by applications other than Acronis.

The third feature up in the Acronis Backup 12.5 parade of features is its introduction of event-based backup scheduling. Almost every administrator has had a moment of concern or regret after he or she decommissions a server, or applies a patch or upgrade to it and realizes they did not think to do a backup of it to do a fail back or recovery if necessary. Event-based backup scheduling takes this worry about forgetting to complete this step off their plate. By configuring this feature within Acronis Backup 12.5 for protected servers, as soon as they initiate one of these activities, Acronis detects this task and completes a backup before the server is decommissioned, patched, or upgraded.

Young or old, everyone seems to love a parade in the summer as they watch the participants and anticipate the next float coming around the bend. The Acronis Backup 12.5 release contains a parade of features with three of them providing organizations new benchmarks by which to measure backup software in both innovative and practical manners. Whether it is helping companies to better detect and protect against ransomware or giving them better, more practical means to manage backups in their environment, Acronis Backup 12.5 provides some of the key new features that organizations need in today’s rapidly evolving world of data protection.




DCIG Quick Look: NGD Systems 24TB Catalina SSD Puts Tape on Notice for Active Archive Use Cases

Each passing week seems to bring new use cases for solid state drives (SSDs) further to the forefront and brings into question the viability of disk and tape for them. This week was no exception. The announcement of NGD Systems 24TB Catalina SSD directly targets use cases such as active archive where tape predominates but for which the 24TB Catalina SSD emerges as a potential replacement.

The use cases for flash, disk, and tape in today’s enterprise data centers largely break down as follows:

  • Flash is the preferred medium for production data
  • Disk is the preferred medium for primary backups
  • Disk and/or tape are the preferred medium for long term data retention and archives

That said, SSD manufacturers have these existing use cases for  tape clearly in their sights with the NGD Systems 24TB Catalina SSD specifically taking aim at the active archive use case. In this situation, organizations typically store large amounts of data that never changes but that they frequently reference and access. The best examples of these use cases include any content that is media related such as audio, photographs, or video.

Once this type of content is in its final production format they may never want it changed. Further, they may prefer to store it on low cost, high performance storage media that is optimized for read intensive applications.

This is where the NGD Systems 24TB Catalina SSD fits nicely for the following two reasons.

  1. Acceptable trade-off between power consumption and performance. SSDs may never equal tape cartridges in terms of their ability to consume zero energy. Conversely, tape cartridges will probably never match SSDs in their ability to deliver high levels of performance. The Catalina SSD, with its ability to deliver 24TB of capacity at less than 0.65 watts per terabyte, provides companies with a very reasonable trade-off between cost and performance.
  2. Smaller data center footprint. The Catalina SSD packs 24TB of storage capacity onto a single PCIe card. This is more than four times how much uncompressed data can be stored on a single LTO 7 tape cartridge. While LTO 7 tape cartridges promote capacities of 15TB compressed data, the 6TB uncompressed capacity is more meaningful in use cases such as media. The combination of storing 24TB of data on a single PCIe SSD coupled with the performance that flash offers over tape makes Catalina SSD very compelling in these use cases.

Despite these two benefits that the Catalina 24TB SSD offers, widespread adoption for it may still be some time off. A visit to NGD Systems’ website will make any enterprise wonder about the viability of either the product or its company. When I last checked its website the morning after I received its press release, the website listed neither the Catalina SSD nor the press release. This lack of visibility into the product and the company will certainly give organizations pause.

Second, in reviewing its press release, this product, while shipping, is still in qualifications with OEM providers. Further, in speaking with NGD Systems CEO Nader Salessi, he indicated that the SSD will initially be placed and used in servers. Large enterprises storing hundreds of terabytes or potentially petabytes of data will likely want to deploy purpose built storage appliances qualified with the Catalina SSDs. Until that qualification occurs, look at these 24TB SSDs to foretell the future of active archive in the next couple of years but do not expect them to  show up on your data center floor as a replacement for tape any time soon.




Three Tips for Breaking through the Backup Complexity of Today’s Enterprises

Perhaps nowhere does the complexity of the IT infrastructure within today’s organizations come more clearly into focus than when viewed from the perspective of data protection. Backup and recovery software sees first hand all of the applications and operating systems in an enterprise’s environment . Yet, at the same time, it is expected to account for this complexity by centralizing management, holding the line on costs, and simplifying these tasks even as it meets heightened end-user demands for faster backups and recoveries. To break through this complexity, there are three tips that any organization can follow to help both accelerate and simplify the protection and recovery of data in their environment.

A First Hand View of Backup Complexity in Today’s IT Infrastructure

Backup software literally touches and interacts with nearly every application, file system and/or operating system that an organization possesses. Though the exact applications, file systems and operating systems that an organization has in-house may vary, it can safely be said that the larger the organization, the more likely it is they have applications other than those from Microsoft, operating systems other than Linux and Microsoft Windows and at least two hypervisors. Despite this level of complexity, organizations expect the backup software to unobtrusively protect them, running in the background.

To do so, application performance must remain unaffected and backups must occur successfully and complete within designated backup windows (if a window even exists.) Further, organizations want backup software to scale to meet the data protection needs of their increasingly virtualized environment, handle the protection of their legacy physical environment and provide near instantaneous recoveries. Lastly, they want backup software to deliver all of these features while remaining simple to deploy and manage.

Backup software sits in the crosshairs of these growing organizational expectations with decreasing margins for error. As such, organizations need to choose the right backup solution that meets their heightened expectations for non-disruptive backup, fast recoveries and comprehensive data protection. Here are three tips that organizations can follow to pick the right solution to more quickly and easily backup and recover their increasingly complex IT infrastructures.

Tip #1 – Eliminate Backup Windows and Shorten Recovery Times

Backup software has become, if nothing else, all about speed in both backup and recovery. Organizations are less inclined than ever to tolerate a prolonged interruption in application performance while a backup occurs nor are they prone to twiddle their thumbs for hours on end while a restore occurs.

Data protection software now delivers on these heightened expectations for faster backups and recoveries. For example, the recent Symantec NetBackup 7.6 release includes its Accelerator for VMware feature. This feature capitalizes on VMware vSphere’s Changed Block Tracking (CBT) feature which NetBackup has supported for years to accelerate and provide near real-time backup of virtualized machines (VMs).

NetBackup AcceleratorSource: Symantec

Leveraging Accelerator for VMware, NetBackup 7.6 nearly eliminates backup windows. By identifying the appropriate deduplicated blocks associated with each VM stored on the NetBackup media server and then synthesizing them, NetBackup 7.6 creates a recoverable full backup image of the VM.

To then restore the VM, an administrator only needs to present this full backup image of the VM to the VMware vSphere ESXi server on which it is to be recovered. vSphere may then boot the VM and make it live even while the VM yet resides on backup storage. Once up and running, vSphere may then vMotion the VM from backup storage to the production VMware vSphere ESXi server. The result is that data can be made available to users within the time it takes to boot a VM, typically within a few minutes.

Tip #2 – Store Archive Data in the Cloud

The tipping point for storing more backup data with public storage cloud providers has arrived for at least two reasons. First, the economics of storing data online make more sense than ever. While the price per GB for online storage is still more than disk or tape, enterprises can minimize or eliminate their capital and operational costs associated with acquiring the infrastructure needed to store and maintain their backup data long term.

Minimally enterprises should look to store most if not all of their archive copies (backup data that is to be retained 90 – 180 days or longer) with public storage cloud providers. Aside from the flexibility of having these copies immediately available and accessible online should they ever need them, standard storage rates for Amazon Web Services (AWS) start at 3 cents per GB per month. Further, if one opts to use Amazon’s Glacier Storage, rates drop to as low as a penny per GB per month, though there are performance trade-offs associated with using this tier of storage mostly because of network bandwidth limitations.

Symantec Backup Exec 2014 and NetBackup 7.6 already give enterprises this flexibility to connect to AWS. Symantec Backup Exec 2014 provides Amazon Cloud Gateway VTL support so organizations may store archived backups in the Amazon cloud while the NetBackup 7.6 offers its Connector feature to store and retrieve backup data directly from AWS.

Second, public cloud storage and backup providers are partnering to provide new options to recover applications in their cloud once the data is stored there. Symantec recently announced its Disaster Recovery Orchestrator (DRO) to complement its backup and recovery solutions. DRO provides automated takeover and failback of Microsoft Windows applications residing on either physical or virtual machines to the Microsoft Azure cloud.

Tip #3 – Verify the Provider Offers an Integrated Backup Appliance

A story that was recently shared with me aptly illustrates why backup software also needs to be available as an integrated appliance. An organization had acquired two new NetBackup media server licenses but after two months the software was still “shelfware.” The problem? The server and security teams were stretched thin and did not have the time to approve the installation of the software on the new servers much less get the backup software itself up and operational.

To expedite the deployment of the NetBackup media servers, the organization instead opted to acquire two NetBackup 5230 integrated backup appliances. This solved the problem. The NetBackup backup appliances were ordered, shipped, installed, configured and running within three weeks as it reduced the internal approval processes that the backup team needed to go through while reducing the workload for both the security and server teams.

Clearly other organizations are seeing the same benefits of acquiring Symantec’s Backup Exec 3600 and NetBackup 5230 appliances in lieu of just buying the software and then doing the installation and configuration themselves. The IDC Integrated PBBA Revenue and Market Share report reveals that Symantec integrated backup appliances have captured 36 percent market share as of 2013.

2013 Integrated PBBA Market ShareSource: IDC 2014Q1 Worldwide Purpose Built Backup Appliance Quarterly Tracker.

Aside from speeding the configuration and deployment of backup software, backup appliances also simplify its ongoing maintenance and upkeep. Since the hardware is delivered as a known quantity to an organizations, Symantec can more effectively pre-test all patches, updates and upgrades before they ship. In turn, this gives organizations more confidence to apply patches as they come out and/or upgrade/update their backup software to the most current version to use their newest features.

The Bottom Line

IT infrastructure complexity has become almost synonymous with today’s organizations. However that does not mean that backup and recovery software has to be as complex as the environment that it protects. By using solutions such as Symantec Backup Exec 2014 and NetBackup 7.6, organizations can take significant strides toward eliminating their backup and recovery windows, centralizing the management of their physical and virtual backups and simplifying the deployment, configuration and ongoing maintenance of their backup software using integrated backup appliances all while laying the foundation to more effectively protect and recover their enterprise applications going forward, locally or in the cloud.




ZL Technologies Brings Virtual Scale-out to Enterprise Archiving and eDiscovery

One of the most exciting and terrifying times in the lifecycle of a company is transitioning from a small to mid-range or mid-range to enterprise sized company. Well led companies that survive those transitions have often been planning for the occasion for some time. The longer they have been planning the more likely they’ve become aware of the need for long term archiving. Of everything.

In some circumstances companies may find that in order meet compliance or litigation standards every instant message, document, revision of a SharePoint archive, blog or email in the company may need to be retained. When viewed from this perspective the volume of data created by even a small company quickly becomes staggering. Worse, if left unclassified all that data becomes next to worthless.

Clearly, the sooner companies in circumstances such as these adopt systems and processes to capture and classify all that data the better off they will be.  The trick is identifying the processes and infrastructure that can scale with these businesses as they grow.

An entire class of software products has been built to meet this herculean task. One of the most feature rich is ZL Technologies flag ship product, Unified Archive. Companies with large data retention requirements that are looking to grow should consider adding ZL UA to their list of candidates.

Data stores it can manage range from a small workgroup with a few hundred gigabytes under management to several petabytes. In one case study ZL states that a “top 5 US bank” has over 165,000 mailboxes under management.

Deployments of such wildly different scale are made possible by what ZL refers to as “ZL GRID.” This refers to ZL’s utilization of virtualization, clustering and other “cloud computing” techniques. The use of these techniques provides customers with a great deal of flexibility, scalability and performance.

Virtualization promotes flexibility and scalability by allowing the administrator to dynamically add and remove processing resources. Think of the approach as a variation on the old saying “Many hands make short work.” The system works by running a number of smaller independent components that break tasks up in to small chunks and then spread the work over several virtual machines.

Because the virtual machines work in a cluster, the work of the cluster as a whole is not interrupted as VMs are added or removed from the cluster.  This “cloud” approach allows for permanent long term scaling as a company grows but also enables them to adjust to temporary surges in demand or even one time jobs such as ingesting a large legacy data store.

When demand increases additional virtual machines are brought online to meet demand.  As the peak subsides the extra virtual machines can be shutdown, returning their resources for use elsewhere.

A common scenario would be during the “ingestion” of a legacy archival system into the new ZL data store. Ingestion is the process of importing existing data into Unified Archive. This is a very resource intensive process as each document, email or similar structured data is indexed, categorized and deduplicated for storage.

To prevent any user performance impact administrators could temporarily add additional virtual appliances, perhaps on non-production hardware, to handle the load of the ingest process. During a spike in user load the ingestion machines could be suspended and user facing VMs could be powered up. After the user load returns to normal the ingest VMs could be restored, continuing from where they left off.

Scaling is not just for ingest though. Each component in ZL’s architecture can be scaled separately. For instance, more VMs can be assigned to the eDiscovery component when legal is creating heavy traffic in the morning. Those VMs can be released when unneeded and then additional VMs spun up for the compliance engine in the afternoon. Clearly, such flexibility to focus resources where they are most needed is a great advantage over legacy systems.

High-availability clustered computing systems such as ZL GRID have the further advantage of being very resilient. If one or more nodes are lost the rest of the cluster is able to identify that they are not responding and reprocess the work that was lost. This often occurs without any impact to the user. This failover can even occur over multiple geographic locations.

For instance, in most cases companies will elect to have their most frequently used data stored locally. As the data ages it is moved further away, often to an off-site location or into the semi-public cloud. ZL GRID can be configured to failover to a remote site to prevent service interruption, though at reduced performance.

ZL Technologies makes no bones that they are targeting enterprise customers. Unified Archive is one of the most feature rich platforms on the market and not every company can fully utilize its power. However, businesses that have the need and are expecting growth will have a hard time saying no to a system that can scale to meet any challenge thrown at it.




Spectra Logic Outlines What It will Take to Transform Tape from Being an “Outie” to an “Innie”

Anyone involved with managing any serious amounts of data (and when I say “serious amounts of data,” I mean multiple PBs of data) knows that today’s disk-based storage solutions are, for the most part, not equipped to meet the diverse requirements of storing this amount of data. While still an extreme use case, a growing number of organizations have to manage PBs of data.

As such, they need a new type of storage solution – or an old storage solution with a new interface – that addresses this particular situation. While attending Spectra Logic’s analyst conference this week in Broomfield, CO, it presented what it considers the six attributes that tape must offer to transform itself from being on the “outs” with organizations to being viewed as the “in” technology again.

Driving the need for this new type of deep storage solution is an entirely new set of requirements being imposed on organizations. While most organizations still use – and will continue to use – storage systems intended for use by databases and file servers into the foreseeable future, the new world of mobile devices and multimedia have their own particular storage needs that these traditional disk-based storage systems are not so well positioned to meet. These require storage solutions that are:

  • Extremely low cost
  • Power efficient
  • Dense
  • Reasonably responsive

While these storage requirements of mobile devices and multimedia would seem to align perfectly with tape, tape has been on the “outs” with organizations for one major reason: it has an “old world” interface. Programmers no longer use block and file protocols when writing code – they use object-oriented code and manage data as objects. Currently tape systems have no such interface to meet these new demands.

This is not to imply tape storage vendors are ignorant of this deficiency in their tape systems. In fact, this is a big part of the reason why Spectra Logic brought a gaggle of analysts to its fall analyst event in Broomfield, CO – to share with us what is considered the six new attributes that a deep storage solution such as a tape library must offer to go from being an “outie” with organizations to once again being an “in” storage technology.

  • Provides a REST interface. This is a subset of HTTP which is a programming language that many programmers are familiar with and can write code. By offering a REST interface, tape libraries become a viable target to store data.
  • Persistent. The data storage solution must essentially maintain data forever. This includes transparently copying data to new media types, doing bit error detection and correction and ensuring that the integrity of the data is as viable 30, 50 or even 100 years from now as the day it was written.
  • Cost-effective. This refers to both the system’s up-front cost and cost over time. While tape and disk are arguably now about on par in terms of the upfront cost per GB, tape beats disk’s costs hand’s down over time. Further, a new generation of high capacity tape technologies is about to emerge that could re-establish tape’s lead over disk in its upfront cost per GB as well.
  • Energy efficient. Organizations are more sensitive than ever about ongoing operational costs and for data that is rarely accessed, they want storage solutions that minimize the impact to the bottom line over time.
  • Encryption. While this may not be a requirement in every environment, any organization housing sensitive personal information will need to encrypt it and have a means to manage the encryption keys for the life of the data.
  • Easy to deploy. This covers a number of bases to include providing web interfaces to access and manage it,  easy to setup and avoids using low-level block and file protocols as the only means to store data on it.

This is Spectra Logic’s outline for what storage solutions in general and tape libraries specifically must deliver to evolve to meet the next generation of end-user demands as well as remain relevant in today’s disk-centric world. Based on what else I saw and heard during the rest of the analyst conference and which will be formally announced by Spectra Logic tomorrow, Spectra Logic clearly took this principles to heart as it prepares to transform how people should view and deploy its tape solutions.




DCIG 2013 Private Cloud Storage Array Buyer’s Guide Now Available

DCIG is pleased to announce the availability of its 2013 Private Cloud Storage Array Buyer’s Guide that weights, scores and ranks over 150 features on 25 different cloud storage arrays from 15 different providers. This Buyer’s Guide provides the critical information that all size organization need when selecting a private cloud storage array that provides the availability, ease-of-use, flexibility and scalability features to meet the demands of their most data-intensive applications.

Private-Storage-Cloud-Software-Buyers-Guide-Logo-500x500.jpgIn October 2011 IDC estimated that the total amount that enterprises spent for on-premise storage in 2010 was around $30.8 billion. In that same interview IDC also forecast that expen­ditures on cloud storage solutions (both public and private) would explode to $11.7 billion by 2015. In other words, in just five (5) short years organizations would reallocate anywhere from 25 to 33 percent of their total storage budget and invest it in cloud storage technologies.

While no one yet knows precisely what percentage that companies will spend on either public or private cloud storage solutions, DCIG anticipates that in the near term (0-5 years) organizations will invest the majority of this new line item in their storage budget in private cloud storage arrays.

ApplicationContinuity.org DCIG is pleased to announce the availability of its 2013 Private Cloud Storage Array Buyer’s Guide that weights, scores and ranks over 150 features on 25 different cloud storage arrays from 15 different providers. This Buyer’s Guide provides the critical information that all size organization need when selecting a private cloud storage array that provides the availability, ease-of-use, flexibility and scalability features to meet the demands of their most data-intensive applications.

Private-Storage-Cloud-Software-Buyers-Guide-Logo-500x500.jpgIn October 2011 IDC estimated that the total amount that enterprises spent for on-premise storage in 2010 was around $30.8 billion. In that same interview IDC also forecast that expen­ditures on cloud storage solutions (both public and private) would explode to $11.7 billion by 2015. In other words, in just five (5) short years organizations would reallocate anywhere from 25 to 33 percent of their total storage budget and invest it in cloud storage technologies.

While no one yet knows precisely what percentage that companies will spend on either public or private cloud storage solutions, DCIG anticipates that in the near term (0-5 years) organizations will invest the majority of this new line item in their storage budget in private cloud storage arrays.

ApplicationContinuity.org found in a separate survey of 3,300 midmarket companies that 90 percent of IT managers felt it was critical to keep their key applications and data in-house and out of the <public> cloud. Yet these organizations still want the benefits that public storage clouds offer, they just do not want the inherent risks and uncertainties that they perceive public storage clouds still present.

This is where private cloud storage arrays enter the picture. They provide the key features that organizations need today more than ever as they enable organizations to:

  • Achieve high levels of availability
  • Easily configure and manage these systems so that as the private cloud storage array grows larger the management of it does not become more complex
  • Non-disruptively perform routine maintenance and upgrades
  • Perform reliably
  • Scale predictably
  • Start small with only the capacity and performance that they need with the flexibility to grow larger

The ability of private cloud storage arrays to deliver on all of these features is particularly important for those organizations that opt to implement private cloud storage arrays to host many and/or all of their production applications. When deployed into these environments, any planned or unplanned downtime or disruption in service at any time for any reason can have potentially catastrophic consequences for the entire business.

It is in this context that DCIG presents its 2013 Private Cloud Storage Array Buyer’s Guide. As prior Buyer’s Guides have done, it puts at the fingertips of organizations a comprehen­sive list of private cloud storage arrays and the features they offer in the form of detailed, standardized data sheets that can assist them in this important buying decision.

The DCIG 2013 Private Cloud Storage Arra Buyer’s Guide accomplishes the following objectives:

  • Provides an objective, third-party evaluation of currently available private cloud storage arrays
  • Evaluates, scores and ranks private cloud storage arrays from an end-user’s perspective
  • Includes recommendations on how to best utilize this Buyer’s Guide
  • Provides data sheets on 25 private cloud storage arrays from fifteen (15) different providers so organizations may do a quick comparison of features while having sufficient detail at their fingertips to make an informed decision
  • Provides insight into each private cloud storage array’s support for various applications, the robustness of its hardware, its management and replication capabilities, its integration with VMware vSphere and what levels of support it offers

The DCIG 2013 Private Cloud Storage Array Buyer’s Guide Top 10 solutions include (in alphabetical order):

  • Coraid EtherDrive
  • Dell Compellent Storage Center SAN + FS8600 NAS
  • EMC Isilon NL-Series
  • EMC Isilon S-Series
  • EMC Isilon X-Series
  • IBM SONAS
  • IceWEB 6500 Series
  • NetApp FAS3250
  • NetApp FAS3220
  • Nimbus Data E-Class Flash Memory

The NetApp FAS3250 achieved the “Best-in-Class” ranking among the private cloud storage arrays that DCIG evalu­ated with its companion FAS3220 close on its heels.  The NetApp FAS3250 and FAS3220 successfully deliver the broadest range of features for those organizations looking for a single array that may be used in multiple roles.

Small and midsize enterprises will find that the FAS3250 and FAS3220 offer the breadth of features in a single system to meet their various application needs ranging from data-intensive, cost-sensitive applications to performance-intensive, highly available ones with the only measurable differences between the two models being the FAS3250 offering more capacity and performance.

In doing its research for this Buyer’s Guide, DCIG uncovered some interesting statistics about private cloud storage arrays in general:

  • 100% support a federated management interface from which all storage arrays in an environment can be managed from a single, common portal
  • 89% support thin provisioning
  • 82% license their standard management software as part of the system’s acquisition cost
  • 80% support storage tiering
  • 67% support more than 1 PB of storage capacity
  • 60% support connectivity to a public cloud storage provider with Open Stack being the most widely supported option
  • 56% support VAAI
  • 48% support NDMP
  • 41% support some type of deduplication technology
  • 26% support more than 1 exabyte of storage capacity

The DCIG 2013 Private Cloud Storage Array Buyer’s Guide is immediately available. It may be downloaded for no charge with registration by following this link.
found in a separate survey of 3,300 midmarket companies that 90 percent of IT managers felt it was critical to keep their key applications and data in-house and out of the <public> cloud. Yet these organizations still want the benefits that public storage clouds offer, they just do not want the inherent risks and uncertainties that they perceive public storage clouds still present.

This is where private cloud storage arrays enter the picture. They provide the key features that organizations need today more than ever as they enable organizations to:

  • Achieve high levels of availability
  • Easily configure and manage these systems so that as the private cloud storage array grows larger the management of it does not become more complex
  • Non-disruptively perform routine maintenance and upgrades
  • Perform reliably
  • Scale predictably
  • Start small with only the capacity and performance that they need with the flexibility to grow larger

The ability of private cloud storage arrays to deliver on all of these features is particularly important for those organizations that opt to implement private cloud storage arrays to host many and/or all of their production applications. When deployed into these environments, any planned or unplanned downtime or disruption in service at any time for any reason can have potentially catastrophic consequences for the entire business.

It is in this context that DCIG presents its 2013 Private Cloud Storage Array Buyer’s Guide. As prior Buyer’s Guides have done, it puts at the fingertips of organizations a comprehen­sive list of private cloud storage arrays and the features they offer in the form of detailed, standardized data sheets that can assist them in this important buying decision.

The DCIG 2013 Private Cloud Storage Arra Buyer’s Guide accomplishes the following objectives:

  • Provides an objective, third-party evaluation of currently available private cloud storage arrays
  • Evaluates, scores and ranks private cloud storage arrays from an end-user’s perspective
  • Includes recommendations on how to best utilize this Buyer’s Guide
  • Provides data sheets on 25 private cloud storage arrays from fifteen (15) different providers so organizations may do a quick comparison of features while having sufficient detail at their fingertips to make an informed decision
  • Provides insight into each private cloud storage array’s support for various applications, the robustness of its hardware, its management and replication capabilities, its integration with VMware vSphere and what levels of support it offers

The DCIG 2013 Private Cloud Storage Array Buyer’s Guide Top 10 solutions include (in alphabetical order):

  • Coraid EtherDrive
  • Dell Compellent Storage Center SAN + FS8600 NAS
  • EMC Isilon NL-Series
  • EMC Isilon S-Series
  • EMC Isilon X-Series
  • IBM SONAS
  • IceWEB 6500 Series
  • NetApp FAS3250
  • NetApp FAS3220
  • Nimbus Data E-Class Flash Memory

The NetApp FAS3250 achieved the “Best-in-Class” ranking among the private cloud storage arrays that DCIG evalu­ated with its companion FAS3220 close on its heels.  The NetApp FAS3250 and FAS3220 successfully deliver the broadest range of features for those organizations looking for a single array that may be used in multiple roles.

Small and midsize enterprises will find that the FAS3250 and FAS3220 offer the breadth of features in a single system to meet their various application needs ranging from data-intensive, cost-sensitive applications to performance-intensive, highly available ones with the only measurable differences between the two models being the FAS3250 offering more capacity and performance.

In doing its research for this Buyer’s Guide, DCIG uncovered some interesting statistics about private cloud storage arrays in general:

  • 100% support a federated management interface from which all storage arrays in an environment can be managed from a single, common portal
  • 89% support thin provisioning
  • 82% license their standard management software as part of the system’s acquisition cost
  • 80% support storage tiering
  • 67% support more than 1 PB of storage capacity
  • 60% support connectivity to a public cloud storage provider with Open Stack being the most widely supported option
  • 56% support VAAI
  • 48% support NDMP
  • 41% support some type of deduplication technology
  • 26% support more than 1 exabyte of storage capacity

The DCIG 2013 Private Cloud Storage Array Buyer’s Guide is immediately available. It may be downloaded for no charge with registration by following this link.




The Perfect Storm for Archiving is Forming Right Now

Digital archiving suffers from a perception problem though one that is probably well-deserved. Perceived as difficult to cost-justify, hard to implement and whose benefits can often be achieved by simply throwing more disk at the problem, most companies have had a hard time justifying its deployment. However a wave of fundamental changes in the storage industry as a whole and in digital archiving technology itself are setting this technology up to be one of the hottest technologies in the months and years to come.

To say that no one invests in archiving would be at best inaccurate. A 2012 IDC worldwide storage software market report highlighted that even as the growth of storage software in general remains flat, archiving software was a bright spot coming in #2 behind only data protection and recovery software in terms of its year over year growth. However the #2 spot only translated into $404 million of total revenue – not too shabby but certainly nowhere near what organizations collectively spend annually on storage hardware.

This is poised to change. A number of trends occurring in the storage industry specifically and in the broader computing industry are setting the stage for archiving to play a much larger role in all size organizations beginning in 2014. The specific trends that are setting up archiving to assume this broader role include:

  • Archiving appliances. One of the obstacles associated with deploying archiving is simply getting the archiving software configured and deployed. Software, servers and storage must all be acquired and then configured. To do so, organizations either hire consultants or have someone do it in-house – all of which takes time and typically results in custom implementations that can be difficult to manage and support.

Archiving appliances such as the Nexsan Assureon (now part of Imation) include both software and hardware. These appliances expedite setup and configuration, ship with default policies so organizations can immediately implement best practices when setting up archiving and offer sufficient scalability to hold hundreds of TBs of data if not petabytes of data.

  • Centralized data stores. Archiving really only works well when data is centrally stored. While archiving has enjoyed some success in database (Oracle, Sybase, SQL Server) and email (Exchange and Lotus Notes) environments, for it to gain broader adoption across more organizations they also need to centrally store their file data. As more organizations centralize the storage of their files on file servers, it becomes both more practical and cost-effective to archive the data residing on these file repositories as much of this data is inactive (60 – 95% depending on who you read) and rarely or never accessed.
  • Flash memory and hybrid storage systems. The future of primary storage systems is definitely storage systems with flash memory or some combination of flash and hard disk drives (hybrid.) The potential downside associated with either of these systems is that the cost of storage capacity is anywhere from 3 – 20x of what storage costs on HDDs now. Further, these flash and hybrid systems make the traditional approach to handling data (throwing more disk drives at the problem) impractical. By implementing archiving, only the most active and/or performance intensive data needs to reside on flash and hybrid systems with the rest moved off to archival storage.
  • “Infinite” retention periods. Some vendors promote the idea of deleting data and attorneys may even advise their own corporations to delete data that they are no longer legally bound to keep.

However the individual who actually has to delete data often feels like the individual who has to push the button to launch a nuclear weapon. You may receive the order to push the button (or in this case, delete the data) but that person also knows that if anything goes wrong or it is determined later that the data is needed, they are the one who will more than likely be blamed for deleting it. (I know because I was once this person.)

Archiving gives organizations the flexibility to economically keep data much longer, potentially even forever using the latest optical technologies, while individuals do not have to worry about losing their job for deleting the wrong data.

  • Public storage clouds to create “infinite” storage capacity. In November 2012 NetApp announced its NetApp Private Storage Cloud for Amazon Web Services (AWS) Direct Connect so that users of NetApp storage can transparently move data stored on NetApp filers to a back end Amazon cloud. According to various individuals within NetApp, this alliance has created more interest among its customer base than almost anything else it has announced in its history. The appeal of this type of solution is that organizations can theoretically store as much data as they want on a file server as the file server essentially now acts as both an infinite storage pool and an archive.
  • Too much data to backup and restore. A question that does not get asked nearly enough is, “How to do you quickly backup or restore tens of TBs, hundreds of TBs or even PBs of data?” The answer is you don’t. By using a robust archiving solution, data is moved off of primary storage so the data that does remain in production can be backed up within established backup windows. Once data is in the archive, two or maybe three copies of that data are made with the archive itself backed up once or twice. Further, once data is in the archive, a robust archiving solution will continually monitor and check on the integrity of the data in the archive and then repair it should irregularities in the data be detected.
  • Unstructured data growth.  Companies are creating data from a variety of sources. While humans create much of it, machine generated data is quickly becoming the largest generator of data as it may come in from multiple sources 24 hours a day, 7 days a week. The questions organizations then struggle with are, “How valuable is this data?” “When in the data’s life cycle is it most valuable?” and “Will it have value again (and again?)” Archiving provides a cost-effective means to keep this data around and easily accessible until such determinations can be made.

Archiving has suffered from a perception problem for years if not decades in large part because it has being difficult to cost-justify and hard to implement and manage. In fact, it often felt like archiving vendors were trying to fit a square peg in a round hole.

That analogy no longer applies. A combination of trends is coming together to form a perfect storm as to why archiving should move to the top of the technology heap  that will make it one of the more needed and sought after storage technologies for organizations of all sizes.




The Real World PST Problem and How to Deal with It; Executive Interview with C2C Systems CTO Ken Hughes Part VI

Everyone frequently talks about archiving data when they know the make-up of the data is and where it is located. But what no one want to discuss is the more common real-world problem of not even knowing where data is so it may be archived – especially as it pertains to Outlook PST files. In this sixth and final blog entry in my interview series with C2C Systems’ CTO Ken Hughes, he talks about the real world problem of finding and archiving PST files in organizations and how ArchiveOne takes that into account in its architecture.

Charles: How does C2C handle archives such as PSTs that are already out there. Are you relying on the existing organization’s infrastructure to access that?

Ken: PSTs are a significant problem. Many people try to avoid or hide the scale of the problem that they have. Most of our competitors will just say, “If you want to ingest a PST, you have got to find it. You have to move it to a location where our product can ingest it.”

This is not reality. C2C is talking to some companies today with 100,000 employees who believe they have a million PSTs. A million PSTs at a GB each is a PB of data. That is a huge amount of data.

Further, if you have the normal employee turnover of about 15 percent, which is roughly the industry norm across all industries, then that is going to create 2.5 times the head count of your existing head count today over a 10 year period.

This means if an organization’s policies were, as an employee left the organization, to ingest all of that employee’s data into a PST and then put it somewhere, the organization will have a lot of PSTs scattered around its network.

Then factor in backup, unassociated PSTs, corruption, overhead, managing them, storing them, it becomes overwhelming. C2C knows customers who believe that 50 percent of their SAN is taken up with PSTs. Those PSTs have been created because of their users trying to get around the mailbox quotas imposed by the company.

So companies have completely shot themselves in the foot and now cannot manage the PSTs they have. They cannot look inside them because there is no simple tool to look inside them. Further, should any eDiscovery request come along, it is a nightmare.

Even with people moving to Office 365 or Exchange 2010, they still cannot manage the PSTs. Although Office 365 can ingest PSTs, organizations still cannot go and find them, they still cannot manage them, they still cannot look at what is inside them to make the decision of whether or not the organization wants to keep the data.

C2C has complete PST management within ArchiveOne. But C2C has also split some of it out into its PST Enterprise product. This product only goes out and looks at and does management of PST data.

Over the last couple of years C2C has been particularly talking to companies of 100,000 or more users that have somewhere in the region of 250 to 400,000 PSTs. Around 30 to 40 percent of those PSTs are on desktops or laptops with a good number of those PSTs unassociated and companies having no idea who these PSTs belong to. So they have find out who these PSTs belong to since before you can understand what you want to do with that data, you first have to find it.

In the e-discovery world are lots of people on the right hand side of EDRM spectrum moving to the left. C2C is solidly in that information management and identification ability, and then the ability to preserve and collect data.

C2C wants to preserve it and collect it. We do not care about the location. If you have can see the archives then that is fantastic. But C2C’s view is that seeing the PST archives is just not real world. A lot of that data is simply scattered around the network.

C2C goes out and finds it, takes a copy of it, preserves it, and then allows organizations to make the decision about how to best manage it without your legal counsel sweating over whether or not that data been deleted. So can C2C really find it? The answer is yes. It’s not difficult to us.

C2C sees archiving and eDiscovery and retention management as a balance. It is not a case of archive it first and then we can do eDiscovery and then we can do retention. It’s all about the balance.

Charles:  Are you selling this solution as a separate product? Or is that part of your overall installation?

Ken:  It is ArchiveOne. When you buy ArchiveOne, you buy all the discovery modules, you get the PST management, and you get the retention. Granter, there are some bolt on extras to make this data available to the general counsel that offers a simpler user interface.

C2C is just trying to build on its ease of use for the end user. C2C’s focus has always been trying to make accessing data as transparent as possible. C2C has given organizations the exact same experience to get with Outlook web access or with normal Outlook, to browse through their archives. This is what makes our solution completely seamless since all of the archive data is accessible within Outlook as well.




2013: ARM, SSD and Common Slot Servers

Bad news is only bad until you hear it, and then it’s just information followed by opportunity. Information may arrive in political, personal, technological and economic forms.  It creates opportunity which brings people, vision, ideas and investment together.  When thinking about a future history of 2013 opportunities, three (3) come to mind:

  • Solid state storage
  • 64bit ARM servers
  • Common slot architecture for servers

While two of these are not new by themselves, an amalgamated version of them is a recipe for necessity. The most novel of the three is common slot architecture for servers.  Common slot architecture allows an Intel or AMD x86, and Samsung or Calxeda CPU to be plugged into the same board and system. But, let’s start by looking at solid state storage impact on storage architecture.  It can eliminate or mitigate at least four (4) storage architecture constraints:

  1. IOPS – Inputs/Outputs per Second
  2. Latency – The time between when the workload generator makes an IO request and when it receives notification of the request’s completion.
  3. Cooling – The design and implementation of racks and HVAC for data centers
  4. Ampre – The design and implementation of electrical systems for data centers and cities

While some may disagree with the assertion, a majority will agree that solid state storage modules or disks (SSD) are fast, much faster than their hard disk drive (HDD) brethren. In less than two years the measurement for IOPS has increased from a few hundred thousand to over one (1) million as expressed in “Performance of Off-the-Shelf Storage Systems is Leaving Their Enterprise Counterparts in the Dust.” Thus it can be assumed the median IOPS requirement is some where between a few hundred thousand to one (1) million.

In that regard, it’s fair to say that most applications and systems would perform quite well with the median output of a solid state storage system. Thus, when implementing an all solid state storage system the median IOPS requirement can be met – CHECK.

Secondary to IOPS is Latency.  Latency is a commonly overlooked requirement when gauging the adaptability of a storage system to an application. While defined above, Latency is referred to as “overall response time (ORT)” as commented by Don Capps Chair at SpecSFS.  In 2012 Mr. Capps wrote to DCIG suggesting this format when sharing SpecSFS results “”XXX SPECsfs2008_cifs ops per second with an overall response time of YYY ms.”

ORT and IOPS are not on par with each other. In that regard, a high IOPS number doesn’t result in a lower latency. For example, Alacritech, Inc.  ANX 1500-20 has 120,954 SpecSFS 2008_nfs ops per second with an overall response time of 0.92ms, whereas the Avere Systems, Inc.  FXT 3500 (44 Node Cluster) has 1,564,404 SPECsfs2008_nfs ops per second with an overall response time of 0.99 ms. In both cases the ORT is under 1 ms and meets the latency requirements for the broadest application cases, but the IOPS are nearly 10x different.

The examples above are designed to illustrate a point architecting a system to meet a balance of IOPS and Latency can go on for hours discussing controllers, memory, disk and networking (as do all performance baseline and bottleneck detection). Conversely, SSD has the ability to meet performance requirements of IOPS, while delivering low with little modification or discussion.  Consequently latency and IOPS are easily balanced when using SSD – CHECK.

The final two constraints mitigated when using SSD compound each other – cooling and power. Let’s take cooling first.  For a system to be properly cooled it must be properly powered or geographically located. For simplicity, let’s assume you can’t build your data center in Prineville, OR. In that regard, it must be properly powered.

Since power must be adequate, the first thing a storage architect must consider is whether or not they can cool and power storage devices. Larger capacity systems offering higher IOPS and balanced latency require more power to cool and run them, thus compounding requirements. An architect must work with data center operations to balance cooling power with storage device power.

Here is where borrowing from Jerome Wendt, Lead Analyst and President of DCIG, is prudent:

Quantifying performance on storage systems has always been a bit like trying to understand Russia. Winston Churchill once famously said in October 1939, “I cannot forecast to you the action of Russia. It is a riddle, wrapped in a mystery, inside an enigma; but perhaps there is a key. That key is Russian national interest.

Power is limited to the amperage available from an public utility.  Limitations on available amperage creates a fixed constraint. Choosing storage with reduced power and cooling needs would mitigate the consequences of the fixed constraint. In that regard, SSD reduces complexities introduced by conflicting power and cooling architectures.  While some may disagree, we know SSD requires less power and less cooling, and with less cooling, power needs are further reduced. SSD can or will eliminate the complexity related to power and cooling requirements – CHECK (Reference: The real costs to power and cool, IDC 06/2008).

Articulating storage needs isn’t based solely on capacity. Storage architects must consider IOPS, latency (ORT), capacity required to meet IOPS & latency needs, data center rack space in the form of “U space“, square footage for cooling, and physical (cooling/power) operations. While some will disagree that these are required with SSD, there complexity is significantly reduced if not eliminated in a broad deployment of SSD.

While SSD can meet capacity, IOPS and ORT while reducing power and cooling costs, many of today’s all flash memory storage arrays are based on x86 software and hardware. It is x86 that creates a barrier to entry for data center deployment of SSD. It is true that some may argue for x86 processing despite a high power-to-heat requirement, we know that ARM can deliver processing with substantially lower power-to-heat requirements.

To the point of power-to-heat, Calxeda published benchmarks indicating a 10x difference comparing x86-to-ARM power-consumption-versus-data-production with a +5 ms storage response time. From a marketing standpoint 10x is a great number, but a 5x difference is “good enough.” 5x enables one to start thinking about replacing individual network attached storage (NAS) systems with private cloud scale-out storage systems using ARM processors and solid state storage modules or disk.

In that regard, it is my opinion that the market will desire ARM servers based on common slot architectures. Common slot architecture allows an Intel or AMD x86, and Samsung or Calxeda CPU to be plugged into the same board and system. Slot homogenization will reduce dependency on specific manufacturer motherboard designs (e.g. Intel) and allow for better elasticity in data center deployments.

As a result of homogenization, market pressure will pressure ARM processor vendors to enter scale-out NAS space in 2013. To that end, Calxeda silently noted their desire in late 2012 to enter the enterprise storage market in this piece by Stacy Higginbotham of GigaOM. Ms Higginbotham writes:

Its t
ests show roughly a 4X improvement in IOPs for a rack of Calxeda SoCs versus x86-based systems. Adding Calexeda’s SoCs also cuts complexity because the entire system of processing and networking components are integrated on the SoC, and the terabit-plus fabric between cores also offers more network capacity between cores in a system -the so-called east-west networking traffic.

Calxeda’s commentary muted the value of SSD, because Calxeda believes that the power hungry storage systems aren’t concerned about power consumption. Instead they believe storage systems are looking for more IOPS by adding process and memory capability to a backlog of disk operations. While that assertion has some flaws, real value for ARM is power consumption and reduced heat signature.  ARM combined with SSD delivers an investment annuity in operational and capital expenditure savings.

Complementing Calxeda’s commitment to ARM is Apache‘s port of popular software that meets big data processing and storage requirements. Some will argue that SSD doesn’t make sense in big data. But, common sense indicates storing 10 PB of data on spinning disk (HDD) over a period of a few years requires you to start migrating when the last TB is added.  Controller aging alone require data to find a new home in immediately, or have a common slot server upgrade.

Factoring in an all SSD and ARM based scale-out storage system using open compute common slot architecture reduces or eliminates the top four (4) storage architect requirements AND delivers a storage ecosystem with the flexibility to exist in excess of 10 years.

Further complementing the marriage, ARM and SSD should have similar data center architecture requirement as tape.  For example, let’s track a company like Quantum with StorNext. They may port StorNext to ARM and take advantage of the $1/GB SSD prices as a way to transition tape customers from tape to new storage systems. Using ARM and SSD, very little would need to change with the data center power and cooling.

Finally, look for companies like Samsung to be a powerful force in componentry as they continue to produce SSD and start the development of their ARM server processors. DCIG believes that as 2013 progresses, we’ll experience a pull from the market for these storage systems long before the manufacturers are geared up to push them.




Synchronizing Archival Data Stores in the Cloud; Interview with C2C Systems CTO Ken Hughes Part V

One of the most common initial use cases for cloud storage is for the storage of archival data. However that does not mean every organization is quite ready to move all of their archival data to the cloud or, what they do move to the cloud, trust the cloud to be available to provide access to the data when they need it. In this fifth blog entry in my interview series with C2C Systems’ CTO Ken Hughes, he talks about the importance of having access to cloud storage repositories for archival data and the advantages of keeping on-premise and data in the cloud synchronized.

Charles:  How important is it to organizations to actually go in and manage their data store because they recognize their data stores are a mess? For example, they know they have all these emails sitting out there that are completely useless in that they know they do not need them anymore. Rather they just need to identify them and then apply a data retention policy to them. Is that a big or just a partial draw of your solution?

Ken: That is a big draw as that side of the business is increasing. Further, the push for this is NOT just coming from the IT department. The new driver is from the records management side of the house which encompasses compliance and the general counsel who want to ensure that data is correctly treated. That is what we call the information governance driver taking control.

Charles: Would you say their data stores are a mess in part because they are just storing everything and they do not even know what they have? If so, how do organizations put controls in place?

Ken: Yes, that is largely correct. That is why C2C offers the ability to do this control with an advanced discovery module that allows the non-IT person to go and search and manage data, map it out and identify different business processes.

This effectively gives the general counsel to go and say, “Find all the emails that are between these people in this date range.” He is going to get back a list of either a 10 or 10,000 emails. In our view, you are trying to do the very, very early stages of eDiscovery to understand the scale of a problem.

We can also archive that data and put it on litigation hold. Using C2C, it can search all these PSTs for all these custodians. If it finds anything, it makes a copy of it to make sure it is not deleted. This copy may then be put into a temporary litigation repository until somebody has made a decision this is a real case to answer.

We can then optionally turn that temporary repository into a permanent archive so that data is secure. This can actually be done by the general counsel without any involvement by the IT department.

ArchiveOne Enterprise already has all of these features in it. Most competitors will tell you they have got it all in there. While that is true to a degree, what separates C2C is the case of sophistication that ArchiveOne gets into.

For example, in our recently launched ArchiveOne Enterprise Cloud, C2C povides the ability to have a hybrid cloud. Using this new option, C2C gives organizations the option of using cloud storage as a form of disaster recovery for their archives.

These archives are completely synchronized so an organization can move to and from it with just a right click and selection of a button. This also becomes very helpful for the organization that has run out of storage and cannot physically add that more storage due to a lack of space. Using this option, they may want to offload all of their archived data to the cloud storage giving them the best of both worlds: anything from on premise archives to archives completely stored in on cloud storage.

Charles:  What cloud storage providers are you working with? Any specific providers like AWS or Rackspace? Or does it not matter?

Ken:    C2C works with a couple in this first release. One is the Amazon S3 service and the other is Nirvanix. We basically have two options for the hosted storage stuff. We do front it ourselves and sell that on as a service, for people that want to buy one-stop turnkey solution from C2C. Or if they already have their own account set up with any of those vendors, then they can use the second option which allows them to use their existing account.

In Part I of this interview series, we discuss C2C’s focus on Microsoft Exchange and which size environments C2C’s products are best positioned to handle.

In Part II of this interview series, Ken explains why eDiscovery and retention management are becoming the new driving forces behind archiving and why C2C’s ArchiveOne is so well
positioned to respond to that trend.

In Part III of this interview series, Ken discusses C2C’s policy management features and the granular ways in which users may manage deletion in their data stores.

In Part IV of this interview series Ken explains how C2C does search using a combination of both centralized and distributed search methodologies.

In the sixth and final part of this interview series Ken explains why PST management is such a problem in large enterprises and offers some suggestions as to how to get PST files under control.




Tips to Doing Search and Establishing Data Ownership; Executive Interview with C2C Systems CTO Ken Hughes Part IV

Doing searches across unstructured data stores and understanding who owns this data are emerging as higher priorities in today’s Big Data era. However archiving software can vary greatly in how it performs these tasks of search and assigning data ownership. In this fourth blog entry in my interview series with C2C Systems’ CTO Ken Hughes, he examines how C2C performs search across distributed email and file systems and what techniques it employs to establish data ownership.

Charles:  How does C2C do search? The fact is that it has the ability to set up processes, policy definition, and implementation, based on certain time frames, from an architecture standpoint.  Is this a distributed type of system that’s doing this? Or is it all done from a central location?

Ken:  The answer is both of those. C2C offers software that runs on a server which is typically in your data center or close by or next to your Exchange server.

All of the mailbox data, or data that we have previously pulled into the archive, that is easy to process. Most vendors can do that because they can guarantee that the source of the data is online. So the real question becomes, “How do we access the remote work stations with PSTs on them or the file servers with PSTs?

To do that C2C does have some client technology that just runs in memory so there is no install required. Typically what C2C does is it has a one lane edit or change to a user’s log in script that basically runs the client from a network share.

In many cases users do not even know that it is running in the background. It will basically sit in memory and wait to be told what to do by the central server. So every five minutes it will wake up and ask if there is anything to do.

It basically works in two modes. One is that it works in Outlook mode whereby it will wait for the user to open Outlook and then examine their PSTs that are open within the Outlook session, and do whatever processing is needed. It just searches through all the messages, finds everything that matches the criteria, and then carries out “X” actions on them.

The other mode is file system mode. A user basically tells it to either search all the drives or gives it some paths that it wants to search. Then it does not care about whether Outlook’s open or not. It just will go through the file system as well as find any PSTs and load them into its own version of Outlook and then checks through all the data, and carries out the actions on them.

Then it either has to send the data off for archiving which it does over web services so an organization can run it inside corporate HQ, or individuals can run it from home or really anywhere since web services usually traverse firewalls. So yes, C2C is archiving data, sending it up to the server, or if C2C is moving or copying the data, it is sent to the server and processed from there.

The precursor to all of this obviously is that when an organization starts today it does not know what PSTs are out there, which ones it wants to migrate, or even which ones it wants to process. Organizations do not know the scale of that problem that they have. Is it 1 TB? Is it 100 TB? Who knows?

Organizations can make some guesstimates with some storage tools. But until they actually have the data and understand it, then they do not really know. So what C2C offers is a couple of nodes that tell organizations about the PSTs that C2C has found.

As C2C searches for these, the first thing C2C does is say, “Hey we found a PST! Here is the kind of data that is in it, here is the metadata about that PST, who owns it, where it is on disk, what client, what workstation it was on, some information about the size of it and the size of the data inside it.

C2C will tell the organization the mailbox name. It will give it the title of that PST. It will reveal the size on disk as well as the size of the data inside it. There can be vast differences here if there is a lot of byte space or it was a big PST and an individual has deleted a lot of data. It can make drastic differences.

C2C has seen organizations where it has over 100,000 PSTs that when they have used the storage tool, they think they have hundreds of GB or even TBs.  But when they have actually analyzed the volume of data that’s inside the PSTs, the actual sizes of the messages inside it reveal a completely different amount of data. It can be 50 percent or more of wasted space.

This also tells you some things about mailbox quotas. You obviously do not want to go ahead and ingest the PST if it is going to blow your mailbox quota.

The other problem is when you have just found a PST on a disk somewhere. This is a different kettle of fish. What C2C has to do with these is basically examine all the data contained within in, and then make a calculation as to who we believe the owner is.

C2C does that with four separate algorithms and each one comes with a different confidence factor. The best case is obviously loaded in Outlook. If we cannot do that, then we are just looking at uncoupled PSTs. Then the first thing we do is look in the sent items folder. If everything has been sent from one user, we can pretty much be sure that that user is the owner of the data as that gives us like a 95 percent confidence factor.

From there we fall back to some methodologies such as counting recipients and tallying up who is copied on the cc fields and bcc fields. Then we look for the most prominent user. Depending on the ratio of that most prominent user to the next users gives you different confidence factors.

That allows us to basically work out in an automated fashion who the owner of the data is. Because at some point if an organization is doing a discovery or wants to migrate data into a mailbox, it needs to know which mailbox to migrate in.

This is automated though an organization can manually override it if it knows for instance that it does not have a high level of confidence in the accuracy of the results. In those case C2C can leave it unassigned, and an organization can go ahead and manually assign ownership if needed.

In Part I of this interview series, we discuss C2C’s focus on Microsoft Exchange and which size environments C2C’s products are best positioned to handle.

In Part II of this interview series, Ken explains why eDiscovery and retention management are becoming the new
driving forces behind archiving and why C2C’s ArchiveOne is so well
positioned to respond to that trend.

In Part III of this interview series, Ken discusses C2C’s policy management features and the granular ways in which users may manage deletion in their data stores

In Part V of this interview series Ken explains how C2C manages archival data stores.




Data Management Software Finally Poised to Become a Data Center Priority in 2013

Ever since I got involved with IT in general and data storage specifically, the predominant way that organizations manage their data growth is by throwing more storage at the problem. Sure, they pay homage to technologies like archiving, data lifecycle management and storage resource management (SRM) but at the end of the day the “just buy more” principle prevails. Yet as we enter 2013, data management is finally poised to become a data center priority.

The rationale as to why the “just buy more” storage principle has prevailed over the years and will continue into the future is fairly simple to understand.

  • First and foremost, the cost of buying more storage has been perceived as “cheaper” than trying to put in place all of the software, procedures and people needed to manage the data. Whether or not this perception is accurate is unclear (I suggest it is not.) But I know from my own experience that when organizations I worked for in the past had to make a decision between investing in more people/technology to better manage the data or buying more storage to hold the data, the storage technology got installed and the people managing the data were turned out on the street to look for a new jobs.
  • Second, a turnkey solution to manage data did not  exist. Everyone (at least from a vendor side) always seems to act like it is no big deal to manage data. What they always seems to fail to grasp is that rarely is data managed the same way in any organization. Even seemingly “routine” data associated with work flows such as accounting, HR and payroll may result in data taking various paths through a company. Further, each company has its own expectations as to how it expects to access data and then manage and retain it long term. Translating these expectations into a form of software that can be universally available for any organization which may be easily implemented and that then manages the data simply does not exist. As such, companies have their own internal processes that generate more data which require more storage.
  • Third, companies capture more data than ever before. Existing business processes continue to generate new data but new technologies are accelerating data generation. As companies do more video surveillance, encourage BYOD, store data in the cloud, adopt social networks and collaborate locally, nationally and globally, they generate increasing amounts of data. These by default dictate they buy more storage in support of these initiatives.

In this respect, the concept of “just buy more” will continue unabated into the foreseeable future and probably even accelerate in the years to come.

Yet the reason  that data management software is poised to take on larger role in data centers in 2013 is the benefits of better managing the data will finally outweigh the costs of just buying more and letting the data residing on it grow in an unchecked manner.

By this I do not mean that organizations are necessarily going to start buying more archiving or SRM software and then try to implement it themselves. While this may occur in a few organizations, the ways in which most are going to deal with it is through getting more mileage out of technology they are already buying.

Here are some key examples of companies that are already capitalizing on this trend.

  • CommVault exemplifies a company that has transformed itself from a provider of backup software into a data management software provider. Sure, CommVault still does backup and arguably is the best product on the market based on DCIG’s research. But what it brings to the table is also archiving, SRM and search. So while companies may buy CommVault because they have a budget for backup, they get a lot more than backup when they implement CommVault.
  • EMC figured out a long time ago that more production data meant more data to backup and brought companies like Avamar and Data Domain into its portfolio. Now its Backup and Recovery Services (BRS) is the fastest growing part of EMC’s portfolio of companies in large part because EMC has done the best job of delivering an end-to-end enterprise backup solution. The key to its success? Automating the management of backup data. So while EMC still “sells more storage,” companies buy much less of it than they would have since EMC added deduplication (aka data management) functionality built into its products.
  • STORServer is carving out a nice niche for itself with its EBA backup appliances. These appliances are powered by IBM TSM beneath the covers but they come with a STORServer software wrapper. This makes them easy to deploy and manage for the backup tasks that companies are first looking to solve when they deploy these appliances while giving them the flexibility to easily expand into more comprehensive data management functionality when they are ready to do so.

These and other products from the likes of HP, NetApp and Symantec all reflect this growing trend of vendors bundling in data management software with their traditional “backup” and/or “storage” solutions which companies already have line items for in their annual budgets.

The bundling of data management software in these respective backup and storage solutions is not necessarily “new.” All of the aforementioned companies have offered data management software (or variations of it) for years.

What is “new” is the fact that organizations are becoming aware that this data management software is included in these products and are more fully taking advantage of the capabilities that it has to offer. This transition is in large part occurring because the growth of data is finally outstripping the ability of companies to buy enough storage to house it all. Further, whatever objections they may have had in the past to using it are being overcome by the fact that the data management software is available with existing backup and storage solutions and the ease and non-disruptive nature by which it may now be deployed.




Top 5 Most Viewed Blog Entries on DCIG’s Site in 2012

As the last business day of 2012 it is time for DCIG to unveil its most read blog entries of 2012. While a few long time reader favorites remain in this year’s Top 5, a couple of newcomers also made first time appearances on this year’s list driven by what is likely growing user interest/concern in managing Big Data and doing eDiscovery across their unstructured data stores.

#5 – Electronic Mail Gains Further Scrutiny in Electronic Discovery during 2007 (link). Joshua Konkle. This blog entry cracks the year end DCIG list for the first time ever and is representative of the heightened interest that readers had in 2012 of eDiscovery related content on DCIG’s website. In this particular blog entry, DCIG Sr Analyst and Partner Joshua Konkle examines what prompted the rise of eDiscovery software in the first half of the 2000 decade in general and the rise of KVS Software (now part of Symantec) in particular. He then takes a look at recent court cases that challenge the authenticity of emails, their legitimacy and privacy and even how well the software that archives and retains these emails stand up under scrutiny.

#4 – DCIG 2011 Virtual Server Backup Software Buyer’s Guide Now Available (link). Jerome Wendt. The release of the DCIG 2011 Virtual Server Backup Software Buyer’s Guide generated a huge amount of interest in December 2010 when it was released. Since then, interest in that Buyer’s Guide has remained high as evidenced by the large number of individuals coming to DCIG’s site looking for information about this Guide. While this blog entry has done well the past two years, I expect interest in this particular blog entry to wane in the coming months once the 2013 version of the Virtual Server Backup Software Buyer’s Guide is released.

#3 – Data Center Management 101 Part 1 (Cable Management) (link). Tim Anderson. This blog entry is another of DCIG’s most consistent performers year after year in terms of generating user interest. Since DCIG formally launched its website in 2008, this blog entry has made it into the Top 5 every year as one of DCIG’s most viewed blog entries. Despite its rather ordinary title, it continues to attract attention n an age where esoteric topics such as Big Data management, the Cloud and deduplication predominate, Tim’s thoughts and guidance on cable managements remain particularly relevant and arguably have taken on increased importance over the years as more emphasis is put on data center availability.

#2 – Huron Consulting Announces V2locity (velocity) (link). Joshua Konkle. This blog entry also rode the growing interest that readers have in better managing their Big Data stores and performing eDiscovery across them. While the title of this blog is fairly innocuous, it is this blog entry’s content that appears to be generating user interest. One of the larger concerns that organizations have when performing eDiscoveries (aside from how the eDiscovery is actually performed by the software) is, “What are some of the different ways eDiscovery offerings are priced?

This particular blog entry examines some of the different ways consulting firms price their eDiscovery services and Huron Consulting’s new (or new in 2007) pay-per-page offering V3locity.  Again, what exactly sparked the heightened in this blog on DCIG’s site in 2012 is unclear but we attribute it to the growing concern that organizations have about the costs associated with managing Big Data.

#1 – Prerequisites for Introducing All-in-One Computing into Enterprise IT (link). Jerome Wendt. This marks the second consecutive year that this blog entry came in #1 on DCIG’s website as its most read blog entry. More enterprises are looking to simplify IT but doing so requires products that come with a comprehensive list of features that can scale to meet the particular needs of their applications (availability, capacity, performance, reliability support) without becoming too complex or cumbersome to manage. This blog entry remains particularly relevant in today’s era of cloud storage and cloud computing where enterprises want easy to manage yet flexible and cost-effective storage solutions.

Interested in seeing what other blog entries were frequently read on DCIG’s site in 2012? If so, check out these three blog entries:

  • Last Friday’s blog entry that looks at the blog entries that ranked #6 – #10 on DCIG’s site in 2012
  • Last Thursday’s blog entry that reveals the most read blog entries on DCIG’s site in 2012 written in 2012
  • Last Wednesday’s blog entry that looks at the blog entries on DCIG’s website that earned an honorable mention for 2012



Overriding Desire is for Archiving to Deliver eDiscovery and Information Management; C2C Systems Executive Interview Part II

The purpose of archiving is becoming more than simply facilitating smaller email stores, faster response times or better use of expensive storage capacity. The growing driver behind archiving is to enable organizations to implement information governance. In this second blog entry in my interview series with C2C System’s CTO Ken Hughes, Ken explains eDiscovery and retention management are becoming the new driving forces behind archiving and why C2C’s ArchiveOne is so well positioned to respond to that trend.

Charles: How has archiving changed from the past?

Ken: The world of email archiving is evolving. Traditionally people are archiving for either capacity reasons or compliance reasons as about 70 percent of our business is pure play archiving.

This does not mean they are doing only PST management. They are doing compliance management. They are doing discovery. They are doing retention. But it is all at a fairly simplistic level.

But there is a lot more going on that is moving us toward a new world of information governance. In this world the overriding desire is for discovery or retention management.

We have some very large customers today who say the driver has come from corporate counsel. They do not want to keep emails for more than a certain period. They will define this type of email for one type of retention period; if it is another type of email it is this retention period. This has been the driver for our business and it is that part of our business is increasing.

Although it is email archiving, the driver for it is coming from discovery management or retention management. Life is changing in information management and information governance. These drivers are starting to come more and more to the forefront of customers’ requests.

The core problems of archiving and email management C2C Systems essentially solved years ago. C2C developed a product called “Max Compression” which is the ability to automatically zip and unzip attachments which is invisible to the user.

C2C sold this to very advanced technology companies who could not get enough bandwidth or storage to satisfy their users. The max compression also sold very quickly to the big oil companies, telecom companies and silicon chip companies, who said we have to get more power to our users so that’s what we delivered. This sort of technology has been embedded in ArchiveOne for many years.

Everything we have done since have focused on around improving life in terms of managing the email system while at the same time not making life any more difficult for the end user. The user has a business to run and he/she does not want IT or IT applications interfering with the operation of it.

Charles: Are there other patents wrapped around any of the compression technology that you’re using?

Ken:  No, C2C actually uses zip technology. Within zip C2C actually just changes the icon when it sends the zip file so the user would not see a Word icon on his desktop and believe it was a real Word document. It is in fact a compressed Word document with the idea being that a user never has to call a help desk.

C2C had a deal with Nestle in the works where it delayed its initial evaluation of the pilot roll out as our product generated two calls to the help desk on the first day. At that time C2C had not done a good job of hiding everything.  It took us almost a year to get that deal back in the pipeline as Nestle wanted no change in the user interface so that nothing user saw was going to worry him.

Charles: So how do email archiving products themselves differ? From my own research I know there is quite a bit of difference between them.

Ken: Every archiving vendor will tell you it does discovery, retention, manage the repositories, and has good administration. But the level of sophistication varies between those vendors.

If you were just given a tick list, 20 competitors will say, “Yes, we do that.” If you were to then score their features, some will get 1 out of 10 and some will get 8, 9, and or even 10 out of 10. In our view, we score the 8, 9 and 10 out of 10 on all those points.

C2C has been doing this for 10 years during which it has been on Gartner’s Magic Quadrant. Its level of sophistication stacks up completely against Symantec Enterprise Vault which is widely recognized as the market leader. C2C has no problem taking it on and no problem beating it technically.

There is one differentiator between competitors which C2C sees as the ability to discover data. Low end vendors just go and grab data from the email system and information store, they archive it, and then do a fairly low grade level of discovery and retention management and archiving features.

Symantec Enterprise Vault can go and find information on its information store. If you provide and point it at the PSTs where the PSTs are on the final server on the corporation network, Enterprise Vault can ingest those PSTs into their archive and do sophisticated management of the data.

C2C’s biggest differentiator is it does not care where the PSTs are located. PSTs may be scattered anywhere. C2C has done a lot of work in recent months or even the last couple of years around PST management.

C2C believes that somewhere in the region of 30 to 40 percent of PSTs exist on people’s desktops or laptops. C2C’s competitors go and find them but if they can’t find them, they can’t ingest them. In fact even if they could find them, all of their processes are about ingesting them from a known location with that known location having to be a file server. C2C can find them and make the admin aware that those PSTs exist.

The other differentiator is C2C can look at the age of the data inside of the PST and then make a decision as to whether or not to ingest it into the archive. Our competitors, even if given the PST, ingest all the data into their archive. They then examine it and say, “Look 95 percent of it i’s more than 5 years old, past our retention period, so let’s delete it.”

C2C’s view is to look at the PST on the desktop and say, “95 percent of the data is more than 5 years old, past the retention period, so let’s just delete it now.” C2C will therefore only incur the pain of ingesting 5 percent of the data.

C2C’s view is that in the real world data exists everywhere. PSTs may have been restored to a file server of which the company was completely unaware. This PST may belong to a former employee so it may have no association to any living employee in the company.

In Part I of this interview series, Ken discusses C2C’s focus on Microsoft Exchange and which size environments C2C’s products are best positioned to handle.

In Part III of this interview series, Ken discusses C2C’s policy management features and the granular ways in which users may manage deletion in their data stores.

In Part IV of this interview series, Ken examines how C2C performs search across distributed email and file
systems and what techniques it employs to establish data ownership
.

In Part V of this interview series Ken explains how C2C manages archival data stores.




Exchange Administrators Need a Consistent Interface for Email Archiving; Interview with C2C Systems CTO Ken Hughes Part I

Archiving is emerging as one of the hot new trends of the next decade with organizations looking for better ways to manage their Big Data stores. Perhaps nowhere is data growth more rampant – and the need for better ways to manage it – more evident than with corporate email stores. In this blog entry, I begin an interview series with C2C System’s CTO Ken Hughes in which we initially discuss C2C’s focus on Microsoft Exchange and which size environments C2C’s products are best positioned to handle.

Charles: Ken, thanks for joining me today. To start this off, can you do a brief introduction of C2C Systems?

Ken: Thanks for having me, Charles. This is where C2C Systems believes it is. It delivers consistent information management including discovery, retention and archived email regardless of its location. This is probably C2C’s biggest differentiator.

C2C first introduced email archiving in 2002 primarily around Microsoft Exchange though C2C started working with email as far back as Microsoft Mail before Microsoft Exchange even existed in 1996. So our team started learning about Microsoft Exchange in the mid 90s. What C2C knows about Exchange is probably the same as the best guys in Microsoft as our team has been in place for a long time.

Charles: Why only Exchange? Why not other products like Domino and SharePoint?

Ken: Domino is just a smaller market. It is complex and though we have a solution, it is just easier for us to focus on Exchange. In regards to SharePoint, C2C has never really seen it as a big enough market to warrant the investment from our development team as the demand just is not there.

Charles: So is all of your development done from a Microsoft perspective and is your platform a Microsoft stack also?

Ken: C2C is a complete Microsoft stack. While we do work on the Domino product, that is done to just hook into the data. All of our core processing and functionality is developed on the Microsoft stack.

This comes back to our idea of consistent data management. C2C believe organizations should manage all data in the same way. You do not want to have different methods of managing different data regardless of its location.

Ease of installation and operation are also important as that is feedback that we consistently get back. This is why C2C made ArchiveOne is easy to install and manage.

Now people may say that you only install and archiving product once so it does not matter that it takes 10 or 15 days to complete this task. Our view is that you absolutely have to show that you can install it quickly, easily and intuitively so you can show that you can do the ongoing management.

Another item that comes through is we have a consistent interface for the Exchange administrator that is presents itself in the way that they tend to think today. Everything C2C does is very familiar to the Exchange administrator.

This is particularly important to the market that is our focus. Our key target market is 500 to 5,000 or maybe 10,000 users. We picked that market because the EMC’s, the Autonomy’s and the Symantec’s do not focus heavily on that market so it is an easier sale.

However we are happy to scale to meet the needs of very large companies. We have some of the largest oil companies and multi-national banks that have over 100,000 users implemented on some of our products. In fact, C2C’s largest archiving installation is some 70,000+ users in the chemical industry.

C2C’s difference is that it is one of the classic on premise archiving vendors. We do that for the security and integration with Exchange. If you’re going to that level of complexity, it’s very difficult in very small companies with less than 50 users as C2C is setting up a complex of systems so C2C is really targeting the enterprise class company.

In Part II of this interview series, Ken explains why eDiscovery and retention management are becoming the new driving forces behind archiving and why C2C’s ArchiveOne is so well positioned to respond to that trend.

In Part III of this interview series, Ken discusses C2C’s policy management features and the granular ways in which users may manage deletion in their data stores.

In Part IV of this interview series, Ken examines how C2C performs search across distributed email and file
systems and what techniques it employs to establish data ownership
.

In Part V of this interview series Ken explains how C2C manages archival data stores.




Reflections on SNW 2012: Time to Revisit Assumptions About Storage

SNW 2012 revealed a dynamic industry that is innovating across all storage tiers. From incorporating super-low-latency flash memory into the data center to new tape formats that essentially turn tape libraries into high-latency disk drives, lots of talent is being applied to meet the growing demands that enterprises have for their storage systems.

As an IT Director at three different universities over the last 24 years I have researched and purchased multiple storage systems and taken two universities through the data center virtualization process, including establishing off-site disaster recovery capabilities. SNW 2012 was my first opportunity to sit down and talk with some of the people who envision and create these systems.  

As my father-in-law advised my son, who was about to start his first job, “There is a knack to everything.” In other words, there is specialized knowledge or skill in every endeavor that makes a real difference in the time and energy required to produce a given amount of work as well as the quality of the results achieved. This specialized knowledge and skill distinguishes the novice from the amateur, and the mere wage earner from the expert.

When it comes to storage systems, engineering still matters. There is specialized storage knowledge that applies across time whether designing/implementing a file system, writing software that implements storage protocols, or qualifying hard drives for use in storage arrays. On the other hand, flash memory presents new opportunities and challenges as engineers seek to leverage the strengths and mitigate the weaknesses of flash memory as it is introduced into enterprise storage systems.

Not surprisingly, flash storage and SSDs were the primary focus of storage system innovation. Multiple permutations of where flash belongs in the data center storage infrastructure were in evidence here, including all-flash PCI cards, SSD-based arrays and appliances, and SSD as one layer in multi-layer storage systems. In some cases this flash memory is intended to serve as primary storage, in other cases just as a cache in front of primary storage.

Somewhat surprisingly, tape seems to be experiencing a resurgence. Tape lives at the opposite end of the storage tier from SSD. The renewed interest in tape is based largely on the Linear Tape File System (LTFS), an open format released by IBM in 2010 that makes accessing files stored on LTFS-formatted tape similar to accessing files stored on disk.  

LTFS is being used to make archive data much more readily accessible since files can be retrieved and used directly from the tape media without having to be restored to disk first. Although there are many applications of this technology, there has been particular interest among broadcasters and others who need to make available large amounts of archived video content. Although latency is high, once a file is found it can in many cases be retrieved from tape as rapidly as from a disk array.

Another dynamic I observed at last week’s SNW 2012 is that there are experts–not only CxO’s but also engineers–who have worked in storage companies from startup through acquisition (plus the required period before stock options can be exercised) and then repeated the cycle. Related to this dynamic is how easy it is for the process of being acquired to slow the pace of innovation in a given product as the product and people from the startup are integrated into the larger entity.  

From a customer perspective this dynamic may drive some of the risk out of adopting a startup’s technology–if the startup has a solid team of storage experts engaged in creating their products. This dynamic also suggests that the acquisition of a startup by a major player entails its own set of risks for current customers. The bottom line is that due diligence is still very much required of customers to go beyond check-boxes on a requirements list to understand not only present capabilities but the likely future path of a given solution.

Cloud computing and storage, including file-sync-and-share technologies, were also topics of significant interest among attendees. Intuit’s CIO shared an insider’s view of how Intuit has labored to move IT from primarily a compliance focus to being a provider of Global Enterprise Solutions that owns business outcomes; even as their own private cloud has become key to meeting customer requirements for mobile, social and global service capabilities. It is a testament to the rapid maturation of public cloud offerings that Intuit is now seriously evaluating a transition to the public cloud even for sensitive financial data.

There are new dynamics in data storage and retrieval–especially the demands that Big Data puts on storage systems and the emergence of flash memory–that mean it is time to revisit assumptions about storage systems. The need for fresh thinking about storage is as true for the businesses that purchase storage systems as it is for the people who create those systems. 




Tape Libraries Just Keep Getting Better with Age; Interview with Spectra Logic CEO Thompson Part III

In today’s information age our focus always tends to be on the here and now and how quickly we can access information that was made sometimes just seconds ago. But in terms of the total amount of data in the digital universe, that is just the tip of the iceberg with possibly as much as 90% of today’s data existing as archival data. Ensuring the integrity of that data and making sure it is stored cost effectively for decades is the responsibility of today’s new generation of tape libraries. In part 3 of my interview series with Spectra Logic’s CEO Nathan Thompson, we discuss how tape libraries have continued to mature to meet today’s new business demands for retaining archival data for even longer periods of time.

Jerome: 
How have tape libraries continued to mature – even in the last year?

Nathan:  This is a story that is not very well told. 20 years ago disk drives had low reliability but they have tremendously improved over that time. In much the same way, tape libraries and tape applications have also tremendously improved, as has the reliability of tape media.

Today, the reliability of LTO drives and tape libraries are nothing short of spectacular.  Development of new features, capabilities and intelligence in tape has continually been invested in and delivered upon year after year.

In that vein, I’ll speak to a feature that Spectra Logic put out a year and a half ago that really became deployed in customer environments over the last 12 months.

We built a feature into our libraries called Data Integrity Verification. Here is what that is: if a user writes a tape on Tuesday, the library itself will load that tape in a separate tape drive on Wednesday , and conduct a quick read verification to confirm that there are no errors that cannot be corrected by the tape drive’s integrated error correction system as the data is written to the drive.

Our T-Series tape libraries can be configured to verify data integrity every six months, or every year, or every five years from that point forward. So a verification and validation system is now built into our tape libraries, at no cost to the user.

We also announced in November 2011 a new technology for tape media health assurance called CarbideClean. CarbideClean does an initial cleaning of “Green” tapes that have never been written to before they are deployed in the customer environment. This reduces debris on the heads of the tape drives as well as decreases needs for tape drive cleaning.

This CarbideClean process has in fact also resulted in an improvement in actual tape capacity and performance. It is a relatively small increase, maybe a two to three percent capacity improvement coupled with a five to eight percent improvement in speed using this innovation.

The concept of pre-cleaning tape media was brought to our attention by a large customer who observed some characteristics of continuous tape use so we built a process into our tape libraries to address it.

Another example of ongoing Tape Library innovation is improved usability features. Tape libraries from decades ago were considered very hard to use and to manage.  Useability features released in the most recent Spectra BlueScale 12 software in the last year continued in our efforts to make tape as easy to manage as disk. 

BlueScale 12 included an XML interface. As our customers upgrade to BlueScale 12 (a free upgrade) they can interface, monitor and manage their tape library using XML as well as programmatically interface with it.

So if they want somebody monitoring a tape library for any variety of conditions that might occur in a data center, it’s very easy to do. These are just some of the innovations that have recently occurred in tape libraries.

Jerome:  So as tape libraries offer these new features, what percentage of tape libraries is still being used in the traditional backup and recovery role and what percentage is being deployed in new capacities?

Nathan:   I would estimate that, on a day-to-day basis, approximately 70 percent of tape libraries in the field are being used primarily to support backup and disaster recovery while the other 30 percent are being used to support archive.

However, if you look at the amount of data on tape libraries and what type it is, the percentages are probably the other way around. Probably 70 to 80 percent of the amount of information that is stored in the aggregate set of tape libraries that we have installed around the world is archival information and it may be as high as 90 percent. The rest would be backup data.  

The growth in unstructured data over the past decade has dramatically increased the amount of data on tape for archive.Most of the really big libraries (over 5000 tape slot libraries) we have installed are being used for archival.

We have a T-Finity tape library  at the Korean Meteorological Institute, which captures weather history for the southern part of the Korean peninsula. KMA’s Spectra T-Finity tape library is tied to a Cray supercomputer, and archives PB upon PB of weather history. The reason is that they run weather models that predict the weather in South Korea so they built a model that inputs the history of the weather and uses it to predict the future.

In their case they need to keep weather history forever. The weather history in Korea that was captured five years ago, or two years ago, or one year ago is being stored in our T-Finity tape library. 100 years from now that information is still going to be important and relevant– because weather will still be predicted.

The only way you can really predict the weather is to access historical climate models. In those climate models you have to plug in previous data and previous weather patterns to see if you are correctly predicting it. That’s one example of the kind of application that will store information forever.

We also have the National Archives and the Library of Congress as accounts, both of which are storing video information. They have large central libraries and are required to maintain information for the life of the republic plus 100 years. So, how best to store all of that data? On spinning disk drives? I hope not.

All of the airfoil design that every airplane uses, formerly known as the NACA. It does simulations and wind tunnels at NASA Ames and it keeps data forever. So there are an enormous number of applications like that and you just can’t realistically keep that information on disk.

In Part I of this interview series Nathan shares how and why Spectra Logic got its start in the tape business and what differentiates it from almost every other tape manufacturer even today.

In Part II of this interview series Nathan discusses why Spectra Logic decided to double down on tape even as many experts were forecasting its death.

In Part IV of this interview series, Nathan discusses why tape will remain an integral part of backup processes for a long time to come.

In Part V of this interview series, Nathan talks about what new features we can expect to see from tape and what new roles it will be able to assume in just a few years.




“Archive, Replicate, Recover” is a Natural Progression to the Cloud; Interview with AIS VP of Network Engineering Steve Wallace Part II

Crawl. Walk. Run. That progression pretty well summarizes how most people look to take advantage of cloud service providers over time though, in cloud services terminology, the progression may be better summed up as: Archive, Replicate, Recover. Today I conclude my conversation with American Internet Service’s VP of Network Engineering, Steve Wallace, as we examine how many of AIS’ clients initially get their data into the AIS cloud and then expand their use of AIS cloud services over time.

Jerome:
In terms of how you implement your offering, there are a number of ways to configure Nirvanix. Do you encourage people to deploy Nirvanix inside of their environment and virtualize their existing storage in pass thru mode so transactional data may be both kept local and moved offsite to either your data center or another one of their data centers? Alternatively, do you give them an appliance so they may do a mirrored write or make a copy of their existing data? Or do you give them multiple options?

Wallace: Nirvanix is an archival solution and is really not intended for live replication of transactional data. It is an excellent way to ensure that you have a recovery point that is independent of your local facilities. Their service is generally used with a gateway application or appliance that manages the ingestion of data into their storage nodes. For the folks who do not have the manpower or expertise to implement that type of solution, AIS provides a simple appliance.

We present them with an NFS or CIFS share that is on a VM, so they can essentially dump their data and have the appliance send it out into the cloud. That makes it a one step process since most folks understand the concept of a CIFS or NFS share and grasp that if they copy data or dump data on the share it will get to the archive.

Another AIS benefit is that we also operate a large wide-area network, so it costs us virtually nothing to send data to Nirvanix. We make sure the clients do not get charged for that data that gets pushed out.

Jerome: Generally the easiest way for people to move to the cloud is to provide them with a “bolt-on” type of approach. In that scenario, they leave their existing infrastructure in place but add a gateway, file share or portal to it that they can access. They then use their backup software, archival software, data management software – whatever -that copies their data to the portal that puts data into the cloud. Would that be a correct way to typify how you are encouraging people to move into the cloud?

Wallace
: That is correct. When you say cloud in this sense, we are just talking about cloud archiving because we have cloud services, which in some ways, are actually easier to manage than that.

When looking at a DR scenario in a state like California, even if you push your data up to the Nirvanix cloud archive and California becomes a smoking crater – and you do not have infrastructure anywhere else you are in big trouble!  It’s great, you have your data, but what are you going to do to get back online?

The DR solution is really incomplete without having some type of virtualized infrastructure and that is a service we provide. We can recreate your primary application infrastructure on a virtual infrastructure at our remote facility and then synchronize data between the two sites.

That pushes back the Nirvanix solution to a solution of a last resort or a protected archive service. If all else fails, we still have that. The data is fully protected and we know exactly where it is.

Jerome:
So if I understand you correctly, you use Nirvanix as a cloud archive offering as a way to help organizations get started in the cloud. Once there, they are more comfortable starting this 360-degree discussion about how they can create a more highly available, recoverable environment for their applications within the state, out of the state or even internationally. Is that a correct conclusion to draw?

Wallace: That is correct. The first thing you need to do is get your data offsite. Get it somewhere else but somewhere other than on a USB drive that you put in the trunk of your car every week and swap out. Get that data somewhere else where it is accessible. That is a great play for the cloud archiving.

The next step is to get down to the practicalities of recovery.
That is the next logical step, which is true even for smaller businesses. Then you are looking at recovering in hours as opposed to being down for a couple of weeks.

Jerome: So to summarize what AIS is doing: AIS gives companies an entry point to solve their immediate cloud archiving, cloud backup or compliance requirements. Then when a company is ready, it can take it up to this 360 degree approach.

Wallace:
That is a good overview. 85% of our clients use some sort of cloud infrastructure and can extend their IT infrastructure into the AIS cloud. We have built our cloud infrastructure deliberately so it can be used as an extension of their existing cloud.

Our infrastructure is compatible with VMware, so they can manage it from their own space with their own management tools. It is totally private and secure. So rather than having to buy more equipment for their corporate datacenter, they can put a network cable in place and their cloud servers and our cloud servers will effectively sit right next to one another in the same rack.

That concept may be extended even to our Phoenix site. If they want to have a backup DR with their backup infrastructure ready to go, we can transmit their data across our 10 Gb links to Phoenix. We can achieve almost synchronous data replication across that link.

What we are hearing from our customers these days is that none of them want to buy any more equipment. We are their infrastructure service provider. We provide space, power and cooling so they don’t have to manage data centers.  They recognize that is not their core business, so they expect us to do that for them. Now we talk about utility compute and utility storage – so they don’t have the expense and distraction of managing hardware.
 
That is how many cloud service providers in general and AIS specifically have evolved. We started by getting their data offsite and into the data center. Now that we have our client’s data offsite, we want to replicate it into another data center or another safe place. With the cloud, we can mirror their physical environment locally and in a remote data center with a much lower cost.  We still need to keep that safety net of cloud archiving.

The economic drivers push businesses to reduce the total cost of ownership. No one wants to own and maintain cloud archiving, storage or any other type of storage. It is not as simple as you would think and the data requirements are tremendous.

The genomics people sometimes require 100 TBs or more for raw data. After they are done processing it, it might drop down to a TB or two but considering the amount of jobs they are processing, we are still talking huge amounts of data. That’s a lot of infrastructure built around moving and storing data. These clients are not interested in owning that infrastructure – they’d rather be sequencing genomes.

In Part I of this interview series with Steve Wallace, we talk about how a convergence in service offerings is occurring among cloud service providers driven by their client needs for them to offer business continuity, disaster recovery and compliance services.




DCIG 2012 Big Data Tape Library Buyer’s Guide Now Available

DCIG is very excited to announce the availability of its inaugural DCIG 2012 Big Data Tape Library Buyer’s Guide that weights, scores and ranks over 140 features on more than 60 tape libraries from 8 different storage providers. Driven by the explosion of storage requirements to address “Big Data” and the “Cloud,” organizations are now more than ever looking for cost-effective, viable storage media on which to store this data. This is why DCIG believes tape libraries are poised to be one of the big benefactors of these growing storage demands which prompted DCIG to produce its first ever Tape Library Buyer’s Guide to help enterprises choose the right solution for their environment.

BIG-DATA-TAPE-LIBRARY-Buyers-Guide-Logo.jpgThe world of Big Data is upon us. More organizations of all sizes capture, store and retain more data for longer periods of time than ever before. Even as the traditional drivers of data growth remain with us (backups, growth of structured data stores, etc.), new ways in which organizations may capture data are driving today’s unprecedented data growth.

As enterprise organizations come to grips with their Big Data requirements and/or look to store data in the cloud, the cost of retaining all of that data is beginning to come fully into focus, especially if they look solely at disk to do so.

2, 3 and even 4 TB disks coupled with compression and data deduplication have certainly contributed to lower the upfront cost of disk such that, on a per GB basis, it is now on par and may even be lower than tape. But what is getting the attention of more organizations is the operational expenses (OPEX) associated with keeping these disks powered on.

Further, not all data is created the same. While it may be “valuable” to the organization, it may not have an immediate value that justifies storing it to disk and incurring ongoing operational costs. Additionally, organizations are also looking to store data that:

  • Cannot be easily or cost-effectively reacquired
  • Consumes little or no power
  • Does not compress or deduplicate well or at all
  • Is rarely or infrequently accessed
  • Needs to be retained for years or even decades
  • Scales into the hundreds of terabytes or even petabytes

It is for reasons like these that DCIG produced this 2012 Big Data Tape Library Buyer’s Guide as DCIG sees tape as being viable storage medium for the foreseeable future. In it, DCIG accounts for tape’s historical use case of backup as well as its emerging role as a cost effective storage medium for archival data that is resulting from the advent of Big Data and the Cloud.

In doing its research for this Buyer’s Guide, DCIG uncovered some interesting statistics about tape libraries in general:

  • 100% supported the LTO format
  • 100% supported a FC interface
  • 39% still support a SCSI interface
  • 64% have 1 year warranties
  • 36% have 3 year warranties
  • 22% scale to support over one (1) petabyte of storage capacity
  • Only 7.5% support a tape media other than LTO (SDLT, 9840, TS1140, etc.)
  • 6% offer dual robotics

As with prior DCIG Buyer’s Guides, it accomplishes the following objectives for end users:

  • Provides an objective, third party evaluation of tape libraries that weights, scores and ranks their features from an end user’s viewpoint
  • Includes recommendations on how to best use this Buyer’s Guide
  • Scores and ranks the features on each tape libraries based upon criteria that matter most to end users so they can quickly know which tape libraries are the most appropriate for them to use and under what conditions
  • Provides data sheets for 66 tape libraries from 8 different storage providers so end users can do quick comparisons of the features that are supported and not supported on each tape library
  • Provides insight into which features on a tape library will result in improved availability and increased storage capacities
  • Provides insight into which tape libraries  are supported by popular archiving  and  backup software products
  • Gives any organization the ability to request competitive bids from different providers of tape libraries that are “apples-to-apples” comparisons

The DCIG 2012 Big Data Tape Library Buyer’s Guide is available immediately and may be downloaded for no charge with registration by following this link.

Bitnami