Last week I wrote about Symantec’s introduction of the Data Insight feature into its Data Loss Prevention (DLP) product. But afterwards a number of questions came to my mind as to how the DLP product itself worked, especially when compared to other solutions in the eDiscovery, search and storage management space, as well as how the Data Insight feature is implemented. So to get those questions answered, I got back on the phone with Robert Hamilton, Symantec’s Senior Product Marketing Manager for DLP.
My reason for reaching out to Robert was to clarify some of the points he made during our initial briefing. Specifically, he made some comments that the Data Insight product would only support file systems from NetApp and EMC and not others.
But as I got to thinking about those remarks after the call, that puzzled me. After all, the DLP product works with Microsoft Active Directory (AD) as well as LDAP so why would it not work with any network attached storage (NAS) filer, especially one based on Microsoft’s Windows Storage Server? So Robert was gracious enough to get back on the phone to answer these questions for me.
To kick off our conversation, Robert provided some further context on the specific problem that Data Insight is intended to solve. Even without Data Insight, DLP can already generally determine ownership of active files. It does this when these files are crossing the network or as they are used on PCs. In these circumstances, DLP can infer file ownership as it knows where the email is coming from or who owns the PC.
However there are three types of data loss prevention that organizations deploy DLP to solve – data at rest on NAS, data moving across the network and on end points such as laptops and PC. Data Insight is primarily intended to help DLP solve the problem of identifying the ownership of files that are at rest on file servers and are not being actively accessed by any one particular user or group of users.
So with that additional background, my first question was a point of clarification. Robert used the term “ACL” in our initial conversation so I first wanted to confirm that he and I were on the same page in my understanding that “ACL” meant “Access Control List”. (Note: I have learned not too assume what acronyms mean.) Assuming that was the case, if DLP does support ACLs, why does it only integrate with NetApp and EMC and not other file servers such as Windows Storage Server?
Robert assured me that we were on the same page as to my understanding of the “ACL” acronym. Data Insight uses ACLs to resolve the ownership of files at rest to a real AD or LDAP user. It then correlates and applies those file system ACLs with the appropriate user and group permissions that it retrieves from AD or LDAP.
The reason that Data Insight only initially supports NetApp and EMC filers is more involved. Both of these filers generate and retain file transaction data but these two network file systems also possess APIs that expose this transaction data so that applications such as DLP can access and use them.
Data Insight utilizes these APIs to then get the raw materials it needs to do its job of determining file ownership. Every time files are accessed, changed or created or deleted, EMC and/or NetApp filers create a record of that activity. Using their APIs, Data Insight can then access this information to determine who owns the file.
Their APIs are also the primary reason that these two file systems are the only ones that DLP supports for now. Over time, Robert anticipates that DLP will integrate with other file systems; it is just that more work is required to integrate with those file systems that do not have these APIs.
I then probed Robert on some particulars as to how quickly Data Insight processes information once it is deployed. The scenario that I gave him as an example was that if an enterprise has one million Microsoft Office documents that were about 50 KB in size, how long would it take Data Insight to do the initial processing of that data?
Like any performance related question, the answer is, “It depends.” But Rob did go on to explain that the initial scan of a system as I described would likely take hours. It is not so fast that it happens instantaneously but it will not take days, weeks or months either. Then once the initial scan is complete, it processes files as they are incoming because the EMC and NetApp APIs report when new files are created.
The next question also had me puzzled after our first conversation. Rob mentioned that DLP supports encryption but the more I thought about, the more confused I became. How can you confidently deploy a solution like DLP into one’s environment without first having answered the whole encryption key management problem?
In short, Robert explained that DLP does not have to solve this problem. DLP has a FlexResponse remediation capability that can, out of the box, automatically apply to a file that is deemed sensitive using PGP universal encryption, Liquid Machine Enterprise Rights management, Oracle IRM, Microsoft RMS, or GigaTrust Enterprise Rights Management.
To apply this encryption, DLP essentially calls this other application to encrypt or apply enterprise rights management to that file and let that application and its key management and its policies take over from there. In that sense, DLP hands off the problem of encryption key management.
The encryption keys are all managed by these other applications so that application can apply its own encryption or enterprise rights management to the file. DLP only automates the application of encryption or enterprise rights management but stays out of their way once it tells them to do it.
My final question for Robert was trying to understand how DLP compares against traditional storage management and eDiscovery products like Kazeon (now EMC) or Autonomy. It seems like there is some overlap but also some distinct differences.
Rob explained that all of these products (to include DLP) can find sensitive data. Where DLP differs is that it has a more extensive policy building capability. For example, it can fingerprint databases of customer numbers or account numbers such that large credit bureaus that have everybody’s social security numbers, account numbers and last names can create policies that monitor for that data. Then if anyone in the organization attempts to send this information out in the form of an email or copy it to a thumb drive, this activity can be detected and caught.
DLP also has a workflow capability that is specifically designed to protect data whereas as information servers are more involved in finding data that is growing for purposes of storage management.
Data loss prevention is about developing policies to guard against data that you do not want to lose versus any type of data that you want to classify. DLP is a classic security application all the way whereas these other products are
more about enterprise search that are being multi-purposed for other activities.
So I hope these additional comments from Rob help you as much as they did me in understanding what DLP is all about, what set of problems it solves, how Data Insight performs and how it compares to other products in this space.