Data Access and Usage Patterns Replacing File Metadata as the Best Way to Determine Ownership of Unstructured Data

Controlling storage costs as unstructured data (files) grows remains a key concern for IT but unmanaged data growth has other implications for the organization such as compliance burden, security risk and operational costs. Further, as regulations become more defined and numerous, the holes associated with relying solely on looking at their file metadata to determine which data to retain and for how long to keep it become more pronounced. This is why more organizations are turning to Symantec’s Data Insight to keep their storage costs under control while ensuring their data meets this complex set of regulatory requirements.

Using file metadata such as size, age and type as the primary way to ascertain how long to keep this data has historically made sense. This metadata includes information as to when the file was created, modified, who owned it (to include users and groups) and when it was last accessed. However as regulations become more defined and numerous, the holes associated with relying on this method of looking solely at their metadata to determine which data to retain and for how long to keep it become more pronounced.

Consider this scenario:

An organization classifies data based on ownership and establishes retention periods. A person then creates a folder or a file and stores it on a shared network drive that may be accessible by twenty other people.

Over time, the person changes jobs and no longer uses and/or manages this data even though fifteen (15) of the twenty individuals still actively access to the data. These individuals all work in different departments yet are subject to the same set of regulation that requires the data be kept for a minimum of three (3) years. However the person who created the data works in a department subject to a separate set of regulations that requires the data to be kept for at least five (5) years.

The dilemma then becomes two-fold:

  1. How does an organization determine who really owns the data?
  2. What is the right retention period for this data in light of the conflicting retention requirements?

While some may argue that five (5) years is the right retention period as it satisfies the more expansive set of requirements, there are two problems with making this assumption.

  • First, the individual who created is no longer using the data.
  • Second, this does not take into account the active users of the data who are likely the ‘true’ data owners.

So when the organization sets the retention period for the data, it lacks insight into who the ‘true’ data owners are – the fifteen (15) people using the file or the department in which the original data owner now works.  As a result, organizations are left in a bit of quandary as to who owns the data and how long it should be retained. 

The challenge for organizations then becomes how do they know who owns the data and who is using the data? Even assuming they do know who is using the data, how do they set the right retention policies based upon that knowledge?

This is where the new capabilities of Data Insight come to bear. Data Insight first does what it always has done: it establishes who should own the data by going beyond just looking at the file metadata and reporting on who created the file.

Rather, Data Insight identifies the individual, individuals or even what programs are regularly accessing the file. In this way an organization may do more than just identify who owns the file; they may leverage Data Insight to assign file ownership to the right individual or department.

Using this information about data usage and access patterns and then analyzing it, Data Insight can then help organizations build a map of the data and the users and/or departments to which it belongs. Organizations may then use Data Insight to classify data by ownership and roles so data may then be tied to specific retention policies appropriate for that data.

For example, policies may state that data owned by the HR group should be retained for 3 year. So if it is determined that the files are used by HR ownership can be assigned to HR and retention periods for these files set accordingly.

To get to this point in the earlier example described above, Data Insight would first alert the organization that the folder or file is a candidate for expiry. It would then additionally inform the organization based on its usage analytics that fifteen (15) people belonging to a department were the known active users of that data and therefore that department owns the data. Using this information the organization may then apply the retention policy for that department – the three (3) year retention period.

The multitude of regulations to which organizations are now subject coupled with the numerous options in which they may share and access data has put organizations in an almost impossible situation of trying to control storage costs while simultaneously satisfying all of these regulations.  This is why Symantec’s Data Insight is finding a new home in more organizations.

It eliminates their reliance on antiquated file metadata methods of determining file ownership by giving them the actionable analytics they need to determine who the real data owners are and what the right retention period is for their data so they only keep data as long as they need to keep it. In this way they may lower their storage costs and the level of risk to which their business is exposed while simultaneously improving their ability to manage their unstructured data stores.

Jerome M. Wendt

About Jerome M. Wendt

President & Lead Analyst of DCIG, Inc. Jerome Wendt is the President and Lead Analyst of DCIG Inc., an independent storage analyst and consulting firm. Mr. Wendt founded the company in September 2006.

Leave a Reply