Inherent Challenges & Risks Associated with Data Collections

Thursday, December 30, 2010 by Thought Leadership Team

First, data storage has become less expensive throughout the years, and as a result many organizations are storing larger volumes of data in disparate locations for longer periods of time. The tendency to “hold onto” data has exponentially increased the cost, effort and time required to collect electronically stored information (ESI). In addition, the advent of new data storage technologies has made locating and collecting relevant data more difficult. As such, data collection practitioners require more education and experience to properly manage some of the newer technical issues that have recently surfaced, such as collecting data from Microsoft® SharePoint® environments, and the myriad of security and accessibility issues surrounding the storage and collection of data from “the cloud” in web-accessible cloud computing environments.

Defensible Data Collection

“Data collection” is a broad term that may refer to the collecting of electronic information for any purpose, including collecting data in anticipation and preparation for a legal proceeding. Electronically stored information (ESI) can come in almost any format, such as emails, wave files, pictures, spreadsheets, databases and loose files, as well as information contained on social media sites. Electronic data can be stored on something as simple as a single hard drive in one computer, or as complicated as massive storage area networks (SANs) that comprise thousands of individual drives that are concurrently accessed by thousands of computers and users.

Before a collection occurs, a data collection plan should be created to ensure an orchestrated and comprehensive plan of attack. This tends to result in a smoother collection process than simply reacting ad-hoc to an incident. The data collection plan should identify potential data locations and key players (individuals that are likely to have or control relevant information), internal and external contact information, procedural guidelines and documented chain of custody expectations. Once a data collection plan is in place and key players are identified, custodians should be interviewed about computer use and data storage habits to help identify the location(s) of potentially relevant ESI.

One way to strengthen the defensibility of a given data collection methodology is to begin with an IT application inventory and data map, which is designed to provide a clear, easily understood reference point regarding the location of potential data sources. At a minimum, the data map should include:

• Name of the ESI, its description and aliases (if any) • Name of the application or system used to create the record • ESI location(s) on the organization’s systems or hosted by third parties (including any copies of the data stored for disaster recovery or redundancy) • A record of past and present IT platforms including operating systems, applications and databases • Backup procedures, retention and disposal policies as well as backup rotation schedules • Contact information for a designated point-person for each ESI source including: 1) a business line expert that understands the data and its connection to the business, 2) an IT contact that supports the platform and 3) a user that understands how the application or system works

When creating the data map, it is important to remember to include data that may be stored outside the company through alternative storage means such as cloud computing, third-party application service providers and/or online/offline archive storage facilities. Once the data map is built, it must be continually updated and maintained to ensure that it remains an accurate picture of the organization’s technology environment.

Avoid Data Collection Pitfalls

As discussed above, the process of collecting data presents many challenges based on data volume, type, complexity and potential locations. It is important for those collecting data to understand and avoid common pitfalls, which can create significant legal and business continuity issues for corporate and outside counsel as well as IT. One common collection hazard may occur when a desk-side collection is performed. Risky desk-side collection practices include forwarding email to a central box for collection, or directing employees to print relevant documents. In addition, many software products are available for clients that wish to collect data themselves, but these products should not be used unsupervised or by untrained individuals. Do-it-yourself data collection practices create a high risk of omitting relevant documents, can potentially alter metadata, and therefore can increase the risk of spoliation claims.

Another damaging pitfall to avoid is failing to document and maintain proper chain of custody documentation. A proper chain of custody ensures the reliability of evidence and minimizes the risk that the evidence was changed, altered or modified from its original form. A chain of custody keeps a chronological record from the time data is gathered through the entire analysis process. The evidence is safeguarded and protected from theft, damage and other potentially deleterious change. Maintaining a proper chain of custody strengthens the defensibility of the collection process, mitigates the risk of sanctions arising from spoliation claims, and increases the likelihood that electronic evidence will be admitted at a hearing or trial sometime in the future.


The pitfalls mentioned above represent only a small sampling of the dangers that may arise when a data collection is not performed properly. As technology and storage options continue to evolve, data collection will likely grow more complicated and present a greater likelihood of error in this critical stage of the ediscovery process. Creating a solid data collection plan based upon an application inventory and data map will help ensure that the collection process is thorough, complete and defensible. One way to ensure your data collection is performed properly and defensibly is to consider enlisting the assistance of a data collection expert. Outside experts can be used for a number of purposes. Not only can they aid in the collection process, but they can also serve as Fed.R.Civ.P. 30(b)(6) deponents and/or trial witnesses. In fact, courts have frequently ordered the use of computer forensic experts in such circumstances. Establishing a relationship with a reputable expert sooner rather than later may help ensure that your data collection process proceeds more smoothly and efficiently, and also stands up to legal scrutiny if and when that occurs.

Special thanks to Jason Paroff, Esq., Senior Director of Computer Forensic Operations with the Electronic Evidence Services group at Kroll Ontrack. Mr. Paroff is a former New York prosecutor who has testified as an expert witness in both federal and state courts, and has examined numerous computers and computer systems for evidence of fraud, theft of trade secrets, harassment and other improper civil and/or criminal conduct