When collecting raw ESI from multiple individuals, there are bound to be tremendous amounts of duplicative documents. In company-wide e-mail chains, for example, a message is sent to multiple recipients and stored within each individual's mailbox. Depending on your organization's data retention policies, copies of the same file might also be found on the employee's hard drive, file server, or company backup tape.When collecting raw ESI from multiple individuals, there are bound to be tremendous amounts of duplicative documents. In company-wide e-mail chains, for example, a message is sent to multiple recipients and stored within each individual's mailbox. Depending on your organization's data retention policies, copies of the same file might also be found on the employee's hard drive, file server, or company backup tape.
For the attorney tasked with identifying, collecting and reviewing ESI, an exhaustive review of a document set rife with duplicates threatens the timeliness, cost effectiveness and efficiency of a project. The risks intensify during review, where duplicate documents increase the potential for inconsistent privilege and responsiveness decisions on identical documents.
To mitigate these concerns, many practitioners turn to de-duplication technologies, where duplicate documents are identified and managed during ediscovery processing to minimize redundant review. Effectively, de-duplication can reduce the number of documents to be reviewed by as much as 90 percent, and, on average, 30 or 40 percent.
For the attorney tasked with identifying, collecting and reviewing ESI, an exhaustive review of a document set rife with duplicates threatens the timeliness, cost effectiveness and efficiency of a project. The risks intensify during review, where duplicate documents increase the potential for inconsistent privilege and responsiveness decisions on identical documents.
To mitigate these concerns, many practitioners turn to de-duplication technologies, where duplicate documents are identified and managed during e-discovery processing to minimize redundant review. Effectively, de-duplication can reduce the number of documents to be reviewed by as much as 90 percent, and, on average, 30 or 40 percent.
With de-duplication, an electronic "fingerprint" is created for each document at the bit level, by leveraging a hashing algorithm. The resultant fingerprints are measured against one another to determine which documents are exact duplicates. Fingerprints change with nearly any type of modification to the file —such as an extra space or formatting changes—and stand out when measured against the existing document universe.
However, identifying duplicates is only the first step. Simply removing all duplicate documents robs the reviewing attorney of potentially important contextual information—such as who maintained or had access to an important e-mail or document. Sophisticated e-discovery technologies have evolved to allow several options for discovery teams to examine these associated details.
With the KLDiscovery e-discovery processing engine, case teams have several de-duplication options. When choosing a de-duplication method, careful consideration of case needs should be measured in relation to the following options: