Two Truths and a Fib About Intelligent Categorization

Wednesday, April 13, 2011 by Thought Leadership Team


Time is money, and linear document review is almost prohibitively expensive due to the surge in electronic data volume over the past several years and the corresponding increase in resources required to review the data. Besides time and costs, having a multitude of attorneys reviewing and categorizing documents for (potentially) months on end can yield inconsistent results. Innovative technological advances have arrived on the document review scene, but concerns about overall effectiveness persist as the legal industry remains hesitant to explore new technology.

Enter Intelligent Categorization (iC). Intelligent Categorization is the third component of Intelligent Review Technology (IRT) that analyzes and learns from category decisions made by human reviewers, then identifies and elevates documents most likely to be relevant and suggests categories for documents not yet reviewed. Along with Automated Workflow and Intelligent Prioritization, the other two legs of IRT, reliance on Intelligent Categorization technology is on its way to becoming a well-established practice in 2011. Differing ideas and opinions associated with this technology have been tossed around, giving rise to certain ideas and misconceptions about what iC is, what iC is not and what iC can do for electronic discovery. Today we will dissipate the confusion and set the record straight by exploring two important truths and a common fib associated with Intelligent Categorization.

Defensible? True.

First and foremost, Intelligent Categorization is defensible. One of the early qualms about iC was that until the technology became court-tested, it was too risky to use. That simply is not the case. In fact, such fears have preceded the acceptance of all new technology, including features such as advanced searching and sampling, which are now embraced by jurists and litigants alike.[1] Case law supports the use of a systematic, well-documented and repeatable process, and Intelligent Categorization is specifically designed to increase accuracy and effectiveness while decreasing review time. Indeed, when using all three components of Intelligent Review Technology, it is possible to save 50 percent on review costs.

Intelligent Categorization also supports the notions of proportionality set forth in Rule 1 of the Federal Rules of Civil Procedure, with the goal of proceedings to be “just, speedy and inexpensive.”[2] As an integrated component of IRT, iC is fully transparent with real-time metrics and analytics available throughout the review process. In addition, experts can explain the technology to judges, opponents, clients and staff if necessary.

Further, the Sedona Conference® has endorsed the use of automated methods (although has not endorsed particular technologies to do so). The Sedona Conference Best Practices Commentary on the Use of Search and Information Retrieval Methods in Ediscovery, Practice Point 1 states:

[R]eliance solely on a manual search process for the purpose of finding documents may be infeasible or unwarranted. In such cases, the use of automated search methods should be viewed as reasonable, valuable, and even necessary.[3]

In addition, The Sedona Conference Commentary on Achieving Quality in the EDiscovery Process advises practitioners to utilize technology that “reasonably and appropriately enable a party to safely and substantially reduce the amount of ESI that must be reviewed by humans.”[4] These commentaries stress the use of technology to realize the important goal of achieving proportionality in the electronic discovery process that has unfortunately spiraled out of control in recent years.

Effective? True.

Closely linked to defensibility is effectiveness. Before investing in new technology, legal teams must be confident that the new feature will work and is worth the change. With sufficient training data, supervised learning can target documents most likely to be relevant. Using supervised learning to identify and pull responsive documents into categories early reduces the time spent organizing documents responsive to particular requests, and helps reviewers and the legal team better understand the case early on. Also, related documents can be dealt with more efficiently as a group and can even be assigned to a reviewer with expertise in a particular category.

The effectiveness of this technology may also be tested through sampling. Sampling is the key to measuring, monitoring, controlling and correcting potential errors in categorization, and is useful in any review to validate results. The technology can systematically and iteratively test the data to evaluate the accuracy of iC (in addition to Intelligent Prioritization). Without the use of sampling, some courts have concluded a party did not take reasonable steps to prevent disclosure.[5] With the flexibility to conduct as much or as little sampling as desired, iC not only reduces the time needed to complete a review, it improves the consistency of and confidence in category determinations.[6]

Independent studies are also proving that the use of Intelligent Review Technology (including Intelligent Categorization) is more effective than traditional, manual review processes. The Ediscovery Institute released a survey that showed using the technology equivalent of Intelligent Categorization resulted in reduced review costs by 45 percent or more.[7] In addition, the TREC Legal Track study from 2008 demonstrated that a “Boolean keyword search found only 24% of the total number of responsive documents in the target data set” while automated searching methods found 76 percent of the responsive documents.[8]

Devoid of Human Control? False.

Intelligent Categorization is not a process devoid of critical human insight and control. In some instances, this new technology has been pitched as a purely hands-off, eyes-off solution. In reality, Intelligent Review Technology as a whole does not replace human reviewers, nor should it. iC works by “learning” from human decisions and applying human logic when suggesting document categories. Human input is required so the technology has data sets with applied classifications from which to learn, and the system learns from both responsive and non-responsive decisions of human reviewers. As more documents are received and sorted, legal teams can rely on technology to continually improve the model while human reviewers can focus their efforts on the content and substance of the documents. In addition, because the tool was designed to increase consistency and accuracy, it affords the flexibility and scalability to give the ediscovery team more control over the review and to leverage as much or as little human input and oversight as is appropriate for the project. Thus, iC is not a substitute for skilled lawyers; rather, it enhances and compliments the work they do.

The question of whether it is reasonable to omit review of some documents altogether is an as-yet undetermined legal question. From a technical standpoint, however, IRT systems can support a range of approaches to selective review, such as extracting documents with a sufficiently low probability of responsiveness from review, guiding a review to read just the most important portions of long documents or focusing extra review on documents likely to belong to sensitive categories.

In short, Intelligent Categorization is a defensible, effective, cost-saving measure that leverages the work of talented attorneys to decrease the time required to complete document review. It is designed to meet flexibility and repeatability needs of the client, and is proving to be the key differentiator in the ability to respond to electronic discovery demands quickly and proportionately.

Note: The above post appeared in the April 2011 issue of the free, monthly e-newsletter, Case Law Update & Trends published by Kroll Ontrack. This newsletter is designed to help busy legal professionals keep pace with case law and information pertaining to electronic evidence. Subscribe and gain valuable and timely information on new ESI court decisions, as well as informative articles and tips for both the corporate and law firm audience.

[1] See, e.g., William A. Gross Constr. Assocs., Inc. v. Am. Mfrs. Mut. Ins. Co., 2009 WL 724954 (S.D.N.Y. Mar. 19, 2009).

[2] Federal Rules of Civil Procedure

[3] The Sedona Conference® Best Practices Commentary on the Use of Search and Information Retrieval Methods in Ediscovery. (Published August 2007).

[4] The Sedona Conference® Commentary on Achieving Quality in the Ediscovery Process, available for download at (Published May 2009).

[5] Mt. Hawley Ins. Co. v. Felman Prods. Inc., 2010 WL 1990555 (S.D.W.Va. May 18, 2010). 

[6] The Sedona Conference® Commentary on Achieving Quality in the Ediscovery Process, Principle 2, states: “In the ediscovery context, statistical sampling can serve as a check on the effectiveness of…automated tools in identifying responsive information and on the reviewers’ ability to correctly code documents.”

[7] See “Ediscovery Institute Survey on Predictive Coding,” available at

[8] For complete results from TREC Legal Track, visit