Legal Technology - Numbers Don’t Lie, but They Don’t Always Tell the Whole Story

Monday, March 25, 2013 by Thought Leadership Team


Major League Baseball’s opening day will soon be upon us, so it’s time to dust off your favorite squad’s cap and embrace the unabashed optimism of a new season that makes you forget past disappointments… even if only for a week or two.

Ediscovery - Legal Technology moneyball

Like the wave of new, advanced legal technologies entering the fray in electronic discovery, the game of baseball is in the midst of a sea change. As the 2011 academy-award nominated film Moneyball suggested, many general managers and talent evaluators are turning to advanced statistics—such as, for example, “wins above replacement” (WAR) and “batting average on balls in play” (BABIP)—to measure player performance and predict outcomes.

Despite the wave of advanced statisticians, there remains a group of purists in baseball who rely on more traditional statistics (such as batting average, runs batted in, etc.), scouting and “intangibles.” Although some purists adhere to the old ways, the expansive use of statistics is certainly growing in popularity with many front offices—so much so that the aforementioned WAR has even found its way onto the front page of many stat sheets and last years’ debate over the American League’s most valuable player.

As the ediscovery community similarly enters a new age with advanced litigation and legal technologies, resolutions to these ongoing culture wars will likely develop as both sides leverage one another’s strengths and meet in the middle. Ultimately, although numbers may not tell the whole story on their own, they’re extremely useful when combined with seasoned, knowledgeable, and well-trained experts.

Know What Your Data Says

While we continue to strive for a solution that is “good enough,” we naturally compare as many data points as we can get our hands on. For example, different legal technologies used on the same Personal Storage Table (PST) may yield different numbers of documents. Such inconsistency should raise your interest, as you need to know what assumptions and tools one technology used in relation to the other to further understand what constituted “good enough” for output. Upon further investigation, you might find that each technology handles corrupt data differently: one technology treats corrupt data as a problem document and generates a placeholder, while the other might apply additional tools to provide the best quality output.  If that fails, the technology will simply report the document rather than using a placeholder—ultimately resulting in a greater number of reported documents and fewer documents delivered for review.

It is important to note that neither solution in the above hypothetical is truly correct or incorrect: both processing engines handle and report the problem files, but without analyzing the data and understanding how the technology works, one appears to have clearly "outperformed" the other in terms of the total number of documents produced.

Similarly, a different document could be related to embedded file handling, e.g. when a spreadsheet document is fully embedded within a word processing document, and the word processing document only displays one chart from one tab of the spreadsheet—what to do with the spreadsheet? Are they identified? Extracted? Neither? Both? Such consideration matters because if you plan on delivering native files, opposing counsel can see those embedded files—which often contain more data than you can see in the parent file.

Granted, the wealth of new technologies in the world of litigation can make it a somewhat confusing place, but ignorance (or possibly indifference) is no excuse: know what you’re looking for, and more importantly, know how to look for it. Courts increasingly expect more tech-savvy litigants, so it’s not outside the scope of your duties to truly understand what’s in your data, or to engage qualified professionals to help.