Predictive Coding and Benedict Cumberbatch: Part 2
How – I hear you ask – can I relate Predictive Coding to Benedict Cumberbatch, not once but twice? Well the answer lies in another of Benedict’s famous roles – Sherlock Holmes. Sherlock Holmes, the infamous detective, was notorious for his powers of observation – making the most fantastical deductions based upon apparently limited information. For example – when he accurately describes Dr James Mortimer without ever having met him, simply by looking at his walking stick! (The Hound of the Baskervilles)
What can Predictive Coding learn from the world’s greatest detective?
As data volumes grow and grow the old cliché of looking for a needle in a haystack becomes ever more pertinent. In investigations, for example, the sheer volume of custodians and data collected is significant to say the least – that much is obvious, even for those of us who lack the Holmes finesse. What is less obvious perhaps is that while the volume of data is increasing the amount of relevant data within the corpus is declining. This is a trend – unlike the deerstalker – that is not going out of style.
It is clear that the next step for Predictive Coding is to become more like Sherlock Holmes: Predictive Coding has to learn more, from less. As the trend continues, the volume of not relevant data will continue to outweigh its relevant counterpart, and there will be fewer examples of relevancy from which the machine can learn. This means, realistically, that the machine will attempt to over-guess what might be relevant and so our Precision scores will suffer (Precision, dear Watson - combined with Recall –are scores used to measure the success of Predictive Coding. Recall is the fraction of retrieved relevant documents within the total relevant documents. Precision is the fraction of relevant documents within the retrieved results, and so is a measure of exactness.)
The machine learning algorithm will need to make deductions from less information, to become steadier when making predictions in order to pull up these Precision scores. And thus, to exclude the impossible, whether you are left with the improbable, there are also strategies that we can take to aid the machine and its learning.
Predictive coding should never exist in a vacuum: It is one of a variety of technologies available to us. So while Predictive Coding is learning to be Holmes, we humans can play Watson. When relevancy rates are low we are trying to achieve two things. Firstly, has the machine had all examples of types of relevant documents to learn from? And secondly, how can we clarify the definition of relevance?
So if faced with a large number of documents, and a low prevalence of relevancy – think Sherlock Holmes and remember, while the machine is learning to be Sherlock you can still be Watson and assist in cracking the case!