Can AI really predict case outcomes?
If historical outcomes are perhaps the best indicator of the likely success of a particular case, then why not simply use AI to quickly perform research into the success rate of that particular type of case before the allocated judge, writes Travis Schultz.
Generative AI and its capacity to work magic in our professional lives seem to be a theme that transcends every legal journal, conference program, and LinkedIn feed these days. Much has been said about its ability to support, and even replace, functions that we as lawyers routinely perform, and if some pundits are to be believed, many of our jobs are at risk. But will artificial intelligence ever really progress to the point that it can accurately predict case outcomes? I, for one, very much doubt that AI will be better at forecasting outcomes than a data-informed and experienced lawyer.
For most of us in the legal profession, one of the most commonly asked questions by clients is, “Am I going to win the case?” And as lawyers, we draw upon our experiences, education, and precedent, sniff the breeze, and usually provide a very guarded opinion. The identity of the trial judge will be perhaps the most important factor, and then it depends on how the evidence falls, we often say.
Companies that provide specialised legal industry AI services make all sorts of claims about the accuracy and reliability of their artificial intelligence tools. Particularly so when it comes to legal research and case analysis. I was interested to read, however, that according to an article by Barun Magesh et al, the hallucination rates in these products can range between 17 and 33 per cent. The delusions that generate fake cases and precedents through large language models like ChatGPT have become infamous!
One high-profile AI tool that claims to be able to predict outcomes in legal cases is Lex Machina. I read with interest that they claim to have performed better than experienced lawyers when trying to predict the outcomes of decisions of the US Supreme Court. But as I understand it, these tools, like Lex Machina, make their predictions by using information available about historical events and outcomes – such as the judge, the lawyers, and the issues – rather than by applying a deep understanding of the law to the known facts. If that’s the case, then how could it predict the outcome of a case in an emerging area of law any better than a member of the legal profession? If the evidence has not yet been received by a court, how could an AI tool realistically make predictions about issues such as creditworthiness or impact of witnesses, or whether the evidence will support the pleaded case?
If historical outcomes are perhaps the best indicator of the likely success of a particular case, then why not simply use AI to quickly perform research into the success rate of that particular type of case before the allocated judge?
I was interested to read an article published in the UNSW Law Journal in 2022, authored by Daniel Ghezelbash, Keyvan Dorostkar and Shannon Walsh, in which the judicial review decisions of different judges of the Federal Circuit and Family Court relating to refugee cases were analysed. The authors used an AI tool to review some 6,700 decisions between 2013 and 2021; and the results were staggering! At one end of the spectrum, Judge Vasta allowed only one out of 165 applications that he heard (a 0.006 per cent success rate), while at the other end, three judges allowed the judicial review applications over 20 per cent of the time (Jones J 23.08 per cent, Riley J 21.74 per cent and Riethmuller J 21.51 per cent). The median success rate of the 30 different judicial officers was about 8 per cent – which means that the statistical variation is significant.
Data of this type, collated by an AI tool, would provide an undeniably helpful guide to lawyers who were trying to predict the likelihood of success before a particular judge.
A Queensland perspective
As a compensation lawyer practising in Queensland for over 30 years, I’ve privately held poorly informed views as to the decision-making tendencies of our judiciary. Human nature being what it is, some judges have a reputation for being naturally more sympathetic towards plaintiffs than others, just as medico-legal report writing specialists exist on a spectrum from mean to compassionate! However, when I recently embarked on a research project to analyse the decisions of the Queensland bench and attempt to identify any trends, the results took me a little by surprise.
Lacking in any form of technology skills (unlike the authors of the judicial review applications report), I didn’t have an AI tool to assist in the task. So, my “analogue” process involved undertaking a physical review of every civil decision of judges who have been appointed to the District Court and Supreme Court, who are currently serving, and who have served for at least five years, with the assistance of a team of graduate lawyers and paralegals in my firm: Shelby Bennett, Isabella Blunt, Aiden Warneke, Emmi Airaksinen, and Abbey Pringle. So that the data sets can be meaningful, judicial officers of shorter tenure were not considered as the number of cases may not be statistically significant.
In conducting our research, my real interest was in looking for patterns of decision making in personal injury cases to enable an analysis as to whether particular judges took a different approach in compensation law cases (compared to other civil litigation) and whether there were any patterns that might help to predict outcomes. In civil judgments, if the parties were each successful on different issues, the research table records the party who was “most successful” as being the successful party. In some decisions, the only issue may have been costs or on an interlocutory application and were not necessarily decisions that determined substantive disputes between parties.
Ignoring appellate level decisions, the research conducted for the period of 1 January 2014 to 31 December 2023 is tabulated below:
First instance decisions
Judge
(the name of the judge has been redacted as it is not my intention to create a “league table”)
|
Total number of all civil judgments
|
Finding for plaintiff (all civil judgments)
|
Total number of PI judgments
|
Finding for plaintiff (PI judgments)
|
Judge 1
|
34
|
32%
|
8
|
25%
|
Judge 2
|
11
|
36%
|
5
|
40%
|
Judge 3
|
34
|
44%
|
7
|
29%
|
Judge 4
|
25
|
56%
|
8
|
50%
|
Judge 5
|
100
|
59%
|
12
|
50%
|
Judge 6
|
31
|
58%
|
12
|
67%
|
Judge 7
|
44
|
80%
|
15
|
80%
|
Judge 8
|
32
|
41%
|
4
|
0%
|
Judge 9
|
74
|
54%
|
6
|
50%
|
Judge 10
|
70
|
57%
|
9
|
56%
|
Judge 11
|
146
|
43%
|
15
|
40%
|
Judge 12
|
23
|
44%
|
2
|
50%
|
Judge 13
|
165
|
58%
|
30
|
60%
|
Judge 14
|
77
|
38%
|
2
|
100%
|
Judge 15
|
13
|
46%
|
5
|
60%
|
Judge 16
|
7
|
14%
|
4
|
25%
|
Judge 17
|
90
|
48%
|
12
|
50%
|
Judge 18
|
11
|
55%
|
3
|
67%
|
Judge 19
|
19
|
37%
|
5
|
40%
|
Judge 20
|
62
|
58%
|
7
|
72%
|
Judge 21
|
131
|
47%
|
13
|
77%
|
Judge 22
|
90
|
46%
|
5
|
80%
|
Judge 23
|
59
|
37%
|
12
|
8%
|
Judge 24
|
15
|
53%
|
7
|
57%
|
Considering this data, it can confidently be asserted that there is a statistically significant variation in the success rate of personal injury cases in front of particular judges; some as low as 8 per cent, and others as high as 80 per cent! A pattern such as this would be clearly visible to a well-trained AI tool – an application that could undertake the exercise in seconds rather than months (as my research took).
Conclusion
While AI is becoming more sophisticated and continuously learning how to teach itself, even generative AI takes a statistical approach in analysing historical data. My firm’s analogue research tends to support the notion that trends can be spotted in particular types of cases before particular judges – and these are nuances that AI could find in a matter of seconds – but I am yet to be convinced that AI can effectively use pleadings alone to predict outcomes any better than an experienced litigator who has access to reliable historical datasets.