Can GenAI outperform Australian law students?

An NSW-based law lecturer recently undertook an experiment, pitting his criminal law cohort against 10 separate AI-generated responses for an end-of-semester exam. The results might surprise you.

Jerome Doraisamy • 25 September 2024 • Big Law

In the wake of its still-recent explosion into mainstream consciousness, much has been made of the capacity for generative AI (GenAI) to perform the duties of legal professionals, resulting in a revival of the discourse surrounding the replacement of lawyers by emerging technology.

You’re out of free articles for this month

Username or Email

Password Forgot password?

Keep me signed in on this device.

First Name

Last Name

Mobile

Organisation Type

By becoming a member, I agree to receive information and promotional messages from Lawyers Weekly. I can opt out of these communications at any time. For more information, please visit our Privacy Statement.

Need help signing up? Visit the Help Centre.

Dr Armin Alimardani (pictured), a lecturer in law and emerging technologies at the University of Wollongong (UOW), has specifically been investigating whether GenAI can outperform law students or, indeed, an overwhelming majority of them.

His findings form the basis of a new paper, Generative Artificial Intelligence vs. Law Students: An Empirical Study on Criminal Law Exam Performance, which was published yesterday (Tuesday, 24 September) in the Journal of Law, Innovation and Technology.

He said: “The OpenAI claim was impressive and could have significant implications in higher education; for instance, does this mean [that] students can just copy their assignments into generative AI and ace their tests?”

“Many of us have played around with generative AI models, and they don’t always seem that smart, so I thought why not test it out myself with some experiments.”

VIEW ALL

The experiment

Last year, in UOW’s second semester, Alimardani – in his capacity as the subject coordinator for criminal law – compiled answers from AI to the end-of-semester exam. Five responses were sought using different versions of ChatGPT, and another five used various prompt engineering techniques for the sake of enhanced responses.

“My research assistant and I hand-wrote the AI-generated answers in different exam booklets and used fake student names and numbers. These booklets were indistinguishable from the real ones,” Alimardani said.

After the criminal law exam was held, Alimardani mixed the AI-generated papers with the real student papers and handed them to tutors for grading, who unknowingly marked two AI papers in their allocated bundle.

The results showed that – for a cohort of 225 students sitting an exam marked out of 60 – the average mark was approximately 40 (i.e. 66 per cent).

The papers using different versions of ChatGPT saw two bare pass marks and three fail marks. The best-performing paper of that quintet scored better than only 14.7 per cent of students.

“… this small sample suggests that if the students simply copied the exam question into one of the OpenAI models, they would have a 50 per cent chance of passing,” Alimardani said.

Of the papers that used prompt engineering tricks, three papers “weren’t that impressive”, Alimardani noted, but two performed reasonably well, with one scoring at 73 per cent and the other scoring at 78 per cent.

Ultimately, he said, “these results don’t quite match the glowing benchmarks from OpenAI’s United States bar exam simulation, and none of the 10 AI papers performed better than 90 per cent of the students”.

Another potentially surprising result was that “hallucination” – the formulation of fabricated information by an AI tool – did not eventuate in this experiment, with the models used staying true to existing legal principles and facts provided in the exam, Alimardani noted.

Implications

Looking ahead, “alignment” – the degree to which AI-generated outputs match the user’s intentions – will be the real problem, Alimardani warned.

“The AI-generated answers weren’t as comprehensive as we expected. It seemed to me that the models were fine-tuned to avoid hallucination by playing it safe and providing less detailed answers,” he said.

“My research shows that people can’t get too excited about the performance of GenAI models in benchmarks. The reliability of benchmarks may be questionable, and the way they evaluate models could differ significantly from how we evaluate students.”

Moreover, the findings suggest that graduates who know how to work with AI could have an advantage in the job market, Alimardani continued.

“Prompt engineering can significantly enhance the performance of GenAI models, and therefore, it is more likely that future employers would have higher expectations regarding students’ GenAI proficiency,” he said.

“It’s likely students will be increasingly assessed on their ability to collaborate with AI to complete tasks more efficiently and with higher quality.”

Elsewhere, there may be implications for legal educators, Alimardani said.

None of the tutors tasked with grading papers suspected any papers were AI-generated, he noted, and were “genuinely surprised” when they found out.

In addition, “three of the tutors admitted that even if the submissions were online, they wouldn’t have caught it”.

“So, if academics think they can spot an AI-generated paper, they should think again,” he said.

We're evolving — and so should your insights. Heads up — Lawyers Weekly is going premium from 1 May for just $5 a month. Stay informed without missing a beat. More information coming soon.

Jerome Doraisamy

Jerome Doraisamy is the managing editor of Lawyers Weekly and HR Leader. He is also the author of The Wellness Doctrines book series, an admitted solicitor in New South Wales, and a board director of the Minds Count Foundation.

You can email Jerome at: This email address is being protected from spambots. You need JavaScript enabled to view it.

You need to be a member to post comments. Become a member for free today!

Lawyers Weekly - legal news for Australian lawyers

SECTIONS

MORE

Can GenAI outperform Australian law students?

Jerome Doraisamy

OUR PLATFORMS AND BRANDS

EVENTS AND SUMMITS

PODCASTS

LEARNING AND EDUCATION

MOMENTUM MARKETS NETWORK

LINKS

STAY CONNECTED