Investigating marker accuracy in differentiating between university scripts written by students and those produced using ChatGPT

Athanasios Hassoulas; Ned Powell; Lindsay Roberts; Katja Umla-Runge; Laurence Gray; Marcus J Coffey

doi:10.37074/jalt.2023.6.2.13

Vol. 6 No. 2 (2023), Research Articles

Vol. 6 No. 2 (2023)

Investigating marker accuracy in differentiating between university scripts written by students and those produced using ChatGPT

Research Articles

https://doi.org/10.37074/jalt.2023.6.2.13

Published July 24, 2023

Athanasios Hassoulas⁺⁻
Ned Powell⁺⁻
Lindsay Roberts⁺⁻
Katja Umla-Runge⁺⁻
Laurence Gray⁺⁻
Marcus J Coffey⁺⁻

Athanasios Hassoulas

Cardiff University School of Medicine

https://orcid.org/0000-0002-1029-1847

Ned Powell

Cardiff University School of Medicine

Lindsay Roberts

Cardiff University School of Medicine

Katja Umla-Runge

Cardiff University School of Medicine

Laurence Gray

Cardiff University School of Medicine

Marcus J Coffey

Cardiff University School of Medicine

PDF

Abstract

The introduction of OpenAI’s ChatGPT has widely been considered a turning point for assessment in higher education. Whilst we find ourselves on the precipice of a profoundly disruptive technology, generative artificial intelligence (AI) is here to stay. At present, institutions around the world are considering how best to respond to such new and emerging tools, ranging from outright bans to re-evaluating assessment strategies. In evaluating the extent of the problem that these tools pose to the marking of assessments, a study was designed to investigate marker accuracy in differentiating between scripts prepared by students and those produced using generative AI. A survey containing undergraduate reflective writing scripts and postgraduate extended essays was administered to markers at a medical school in Wales, UK. The markers were asked to assess the scripts on writing style and content, and to indicate whether they believed the scripts to have been produced by students or ChatGPT. Of the 34 markers recruited, only 23% and 19% were able to correctly identify the ChatGPT undergraduate and postgraduate scripts, respectively. A significant effect of suspected script authorship was found for script content, X²(4, n=34) = 10.41, p<0.05, suggesting that written content holds clues as to how markers assign authorship. We recommend consideration be given to how generative AI can be responsibly integrated into assessment strategies and expanding our definition of what constitutes academic misconduct in light of this new technology.

https://doi.org/10.37074/jalt.2023.6.2.13

PDF

Downloads

Download data is not yet available.