Automated versus Human Essay Scoring: A Comparative Study

Rania Zribi, Chokri Smaoui


The purpose of this study was to investigate the validity of automated essay scoring (Paper Rater) of EFL learners’ written performances by comparing the average group mean scores assigned by the Paper Rater computer and by human raters. Ten intermediate EFL learners responded to a topic and received scores from both automated and human scoring processes. The SPSS statistical procedure, namely the One-Way Reported-Measures ANOVA, diagnosed the difference between the computerized mean scores and human raters’ mean scores. Unlike previous studies, the findings of this study reflected some differences in the scores awarded by both procedures. The average mean scores assigned by the automated essay scoring tool Paper Rater was significantly higher than the human raters’ scores of learners’ essays. The Paper Rater tool did not seem to correlate well with human raters. Thus, the implications for English teachers revealed that despite its cost-effective nature, the automated scoring system together with human scorers lack the ability to award as reliable scores as humans do. However, the application of computerized scoring system in the educational system plays a key role in improving the learning process. Thanks to its instant feedback, this software may contribute to the improvement of EFL learners’ writings.

