Don't Just Judge the Spelling! The Influence of Spelling on Assessing Second-Language Student Essays

Main Article Content

Thorben Jansen
Cristina Vögelin
Nils Machts
Stefan Daniel Keller
Jens Möller


When judging subject-specific aspects of students’ texts, teachers should assess various characteristics, e.g., spelling and content, independently of one another since these characteristics are indicators of different skills. Independent judgments enable teachers to adapt their classroom instruction according to students’ skills. It is still unclear how well teachers meet this challenge and which intervention could be helpful to them. In Study 1, N = 51 pre-service teachers assessed four authentic English as a Second Language (ESL) essays with different overall text qualities and different qualities of spelling using holistic and analytic rating scales. Results showed a negative influence of the experimentally manipulated spelling errors on the judgment of almost all textual characteristics. In Study 2, an experimental prompt was used to reduce this judgment error. Participants who were made aware of the judgment error caused by spelling errors formed their judgments in a less biased way, indicating a reduction of bias. The determinants of the observed effects and their practical implications are discussed.

Article Details

How to Cite
Jansen, T., Vögelin, C., Machts, N., Keller, S. D., & Möller, J. (2021). Don’t Just Judge the Spelling! The Influence of Spelling on Assessing Second-Language Student Essays. Frontline Learning Research, 9(1), 44–65.


Bae, J., & Bachman, L. F. (2010). An investigation of four writing traits and two tasks across two languages. Language Testing, 27(2), 213–234.

Bae, J., Bentler, P. M., & Lee, Y.‑S. (2016). On the role of content in writing assessment. Language Assessment Quarterly, 13(4), 302–328.

Bakker, M., van Dijk, A., & Wicherts, J. M. (2012). The rules of the game called psychological science. Perspectives on Psychological Science, 7(6), 543–554.

Barkaoui, K. (2010). Explaining ESL essay holistic scores: A multilevel modeling approach. Language Testing, 27(4), 515–535.

Birkel, P., & Birkel, C. (2002). Wie einig sind sich Lehrer bei der Aufsatzbeurteilung? Eine Replikationsstudie zur Untersuchung von Rudolf Weiss [How Concordant are Teachers’ Essay Scorings? A Replication of Rudolf Weiss’ Sudies]. Psychologie in Erziehung Und Unterricht, 49(3), 219–224.

Brookhart, S. M. (2011). Educational assessment knowledge and skills for teachers. Educational Measurement: Issues and Practice, 30(1), 3–12.

Brookhart, S. M. (2013). The use of teacher judgement for summative assessment in the USA. Assessment in Education: Principles, Policy & Practice, 20(1), 69–90.

Chamberlain, S., & Taylor, R. (2011). Online or face‐to‐face? An experimental study of examiner training. British Journal of Educational Technology, 42(4), 665–675.

Chernikova, O., Heitzmann, N., Fink, M.C. et al. (2019). Facilitating Diagnostic Competencies in Higher Education—a Meta-Analysis in Medical and Teacher Education. Educ Psychol Rev, 32, 157–196.

Cooksey, R. W., Freebody, P., & Wyatt-Smith, C. (2007). Assessment as Judgment-in-Context: Analysing how teachers evaluate students’ writing 1. Educational Research and Evaluation, 13(5), 401–434.

Culham, R. (2003). 6+ 1 traits of writing: The complete guide. New York: Scholastic Inc.

Cumming, A., Kantor, R., & Powers, D. E. (2002). Decision making while rating ESL/EFL writing tasks: A descriptive framework. The Modern Language Journal, 86(1), 67–96.

Dempsey, M. S., PytlikZillig, L. M., & Bruning, R. H. (2009). Helping preservice teachers learn to assess writing: Practice and feedback in a Web-based environment. Assessing Writing, 14(1), 38–61.

Driscoll, D. P., Avallone, A. P., Orr, C. S., & Crovo, M. (2010). Writing framework for the 2011 National Assessment of Educational progress. Washington, DC: National Assessment Governing Board, US Dept. of Education.

Elliott, J., Lee, S. W., & Tollefson, N. (2001). A reliability and validity study of the Dynamic Indicators of Basic Early Literacy Skills-Modified. School Psychology Review, 30(1), 33–49.

European Commission (2008). Multilingualism - an asset for Europe and a shared commitment. Retrieved from

Fiske, S. T., & Neuberg, S. L. (1990). A continuum of impression formation, from category-based to individuating processes: Influences of information and motivation on attention and interpretation. In M. P. Zanna (Ed.), Advances in experimental social psychology (Vol. 23, pp. 1–74). New York: NY: Academic Press.

Flor, M., Futagi, Y., Lopez, M., & Mulholland, M. (2015). Patterns of misspellings in L2 and L1 English: A view from the ETS Spelling Corpus. Bergen Language and Linguistics Studies, 6, 107–132.

Flower, L., & Hayes, J. R. (1981). A cognitive process theory of writing. College Composition and Communication, 32(4), 365–387.

Freedman, S. W. (1979). How characteristics of student essays influence teachers’ evaluations. Journal of Educational Psychology, 71(3), 328–338.

Hamp-Lyons, L. (1991). Assessing Second Language Writing in Academic Contexts. Chestnut St., Norwood: Ablex Publishing Corporation.

Heitzmann, N., Fischer, F., & Fischer, M. R. (2018). Worked examples with errors: When self-explanation prompts hinder learning of teachers diagnostic competences on problem-based learning. Instructional Science, 46(2), 245–271.

Herppich, S., Praetorius, A.‑K., Förster, N., Karst, K., Leutner, D., Behrmann, L., . . . Südkamp, A. (2017). Teachers’ assessment competence: Integrating knowledge-, process-, and product-oriented approaches into a competence-oriented conceptual model. Teaching and Teacher Education. (76), 1–13.

Huot, B. (1996). Toward a new theory of writing assessment. College composition and communication, 47(4), 549-566.

Hyland, K. (2008). Second language writing. New York: Cambridge University Press.

Jansen, T., Vögelin, C., Machts, N., Keller, S., & Möller, J. (2019). Das Schülerinventar ASSET zur Beurteilung von Schülerarbeiten im Fach Englisch: Drei experimentelle Studien zu Effekten der Textqualität und der Schülernamen [The Student Inventory ASSET for judging students performances in the subject English: Three experimental studies on effect of text quality and student names]. Psychologie in Erziehung Und Unterricht, 66(4), 303–315.

Jansen, T., Vögelin, C., Machts, N., Keller, S., Köller, O., & Möller, J. (2021). Judgment accuracy in experienced versus student teachers: Assessing essays in English as a foreign language. Teaching and Teacher Education, 97, 103216.

Kaiser, J., Möller, J., Helm, F., & Kunter, M. (2015). Das Schülerinventar: Welche Schülermerkmale die Leistungsurteile von Lehrkräften beeinflussen [The student inventory: how student characteristics bias teacher judgments]. Zeitschrift Für Erziehungswissenschaft, 18(2), 279–302.

Kaufmann, E. (2020). How accurately do teachers judge students? Re-analysis of Hoge and Coladarci (1989) meta-analysis. Contemporary Educational Psychology, 63, 101902.

Keller, S. (2013). Integrative Schreibdidaktik Englisch für die Sekundarstufe: Theorie, Prozessgestaltung, Empirie. Tübingen: Gunter Narr Verlag.

Lai, E. R., Wolfe, E. W., & Vickers, D. (2015). Differentiation of illusory and true halo in writing scores. Educational and Psychological Measurement, 75(1), 102–125.

Lovorn, M. G., & Rezaei, A. R. (2011). Assessing the assessment: Rubrics training for pre-service and new in-service teachers. Practical Assessment, Research & Evaluation, 16(16), 1–18.

Marshall, J. C. (1967). Composition errors and essay examination grades re-examined. American Educational Research Journal, 4(4), 375–385.

Meadows, M., & Billington, L. (2010). The effect of marker background and training on the quality of marking in GCSE English. Manchester: AQA Centre for Education Research and Policy.

Murphy, K. R., & Reynolds, D. H. (1988). Does true halo affect observed halo? Journal of Applied Psychology, 73(2), 235–238.

Open Science Collaboration (2015). Estimating the reproducibility of psychological science. Science, 349(6251).

Parr, J. M., & Timperley, H. S. (2010). Feedback to writing, assessment for teaching and learning and student progress. Assessing Writing, 15(2), 68–85.

Rafoth, B. A., & Rubin, D. L. (1984). The impact of content and mechanics on judgments of writing quality. Written Communication, 1(4), 446–458.

Rezaei, A. R., & Lovorn, M. (2010). Reliability and validity of rubrics for assessment through writing. Assessing Writing, 15(1), 18–39.

Royal‐Dawson, L., & Baird, J.‑A. (2009). Is teaching experience necessary for reliable scoring of extended English questions? Educational Measurement: Issues and Practice, 28(2), 2–8.

Ruiz‐Primo, M. A., & Furtak, E. M. (2007). Exploring teachers’ informal formative assessment practices and students’ understanding in the context of scientific inquiry. Journal of Research in Science Teaching, 44(1), 57–84.

Saal, F. E., Downey, R. G., & Lahey, M. A. (1980). Rating the ratings: Assessing the psychometric quality of rating data. Psychological Bulletin, 88(2), 413–428.

Sadler, D. R. (2009). Indeterminacy in the use of preset criteria for assessment and grading. Assessment & Evaluation in Higher Education, 34(2), 159-179.

Scannell, D. P., & Marshall, J. C. (1966). The effect of selected composition errors on grades assigned to essay examinations. American Educational Research Journal, 3(2), 125–130.

Südkamp, A., Kaiser, J., & Möller, J. (2012). Accuracy of teachers’ judgments of students’ academic achievement: A meta-analysis. Journal of Educational Psychology, 104(3), 743–762.

Sweedler-Brown, C. O. (1993). ESL essay evaluation: The influence of sentence-level and rhetorical features. Journal of Second Language Writing, 2(1), 3–17.

Urhahne, D., & Wijnia, L. (2021). A Review on the Accuracy of Teacher Judgments. Educational Research Review, 32, 100374.

Vögelin, C., Jansen, T., Keller, S., Machts, N., & Möller, J. (2019). The influence of lexical features on teacher judgments of ESL argumentative essays. Assessing Writing, 39, 50–63.

Vögelin, C., Jansen, T., Keller, S., & Möller, J. (2018). The impact of vocabulary and spelling on judgments of ESL essays: an analysis of teacher comments. The Language Learning Journal. Advance online publication.

Weigle, S. C. (2002). Assessing Writing.: Cambridge Language Assessment Series. Cambridge: CUP.

Weir, C. (1988). The specification, realization and validation of an English language proficiency test. In Hughes A. (Ed.), Testing English for university study. ELT documents 127 (pp. 45–110). London: Modern English Publications in association with The British Council.

Wind, S. A., Stager, C., & Patil, Y. J. (2017). Exploring the relationship between textual characteristics and rating quality in rater-mediated writing assessments: An illustration with L1 and L2 writing assessments. Assessing Writing, 34, 1–15.

Wolfe, E. W., Song, T., & Jiao, H. (2016). Features of difficult-to-score essays. Assessing Writing, 27, 1–10.

Zimmermann, F., Möller, J., & Köller, O. (2018). When students doubt their teachers’ diagnostic competence: Moderation in the internal/external frame of reference model. Journal of Educational Psychology, 110(1), 46–57.