Don't Just Judge the Spelling! The Influence of Spelling on Assessing Second-Language Student Essays

Main Article Content

Thorben Jansen
Cristina Vögelin
Nils Machts
Stefan Daniel Keller
Jens Möller

Abstract

When judging subject-specific aspects of students’ texts, teachers should assess various characteristics, e.g., spelling and content, independently of one another since these characteristics are indicators of different skills. Independent judgments enable teachers to adapt their classroom instruction according to students’ skills. It is still unclear how well teachers meet this challenge and which intervention could be helpful to them. In Study 1, N = 51 pre-service teachers assessed four authentic English as a Second Language (ESL) essays with different overall text qualities and different qualities of spelling using holistic and analytic rating scales. Results showed a negative influence of the experimentally manipulated spelling errors on the judgment of almost all textual characteristics. In Study 2, an experimental prompt was used to reduce this judgment error. Participants who were made aware of the judgment error caused by spelling errors formed their judgments in a less biased way, indicating a reduction of bias. The determinants of the observed effects and their practical implications are discussed.

Article Details

How to Cite
Jansen, T., Vögelin, C., Machts, N., Keller, S. D., & Möller, J. (2021). Don’t Just Judge the Spelling! The Influence of Spelling on Assessing Second-Language Student Essays. Frontline Learning Research, 9(1), 44 - 65. https://doi.org/10.14786/flr.v9i1.541
Section
Articles

References

Bae, J., & Bachman, L. F. (2010). An investigation of four writing traits and two tasks across two languages. Language Testing, 27(2), 213–234. https://doi.org/10.1177/0265532209349470

Bae, J., Bentler, P. M., & Lee, Y.‑S. (2016). On the role of content in writing assessment. Language Assessment Quarterly, 13(4), 302–328. https://doi.org/10.1080/15434303.2016.1246552

Bakker, M., van Dijk, A., & Wicherts, J. M. (2012). The rules of the game called psychological science. Perspectives on Psychological Science, 7(6), 543–554. https://doi.org/10.1177/1745691612459060

Barkaoui, K. (2010). Explaining ESL essay holistic scores: A multilevel modeling approach. Language Testing, 27(4), 515–535. https://doi.org/10.1177/0265532210368717

Birkel, P., & Birkel, C. (2002). Wie einig sind sich Lehrer bei der Aufsatzbeurteilung? Eine Replikationsstudie zur Untersuchung von Rudolf Weiss [How Concordant are Teachers’ Essay Scorings? A Replication of Rudolf Weiss’ Sudies]. Psychologie in Erziehung Und Unterricht, 49(3), 219–224.

Brookhart, S. M. (2011). Educational assessment knowledge and skills for teachers. Educational Measurement: Issues and Practice, 30(1), 3–12. https://doi.org/10.1111/j.1745-3992.2010.00195.x

Brookhart, S. M. (2013). The use of teacher judgement for summative assessment in the USA. Assessment in Education: Principles, Policy & Practice, 20(1), 69–90. https://doi.org/10.1080/0969594X.2012.703170

Chamberlain, S., & Taylor, R. (2011). Online or face‐to‐face? An experimental study of examiner training. British Journal of Educational Technology, 42(4), 665–675. https://doi.org/10.1111/j.1467-8535.2010.01062.x

Chernikova, O., Heitzmann, N., Fink, M.C. et al. (2019). Facilitating Diagnostic Competencies in Higher Education—a Meta-Analysis in Medical and Teacher Education. Educ Psychol Rev, 32, 157–196. https://doi.org/10.1007/s10648-019-09492-2

Cooksey, R. W., Freebody, P., & Wyatt-Smith, C. (2007). Assessment as Judgment-in-Context: Analysing how teachers evaluate students’ writing 1. Educational Research and Evaluation, 13(5), 401–434. https://doi.org/10.1080/13803610701728311

Culham, R. (2003). 6+ 1 traits of writing: The complete guide. New York: Scholastic Inc.

Cumming, A., Kantor, R., & Powers, D. E. (2002). Decision making while rating ESL/EFL writing tasks: A descriptive framework. The Modern Language Journal, 86(1), 67–96. https://doi.org/10.1111/1540-4781.00137

Dempsey, M. S., PytlikZillig, L. M., & Bruning, R. H. (2009). Helping preservice teachers learn to assess writing: Practice and feedback in a Web-based environment. Assessing Writing, 14(1), 38–61. https://doi.org/10.1016/j.asw.2008.12.003

Driscoll, D. P., Avallone, A. P., Orr, C. S., & Crovo, M. (2010). Writing framework for the 2011 National Assessment of Educational progress. Washington, DC: National Assessment Governing Board, US Dept. of Education.

Elliott, J., Lee, S. W., & Tollefson, N. (2001). A reliability and validity study of the Dynamic Indicators of Basic Early Literacy Skills-Modified. School Psychology Review, 30(1), 33–49.

European Commission (2008). Multilingualism - an asset for Europe and a shared commitment. Retrieved from http://eur-lex.europa.eu/legal-content/EN/TXT/?uri=URISERV:ef0003

Fiske, S. T., & Neuberg, S. L. (1990). A continuum of impression formation, from category-based to individuating processes: Influences of information and motivation on attention and interpretation. In M. P. Zanna (Ed.), Advances in experimental social psychology (Vol. 23, pp. 1–74). New York: NY: Academic Press.

Flor, M., Futagi, Y., Lopez, M., & Mulholland, M. (2015). Patterns of misspellings in L2 and L1 English: A view from the ETS Spelling Corpus. Bergen Language and Linguistics Studies, 6, 107–132. https://doi.org/10.15845/bells.v6i0.811

Flower, L., & Hayes, J. R. (1981). A cognitive process theory of writing. College Composition and Communication, 32(4), 365–387. https://doi.org/10.2307/356600

Freedman, S. W. (1979). How characteristics of student essays influence teachers’ evaluations. Journal of Educational Psychology, 71(3), 328–338. https://doi.org/10.1037/0022-0663.71.3.328

Hamp-Lyons, L. (1991). Assessing Second Language Writing in Academic Contexts. Chestnut St., Norwood: Ablex Publishing Corporation.

Heitzmann, N., Fischer, F., & Fischer, M. R. (2018). Worked examples with errors: When self-explanation prompts hinder learning of teachers diagnostic competences on problem-based learning. Instructional Science, 46(2), 245–271. https://doi.org/10.1007/s11251-017-9432-2.

Herppich, S., Praetorius, A.‑K., Förster, N., Karst, K., Leutner, D., Behrmann, L., . . . Südkamp, A. (2017). Teachers’ assessment competence: Integrating knowledge-, process-, and product-oriented approaches into a competence-oriented conceptual model. Teaching and Teacher Education. (76), 1–13. https://doi.org/10.1016/j.tate.2017.12.001

Huot, B. (1996). Toward a new theory of writing assessment. College composition and communication, 47(4), 549-566. https://doi.org/10.2307/358601

Hyland, K. (2008). Second language writing. New York: Cambridge University Press. https://doi.org/10.1017/S0261444808005235

Jansen, T., Vögelin, C., Machts, N., Keller, S., & Möller, J. (2019). Das Schülerinventar ASSET zur Beurteilung von Schülerarbeiten im Fach Englisch: Drei experimentelle Studien zu Effekten der Textqualität und der Schülernamen [The Student Inventory ASSET for judging students performances in the subject English: Three experimental studies on effect of text quality and student names]. Psychologie in Erziehung Und Unterricht, 66(4), 303–315. https://doi.org/10.2378/peu2019.art21d

Jansen, T., Vögelin, C., Machts, N., Keller, S., Köller, O., & Möller, J. (2021). Judgment accuracy in experienced versus student teachers: Assessing essays in English as a foreign language. Teaching and Teacher Education, 97, 103216. https://doi.org/10.1016/j.tate.2020.103216

Kaiser, J., Möller, J., Helm, F., & Kunter, M. (2015). Das Schülerinventar: Welche Schülermerkmale die Leistungsurteile von Lehrkräften beeinflussen [The student inventory: how student characteristics bias teacher judgments]. Zeitschrift Für Erziehungswissenschaft, 18(2), 279–302. https://doi.org/10.1007/s11618-015-0619-5

Kaufmann, E. (2020). How accurately do teachers judge students? Re-analysis of Hoge and Coladarci (1989) meta-analysis. Contemporary Educational Psychology, 63, 101902. https://doi.org/10.1016/j.cedpsych.2020.101902

Keller, S. (2013). Integrative Schreibdidaktik Englisch für die Sekundarstufe: Theorie, Prozessgestaltung, Empirie. Tübingen: Gunter Narr Verlag.

Lai, E. R., Wolfe, E. W., & Vickers, D. (2015). Differentiation of illusory and true halo in writing scores. Educational and Psychological Measurement, 75(1), 102–125. https://doi.org/10.1177/0013164414530990

Lovorn, M. G., & Rezaei, A. R. (2011). Assessing the assessment: Rubrics training for pre-service and new in-service teachers. Practical Assessment, Research & Evaluation, 16(16), 1–18.

Marshall, J. C. (1967). Composition errors and essay examination grades re-examined. American Educational Research Journal, 4(4), 375–385.

Meadows, M., & Billington, L. (2010). The effect of marker background and training on the quality of marking in GCSE English. Manchester: AQA Centre for Education Research and Policy.

Murphy, K. R., & Reynolds, D. H. (1988). Does true halo affect observed halo? Journal of Applied Psychology, 73(2), 235–238. https://doi.org/10.1037/0021-9010.73.2.235

Open Science Collaboration (2015). Estimating the reproducibility of psychological science. Science, 349(6251). https://doi.org/10.1126/science.aac4716

Parr, J. M., & Timperley, H. S. (2010). Feedback to writing, assessment for teaching and learning and student progress. Assessing Writing, 15(2), 68–85. https://doi.org/10.1016/j.asw.2010.05.004

Rafoth, B. A., & Rubin, D. L. (1984). The impact of content and mechanics on judgments of writing quality. Written Communication, 1(4), 446–458. https://doi.org/10.1177/0741088384001004004

Rezaei, A. R., & Lovorn, M. (2010). Reliability and validity of rubrics for assessment through writing. Assessing Writing, 15(1), 18–39. https://doi.org/10.1016/j.asw.2010.01.003

Royal‐Dawson, L., & Baird, J.‑A. (2009). Is teaching experience necessary for reliable scoring of extended English questions? Educational Measurement: Issues and Practice, 28(2), 2–8. https://doi.org/10.1111/j.1745-3992.2009.00142.x

Ruiz‐Primo, M. A., & Furtak, E. M. (2007). Exploring teachers’ informal formative assessment practices and students’ understanding in the context of scientific inquiry. Journal of Research in Science Teaching, 44(1), 57–84. https://doi.org/10.1002/tea.20163

Saal, F. E., Downey, R. G., & Lahey, M. A. (1980). Rating the ratings: Assessing the psychometric quality of rating data. Psychological Bulletin, 88(2), 413–428. https://doi.org/10.1037/0033-2909.88.2.413

Sadler, D. R. (2009). Indeterminacy in the use of preset criteria for assessment and grading. Assessment & Evaluation in Higher Education, 34(2), 159-179. https://doi.org/10.1080/02602930801956059

Scannell, D. P., & Marshall, J. C. (1966). The effect of selected composition errors on grades assigned to essay examinations. American Educational Research Journal, 3(2), 125–130.

Südkamp, A., Kaiser, J., & Möller, J. (2012). Accuracy of teachers’ judgments of students’ academic achievement: A meta-analysis. Journal of Educational Psychology, 104(3), 743–762. https://doi.org/10.1037/a0027627

Sweedler-Brown, C. O. (1993). ESL essay evaluation: The influence of sentence-level and rhetorical features. Journal of Second Language Writing, 2(1), 3–17. https://doi.org/10.1016/1060-3743(93)90003-L

Urhahne, D., & Wijnia, L. (2021). A Review on the Accuracy of Teacher Judgments. Educational Research Review, 32, 100374. https://doi.org/10.1016/j.edurev.2020.100374

Vögelin, C., Jansen, T., Keller, S., Machts, N., & Möller, J. (2019). The influence of lexical features on teacher judgments of ESL argumentative essays. Assessing Writing, 39, 50–63. https://doi.org/10.1016/j.asw.2018.12.003

Vögelin, C., Jansen, T., Keller, S., & Möller, J. (2018). The impact of vocabulary and spelling on judgments of ESL essays: an analysis of teacher comments. The Language Learning Journal. Advance online publication. https://doi.org/10.1080/09571736.2018.1522662

Weigle, S. C. (2002). Assessing Writing.: Cambridge Language Assessment Series. Cambridge: CUP.

Weir, C. (1988). The specification, realization and validation of an English language proficiency test. In Hughes A. (Ed.), Testing English for university study. ELT documents 127 (pp. 45–110). London: Modern English Publications in association with The British Council.

Wind, S. A., Stager, C., & Patil, Y. J. (2017). Exploring the relationship between textual characteristics and rating quality in rater-mediated writing assessments: An illustration with L1 and L2 writing assessments. Assessing Writing, 34, 1–15. https://doi.org/10.1016/j.asw.2017.08.003

Wolfe, E. W., Song, T., & Jiao, H. (2016). Features of difficult-to-score essays. Assessing Writing, 27, 1–10. https://doi.org/10.1016/j.asw.2015.06.002

Zimmermann, F., Möller, J., & Köller, O. (2018). When students doubt their teachers’ diagnostic competence: Moderation in the internal/external frame of reference model. Journal of Educational Psychology, 110(1), 46–57. https://doi.org/10.1037/edu0000196.