Main Article Content
When judging subject-specific aspects of students’ texts, teachers should assess various characteristics, e.g., spelling and content, independently of one another since these characteristics are indicators of different skills. Independent judgments enable teachers to adapt their classroom instruction according to students’ skills. It is still unclear how well teachers meet this challenge and which intervention could be helpful to them. In Study 1, N = 51 pre-service teachers assessed four authentic English as a Second Language (ESL) essays with different overall text qualities and different qualities of spelling using holistic and analytic rating scales. Results showed a negative influence of the experimentally manipulated spelling errors on the judgment of almost all textual characteristics. In Study 2, an experimental prompt was used to reduce this judgment error. Participants who were made aware of the judgment error caused by spelling errors formed their judgments in a less biased way, indicating a reduction of bias. The determinants of the observed effects and their practical implications are discussed.
FLR adopts the Attribution-NonCommercial-NoDerivs Creative Common License (BY-NC-ND). That is, Copyright for articles published in this journal is retained by the authors with, however, first publication rights granted to the journal. By virtue of their appearance in this open access journal, articles are free to use, with proper attribution, in educational and other non-commercial settings.
Bae, J., & Bachman, L. F. (2010). An investigation of four writing traits and two tasks across two languages. Language Testing, 27(2), 213–234. https://doi.org/10.1177/0265532209349470
Bae, J., Bentler, P. M., & Lee, Y.‑S. (2016). On the role of content in writing assessment. Language Assessment Quarterly, 13(4), 302–328. https://doi.org/10.1080/15434303.2016.1246552
Bakker, M., van Dijk, A., & Wicherts, J. M. (2012). The rules of the game called psychological science. Perspectives on Psychological Science, 7(6), 543–554. https://doi.org/10.1177/1745691612459060
Barkaoui, K. (2010). Explaining ESL essay holistic scores: A multilevel modeling approach. Language Testing, 27(4), 515–535. https://doi.org/10.1177/0265532210368717
Birkel, P., & Birkel, C. (2002). Wie einig sind sich Lehrer bei der Aufsatzbeurteilung? Eine Replikationsstudie zur Untersuchung von Rudolf Weiss [How Concordant are Teachers’ Essay Scorings? A Replication of Rudolf Weiss’ Sudies]. Psychologie in Erziehung Und Unterricht, 49(3), 219–224.
Brookhart, S. M. (2011). Educational assessment knowledge and skills for teachers. Educational Measurement: Issues and Practice, 30(1), 3–12. https://doi.org/10.1111/j.1745-3992.2010.00195.x
Brookhart, S. M. (2013). The use of teacher judgement for summative assessment in the USA. Assessment in Education: Principles, Policy & Practice, 20(1), 69–90. https://doi.org/10.1080/0969594X.2012.703170
Chamberlain, S., & Taylor, R. (2011). Online or face‐to‐face? An experimental study of examiner training. British Journal of Educational Technology, 42(4), 665–675. https://doi.org/10.1111/j.1467-8535.2010.01062.x
Chernikova, O., Heitzmann, N., Fink, M.C. et al. (2019). Facilitating Diagnostic Competencies in Higher Education—a Meta-Analysis in Medical and Teacher Education. Educ Psychol Rev, 32, 157–196. https://doi.org/10.1007/s10648-019-09492-2
Cooksey, R. W., Freebody, P., & Wyatt-Smith, C. (2007). Assessment as Judgment-in-Context: Analysing how teachers evaluate students’ writing 1. Educational Research and Evaluation, 13(5), 401–434. https://doi.org/10.1080/13803610701728311
Culham, R. (2003). 6+ 1 traits of writing: The complete guide. New York: Scholastic Inc.
Cumming, A., Kantor, R., & Powers, D. E. (2002). Decision making while rating ESL/EFL writing tasks: A descriptive framework. The Modern Language Journal, 86(1), 67–96. https://doi.org/10.1111/1540-4781.00137
Dempsey, M. S., PytlikZillig, L. M., & Bruning, R. H. (2009). Helping preservice teachers learn to assess writing: Practice and feedback in a Web-based environment. Assessing Writing, 14(1), 38–61. https://doi.org/10.1016/j.asw.2008.12.003
Driscoll, D. P., Avallone, A. P., Orr, C. S., & Crovo, M. (2010). Writing framework for the 2011 National Assessment of Educational progress. Washington, DC: National Assessment Governing Board, US Dept. of Education.
Elliott, J., Lee, S. W., & Tollefson, N. (2001). A reliability and validity study of the Dynamic Indicators of Basic Early Literacy Skills-Modified. School Psychology Review, 30(1), 33–49.
European Commission (2008). Multilingualism - an asset for Europe and a shared commitment. Retrieved from http://eur-lex.europa.eu/legal-content/EN/TXT/?uri=URISERV:ef0003
Fiske, S. T., & Neuberg, S. L. (1990). A continuum of impression formation, from category-based to individuating processes: Influences of information and motivation on attention and interpretation. In M. P. Zanna (Ed.), Advances in experimental social psychology (Vol. 23, pp. 1–74). New York: NY: Academic Press.
Flor, M., Futagi, Y., Lopez, M., & Mulholland, M. (2015). Patterns of misspellings in L2 and L1 English: A view from the ETS Spelling Corpus. Bergen Language and Linguistics Studies, 6, 107–132. https://doi.org/10.15845/bells.v6i0.811
Flower, L., & Hayes, J. R. (1981). A cognitive process theory of writing. College Composition and Communication, 32(4), 365–387. https://doi.org/10.2307/356600
Freedman, S. W. (1979). How characteristics of student essays influence teachers’ evaluations. Journal of Educational Psychology, 71(3), 328–338. https://doi.org/10.1037/0022-06126.96.36.1998
Hamp-Lyons, L. (1991). Assessing Second Language Writing in Academic Contexts. Chestnut St., Norwood: Ablex Publishing Corporation.
Heitzmann, N., Fischer, F., & Fischer, M. R. (2018). Worked examples with errors: When self-explanation prompts hinder learning of teachers diagnostic competences on problem-based learning. Instructional Science, 46(2), 245–271. https://doi.org/10.1007/s11251-017-9432-2.
Herppich, S., Praetorius, A.‑K., Förster, N., Karst, K., Leutner, D., Behrmann, L., . . . Südkamp, A. (2017). Teachers’ assessment competence: Integrating knowledge-, process-, and product-oriented approaches into a competence-oriented conceptual model. Teaching and Teacher Education. (76), 1–13. https://doi.org/10.1016/j.tate.2017.12.001
Huot, B. (1996). Toward a new theory of writing assessment. College composition and communication, 47(4), 549-566. https://doi.org/10.2307/358601
Hyland, K. (2008). Second language writing. New York: Cambridge University Press. https://doi.org/10.1017/S0261444808005235
Jansen, T., Vögelin, C., Machts, N., Keller, S., & Möller, J. (2019). Das Schülerinventar ASSET zur Beurteilung von Schülerarbeiten im Fach Englisch: Drei experimentelle Studien zu Effekten der Textqualität und der Schülernamen [The Student Inventory ASSET for judging students performances in the subject English: Three experimental studies on effect of text quality and student names]. Psychologie in Erziehung Und Unterricht, 66(4), 303–315. https://doi.org/10.2378/peu2019.art21d
Jansen, T., Vögelin, C., Machts, N., Keller, S., Köller, O., & Möller, J. (2021). Judgment accuracy in experienced versus student teachers: Assessing essays in English as a foreign language. Teaching and Teacher Education, 97, 103216. https://doi.org/10.1016/j.tate.2020.103216
Kaiser, J., Möller, J., Helm, F., & Kunter, M. (2015). Das Schülerinventar: Welche Schülermerkmale die Leistungsurteile von Lehrkräften beeinflussen [The student inventory: how student characteristics bias teacher judgments]. Zeitschrift Für Erziehungswissenschaft, 18(2), 279–302. https://doi.org/10.1007/s11618-015-0619-5
Kaufmann, E. (2020). How accurately do teachers judge students? Re-analysis of Hoge and Coladarci (1989) meta-analysis. Contemporary Educational Psychology, 63, 101902. https://doi.org/10.1016/j.cedpsych.2020.101902
Keller, S. (2013). Integrative Schreibdidaktik Englisch für die Sekundarstufe: Theorie, Prozessgestaltung, Empirie. Tübingen: Gunter Narr Verlag.
Lai, E. R., Wolfe, E. W., & Vickers, D. (2015). Differentiation of illusory and true halo in writing scores. Educational and Psychological Measurement, 75(1), 102–125. https://doi.org/10.1177/0013164414530990
Lovorn, M. G., & Rezaei, A. R. (2011). Assessing the assessment: Rubrics training for pre-service and new in-service teachers. Practical Assessment, Research & Evaluation, 16(16), 1–18.
Marshall, J. C. (1967). Composition errors and essay examination grades re-examined. American Educational Research Journal, 4(4), 375–385.
Meadows, M., & Billington, L. (2010). The effect of marker background and training on the quality of marking in GCSE English. Manchester: AQA Centre for Education Research and Policy.
Murphy, K. R., & Reynolds, D. H. (1988). Does true halo affect observed halo? Journal of Applied Psychology, 73(2), 235–238. https://doi.org/10.1037/0021-9010.73.2.235
Open Science Collaboration (2015). Estimating the reproducibility of psychological science. Science, 349(6251). https://doi.org/10.1126/science.aac4716
Parr, J. M., & Timperley, H. S. (2010). Feedback to writing, assessment for teaching and learning and student progress. Assessing Writing, 15(2), 68–85. https://doi.org/10.1016/j.asw.2010.05.004
Rafoth, B. A., & Rubin, D. L. (1984). The impact of content and mechanics on judgments of writing quality. Written Communication, 1(4), 446–458. https://doi.org/10.1177/0741088384001004004
Rezaei, A. R., & Lovorn, M. (2010). Reliability and validity of rubrics for assessment through writing. Assessing Writing, 15(1), 18–39. https://doi.org/10.1016/j.asw.2010.01.003
Royal‐Dawson, L., & Baird, J.‑A. (2009). Is teaching experience necessary for reliable scoring of extended English questions? Educational Measurement: Issues and Practice, 28(2), 2–8. https://doi.org/10.1111/j.1745-3992.2009.00142.x
Ruiz‐Primo, M. A., & Furtak, E. M. (2007). Exploring teachers’ informal formative assessment practices and students’ understanding in the context of scientific inquiry. Journal of Research in Science Teaching, 44(1), 57–84. https://doi.org/10.1002/tea.20163
Saal, F. E., Downey, R. G., & Lahey, M. A. (1980). Rating the ratings: Assessing the psychometric quality of rating data. Psychological Bulletin, 88(2), 413–428. https://doi.org/10.1037/0033-2909.88.2.413
Sadler, D. R. (2009). Indeterminacy in the use of preset criteria for assessment and grading. Assessment & Evaluation in Higher Education, 34(2), 159-179. https://doi.org/10.1080/02602930801956059
Scannell, D. P., & Marshall, J. C. (1966). The effect of selected composition errors on grades assigned to essay examinations. American Educational Research Journal, 3(2), 125–130.
Südkamp, A., Kaiser, J., & Möller, J. (2012). Accuracy of teachers’ judgments of students’ academic achievement: A meta-analysis. Journal of Educational Psychology, 104(3), 743–762. https://doi.org/10.1037/a0027627
Sweedler-Brown, C. O. (1993). ESL essay evaluation: The influence of sentence-level and rhetorical features. Journal of Second Language Writing, 2(1), 3–17. https://doi.org/10.1016/1060-3743(93)90003-L
Urhahne, D., & Wijnia, L. (2021). A Review on the Accuracy of Teacher Judgments. Educational Research Review, 32, 100374. https://doi.org/10.1016/j.edurev.2020.100374
Vögelin, C., Jansen, T., Keller, S., Machts, N., & Möller, J. (2019). The influence of lexical features on teacher judgments of ESL argumentative essays. Assessing Writing, 39, 50–63. https://doi.org/10.1016/j.asw.2018.12.003
Vögelin, C., Jansen, T., Keller, S., & Möller, J. (2018). The impact of vocabulary and spelling on judgments of ESL essays: an analysis of teacher comments. The Language Learning Journal. Advance online publication. https://doi.org/10.1080/09571736.2018.1522662
Weigle, S. C. (2002). Assessing Writing.: Cambridge Language Assessment Series. Cambridge: CUP.
Weir, C. (1988). The specification, realization and validation of an English language proficiency test. In Hughes A. (Ed.), Testing English for university study. ELT documents 127 (pp. 45–110). London: Modern English Publications in association with The British Council.
Wind, S. A., Stager, C., & Patil, Y. J. (2017). Exploring the relationship between textual characteristics and rating quality in rater-mediated writing assessments: An illustration with L1 and L2 writing assessments. Assessing Writing, 34, 1–15. https://doi.org/10.1016/j.asw.2017.08.003
Wolfe, E. W., Song, T., & Jiao, H. (2016). Features of difficult-to-score essays. Assessing Writing, 27, 1–10. https://doi.org/10.1016/j.asw.2015.06.002
Zimmermann, F., Möller, J., & Köller, O. (2018). When students doubt their teachers’ diagnostic competence: Moderation in the internal/external frame of reference model. Journal of Educational Psychology, 110(1), 46–57. https://doi.org/10.1037/edu0000196.