Ensuring content validity of psychological and educational tests – the role of experts

Main Article Content

Klaus Beck


Many test developers try to ensure the content validity of their tests by having external experts review the items in terms of relevance, difficulty, clarity, and so on. Although this approach is widely accepted, a closer look reveals there are several pitfalls that need to be avoided if experts’ advice is to be truly helpful. First, I offer a classification of tasks experts are given by test developers as reported on in the literature dealing with procedures of drawing on experts’ advice. Second, I review a sample of reports on test development (N = 72) to identify the common current procedures for selecting and consulting experts. Results indicate that often the choice of experts seems to be somewhat arbitrary, the questions posed to experts lack precision, and the methods used to evaluate experts' feedback are questionable. Given these findings I explore in more depth what prerequisites are necessary for their contributions to be useful in ensuring the content validity of tests. In conclusion, explicit guidelines on this matter need to be elaborated and standardised (above all, by the AERA, APA, and NCME-"Standards").

Article Details

How to Cite
Beck, K. (2020). Ensuring content validity of psychological and educational tests – the role of experts. Frontline Learning Research, 8(6), 1–37. https://doi.org/10.14786/flr.v8i6.517


AERA, APA, & NCME (1985). Standards for Educational and Psychological Testing. Washington: APA.

AERA, APA, & NCME (2014). Standards for Educational and Psychological Testing. Washington: AERA.

Allen, M. J., Yen, W. M. (2002). Introduction to measurement theory (2nd ed.). Prospect Heights, IL: Waveland Press.

Anastasi A., Urbina S. (1997). Psychological testing (7th ed.). New York, NY: Prentice Hall.

Anderson, D., Irvin, S., Alonzo, J., & Tindal, G. A. (2015). Gauging Item Alignment Through Online Systems While Controlling for Rater Effects. Educational Measurement: Issues and Practice, 34(1), 22–33.

Angoff, W. H. (1988). Validity: an evolving concept. In H. Wainer & H. Braun (Eds.), Test validity (pp. 9–13). Hillsdale, NJ: Lawrence Erlbaum.

Baker, E. (2013). The Chimera of Validity. Teachers’ College Record, 115(9).

Beck, K. (1987). Die empirischen Grundlagen der Unterrichtsforschung [The empirical foundations of research on classroom teaching. A critical analysis of the descriptive power of observation methods]. Goettingen: Hogrefe.

Beck, K., Landenberger, M. & Oser, F. (Eds.) (2016). Technoligiebasierte Kompetenzmessung in der beruflichen Bildung [Technology-based measurement in vocational education and training]. Bielefeld: Bertelsmann.

Berk, R. (1990). Importance of expert judgment in content-related validity evidence. Western Journal of Nursing Research, 12(5), 659–671. DOI: org/10.1177/019394599001200507

Brennan, R. L. (2013). Commentary on “Validating the Interpretations and Uses of Test Scores”. Journal of Educational Measurement, 50(1), 74-83.

Dodd, J. (2002). Truth. Analytic Philosophy, 43(4), 279-291. DOI: 10.1111/1468-0149.00270 [https://onlinelibrary.wiley.com/doi/abs/10.1111/1468-0149.00270]

Ericsson, K. A. & Smith, J. (1991). Prospects and limits of the empirical study of expertise: an introduction. In K. A. Ericsson & J. Smith (eds.), Toward a general theory of expertise (pp. 1-39). New York: Cambridge Univ. Press.

Field, J. (2013). Cognitive validity. In A. Geranpayeh & L. Taylor, Examining Listening. Research and practice in assessing second language listening (pp. 77-151). Cambridge, UK: Cambridge Univ. Press

Gay, L. R. (1980). Educational evaluation and measurement: Competencies for analysis and application. Columbus, OH: Charles E. Merrill.

Jackson, F., Oppy, G. & Smith, M. (1994). Minimalism and truth aptness. Mind, 103(411), 287-302. [https://www.jstor.org/stable/2253741?seq=1#page_scan_tab_contents]

Grant, J. S. & Davis, L. L. (1997). Selection and Use of Content Experts for Instrument Development. Research in Nursing & Health, 20, 269–274.

Guion, R. M. (1977). Content validity: the source of my discontent. Applied Psychological Measurement, 1(1), 1-10. DOI.org/10.1177/014662167700100103

Henig, J. R. (2013). The Politics of Testing When measures “Go Public”. Teachers’ College Record, 115(9), 1-11.

Kane, M. T. (2013a). Validating the Interpretations and Uses of Test Scores. Journal of Educational Measurement, 50(1), 1–73.

Kane, M. T. (2013b). Validation as a Pragmatic, Scientific Activity. Journal of Educational Measurement, 50(1), 115–122.

Kerlinger F. N. (1986). Foundations of behavioral research (3rd ed.). New York, NY: Holt, Rinehart, & Winston

Lawshe, C. H. (1975). A quantitative approach to content validity. Personnel Psychology, 28, 563–575.

Lynn, M. (1986). Determination and quantification of content validity. Nursing Research, 35, 382–385.

Maas, Van der, H. L. J., Kan, K.-J. & Borsboom, D. (2014). Intelligence Is What the Intelligence Test Measures. Seriously. Journal of Intelligence, 2(1), 12-15. DOI: https://doi.org/10.3390/jintelligence2010012

Messick, S. (1987). Validity. ETS Research Report Series. Vol. 1987, Issue 2, 1-108. http://onlinelibrary.wiley.com/doi/10.1002/j.2330-8516.1987.tb00244.x/abstract; date accessed 2018/05/06; doi: 10.1002/j.2330-8516.1987.tb00244.x)

Messick, S. (1990). Validity of Test Interpretation and Use. ETS Research Report Series. https://eric.ed.gov/?id=ED395031; date accessed 2019/03/30.

Newton, P. E. & Shaw, S. D. (2014). Validity in Educational and Psychological Assessment. Los Angeles: Sage.

Pant, H. A., Zlatkin-Troitschanskaia, O., Lautenbach, C., Toepper, M. & Molerov, D. (eds.) (2016). Modelling and Measuring Competencies in Higher Education – Validation and Methodological Innovations (KoKoHs) – Overview of the Research Projects (KoKoHs Working Papers, 11). Berlin & Mainz: Humboldt University & Johannes Gutenberg University. http://www.kompetenzen-im-hochschulsektor.de/617_DEU_HTML.php; date accessed 2018/03/20.

Popper, K.R. (1972). Objective Knowledge. Oxford: Clarendon.

Reynolds C. R., Livingston R. B., Willson V. (2009). Measurement and assessment in education (2nd ed.). Upper Saddle River, NJ: Pearson.

Rovinelli, R. J. & Hambleton, R. K. (1977). On the use of content specialists in the assessment of criterion-referenced test item validity. Dutch Journal for Educational Research, 2, 49-60.

Savigny, von, E. (19712). Grundkurs im wissenschaftlichen Definieren [Basic course on scientific defining]. München: DTV.

Shavelson, R. J., Gao, X. & Baxter, G. P. (1995). On the content validity of performance assessments: Centrality of domain specification. In M. Birenbaum & F. Douchy (eds.), Alternatives in Assessment of Achievements, Learning Process, and Prior Knowledge (pp. 131–141). Boston: Kluwer Academic.

Shepard, L. A. (2013). Validity for What Purpose? Teachers’ College Record, Vol. 115(9), p. 1-12. http://www.tcrecord.org ID Number: 17116, date accessed: 2019/01/23.

Siedentop, D. & Eldar, E. (1989). Expertise, experience, and effectiveness. Journal of Teaching Physical Education, 8, 254-260.

Sireci, S. G. (1998). The construct of content validity. Social Indicators Research, 45(1), 83-117. DOI:org/10.1023/A:100698552

Smith, M. D. (2017). Cognitive Validity: Can Multiple-Choice Items Tap Historical Thinking Processes? American Educational Research Journal, 54(6), 1256-1287. DOI: 10.3102/0002831217717949

Thorn, D. W. & Deitz, J. C. (1989). Examining Content Validity Through the Use of Content Experts. The Occupational Therapy Journal of Research, 9, 334-346.

Vogt, D. S., King, D. W. & King, L. A. (2004). Focus Groups in Psychological Assessment: Enhancing Content Validity by Consulting Members of the Target Population. Psychological Assessment, 16(3), 231-243.

Walstad, W. B. & Rebeck, K. (2001). Test of Economic Literacy. Third Edition. New York: National Council on Economics Education.

Welner, K. G. (2013). Consequential Validity and the Transformation of Tests from Measurement Tools to Policy Tools. Teachers’ College Record, 115(9), p. 1-6. http://www.tcrecord.org ID Number: 17115, date accessed: 06.05.2015.

White, M. C. (2018). Rater Performance Standards for Classroom Observation Instruments. Educational Researcher, 47(8), 492-501.

Wilson, F. R., Pan, W. & Schumsky, D. A. (2012). Recalculation of the Critical Values for Lawshe’s Content Validity Ratio. Measurement and Evaluation in Counseling and Development, 45(3) 197–210.

Zlatkin-Troitschanskaia, O., Pant, H. A., Nagel, Th.-M., Molerov, D., Lautenbach, C. & Toepper, M. (Eds.) (2020). Portfolio of KoKoHs Assessemnts. Test Instruments for Modelling and Measuring Domain-specific and Generic Competencies of Higher Education Students and Graduates. Mainz & Berlin. https://www.wihoforschung.de/_medien/downloads/KoKoHs_Kompetenztest-Verfahren_Englisch.pdf