Understanding Screening: Validity
Validity is broadly defined as how well something measures what it’s supposed to measure. The reliability and validity of scores from assessments are two concepts that are closely knit together and feed into each other. Marianne Miserandino’s Ice Cream Personality Test can help illustrate their overlap and differences. This assessment is often used to demonstrate the concepts of reliability and validity. A group of participants answer two questions. The first question asks the participant to select their top ice cream preference from a list of six flavor options. The second question asks the participant which of six personality descriptions bests fits their personality. The reliability of scores could be demonstrated by administering the questions twice over a two-week period and then correlating the responses for each question over the assessment period. A higher correlation for each question across the two-week testing would provide evidence that participants’ ice cream preferences and personality selections were stable over two weeks. Where reliability would be evaluated through the consistency of scores, validity is concerned with how well a set of scores reflects the intended construct or domain being assessed.
In our Ice Cream Personality Test example, if a researcher hypothesized that ice cream preference is associated with personality description, a moderate to strong correlation between ice cream and personality preference would provide an evidence of validity. This qualifier, an evidence of validity, is used because there are many ways to operationalize and provide evidence for how well scores (e.g., from an assessment such as a screener) reflect what an assessment is supposed to measure. A unifying perspective for the validity of screener scores is that there are six broad forms of validity that may be housed under an umbrella term of construct validity.
- Content validity are the characterizations of the assessment content’s relevance, the overall representativeness of the content (e.g., test items or stimuli), and the quality of the test items or stimuli.
- Substantive validity is established through a description of the theoretical rationales that explain consistency in one’s response to test items.
- Structural validity describes how well the grouping of scores within an assessment aligns with the theoretical grouping of what the item content measures.
- Generalizability is concerned with the interpretation of scores and how well they generalize across different samples and different time points.
- External validity includes sub-areas of convergent validity (i.e., how well sets of scores that should be correlated are correlated), discriminant validity (i.e., how well sets of scores that should not be correlated are not actually correlated), and predictive validity (i.e., how well a set of scores at one time point predicts scores at another time point).
- Consequential validity describes the implications for what happens when correct decisions or decision errors occur based on screener scores.
Petscher, Y., Pentimonti, J., & Stanley, C. (2019). Validity. Washington, DC: U.S. Department of Education, Office of Elementary and Secondary Education, Office of Special Education Programs, National Center on Improving Literacy. Retrieved from improvingliteracy.org.
Validity is broadly defined as how well something measures what it’s supposed to measure. The reliability and validity of scores from assessments are two concepts that are closely knit together and feed into each other.
The research reported here is funded by awards to the National Center on Improving Literacy from the Office of Elementary and Secondary Education, in partnership with the Office of Special Education Programs (Award #: S283D160003). The opinions expressed are those of the authors and do not represent views of OESE, OSEP, or the U.S. Department of Education. Copyright © 2021 National Center on Improving Literacy. https://improvingliterarcy.org