This study investigates the impact of incorporating teacher ratings in the identification (ID) process on the diversity and academic profile of the identified population. Given the cost of teacher rating scales, in materials and time, it is important to evaluate how teacher rating scales impact the ID process. Do teacher ratings improve the accuracy and validity of the ID process, are they an unnecessary expense, or worst of all, do they lead to decreased sensitivity and greater inequity? Our study examines the impact of teacher ratings on gifted ID rates for different racial/ethnic and socioeconomic groups as well as the degree to which teacher ratings correlate with measures of student ability and achievement. We are also examining the impact of various identification policies and practices (not only teacher ratings, but achievement scores and cognitive ability scores and combinations of how they are used) on the representation of historically underserved students. We do this by examining data from participating school districts.
We find that between 10% and 25% of a students’ teacher ratings scale score can be attributed to the teacher doing the rating, and between-teacher standard deviations represent an effect size of one-third to one-half standard deviation unit. Our results suggest that teacher ratings of students are not easily comparable across teachers, making it impossible to set a cut score for admission into a program (or for further screening) that functions equitably across teachers. For example, less than 1/3 of students who scored in the top 10% on the cognitive ability measure also scored in the top 10% on teacher rating scales. Even with a lenient teacher rating scale cut score, almost 30% of students who are in the top 10% on ability do not score in the top 30% on TRS. (And in some datasets, almost half of students who score in the top 10% on ability are not in the top 30% on TRS.) Therefore, teacher rating scales should never be used as the sole universal screening instrument to determine which students move forward to a second stage gifted identification process. On a positive note, race/ethnicity, free or reduced lunch status, and English learner status do not consistently predict a teacher's rating of a student after controlling for ability.
With respect to using combinations of teacher rating scales, achievement test scores, and cognitive ability scores, we modeled 64 hypothetical identification systems to examine how different selected measures, norms, cut-off levels, and combinations of data points influenced the size, equity/parity, and academic profiles of students identified for gifted services. OR rules and teacher rating scales generate the largest increases in identification rates for students from traditionally underrepresented groups, but also result in an identified gifted population with lower mean and higher variability in ability and achievement. Also, different rules with similar average characteristics identify very different sets of students. However, no set of measures, norms, cut-off levels, or combination rules result in complete parity in gifted identification for students from traditionally underserved racial/ethnic groups or low-income families. We offer a Shiny App to compare characteristics of students selected for gifted programs based on applying these hypothetical gifted identification systems to real student data in seven districts. This app produces a forest plot of the hypothetical results for select outcome variable across the different identification systems.