-
-
Autor*innen: Buchholz, Janine; Hartig, Johannes
Titel: The impact of ignoring the partially compensatory relation between ability dimensions on norm-referenced test scores
In: Psychological Test and Assessment Modeling, 60 (2018) 3, S. 369-385
URL: https://www.psychologie-aktuell.com/fileadmin/Redaktion/Journale/ptam_3-2018_369-385.pdf
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Beitrag in Sonderheft
Sprache: Englisch
Schlagwörter: Schülerleistung; Leistungsmessung; Test; Interpretation; Item-Response-Theory; Modell; Methode; Validität; Mathematische Kompetenz; Sprachfertigkeit; Simulation; Empirische Untersuchung
Abstract (english): The IRT models most commonly employed to estimate within-item multidimensionality are compensatory and suggest that some dimensions (e.g., traits or abilities) can make up for a lack in others. However, many assessment frameworks in educational large-scale assessments suggest partially compensatory relations among dimensions. In two Monte-Carlo simulation studies we varied the loading pattern, the latent correlation between dimensions and the ability distribution to evaluate the impact on test scores when a compensatory model is incorrectly applied onto partially compensatory data. Findings imply only negligible effects when true abilities are bivariate normal. Assuming a uniform distribution, however, analyses of differences in test scores demonstrated systematic effects for specific patterns of true ability: High abilities are largely underestimated when the other ability required to solve some of the items was low. These findings highlight the necessity of applying the partially compensatory model under data conditions likely to occur in educational large-scale assessments. (DIPF/Orig.)
DIPF-Abteilung: Bildungsqualität und Evaluation
-
-
Autor*innen: Frey, Andreas; Spoden, Christian; Goldhammer, Frank; Wenzel, S. Franziska C.
Titel: Response time-based treatment of omitted responses in computer-based testing
In: Behaviormetrika, 45 (2018) 2, S. 505-526
DOI: 10.1007/s41237-018-0073-9
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Beitrag in Sonderheft
Sprache: Englisch
Schlagwörter: Methode; Technologiebasiertes Testen; Antwort; Dauer; Verhalten; Item-Response-Theory; Fehlende Daten; Datenanalyse; Testaufgabe; Typologie; Medienkompetenz; Schülerleistungstest; Testauswertung
Abstract: A new response time-based method for coding omitted item responses in computer-based testing is introduced and illustrated with empirical data. The new method is derived from the theory of missing data problems of Rubin and colleagues and embedded in an item response theory framework. Its basic idea is using item response times to statistically test for each individual item whether omitted responses are missing completely at random (MCAR) or missing due to a lack of ability and, thus, not at random (MNAR) with fixed type-1 and type-2 error levels. If the MCAR hypothesis is maintained, omitted responses are coded as not administered (NA), and as incorrect (0) otherwise. The empirical illustration draws from the responses given by N = 766 students to 70 items of a computer-based ICT skills test. The new method is compared with the two common deterministic methods of scoring omitted responses as 0 or as NA. In result, response time thresholds from 18 to 58 s were identified. With 61%, more omitted responses were recoded into 0 than into NA (39%). The differences in difficulty were larger when the new method was compared to deterministically scoring omitted responses as NA compared to scoring omitted responses as 0. The variances and reliabilities obtained under the three methods showed small differences. The paper concludes with a discussion of the practical relevance of the observed effect sizes, and with recommendations for the practical use of the new method as a method to be applied in the early stage of data processing. (DIPF/Orig.)
DIPF-Abteilung: Bildungsqualität und Evaluation
-
-
Autor*innen: Köhler, Carmen
Titel: Isn't something missing? Latent variable models accounting for item nonresponse
Erscheinungsvermerk: Berlin: Freie Universität, 2017
URN: urn:nbn:de:kobv:188-fudissthesis000000103203-8
URL: http://www.diss.fu-berlin.de/diss/receive/FUDISS_thesis_000000103203
Dokumenttyp: 1. Monographien (Autorenschaft); Monographie
Sprache: Englisch
Schlagwörter: Empirische Forschung; Evaluation; Fehlende Daten; Item-Response-Theory; Kompetenz; Leistungsmessung; Modell; Schülerleistung; Schülerleistungstest; Statistische Methode; Testauswertung
Abstract: Item nonresponse in competence tests pose a threat to a valid and reliable competence measurement, especially if the missing values occur systematically and relate to the unobserved response. This is often the case in the context of large-scale assessments, where the failure to respond to an item relates to examinee ability. Researchers developed methods that consider the dependency between ability and item nonresponse by incorporating a model for the process that causes missing values into the measurement model for ability. These model-based approaches seem very promising and might prove superior to common missing data approaches, which typically fail at taking the dependency between ability and nonresponse into account. Up to this point, the approaches have barely been investigated in terms of applicability and performance with regard to the scaling of competence tests in large-scale assessments. The current dissertation bridges the gap between these theoretically postulated models and their possible implementation in the context of large-scale assessments. It aims at (1) testing the applicability of model-based approaches to competence test data, and (2) evaluating whether and under what missing data conditions these approaches are superior to common missing data approaches. Three research studies were conducted for this purpose. Study 1 investigated the assumptions of model-based approaches, whether they hold in empirical practice, and how violations to those assumptions affect individual person parameters. Study 2 focused on features of examinees' nonresponse behavior, such as its stability across different competence tests and how it relates to other examinee characteristics. Study 3 examined the performance of model-based approaches compared to other approaches.
Results demonstrate that model-based approaches can be applied to large-scale assessment data, though slight extensions of the models might enhance accuracy in parameter estimates. Further, persons' tendencies not to respond can be considered person-specific attributes, which are relatively constant across different competence tests and also relate to other stable person characteristics. Findings from the third study confirmed the superiority of the model-based approaches compared to common missing data approaches, although a model that simply ignores missing values also led to acceptable results.
Model-based approaches show serval advantages over common missing data approaches. Considering their complexity, however, the benefits and drawbacks from different methods need to be weighed. Important issues in the debate on an appropriate scaling method concern model complexity, consequences on examinees' test-taking behavior, and precision of parameter estimates. For many large-scale assessments, a change in the missing data treatment is clearly necessary. Whether model-based approaches will replace former methods is yet to be determined. They certainly count amongst the most advanced methods to handle missing values in the scaling of competence tests. (DIPF/Orig.)
DIPF-Abteilung: Bildungsqualität und Evaluation
-
-
Autor*innen: Goldhammer, Frank; Martens, Thomas; Lüdtke, Oliver
Titel: Conditioning factors of test-taking engagement in PIAAC. An exploratory IRT modelling approach considering person and item characteristics
In: Large-scale Assessments in Education, 5 (2017) , S. 18
DOI: 10.1186/s40536-017-0051-9
URL: https://largescaleassessmentsineducation.springeropen.com/articles/10.1186/s40536-017-0051-9
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Beitrag in Sonderheft
Sprache: Englisch
Schlagwörter: Antwort; Einflussfaktor; Erwachsener; Item-Response-Theory; Kanada; Längsschnittuntersuchung; Leistungstest; Lesekompetenz; Mathematische Kompetenz; Messung; Motivation; PIAAC (Programme for the International Assessment of Adult Competencies); Problemlösen; Selbstkonzept; Technologiebasiertes Testen; Verhalten
Abstract: Background: A potential problem of low-stakes large-scale assessments such as the Programme for the International Assessment of Adult Competencies (PIAAC) is low test-taking engagement. The present study pursued two goals in order to better understand conditioning factors of test-taking disengagement: First, a model-based approach was used to investigate whether item indicators of disengagement constitute a continuous latent person variable by domain. Second, the effects of person and item characteristics were jointly tested using explanatory item response models. Methods: Analyses were based on the Canadian sample of Round 1 of the PIAAC, with N = 26,683 participants completing test items in the domains of literacy, numeracy, and problem solving. Binary item disengagement indicators were created by means of item response time thresholds. Results: The results showed that disengagement indicators define a latent dimension by domain. Disengagement increased with lower educational attainment, lower cognitive skills, and when the test language was not the participant's native language. Gender did not exert any effect on disengagement, while age had a positive effect for problem solving only. An item's location in the second of two assessment modules was positively related to disengagement, as was item difficulty. The latter effect was negatively moderated by cognitive skill, suggesting that poor test-takers are especially likely to disengage with more difficult items. Conclusions: The negative effect of cognitive skill, the positive effect of item difficulty, and their negative interaction effect support the assumption that disengagement is the outcome of individual expectations about success (informed disengagement). (DIPF/Orig.)
DIPF-Abteilung: Bildungsqualität und Evaluation
-
-
Autor*innen: He, Jia; Buchholz, Janine; Klieme, Eckhard
Titel: Effects of anchoring vignettes on comparability and predictive validity of student self-reports in 64 cultures
In: Journal of Cross-Cultural Psychology, 48 (2017) 3, S. 319-334
DOI: 10.1177/0022022116687395
URN: urn:nbn:de:0111-dipfdocs-156073
URL: http://www.dipfdocs.de/volltexte/2018/15607/pdf/0022022116687395_A.pdf
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Bewertung; Internationaler Vergleich; Item-Response-Theory; Klassenführung; Kultureinfluss; Mathematikunterricht; Messverfahren; Modell; Motivation; OECD-Länder; PISA <Programme for International Student Assessment>; Qualität; Schüler; Schülerleistungstest; Schülerorientierter Unterricht; Selbsteinschätzung; Unterricht; Validität; Vignette <Methode>
Abstract (english): Anchoring vignettes are item batteries especially designed for correcting responses that might be affected by incomparability. This article investigates the effects of anchoring vignettes on the validity of student self-report data in 64 cultures. Using secondary data analysis from the 2012 Programme for International Student Assessment (PISA), we checked the validity of ratings on vignette questions, and investigated how rescaled item responses of two student scales, Teacher Support and Classroom Management, enhanced comparability and predictive validity. The main findings include that (a) responses to vignette questions represent valid individual and cultural differences; in particular, violations in these responses (i.e., misorderings) are related to low socioeconomic status and low cognitive sophistication; (b) the rescaled responses tend to show higher levels of comparability; and (c) the associations of rescaled Teacher Support and Classroom Management with math achievement, Student-Oriented Instruction, and Teacher-Directed Instruction are slightly different from raw scores of the two target constructs, and the associations with rescaled scores seem to be more in line with the literature. Namely, the associations among all self-report Likerttype scales are weaker with rescaled scores, presumably reducing common method variance, and both rescaled scale scores are more positively related to math achievement. The country ranking also changes substantially; in particular, Asian cultures top the ranking on Teacher Support after rescaling. However, anchoring vignettes are not a cure-all in solving measurement bias in crosscultural surveys; we discuss the technicality and directions for further research on this technique. (DIPF/Orig.)
DIPF-Abteilung: Bildungsqualität und Evaluation
-
-
Autor*innen: Köhler, Carmen; Hartig, Johannes
Titel: Practical significance of item misfit in educational assessments
In: Applied Psychological Measurement, 41 (2017) 5, S. 388-400
DOI: 10.1177/0146621617692978
URN: urn:nbn:de:0111-pedocs-156084
URL: https://nbn-resolving.org/urn:nbn:de:0111-pedocs-156084
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Item-Response-Theory; Korrelation; Leistungsmessung; Rasch-Modell; Schülerleistung; Schülerleistungstest; Testkonstruktion; Testtheorie; Validität
Abstract: Testing item fit is an important step when calibrating and analyzing item response theory (IRT)-based tests, as model fit is a necessary prerequisite for drawing valid inferences from estimated parameters. In the literature, numerous item fit statistics exist, sometimes resulting in contradictory conclusions regarding which items should be excluded from the test. Recently, researchers argue to shift the focus from statistical item fit analyses to evaluating practical consequences of item misfit. This article introduces a method to quantify potential bias of relationship estimates (e.g., correlation coefficients) due to misfitting items. The potential deviation informs about whether item misfit is practically significant for outcomes of substantial analyses. The method is demonstrated using data from an educational test. (DIPF/Orig.)
DIPF-Abteilung: Bildungsqualität und Evaluation
-
-
Autor*innen: Kunina-Habenicht, Olga; Rupp, André A.; Wilhelm, Oliver
Titel: Incremental validity of multidimensional proficiency scores from diagnostic classification models: An illustration for elementary school mathematics
In: International Journal of Testing, 17 (2017) 4, S. 277-301
DOI: 10.1080/15305058.2017.1291517
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Arithmetik; Diagnostik; Empirische Untersuchung; Item-Response-Theory; Leistungsmessung; Mathematische Kompetenz; Modell; Regressionsanalyse; Reliabilität; Schülerleistung; Schülerleistungstest; Schuljahr 04; Testkonstruktion; Validität
Abstract (english): Diagnostic classification models (DCMs) hold great potential for applications in summative and formative assessment by providing discrete multivariate proficiency scores that yield statistically-driven classifications of students. Using data from a newly developed diagnostic arithmetic assessment that was administered to 2,032 fourth-grade students in Germany, we evaluated whether the multidimensional proficiency scores from the best-fitting DCM have an added value, over and above the unidimensional proficiency score from a simpler unidimensional IRT model, in explaning variance in external (a) school grades in mathematics and (b) unidimensional proficiency scores from a standards-based large-scale assessment of mathematics. Results revealed high classification reliabilities as well as interpretable parameter estimates for items and students for the best-fitting DCM. However, while DCM scores were moderatly correlated with both external criteria, only a negligible incremental validity of the multivariate attribute scores was found. (DIPF/Orig.)
DIPF-Abteilung: Bildungsqualität und Evaluation
-
-
Autor*innen: Nagy, Gabriel; Retelsdorf, Jan; Goldhammer, Frank; Schiepe-Tiska, Anja; Lüdtke, Oliver
Titel: Veränderungen der Lesekompetenz von der 9. zur 10. Klasse. Differenzielle Entwicklungen in Abhängigkeit der Schulform, des Geschlechts und des soziodemografischen Hintergrunds?
In: Zeitschrift für Erziehungswissenschaft. Sonderheft, 33 (2017) , S. 177-203
DOI: 10.1007/s11618-017-0747-1
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Beitrag in Sonderheft
Sprache: Deutsch
Schlagwörter: Lesekompetenz; Schülerleistung; Entwicklung; Veränderung; Schuljahr 09; Schuljahr 10; Sekundarstufe I; Schulform; Geschlecht; Migrationshintergrund; Sozioökonomische Lage; Schülerleistungstest; Datenanalyse; Item-Response-Theory; PISA <Programme for International Student Assessment>; Längsschnittuntersuchung; Deutschland
Abstract: Im vorliegenden Beitrag wurde die Entwicklung der Lesekompetenz im letzten Abschnitt der Sekundarstufe I (Klassenstufen 9 bis 10) untersucht. Neben der Veränderung der Testleistungen in der Gesamtpopulation wurden die Assoziationen ausgewählter institutioneller (Schulform), familiärer (Zuwanderungshintergrund und sozioökonomischer familiärer Status) und individueller Merkmale (Geschlecht) mit der Leistungsentwicklung erfasst. In Übereinstimmung mit aktuellen Studien, die eine abflachende Entwicklung der Leseleistung in späteren Phasen der Beschulung zeigen, konnten wir keinen Leistungszuwachs in der Gesamtstichprobe feststellen. Ebenso fanden sich keine belastbaren Hinweise dafür, dass die betrachteten Erklärungsvariablen mit der Kompetenzentwicklung assoziiert sind. Die Auswertungen lieferten jedoch deutliche Indizien dafür, dass die schülerseitige Persistenz der Testbearbeitung, die mittels Positionseffekten erfasst wurde, sich systematisch in Abhängigkeit der Hintergrundvariablen veränderte, wobei stärkere Abnahmen der Bearbeitungspersistenz an nichtgymnasialen Schulformen, bei Jungen, und Schülerinnen und Schülern aus sozioökonomisch schlechter gestellten Familien festgestellt wurden. Die Nichtberücksichtigung des Testbearbeitungsverhaltens führte zu "Scheineffekten" der Erklärungsvariablen Schulform und Geschlecht, die sich bei einer genaueren Betrachtung jedoch nicht auf den realen Kompetenzzuwachs bezogen. (DIPF/Orig.)
DIPF-Abteilung: Bildungsqualität und Evaluation
-
-
Autor*innen: Naumann, Alexander; Hartig, Johannes; Hochweber, Jan
Titel: Absolute and relative measures of instructional sensitivity
In: Journal of Educational and Behavioral Statistics, 42 (2017) 6, S. 678-705
DOI: 10.3102/1076998617703649
URN: urn:nbn:de:0111-pedocs-156029
URL: http://www.dipfdocs.de/volltexte/2018/15602/pdf/1076998617703649_A.pdf
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Bewertung; DESI <Deutsch-Englisch-Schülerleistungen-International>; Deutschland; Englischunterricht; Item-Response-Theory; Leistungsmessung; Messverfahren; Schüler; Schülerleistung; Schuljahr 09; Sprachkompetenz; Test; Testkonstruktion; Testtheorie; Unterricht; Wirkung
Abstract: Valid inferences on teaching drawn from students' test scores require that tests are sensitive to the instruction students received in class. Accordingly, measures of the test items' instructional sensitivity provide empirical support for validity claims about inferences on instruction. In the present study, we first introduce the concepts of absolute and relative measures of instructional sensitivity. Absolute measures summarize a single item's total capacity of capturing effects of instruction, which is independent of the test's sensitivity. In contrast, relative measures summarize a single item's capacity of capturing effects of instruction relative to test sensitivity. Then, we propose a longitudinal multilevel item response theory model that allows estimating both types of measures depending on the identification constraints. (DIPF/Orig.)
DIPF-Abteilung: Bildungsqualität und Evaluation
-
-
Autor*innen: Frey, Andreas; Kröhne, Ulf; Seitz, Nicki-Nils; Born, Sebastian
Titel: Multidimensional adaptive measurement of competences
Aus: Leutner, Detlev;Fleischer, Jens;Grünkorn, Juliane;Klieme, Eckhard (Hrsg.): Competence assessment in education: Research, models and instruments, Cham: Springer, 2017 (Methodology of educational measurement and assessment), S. 369-387
DOI: 10.1007/978-3-319-50030-0_22
Dokumenttyp: 4. Beiträge in Sammelwerken; Sammelband (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Adaptives Testen; Computerunterstütztes Verfahren; Item-Response-Theory; Test; Software; Simulation
Abstract: Even though multidimensional adaptive testing (MAT) is advantageous in the measurement of complex competencies, operational applications are still rare. In an attempt to change this situation, this chapter presents four recent developments that foster the applicability of MAT. First, in a simulation study, we show that multiple constraints can be accounted for in MAT without a loss of measurement precision, by using the multidimensional maximum priority index method. Second, the results from another simulation study show that the high efficiency of MAT is mainly due to the fact that MAT considers prior information in the final ability estimation, and not to the fact that MAT uses prior information for item selection. Third, the multidimensional adaptive testing environment is presented. This software can be used to assemble, configure, and apply multidimensional adaptive tests. Last, the application of the software is illustrated for unidimensional and multidimensional adaptive tests. The application of MAT is especially recommended for large-scale assessments of student achievement. (DIPF/Orig.)
DIPF-Abteilung: Bildungsqualität und Evaluation