Search results in the DIPF database of publications

Your query:

(Schlagwörter: "Item-Response-Theory")

Conditioning factors of test-taking engagement in PIAAC. An exploratory IRT modelling approach […] Goldhammer, Frank; Martens, Thomas; Lüdtke, Oliver Journal Article | In: Large-scale Assessments in Education | 2017 37971 Endnote: Author(s): Goldhammer, Frank; Martens, Thomas; Lüdtke, Oliver
Title: Conditioning factors of test-taking engagement in PIAAC. An exploratory IRT modelling approach considering person and item characteristics
In: Large-scale Assessments in Education, 5 (2017) , S. 18
DOI: 10.1186/s40536-017-0051-9
URL: https://largescaleassessmentsineducation.springeropen.com/articles/10.1186/s40536-017-0051-9
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Beitrag in Sonderheft
Language: Englisch
Keywords: Antwort; Einflussfaktor; Erwachsener; Item-Response-Theory; Kanada; Längsschnittuntersuchung; Leistungstest; Lesekompetenz; Mathematische Kompetenz; Messung; Motivation; PIAAC (Programme for the International Assessment of Adult Competencies); Problemlösen; Selbstkonzept; Technologiebasiertes Testen; Verhalten
Abstract: Background: A potential problem of low-stakes large-scale assessments such as the Programme for the International Assessment of Adult Competencies (PIAAC) is low test-taking engagement. The present study pursued two goals in order to better understand conditioning factors of test-taking disengagement: First, a model-based approach was used to investigate whether item indicators of disengagement constitute a continuous latent person variable by domain. Second, the effects of person and item characteristics were jointly tested using explanatory item response models. Methods: Analyses were based on the Canadian sample of Round 1 of the PIAAC, with N = 26,683 participants completing test items in the domains of literacy, numeracy, and problem solving. Binary item disengagement indicators were created by means of item response time thresholds. Results: The results showed that disengagement indicators define a latent dimension by domain. Disengagement increased with lower educational attainment, lower cognitive skills, and when the test language was not the participant's native language. Gender did not exert any effect on disengagement, while age had a positive effect for problem solving only. An item's location in the second of two assessment modules was positively related to disengagement, as was item difficulty. The latter effect was negatively moderated by cognitive skill, suggesting that poor test-takers are especially likely to disengage with more difficult items. Conclusions: The negative effect of cognitive skill, the positive effect of item difficulty, and their negative interaction effect support the assumption that disengagement is the outcome of individual expectations about success (informed disengagement). (DIPF/Orig.)
DIPF-Departments: Bildungsqualität und Evaluation

Effects of anchoring vignettes on comparability and predictive validity of student self-reports in […] He, Jia; Buchholz, Janine; Klieme, Eckhard Journal Article | In: Journal of Cross-Cultural Psychology | 2017 37052 Endnote: Author(s): He, Jia; Buchholz, Janine; Klieme, Eckhard
Title: Effects of anchoring vignettes on comparability and predictive validity of student self-reports in 64 cultures
In: Journal of Cross-Cultural Psychology, 48 (2017) 3, S. 319-334
DOI: 10.1177/0022022116687395
URN: urn:nbn:de:0111-dipfdocs-156073
URL: http://www.dipfdocs.de/volltexte/2018/15607/pdf/0022022116687395_A.pdf
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Language: Englisch
Keywords: Bewertung; Internationaler Vergleich; Item-Response-Theory; Klassenführung; Kultureinfluss; Mathematikunterricht; Messverfahren; Modell; Motivation; OECD-Länder; PISA <Programme for International Student Assessment>; Qualität; Schüler; Schülerleistungstest; Schülerorientierter Unterricht; Selbsteinschätzung; Unterricht; Validität; Vignette <Methode>
Abstract (english): Anchoring vignettes are item batteries especially designed for correcting responses that might be affected by incomparability. This article investigates the effects of anchoring vignettes on the validity of student self-report data in 64 cultures. Using secondary data analysis from the 2012 Programme for International Student Assessment (PISA), we checked the validity of ratings on vignette questions, and investigated how rescaled item responses of two student scales, Teacher Support and Classroom Management, enhanced comparability and predictive validity. The main findings include that (a) responses to vignette questions represent valid individual and cultural differences; in particular, violations in these responses (i.e., misorderings) are related to low socioeconomic status and low cognitive sophistication; (b) the rescaled responses tend to show higher levels of comparability; and (c) the associations of rescaled Teacher Support and Classroom Management with math achievement, Student-Oriented Instruction, and Teacher-Directed Instruction are slightly different from raw scores of the two target constructs, and the associations with rescaled scores seem to be more in line with the literature. Namely, the associations among all self-report Likerttype scales are weaker with rescaled scores, presumably reducing common method variance, and both rescaled scale scores are more positively related to math achievement. The country ranking also changes substantially; in particular, Asian cultures top the ranking on Teacher Support after rescaling. However, anchoring vignettes are not a cure-all in solving measurement bias in crosscultural surveys; we discuss the technicality and directions for further research on this technique. (DIPF/Orig.)
DIPF-Departments: Bildungsqualität und Evaluation

Practical significance of item misfit in educational assessments Köhler, Carmen; Hartig, Johannes Journal Article | In: Applied Psychological Measurement | 2017 37161 Endnote: Author(s): Köhler, Carmen; Hartig, Johannes
Title: Practical significance of item misfit in educational assessments
In: Applied Psychological Measurement, 41 (2017) 5, S. 388-400
DOI: 10.1177/0146621617692978
URN: urn:nbn:de:0111-pedocs-156084
URL: https://nbn-resolving.org/urn:nbn:de:0111-pedocs-156084
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Language: Englisch
Keywords: Item-Response-Theory; Korrelation; Leistungsmessung; Rasch-Modell; Schülerleistung; Schülerleistungstest; Testkonstruktion; Testtheorie; Validität
Abstract: Testing item fit is an important step when calibrating and analyzing item response theory (IRT)-based tests, as model fit is a necessary prerequisite for drawing valid inferences from estimated parameters. In the literature, numerous item fit statistics exist, sometimes resulting in contradictory conclusions regarding which items should be excluded from the test. Recently, researchers argue to shift the focus from statistical item fit analyses to evaluating practical consequences of item misfit. This article introduces a method to quantify potential bias of relationship estimates (e.g., correlation coefficients) due to misfitting items. The potential deviation informs about whether item misfit is practically significant for outcomes of substantial analyses. The method is demonstrated using data from an educational test. (DIPF/Orig.)
DIPF-Departments: Bildungsqualität und Evaluation

Incremental validity of multidimensional proficiency scores from diagnostic classification models: […] Kunina-Habenicht, Olga; Rupp, André A.; Wilhelm, Oliver Journal Article | In: International Journal of Testing | 2017 37179 Endnote: Author(s): Kunina-Habenicht, Olga; Rupp, André A.; Wilhelm, Oliver
Title: Incremental validity of multidimensional proficiency scores from diagnostic classification models: An illustration for elementary school mathematics
In: International Journal of Testing, 17 (2017) 4, S. 277-301
DOI: 10.1080/15305058.2017.1291517
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Language: Englisch
Keywords: Arithmetik; Diagnostik; Empirische Untersuchung; Item-Response-Theory; Leistungsmessung; Mathematische Kompetenz; Modell; Regressionsanalyse; Reliabilität; Schülerleistung; Schülerleistungstest; Schuljahr 04; Testkonstruktion; Validität
Abstract (english): Diagnostic classification models (DCMs) hold great potential for applications in summative and formative assessment by providing discrete multivariate proficiency scores that yield statistically-driven classifications of students. Using data from a newly developed diagnostic arithmetic assessment that was administered to 2,032 fourth-grade students in Germany, we evaluated whether the multidimensional proficiency scores from the best-fitting DCM have an added value, over and above the unidimensional proficiency score from a simpler unidimensional IRT model, in explaning variance in external (a) school grades in mathematics and (b) unidimensional proficiency scores from a standards-based large-scale assessment of mathematics. Results revealed high classification reliabilities as well as interpretable parameter estimates for items and students for the best-fitting DCM. However, while DCM scores were moderatly correlated with both external criteria, only a negligible incremental validity of the multivariate attribute scores was found. (DIPF/Orig.)
DIPF-Departments: Bildungsqualität und Evaluation

Veränderungen der Lesekompetenz von der 9. zur 10. Klasse. Differenzielle Entwicklungen in […] Nagy, Gabriel; Retelsdorf, Jan; Goldhammer, Frank; Schiepe-Tiska, Anja; Lüdtke, Oliver Journal Article | In: Zeitschrift für Erziehungswissenschaft. Sonderheft | 2017 37702 Endnote: Author(s): Nagy, Gabriel; Retelsdorf, Jan; Goldhammer, Frank; Schiepe-Tiska, Anja; Lüdtke, Oliver
Title: Veränderungen der Lesekompetenz von der 9. zur 10. Klasse. Differenzielle Entwicklungen in Abhängigkeit der Schulform, des Geschlechts und des soziodemografischen Hintergrunds?
In: Zeitschrift für Erziehungswissenschaft. Sonderheft, 33 (2017) , S. 177-203
DOI: 10.1007/s11618-017-0747-1
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Beitrag in Sonderheft
Language: Deutsch
Keywords: Lesekompetenz; Schülerleistung; Entwicklung; Veränderung; Schuljahr 09; Schuljahr 10; Sekundarstufe I; Schulform; Geschlecht; Migrationshintergrund; Sozioökonomische Lage; Schülerleistungstest; Datenanalyse; Item-Response-Theory; PISA <Programme for International Student Assessment>; Längsschnittuntersuchung; Deutschland
Abstract: Im vorliegenden Beitrag wurde die Entwicklung der Lesekompetenz im letzten Abschnitt der Sekundarstufe I (Klassenstufen 9 bis 10) untersucht. Neben der Veränderung der Testleistungen in der Gesamtpopulation wurden die Assoziationen ausgewählter institutioneller (Schulform), familiärer (Zuwanderungshintergrund und sozioökonomischer familiärer Status) und individueller Merkmale (Geschlecht) mit der Leistungsentwicklung erfasst. In Übereinstimmung mit aktuellen Studien, die eine abflachende Entwicklung der Leseleistung in späteren Phasen der Beschulung zeigen, konnten wir keinen Leistungszuwachs in der Gesamtstichprobe feststellen. Ebenso fanden sich keine belastbaren Hinweise dafür, dass die betrachteten Erklärungsvariablen mit der Kompetenzentwicklung assoziiert sind. Die Auswertungen lieferten jedoch deutliche Indizien dafür, dass die schülerseitige Persistenz der Testbearbeitung, die mittels Positionseffekten erfasst wurde, sich systematisch in Abhängigkeit der Hintergrundvariablen veränderte, wobei stärkere Abnahmen der Bearbeitungspersistenz an nichtgymnasialen Schulformen, bei Jungen, und Schülerinnen und Schülern aus sozioökonomisch schlechter gestellten Familien festgestellt wurden. Die Nichtberücksichtigung des Testbearbeitungsverhaltens führte zu "Scheineffekten" der Erklärungsvariablen Schulform und Geschlecht, die sich bei einer genaueren Betrachtung jedoch nicht auf den realen Kompetenzzuwachs bezogen. (DIPF/Orig.)
DIPF-Departments: Bildungsqualität und Evaluation

Absolute and relative measures of instructional sensitivity Naumann, Alexander; Hartig, Johannes; Hochweber, Jan Journal Article | In: Journal of Educational and Behavioral Statistics | 2017 37374 Endnote: Author(s): Naumann, Alexander; Hartig, Johannes; Hochweber, Jan
Title: Absolute and relative measures of instructional sensitivity
In: Journal of Educational and Behavioral Statistics, 42 (2017) 6, S. 678-705
DOI: 10.3102/1076998617703649
URN: urn:nbn:de:0111-pedocs-156029
URL: http://www.dipfdocs.de/volltexte/2018/15602/pdf/1076998617703649_A.pdf
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Language: Englisch
Keywords: Bewertung; DESI <Deutsch-Englisch-Schülerleistungen-International>; Deutschland; Englischunterricht; Item-Response-Theory; Leistungsmessung; Messverfahren; Schüler; Schülerleistung; Schuljahr 09; Sprachkompetenz; Test; Testkonstruktion; Testtheorie; Unterricht; Wirkung
Abstract: Valid inferences on teaching drawn from students' test scores require that tests are sensitive to the instruction students received in class. Accordingly, measures of the test items' instructional sensitivity provide empirical support for validity claims about inferences on instruction. In the present study, we first introduce the concepts of absolute and relative measures of instructional sensitivity. Absolute measures summarize a single item's total capacity of capturing effects of instruction, which is independent of the test's sensitivity. In contrast, relative measures summarize a single item's capacity of capturing effects of instruction relative to test sensitivity. Then, we propose a longitudinal multilevel item response theory model that allows estimating both types of measures depending on the identification constraints. (DIPF/Orig.)
DIPF-Departments: Bildungsqualität und Evaluation

Multidimensional adaptive measurement of competences Frey, Andreas; Kröhne, Ulf; Seitz, Nicki-Nils; Born, Sebastian Book Chapter | Aus: Leutner, Detlev;Fleischer, Jens;Grünkorn, Juliane;Klieme, Eckhard (Hrsg.): Competence assessment in education: Research, models and instruments | Cham: Springer | 2017 37125 Endnote: Author(s): Frey, Andreas; Kröhne, Ulf; Seitz, Nicki-Nils; Born, Sebastian
Title: Multidimensional adaptive measurement of competences
In: Leutner, Detlev;Fleischer, Jens;Grünkorn, Juliane;Klieme, Eckhard (Hrsg.): Competence assessment in education: Research, models and instruments, Cham: Springer, 2017 (Methodology of educational measurement and assessment), S. 369-387
DOI: 10.1007/978-3-319-50030-0_22
Publication Type: 4. Beiträge in Sammelwerken; Sammelband (keine besondere Kategorie)
Language: Englisch
Keywords: Adaptives Testen; Computerunterstütztes Verfahren; Item-Response-Theory; Test; Software; Simulation
Abstract: Even though multidimensional adaptive testing (MAT) is advantageous in the measurement of complex competencies, operational applications are still rare. In an attempt to change this situation, this chapter presents four recent developments that foster the applicability of MAT. First, in a simulation study, we show that multiple constraints can be accounted for in MAT without a loss of measurement precision, by using the multidimensional maximum priority index method. Second, the results from another simulation study show that the high efficiency of MAT is mainly due to the fact that MAT considers prior information in the final ability estimation, and not to the fact that MAT uses prior information for item selection. Third, the multidimensional adaptive testing environment is presented. This software can be used to assemble, configure, and apply multidimensional adaptive tests. Last, the application of the software is illustrated for unidimensional and multidimensional adaptive tests. The application of MAT is especially recommended for large-scale assessments of student achievement. (DIPF/Orig.)
DIPF-Departments: Bildungsqualität und Evaluation

Multidimensional structures of competencies. Focusing on text comprehension in English as a foreign […] Hartig, Johannes; Harsch, Claudia Book Chapter | Aus: Leutner, Detlev;Fleischer, Jens;Grünkorn, Juliane;Klieme, Eckhard (Hrsg.): Competence assessment in education: Research, models and instruments | Cham: Springer | 2017 37126 Endnote: Author(s): Hartig, Johannes; Harsch, Claudia
Title: Multidimensional structures of competencies. Focusing on text comprehension in English as a foreign language
In: Leutner, Detlev;Fleischer, Jens;Grünkorn, Juliane;Klieme, Eckhard (Hrsg.): Competence assessment in education: Research, models and instruments, Cham: Springer, 2017 (Methodology of educational measurement and assessment), S. 357-368
DOI: 10.1007/978-3-319-50030-0_21
Publication Type: 4. Beiträge in Sammelwerken; Sammelband (keine besondere Kategorie)
Language: Englisch
Keywords: Deutschland; Englischunterricht; Englisch als Zweitsprache; Textverständnis; Hörverstehen; Item-Response-Theory; Itemanalyse; Schwierigkeit; Test; Schüler; Schuljahr 09; Text; Rezeption; Rasch-Modell
Abstract: The project "Modeling competencies with multidimensional item-response-theory models" examined different psychometric models for student performance in English as a foreign language. On the basis of the results of re-analyses of data from completed large scale assessments, a new test of reading and listening comprehension was constructed. The items within this test use the same text material both for reading and for listening tasks, thus allowing a closer examination of the relations between abilities required for the comprehension of both written and spoken texts. Furthermore, item characteristics (e.g., cognitive demands and response format) were systematically varied, allowing us to disentangle the effects of these characteristics on item difficulty and dimensional structure. This chapter presents results on the properties of the newly developed test: Both reading and listening comprehension can be reliably measured (rel = .91 for reading and .86 for listening). Abilities for both sub-domains prove to be highly correlated yet empirically distinguishable, with a latent correlation of .84. Despite the listening items being more difficult, in terms of absolute correct answers, the difficulties of the same items in the reading and listening versions are highly correlated (r = .84). Implications of the results for measuring language competencies in educational contexts are discussed. (DIPF/Orig.)
DIPF-Departments: Bildungsqualität und Evaluation

The transition to computer-based testing in large-scale assessments. Investigating (partial) […] Bürger, Sarah; Kröhne, Ulf; Goldhammer, Frank Journal Article | In: Psychological Test and Assessment Modeling | 2016 36754 Endnote: Author(s): Bürger, Sarah; Kröhne, Ulf; Goldhammer, Frank
Title: The transition to computer-based testing in large-scale assessments. Investigating (partial) measurement invariance between modes
In: Psychological Test and Assessment Modeling, 4 (2016) 58, S. 597-616
URL: http://www.psychologie-aktuell.com/fileadmin/download/ptam/4-2016_20161219/03_Buerger.pdf
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Beitrag in Sonderheft
Language: Englisch
Keywords: Äquivalenz; Computerunterstütztes Verfahren; Experiment; Item-Response-Theory; Messung; Schülerleistungstest; Technologiebasiertes Testen; Testverfahren; Wirkung
Abstract (english): This paper provides an overview and recommendations on how to conduct a mode effect study in large-scale assessments by addressing criteria of equivalence between paper-based and computerbased tests. These criteria are selected according to the intended use of test scores and test score interpretations. A mode effect study can be implemented using experimental designs. The major benefit of combining experimental design considerations with the IRT methodology of mode effects is the possibility to investigate partial measurement invariance. This allows test scores from different modes to be used interchangeably and means of latent variables or mean differences and correlations to be compared on the population level even if some items differ in difficulty between modes. For this purpose, a multiple-group IRT model approach for analyzing mode effects on the test and item levels is presented. Instances where partial measurement invariance suffices to combine item parameters into one metric are reviewed in this paper. Furthermore, relevant study design requirements and potential sources of mode effects are discussed. Finally, an extension of the modelling approach to explain mode effects by means of item properties such as response format is presented. (DIPF/Orig.)
DIPF-Departments: Bildungsqualität und Evaluation

Competencies in geriatric nursing. Empirical evidence from a computer-based large-scale assessment […] Kaspar, Roman; Döring, Ottmar; Wittmann, Eveline; Hartig, Johannes; Weyland, Ulrike; […] Journal Article | In: Vocations and Learning | 2016 36440 Endnote: Author(s): Kaspar, Roman; Döring, Ottmar; Wittmann, Eveline; Hartig, Johannes; Weyland, Ulrike; Nauerth, Annette; Möllers, Michaela; Rechenbach, Simone; Simon, Julia; Worofka, Iberé
Title: Competencies in geriatric nursing. Empirical evidence from a computer-based large-scale assessment calibration study
In: Vocations and Learning, 9 (2016) 2, S. 185-206
DOI: 10.1007/s12186-015-9147-y
URL: http://link.springer.com/article/10.1007/s12186-015-9147-y
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Language: Englisch
Keywords: Altenpflege; Berufsausbildung; Berufsschule; Computerunterstütztes Verfahren; Deutschland; Item-Response-Theory; Kompetenz; Messverfahren; Technologiebasiertes Testen; Test
Abstract (english): Valid and reliable standardized assessment of nursing competencies is needed to monitor the quality of vocational education and training (VET) in nursing and evaluate learning outcomes for care work trainees with increasingly heterogeneous learning backgrounds. To date, however, the modeling of professional competencies has not yet evolved into procedures that would meet large-scale assessment (LSA) standards in VET. To empirically test a proposed structural model for client-directed nursing competence and to estimate psychometric properties of a newly developed video- and computer-based test (CBT) to inform subsequent LSA in nursing VET, 402 final-year nursing students from 24 German schools responded to a 77 item CBT. Multi-dimensional IRT modeling was employed to test the subdomain structure and estimate students' competencies in geriatric nursing. The standardized CBT measures nursing students' client-directed care competence with acceptable precision (WLE= 0.76) and does so across the whole range of observed proficiency levels. Structural validity was supported by substantive contributions of test items from all proposed process-oriented subdomains, practice field scenarios, as well as items with and without reference to emotional demands. However, it was not possible to empirically separate the diagnostic, practical or communicative subdomains, probably reflecting parallel, recursive and hierarchical care processes in complex care situations. On average, students in our sample attained 45 % of the maximum test score so it is a demanding assessment of nursing competence. An extensively piloted, valid and reliable CBT is suggested to assess nursing students' client-directed care competencies at the end of the third year of the VET program. (DIPF/Orig.)
DIPF-Departments: Bildungsqualität und Evaluation