Ergebnis der Suche in der DIPF Publikationendatenbank

Ihre Abfrage:

(Schlagwörter: "Item-Response-Theory")

Changes in the speed-ability relation through different treatments of rapid guessing Deribo, Tobias; Goldhammer, Frank; Kröhne, Ulf Zeitschriftenbeitrag | In: Educational and Psychological Measurement | 2023 42903 Endnote: Autor*innen: Deribo, Tobias; Goldhammer, Frank; Kröhne, Ulf
Titel: Changes in the speed-ability relation through different treatments of rapid guessing
In: Educational and Psychological Measurement, 83 (2023) 3, S. 473-494
DOI: 10.1177/00131644221109490
URL: https://journals.sagepub.com/doi/10.1177/00131644221109490
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Antwort; Deutschland; Empirische Untersuchung; Fertigkeit; Informations- und Kommunikationstechnologie; Item-Response-Theory; Leistungstest; Modell; Panel; Psychometrie; Reliabilität; Student; Test; Validität; Verhalten; Zeit
Abstract (english): As researchers in the social sciences, we are often interested in studying not directly observable constructs through assessments and questionnaires. But even in a well-designed and well-implemented study, rapid-guessing behavior may occur. Under rapid-guessing behavior, a task is skimmed shortly but not read and engaged with in-depth. Hence, a response given under rapid-guessing behavior does bias constructs and relations of interest. Bias also appears reasonable for latent speed estimates obtained under rapid-guessing behavior, as well as the identified relation between speed and ability. This bias seems especially problematic considering that the relation between speed and ability has been shown to be able to improve precision in ability estimation. For this reason, we investigate if and how responses and response times obtained under rapid-guessing behavior affect the identified speed-ability relation and the precision of ability estimates in a joint model of speed and ability. Therefore, the study presents an empirical application that highlights a specific methodological problem resulting from rapid-guessing behavior. Here, we could show that different (non-)treatments of rapid guessing can lead to different conclusions about the underlying speed-ability relation. Furthermore, different rapid-guessing treatments led to wildly different conclusions about gains in precision through joint modeling. The results show the importance of taking rapid guessing into account when the psychometric use of response times is of interest. (DIPF/Orig.)
DIPF-Abteilung: Lehr und Lernqualität in Bildungseinrichtungen

Practical significance of item misfit and its manifestations in constructs assessed in large‑scale […] Fährmann, Katharina; Köhler, Carmen; Hartig, Johannes; Heine, Jörg‑Henrik Zeitschriftenbeitrag | In: Large-scale Assessments in Education | 2022 42893 Endnote: Autor*innen: Fährmann, Katharina; Köhler, Carmen; Hartig, Johannes; Heine, Jörg‑Henrik
Titel: Practical significance of item misfit and its manifestations in constructs assessed in large‑scale studies
In: Large-scale Assessments in Education, 10 (2022) , S. 7
DOI: 10.1186/s40536‑022‑00124‑w
URL: https://largescaleassessmentsineducation.springeropen.com/articles/10.1186/s40536-022-00124-w
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Abstract (english): When scaling psychological tests with methods of item response theory it is necessary to investigate to what extent the responses correspond to the model predictions. In addition to the statistical evaluation of item misfit, the question arises as to its practical significance. Although item removal is undesirable for several reasons, its practical consequences are rarely investigated and focus mostly on main survey data with pre-selected items. In this paper, we identify criteria to evaluate practical significance and discuss them with respect to various types of assessments and their particular purposes. We then demonstrate the practical consequences of item misfit using two data examples from the German PISA 2018 field trial study: one with cognitive data and one with non-cognitive/metacognitive data. For the former, we scale the data under the GPCM with and without the inclusion of misfitting items, and investigate how this influences the trait distribution and the allocation to reading competency levels. For non-cognitive/metacognitive data, we explore the effect of excluding misfitting items on estimated gender differences. Our results indicate minor practical consequences for person allocation and no changes in the estimated gender-difference effects. (DIPF/Orig.)
DIPF-Abteilung: Lehr und Lernqualität in Bildungseinrichtungen

Performance of infit and outfit confidence intervals calculated via parametric bootstrapping Silva Diaz, John Alexander; Köhler, Carmen; Hartig, Johannes Zeitschriftenbeitrag | In: Applied Measurement in Education | 2022 42707 Endnote: Autor*innen: Silva Diaz, John Alexander; Köhler, Carmen; Hartig, Johannes
Titel: Performance of infit and outfit confidence intervals calculated via parametric bootstrapping
In: Applied Measurement in Education, 35 (2022) 2, S. 116-132
DOI: 10.1080/08957347.2022.2067540
URL: https://www.tandfonline.com/doi/full/10.1080/08957347.2022.2067540
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Item-Response-Theory; Rasch-Modell; Statistik; Methode; Verfahren; Stichprobe; Test; Analyse; Simulation
Abstract: Testing item fit is central in item response theory (IRT) modeling, since a good fit is necessary to draw valid inferences from estimated model parameters. Infit and outfit fit statistics, widespread indices for detecting deviations from the Rasch model, are affected by data factors, such as sample size. Consequently, the traditional use of fixed infit and outfit cutoff points is an ineffective practice. This article evaluates if confidence intervals estimated via parametric bootstrapping provide more suitable cutoff points than the conventionally applied range of 0.8-1.2, and outfit critical ranges adjusted by sample size. The performance is evaluated under different sizes of misfit, sample sizes, and number of items. Results show that the confidence intervals performed better in terms of power, but had inflated type-I error rates, which resulted from mean square values pushed below unity in the large size of misfit conditions. However, when performing a one-side test with the upper range of the confidence intervals, the forementioned inflation was fixed. (DIPF/Orig.)
DIPF-Abteilung: Lehr und Lernqualität in Bildungseinrichtungen

On the speed sensitivity parameter in the lognormal model for response times. Implications for test […] Becker, Benjamin; Debeer, Dries; Weirich, Sebastian; Goldhammer, Frank Zeitschriftenbeitrag | In: Applied Psychological Measurement | 2021 42009 Endnote: Autor*innen: Becker, Benjamin; Debeer, Dries; Weirich, Sebastian; Goldhammer, Frank
Titel: On the speed sensitivity parameter in the lognormal model for response times. Implications for test assembly
In: Applied Psychological Measurement, 45 (2021) 6, S. 407-422
DOI: 10.1177/01466216211008530
URL: https://journals.sagepub.com/doi/abs/10.1177/01466216211008530
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Software; Technologiebasiertes Testen; Messverfahren; Item-Response-Theory; Leistungstest; Frage; Antwort; Dauer; Einflussfaktor; Testkonstruktion; Modell; Vergleich; Testtheorie; Simulation
Abstract: In high-stakes testing, often multiple test forms are used and a common time limit is enforced. Test fairness requires that ability estimates must not depend on the administration of a specific test form. Such a requirement may be violated if speededness differs between test forms. The impact of not taking speed sensitivity into account on the comparability of test forms regarding speededness and ability estimation was investigated. The lognormal measurement model for response times by van der Linden was compared with its extension by Klein Entink, van der Linden, and Fox, which includes a speed sensitivity parameter. An empirical data example was used to show that the extended model can fit the data better than the model without speed sensitivity parameters. A simulation was conducted, which showed that test forms with different average speed sensitivity yielded substantial different ability estimates for slow test takers, especially for test takers with high ability. Therefore, the use of the extended lognormal model for response times is recommended for the calibration of item pools in high-stakes testing situations. Limitations to the proposed approach and further research questions are discussed. (DIPF/Orig.)
Abstract (english): In high-stakes testing, often multiple test forms are used and a common time limit is enforced. Test fairness requires that ability estimates must not depend on the administration of a specific test form. Such a requirement may be violated if speededness differs between test forms. The impact of not taking speed sensitivity into account on the comparability of test forms regarding speededness and ability estimation was investigated. The lognormal measurement model for response times by van der Linden was compared with its extension by Klein Entink, van der Linden, and Fox, which includes a speed sensitivity parameter. An empirical data example was used to show that the extended model can fit the data better than the model without speed sensitivity parameters. A simulation was conducted, which showed that test forms with different average speed sensitivity yielded substantial different ability estimates for slow test takers, especially for test takers with high ability. Therefore, the use of the extended lognormal model for response times is recommended for the calibration of item pools in high-stakes testing situations. Limitations to the proposed approach and further research questions are discussed. (DIPF/Orig.)
DIPF-Abteilung: Lehr und Lernqualität in Bildungseinrichtungen

Model‐based treatment of rapid guessing Deribo, Tobias; Kröhne, Ulf; Goldhammer, Frank Zeitschriftenbeitrag | In: Journal of Educational Measurement | 2021 41271 Endnote: Autor*innen: Deribo, Tobias; Kröhne, Ulf; Goldhammer, Frank
Titel: Model‐based treatment of rapid guessing
In: Journal of Educational Measurement, 58 (2021) 2, S. 281-303
DOI: 10.1111/jedm.12290
URL: https://onlinelibrary.wiley.com/doi/10.1111/jedm.12290?af=R
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Leistungstest; Testkonstruktion; Messverfahren; Computerunterstütztes Verfahren; Frage; Antwort; Verhalten; Dauer; Problemlösen; Modell; Student; Medienkompetenz; Item-Response-Theory; Multiple-Choice-Verfahren; Validität; Panel; Längsschnittuntersuchung
Abstract (english): The increased availability of time-related information as a result of computer-based assessment has enabled new ways to measure test-taking engagement. One of these ways is to distinguish between solution and rapid guessing behavior. Prior research has recommended response-level filtering to deal with rapid guessing. Response-level filtering can lead to parameter bias if rapid guessing depends on the measured trait or (un-)observed covariates. Therefore, a model based on Mislevy and Wu (1996) was applied to investigate the assumption of ignorable missing data underlying response-level filtering. The model allowed us to investigate different approaches to treating response-level filtered responses in a single framework through model parameterization. The study found that lower-ability test-takers tend to rapidly guess more frequently and are more likely to be unable to solve an item they guessed on, indicating a violation of the assumption of ignorable missing data underlying response-level filtering. Further ability estimation seemed sensitive to different approaches to treating response-level filtered responses. Moreover, model-based approaches exhibited better model fit and higher convergent validity evidence compared to more naïve treatments of rapid guessing. The results illustrate the need to thoroughly investigate the assumptions underlying specific treatments of rapid guessing as well as the need for robust methods. (DIPF/Orig.)
DIPF-Abteilung: Lehr und Lernqualität in Bildungseinrichtungen

Measuring hygiene competence. The picture-based situational judgement test HygiKo Heininger, Susanne Katharina; Baumgartner, Maria; Zehner, Fabian; Burgkart, Rainer; Söllner, Nina; […] Zeitschriftenbeitrag | In: BMC Medical Education | 2021 41439 Endnote: Autor*innen: Heininger, Susanne Katharina; Baumgartner, Maria; Zehner, Fabian; Burgkart, Rainer; Söllner, Nina; Berberat, Pascal O.; Gartmeier, Martin
Titel: Measuring hygiene competence. The picture-based situational judgement test HygiKo
In: BMC Medical Education, 21 (2021) , S. 410
DOI: 10.1186/s12909-021-02829-y
URL: https://bmcmededuc.biomedcentral.com/articles/10.1186/s12909-021-02829-y
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Hygiene; Kompetenz; Testverfahren; Gesundheitswesen; Medizin; Student; Arzt; Medizinisches Personal; Situation; Bewertung; Vignette; Item-Response-Theory; Rasch-Modell
Abstract: Background: With the onset of the COVID-19 pandemic at the beginning of 2020, the crucial role of hygiene in healthcare settings has once again become very clear. For diagnostic and for didactic purposes, standardized and reliable tests suitable to assess the competencies involved in "working hygienically" are required. However, existing tests usually use self-report questionnaires, which are suboptimal for this purpose. In the present study, we introduce the newly developed, competence-oriented HygiKo test instrument focusing health-care professionals' hygiene competence and report empirical evidence regarding its psychometric properties. Methods: HygiKo is a Situational Judgement Test (SJT) to assess hygiene competence. The HygiKo-test consists of twenty pictures (items), each item presents only one unambiguous hygiene lapse. For each item, test respondents are asked (1) whether they recognize a problem in the picture with respect to hygiene guidelines and, (2) if yes, to describe the problem in a short verbal response. Our sample comprised n = 149 health care professionals (79.1 % female; age: M = 26.7 years, SD = 7.3 years) working as clinicians or nurses. The written responses were rated by two independent raters with high agreement (α > 0.80), indicating high reliability of the measurement. We used Item Response Theory (IRT) for further data analysis. Results: We report IRT analyses that show that the HygiKo-test is suitable to assess hygiene competence and that it allows to distinguish between persons demonstrating different levels of ability for seventeen of the twenty items), especially for the range of low to medium person abilities. Hence, the HygiKo-SJT is suitable to get a reliable and competence-oriented measure for hygiene-competence. Conclusions: In its present form, the HygiKo-test can be used to assess the hygiene competence of medical students, medical doctors, nurses and trainee nurses in cross-sectional measurements. In order to broaden the difficulty spectrum of the current test, additional test items with higher difficulty should be developed. The Situational Judgement Test designed to assess hygiene competence can be helpful in testing and teaching the ability of working hygienically. Further research for validity is needed. (DIPF/Orig.)
DIPF-Abteilung: Lehr und Lernqualität in Bildungseinrichtungen

A semiparametric approach for item response function estimation to detect item misﬁt Köhler, Carmen; Robitzsch, Alexander; Fährmann, Katharina; von Davier, Matthias; Hartig, Johannes Zeitschriftenbeitrag | In: British Journal of Mathematical and Statistical Psychology | 2021 41437 Endnote: Autor*innen: Köhler, Carmen; Robitzsch, Alexander; Fährmann, Katharina; von Davier, Matthias; Hartig, Johannes
Titel: A semiparametric approach for item response function estimation to detect item misﬁt
In: British Journal of Mathematical and Statistical Psychology, 74 (2021) 51, S. 157-175
DOI: 10.1111/bmsp.12224
URL: https://bpspsychub.onlinelibrary.wiley.com/doi/epdf/10.1111/bmsp.12224
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Item-Response-Theory; Testtheorie
DIPF-Abteilung: Lehr und Lernqualität in Bildungseinrichtungen

A bias corrected RMSD item fit statistic. An evaluation and comparison to alternatives Köhler, Carmen; Robitzsch, Alexander; Hartig, Johannes Zeitschriftenbeitrag | In: Journal of Educational and Behavioral Statistics | 2020 40510 Endnote: Autor*innen: Köhler, Carmen; Robitzsch, Alexander; Hartig, Johannes
Titel: A bias corrected RMSD item fit statistic. An evaluation and comparison to alternatives
In: Journal of Educational and Behavioral Statistics, 45 (2020) 3, S. 251-273
DOI: 10.3102/1076998619890566
URL: https://journals.sagepub.com/doi/10.3102/1076998619890566
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Item-Response-Theory; Testkonstruktion; Modell; Frage; Antwort; Messverfahren; Statistische Methode; Evaluation; Vergleich; Bildungsforschung; Empirische Forschung
Abstract: Testing whether items fit the assumptions of an item response theory model is an important step in evaluating a test. In the literature, numerous item fit statistics exist, many of which show severe limitations. The current study investigates the root mean squared deviation (RMSD) item fit statistic, which is used for evaluating item fit in various large-scale assessment studies. The three research questions of this study are (1) whether the empirical RMSD is an unbiased estimator of the population RMSD; (2) if this is not the case, whether this bias can be corrected; and (3) whether the test statistic provides an adequate significance test to detect misfitting items. Using simulation studies, it was found that the empirical RMSD is not an unbiased estimator of the population RMSD, and nonparametric bootstrapping falls short of entirely eliminating this bias. Using parametric bootstrapping, however, the RMSD can be used as a test statistic that outperforms the other approaches - infit and outfit, S1 X2 with respect to both Type I error rate and power. The empirical application showed that parametric bootstrapping of the RMSD results in rather conservative item fit decisions, which suggests more lenient cut-off criteria. (DIPF/Orig.)
DIPF-Abteilung: Bildungsqualität und Evaluation

Interpretation von Testwerten in der Item-Response-Theorie (IRT) Rauch, Dominique; Hartig, Johannes Sammelbandbeitrag | Aus: Moosbrugger, Helfried; Kelava, Augustin (Hrsg.): Testtheorie und Fragebogenkonstruktion | Berlin: Springer | 2020 40527 Endnote: Autor*innen: Rauch, Dominique; Hartig, Johannes
Titel: Interpretation von Testwerten in der Item-Response-Theorie (IRT)
Aus: Moosbrugger, Helfried; Kelava, Augustin (Hrsg.): Testtheorie und Fragebogenkonstruktion, Berlin: Springer, 2020 , S. 411-424
DOI: 10.1007/978-3-662-61532-4_17
URL: https://link.springer.com/chapter/10.1007%2F978-3-662-61532-4_17
Dokumenttyp: 4. Beiträge in Sammelwerken; Sammelband (keine besondere Kategorie)
Sprache: Deutsch
Schlagwörter: Test; Wert; Testauswertung; Interpretation; Item-Response-Theory; Modell; Bildungsforschung; Empirische Forschung; Kompetenz; Definition; Rasch-Modell; Datenanalyse
Abstract: Im vorliegenden Kapitel geht es um die Anwendung von IRT-Modellen im Rahmen der empirischen Bildungsforschung. Bei großen Schulleistungsstudien werden spezifische Vorteile der IRT genutzt, um beispielsweise das Matrix-Sampling von Testaufgaben, die Erstellung paralleler Testformen und die Entwicklung computerisierter adaptiver Tests zu ermöglichen. Ein weiterer wesentlicher Vorteil von IRT-Modellen ist die Möglichkeit der kriteriumsorientierten Interpretation IRT-basierter Testwerte. Diese wird durch die gemeinsame Verortung von Itemschwierigkeiten und Personenfähigkeiten auf einer Joint Scale durchführbar. Bei Gültigkeit des Rasch-Modells können individuelle Testwerte durch ihre Abstände zu Itemschwierigkeiten interpretiert werden. Auf dieser zentralen Eigenschaft von Rasch-Modellen bauen auch sog. "Kompetenzniveaus" auf. Zur leichteren Interpretation wird die kontinuierliche Skala in Abschnitte (Kompetenzniveaus) unterteilt, die dann als Ganzes kriteriumsorientiert beschrieben werden. In diesem Kapitel werden an einem gemeinsamen Beispiel die Definition und Beschreibung von Kompetenzniveaus anhand eines Vorgehens mit Post-hoc-Analysen der Items und die Verwendung von A-priori-Aufgabenmerkmalen veranschaulicht. (DIPF/Orig.)
DIPF-Abteilung: Bildungsqualität und Evaluation

Comparing attitudes across groups. An IRT-based item-fit statistic for the analysis of measurement […] Buchholz, Janine; Hartig, Johannes Zeitschriftenbeitrag | In: Applied Psychological Measurement | 2019 37766 Endnote: Autor*innen: Buchholz, Janine; Hartig, Johannes
Titel: Comparing attitudes across groups. An IRT-based item-fit statistic for the analysis of measurement invariance
In: Applied Psychological Measurement, 43 (2019) 3, S. 241-250
DOI: 10.1177/0146621617748323
URN: urn:nbn:de:0111-dipfdocs-174393
URL: http://www.dipfdocs.de/volltexte/2020/17439/pdf/APM_2019_3_Buchholz_Hartig_Comparing_attitudes_across_groups_A.pdf
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Einstellung <Psy>; Messung; Fragebogen; Internationaler Vergleich; Gruppe; Vergleich; Item-Response-Theory; Skalierung; Modell; Statistische Methode; Simulation
Abstract (english): Questionnaires for the assessment of attitudes and other psychological traits are crucial in educational and psychological research, and Item Response Theory (IRT) has become a viable tool for scaling such data. Many international large-scale assessments aim at comparing these constructs across countries, and the invariance of measures across countries is thus required. In its most recent cycle, the Programme for International Student Assessment (PISA 2015) implemented an innovative approach for testing the invariance of IRT-scaled constructs in the context questionnaires administered to students, parents, school principals and teachers. On the basis of a concurrent calibration with equal item parameters across all groups (i.e., languages within countries), a group-specific item-fit statistic (root-mean-square deviance; RMSD) was used as a measure for the invariance of item parameters for individual groups. The present simulation study examines the statistic's distribution under different types and extents of (non-) invariance in polytomous items. Responses to five four-point Likert-type items were generated under the Generalized Partial Credit Model (GPCM) for 1000 simulees in 50 groups each. For one of the five items, either location or discrimination parameters were drawn from a normal distribution. In addition to this type of non-invariance, we varied the extent of non-invariance by manipulating the variation of these distributions. Results indicate that the RMSD statistic is better at detecting non-invariance related to between-group differences in item location than in item discrimination. The study's findings may be used as a starting point to sensitivity analysis aiming to define cut-off values for determining (non-) invariance. (DIPF/Orig.)
DIPF-Abteilung: Bildungsqualität und Evaluation