-
-
Autor*innen: Deribo, Tobias; Goldhammer, Frank; Kröhne, Ulf
Titel: Changes in the speed-ability relation through different treatments of rapid guessing
In: Educational and Psychological Measurement, 83 (2023) 3, S. 473-494
DOI: 10.1177/00131644221109490
URL: https://journals.sagepub.com/doi/10.1177/00131644221109490
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Antwort; Deutschland; Empirische Untersuchung; Fertigkeit; Informations- und Kommunikationstechnologie; Item-Response-Theory; Leistungstest; Modell; Panel; Psychometrie; Reliabilität; Student; Test; Validität; Verhalten; Zeit
Abstract (english): As researchers in the social sciences, we are often interested in studying not directly observable constructs through assessments and questionnaires. But even in a well-designed and well-implemented study, rapid-guessing behavior may occur. Under rapid-guessing behavior, a task is skimmed shortly but not read and engaged with in-depth. Hence, a response given under rapid-guessing behavior does bias constructs and relations of interest. Bias also appears reasonable for latent speed estimates obtained under rapid-guessing behavior, as well as the identified relation between speed and ability. This bias seems especially problematic considering that the relation between speed and ability has been shown to be able to improve precision in ability estimation. For this reason, we investigate if and how responses and response times obtained under rapid-guessing behavior affect the identified speed-ability relation and the precision of ability estimates in a joint model of speed and ability. Therefore, the study presents an empirical application that highlights a specific methodological problem resulting from rapid-guessing behavior. Here, we could show that different (non-)treatments of rapid guessing can lead to different conclusions about the underlying speed-ability relation. Furthermore, different rapid-guessing treatments led to wildly different conclusions about gains in precision through joint modeling. The results show the importance of taking rapid guessing into account when the psychometric use of response times is of interest. (DIPF/Orig.)
DIPF-Abteilung: Lehr und Lernqualität in Bildungseinrichtungen
-
-
Autor*innen: Fährmann, Katharina; Köhler, Carmen; Hartig, Johannes; Heine, Jörg‑Henrik
Titel: Practical significance of item misfit and its manifestations in constructs assessed in large‑scale studies
In: Large-scale Assessments in Education, 10 (2022) , S. 7
DOI: 10.1186/s40536‑022‑00124‑w
URL: https://largescaleassessmentsineducation.springeropen.com/articles/10.1186/s40536-022-00124-w
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Abstract (english): When scaling psychological tests with methods of item response theory it is necessary to investigate to what extent the responses correspond to the model predictions. In addition to the statistical evaluation of item misfit, the question arises as to its practical significance. Although item removal is undesirable for several reasons, its practical consequences are rarely investigated and focus mostly on main survey data with pre-selected items. In this paper, we identify criteria to evaluate practical significance and discuss them with respect to various types of assessments and their particular purposes. We then demonstrate the practical consequences of item misfit using two data examples from the German PISA 2018 field trial study: one with cognitive data and one with non-cognitive/metacognitive data. For the former, we scale the data under the GPCM with and without the inclusion of misfitting items, and investigate how this influences the trait distribution and the allocation to reading competency levels. For non-cognitive/metacognitive data, we explore the effect of excluding misfitting items on estimated gender differences. Our results indicate minor practical consequences for person allocation and no changes in the estimated gender-difference effects. (DIPF/Orig.)
DIPF-Abteilung: Lehr und Lernqualität in Bildungseinrichtungen
-
-
Autor*innen: Franzen, Patrick; Arens, A. Katrin; Greiff, Samuel; Westhuizen, Lindie van der; Fischbach, Antoine; Wollschläger, Rachel; Niepel, Christoph
Titel: Developing and validating a short-form questionnaire for the assessment of seven facets of conscientiousness in large-scale assessments
In: Journal of Personality Assessment, 104 (2022) 6, S. 759-773
DOI: 10.1080/00223891.2021.1998083
URL: https://www.tandfonline.com/doi/full/10.1080/00223891.2021.1998083
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Auswahl; Bildungsforschung; Datenanalyse; Entwicklung; Item; Luxemburg; Messbarkeit; Messinstrument; Pflichtbewusstsein; Psychometrie; Reliabilität; Schüler; Schülerin; Schuljahr 09; Studie; Validität
Abstract: Conscientiousness is the most important personality predictor of academic achievement. It consists of several lower order facets with differential relations to academic achievement. There is currently no short instrument assessing facets of conscientiousness in the educational context. Therefore, in the present multi-study report, we develop and validate a short-form questionnaire for the assessment of seven Conscientiousness facets, namely Industriousness, Perfectionism, Tidiness, Procrastination Refrainment, Control, Caution, and Task Planning. To this end, we examined multiple representative samples totaling N = 14,604 Grade 9 and 10 students from Luxembourg. The questionnaire was developed by adapting and shortening an existing scale using an exhaustive search algorithm. The algorithm was specified to select the best item combination based on model fit, reliability, and measurement invariance across the German and French language versions. The resulting instrument showed the expected factorial structure. The relations of the facets with personality constructs and academic achievement were in line with theoretical assumptions. Reliability was acceptable for all facets. Measurement invariance across language versions, gender, immigration status and cohort was established. We conclude that the presented questionnaire provides a short measurement of seven facets of Conscientiousness with valid and reliable scores. (DIPF/Orig.)
DIPF-Abteilung: Bildung und Entwicklung
-
-
Autor*innen: Silva Diaz, John Alexander; Köhler, Carmen; Hartig, Johannes
Titel: Performance of infit and outfit confidence intervals calculated via parametric bootstrapping
In: Applied Measurement in Education, 35 (2022) 2, S. 116-132
DOI: 10.1080/08957347.2022.2067540
URL: https://www.tandfonline.com/doi/full/10.1080/08957347.2022.2067540
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Item-Response-Theory; Rasch-Modell; Statistik; Methode; Verfahren; Stichprobe; Test; Analyse; Simulation
Abstract: Testing item fit is central in item response theory (IRT) modeling, since a good fit is necessary to draw valid inferences from estimated model parameters. Infit and outfit fit statistics, widespread indices for detecting deviations from the Rasch model, are affected by data factors, such as sample size. Consequently, the traditional use of fixed infit and outfit cutoff points is an ineffective practice. This article evaluates if confidence intervals estimated via parametric bootstrapping provide more suitable cutoff points than the conventionally applied range of 0.8-1.2, and outfit critical ranges adjusted by sample size. The performance is evaluated under different sizes of misfit, sample sizes, and number of items. Results show that the confidence intervals performed better in terms of power, but had inflated type-I error rates, which resulted from mean square values pushed below unity in the large size of misfit conditions. However, when performing a one-side test with the upper range of the confidence intervals, the forementioned inflation was fixed. (DIPF/Orig.)
DIPF-Abteilung: Lehr und Lernqualität in Bildungseinrichtungen
-
-
Autor*innen: Dohrmann, Julia
Titel: Überzeugungen von Lehrkräften. Ihre Bedeutung für das pädagogische Handeln und die Lernergebnisse in den Fächern Englisch und Mathematik
Erscheinungsvermerk: Münster: Waxmann, 2021 (Empirische Erziehungswissenschaft, 78)
DOI: 10.31244/9783830994176
URN: urn:nbn:de:0111-pedocs-224986
URL: http://nbn-resolving.org/urn:nbn:de:0111-pedocs-224986
Dokumenttyp: 1. Monographien (Autorenschaft); Monographie
Sprache: Deutsch
Schlagwörter: 20. Jahrhundert; Allgemeine Pädagogik; Bildungsgeschichte; Datenanalyse; Deutschland; Dissertation; Einflussfaktor; Einstellung; Empirische Forschung; Englischunterricht; Gesamtschule; Gymnasium; Handlungskompetenz; Hauptschule; Hessen; Item; Item-Response-Theorie; Lehrer; Lernergebnis; Mathematikunterricht; Merkmal; Niedersachsen; Nordrhein-Westfalen; Pädagogisches Handeln; Professionalisierung; Qualität; Realschule; Schülerleistung; Schulform; Schulforschung; Schuljahr 09; Schulkultur; Schulqualität; Sekundäranalyse; Test; Überzeugung; Unterricht; Unterrichtsforschung; Unterrichtsklima; Unterrichtspraxis; Unterrichtsqualität; Wandel
Abstract: Pädagogische Überzeugungen von Lehrkräften sind ein zentraler Aspekt ihrer professionellen Kompetenz, der für Schul- und Unterrichtsqualität bedeutsam ist. In dieser Studie wird untersucht, mit welchen Unterrichtsmerkmalen allgemeine pädagogische Überzeugungen von Lehrkräften zusammenhängen und wie diese Überzeugungen, vermittelt über das Unterrichtshandeln, mit Lernergebnissen von Schülerinnen und Schülern korrespondieren. Dies geschieht durch eine Sekundäranalyse der Drei-Länder-Studie von Helmut Fend aus den Jahren 1978/79. Ziel der Arbeit ist es, die Beziehungen zwischen allgemeinpädagogischen Überzeugungen, Unterricht und Lernergebnissen unter Berücksichtigung des aktuellen theoretischen und methodischen Forschungsstandes zu analysieren. Es zeigt sich, dass im Englischunterricht die pädagogischen Überzeugungen der Lehrkräfte mit einem unterstützenden Unterrichtsklima und - vermittelt über adaptives Unterrichtshandeln - mit den Lernergebnissen der Schülerinnen und Schüler im affektiven Bereich zusammenhängen.
Abstract (english): {Abstract_englisch}
DIPF-Abteilung: Lehr und Lernqualität in Bildungseinrichtungen
-
-
Autor*innen: Becker, Benjamin; Debeer, Dries; Weirich, Sebastian; Goldhammer, Frank
Titel: On the speed sensitivity parameter in the lognormal model for response times. Implications for test assembly
In: Applied Psychological Measurement, 45 (2021) 6, S. 407-422
DOI: 10.1177/01466216211008530
URL: https://journals.sagepub.com/doi/abs/10.1177/01466216211008530
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Software; Technologiebasiertes Testen; Messverfahren; Item-Response-Theory; Leistungstest; Frage; Antwort; Dauer; Einflussfaktor; Testkonstruktion; Modell; Vergleich; Testtheorie; Simulation
Abstract: In high-stakes testing, often multiple test forms are used and a common time limit is enforced. Test fairness requires that ability estimates must not depend on the administration of a specific test form. Such a requirement may be violated if speededness differs between test forms. The impact of not taking speed sensitivity into account on the comparability of test forms regarding speededness and ability estimation was investigated. The lognormal measurement model for response times by van der Linden was compared with its extension by Klein Entink, van der Linden, and Fox, which includes a speed sensitivity parameter. An empirical data example was used to show that the extended model can fit the data better than the model without speed sensitivity parameters. A simulation was conducted, which showed that test forms with different average speed sensitivity yielded substantial different ability estimates for slow test takers, especially for test takers with high ability. Therefore, the use of the extended lognormal model for response times is recommended for the calibration of item pools in high-stakes testing situations. Limitations to the proposed approach and further research questions are discussed. (DIPF/Orig.)
Abstract (english): In high-stakes testing, often multiple test forms are used and a common time limit is enforced. Test fairness requires that ability estimates must not depend on the administration of a specific test form. Such a requirement may be violated if speededness differs between test forms. The impact of not taking speed sensitivity into account on the comparability of test forms regarding speededness and ability estimation was investigated. The lognormal measurement model for response times by van der Linden was compared with its extension by Klein Entink, van der Linden, and Fox, which includes a speed sensitivity parameter. An empirical data example was used to show that the extended model can fit the data better than the model without speed sensitivity parameters. A simulation was conducted, which showed that test forms with different average speed sensitivity yielded substantial different ability estimates for slow test takers, especially for test takers with high ability. Therefore, the use of the extended lognormal model for response times is recommended for the calibration of item pools in high-stakes testing situations. Limitations to the proposed approach and further research questions are discussed. (DIPF/Orig.)
DIPF-Abteilung: Lehr und Lernqualität in Bildungseinrichtungen
-
-
Autor*innen: Deribo, Tobias; Kröhne, Ulf; Goldhammer, Frank
Titel: Model‐based treatment of rapid guessing
In: Journal of Educational Measurement, 58 (2021) 2, S. 281-303
DOI: 10.1111/jedm.12290
URL: https://onlinelibrary.wiley.com/doi/10.1111/jedm.12290?af=R
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Leistungstest; Testkonstruktion; Messverfahren; Computerunterstütztes Verfahren; Frage; Antwort; Verhalten; Dauer; Problemlösen; Modell; Student; Medienkompetenz; Item-Response-Theory; Multiple-Choice-Verfahren; Validität; Panel; Längsschnittuntersuchung
Abstract (english): The increased availability of time-related information as a result of computer-based assessment has enabled new ways to measure test-taking engagement. One of these ways is to distinguish between solution and rapid guessing behavior. Prior research has recommended response-level filtering to deal with rapid guessing. Response-level filtering can lead to parameter bias if rapid guessing depends on the measured trait or (un-)observed covariates. Therefore, a model based on Mislevy and Wu (1996) was applied to investigate the assumption of ignorable missing data underlying response-level filtering. The model allowed us to investigate different approaches to treating response-level filtered responses in a single framework through model parameterization. The study found that lower-ability test-takers tend to rapidly guess more frequently and are more likely to be unable to solve an item they guessed on, indicating a violation of the assumption of ignorable missing data underlying response-level filtering. Further ability estimation seemed sensitive to different approaches to treating response-level filtered responses. Moreover, model-based approaches exhibited better model fit and higher convergent validity evidence compared to more naïve treatments of rapid guessing. The results illustrate the need to thoroughly investigate the assumptions underlying specific treatments of rapid guessing as well as the need for robust methods. (DIPF/Orig.)
DIPF-Abteilung: Lehr und Lernqualität in Bildungseinrichtungen
-
-
Autor*innen: Engelhardt, Lena; Naumann, Johannes; Goldhammer, Frank; Frey, Andreas; Horz, Holger; Hartig, Katja; Wenzel, S. Franziska C.
Titel: Development and evaluation of a framework for the performance-based testing of ICT skills
In: Frontiers in Education, 6 (2021) , S. 668860
DOI: 10.3389/feduc.2021.668860
URL: https://www.frontiersin.org/articles/10.3389/feduc.2021.668860/full
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Informations- und Kommunikationstechnologie; Praktische Fertigkeit; Wissen; Problemlösen; Textverständnis; Bildverstehen; Bewertung; Modell; Item; Entwicklung; Testvalidität; Itemanalyse; Rasch-Modell; Implementation; Evaluation; Test; Testverhalten; Schüler; Sekundarstufe I; Baden-Württemberg; Rheinland-Pfalz; Deutschland
Abstract (english): This paper addresses the development of performance-based assessment items for ICT skills, skills in dealing with information and communication technologies, a construct which is rather broadly and only operationally defined. Item development followed a construct-driven approach to ensure that test scores could be interpreted as intended. Specifically, ICT-specific knowledge as well as problem-solving and the comprehension of text and graphics were defined as components of ICT skills and cognitive ICT tasks (i.e., accessing, managing, integrating, evaluating, creating). In order to capture the construct in a valid way, design principles for constructing the simulation environment and response format were formulated. To empirically evaluate the very heterogeneous items and detect malfunctioning items, item difficulties were analyzed and behavior-related indicators with item-specific thresholds were developed and applied. The 69 item's difficulty scores from the Rasch model fell within a comparable range for each cognitive task. Process indicators addressing time use and test-taker interactions were used to analyze whether most test-takers executed the intended processes, exhibited disengagement, or got lost among the items. Most items were capable of eliciting the intended behavior; for the few exceptions, conclusions for item revisions were drawn. The results affirm the utility of the proposed framework for developing and implementing performance-based items to assess ICT skills. (DIPF/Orig.)
DIPF-Abteilung: Lehr und Lernqualität in Bildungseinrichtungen
-
-
Autor*innen: Heininger, Susanne Katharina; Baumgartner, Maria; Zehner, Fabian; Burgkart, Rainer; Söllner, Nina; Berberat, Pascal O.; Gartmeier, Martin
Titel: Measuring hygiene competence. The picture-based situational judgement test HygiKo
In: BMC Medical Education, 21 (2021) , S. 410
DOI: 10.1186/s12909-021-02829-y
URL: https://bmcmededuc.biomedcentral.com/articles/10.1186/s12909-021-02829-y
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Hygiene; Kompetenz; Testverfahren; Gesundheitswesen; Medizin; Student; Arzt; Medizinisches Personal; Situation; Bewertung; Vignette; Item-Response-Theory; Rasch-Modell
Abstract: Background: With the onset of the COVID-19 pandemic at the beginning of 2020, the crucial role of hygiene in healthcare settings has once again become very clear. For diagnostic and for didactic purposes, standardized and reliable tests suitable to assess the competencies involved in "working hygienically" are required. However, existing tests usually use self-report questionnaires, which are suboptimal for this purpose. In the present study, we introduce the newly developed, competence-oriented HygiKo test instrument focusing health-care professionals' hygiene competence and report empirical evidence regarding its psychometric properties.
Methods: HygiKo is a Situational Judgement Test (SJT) to assess hygiene competence. The HygiKo-test consists of twenty pictures (items), each item presents only one unambiguous hygiene lapse. For each item, test respondents are asked (1) whether they recognize a problem in the picture with respect to hygiene guidelines and, (2) if yes, to describe the problem in a short verbal response. Our sample comprised n = 149 health care professionals (79.1 % female; age: M = 26.7 years, SD = 7.3 years) working as clinicians or nurses. The written responses were rated by two independent raters with high agreement (α > 0.80), indicating high reliability of the measurement. We used Item Response Theory (IRT) for further data analysis.
Results: We report IRT analyses that show that the HygiKo-test is suitable to assess hygiene competence and that it allows to distinguish between persons demonstrating different levels of ability for seventeen of the twenty items), especially for the range of low to medium person abilities. Hence, the HygiKo-SJT is suitable to get a reliable and competence-oriented measure for hygiene-competence.
Conclusions: In its present form, the HygiKo-test can be used to assess the hygiene competence of medical students, medical doctors, nurses and trainee nurses in cross-sectional measurements. In order to broaden the difficulty spectrum of the current test, additional test items with higher difficulty should be developed. The Situational Judgement Test designed to assess hygiene competence can be helpful in testing and teaching the ability of working hygienically. Further research for validity is needed. (DIPF/Orig.)
DIPF-Abteilung: Lehr und Lernqualität in Bildungseinrichtungen
-
-
Autor*innen: Köhler, Carmen; Robitzsch, Alexander; Fährmann, Katharina; von Davier, Matthias; Hartig, Johannes
Titel: A semiparametric approach for item response function estimation to detect item misfit
In: British Journal of Mathematical and Statistical Psychology, 74 (2021) 51, S. 157-175
DOI: 10.1111/bmsp.12224
URL: https://bpspsychub.onlinelibrary.wiley.com/doi/epdf/10.1111/bmsp.12224
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Item-Response-Theory; Testtheorie
DIPF-Abteilung: Lehr und Lernqualität in Bildungseinrichtungen