Search results in the DIPF database of publications

Your query:

(Schlagwörter: "Testkonstruktion")

Sensitivity of test items to teaching quality Naumann, Alexander; Rieser, Svenja; Musow, Stephanie; Hochweber, Jan; Hartig, Johannes Journal Article | In: Learning and Instruction | 2019 38989 Endnote: Author(s): Naumann, Alexander; Rieser, Svenja; Musow, Stephanie; Hochweber, Jan; Hartig, Johannes
Title: Sensitivity of test items to teaching quality
In: Learning and Instruction, 60 (2019) , S. 41-53
DOI: 10.1016/j.learninstruc.2018.11.002
URL: https://www.sciencedirect.com/science/article/pii/S0959475217307065?via%3Dihub
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Language: Englisch
Keywords: Leistungstest; Testkonstruktion; Unterricht; Qualität; Einflussfaktor; Testauswertung; Grundschule; Naturwissenschaftlicher Unterricht; Aktives Lernen; Entdeckendes Lernen; Unterrichtsmethode; Wirkung; Messverfahren; Testaufgabe; Problemlösen; Grundschüler; Dauer; Antwort; Schwierigkeit; Datenanalyse; Interpretation; Quasi-Experiment; Deutschland
Abstract: Instructional sensitivity is the psychometric capacity of tests or single items of capturing effects of classroom instruction. Yet, current item sensitivity measures' relationship to (a) actual instruction and (b) overall test sensitivity is rather unclear. The present study aims at closing these gaps by investigating test and item sensitivity to teaching quality, reanalyzing data from a quasi-experimental intervention study in primary school science education (1026 students, 53 classes, Mage = 8.79 years, SDage = 0.49, 50% female). We examine (a) the correlation of item sensitivity measures and the potential for cognitive activation in class and (b) consequences for test score interpretation when assembling tests from items varying in their degree of sensitivity to cognitive activation. Our study (a) provides validity evidence that item sensitivity measures may be related to actual classroom instruction and (b) points out that inferences on teaching drawn from test scores may vary due to test composition. (DIPF/Orig.)
DIPF-Departments: Bildungsqualität und Evaluation

Assessment of competences in sustainability management. Analyses to the construct dimensionality Seeber, Susan; Michaelis, Christian; Repp, Anton; Hartig, Johannes; Aichele, Christine; […] Journal Article | In: Zeitschrift für Pädagogische Psychologie | 2019 39562 Endnote: Author(s): Seeber, Susan; Michaelis, Christian; Repp, Anton; Hartig, Johannes; Aichele, Christine; Schumann, Matthias; Anke, Jan Moritz; Dierkes, Stefan; Siepelmeyer, David
Title: Assessment of competences in sustainability management. Analyses to the construct dimensionality
In: Zeitschrift für Pädagogische Psychologie, 33 (2019) 2, S. 148-158
DOI: 10.1024/1010-0652/a000240
URN: urn:nbn:de:0111-pedocs-237802
URL: https://nbn-resolving.org/urn:nbn:de:0111-pedocs-237802
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Language: Englisch
Keywords: Nachhaltige Entwicklung; Kompetenz; Diagnostik; Messung; Modell; Student; Wirtschaftswissenschaft; Unternehmen; Simulation; Management; Curriculum; Messverfahren; Diagnostischer Test; Testkonstruktion; Erhebungsinstrument; Faktorenanalyse; Strukturgleichungsmodell
Abstract: Dieser Beitrag thematisiert die Messung von Kompetenzen für das Nachhaltigkeitsmanagement. Eine zentrale Annahme des zugrunde gelegten Kompetenzmodells ist, dass sich die Dimensionen nach der Wissensrepräsentation (deklaratives vs. schematisches und strategisches Wissen) und nach inhaltlichen Bereichen (Betriebswirtschaft, Nachhaltigkeit aus gesellschaftlicher Perspektive und Nachhaltigkeitsmanagement) unterscheiden. An der Studie nahmen 850 Studierende aus 16 deutschen Universitäten wirtschaftswissenschaftlicher Studiengänge teil. Die Analysen wurden auf der Grundlage von Strukturgleichungsmodellierungen durchgeführt. Die Ergebnisse zeigen einen erwartungskonformen Befund dahingehend, dass die über unterschiedliche Assessmentformate und inhaltliche Anforderungen adressierten Wissensarten zwei disjunkte Dimensionen darstellen. Die Modellanalysen zeigen eine bessere Passung zum mehrdimensionalen Modell, bei dem zwischen deklarativem Wissen im Bereich der Betriebswirtschaftslehre und der Nachhaltigkeit aus gesellschaftlicher Perspektive einerseits und dem Nachhaltigkeitsmanagement andererseits unterschieden wird. (DIPF/Orig.)
Abstract (english): The paper discusses an examination of the dimensions of a competence model for sustainability management. A central assumption is that the dimensions of the competence model differ according to knowledge representation (i. e., declarative vs. schematic and strategic knowledge) and content area (i. e., business administration and sustainability from a societal perspective, as well as sustainability management). Study participants included 850 students from 16 universities in Germany, and the analyses were conducted on the basis of structural equation modeling. The results reveal an expectation-compliant finding whereby the types of knowledge addressed by different assessment formats and content requirements can be presented in two disjunct dimensions. On the one hand, the model analyses indicate a better fit to the multidimensional model, which distinguishes between declarative knowledge in the field of business administration and sustainability from a social perspective, while on the other hand, the analyses suggest a better fit to sustainability management. (DIPF/Orig.)
DIPF-Departments: Bildungsqualität und Evaluation

Kompetenzdiagnostik Frey, Andreas; Hartig, Johannes Book Chapter | Aus: Harring, Marius; Rohlfs, Carsten; Gläser-Zikuda, Michaela (Hrsg.): Handbuch Schulpädagogik | Münster: Waxmann | 2019 38881 Endnote: Author(s): Frey, Andreas; Hartig, Johannes
Title: Kompetenzdiagnostik
In: Harring, Marius; Rohlfs, Carsten; Gläser-Zikuda, Michaela (Hrsg.): Handbuch Schulpädagogik, Münster: Waxmann, 2019 , S. 849-858
Publication Type: 4. Beiträge in Sammelwerken; Sammelband (keine besondere Kategorie)
Language: Deutsch
Keywords: Kompetenz; Diagnostik; Schülerleistung; Leistungsmessung; Kognitive Kompetenz; Schülerleistungstest; Testkonstruktion; Datenerfassung; Testauswertung; Interpretation; Qualität; Bewertung; Testdurchführung; Planung
Abstract: Mit dem vorliegenden Beitrag wird das Feld der Kompetenzdiagnostik zusammenfassend dargestellt. Konkret wird dabei beschrieben, (a) was unter Kompetenzdiagnostik zu verstehen ist und welche Ziele mit ihr verfolgt werden, (b) wie Kompetenztests entwickelt und (c) wie sie angewendet werden. Das Kapitel schließt mit einem zusammenfassenden Fazit und einem Ausblick auf künftige Entwicklungsmöglichkeiten. (DIPF/Orig.)
DIPF-Departments: Bildungsqualität und Evaluation

Vertiefende Analysen zur Umstellung des Modus von Papier auf Computer Goldhammer, Frank; Harrison, Scott; Bürger, Sarah; Kroehne, Ulf; Lüdtke, Oliver; […] Book Chapter | Aus: Reiss, Kristina; Weis, Mirjam; Klieme, Eckhard; Köller, Olaf (Hrsg.): PISA 2018: Grundbildung im internationalen Vergleich | Münster: Waxmann | 2019 39806 Endnote: Author(s): Goldhammer, Frank; Harrison, Scott; Bürger, Sarah; Kroehne, Ulf; Lüdtke, Oliver; Robitzsch, Alexander; Köller, Olaf; Heine, Jörg-Henrik; Mang, Julia
Title: Vertiefende Analysen zur Umstellung des Modus von Papier auf Computer
In: Reiss, Kristina; Weis, Mirjam; Klieme, Eckhard; Köller, Olaf (Hrsg.): PISA 2018: Grundbildung im internationalen Vergleich, Münster: Waxmann, 2019 , S. 163-186
URL: https://www.pisa.tum.de/fileadmin/w00bgi/www/Berichtsbaende_und_Zusammenfassungungen/PISA_2018_Berichtsband_online_29.11.pdf#page=163
Publication Type: 4. Beiträge in Sammelwerken; Sammelband (keine besondere Kategorie)
Language: Deutsch
Keywords: PISA <Programme for International Student Assessment>; Papier-Bleistift-Test; Technologiebasiertes Testen; Veränderung; Methode; Wirkung; Computerunterstütztes Verfahren; Testaufgabe; Antwort; Schwierigkeit; Lesen; Mathematik; Naturwissenschaften; Testkonstruktion; Testdurchführung; Korrelation; Vergleich; Deutschland
Abstract: In PISA 2015 wurde der Erhebungsmodus von Papier zu Computer umgestellt. Eine nationale Ergänzungsstudie im Rahmen von PISA 2018 hatte entsprechend das Ziel, vertiefende Analysen zu möglichen Unterschieden papierbasierter und computerbasierter Messungen durchzuführen. Im Fokus standen die Vergleichbarkeit des gemessenen Konstrukts und der einzelnen Aufgaben (Items), beispielsweise hinsichtlich ihrer Schwierigkeit. Darüber hinaus wurden die Auswirkungen des Moduswechsels auf die Vergleichbarkeit mit den Ergebnissen früherer PISA-Erhebungen in Deutschland untersucht. Als empirische Basis wurden Daten aus dem PISA-2015-Feldtest genutzt sowie Daten, die im Rahmen der nationalen PISA-Haupterhebung 2018 an einem zweiten Testtag mit papierbasierten Testheften aus PISA 2009 zusätzlich erhoben wurden. Erste Ergebnisse der Ergänzungsstudie liefern Belege für die Konstruktäquivalenz zwischen papier- und computerbasierten Messungen. Zudem weisen die Daten der Ergänzungsstudie darauf hin, dass die computerbasierten Items im Mittel etwas schwieriger sind als die papierbasierten Items. Hinsichtlich der Veränderungen zwischen 2015 und 2018 zeigt sich eine hohe Übereinstimmung von international berichtetem (originalem) und nationalem (marginalem) Trend. Die Veränderungen zwischen 2009 und 2018 fallen für den nationalen Trend, der allein auf papierbasierten Messungen beruht, insgesamt etwas günstiger aus als für den originalen Trend. (DIPF/Orig.)
DIPF-Departments: Bildungsqualität und Evaluation

Adaptive item selection under matroid constraints Bengs, Daniel; Brefeld, Ulf; Kröhne, Ulf Journal Article | In: Journal of Computerized Adaptive Testing | 2018 38642 Endnote: Author(s): Bengs, Daniel; Brefeld, Ulf; Kröhne, Ulf
Title: Adaptive item selection under matroid constraints
In: Journal of Computerized Adaptive Testing, 6 (2018) 2, S. 15-36
DOI: 10.7333/1808-0602015
URN: urn:nbn:de:0111-dipfdocs-166953
URL: http://www.dipfdocs.de/volltexte/2020/16695/pdf/JCAT_2018_2_Bengs_Brefeld_Kroehne_Adaptive_item_selection_under_matroid_constraints_A.pdf
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Language: Englisch
Keywords: Adaptives Testen; Algorithmus; Computerunterstütztes Verfahren; Itembank; Messverfahren; Technologiebasiertes Testen; Testkonstruktion
Abstract (english): The shadow testing approach (STA; van der Linden & Reese, 1998) is considered the state of the art in constrained item selection for computerized adaptive tests. The present paper shows that certain types of constraints (e.g., bounds on categorical item attributes) induce a matroid on the item bank. This observation is used to devise item selection algorithms that are based on matroid optimization and lead to optimal tests, as the STA does. In particular, a single matroid constraint can be treated optimally by an efficient greedy algorithm that selects the most informative item preserving the integrity of the constraints. A simulation study shows that for applicable constraints, the optimal algorithms realize a decrease in standard error (SE) corresponding to a reduction in test length of up to 10% compared to the maximum priority index (Cheng & Chang, 2009) and up to 30% compared to Kingsbury and Zara's (1991) constrained computerized adaptive testing.
DIPF-Departments: Bildungsqualität und Evaluation

How to conceptualize, represent, and analyze log data from technology-based assessments? A generic […] Kroehne, Ulf; Goldhammer, Frank Journal Article | In: Behaviormetrika | 2018 38895 Endnote: Author(s): Kroehne, Ulf; Goldhammer, Frank
Title: How to conceptualize, represent, and analyze log data from technology-based assessments? A generic framework and an application to questionnaire items
In: Behaviormetrika, 45 (2018) 2, S. 527-563
DOI: 10.1007/s41237-018-0063-y
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Beitrag in Sonderheft
Language: Englisch
Keywords: Bildungsforschung; Empirische Forschung; Logdatei; Datenanalyse; Technologiebasiertes Testen; PISA <Programme for International Student Assessment>; Fragebogen; Konzeption; Testkonstruktion; Daten; Typologie; Hardware; Antwort; Verhalten; Dauer; Interaktion; Mensch-Maschine-Kommunikation; Indikator
Abstract: Log data from educational assessments attract more and more attention and large-scale assessment programs have started providing log data as scientific use files. Such data generated as a by-product of computer-assisted data collection has been known as paradata in survey research. In this paper, we integrate log data from educational assessments into a taxonomy of paradata. To provide a generic framework for the analysis of log data, finite state machines are suggested. Beyond its computational value, the specific benefit of using finite state machines is achieved by separating platform-specific log events from the definition of indicators by states. Specifically, states represent filtered log data given a theoretical process model, and therefore, encode the information of log files selectively. The approach is empirically illustrated using log data of the context questionnaires of the Programme for International Student Assessment (PISA). We extracted item-level response time components from questionnaire items that were administered as item batteries with multiple questions on one screen and related them to the item responses. Finally, the taxonomy and the finite state machine approach are discussed with respect to the definition of complete log data, the verification of log data and the reproducibility of log data analyses. (DIPF/Orig.)
DIPF-Departments: Bildungsqualität und Evaluation

Experimental validation strategies for heterogeneous computer-based assessment items Engelhardt, Lena; Goldhammer, Frank; Naumann, Johannes; Frey, Andreas Journal Article | In: Computers in Human Behavior | 2017 37464 Endnote: Author(s): Engelhardt, Lena; Goldhammer, Frank; Naumann, Johannes; Frey, Andreas
Title: Experimental validation strategies for heterogeneous computer-based assessment items
In: Computers in Human Behavior, 76 (2017) , S. 683-692
DOI: 10.1016/j.chb.2017.02.020
URN: urn:nbn:de:0111-dipfdocs-176056
URL: http://www.dipfdocs.de/volltexte/2019/17605/pdf/Engelhardt_et_al._2017_ManuscriptAccepted_A.pdf
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Beitrag in Sonderheft
Language: Englisch
Keywords: Leistungstest; Leistungsmessung; Medienkompetenz; Computerunterstütztes Verfahren; Validität; Testaufgabe; Testkonstruktion; Anpassung; Strategie; Veränderung; Testmethodik; Testtheorie
Abstract (english): Computer-based assessments open up new possibilities to measure constructs in authentic settings. They are especially promising to measure 21st century skills, as for instance information and communication technologies (ICT) skills. Items tapping such constructs may be diverse regarding design principles and content and thus form a heterogeneous item set. Existing validation approaches, as the construct representation approach by Embretson (1983), however, require homogenous item sets in the sense that a particular task characteristic can be applied to all items. To apply this validation rational also for heterogeneous item sets, two experimental approaches are proposed based on the idea to create variants of items by systematically manipulating task characteristics. The change-approach investigates whether the manipulation affects construct-related demands and the eliminate-approach whether the test score represents the targeted skill dimension. Both approaches were applied within an empirical study (N = 983) using heterogeneous items from an ICT skills test. The results show how changes of ICT-specific task characteristics influenced item difficulty without changing the represented construct. Additionally, eliminating the intended skill dimension led to easier items and changed the construct partly. Overall, the suggested experimental approaches provide a useful validation tool for 21st century skills assessed by heterogeneous items. (DIPF/Orig.)
DIPF-Departments: Bildungsqualität und Evaluation

Modeling individual response time effects between and within experimental speed conditions. A GLMM […] Goldhammer, Frank; Steinwascher, Merle A.; Kroehne, Ulf; Naumann, Johannes Journal Article | In: British Journal of Mathematical and Statistical Psychology | 2017 37357 Endnote: Author(s): Goldhammer, Frank; Steinwascher, Merle A.; Kroehne, Ulf; Naumann, Johannes
Title: Modeling individual response time effects between and within experimental speed conditions. A GLMM approach for speeded tests
In: British Journal of Mathematical and Statistical Psychology, 70 (2017) 2, S. 238-256
DOI: 10.1111/bmsp.12099
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Beitrag in Sonderheft
Language: Englisch
Keywords: Test; Testkonstruktion; Antwort; Dauer; Unterschied; Messverfahren; Entscheidung; Einflussfaktor; Fehler; Modell; Vergleich
Abstract: Completing test items under multiple speed conditions avoids the performance measure being confounded with individual differences in the speed-accuracy compromise, and offers insights into the response process, that is, how response time relates to the probability of a correct response. This relation is traditionally represented by two conceptually different functions: the speed-accuracy trade-off function (SATF) across conditions relating the condition average response time to the condition average of accuracy, and the conditional accuracy function (CAF) within a condition describing accuracy conditional on response time. Using a generalized linear mixed modelling approach, we propose an item response modelling framework that is suitable for item response and response time data from experimental speed conditions. The proposed SATF and CAF model accommodates response time effects between conditions (i.e., person and item SATF slope) and within conditions (i.e., residual CAF slopes), captures person and item differences in these effects, and is suitable for measures with a strong speed component. Moreover, for a single condition a CAF model is proposed distinguishing person, item and residual CAF. The properties of the models are illustrated with an empirical example. (DIPF/Orig.)
DIPF-Departments: Bildungsqualität und Evaluation

Practical significance of item misfit in educational assessments Köhler, Carmen; Hartig, Johannes Journal Article | In: Applied Psychological Measurement | 2017 37161 Endnote: Author(s): Köhler, Carmen; Hartig, Johannes
Title: Practical significance of item misfit in educational assessments
In: Applied Psychological Measurement, 41 (2017) 5, S. 388-400
DOI: 10.1177/0146621617692978
URN: urn:nbn:de:0111-pedocs-156084
URL: https://nbn-resolving.org/urn:nbn:de:0111-pedocs-156084
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Language: Englisch
Keywords: Item-Response-Theory; Korrelation; Leistungsmessung; Rasch-Modell; Schülerleistung; Schülerleistungstest; Testkonstruktion; Testtheorie; Validität
Abstract: Testing item fit is an important step when calibrating and analyzing item response theory (IRT)-based tests, as model fit is a necessary prerequisite for drawing valid inferences from estimated parameters. In the literature, numerous item fit statistics exist, sometimes resulting in contradictory conclusions regarding which items should be excluded from the test. Recently, researchers argue to shift the focus from statistical item fit analyses to evaluating practical consequences of item misfit. This article introduces a method to quantify potential bias of relationship estimates (e.g., correlation coefficients) due to misfitting items. The potential deviation informs about whether item misfit is practically significant for outcomes of substantial analyses. The method is demonstrated using data from an educational test. (DIPF/Orig.)
DIPF-Departments: Bildungsqualität und Evaluation

Incremental validity of multidimensional proficiency scores from diagnostic classification models: […] Kunina-Habenicht, Olga; Rupp, André A.; Wilhelm, Oliver Journal Article | In: International Journal of Testing | 2017 37179 Endnote: Author(s): Kunina-Habenicht, Olga; Rupp, André A.; Wilhelm, Oliver
Title: Incremental validity of multidimensional proficiency scores from diagnostic classification models: An illustration for elementary school mathematics
In: International Journal of Testing, 17 (2017) 4, S. 277-301
DOI: 10.1080/15305058.2017.1291517
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Language: Englisch
Keywords: Arithmetik; Diagnostik; Empirische Untersuchung; Item-Response-Theory; Leistungsmessung; Mathematische Kompetenz; Modell; Regressionsanalyse; Reliabilität; Schülerleistung; Schülerleistungstest; Schuljahr 04; Testkonstruktion; Validität
Abstract (english): Diagnostic classification models (DCMs) hold great potential for applications in summative and formative assessment by providing discrete multivariate proficiency scores that yield statistically-driven classifications of students. Using data from a newly developed diagnostic arithmetic assessment that was administered to 2,032 fourth-grade students in Germany, we evaluated whether the multidimensional proficiency scores from the best-fitting DCM have an added value, over and above the unidimensional proficiency score from a simpler unidimensional IRT model, in explaning variance in external (a) school grades in mathematics and (b) unidimensional proficiency scores from a standards-based large-scale assessment of mathematics. Results revealed high classification reliabilities as well as interpretable parameter estimates for items and students for the best-fitting DCM. However, while DCM scores were moderatly correlated with both external criteria, only a negligible incremental validity of the multivariate attribute scores was found. (DIPF/Orig.)
DIPF-Departments: Bildungsqualität und Evaluation