-
-
Autor*innen: Fährmann, Katharina; Köhler, Carmen; Hartig, Johannes; Heine, Jörg‑Henrik
Titel: Practical significance of item misfit and its manifestations in constructs assessed in large‑scale studies
In: Large-scale Assessments in Education, 10 (2022) , S. 7
DOI: 10.1186/s40536‑022‑00124‑w
URL: https://largescaleassessmentsineducation.springeropen.com/articles/10.1186/s40536-022-00124-w
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Abstract (english): When scaling psychological tests with methods of item response theory it is necessary to investigate to what extent the responses correspond to the model predictions. In addition to the statistical evaluation of item misfit, the question arises as to its practical significance. Although item removal is undesirable for several reasons, its practical consequences are rarely investigated and focus mostly on main survey data with pre-selected items. In this paper, we identify criteria to evaluate practical significance and discuss them with respect to various types of assessments and their particular purposes. We then demonstrate the practical consequences of item misfit using two data examples from the German PISA 2018 field trial study: one with cognitive data and one with non-cognitive/metacognitive data. For the former, we scale the data under the GPCM with and without the inclusion of misfitting items, and investigate how this influences the trait distribution and the allocation to reading competency levels. For non-cognitive/metacognitive data, we explore the effect of excluding misfitting items on estimated gender differences. Our results indicate minor practical consequences for person allocation and no changes in the estimated gender-difference effects. (DIPF/Orig.)
DIPF-Abteilung: Lehr und Lernqualität in Bildungseinrichtungen
-
-
Autor*innen: Goldhammer, Frank; Hahnel, Carolin; Kroehne, Ulf; Zehner, Fabian
Titel: From byproduct to design factor. On validating the interpretation of process indicators based on log data
In: Large-scale Assessments in Education, 9 (2021) , S. 20
DOI: 10.1186/s40536-021-00113-5
URN: urn:nbn:de:0111-pedocs-250050
URL: https://nbn-resolving.org/urn:nbn:de:0111-pedocs-250050
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Leistungstest; Logdatei; PISA <Programme for International Student Assessment>; PIAAC <Programme for the International Assessment of Adult Competencies>; Datenanalyse; Interpretation; Leistungsmessung; Messverfahren; Indikator; Typologie; Testkonstruktion; Testtheorie
Abstract (english): International large-scale assessments such as PISA or PIAAC have started to provide public or scientific use files for log data; that is, events, event-related attributes and timestamps of test-takers' interactions with the assessment system. Log data and the process indicators derived from it can be used for many purposes. However, the intended uses and interpretations of process indicators require validation, which here means a theoretical and/or empirical justification that inferences about (latent) attributes of the test-taker's work process are valid. This article reviews and synthesizes measurement concepts from various areas, including the standard assessment paradigm, the continuous assessment approach, the evidence-centered design (ECD) framework, and test validation. Based on this synthesis, we address the questions of how to ensure the valid interpretation of process indicators by means of an evidence-centered design of the task situation, and how to empirically challenge the intended interpretation of process indicators by developing and implementing correlational and/or experimental validation strategies. For this purpose, we explicate the process of reasoning from log data to low-level features and process indicators as the outcome of evidence identification. In this process, contextualizing information from log data is essential in order to reduce interpretative ambiguities regarding the derived process indicators. Finally, we show that empirical validation strategies can be adapted from classical approaches investigating the nomothetic span and construct representation. Two worked examples illustrate possible validation strategies for the design phase of measurements and their empirical evaluation. (DIPF/Orig.)
DIPF-Abteilung: Lehr und Lernqualität in Bildungseinrichtungen
-
-
Autor*innen: Goldhammer, Frank; Kroehne, Ulf; Hahnel, Carolin; De Boeck, Paul
Titel: Controlling speed in component skills of reading improves the explanation of reading comprehension
In: Journal of Educational Psychology, 113 (2021) 5, S. 861-878
DOI: 10.1037/edu0000655
URN: urn:nbn:de:0111-pedocs-237977
URL: https://nbn-resolving.org/urn:nbn:de:0111-pedocs-237977
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Lesekompetenz; Fertigkeit; Kognitive Prozesse; Leistung; Antwort; Zeit; Wort; Semantik; Text; Leseverstehen; PISA <Programme for International Student Assessment>; Schüler; Messverfahren; Test; Experimentelle Untersuchung; Empirische Untersuchung; Deutschland
Abstract: Efficiency in reading component skills is crucial for reading comprehension, as efficient subprocesses do not extensively consume limited cognitive resources, making them available for comprehension processes. Cognitive efficiency is typically measured with speeded tests of relatively easy items. Observed responses and response times indicate the latent variables of ability and speed. Interpreting only ability or speed as efficiency may be misleading because there is a within-person dependency between both variables (speed-ability tradeoff [SAT]). Therefore, the present study measures efficiency as ability conditional on speed by controlling speed experimentally with item-level time limits. The proposed timed ability measures of reading component skills are expected to have a clearer interpretation in terms of efficiency and to be better predictors for reading comprehension. To support this claim, this study investigates two component skills, visual word recognition and sentence-level semantic integration (sentence reading), to understand how differences in ability in a timed condition are related to differences in ability and speed in a traditional untimed condition. Moreover, untimed and timed reading component skill measures were used to explain reading comprehension. A German subsample from Programme for International Student Assessment (PISA) 2012 completed the reading component skills tasks with and without item-level time limits and PISA reading tasks. The results showed that timed ability is only moderately related to untimed ability. Furthermore, timed ability measures proved to be stronger predictors of sentence-level and text-level reading comprehension than the corresponding untimed ability and speed measures, although using untimed ability and speed jointly as predictors increased the amount of explained variance.
Abstract (english): Efficiency in reading component skills is crucial for reading comprehension, as efficient subprocesses do not extensively consume limited cognitive resources, making them available for comprehension processes. Cognitive efficiency is typically measured with speeded tests of relatively easy items. Observed responses and response times indicate the latent variables of ability and speed. Interpreting only ability or speed as efficiency may be misleading because there is a within-person dependency between both variables (speed-ability tradeoff [SAT]). Therefore, the present study measures efficiency as ability conditional on speed by controlling speed experimentally with item-level time limits. The proposed timed ability measures of reading component skills are expected to have a clearer interpretation in terms of efficiency and to be better predictors for reading comprehension. To support this claim, this study investigates two component skills, visual word recognition and sentence-level semantic integration (sentence reading), to understand how differences in ability in a timed condition are related to differences in ability and speed in a traditional untimed condition. Moreover, untimed and timed reading component skill measures were used to explain reading comprehension. A German subsample from Programme for International Student Assessment (PISA) 2012 completed the reading component skills tasks with and without item-level time limits and PISA reading tasks. The results showed that timed ability is only moderately related to untimed ability. Furthermore, timed ability measures proved to be stronger predictors of sentence-level and text-level reading comprehension than the corresponding untimed ability and speed measures, although using untimed ability and speed jointly as predictors increased the amount of explained variance.
DIPF-Abteilung: Lehr und Lernqualität in Bildungseinrichtungen
-
-
Autor*innen: Weis, Mirjam; Reiss, Kristina; Mang, Julia; Schiepe-Tiska, Anja; Diedrich, Jenniger; Roczen, Nina; Jude, Nina
Titel: Global competence in PISA 2018. Einstellungen von Fünfzehnjährigen in Deutschland zu globalen und interkulturellen Themen
Erscheinungsvermerk: Münster: Waxmann, 2020 (Wissenschaft macht Schule, 2)
DOI: 10.31244/978383099300
URN: urn:nbn:de:0111-pedocs-210696
URL: https://www.pedocs.de/frontdoor.php?source_opus=21069
Dokumenttyp: 1. Monographien (Autorenschaft); Monographie
Sprache: Deutsch
Schlagwörter: Deutschland; Einstellung <Psy>; Eltern; Fragebogenerhebung; Globales Denken; Globales Lernen; Globalisierung; Interkulturalität; Interkulturelle Kompetenz; Internationaler Vergleich; Jugendlicher; Lehrer; PISA <Programme for International Student Assessment>; Schüler; Schülerperspektive; Schulform; Schulleiter; Selbsteinschätzung
Abstract: In der PISA-Studie 2018 wurde als innovative Domäne erstmals Global Competence bei fünfzehnjährigen Schülerinnen und Schülern erfasst. In dieser Zusatzerhebung werden das selbsteingeschätzte Wissen von Schülerinnen und Schülern zu Themen mit lokaler und globaler Bedeutung (z. B. Klimawandel, Armut, Pandemien) sowie ihre Einstellungen zu globalen und interkulturellen Themen in den Blick genommen. Dabei geht es beispielsweise um den respektvollen Umgang mit Menschen unterschiedlicher nationaler Herkunft und entsprechendem ethnischen, religiösen, sozialen oder kulturellen Hintergrund. Diese Broschüre stellt die Ergebnisse der Schülerinnen und Schüler in Deutschland aus der Zusatzauswertung Global Competence bei der PISA-Studie 2018 vor und betrachtet diese im internationalen Vergleich. Zusätzlich werden die Sicht der Schulleitungen und Lehrkräfte in den verschiedenen Schularten sowie die Sicht der Eltern einbezogen. (DIPF/Orig.)
DIPF-Abteilung: Bildungsqualität und Evaluation
-
-
Autor*innen: Aditomo, Anindito; Köhler, Carmen
Titel: Do student ratings provide reliable and valid information about teaching quality at the school level? Evaluating measures of science teaching in PISA 2015
In: Educational Assessment, Evaluation and Accountability, 32 (2020) 3, S. 275-310
DOI: 10.1007/s11092-020-09328-6
URL: https://link.springer.com/article/10.1007/s11092-020-09328-6
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Naturwissenschaftlicher Unterricht; Qualität; Messung; Evaluation; Schüler; Bewertung; Schülerurteil; Validität; Reliabilität; Schulklima; Einflussfaktor; Wirkung; PISA <Programme for International Student Assessment>; Modell; Faktorenanalyse; Empirische Untersuchung; OECD-Länder
Abstract: Large-scale educational surveys, including PISA, often collect student ratings to assess teaching quality. Because of the sampling design in PISA, student ratings must be aggregated at the school level instead of the classroom level. To what extent does school-level aggregation of student ratings yield reliable and valid measures of teaching quality? We investigate this question for six scales measuring classroom management, emotional support, inquiry-based instruction, teacher-directed instruction, adaptive instruction, and feedback provided by PISA 2015. The sample consisted of 503,146 students from 17,678 schools in 69 countries/regions. Multilevel CFA and SEM were conducted for each scale in each country/region to evaluate school-level reliability (intraclass correlations 1 and 2), factorial validity, and predictive validity. In most countries/regions, school-level reliability was found to be adequate for the classroom management scale, but only low to moderate for the other scales. Examination of factorial and predictive validity indicated that the classroom management, emotional support, adaptive instruction, and teacher-directed instruction scales capture meaningful differences in teaching quality between schools. Meanwhile, the inquiry scale exhibited poor validity in almost all countries/regions. These findings suggest the possibility of using student ratings in PISA to investigate some aspects of school-level teaching quality in most countries/regions. (DIPF/Orig.)
DIPF-Abteilung: Bildungsqualität und Evaluation
-
-
Autor*innen: Buchholz, Janine; Hartig, Johannes
Titel: Measurement invariance testing in questionnaires. A comparison of three Multigroup-CFA and IRT-based approaches
In: Psychological Test and Assessment Modelling, 62 (2020) 1, S. 29-54
URL: https://www.psychologie-aktuell.com/fileadmin/Redaktion/Journale/ptam-2020-1/03_Buchholz.pdf
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Beitrag in Sonderheft
Sprache: Englisch
Schlagwörter: PISA <Programme for International Student Assessment>; Item-Response-Theorie; Faktorenanalyse; Schülerleistung; Leistungsmessung; Messung; Invarianz; Validität; Statistische Methode
Abstract (english): International Large-Scale Assessments aim at comparisons of countries with respect to latent constructs such as attitudes, values and beliefs. Measurement invariance (MI) needs to hold in order for such comparisons to be valid. Several statistical approaches to test for MI have been proposed: While Multigroup Confirmatory Factor Analysis (MGCFA) is particularly popular, a newer, IRT-based approach was introduced for non-cognitive constructs in PISA 2015, thus raising the question of consistency between these approaches. A total of three approaches (MGCFA for ordinal and continuous data, multi-group IRT) were applied to simulated data containing different types and extents of MI violations, and to the empirical non-cognitive PISA 2015 data. Analyses are based on indices of the magnitude (i.e., parameter-specific modification indices resulting from MGCFA and group-specific item fit statistics resulting from the IRT approach) and direction of local misfit (i.e., standardized parameter change and mean deviation, respectively). Results indicate that all measures were sensitive to (some) MI violations and more consistent in identifying group differences in item difficulty parameters.
DIPF-Abteilung: Bildungsqualität und Evaluation
-
-
Autor*innen: Eichmann, Beate; Goldhammer, Frank; Greiff, Samuel; Brandhuber, Liene; Naumann, Johannes
Titel: Using process data to explain group differences in complex problem solving
In: Journal of Educational Psychology, 112 (2020) 8, S. 1546-1562
DOI: 10.1037/edu0000446
URN: urn:nbn:de:0111-pedocs-232721
URL: https://nbn-resolving.org/urn:nbn:de:0111-pedocs-232721
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: PISA <Programme for International Student Assessment>; Problemlösen; Schülerleistung; Leistungsmessung; Geschlechtsspezifischer Unterschied; Migrationshintergrund; Computerunterstütztes Verfahren; Logdatei; Interaktion; Exploration; Verhalten; Vorwissen; Wirkung; Indikator; Leistung; Unterschied; Messverfahren; OECD-Länder
Abstract: In large-scale assessments, performance differences across different groups are regularly found. These group differences (e.g., gender differences) are often relevant for educational policy decisions and measures. However, the formation of these group differences usually remains unclear. We propose an approach for investigating this formation by considering behavioral process measures as mediating variables between group membership and performance on the 2012 Programme for International Student Assessment complex problem solving (CPS) items. We found that across all investigated countries interactive behavior can fully explain gender differences in CPS, but cannot explain differences between students with and without a migration background. However, in some countries these results differ from the cross-country results. Our results indicate that process measures derived from log data are useful for further investigating and explaining performance differences between girls and boys and students with and without migration background. (DIPF/Orig.)
DIPF-Abteilung: Bildungsqualität und Evaluation
-
-
Autor*innen: Eichmann, Beate; Greiff, Samuel; Naumann, Johannes; Brandhuber, Liene; Goldhammer, Frank
Titel: Exploring behavioural patterns during complex problem‐solving
In: Journal of Computer Assisted Learning, 36 (2020) 6, S. 933-956
DOI: 10.1111/jcal.12451
URN: urn:nbn:de:0111-pedocs-232225
URL: https://nbn-resolving.org/urn:nbn:de:0111-pedocs-232225
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Problemlösen; Exploration; Datenanalyse; PISA <Programme for International Student Assessment>; Verhaltensmuster; Sequenz; Analyse
Abstract: In this explorative study, we investigate how sequences of behaviour are related to success or failure in complex problem‐solving (CPS). To this end, we analysed log data from two different tasks of the problem‐solving assessment of the Programme for International Student Assessment 2012 study (n = 30,098 students). We first coded every interaction of students as (initial or repeated) exploration, (initial or repeated) goal‐directed behaviour, or resetting the task. We then split the data according to task successes and failures. We used full‐path sequence analysis to identify groups of students with similar behavioural patterns in the respective tasks. Double‐checking and minimalistic behaviour was associated with success in CPS, while guessing and exploring task‐irrelevant content was associated with failure. Our findings held for both tasks investigated, from two different CPS measurement frameworks. We thus gained detailed insight into the behavioural processes that are related to success and failure in CPS. (DIPF/Orig.)
DIPF-Abteilung: Bildungsqualität und Evaluation
-
-
Autor*innen: Fischer, Jessica; He, Jia; Klieme, Eckhard
Titel: The structure of teaching practices across countries. A combination of factor analysis and network analysis
In: Studies in Educational Evaluation, 65 (2020) , S. 100861
DOI: 10.1016/j.stueduc.2020.100861
URN: urn:nbn:de:0111-pedocs-203522
URL: https://nbn-resolving.org/urn:nbn:de:0111-pedocs-203522
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Unterrichtspraxis; Schülerleistung; Leistungsbewertung; Faktorenanalyse; Netzwerkanalyse; Datenanalyse; Sekundäranalyse; PISA <Programme for International Student Assessment>; Internationaler Vergleich; Interkultureller Vergleich
Abstract (english): Teaching practices are pivotal for student learning. Due to pedagogical traditions and national cultures, the structure of teaching practices may differ across countries. This study investigates the structure of teaching practices across 12 countries grouped into four major linguistic/cultural clusters. First, factor analysis is applied to investigate if the theoretical distinction between teacher-directed and student-centred practices is generalizable across countries. Then, network analysis is used to explore how individual classroom assessment practices relate to either teacher-directed or student-centred practices. Main findings include that: (1) teacher-directed and student-centred practices are two distinct factors across countries; (2) the overall structure and connectivity of teaching practices differs across countries, with smaller differences within linguistic/cultural clusters; and (3) assessment practices with the aim to structure and guide learning strongly relate to teacher-directed practices, whereas assessment practices with the aim to individualize instruction more relate to student-centred practices. We discuss the global patterning and implications.
DIPF-Abteilung: Bildungsqualität und Evaluation
-
-
Autor*innen: He, Jia; Fischer, Jessica
Titel: Differential associations of school practices with achievement and sense of belonging of immigrant and non-immigrant students
In: Journal of Applied Developmental Psychology, 67 (2020) , S. 101089
DOI: 10.1016/j.appdev.2019.101089
URL: https://www.sciencedirect.com/science/article/pii/S0193397319301078?dgcid=aut10.1016/j.appdev.2019.101089hor
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Beitrag in Sonderheft
Sprache: Englisch
Schlagwörter: Deutschland; Italien; Spanien; Schüler; Immigrant; Lernergebnis; Schulische Integration; Einflussfaktor; Homogene Gruppierung; Notengebung; Schulpraxis; Außerunterrichtliche Aktivität; Mehrebenenanalyse; Datenanalyse; PISA <Programme for International Student Assessment>
Abstract (english): We are interested in identifying "malleable" school and classroom practices to enhance immigrant students' learning. Using PISA 2015 data from Germany, Italy, and Spain we test the differential associations of school-level practices with achievement and sense of belonging at school for students with and without an immigrant background. We found that (1) in-school ability grouping was invariably, negatively related to achievement of both student groups, and the effects were stronger for immigrant than nonimmigrant students; (3) grading based on "hard" factors was not related to achievement, but it showed differential associations with sense of belonging in Germany; (4) grading based on "soft" factors and provision of extracurricular activities also showed mixed associations with the outcomes across countries and did not fulfil the potential to enhance immigrant students' outcomes. We discuss these findings and implications. (DIPF/Orig.)
DIPF-Abteilung: Bildungsqualität und Evaluation