Search results in the DIPF database of publications

Your query:

(Schlagwörter: "Leistungsmessung")

Dealing with item nonresponse in large-scale cognitive assessments. The impact of missing data […] Köhler, Carmen; Pohl, Steffi; Carstensen, Claus H. Journal Article | In: Journal of Educational Measurement | 2017 38004 Endnote: Author(s): Köhler, Carmen; Pohl, Steffi; Carstensen, Claus H.
Title: Dealing with item nonresponse in large-scale cognitive assessments. The impact of missing data methods on estimated explanatory relationships
In: Journal of Educational Measurement, 54 (2017) 4, S. 397-419
DOI: 10.1111/jedm.12154
URN: urn:nbn:de:0111-dipfdocs-174619
URL: http://www.dipfdocs.de/volltexte/2019/17461/pdf/KoehlerPohlCarstensen2017_A.pdf
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Language: Englisch
Keywords: Datenanalyse; Schülerleistungstest; Leistungsmessung; Fehlende Daten; Fragebogen; Antwort; Wirkung; Messverfahren; PISA <Programme for International Student Assessment>; Panel; Lesekompetenz; Regression; Simulation
Abstract: Competence data from low-stakes educational large-scale assessment studies allow for evaluating relationships between competencies and other variables. The impact of item-level nonresponse has not been investigated with regard to statistics that determine the size of these relationships (e.g., correlations, regression coefficients). Classical approaches such as ignoring missing values or treating them as incorrect are currently applied in many large-scale studies, while recent model-based approaches that can account for nonignorable nonresponse have been developed. Estimates of item and person parameters have been demonstrated to be biased for classical approaches when missing data are missing not at random (MNAR). In our study, we focus on parameter estimates of the structural model (i.e., the true regression coefficient when regressing competence on an explanatory variable), simulating data according to various missing data mechanisms. We found that model-based approaches and ignoring missing values performed well in retrieving regression coefficients even when we induced missing data that were MNAR. Treating missing values as incorrect responses can lead to substantial bias. We demonstrate the validity of our approach empirically and discuss the relevance of our results. (DIPF/Orig.)
DIPF-Departments: Bildungsqualität und Evaluation

Incremental validity of multidimensional proficiency scores from diagnostic classification models: […] Kunina-Habenicht, Olga; Rupp, André A.; Wilhelm, Oliver Journal Article | In: International Journal of Testing | 2017 37179 Endnote: Author(s): Kunina-Habenicht, Olga; Rupp, André A.; Wilhelm, Oliver
Title: Incremental validity of multidimensional proficiency scores from diagnostic classification models: An illustration for elementary school mathematics
In: International Journal of Testing, 17 (2017) 4, S. 277-301
DOI: 10.1080/15305058.2017.1291517
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Language: Englisch
Keywords: Arithmetik; Diagnostik; Empirische Untersuchung; Item-Response-Theory; Leistungsmessung; Mathematische Kompetenz; Modell; Regressionsanalyse; Reliabilität; Schülerleistung; Schülerleistungstest; Schuljahr 04; Testkonstruktion; Validität
Abstract (english): Diagnostic classification models (DCMs) hold great potential for applications in summative and formative assessment by providing discrete multivariate proficiency scores that yield statistically-driven classifications of students. Using data from a newly developed diagnostic arithmetic assessment that was administered to 2,032 fourth-grade students in Germany, we evaluated whether the multidimensional proficiency scores from the best-fitting DCM have an added value, over and above the unidimensional proficiency score from a simpler unidimensional IRT model, in explaning variance in external (a) school grades in mathematics and (b) unidimensional proficiency scores from a standards-based large-scale assessment of mathematics. Results revealed high classification reliabilities as well as interpretable parameter estimates for items and students for the best-fitting DCM. However, while DCM scores were moderatly correlated with both external criteria, only a negligible incremental validity of the multivariate attribute scores was found. (DIPF/Orig.)
DIPF-Departments: Bildungsqualität und Evaluation

A double-edged sword? On the benefit, detriment, and net effect of dimensional comparison on […] Müller-Kalthoff, Hanno; Jansen, Malte; Schiefer, Irene; Helm, Friederike; Nagy, Nicole; Möller, Jens Journal Article | In: Journal of Educational Psychology | 2017 36831 Endnote: Author(s): Müller-Kalthoff, Hanno; Jansen, Malte; Schiefer, Irene; Helm, Friederike; Nagy, Nicole; Möller, Jens
Title: A double-edged sword? On the benefit, detriment, and net effect of dimensional comparison on self-concept
In: Journal of Educational Psychology, 109 (2017) 7, S. 1029-1047
DOI: 10.1037/edu0000171
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Language: Englisch
Keywords: Befragung; Deutschunterricht; Empirische Untersuchung; Experimentelle Untersuchung; Feedback; Feldstudien; Lehramtsstudent; Leistungsbeurteilung; Leistungsmessung; Mathematikunterricht; Schüler; Schülerleistung; Schuljahr 06; Schuljahr 09; Schulnoten; Sekundarstufe I; Selbstkonzept; Vergleich; Vignette <Methode>; Wirkung
Abstract: Dimensional comparison theory (DCT; Möller & Marsh, 2013) assumes that students compare their academic achievement intraindividually across domains to form domain-specific self-concepts. Upward dimensional comparisons are believed to lead to lower self-concepts in the worse-off domain, while downward dimensional comparisons should lead to higher self-concepts in the better-off domain. Furthermore, DCT assumes the net effect of upward and downward dimensional comparisons to be beneficial to the self. To test these assumptions, 3 experiments and 2 field studies were conducted investigating the relative effects of upward and downward dimensional comparisons as well as their net effect. In Studies 1 (N = 149), 2 (N = 150) and 3 (N = 300), participants were asked to infer self-concepts of fictitious students after receiving experimentally manipulated information about their achievements in 2 domains, whereas participants in Studies 4 (N = 2,268) and 5 (N = 20,662) assessed their own self-concepts in German and mathematics. In all studies, downward dimensional comparisons resulted in higher self-concepts, whereas upward dimensional comparisons led to lower self-concepts. The net effect of dimensional comparisons was always found to be not statistically different from zero. The findings therefore support the central prediction of DCT on the discreteness of the effects of upward and downward dimensional comparisons, yet do not support the assumed positivity of their net effect. Furthermore, results indicate the effect patterns to be rather universal as they were stable across different samples, domains, achievement situations, research designs, and types of assessment. (DIPF/Orig.)
DIPF-Departments: Struktur und Steuerung des Bildungswesens

Absolute and relative measures of instructional sensitivity Naumann, Alexander; Hartig, Johannes; Hochweber, Jan Journal Article | In: Journal of Educational and Behavioral Statistics | 2017 37374 Endnote: Author(s): Naumann, Alexander; Hartig, Johannes; Hochweber, Jan
Title: Absolute and relative measures of instructional sensitivity
In: Journal of Educational and Behavioral Statistics, 42 (2017) 6, S. 678-705
DOI: 10.3102/1076998617703649
URN: urn:nbn:de:0111-pedocs-156029
URL: http://www.dipfdocs.de/volltexte/2018/15602/pdf/1076998617703649_A.pdf
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Language: Englisch
Keywords: Bewertung; DESI <Deutsch-Englisch-Schülerleistungen-International>; Deutschland; Englischunterricht; Item-Response-Theory; Leistungsmessung; Messverfahren; Schüler; Schülerleistung; Schuljahr 09; Sprachkompetenz; Test; Testkonstruktion; Testtheorie; Unterricht; Wirkung
Abstract: Valid inferences on teaching drawn from students' test scores require that tests are sensitive to the instruction students received in class. Accordingly, measures of the test items' instructional sensitivity provide empirical support for validity claims about inferences on instruction. In the present study, we first introduce the concepts of absolute and relative measures of instructional sensitivity. Absolute measures summarize a single item's total capacity of capturing effects of instruction, which is independent of the test's sensitivity. In contrast, relative measures summarize a single item's capacity of capturing effects of instruction relative to test sensitivity. Then, we propose a longitudinal multilevel item response theory model that allows estimating both types of measures depending on the identification constraints. (DIPF/Orig.)
DIPF-Departments: Bildungsqualität und Evaluation

Time-on-task effects in digital reading are non-linear and moderated by persons' skills and tasks' […] Naumann, Johannes; Goldhammer, Frank Journal Article | In: Learning and Individual Differences | 2017 36715 Endnote: Author(s): Naumann, Johannes; Goldhammer, Frank
Title: Time-on-task effects in digital reading are non-linear and moderated by persons' skills and tasks' demands
In: Learning and Individual Differences, 53 (2017) , S. 1-16
DOI: 10.1016/j.lindif.2016.10.002
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Language: Englisch
Keywords: Digitale Medien; Hypertext; Internationaler Vergleich; Kognitive Prozesse; Leistungsmessung; Lesekompetenz; Lesen; Leseverstehen; Modell; OECD-Länder; PISA <Programme for International Student Assessment>; Problemlösen; Schülerleistung; Technologiebasiertes Testen; Testaufgabe; Testkonstruktion; Wirkung; Zeit
Abstract: Time-on-task effects on response accuracy in digital reading tasks were examined using PISA 2009 data (N = 34,062, 19 countries/economies). As a baseline, task responses were explained by time on task, tasks' easiness, and persons' digital reading skill (Model 1). Model 2 added a quadratic time-on-task effect, persons' comprehension skill and tasks' navigation demands as predictors. In each country, linear and quadratic time-on-task effects were moderated by person and task characteristics. Strongly positive linear time-on-task effects were found for persons being poor digital readers (Model 1) and poor comprehenders (Model 2), which decreased with increasing skill. Positive linear time-on-task effects were found for hard tasks (Model 1) and tasks high in navigation demands (Model 2). For easy tasks and tasks low in navigation demands, the time-on-task effects were negative, or close to zero, respectively. A negative quadratic component of the time-on-task effect was more pronounced for strong comprehenders, while the linear component was weaker. Correspondingly, for tasks high in navigation demands the negative quadratic component to the time-on-task effect was weaker, and the linear component was stronger. These results are in line with a dual-processing account of digital reading that distinguishes automatic reading components from resource-demanding regulation and navigation processes. (DIPF/Orig.)
DIPF-Departments: Bildungsqualität und Evaluation

Change in test-taking motivation and its relationship to test performance in low-stakes assessments Penk, Christiane; Richter, Dirk Journal Article | In: Educational Assessment, Evaluation and Accountability | 2017 36815 Endnote: Author(s): Penk, Christiane; Richter, Dirk
Title: Change in test-taking motivation and its relationship to test performance in low-stakes assessments
In: Educational Assessment, Evaluation and Accountability, 29 (2017) 1, S. 55-79
DOI: 10.1007/s11092-016-9248-7
URN: urn:nbn:de:0111-pedocs-174284
URL: http://nbn-resolving.org/urn:nbn:de:0111-pedocs-174284
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Language: Englisch
Keywords: Datenanalyse; Deutschland; Erfolg; Erwartung; Kognitive Kompetenz; Längsschnittuntersuchung; Latente Wachstumskurvenmodelle; Leistungsmessung; Mathematik; Motivation; Schüler; Schülerleistungstest; Strukturgleichungsmodell; Test
Abstract (english): Since the turn of the century, an increasing number of low-stakes assessments (i.e., assessments without direct consequences for the test-takers) are being used to evaluate the quality of educational systems. Internationally, research has shown that low-stakes test results can be biased due to students' low test-taking motivation and that students' effort levels can vary throughout a testing session involving both cognitive and noncognitive tests. Thus, it is possible that students' motivation varies throughout a single cognitive test and in turn affects test performance. This study examines the change in test-taking motivation within a 2-h cognitive low-stakes test and its association with test performance. Based on expectancy-value theory, we assessed three components of test-taking motivation (expectancy for success, value, and effort) and investigated its change. Using data from a large-scale student achievement study of German ninth-graders, we employed second-order latent growth modeling and structural equation modeling to predict test performance in mathematics. On average, students' effort and perceived value of the test decreased, whereas expectancy for success remained stable. Overall, initial test-taking motivation was a better predictor of test performance than change in motivation. Only the variability of change in the expectancy component was positively related to test performance. The theoretical and practical implications for test practitioners are discussed. (DIPF/Orig.)
DIPF-Departments: Struktur und Steuerung des Bildungswesens

Prüfungen - systematische Perspektiven der Geschichte einer pädagogischen Praxis. Einführung in den […] Ricken, Norbert; Reh, Sabine Journal Article | In: Zeitschrift für Pädagogik | 2017 37072 Endnote: Author(s): Ricken, Norbert; Reh, Sabine
Title: Prüfungen - systematische Perspektiven der Geschichte einer pädagogischen Praxis. Einführung in den Thementeil
In: Zeitschrift für Pädagogik, 63 (2017) 3, S. 247-258
URN: urn:nbn:de:0111-pedocs-185389
URL: http://nbn-resolving.org/urn:nbn:de:0111-pedocs-185389
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Language: Deutsch
Keywords: Prüfung; Bildungsgeschichte; Leistungsmessung; Objektivität; Schülerbeurteilung; Lehrer; Schüler; Wissen; Transformation <Soz>; Kompetenz; Einführung
Abstract: Die Einführung beleuchtet das Verhältnis von Pädagogik und Prüfung aus historischer Perspektive. Dabei hinterfragen die Autoren die "Logiken" der Prüfung und zeigen deren gesellschaftlichen Funktionswandel auf. (DIPF/Bal)
DIPF-Departments: Bibliothek für Bildungsgeschichtliche Forschung

Herausforderungen bei der Schätzung von Trends in Schulleistungsstudien. Eine Skalierung der […] Robitzsch, Alexander; Lüdtke, Oliver; Köller, Olaf; Kröhne, Ulf; Goldhammer, Frank; […] Journal Article | In: Diagnostica | 2017 36898 Endnote: Author(s): Robitzsch, Alexander; Lüdtke, Oliver; Köller, Olaf; Kröhne, Ulf; Goldhammer, Frank; Heine, Jörg-Henrik
Title: Herausforderungen bei der Schätzung von Trends in Schulleistungsstudien. Eine Skalierung der deutschen PISA-Daten
In: Diagnostica, 63 (2017) 2, S. 148-165
DOI: 10.1026/0012-1924/a000177
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Language: Deutsch
Keywords: Deutschland; Einflussfaktor; Leistungsmessung; Lesekompetenz; Mathematische Kompetenz; Modell; Naturwissenschaftliche Kompetenz; PISA <Programme for International Student Assessment>; Schülerleistung; Schülerleistungstest; Skalierung; Technologiebasiertes Testen; Testauswertung
Abstract: Internationale Schulleistungsstudien wie das Programme for International Student Assessment (PISA) dienen den teilnehmenden Ländern zur Feststellung der Leistungsfähigkeit ihrer Schulsysteme. In PISA wird die Zielpopulation (15-jährige Schülerinnen und Schüler) alle 3 Jahre getestet. Von besonderer Bedeutung sind dabei die Trendinformationen, die für die Zielpopulation ausweisen, ob sich ihre Leistungen gegenüber denen aus früheren Erhebungen verändert haben. Um solche Trends valide interpretieren zu können, sollten die PISA-Erhebungen unter möglichst vergleichbaren Bedingungen durchgeführt und die verwendeten statistischen Verfahren vergleichbar bleiben. In PISA 2015 wurde erstmalig computerbasiert getestet; zuvor mittels Papier-und-Bleistift-Tests. Es wurde das Skalierungsmodell verändert und in den Naturwissenschaften wurden neue Aufgabenformate eingesetzt. Im vorliegenden Beitrag gehen wir anhand der nationalen PISA-Stichproben von 2000 bis 2015 der Frage nach, inwiefern der Wechsel des Testmodus und der Wechsel des Skalierungsmodells die Interpretation der Trendschätzungen beeinflussen. Die Analysen belegen, dass die Veränderung von Papier-und-Bleistift-Tests auf Computertestung die Trendschätzung für Deutschland verzerrt haben könnte. (DIPF/Orig.)
Abstract (english): International large-scale assessments, for instance, the Programme for International Student Assessment (PISA), are conducted to provide information on the effectiveness of educational systems. In PISA, the target population of 15-year-old students is assessed every 3 years. Trends show whether competencies have changed for the target population between PISA cycles. To ensure valid trend information, it is necessary to keep the test conditions and statistical methods in all PISA cycles as constant as possible. In PISA 2015, however, several changes were established; the test model changed from paper pencil to computer tests, scaling methods were changed, and new types of tasks were used in science. In this article, we investigate the effects of these changes on trend estimation in PISA using German data from all PISA cycles (2000 - 2015). Findings suggest that the change from paper pencil to computer tests could have biased the trend estimation. (DIPF/Orig.)
DIPF-Departments: Bildungsqualität und Evaluation

What happens to the fish's achievement in a little pond? A simultaneous analysis of class-average […] Stäbler, Franziska; Dumont, Hanna; Becker, Michael; Baumert, Jürgen Journal Article | In: Journal of Educational Psychology | 2017 36822 Endnote: Author(s): Stäbler, Franziska; Dumont, Hanna; Becker, Michael; Baumert, Jürgen
Title: What happens to the fish's achievement in a little pond? A simultaneous analysis of class-average achievement effects on achievement and academic self-concept
In: Journal of Educational Psychology, 109 (2017) 2, S. 191-207
DOI: 10.1037/edu0000135
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Language: Englisch
Keywords: Datenanalyse; Deutschland; Leistungsmessung; Leistungssteigerung; Lernumgebung; Mathematische Kompetenz; Schüler; Schülerleistung; Schuljahr 07; Schulklasse; Sekundäranalyse; Selbstkonzept; Wirkung
Abstract (english): Empirical studies have demonstrated that students who are taught in a group of students with higher average achievement benefit in terms of their achievement. However, there is also evidence showing that being surrounded by high-achieving students has a negative effect on students' academic self-concept, also known as the big-fish-little-pond effect. In view of the reciprocal relationship between achievement and academic self-concept, the present study aims to scrutinize how the average achievement of a class affects students' achievement and academic self-concept, and how that, in turn, affects subsequent achievement and academic self-concept. Using a sample of 6,463 seventh-graders from 285 classes in Germany, multilevel path models showed that the class-average achievement at the beginning of the school year positively affected individual achievement in the middle and at the end of the school year, and negative effects on academic self-concept occurred only at the beginning of Grade 7, but not later in the school year. In addition, mediation analyses revealed that the effects of class-average achievement on students' achievement and academic self-concept at the end of the school year were mediated by midterm achievement, but not by midterm academic self-concept. This pattern was found for mathematics, biology, physics, and English as a foreign language. The results of our study indicate that the consequences for students of belonging to a group of high-achieving students should be analyzed with respect to both academic self-concept and achievement. (DIPF/Orig.)
DIPF-Departments: Struktur und Steuerung des Bildungswesens

On-entry assessment of school competencies and academic achievement. A comparison between Slovenia […] Vidmar, Maša; Niklas, Frank; Schneider, Wolfgang; Hasselhorn, Marcus Journal Article | In: European Journal of Psychology of Education | 2017 37061 Endnote: Author(s): Vidmar, Maša; Niklas, Frank; Schneider, Wolfgang; Hasselhorn, Marcus
Title: On-entry assessment of school competencies and academic achievement. A comparison between Slovenia and Germany
In: European Journal of Psychology of Education, 32 (2017) 2, S. 311-331
DOI: 10.1007/s10212-016-0294-9
URN: urn:nbn:de:0111-pedocs-174345
URL: http://nbn-resolving.org/urn:nbn:de:0111-pedocs-174345
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Language: Englisch
Keywords: Deutschland; Fähigkeit; Faktorenanalyse; Grundschüler; Internationaler Vergleich; Korrelationsanalyse; Leistungsbeurteilung; Leistungsmessung; Lesen; Prognose; Prognostischer Test; Rechnen; Schreiben; Schulanfänger; Schülerleistung; Slowenien; Strukturgleichungsmodell
Abstract: The foundation of school success is laid early in children's lives. Consequently, assessments of academic precursors may help to identify children in need of additional support. Such early assessments could also be interesting from an international perspective when educational systems are compared. This analysis is used to inform on the comparability of Slovenian and German versions of the English on-school-entry assessment tool "Performance Indicators in Primary School" (PIPS; Tymms and Albone 2002). PIPS was also used to predict later academic achievement in the two national samples. The German sample consisted of 468 children with a mean age of about 6;6 years at school entry (48.7 % girls). In Slovenia, 328 children (49 % girls) were assessed (mean age of about 6;3 years at school entry). Multi-group confirmatory factor analyses for PIPS did not support weak measurement invariance. However, results indicated that the number of factors as well as the pattern of loadings seems to be comparable. Further research is needed to examine in which respects PIPS might work as a tool for international comparisons. Structural equation modelling indicated that PIPS can be used as a predictor of academic achievement and that overall academic achievement could be predicted best by early numeracy. PIPS measures of literacy and numeracy skills were specific and significant predictors of children's later language and math achievement in grade 1. (DIPF/Orig.)
DIPF-Departments: Bildung und Entwicklung