Ergebnis der Suche in der DIPF Publikationendatenbank

Ihre Abfrage:

(Schlagwörter: "Konstruktion")

Analysis using TALIS 2018 scale scores Rozman, Mojca; Wild, Justin; Stancel Piątak, Agnes Verschiedenartige Dokumente | 2019 39714 Endnote: Autor*innen: Rozman, Mojca; Wild, Justin; Stancel Piątak, Agnes
Titel: Analysis using TALIS 2018 scale scores
Erscheinungsvermerk: Paris: OECD, 2019 (OECD (Hrsg.): TALIS 2018 and TALIS Starting Strong 2018 user guide)
URL: www.oecd.org/education/talis/TALIS_2018-TALIS_Starting_Strong_2018_User_Guide.pdf#page=130
Dokumenttyp: 5. Arbeits- und Diskussionspapiere; Forschungsbericht/Projektberichte/Schulrückmeldungen
Sprache: Englisch
Schlagwörter: Skala; Evaluation; Skalenkonstruktion; Analyse; Beispiel; Internationaler Vergleich; OECD-Länder
Abstract: This chapter provides a brief summary on scale evaluation and scale score contruction in the OECD Teaching and Learning International Survey (TALIS) 2018 and the OECD Starting Strong Teaching and Learning International Survey (TALIS Starting Strong) 2018. In addition, it provides an example of how to read the results from scale evaluation and also offers two examples of analyses with scale scores. The first one demonstrates comparisons of scale scores between participating countries/economies within one ISCED level and the second one describes scale score comparison within one country between different ISCED levels. The results from statistical analysis using the scale scores should be interpreted taking into account the limitations based on the level of measurement invariance achieved by each scale. (DIPF/Orig.)
DIPF-Abteilung: Bildungsqualität und Evaluation

Validation of scales and construction of scale scores Stancel-Piątak, Agnes; Wild, Justin; Chen, Minge; Rozman, Mojca; Mirazchiyski, Plamen; Cigler, Hynek Verschiedenartige Dokumente | 2019 39715 Endnote: Autor*innen: Stancel-Piątak, Agnes; Wild, Justin; Chen, Minge; Rozman, Mojca; Mirazchiyski, Plamen; Cigler, Hynek
Titel: Validation of scales and construction of scale scores
Erscheinungsvermerk: Paris: OECD, 2019 (OECD (Hrsg.): TALIS 2018 technical report)
URL: https://www.oecd.org/education/talis/TALIS_2018_Technical_Report.pdf#page=192
Dokumenttyp: 5. Arbeits- und Diskussionspapiere; Forschungsbericht/Projektberichte/Schulrückmeldungen
Sprache: Englisch
Schlagwörter: Skala; Evaluation; Index; Skalenkonstruktion; Internationaler Vergleich; OECD-Länder
Abstract: To enable reporting on a latent trait (sometimes referred to as a construct) or other abstract trait, some questions in the TALIS 2018 questionnaires were combined into an index or scale. This chapter explains how the indices were created and describes the methodology used to validate scales and construct scale scores. It details latent trait evaluation and the procedure involved in computing scale scores and illustrates the implications of the evaluation results for using scale scores in further analyses. The chapter also describes the possibilities and limitations of using scale scores for cross-country/economy comparisons and presents each scale in more detail together with its statistical properties. (DIPF/Orig.)
DIPF-Abteilung: Bildungsqualität und Evaluation

Adaptive item selection under matroid constraints Bengs, Daniel; Brefeld, Ulf; Kröhne, Ulf Zeitschriftenbeitrag | In: Journal of Computerized Adaptive Testing | 2018 38642 Endnote: Autor*innen: Bengs, Daniel; Brefeld, Ulf; Kröhne, Ulf
Titel: Adaptive item selection under matroid constraints
In: Journal of Computerized Adaptive Testing, 6 (2018) 2, S. 15-36
DOI: 10.7333/1808-0602015
URN: urn:nbn:de:0111-dipfdocs-166953
URL: http://www.dipfdocs.de/volltexte/2020/16695/pdf/JCAT_2018_2_Bengs_Brefeld_Kroehne_Adaptive_item_selection_under_matroid_constraints_A.pdf
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Adaptives Testen; Algorithmus; Computerunterstütztes Verfahren; Itembank; Messverfahren; Technologiebasiertes Testen; Testkonstruktion
Abstract (english): The shadow testing approach (STA; van der Linden & Reese, 1998) is considered the state of the art in constrained item selection for computerized adaptive tests. The present paper shows that certain types of constraints (e.g., bounds on categorical item attributes) induce a matroid on the item bank. This observation is used to devise item selection algorithms that are based on matroid optimization and lead to optimal tests, as the STA does. In particular, a single matroid constraint can be treated optimally by an efficient greedy algorithm that selects the most informative item preserving the integrity of the constraints. A simulation study shows that for applicable constraints, the optimal algorithms realize a decrease in standard error (SE) corresponding to a reduction in test length of up to 10% compared to the maximum priority index (Cheng & Chang, 2009) and up to 30% compared to Kingsbury and Zara's (1991) constrained computerized adaptive testing.
DIPF-Abteilung: Bildungsqualität und Evaluation

How to conceptualize, represent, and analyze log data from technology-based assessments? A generic […] Kroehne, Ulf; Goldhammer, Frank Zeitschriftenbeitrag | In: Behaviormetrika | 2018 38895 Endnote: Autor*innen: Kroehne, Ulf; Goldhammer, Frank
Titel: How to conceptualize, represent, and analyze log data from technology-based assessments? A generic framework and an application to questionnaire items
In: Behaviormetrika, 45 (2018) 2, S. 527-563
DOI: 10.1007/s41237-018-0063-y
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Beitrag in Sonderheft
Sprache: Englisch
Schlagwörter: Bildungsforschung; Empirische Forschung; Logdatei; Datenanalyse; Technologiebasiertes Testen; PISA <Programme for International Student Assessment>; Fragebogen; Konzeption; Testkonstruktion; Daten; Typologie; Hardware; Antwort; Verhalten; Dauer; Interaktion; Mensch-Maschine-Kommunikation; Indikator
Abstract: Log data from educational assessments attract more and more attention and large-scale assessment programs have started providing log data as scientific use files. Such data generated as a by-product of computer-assisted data collection has been known as paradata in survey research. In this paper, we integrate log data from educational assessments into a taxonomy of paradata. To provide a generic framework for the analysis of log data, finite state machines are suggested. Beyond its computational value, the specific benefit of using finite state machines is achieved by separating platform-specific log events from the definition of indicators by states. Specifically, states represent filtered log data given a theoretical process model, and therefore, encode the information of log files selectively. The approach is empirically illustrated using log data of the context questionnaires of the Programme for International Student Assessment (PISA). We extracted item-level response time components from questionnaire items that were administered as item batteries with multiple questions on one screen and related them to the item responses. Finally, the taxonomy and the finite state machine approach are discussed with respect to the definition of complete log data, the verification of log data and the reproducibility of log data analyses. (DIPF/Orig.)
DIPF-Abteilung: Bildungsqualität und Evaluation

Experimental validation strategies for heterogeneous computer-based assessment items Engelhardt, Lena; Goldhammer, Frank; Naumann, Johannes; Frey, Andreas Zeitschriftenbeitrag | In: Computers in Human Behavior | 2017 37464 Endnote: Autor*innen: Engelhardt, Lena; Goldhammer, Frank; Naumann, Johannes; Frey, Andreas
Titel: Experimental validation strategies for heterogeneous computer-based assessment items
In: Computers in Human Behavior, 76 (2017) , S. 683-692
DOI: 10.1016/j.chb.2017.02.020
URN: urn:nbn:de:0111-dipfdocs-176056
URL: http://www.dipfdocs.de/volltexte/2019/17605/pdf/Engelhardt_et_al._2017_ManuscriptAccepted_A.pdf
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Beitrag in Sonderheft
Sprache: Englisch
Schlagwörter: Leistungstest; Leistungsmessung; Medienkompetenz; Computerunterstütztes Verfahren; Validität; Testaufgabe; Testkonstruktion; Anpassung; Strategie; Veränderung; Testmethodik; Testtheorie
Abstract (english): Computer-based assessments open up new possibilities to measure constructs in authentic settings. They are especially promising to measure 21st century skills, as for instance information and communication technologies (ICT) skills. Items tapping such constructs may be diverse regarding design principles and content and thus form a heterogeneous item set. Existing validation approaches, as the construct representation approach by Embretson (1983), however, require homogenous item sets in the sense that a particular task characteristic can be applied to all items. To apply this validation rational also for heterogeneous item sets, two experimental approaches are proposed based on the idea to create variants of items by systematically manipulating task characteristics. The change-approach investigates whether the manipulation affects construct-related demands and the eliminate-approach whether the test score represents the targeted skill dimension. Both approaches were applied within an empirical study (N = 983) using heterogeneous items from an ICT skills test. The results show how changes of ICT-specific task characteristics influenced item difficulty without changing the represented construct. Additionally, eliminating the intended skill dimension led to easier items and changed the construct partly. Overall, the suggested experimental approaches provide a useful validation tool for 21st century skills assessed by heterogeneous items. (DIPF/Orig.)
DIPF-Abteilung: Bildungsqualität und Evaluation

Modeling individual response time effects between and within experimental speed conditions. A GLMM […] Goldhammer, Frank; Steinwascher, Merle A.; Kroehne, Ulf; Naumann, Johannes Zeitschriftenbeitrag | In: British Journal of Mathematical and Statistical Psychology | 2017 37357 Endnote: Autor*innen: Goldhammer, Frank; Steinwascher, Merle A.; Kroehne, Ulf; Naumann, Johannes
Titel: Modeling individual response time effects between and within experimental speed conditions. A GLMM approach for speeded tests
In: British Journal of Mathematical and Statistical Psychology, 70 (2017) 2, S. 238-256
DOI: 10.1111/bmsp.12099
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Beitrag in Sonderheft
Sprache: Englisch
Schlagwörter: Test; Testkonstruktion; Antwort; Dauer; Unterschied; Messverfahren; Entscheidung; Einflussfaktor; Fehler; Modell; Vergleich
Abstract: Completing test items under multiple speed conditions avoids the performance measure being confounded with individual differences in the speed-accuracy compromise, and offers insights into the response process, that is, how response time relates to the probability of a correct response. This relation is traditionally represented by two conceptually different functions: the speed-accuracy trade-off function (SATF) across conditions relating the condition average response time to the condition average of accuracy, and the conditional accuracy function (CAF) within a condition describing accuracy conditional on response time. Using a generalized linear mixed modelling approach, we propose an item response modelling framework that is suitable for item response and response time data from experimental speed conditions. The proposed SATF and CAF model accommodates response time effects between conditions (i.e., person and item SATF slope) and within conditions (i.e., residual CAF slopes), captures person and item differences in these effects, and is suitable for measures with a strong speed component. Moreover, for a single condition a CAF model is proposed distinguishing person, item and residual CAF. The properties of the models are illustrated with an empirical example. (DIPF/Orig.)
DIPF-Abteilung: Bildungsqualität und Evaluation

Practical significance of item misfit in educational assessments Köhler, Carmen; Hartig, Johannes Zeitschriftenbeitrag | In: Applied Psychological Measurement | 2017 37161 Endnote: Autor*innen: Köhler, Carmen; Hartig, Johannes
Titel: Practical significance of item misfit in educational assessments
In: Applied Psychological Measurement, 41 (2017) 5, S. 388-400
DOI: 10.1177/0146621617692978
URN: urn:nbn:de:0111-pedocs-156084
URL: https://nbn-resolving.org/urn:nbn:de:0111-pedocs-156084
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Item-Response-Theory; Korrelation; Leistungsmessung; Rasch-Modell; Schülerleistung; Schülerleistungstest; Testkonstruktion; Testtheorie; Validität
Abstract: Testing item fit is an important step when calibrating and analyzing item response theory (IRT)-based tests, as model fit is a necessary prerequisite for drawing valid inferences from estimated parameters. In the literature, numerous item fit statistics exist, sometimes resulting in contradictory conclusions regarding which items should be excluded from the test. Recently, researchers argue to shift the focus from statistical item fit analyses to evaluating practical consequences of item misfit. This article introduces a method to quantify potential bias of relationship estimates (e.g., correlation coefficients) due to misfitting items. The potential deviation informs about whether item misfit is practically significant for outcomes of substantial analyses. The method is demonstrated using data from an educational test. (DIPF/Orig.)
DIPF-Abteilung: Bildungsqualität und Evaluation

Incremental validity of multidimensional proficiency scores from diagnostic classification models: […] Kunina-Habenicht, Olga; Rupp, André A.; Wilhelm, Oliver Zeitschriftenbeitrag | In: International Journal of Testing | 2017 37179 Endnote: Autor*innen: Kunina-Habenicht, Olga; Rupp, André A.; Wilhelm, Oliver
Titel: Incremental validity of multidimensional proficiency scores from diagnostic classification models: An illustration for elementary school mathematics
In: International Journal of Testing, 17 (2017) 4, S. 277-301
DOI: 10.1080/15305058.2017.1291517
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Arithmetik; Diagnostik; Empirische Untersuchung; Item-Response-Theory; Leistungsmessung; Mathematische Kompetenz; Modell; Regressionsanalyse; Reliabilität; Schülerleistung; Schülerleistungstest; Schuljahr 04; Testkonstruktion; Validität
Abstract (english): Diagnostic classification models (DCMs) hold great potential for applications in summative and formative assessment by providing discrete multivariate proficiency scores that yield statistically-driven classifications of students. Using data from a newly developed diagnostic arithmetic assessment that was administered to 2,032 fourth-grade students in Germany, we evaluated whether the multidimensional proficiency scores from the best-fitting DCM have an added value, over and above the unidimensional proficiency score from a simpler unidimensional IRT model, in explaning variance in external (a) school grades in mathematics and (b) unidimensional proficiency scores from a standards-based large-scale assessment of mathematics. Results revealed high classification reliabilities as well as interpretable parameter estimates for items and students for the best-fitting DCM. However, while DCM scores were moderatly correlated with both external criteria, only a negligible incremental validity of the multivariate attribute scores was found. (DIPF/Orig.)
DIPF-Abteilung: Bildungsqualität und Evaluation

Absolute and relative measures of instructional sensitivity Naumann, Alexander; Hartig, Johannes; Hochweber, Jan Zeitschriftenbeitrag | In: Journal of Educational and Behavioral Statistics | 2017 37374 Endnote: Autor*innen: Naumann, Alexander; Hartig, Johannes; Hochweber, Jan
Titel: Absolute and relative measures of instructional sensitivity
In: Journal of Educational and Behavioral Statistics, 42 (2017) 6, S. 678-705
DOI: 10.3102/1076998617703649
URN: urn:nbn:de:0111-pedocs-156029
URL: http://www.dipfdocs.de/volltexte/2018/15602/pdf/1076998617703649_A.pdf
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Bewertung; DESI <Deutsch-Englisch-Schülerleistungen-International>; Deutschland; Englischunterricht; Item-Response-Theory; Leistungsmessung; Messverfahren; Schüler; Schülerleistung; Schuljahr 09; Sprachkompetenz; Test; Testkonstruktion; Testtheorie; Unterricht; Wirkung
Abstract: Valid inferences on teaching drawn from students' test scores require that tests are sensitive to the instruction students received in class. Accordingly, measures of the test items' instructional sensitivity provide empirical support for validity claims about inferences on instruction. In the present study, we first introduce the concepts of absolute and relative measures of instructional sensitivity. Absolute measures summarize a single item's total capacity of capturing effects of instruction, which is independent of the test's sensitivity. In contrast, relative measures summarize a single item's capacity of capturing effects of instruction relative to test sensitivity. Then, we propose a longitudinal multilevel item response theory model that allows estimating both types of measures depending on the identification constraints. (DIPF/Orig.)
DIPF-Abteilung: Bildungsqualität und Evaluation

Time-on-task effects in digital reading are non-linear and moderated by persons' skills and tasks' […] Naumann, Johannes; Goldhammer, Frank Zeitschriftenbeitrag | In: Learning and Individual Differences | 2017 36715 Endnote: Autor*innen: Naumann, Johannes; Goldhammer, Frank
Titel: Time-on-task effects in digital reading are non-linear and moderated by persons' skills and tasks' demands
In: Learning and Individual Differences, 53 (2017) , S. 1-16
DOI: 10.1016/j.lindif.2016.10.002
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Digitale Medien; Hypertext; Internationaler Vergleich; Kognitive Prozesse; Leistungsmessung; Lesekompetenz; Lesen; Leseverstehen; Modell; OECD-Länder; PISA <Programme for International Student Assessment>; Problemlösen; Schülerleistung; Technologiebasiertes Testen; Testaufgabe; Testkonstruktion; Wirkung; Zeit
Abstract: Time-on-task effects on response accuracy in digital reading tasks were examined using PISA 2009 data (N = 34,062, 19 countries/economies). As a baseline, task responses were explained by time on task, tasks' easiness, and persons' digital reading skill (Model 1). Model 2 added a quadratic time-on-task effect, persons' comprehension skill and tasks' navigation demands as predictors. In each country, linear and quadratic time-on-task effects were moderated by person and task characteristics. Strongly positive linear time-on-task effects were found for persons being poor digital readers (Model 1) and poor comprehenders (Model 2), which decreased with increasing skill. Positive linear time-on-task effects were found for hard tasks (Model 1) and tasks high in navigation demands (Model 2). For easy tasks and tasks low in navigation demands, the time-on-task effects were negative, or close to zero, respectively. A negative quadratic component of the time-on-task effect was more pronounced for strong comprehenders, while the linear component was weaker. Correspondingly, for tasks high in navigation demands the negative quadratic component to the time-on-task effect was weaker, and the linear component was stronger. These results are in line with a dual-processing account of digital reading that distinguishes automatic reading components from resource-demanding regulation and navigation processes. (DIPF/Orig.)
DIPF-Abteilung: Bildungsqualität und Evaluation