Menü Überspringen
Suche
de
|
en
Kontakt
Not track
Datenpraxis
Anmelden
DIPF aktuell
Forschung
Wissensressourcen
Vernetzung
Institut
Zurück
de
|
en
Kontakt
Suche
Startseite
>
Forschung
>
Publikationen
>
Publikationendatenbank
Ergebnis der Suche in der DIPF Publikationendatenbank
Ihre Abfrage:
(Schlagwörter: "Item-Response-Theory")
zur erweiterten Suche
Suchbegriff
Nur Open Access
Suchen
Markierungen aufheben
Alle Treffer markieren
Export
59
Inhalte gefunden
Alle Details anzeigen
On the speed sensitivity parameter in the lognormal model for response times. Implications for test […]
Becker, Benjamin; Debeer, Dries; Weirich, Sebastian; Goldhammer, Frank
Zeitschriftenbeitrag
| In: Applied Psychological Measurement | 2021
42009 Endnote
Autor*innen:
Becker, Benjamin; Debeer, Dries; Weirich, Sebastian; Goldhammer, Frank
Titel:
On the speed sensitivity parameter in the lognormal model for response times. Implications for test assembly
In:
Applied Psychological Measurement, 45 (2021) 6, S. 407-422
DOI:
10.1177/01466216211008530
URL:
https://journals.sagepub.com/doi/abs/10.1177/01466216211008530
Dokumenttyp:
3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache:
Englisch
Schlagwörter:
Software; Technologiebasiertes Testen; Messverfahren; Item-Response-Theory; Leistungstest; Frage; Antwort; Dauer; Einflussfaktor; Testkonstruktion; Modell; Vergleich; Testtheorie; Simulation
Abstract:
In high-stakes testing, often multiple test forms are used and a common time limit is enforced. Test fairness requires that ability estimates must not depend on the administration of a specific test form. Such a requirement may be violated if speededness differs between test forms. The impact of not taking speed sensitivity into account on the comparability of test forms regarding speededness and ability estimation was investigated. The lognormal measurement model for response times by van der Linden was compared with its extension by Klein Entink, van der Linden, and Fox, which includes a speed sensitivity parameter. An empirical data example was used to show that the extended model can fit the data better than the model without speed sensitivity parameters. A simulation was conducted, which showed that test forms with different average speed sensitivity yielded substantial different ability estimates for slow test takers, especially for test takers with high ability. Therefore, the use of the extended lognormal model for response times is recommended for the calibration of item pools in high-stakes testing situations. Limitations to the proposed approach and further research questions are discussed. (DIPF/Orig.)
Abstract (english):
In high-stakes testing, often multiple test forms are used and a common time limit is enforced. Test fairness requires that ability estimates must not depend on the administration of a specific test form. Such a requirement may be violated if speededness differs between test forms. The impact of not taking speed sensitivity into account on the comparability of test forms regarding speededness and ability estimation was investigated. The lognormal measurement model for response times by van der Linden was compared with its extension by Klein Entink, van der Linden, and Fox, which includes a speed sensitivity parameter. An empirical data example was used to show that the extended model can fit the data better than the model without speed sensitivity parameters. A simulation was conducted, which showed that test forms with different average speed sensitivity yielded substantial different ability estimates for slow test takers, especially for test takers with high ability. Therefore, the use of the extended lognormal model for response times is recommended for the calibration of item pools in high-stakes testing situations. Limitations to the proposed approach and further research questions are discussed. (DIPF/Orig.)
DIPF-Abteilung:
Lehr und Lernqualität in Bildungseinrichtungen
Model‐based treatment of rapid guessing
Deribo, Tobias; Kröhne, Ulf; Goldhammer, Frank
Zeitschriftenbeitrag
| In: Journal of Educational Measurement | 2021
41271 Endnote
Autor*innen:
Deribo, Tobias; Kröhne, Ulf; Goldhammer, Frank
Titel:
Model‐based treatment of rapid guessing
In:
Journal of Educational Measurement, 58 (2021) 2, S. 281-303
DOI:
10.1111/jedm.12290
URL:
https://onlinelibrary.wiley.com/doi/10.1111/jedm.12290?af=R
Dokumenttyp:
3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache:
Englisch
Schlagwörter:
Leistungstest; Testkonstruktion; Messverfahren; Computerunterstütztes Verfahren; Frage; Antwort; Verhalten; Dauer; Problemlösen; Modell; Student; Medienkompetenz; Item-Response-Theory; Multiple-Choice-Verfahren; Validität; Panel; Längsschnittuntersuchung
Abstract (english):
The increased availability of time-related information as a result of computer-based assessment has enabled new ways to measure test-taking engagement. One of these ways is to distinguish between solution and rapid guessing behavior. Prior research has recommended response-level filtering to deal with rapid guessing. Response-level filtering can lead to parameter bias if rapid guessing depends on the measured trait or (un-)observed covariates. Therefore, a model based on Mislevy and Wu (1996) was applied to investigate the assumption of ignorable missing data underlying response-level filtering. The model allowed us to investigate different approaches to treating response-level filtered responses in a single framework through model parameterization. The study found that lower-ability test-takers tend to rapidly guess more frequently and are more likely to be unable to solve an item they guessed on, indicating a violation of the assumption of ignorable missing data underlying response-level filtering. Further ability estimation seemed sensitive to different approaches to treating response-level filtered responses. Moreover, model-based approaches exhibited better model fit and higher convergent validity evidence compared to more naïve treatments of rapid guessing. The results illustrate the need to thoroughly investigate the assumptions underlying specific treatments of rapid guessing as well as the need for robust methods. (DIPF/Orig.)
DIPF-Abteilung:
Lehr und Lernqualität in Bildungseinrichtungen
A bias corrected RMSD item fit statistic. An evaluation and comparison to alternatives
Köhler, Carmen; Robitzsch, Alexander; Hartig, Johannes
Zeitschriftenbeitrag
| In: Journal of Educational and Behavioral Statistics | 2020
40510 Endnote
Autor*innen:
Köhler, Carmen; Robitzsch, Alexander; Hartig, Johannes
Titel:
A bias corrected RMSD item fit statistic. An evaluation and comparison to alternatives
In:
Journal of Educational and Behavioral Statistics, 45 (2020) 3, S. 251-273
DOI:
10.3102/1076998619890566
URL:
https://journals.sagepub.com/doi/10.3102/1076998619890566
Dokumenttyp:
3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache:
Englisch
Schlagwörter:
Item-Response-Theory; Testkonstruktion; Modell; Frage; Antwort; Messverfahren; Statistische Methode; Evaluation; Vergleich; Bildungsforschung; Empirische Forschung
Abstract:
Testing whether items fit the assumptions of an item response theory model is an important step in evaluating a test. In the literature, numerous item fit statistics exist, many of which show severe limitations. The current study investigates the root mean squared deviation (RMSD) item fit statistic, which is used for evaluating item fit in various large-scale assessment studies. The three research questions of this study are (1) whether the empirical RMSD is an unbiased estimator of the population RMSD; (2) if this is not the case, whether this bias can be corrected; and (3) whether the test statistic provides an adequate significance test to detect misfitting items. Using simulation studies, it was found that the empirical RMSD is not an unbiased estimator of the population RMSD, and nonparametric bootstrapping falls short of entirely eliminating this bias. Using parametric bootstrapping, however, the RMSD can be used as a test statistic that outperforms the other approaches - infit and outfit, S1 X2 with respect to both Type I error rate and power. The empirical application showed that parametric bootstrapping of the RMSD results in rather conservative item fit decisions, which suggests more lenient cut-off criteria. (DIPF/Orig.)
DIPF-Abteilung:
Bildungsqualität und Evaluation
Interpretation von Testwerten in der Item-Response-Theorie (IRT)
Rauch, Dominique; Hartig, Johannes
Sammelbandbeitrag
| Aus: Moosbrugger, Helfried; Kelava, Augustin (Hrsg.): Testtheorie und Fragebogenkonstruktion | Berlin: Springer | 2020
40527 Endnote
Autor*innen:
Rauch, Dominique; Hartig, Johannes
Titel:
Interpretation von Testwerten in der Item-Response-Theorie (IRT)
Aus:
Moosbrugger, Helfried; Kelava, Augustin (Hrsg.): Testtheorie und Fragebogenkonstruktion, Berlin: Springer, 2020 , S. 411-424
DOI:
10.1007/978-3-662-61532-4_17
URL:
https://link.springer.com/chapter/10.1007%2F978-3-662-61532-4_17
Dokumenttyp:
4. Beiträge in Sammelwerken; Sammelband (keine besondere Kategorie)
Sprache:
Deutsch
Schlagwörter:
Test; Wert; Testauswertung; Interpretation; Item-Response-Theory; Modell; Bildungsforschung; Empirische Forschung; Kompetenz; Definition; Rasch-Modell; Datenanalyse
Abstract:
Im vorliegenden Kapitel geht es um die Anwendung von IRT-Modellen im Rahmen der empirischen Bildungsforschung. Bei großen Schulleistungsstudien werden spezifische Vorteile der IRT genutzt, um beispielsweise das Matrix-Sampling von Testaufgaben, die Erstellung paralleler Testformen und die Entwicklung computerisierter adaptiver Tests zu ermöglichen. Ein weiterer wesentlicher Vorteil von IRT-Modellen ist die Möglichkeit der kriteriumsorientierten Interpretation IRT-basierter Testwerte. Diese wird durch die gemeinsame Verortung von Itemschwierigkeiten und Personenfähigkeiten auf einer Joint Scale durchführbar. Bei Gültigkeit des Rasch-Modells können individuelle Testwerte durch ihre Abstände zu Itemschwierigkeiten interpretiert werden. Auf dieser zentralen Eigenschaft von Rasch-Modellen bauen auch sog. "Kompetenzniveaus" auf. Zur leichteren Interpretation wird die kontinuierliche Skala in Abschnitte (Kompetenzniveaus) unterteilt, die dann als Ganzes kriteriumsorientiert beschrieben werden. In diesem Kapitel werden an einem gemeinsamen Beispiel die Definition und Beschreibung von Kompetenzniveaus anhand eines Vorgehens mit Post-hoc-Analysen der Items und die Verwendung von A-priori-Aufgabenmerkmalen veranschaulicht. (DIPF/Orig.)
DIPF-Abteilung:
Bildungsqualität und Evaluation
Comparing attitudes across groups. An IRT-based item-fit statistic for the analysis of measurement […]
Buchholz, Janine; Hartig, Johannes
Zeitschriftenbeitrag
| In: Applied Psychological Measurement | 2019
37766 Endnote
Autor*innen:
Buchholz, Janine; Hartig, Johannes
Titel:
Comparing attitudes across groups. An IRT-based item-fit statistic for the analysis of measurement invariance
In:
Applied Psychological Measurement, 43 (2019) 3, S. 241-250
DOI:
10.1177/0146621617748323
URN:
urn:nbn:de:0111-dipfdocs-174393
URL:
http://www.dipfdocs.de/volltexte/2020/17439/pdf/APM_2019_3_Buchholz_Hartig_Comparing_attitudes_across_groups_A.pdf
Dokumenttyp:
3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache:
Englisch
Schlagwörter:
Einstellung <Psy>; Messung; Fragebogen; Internationaler Vergleich; Gruppe; Vergleich; Item-Response-Theory; Skalierung; Modell; Statistische Methode; Simulation
Abstract (english):
Questionnaires for the assessment of attitudes and other psychological traits are crucial in educational and psychological research, and Item Response Theory (IRT) has become a viable tool for scaling such data. Many international large-scale assessments aim at comparing these constructs across countries, and the invariance of measures across countries is thus required. In its most recent cycle, the Programme for International Student Assessment (PISA 2015) implemented an innovative approach for testing the invariance of IRT-scaled constructs in the context questionnaires administered to students, parents, school principals and teachers. On the basis of a concurrent calibration with equal item parameters across all groups (i.e., languages within countries), a group-specific item-fit statistic (root-mean-square deviance; RMSD) was used as a measure for the invariance of item parameters for individual groups. The present simulation study examines the statistic's distribution under different types and extents of (non-) invariance in polytomous items. Responses to five four-point Likert-type items were generated under the Generalized Partial Credit Model (GPCM) for 1000 simulees in 50 groups each. For one of the five items, either location or discrimination parameters were drawn from a normal distribution. In addition to this type of non-invariance, we varied the extent of non-invariance by manipulating the variation of these distributions. Results indicate that the RMSD statistic is better at detecting non-invariance related to between-group differences in item location than in item discrimination. The study's findings may be used as a starting point to sensitivity analysis aiming to define cut-off values for determining (non-) invariance. (DIPF/Orig.)
DIPF-Abteilung:
Bildungsqualität und Evaluation
Cross-cultural comparability of noncognitive constructs in TIMSS and PISA
He, Jia; Barrera-Pedemonte, Fabian; Buchholz, Janine
Zeitschriftenbeitrag
| In: Assessment in Education | 2019
38403 Endnote
Autor*innen:
He, Jia; Barrera-Pedemonte, Fabian; Buchholz, Janine
Titel:
Cross-cultural comparability of noncognitive constructs in TIMSS and PISA
In:
Assessment in Education, 26 (2019) 4, S. 369-385
DOI:
10.1080/0969594X.2018.1469467
URL:
https://www.tandfonline.com/doi/full/10.1080/0969594X.2018.1469467
Dokumenttyp:
3a. Beiträge in begutachteten Zeitschriften; Beitrag in Sonderheft
Sprache:
Englisch
Schlagwörter:
PISA <Programme for International Student Assessment>; TIMSS <Third International Mathematics and Science Study>; Schülerleistung; Leistungsmessung; Mathematikunterricht; Naturwissenschaftlicher Unterricht; Freude; Motivation; Schule; Identifikation <Psy>; Sekundarstufe I; Schüler; Messverfahren; Vergleich; Item-Response-Theory; Faktorenanalyse; OECD-Länder
Abstract:
Noncognitive assessments in Programme for International Student Assessment (PISA) and Trends in International Mathematics and Science Study share certain similarities and provide complementary information, yet their comparability is seldom checked and convergence not sought. We made use of student self-report data of Instrumental Motivation, Enjoyment of Science and Sense of Belonging to School targeted in both surveys in 29 overlapping countries to (1) demonstrate levels of measurement comparability, (2) check convergence of different scaling methods within survey and (3) check convergence of these constructs with student achievement across surveys. We found that the three scales in either survey (except Sense of Belonging to School in PISA) reached at least metric invariance. The scale scores from the multigroup confirmatory factor analysis and the item response theory analysis were highly correlated, pointing to robustness of scaling methods. The correlations between each construct and achievement was generally positive within each culture in each survey, and the correlational pattern was similar across surveys (except for Sense of Belonging), indicating certain convergence in the cross-survey validation. We stress the importance of checking measurement invariance before making comparative inferences, and we discuss implications on the quality and relevance of these constructs in understating learning. (DIPF/Orig.)
DIPF-Abteilung:
Bildungsqualität und Evaluation
Construct equivalence of PISA reading comprehension measured with paper‐based and computer‐based […]
Kroehne, Ulf; Buerger, Sarah; Hahnel, Carolin; Goldhammer, Frank
Zeitschriftenbeitrag
| In: Educational Measurement | 2019
39814 Endnote
Autor*innen:
Kroehne, Ulf; Buerger, Sarah; Hahnel, Carolin; Goldhammer, Frank
Titel:
Construct equivalence of PISA reading comprehension measured with paper‐based and computer‐based assessments
In:
Educational Measurement, 38 (2019) 3, S. 97-111
DOI:
10.1111/emip.12280
URL:
https://onlinelibrary.wiley.com/doi/abs/10.1111/emip.12280
Dokumenttyp:
3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache:
Englisch
Schlagwörter:
Einflussfaktor; Schülerleistung; Frage; Antwort; Interaktion; Unterschied; Vergleich; Item-Response-Theory; Deutschland; PISA <Programme for International Student Assessment>; Leseverstehen; Messverfahren; Testkonstruktion; Korrelation; Äquivalenz; Papier-Bleistift-Test; Computerunterstütztes Verfahren; Technologiebasiertes Testen; Leistungsmessung; Testverfahren; Testdurchführung
Abstract:
For many years, reading comprehension in the Programme for International Student Assessment (PISA) was measured via paper‐based assessment (PBA). In the 2015 cycle, computer‐based assessment (CBA) was introduced, raising the question of whether central equivalence criteria required for a valid interpretation of the results are fulfilled. As an extension of the PISA 2012 main study in Germany, a random subsample of two intact PISA reading clusters, either computerized or paper‐based, was assessed using a random group design with an additional within‐subject variation. The results are in line with the hypothesis of construct equivalence. That is, the latent cross‐mode correlation of PISA reading comprehension was not significantly different from the expected correlation between the two clusters. Significant mode effects on item difficulties were observed for a small number of items only. Interindividual differences found in mode effects were negatively correlated with reading comprehension, but were not predicted by basic computer skills or gender. Further differences between modes were found with respect to the number of missing values.
Abstract (english):
For many years, reading comprehension in the Programme for International Student Assessment (PISA) was measured via paper‐based assessment (PBA). In the 2015 cycle, computer‐based assessment (CBA) was introduced, raising the question of whether central equivalence criteria required for a valid interpretation of the results are fulfilled. As an extension of the PISA 2012 main study in Germany, a random subsample of two intact PISA reading clusters, either computerized or paper‐based, was assessed using a random group design with an additional within‐subject variation. The results are in line with the hypothesis of construct equivalence. That is, the latent cross‐mode correlation of PISA reading comprehension was not significantly different from the expected correlation between the two clusters. Significant mode effects on item difficulties were observed for a small number of items only. Interindividual differences found in mode effects were negatively correlated with reading comprehension, but were not predicted by basic computer skills or gender. Further differences between modes were found with respect to the number of missing values.
DIPF-Abteilung:
Bildungsqualität und Evaluation
Invariance of the response processes between gender and modes in an assessment of reading
Kroehne, Ulf; Hahnel, Carolin; Goldhammer, Frank
Zeitschriftenbeitrag
| In: Frontiers in Applied Mathematics and Statistics | 2019
39231 Endnote
Autor*innen:
Kroehne, Ulf; Hahnel, Carolin; Goldhammer, Frank
Titel:
Invariance of the response processes between gender and modes in an assessment of reading
In:
Frontiers in Applied Mathematics and Statistics, (2019) , S. 5:2
DOI:
10.3389/fams.2019.00002
URL:
https://www.frontiersin.org/articles/10.3389/fams.2019.00002/full
Dokumenttyp:
3a. Beiträge in begutachteten Zeitschriften; Beitrag in Sonderheft
Sprache:
Englisch
Schlagwörter:
Lesefertigkeit; Technologiebasiertes Testen; Computerunterstütztes Verfahren; Papier-Bleistift-Test; Antwort; Zeit; Messung; Item-Response-Theory; Modell; Geschlechtsspezifischer Unterschied; Logdatei; Datenanalyse; Empirische Untersuchung; Deutschland
Abstract:
In this paper, we developed a method to extract item-level response times from log data that are available in computer-based assessments (CBA) and paper-based assessments (PBA) with digital pens. Based on response times that were extracted using only time differences between responses, we used the bivariate generalized linear IRT model framework (B-GLIRT, [1]) to investigate response times as indicators for response processes. A parameterization that includes an interaction between the latent speed factor and the latent ability factor in the cross-relation function was found to fit the data best in CBA and PBA. Data were collected with a within-subject design in a national add-on study to PISA 2012 administering two clusters of PISA 2009 reading units. After investigating the invariance of the measurement models for ability and speed between boys and girls, we found the expected gender effect in reading ability to coincide with a gender effect in speed in CBA. Taking this result as indication for the validity of the time measures extracted from time differences between responses, we analyzed the PBA data and found the same gender effects for ability and speed. Analyzing PBA and CBA data together we identified the ability mode effect as the latent difference between reading measured in CBA and PBA. Similar to the gender effect the mode effect in ability was observed together with a difference in the latent speed between modes. However, while the relationship between speed and ability is identical for boys and girls we found hints for mode differences in the estimated parameters of the cross-relation function used in the B-GLIRT model. (DIPF/Orig.)
DIPF-Abteilung:
Bildungsqualität und Evaluation
Instruktionssensitivität von Tests und Items
Naumann, Alexander; Musow, Stephanie; Aichele, Christine; Hochweber, Jan; Hartig, Johannes
Zeitschriftenbeitrag
| In: Zeitschrift für Erziehungswissenschaft | 2019
38407 Endnote
Autor*innen:
Naumann, Alexander; Musow, Stephanie; Aichele, Christine; Hochweber, Jan; Hartig, Johannes
Titel:
Instruktionssensitivität von Tests und Items
In:
Zeitschrift für Erziehungswissenschaft, 22 (2019) 1, S. 181-202
DOI:
10.1007/s11618-018-0832-0
URL:
https://link.springer.com/article/10.1007%2Fs11618-018-0832-0
Dokumenttyp:
3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache:
Deutsch
Schlagwörter:
Unterricht; Effektivität; Schülerleistung; Leistungsmessung; Test; Messverfahren; Empirische Forschung; Konzeption; Validität; Daten; Interpretation; Psychometrie; Item-Response-Theory; Modell
Abstract:
Testergebnisse von Schülerinnen und Schülern dienen regelmäßig als ein zentrales Kriterium für die Beurteilung der Effektivität von Schule und Unterricht. Gültige Rückschlüsse über Schule und Unterricht setzen voraus, dass die eingesetzten Testinstrumente mögliche Effekte des Unterrichts auffangen können, also instruktionssensitiv sind. Jedoch wird diese Voraussetzung nur selten empirisch überprüft. Somit bleibt mitunter unklar, ob ein Test nicht instruktionssensitiv oder ein Unterricht nicht effektiv war. Die Klärung dieser Frage erfordert die empirische Untersuchung der Instruktionssensitivität der eingesetzten Tests und Items. Während die Instruktionssensitivität in den USA bereits seit Langem diskutiert wird, findet das Konzept im deutschsprachigen Diskurs bislang nur wenig Beachtung. Unsere Arbeit zielt daher darauf ab, das Konzept Instruktionssensitivität in den deutschsprachigen Diskurs über schulische Leistungsmessung einzubetten. Dazu werden drei Themenfelder behandelt, (a) der theoretische Hintergrund des Konzepts Instruktionssensitivität, (b) die Messung von Instruktionssensitivität sowie (c) die Identifikation von weiteren Forschungsbedarfen. (DIPF/Orig.)
Abstract (english):
Students' performance in assessments is regularly attributed to more or less effective teaching. Valid interpretation requires that outcomes are affected by instruction to a significant degree. Hence, instruments need to be capable of detecting effects of instruction, that is, instruments need to be instructionally sensitive. However, empirical investigation of the instructional sensitivity of tests and items is seldom in practice. In consequence, in many cases, it remains unclear whether teaching was ineffective or the instrument was insensitive. While there is a living discussion on the instructional sensitivity of tests and items in the USA, the concept of instructional sensitivity is rather unknown in German-speaking countries. Thus, the present study aims at (a) introducing the concept of instructional sensitivity, (b) providing an overview on current approaches of measuring instructional sensitivity, and (c) identifying further research directions. (DIPF/Orig.)
DIPF-Abteilung:
Bildungsqualität und Evaluation
Predicting the difficulty of exercise items for dynamic difficulty adaptation in adaptive language […]
Pandarova, Irina; Schmidt, Torben; Hartig, Johannes; Boubekki, Ahcène; Jones, Roger Dale; […]
Zeitschriftenbeitrag
| In: International Journal of Artificial Intelligence in Education | 2019
39472 Endnote
Autor*innen:
Pandarova, Irina; Schmidt, Torben; Hartig, Johannes; Boubekki, Ahcène; Jones, Roger Dale; Brefeld, Ulf
Titel:
Predicting the difficulty of exercise items for dynamic difficulty adaptation in adaptive language tutoring
In:
International Journal of Artificial Intelligence in Education, 29 (2019) 3, S. 342-367
DOI:
10.1007/s40593-019-00180-4
URL:
https://link.springer.com/article/10.1007%2Fs40593-019-00180-4
Dokumenttyp:
3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache:
Englisch
Schlagwörter:
Fremdsprachenunterricht; Englischunterricht; Digitale Medien; Künstliche Intelligenz; Tutorensystem; Grammatik; Aufgabe; Zweitsprachenerwerb; Problemlösen; Schwierigkeit; Prognose; Messung; Computerunterstütztes Lernen; Schüler; Schuljahr 09; Schuljahr 10; Papier-Bleistift-Test; Gymnasium; Integrierte Gesamtschule; Item-Response-Theory; Itemanalyse; Niedersachsen; Deutschland
Abstract:
Advances in computer technology and artificial intelligence create opportunities for developing adaptive language learning technologies which are sensitive to individual learner characteristics. This paper focuses on one form of adaptivity in which the difficulty of learning content is dynamically adjusted to the learner's evolving language ability. A pilot study is presented which aims to advance the (semi-)automatic difficulty scoring of grammar exercise items to be used in dynamic difficulty adaptation in an intelligent language tutoring system for practicing English tenses. In it, methods from item response theory and machine learning are combined with linguistic item analysis in order to calibrate the difficulty of an initial exercise pool of cued gap-filling items (CGFIs) and isolate CGFI features predictive of item difficulty. Multiple item features at the gap, context and CGFI levels are tested and relevant predictors are identified at all three levels. Our pilot regression models reach encouraging prediction accuracy levels which could, pending additional validation, enable the dynamic selection of newly generated items ranging from moderately easy to moderately difficult. The paper highlights further applications of the proposed methodology in the area of adapting language tutoring, item design and second language acquisition, and sketches out issues for future research. (DIPF/Orig.)
DIPF-Abteilung:
Bildungsqualität und Evaluation
Markierungen aufheben
Alle Treffer markieren
Export
1
2
3
...
6
Die nächsten 10 Inhalte
>
Alle anzeigen
(59)