Menü Überspringen
Search
de
|
en
Contact
Not track
Data Protection
Log in
DIPF News
Research
Knowledge Resources
Networks
Institute
Zurück
de
|
en
Contact
Search
Home
>
Research
>
Publications
>
Publications Data Base
Search results in the DIPF database of publications
Your query:
(Schlagwörter: "Testkonstruktion")
Advanced Search
Search term
Only Open Access
Search
Unselect matches
Select all matches
Export
73
items matching your search terms.
Show all details
Adaptive item selection under matroid constraints
Bengs, Daniel; Brefeld, Ulf; Kröhne, Ulf
Journal Article
| In: Journal of Computerized Adaptive Testing | 2018
38642 Endnote
Author(s):
Bengs, Daniel; Brefeld, Ulf; Kröhne, Ulf
Title:
Adaptive item selection under matroid constraints
In:
Journal of Computerized Adaptive Testing, 6 (2018) 2, S. 15-36
DOI:
10.7333/1808-0602015
URN:
urn:nbn:de:0111-dipfdocs-166953
URL:
http://www.dipfdocs.de/volltexte/2020/16695/pdf/JCAT_2018_2_Bengs_Brefeld_Kroehne_Adaptive_item_selection_under_matroid_constraints_A.pdf
Publication Type:
3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Language:
Englisch
Keywords:
Adaptives Testen; Algorithmus; Computerunterstütztes Verfahren; Itembank; Messverfahren; Technologiebasiertes Testen; Testkonstruktion
Abstract (english):
The shadow testing approach (STA; van der Linden & Reese, 1998) is considered the state of the art in constrained item selection for computerized adaptive tests. The present paper shows that certain types of constraints (e.g., bounds on categorical item attributes) induce a matroid on the item bank. This observation is used to devise item selection algorithms that are based on matroid optimization and lead to optimal tests, as the STA does. In particular, a single matroid constraint can be treated optimally by an efficient greedy algorithm that selects the most informative item preserving the integrity of the constraints. A simulation study shows that for applicable constraints, the optimal algorithms realize a decrease in standard error (SE) corresponding to a reduction in test length of up to 10% compared to the maximum priority index (Cheng & Chang, 2009) and up to 30% compared to Kingsbury and Zara's (1991) constrained computerized adaptive testing.
DIPF-Departments:
Bildungsqualität und Evaluation
How to conceptualize, represent, and analyze log data from technology-based assessments? A generic […]
Kroehne, Ulf; Goldhammer, Frank
Journal Article
| In: Behaviormetrika | 2018
38895 Endnote
Author(s):
Kroehne, Ulf; Goldhammer, Frank
Title:
How to conceptualize, represent, and analyze log data from technology-based assessments? A generic framework and an application to questionnaire items
In:
Behaviormetrika, 45 (2018) 2, S. 527-563
DOI:
10.1007/s41237-018-0063-y
Publication Type:
3a. Beiträge in begutachteten Zeitschriften; Beitrag in Sonderheft
Language:
Englisch
Keywords:
Bildungsforschung; Empirische Forschung; Logdatei; Datenanalyse; Technologiebasiertes Testen; PISA <Programme for International Student Assessment>; Fragebogen; Konzeption; Testkonstruktion; Daten; Typologie; Hardware; Antwort; Verhalten; Dauer; Interaktion; Mensch-Maschine-Kommunikation; Indikator
Abstract:
Log data from educational assessments attract more and more attention and large-scale assessment programs have started providing log data as scientific use files. Such data generated as a by-product of computer-assisted data collection has been known as paradata in survey research. In this paper, we integrate log data from educational assessments into a taxonomy of paradata. To provide a generic framework for the analysis of log data, finite state machines are suggested. Beyond its computational value, the specific benefit of using finite state machines is achieved by separating platform-specific log events from the definition of indicators by states. Specifically, states represent filtered log data given a theoretical process model, and therefore, encode the information of log files selectively. The approach is empirically illustrated using log data of the context questionnaires of the Programme for International Student Assessment (PISA). We extracted item-level response time components from questionnaire items that were administered as item batteries with multiple questions on one screen and related them to the item responses. Finally, the taxonomy and the finite state machine approach are discussed with respect to the definition of complete log data, the verification of log data and the reproducibility of log data analyses. (DIPF/Orig.)
DIPF-Departments:
Bildungsqualität und Evaluation
Experimental validation strategies for heterogeneous computer-based assessment items
Engelhardt, Lena; Goldhammer, Frank; Naumann, Johannes; Frey, Andreas
Journal Article
| In: Computers in Human Behavior | 2017
37464 Endnote
Author(s):
Engelhardt, Lena; Goldhammer, Frank; Naumann, Johannes; Frey, Andreas
Title:
Experimental validation strategies for heterogeneous computer-based assessment items
In:
Computers in Human Behavior, 76 (2017) , S. 683-692
DOI:
10.1016/j.chb.2017.02.020
URN:
urn:nbn:de:0111-dipfdocs-176056
URL:
http://www.dipfdocs.de/volltexte/2019/17605/pdf/Engelhardt_et_al._2017_ManuscriptAccepted_A.pdf
Publication Type:
3a. Beiträge in begutachteten Zeitschriften; Beitrag in Sonderheft
Language:
Englisch
Keywords:
Leistungstest; Leistungsmessung; Medienkompetenz; Computerunterstütztes Verfahren; Validität; Testaufgabe; Testkonstruktion; Anpassung; Strategie; Veränderung; Testmethodik; Testtheorie
Abstract (english):
Computer-based assessments open up new possibilities to measure constructs in authentic settings. They are especially promising to measure 21st century skills, as for instance information and communication technologies (ICT) skills. Items tapping such constructs may be diverse regarding design principles and content and thus form a heterogeneous item set. Existing validation approaches, as the construct representation approach by Embretson (1983), however, require homogenous item sets in the sense that a particular task characteristic can be applied to all items. To apply this validation rational also for heterogeneous item sets, two experimental approaches are proposed based on the idea to create variants of items by systematically manipulating task characteristics. The change-approach investigates whether the manipulation affects construct-related demands and the eliminate-approach whether the test score represents the targeted skill dimension. Both approaches were applied within an empirical study (N = 983) using heterogeneous items from an ICT skills test. The results show how changes of ICT-specific task characteristics influenced item difficulty without changing the represented construct. Additionally, eliminating the intended skill dimension led to easier items and changed the construct partly. Overall, the suggested experimental approaches provide a useful validation tool for 21st century skills assessed by heterogeneous items. (DIPF/Orig.)
DIPF-Departments:
Bildungsqualität und Evaluation
Modeling individual response time effects between and within experimental speed conditions. A GLMM […]
Goldhammer, Frank; Steinwascher, Merle A.; Kroehne, Ulf; Naumann, Johannes
Journal Article
| In: British Journal of Mathematical and Statistical Psychology | 2017
37357 Endnote
Author(s):
Goldhammer, Frank; Steinwascher, Merle A.; Kroehne, Ulf; Naumann, Johannes
Title:
Modeling individual response time effects between and within experimental speed conditions. A GLMM approach for speeded tests
In:
British Journal of Mathematical and Statistical Psychology, 70 (2017) 2, S. 238-256
DOI:
10.1111/bmsp.12099
Publication Type:
3a. Beiträge in begutachteten Zeitschriften; Beitrag in Sonderheft
Language:
Englisch
Keywords:
Test; Testkonstruktion; Antwort; Dauer; Unterschied; Messverfahren; Entscheidung; Einflussfaktor; Fehler; Modell; Vergleich
Abstract:
Completing test items under multiple speed conditions avoids the performance measure being confounded with individual differences in the speed-accuracy compromise, and offers insights into the response process, that is, how response time relates to the probability of a correct response. This relation is traditionally represented by two conceptually different functions: the speed-accuracy trade-off function (SATF) across conditions relating the condition average response time to the condition average of accuracy, and the conditional accuracy function (CAF) within a condition describing accuracy conditional on response time. Using a generalized linear mixed modelling approach, we propose an item response modelling framework that is suitable for item response and response time data from experimental speed conditions. The proposed SATF and CAF model accommodates response time effects between conditions (i.e., person and item SATF slope) and within conditions (i.e., residual CAF slopes), captures person and item differences in these effects, and is suitable for measures with a strong speed component. Moreover, for a single condition a CAF model is proposed distinguishing person, item and residual CAF. The properties of the models are illustrated with an empirical example. (DIPF/Orig.)
DIPF-Departments:
Bildungsqualität und Evaluation
Practical significance of item misfit in educational assessments
Köhler, Carmen; Hartig, Johannes
Journal Article
| In: Applied Psychological Measurement | 2017
37161 Endnote
Author(s):
Köhler, Carmen; Hartig, Johannes
Title:
Practical significance of item misfit in educational assessments
In:
Applied Psychological Measurement, 41 (2017) 5, S. 388-400
DOI:
10.1177/0146621617692978
URN:
urn:nbn:de:0111-pedocs-156084
URL:
https://nbn-resolving.org/urn:nbn:de:0111-pedocs-156084
Publication Type:
3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Language:
Englisch
Keywords:
Item-Response-Theory; Korrelation; Leistungsmessung; Rasch-Modell; Schülerleistung; Schülerleistungstest; Testkonstruktion; Testtheorie; Validität
Abstract:
Testing item fit is an important step when calibrating and analyzing item response theory (IRT)-based tests, as model fit is a necessary prerequisite for drawing valid inferences from estimated parameters. In the literature, numerous item fit statistics exist, sometimes resulting in contradictory conclusions regarding which items should be excluded from the test. Recently, researchers argue to shift the focus from statistical item fit analyses to evaluating practical consequences of item misfit. This article introduces a method to quantify potential bias of relationship estimates (e.g., correlation coefficients) due to misfitting items. The potential deviation informs about whether item misfit is practically significant for outcomes of substantial analyses. The method is demonstrated using data from an educational test. (DIPF/Orig.)
DIPF-Departments:
Bildungsqualität und Evaluation
Incremental validity of multidimensional proficiency scores from diagnostic classification models: […]
Kunina-Habenicht, Olga; Rupp, André A.; Wilhelm, Oliver
Journal Article
| In: International Journal of Testing | 2017
37179 Endnote
Author(s):
Kunina-Habenicht, Olga; Rupp, André A.; Wilhelm, Oliver
Title:
Incremental validity of multidimensional proficiency scores from diagnostic classification models: An illustration for elementary school mathematics
In:
International Journal of Testing, 17 (2017) 4, S. 277-301
DOI:
10.1080/15305058.2017.1291517
Publication Type:
3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Language:
Englisch
Keywords:
Arithmetik; Diagnostik; Empirische Untersuchung; Item-Response-Theory; Leistungsmessung; Mathematische Kompetenz; Modell; Regressionsanalyse; Reliabilität; Schülerleistung; Schülerleistungstest; Schuljahr 04; Testkonstruktion; Validität
Abstract (english):
Diagnostic classification models (DCMs) hold great potential for applications in summative and formative assessment by providing discrete multivariate proficiency scores that yield statistically-driven classifications of students. Using data from a newly developed diagnostic arithmetic assessment that was administered to 2,032 fourth-grade students in Germany, we evaluated whether the multidimensional proficiency scores from the best-fitting DCM have an added value, over and above the unidimensional proficiency score from a simpler unidimensional IRT model, in explaning variance in external (a) school grades in mathematics and (b) unidimensional proficiency scores from a standards-based large-scale assessment of mathematics. Results revealed high classification reliabilities as well as interpretable parameter estimates for items and students for the best-fitting DCM. However, while DCM scores were moderatly correlated with both external criteria, only a negligible incremental validity of the multivariate attribute scores was found. (DIPF/Orig.)
DIPF-Departments:
Bildungsqualität und Evaluation
Absolute and relative measures of instructional sensitivity
Naumann, Alexander; Hartig, Johannes; Hochweber, Jan
Journal Article
| In: Journal of Educational and Behavioral Statistics | 2017
37374 Endnote
Author(s):
Naumann, Alexander; Hartig, Johannes; Hochweber, Jan
Title:
Absolute and relative measures of instructional sensitivity
In:
Journal of Educational and Behavioral Statistics, 42 (2017) 6, S. 678-705
DOI:
10.3102/1076998617703649
URN:
urn:nbn:de:0111-pedocs-156029
URL:
http://www.dipfdocs.de/volltexte/2018/15602/pdf/1076998617703649_A.pdf
Publication Type:
3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Language:
Englisch
Keywords:
Bewertung; DESI <Deutsch-Englisch-Schülerleistungen-International>; Deutschland; Englischunterricht; Item-Response-Theory; Leistungsmessung; Messverfahren; Schüler; Schülerleistung; Schuljahr 09; Sprachkompetenz; Test; Testkonstruktion; Testtheorie; Unterricht; Wirkung
Abstract:
Valid inferences on teaching drawn from students' test scores require that tests are sensitive to the instruction students received in class. Accordingly, measures of the test items' instructional sensitivity provide empirical support for validity claims about inferences on instruction. In the present study, we first introduce the concepts of absolute and relative measures of instructional sensitivity. Absolute measures summarize a single item's total capacity of capturing effects of instruction, which is independent of the test's sensitivity. In contrast, relative measures summarize a single item's capacity of capturing effects of instruction relative to test sensitivity. Then, we propose a longitudinal multilevel item response theory model that allows estimating both types of measures depending on the identification constraints. (DIPF/Orig.)
DIPF-Departments:
Bildungsqualität und Evaluation
Time-on-task effects in digital reading are non-linear and moderated by persons' skills and tasks' […]
Naumann, Johannes; Goldhammer, Frank
Journal Article
| In: Learning and Individual Differences | 2017
36715 Endnote
Author(s):
Naumann, Johannes; Goldhammer, Frank
Title:
Time-on-task effects in digital reading are non-linear and moderated by persons' skills and tasks' demands
In:
Learning and Individual Differences, 53 (2017) , S. 1-16
DOI:
10.1016/j.lindif.2016.10.002
Publication Type:
3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Language:
Englisch
Keywords:
Digitale Medien; Hypertext; Internationaler Vergleich; Kognitive Prozesse; Leistungsmessung; Lesekompetenz; Lesen; Leseverstehen; Modell; OECD-Länder; PISA <Programme for International Student Assessment>; Problemlösen; Schülerleistung; Technologiebasiertes Testen; Testaufgabe; Testkonstruktion; Wirkung; Zeit
Abstract:
Time-on-task effects on response accuracy in digital reading tasks were examined using PISA 2009 data (N = 34,062, 19 countries/economies). As a baseline, task responses were explained by time on task, tasks' easiness, and persons' digital reading skill (Model 1). Model 2 added a quadratic time-on-task effect, persons' comprehension skill and tasks' navigation demands as predictors. In each country, linear and quadratic time-on-task effects were moderated by person and task characteristics. Strongly positive linear time-on-task effects were found for persons being poor digital readers (Model 1) and poor comprehenders (Model 2), which decreased with increasing skill. Positive linear time-on-task effects were found for hard tasks (Model 1) and tasks high in navigation demands (Model 2). For easy tasks and tasks low in navigation demands, the time-on-task effects were negative, or close to zero, respectively. A negative quadratic component of the time-on-task effect was more pronounced for strong comprehenders, while the linear component was weaker. Correspondingly, for tasks high in navigation demands the negative quadratic component to the time-on-task effect was weaker, and the linear component was stronger. These results are in line with a dual-processing account of digital reading that distinguishes automatic reading components from resource-demanding regulation and navigation processes. (DIPF/Orig.)
DIPF-Departments:
Bildungsqualität und Evaluation
Bedeutung und Berechnung der Prozentränge und T-Werte beim Erstellen von Testnormen. Anmerkungen […]
Woerner, Wolfgang; Müller, Christian; Hasselhorn, Marcus
Book Chapter
| Aus: Trautwein, Ulrich; Hasselhorn, Marcus (Hrsg.): Begabungen und Talente | Göttingen: Hogrefe | 2017
37063 Endnote
Author(s):
Woerner, Wolfgang; Müller, Christian; Hasselhorn, Marcus
Title:
Bedeutung und Berechnung der Prozentränge und T-Werte beim Erstellen von Testnormen. Anmerkungen und Empfehlungen
In:
Trautwein, Ulrich; Hasselhorn, Marcus (Hrsg.): Begabungen und Talente, Göttingen: Hogrefe, 2017 (Test und Trends. N. F., 15), S. 245-263
Publication Type:
4. Beiträge in Sammelwerken; Sammelband (keine besondere Kategorie)
Language:
Deutsch
Keywords:
Pädagogische Diagnostik; Begabtenauslese; Leistungstest; Testkonstruktion; Testmethodik; Qualität; Testauswertung; SPSS; Stichprobe; Testverfahren; Testtheorie
Abstract:
Die Nützlichkeit und der wissenschaftliche Wert eines pädagogisch-psychologischen Diagnoseinstruments setzen neben dem Nachweis von angemessen erfüllten Gütekriterien und einer ausreichend detaillierten Dokumentation der verwendeten Methodik aus voraus, dass geeignete Normwerte vorliegen. Angesichts der zentralen Rolle des Normierungsprozesses überrascht - auch bei aktuell verwendeten (Schul-)Leistungstests - eine bedauerliche Heterogenität der methodisch-rechnerischen Bestimmung von Normwerten mit bisweilen erheblichen Konsequenzen für individualdiagnostische Entscheidungen. Einschlägige Lehrbücher beschreiben zwar verschiedene alternative Methoden, ohne jedoch konkrete Empfehlungen zu deren Verwendung anzusprechen. Um dies nachzuholen, wird in diesem Beitrag ausführlich auf die Bedeutung und Berechnung von Prozentrang-Werten und darauf aufbauenden Standardnorm-Äquivalenten eingegangen. Insbesondere wird der Unterschied zwischen kumulativen Prozentwerten und dem hier nachdrücklich empfohlenen Intervallmitten-Prozentrang (IM-PR) erläutert. Um künftigen Testentwicklern die Berechnung von IM-PR-Werten zu erleichtern, werden im Appendix entsprechende SPSS-Mustersyntaxen zur Verfügung gestellt - in der Hoffnung, dass sich dadurch in Zukunft eine einheitliche Berechnungsgrundlage der Normwerte von psychodiagnostischen Verfahren erzielen lässt. (DIPF/Orig.)
DIPF-Departments:
Bildung und Entwicklung
PISA 2015. Eine Studie zwischen Kontinuität und Innovation
Reiss, Kristina; Sälzer, Christine; Schiepe-Tiska, Anja; Klieme, Eckhard; Köller, Olaf (Hrsg.)
Compilation Book
| Münster: Waxmann | 2016
36828 Endnote
Editor(s)
Reiss, Kristina; Sälzer, Christine; Schiepe-Tiska, Anja; Klieme, Eckhard; Köller, Olaf
Title:
PISA 2015. Eine Studie zwischen Kontinuität und Innovation
Published:
Münster: Waxmann, 2016
URL:
https://www.waxmann.com/fileadmin/media/zusatztexte/3555Volltext.pdf
Publication Type:
2. Herausgeberschaft; Sammelband (keine besondere Kategorie)
Language:
Deutsch
Keywords:
Deutschland; Einstellung <Psy>; Eltern; Empirische Untersuchung; Entdeckendes Lernen; Forschendes Lernen; Fragebogen; Freude; Geschlechtsspezifischer Unterschied; Interesse; Internationale Organisation; Internationaler Vergleich; Jugendlicher; Kompetenzerwerb; Konzeption; Leistungsmessung; Lernbedingungen; Lernumgebung; Lesekompetenz; Mathematische Kompetenz; Migrationshintergrund; Motivation; Naturwissenschaftliche Kompetenz; Naturwissenschaftlicher Unterricht; OECD-Länder; Organisation; PISA <Programme for International Student Assessment>; Qualität; Querschnittuntersuchung; Reliabilität; Schulentwicklung; Schülerleistung; Schülerleistungstest; Schulform; Schulklima; Sekundarbereich; Selbstwirksamkeit; Skalierung; Soziale Herkunft; Stichprobe; Technologiebasiertes Testen; Teilnehmer; Testaufgabe; Testauswertung; Testdurchführung; Testkonstruktion; Testmethodik; Überzeugung; Validität; Veränderung; Wahrnehmung
Abstract:
Alle drei Jahre testet PISA den Stand der Grundbildung fünfzehnjähriger Jugendlicher in den Bereichen Naturwissenschaften, Mathematik und Lesen und untersucht so Stärken und Schwächen von Bildungssystemen im Vergleich der OECD-Staaten. Zentral ist dabei die Frage, inwieweit es den teilnehmenden Staaten gelingt, die Schülerinnen und Schüler während der Schulpflicht auf ihre weiteren Bildungs- und Berufswege vorzubereiten. Der nationale Berichtsband stellt die Ergebnisse aus PISA 2015 vor, die von den Schülerinnen und Schülern in Deutschland erreicht wurden, und setzt sie in Relation zu den Ergebnissen in anderen OECD-Staaten. Der Schwerpunkt der Erhebungen und Auswertungen liegt dabei auf den Naturwissenschaften. PISA 2015 bildet als sechste Erhebungsrunde des Programme for International Student Assessment der OECD zugleich den Abschluss des zweiten Zyklus der Studie und den Beginn der computerbasierten Testung. Unter Beibehaltung wesentlicher Standards der Datenerhebung und -auswertung wurden in PISA 2015 mit dem Erhebungsmodus am Computer, einem differenzierteren Skalierungsmodell und einem erweiterten Testdesign mehrere Neuerungen eingeführt. Sie tragen Veränderungen in der Lern- und Lebenswelt Rechnung und werden die Aussagekraft der PISA-Studien auf lange Sicht verbessern. Mit Blick auf diese Balance zwischen Kontinuität und Innovation werden die Befunde aus PISA 2015 in diesem Band eingeordnet und diskutiert. (DIPF/Verlag)
DIPF-Departments:
Bildungsqualität und Evaluation
Unselect matches
Select all matches
Export
<
1
2
3
4
...
8
>
Show all
(73)