Ergebnis der Suche in der DIPF Publikationendatenbank

Ihre Abfrage:

(Schlagwörter: "Itemanalyse")

Development and evaluation of a framework for the performance-based testing of ICT skills Engelhardt, Lena; Naumann, Johannes; Goldhammer, Frank; Frey, Andreas; Horz, Holger; Hartig, Katja; […] Zeitschriftenbeitrag | In: Frontiers in Education | 2021 41203 Endnote: Autor*innen: Engelhardt, Lena; Naumann, Johannes; Goldhammer, Frank; Frey, Andreas; Horz, Holger; Hartig, Katja; Wenzel, S. Franziska C.
Titel: Development and evaluation of a framework for the performance-based testing of ICT skills
In: Frontiers in Education, 6 (2021) , S. 668860
DOI: 10.3389/feduc.2021.668860
URL: https://www.frontiersin.org/articles/10.3389/feduc.2021.668860/full
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Informations- und Kommunikationstechnologie; Praktische Fertigkeit; Wissen; Problemlösen; Textverständnis; Bildverstehen; Bewertung; Modell; Item; Entwicklung; Testvalidität; Itemanalyse; Rasch-Modell; Implementation; Evaluation; Test; Testverhalten; Schüler; Sekundarstufe I; Baden-Württemberg; Rheinland-Pfalz; Deutschland
Abstract (english): This paper addresses the development of performance-based assessment items for ICT skills, skills in dealing with information and communication technologies, a construct which is rather broadly and only operationally defined. Item development followed a construct-driven approach to ensure that test scores could be interpreted as intended. Specifically, ICT-specific knowledge as well as problem-solving and the comprehension of text and graphics were defined as components of ICT skills and cognitive ICT tasks (i.e., accessing, managing, integrating, evaluating, creating). In order to capture the construct in a valid way, design principles for constructing the simulation environment and response format were formulated. To empirically evaluate the very heterogeneous items and detect malfunctioning items, item difficulties were analyzed and behavior-related indicators with item-specific thresholds were developed and applied. The 69 item's difficulty scores from the Rasch model fell within a comparable range for each cognitive task. Process indicators addressing time use and test-taker interactions were used to analyze whether most test-takers executed the intended processes, exhibited disengagement, or got lost among the items. Most items were capable of eliciting the intended behavior; for the few exceptions, conclusions for item revisions were drawn. The results affirm the utility of the proposed framework for developing and implementing performance-based items to assess ICT skills. (DIPF/Orig.)
DIPF-Abteilung: Lehr und Lernqualität in Bildungseinrichtungen

Questionnaire scale characteristics Mihaly, Kata; Klieme, Eckhard; Fischer, Jessica; Doan, Sy Sammelbandbeitrag | Aus: OECD (Hrsg.): Global teaching insights technical report | Paris: OECD Publishing | 2021 42209 Endnote: Autor*innen: Mihaly, Kata; Klieme, Eckhard; Fischer, Jessica; Doan, Sy
Titel: Questionnaire scale characteristics
Aus: OECD (Hrsg.): Global teaching insights technical report, Paris: OECD Publishing, 2021 , S. 1-22
URL: https://www.oecd.org/education/school/GTI-TechReport-Chapter18.pdf
Dokumenttyp: 4. Beiträge in Sammelbänden; Sammelband (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Psychometrie; Fragebogen; Skala; Itemanalyse; Reliabilität; Messung; Invarianz; Unterrichtsforschung
Abstract: Der Beitrag beschreibt die psychometrischen Kennwerte der in der TALIS Video Studie verwendeten Fragebogen-Skalen einschließlich einer Überprüfung der Messinvarianz. (DIPF/Autor)
Abstract (english): The chapter documents the psychometric features of all questionnaire scales used in the TALIS Vdieo Study, including checks of measurement invariance for some of the scales. (DIPF/Orig.)
DIPF-Abteilung: Lehr und Lernqualität in Bildungseinrichtungen

Evaluation of online information in university students. Development and scaling of the screening […] Hahnel, Carolin; Eichmann, Beate; Goldhammer, Frank Zeitschriftenbeitrag | In: Frontiers in Psychology | 2020 40881 Endnote: Autor*innen: Hahnel, Carolin; Eichmann, Beate; Goldhammer, Frank
Titel: Evaluation of online information in university students. Development and scaling of the screening instrument EVON
In: Frontiers in Psychology, (2020) , S. 11:562128
DOI: 10.3389/fpsyg.2020.562128
URN: urn:nbn:de:0111-pedocs-232241
URL: https://nbn-resolving.org/urn:nbn:de:0111-pedocs-232241
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Deutschland; Internet; Informationskompetenz; Ressource; Glaubwürdigkeit; Relevanz; Bewertung; Test; Testentwicklung; Itemanalyse; Suchmaschine; Simulation; Technologiebasiertes Testen; Interview; Erhebungsinstrument; Evaluation; Student; Rasch-Modell; Empirische Untersuchung;
Abstract: As Internet sources provide information of varying quality, it is an indispensable prerequisite skill to evaluate the relevance and credibility of online information. Based on the assumption that competent individuals can use different properties of information to assess its relevance and credibility, we developed the EVON (evaluation of online information), an interactive computer-based test for university students. The developed instrument consists of eight items that assess the skill to evaluate online information in six languages. Within a simulated search engine environment, students are requested to select the most relevant and credible link for a respective task. To evaluate the developed instrument, we conducted two studies: (1) a pre-study for quality assurance and observing the response process (cognitive interviews of n = 8 students) and (2) a main study aimed at investigating the psychometric properties of the EVON and its relation to other variables (n = 152 students). The results of the pre-study provided first evidence for a theoretically sound test construction with regard to students' item processing behavior. The results of the main study showed acceptable psychometric outcomes for a standardized screening instrument with a small number of items. The item design criteria affected the item difficulty as intended, and students' choice to visit a website had an impact on their task success. Furthermore, the probability of task success was positively predicted by general cognitive performance and reading skill. Although the results uncovered a few weaknesses (e.g., a lack of difficult items), and the efforts of validating the interpretation of EVON outcomes still need to be continued, the overall results speak in favor of a successful test construction and provide first indication that the EVON assesses students' skill in evaluating online information in search engine environments. (DIPF/Orig.)
DIPF-Abteilung: Bildungsqualität und Evaluation

Predicting the difficulty of exercise items for dynamic difficulty adaptation in adaptive language […] Pandarova, Irina; Schmidt, Torben; Hartig, Johannes; Boubekki, Ahcène; Jones, Roger Dale; […] Zeitschriftenbeitrag | In: International Journal of Artificial Intelligence in Education | 2019 39472 Endnote: Autor*innen: Pandarova, Irina; Schmidt, Torben; Hartig, Johannes; Boubekki, Ahcène; Jones, Roger Dale; Brefeld, Ulf
Titel: Predicting the difficulty of exercise items for dynamic difficulty adaptation in adaptive language tutoring
In: International Journal of Artificial Intelligence in Education, 29 (2019) 3, S. 342-367
DOI: 10.1007/s40593-019-00180-4
URL: https://link.springer.com/article/10.1007%2Fs40593-019-00180-4
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Fremdsprachenunterricht; Englischunterricht; Digitale Medien; Künstliche Intelligenz; Tutorensystem; Grammatik; Aufgabe; Zweitsprachenerwerb; Problemlösen; Schwierigkeit; Prognose; Messung; Computerunterstütztes Lernen; Schüler; Schuljahr 09; Schuljahr 10; Papier-Bleistift-Test; Gymnasium; Integrierte Gesamtschule; Item-Response-Theory; Itemanalyse; Niedersachsen; Deutschland
Abstract: Advances in computer technology and artificial intelligence create opportunities for developing adaptive language learning technologies which are sensitive to individual learner characteristics. This paper focuses on one form of adaptivity in which the difficulty of learning content is dynamically adjusted to the learner's evolving language ability. A pilot study is presented which aims to advance the (semi-)automatic difficulty scoring of grammar exercise items to be used in dynamic difficulty adaptation in an intelligent language tutoring system for practicing English tenses. In it, methods from item response theory and machine learning are combined with linguistic item analysis in order to calibrate the difficulty of an initial exercise pool of cued gap-filling items (CGFIs) and isolate CGFI features predictive of item difficulty. Multiple item features at the gap, context and CGFI levels are tested and relevant predictors are identified at all three levels. Our pilot regression models reach encouraging prediction accuracy levels which could, pending additional validation, enable the dynamic selection of newly generated items ranging from moderately easy to moderately difficult. The paper highlights further applications of the proposed methodology in the area of adapting language tutoring, item design and second language acquisition, and sketches out issues for future research. (DIPF/Orig.)
DIPF-Abteilung: Bildungsqualität und Evaluation

Anforderungen, Entscheidungsfreiräume und Stress im Studium. Erste Befunde zu Reliabilität und […] Schmidt, Laura I.; Scheiter, Fabian; Neubauer, Andreas B.; Sieverding, Monika Zeitschriftenbeitrag | In: Diagnostica | 2019 38753 Endnote: Autor*innen: Schmidt, Laura I.; Scheiter, Fabian; Neubauer, Andreas B.; Sieverding, Monika
Titel: Anforderungen, Entscheidungsfreiräume und Stress im Studium. Erste Befunde zu Reliabilität und Validität eines Fragebogens zu strukturellen Belastungen und Ressourcen (StrukStud) in Anlehnung an den Job Content Questionnaire
In: Diagnostica, 65 (2019) 2, S. 63-74
DOI: 10.1026/0012-1924/a000213
URN: urn:nbn:de:0111-pedocs-180602
URL: http://nbn-resolving.org/urn:nbn:de:0111-pedocs-180602
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Deutsch
Schlagwörter: Studium; Universität; Stress; Belastung; Wohlbefinden; Gesundheit; Entscheidung; Freiheit; Unterstützung; Student; Selbsteinschätzung; Modell; Fragebogen; Psychometrie; Validität; Reliabilität; Faktorenanalyse; Itemanalyse; Empirische Untersuchung; Heidelberg; Deutschland
Abstract: Mit dem Demand-Control-Modell und dem dazugehörigen Job Content Questionnaire (JCQ) existiert im Arbeitsumfeld ein bewährtes Modell zur Vorhersage physischer und psychischer Gesundheitsrisiken. Um diese auch unter Studierenden theoriegeleitet vorhersagen zu können, passten wir den JCQ auf den Hochschulkontext an und untersuchten mittels unseres Fragebogens zu strukturellen Belastungen und Ressourcen im Studium (StrukStud) den Erklärungsbeitrag hinsichtlich Stresserleben und Wohlbefinden. In 4 Studien mit insgesamt 732 Studierenden (Psychologie, Lehramt, Soziale Arbeit, Wirtschaftsrecht und Erziehung & Bildung) wurden die Demand-Control-Dimensionen (StrukStud), Stresserleben (Heidelberger Stress-Index HEI-STRESS & Perceived Stress Questionnaire) und weitere Referenzkonstrukte wie Studienzufriedenheit und körperliche Beschwerden erfasst. Befunde zur Reliabilität und Validität werden vorgestellt. Die Ergebnisse belegen die psychometrische Qualität des StrukStud sowie dessen Potenzial zur Erklärung von Stress im Studium. Mit dem StrukStud liegt für den deutschsprachigen Raum erstmals ein ökonomisches Selbsteinschätzungsinstrument zur Erfassung von psychologischen Anforderungen und Entscheidungsfreiräumen im Studium vor.
Abstract (english): Karasek's demand-control model and the corresponding Job Content Questionnaire (JCQ) have greatly influenced research conducted on psychosocial factors at work and health. In our questionnaire on structural conditions (StrukStud), we applied the JCQ to the situation of university students in order to explore the contribution of the Karasek dimensions on outcomes such as psychological distress. In 4 studies of 732 university students (Psychology, Teaching, Social Work, Business Law, and Educational Science) we assessed the demand-control dimensions (StrukStud), stress (Heidelberg Stress Index [HEI-STRESS] and Perceived Stress Questionnaire), and related constructs such as study satisfaction and physical health complaints. Initial findings on reliability and validity are presented. Results demonstrate the psychometric properties of the StrukStud and its potential to explain study-related stress. For the German-speaking countries, the StrukStud is the first economic self-report measure for psychological demands and decision latitude in the context of higher education.
DIPF-Abteilung: Bildung und Entwicklung

Multidimensional structures of competencies. Focusing on text comprehension in English as a foreign […] Hartig, Johannes; Harsch, Claudia Sammelbandbeitrag | Aus: Leutner, Detlev;Fleischer, Jens;Grünkorn, Juliane;Klieme, Eckhard (Hrsg.): Competence assessment in education: Research, models and instruments | Cham: Springer | 2017 37126 Endnote: Autor*innen: Hartig, Johannes; Harsch, Claudia
Titel: Multidimensional structures of competencies. Focusing on text comprehension in English as a foreign language
Aus: Leutner, Detlev;Fleischer, Jens;Grünkorn, Juliane;Klieme, Eckhard (Hrsg.): Competence assessment in education: Research, models and instruments, Cham: Springer, 2017 (Methodology of educational measurement and assessment), S. 357-368
DOI: 10.1007/978-3-319-50030-0_21
Dokumenttyp: 4. Beiträge in Sammelwerken; Sammelband (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Deutschland; Englischunterricht; Englisch als Zweitsprache; Textverständnis; Hörverstehen; Item-Response-Theory; Itemanalyse; Schwierigkeit; Test; Schüler; Schuljahr 09; Text; Rezeption; Rasch-Modell
Abstract: The project "Modeling competencies with multidimensional item-response-theory models" examined different psychometric models for student performance in English as a foreign language. On the basis of the results of re-analyses of data from completed large scale assessments, a new test of reading and listening comprehension was constructed. The items within this test use the same text material both for reading and for listening tasks, thus allowing a closer examination of the relations between abilities required for the comprehension of both written and spoken texts. Furthermore, item characteristics (e.g., cognitive demands and response format) were systematically varied, allowing us to disentangle the effects of these characteristics on item difficulty and dimensional structure. This chapter presents results on the properties of the newly developed test: Both reading and listening comprehension can be reliably measured (rel = .91 for reading and .86 for listening). Abilities for both sub-domains prove to be highly correlated yet empirically distinguishable, with a latent correlation of .84. Despite the listening items being more difficult, in terms of absolute correct answers, the difficulties of the same items in the reading and listening versions are highly correlated (r = .84). Implications of the results for measuring language competencies in educational contexts are discussed. (DIPF/Orig.)
DIPF-Abteilung: Bildungsqualität und Evaluation

Die empirische Untersuchung von individueller Förderung als Perspektive für die […] Dumont, Hanna Sammelbandbeitrag | Aus: McElvany, Nele; Bos, Wilfried; Holzappels, Heinz Günter; Gebauer, Miriam M.; Schwabe, Franziska (Hrsg.): Bedingungen und Effekte guten Unterrichts: Aktueller Stand und Perspektiven der Unterrichtsforschung | Münster: Waxmann | 2016 36130 Endnote: Autor*innen: Dumont, Hanna
Titel: Die empirische Untersuchung von individueller Förderung als Perspektive für die Unterrichtsqualitätsforschung
Aus: McElvany, Nele; Bos, Wilfried; Holzappels, Heinz Günter; Gebauer, Miriam M.; Schwabe, Franziska (Hrsg.): Bedingungen und Effekte guten Unterrichts: Aktueller Stand und Perspektiven der Unterrichtsforschung, Münster: Waxmann, 2016 (Dortmunder Symposium der Empirischen Bildungsforschung, 1), S. 107-116
Dokumenttyp: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Deutsch
Schlagwörter: Baden-Württemberg; Deutschland; Empirische Untersuchung; Entwicklung; Evaluation; Faktorenanalyse; Fragebogen; Grundschüler; Individuelle Förderung; Itemanalyse; Itembank; Methode; Nordrhein-Westfalen; Qualität; Schüler; Schuljahr 03; Schuljahr 04; Unterricht; Unterrichtsforschung; Validität
Abstract: Der vorliegende Beitrag [geht] der Frage nach, wie individuelle Förderung im Rahmen der Unterrichtsqualitätsforschung empirisch untersucht werden kann. Dazu wird zunächst dargestellt, in welcher Form das Konzept bislang in diesem Kontext thematisiert wurde. Im Anschluss daran soll die Entwicklung und Validierung eines Schülerfragebogens zur Erfassung von individueller Förderung in der Grundschule vorgestellt werden. Auf der Basis der empirischen Befunde wird schließlich diskutiert, wie die empirische Untersuchung von individueller Förderung und seiner Wirksamkeit erfolgen kann. (Orig.)
DIPF-Abteilung: Struktur und Steuerung des Bildungswesens

Erfassung der Unterrichtsqualität mithilfe von Schülerurteilen. Chancen, Grenzen und […] Göllner, Richard; Wagner, Wolfgang; Klieme, Eckhard; Lüdtke, Oliver; Nagengast, Benjamin; […] Sammelbandbeitrag | Aus: Bundesministerium für Bildung und Forschung (Hrsg.): Forschungsvorhaben in Ankopplung an Large-Scale-Assessments | Berlin: Bundesministerium für Bildung und Forschung | 2016 36543 Endnote: Autor*innen: Göllner, Richard; Wagner, Wolfgang; Klieme, Eckhard; Lüdtke, Oliver; Nagengast, Benjamin; Trautwein, Ulrich
Titel: Erfassung der Unterrichtsqualität mithilfe von Schülerurteilen. Chancen, Grenzen und Forschungsperspektiven
Aus: Bundesministerium für Bildung und Forschung (Hrsg.): Forschungsvorhaben in Ankopplung an Large-Scale-Assessments, Berlin: Bundesministerium für Bildung und Forschung, 2016 (Bildungsforschung, 44), S. 63-82
URL: https://www.bmbf.de/pub/Bildungsforschung_Band_44.pdf#page=65
Dokumenttyp: 4. Beiträge in Sammelwerken; Sammelband (keine besondere Kategorie)
Sprache: Deutsch
Schlagwörter: Bewertung; Bildungssprache; Deutschland; Empirische Untersuchung; Forschungsprojekt; Itemanalyse; Kompetenz; Psychometrie; Qualität; Schüler; Schülerurteil; Unterricht; Unterrichtsforschung; Urteilsfähigkeit; Validität
Abstract: Schülerurteile stellen eine wichtige Datenquelle zur Messung lern- und leistungsförderlicher Qualitätsaspekte des Unterrichts dar (Clausen, 2002; Klieme, Schümer & Knoll, 2001; Fraser & Walberg, 1991; Seidel & Shavelson, 2007). Schülerinnen und Schüler gelten als Experten des Unterrichts. Sie sind in der Lage, Vergleiche mit anderen Lehrkräften anzustellen, sie können ihr Urteil auf einen ausgedehnten Beobachtungszeitraum stützen, und sie haben die Möglichkeit, selbst über seltene Ereignisse im Unterricht Auskunft zu geben. Zudem können mit Schülerbefragungen eine Vielzahl von Beurteilern im Hinblick auf einen Beurteilungsgegenstand befragt und somit eine hohe Informationsdichte (und gegebenenfalls -vielfalt) erzielt werden. Gerade in Hinblick auf große Schulleistungsstudien wie PISA, IGLU oder TRAIN sind Schülerurteile von potenziell hoher Bedeutung, da ihre Erfassung relativ kostengünstig ist und wenig Zeitressourcen in Anspruch nimmt (Clausen, 2002; Lüdtke, Trautwein, Kunter & Baumert, 2006). Aber wie gut sind diese Schülerurteile? Nach wie vor ist zu wenig über die psychometrische Güte (wie Reliabilität und Validität) von Schülerurteilen über den Unterricht bekannt. Können Schülerinnen und Schüler theoretisch distinkte Facetten des Unterrichtsgeschehens zuverlässig und zutreffend einschätzen? Inwiefern sind die Einschätzungen von Schülerinnen und Schülern über unterschiedliche Kontexte (z.B. Unterrichtsfächer oder Schulklassen) hinweg vergleichbar? Inwiefern beeinflusst die sprachliche Gestaltung der Items den Beurteilungsprozess? Diese bislang wenig untersuchten Fragen standen im Mittelpunkt des durch das BMBF geförderten Projekts "Erfassung der Unterrichtsqualität in Large-Scale-Studien. Optimierung der Modellierung und Itemauswahl" (01LSA008; Trautwein, Lüdtke, Klieme, Nagengast & Wagner, 2011). In dem vorliegenden Beitrag werden zentrale Ergebnisse des Projekts vorgestellt. Zunächst wird ein allgemeiner Überblick über die Erfassung der Unterrichtsqualität gegeben, um dann genauer auf Chancen und Grenzen von Schülerbeurteilungen des Unterrichts anhand von empirischen Ergebnissen des Projekts einzugehen. (DIPF/Orig.)
DIPF-Abteilung: Bildungsqualität und Evaluation

Separating cognitive and content domains in mathematical competence Harks, Birgit; Klieme, Eckhard; Hartig, Johannes; Leiss, Dominik Zeitschriftenbeitrag | In: Educational Assessment | 2014 34691 Endnote: Autor*innen: Harks, Birgit; Klieme, Eckhard; Hartig, Johannes; Leiss, Dominik
Titel: Separating cognitive and content domains in mathematical competence
In: Educational Assessment, 19 (2014) 4, S. 243-266
DOI: 10.1080/10627197.2014.964114
URN: urn:nbn:de:0111-pedocs-179870
URL: http://nbn-resolving.org/urn:nbn:de:0111-pedocs-179870
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Bildungsforschung; Empirische Forschung; Geschlechtsspezifischer Unterschied; Inhalt; Itemanalyse; Item-Response-Theory; Kognition; Mathematische Kompetenz; Schuljahr 09; Vergleichsuntersuchung
Abstract: The present study investigates the empirical separability of mathematical (a) content domains, (b) cognitive domains, and (c) content-specific cognitive domains. There were 122 items representing two content domains (linear equations vs. theorem of Pythagoras) combined with two cognitive domains (modeling competence vs. technical competence) administered in a study with 1,570 German ninth graders. A unidimensional item response theory model, two two-dimensional multidimensional item response theory (MIRT) models (dimensions: content domains and cognitive domains, respectively), and a four-dimensional MIRT model (dimensions: content-specific cognitive domains) were compared with regard to model fit and latent correlations. Results indicate that the two content and the two cognitive domains can each be empirically separated. Content domains are better separable than cognitive domains. A differentiation of content-specific cognitive domains shows the best fit to the empirical data. Differential gender effects mostly confirm that the separated dimensions have different psychological meaning. Potential explanations, practical implications, and possible directions for future research are discussed. (DIPF/Orig.)
DIPF-Abteilung: Bildungsqualität und Evaluation

Modeling instructional sensitivity using a longitudinal multilevel differential item functioning […] Naumann, Alexander; Hochweber, Jan; Hartig, Johannes Zeitschriftenbeitrag | In: Journal of Educational Measurement | 2014 35094 Endnote: Autor*innen: Naumann, Alexander; Hochweber, Jan; Hartig, Johannes
Titel: Modeling instructional sensitivity using a longitudinal multilevel differential item functioning approach
In: Journal of Educational Measurement, 51 (2014) 4, S. 381-399
DOI: 10.1111/jedm.12051
URN: urn:nbn:de:0111-dipfdocs-189977
URL: http://www.dipfdocs.de/volltexte/2020/18997/pdf/Naumann_et_al_2014_Modeling_instructional_sensitivity_using_LML-DIF_A.pdf
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Curriculum; DESI <Deutsch-Englisch-Schülerleistungen-International>; Grundschule; Itemanalyse; Längsschnittuntersuchung; Leistungsmessung; Leistungstest; Modell; Qualität; Schülerleistung; Sekundarbereich; Testkonstruktion; Unterricht; Validität; Wirkung
Abstract (english): Students' performance in assessments is commonly attributed to more or less effective teaching. This implies that students' responses are significantly affected by instruction. However, the assumption that outcome measures indeed are instructionally sensitive is scarcely investigated empirically. In the present study, we propose a longitudinal multilevel-differential item functioning (DIF) model to combine two existing yet independent approaches to evaluate items' instructional sensitivity. The model permits for a more informative judgment of instructional sensitivity, allowing the distinction of global and differential sensitivity. Exemplarily, the model is applied to two empirical data sets, with classical indices (Pretest-Posttest Difference Index and posttest multilevel-DIF) computed for comparison. Results suggest that the approach works well in the application to empirical data, and may provide important information to test developers. (DIPF/Autor)
DIPF-Abteilung: Bildungsqualität und Evaluation