Search results in the DIPF database of publications

Your query:

(Schlagwörter: "Testkonstruktion")

Absolute and relative measures of instructional sensitivity Naumann, Alexander; Hartig, Johannes; Hochweber, Jan Journal Article | In: Journal of Educational and Behavioral Statistics | 2017 37374 Endnote: Author(s): Naumann, Alexander; Hartig, Johannes; Hochweber, Jan
Title: Absolute and relative measures of instructional sensitivity
In: Journal of Educational and Behavioral Statistics, 42 (2017) 6, S. 678-705
DOI: 10.3102/1076998617703649
URN: urn:nbn:de:0111-pedocs-156029
URL: http://www.dipfdocs.de/volltexte/2018/15602/pdf/1076998617703649_A.pdf
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Language: Englisch
Keywords: Bewertung; DESI <Deutsch-Englisch-Schülerleistungen-International>; Deutschland; Englischunterricht; Item-Response-Theory; Leistungsmessung; Messverfahren; Schüler; Schülerleistung; Schuljahr 09; Sprachkompetenz; Test; Testkonstruktion; Testtheorie; Unterricht; Wirkung
Abstract: Valid inferences on teaching drawn from students' test scores require that tests are sensitive to the instruction students received in class. Accordingly, measures of the test items' instructional sensitivity provide empirical support for validity claims about inferences on instruction. In the present study, we first introduce the concepts of absolute and relative measures of instructional sensitivity. Absolute measures summarize a single item's total capacity of capturing effects of instruction, which is independent of the test's sensitivity. In contrast, relative measures summarize a single item's capacity of capturing effects of instruction relative to test sensitivity. Then, we propose a longitudinal multilevel item response theory model that allows estimating both types of measures depending on the identification constraints. (DIPF/Orig.)
DIPF-Departments: Bildungsqualität und Evaluation

Time-on-task effects in digital reading are non-linear and moderated by persons' skills and tasks' […] Naumann, Johannes; Goldhammer, Frank Journal Article | In: Learning and Individual Differences | 2017 36715 Endnote: Author(s): Naumann, Johannes; Goldhammer, Frank
Title: Time-on-task effects in digital reading are non-linear and moderated by persons' skills and tasks' demands
In: Learning and Individual Differences, 53 (2017) , S. 1-16
DOI: 10.1016/j.lindif.2016.10.002
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Language: Englisch
Keywords: Digitale Medien; Hypertext; Internationaler Vergleich; Kognitive Prozesse; Leistungsmessung; Lesekompetenz; Lesen; Leseverstehen; Modell; OECD-Länder; PISA <Programme for International Student Assessment>; Problemlösen; Schülerleistung; Technologiebasiertes Testen; Testaufgabe; Testkonstruktion; Wirkung; Zeit
Abstract: Time-on-task effects on response accuracy in digital reading tasks were examined using PISA 2009 data (N = 34,062, 19 countries/economies). As a baseline, task responses were explained by time on task, tasks' easiness, and persons' digital reading skill (Model 1). Model 2 added a quadratic time-on-task effect, persons' comprehension skill and tasks' navigation demands as predictors. In each country, linear and quadratic time-on-task effects were moderated by person and task characteristics. Strongly positive linear time-on-task effects were found for persons being poor digital readers (Model 1) and poor comprehenders (Model 2), which decreased with increasing skill. Positive linear time-on-task effects were found for hard tasks (Model 1) and tasks high in navigation demands (Model 2). For easy tasks and tasks low in navigation demands, the time-on-task effects were negative, or close to zero, respectively. A negative quadratic component of the time-on-task effect was more pronounced for strong comprehenders, while the linear component was weaker. Correspondingly, for tasks high in navigation demands the negative quadratic component to the time-on-task effect was weaker, and the linear component was stronger. These results are in line with a dual-processing account of digital reading that distinguishes automatic reading components from resource-demanding regulation and navigation processes. (DIPF/Orig.)
DIPF-Departments: Bildungsqualität und Evaluation

Bedeutung und Berechnung der Prozentränge und T-Werte beim Erstellen von Testnormen. Anmerkungen […] Woerner, Wolfgang; Müller, Christian; Hasselhorn, Marcus Book Chapter | Aus: Trautwein, Ulrich; Hasselhorn, Marcus (Hrsg.): Begabungen und Talente | Göttingen: Hogrefe | 2017 37063 Endnote: Author(s): Woerner, Wolfgang; Müller, Christian; Hasselhorn, Marcus
Title: Bedeutung und Berechnung der Prozentränge und T-Werte beim Erstellen von Testnormen. Anmerkungen und Empfehlungen
In: Trautwein, Ulrich; Hasselhorn, Marcus (Hrsg.): Begabungen und Talente, Göttingen: Hogrefe, 2017 (Test und Trends. N. F., 15), S. 245-263
Publication Type: 4. Beiträge in Sammelwerken; Sammelband (keine besondere Kategorie)
Language: Deutsch
Keywords: Pädagogische Diagnostik; Begabtenauslese; Leistungstest; Testkonstruktion; Testmethodik; Qualität; Testauswertung; SPSS; Stichprobe; Testverfahren; Testtheorie
Abstract: Die Nützlichkeit und der wissenschaftliche Wert eines pädagogisch-psychologischen Diagnoseinstruments setzen neben dem Nachweis von angemessen erfüllten Gütekriterien und einer ausreichend detaillierten Dokumentation der verwendeten Methodik aus voraus, dass geeignete Normwerte vorliegen. Angesichts der zentralen Rolle des Normierungsprozesses überrascht - auch bei aktuell verwendeten (Schul-)Leistungstests - eine bedauerliche Heterogenität der methodisch-rechnerischen Bestimmung von Normwerten mit bisweilen erheblichen Konsequenzen für individualdiagnostische Entscheidungen. Einschlägige Lehrbücher beschreiben zwar verschiedene alternative Methoden, ohne jedoch konkrete Empfehlungen zu deren Verwendung anzusprechen. Um dies nachzuholen, wird in diesem Beitrag ausführlich auf die Bedeutung und Berechnung von Prozentrang-Werten und darauf aufbauenden Standardnorm-Äquivalenten eingegangen. Insbesondere wird der Unterschied zwischen kumulativen Prozentwerten und dem hier nachdrücklich empfohlenen Intervallmitten-Prozentrang (IM-PR) erläutert. Um künftigen Testentwicklern die Berechnung von IM-PR-Werten zu erleichtern, werden im Appendix entsprechende SPSS-Mustersyntaxen zur Verfügung gestellt - in der Hoffnung, dass sich dadurch in Zukunft eine einheitliche Berechnungsgrundlage der Normwerte von psychodiagnostischen Verfahren erzielen lässt. (DIPF/Orig.)
DIPF-Departments: Bildung und Entwicklung

PISA 2015. Eine Studie zwischen Kontinuität und Innovation Reiss, Kristina; Sälzer, Christine; Schiepe-Tiska, Anja; Klieme, Eckhard; Köller, Olaf (Hrsg.) Compilation Book | Münster: Waxmann | 2016 36828 Endnote: Editor(s) Reiss, Kristina; Sälzer, Christine; Schiepe-Tiska, Anja; Klieme, Eckhard; Köller, Olaf
Title: PISA 2015. Eine Studie zwischen Kontinuität und Innovation
Published: Münster: Waxmann, 2016
URL: https://www.waxmann.com/fileadmin/media/zusatztexte/3555Volltext.pdf
Publication Type: 2. Herausgeberschaft; Sammelband (keine besondere Kategorie)
Language: Deutsch
Keywords: Deutschland; Einstellung <Psy>; Eltern; Empirische Untersuchung; Entdeckendes Lernen; Forschendes Lernen; Fragebogen; Freude; Geschlechtsspezifischer Unterschied; Interesse; Internationale Organisation; Internationaler Vergleich; Jugendlicher; Kompetenzerwerb; Konzeption; Leistungsmessung; Lernbedingungen; Lernumgebung; Lesekompetenz; Mathematische Kompetenz; Migrationshintergrund; Motivation; Naturwissenschaftliche Kompetenz; Naturwissenschaftlicher Unterricht; OECD-Länder; Organisation; PISA <Programme for International Student Assessment>; Qualität; Querschnittuntersuchung; Reliabilität; Schulentwicklung; Schülerleistung; Schülerleistungstest; Schulform; Schulklima; Sekundarbereich; Selbstwirksamkeit; Skalierung; Soziale Herkunft; Stichprobe; Technologiebasiertes Testen; Teilnehmer; Testaufgabe; Testauswertung; Testdurchführung; Testkonstruktion; Testmethodik; Überzeugung; Validität; Veränderung; Wahrnehmung
Abstract: Alle drei Jahre testet PISA den Stand der Grundbildung fünfzehnjähriger Jugendlicher in den Bereichen Naturwissenschaften, Mathematik und Lesen und untersucht so Stärken und Schwächen von Bildungssystemen im Vergleich der OECD-Staaten. Zentral ist dabei die Frage, inwieweit es den teilnehmenden Staaten gelingt, die Schülerinnen und Schüler während der Schulpflicht auf ihre weiteren Bildungs- und Berufswege vorzubereiten. Der nationale Berichtsband stellt die Ergebnisse aus PISA 2015 vor, die von den Schülerinnen und Schülern in Deutschland erreicht wurden, und setzt sie in Relation zu den Ergebnissen in anderen OECD-Staaten. Der Schwerpunkt der Erhebungen und Auswertungen liegt dabei auf den Naturwissenschaften. PISA 2015 bildet als sechste Erhebungsrunde des Programme for International Student Assessment der OECD zugleich den Abschluss des zweiten Zyklus der Studie und den Beginn der computerbasierten Testung. Unter Beibehaltung wesentlicher Standards der Datenerhebung und -auswertung wurden in PISA 2015 mit dem Erhebungsmodus am Computer, einem differenzierteren Skalierungsmodell und einem erweiterten Testdesign mehrere Neuerungen eingeführt. Sie tragen Veränderungen in der Lern- und Lebenswelt Rechnung und werden die Aussagekraft der PISA-Studien auf lange Sicht verbessern. Mit Blick auf diese Balance zwischen Kontinuität und Innovation werden die Befunde aus PISA 2015 in diesem Band eingeordnet und diskutiert. (DIPF/Verlag)
DIPF-Departments: Bildungsqualität und Evaluation

Overidentification of learning disorders among language-minority students. Implications for the […] Brandenburg, Janin; Fischbach, Anne; Labuhn, Andju Sara; Rietz, Chantal Sabrina; Schmid, Johanna; […] Journal Article | In: Journal for Educational Research Online | 2016 36124 Endnote: Author(s): Brandenburg, Janin; Fischbach, Anne; Labuhn, Andju Sara; Rietz, Chantal Sabrina; Schmid, Johanna; Hasselhorn, Marcus
Title: Overidentification of learning disorders among language-minority students. Implications for the standardization of school achievement tests
In: Journal for Educational Research Online, 8 (2016) 1, S. 42-65
URN: urn:nbn:de:0111-pedocs-120293
URL: http://www.j-e-r-o.com/index.php/jero/article/view/621
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Beitrag in Sonderheft
Language: Englisch
Keywords: Deutsch; Deutsch als Zweitsprache; Deutschland; Diagnose; Empirische Untersuchung; Grundschule; Intelligenzmessung; Leistungsbeurteilung; Leistungsmessung; Lernstörung; Lesekompetenz; Migrationshintergrund; Muttersprache; Schülerleistung; Schuljahr 03; Standard; Test; Testkonstruktion
Abstract: Die Prävalenzstudie untersucht bei Kindern, die Deutsch als Muttersprache (DaM) bzw. als Zweitsprache (DaZ) sprechen, die Häufigkeit von Lernstörungen nach ICD-10 (WHO, 1992). Die meisten deutschen Schulleistungstests, die zur Lernstörungsdiagnose herangezogen werden, stellen keine gesonderten Normen für Kinder mit DaZ bereit. Es ist anzunehmen, dass dies zu einer Überidentifikation von Lernstörungen bei Kindern mit DaZ führt, da die besondere Spracherwerbssituation dieser Kinder nicht berücksichtigt wird. Dennoch ist bislang wenig über das Ausmaß dieses Effektes bekannt. Die vorliegende Studie vergleicht daher die Lernstörungsprävalenz zwischen Drittklässlern mit DaM (n = 566) bzw. mit DaZ (n = 478) wenn gemeinsame versus getrennte Schulleistungsnormen zur Leistungsbeurteilung herangezogen werden. Die Studie erbrachte drei wesentliche Ergebnisse: (1) Wie erwartet kam es bei Verwendung gemeinsamer Schulleistungsnormen zu einer deutlichen Erhöhung der Lernstörungsprävalenz bei Kindern mit DaZ. Die Wahrscheinlichkeit einer Lernstörungsdiagnose belief sich für diese Teilstichprobe auf 25-30 % und war damit annähernd doppelt so groß wie bei Kindern mit DaM, für die sich eine Gesamtprävalenz von 14-18 % ergab. (2) Die Gruppenunterschiede variierten dabei in Abhängigkeit des Lernstörungstypus: Während keine signifikant unterschiedlichen Prävalenzraten für die isolierte Rechenstörung (F81.2) nachweisbar waren, zeigten sich für die verbalen Lernstörungstypen (d. h. Lese-Rechtschreibstörung [F81.0], isolierte Rechtschreibstörung [F81.1] und kombinierte Störung schulischer Fertigkeiten [F81.3]) signifikant erhöhte Prävalenzraten für Kinder mit DaZ. (3) Werden hingegen getrennte Schulleistungsnormen zur Lernstörungsdiagnose herangezogen um für die besondere Spracherwerbssituation von Kindern mit DaZ zu kontrollieren, nähern sich die Prävalenzraten beider Gruppen wie erwartet auf ein vergleichbares Niveau an. Es wird diskutiert, welche Herausforderungen sich bei der Lernstörungsdiagnostik von Kindern mit DaZ ergeben. (DIPF/Orig.)
Abstract (english): This German prevalence study examined disproportionate representation of language-minority students among children identified with learning disorder (LD) according to ICD-10 (WHO, 1992). Most German school achievement tests used in LD diagnostics do not provide separate norms for language-minority students, and thus do not take these children's second language status into account when evaluating their academic performance. Although this is likely to result in an LD overidentification of language-minority students, little is known about the magnitude of this effect. Therefore, we compared the estimation of LD prevalence between native German speaking students (n = 566) and language-minority students (n = 478) when pooled versus group-specific achievement norms were used for LD classification. Three important findings emerged from our study: Firstly, and as expected, significant disproportionality effects occurred under pooled norms. Specifically, the likelihood of being diagnosed with LD amounted to 14-18 % among native German speakers and nearly doubled to 25-30 % among language-minority students. Secondly, disproportionality varied as a function of LD subtype: Whereas no disproportionate representation was revealed for arithmetic LD (F81.2), overidentification of language-minority students was found for verbal LD subtypes (namely, reading disorder [F81.0], spelling disorder [F81.1], and mixed disorder of scholastic skills [F81.3]). Thirdly, disproportionality effects were absent when group-specific norms were used for LD classification that controlled for second-language issues. Challenges that have to be met when testing language-minority students for LD are discussed. (DIPF/Orig.)
DIPF-Departments: Bildung und Entwicklung

Comparing C-tests and Yes/No vocabulary size tests as predictors of receptive language skills Harsch, Claudia; Hartig, Johannes Journal Article | In: Language Testing | 2016 35732 Endnote: Author(s): Harsch, Claudia; Hartig, Johannes
Title: Comparing C-tests and Yes/No vocabulary size tests as predictors of receptive language skills
In: Language Testing, 33 (2016) 4, S. 555-575
DOI: 10.1177/0265532215594642
URN: urn:nbn:de:0111-pedocs-125709
URL: https://nbn-resolving.org/urn:nbn:de:0111-pedocs-125709
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Language: Englisch
Keywords: Deutschland; Einstufung; Hörverstehen; Leseverstehen; Methode; Prognostischer Test; Regressionsanalyse; Reliabilität; Schüler; Sekundarbereich; Sprachtest; Strukturgleichungsmodell; Test; Testkonstruktion; Testverfahren; Validität
Abstract (english): Placement and screening tests serve important functions, not only with regard to placing learners at appropriate levels of language courses but also with a view to maximizing the effectiveness of administering test batteries. We examined two widely reported formats suitable for these purposes, the discrete decontextualized Yes/No vocabulary test and the embedded contextualized C-test format, in order to determine which format can explain more variance in measures of listening and reading comprehension. Our data stem from a large-scale assessment with over 3000 students in the German secondary educational context; the four measures relevant to our study were administered to a subsample of 559 students. Using regression analysis on observed scores and SEM on a latent level, we found that the C-test outperforms the Yes/No format in both methodological approaches. The contextualized nature of the C-test seems to be able to explain large amounts of variance in measures of receptive language skills. The C-test, being a reliable, economical and robust measure, appears to be an ideal candidate for placement and screening purposes. In a side-line of our study, we also explored different scoring approaches for the Yes-No format. We found that using the hit rate and the false-alarm rate as two separate indicators yielded the most reliable results. These indicators can be interpreted as measures for vocabulary breadth and as guessing factors respectively, and they allow controlling for guessing. (DIPF/Orig.)
DIPF-Departments: Bildungsqualität und Evaluation

Emotional competencies in geriatric nursing. Empirical evidence from a computer based large scale […] Kaspar, Roman; Hartig, Johannes Journal Article | In: Advances in Health Sciences Education | 2016 35733 Endnote: Author(s): Kaspar, Roman; Hartig, Johannes
Title: Emotional competencies in geriatric nursing. Empirical evidence from a computer based large scale assessment calibration study
In: Advances in Health Sciences Education, 21 (2016) 1, S. 105-109
DOI: 10.1007/s10459-015-9616-y
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Language: Englisch
Keywords: Berufsanforderung; Berufsausbildung; Bewertung; Computerunterstütztes Verfahren; Deutschland; Emotionale Kompetenz; Empathie; Geriatrie; Item-Response-Theory; Krankenpflege; Krankenpflegeberuf; Professionalität; Test; Testkonstruktion
Abstract (english): The care of older people was described as involving substantial emotion-related affordances. Scholars in vocational training and nursing disagree whether emotion-related skills could be conceptualized and assessed as a professional competence. Studies on emotion work and empathy regularly neglect the multidimensionality of these phenomena and their relation to the care process, and are rarely conclusive with respect to nursing behavior in practice. To test the status of emotion-related skills as a facet of client-directed geriatric nursing competence, 402 final-year nursing students from 24 German schools responded to a 62-item computer-based test. 14 items were developed to represent emotion-related affordances. Multi-dimensional IRT modeling was employed to assess a potential subdomain structure. Emotion-related test items did not form a separate subdomain, and were found to be discriminating across the whole competence continuum. Tasks concerning emotion work and empathy are reliable indicators for various levels of client-directed nursing competence. Claims for a distinct emotion-related competence in geriatric nursing, however, appear excessive with a process-oriented perspective. (DIPF/Orig.)
DIPF-Departments: Bildungsqualität und Evaluation

Automatic coding of short text responses via clustering in educational assessment Zehner, Fabian; Sälzer, Christine; Goldhammer, Frank Journal Article | In: Educational and Psychological Measurement | 2016 35473 Endnote: Author(s): Zehner, Fabian; Sälzer, Christine; Goldhammer, Frank
Title: Automatic coding of short text responses via clustering in educational assessment
In: Educational and Psychological Measurement, 76 (2016) 2, S. 280-303
DOI: 10.1177/0013164415590022
URN: urn:nbn:de:0111-pedocs-149795
URL: https://nbn-resolving.org/urn:nbn:de:0111-pedocs-149795
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Language: Englisch
Keywords: Antwort; Automatisierung; Codierung; Computerlinguistik; Leistungstest; Methode; PISA <Programme for International Student Assessment>; Software; Technologiebasiertes Testen; Testkonstruktion
Abstract: Automatic coding of short text responses opens new doors in assessment. We implemented and integrated baseline methods of natural language processing and statistical modelling by means of software components that are available under open licenses. The accuracy of automatic text coding is demonstrated by using data collected in the Programme for International Student Assessment (PISA) 2012 in Germany. Free text responses of 10 items with Formula responses in total were analyzed. We further examined the effect of different methods, parameter values, and sample sizes on performance of the implemented system. The system reached fair to good up to excellent agreement with human codings Formula Especially items that are solved by naming specific semantic concepts appeared properly coded. The system performed equally well with Formula and somewhat poorer but still acceptable down to Formula Based on our findings, we discuss potential innovations for assessment that are enabled by automatic coding of short text responses. (DIPF/Orig.)
DIPF-Departments: Bildungsqualität und Evaluation

Die Rolle transversaler Kompetenzen für schulisches Lernen. Das Beispiel des komplexen Problemlösens Niepel, Christoph; Rudolph, Julia; Goldhammer, Frank; Greiff, Samuel Book Chapter | Aus: Bundesministerium für Bildung und Forschung (Hrsg.): Forschungsvorhaben in Ankopplung an Large-Scale-Assessments | Berlin: Bundesministerium für Bildung und Forschung | 2016 36590 Endnote: Author(s): Niepel, Christoph; Rudolph, Julia; Goldhammer, Frank; Greiff, Samuel
Title: Die Rolle transversaler Kompetenzen für schulisches Lernen. Das Beispiel des komplexen Problemlösens
In: Bundesministerium für Bildung und Forschung (Hrsg.): Forschungsvorhaben in Ankopplung an Large-Scale-Assessments, Berlin: Bundesministerium für Bildung und Forschung, 2016 (Bildungsforschung, 44), S. 48-62
URL: https://www.bmbf.de/pub/Bildungsforschung_Band_44.pdf#page=50
Publication Type: 4. Beiträge in Sammelwerken; Sammelband (keine besondere Kategorie)
Language: Deutsch
Keywords: Bildungsstandards; Kompetenz; Konzeption; Lernen; Naturwissenschaftlicher Unterricht; PISA <Programme for International Student Assessment>; Problemlösen; Projekt; Schüler; Schülerleistung; Schülerleistungstest; Technologiebasiertes Testen; Testaufgabe; Testkonstruktion
Abstract: In den letzten beiden Dekaden wurde eine Reihe von Large-Scale-Assessments (LSA) auf internationaler Ebene initiiert. Zentrale Aufgabe solcher LSA ist das Bildungsmonitoring - untersucht wird also, ob und inwieweit politisch gesetzte Bildungsziele erreicht werden konnten bzw. in welchen zentralen Kompetenzbereichen besondere Stärken oder Schwächen zu beobachten sind. Konzipiert als groß angelegte Befragungen und Testungen von Schülerinnen und Schülern, geben LSA empirisch fundiertes Feedback zu Bildungssystemen auf nationaler und - zum Teil - internationaler Ebene. Dabei hatten und haben LSA einen großen Einfluss auf die Bildungspolitik und Bildungsforschung. Da LSA vornehmlich bildungspolitisch ausgerichtet sind, sind zusätzliche begleitende Forschungsanstrengungen nötig, um die gesammelten Daten für die Bildungsforschung tiefer gehend zu erschließen - Forschungsanstrengungen wie die des LSA004-Projektes zum komplexen Problemlösen im Kontext von LSA. Im Weiteren geben wir einen kombinierten und breiteren Überblick über Befunde zum komplexen Problemlösen aus der PISA-Studie und aus dem LSA004-Projekt. (DIPF/Orig.)
DIPF-Departments: Bildungsqualität und Evaluation

The history and current status of testing across cultures and countries Poortinga, Ype H.; Klieme, Eckhard Book Chapter | Aus: Leong, Frederick T. L.;Bartram, Dave;Cheung, Fanny M.;Geisinger, Kurt F.;Iliescu, Dragos (Hrsg.): The ITC international handbook of testing and assessment | New York; NY: Oxford University Press | 2016 36618 Endnote: Author(s): Poortinga, Ype H.; Klieme, Eckhard
Title: The history and current status of testing across cultures and countries
In: Leong, Frederick T. L.;Bartram, Dave;Cheung, Fanny M.;Geisinger, Kurt F.;Iliescu, Dragos (Hrsg.): The ITC international handbook of testing and assessment, New York; NY: Oxford University Press, 2016 , S. 14-28
Publication Type: 4. Beiträge in Sammelwerken; Sammelband (keine besondere Kategorie)
Language: Englisch
Keywords: Bildungsforschung; Empirische Forschung; Geschichte <Histor>; Internationaler Vergleich; Konzeption; Kulturdifferenz; Leistungsmessung; Leistungstest; Psychometrie; Qualität; Schülerleistung; Testkonstruktion; Testverfahren; Vergleichende Bildungsforschung
Abstract: There are a few hundred countries and many more culturally distinct groups identified by cultural anthropologists (Human Relations Area Files; Murdock et al., 2008). In many of these groups, psychometric instruments have been applied for at least half a century. There are hundreds of psychological tests and scales, of which dozens are widely used. The permutation of peoples, tests and historical time defines the global success of the testing enterprise. It also defines the scope of this chapter, making clear that our treatment of the topic can only be scant. We will focus on a few general themes, leaving aside, for the most part, specific countries or cultures and specific tests. In the first section we look at the early use of tests cross-culturally and the growing awareness that test scores are not likely to have the same meaning across cultures. In the second section we address the resulting issues of inequivalence or imcomparability and the methods of analysis that were devised to deal with these issues. The third section covers more recent history, in our view characterized by concern about inequivalence, but also by a pragmatic approach to the use of tests with test takers who differ in the language they speak and in cultural background. The chapter ends with a section in which we address some inherent difficulties in international test use and point to important achievements over the period of almost a century. (DIPF/Orig.)
DIPF-Departments: Bildungsqualität und Evaluation