Logo: Deutsches Institut für Internationale Pädagogische Forschung


Modelling Competencies with Multi-Dimensional Item-Response-Theory Models (IRT)

The MIRT project deals with psychometrical modelling of foreign language competencies in English. Different Item-Response Theory (IRT) models will be compared to formulate general criteria for the selection of psychometrical models.

Project description

The project provides a framework for comparing different models from Item-Response Theory IRT) for modelling competencies, which are applied to test data for modelling competencies in English as a foreign language. First, existing data from completed large scale assessments were analysed. In the second phase, new test tasks for assessing reading and listening comprehension were developed with systematically varied task demands and administered to a random sample of students. At present, the third project phase is under way, focusing on the assessment of the data regarding dimensionality and item difficulties. A formulation of a competence model for receptive foreign language competencies is envisaged, the model being empirically sound regarding dimensions and levels. Furthermore, the characteristics used in constructing the model shall be validated by independent judgments, and the scales will be linked to the levels represented in the Common European Framework of Reference for Languages (CEFR)

We have reserved some time at the end of the current funding phase for presenting findings from the empirical studies in the context of international journal publications.


In the third phase funded from October 2011 until September 2013, four main objectives will be pursued:

1. The task characteristics used in developing tasks in the second project phase will be confirmed by independent coders. To this end, and in analogy to the procedure used in the DESI and the educational standards tasks in the first project phase, trained students will act as coders and evaluate the task characteristics.

2. The dimensional structure of competencies assessed by means of the newly developed tasks will be tested by multi-dimensional IRT-models. We expect reading and listening competencies to be two dimensions that are each homogeneous and correlate with each other. This assumed model will be tested against competing models that include  further dimensions defined by certain task characteristics (e. g. open response format).

3. Regarding competence dimensions (e. g. reading and listening comprehension), effects of task characteristics on item difficulties will be assessed. We expect all characteristics that were considered in developing the task to have a substantial effect on its difficulty. Testing these effects will serve to validate the constructs, and findings from modelling item difficulties will be used in defining competence levels.

4. The competence-level based models developed in the project will be linked to the CEFR. On the one hand, experts will assign the actual test tasks to the CEFR levels, on the other hand the task characteristics considered in constructing the tasks and their combinations will be translated to the CEFR.

5. Systematic comparison of diagnostic statements resulting from the application of different psychometric models. Practical use and comprehensibility of findings from different models for foreign language instruction will be assessed by interviewing English language teachers.


1. Can the task characteristics used in constructing also be assessed by independent evaluators?

2. How can the structure of reading and listening comprehension in English as a foreign language be described with respect to dimensional structure?

3. Can levels of reading and listening comprehension in English as a foreign language be described by task demands, and can these descriptions be generalized across both domains?

4. Can connections be drawn between the demands posed by the developed task material and the levels defined in the CEFR?

5. In how far do teachers regard feedback on results derived from complex psychometric models as comprehensible and useful?

Methodological approach

1. Test data are analysed using uni- and multi-dimensional IRT models.

2. Task difficulties are predicted by regression analyses and explanatory IRT modelss.

3. The task characteristics are tested by trained coders (students of English as a Foreign Language/ teacher trainees)

4. Linkage to the Common European Framework of Reference for Languages (CEFR) is subject to expert judgments.

5. A systematic preparation of results for the reading and listening comprehension test relies on questionnaire-based teacher ratings of comprehensibility and use.

Project milestones throughout the current year

  • Completion of the data set and feedback on results
  • Analyses regarding item and scale characteristics
  • Analyses regarding dimensionality
  • Analyses regarding prediction of item difficulties


  • Hartig, J., Frey, A., Nold, G. & Klieme, E. (in press). An application of explanatory item response modeling for model-based proficiency scaling. Educational and Psychological Measurement.
  • Hartig, J. & Frey, A. (2012). Konstruktvalidierung und Skalenbeschreibung in der Kompetenz-diagnostik durch die Vorhersage von Aufgabenschwierigkeiten. Psychologische Rundschau, 63, 43-49. [Construct validation and Scale description in competence diagnostics by predicting task difficulties]
  • Hartig, J. & Höhler, J. (2010). Modellierung von Kompetenzen mit mehrdimensionalen IRT-Modellen. Zeitschrift für Pädagogik, 56, 189-198.  [Modelling competencies with multi-dimensional IRT models]
  • Höhler, J., Hartig, J. & Goldhammer, F. (2010). Modeling the multidimensional structure of students' foreign language competence within and between classrooms. Psychological Test and Assessment Modeling, 52, 323-340.
  • Hartig, J. & Höhler, J. (2009). Multidimensional IRT models for the assessment of competences. Studies in Educational Evaluation, 35, 57-63.
  • Hartig, J. & Höhler, J. (2008). Representation of competencies in multidimensional IRT models with within- and between-item multidimensionality. Journal of Psychology, 216, 89–101.
  • Hartig, J. (2008). Psychometric models for the assessment of competencies. In J. Hartig, E. Klieme & D. Leutner (Hrsg.) Assessment of competencies in educational contexts (S. 69–90). Göttingen: Hogrefe & Huber Publishers.


The project receives funding from the German Research Association (Deutsche Forschungsgemeinschaft, reg. no-. HA 5050/2-3) within the research priority programme "Competence Models for Assessing Individual Learning Outcomes and Evaluating Educational Processes" (SPP 1293).


Project Management

Project Management

Prof. Dr. Johannes Hartig

Project Details

Completed projects
10/2007 – 9/2013
External funding
Department: Educational Quality and Evaluation
last modified Aug 04, 2017