Automatic Coding of Free Text Formats for Elaborated Educational Measurement

The project AKTeur aims at establishing an interdisciplinary cooperation network and a broad technological basis for automatic coding of free text formats in different scenarios.

Project description

Two exemplary application scenarios from educational measurement will serve to conduct respective research. For instance, automatic language processing procedures will be implemented. To enable automated assessment, closed answer formats are often applied in the field, e.g. multiple-choice. A downside of such approaches is an unavoidable loss of information even at the outset, owing to categorisation. On the other hand, free text formats allow for an elaborate and content –valid measurement of competencies as well as the estimation of several facets of a writing product. Automated coding of open responses can moreover provide for graded assessments (partial credit) and confidence values respectively weighting of individual assessments.


AKTeur pursues the central objective of creating a broad technnological basis for the automated coding of free text responses in diverse scenarios. To this end, an interdisciplinary co-operation network is founded, consisting of psychologists, pedagogues and computer scientists. We will exemplarily focus on two application scenarios from educational measurement:

  • Automatic coding of quality dimensions of a free text in research on learning to write (multidimensional rating)
  • Automatic coding of short free answer formats in psychological diagnostics of achievement characteristics (unidimensional rating)


The planned automatic systems are based on natural language processing (NLP) procedures. The estimation of results will draw on the agreement among raters, i.e. system output and human ratings on the one hand, with results from a pair of human raters on the other. We can thereby ascertain whether an error ratio of the system can be reduced to the conflict between two human raters.

Diagnostic research questions explore the developed system with respect to systematic sources of error, e.g. regarding items, response types and groups of test persons. This also pertains to explaining inter-task variation in the agreement between human and automatic assessment. Moreover, we will test comparability of coding variations at the scale level, by means of measurement model and structural invariance.


Funded from the institutional budget, subject to DIPF 2015

Co-operation partners

Project Details

Completed Projects
Department: Teacher and Teaching Quality
04/2013 – 12/2014