NEPS TBT: Work Package Technology Based Testing

As a consortium partner, the DIPF contributes to the planning and implementation of the National Educational Panel Study (NEPS). One of focal point of this project is the work package on Technology-Based Testing (TBT).

Project Description

The Technology based Testing (TBT) work package is part of the NEPS methods group and is located at the DIPF in the TBA centre (Centre for Technology-Based Assessment). There it is under the scientific leadership of Prof. Dr. Frank Goldhammer and scientific co-leadership of Dr. Daniel Schiffner as well as the operational leadership of Dr. Lena Engelhardt. NEPS-TBT works closely with the Leibniz Institute for Educational Trajectories (LIfBi) and is concerned with innovative survey and test methods, for example, computer- and Internet-based skills testing.

Project Objectives

The TBT work package supports the implementation of Technology based testing in NEPS, especially in the domains of reading and mathematics, with science-based services, project-specific adaptations of software products, and accompanying scientific research.

Project Phase 2023-2027

In addition to providing scientific services, NEPS-TBT also aims to conduct accompanying scientific research on currently relevant topics in NEPS. In the current project phase, this includes

  1. Co-design and implementation of proctored vs. unproctored online surveys
    The focus is on the experimental investigation of possible future online survey formats and the effects of these formats on, for example, processing behavior or data quality. With the survey formats to be tested, new promising possibilities are to be explored in comparison to the classic one-to-one interview situation in the household. For example, people could complete the competency tests online accompanied by a virtually connected interviewer (proctored mode), or complete the competency tests independently in an online setting (unproctored mode). Indicators for potentially deviating processing behavior (e.g., prolonged inactivity, rapid guessing, etc.) are developed, read out at runtime, and appropriate prompts are designed and presented as interventions. It will be tested whether such prompts can induce behavioral adaptations. Furthermore, it will be investigated whether the different conditions lead to a valid interpretation of the outcomes comparable to the classical one-to-one setting.
  2. Diagnostic use of process indicators, e.g., to predict panel readiness
    On the basis of log data, process indicators are to be extracted that can be used for modeling competency data and, for example, serve the research-based further development of existing scaling models. Process indicators can also be used to consider aspects of data quality or missing coding, i.e., the assignment of missing values to a category of missing values.
    In addition, process data will be used together with outcome and paradata, such as response times, to predict willingness to participate in follow-up surveys. This can result in risk profiles with regard to drop-out, from which implications for panel maintenance and incentivization can be derived.

Research Topics

  • Investigation of different survey formats in online settings (e.g., proctoring, prompts)
  • Investigation of processing behavior in online tests and effectiveness of behavioral interventions
  • Predicting willingness to participate in follow-up surveys using multiple data sources, such as process indicators, outcome data, paradata
  • Creation and validation of innovative item and response formats for computer-based testing
  • Analysis and validation of process-related behavioral data from competency measurements
  • Modeling of processing speed

Science-based Services

  • Provision of the CBA ItemBuilder and a deployment software (IRTlib) for the delivery of computer-based test modules
  • Study-specific support in the form of support in the creation of test items and support in the creation of test modules
  • Regular workshops as well as the development of a knowledge database to support item authors in the independent creation of computer-based test modules
  • Prototypical creation of innovative and new item formats
  • Coordination of requirements for the further development of the authoring tool CBA ItemBuilder and the deployment software (IRTlib) for use in NEPS
  • Preparation and analysis of data sets (outcome and process data) and provision of existing tools of the TBA centre or the analysis of the collected data

Completed Project Phase 2018-2022

The superordinate aim of NEPS-TBT was the operation of scientifically grounded technology-based assessments that can connect to interna¬tional standards. Five central innovation aspects contributed to this objective: (1) update of software components step by step, (2) transfer of assessment innovations (e.g. innovative item formats & increase of measurement efficiency) in panel studies, (3) cross-mode linking also for heterogeneous assessment hardware (tablets, touch entry), (4) data processing of all TBT data via log data, (5) automated software testing and quality assurance. These foci of innovation were specifically implemented in the following work packages:

  1. A strategy had been developed for the testing and quality assurance of study-specific TBT-modules. By means of automated testing, complete data storage checks were enabled. Automated testing was furthermore serving the quality assurance of fixed test assembly and allowed for checking adaptive test assembly.
  2. The development of a standardized editor enabled automated checking of codebooks and test definition for multistage tests.
  3. A generic, study-independent concept was developed for coding missing responses taking into account indicators from log data.
  4. Prerequisites for implementing psychometrically sophisticated test designs were prepared, such as adaptive algorithms. The TBA Centre thus developed an infrastructure to configure CAT algorithms for test development from R. These was tested in simulation studies which have been operatively integrated into the delivery software.
  5. Following the paradigm of economy, result data and log data were not processed in parallel. Instead, result data has been processed on the basis of log data. To this end, criteria had been developed for defining the completeness of log data (cf. Kroehne & Goldhammer, 2018). These developments were used for creating generic tools, to enable reproducible and transparent data processing.

 Selected Publications:

  • Kroehne, U. & Goldhammer, F. (2018). How to conceptualize, represent, and analyze log data from technology-based assessments? A generic framework and an application to questionnaire items. Behaviormetrika, 45(2), 527–563.
  • Deribo, T., Goldhammer, F. & Kröhne, U. (2022). Changes in the speed-ability relation through different treatments of rapid guessing. Educational and Psychological Measurement, online first. doi: 10.1177/00131644221109490
  • Deribo, T., Kröhne, U. & Goldhammer, F. (2021). Model‐based treatment of rapid guessing. Journal of Educational Measurement, 58(2), 281-303. doi: 10.1111/jedm.12290
  • Kröhne, U., Deribo, T. & Goldhammer, F. (2020). Rapid guessing rates across administration mode and test setting. Psychological Test and Assessment Modeling, 62(2), 144-177. doi: 10.25656/01:23630
  • Kroehne, U. & Goldhammer, F. (2018). How to conceptualize, represent, and analyze log data from technology-based assessments? A generic framework and an application to questionnaire items. Behaviormetrika, 45(2), 527-563. doi: 10.1007/s41237-018-0063-y
  • Engelhardt, L., Goldhammer, F., Naumann, J., & Frey, A. (2017). Experimental validation strategies for heterogeneous computer-based assessment items. Computers in Human Behavior, 76(11), 683-692. doi: 10.1016/j.chb.2017.02.020

Completed Project Phase 2014-2017

For the domains reading, mathematics, science, and ICT literacy, which are surveyed multiple times in the longitudinal NEPS, changes in the measurement instruments as a result of computerization were psychometrically explored based on combined mode-effect and link studies as well as with the help of experimental mode variation (see e.g. Buerger, Kroehne & Goldhammer, 2016). For this purpose, such procedures of quantifying and correcting mode effects were investigated and applied to enable the introduction of computer-based competency testing in NEPS. Research and development in this project phase focused on the use of properties of technology-based testing for the further development and optimization of NEPS competency tests (e.g., testing multiple highlighting as a response format).

For in-depth research on mode and setting effects, for example, a procedure for log data collection in paper-based testing has been developed at TBA and has been used in selected NEPS studies (see, e.g., Kroehne & Goldhammer, 2018). In this approach, digital ballpoint pens are used to answer the paper-based administered questions in test booklets in which a special dot pattern is is printed (see, among others, Dirk et al, 2017 for a description). While the entries in the test booklet with these digital ballpoint pens are visible to panelists as if they had been made with an ordinary ballpoint pen, the coordinates and timestamps of all the responses are additionally recorded via a Bluetooth-connected computer. This data collection method allows the analysis of answering processes, such as e.g., the comparison of processing times between paper-based and computer-based testing (see, e.g., Kroehne, Hahnel, & Goldhammer,2019).Selected Publications

  • Kroehne, U., Gnambs, T., & Goldhammer, F. (2019). Disentangling setting and mode effects for online competence assessment. In H.-P. Blossfeld & H.-G. Roßbach (Hrsg.), Education as a lifelong process (2. Aufl.). Wiesbaden, Germany: Springer VS. doi: 10.1007/978-3-658-23162-0
  • Buerger, S., Kroehne, U., Köhler, C. & Goldhammer, F. (2019). What makes the difference? The impact of item properties on mode effects in reading assessments. Studies in Educational Evaluation, 62, 1-9. doi: 10.1016/j.stueduc.2019.04.005
  • Kroehne, U., Hahnel, C. & Goldhammer, F. (2019). Invariance of the response processes between gender and modes in an assessment of reading. Frontiers in Applied Mathematics and Statistics, 5:2. doi: 10.3389/fams.2019.00002
  • Kroehne, U., Buerger, S., Hahnel, C. & Goldhammer, F. (2019). Construct equivalence of PISA reading comprehension measured with paper‐based and computer‐based assessments. Educational Measurement, 38(3), 97-111. doi: 10.1111/emip.12280
  • Dirk, J., Kratzsch, G. K., Prindle, J. P., Kroehne, U., Goldhammer, F., & Schmiedek, F. (2017). Paper-Based Assessment of the Effects of Aging on Response Time: A Diffusion Model Analysis. Journal of Intelligence5(2), 12. doi: 10.3390/jintelligence5020012
  • Buerger, S., Kroehne, U., & Goldhammer, F. (2016). The Transition to Computer-Based Testing in Large-Scale Assessments: Investigating (Partial) Measurement Invariance between Modes. Psychological Test and Assessment Modeling58(4), 597-616.
  • Goldhammer, F., & Kroehne, U. (2014). Controlling Individuals’ Time Spent on Task in Speeded Performance Measures: Experimental Time Limits, Posterior Time Limits, and Response Time Modeling. Applied Psychological Measurement38(4), 255–267. doi: 10.1177/0146621613517164

Completed Project Phase 2009-2013

In the project phase 2009 until 2013, preparatory to the work package Technology Based Testing (TBT), DIPF was responsible for the following tasks:


  • The software development for a data warehouse was located at the TBA centre; the data warehouse ensures a data access as fast as possible while considering privacy data protection.
  • The development of a data warehouse was supposed to guarantee a central data stock for the entire NEPS study and to provide appropriate tools for filtering as well as report production.
  • Data Warehouse: Three proceeding processes of parallel software developed during the project duration: (1) implementation, optimisation and development of data banks, (2) implementation, optimisation and further development of ETL and reporting tools and (3) implementation, optimisation and further development of a web portal.
  • With the Data Warehouse the data from the four waves of assessment as well as the tools for filtering and report production should be accessible for the whole scientific community subsequently.


  • For the preparation of the electronic test assessment empirical assessments had been conveyed to identify possible differences between paper-based and computer-based testing (quantification of mode effects) and to investigate computer-based testing results (cross mode linking).
  • Mode effect studies (equivalence studies combined with linking studies of NEPS) have been conducted to prepare a test execution on a technological basis. The mode effect studies aimed at testing the equivalence of paper-based assessments (PBA) and computer-based assessment (CBA) by means of different criteria. The organisation and execution of mode effect studies was carried out with pillar 1 (Competence development in the life course).

Selected Publications:

  • Kroehne, U., & Martens, T. (2011). Computer-based competence tests in the national educational panel study: The challenge of mode effects. Zeitschrift Für Erziehungswissenschaft14(S2), 169– doi: 10.1007/s11618-011-0185-4
  • Rölke, H. (2012). The ItemBuilder: A Graphical Authoring System for Complex Item Development. In T. Bastiaens & G. Marks (Hrsg.), Proceedings of E-Learn: World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 1 (S. 344-353). Chesapeake, VA: AACE.


Logo Leibniz Institute for Educational Trajectories (LIfBi)


This project is being carried out in cooperation with...


Selected Publications:

  • Goldhammer, F., Hahnel, C., Kroehne, U. & Zehner, F. (2021). From byproduct to design factor: On validating the interpretation of process indicators based on log data. Large-scale Assessments in Education, 9:20. doi: 1186/s40536-021-00113-5
  • Goldhammer, F., Kroehne, U., Hahnel, C. & De Boeck, P. (2021). Controlling speed in component skills of reading improves the explanation of reading comprehension. Journal of Educational Psychology, 113(5), 861-878. doi: 1037/edu0000655
  • Engelhardt, L. & Goldhammer, F. (2019). Validating test score interpretations using time information. Frontiers in Psychology, 10:1131. doi: 10.3389/fpsyg.2019.01131
  • Engelhardt, L., Goldhammer, F., Naumann, J. & Frey, A. (2017). Experimental validation strategies for heterogeneous computer-based assessment items. Computers in Human Behavior, 76, 683-692. doi: 10.1016/j.chb.2017.02.020

Selected Talks:

  • Deribo, T.; Kröhne, U.; Hahnel, C., Goldhammer, F. (2023, March): Time-on-Task from Log and Eye Movement Data: Commonalities and Differences. Talk presented at the Annual Meeting of the National Council on Measurement in Education (NCME), Virtual, Chicago, USA, March 28.-30., 2023.
  • Engelhardt, L., Kroehne, U., Hahnel, C., Deribo, T., Goldhammer, F. (2021, June). Validating ability-related time components in reading tasks with unit structure. Talk presented at the NEPS Conference 2021, Virtual, June 8, 2021.

Project Management

Project Team

Project Details

Current project
Area of Focus Education in the Digital World
Department: Teacher and Teaching Quality
Unit: Technology-Based Assessment
Education Sectors: Extracurricular Learning, Higher Education, Primary and Secondary Education
01/2023 – 12/2027
External funding