Search results in the DIPF database of publications

Your query:

(Personen: "Zehner," und "Fabian")

Advancing educational assessment through log data analysis, natural language processing, and […] Zehner, Fabian; Hahnel, Carolin (Hrsg.) Compilation Book | Hoboken; NJ: Wiley | 2023 43811 Endnote: Editor(s) Zehner, Fabian; Hahnel, Carolin
Title: Advancing educational assessment through log data analysis, natural language processing, and machine learning
Published: Hoboken; NJ: Wiley, 2023 (Journal of Computer Assisted Learning, Vol. 39, No. 3)
URL: https://onlinelibrary.wiley.com/toc/13652729/2023/39/3
Publication Type: 2. Herausgeberschaft; Zeitschriftensonderheft
Language: Englisch
DIPF-Departments: Lehr und Lernqualität in Bildungseinrichtungen

Semi-automatic coding of open-ended text responses in large-scale assessments Andersen, Nico; Zehner, Fabian; Goldhammer, Frank Journal Article | In: Journal of Computer Assisted Learning | 2023 43124 Endnote: Author(s): Andersen, Nico; Zehner, Fabian; Goldhammer, Frank
Title: Semi-automatic coding of open-ended text responses in large-scale assessments
In: Journal of Computer Assisted Learning, 39 (2023) 3, S. 841-854
DOI: 10.1111/jcal.12717
URL: https://onlinelibrary.wiley.com/doi/10.1111/jcal.12717
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Beitrag in Sonderheft
Language: Englisch
Abstract (english): Background:In the context of large-scale educational assessments, the effortrequired to code open-ended text responses is considerably more expensive andtime-consuming than the evaluation of multiple-choice responses because it requirestrained personnel and long manual coding sessions.Aim:Our semi-supervised coding methodeco(exploring coding assistant) dynamicallysupports human raters by automatically coding a subset of the responses.Method:We map normalized response texts into a semantic space and clusterresponse vectors based on their semantic similarity. Assuming that similar codes rep-resent semantically similar responses, we propagate codes to responses in optimallyhomogeneous clusters. Cluster homogeneity is assessed by strategically queryinginformative responses and presenting them to a human rater. Following each manualcoding, the method estimates the code distribution respecting a certainty intervaland assumes a homogeneous distribution if certainty exceeds a predefined threshold.If a cluster is determined to certainly comprise homogeneous responses, all remainingresponses are coded accordingly automatically. We evaluated the method in a simu-lation using different data sets.Results:With an average miscoding of about 3%, the method reduced the manualcoding effort by an average of about 52%.Conclusion: Combining the advantages of automatic and manual coding producesconsiderable coding accuracy and reduces the required manual effort. (DIPF/Orig.)
DIPF-Departments: Lehr und Lernqualität in Bildungseinrichtungen

Artificial intelligence on the advance to enhance educational assessment. Scientific clickbait or […] Zehner, Fabian; Hahnel, Carolin Journal Article | In: Journal of Computer Assisted Learning | 2023 44215 Endnote: Author(s): Zehner, Fabian; Hahnel, Carolin
Title: Artificial intelligence on the advance to enhance educational assessment. Scientific clickbait or genuine gamechanger?
In: Journal of Computer Assisted Learning, 39 (2023) 3, S. 695-702
DOI: 10.1111/jcal.12810
URL: https://onlinelibrary.wiley.com/doi/10.1111/jcal.12810
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Beitrag in Sonderheft
Language: Englisch
Abstract (english): Contributions in the Special Issue: The special issue assembles papers centring around log data analysis, natural language processing, and machine learning used to advance educational assessment. They demonstrate how semi- and unstructured data such as log and text data can, despite their challenging nature, be handled appropriately to benefit educational assessment. In this editorial, we contextualize the special issue's contributions within the diverse field of modern technology-based assessments. Reflection on Terminology: Moreover, we raise concerns about nowadays' use of the term artificial intelligence (AI) in scientific communication. While the contribution of AI to scientific progress is indisputable, the mere use of methods that have evolved within AI research does not necessarily render tools or studies AI-related. We argue that academics have the social responsibility to adopt accurate terminology, given it is integral to scientific rigour and proper scientific communication. Implications: In view of the inflationary use of the term AI in science, we propose a scheme to locate one's research in the field by focusing on (1) the type of data, (2) the processing involved, and (3) the output of a study and the actions derived from it, which are situated within the (4) scope of a study. (DIPF/Orig.)
DIPF-Departments: Lehr und Lernqualität in Bildungseinrichtungen

To score or not to score. Factors influencing performance and feasibility of automatic content […] Zesch, Torsten; Horbach, Andrea; Zehner, Fabian Journal Article | In: Educational Measurement: Issues and Practice | 2023 43438 Endnote: Author(s): Zesch, Torsten; Horbach, Andrea; Zehner, Fabian
Title: To score or not to score. Factors influencing performance and feasibility of automatic content scoring of text responses
In: Educational Measurement: Issues and Practice, 42 (2023) 1, S. 44-58
DOI: 10.1111/emip.12544
URL: https://onlinelibrary.wiley.com/doi/10.1111/emip.12544
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Beitrag in Sonderheft
Language: Englisch
Keywords: Antwort; Automatisierung; Bewertung; Einflussfaktor; Inhalt; Leistung; Text; Tool; Verfahren
Abstract (english): In this article, we systematize the factors influencing performance and feasibility of automatic content scoring methods for short text responses. We argue that performance (i.e., how well an automatic system agrees with human judgments) mainly depends on the linguistic variance seen in the responses and that this variance is indirectly influenced by other factors such as target population or input modality. Extending previous work, we distinguish conceptual, realization, and nonconformity variance, which are differentially impacted by the various factors. While conceptual variance relates to different concepts embedded in the text responses, realization variance refers to their diverse manifestation through natural language. Nonconformity variance is added by aberrant response behavior. Furthermore, besides its performance, the feasibility of using an automatic scoring system depends on external factors, such as ethical or computational constraints, which influence whether a system with a given performance is accepted by stakeholders. Our work provides (i) a framework for assessment practitioners to decide a priori whether automatic content scoring can be successfully applied in a given setup as well as (ii) new empirical findings and the integration of empirical findings from the literature on factors that influence automatic systems' performance. (DIPF/Orig.)
DIPF-Departments: Lehr und Lernqualität in Bildungseinrichtungen

shinyReCoR: A shiny application for automatically coding text responses using R Andersen, Nico; Zehner, Fabian Journal Article | In: Psych | 2021 41458 Endnote: Author(s): Andersen, Nico; Zehner, Fabian
Title: shinyReCoR: A shiny application for automatically coding text responses using R
In: Psych, 3 (2021) 3, S. 422-446
DOI: 10.3390/psych3030030
URL: https://www.mdpi.com/2624-8611/3/3/30
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Beitrag in Sonderheft
Language: Englisch
Keywords: Natürliche Sprache; Sprachverarbeitung; Text; Codierung; Computerprogramm; Methodologie
Abstract (english): In this paper, we introduce shinyReCoR: a new app that utilizes a cluster-based method for automatically coding open-ended text responses. Reliable coding of text responses from educational or psychological assessments requires substantial organizational and human effort. The coding of natural language in responses to tests depends on the texts' complexity, corresponding coding guides, and the guides' quality. Manual coding is thus not only expensive but also error-prone. With shinyReCoR, we provide a more efficient alternative. The use of natural language processing makes texts utilizable for statistical methods. shinyReCoR is a Shiny app deployed as an R-package that allows users with varying technical affinity to create automatic response classifiers through a graphical user interface based on annotated data. The present paper describes the underlying methodology, including machine learning, as well as peculiarities of the processing of language in the assessment context. The app guides users through the workflow with steps like text corpus compilation, semantic space building, preprocessing of the text data, and clustering. Users can adjust each step according to their needs. Finally, users are provided with an automatic response classifier, which can be evaluated and tested within the process. (DIPF/Orig.)
DIPF-Departments: Lehr und Lernqualität in Bildungseinrichtungen

Physicians as clinical teachers. Motivation and attitudes Gartmeier, Martin; Coppi, Renato Alves; Zehner, Fabian; Koumpouli, Konstantina; […] Journal Article | In: Beiträge zur Hochschulforschung | 2021 42108 Endnote: Author(s): Gartmeier, Martin; Coppi, Renato Alves; Zehner, Fabian; Koumpouli, Konstantina; Wijnen-Meijer, Marjo; Berberat, Pascal O.
Title: Physicians as clinical teachers. Motivation and attitudes
In: Beiträge zur Hochschulforschung, 43 (2021) 4, S. 74-95
URL: https://www.bzh.bayern.de/archiv/artikelarchiv/artikeldetail/physicians-as-clinical-teachers-motivation-and-attitudes
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Beitrag in Sonderheft
Language: Englisch
Keywords: Arzt; Lehrtätigkeit; Unterricht; Krankenhaus; Motivation; Einstellung <Psy>; Motiv <Psy>; Fragebogen; Deutschland
Abstract (english): Especially in university hospitals, many physicians have to fulfil multiple roles as they treat patients, conduct research and act as clinical teachers. The present study focuses upon the latter role and analyses which attitudes and motivational patterns guide physicians in their teaching activities. With regard to motivation, we draw on self-determination theory and distinguish between autonomous and controlled motives. In terms of attitude, we examine the extent to which clinical teachers use teaching activities that relate to a transmissive or constructivist paradigm. These questions are investigated using data from a questionnaire study conducted at a German university hospital. The respondents were 314 physicians who participated in one of two didactic qualification workshops at different points in their professional career. Physicians reported higher scores for autonomous types of motivation (derived from the self) than for controlled types (influenced by external factors). Further, we found that overall, the physicians considered both transmissive and constructivist concepts relevant for their teaching, but agreed even stronger to the constructivist paradigm. Latent class analyses revealed distinct patterns of attitudes towards teaching, but no relation between different motivations and teaching attitudes was found. (DIPF/Orig.)
DIPF-Departments: Lehr und Lernqualität in Bildungseinrichtungen

From byproduct to design factor. On validating the interpretation of process indicators based on […] Goldhammer, Frank; Hahnel, Carolin; Kroehne, Ulf; Zehner, Fabian Journal Article | In: Large-scale Assessments in Education | 2021 41612 Endnote: Author(s): Goldhammer, Frank; Hahnel, Carolin; Kroehne, Ulf; Zehner, Fabian
Title: From byproduct to design factor. On validating the interpretation of process indicators based on log data
In: Large-scale Assessments in Education, 9 (2021) , S. 20
DOI: 10.1186/s40536-021-00113-5
URN: urn:nbn:de:0111-pedocs-250050
URL: https://nbn-resolving.org/urn:nbn:de:0111-pedocs-250050
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Language: Englisch
Keywords: Leistungstest; Logdatei; PISA <Programme for International Student Assessment>; PIAAC <Programme for the International Assessment of Adult Competencies>; Datenanalyse; Interpretation; Leistungsmessung; Messverfahren; Indikator; Typologie; Testkonstruktion; Testtheorie
Abstract (english): International large-scale assessments such as PISA or PIAAC have started to provide public or scientific use files for log data; that is, events, event-related attributes and timestamps of test-takers' interactions with the assessment system. Log data and the process indicators derived from it can be used for many purposes. However, the intended uses and interpretations of process indicators require validation, which here means a theoretical and/or empirical justification that inferences about (latent) attributes of the test-taker's work process are valid. This article reviews and synthesizes measurement concepts from various areas, including the standard assessment paradigm, the continuous assessment approach, the evidence-centered design (ECD) framework, and test validation. Based on this synthesis, we address the questions of how to ensure the valid interpretation of process indicators by means of an evidence-centered design of the task situation, and how to empirically challenge the intended interpretation of process indicators by developing and implementing correlational and/or experimental validation strategies. For this purpose, we explicate the process of reasoning from log data to low-level features and process indicators as the outcome of evidence identification. In this process, contextualizing information from log data is essential in order to reduce interpretative ambiguities regarding the derived process indicators. Finally, we show that empirical validation strategies can be adapted from classical approaches investigating the nomothetic span and construct representation. Two worked examples illustrate possible validation strategies for the design phase of measurements and their empirical evaluation. (DIPF/Orig.)
DIPF-Departments: Lehr und Lernqualität in Bildungseinrichtungen

Measuring hygiene competence. The picture-based situational judgement test HygiKo Heininger, Susanne Katharina; Baumgartner, Maria; Zehner, Fabian; Burgkart, Rainer; Söllner, Nina; […] Journal Article | In: BMC Medical Education | 2021 41439 Endnote: Author(s): Heininger, Susanne Katharina; Baumgartner, Maria; Zehner, Fabian; Burgkart, Rainer; Söllner, Nina; Berberat, Pascal O.; Gartmeier, Martin
Title: Measuring hygiene competence. The picture-based situational judgement test HygiKo
In: BMC Medical Education, 21 (2021) , S. 410
DOI: 10.1186/s12909-021-02829-y
URL: https://bmcmededuc.biomedcentral.com/articles/10.1186/s12909-021-02829-y
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Language: Englisch
Keywords: Hygiene; Kompetenz; Testverfahren; Gesundheitswesen; Medizin; Student; Arzt; Medizinisches Personal; Situation; Bewertung; Vignette; Item-Response-Theory; Rasch-Modell
Abstract: Background: With the onset of the COVID-19 pandemic at the beginning of 2020, the crucial role of hygiene in healthcare settings has once again become very clear. For diagnostic and for didactic purposes, standardized and reliable tests suitable to assess the competencies involved in "working hygienically" are required. However, existing tests usually use self-report questionnaires, which are suboptimal for this purpose. In the present study, we introduce the newly developed, competence-oriented HygiKo test instrument focusing health-care professionals' hygiene competence and report empirical evidence regarding its psychometric properties. Methods: HygiKo is a Situational Judgement Test (SJT) to assess hygiene competence. The HygiKo-test consists of twenty pictures (items), each item presents only one unambiguous hygiene lapse. For each item, test respondents are asked (1) whether they recognize a problem in the picture with respect to hygiene guidelines and, (2) if yes, to describe the problem in a short verbal response. Our sample comprised n = 149 health care professionals (79.1 % female; age: M = 26.7 years, SD = 7.3 years) working as clinicians or nurses. The written responses were rated by two independent raters with high agreement (α > 0.80), indicating high reliability of the measurement. We used Item Response Theory (IRT) for further data analysis. Results: We report IRT analyses that show that the HygiKo-test is suitable to assess hygiene competence and that it allows to distinguish between persons demonstrating different levels of ability for seventeen of the twenty items), especially for the range of low to medium person abilities. Hence, the HygiKo-SJT is suitable to get a reliable and competence-oriented measure for hygiene-competence. Conclusions: In its present form, the HygiKo-test can be used to assess the hygiene competence of medical students, medical doctors, nurses and trainee nurses in cross-sectional measurements. In order to broaden the difficulty spectrum of the current test, additional test items with higher difficulty should be developed. The Situational Judgement Test designed to assess hygiene competence can be helpful in testing and teaching the ability of working hygienically. Further research for validity is needed. (DIPF/Orig.)
DIPF-Departments: Lehr und Lernqualität in Bildungseinrichtungen

Applying psychometric modeling to aid feature engineering in predictive log-data analytics. The […] Zehner, Fabian; Eichmann, Beate; Deribo, Tobias; Harrison, Scott; Bengs, Daniel; Andersen, Nico; […] Journal Article | In: Journal of Educational Data Mining | 2021 41457 Endnote: Author(s): Zehner, Fabian; Eichmann, Beate; Deribo, Tobias; Harrison, Scott; Bengs, Daniel; Andersen, Nico; Hahnel, Carolin
Title: Applying psychometric modeling to aid feature engineering in predictive log-data analytics. The NAEP EDM Competition
In: Journal of Educational Data Mining, 13 (2021) 2, S. 80-107
DOI: 10.5281/zenodo.5275316
URN: urn:nbn:de:0111-dipfdocs-250034
URL: https://nbn-resolving.org/urn:nbn:de:0111-dipfdocs-250034
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Beitrag in Sonderheft
Language: Englisch
Keywords: Psychometrie; Modellierung; Protokoll; Datenanalyse; Testverhalten; Cluster
Abstract (english): The NAEP EDM Competition required participants to predict efficient test-taking behavior based on log data. This paper describes our top-down approach for engineering features by means of psychometric modeling, aiming at machine learning for the predictive classification task. For feature engineering, we employed, among others, the Log-Normal Response Time Model for estimating latent person speed, and the Generalized Partial Credit Model for estimating latent person ability. Additionally, we adopted an n-gram feature approach for event sequences. Furthermore, instead of using the provided binary target label, we distinguished inefficient test takers who were going too fast and those who were going too slow for training a multi-label classifier. Our best-performing ensemble classifier comprised three sets of low-dimensional classifiers, dominated by test-taker speed. While our classifier reached moderate performance, relative to the competition leaderboard, our approach makes two important contributions. First, we show how classifiers that contain features engineered through literature-derived domain knowledge can provide meaningful predictions if results can be contextualized to test administrators who wish to intervene or take action. Second, our re-engineering of test scores enabled us to incorporate person ability into the models. However, ability was hardly predictive of efficient behavior, leading to the conclusion that the target label's validity needs to be questioned. Beyond competition-related findings, we furthermore report a state sequence analysis for demonstrating the viability of the employed tools. The latter yielded four different test-taking types that described distinctive differences between test takers, providing relevant implications for assessment practice. (DIPF/Orig.)
DIPF-Departments: Lehr und Lernqualität in Bildungseinrichtungen

PISA reading. Mode effects unveiled in short text responses Zehner, Fabian; Kroehne, Ulf; Hahnel, Carolin; Goldhammer, Frank Journal Article | In: Psychological Test and Assessment Modeling | 2020 39911 Endnote: Author(s): Zehner, Fabian; Kroehne, Ulf; Hahnel, Carolin; Goldhammer, Frank
Title: PISA reading. Mode effects unveiled in short text responses
In: Psychological Test and Assessment Modeling, 62 (2020) 1, S. 85-105
URN: urn:nbn:de:0111-pedocs-203542
URL: https://www.psychologie-aktuell.com/fileadmin/Redaktion/Journale/ptam-2020-1/05_Zehner.pdf
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Beitrag in Sonderheft
Language: Englisch
Keywords: PISA <Programme for International Student Assessment>; Deutschland; Schülerleistung; Leistungstest; Computerunterstütztes Verfahren; Papier; Bleistift; Antwort; Text; Inhalt; Information; Quantität; Methodenwechsel; Effekt; Wirkungsforschung; Datenanalyse; Sekundäranalyse
Abstract (english): Educational largescale assessments risk their temporal comparability when shifting from paperto computerbased assessment. A recent study showed how text responses have altered alongside PISA's mode change, indicating mode effects. Uncertainty remained, however, because it compared students from 2012 and 2015. We aimed at reproducing the findings in an experimental setting, in which n = 836 students answered PISA reading questions on computer, paper, or both. Text response features for information quantity and relevance were extracted automatically. Results show a comprehensive recovery of findings. Students incorporated more information into their text responses on computer than on paper, with some items being more affected than others. Regarding information relevance, we found less mode effect variance across items than the original study. Hints for a relationship between mode effect and gender across items could be reproduced. The study demonstrates the stability of linguistic feature extraction from text responses. (DIPF/Orig.)
DIPF-Departments: Bildungsqualität und Evaluation