Ergebnis der Suche in der DIPF Publikationendatenbank

Ihre Abfrage:

(Schlagwörter: "Rasch-Modell")

Performance of infit and outfit confidence intervals calculated via parametric bootstrapping Silva Diaz, John Alexander; Köhler, Carmen; Hartig, Johannes Zeitschriftenbeitrag | In: Applied Measurement in Education | 2022 42707 Endnote: Autor*innen: Silva Diaz, John Alexander; Köhler, Carmen; Hartig, Johannes
Titel: Performance of infit and outfit confidence intervals calculated via parametric bootstrapping
In: Applied Measurement in Education, 35 (2022) 2, S. 116-132
DOI: 10.1080/08957347.2022.2067540
URL: https://www.tandfonline.com/doi/full/10.1080/08957347.2022.2067540
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Item-Response-Theory; Rasch-Modell; Statistik; Methode; Verfahren; Stichprobe; Test; Analyse; Simulation
Abstract: Testing item fit is central in item response theory (IRT) modeling, since a good fit is necessary to draw valid inferences from estimated model parameters. Infit and outfit fit statistics, widespread indices for detecting deviations from the Rasch model, are affected by data factors, such as sample size. Consequently, the traditional use of fixed infit and outfit cutoff points is an ineffective practice. This article evaluates if confidence intervals estimated via parametric bootstrapping provide more suitable cutoff points than the conventionally applied range of 0.8-1.2, and outfit critical ranges adjusted by sample size. The performance is evaluated under different sizes of misfit, sample sizes, and number of items. Results show that the confidence intervals performed better in terms of power, but had inflated type-I error rates, which resulted from mean square values pushed below unity in the large size of misfit conditions. However, when performing a one-side test with the upper range of the confidence intervals, the forementioned inflation was fixed. (DIPF/Orig.)
DIPF-Abteilung: Lehr und Lernqualität in Bildungseinrichtungen

Development and evaluation of a framework for the performance-based testing of ICT skills Engelhardt, Lena; Naumann, Johannes; Goldhammer, Frank; Frey, Andreas; Horz, Holger; Hartig, Katja; […] Zeitschriftenbeitrag | In: Frontiers in Education | 2021 41203 Endnote: Autor*innen: Engelhardt, Lena; Naumann, Johannes; Goldhammer, Frank; Frey, Andreas; Horz, Holger; Hartig, Katja; Wenzel, S. Franziska C.
Titel: Development and evaluation of a framework for the performance-based testing of ICT skills
In: Frontiers in Education, 6 (2021) , S. 668860
DOI: 10.3389/feduc.2021.668860
URL: https://www.frontiersin.org/articles/10.3389/feduc.2021.668860/full
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Informations- und Kommunikationstechnologie; Praktische Fertigkeit; Wissen; Problemlösen; Textverständnis; Bildverstehen; Bewertung; Modell; Item; Entwicklung; Testvalidität; Itemanalyse; Rasch-Modell; Implementation; Evaluation; Test; Testverhalten; Schüler; Sekundarstufe I; Baden-Württemberg; Rheinland-Pfalz; Deutschland
Abstract (english): This paper addresses the development of performance-based assessment items for ICT skills, skills in dealing with information and communication technologies, a construct which is rather broadly and only operationally defined. Item development followed a construct-driven approach to ensure that test scores could be interpreted as intended. Specifically, ICT-specific knowledge as well as problem-solving and the comprehension of text and graphics were defined as components of ICT skills and cognitive ICT tasks (i.e., accessing, managing, integrating, evaluating, creating). In order to capture the construct in a valid way, design principles for constructing the simulation environment and response format were formulated. To empirically evaluate the very heterogeneous items and detect malfunctioning items, item difficulties were analyzed and behavior-related indicators with item-specific thresholds were developed and applied. The 69 item's difficulty scores from the Rasch model fell within a comparable range for each cognitive task. Process indicators addressing time use and test-taker interactions were used to analyze whether most test-takers executed the intended processes, exhibited disengagement, or got lost among the items. Most items were capable of eliciting the intended behavior; for the few exceptions, conclusions for item revisions were drawn. The results affirm the utility of the proposed framework for developing and implementing performance-based items to assess ICT skills. (DIPF/Orig.)
DIPF-Abteilung: Lehr und Lernqualität in Bildungseinrichtungen

Measuring hygiene competence. The picture-based situational judgement test HygiKo Heininger, Susanne Katharina; Baumgartner, Maria; Zehner, Fabian; Burgkart, Rainer; Söllner, Nina; […] Zeitschriftenbeitrag | In: BMC Medical Education | 2021 41439 Endnote: Autor*innen: Heininger, Susanne Katharina; Baumgartner, Maria; Zehner, Fabian; Burgkart, Rainer; Söllner, Nina; Berberat, Pascal O.; Gartmeier, Martin
Titel: Measuring hygiene competence. The picture-based situational judgement test HygiKo
In: BMC Medical Education, 21 (2021) , S. 410
DOI: 10.1186/s12909-021-02829-y
URL: https://bmcmededuc.biomedcentral.com/articles/10.1186/s12909-021-02829-y
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Hygiene; Kompetenz; Testverfahren; Gesundheitswesen; Medizin; Student; Arzt; Medizinisches Personal; Situation; Bewertung; Vignette; Item-Response-Theory; Rasch-Modell
Abstract: Background: With the onset of the COVID-19 pandemic at the beginning of 2020, the crucial role of hygiene in healthcare settings has once again become very clear. For diagnostic and for didactic purposes, standardized and reliable tests suitable to assess the competencies involved in "working hygienically" are required. However, existing tests usually use self-report questionnaires, which are suboptimal for this purpose. In the present study, we introduce the newly developed, competence-oriented HygiKo test instrument focusing health-care professionals' hygiene competence and report empirical evidence regarding its psychometric properties. Methods: HygiKo is a Situational Judgement Test (SJT) to assess hygiene competence. The HygiKo-test consists of twenty pictures (items), each item presents only one unambiguous hygiene lapse. For each item, test respondents are asked (1) whether they recognize a problem in the picture with respect to hygiene guidelines and, (2) if yes, to describe the problem in a short verbal response. Our sample comprised n = 149 health care professionals (79.1 % female; age: M = 26.7 years, SD = 7.3 years) working as clinicians or nurses. The written responses were rated by two independent raters with high agreement (α > 0.80), indicating high reliability of the measurement. We used Item Response Theory (IRT) for further data analysis. Results: We report IRT analyses that show that the HygiKo-test is suitable to assess hygiene competence and that it allows to distinguish between persons demonstrating different levels of ability for seventeen of the twenty items), especially for the range of low to medium person abilities. Hence, the HygiKo-SJT is suitable to get a reliable and competence-oriented measure for hygiene-competence. Conclusions: In its present form, the HygiKo-test can be used to assess the hygiene competence of medical students, medical doctors, nurses and trainee nurses in cross-sectional measurements. In order to broaden the difficulty spectrum of the current test, additional test items with higher difficulty should be developed. The Situational Judgement Test designed to assess hygiene competence can be helpful in testing and teaching the ability of working hygienically. Further research for validity is needed. (DIPF/Orig.)
DIPF-Abteilung: Lehr und Lernqualität in Bildungseinrichtungen

Evaluation of online information in university students. Development and scaling of the screening […] Hahnel, Carolin; Eichmann, Beate; Goldhammer, Frank Zeitschriftenbeitrag | In: Frontiers in Psychology | 2020 40881 Endnote: Autor*innen: Hahnel, Carolin; Eichmann, Beate; Goldhammer, Frank
Titel: Evaluation of online information in university students. Development and scaling of the screening instrument EVON
In: Frontiers in Psychology, (2020) , S. 11:562128
DOI: 10.3389/fpsyg.2020.562128
URN: urn:nbn:de:0111-pedocs-232241
URL: https://nbn-resolving.org/urn:nbn:de:0111-pedocs-232241
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Deutschland; Internet; Informationskompetenz; Ressource; Glaubwürdigkeit; Relevanz; Bewertung; Test; Testentwicklung; Itemanalyse; Suchmaschine; Simulation; Technologiebasiertes Testen; Interview; Erhebungsinstrument; Evaluation; Student; Rasch-Modell; Empirische Untersuchung;
Abstract: As Internet sources provide information of varying quality, it is an indispensable prerequisite skill to evaluate the relevance and credibility of online information. Based on the assumption that competent individuals can use different properties of information to assess its relevance and credibility, we developed the EVON (evaluation of online information), an interactive computer-based test for university students. The developed instrument consists of eight items that assess the skill to evaluate online information in six languages. Within a simulated search engine environment, students are requested to select the most relevant and credible link for a respective task. To evaluate the developed instrument, we conducted two studies: (1) a pre-study for quality assurance and observing the response process (cognitive interviews of n = 8 students) and (2) a main study aimed at investigating the psychometric properties of the EVON and its relation to other variables (n = 152 students). The results of the pre-study provided first evidence for a theoretically sound test construction with regard to students' item processing behavior. The results of the main study showed acceptable psychometric outcomes for a standardized screening instrument with a small number of items. The item design criteria affected the item difficulty as intended, and students' choice to visit a website had an impact on their task success. Furthermore, the probability of task success was positively predicted by general cognitive performance and reading skill. Although the results uncovered a few weaknesses (e.g., a lack of difficult items), and the efforts of validating the interpretation of EVON outcomes still need to be continued, the overall results speak in favor of a successful test construction and provide first indication that the EVON assesses students' skill in evaluating online information in search engine environments. (DIPF/Orig.)
DIPF-Abteilung: Bildungsqualität und Evaluation

Using a multilevel random item Rasch model to examine item difficulty variance between random groups Hartig, Johannes; Köhler, Carmen; Naumann, Alexander Zeitschriftenbeitrag | In: Psychological Test and Assessment Modeling | 2020 40525 Endnote: Autor*innen: Hartig, Johannes; Köhler, Carmen; Naumann, Alexander
Titel: Using a multilevel random item Rasch model to examine item difficulty variance between random groups
In: Psychological Test and Assessment Modeling, 62 (2020) 1, S. 11-27
URL: https://www.psychologie-aktuell.com/fileadmin/Redaktion/Journale/ptam-2020-1/02_Hartig.pdf
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Rasch-Modell; Mehrebenenanalyse; Methode; Leistungsfähigkeit; Vergleichsuntersuchung; Simulation
Abstract: In educational assessments, item difficulties are typically assumed to be invariant across groups (e.g., schools or countries). We refer to variances of item difficulties on the group level violating this assumption as random group differential item functioning (RG-DIF). We examine the performance of three methods to estimate RG-DIF: (1) three-level Generalized Linear Mixed Models (GLMMs), (2) three-level GLMMs with anchor items, and (3) item-wise multilevel logistic regression (ML-LR) controlling for the estimated trait score. In a simulation study, the magnitude of RG-DIF and the covariance of the item difficulties on the group level were varied. When group level effects were independent, all three methods performed well. With correlated DIF, estimated variances on the group level were biased with the full three-level GLMM and ML-LR. This bias was more pronounced for ML-LR than for the full three-level GLMM. Using a three-level GLMM with anchor items allowed unbiased estimation of RG-DIF.
Abstract (english): In educational assessments, item difficulties are typically assumed to be invariant across groups (e.g., schools or countries). We refer to variances of item difficulties on the group level violating this assumption as random group differential item functioning (RG-DIF). We examine the performance of three methods to estimate RG-DIF: (1) three-level Generalized Linear Mixed Models (GLMMs), (2) three-level GLMMs with anchor items, and (3) item-wise multilevel logistic regression (ML-LR) controlling for the estimated trait score. In a simulation study, the magnitude of RG-DIF and the covariance of the item difficulties on the group level were varied. When group level effects were independent, all three methods performed well. With correlated DIF, estimated variances on the group level were biased with the full three-level GLMM and ML-LR. This bias was more pronounced for ML-LR than for the full three-level GLMM. Using a three-level GLMM with anchor items allowed unbiased estimation of RG-DIF.
DIPF-Abteilung: Bildungsqualität und Evaluation

Interpretation von Testwerten in der Item-Response-Theorie (IRT) Rauch, Dominique; Hartig, Johannes Sammelbandbeitrag | Aus: Moosbrugger, Helfried; Kelava, Augustin (Hrsg.): Testtheorie und Fragebogenkonstruktion | Berlin: Springer | 2020 40527 Endnote: Autor*innen: Rauch, Dominique; Hartig, Johannes
Titel: Interpretation von Testwerten in der Item-Response-Theorie (IRT)
Aus: Moosbrugger, Helfried; Kelava, Augustin (Hrsg.): Testtheorie und Fragebogenkonstruktion, Berlin: Springer, 2020 , S. 411-424
DOI: 10.1007/978-3-662-61532-4_17
URL: https://link.springer.com/chapter/10.1007%2F978-3-662-61532-4_17
Dokumenttyp: 4. Beiträge in Sammelwerken; Sammelband (keine besondere Kategorie)
Sprache: Deutsch
Schlagwörter: Test; Wert; Testauswertung; Interpretation; Item-Response-Theory; Modell; Bildungsforschung; Empirische Forschung; Kompetenz; Definition; Rasch-Modell; Datenanalyse
Abstract: Im vorliegenden Kapitel geht es um die Anwendung von IRT-Modellen im Rahmen der empirischen Bildungsforschung. Bei großen Schulleistungsstudien werden spezifische Vorteile der IRT genutzt, um beispielsweise das Matrix-Sampling von Testaufgaben, die Erstellung paralleler Testformen und die Entwicklung computerisierter adaptiver Tests zu ermöglichen. Ein weiterer wesentlicher Vorteil von IRT-Modellen ist die Möglichkeit der kriteriumsorientierten Interpretation IRT-basierter Testwerte. Diese wird durch die gemeinsame Verortung von Itemschwierigkeiten und Personenfähigkeiten auf einer Joint Scale durchführbar. Bei Gültigkeit des Rasch-Modells können individuelle Testwerte durch ihre Abstände zu Itemschwierigkeiten interpretiert werden. Auf dieser zentralen Eigenschaft von Rasch-Modellen bauen auch sog. "Kompetenzniveaus" auf. Zur leichteren Interpretation wird die kontinuierliche Skala in Abschnitte (Kompetenzniveaus) unterteilt, die dann als Ganzes kriteriumsorientiert beschrieben werden. In diesem Kapitel werden an einem gemeinsamen Beispiel die Definition und Beschreibung von Kompetenzniveaus anhand eines Vorgehens mit Post-hoc-Analysen der Items und die Verwendung von A-priori-Aufgabenmerkmalen veranschaulicht. (DIPF/Orig.)
DIPF-Abteilung: Bildungsqualität und Evaluation

Practical significance of item misfit in educational assessments Köhler, Carmen; Hartig, Johannes Zeitschriftenbeitrag | In: Applied Psychological Measurement | 2017 37161 Endnote: Autor*innen: Köhler, Carmen; Hartig, Johannes
Titel: Practical significance of item misfit in educational assessments
In: Applied Psychological Measurement, 41 (2017) 5, S. 388-400
DOI: 10.1177/0146621617692978
URN: urn:nbn:de:0111-pedocs-156084
URL: https://nbn-resolving.org/urn:nbn:de:0111-pedocs-156084
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Item-Response-Theory; Korrelation; Leistungsmessung; Rasch-Modell; Schülerleistung; Schülerleistungstest; Testkonstruktion; Testtheorie; Validität
Abstract: Testing item fit is an important step when calibrating and analyzing item response theory (IRT)-based tests, as model fit is a necessary prerequisite for drawing valid inferences from estimated parameters. In the literature, numerous item fit statistics exist, sometimes resulting in contradictory conclusions regarding which items should be excluded from the test. Recently, researchers argue to shift the focus from statistical item fit analyses to evaluating practical consequences of item misfit. This article introduces a method to quantify potential bias of relationship estimates (e.g., correlation coefficients) due to misfitting items. The potential deviation informs about whether item misfit is practically significant for outcomes of substantial analyses. The method is demonstrated using data from an educational test. (DIPF/Orig.)
DIPF-Abteilung: Bildungsqualität und Evaluation

Multidimensional structures of competencies. Focusing on text comprehension in English as a foreign […] Hartig, Johannes; Harsch, Claudia Sammelbandbeitrag | Aus: Leutner, Detlev;Fleischer, Jens;Grünkorn, Juliane;Klieme, Eckhard (Hrsg.): Competence assessment in education: Research, models and instruments | Cham: Springer | 2017 37126 Endnote: Autor*innen: Hartig, Johannes; Harsch, Claudia
Titel: Multidimensional structures of competencies. Focusing on text comprehension in English as a foreign language
Aus: Leutner, Detlev;Fleischer, Jens;Grünkorn, Juliane;Klieme, Eckhard (Hrsg.): Competence assessment in education: Research, models and instruments, Cham: Springer, 2017 (Methodology of educational measurement and assessment), S. 357-368
DOI: 10.1007/978-3-319-50030-0_21
Dokumenttyp: 4. Beiträge in Sammelwerken; Sammelband (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Deutschland; Englischunterricht; Englisch als Zweitsprache; Textverständnis; Hörverstehen; Item-Response-Theory; Itemanalyse; Schwierigkeit; Test; Schüler; Schuljahr 09; Text; Rezeption; Rasch-Modell
Abstract: The project "Modeling competencies with multidimensional item-response-theory models" examined different psychometric models for student performance in English as a foreign language. On the basis of the results of re-analyses of data from completed large scale assessments, a new test of reading and listening comprehension was constructed. The items within this test use the same text material both for reading and for listening tasks, thus allowing a closer examination of the relations between abilities required for the comprehension of both written and spoken texts. Furthermore, item characteristics (e.g., cognitive demands and response format) were systematically varied, allowing us to disentangle the effects of these characteristics on item difficulty and dimensional structure. This chapter presents results on the properties of the newly developed test: Both reading and listening comprehension can be reliably measured (rel = .91 for reading and .86 for listening). Abilities for both sub-domains prove to be highly correlated yet empirically distinguishable, with a latent correlation of .84. Despite the listening items being more difficult, in terms of absolute correct answers, the difficulties of the same items in the reading and listening versions are highly correlated (r = .84). Implications of the results for measuring language competencies in educational contexts are discussed. (DIPF/Orig.)
DIPF-Abteilung: Bildungsqualität und Evaluation

Betriebliche Ausbildungsmerkmale und berufsfachliche Handlungskompetenz in der Altenpflege Wittmann, Eveline; Weyland, Ulrike; Kaspar, Roman; Döring, Ottmar; Hartig, Johannes; […] Zeitschriftenbeitrag | In: Zeitschrift für Berufs- und Wirtschaftspädagogik | 2015 35734 Endnote: Autor*innen: Wittmann, Eveline; Weyland, Ulrike; Kaspar, Roman; Döring, Ottmar; Hartig, Johannes; Nauerth, Annette; Rechenbach, Simone; Möllers, Michaela; Simon, Julia; Worofka, Iberé
Titel: Betriebliche Ausbildungsmerkmale und berufsfachliche Handlungskompetenz in der Altenpflege
In: Zeitschrift für Berufs- und Wirtschaftspädagogik, 111 (2015) 3, S. 359-378
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Beitrag in Sonderheft
Sprache: Deutsch
Schlagwörter: Altenpflege; Auszubildender; Berufsausbildung; Betriebliche Ausbildung; Bildungsinhalt; Deutschland; Handlungskompetenz; Praxisbezug; Rasch-Modell; Regressionsanalyse; Test
Abstract: Unter Verwendung von Mehrebenen-Regressionsmodellen werden in diesem Beitrag Zusammenhänge zwischen von Altenpflegeschülerinnen und -schülern wahrgenommenen betrieblichen Ausbildungsbedingungen und der Testleistung in einem psychometrischen Test berufsfachlicher Handlungskompetenz im unmittelbar bewohnerbezogenen Bereich der Pflege älterer Menschen untersucht. Neben in der Literatur aufzufindenden Qualitätsmerkmalen betrieblicher Ausbildung wurden inhaltliche Aufgabenbereiche der betrieblichen Praxiseinrichtungen sowie - vor dem Hintergrund theoretischer Annahmen zur Lernwirksamkeit von zwischen Betrieb und Schule abgestimmten Ausbildungsinhalten - deren Kongruenz zu Inhaltsbereichen der Altenpflegeschulen im Zusammenhang mit der Testleistung analysiert. In einem querschnittlichen Design wurden hierzu 402 Absolventinnen und -absolventen der Pflegeausbildung aus 24 Schulklassen am Ende der Ausbildung befragt und getestet. Die Ergebnisse erweisen sich nur in Teilen als erwartungskonform. Die Befunde legen nahe, gegenüber globalen Merkmalen betrieblicher Ausbildungsqualität eher proximale Merkmale der Ausbildung in den betrieblichen Praxiseinrichtungen, wie den Umfang erfahrener inhaltlicher Aufgabenbereiche und die qualitative Befassung mit diesen spezifischen Aufgabenbereichen, verstärkt zu beleuchten. (DIPF/Orig.)
Abstract (english): Using multi-level regression models, in this contribution we study the relationship between practical training conditions and the test score of geriatric nursing students in a psychometric competence test on immediately client-related geriatric nursing. Apart from features of high quality practical training found in existing literature, task content in the practical training sites as well as - on the background of theoretical assumptions on the effects of the coordination of content between practical training sites and the schools on learning - congruency of training site tasks with content taught at schools are being analyzed. In a cross-sectional design, 402 geriatric nursing students from 24 classes were questioned and tested at the end of VET. Test results are only partially congruent with expectations. The findings suggest to pay more attention to the study of proximal aspects of training at the practical training sites rather than global estimates of training quality, like task content and the qualitative aspects of confrontations with specific task content. (DIPF/Orig.)
DIPF-Abteilung: Bildungsqualität und Evaluation

Generalising IRT to discriminate between examinees Boubekki, Ahcène; Brefeld, Ulf; Delacroix, Thomas Sammelbandbeitrag | Aus: Santos, O. C.; Boticario, J. G.; Romero, C.; Pechenizkiy, M.; Merceron, A.; Mitros, P.;Luna, J. M.; Mihaescu, C.; Moreno, P.; Hershkovitz, A. ;Ventura, S.; Desmarais, M. (Hrsg.): Proceedings of the 8th International Conference on Educational Data Mining (EMD 2015) 26-29 June Madrid, Spain | Madrid: International Educational Data Mining Society | 2015 35623 Endnote: Autor*innen: Boubekki, Ahcène; Brefeld, Ulf; Delacroix, Thomas
Titel: Generalising IRT to discriminate between examinees
Aus: Santos, O. C.; Boticario, J. G.; Romero, C.; Pechenizkiy, M.; Merceron, A.; Mitros, P.;Luna, J. M.; Mihaescu, C.; Moreno, P.; Hershkovitz, A. ;Ventura, S.; Desmarais, M. (Hrsg.): Proceedings of the 8th International Conference on Educational Data Mining (EMD 2015) 26-29 June Madrid, Spain, Madrid: International Educational Data Mining Society, 2015 , S. 604-606
URL: http://www.educationaldatamining.org/EDM2015/proceedings/edm2015_proceedings.pdf
Dokumenttyp: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Item-Response-Theory; Leistungstest; PISA <Programme for International Student Assessment>; Rasch-Modell; Testauswertung; Testkonstruktion
Abstract: We present a generalisation of the IRT framework that allows to discriminate between examinees. Our model therefore introduces examinee parameters that can be optimised with Expectation Maximisation-like algorithms. We provide empirical results on PISA data showing that our approach leads to a more appropriate grouping of PISA countries than by test scores and socio-economic indicators. (DIPF/Orig.)
DIPF-Abteilung: Informationszentrum Bildung