Search results in the DIPF database of publications

Your query:

(Schlagwörter: "Nachschlagewerk")

Historische Begriffe der Erziehungswissenschaft. Erzeugung einer Ontologie Müller, Lars Book Chapter | Aus: Digital Humanities im deutschsprachigen Raum (Hrsg.): DHd 2016: Modellierung - Vernetzung - Visualisierung; Die Digital Humanities als fächerübergreifendes Paradigma, Konferenzabstracts, Universität Leipzig, 7. bis 12. März 2016 | Duisburg: nisaba | 2016 37229 Endnote: Author(s): Müller, Lars
Title: Historische Begriffe der Erziehungswissenschaft. Erzeugung einer Ontologie
In: Digital Humanities im deutschsprachigen Raum (Hrsg.): DHd 2016: Modellierung - Vernetzung - Visualisierung; Die Digital Humanities als fächerübergreifendes Paradigma, Konferenzabstracts, Universität Leipzig, 7. bis 12. März 2016, Duisburg: nisaba, 2016 , S. 352-354
URL: http://dhd2016.de/boa.pdf#page=352
Publication Type: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Language: Deutsch
Keywords: Erziehungswissenschaft; Bildungsgeschichte; Nachschlagewerk; Begriff; Terminologie; Ontologie
Abstract: Historische Begriffe der entstehenden Erziehungswissenschaft sollen als maschinenlesbare Terminologie für digitale historische Bildungsforschung bereitgestellt werden. Bibliografische Titelaufnahmen von Lemmata aus 24 historischen erziehungswissenschaftlichen Nachschlagewerken (1774 - 1942) werden hierfür in eine Ontologie transformiert. (DIPF/Orig.)
DIPF-Departments: Bibliothek für Bildungsgeschichtliche Forschung

High performance word sense alignment by joint modeling of sense distance and gloss similarity Matuschek, Michael; Gurevych, Iryna Book Chapter | Aus: Tsujii, Junichi; Hajic, Jan (Hrsg.): Proceedings of COLING 2014: Technical papers | Stroudsburg; PA: Association for Computational Linguistics | 2014 34974 Endnote: Author(s): Matuschek, Michael; Gurevych, Iryna
Title: High performance word sense alignment by joint modeling of sense distance and gloss similarity
In: Tsujii, Junichi; Hajic, Jan (Hrsg.): Proceedings of COLING 2014: Technical papers, Stroudsburg; PA: Association for Computational Linguistics, 2014 , S. 245-256
URL: http://www.aclweb.org/anthology/C/C14/C14-1025.pdf
Publication Type: 4. Beiträge in Sammelbänden; Tagungsband/Konferenzbeitrag/Proceedings
Language: Englisch
Keywords: Algorithmus; Automatisierung; Computerlinguistik; Nachschlagewerk; Online; Semantik; Sinn; Wort
Abstract: In this paper, we present a machine learning approach for word sense alignment (WSA) which combines distances between senses in the graph representations of lexical-semantic resources with gloss similarities. In this way, we significantly outperform the state of the art on each of the four datasets we consider. Moreover, we present two novel datasets for WSA between Wiktionary and Wikipedia in English and German. The latter dataset in not only of unprecedented size, but also created by the large community of Wiktionary editors instead of expert annotators, making it an interesting subject of study in its own right as the first crowdsourced WSA dataset. We will make both datasets freely available along with our computed alignments. (DIPF/Orig.)
DIPF-Departments: Informationszentrum Bildung

Lexical substitution dataset for German Cholakov, Kostadin; Biemann, Chris; Eckle-Kohler, Judith; Gurevych, Iryna Book Chapter | Aus: Calzolari, Nicoletta;Choukri,Khalid;Declerck,Thierry;Loftsson,Hrafn;Maegaard,Bente;Mariani,Joseph;Moreno,Asuncion;Odijk,Jan;Piperidis,Stelios (Hrsg.): Proceedings of the 9th International Conference on Language Resources and Evaluations (LREC 2014) | Reykjavik: European Language Resources Association | 2014 34575 Endnote: Author(s): Cholakov, Kostadin; Biemann, Chris; Eckle-Kohler, Judith; Gurevych, Iryna
Title: Lexical substitution dataset for German
In: Calzolari, Nicoletta;Choukri,Khalid;Declerck,Thierry;Loftsson,Hrafn;Maegaard,Bente;Mariani,Joseph;Moreno,Asuncion;Odijk,Jan;Piperidis,Stelios (Hrsg.): Proceedings of the 9th International Conference on Language Resources and Evaluations (LREC 2014), Reykjavik: European Language Resources Association, 2014 , S. 1406-1411
URL: http://www.lrec-conf.org/proceedings/lrec2014/pdf/545_Paper.pdf
Publication Type: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Language: Englisch
Keywords: Computerlinguistik; Computerunterstütztes Verfahren; Daten; Deutsch; Nachschlagewerk; Online; Sprachanalyse; Synonym; Textanalyse; World wide web 2.0; Wort
Abstract: This article describes a lexical substitution dataset for German. The whole dataset contains 2,040 sentences from the German Wikipedia, with one target word in each sentence. There are 51 target nouns, 51 adjectives, and 51 verbs randomly selected from 3 frequency groups based on the lemma frequency list of the German WaCKy corpus. 200 sentences have been annotated by 4 professional annotators and the remaining sentences by 1 professional annotator and 5 additional annotators who have been recruited via crowdsourcing. The resulting dataset can be used to evaluate not only lexical substitution systems, but also different sense inventories and word sense disambiguation systems.
DIPF-Departments: Informationszentrum Bildung

Automatically detecting corresponding edit-turn-pairs in Wikipedia Daxenberger, Johannes; Gurevych, Iryna Book Chapter | Aus: Association for Computational Linguistics (Hrsg.): Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Short Papers | Stroudsburg; PA: Association for Computational Linguistics | 2014 34577 Endnote: Author(s): Daxenberger, Johannes; Gurevych, Iryna
Title: Automatically detecting corresponding edit-turn-pairs in Wikipedia
In: Association for Computational Linguistics (Hrsg.): Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Short Papers, Stroudsburg; PA: Association for Computational Linguistics, 2014 , S. 187-192
URL: http://anthology.aclweb.org//P/P14/P14-2031.pdf
Publication Type: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Language: Englisch
Keywords: Automatisierung; Computerunterstütztes Verfahren; Information; Nachschlagewerk; Online; Soziale Software; Textanalyse; Wissen; World wide web 2.0
Abstract: In this study, we analyze links between edits in Wikipedia articles and turns from their discussion page. Our motivation is to better understand implicit details about the writing process and knowledge flow in collaboratively created resources. Based on properties of the involved edit and turn, we have defined constraints for corresponding edit-turn-pairs. We manually annotated a corpus of 636 corresponding and non-corresponding edit-turn-pairs. Furthermore, we show how our data can be used to automatically identify corresponding edit-turn-pairs. With the help of supervised machine learning, we achieve an accuracy of .87 for this task.
DIPF-Departments: Informationszentrum Bildung

What makes a good biography? Multidimensional quality analysis based on Wikipedia article feedback […] Flekova, Lucie; Ferschke, Oliver; Gurevych, Iryna Book Chapter | Aus: IW3C2 (Hrsg.): Proceedings of the 23rd International World Wide Web Conference (WWW 2014) | Geneva: International World Wide Web Conferences Steering Committee | 2014 34576 Endnote: Author(s): Flekova, Lucie; Ferschke, Oliver; Gurevych, Iryna
Title: What makes a good biography? Multidimensional quality analysis based on Wikipedia article feedback data
In: IW3C2 (Hrsg.): Proceedings of the 23rd International World Wide Web Conference (WWW 2014), Geneva: International World Wide Web Conferences Steering Committee, 2014 , S. 855-866
DOI: 10.1145/2566486.2567972
URL: http://dl.acm.org/citation.cfm?doid=2566486.2567972
Publication Type: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Language: Englisch
Keywords: Bewertung; Biografie; Feedback; Information; Information Retrieval; Inhaltsanalyse; Nachschlagewerk; Online; Qualität; Qualitätssicherung; World wide web 2.0
Abstract: With more than 22 million articles, the largest collaborative knowledge resource never sleeps, experiencing several article edits every second. Over one fifth of these articles describes individual people, the majority of which are still alive. Such articles are, by their nature, prone to corruption and vandalism. Manual quality assurance by experts can barely cope with this massive amount of data. Can it be effectively replaced by feedback from the crowd? Can we provide meaningful support for quality assurance with automated text processing techniques? Which properties of the articles should then play a key role in the machine learning algorithms and why? In this paper, we study the user-perceived quality of Wikipedia articles based on a novel Wikipedia user feedback dataset. In contrast to previous work on quality assessment which mostly relied on judgements of active Wikipedia authors, we analyze ratings of ordinary Wikipedia users along four quality dimensions (complete, well written, trustworthy and objective). We first present an empirical analysis of the novel dataset with over 36 million Wikipedia article ratings. We then select a subset of biographical articles and perform classification experiments to predict their quality ratings along each of the dimensions, exploring multiple linguistic, surface and network properties of the rated articles. Additionally, we study the classification performance and differences for the biographies of living and dead people as well as those for men and women. We demonstrate the effectiveness of our approach by the F1 scores of 0.94, 0.89, 0.73, and 0.73 for the dimensions complete, well written, trustworthy, and objective. Based on the results, we believe that the quality assessment of big textual data can be effectively supported by current text classification and language processing tools.
DIPF-Departments: Informationszentrum Bildung

WordNet-Wikipedia-Wiktionary. Construction of a three-way alignment Miller, Tristan; Gurevych, Iryna Book Chapter | Aus: Calzolari, Nicoletta;Choukri,Khalid;Declerck,Thierry;Loftsson,Hrafn;Maegaard,Bente;Mariani,Joseph;Moreno,Asuncion;Odijk,Jan;Piperidis,Stelios (Hrsg.): Proceedings of the 9th International Conference on Language Resources and Evaluations (LREC 2014) | Reykjavik: European Language Resources Association | 2014 34574 Endnote: Author(s): Miller, Tristan; Gurevych, Iryna
Title: WordNet-Wikipedia-Wiktionary. Construction of a three-way alignment
In: Calzolari, Nicoletta;Choukri,Khalid;Declerck,Thierry;Loftsson,Hrafn;Maegaard,Bente;Mariani,Joseph;Moreno,Asuncion;Odijk,Jan;Piperidis,Stelios (Hrsg.): Proceedings of the 9th International Conference on Language Resources and Evaluations (LREC 2014), Reykjavik: European Language Resources Association, 2014 , S. 2094-2100
URL: http://www.lrec-conf.org/proceedings/lrec2014/pdf/4_Paper.pdf
Publication Type: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Language: Englisch
Keywords: Analyse; Computerlinguistik; Evaluation; Nachschlagewerk; Online; Semantik; World wide web 2.0; Wort; Wörterbuch
Abstract: The coverage and quality of conceptual information contained in lexical semantic resources is crucial for many tasks in natural language processing. Automatic alignment of complementary resources is one way of improving this coverage and quality; however, past attempts have always been between pairs of specific resources. In this paper we establish some set-theoretic conventions for describing concepts and their alignments, and use them to describe a method for automatically constructing n-way alignments from arbitrary pairwise alignments. We apply this technique to the production of a three-way alignment from previously published WordNet-Wikipedia and WordNet-Wiktionary alignments. We then present a quantitative and informal qualitative analysis of the aligned resource. The three-way alignment was found to have greater coverage, an enriched sense representation, and coarser sense granularity than both the original resources and their pairwise alignments, though this came at the cost of accuracy. An evaluation of the induced word sense clusters in a word sense disambiguation task showed that they were no better than random clusters of equivalent granularity. However, use of the alignments to enrich a sense inventory with additional sense glosses did significantly improve the performance of a baseline knowledge-based WSD algorithm.
DIPF-Departments: Informationszentrum Bildung

The people's web meets NLP. Collaboratively Constructed Language Resources Gurevych, Iryna; Kim, Jungi (Hrsg.) Compilation Book | Dordrecht: Springer | 2013 32811 Endnote: Editor(s) Gurevych, Iryna; Kim, Jungi
Title: The people's web meets NLP. Collaboratively Constructed Language Resources
Published: Dordrecht: Springer, 2013 (Theory and applications of natural language processing)
DOI: 10.1007/978-3-642-35085-6
URL: https://link.springer.com/book/10.1007/978-3-642-35085-6
Publication Type: 2. Herausgeberschaft; Sammelband (keine besondere Kategorie)
Language: Englisch
Keywords: Automatisierung; Computerlinguistik; Computerspiel; Data Mining; Forschung; Gemeinschaft; Indexierung; Kooperation; Mehrsprachigkeit; Methodologie; Nachschlagewerk; Ontologie; Schreiben; Semantic Web; Soziale Software; Sprachanalyse; Sprache; Textanalyse; Textverarbeitung; Wissen; World wide web 2.0
Abstract (english): The application of collective intelligence in the domain of language yielded collaboratively constructed language resources (CCLR) that can be used in a variety of ways. For example, Wikipedia, Wiktionary, and other language resources constructed through crowdsourcing such as Games with a Purpose and Mechanical Turk have been used in many ways in NLP. Researchers started using such resources to substitute for or supplement conventional lexical semantic resources such as WordNet or linguistically annotated corpora in different NLP tasks. Another research direction is to utilize NLP techniques to enhance the collaboration process and its outcome. Overall the emergence of CCLRs has generated new challenges to the research field that are to be addressed in the present book. As the research field of CCLRs matures, it has become necessary to summarize a set of results to advance and focus the further research effort.
DIPF-Departments: Informationszentrum Bildung

Automatically classifying edit categories in wikipedia revisions Daxenberger, Johannes; Gurevych, Iryna Book Chapter | Aus: Yarowsky, David;Baldwin, Timothy;Korhonen, Anna;Livescu, Karen;Bethard, Steven (Hrsg.): Conference on Empirical Methods in Natural Language Processing (EMNLP 2013) | Stroudsburg; PA: Association for Computational Linguistics | 2013 34053 Endnote: Author(s): Daxenberger, Johannes; Gurevych, Iryna
Title: Automatically classifying edit categories in wikipedia revisions
In: Yarowsky, David;Baldwin, Timothy;Korhonen, Anna;Livescu, Karen;Bethard, Steven (Hrsg.): Conference on Empirical Methods in Natural Language Processing (EMNLP 2013), Stroudsburg; PA: Association for Computational Linguistics, 2013 , S. 578-589
URL: https://www.ukp.tu-darmstadt.de/fileadmin/user_upload/Group_UKP/publikationen/2013/EMNLP2013_DaxenbergerGurevych.pdf
Publication Type: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Language: Englisch
Keywords: Automatisierung; Computerlinguistik; Computerunterstütztes Verfahren; Evaluation; Korrektur; Nachschlagewerk; Qualität; Taxonomie; Textanalyse; World wide web 2.0
Abstract: In this paper, we analyze a novel set of features for the task of automatic edit category classification. Edit category classification assigns categories such as spelling error correction, paraphrase or vandalism to edits in a document. Our features are based on differences between two versions of a document including meta data, textual and language properties and markup. In a supervised machine learning experiment, we achieve a micro-averaged F1 score of .62 on a corpus of edits from the English Wikipedia. In this corpus, each edit has been multi-labeled according to a 21-category taxonomy. A model trained on the same data achieves state-of-the-art performance on the related task of fluency edit classification. We apply pattern mining to automatically labeled edits in the revision histories of different Wikipedia articles. Our results suggest that high-quality articles show a higher degree of homogeneity with respect to their collaboration patterns as compared to random articles.
DIPF-Departments: Informationszentrum Bildung

The impact of topic bias on quality flaw prediction in Wikipedia Ferschke, Oliver; Gurevych, Iryna; Rittberger, Marc Book Chapter | Aus: Association of Computational Linguistics (Hrsg.): 51st Annual Meeting of the Association for Computational Linguistics: Proceedings of the Conference System Demonstrations | Stroudsburg; PA: Association for Computational Linguistics | 2013 33527 Endnote: Author(s): Ferschke, Oliver; Gurevych, Iryna; Rittberger, Marc
Title: The impact of topic bias on quality flaw prediction in Wikipedia
In: Association of Computational Linguistics (Hrsg.): 51st Annual Meeting of the Association for Computational Linguistics: Proceedings of the Conference System Demonstrations, Stroudsburg; PA: Association for Computational Linguistics, 2013 , S. 721-730
URN: urn:nbn:de:0111-dipfdocs-184570
URL: http://www.dipfdocs.de/volltexte/2020/18457/pdf/The_impact_of_topic_bias_on_quality_flaw_prediction_in_Wikipedia_A.pdf
Publication Type: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Language: Englisch
Keywords: Algorithmus; Computerunterstütztes Verfahren; Evaluation; Nachschlagewerk; Online; Qualität; Qualitätssicherung; Reliabilität; Soziale Software; Standard; World wide web 2.0
Abstract: With the increasing amount of user generated reference texts in the web, automatic quality assessment has become a key challenge. However, only a small amount of annotated data is available for training quality assessment systems. Wikipedia contains a large amount of texts annotated with cleanup templates which identify quality flaws. We show that the distribution of these labels is topically biased, since they cannot be applied freely to any arbitrary article. We argue that it is necessary to consider the topical restrictions of each label in order to avoid a sampling bias that results in a skewed classifier and overly optimistic evaluation results. We factor out the topic bias by extracting reliable training instances from the revision history which have a topic distribution similar to the labeled articles. This approach better reflects the situation a classifier would face in a real-life application.
DIPF-Departments: Informationszentrum Bildung

UKP at CrossLink2. CJK-to-English Subtasks Kim, Jungi; Gurevych, Iryna Book Chapter | Aus: Kando, Noriko; Kishida, Kazuaki (Hrsg.): Proceedings of the 10th NTCIR Conference on Evaluation of Information Access Technologies | Tokio: NTCIR | 2013 34044 Endnote: Author(s): Kim, Jungi; Gurevych, Iryna
Title: UKP at CrossLink2. CJK-to-English Subtasks
In: Kando, Noriko; Kishida, Kazuaki (Hrsg.): Proceedings of the 10th NTCIR Conference on Evaluation of Information Access Technologies, Tokio: NTCIR, 2013 , S. 57-61
URL: http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings10/pdf/NTCIR/CrossLink-2/05-NTCIR10-CROSSLINK2-KimJ.pdf
Publication Type: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Language: Englisch
Keywords: Automatisierung; Computerlinguistik; Computerunterstütztes Verfahren; Information Retrieval; Mehrsprachigkeit; Nachschlagewerk; Online; Sprachanalyse
Abstract: This paper describes UKP's participation in the cross-lingual link discovery task at NTCIR-10 (CrossLink2). The task addressed in our work is to find valid anchor texts from a Chinese, Japanese, and Korean (CJK) Wikipedia page and retrieve the corresponding target Wiki pages in the English language. The CrossLink framework was developed based on our previous CrossLink system that works on the opposite directions of the language pairs, i.e. discovered anchor texts from English Wikipedia pages and their corresponding targets in CJK languages. The framework consists of anchor selection, anchor ranking, anchor translation, and target discovery sub-modules. Each sub-module in the framework has been shown to work well both in monolingual settings and English to CJK language pairs. We seek to find out whether the approach that worked very well for English to CJK would still work for CJK to English. We use the same experimental settings that were used in our previous participation, and our experimental runs show that the CJK-to- English CrossLink task is a much harder task when using the same resources as the English-to-CJK one.
DIPF-Departments: Informationszentrum Bildung