Ergebnis der Suche in der DIPF Publikationendatenbank

Ihre Abfrage:

(Schlagwörter: "Linguistik")

Mass collaboration on the web. Textual content analysis by means of natural language processing Habernal, Ivan; Daxenberger, Johannes; Gurevych, Iryna Sammelbandbeitrag | Aus: Cress, Ulrike;Moskaliuk, Johannes;Jeong, Heisawn (Hrsg.): Mass collaboration and education | Cham: Springer | 2016 35504 Endnote: Autor*innen: Habernal, Ivan; Daxenberger, Johannes; Gurevych, Iryna
Titel: Mass collaboration on the web. Textual content analysis by means of natural language processing
Aus: Cress, Ulrike;Moskaliuk, Johannes;Jeong, Heisawn (Hrsg.): Mass collaboration and education, Cham: Springer, 2016 , S. 367-390
DOI: 10.1007/978-3-319-13536-6_18
Dokumenttyp: 4. Beiträge in Sammelwerken; Sammelband (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Argumentation; Computerlinguistik; Data Mining; Daten; Inhaltsanalyse; Text; Web log; Wiki; Wissen
Abstract: This chapter describes perspectives for utilizing natural language processing (NLP) to analyze artifacts arising from mass collaboration on the web. In recent years, the amount of user-generated content on the web has grown drastically. This content is typically noisy and un- or at best semi-structured, so that traditional analysis tools cannot properly handle it. To discover linguistic structures in this data, manual analysis is not feasible due to the large quantities of data. In this chapter, we explain and analyze web-based resources of mass collaboration, namely, wikis, web forums, debate platforms, and blog comments. We introduce recent advances and ongoing efforts to analyze textual content in two of these resources with the help of NLP. This includes an approach to discover flows of knowledge in online mass collaboration as well as methods to mine argumentative structures in natural language text. Finally, we outline application scenarios of the previously discussed techniques and resources within the domain of education. (DIPF/Orig.)
DIPF-Abteilung: Informationszentrum Bildung

What makes a convincing argument? Empirical analysis and detecting attributes of convincingness in […] Habernal, Ivan; Gurevych, Iryna Sammelbandbeitrag | Aus: Association for Computational Linguistics (Hrsg.): Proceedings of the 2016 conference on Empirical Methods in Natural Language Processing (EMNLP) | Stroudsburg; PA: Association for Computational Linguistics | 2016 36989 Endnote: Autor*innen: Habernal, Ivan; Gurevych, Iryna
Titel: What makes a convincing argument? Empirical analysis and detecting attributes of convincingness in Web argumentation
Aus: Association for Computational Linguistics (Hrsg.): Proceedings of the 2016 conference on Empirical Methods in Natural Language Processing (EMNLP), Stroudsburg; PA: Association for Computational Linguistics, 2016 , S. 1214-1223
URL: https://aclweb.org/anthology/D16-1129
Dokumenttyp: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Argumentation; Bewertung; Computerlinguistik; Klassifikation; Qualität; Überzeugung
Abstract (english): This article tackles a new challenging task in computational argumentation. Given a pair of two arguments to a certain controversial topic, we aim to directly assess qualitative properties of the arguments in order to explain why one argument is more convincing than the other one. We approach this task in a fully empirical manner by annotating 26k explanations written in natural language. These explanations describe convincingness of arguments in the given argument pair, such as their strengths or flaws. We create a new crowd-sourced corpus containing 9,111 argument pairs, multi-labeled with 17 classes, which was cleaned and curated by employing several strict quality measures. We propose two tasks on this data set, namely (1) predicting the full label distribution and (2) classifying types of flaws in less convincing arguments. Our experiments with feature-rich SVM learners and Bidirectional LSTM neural networks with convolution and attention mechanism reveal that such a novel fine-grained analysis of Web argument convincingness is a very challenging task. We release the new UKPConvArg2 corpus and software under permissive licenses to the research community. (DIPF/Orig.)
DIPF-Abteilung: Informationszentrum Bildung

C4Corpus. Multilingual web-size corpus with free license Habernal, Ivan; Zayed, Omnia; Gurevych, Iryna Sammelbandbeitrag | Aus: European Language Resources Association (Hrsg.): Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016) | Portoloz: European Language Resources Association | 2016 37065 Endnote: Autor*innen: Habernal, Ivan; Zayed, Omnia; Gurevych, Iryna
Titel: C4Corpus. Multilingual web-size corpus with free license
Aus: European Language Resources Association (Hrsg.): Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), Portoloz: European Language Resources Association, 2016 , S. 914-922
URL: https://www.informatik.tu-darmstadt.de/de/forschung/veroeffentlichungen/details/?no_cache=1&tx_bibtex_pi1%5Bpub_id%5D=TUD-CS-2016-0023
Dokumenttyp: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Computerlinguistik; Datenanalyse; Dokument; Internet; Text; Textanalyse; Urheberrecht
Abstract: Large Web corpora containing full documents with permissive licenses are crucial for many NLP tasks. In this article we present the construction of 12 million-pages Web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs. Our highly-scalable Hadoop-based framework is able to process the full CommonCrawl corpus on 2000+ CPU cluster on the Amazon Elastic Map/Reduce infrastructure. The processing pipeline includes license identification, state-of-the-art boilerplate removal, exact duplicate and near-duplicate document removal, and language detection. The construction of the corpus is highly configurable and fully reproducible, and we provide both the framework (DKPro C4CorpusTools) and the resulting data (C4Corpus) to the research community. (DIPF/Orig.)
DIPF-Abteilung: Informationszentrum Bildung

Medical concept embeddings via labeled background corpora Mencía, Eneldo Loza; De Melo, Gerard; Nam, Jinseok Sammelbandbeitrag | Aus: European Language Resources Association (Hrsg.): Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016) | Portoroz: European Language Resources Association | 2016 37067 Endnote: Autor*innen: Mencía, Eneldo Loza; De Melo, Gerard; Nam, Jinseok
Titel: Medical concept embeddings via labeled background corpora
Aus: European Language Resources Association (Hrsg.): Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), Portoroz: European Language Resources Association, 2016 , S. 3629-3636
URL: http://www.lrec-conf.org/proceedings/lrec2016/pdf/1190_Paper.pdf
Dokumenttyp: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Algorithmus; Automatisierung; Computerlinguistik; Medizin; Semantik; Sprache; Textanalyse
Abstract: In recent years, we have seen an increasing amount of interest in low-dimensional vector representations of words. Among other things, these facilitate computing word similarity and relatedness scores. The most well-known example of algorithms to produce representations of this sort are the word2vec approaches. In this paper, we investigate a new model to induce such vector spaces for medical concepts, based on a joint objective that exploits not only word co-occurrences but also manually labeled documents, as available from sources such as PubMed. Our extensive experimental analysis shows that our embeddings lead to significantly higher correlations with human similarity and relatedness assessments than previous work. Due to the simplicity and versatility of vector representations, these findings suggest that our resource can easily be used as a drop-in replacement to improve any systems relying on medical concept similarity measures. (DIPF/Orig.)
DIPF-Abteilung: Informationszentrum Bildung

Enriching wikidata with frame semantics Mousselly-Sergieh, Hatem; Gurevych, Iryna Sammelbandbeitrag | Aus: Association for Computational Linguistics (Hrsg.): Proceedings of the 5th workshop on automated knowledge base construction (AKBC) 2016 held in conjunction with NAACL 2016 | Stroudsburg; PA: Association for Computational Linguistics | 2016 36979 Endnote: Autor*innen: Mousselly-Sergieh, Hatem; Gurevych, Iryna
Titel: Enriching wikidata with frame semantics
Aus: Association for Computational Linguistics (Hrsg.): Proceedings of the 5th workshop on automated knowledge base construction (AKBC) 2016 held in conjunction with NAACL 2016, Stroudsburg; PA: Association for Computational Linguistics, 2016 , S. 29-34
URL: https://www.ukp.tu-darmstadt.de/fileadmin/user_upload/Group_UKP/publikationen/2016/2016_NAACL_AKBC_HMS.pdf
Dokumenttyp: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Automatisierung; Computerlinguistik; Lexikon; Mehrsprachigkeit; Online; Semantik
Abstract (english): Wikidata is a large-scale, multilingual and freely available knowledge base. It contains more than 14 million facts, however, it is still missing linguistic information. In this paper, we aim to bridge this gap by aligning Wikidata with FrameNet lexicon. We propose an approach based on word embedding to identify a mapping between Wikidata relations, called properties, and FrameNet frames and to annotate the arguments of each relation with the semantic roles of the matching frames. Early empirical results show the advantage of our approach compared to other baseline methods. (DIPF/Orig.)
DIPF-Abteilung: Informationszentrum Bildung

Domain-specific corpus expansion with focused webcrawling Remus, Steffen; Biemann, Chris Sammelbandbeitrag | Aus: European Language Resources Association (Hrsg.): Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016) | Portoroz: European Language Resources Association | 2016 37066 Endnote: Autor*innen: Remus, Steffen; Biemann, Chris
Titel: Domain-specific corpus expansion with focused webcrawling
Aus: European Language Resources Association (Hrsg.): Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), Portoroz: European Language Resources Association, 2016 , S. 3607-3611
URL: http://www.lrec-conf.org/proceedings/lrec2016/pdf/316_Paper.pdf
Dokumenttyp: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Algorithmus; Automatisierung; Bildung; Computerlinguistik; Data Mining; Hypertext; Modell; Sprache; Text; Textanalyse
Abstract: This work presents a straightforward method for extending or creating in-domain web corpora by focused webcrawling. The focused webcrawler uses statistical N-gram language models to estimate the relatedness of documents and weblinks and needs as input only N-grams or plain texts of a predefined domain and seed URLs as starting points. Two experiments demonstrate that our focused crawler is able to stay focused in domain and language. The first experiment shows that the crawler stays in a focused domain, the second experiment demonstrates that language models trained on focused crawls obtain better perplexity scores on in-domain corpora. We distribute the focused crawler as open source software. (DIPF/Orig.)
DIPF-Abteilung: Informationszentrum Bildung

Crowdsourcing a large dataset of domain-specific context-sensitive semantic verb relations Sukhareva, Maria; Eckle-Kohler, Judith; Habernal, Ivan; Gurevych, Iryna Sammelbandbeitrag | Aus: European Language Resources Association (Hrsg.): Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016) | Portoroz: European Language Resources Association | 2016 36972 Endnote: Autor*innen: Sukhareva, Maria; Eckle-Kohler, Judith; Habernal, Ivan; Gurevych, Iryna
Titel: Crowdsourcing a large dataset of domain-specific context-sensitive semantic verb relations
Aus: European Language Resources Association (Hrsg.): Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), Portoroz: European Language Resources Association, 2016 , S. 2131-2137
URL: https://www.ukp.tu-darmstadt.de/fileadmin/user_upload/Group_UKP/publikationen/2016/lrec2016_sukhareva.pdf
Dokumenttyp: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Automatisierung; Computerlinguistik; Data Mining; Klassifikation; Semantik; Textanalyse
Abstract (english): We present a new large dataset of 12403 context-sensitive verb relations manually annotated via crowdsourcing. These relations capture fine-grained semantic information between verb-centric propositions, such as temporal or entailment relations. We propose a novel semantic verb relation scheme and design a multi-step annotation approach for scaling-up the annotations using crowdsourcing. We employ several quality measures and report on agreement scores. The resulting dataset is available under a permissive CreativeCommons license. It represents a valuable resource for various applications, such as automatic information consolidation or automatic summarization. (DIPF/Orig.)
DIPF-Abteilung: Informationszentrum Bildung

Using semantic similarity for multi-label zero-shot classification of text documents Veeranna, Sappadla Prateek; Nam, Jinseok; Mencía, Eneldo Loza; Fürnkranz, Johannes Sammelbandbeitrag | Aus: European Symposium on Artificial Neural Networks (Hrsg.): ESANN 2016 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges (Belgium), 27-29 April 2016 | Bruges: European Symposium on Artificial Neural Networks | 2016 36982 Endnote: Autor*innen: Veeranna, Sappadla Prateek; Nam, Jinseok; Mencía, Eneldo Loza; Fürnkranz, Johannes
Titel: Using semantic similarity for multi-label zero-shot classification of text documents
Aus: European Symposium on Artificial Neural Networks (Hrsg.): ESANN 2016 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges (Belgium), 27-29 April 2016, Bruges: European Symposium on Artificial Neural Networks, 2016 , S. 423-428
URL: https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2016-174.pdf
Dokumenttyp: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Computerlinguistik; Klassifikation; Semantik; Text
Abstract (english): In this paper, we examine a simple approach to zero-shot multi-label text classification, i.e., to the problem of predicting multiple, possibly previously unseen labels for a document. In particular, we propose to use a semantic embedding of label and document words and base the prediction of previously unseen labels on the similarity between the label name and the document words in this embedding. Experiments on three textual datasets across various domains show that even such a simple technique yields considerable performance improvements over a simple uninformed baseline. (DIPF/Orig.)
DIPF-Abteilung: Informationszentrum Bildung

The eras and trends of automatic short answer grading Burrows, Steven; Gurevych, Iryna; Stein, Benno Zeitschriftenbeitrag | In: International Journal of Artificial Intelligence in Education | 2015 34978 Endnote: Autor*innen: Burrows, Steven; Gurevych, Iryna; Stein, Benno
Titel: The eras and trends of automatic short answer grading
In: International Journal of Artificial Intelligence in Education, 25 (2015) 1, S. 60-117
DOI: 10.1007/s40593-014-0026-8
URL: http://link.springer.com/article/10.1007/s40593-014-0026-8
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Automatisierung; Computerlinguistik; Computerunterstütztes Verfahren; Frage; Leistungsbeurteilung; Methode; Notengebung; Technologiebasiertes Testen; Testaufgabe
Abstract: Automatic short answer grading (ASAG) is the task of assessing short natural language responses to objective questions using computational methods. The active research in this field has increased enormously of late with over 80 papers fitting a definition of ASAG. However, the past efforts have generally been ad-hoc and non-comparable until recently, hence the need for a unified view of the whole field. The goal of this paper is to address this aim with a comprehensive review of ASAG research and systems according to history and components. Our historical analysis identifies 35 ASAG systems within 5 temporal themes that mark advancement in methodology or evaluation. In contrast, our component analysis reviews 6 common dimensions from preprocessing to effectiveness. A key conclusion is that an era of evaluation is the newest trend in ASAG research, which is paving the way for the consolidation of the field. (DIPF/Orig.)
DIPF-Abteilung: Informationszentrum Bildung

Analyzing domain suitability of a sentiment lexicon by identifying distributionally bipolar words Flekova, Lucie; Ruppert, Eugen; Preotiuc-Pietro, Daniel Sammelbandbeitrag | Aus: Association for Computational Linguistics (Hrsg.): 6th workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA 2015): Workshop proceedings, 17 September 2015, Lisboa, Portugal | Red Hook; NY: Association for Computational Linguistics | 2015 37028 Endnote: Autor*innen: Flekova, Lucie; Ruppert, Eugen; Preotiuc-Pietro, Daniel
Titel: Analyzing domain suitability of a sentiment lexicon by identifying distributionally bipolar words
Aus: Association for Computational Linguistics (Hrsg.): 6th workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA 2015): Workshop proceedings, 17 September 2015, Lisboa, Portugal, Red Hook; NY: Association for Computational Linguistics, 2015 , S. 77-84
URL: http://www.emnlp2015.org/proceedings/WASSA/WASSA-2015.pdf#page=89
Dokumenttyp: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Automatisierung; Computerlinguistik; Emotion; Kommunikation; Lexikographie; Lexikon; Online; Qualität; Soziale Software; Textanalyse; Thesaurus
Abstract: Contemporary sentiment analysis approaches rely on lexicon based methods. This is mainly due to their simplicity, although the best empirical results can be achieved by more complex techniques. We introduce a method to assess suitability of generic sentiment lexicons for a given domain, namely to identify frequent bigrams where a polar word switches polarity. Our bigrams are scored using Lexicographers Mutual Information and leveraging large automatically obtained corpora. Our score matches human perception of polarity and demonstrates improvements in classification results using our enhanced context-aware method. Our method enhances the assessment of lexicon based sentiment detection and can be further userd to quantify ambiguous words. (DIPF/Orig.)