Ergebnis der Suche in der DIPF Publikationendatenbank

Ihre Abfrage:

(Schlagwörter: "Textanalyse")

Generating training data for semantic role labeling based on label transfer from linked lexical […] Hartmann, Silvana; Eckle-Kohler, Judith; Gurevych, Iryna Zeitschriftenbeitrag | In: Transactions of the Association for Computational Linguistics | 2016 36232 Endnote: Autor*innen: Hartmann, Silvana; Eckle-Kohler, Judith; Gurevych, Iryna
Titel: Generating training data for semantic role labeling based on label transfer from linked lexical resources
In: Transactions of the Association for Computational Linguistics, (2016)
URL: https://tacl2013.cs.columbia.edu/ojs/index.php/tacl/article/view/717
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Ambiguität; Automatisierung; Computerlinguistik; Computerunterstütztes Verfahren; Semantik; Textanalyse; Wort; Wörterbuch
Abstract (english): We present a new approach for generating role-labeled training data using Linked Lexical Resources, i.e., integrated lexical resources that combine several resources (e.g., WordNet, FrameNet, Wiktionary) by linking them on the sense or on the role level. Unlike resource-based supervision in relation extraction, we focus on complex linguistic annotations, more specifically FrameNet senses and roles. The automatically labeled training data (http://www.ukp.tu-darmstadt.de/knowledge-based-srl/) are evaluated on four corpora from different domains for the tasks of word sense disambiguation and semantic role classification. Results show that classifiers trained on our generated data equal those resulting from a standard supervised setting. (DIPF/Orig.)
DIPF-Abteilung: Informationszentrum Bildung

Which argument is more convincing? Analyzing and predicting convincingness of Web arguments using […] Habernal, Ivan; Gurevych, Iryna Sammelbandbeitrag | Aus: Association for Computational Linguistics (Hrsg.): Proceedings of the 54th annual meeting of the Association for Computational Linguistics (ACL 2016): Long papers | Stroudsburg; PA: Association for Computational Linguistics | 2016 36970 Endnote: Autor*innen: Habernal, Ivan; Gurevych, Iryna
Titel: Which argument is more convincing? Analyzing and predicting convincingness of Web arguments using bidirectional LSTM
Aus: Association for Computational Linguistics (Hrsg.): Proceedings of the 54th annual meeting of the Association for Computational Linguistics (ACL 2016): Long papers, Stroudsburg; PA: Association for Computational Linguistics, 2016 , S. 1589-1599
URL: http://www.aclweb.org/anthology/P16-1150
Dokumenttyp: 4. Beiträge in Sammelbänden; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Algorithmus; Argumentation; Automatisierung; Computerlinguistik; Kommunikation; Online; Prognose; Qualität; Rhetorik; Soziale Software; Textanalyse; Überzeugung; World wide web 2.0
Abstract (english): We propose a new task in the field of computational argumentation in which we investigate qualitative properties of Web arguments, namely their convincingness. We cast the problem as relation classification, where a pair of arguments having the same stance to the same prompt is judged. We annotate a large datasets of 16k pairs of arguments over 32 topics and investigate whether the relation "A is more convincing than B" exhibits properties of total ordering; these findings are used as global constraints for cleaning the crowdsourced data. We propose two tasks: (1) predicting which argument from an argument pair is more convincing and (2) ranking all arguments to the topic based on their convincingness. We experiment with feature-rich SVM and bidirectional LSTM and obtain 0.76-0.78 accuracy and 0.35-0.40 Spearman's correlation in a cross-topic evaluation. We release the newly created corpus UKPConvArg1 and the experimental software under open licenses. (DIPF/Orig.)
DIPF-Abteilung: Informationszentrum Bildung

Improve sentiment analysis of citations with author modelling Ma, Zheng; Nam, Jinseok; Weihe, Karsten Sammelbandbeitrag | Aus: Association for Computational Linguistics (Hrsg.): Proceedings of the 7th workshop on computational approaches to subjectivity, sentiment and Social media analysis (WASSA 2016) held in conjunction with NAACL 2016 | Stroudsburg; PA: Association for Computational Linguistics | 2016 36981 Endnote: Autor*innen: Ma, Zheng; Nam, Jinseok; Weihe, Karsten
Titel: Improve sentiment analysis of citations with author modelling
Aus: Association for Computational Linguistics (Hrsg.): Proceedings of the 7th workshop on computational approaches to subjectivity, sentiment and Social media analysis (WASSA 2016) held in conjunction with NAACL 2016, Stroudsburg; PA: Association for Computational Linguistics, 2016 , S. 122-127
URL: http://www.aclweb.org/anthology/W16-0420
Dokumenttyp: 4. Beiträge in Sammelbänden; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Automatisierung; Autor; Bibliometrie; Modell; Text; Textanalyse; Zitat
Abstract (english): In this paper, we introduce a novel approach to sentiment polarity classification of citations, which integrates data about the authors' reputation. More specifically, our method extends the h-index with citation polarities and utilizes it in sentiment classification of citation sentences. Our computational results show that our method yields significant improvement in terms of classification performance. (DIPF/Orig.)
DIPF-Abteilung: Informationszentrum Bildung

Token-level metaphor detection using neural networks Do Dinh, Erik-Lân; Gurevych, Iryna Sammelbandbeitrag | Aus: Association for Computational Linguistics (Hrsg.): Proceedings of the fourth workshop on metaphor in NLP held in conjunction with NAACL 2016 | Stroudsburg; PA: Association for Computational Linguistics | 2016 36978 Endnote: Autor*innen: Do Dinh, Erik-Lân; Gurevych, Iryna
Titel: Token-level metaphor detection using neural networks
Aus: Association for Computational Linguistics (Hrsg.): Proceedings of the fourth workshop on metaphor in NLP held in conjunction with NAACL 2016, Stroudsburg; PA: Association for Computational Linguistics, 2016 , S. 28-33
URL: https://www.ukp.tu-darmstadt.de/fileadmin/user_upload/Group_UKP/publikationen/2016/2016_DoDinh_NAACL_pages.pdf
Dokumenttyp: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Automatisierung; Computerlinguistik; Netzwerk; Semantik; Textanalyse
Abstract (english): Automatic metaphor detection usually relies on various features, incorporating e.g. selectional preference violations or concreteness ratings to detect metaphors in text. These features rely on background corpora, hand-coded rules or additional, manually created resources, all speciﬁc to the language the system is being used on. We present a novel approach to metaphor detection using a neural network in combination with word embeddings, a method that has already proven to yield promising results for other natural language processing tasks. We show that foregoing manual feature engineering by solely relying on word embeddings trained on large corpora produces comparable results to other systems, while removing the need for additional resources. (DIPF/Orig.)
DIPF-Abteilung: Informationszentrum Bildung

C4Corpus. Multilingual web-size corpus with free license Habernal, Ivan; Zayed, Omnia; Gurevych, Iryna Sammelbandbeitrag | Aus: European Language Resources Association (Hrsg.): Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016) | Portoloz: European Language Resources Association | 2016 37065 Endnote: Autor*innen: Habernal, Ivan; Zayed, Omnia; Gurevych, Iryna
Titel: C4Corpus. Multilingual web-size corpus with free license
Aus: European Language Resources Association (Hrsg.): Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), Portoloz: European Language Resources Association, 2016 , S. 914-922
URL: https://www.informatik.tu-darmstadt.de/de/forschung/veroeffentlichungen/details/?no_cache=1&tx_bibtex_pi1%5Bpub_id%5D=TUD-CS-2016-0023
Dokumenttyp: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Computerlinguistik; Datenanalyse; Dokument; Internet; Text; Textanalyse; Urheberrecht
Abstract: Large Web corpora containing full documents with permissive licenses are crucial for many NLP tasks. In this article we present the construction of 12 million-pages Web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs. Our highly-scalable Hadoop-based framework is able to process the full CommonCrawl corpus on 2000+ CPU cluster on the Amazon Elastic Map/Reduce infrastructure. The processing pipeline includes license identification, state-of-the-art boilerplate removal, exact duplicate and near-duplicate document removal, and language detection. The construction of the corpus is highly configurable and fully reproducible, and we provide both the framework (DKPro C4CorpusTools) and the resulting data (C4Corpus) to the research community. (DIPF/Orig.)
DIPF-Abteilung: Informationszentrum Bildung

Medical concept embeddings via labeled background corpora Mencía, Eneldo Loza; De Melo, Gerard; Nam, Jinseok Sammelbandbeitrag | Aus: European Language Resources Association (Hrsg.): Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016) | Portoroz: European Language Resources Association | 2016 37067 Endnote: Autor*innen: Mencía, Eneldo Loza; De Melo, Gerard; Nam, Jinseok
Titel: Medical concept embeddings via labeled background corpora
Aus: European Language Resources Association (Hrsg.): Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), Portoroz: European Language Resources Association, 2016 , S. 3629-3636
URL: http://www.lrec-conf.org/proceedings/lrec2016/pdf/1190_Paper.pdf
Dokumenttyp: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Algorithmus; Automatisierung; Computerlinguistik; Medizin; Semantik; Sprache; Textanalyse
Abstract: In recent years, we have seen an increasing amount of interest in low-dimensional vector representations of words. Among other things, these facilitate computing word similarity and relatedness scores. The most well-known example of algorithms to produce representations of this sort are the word2vec approaches. In this paper, we investigate a new model to induce such vector spaces for medical concepts, based on a joint objective that exploits not only word co-occurrences but also manually labeled documents, as available from sources such as PubMed. Our extensive experimental analysis shows that our embeddings lead to significantly higher correlations with human similarity and relatedness assessments than previous work. Due to the simplicity and versatility of vector representations, these findings suggest that our resource can easily be used as a drop-in replacement to improve any systems relying on medical concept similarity measures. (DIPF/Orig.)
DIPF-Abteilung: Informationszentrum Bildung

Domain-specific corpus expansion with focused webcrawling Remus, Steffen; Biemann, Chris Sammelbandbeitrag | Aus: European Language Resources Association (Hrsg.): Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016) | Portoroz: European Language Resources Association | 2016 37066 Endnote: Autor*innen: Remus, Steffen; Biemann, Chris
Titel: Domain-specific corpus expansion with focused webcrawling
Aus: European Language Resources Association (Hrsg.): Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), Portoroz: European Language Resources Association, 2016 , S. 3607-3611
URL: http://www.lrec-conf.org/proceedings/lrec2016/pdf/316_Paper.pdf
Dokumenttyp: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Algorithmus; Automatisierung; Bildung; Computerlinguistik; Data Mining; Hypertext; Modell; Sprache; Text; Textanalyse
Abstract: This work presents a straightforward method for extending or creating in-domain web corpora by focused webcrawling. The focused webcrawler uses statistical N-gram language models to estimate the relatedness of documents and weblinks and needs as input only N-grams or plain texts of a predefined domain and seed URLs as starting points. Two experiments demonstrate that our focused crawler is able to stay focused in domain and language. The first experiment shows that the crawler stays in a focused domain, the second experiment demonstrates that language models trained on focused crawls obtain better perplexity scores on in-domain corpora. We distribute the focused crawler as open source software. (DIPF/Orig.)
DIPF-Abteilung: Informationszentrum Bildung

Crowdsourcing a large dataset of domain-specific context-sensitive semantic verb relations Sukhareva, Maria; Eckle-Kohler, Judith; Habernal, Ivan; Gurevych, Iryna Sammelbandbeitrag | Aus: European Language Resources Association (Hrsg.): Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016) | Portoroz: European Language Resources Association | 2016 36972 Endnote: Autor*innen: Sukhareva, Maria; Eckle-Kohler, Judith; Habernal, Ivan; Gurevych, Iryna
Titel: Crowdsourcing a large dataset of domain-specific context-sensitive semantic verb relations
Aus: European Language Resources Association (Hrsg.): Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), Portoroz: European Language Resources Association, 2016 , S. 2131-2137
URL: https://www.ukp.tu-darmstadt.de/fileadmin/user_upload/Group_UKP/publikationen/2016/lrec2016_sukhareva.pdf
Dokumenttyp: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Automatisierung; Computerlinguistik; Data Mining; Klassifikation; Semantik; Textanalyse
Abstract (english): We present a new large dataset of 12403 context-sensitive verb relations manually annotated via crowdsourcing. These relations capture fine-grained semantic information between verb-centric propositions, such as temporal or entailment relations. We propose a novel semantic verb relation scheme and design a multi-step annotation approach for scaling-up the annotations using crowdsourcing. We employ several quality measures and report on agreement scores. The resulting dataset is available under a permissive CreativeCommons license. It represents a valuable resource for various applications, such as automatic information consolidation or automatic summarization. (DIPF/Orig.)
DIPF-Abteilung: Informationszentrum Bildung

Analyzing domain suitability of a sentiment lexicon by identifying distributionally bipolar words Flekova, Lucie; Ruppert, Eugen; Preotiuc-Pietro, Daniel Sammelbandbeitrag | Aus: Association for Computational Linguistics (Hrsg.): 6th workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA 2015): Workshop proceedings, 17 September 2015, Lisboa, Portugal | Red Hook; NY: Association for Computational Linguistics | 2015 37028 Endnote: Autor*innen: Flekova, Lucie; Ruppert, Eugen; Preotiuc-Pietro, Daniel
Titel: Analyzing domain suitability of a sentiment lexicon by identifying distributionally bipolar words
Aus: Association for Computational Linguistics (Hrsg.): 6th workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA 2015): Workshop proceedings, 17 September 2015, Lisboa, Portugal, Red Hook; NY: Association for Computational Linguistics, 2015 , S. 77-84
URL: http://www.emnlp2015.org/proceedings/WASSA/WASSA-2015.pdf#page=89
Dokumenttyp: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Automatisierung; Computerlinguistik; Emotion; Kommunikation; Lexikographie; Lexikon; Online; Qualität; Soziale Software; Textanalyse; Thesaurus
Abstract: Contemporary sentiment analysis approaches rely on lexicon based methods. This is mainly due to their simplicity, although the best empirical results can be achieved by more complex techniques. We introduce a method to assess suitability of generic sentiment lexicons for a given domain, namely to identify frequent bigrams where a polar word switches polarity. Our bigrams are scored using Lexicographers Mutual Information and leveraging large automatically obtained corpora. Our score matches human perception of polarity and demonstrates improvements in classification results using our enhanced context-aware method. Our method enhances the assessment of lexicon based sentiment detection and can be further userd to quantify ambiguous words. (DIPF/Orig.)

Linking the thoughts. Analysis of argumentation structures in scientific publications Kirschner, Christian; Eckle-Kohler, Judith; Gurevych, Iryna Sammelbandbeitrag | Aus: Association for Computational Linguistics (Hrsg.): Proceedings of the 2nd Workshop on Argumentation Mining held in conjunction with the 2015 Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL HLT 2015) | Denver; CO: Association for Computational Linguistics | 2015 35503 Endnote: Autor*innen: Kirschner, Christian; Eckle-Kohler, Judith; Gurevych, Iryna
Titel: Linking the thoughts. Analysis of argumentation structures in scientific publications
Aus: Association for Computational Linguistics (Hrsg.): Proceedings of the 2nd Workshop on Argumentation Mining held in conjunction with the 2015 Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL HLT 2015), Denver; CO: Association for Computational Linguistics, 2015 , S. 1-11
URL: https://aclweb.org/anthology/W/W15/W15-05.pdf
Dokumenttyp: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Argumentation; Automatisierung; Bildungsforschung; Computerlinguistik; Data Mining; Klassifikation; Textanalyse; Veröffentlichung
Abstract: This paper presents the results of an annotation study focused on the fine-grained analysis of argumentation structures in scientific publications. Our new annotation scheme specifies four types of binary argumentative relations between sentences, resulting in the representation of arguments as small graph structures. We developed an annotation tool that supports the annotation of such graphs and carried out an annotation study with four annotators on 24 scientific articles from the domain of educational research. For calculating the inter-annotator agreement, we adapted existing measures and developed a novel graph based agreement measure which reflects the semantic similarity of different annotation graphs. (DIPF/Orig.)
DIPF-Abteilung: Informationszentrum Bildung