Search results in the DIPF database of publications

Your query:

(Schlagwörter: "Textanalyse")

Recognizing partial textual entailment Levy, Omer; Zesch, Torsten; Dagan, Ido; Gurevych, Iryna Book Chapter | Aus: Association of Computational Linguistics (Hrsg.): Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: Short Papers (ACL Short Papers 2013) | Stroudsburg; PA: Association for Computational Linguistics | 2013 33525 Endnote: Author(s): Levy, Omer; Zesch, Torsten; Dagan, Ido; Gurevych, Iryna
Title: Recognizing partial textual entailment
In: Association of Computational Linguistics (Hrsg.): Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: Short Papers (ACL Short Papers 2013), Stroudsburg; PA: Association for Computational Linguistics, 2013 , S. 1-2
URL: http://u.cs.biu.ac.il/~dagan/publications/PartialEntailment_Fixed.pdf
Publication Type: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Language: Englisch
Keywords: Computerlinguistik; Computerunterstütztes Verfahren; Evaluation; Hypothese; Klassifikation; Methode; Semantik; Text; Textanalyse
Abstract: Textual entailment is an asymmetric relation between two text fragments that describes whether one fragment can be inferred from the other. It thus cannot capture the notion that the target fragment is "almost entailed" by the given text. The recently suggested idea of partial textual entailment may remedy this problem. We investigate partial entailment under the faceted entailment model and the possibility of adapting existing textual entailment methods to this setting. Indeed, our results show that these methods are useful for recognizing partial entailment. We also provide a preliminary assessment of how partial entailment may be used for recognizing (complete) textual entailment.
DIPF-Departments: Informationszentrum Bildung

Supervised all-words lexical substitution using delexicalized features Szarvas, György; Biemann, Chris; Gurevych, Iryna Book Chapter | Aus: Association for Computational Linguistics (Hrsg.): Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT) | Stroudsburg; PA: Association for Computational Linguistics | 2013 33528 Endnote: Author(s): Szarvas, György; Biemann, Chris; Gurevych, Iryna
Title: Supervised all-words lexical substitution using delexicalized features
In: Association for Computational Linguistics (Hrsg.): Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT), Stroudsburg; PA: Association for Computational Linguistics, 2013 , S. 1131-1141
URL: https://www.ukp.tu-darmstadt.de/fileadmin/user_upload/Group_UKP/publikationen/2013/SzarvasBiemannGurevych_naaclhlt2013.pdf
Publication Type: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Language: Englisch
Keywords: Automatisierung; Computerlinguistik; Information Retrieval; Methode; Modell; Sinn; Synonym; Textanalyse; Thesaurus; Verfahren; Wort
Abstract (english): We propose a supervised lexical substitution system that does not use separate classifiers per word and is therefore applicable to any word in the vocabulary. Instead of learning word-specific substitution patterns, a global model for lexical substitution is trained on delexicalized (i.e., non lexical) features, which allows to exploit the power of supervised methods while being able to generalize beyond target words in the training set. This way, our approach remains technically straightforward, provides better performance and similar coverage in comparison to unsupervised approaches. Using features from lexical resources, as well as a variety of features computed from large corpora (n-gram counts, distributional similarity) and a ranking method based on the posterior probabilities obtained from a Maximum Entropy classifier, we improve over the state of the art in the LexSub Best-Precision metric and the Generalized Average Precision measure. Robustness of our approach is demonstrated by evaluating it successfully on two different datasets.
DIPF-Departments: Informationszentrum Bildung

UKP-BIU. Similarity and entailment metrics for student response analysis Zesch, Torsten; Levy, Omer; Gurevych, Iryna; Dagan, Ido Book Chapter | Aus: Association for Computational Linguistics (Hrsg.): Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval 2013), in conjunction with the 2nd Joint Conference on Lexical and Computational Semantics (*SEM 2013) | Stroudsburg; PA: Association for Computational Linguistics | 2013 33554 Endnote: Author(s): Zesch, Torsten; Levy, Omer; Gurevych, Iryna; Dagan, Ido
Title: UKP-BIU. Similarity and entailment metrics for student response analysis
In: Association for Computational Linguistics (Hrsg.): Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval 2013), in conjunction with the 2nd Joint Conference on Lexical and Computational Semantics (*SEM 2013), Stroudsburg; PA: Association for Computational Linguistics, 2013 , S. 285-289
URL: https://www.ukp.tu-darmstadt.de/fileadmin/user_upload/Group_UKP/publikationen/2013/S13-2048.pdf
Publication Type: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Language: Englisch
Keywords: Antwort; Automatisierung; Computerunterstütztes Verfahren; Evaluation; Messverfahren; Qualität; Schülerleistungstest; Semantik; Textanalyse
Abstract: Our system combines text similarity measures with a textual entailment system. In the main task, we focused on the influence of lexicalized versus unlexicalized features, and how they affect performance on unseen questions and domains. We also participated in the pilot partial entailment task, where our system significantly outperforms a strong baseline.
DIPF-Departments: Informationszentrum Bildung

Acquisition of multiword lexical units for FrameNet Hartmann, Silvana; Gurevych, Iryna Working Papers | 2013 33590 Endnote: Author(s): Hartmann, Silvana; Gurevych, Iryna
Title: Acquisition of multiword lexical units for FrameNet
Published: Berkeley: Språkbanken (the Swedish Language Bank), 2013 (International FrameNet Workshop 2013)
URL: http://spraakbanken.gu.se/sites/spraakbanken.gu.se/files/fn_mwe_at_fn_ws_130419.pdf
Publication Type: 5. Arbeits- und Diskussionspapiere; Arbeits- und Diskussionspapier (keine besondere Kategorie)
Language: Englisch
Keywords: Automatisierung; Computerlinguistik; Computerunterstütztes Verfahren; Lexikon; Semantik; Textanalyse; Wort
Abstract (english): FrameNet [1] is a well-known resource for modeling the predicate argument structure of words and organizing them in situation-specific frames and semantic roles (i.e., frame elements). Its interesting formalism to represent the semantics of multiword expressions (MWEs) is often overlooked [2]. FrameNet can represent the relation between constituents of Figure 1: Incorporated roles. MWEs. The following example from [2] illustrates this: storage container and bread container evoke the Container frame. Roles of this frame are the Material of the container, its Contents, Size, or Function. For storage container, storage the Function role, while for bread container, bread the Contents role (Fig. 1). The FrameNet lexicon model provides the option to annotate Function and Contents as an "incorporated role" (ICR) for the respective MWEs. Thus, the implicit relations between the constituents of the MWEs are made explicit. A large FrameNet MWE lexicon can enhance FrameNet-based semantic role labeling (SRL) by a better model for MWEs see analogous developments integrating MWE detection in parsing [3]. Moreover, the lexicon can be used as information source for the automatic interpretation of MWEs in applications such as information extraction, question answering, or machine translation, for instance by providing features for noun compound interpretation (NCI) [5]. Finally, it provides a basis for further theoretical investigation of MWE semantics. Unfortunately, the coverage of MWEs in FrameNet 1.5 is low; it contains less than 1,000 multi-word entries. This also aspects the performance of FrameNet-based SRL [4]. Currently, FrameNet does not make use of its potential to model the relations within MWEs: even though leather jacket does occur in the FrameNet example sentences for the Clothing frame with the desired incorporated role (Material), it does not receive a separate lexical entry. To close this gap, and to make full use of FrameNet's potential, an automatic process for the acquisition of MWE lexical units and MWE semantics is desired. Such an automatic approach needs to be based on solid theoretical foundations. Therefore, we present an analysis of the current state of MWEs in FrameNet. Then, we focus on the acquisition of MWE semantics, specically of ICRs, which, to our knowledge, has not been addressed before. We present a new approach to bootstrap the ICRs of MWEs in FrameNet by annotating their paraphrases with semantic roles, for instance container that contains bread for bread container. The semantic dependencies between the verb contains that evokes the Container frame and bread, that the Contents role, mirror the relations between the constituents in bread container (Fig. 2). Thus, we can extract the incorporated arguments from the explicit role annotations on the paraphrases. Our approach is related to the work on NCI using paraphrases [6], but is not restricted to compounds and applicable in a multilingual setting. For lexical acquisition of MWEs, previous work on lexical acquisition for FrameNet, for instance using distributional methods [7], can be adapted to MWEs. Our contributions are (i) analyzing the state of MWEs in FrameNet, and (ii) a preliminary evaluation and discussion of the proposed method for ICR detection on MWEs.
DIPF-Departments: Informationszentrum Bildung

Cross-genre and cross-domain detection of semantic uncertainty Szarvas, György; Vincze, Veronika; Farkas, Richárd; Móra, György; Gurevych, Iryna Journal Article | In: Computational Linguistics Journal | 2012 32810 Endnote: Author(s): Szarvas, György; Vincze, Veronika; Farkas, Richárd; Móra, György; Gurevych, Iryna
Title: Cross-genre and cross-domain detection of semantic uncertainty
In: Computational Linguistics Journal, 38 (2012) 2, S. 335-367
URL: http://www.mitpressjournals.org/doi/pdf/10.1162/COLI_a_00098
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Beitrag in Sonderheft
Language: Englisch
Keywords: Computerlinguistik; Computerunterstütztes Verfahren; Information; Information Retrieval; Klassifikation; Modell; Natürlichsprachiges System; Semantik; Sprachanalyse; Textanalyse; Wissenschaftsdisziplin
Abstract (english): Uncertainty is an important linguistic phenomenon that is relevant in various Natural Language Processing applications, in diverse genres from medical to community generated, newswire or scientific discourse and domains from science to humanities. The semantic uncertainty of a proposition can be identified in most cases by using a finite dictionary - i.e. lexical cues - and the key steps of uncertainty detection in an application include the steps of locating the (genre- and domain-specific) lexical cues, disambiguating them, and linking them with the units of interest for the particular application (e.g. identified events in information extraction). In this study, we focus on the genre and domain differences of the context-dependent semantic uncertainty cue recognition task. We introduce a unified subcategorization of semantic uncertainty as different domain applications can apply different uncertainty categories. Based on this categorization, we normalized the annotation of three corpora and present results with a state-of-the-art uncertainty cue recognition model for four fine-grained categories of semantic uncertainty. Our results reveal the domain and genre dependence of the problem; nevertheless, we also show that even a distant source domain dataset can contribute to the recognition and disambiguation of uncertainty cues, efficiently reducing the annotation costs needed to cover a new domain. Thus, the unified subcategorization and domain adaptation for training the models offer an efficient solution for cross-domain and cross-genre semantic uncertainty recognition.
DIPF-Departments: Informationszentrum Bildung

Detecting and correcting language errors using measures of contextual fitness Zesch, Torsten Journal Article | In: TAL Journal | 2012 33563 Endnote: Author(s): Zesch, Torsten
Title: Detecting and correcting language errors using measures of contextual fitness
In: TAL Journal, 53 (2012) 3, S. 11-31
URL: http://www.atala.org/IMG/pdf/Zesch-TAL3-3.pdf
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Beitrag in Sonderheft
Language: Englisch
Keywords: Automatisierung; Computerlinguistik; Fehler; Messung; Nachschlagewerk; Online; Rechtschreibung; Textanalyse
Abstract (english): While detecting simple language errors (e.g. misspellings, number agreement, etc.) is nowadays standard functionality in all but the simplest text-editors, other more complicated language errors might go unnoticed. A difficult case are errors that come in the disguise of a valid word that fits syntactically into the sentence. We use the Wikipedia revision history to extract a dataset with such errors in their context. We show that the new dataset provides a more realistic picture of the performance of contextual fitness measures. The achieved error detection quality is generally sufficient for competent language users who are willing to accept a certain level of false alarms, but might be problematic for non-native writers who accept all suggestions made by the systems. We make the full experimental framework publicly available which will allow other scientists to reproduce our experiments and to conduct follow-up experiments.
DIPF-Departments: Informationszentrum Bildung

UKP: Computing Semantic Textual Similarity by Combining Multiple Content Similarity Measures Bär, Daniel; Biemann, Chris; Gurevych, Iryna; Zesch, Torsten Book Chapter | Aus: Agirre, Eneko (Hrsg.): *SEM First Joint Conference on Lexical and Computational Semantics | Montreal: Association for Computational Linguistics | 2012 32698 Endnote: Author(s): Bär, Daniel; Biemann, Chris; Gurevych, Iryna; Zesch, Torsten
Title: UKP: Computing Semantic Textual Similarity by Combining Multiple Content Similarity Measures
In: Agirre, Eneko (Hrsg.): *SEM First Joint Conference on Lexical and Computational Semantics, Montreal: Association for Computational Linguistics, 2012 , S. 435-440
URL: http://aclweb.org/anthology-new/S/S12/S12-1059.pdf
Publication Type: 4. Beiträge in Sammelbänden; Tagungsband/Konferenzbeitrag/Proceedings
Language: Englisch
Keywords: Analyse; Computerlinguistik; Semantik; Textanalyse; Verfahren
Abstract (english): We present the UKP system which performed best in the Semantic Textual Similarity (STS) task at SemEval-2012 in two out of three metrics. It uses a simple log-linear regression model, trained on the training data, to combine multiple text similarity measures of varying complexity. These range from simple character and word n-grams and common subsequences to complex features such as Explicit Semantic Analysis vector comparisons and aggregation of word similarity based on lexical-semantic resources. Further, we employ a lexical substitution system and statistical machine translation to add additional lexemes, which alleviates lexical gaps. Our final models, one per dataset, consist of a log-linear combination of about 20 features, out of the possible 300+ features implemented.
DIPF-Departments: Informationszentrum Bildung

Text reuse detection using a composition of text similarity measures Bär, Daniel; Zesch, Torsten; Gurevych, Iryna Book Chapter | Aus: Kay, Martin; Boitet, Christian (Hrsg.): Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012) | Mumbai: The COLING 2012 Organizing Committee | 2012 33289 Endnote: Author(s): Bär, Daniel; Zesch, Torsten; Gurevych, Iryna
Title: Text reuse detection using a composition of text similarity measures
In: Kay, Martin; Boitet, Christian (Hrsg.): Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012), Mumbai: The COLING 2012 Organizing Committee, 2012 , S. 167-184
URL: http://www.aclweb.org/anthology/C/C12/C12-1011.pdf
Publication Type: 4. Beiträge in Sammelbänden; Tagungsband/Konferenzbeitrag/Proceedings
Language: Englisch
Keywords: Computerunterstütztes Verfahren; Erkennen; Inhalt; Messung; Plagiat; Struktur; Text; Textanalyse; Vergleich
Abstract: Detecting text reuse is a fundamental requirement for a variety of tasks and applications, ranging from journalistic text reuse to plagiarism detection. Text reuse is traditionally detected by computing similarity between a source text and a possibly reused text. However, existing text similarity measures exhibit a major limitation: They compute similarity only on features which can be derived from the content of the given texts, thereby inherently implying that any other text characteristics are negligible. In this paper, we overcome this traditional limitation and compute similarity along three characteristic dimensions inherent to texts: content, structure, and style. We explore and discuss possible combinations of measures along these dimensions, and our results demonstrate that the composition consistently outperforms previous approaches on three standard evaluation datasets, and that text reuse detection greatly benefits from incorporating a diverse feature set that reflects a wide variety of text characteristics.
DIPF-Departments: Informationszentrum Bildung

Measuring contextual fitness using error contexts extracted from the Wikipedia revision history Zesch, Torsten Book Chapter | Aus: Association for Computational Linguistics (Hrsg.): Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012) | Avignon: Association for Computational Linguistics | 2012 32799 Endnote: Author(s): Zesch, Torsten
Title: Measuring contextual fitness using error contexts extracted from the Wikipedia revision history
In: Association for Computational Linguistics (Hrsg.): Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012), Avignon: Association for Computational Linguistics, 2012 , S. 529-538
URL: http://aclweb.org/anthology-new/E/E12/E12-1054.pdf
Publication Type: 4. Beiträge in Sammelbänden; Tagungsband/Konferenzbeitrag/Proceedings
Language: Englisch
Keywords: Computerlinguistik; Evaluation; Fehler; Messung; Semantik; Statistische Methode; Textanalyse; Verfahren
Abstract (english): We evaluate measures of contextual fitness on the task of detecting real-word spelling errors. For that purpose, we extract naturally occurring errors and their contexts from the Wikipedia revision history. We show that such natural errors are better suited for evaluation than the previously used artificially created errors. In particular, the precision of statistical methods has been largely over-estimated, while the precision of knowledge-based approaches has been under-estimated. Additionally, we show that knowledge-based approaches can be improved by using semantic relatedness measures that make use of knowledge beyond classical taxonomic relations. Finally, we show that statistical and knowledgebased methods can be combined for increased performance.
DIPF-Departments: Informationszentrum Bildung

HOO 2012 shared task: UKP lab system description Zesch, Torsten; Haase, Jens Book Chapter | Aus: Association for Computational Linguistics (Hrsg.): Proceedings of the Seventh Workshop on Innovative Use of NLP for Building Educational Applications at NAACL-HLT | Montreal: Association for Computational Linguistics | 2012 32998 Endnote: Author(s): Zesch, Torsten; Haase, Jens
Title: HOO 2012 shared task: UKP lab system description
In: Association for Computational Linguistics (Hrsg.): Proceedings of the Seventh Workshop on Innovative Use of NLP for Building Educational Applications at NAACL-HLT, Montreal: Association for Computational Linguistics, 2012 , S. 302-306
URL: http://aclweb.org/anthology-new/W/W12/W12-2036.pdf
Publication Type: 4. Beiträge in Sammelbänden; Tagungsband/Konferenzbeitrag/Proceedings
Language: Englisch
Keywords: Anwendungsbeispiel; Computerlinguistik; Fehler; Messung; Software; Textanalyse; Verfahren
Abstract (english): In this paper, we describe the UKP Lab system participating in the HOO 2012 Shared Task on preposition and determiner error correction. Our focus was to implement a highly flexible and modular system which can be easily augmented by other researchers. The system might be used to provide a level playground for subsequent shared tasks and enable further progress in this important research field on top of the state of the art identified by the shared task.
DIPF-Departments: Informationszentrum Bildung