Search results in the DIPF database of publications

Your query:

(Schlagwörter: "Linguistik")

Supervised all-words lexical substitution using delexicalized features Szarvas, György; Biemann, Chris; Gurevych, Iryna Book Chapter | Aus: Association for Computational Linguistics (Hrsg.): Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT) | Stroudsburg; PA: Association for Computational Linguistics | 2013 33528 Endnote: Author(s): Szarvas, György; Biemann, Chris; Gurevych, Iryna
Title: Supervised all-words lexical substitution using delexicalized features
In: Association for Computational Linguistics (Hrsg.): Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT), Stroudsburg; PA: Association for Computational Linguistics, 2013 , S. 1131-1141
URL: https://www.ukp.tu-darmstadt.de/fileadmin/user_upload/Group_UKP/publikationen/2013/SzarvasBiemannGurevych_naaclhlt2013.pdf
Publication Type: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Language: Englisch
Keywords: Automatisierung; Computerlinguistik; Information Retrieval; Methode; Modell; Sinn; Synonym; Textanalyse; Thesaurus; Verfahren; Wort
Abstract (english): We propose a supervised lexical substitution system that does not use separate classifiers per word and is therefore applicable to any word in the vocabulary. Instead of learning word-specific substitution patterns, a global model for lexical substitution is trained on delexicalized (i.e., non lexical) features, which allows to exploit the power of supervised methods while being able to generalize beyond target words in the training set. This way, our approach remains technically straightforward, provides better performance and similar coverage in comparison to unsupervised approaches. Using features from lexical resources, as well as a variety of features computed from large corpora (n-gram counts, distributional similarity) and a ranking method based on the posterior probabilities obtained from a Maximum Entropy classifier, we improve over the state of the art in the LexSub Best-Precision metric and the Generalized Average Precision measure. Robustness of our approach is demonstrated by evaluating it successfully on two different datasets.
DIPF-Departments: Informationszentrum Bildung

Uncertainty detection for natural language watermarking Szarvas, György; Gurevych, Iryna Book Chapter | Aus: Mitkov, Ruslan; Park, Jong C. (Hrsg.): Proceedings of the Sixth International Joint Conference on Natural Language Processing (IJCNLP 2013) | Nagoya: Asian Federation of Natural Language Processing | 2013 34038 Endnote: Author(s): Szarvas, György; Gurevych, Iryna
Title: Uncertainty detection for natural language watermarking
In: Mitkov, Ruslan; Park, Jong C. (Hrsg.): Proceedings of the Sixth International Joint Conference on Natural Language Processing (IJCNLP 2013), Nagoya: Asian Federation of Natural Language Processing, 2013 , S. 1188-1194
URL: https://www.ukp.tu-darmstadt.de/fileadmin/user_upload/Group_UKP/publikationen/2013/IJCNLP_2013_Szarvas.pdf
Publication Type: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Language: Englisch
Keywords: Algorithmus; Computerlinguistik; Daten; Information; Synonym; Text; Veränderung; Wort
Abstract: In this paper we investigate the application of uncertainty detection to text watermarking, a problem where the aim is to produce individually identifiable copies of a source text via small manipulations to the text (e.g. synonym substitutions). As previous attempts showed, accurate paraphrasing is challenging in an open vocabulary setting, so we propose the use of the closed word class of uncertainty cues. We demonstrate that these words are promising for text watermarking as they can be accurately disambiguated (from the noncue uses of the same words) and their substitution with other cues has marginal impact to the meaning of the text.
DIPF-Departments: Informationszentrum Bildung

UKP-WSI. UKP Lab Semeval-2013 task 11 system description Zorn, Hans-Peter; Gurevych, Iryna Book Chapter | Aus: Associatioin for Computational Linguistics (Hrsg.): Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval 2013), in conjunction with the 2nd Joint Conference on Lexical and Computational Semantics (*SEM 2013) | Stroudsburg; PA: Association for Computational Linguistics | 2013 33811 Endnote: Author(s): Zorn, Hans-Peter; Gurevych, Iryna
Title: UKP-WSI. UKP Lab Semeval-2013 task 11 system description
In: Associatioin for Computational Linguistics (Hrsg.): Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval 2013), in conjunction with the 2nd Joint Conference on Lexical and Computational Semantics (*SEM 2013), Stroudsburg; PA: Association for Computational Linguistics, 2013 , S. 248-252
URL: https://www.informatik.tu-darmstadt.de/fileadmin/user_upload/Group_UKP/publikationen/2013/semeval2013_task11_hpz_ig.pdf
Publication Type: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Language: Englisch
Keywords: Ambiguität; Analyse; Computerlinguistik; Evaluation; Sinn; Wort
Abstract (english): In this paper, we describe the UKP Lab system participating in the Semeval-2013 task "Word Sense Induction and Disambiguation within an End-User Application". Our approach uses preprocessing, co-occurrence extraction, graph clustering, and a state-of-the-art word sense disambiguation system. We developed a configurable pipeline which can be used to integrate and evaluate other components for the various steps of the complex task.
DIPF-Departments: Informationszentrum Bildung

Acquisition of multiword lexical units for FrameNet Hartmann, Silvana; Gurevych, Iryna Working Papers | 2013 33590 Endnote: Author(s): Hartmann, Silvana; Gurevych, Iryna
Title: Acquisition of multiword lexical units for FrameNet
Published: Berkeley: Språkbanken (the Swedish Language Bank), 2013 (International FrameNet Workshop 2013)
URL: http://spraakbanken.gu.se/sites/spraakbanken.gu.se/files/fn_mwe_at_fn_ws_130419.pdf
Publication Type: 5. Arbeits- und Diskussionspapiere; Arbeits- und Diskussionspapier (keine besondere Kategorie)
Language: Englisch
Keywords: Automatisierung; Computerlinguistik; Computerunterstütztes Verfahren; Lexikon; Semantik; Textanalyse; Wort
Abstract (english): FrameNet [1] is a well-known resource for modeling the predicate argument structure of words and organizing them in situation-specific frames and semantic roles (i.e., frame elements). Its interesting formalism to represent the semantics of multiword expressions (MWEs) is often overlooked [2]. FrameNet can represent the relation between constituents of Figure 1: Incorporated roles. MWEs. The following example from [2] illustrates this: storage container and bread container evoke the Container frame. Roles of this frame are the Material of the container, its Contents, Size, or Function. For storage container, storage the Function role, while for bread container, bread the Contents role (Fig. 1). The FrameNet lexicon model provides the option to annotate Function and Contents as an "incorporated role" (ICR) for the respective MWEs. Thus, the implicit relations between the constituents of the MWEs are made explicit. A large FrameNet MWE lexicon can enhance FrameNet-based semantic role labeling (SRL) by a better model for MWEs see analogous developments integrating MWE detection in parsing [3]. Moreover, the lexicon can be used as information source for the automatic interpretation of MWEs in applications such as information extraction, question answering, or machine translation, for instance by providing features for noun compound interpretation (NCI) [5]. Finally, it provides a basis for further theoretical investigation of MWE semantics. Unfortunately, the coverage of MWEs in FrameNet 1.5 is low; it contains less than 1,000 multi-word entries. This also aspects the performance of FrameNet-based SRL [4]. Currently, FrameNet does not make use of its potential to model the relations within MWEs: even though leather jacket does occur in the FrameNet example sentences for the Clothing frame with the desired incorporated role (Material), it does not receive a separate lexical entry. To close this gap, and to make full use of FrameNet's potential, an automatic process for the acquisition of MWE lexical units and MWE semantics is desired. Such an automatic approach needs to be based on solid theoretical foundations. Therefore, we present an analysis of the current state of MWEs in FrameNet. Then, we focus on the acquisition of MWE semantics, specically of ICRs, which, to our knowledge, has not been addressed before. We present a new approach to bootstrap the ICRs of MWEs in FrameNet by annotating their paraphrases with semantic roles, for instance container that contains bread for bread container. The semantic dependencies between the verb contains that evokes the Container frame and bread, that the Contents role, mirror the relations between the constituents in bread container (Fig. 2). Thus, we can extract the incorporated arguments from the explicit role annotations on the paraphrases. Our approach is related to the work on NCI using paraphrases [6], but is not restricted to compounds and applicable in a multilingual setting. For lexical acquisition of MWEs, previous work on lexical acquisition for FrameNet, for instance using distributional methods [7], can be adapted to MWEs. Our contributions are (i) analyzing the state of MWEs in FrameNet, and (ii) a preliminary evaluation and discussion of the proposed method for ICR detection on MWEs.
DIPF-Departments: Informationszentrum Bildung

Cross-genre and cross-domain detection of semantic uncertainty Szarvas, György; Vincze, Veronika; Farkas, Richárd; Móra, György; Gurevych, Iryna Journal Article | In: Computational Linguistics Journal | 2012 32810 Endnote: Author(s): Szarvas, György; Vincze, Veronika; Farkas, Richárd; Móra, György; Gurevych, Iryna
Title: Cross-genre and cross-domain detection of semantic uncertainty
In: Computational Linguistics Journal, 38 (2012) 2, S. 335-367
URL: http://www.mitpressjournals.org/doi/pdf/10.1162/COLI_a_00098
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Beitrag in Sonderheft
Language: Englisch
Keywords: Computerlinguistik; Computerunterstütztes Verfahren; Information; Information Retrieval; Klassifikation; Modell; Natürlichsprachiges System; Semantik; Sprachanalyse; Textanalyse; Wissenschaftsdisziplin
Abstract (english): Uncertainty is an important linguistic phenomenon that is relevant in various Natural Language Processing applications, in diverse genres from medical to community generated, newswire or scientific discourse and domains from science to humanities. The semantic uncertainty of a proposition can be identified in most cases by using a finite dictionary - i.e. lexical cues - and the key steps of uncertainty detection in an application include the steps of locating the (genre- and domain-specific) lexical cues, disambiguating them, and linking them with the units of interest for the particular application (e.g. identified events in information extraction). In this study, we focus on the genre and domain differences of the context-dependent semantic uncertainty cue recognition task. We introduce a unified subcategorization of semantic uncertainty as different domain applications can apply different uncertainty categories. Based on this categorization, we normalized the annotation of three corpora and present results with a state-of-the-art uncertainty cue recognition model for four fine-grained categories of semantic uncertainty. Our results reveal the domain and genre dependence of the problem; nevertheless, we also show that even a distant source domain dataset can contribute to the recognition and disambiguation of uncertainty cues, efficiently reducing the annotation costs needed to cover a new domain. Thus, the unified subcategorization and domain adaptation for training the models offer an efficient solution for cross-domain and cross-genre semantic uncertainty recognition.
DIPF-Departments: Informationszentrum Bildung

Detecting and correcting language errors using measures of contextual fitness Zesch, Torsten Journal Article | In: TAL Journal | 2012 33563 Endnote: Author(s): Zesch, Torsten
Title: Detecting and correcting language errors using measures of contextual fitness
In: TAL Journal, 53 (2012) 3, S. 11-31
URL: http://www.atala.org/IMG/pdf/Zesch-TAL3-3.pdf
Publication Type: 3a. Beiträge in begutachteten Zeitschriften; Beitrag in Sonderheft
Language: Englisch
Keywords: Automatisierung; Computerlinguistik; Fehler; Messung; Nachschlagewerk; Online; Rechtschreibung; Textanalyse
Abstract (english): While detecting simple language errors (e.g. misspellings, number agreement, etc.) is nowadays standard functionality in all but the simplest text-editors, other more complicated language errors might go unnoticed. A difficult case are errors that come in the disguise of a valid word that fits syntactically into the sentence. We use the Wikipedia revision history to extract a dataset with such errors in their context. We show that the new dataset provides a more realistic picture of the performance of contextual fitness measures. The achieved error detection quality is generally sufficient for competent language users who are willing to accept a certain level of false alarms, but might be problematic for non-native writers who accept all suggestions made by the systems. We make the full experimental framework publicly available which will allow other scientists to reproduce our experiments and to conduct follow-up experiments.
DIPF-Departments: Informationszentrum Bildung

UKP: Computing Semantic Textual Similarity by Combining Multiple Content Similarity Measures Bär, Daniel; Biemann, Chris; Gurevych, Iryna; Zesch, Torsten Book Chapter | Aus: Agirre, Eneko (Hrsg.): *SEM First Joint Conference on Lexical and Computational Semantics | Montreal: Association for Computational Linguistics | 2012 32698 Endnote: Author(s): Bär, Daniel; Biemann, Chris; Gurevych, Iryna; Zesch, Torsten
Title: UKP: Computing Semantic Textual Similarity by Combining Multiple Content Similarity Measures
In: Agirre, Eneko (Hrsg.): *SEM First Joint Conference on Lexical and Computational Semantics, Montreal: Association for Computational Linguistics, 2012 , S. 435-440
URL: http://aclweb.org/anthology-new/S/S12/S12-1059.pdf
Publication Type: 4. Beiträge in Sammelbänden; Tagungsband/Konferenzbeitrag/Proceedings
Language: Englisch
Keywords: Analyse; Computerlinguistik; Semantik; Textanalyse; Verfahren
Abstract (english): We present the UKP system which performed best in the Semantic Textual Similarity (STS) task at SemEval-2012 in two out of three metrics. It uses a simple log-linear regression model, trained on the training data, to combine multiple text similarity measures of varying complexity. These range from simple character and word n-grams and common subsequences to complex features such as Explicit Semantic Analysis vector comparisons and aggregation of word similarity based on lexical-semantic resources. Further, we employ a lexical substitution system and statistical machine translation to add additional lexemes, which alleviates lexical gaps. Our final models, one per dataset, consist of a log-linear combination of about 20 features, out of the possible 300+ features implemented.
DIPF-Departments: Informationszentrum Bildung

Learning semantics with deep belief network for cross-language information retrieval Kim, Jungi; Nam, Jinseok; Gurevych, Iryna Book Chapter | Aus: Kay, Martin; Boitet, Christian (Hrsg.): Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012) | Mumbai: The COLING 2012 Organizing Committee | 2012 33140 Endnote: Author(s): Kim, Jungi; Nam, Jinseok; Gurevych, Iryna
Title: Learning semantics with deep belief network for cross-language information retrieval
In: Kay, Martin; Boitet, Christian (Hrsg.): Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012), Mumbai: The COLING 2012 Organizing Committee, 2012 , S. 579-588
URL: http://aclweb.org/anthology-new/C/C12/C12-2057.pdf
Publication Type: 4. Beiträge in Sammelbänden; Tagungsband/Konferenzbeitrag/Proceedings
Language: Englisch
Keywords: Computerlinguistik; Information Retrieval; Mehrsprachigkeit; Semantik
Abstract: This paper introduces a cross-language information retrieval (CLIR) framework that combines the state-of-the-art keyword-based approach with a latent semantic-based retrieval model. To capture and analyze the hidden semantics in cross-lingual settings, we construct latent semantic models that map text in different languages into a shared semantic space. Our proposed framework consists of deep belief networks (DBN) for each language and we employ canonical correlation analysis (CCA) to construct a shared semantic space. We evaluated the proposed CLIR approach on a standard ad hoc CLIR dataset, and we show that the cross-lingual semantic analysis with DBN and CCA improves the state-of-the-art keyword-based CLIR performance.
DIPF-Departments: Informationszentrum Bildung

To exhibit is not to loiter. A multilingual, sense-disambiguated Wiktionary for measuring verb […] Meyer, M. Christian; Gurevych, Iryna Book Chapter | Aus: Kay, Martin; Boitet, Christian (Hrsg.): Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012) | Mumbai: The COLING 2012 Organizing Committee | 2012 33146 Endnote: Author(s): Meyer, M. Christian; Gurevych, Iryna
Title: To exhibit is not to loiter. A multilingual, sense-disambiguated Wiktionary for measuring verb similarity
In: Kay, Martin; Boitet, Christian (Hrsg.): Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012), Mumbai: The COLING 2012 Organizing Committee, 2012 , S. 1763--1780
URL: http://aclweb.org/anthology-new/C/C12/C12-1108.pdf
Publication Type: 4. Beiträge in Sammelbänden; Tagungsband/Konferenzbeitrag/Proceedings
Language: Englisch; Deutsch
Keywords: Beziehung; Computerlinguistik; Mehrsprachigkeit; Semantik; Sinn; Übersetzung; Vergleich; Wort; Wörterbuch
Abstract: We construct a new multilingual lexical resource from Wiktionary by disambiguating semantic relations and translations. For this task, we propose and evaluate an automatic disambiguation method that outperforms previous approaches significantly. We additionally introduce a method for inferring new semantic relations based on the disambiguated translations. Our resource fills the gap between expert-built resources suffering from high cost and small size and Wikipediabased resources that are restricted to encyclopedic knowledge about nouns. We demonstrate this by applying our new resource to measuring monolingual and cross-lingual verb similarity. For the latter, our resource yields better results than Wikipedia and expert-built multilingual wordnets. We make our final resource and the evaluation datasets publicly available.
DIPF-Departments: Informationszentrum Bildung

Using distributional similarity for lexical expansion in knowledge-based word sense disambiguation Miller, Tristan; Biemann, Chris; Zesch, Torsten; Gurevych, Iryna Book Chapter | Aus: Kay, Martin; Boitet, Christian (Hrsg.): Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012) | Mumbai: The COLING 2012 Organizing Committee | 2012 33142 Endnote: Author(s): Miller, Tristan; Biemann, Chris; Zesch, Torsten; Gurevych, Iryna
Title: Using distributional similarity for lexical expansion in knowledge-based word sense disambiguation
In: Kay, Martin; Boitet, Christian (Hrsg.): Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012), Mumbai: The COLING 2012 Organizing Committee, 2012 , S. 1781-1796
URL: http://aclweb.org/anthology-new/C/C12/C12-1109.pdf
Publication Type: 4. Beiträge in Sammelbänden; Tagungsband/Konferenzbeitrag/Proceedings
Language: Englisch
Keywords: Computerlinguistik; Sinn; Thesaurus; Verteilung; Wort
Abstract: We explore the contribution of distributional information for purely knowledge-based word sense disambiguation. Specifically, we use a distributional thesaurus, computed from a large parsed corpus, for lexical expansion of context and sense information. This bridges the lexical gap that is seen as the major obstacle for word overlap-based approaches. We apply this mechanism to two traditional knowledge-based methods and show that distributional information significantly improves disambiguation results across several data sets. This improvement exceeds the state of the art for disambiguation without sense frequency information-a situation which is especially encountered with new domains or languages for which no sense-annotated corpus is available.
DIPF-Departments: Informationszentrum Bildung