Ergebnis der Suche in der DIPF Publikationendatenbank

Ihre Abfrage:

(Schlagwörter: "Computerlinguistik")

Acquisition of multiword lexical units for FrameNet Hartmann, Silvana; Gurevych, Iryna Verschiedenartige Dokumente | 2013 33590 Endnote: Autor*innen: Hartmann, Silvana; Gurevych, Iryna
Titel: Acquisition of multiword lexical units for FrameNet
Erscheinungsvermerk: Berkeley: Språkbanken (the Swedish Language Bank), 2013 (International FrameNet Workshop 2013)
URL: http://spraakbanken.gu.se/sites/spraakbanken.gu.se/files/fn_mwe_at_fn_ws_130419.pdf
Dokumenttyp: 5. Arbeits- und Diskussionspapiere; Arbeits- und Diskussionspapier (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Automatisierung; Computerlinguistik; Computerunterstütztes Verfahren; Lexikon; Semantik; Textanalyse; Wort
Abstract (english): FrameNet [1] is a well-known resource for modeling the predicate argument structure of words and organizing them in situation-specific frames and semantic roles (i.e., frame elements). Its interesting formalism to represent the semantics of multiword expressions (MWEs) is often overlooked [2]. FrameNet can represent the relation between constituents of Figure 1: Incorporated roles. MWEs. The following example from [2] illustrates this: storage container and bread container evoke the Container frame. Roles of this frame are the Material of the container, its Contents, Size, or Function. For storage container, storage the Function role, while for bread container, bread the Contents role (Fig. 1). The FrameNet lexicon model provides the option to annotate Function and Contents as an "incorporated role" (ICR) for the respective MWEs. Thus, the implicit relations between the constituents of the MWEs are made explicit. A large FrameNet MWE lexicon can enhance FrameNet-based semantic role labeling (SRL) by a better model for MWEs see analogous developments integrating MWE detection in parsing [3]. Moreover, the lexicon can be used as information source for the automatic interpretation of MWEs in applications such as information extraction, question answering, or machine translation, for instance by providing features for noun compound interpretation (NCI) [5]. Finally, it provides a basis for further theoretical investigation of MWE semantics. Unfortunately, the coverage of MWEs in FrameNet 1.5 is low; it contains less than 1,000 multi-word entries. This also aspects the performance of FrameNet-based SRL [4]. Currently, FrameNet does not make use of its potential to model the relations within MWEs: even though leather jacket does occur in the FrameNet example sentences for the Clothing frame with the desired incorporated role (Material), it does not receive a separate lexical entry. To close this gap, and to make full use of FrameNet's potential, an automatic process for the acquisition of MWE lexical units and MWE semantics is desired. Such an automatic approach needs to be based on solid theoretical foundations. Therefore, we present an analysis of the current state of MWEs in FrameNet. Then, we focus on the acquisition of MWE semantics, specically of ICRs, which, to our knowledge, has not been addressed before. We present a new approach to bootstrap the ICRs of MWEs in FrameNet by annotating their paraphrases with semantic roles, for instance container that contains bread for bread container. The semantic dependencies between the verb contains that evokes the Container frame and bread, that the Contents role, mirror the relations between the constituents in bread container (Fig. 2). Thus, we can extract the incorporated arguments from the explicit role annotations on the paraphrases. Our approach is related to the work on NCI using paraphrases [6], but is not restricted to compounds and applicable in a multilingual setting. For lexical acquisition of MWEs, previous work on lexical acquisition for FrameNet, for instance using distributional methods [7], can be adapted to MWEs. Our contributions are (i) analyzing the state of MWEs in FrameNet, and (ii) a preliminary evaluation and discussion of the proposed method for ICR detection on MWEs.
DIPF-Abteilung: Informationszentrum Bildung

Cross-genre and cross-domain detection of semantic uncertainty Szarvas, György; Vincze, Veronika; Farkas, Richárd; Móra, György; Gurevych, Iryna Zeitschriftenbeitrag | In: Computational Linguistics Journal | 2012 32810 Endnote: Autor*innen: Szarvas, György; Vincze, Veronika; Farkas, Richárd; Móra, György; Gurevych, Iryna
Titel: Cross-genre and cross-domain detection of semantic uncertainty
In: Computational Linguistics Journal, 38 (2012) 2, S. 335-367
URL: http://www.mitpressjournals.org/doi/pdf/10.1162/COLI_a_00098
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Beitrag in Sonderheft
Sprache: Englisch
Schlagwörter: Computerlinguistik; Computerunterstütztes Verfahren; Information; Information Retrieval; Klassifikation; Modell; Natürlichsprachiges System; Semantik; Sprachanalyse; Textanalyse; Wissenschaftsdisziplin
Abstract (english): Uncertainty is an important linguistic phenomenon that is relevant in various Natural Language Processing applications, in diverse genres from medical to community generated, newswire or scientific discourse and domains from science to humanities. The semantic uncertainty of a proposition can be identified in most cases by using a finite dictionary - i.e. lexical cues - and the key steps of uncertainty detection in an application include the steps of locating the (genre- and domain-specific) lexical cues, disambiguating them, and linking them with the units of interest for the particular application (e.g. identified events in information extraction). In this study, we focus on the genre and domain differences of the context-dependent semantic uncertainty cue recognition task. We introduce a unified subcategorization of semantic uncertainty as different domain applications can apply different uncertainty categories. Based on this categorization, we normalized the annotation of three corpora and present results with a state-of-the-art uncertainty cue recognition model for four fine-grained categories of semantic uncertainty. Our results reveal the domain and genre dependence of the problem; nevertheless, we also show that even a distant source domain dataset can contribute to the recognition and disambiguation of uncertainty cues, efficiently reducing the annotation costs needed to cover a new domain. Thus, the unified subcategorization and domain adaptation for training the models offer an efficient solution for cross-domain and cross-genre semantic uncertainty recognition.
DIPF-Abteilung: Informationszentrum Bildung

Detecting and correcting language errors using measures of contextual fitness Zesch, Torsten Zeitschriftenbeitrag | In: TAL Journal | 2012 33563 Endnote: Autor*innen: Zesch, Torsten
Titel: Detecting and correcting language errors using measures of contextual fitness
In: TAL Journal, 53 (2012) 3, S. 11-31
URL: http://www.atala.org/IMG/pdf/Zesch-TAL3-3.pdf
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Beitrag in Sonderheft
Sprache: Englisch
Schlagwörter: Automatisierung; Computerlinguistik; Fehler; Messung; Nachschlagewerk; Online; Rechtschreibung; Textanalyse
Abstract (english): While detecting simple language errors (e.g. misspellings, number agreement, etc.) is nowadays standard functionality in all but the simplest text-editors, other more complicated language errors might go unnoticed. A difficult case are errors that come in the disguise of a valid word that fits syntactically into the sentence. We use the Wikipedia revision history to extract a dataset with such errors in their context. We show that the new dataset provides a more realistic picture of the performance of contextual fitness measures. The achieved error detection quality is generally sufficient for competent language users who are willing to accept a certain level of false alarms, but might be problematic for non-native writers who accept all suggestions made by the systems. We make the full experimental framework publicly available which will allow other scientists to reproduce our experiments and to conduct follow-up experiments.
DIPF-Abteilung: Informationszentrum Bildung

UKP: Computing Semantic Textual Similarity by Combining Multiple Content Similarity Measures Bär, Daniel; Biemann, Chris; Gurevych, Iryna; Zesch, Torsten Sammelbandbeitrag | Aus: Agirre, Eneko (Hrsg.): *SEM First Joint Conference on Lexical and Computational Semantics | Montreal: Association for Computational Linguistics | 2012 32698 Endnote: Autor*innen: Bär, Daniel; Biemann, Chris; Gurevych, Iryna; Zesch, Torsten
Titel: UKP: Computing Semantic Textual Similarity by Combining Multiple Content Similarity Measures
Aus: Agirre, Eneko (Hrsg.): *SEM First Joint Conference on Lexical and Computational Semantics, Montreal: Association for Computational Linguistics, 2012 , S. 435-440
URL: http://aclweb.org/anthology-new/S/S12/S12-1059.pdf
Dokumenttyp: 4. Beiträge in Sammelbänden; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Analyse; Computerlinguistik; Semantik; Textanalyse; Verfahren
Abstract (english): We present the UKP system which performed best in the Semantic Textual Similarity (STS) task at SemEval-2012 in two out of three metrics. It uses a simple log-linear regression model, trained on the training data, to combine multiple text similarity measures of varying complexity. These range from simple character and word n-grams and common subsequences to complex features such as Explicit Semantic Analysis vector comparisons and aggregation of word similarity based on lexical-semantic resources. Further, we employ a lexical substitution system and statistical machine translation to add additional lexemes, which alleviates lexical gaps. Our final models, one per dataset, consist of a log-linear combination of about 20 features, out of the possible 300+ features implemented.
DIPF-Abteilung: Informationszentrum Bildung

Learning semantics with deep belief network for cross-language information retrieval Kim, Jungi; Nam, Jinseok; Gurevych, Iryna Sammelbandbeitrag | Aus: Kay, Martin; Boitet, Christian (Hrsg.): Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012) | Mumbai: The COLING 2012 Organizing Committee | 2012 33140 Endnote: Autor*innen: Kim, Jungi; Nam, Jinseok; Gurevych, Iryna
Titel: Learning semantics with deep belief network for cross-language information retrieval
Aus: Kay, Martin; Boitet, Christian (Hrsg.): Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012), Mumbai: The COLING 2012 Organizing Committee, 2012 , S. 579-588
URL: http://aclweb.org/anthology-new/C/C12/C12-2057.pdf
Dokumenttyp: 4. Beiträge in Sammelbänden; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Computerlinguistik; Information Retrieval; Mehrsprachigkeit; Semantik
Abstract: This paper introduces a cross-language information retrieval (CLIR) framework that combines the state-of-the-art keyword-based approach with a latent semantic-based retrieval model. To capture and analyze the hidden semantics in cross-lingual settings, we construct latent semantic models that map text in different languages into a shared semantic space. Our proposed framework consists of deep belief networks (DBN) for each language and we employ canonical correlation analysis (CCA) to construct a shared semantic space. We evaluated the proposed CLIR approach on a standard ad hoc CLIR dataset, and we show that the cross-lingual semantic analysis with DBN and CCA improves the state-of-the-art keyword-based CLIR performance.
DIPF-Abteilung: Informationszentrum Bildung

To exhibit is not to loiter. A multilingual, sense-disambiguated Wiktionary for measuring verb […] Meyer, M. Christian; Gurevych, Iryna Sammelbandbeitrag | Aus: Kay, Martin; Boitet, Christian (Hrsg.): Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012) | Mumbai: The COLING 2012 Organizing Committee | 2012 33146 Endnote: Autor*innen: Meyer, M. Christian; Gurevych, Iryna
Titel: To exhibit is not to loiter. A multilingual, sense-disambiguated Wiktionary for measuring verb similarity
Aus: Kay, Martin; Boitet, Christian (Hrsg.): Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012), Mumbai: The COLING 2012 Organizing Committee, 2012 , S. 1763--1780
URL: http://aclweb.org/anthology-new/C/C12/C12-1108.pdf
Dokumenttyp: 4. Beiträge in Sammelbänden; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch; Deutsch
Schlagwörter: Beziehung; Computerlinguistik; Mehrsprachigkeit; Semantik; Sinn; Übersetzung; Vergleich; Wort; Wörterbuch
Abstract: We construct a new multilingual lexical resource from Wiktionary by disambiguating semantic relations and translations. For this task, we propose and evaluate an automatic disambiguation method that outperforms previous approaches significantly. We additionally introduce a method for inferring new semantic relations based on the disambiguated translations. Our resource fills the gap between expert-built resources suffering from high cost and small size and Wikipediabased resources that are restricted to encyclopedic knowledge about nouns. We demonstrate this by applying our new resource to measuring monolingual and cross-lingual verb similarity. For the latter, our resource yields better results than Wikipedia and expert-built multilingual wordnets. We make our final resource and the evaluation datasets publicly available.
DIPF-Abteilung: Informationszentrum Bildung

Using distributional similarity for lexical expansion in knowledge-based word sense disambiguation Miller, Tristan; Biemann, Chris; Zesch, Torsten; Gurevych, Iryna Sammelbandbeitrag | Aus: Kay, Martin; Boitet, Christian (Hrsg.): Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012) | Mumbai: The COLING 2012 Organizing Committee | 2012 33142 Endnote: Autor*innen: Miller, Tristan; Biemann, Chris; Zesch, Torsten; Gurevych, Iryna
Titel: Using distributional similarity for lexical expansion in knowledge-based word sense disambiguation
Aus: Kay, Martin; Boitet, Christian (Hrsg.): Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012), Mumbai: The COLING 2012 Organizing Committee, 2012 , S. 1781-1796
URL: http://aclweb.org/anthology-new/C/C12/C12-1109.pdf
Dokumenttyp: 4. Beiträge in Sammelbänden; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Computerlinguistik; Sinn; Thesaurus; Verteilung; Wort
Abstract: We explore the contribution of distributional information for purely knowledge-based word sense disambiguation. Specifically, we use a distributional thesaurus, computed from a large parsed corpus, for lexical expansion of context and sense information. This bridges the lexical gap that is seen as the major obstacle for word overlap-based approaches. We apply this mechanism to two traditional knowledge-based methods and show that distributional information significantly improves disambiguation results across several data sets. This improvement exceeds the state of the art for disambiguation without sense frequency information-a situation which is especially encountered with new domains or languages for which no sense-annotated corpus is available.
DIPF-Abteilung: Informationszentrum Bildung

Measuring contextual fitness using error contexts extracted from the Wikipedia revision history Zesch, Torsten Sammelbandbeitrag | Aus: Association for Computational Linguistics (Hrsg.): Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012) | Avignon: Association for Computational Linguistics | 2012 32799 Endnote: Autor*innen: Zesch, Torsten
Titel: Measuring contextual fitness using error contexts extracted from the Wikipedia revision history
Aus: Association for Computational Linguistics (Hrsg.): Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012), Avignon: Association for Computational Linguistics, 2012 , S. 529-538
URL: http://aclweb.org/anthology-new/E/E12/E12-1054.pdf
Dokumenttyp: 4. Beiträge in Sammelbänden; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Computerlinguistik; Evaluation; Fehler; Messung; Semantik; Statistische Methode; Textanalyse; Verfahren
Abstract (english): We evaluate measures of contextual fitness on the task of detecting real-word spelling errors. For that purpose, we extract naturally occurring errors and their contexts from the Wikipedia revision history. We show that such natural errors are better suited for evaluation than the previously used artificially created errors. In particular, the precision of statistical methods has been largely over-estimated, while the precision of knowledge-based approaches has been under-estimated. Additionally, we show that knowledge-based approaches can be improved by using semantic relatedness measures that make use of knowledge beyond classical taxonomic relations. Finally, we show that statistical and knowledgebased methods can be combined for increased performance.
DIPF-Abteilung: Informationszentrum Bildung

HOO 2012 shared task: UKP lab system description Zesch, Torsten; Haase, Jens Sammelbandbeitrag | Aus: Association for Computational Linguistics (Hrsg.): Proceedings of the Seventh Workshop on Innovative Use of NLP for Building Educational Applications at NAACL-HLT | Montreal: Association for Computational Linguistics | 2012 32998 Endnote: Autor*innen: Zesch, Torsten; Haase, Jens
Titel: HOO 2012 shared task: UKP lab system description
Aus: Association for Computational Linguistics (Hrsg.): Proceedings of the Seventh Workshop on Innovative Use of NLP for Building Educational Applications at NAACL-HLT, Montreal: Association for Computational Linguistics, 2012 , S. 302-306
URL: http://aclweb.org/anthology-new/W/W12/W12-2036.pdf
Dokumenttyp: 4. Beiträge in Sammelbänden; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Anwendungsbeispiel; Computerlinguistik; Fehler; Messung; Software; Textanalyse; Verfahren
Abstract (english): In this paper, we describe the UKP Lab system participating in the HOO 2012 Shared Task on preposition and determiner error correction. Our focus was to implement a highly flexible and modular system which can be easily augmented by other researchers. The system might be used to provide a level playground for subsequent shared tasks and enable further progress in this important research field on top of the state of the art identified by the shared task.
DIPF-Abteilung: Informationszentrum Bildung

The Working Group for Open Data in Linguistics Chiarcos, Christian; Hellmann, Sebastian; Nordhoff, Sebastian; Cimiano, Philipp; McCrae, John; […] Sammelbandbeitrag | Aus: DGfS-CL (Hrsg.): Sprache als komplexes System: Proceedings der 34. Jahrestagung der DGfS | Frankfurt am Main: DGfS-CL | 2012 32816 Endnote: Autor*innen: Chiarcos, Christian; Hellmann, Sebastian; Nordhoff, Sebastian; Cimiano, Philipp; McCrae, John; Brekle, Jonas; Eckle-Kohler, Judith; Gurevych, Iryna; Hartmann, Silvana; Matuschek, Michael; M.Meyer, Christian; Littauer, Richard
Titel: The Working Group for Open Data in Linguistics
Aus: DGfS-CL (Hrsg.): Sprache als komplexes System: Proceedings der 34. Jahrestagung der DGfS, Frankfurt am Main: DGfS-CL, 2012 , S. 1
URL: https://www.informatik.tu-darmstadt.de/fileadmin/user_upload/Group_UKP/publikationen/2012/dgfs2012-posterOWLG.pdf
Dokumenttyp: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Arbeitsgruppe; Austausch; Computerlinguistik; Daten; Datenbank; Linguistik; Open Access; Semantic Web; Semantik; Sprachanalyse; Vernetzung; World wide web 2.0
Abstract (english): The Open Linguistics Working Group (OWLG) of the Open Knowledge Foundation (OKFN) is an initiative of experts from different fields concerned with linguistic data, including academic linguistics (e.g. typology, corpus linguistics), applied linguistics (e.g. computational linguistics, lexicography and language documentation), and NLP (e.g. from the Semantic Web community). The primary goals of the working group are 1) promoting the idea of open linguistic resources 2) the development of means for their representation, and 3) encouraging the exchange of ideas across different disciplines. Here, we focus on one particular aspect of our work, the promotion of linked data in linguistics.
DIPF-Abteilung: Informationszentrum Bildung