-
-
Autor*innen: Hartmann, Silvana; Gurevych, Iryna
Titel: Acquisition of multiword lexical units for FrameNet
Erscheinungsvermerk: Berkeley: Språkbanken (the Swedish Language Bank), 2013 (International FrameNet Workshop 2013)
URL: http://spraakbanken.gu.se/sites/spraakbanken.gu.se/files/fn_mwe_at_fn_ws_130419.pdf
Dokumenttyp: 5. Arbeits- und Diskussionspapiere; Arbeits- und Diskussionspapier (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Automatisierung; Computerlinguistik; Computerunterstütztes Verfahren; Lexikon; Semantik; Textanalyse; Wort
Abstract (english): FrameNet [1] is a well-known resource for modeling the predicate argument structure of words and organizing them in situation-specific frames and semantic roles (i.e., frame elements). Its interesting formalism to represent the semantics of multiword expressions (MWEs) is often overlooked [2]. FrameNet can represent the relation between constituents of Figure 1: Incorporated roles. MWEs. The following example from [2] illustrates this: storage container and bread container evoke the Container frame. Roles of this frame are the Material of the container, its Contents, Size, or Function. For storage container, storage the Function role, while for bread container, bread the Contents role (Fig. 1). The FrameNet lexicon model provides the option to annotate Function and Contents as an "incorporated role" (ICR) for the respective MWEs. Thus, the implicit relations between the constituents of the MWEs are made explicit. A large FrameNet MWE lexicon can enhance FrameNet-based semantic role labeling (SRL) by a better model for MWEs see analogous developments integrating MWE detection in parsing [3]. Moreover, the lexicon can be used as information source for the automatic interpretation of MWEs in applications such as information extraction, question answering, or machine translation, for instance by providing features for noun compound interpretation (NCI) [5]. Finally, it provides a basis for further theoretical investigation of MWE semantics. Unfortunately, the coverage of MWEs in FrameNet 1.5 is low; it contains less than 1,000 multi-word entries. This also aspects the performance of FrameNet-based SRL [4]. Currently, FrameNet does not make use of its potential to model the relations within MWEs: even though leather jacket does occur in the FrameNet example sentences for the Clothing frame with the desired incorporated role (Material), it does not receive a separate lexical entry. To close this gap, and to make full use of FrameNet's potential, an automatic process for the acquisition of MWE lexical units and MWE semantics is desired. Such an automatic approach needs to be based on solid theoretical foundations. Therefore, we present an analysis of the current state of MWEs in FrameNet. Then, we focus on the acquisition of MWE semantics, specically of ICRs, which, to our knowledge, has not been addressed before. We present a new approach to bootstrap the ICRs of MWEs in FrameNet by annotating their paraphrases with semantic roles, for instance container that contains bread for bread container. The semantic dependencies between the verb contains that evokes the Container frame and bread, that the Contents role, mirror the relations between the constituents in bread container (Fig. 2). Thus, we can extract the incorporated arguments from the explicit role annotations on the paraphrases. Our approach is related to the work on NCI using paraphrases [6], but is not restricted to compounds and applicable in a multilingual setting. For lexical acquisition of MWEs, previous work on lexical acquisition for FrameNet, for instance using distributional methods [7], can be adapted to MWEs. Our contributions are (i) analyzing the state of MWEs in FrameNet, and (ii) a preliminary evaluation and discussion of the proposed method for ICR detection on MWEs.
DIPF-Abteilung: Informationszentrum Bildung