-
-
Autor*innen: Mousselly-Sergieh, Hatem; Gurevych, Iryna
Titel: Enriching wikidata with frame semantics
Aus: Association for Computational Linguistics (Hrsg.): Proceedings of the 5th workshop on automated knowledge base construction (AKBC) 2016 held in conjunction with NAACL 2016, Stroudsburg; PA: Association for Computational Linguistics, 2016 , S. 29-34
URL: https://www.ukp.tu-darmstadt.de/fileadmin/user_upload/Group_UKP/publikationen/2016/2016_NAACL_AKBC_HMS.pdf
Dokumenttyp: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Automatisierung; Computerlinguistik; Lexikon; Mehrsprachigkeit; Online; Semantik
Abstract (english): Wikidata is a large-scale, multilingual and freely available knowledge base. It contains more than 14 million facts, however, it is still missing linguistic information. In this paper, we aim to bridge this gap by aligning Wikidata with FrameNet lexicon. We propose an approach based on word embedding to identify a mapping between Wikidata relations, called properties, and FrameNet frames and to annotate the arguments of each relation with the semantic roles of the matching frames. Early empirical results show the advantage of our approach compared to other baseline methods. (DIPF/Orig.)
DIPF-Abteilung: Informationszentrum Bildung
-
-
Autor*innen: Flekova, Lucie; Ruppert, Eugen; Preotiuc-Pietro, Daniel
Titel: Analyzing domain suitability of a sentiment lexicon by identifying distributionally bipolar words
Aus: Association for Computational Linguistics (Hrsg.): 6th workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA 2015): Workshop proceedings, 17 September 2015, Lisboa, Portugal, Red Hook; NY: Association for Computational Linguistics, 2015 , S. 77-84
URL: http://www.emnlp2015.org/proceedings/WASSA/WASSA-2015.pdf#page=89
Dokumenttyp: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Automatisierung; Computerlinguistik; Emotion; Kommunikation; Lexikographie; Lexikon; Online; Qualität; Soziale Software; Textanalyse; Thesaurus
Abstract: Contemporary sentiment analysis approaches rely on lexicon based methods. This is mainly due to their simplicity, although the best empirical results can be achieved by more complex techniques. We introduce a method to assess suitability of generic sentiment lexicons for a given domain, namely to identify frequent bigrams where a polar word switches polarity. Our bigrams are scored using Lexicographers Mutual Information and leveraging large automatically obtained corpora. Our score matches human perception of polarity and demonstrates improvements in classification results using our enhanced context-aware method. Our method enhances the assessment of lexicon based sentiment detection and can be further userd to quantify ambiguous words. (DIPF/Orig.)
-
-
Autor*innen: Stisser, Anna; Hild, Anne; Ell, Basil; Schindler, Christoph
Titel: Neue Forschungswerkzeuge in der Historischen Bildungsforschung. Die virtuelle Forschungsumgebung SMW-CorA für die kollaborative Analyse und Auswertung umfangreicher digitalisierter Quellen
In: Jahrbuch für Historische Bildungsforschung, 19 (2014) , S. 305-325
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Deutsch
Schlagwörter: Bildungsforschung; Bildungsgeschichte; Codierung; Daten; Digitalisierung; Forscher; Kooperation; Lexikon; MediaWiki; Metadaten; Netzwerk; Qualitätssicherung; Quelle; Semantic Web; Text
Abstract: Die Autoren stellen eine webbasierte Forschungsumgebung für die kollaborative, kooperative Analyse und Auswertung umfangreicher digitalisierter Quellen vor. […] "Für die Entwicklung solcher Forschungsumgebungen wird die Zusammenarbeit von Entwicklerinnen und ForscherInnen und der Bezug auf ganz konkrete Forschungsinteressen und Datensätze als notwendig erachtet, um konkreten Bedürfnissen und Anforderungen gerecht werden zu können. Hier setzt die Entwicklung einer Virtuellen Forschungsumgebung auf Basis von Semantic MediaWiki (SMW) für die kollaborative Analyse von umfangreichen digitalisierten Textkorpora an, die im Folgenden vorgestellt werden soll."
DIPF-Abteilung: Informationszentrum Bildung
-
-
Autor*innen: Eckle-Kohler, Judith; Gurevych, Iryna; Hartmann, Silvana; Matuschek, Michael; Meyer, Christian M.
Titel: UBY-LMF. Exploring the boundaries of language-independent lexicon models
Aus: Francopoulo, Gil (Hrsg.): LMF Lexical Markup Framework, Hoboken; NJ: Wiley, 2013 (Computer engineering and IT series), S. 145-156
Dokumenttyp: 4. Beiträge in Sammelwerken; Sammelband (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Computerlinguistik; Datenverarbeitung; Lexikographie; Lexikon; Mehrsprachigkeit; Modell; Semantik; Syntax
Abstract: Following a long series of successful scientific projects and collaborations, the community responsible for developing lexicons for Natural Language Processing (NLP) and Machine Readable Dictionaries (MRDs decided to jump start their International Organization for Standardization (ISO) standardization activities in 2003. A group of 60 researchers (cited herein as the "LMF team") spent 5 years gathering requirements and developing the ideas which resulted in the LMF standard. The LMF specification is a success. Numerous lexicon managers currently use LMF in different languages and contexts. This book is dedicated to reporting on a number of these applications. It is structured as follows: Chapter 1 presents the historical context of LMF. Chapter 2 provides an overview of the LMF model. Chapter 3 deals with the Data Category Registry, which provides a flexible means for applying constants like /grammatical gender/ in a variety of different settings. The remaining chapters present concrete applications and experiments on real data, which are important for developers who want to learn about the use of LMF. Despite this success, we do not claim that LMF is perfect. Indeed, several chapters describe a number of limitations and/or proposals for its improvement.
DIPF-Abteilung: Informationszentrum Bildung
-
-
Autor*innen: Ell, Basil; Schindler, Christoph; Rittberger, Marc
Titel: Semantically enhanced interactions between heterogeneous data life-cycles. Analyzing educational lexica in a Virtual Research Environment
Aus: Garoufallou, Emmanouel; Greenberg, Jane (Hrsg.): Metadata and Semantics Research: 7th Research Conference, MTSR 2013, Thessaloniki, Greece, November 19-22, 2013, proceedings, Heidelberg: Springer, 2013 (Communications in Computer and Information Science, 390), S. 277-288
DOI: 10.1007/978-3-319-03437-9_28
URL: http://link.springer.com/chapter/10.1007/978-3-319-03437-9_28
Dokumenttyp: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Bildungsforschung; Daten; Datenaustausch; Elektronische Bibliothek; Forschung; Infrastruktur; Interaktion; Lexikon; Qualitative Forschung; Quantitative Forschung; Semantic Web; Wissenschaftler
Abstract: This paper highlights how Semantic Web technologies facilitate new socio-technical interactions between researchers and libraries focussing research data in a Virtual Research Environment. Concerning data practices in the fields of social sciences and humanities, the worlds of researchers and librarians have so far been separate. The increased digitization of research data and the ubiquitous use of Web technologies change this situation and offer new capacities for interaction. This is realized as a semantically enhanced Virtual Research Environment, which offers the possibility to align the previously disparate data life-cycles in research and in libraries covering a variety of inter-activities from importing research data via enriching research data and cleansing to exporting and sharing to allow for reuse. Currently, collaborative qualitative and quantitative analyses of a large digital corpus of educational lexica are carried out using this semantic and wiki-based research environment.
Abstract (english): This paper highlights how Semantic Web technologies facilitate new socio-technical interactions between researchers and libraries focussing research data in a Virtual Research Environment. Concerning data practices in the fields of social sciences and humanities, the worlds of researchers and librarians have so far been separate. The increased digitization of research data and the ubiquitous use of Web technologies change this situation and offer new capacities for interaction. This is realized as a semantically enhanced Virtual Research Environment, which offers the possibility to align the previously disparate data life-cycles in research and in libraries covering a variety of inter-activities from importing research data via enriching research data and cleansing to exporting and sharing to allow for reuse. Currently, collaborative qualitative and quantitative analyses of a large digital corpus of educational lexica are carried out using this semantic and wiki-based research environment.
DIPF-Abteilung: Informationszentrum Bildung
-
-
Autor*innen: Gurevych, Iryna; Eckle-Kohler, Judith; Hartmann, Silvana; Matuschek, Michael; Meyer, Christian M.; Nghiem, Tri-Duc
Titel: UBY - A large-scale lexical-semantic resource [Abstract]
Aus: Theune, M. ; Nijholt, A. (Hrsg.): Book of abstracts of the 23rd Meeting of Computational Linguistics in the Netherlands (CLIN 2013), Enschede: Universiteit Twente, 2013 , S. 81
URL: http://hmi.ewi.utwente.nl/clin2013-dir/bookofabstracts.pdf
Dokumenttyp: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Automatisierung; Computerlinguistik; Computerunterstütztes Verfahren; Deutsch; Englisch; Lexikon
Abstract: We present UBY, a large-scale lexical-semantic resource combining a wide range of information from expert-constructed and collaboratively created resources for English and German. It currently contains nine resources in two languages: English WordNet, Wiktionary, Wikipedia, FrameNet and VerbNet, German Wikipedia, Wiktionary, and GermaNet, and the multilingual OmegaWiki. The main contributions of our work can be summarised as follows. First, we define a standardised format for modelling the heterogeneous information coming from the various lexical-semantic resources (LSRs) and languages included in UBY. For this purpose, we employ the ISO standard Lexical Markup Framework and Data Categories selected from ISOCat. In this way, all types of information provided by the LSRs in UBY are easily accessible on a fine-grained level. Further, this standardised format facilitates the extension of UBY with new languages and resources. This is different from previous efforts in combining LSRs which usually targeted particular applications and thus focused on aligning specific types of information only. Second, UBY contains nine pairwise sense alignments between resources. Through these alignments, we provide access to the complementary information for a word sense in different resources. For example, if one looks up a particular verb sense in UBY, one has simultaneous access to the sense in WordNet and to the corresponding sense in FrameNet. Third, UBY is freely available and we have developed an easy-to-use Java API which provides unified access to all types of information contained in UBY. This facilitates the utilization of UBY for a variety of NLP tasks.
DIPF-Abteilung: Informationszentrum Bildung
-
-
Autor*innen: Hartmann, Silvana; Gurevych, Iryna
Titel: FrameNet on the way to Babel. Creating a bilingual FrameNet using Wiktionary as interlingual connection
Aus: Association of Computational Linguistics (Hrsg.): 51st Annual Meeting of the Association for Computational Linguistics: Proceedings of the Conference System Demonstrations, Stroudsburg; PA: Association for Computational Linguistics, 2013 , S. 1363-1373
URL: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.386.1725&rep=rep1&type=pdf#page=49pdf
Dokumenttyp: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Computerlinguistik; Evaluation; Konzeption; Lexikograhie; Lexikon; Methode; Semantik; Soziale Software; World wide web 2.0; Zweisprachigkeit
Abstract: We present a new bilingual FrameNet lexicon for English and German. It is created through a simple, but powerful approach to construct a FrameNet in any language using Wiktionary as an interlingual representation. Our approach is based on a sense alignment of FrameNet and Wiktionary, and subsequent translation disambiguation into the target language. We perform a detailed evaluation of the created resource and a discussion of Wiktionary as an interlingual connection for the cross-language transfer of lexical-semantic resources. The created resource is publicly available at http://www.ukp.tu-darmstadt.de/fnwkde/
DIPF-Abteilung: Informationszentrum Bildung
-
-
Autor*innen: Hartmann, Silvana; Gurevych, Iryna
Titel: Acquisition of multiword lexical units for FrameNet
Erscheinungsvermerk: Berkeley: Språkbanken (the Swedish Language Bank), 2013 (International FrameNet Workshop 2013)
URL: http://spraakbanken.gu.se/sites/spraakbanken.gu.se/files/fn_mwe_at_fn_ws_130419.pdf
Dokumenttyp: 5. Arbeits- und Diskussionspapiere; Arbeits- und Diskussionspapier (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Automatisierung; Computerlinguistik; Computerunterstütztes Verfahren; Lexikon; Semantik; Textanalyse; Wort
Abstract (english): FrameNet [1] is a well-known resource for modeling the predicate argument structure of words and organizing them in situation-specific frames and semantic roles (i.e., frame elements). Its interesting formalism to represent the semantics of multiword expressions (MWEs) is often overlooked [2]. FrameNet can represent the relation between constituents of Figure 1: Incorporated roles. MWEs. The following example from [2] illustrates this: storage container and bread container evoke the Container frame. Roles of this frame are the Material of the container, its Contents, Size, or Function. For storage container, storage the Function role, while for bread container, bread the Contents role (Fig. 1). The FrameNet lexicon model provides the option to annotate Function and Contents as an "incorporated role" (ICR) for the respective MWEs. Thus, the implicit relations between the constituents of the MWEs are made explicit. A large FrameNet MWE lexicon can enhance FrameNet-based semantic role labeling (SRL) by a better model for MWEs see analogous developments integrating MWE detection in parsing [3]. Moreover, the lexicon can be used as information source for the automatic interpretation of MWEs in applications such as information extraction, question answering, or machine translation, for instance by providing features for noun compound interpretation (NCI) [5]. Finally, it provides a basis for further theoretical investigation of MWE semantics. Unfortunately, the coverage of MWEs in FrameNet 1.5 is low; it contains less than 1,000 multi-word entries. This also aspects the performance of FrameNet-based SRL [4]. Currently, FrameNet does not make use of its potential to model the relations within MWEs: even though leather jacket does occur in the FrameNet example sentences for the Clothing frame with the desired incorporated role (Material), it does not receive a separate lexical entry. To close this gap, and to make full use of FrameNet's potential, an automatic process for the acquisition of MWE lexical units and MWE semantics is desired. Such an automatic approach needs to be based on solid theoretical foundations. Therefore, we present an analysis of the current state of MWEs in FrameNet. Then, we focus on the acquisition of MWE semantics, specically of ICRs, which, to our knowledge, has not been addressed before. We present a new approach to bootstrap the ICRs of MWEs in FrameNet by annotating their paraphrases with semantic roles, for instance container that contains bread for bread container. The semantic dependencies between the verb contains that evokes the Container frame and bread, that the Contents role, mirror the relations between the constituents in bread container (Fig. 2). Thus, we can extract the incorporated arguments from the explicit role annotations on the paraphrases. Our approach is related to the work on NCI using paraphrases [6], but is not restricted to compounds and applicable in a multilingual setting. For lexical acquisition of MWEs, previous work on lexical acquisition for FrameNet, for instance using distributional methods [7], can be adapted to MWEs. Our contributions are (i) analyzing the state of MWEs in FrameNet, and (ii) a preliminary evaluation and discussion of the proposed method for ICR detection on MWEs.
DIPF-Abteilung: Informationszentrum Bildung
-
-
Herausgeber*innen: Horn, Klaus-Peter; Kemnitz, Heidemarie; Marotzki, Winfried; Sandfuchs, Uwe; Füssel, Hans-Peter
Titel: Klinkhardt Lexikon Erziehungswissenschaft
Erscheinungsvermerk: Bad Heilbrunn: UTB/Klinkhardt, 2012
Dokumenttyp: 2. Herausgeberschaft; Sammelband (keine besondere Kategorie)
Sprache: Deutsch
Schlagwörter: Berufspädagogik; Bildungsforschung; Bildungsgeschichte; Bildungspolitik; Bildungsrecht; Biografie; Erwachsenenbildung; Erziehungswissenschaft; Familie; Historische Pädagogik; Interkulturelle Pädagogik; Lexikon; Medienpädagogik; Methode; Pädagoge; Pädagogik; Psychologie; Schulpädagogik; Sonderpädagogik; Sozialpädagogik; Soziologie; Vergleichende Erziehungswissenschaft; Vorschulerziehung; Weiterbildung; Wirtschaftspädagogik
Abstract: Das Lexikon umfasst alle Teilgebiete der Erziehungswissenschaft. Für 16 definierte Fachgebiete - Allgemeine Erziehungswissenschaft, Berufs- und Wirtschaftspädagogik, Bildungspolitik, Erwachsenen- und Weiterbildung, Familie und Vorschulerziehung, Historische Erziehungswissenschaft, Interkulturelle Pädagogik, Medienpädagogik, Methoden der erziehungswissenschaftlichen Forschung, Psychologie, Recht, Schulpädagogik, Sonderpädagogik, Sozialpädagogik, Soziologie, Vergleichende Erziehungswissenschaft - wurden in Zusammenarbeit mit kompetenten Fachvertretern nach Bedeutung abgestufte Stichwörter generiert und erarbeitet.
DIPF-Abteilung: Struktur und Steuerung des Bildungswesens
-
-
Autor*innen: Daxenberger, Johannes; Gurevych, Iryna
Titel: A corpus-based study of edit categories in featured and non-featured Wikipedia articles
Aus: Kay, Martin; Boitet, Christian (Hrsg.): Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012), Mumbai: The COLING 2012 Organizing Committee, 2012 , S. 711-726
URL: http://aclweb.org/anthology-new/C/C12/C12-1044.pdf
Dokumenttyp: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Bewertung; Internet; Lexikon; Mitarbeit; Online; Publizieren; Qualität; Schreiben
Abstract: In this paper, we present a study of the collaborative writing process in Wikipedia. Our work is
based on a corpus of 1,995 edits obtained from 891 article revisions in the English Wikipedia.
We propose a 21-category classification scheme for edits based on Faigley and Witte's (1981)
model. Example edit categories include spelling error corrections and vandalism. In a manual
multi-label annotation study with 3 annotators, we obtain an inter-annotator agreement of
= 0.67. We further analyze the distribution of edit categories for distinct stages in the revision
history of 10 featured and 10 non-featured articles. Our results show that the information
content in featured articles tends to become more stable after their promotion. On the opposite,
this is not true for non-featured articles. We make the resulting corpus and the annotation
guidelines freely available.1
1http://www.ukp.tu-darmstadt.de/data/wiki-edits/
DIPF-Abteilung: Informationszentrum Bildung