-
-
Autor*innen: Veeranna, Sappadla Prateek; Nam, Jinseok; Mencía, Eneldo Loza; Fürnkranz, Johannes
Titel: Using semantic similarity for multi-label zero-shot classification of text documents
Aus: European Symposium on Artificial Neural Networks (Hrsg.): ESANN 2016 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges (Belgium), 27-29 April 2016, Bruges: European Symposium on Artificial Neural Networks, 2016 , S. 423-428
URL: https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2016-174.pdf
Dokumenttyp: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Computerlinguistik; Klassifikation; Semantik; Text
Abstract (english): In this paper, we examine a simple approach to zero-shot multi-label text classification, i.e., to the problem of predicting multiple, possibly previously unseen labels for a document. In particular, we propose to use a semantic embedding of label and document words and base the prediction of previously unseen labels on the similarity between the label name and the document words in this embedding. Experiments on three textual datasets across various domains show that even such a simple technique yields considerable performance improvements over a simple uninformed baseline. (DIPF/Orig.)
DIPF-Abteilung: Informationszentrum Bildung
-
-
Autor*innen: Sousa, Tahir; Flekova, Lucie; Mieskes, Margot; Gurevych, Iryna
Titel: Constructive feedback, thinking process and cooperation. Assessing the quality of classroom interaction
Aus: Möller, Sebastian (Hrsg.): Proceedings of the Interspeech 2015 Conference Dresden, Berlin: Technische Universität, 2015 , S. 2739-2743
Dokumenttyp: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Computerlinguistik; Datenanalyse; Denken; Deutschland; Diskursanalyse; Feedback; Interaktionsanalyse; Klassifikation; Kooperation; Mathematikunterricht; Qualität; Schulklasse; Schweiz; Semantik; Soziale Interaktion; Sprachanalyse; Unterrichtsforschung; Video
Abstract: Analyzing and assessing the quality of classroom lessons on a range of quality dimensions is a number one educational research topic, as this allows developing teacher trainings and interventions to improve lesson quality. We model this assessment as a text classification task, exploiting linguistic features to predict the scores in several lesson quality dimensions relevant for educational researchers. Our work relies on a variety of phenomena, amongst them paralinguistic features, such as laughter, from real classroom interactions. We used these features to train machine learning models to assess various quality dimensions of school lessons. Our results show, that especially features focusing on the discourse and semantics are beneficial for this classification task. (DIPF/Orig.)
DIPF-Abteilung: Informationszentrum Bildung
-
-
Autor*innen: Beinborn, Lisa; Zesch, Torsten; Gurevych, Iryna
Titel: Readability for foreign language learning. The importance of cognates
In: Recent Advances in Automatic Readability Assessment and Text Simplification: Special Issue of the International Journal of Applied Linguistics, 165 (2014) 2, S. 136-162
DOI: 10.1075/itl.165.2.02bei
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Beitrag in Sonderheft
Sprache: Englisch
Schlagwörter: Assoziation; Deduktion; Einflussfaktor; Fremdsprache; Lesekompetenz; Leseverstehen; Messung; Modell; Muttersprache; Semantik; Spracherwerb; Textverständnis; Zweitsprachenerwerb
Abstract: In this paper, we analyse the differences between L1 acquisition and L2 learning and identify four main aspects: input quality and quantity, mapping processes, cross-lingual influence, and reading experience. As a consequence of these differences, we conclude that L1 readability measures cannot be directly mapped to L2 readability. We propose to calculate L2 readability for various dimensions and for smaller units. It is particularly important to account for the cross-lingual influence from the learner's L1 and other previously acquired languages and for the learner's higher experience in reading.In our analysis, we focus on lexical readability as it has been found to be the most influential dimension for L2 reading comprehension. We discuss the features frequency, lexical variation, concreteness, polysemy, and context specificity and analyse their impact on L2 readability. As a new feature specific to L2 readability, we propose the cognateness of words with words in languages the learner already knows. A pilot study confirms our assumption that learners can deduce the meaning of new words by their cognateness to other languages. (DIPF/Orig.)
DIPF-Abteilung: Informationszentrum Bildung
-
-
Autor*innen: Ihrke, Matthias; Behrendt, Jörg; Menge, Uwe; Titz, Cora; Hasselhorn, Marcus
Titel: Response-retrieval in identity negative priming is modulated by temporal discriminability
In: Frontiers in Psychology, (2014) , S. 5:621
DOI: 10.3389/fpsyg.2014.00621
URL: http://journal.frontiersin.org/Journal/10.3389/fpsyg.2014.00621/abstract
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Aufmerksamkeit; Deutschland; Erwachsener; Experiment; Experimentelle Untersuchung; Gedächtnis; Reaktion; Semantik; Sinneseindruck; Unaufmerksamkeit; Visuelle Wahrnehmung; Wiederholung
Abstract (english): Reaction times to previously ignored information are often delayed, a phenomenon referred to as negative priming (NP). Rothermund et al. (2005) proposed that NP is caused by the retrieval of incidental stimulus-response associations when consecutive displays share visual features but require different responses. In two experiments we examined whether the features (color, shape) that reappear in consecutive displays, or their level of processing (early-perceptual, late-semantic) moderate the likelihood that stimulus-response associations are retrieved. Using a perceptual matching task (Experiment 1), NP occurred independently of whether responses were repeated or switched. Only when implementing a semantic-matching task (Experiment 2), negative priming was determined by response-repetition as predicted by response-retrieval theory. The results can be explained in terms of a task-dependent temporal discrimination process (Milliken et al., 1998): Response-relevant features are encoded more strongly and/or are more likely to be retrieved than irrelevant features. (DIPF/Orig.)
DIPF-Abteilung: Bildung und Entwicklung
-
-
Autor*innen: Oelke, Daniela; Strobelt, Hendrik; Rohrdantz, Christian; Gurevych, Iryna; Deussen, Oliver
Titel: Comparative exploration of document collections. A visual analytics approach
In: Computer Graphics Forum, 33 (2014) 3, S. 201-210
DOI: 10.1111/cgf.12376
Dokumenttyp: 3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Sprache: Englisch
Schlagwörter: Automatisierung; Computergrafik; Computerlinguistik; Informationssystem; Methode; Modellierung; Semantik; Textanalyse; Vergleich; Visualisierung
Abstract: We present an analysis and visualization method for computing what distinguishes a given document collection from others. We determine topics that discriminate a subset of collections from the remaining ones by applying probabilistic topic modeling and subsequently approximating the two relevant criteria distinctiveness and characteristicness algorithmically through a set of heuristics. Furthermore, we suggest a novel visualization method called DiTop-View, in which topics are represented by glyphs (topic coins) that are arranged on a 2D plane. Topic coins are designed to encode all information necessary for performing comparative analyses such as the class membership of a topic, its most probable terms and the discriminative relations. We evaluate our topic analysis using statistical measures and a small user experiment and present an expert case study with researchers from political sciences analyzing two real-world datasets. (DIPF/Orig.)
DIPF-Abteilung: Informationszentrum Bildung
-
-
Autor*innen: Erbs, Nicolai; Gurevych, Iryna; Zesch, Torsten
Titel: Sense and similarity. A study of sense-level similarity measures
Aus: Bos, Johan; Frank, Anette; Navigli, Roberto (Hrsg.): Proceedings of the 3rd Joint Conference on Lexical and Computational Semantics (SEM 2014), Stroudsburg; PA: Association for Computational Linguistics, 2014 , S. 30-39
URL: http://www.aclweb.org/anthology/S14-1004
Dokumenttyp: 4. Beiträge in Sammelbänden; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Ambiguität; Begriff; Computerlinguistik; Messung; Semantik; Sinn; Textanalyse; Wort
Abstract: In this paper, we investigate the difference between word and sense similarity measures and present means to convert a state-of-the-art word similarity measure into a sense similarity measure. In order to evaluate the new measure, we create a special sense similarity dataset and re-rate an existing word similarity dataset using two different sense inventories from WordNet and Wikipedia. We discover that word-level measures were not able to differentiate between different senses of one word, while sense-level measures actually increase correlation when shifting to sense similarities. Sense-level similarity measures improve when evaluated with a re-rated sense-aware gold standard, while correlation with word-level similarity measures decreases. (DIPF/Org.)
DIPF-Abteilung: Informationszentrum Bildung
-
-
Autor*innen: Flekova, Lucie; Ferschke, Oliver; Gurevych, Iryna
Titel: UKPDIPF. A lexical semantic approach to sentiment polarity prediction in Twitter data
Aus: Nakov, Preslav; Zesch, Torsten (Hrsg.): Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), Stroudsburg; PA: Association for Computational Linguistics, 2014 , S. 704-710
URL: http://alt.qcri.org/semeval2014/cdrom/pdf/SemEval2014126.pdf
Dokumenttyp: 4. Beiträge in Sammelbänden; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Ausdruck <Psy>; Computerlinguistik; Emotion; Klassifikation; Schriftsprache; Semantik; Soziale Software; Textanalyse
Abstract: We present a sentiment classification system that participated in the SemEval 2014 shared task on sentiment analysis in Twitter. Our system expands tokens in a tweet with semantically similar expressions using a large novel distributional thesaurus and calculates the semantic relatedness of the expanded tweets to word lists repre- senting positive and negative sentiment. This approach helps to assess the polarity of tweets that do not directly contain polarity cues. Moreover, we incorporate syntactic, lexical and surface sentiment features. On the message level, our system achieved the 8th place in terms of macroaveraged F-score among 50 systems, with particularly good performance on the Life-Journal corpus (F1=71.92) and the Twitter sarcasm (F1=54.59) dataset. On the expression level, our system ranked 14 out of 27 systems, based on macro-averaged F-score. (DIPF/Orig.)
DIPF-Abteilung: Informationszentrum Bildung
-
-
Autor*innen: Matuschek, Michael; Gurevych, Iryna
Titel: High performance word sense alignment by joint modeling of sense distance and gloss similarity
Aus: Tsujii, Junichi; Hajic, Jan (Hrsg.): Proceedings of COLING 2014: Technical papers, Stroudsburg; PA: Association for Computational Linguistics, 2014 , S. 245-256
URL: http://www.aclweb.org/anthology/C/C14/C14-1025.pdf
Dokumenttyp: 4. Beiträge in Sammelbänden; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Algorithmus; Automatisierung; Computerlinguistik; Nachschlagewerk; Online; Semantik; Sinn; Wort
Abstract: In this paper, we present a machine learning approach for word sense alignment (WSA) which combines distances between senses in the graph representations of lexical-semantic resources with gloss similarities. In this way, we significantly outperform the state of the art on each of the four datasets we consider. Moreover, we present two novel datasets for WSA between Wiktionary and Wikipedia in English and German. The latter dataset in not only of unprecedented size, but also created by the large community of Wiktionary editors instead of expert annotators, making it an interesting subject of study in its own right as the first crowdsourced WSA dataset. We will make both datasets freely available along with our computed alignments. (DIPF/Orig.)
DIPF-Abteilung: Informationszentrum Bildung
-
-
Autor*innen: Matuschek, Michael; Miller, Tristan; Gurevych, Iryna
Titel: A language-independent sense clustering approach for enhanced WSD
Aus: Ruppenhofer, Josef;Faaß, Gertrud (Hrsg.): Proceedings of the12th edition of the Konvens Conference, Hildesheim: Universitätsverlag Hildesheim, 2014 , S. 11-21
URL: http://nbn-resolving.de/urn:nbn:de:gbv:hil2-opus-2893
Dokumenttyp: 4. Beiträge in Sammelbänden; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Computerlinguistik; Semantik; Sinn; Sprachwissenschaft; Wort; Wortschatz
Abstract: We present a method for clustering word senses of a lexical-semantic resource by mapping them to those of another sense inventory. This is a promising way of reducing polysemy in sense inventories and consequently improving word sense disambiguation performance. In contrast to previous approaches, we use Dijkstra-WSA, a parameterizable alignment algorithm which is largely resource- and language-agnostic. To demonstrate this, we apply our technique to GermaNet, the German equivalent to WordNet. The Germa- Net sense clusterings we induce through alignments to various collaboratively constructed resources achieve a significant boost in accuracy, even though our method is far less complex and less dependent on language-specific knowledge than past approaches. (DIPF/Orig.)
DIPF-Abteilung: Informationszentrum Bildung
-
-
Autor*innen: Remus, Steffen
Titel: Unsupervised relation extraction of in-domain data from focused crawls
Aus: Association for Computational Linguistics (Hrsg.): Proceedings of the Student Research Workshop at the 14th Conference of the European Chapter of the Association for Computational Linguistics (ACL), Stroudsburg; PA: Association for Computational Linguistics, 2014 , S. 11-20
URL: http://aclweb.org/anthology//E/E14/E14-3002.pdf
Dokumenttyp: 4. Beiträge in Sammelbänden; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Computerlinguistik; Semantik; Textanalyse
Abstract: This thesis proposal approaches unsuper- vised relation extraction from web data, which is collected by crawling only those parts of the web that are from the same do- main as a relatively small reference cor- pus. The first part of this proposal is con- cerned with the efficient discovery of web documents for a particular domain and in a particular language. We create a com- bined, focused web crawling system that automatically collects relevant documents and minimizes the amount of irrelevant web content. The collected web data is semantically processed in order to acquire rich in-domain knowledge. Here, we focus on fully unsupervised relation extraction by employing the extended distributional hypothesis. We use distributional similar- ities between two pairs of nominals based on dependency paths as context and vice versa for identifying relational structure. We apply our system for the domain of educational sciences by focusing primarily on crawling scientific educational publica- tions in the web. We are able to produce promising initial results on relation identi- fication and we will discuss future direc- tions. (DIPF/Orig.)
DIPF-Abteilung: Informationszentrum Bildung