Menü Überspringen
Contact
Deutsch
English
Not track
Data Protection
Search
Log in
DIPF News
Research
Infrastructures
Institute
Zurück
Contact
Deutsch
English
Not track
Data Protection
Search
Home
>
Research
>
Publications
>
Publications Data Base
Search results in the DIPF database of publications
Your query:
(Schlagwörter: "Computerlinguistik")
Advanced Search
Search term
Only Open Access
Search
Unselect matches
Select all matches
Export
110
items matching your search terms.
Show all details
Medical concept embeddings via labeled background corpora
Mencía, Eneldo Loza; De Melo, Gerard; Nam, Jinseok
Book Chapter
| Aus: European Language Resources Association (Hrsg.): Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016) | Portoroz: European Language Resources Association | 2016
37067 Endnote
Author(s):
Mencía, Eneldo Loza; De Melo, Gerard; Nam, Jinseok
Title:
Medical concept embeddings via labeled background corpora
In:
European Language Resources Association (Hrsg.): Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), Portoroz: European Language Resources Association, 2016 , S. 3629-3636
URL:
http://www.lrec-conf.org/proceedings/lrec2016/pdf/1190_Paper.pdf
Publication Type:
4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Language:
Englisch
Keywords:
Algorithmus; Automatisierung; Computerlinguistik; Medizin; Semantik; Sprache; Textanalyse
Abstract:
In recent years, we have seen an increasing amount of interest in low-dimensional vector representations of words. Among other things, these facilitate computing word similarity and relatedness scores. The most well-known example of algorithms to produce representations of this sort are the word2vec approaches. In this paper, we investigate a new model to induce such vector spaces for medical concepts, based on a joint objective that exploits not only word co-occurrences but also manually labeled documents, as available from sources such as PubMed. Our extensive experimental analysis shows that our embeddings lead to significantly higher correlations with human similarity and relatedness assessments than previous work. Due to the simplicity and versatility of vector representations, these findings suggest that our resource can easily be used as a drop-in replacement to improve any systems relying on medical concept similarity measures. (DIPF/Orig.)
DIPF-Departments:
Informationszentrum Bildung
Enriching wikidata with frame semantics
Mousselly-Sergieh, Hatem; Gurevych, Iryna
Book Chapter
| Aus: Association for Computational Linguistics (Hrsg.): Proceedings of the 5th workshop on automated knowledge base construction (AKBC) 2016 held in conjunction with NAACL 2016 | Stroudsburg; PA: Association for Computational Linguistics | 2016
36979 Endnote
Author(s):
Mousselly-Sergieh, Hatem; Gurevych, Iryna
Title:
Enriching wikidata with frame semantics
In:
Association for Computational Linguistics (Hrsg.): Proceedings of the 5th workshop on automated knowledge base construction (AKBC) 2016 held in conjunction with NAACL 2016, Stroudsburg; PA: Association for Computational Linguistics, 2016 , S. 29-34
URL:
https://www.ukp.tu-darmstadt.de/fileadmin/user_upload/Group_UKP/publikationen/2016/2016_NAACL_AKBC_HMS.pdf
Publication Type:
4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Language:
Englisch
Keywords:
Automatisierung; Computerlinguistik; Lexikon; Mehrsprachigkeit; Online; Semantik
Abstract (english):
Wikidata is a large-scale, multilingual and freely available knowledge base. It contains more than 14 million facts, however, it is still missing linguistic information. In this paper, we aim to bridge this gap by aligning Wikidata with FrameNet lexicon. We propose an approach based on word embedding to identify a mapping between Wikidata relations, called properties, and FrameNet frames and to annotate the arguments of each relation with the semantic roles of the matching frames. Early empirical results show the advantage of our approach compared to other baseline methods. (DIPF/Orig.)
DIPF-Departments:
Informationszentrum Bildung
Domain-specific corpus expansion with focused webcrawling
Remus, Steffen; Biemann, Chris
Book Chapter
| Aus: European Language Resources Association (Hrsg.): Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016) | Portoroz: European Language Resources Association | 2016
37066 Endnote
Author(s):
Remus, Steffen; Biemann, Chris
Title:
Domain-specific corpus expansion with focused webcrawling
In:
European Language Resources Association (Hrsg.): Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), Portoroz: European Language Resources Association, 2016 , S. 3607-3611
URL:
http://www.lrec-conf.org/proceedings/lrec2016/pdf/316_Paper.pdf
Publication Type:
4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Language:
Englisch
Keywords:
Algorithmus; Automatisierung; Bildung; Computerlinguistik; Data Mining; Hypertext; Modell; Sprache; Text; Textanalyse
Abstract:
This work presents a straightforward method for extending or creating in-domain web corpora by focused webcrawling. The focused webcrawler uses statistical N-gram language models to estimate the relatedness of documents and weblinks and needs as input only N-grams or plain texts of a predefined domain and seed URLs as starting points. Two experiments demonstrate that our focused crawler is able to stay focused in domain and language. The first experiment shows that the crawler stays in a focused domain, the second experiment demonstrates that language models trained on focused crawls obtain better perplexity scores on in-domain corpora. We distribute the focused crawler as open source software. (DIPF/Orig.)
DIPF-Departments:
Informationszentrum Bildung
Crowdsourcing a large dataset of domain-specific context-sensitive semantic verb relations
Sukhareva, Maria; Eckle-Kohler, Judith; Habernal, Ivan; Gurevych, Iryna
Book Chapter
| Aus: European Language Resources Association (Hrsg.): Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016) | Portoroz: European Language Resources Association | 2016
36972 Endnote
Author(s):
Sukhareva, Maria; Eckle-Kohler, Judith; Habernal, Ivan; Gurevych, Iryna
Title:
Crowdsourcing a large dataset of domain-specific context-sensitive semantic verb relations
In:
European Language Resources Association (Hrsg.): Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), Portoroz: European Language Resources Association, 2016 , S. 2131-2137
URL:
https://www.ukp.tu-darmstadt.de/fileadmin/user_upload/Group_UKP/publikationen/2016/lrec2016_sukhareva.pdf
Publication Type:
4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Language:
Englisch
Keywords:
Automatisierung; Computerlinguistik; Data Mining; Klassifikation; Semantik; Textanalyse
Abstract (english):
We present a new large dataset of 12403 context-sensitive verb relations manually annotated via crowdsourcing. These relations capture fine-grained semantic information between verb-centric propositions, such as temporal or entailment relations. We propose a novel semantic verb relation scheme and design a multi-step annotation approach for scaling-up the annotations using crowdsourcing. We employ several quality measures and report on agreement scores. The resulting dataset is available under a permissive CreativeCommons license. It represents a valuable resource for various applications, such as automatic information consolidation or automatic summarization. (DIPF/Orig.)
DIPF-Departments:
Informationszentrum Bildung
Using semantic similarity for multi-label zero-shot classification of text documents
Veeranna, Sappadla Prateek; Nam, Jinseok; Mencía, Eneldo Loza; Fürnkranz, Johannes
Book Chapter
| Aus: European Symposium on Artificial Neural Networks (Hrsg.): ESANN 2016 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges (Belgium), 27-29 April 2016 | Bruges: European Symposium on Artificial Neural Networks | 2016
36982 Endnote
Author(s):
Veeranna, Sappadla Prateek; Nam, Jinseok; Mencía, Eneldo Loza; Fürnkranz, Johannes
Title:
Using semantic similarity for multi-label zero-shot classification of text documents
In:
European Symposium on Artificial Neural Networks (Hrsg.): ESANN 2016 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges (Belgium), 27-29 April 2016, Bruges: European Symposium on Artificial Neural Networks, 2016 , S. 423-428
URL:
https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2016-174.pdf
Publication Type:
4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Language:
Englisch
Keywords:
Computerlinguistik; Klassifikation; Semantik; Text
Abstract (english):
In this paper, we examine a simple approach to zero-shot multi-label text classification, i.e., to the problem of predicting multiple, possibly previously unseen labels for a document. In particular, we propose to use a semantic embedding of label and document words and base the prediction of previously unseen labels on the similarity between the label name and the document words in this embedding. Experiments on three textual datasets across various domains show that even such a simple technique yields considerable performance improvements over a simple uninformed baseline. (DIPF/Orig.)
DIPF-Departments:
Informationszentrum Bildung
The eras and trends of automatic short answer grading
Burrows, Steven; Gurevych, Iryna; Stein, Benno
Journal Article
| In: International Journal of Artificial Intelligence in Education | 2015
34978 Endnote
Author(s):
Burrows, Steven; Gurevych, Iryna; Stein, Benno
Title:
The eras and trends of automatic short answer grading
In:
International Journal of Artificial Intelligence in Education, 25 (2015) 1, S. 60-117
DOI:
10.1007/s40593-014-0026-8
URL:
http://link.springer.com/article/10.1007/s40593-014-0026-8
Publication Type:
3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Language:
Englisch
Keywords:
Automatisierung; Computerlinguistik; Computerunterstütztes Verfahren; Frage; Leistungsbeurteilung; Methode; Notengebung; Technologiebasiertes Testen; Testaufgabe
Abstract:
Automatic short answer grading (ASAG) is the task of assessing short natural language responses to objective questions using computational methods. The active research in this field has increased enormously of late with over 80 papers fitting a definition of ASAG. However, the past efforts have generally been ad-hoc and non-comparable until recently, hence the need for a unified view of the whole field. The goal of this paper is to address this aim with a comprehensive review of ASAG research and systems according to history and components. Our historical analysis identifies 35 ASAG systems within 5 temporal themes that mark advancement in methodology or evaluation. In contrast, our component analysis reviews 6 common dimensions from preprocessing to effectiveness. A key conclusion is that an era of evaluation is the newest trend in ASAG research, which is paving the way for the consolidation of the field. (DIPF/Orig.)
DIPF-Departments:
Informationszentrum Bildung
Analyzing domain suitability of a sentiment lexicon by identifying distributionally bipolar words
Flekova, Lucie; Ruppert, Eugen; Preotiuc-Pietro, Daniel
Book Chapter
| Aus: Association for Computational Linguistics (Hrsg.): 6th workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA 2015): Workshop proceedings, 17 September 2015, Lisboa, Portugal | Red Hook; NY: Association for Computational Linguistics | 2015
37028 Endnote
Author(s):
Flekova, Lucie; Ruppert, Eugen; Preotiuc-Pietro, Daniel
Title:
Analyzing domain suitability of a sentiment lexicon by identifying distributionally bipolar words
In:
Association for Computational Linguistics (Hrsg.): 6th workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA 2015): Workshop proceedings, 17 September 2015, Lisboa, Portugal, Red Hook; NY: Association for Computational Linguistics, 2015 , S. 77-84
URL:
http://www.emnlp2015.org/proceedings/WASSA/WASSA-2015.pdf#page=89
Publication Type:
4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Language:
Englisch
Keywords:
Automatisierung; Computerlinguistik; Emotion; Kommunikation; Lexikographie; Lexikon; Online; Qualität; Soziale Software; Textanalyse; Thesaurus
Abstract:
Contemporary sentiment analysis approaches rely on lexicon based methods. This is mainly due to their simplicity, although the best empirical results can be achieved by more complex techniques. We introduce a method to assess suitability of generic sentiment lexicons for a given domain, namely to identify frequent bigrams where a polar word switches polarity. Our bigrams are scored using Lexicographers Mutual Information and leveraging large automatically obtained corpora. Our score matches human perception of polarity and demonstrates improvements in classification results using our enhanced context-aware method. Our method enhances the assessment of lexicon based sentiment detection and can be further userd to quantify ambiguous words. (DIPF/Orig.)
Linking the thoughts. Analysis of argumentation structures in scientific publications
Kirschner, Christian; Eckle-Kohler, Judith; Gurevych, Iryna
Book Chapter
| Aus: Association for Computational Linguistics (Hrsg.): Proceedings of the 2nd Workshop on Argumentation Mining held in conjunction with the 2015 Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL HLT 2015) | Denver; CO: Association for Computational Linguistics | 2015
35503 Endnote
Author(s):
Kirschner, Christian; Eckle-Kohler, Judith; Gurevych, Iryna
Title:
Linking the thoughts. Analysis of argumentation structures in scientific publications
In:
Association for Computational Linguistics (Hrsg.): Proceedings of the 2nd Workshop on Argumentation Mining held in conjunction with the 2015 Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL HLT 2015), Denver; CO: Association for Computational Linguistics, 2015 , S. 1-11
URL:
https://aclweb.org/anthology/W/W15/W15-05.pdf
Publication Type:
4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Language:
Englisch
Keywords:
Argumentation; Automatisierung; Bildungsforschung; Computerlinguistik; Data Mining; Klassifikation; Textanalyse; Veröffentlichung
Abstract:
This paper presents the results of an annotation study focused on the fine-grained analysis of argumentation structures in scientific publications. Our new annotation scheme specifies four types of binary argumentative relations between sentences, resulting in the representation of arguments as small graph structures. We developed an annotation tool that supports the annotation of such graphs and carried out an annotation study with four annotators on 24 scientific articles from the domain of educational research. For calculating the inter-annotator agreement, we adapted existing measures and developed a novel graph based agreement measure which reflects the semantic similarity of different annotation graphs. (DIPF/Orig.)
DIPF-Departments:
Informationszentrum Bildung
Constructive feedback, thinking process and cooperation. Assessing the quality of classroom […]
Sousa, Tahir; Flekova, Lucie; Mieskes, Margot; Gurevych, Iryna
Book Chapter
| Aus: Möller, Sebastian (Hrsg.): Proceedings of the Interspeech 2015 Conference Dresden | Berlin: Technische Universität | 2015
35635 Endnote
Author(s):
Sousa, Tahir; Flekova, Lucie; Mieskes, Margot; Gurevych, Iryna
Title:
Constructive feedback, thinking process and cooperation. Assessing the quality of classroom interaction
In:
Möller, Sebastian (Hrsg.): Proceedings of the Interspeech 2015 Conference Dresden, Berlin: Technische Universität, 2015 , S. 2739-2743
Publication Type:
4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Language:
Englisch
Keywords:
Computerlinguistik; Datenanalyse; Denken; Deutschland; Diskursanalyse; Feedback; Interaktionsanalyse; Klassifikation; Kooperation; Mathematikunterricht; Qualität; Schulklasse; Schweiz; Semantik; Soziale Interaktion; Sprachanalyse; Unterrichtsforschung; Video
Abstract:
Analyzing and assessing the quality of classroom lessons on a range of quality dimensions is a number one educational research topic, as this allows developing teacher trainings and interventions to improve lesson quality. We model this assessment as a text classification task, exploiting linguistic features to predict the scores in several lesson quality dimensions relevant for educational researchers. Our work relies on a variety of phenomena, amongst them paralinguistic features, such as laughter, from real classroom interactions. We used these features to train machine learning models to assess various quality dimensions of school lessons. Our results show, that especially features focusing on the discourse and semantics are beneficial for this classification task. (DIPF/Orig.)
DIPF-Departments:
Informationszentrum Bildung
Predicting the difficulty of language proficiency tests
Beinborn, Lisa; Zesch, Torsten; Gurevych, Iryna
Journal Article
| In: Transactions of the Association for Computational Linguistics | 2014
34990 Endnote
Author(s):
Beinborn, Lisa; Zesch, Torsten; Gurevych, Iryna
Title:
Predicting the difficulty of language proficiency tests
In:
Transactions of the Association for Computational Linguistics, 2 (2014) , S. 517-529
URL:
http://tacl2013.cs.columbia.edu/ojs/index.php/tacl/article/view/414/88
Publication Type:
3a. Beiträge in begutachteten Zeitschriften; Aufsatz (keine besondere Kategorie)
Language:
Englisch
Keywords:
Computerlinguistik; Datenanalyse; Deutschland; Fremdsprache; Kenntnisse; Lernerfolg; Prognose; Schwierigkeit; Sprachfertigkeit; Sprachtest; Student; Verfahren
Abstract:
Language proficiency tests are used to evaluate and compare the progress of language learners. We present an approach for automatic difficulty prediction of C-tests that performs on par with human experts. On the basis of detailed analysis of newly collected data, we develop a model for C-test difficulty introducing four dimensions: solution difficulty, candidate ambiguity, inter-gap dependency, and paragraph difficulty. We show that cues from all four dimensions contribute to C-test difficulty. (DIPF/Org.)
DIPF-Departments:
Informationszentrum Bildung
Unselect matches
Select all matches
Export
<
1
...
3
4
5
...
11
>
Show all
(110)