Menü Überspringen
Contact
Deutsch
English
Not track
Data Protection
Search
Log in
DIPF News
Research
Infrastructures
Institute
Zurück
Contact
Deutsch
English
Not track
Data Protection
Search
Home
>
Research
>
Publications
>
Publications Data Base
Search results in the DIPF database of publications
Your query:
(Schlagwörter: "Synonym")
Advanced Search
Search term
Only Open Access
Search
Unselect matches
Select all matches
Export
3
items matching your search terms.
Show all details
Lexical substitution dataset for German
Cholakov, Kostadin; Biemann, Chris; Eckle-Kohler, Judith; Gurevych, Iryna
Book Chapter
| Aus: Calzolari, Nicoletta;Choukri,Khalid;Declerck,Thierry;Loftsson,Hrafn;Maegaard,Bente;Mariani,Joseph;Moreno,Asuncion;Odijk,Jan;Piperidis,Stelios (Hrsg.): Proceedings of the 9th International Conference on Language Resources and Evaluations (LREC 2014) | Reykjavik: European Language Resources Association | 2014
34575 Endnote
Author(s):
Cholakov, Kostadin; Biemann, Chris; Eckle-Kohler, Judith; Gurevych, Iryna
Title:
Lexical substitution dataset for German
In:
Calzolari, Nicoletta;Choukri,Khalid;Declerck,Thierry;Loftsson,Hrafn;Maegaard,Bente;Mariani,Joseph;Moreno,Asuncion;Odijk,Jan;Piperidis,Stelios (Hrsg.): Proceedings of the 9th International Conference on Language Resources and Evaluations (LREC 2014), Reykjavik: European Language Resources Association, 2014 , S. 1406-1411
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/545_Paper.pdf
Publication Type:
4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Language:
Englisch
Keywords:
Computerlinguistik; Computerunterstütztes Verfahren; Daten; Deutsch; Nachschlagewerk; Online; Sprachanalyse; Synonym; Textanalyse; World wide web 2.0; Wort
Abstract:
This article describes a lexical substitution dataset for German. The whole dataset contains 2,040 sentences from the German Wikipedia, with one target word in each sentence. There are 51 target nouns, 51 adjectives, and 51 verbs randomly selected from 3 frequency groups based on the lemma frequency list of the German WaCKy corpus. 200 sentences have been annotated by 4 professional annotators and the remaining sentences by 1 professional annotator and 5 additional annotators who have been recruited via crowdsourcing. The resulting dataset can be used to evaluate not only lexical substitution systems, but also different sense inventories and word sense disambiguation systems.
DIPF-Departments:
Informationszentrum Bildung
Supervised all-words lexical substitution using delexicalized features
Szarvas, György; Biemann, Chris; Gurevych, Iryna
Book Chapter
| Aus: Association for Computational Linguistics (Hrsg.): Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT) | Stroudsburg; PA: Association for Computational Linguistics | 2013
33528 Endnote
Author(s):
Szarvas, György; Biemann, Chris; Gurevych, Iryna
Title:
Supervised all-words lexical substitution using delexicalized features
In:
Association for Computational Linguistics (Hrsg.): Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT), Stroudsburg; PA: Association for Computational Linguistics, 2013 , S. 1131-1141
URL:
https://www.ukp.tu-darmstadt.de/fileadmin/user_upload/Group_UKP/publikationen/2013/SzarvasBiemannGurevych_naaclhlt2013.pdf
Publication Type:
4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Language:
Englisch
Keywords:
Automatisierung; Computerlinguistik; Information Retrieval; Methode; Modell; Sinn; Synonym; Textanalyse; Thesaurus; Verfahren; Wort
Abstract (english):
We propose a supervised lexical substitution system that does not use separate classifiers per word and is therefore applicable to any word in the vocabulary. Instead of learning word-specific substitution patterns, a global model for lexical substitution is trained on delexicalized (i.e., non lexical) features, which allows to exploit the power of supervised methods while being able to generalize beyond target words in the training set. This way, our approach remains technically straightforward, provides better performance and similar coverage in comparison to unsupervised approaches. Using features from lexical resources, as well as a variety of features computed from large corpora (n-gram counts, distributional similarity) and a ranking method based on the posterior probabilities obtained from a Maximum Entropy classifier, we improve over the state of the art in the LexSub Best-Precision metric and the Generalized Average Precision measure. Robustness of our approach is demonstrated by evaluating it successfully on two different datasets.
DIPF-Departments:
Informationszentrum Bildung
Uncertainty detection for natural language watermarking
Szarvas, György; Gurevych, Iryna
Book Chapter
| Aus: Mitkov, Ruslan; Park, Jong C. (Hrsg.): Proceedings of the Sixth International Joint Conference on Natural Language Processing (IJCNLP 2013) | Nagoya: Asian Federation of Natural Language Processing | 2013
34038 Endnote
Author(s):
Szarvas, György; Gurevych, Iryna
Title:
Uncertainty detection for natural language watermarking
In:
Mitkov, Ruslan; Park, Jong C. (Hrsg.): Proceedings of the Sixth International Joint Conference on Natural Language Processing (IJCNLP 2013), Nagoya: Asian Federation of Natural Language Processing, 2013 , S. 1188-1194
URL:
https://www.ukp.tu-darmstadt.de/fileadmin/user_upload/Group_UKP/publikationen/2013/IJCNLP_2013_Szarvas.pdf
Publication Type:
4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Language:
Englisch
Keywords:
Algorithmus; Computerlinguistik; Daten; Information; Synonym; Text; Veränderung; Wort
Abstract:
In this paper we investigate the application of uncertainty detection to text watermarking, a problem where the aim is to produce individually identifiable copies of a source text via small manipulations to the text (e.g. synonym substitutions). As previous attempts showed, accurate paraphrasing is challenging in an open vocabulary setting, so we propose the use of the closed word class of uncertainty cues. We demonstrate that these words are promising for text watermarking as they can be accurately disambiguated (from the noncue uses of the same words) and their substitution with other cues has marginal impact to the meaning of the text.
DIPF-Departments:
Informationszentrum Bildung
Unselect matches
Select all matches
Export