Menü Überspringen
Kontakt
Presse
Deutsch
English
Not track
Datenverarbeitung
Suche
Anmelden
DIPF aktuell
Forschung
Infrastrukturen
Institut
Zurück
Kontakt
Presse
Deutsch
English
Not track
Datenverarbeitung
Suche
Startseite
>
Forschung
>
Publikationen
>
Publikationendatenbank
Ergebnis der Suche in der DIPF Publikationendatenbank
Ihre Abfrage:
(Schlagwörter: "Synonym")
zur erweiterten Suche
Suchbegriff
Nur Open Access
Suchen
Markierungen aufheben
Alle Treffer markieren
Export
3
Inhalte gefunden
Alle Details anzeigen
Lexical substitution dataset for German
Cholakov, Kostadin; Biemann, Chris; Eckle-Kohler, Judith; Gurevych, Iryna
Sammelbandbeitrag
| Aus: Calzolari, Nicoletta;Choukri,Khalid;Declerck,Thierry;Loftsson,Hrafn;Maegaard,Bente;Mariani,Joseph;Moreno,Asuncion;Odijk,Jan;Piperidis,Stelios (Hrsg.): Proceedings of the 9th International Conference on Language Resources and Evaluations (LREC 2014) | Reykjavik: European Language Resources Association | 2014
34575 Endnote
Autor*innen:
Cholakov, Kostadin; Biemann, Chris; Eckle-Kohler, Judith; Gurevych, Iryna
Titel:
Lexical substitution dataset for German
Aus:
Calzolari, Nicoletta;Choukri,Khalid;Declerck,Thierry;Loftsson,Hrafn;Maegaard,Bente;Mariani,Joseph;Moreno,Asuncion;Odijk,Jan;Piperidis,Stelios (Hrsg.): Proceedings of the 9th International Conference on Language Resources and Evaluations (LREC 2014), Reykjavik: European Language Resources Association, 2014 , S. 1406-1411
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/545_Paper.pdf
Dokumenttyp:
4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Sprache:
Englisch
Schlagwörter:
Computerlinguistik; Computerunterstütztes Verfahren; Daten; Deutsch; Nachschlagewerk; Online; Sprachanalyse; Synonym; Textanalyse; World wide web 2.0; Wort
Abstract:
This article describes a lexical substitution dataset for German. The whole dataset contains 2,040 sentences from the German Wikipedia, with one target word in each sentence. There are 51 target nouns, 51 adjectives, and 51 verbs randomly selected from 3 frequency groups based on the lemma frequency list of the German WaCKy corpus. 200 sentences have been annotated by 4 professional annotators and the remaining sentences by 1 professional annotator and 5 additional annotators who have been recruited via crowdsourcing. The resulting dataset can be used to evaluate not only lexical substitution systems, but also different sense inventories and word sense disambiguation systems.
DIPF-Abteilung:
Informationszentrum Bildung
Supervised all-words lexical substitution using delexicalized features
Szarvas, György; Biemann, Chris; Gurevych, Iryna
Sammelbandbeitrag
| Aus: Association for Computational Linguistics (Hrsg.): Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT) | Stroudsburg; PA: Association for Computational Linguistics | 2013
33528 Endnote
Autor*innen:
Szarvas, György; Biemann, Chris; Gurevych, Iryna
Titel:
Supervised all-words lexical substitution using delexicalized features
Aus:
Association for Computational Linguistics (Hrsg.): Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT), Stroudsburg; PA: Association for Computational Linguistics, 2013 , S. 1131-1141
URL:
https://www.ukp.tu-darmstadt.de/fileadmin/user_upload/Group_UKP/publikationen/2013/SzarvasBiemannGurevych_naaclhlt2013.pdf
Dokumenttyp:
4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Sprache:
Englisch
Schlagwörter:
Automatisierung; Computerlinguistik; Information Retrieval; Methode; Modell; Sinn; Synonym; Textanalyse; Thesaurus; Verfahren; Wort
Abstract (english):
We propose a supervised lexical substitution system that does not use separate classifiers per word and is therefore applicable to any word in the vocabulary. Instead of learning word-specific substitution patterns, a global model for lexical substitution is trained on delexicalized (i.e., non lexical) features, which allows to exploit the power of supervised methods while being able to generalize beyond target words in the training set. This way, our approach remains technically straightforward, provides better performance and similar coverage in comparison to unsupervised approaches. Using features from lexical resources, as well as a variety of features computed from large corpora (n-gram counts, distributional similarity) and a ranking method based on the posterior probabilities obtained from a Maximum Entropy classifier, we improve over the state of the art in the LexSub Best-Precision metric and the Generalized Average Precision measure. Robustness of our approach is demonstrated by evaluating it successfully on two different datasets.
DIPF-Abteilung:
Informationszentrum Bildung
Uncertainty detection for natural language watermarking
Szarvas, György; Gurevych, Iryna
Sammelbandbeitrag
| Aus: Mitkov, Ruslan; Park, Jong C. (Hrsg.): Proceedings of the Sixth International Joint Conference on Natural Language Processing (IJCNLP 2013) | Nagoya: Asian Federation of Natural Language Processing | 2013
34038 Endnote
Autor*innen:
Szarvas, György; Gurevych, Iryna
Titel:
Uncertainty detection for natural language watermarking
Aus:
Mitkov, Ruslan; Park, Jong C. (Hrsg.): Proceedings of the Sixth International Joint Conference on Natural Language Processing (IJCNLP 2013), Nagoya: Asian Federation of Natural Language Processing, 2013 , S. 1188-1194
URL:
https://www.ukp.tu-darmstadt.de/fileadmin/user_upload/Group_UKP/publikationen/2013/IJCNLP_2013_Szarvas.pdf
Dokumenttyp:
4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Sprache:
Englisch
Schlagwörter:
Algorithmus; Computerlinguistik; Daten; Information; Synonym; Text; Veränderung; Wort
Abstract:
In this paper we investigate the application of uncertainty detection to text watermarking, a problem where the aim is to produce individually identifiable copies of a source text via small manipulations to the text (e.g. synonym substitutions). As previous attempts showed, accurate paraphrasing is challenging in an open vocabulary setting, so we propose the use of the closed word class of uncertainty cues. We demonstrate that these words are promising for text watermarking as they can be accurately disambiguated (from the noncue uses of the same words) and their substitution with other cues has marginal impact to the meaning of the text.
DIPF-Abteilung:
Informationszentrum Bildung
Markierungen aufheben
Alle Treffer markieren
Export