-
-
Autor*innen: Bär, Daniel; Zesch, Torsten; Gurevych, Iryna
Titel: Text reuse detection using a composition of text similarity measures
Aus: Kay, Martin; Boitet, Christian (Hrsg.): Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012), Mumbai: The COLING 2012 Organizing Committee, 2012 , S. 167-184
URL: http://www.aclweb.org/anthology/C/C12/C12-1011.pdf
Dokumenttyp: 4. Beiträge in Sammelbänden; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Computerunterstütztes Verfahren; Erkennen; Inhalt; Messung; Plagiat; Struktur; Text; Textanalyse; Vergleich
Abstract: Detecting text reuse is a fundamental requirement for a variety of tasks and applications,
ranging from journalistic text reuse to plagiarism detection. Text reuse is traditionally detected
by computing similarity between a source text and a possibly reused text. However, existing text
similarity measures exhibit a major limitation: They compute similarity only on features which
can be derived from the content of the given texts, thereby inherently implying that any other
text characteristics are negligible. In this paper, we overcome this traditional limitation and
compute similarity along three characteristic dimensions inherent to texts: content, structure,
and style. We explore and discuss possible combinations of measures along these dimensions,
and our results demonstrate that the composition consistently outperforms previous approaches
on three standard evaluation datasets, and that text reuse detection greatly benefits from
incorporating a diverse feature set that reflects a wide variety of text characteristics.
DIPF-Abteilung: Informationszentrum Bildung