Logo: Deutsches Institut für Internationale Pädagogische Forschung

Forschung

Publikationendatenbank

Treffer anzeigen

Autor:
Jamison, Emily; Gurevych, Iryna:

Titel:
Headerless, quoteless, but not hopeless?
Using pairwise email classification to disentangle email threads

Quelle:
In: Angelova, Galia; Bontcheva, Kalina; Mitkov, Ruslan (Hrsg.): Proceedings of 9th Conference on Recent Advances in Natural Language Processing (RANLP 2013) Shoumen : INCOMA Ltd. (2013) , 327-335

URL des Volltextes:
https://www.ukp.tu-darmstadt.de/fileadmin/user_upload/Group_UKP/publikationen/2013/RANLP_2013_EJIG_Camera.pdf

Sprache:
Englisch

Dokumenttyp:
4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings

Schlagwörter:
Computerlinguistik, Datenanalyse, E-Mail, Inhalt, Klassifikation, Semantik, Struktur, Text


Abstract(original):
Thread disentanglement is the task of separating out conversations whose thread structure is implicit, distorted, or lost. In this paper, we perform email thread disentanglement through pairwise classification, using text similarity measures on non-quoted texts in emails. We show that i) content text similarity metrics outperform style and structure text similarity metrics in both a class-balanced and class-imbalanced setting, and ii) although feature performance is dependent on the semantic similarity of the corpus, content features are still effective even when controlling for semantic similarity. We make available the Enron Threads Corpus, a newly-extracted corpus of 70,178 multiemail threads with emails from the Enron Email Corpus.


DIPF-Abteilung:
Informationszentrum Bildung

Notizen:

zuletzt verändert: 11.11.2016