-
-
Author(s): Jamison, Emily; Gurevych, Iryna
Title: Headerless, quoteless, but not hopeless? Using pairwise email classification to disentangle email threads
In: Angelova, Galia; Bontcheva, Kalina; Mitkov, Ruslan (Hrsg.): Proceedings of 9th Conference on Recent Advances in Natural Language Processing (RANLP 2013), Shoumen: INCOMA Ltd., 2013 , S. 327-335
URL: https://www.ukp.tu-darmstadt.de/fileadmin/user_upload/Group_UKP/publikationen/2013/RANLP_2013_EJIG_Camera.pdf
Publication Type: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Language: Englisch
Keywords: Computerlinguistik; Datenanalyse; E-Mail; Inhalt; Klassifikation; Semantik; Struktur; Text
Abstract: Thread disentanglement is the task of separating
out conversations whose thread
structure is implicit, distorted, or lost. In
this paper, we perform email thread disentanglement
through pairwise classification,
using text similarity measures on
non-quoted texts in emails. We show
that i) content text similarity metrics outperform
style and structure text similarity
metrics in both a class-balanced and
class-imbalanced setting, and ii) although
feature performance is dependent on the
semantic similarity of the corpus, content
features are still effective even when
controlling for semantic similarity. We
make available the Enron Threads Corpus,
a newly-extracted corpus of 70,178 multiemail
threads with emails from the Enron
Email Corpus.
DIPF-Departments: Informationszentrum Bildung