DIPF database of publications

Detailansicht Treffer

DIPF database of publications

Show results

Author
Jamison, Emily; Gurevych, Iryna:

Title:
Headerless, quoteless, but not hopeless?
Using pairwise email classification to disentangle email threads

Source:
In: Angelova, Galia; Bontcheva, Kalina; Mitkov, Ruslan (Hrsg.): Proceedings of 9th Conference on Recent Advances in Natural Language Processing (RANLP 2013) Shoumen : INCOMA Ltd. (2013) , 327-335

URL of full text:
https://www.ukp.tu-darmstadt.de/fileadmin/user_upload/Group_UKP/publikationen/2013/RANLP_2013_EJIG_Camera.pdf

Language:
Englisch

Document type
4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings

Schlagwörter:
Computerlinguistik, Datenanalyse, E-Mail, Inhalt, Klassifikation, Semantik, Struktur, Text


Abstract(original):
Thread disentanglement is the task of separating out conversations whose thread structure is implicit, distorted, or lost. In this paper, we perform email thread disentanglement through pairwise classification, using text similarity measures on non-quoted texts in emails. We show that i) content text similarity metrics outperform style and structure text similarity metrics in both a class-balanced and class-imbalanced setting, and ii) although feature performance is dependent on the semantic similarity of the corpus, content features are still effective even when controlling for semantic similarity. We make available the Enron Threads Corpus, a newly-extracted corpus of 70,178 multiemail threads with emails from the Enron Email Corpus.


DIPF-Departments:
Informationszentrum Bildung

Notes: