Logo: Deutsches Institut für Internationale Pädagogische Forschung

Publications

Publikationendatenbank

show results

Autor:
Erbs, Nicolai; Gurevych, Iryna; Zesch, Torsten:

Titel:
Hierarchy identification for automatically generating table-of-contents

Quelle:
In: Galia Angelova, Kalina Bontcheva, Ruslan Mitkov (Hrsg.): Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP) 2013 Shoumen, Bulgarien : RANLP (2013) , 252-260

URL des Volltextes:
http://lml.bas.bg/ranlp2013/docs/RANLP_main.pdf

Sprache:
Englisch

Dokumenttyp:
4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings

Schlagwörter:
Algorithmus, Analyse, Inhalt, Inhaltsanalyse, Stuktur, Text


Abstract(englisch):
A table-of-contents (TOC) provides a quick reference to a document's content and structure. We present the first study on identifying the hierarchical structure for automatically generating a TOC using only textual features instead of structural hints e.g. from HTML-tags. We create two new datasets to evaluate our approaches for hierarchy identification. We find that our algorithm performs on a level that is sufficient for a fully automated system. For documents without given segment titles, we extend our work by automatically generating segment titles. We make the datasets and our experimental framework publicly available in order to foster future research in TOC generation.


Notizen:

last modified Nov 11, 2016