-
-
Autor*innen: Erbs, Nicolai; Gurevych, Iryna; Zesch, Torsten
Titel: Hierarchy identification for automatically generating table-of-contents
Aus: Galia Angelova, Kalina Bontcheva, Ruslan Mitkov (Hrsg.): Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP) 2013, Shoumen; Bulgarien: RANLP, 2013 , S. 252-260
URL: http://lml.bas.bg/ranlp2013/docs/RANLP_main.pdf
Dokumenttyp: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Algorithmus; Analyse; Inhalt; Inhaltsanalyse; Stuktur; Text
Abstract (english): A table-of-contents (TOC) provides a
quick reference to a document's content
and structure. We present the first study
on identifying the hierarchical structure
for automatically generating a TOC using
only textual features instead of structural
hints e.g. from HTML-tags. We create
two new datasets to evaluate our approaches
for hierarchy identification. We
find that our algorithm performs on a level
that is sufficient for a fully automated system.
For documents without given segment
titles, we extend our work by automatically
generating segment titles.
We make the datasets and our experimental
framework publicly available in order
to foster future research in TOC generation.
DIPF-Abteilung: Informationszentrum Bildung