-
-
Autor*innen: Erbs, Nicolai; Gurevych, Iryna; Rittberger, Marc
Titel: Bringing order to digital libraries. From keyphrase extraction to index term assignment
In: D-lib magazine, 19 (2013) 9
DOI: 10.1045/september2013-erbs
URL: http://www.dlib.org/dlib/september13/erbs/09erbs.html
Dokumenttyp: 3b. Beiträge in weiteren Zeitschriften; wissenschaftsorientiert
Sprache: Englisch
Schlagwörter: Automatisierung; Bildung; Computerlinguistik; Dokument; Elektronische Bibliothek; Indexierung; Klassifikation; Volltext
Abstract: Collections of topically related documents held by digital libraries are valuable resources for users; however, as collections grow, it becomes more difficult to search them for specific information. Structure needs to be introduced to facilitate searching. Assigning index terms is helpful, but it is a tedious task even for professional indexers, requiring knowledge about the collection in general, and the document in particular. Automatic index term assignment (ITA) is considered to be a great improvement. In this paper we present a hybrid approach to index term assignment, using a combination of keyphrase extraction and multi-label classification. Keyphrase extraction efficiently assigns infrequently used index terms, while multi-label classification assigns frequently used index terms. We compare results to other state-of-the-art approaches for related tasks. The assigned index terms allow for a clustering of the document collection. Using hybrid and individual approaches, we evaluate a dataset consisting of German educational documents that was created by professional indexers, and is the first one with German data that allows estimating performance of ITA on languages other than English.
DIPF-Abteilung: Informationszentrum Bildung