Logo: Deutsches Institut für Internationale Pädagogische Forschung

Publications

Publikationendatenbank

show results

Autor:
Erbs, Nicolai; Gurevych, Iryna; Rittberger, Marc:

Titel:
Bringing order to digital libraries
from keyphrase extraction to index term assignment

Quelle:
In: D-lib magazine, 19 (2013) 9

URL des Volltextes:
http://www.dlib.org/dlib/september13/erbs/09erbs.html

Sprache:
Englisch

Dokumenttyp:
3b. Beiträge in weiteren Zeitschriften; wissenschaftsorientiert

Schlagwörter:
Automatisierung, Bildung, Computerlinguistik, Dokument, Elektronische Bibliothek, Indexierung, Klassifikation, Volltext


Abstract(original):
Collections of topically related documents held by digital libraries are valuable resources for users; however, as collections grow, it becomes more difficult to search them for specific information. Structure needs to be introduced to facilitate searching. Assigning index terms is helpful, but it is a tedious task even for professional indexers, requiring knowledge about the collection in general, and the document in particular. Automatic index term assignment (ITA) is considered to be a great improvement. In this paper we present a hybrid approach to index term assignment, using a combination of keyphrase extraction and multi-label classification. Keyphrase extraction efficiently assigns infrequently used index terms, while multi-label classification assigns frequently used index terms. We compare results to other state-of-the-art approaches for related tasks. The assigned index terms allow for a clustering of the document collection. Using hybrid and individual approaches, we evaluate a dataset consisting of German educational documents that was created by professional indexers, and is the first one with German data that allows estimating performance of ITA on languages other than English.


DIPF-Abteilung:
Informationszentrum Bildung

Notizen:

last modified Nov 11, 2016