-
-
Autor*innen: Remus, Steffen
Titel: Unsupervised relation extraction of in-domain data from focused crawls
Aus: Association for Computational Linguistics (Hrsg.): Proceedings of the Student Research Workshop at the 14th Conference of the European Chapter of the Association for Computational Linguistics (ACL), Stroudsburg; PA: Association for Computational Linguistics, 2014 , S. 11-20
URL: http://aclweb.org/anthology//E/E14/E14-3002.pdf
Dokumenttyp: 4. Beiträge in Sammelbänden; Tagungsband/Konferenzbeitrag/Proceedings
Sprache: Englisch
Schlagwörter: Computerlinguistik; Semantik; Textanalyse
Abstract: This thesis proposal approaches unsuper- vised relation extraction from web data, which is collected by crawling only those parts of the web that are from the same do- main as a relatively small reference cor- pus. The first part of this proposal is con- cerned with the efficient discovery of web documents for a particular domain and in a particular language. We create a com- bined, focused web crawling system that automatically collects relevant documents and minimizes the amount of irrelevant web content. The collected web data is semantically processed in order to acquire rich in-domain knowledge. Here, we focus on fully unsupervised relation extraction by employing the extended distributional hypothesis. We use distributional similar- ities between two pairs of nominals based on dependency paths as context and vice versa for identifying relational structure. We apply our system for the domain of educational sciences by focusing primarily on crawling scientific educational publica- tions in the web. We are able to produce promising initial results on relation identi- fication and we will discuss future direc- tions. (DIPF/Orig.)
DIPF-Abteilung: Informationszentrum Bildung