Logo: Deutsches Institut für Internationale Pädagogische Forschung

Publications

Publikationendatenbank

show results

Autor:
Ferschke, Oliver; Gurevych, Iryna; Rittberger, Marc:

Titel:
The impact of topic bias on quality flaw prediction in Wikipedia

Quelle:
In: Association of Computational Linguistics (Hrsg.): 51st Annual Meeting of the Association for Computational Linguistics Stroudsburg, Pa. : Association for Computational Linguistics (ACL) (2013) , 721-730

URL des Volltextes:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.386.1725&rep=rep1&type=pdf#page=49pdf

Sprache:
Englisch

Dokumenttyp:
4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings

Schlagwörter:
Algorithmus, Computerunterstütztes Verfahren, Evaluation, Nachschlagewerk, Online, Qualität, Qualitätssicherung, Reliabilität, Soziale Software, Standard, World wide web 2.0


Abstract(original):
With the increasing amount of user generated reference texts in the web, automatic quality assessment has become a key challenge. However, only a small amount of annotated data is available for training quality assessment systems. Wikipedia contains a large amount of texts annotated with cleanup templates which identify quality flaws. We show that the distribution of these labels is topically biased, since they cannot be applied freely to any arbitrary article. We argue that it is necessary to consider the topical restrictions of each label in order to avoid a sampling bias that results in a skewed classifier and overly optimistic evaluation results. We factor out the topic bias by extracting reliable training instances from the revision history which have a topic distribution similar to the labeled articles. This approach better reflects the situation a classifier would face in a real-life application.


DIPF-Abteilung:
Informationszentrum Bildung

Notizen:

last modified Nov 11, 2016