-
-
Author(s): Flekova, Lucie; Gurevych, Iryna
Title: Can we hide in the web? Large scale simultaneous age and gender author profiling in social media. Notebook for PAN at CLEF 2013
In: Forner, Pamela; Navigli, Roberto; Tufis, Dan (Hrsg.): CLEF 2013 Labs and Workshops: Online working notes, Padua: PROMISE, 2013 , S. 1-11
URL: http://ims-sites.dei.unipd.it/documents/71612/430938/CLEF2013wn-PAN-FlekovaEt2013.pdf
Publication Type: 4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings
Language: Englisch
Keywords: Alter; Chatten <Kommunikation>; Computerlinguistik; Computerunterstütztes Verfahren; Datenanalyse; Gender; Soziale Software; Sprachanalyse; Stilistik; Web log; Zielgruppe
Abstract: Would you target your audience differently, knowing the real age and gender of the text authors on your website forum? This paper examines hundreds of thousands of online documents, e.g. chat lines or blog posts, showing that computers are capable to address this task better than humans, without relying on content stereotypes. Pointing out that age and gender profiling are not independent problems, we approach the task as a multiclass classification problem, combining the age and gender information to define six classes. Utilizing a wide range of stylistic and content features and a large number of readability measures we demonstrate the high predictive abilities of the parts of speech, the punctuation and the amount of emotions and slang used in the text, independently of the topic discussed.
DIPF-Departments: Informationszentrum Bildung