Flekova, Lucie; Gurevych, Iryna:

Can we hide in the web? Large scale simultaneous age and gender author profiling in social media
Notebook for PAN at CLEF 2013

In: Forner, Pamela; Navigli, Roberto; Tufis, Dan (Hrsg.): CLEF 2013 Labs and Workshops Padua : PROMISE (2013) , 1-11

4. Beiträge in Sammelwerken; Tagungsband/Konferenzbeitrag/Proceedings

Alter, Chatten (Kommunikation), Computerlinguistik, Computerunterstütztes Verfahren, Datenanalyse, Gender, Soziale Software, Sprachanalyse, Stilistik, Web log, Zielgruppe

Would you target your audience differently, knowing the real age and gender of the text authors on your website forum? This paper examines hundreds of thousands of online documents, e.g. chat lines or blog posts, showing that computers are capable to address this task better than humans, without relying on content stereotypes. Pointing out that age and gender profiling are not independent problems, we approach the task as a multiclass classification problem, combining the age and gender information to define six classes. Utilizing a wide range of stylistic and content features and a large number of readability measures we demonstrate the high predictive abilities of the parts of speech, the punctuation and the amount of emotions and slang used in the text, independently of the topic discussed.

