Two Projects in Text Data Mining and Natural Language Processing
Department of Computer Systems Technology, New York City College of Technology, City University of New York
In this presentation I will describe two projects I am working on: Automatic Sarcasm Detection and Information Assymetries in Multilingual Wikipedia.
Sarcasm detection: Humans are good at identifying sarcasm in text and speech. Can we teach a computer to identify sarcasm? Is it possible to point out the parts of the review that make it sarcastic? To answer these questions I use a corpus of sarcastic and regular Amazon product reviews. I analyze the sentiment flow of these reviews and demonstrate that classification features based on sentiment flow can be used to reliably classify documents into sarcastic and non-sarcastic.
Multilingual Wikipedia: Wikipedia is currently used as THE source of information without doubting the quality of this information. However, the Wikipedia articles corresponding to the same entry (person, location, event, etc.) written in different languages have substantial differences regarding what information is included in these articles. I discuss the nature of information assymetries in Multilingual Wikipedia and outline my plan for using information assymetries for automatic extension of Wikipedia articles.
Bio: Dr. Filatova is an Assistant Professor in the Computer Systems Technology department at CUNY CityTech since Fall 2015. Prior to that she was a faculty member at the Forhdam CIS department. She received her Ph.D. in Computer Science from Columbia University in 2008