In this post I present my dissertation, or the Capstone project completed as the final work of the MSc Applied Social Data Science program. In this project, I used data from various Reddit communities that I scraped myself using Pushshift API in Python, as well as AskReddit data from Kaggle. I preprocess the data and conduct some preliminary analysis in R, mainly using the quanteda package for quantitative text analysis. Finally, I estimate predictive models using scikit-learn and TensorFlow, achieving the best performance with the neural network.

Please note that the corresponding code files are uploaded in this repository.