Predicting a song becoming a hit with probabilistic models - applied econometrics essay
Below I have attached a file containing an essay that I wrote as a final project for my undergraduate course in econometrics. For this task I used the Spotify Hit Predictor Dataset and estimated logit, probit, and linear probability models (LPM) using R. First I found the best model specification within each type of model, and then compared the error rates. The figures for LPM, logit and probit were 0.266, 0.279 and 0.328 respectively. The result is quite unusual because normally we would not expect the LPM to outperform non-linear probability models due to its numerous flaws. However, LPM tends to perform quite well when the data is homogenous, which I believe was the case here.