A Python reproduction of the Maas et al. (2011) paper on learning word vectors for sentiment analysis using IMDb reviews. The post walks through building a vocabulary from 75,000 reviews, implementing a probabilistic semantic model with MAP estimation, adding a supervised sentiment objective using star ratings, and evaluating learned representations with a linear SVM classifier. Four document representations are compared: Bag of Words baseline, semantic-only word vectors, full semantic+sentiment vectors, and a combined dense+sparse representation. Results closely match the original paper, demonstrating how unlabeled data can learn semantic structure while labeled ratings inject sentiment polarity into the same vector space.
Table of contents
What problem does the paper solve?Data structureSemantic componentSentiment componentImplementation in PythonConclusionImage CreditsReferencesSort: