Analyzing Review Text
Projects | | Links:

Note: The code and the end result cannot be publicly displayed due to copywrite by The Data Incubator; however, they can be sent privately upon request.
After downloading a Yelp reviews data set, I began by pulling out only the restaurant reviews. Then, I started off with a standard bag-of-words model, strengthened it using normalization, and then further improved it by adding in bigrams as features. Once my model was successful, I further analyzed the text to find bigrams that showed up with significant regularity. I did this by dividing the probability of a particular bigram appearing over the product of the probabilities of both of the individual words appearing. By adding Bayesian Smoothing, I was able to minimize meaningless results.
Learning Points: Memory Optimization, Advanced Transformers, Count and Hashing Vectorizers, Lemmas, Bigrams, Dimension Reduction, Bayesian Smoothing