Analyzing Review Text

Analyzing Review Text

Using advanced machine learning techniques, process review texts to predict restaurant star ratings.

Note: The code and the end result cannot be publicly displayed due to copywrite by The Data Incubator; however, they can be sent privately upon request.

After downloading a Yelp reviews data set, I began by pulling out only the restaurant reviews. Then, I started off with a standard bag-of-words model, strengthened it using normalization, and then further improved it by adding in bigrams as features. Once my model was successful, I further analyzed the text to find bigrams that showed up with significant regularity. I did this by dividing the probability of a particular bigram appearing over the product of the probabilities of both of the individual words appearing. By adding Bayesian Smoothing, I was able to minimize meaningless results.

Learning Points: Memory Optimization, Advanced Transformers, Count and Hashing Vectorizers, Lemmas, Bigrams, Dimension Reduction, Bayesian Smoothing


© 2017. All rights reserved.