Seattle Confidential: unpacking Airbnb reviews with sentiment
Sentiment Analysis is a text classification tool that analyses a message, like a tweet, or a Tripadvisor comment and tells you whether the underlying sentiment is positive, negative or neutral. It’s a great tool to quickly extract information or classify social media text streams. Or it could help a business understand the sentiment around its brand by monitoring online conversations.
Can we find a way to classify negative and positive reviews based on text?
Machine learning can be a powerful tool to classify text, but it usually requires an algorithm to be trained on a set of “labeled” data, or already manually classified, before it can generalize to new data. And unfortunately you don’t always have one.
In this study, instead, we have used a lexicon and rule-based sentiment analysis tool known as Vader (from “VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text” by C.J. Hutto and Eric Gilbert), that works very well in analyzing text originating from social media.
VADER produces four sentiment metrics: the first three, positive, neutral and negative, represent the proportion of the text that falls into those categories. The final metric is a compound score, it is the normalized sum of all of the lexicon ratings which have been standardized to a range between -1 and 1.
We fed the algorithm with a dataset of reviews from Airbnb guests in the period that goes from 2009 to 2016, made available by Airbnb Inside.
The Sentiment Analyser distributions are quite similar between the two cities. The plots show that only just over 6% of the reviews have a strong positive sentiment (score >0.5) and the majority (61%) are in the mild 0.2–0.4 range, while most of the reviews are neutral (93%). However the negative sentiment is very low as the majority of comments have a negativity score of less than 0.1. So although guest tend to not overdo with positive comments, they leave negativity only to extreme cases.
The compound scores gives us an interesting info: while more than 88% of the guests in Seattle leave overall positive reviews, only 63% of the guests in Boston do so: our friends in the east coast seem to be a bit fussier. The reason might be simple: they pay more. Boston is basically more expensive than Seattle, as the average price per listing in Seattle is 137 dollars, compared to 201 dollars in Boston. The graph below shows trends of the prices in the two cities:
Conclusion
With the proliferation of social media businesses increasingly look at reviews, ratings and other forms of online opinion to manage their reputations or market their products. The huge amount of data available makes it necessary to automate the process of filtering out the noise and identifying the relevant content to action on. They will require more sophisticated sentiment analysis tools that go beyond using single terms but will take into account cultural factors, linguistic nuances and differing contexts.