Piegenstuff

Mostly python and math stuff

Fri 25 August 2017

Seattle Airbnb Reviews

Posted by Simon Orlovsky in Posts   

DISCLAIMER: I am not trying to make any political or population wide claims, I just came across this dataset and was curious if I could make this analysis happen

Is there a difference between how men and women rate Airbnb's in Seattle? In this post, we're going to look at 1500 Airbnb reviews to answer this question. We will be scoring differences by measuring the sentiment of the comments in the reviews.

The data we will be using can be found on Kaggle

The gender guesser library is a module that gives functions that decide whether a name is female or male. We will use this tool to determine the sex of the Airbnb reviewers. The vaderSentiment analysis is a python library that determines whether a body of text is positive or negative in its sentiment. When joined together, we can use these modules to identify differences in how men and women review Airbnbs.

To get started, we're going to import the following libraries. They can be installed with pip if you don't have them.

In [1]:
import pandas as pd
import numpy as np
import gender_guesser.detector as gender
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

Next, we will read in the data and see what we're working with. We're only going to look at the first 1500 reviews.

In [2]:
reviews = pd.read_csv('seattle-airbnb-open-data/reviews.csv').head(1500)
reviews.head(3)
Out[2]:
listing_id id date reviewer_id reviewer_name comments
0 7202016 38917982 2015-07-19 28943674 Bianca Cute and cozy place. Perfect location to every...
1 7202016 39087409 2015-07-20 32440555 Frank Kelly has a great room in a very central locat...
2 7202016 39820030 2015-07-26 37722850 Ian Very spacious apartment, and in a great neighb...

This is what the data currently looks like. As we can see we don't know gender or sentiment. We just have names and comments.

Determining Gender

We currently know the first names of the reviewer but we don't know what gender they are.

Using the gender_guesser library we can assign a gender to each listing.

In [3]:
d = gender.Detector()
reviews['gender'] = reviews['reviewer_name'].apply(d.get_gender)
reviews.head(3)
Out[3]:
listing_id id date reviewer_id reviewer_name comments gender
0 7202016 38917982 2015-07-19 28943674 Bianca Cute and cozy place. Perfect location to every... female
1 7202016 39087409 2015-07-20 32440555 Frank Kelly has a great room in a very central locat... male
2 7202016 39820030 2015-07-26 37722850 Ian Very spacious apartment, and in a great neighb... male

Figuring Out the Sentiment of the Reviews

Next, we need to assign sentiment to each of the reviews.

The polarity score assigned by vader is returned in a dictionary so we're going to need to do a little bit of pandas finagling to get the data in the shape we want it.

In [4]:
analyzer = SentimentIntensityAnalyzer()

analyzer.polarity_scores('Cute and cozy place. Perfect location to everything! ')
Out[4]:
{'compound': 0.7901, 'neg': 0.0, 'neu': 0.462, 'pos': 0.538}
In [5]:
reviews['vader'] = reviews['comments'].apply(analyzer.polarity_scores)
reviews = reviews.join(reviews['vader'].apply(pd.Series))
reviews.head(3)
Out[5]:
listing_id id date reviewer_id reviewer_name comments gender vader compound neg neu pos
0 7202016 38917982 2015-07-19 28943674 Bianca Cute and cozy place. Perfect location to every... female {'neg': 0.0, 'neu': 0.462, 'pos': 0.538, 'comp... 0.7901 0.000 0.462 0.538
1 7202016 39087409 2015-07-20 32440555 Frank Kelly has a great room in a very central locat... male {'neg': 0.0, 'neu': 0.609, 'pos': 0.391, 'comp... 0.9872 0.000 0.609 0.391
2 7202016 39820030 2015-07-26 37722850 Ian Very spacious apartment, and in a great neighb... male {'neg': 0.043, 'neu': 0.772, 'pos': 0.185, 'co... 0.8718 0.043 0.772 0.185

Now we have the sentiment for each of the reviews. We can use this information to finally figure out each groups positive and negative sentiment.

In [6]:
female_pos = reviews.loc[reviews.gender == 'female'].pos.mean()
male_pos = reviews.loc[reviews.gender == 'male'].pos.mean()
male_neg = reviews.loc[reviews.gender == 'male'].neg.mean()
female_neg = reviews.loc[reviews.gender == 'female'].neg.mean()
In [7]:
print("Female positivity score: {}, Male positivity score: {}".format(female_pos, male_pos))
print("Female negativity score: {}, Male negativity score: {}".format(female_neg, male_neg))
Female positivity score: 0.30064392678868573, Male positivity score: 0.30319217081850547
Female negativity score: 0.01214309484193011, Male negativity score: 0.011062277580071167

It appears that the positive and negative values from men and women are about the same.

There we have it. We used pandas, gender_guesser and vader to compare genders and positivity of Airbnb reviews in Seattle.

Next Steps

In the next tutorial we will do a statistical analysis on this data to see if it is significant. We will also compare the standard deviations and do other visualizations.