Things to remember while planning a trip to Seattle.
Exploring Seattle city Airbnb listings.
Airbnb began in 2008 when 2 designers decided to rent out a room to 3 travellers looking for stay. Now Airbnb has turn into a marketplace which lets people to rent out their properties or spare rooms to guests for a minimal charge.
In this article, I am going to use Data analysis techniques with Python based on the CRISP-DM process to analyze the Airbnb data.
CRISP-DM (Cross Industry Standard process for data mining)
Why choose Seattle?
Located on the northwest coast of America, the city of Seattle is a seamless hybrid of urban buzz and rural escapes.
For getting started with, data used in this project is available from https://www.kaggle.com/airbnb/seattle/data. The data conists of :
- reviews.csv containing information such as unique identifier for each reviewer and detailed comments.
- listings.csv contains 3818 rows and 92 attributes such as id, property type, neighbourhood, price, room type, availabilty , etc and some host related information.
- calendar.csv contains information for listings providing information about availability on a specific date.
After exploring the data a little bit, I decided to focus on some specific attributes which seems to have some kind of effect on the price of listings and it’s popularity. Some interesting questions I had are the following :
- What are some most expensive property types in Seattle City?
- Effect on booking prices at different times of the year?
- What are the most and least neighbourhoods in Seattle City?
- Which are the most impactful attributes for predicting price of a listing?
Question 1: What are some most expensive property types in Seattle City ?
After performing some data cleaning and necessary steps on listings data we can clearly see the availability of each property type. Almost 90% of property listings in data-set are Houses and Apartments.
After grouping the data by property type and aggregating to get the average price for each type of property.
From the above figure the following can be observed:
- Properties listed falls into 3 price categories high,mid and low.
- As boat houses are extremely luxurious and due to their low availability it is the most expensive property type.
- Houses, Apartment, Bungalows are spacious and highly available they fall under the mid price range category.
- Tent, Dorm has availability less than 1% and not in demand they fall under the low price range category.
2) Effect on booking prices at different times of the year?
Using the information available in calendar data we can plot the booking prices listed throughout the year.
By looking at the above figure we can observe that prices are quite low at the start of the year and shows an increase when reaching the middle of the year.
We can compare the available listings with the daily listed prices to find the relation between availability and prices at different times of the year.
From the above figure we can conclude:
- There is an inverse relationship between listings availability and daily prices.
- Prices are at their peak during the months June to September, also the availability of houses shows a downfall.
- Cheapest time to visit Seattle at the start of the year to March.
3) Most and least expensive neighbourhoods in Seattle?
It is important to know which neighbourhoods have a high availability in the listings and which ones fits according to your convenience?
Majority of listings in Seattle are from Capitol Hill neighbourhood
Insights drawn from the above figure are:
- Most Expensive neighourhood is Fair-mount park followed by Industrial District, Portage Bay ranging between $150 to $350.
- Least Expensive ones are Roxhill, Olympic Hills ranging from $50 to $100 showing a huge difference compared to expensive ones.
4) Which are the most impactful attributes for predicting price of a listing?
To find the attributes that can help predicting the price attribute, I used clustering method and xgboost. With the results from above mentioned data modelling methods I was able to segment data into different clusters.
Cluster 9 has listings with highest prices, cluster 2 has listings with lowest prices compared to rest of the clusters.
We can observe here that cluster 2 is highly negative compared to other cluster and has the most difference in price.
After modelling data with xgboost classifier important features affecting price are surfaced.
From the above figure it observed that:
- Availability, no. of reviews, how many persons can be accommodated, room type, neighbourhood, property type, cancellation policies are some important features that highly affect the listing prices.
- With these features it is possible to predict the prices with some good accuracy.
In this article, I had a look into Airbnb listings data Seattle city and found some interesting insights which can help while exploring the city.
- Got the names of some most and least expensive neighbourhoods in Seattle.
- Compared the available listings data to the daily prices of the listings to find the relationship between them that helped us to identity high seasons,most expensive and cheapest months to visit Seattle.
- Identified the distribution of availability in different neighbourhoods also identified some most expensive and least expensive neighbourhoods.
- Identified attributes associated with price using clustering methods and xgboost model:
- Property Type
- Room Type
- Cancellation Policy
- No. of Reviews
What are your suggestions ? What other features did you find are important?
For the code and detailed analysis, please refer to the GitHub link : https://github.com/rahul81/Data-Scientist-Nanodegree/tree/master/blogpost