IBM Data Science Capstone
Choosing the Best Cities for Middle Eastern Eateries in Australia: An exploratory analysis of Sydney, Melbourne, Canberra, Brisbane and Perth
In this capstone project, I used my overall learning from IBM’s Professional Data Science Certificate to solve a problem that I often encountered while living in Australia. Moreover, it is also a common issue faced by many local and foreign tourists who travel to major metropolitans in Australia, however, they are overwhelmed by food choices. Hence, I will use this project to “explore” which major metropolitans in Australia are the best options for visiting restaurants and eateries offering “Middle Eastern” cuisine.
The cities that I analyzed included Sydney, Melbourne, Canberra, Perth and Brisbane. Even though Sydney is the densest city by population (ABS, 2020), and normally it would be assumed that it would be easier and more convenient to visit various venues in the city, however, from my personal experience, it takes longer to explore venues in Sydney than in Melbourne or Perth. Hence, this project tried to explore why is that the case, especially when Sydney has more venues than any other city in Australia.
The data required for this project was mainly of Middle Eastern restaurants based in the major cities of Australia; namely Sydney, Melbourne, Perth, Brisbane, and Canberra. For that purpose, I used the Foursquare API to extract data of Middle Eastern restaurants in the cities under discussion. The major values from that data that were required for this project were the name, address, latitude and longitude. Moreover, for analysis purposes, Wikipedia and the Australian Bureau of Statistics (ABS)data was also used to expand the analysis. To analyze the data and create data frames, pandas, NumPy and folium (for maps) libraries were used.
As John Rollins rightly argues, “lack sufficient understanding of how to go about solving problems using data science techniques” results in failure of adequately addressing the problem at hand (Rollins, 2015). Hence, in section 1 and 2 of this report, we started with introducing the problem, location and our data sources. In that regard, we used simple exploratory data analysis and map visualizations to find out which major cities in Australia are the best for local and foreign tourists in terms of eating at Middle Eastern restaurants.
Data Preparation and Modelling
We used the Foursquare API to search for Middle Eastern venues in our sample set of cities using the following code:
As our data was in a highly nested JSON format, we turned it into a Pandas Dataframe using json_normalize. After normalizing the data, we had the data in our required format of “Name”, “Address”, “Lat” and “Lng”.
Once we have normalized our data, we used a two-step approach to analyze our data. In the first step, we plotted the data normally as clusters of different venues in our sample cities. In our second step, we calculated the geographical centres of the city and the calculated the mean distance of venues from that centre. The analysis, even though not directly in our scope, also showed us how Middle Eastern restaurants are clustered in each of the five cities, indicating the diversity of the cities.
Results with normal plotting
While plotting our data for various cities, we first normally plotted the venues gathered from Foursquare API, along with calculating the central locations of all the cities. However, as we do not need to calculate the distance of venues from the centre, we first normally plotted the data.
Results with mean distance from the mean (center) location of the city:
Now that we have seen the clusters plotted on the map, we calculate the mean distance (average distance) of venues from the geographical centre of the city. The plot gives us a clearer picture of the true distance for a tourist or a traveller to visit various venues.
Values of Mean Distance (from mean center) of each city
Even though the mean square distance value gave us a clear picture, we calculated the mean square distance by excluding the maximum value of the “outliers”, to get a homogenous value. For this purpose, we use the NumPy mean function. We use the following code for each city:
Sydney, NSW: 0.11287612594187721
Canberra, ACT: 0.07047281616415198
Melbourne, VIC: 0.03318223070418223
Perth, WA: 0.04176312230297355
Brisbane, QLD: 0.05986191980294174
Discussion and Conclusion:
The findings from our analysis present an elaborate picture. As seen from our findings, when the venues were plotted without the mean square distance, the map, for a layman, might have shown that both Melbourne and Sydney might seem like good clusters with higher density. However, our mean square distance shows that even though Melbourne is indeed good, Sydney does not present a convenient picture as the mean square distance for Sydney is the highest compared to all other cities. This is because Sydney has a total area of 12,368 km², whereas other cities have the following areas: Brisbane 15,826 km², Melbourne 9,990 km², Perth 6,418 km², and Canberra 814.2 km² (Wikipedia, 2020). However, even though Brisbane, area-wise, is the largest “proper” city in Australia, it has a lower population than Sydney and Melbourne. Hence, this also suggests why Brisbane has fewer venues than Sydney and Melbourne because the latter cities have a higher population than Brisbane. Hence, our findings suggest that the best location for any tourist to try Middle Eastern food in Melbourne, as they will have to travel less to explore different venues in the city. After Melbourne, the two best options, in terms of less travel, are Perth and Canberra. The reason both Sydney and Brisbane, even being major metropolitans in Australia, have a higher mean square distance in terms of locations is that both the cities are area wise the biggest cities in Australia.
The project initially started with defining the problem statement, which questioned which major cities in Australia were the best for local and foreign tourists to visit Middle Eastern restaurants. Initially, from the normal clusters, it would seem that Melbourne and Sydney were indeed the best locations to visit the Middle Eastern venues. However, the mean square distance suggested that Sydney, with a larger area, had a higher mean square distance between the geographic centre of the city and all the venues. Similarly, Melbourne was the best city in this regard as the mean square distance was the lowest, suggesting that tourists would have to travel less to explore different venues. This project also explained how data science can solve even the simplest of problems in a more attractive manner.
The full repository for the project can be found here
ABS. (2020). Australian Demographic Statistics, Sep 2019. Retrieved from https://www.abs.gov.au/ausstats/abs@.nsf/mf/3101.0
Rollins, J. (2015). Why we need a methodology for data science. Big Data and Analytics Hub. Retrieved from https://www.ibmbigdatahub.com/blog/why-we-need-methodology-data-science
Wikipedia. (2020). List of cities in Australia by population. Retrieved from https://en.wikipedia.org/wiki/List_of_cities_in_Australia_by_population