Using my IBM Data Science Skills: Choosing the Best Cities for Middle Eastern Eateries in Australia

IBM Data Science Capstone

Choosing the Best Cities for Middle Eastern Eateries in Australia: An exploratory analysis of Sydney, Melbourne, Canberra, Brisbane and Perth

Farooq Yousaf

In this capstone project, I used my overall learning from IBM’s Professional Data Science Certificate to solve a problem that I often encountered while living in Australia. Moreover, it is also a common issue faced by many local and foreign tourists who travel to major metropolitans in Australia, however, they are overwhelmed by food choices. Hence, I will use this project to “explore” which major metropolitans in Australia are the best options for visiting restaurants and eateries offering “Middle Eastern” cuisine.

The cities that I analyzed included Sydney, Melbourne, Canberra, Perth and Brisbane. Even though Sydney is the densest city by population (ABS, 2020), and normally it would be assumed that it would be easier and more convenient to visit various venues in the city, however, from my personal experience, it takes longer to explore venues in Sydney than in Melbourne or Perth. Hence, this project tried to explore why is that the case, especially when Sydney has more venues than any other city in Australia.

Data

The data required for this project was mainly of Middle Eastern restaurants based in the major cities of Australia; namely Sydney, Melbourne, Perth, Brisbane, and Canberra. For that purpose, I used the Foursquare API to extract data of Middle Eastern restaurants in the cities under discussion. The major values from that data that were required for this project were the name, address, latitude and longitude. Moreover, for analysis purposes, Wikipedia and the Australian Bureau of Statistics (ABS)data was also used to expand the analysis. To analyze the data and create data frames, pandas, NumPy and folium (for maps) libraries were used.

Methodology

As John Rollins rightly argues, “lack sufficient understanding of how to go about solving problems using data science techniques” results in failure of adequately addressing the problem at hand (Rollins, 2015). Hence, in section 1 and 2 of this report, we started with introducing the problem, location and our data sources. In that regard, we used simple exploratory data analysis and map visualizations to find out which major cities in Australia are the best for local and foreign tourists in terms of eating at Middle Eastern restaurants.

Data Preparation and Modelling

We used the Foursquare API to search for Middle Eastern venues in our sample set of cities using the following code:

Foursquare API Code

As our data was in a highly nested JSON format, we turned it into a Pandas Dataframe using json_normalize. After normalizing the data, we had the data in our required format of “Name”, “Address”, “Lat” and “Lng”.

Normalizing our Json Data

Once we have normalized our data, we used a two-step approach to analyze our data. In the first step, we plotted the data normally as clusters of different venues in our sample cities. In our second step, we calculated the geographical centres of the city and the calculated the mean distance of venues from that centre. The analysis, even though not directly in our scope, also showed us how Middle Eastern restaurants are clustered in each of the five cities, indicating the diversity of the cities.

Results with normal plotting

While plotting our data for various cities, we first normally plotted the venues gathered from Foursquare API, along with calculating the central locations of all the cities. However, as we do not need to calculate the distance of venues from the centre, we first normally plotted the data.

Results with mean distance from the mean (center) location of the city:

Now that we have seen the clusters plotted on the map, we calculate the mean distance (average distance) of venues from the geographical centre of the city. The plot gives us a clearer picture of the true distance for a tourist or a traveller to visit various venues.

Values of Mean Distance (from mean center) of each city

Even though the mean square distance value gave us a clear picture, we calculated the mean square distance by excluding the maximum value of the “outliers”, to get a homogenous value. For this purpose, we use the NumPy mean function. We use the following code for each city:

Sydney, NSW: 0.11287612594187721

Canberra, ACT: 0.07047281616415198

Melbourne, VIC: 0.03318223070418223

Perth, WA: 0.04176312230297355

Brisbane, QLD: 0.05986191980294174

Discussion and Conclusion:

The findings from our analysis present an elaborate picture. As seen from our findings, when the venues were plotted without the mean square distance, the map, for a layman, might have shown that both Melbourne and Sydney might seem like good clusters with higher density. However, our mean square distance shows that even though Melbourne is indeed good, Sydney does not present a convenient picture as the mean square distance for Sydney is the highest compared to all other cities. This is because Sydney has a total area of 12,368 km², whereas other cities have the following areas: Brisbane 15,826 km², Melbourne 9,990 km², Perth 6,418 km², and Canberra 814.2 km² (Wikipedia, 2020). However, even though Brisbane, area-wise, is the largest “proper” city in Australia, it has a lower population than Sydney and Melbourne. Hence, this also suggests why Brisbane has fewer venues than Sydney and Melbourne because the latter cities have a higher population than Brisbane. Hence, our findings suggest that the best location for any tourist to try Middle Eastern food in Melbourne, as they will have to travel less to explore different venues in the city. After Melbourne, the two best options, in terms of less travel, are Perth and Canberra. The reason both Sydney and Brisbane, even being major metropolitans in Australia, have a higher mean square distance in terms of locations is that both the cities are area wise the biggest cities in Australia.

The project initially started with defining the problem statement, which questioned which major cities in Australia were the best for local and foreign tourists to visit Middle Eastern restaurants. Initially, from the normal clusters, it would seem that Melbourne and Sydney were indeed the best locations to visit the Middle Eastern venues. However, the mean square distance suggested that Sydney, with a larger area, had a higher mean square distance between the geographic centre of the city and all the venues. Similarly, Melbourne was the best city in this regard as the mean square distance was the lowest, suggesting that tourists would have to travel less to explore different venues. This project also explained how data science can solve even the simplest of problems in a more attractive manner.

Github Link:

The full repository for the project can be found here

References:

ABS. (2020). Australian Demographic Statistics, Sep 2019. Retrieved from https://www.abs.gov.au/ausstats/abs@.nsf/mf/3101.0

Rollins, J. (2015). Why we need a methodology for data science. Big Data and Analytics Hub. Retrieved from https://www.ibmbigdatahub.com/blog/why-we-need-methodology-data-science

Wikipedia. (2020). List of cities in Australia by population. Retrieved from https://en.wikipedia.org/wiki/List_of_cities_in_Australia_by_population

Sentiment Analysis of Pakistani Twitter: #SindhGovt. and #DGISPR leading with “positive sentiments”

Dr Farooq Yousaf [1]

The confusion surrounding the PTI Government in terms of dealing with the Coronavirus (COVID-19) outbreak is also denting the party’s, especially PM Imran Khan’s, perception on social media. The three major stakeholders currently under discussion on the social media in terms of COVID-19 outbreak include the Sindh Government, the PTI Government and, more recently, DG ISPR.

Even though it should not be taken as a major generalizable indicator, a basic sentiment analysis conducted on these three stakeholders suggests that both the Sindh Government, and its CM Murad Ali Shah, and DGISPR, after his latest press talk on the coronavirus outbreak, are leading Pakistani twitter with twitter users expressing “positive sentiments” towards both of them. Even though sentiments towards PM Imran Khan are also predominantly positive, his ratio/percentage of positive to negative is lower than both CM Sindh and DG ISPR.

In the vast field of data science, sentiment analysis is a technique used in text mining and analysis of various emotions associated with the text. More specifically, it can refer to a “text mining technique for analyzing the underlying sentiment of a text message, i.e., a tweet. Twitter sentiment or opinion expressed through it may be positive, negative or neutral”. I have used a variety of text analysis libraries to run and re-run sentiment analyses of various datasets that I imported from twitter using Twitter’s API in Python.

Due to Twitter’s data extraction limitations, one can only extract a certain number of tweets at a given time. Hence, the following sentiment analyses are based on 500-1000 tweets for each of the three stakeholders. This analysis was performed to gauge the general sentiment of Pakistani Twitter in terms of how it was reacting to various major stakeholders in the country. Moreover, this analysis should be treated as a sample to explore wider applications of Machine Learning and Natural Language processing in political narrative and discourse analysis of Pakistan.

The following analyses of tweets do not contain retweeted tweets. The scale for reading the values of sentiment can be found at the bottom of this page*.

(Anyone interested in performing a Sentiment Analysis on their own can read this resource.)

Sindh Government:

The COVID-19 pandemic has somewhat brought the Pakistan Peoples Party back in the country’s mainstream politics. According to his supporters and critics alike, CM Murad Ali Shah of Sindh Province has led from the front to tackle the outbreak in the province. It is no wonder that over 80% of the tweets from our random sample of tweets expressed positive sentiments towards the Sindh CM.

Twitter Sentiment on DG ISPR

On the other hand, the newly appointed DG ISPR Maj Gen Babar Iftikhar has received positive feedback on his ‘apolitical’ press talk on Pakistan Resolution Day and the coronavirus pandemic. The DG argued that even though the ‘geographical’ borders had been closed as a preventive measure, “the actual border was between the man and the coronavirus, which we (Pakistan) have yet to take control of”. The tweets mentioning #DGISPR as their hashtag had an 80% positive sentiment ratio.

Twitter Sentiment on PM Imran Khan and PTI

Finally, both PM Imran Khan and his PTI Government had the lowest “positive sentiment” ratio out of the three stakeholders under discussion, with 67% positive tweets mentioning either #ImranKhan and #PTIGovernment. Even though the majority of the tweets in the sample had a positive sentiment, the lowest percentage of the three means that PM, due to lack of clarity in his plan for dealing with the outbreak, might have taken a hit on his popularity on twitter. This is especially interesting because PTI’s social media team and campaigns are considered to be the most sophisticated in the country among all political and non-political stakeholders. Hence, this provides a point of concern for the PM.

Even though Pakistani Twitter’s sentiment analysis presents an interesting picture in terms of the apparent popularity, in terms of positive sentiments, of various political and non-political stakeholders, this analysis should be treated as the “final word” on the topic for two reasons. 1) the sample size of analysis is limited and consists of 500-1000 tweets each for each stakeholder, and 2) the analysis excludes tweets in local and national languages, which might present a different picture altogether. However, the analysis should be treated as an example to explore how policymakers and politicians can use Machine Learning, and Natural Language Processing to gauge sentiments of social media users in the country.  

———————————-

*Scale:

-1: Very Negative Sentiment

 1: Very Positive Sentiment

[1] The author holds a PhD in Politics from the University of Newcastle Australia. He has also previously received his Masters in Public Policy from the University of Erfurt in Germany.

COVID-19 outbreak, terrorism and regional peace – Farooq Yousaf

Within a matter of weeks, the novel coronavirus – which came in the global limelight after spreading in China’s Hubei province and resulted in lockdowns in the country – has created major risks for global economy and public health.

On December 31, 2019, China had alerted the World Health Organization (WHO) of “several flu-like cases” in Wuhan, the capital of Hubei province. Patients had then been quarantined and health authorities commenced work on tracing the source of the “flu”. Soon after, on January 1, 2020, the US Centres for Disease Control and Prevention identified a seafood market in Wuhan as the “suspected hub of the outbreak”. Since then, at the time of writing this piece, the COVID-19 strain of the Coronavirus has caused a global pandemic, affecting over 150 countries, infecting nearly 200,000 and killing nearly 8000 people (More details and live tracking of the coronavirus data can be found here).

The World Health Organization (WHO) chief Tedros Adhanom Ghebreyesus went as far as claiming that “a virus is more powerful in creating political, economic and social upheaval than any terrorist attack”. Hence, pandemics like these also raise concerns over the possible use of a deadly virus as a “bioterrorism” tool by terrorist groups around the world.

In 2015, it was reported that scientists at a top-secret facility in the UK were assessing the “potential use of Ebola as a bioterrorism weapon”. The unit was tasked with evaluating whether terrorist organisations like Al Qaida and the Islamic State could use the virus to attack targets in the west[i]. Moreover, Stephen Strauss also argues that much of the research on Ebola was funded due to the growing fears of the virus turning into a bio-weapon by terrorist organisations.[ii] In addition to that, outbreaks have also been used in the past to influence political outcomes. According to The Lancet, ‘on Dec 26, 2018, DR Congo’s Independent National Electoral Commission (CENI) invoked concerns about the Ebola outbreak and terrorism to postpone the elections in three areas in North Kivu (Beni, Beni Ville, and Butembo) until March 2019.’[iii] This move invited strong criticism as many of 1.2 million people in these three areas were likely to vote for the opposition leader, Martin Fayulu.

These precedents make the coronavirus epidemic a point of concern for state officials, policymakers and counter-terrorism experts around the world. Opinion pieces have already started showing up in major news outlets, especially in the US, particularly on the lines of mild “conspiracy theories” and orientalist discourses, on how the coronavirus can be used as a major bioterrorism weapon by non-state and ‘hidden’ state actors, especially in the “desert caves of Middle East”.

In this regard, Grady Means, in The Hill, writes:

Regardless of the source of the coronavirus, it is now a roadmap for future bioterrorism. The damage has been quick and enormous — much greater than 9/11 — and worldwide. The responses have been predictable and ineffective. And the cost of a potential weapon such as this is close to zero. It represents the perfect asymmetric warfare strategy, and there should be little doubt these lessons are being studied carefully by military planners in North Korea, Tehran, Moscow, Beijing and desert caves throughout the Middle East [iv].

However, along with raising “bio-terrorism” concerns, the coronavirus outbreak is also impacting (and somewhat hampering) the everyday operations of various terrorist and militant groups around the world. Moreover, due to the urgency and importance of covering the outbreak in the mainstream and alternate media, incidents around the world involving terrorist attacks are also going unnoticed.

COVID-19, Terrorism and Peace

The Islamic State recently included a full-page infographic on coronavirus prevention in the new issue of its official weekly “al-Naba newsletter”, the Homeland Security reported. The group also asked its followers to “stay away from countries affected by the outbreak”. The group has been constantly “reporting” on the Coronavirus in its newsletter since last month, with one of its February issues noting that even though the epidemic was a punishment from God for China’s rights abuses against the Uighur Muslims, the interconnectedness of this world would facilitate the transfer of epidemics. Therefore, “Muslims should seek help from God Almighty to avoid illness and keep it away from their countries”. These updates from the group suggest that the outbreak has, in some way, affected the Islamic State, at least for the time being.

US, Taliban and Peace in Afghanistan

Afghanistan has remained a point of concern for many, as not only is the country battling through a major presidential crisis, between Ashraf Ghani and Abdullah Abdullah, but a possible spread of coronavirus is also threatening to endanger the recently signed ceasefire agreement between the United States and the Taliban. Locals also fear that lack of supplies and facilities could be detrimental to the Afghan population, especially if the number of cases exponentially rises in the country.

“A doctor or a nurse may be able to buy some hand sanitiser and gloves for their homes, but we have hospitals in Kabul that don’t have clean water for doctors to wash their hands,” Najmusama Shefajo, an obstetrician-gynaecologist, told Al Jazeera news. She also said that if doctors lack the supplies to guarantee their own hygiene, it will then be hard for the patients to trust the doctors.

Therefore, political instability amid growing coronavirus concerns have caught the attention of foreign and senior officials overseeing security matters in Afghanistan.

Nick Kay – a senior representative of NATO – in a video message on Saturday called on the Afghan leaders find to an amicable solution the political instability in the country. “As the coronavirus sweeps the world causing public health crisis and potential economic crisis…it is strange that the political leadership cannot find a way to resolve their differences and unite the country both in the interests of public health but also peace,” Kay said.

Ironically, the Taliban have also raised concerns on the spread of coronavirus in government prisons.  “About 40,000 people are living in prisons run by the Kabul administration where there are no hygiene or healthcare facilities, making it a serious threat. This virus can spread very easily in such conditions. If something goes wrong, it will be the responsibility of the Kabul government.”, Taliban spokesperson Zabihullah Mujahid said on Sunday.

On the other hand, US expert and scholar Dr Barnett Rubin succinctly summarises the threat coronavirus poses to peace and stability in the region. In this regard, he writes:

The pandemic seems likely to spread quickly from both Iran and Pakistan into Afghanistan. All of the known cases are related to Iran—with which Afghanistan has a 572-mile largely unmonitored border, and where more than 10,000 coronavirus cases have been recorded and over 300 have died, making it the fourth-most-affected country after China, South Korea, and Italy  …… Afghanistan also has a 1,510 mile long border with Pakistan, which is at risk because of the weakness of its public health system, its 596-mile-long border with Iran, and the China-Pakistan Economic Corridor, which has brought tens of thousands of Chinese citizens into Pakistan…. The pandemic makes it even more important to end the war. The virus makes no political, national, religious, or sectarian distinctions. [v].

On the other hand, U.S. Peace envoy Zalmay Khalilzad, who was primarily responsible for giving final touches to the Afghan peace plan, has repeatedly met both the presidential claimants, Ghani and Abdullah, however, there is no end in sight for the political turmoil. Moreover, even after the signing of the deal with the US, violence has continued in the country as a recent “insider attack” killed seven Afghan security officials.

In these testing times, and with a near-global lockdown, regional actors like China, Russia, Qatar, Saudi Arabia and Pakistan can do little in terms of mediating between the Afghan government, the US and the Taliban. Hence, Barnett Rubin asks a pertinent question:

Will Qatar still welcome a delegation from Afghanistan, plus hundreds of journalists, under these circumstances?

Probably not!

The latest data (see graph below) on coronavirus infections from Johns Hopkins University’s Center for Systems Science and Engineering indicates two positive trends; one, the “infections curve” (orange) in China is flattening, and two, the patient recovery rates (green) are getting higher.  

Source: CSSE

However, in terms of its impact on terrorism and terrorist activities, even though little could be said with certainty whether coronavirus outbreak will hamper terrorist attacks from groups like Islamic State and the Taliban, concerns expressed by both these groups over the pandemic suggest that it may affect them in “some” way; probably in terms of movement and operations. The Taliban, even though seemingly concerned about the pandemic, has carried on with it its attacks on the Afghan security forces. The Islamic State, on the other hand, is also fighting for its survival and relevance in Afghanistan. Only time will tell whether COVID-19 outbreak will negatively affect the Afghan peace deal and the frequency of terrorist attacks around the world. However, this also leaves a gap for future research projects on global pandemics and their impact on the frequency of terrorist attacks.

Notes:


[i] The Guardian (2015), URL: https://www.theguardian.com/uk-news/2015/feb/21/top-secret-ebola-biological-weapon-terror-warning-al-qaida-isis

[ii] Strauss, Stephen. “Ebola research fueled by bioterrorism threat.” Canadian Medical Association. Journal 186.16 (2014): 1206.

[iii] The Lancet, URL: https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(19)30002-9/fulltext

[iv] Means, Grady, The Hill, URL: https://thehill.com/opinion/national-security/485921-the-coronavirus-blueprint-for-bioterrorism

[v] Rubin, Barnett, The Coronavirus Risk to the Afghan Peace Process, CIC-NYU, URL: https://cic.nyu.edu/publications/coronavirus-risk-afghan-peace-process