Location and Language
There are times when understanding the mood or the thoughts of a particular region of the world is of main importance. For example, if we are interested in understanding the social opinions or concerns of youths in India, monitoring data from the United States isn’t all that practical. Just to be complete in this thought, however, while we understand that there may be some spillover discussion in US-based traffic about conditions in India, the likelihood of finding any significant content is probably not worth the effort of having to discover it in a vast sea of other (unrelated) data. Obviously, this is a decision that needs to be made by each data scientist or organization; our intent is simply to point out where there may be value in looking only at a particular region in the world.
As an example, consider the diagram shown in Figure 3.3; it shows social media mentions for a particular bank we were working on an analysis for. The bank had recently made some announcements and was interested to see if there was an increase or decrease in social media traffic as (perhaps) a result of the media attention. Figure 3.3 shows a summary of the top 10 languages for all of the media mentions we were able to collect over the previous two days.
Figure 3.3 Top 10 languages used in mentions.
What we were able to see was a large amount of traffic coming not from English (US) speaking individuals, but from Turkish social media participants. Not only that, but it appeared that Portuguese and Spanish numbers were almost equally as high. What was more interesting was that the announcements were made in the United States.
One of the interesting facts to gather would obviously be the location of the individuals making the comments. In some cases, this information is easy to retrieve—for example, through the use of GPS technology on mobile devices. In the case of Twitter, the use of geolocation can allow someone to find tweets that have been sent from a specific location. This could be a country, a city, or multiple regions around the world. When a Twitter user opts in to allow location-based services on his or her Twitter account, Twitter uses geotagging to categorize each tweet by location and makes that information available to subscribers of the data. In theory, this would give users of that data the ability to track tweets sent from a specific city or country. Unfortunately, the statistics on the use of this feature aren’t promising (yet), with only about 10% of the total population enabling the feature 5.
Lacking the exact geolocation, we could make the assumption that those posting in Turkish, for example, were originating their tweets from Turkey. It may not be a perfect one-to-one match, but lacking any other information, it’s the best we could do.
In this case, the bank in question had made an announcement (in the US press) about some branch closings in Europe. From the backlash we were able to mine from social media sources, it appears that those most widely affected customers were located in Spanish-speaking countries as well as Turkey. While we don’t know exactly how the bank handled this situation (our job was simply to discover any potential issues), we do know it immediately focused customer relations on branches and banking in those regions in an effort to minimize any fallout from its announcements.