In a market economy, companies are always comparing themselves to their competitors in order to stay in business by ensuring that they are staying ahead of the competition. One way that companies measure themselves based on their competition is by looking at what consumers are saying about them.
Last year, I was fortunate enough to participate in one of these competitor audits during my time at Wunderman DC. Some of the things that we looked at was how many users are talking about the brand (i.e. how many posts mention our client’s name vs that of the competitors), which tweets have the highest user engagement (i.e. get retweeted or liked the most) and finally what are the sentiments surrounding the topic using reports from Synthesio. While this process is very time consuming, aggregating these granular statistics and exploring these findings could be key in serving the consumers more effectively. Because this is an important analysis that is repeated periodically, I’ve been wondering if there was any way that this process could be automated in order to save valuable time for the analysis part by reducing the time needed for data collection.
Recently I did just that! Using python and python libraries (such as Tweepy, WordCloud and TextBlob), I create a program that prints a pdf report (see left) for any user-defined keyword(s). I have made a loop that runs the program for as many keywords as the user specifies. I have also given the user the flexibility to choose how many tweets they would like to see by keyword. This will differ by industry/brand popularity. For example, running the program with 5000 tweets for Nike results in a report of the last couple of hours while running the program with 5000 tweets for Danone results in a report for the past couple of days.
Another reason why I allowed this flexibility is because I see two uses for the program. First, as mentioned above, I see this program as helping automate the tedious task of social listening for competitive audits. Because of the Twitter API rate limit, a marketer will only be able to make 180 calls in 15 minutes. As a result, in order to have the program collect all of the tweets for the past 6 months, a time buffer of 5 seconds was created. In this way, whenever the program completes one call, it waits for five seconds before making the next one. The logic is that in 15 minutes, you can only make 180 calls. Five minutes has 900 seconds and 900 seconds divided by 180 calls results in 5 seconds per call. Therefore, this program can’t make more than 180 calls in 5 seconds even if it wanted to.
However, running this code for a six month period could take a lot of time depending on the popularity of the brand. Therefore, I wanted to allow the user the flexibility to decide how many tweets they want to source in case they wanted to get a daily snapshot. As a result, I see the program as having a second function as a daily brand health test. Depending on the average chatter associated with the brand, the marketer should set the maximum tweet parameter to return the appropriate number of tweets for the past day. Because the program shows the popularity of the brand as well as the engagement of the consumers, it could give warning signs to changes in the landscape. If the metrics dip from one day to the next, the marketer might be interested in seeing what caused the change. On the other hand, if they skyrocket, it might be a good idea to see what the cause was in order to capitalize on this behavior to improve the marketing efforts.
If you are interested in seeing the code or running the program, you can download the python code here.
Currently, I am working on converting the plots from matplotlib plots to pyplots using Dash, in order to make the report into an interactive html dashboard (stored locally). Do you have any suggestions for future updates?