I recently stumbled upon an article published on Shekhar Gupta’s portal, The Print. Titled “Retweeting Modi increases BJP MPs’ chances of getting a Lok Sabha ticket: Study,” the article was written by Barbara De Alfaro, a student of political science at UC Berkeley. Needless to say, the unlettered folks (in the sciences, at least) that occupy bastions of our media space went about bandying it as statistical evidence against Modi, touting Berkeley’s reputation to lend it credibility.
How many BJP MPs who routinely retweeted Modi got tickets? A Berkeley student investigates in this delightful piece. https://t.co/NB9G0DY6bd
— rama lakshmi (@RamaNewDelhi) April 22, 2019
Brlliant finding by a Berkeley researcher….
Retweeting Narendra Modi increases BJP MPs’ chances of getting a Lok Sabha ticket again…
Political science research student Barbara de Alfaro writes for ThePrint https://t.co/lUgraj7yqn
— Shekhar Gupta (@ShekharGupta) April 23, 2019
As a student of the mathematical sciences currently working in the field of statistics & machine learning, I can assert, without a speck of doubt, that this article is ridden with the most fundamental of statistical fallacies. I’ve taught classes (as student instructor) in the Department of Statistics at one of the most prestigious universities in the world (name withheld, lest the left cabal lets the dogs out to hound me). Now that my pedigree has been established (lest someone call me unqualified), let me commence with a point-by-point post-mortem of this spurious argument:
1. “I analyse the rate at which a BJP MP retweeted the leadership in February 2018. The retweet rates of February 2018 are a good benchmark for two reasons. First, it is more than a year before the general elections. We are, therefore, not observing retweets being conveniently used to make a good impression right before the elections”, the article says.
There are about 60 months between two general elections. Choosing one month out of them comes down to a tiny sample size of about 1.67%. And, why the arbitrary choice of February 2018? Why not March 2016 or November 2017 or Moronember 2015? They could have taken a uniform random sample (or use any other random sampling strategy) from months from May 2014 to February 2018 (to avoid the year before elections, if that is what they wished). That obviously didn’t happen — February 2018 was chosen for no good reason. Besides, has the dataset even been made public for others to analyze it, as is the norm in academia?
2. “How much a BJP MP retweeted key members of the BJP is calculated as a percentage of their total number of tweets during the period for which data was collected. For example, if an MP tweeted 2000 times and retweeted party leadership 1000 times, the retweet rate for that MP is 50 per cent.” — this is a dastardly assault on my technical sensibilities. This study is rotten in its core itself, with a metric that is flawed by design. Let me explain: consider the situation mentioned above where the “key members” of the BJP tweeted a total of 1000 times. Assume BJP MP X tweeted 1,800 times including every single of the 1000 tweets sent out by the “key members” — her retweet rate would be about 55.57%. Now, assume MP Y tweeted 1,500 times, including again every single of the 1000 tweets sent out by the “key members” — her “retweet rate” would be 66.67%. Now, assume a third MP Z tweeted exactly 10 times, out of which 8 tweets were from the “key members” — her retweet rate would be 80%. By the metric suggested by the study, Z would be considered most “loyal” because she has the highest retweet rate, despite having retweeted just 1%of the tweets from the “key members!” This is an even more ridiculous proposition because X & Y both retweeted every single tweet from the “key members,” clearly contributing more to advancing the image of the party & these members. X & Y are, in fact, being penalized by the “study” for being more active on Twitter, because the retweet rate decreases with an increase in the total number of tweets! Why is the “retweet rate” even dependent on the total number of tweets made by the MP — what does “loyalty” have to do with the other things a MP tweets about? Furthermore, assuming X & Y are not working against their own party, a large portion of the total 1800 & 1500 tweets would likely be working towards advancing the interests of the party: hence one could say they’re likelier to be more loyal to Modi/BJP than Z is, given that they’re sharing more pro-BJP/pro-Modi content with their followers. Choosing a deeply flawed metric — this is what amateur statistical analysis looks like, for the uninitiated.
3. “The key members of the BJP identified for this analysis were: Prime Minister Narendra Modi, the official BJP Twitter account, BJP president Amit Shah, and cabinet members Arun Jaitley, Sushma Swaraj, Rajnath Singh, and Uma Bharti.” — Again, arbitrary choices. A lot of leaders could happen to tweet content from other handles, the most prominent being that of PMO India. Why was that given a skip? Why were the accounts of BJP bigwigs like Yogi Adityanath, Smriti Irani, Piyush Goyal, Nirmala Sitharaman etc. given a skip? This is the most obvious limitation of the dataset used for this drivel that passes for “research.” There is no sound strategy used to identify who the “key members” are — they could, at the very least, have chosen the top-N percentile of most popular BJP-affiliated accounts (by Twitter follower count) as the “key members”. But who cares about generating datasets correctly when you can instead cherry-pick data i.e. the match-fixing equivalent of statistics?
4. “Of the top 20 per cent of MPs who retweeted Modi, 69 per cent were renominated to contest for the Lok Sabha. In contrast, of the bottom 20 per cent of people who retweeted Modi the least, only 46 per cent were renominated. This gives us a difference in renomination rates of 23 points.” — yet another arbitrary choice. Why 20%? Was that also cherry-picked to arrive at the desired conclusion? In that case, the author of the “study” might want to learn what confirmation bias is. Besides, for each cohort, the author takes into account only 40% of the data, discarding 60% of it. Again, novice mistakes in the study of statistics.
5. “But if there is such a high correlation between Twitter behaviour and political outcomes, it is worth investigating more closely the relationship between social media presence and BJP mobility” — The article talks about “high correlation” but does not even compute a correlation coefficient to quantify correlation! In what world can a “study” quantify correlation without a single metric for the same?
6. And now, onto the cardinal sin of this “study”: implying causation i.e. implying that retweeting Modi leads to an increase in an MP’s chance of getting a ticket. This is in direct contradiction with the token, blanket statement of denial that “[…] the argument presented here is not causal.” Firstly, the study itself fails to prove correlation, as shown above. Even if they did prove causation, correlation does not imply causation! This is a core tenet in the field of statistics, that every student would learn in an introductory stats class. Here are the causal statements in this sham of a piece, starting with the outlandish headline itself: “Retweeting Modi increases BJP MPs’ chances of getting a Lok Sabha ticket.” These casual statements of causal nature continue through the piece: “Being in the top retweeters of the cabinet members all combined boosted renomination rates by 30 points. […] Members of Parliament may no longer need to be well connected to make sure they can get a ticket for the next election cycle. They may no longer need to be able to attract large crowds or even have overwhelming support in their respective constituencies. Perhaps, they just need to retweet Modi.” This is what happens when politics meets science. You get neither sound science nor proper political analysis — just a shoddy concoction of the two.
Another fact that the ‘study’ fails to account for is that the BJP has stitched multiple alliances since February 2018, the time frame of the alleged ‘study’. The BJP has stitched an alliance with the Shiv Sena which means that the party will have to concede at least some seats to its alliance partner which it had won the last time around. Similarly, the BJP has an alliance with the JD(U) in Bihar, another alliance that did not exist in 2014 and carries similar implications. The ‘study’ fails to take into account these factors.
More importantly, social settings are multivariate and therefore, a particular situation is affected by a multitude of factors and not just one. In this particular ‘study’, the person has basically attempted to plot a simple graph between the number of retweets and renomination without accounting for various other important factors which affect the situation.
The author says further, “This analysis is limited by the fact that data was only available for 131 of 266 sitting BJP MPs. It is also based on renominations as they have been announced so far, but new lists of candidates are still being released.” Therefore, not only has the author ignored the effects of other variables but the far-reaching conclusions he has drawn are based on incomplete data.
While most postulates of my argument indicate a lack of statistical rigour in the “study,” I am almost willing to attribute them to pure malice, in defiance of Hanlon’s Razor. I could further probe this excuse of a study and find holes gaping enough for me to walk through, but that would be an exercise in futility — because cooking data & making outlandish conjectures masquerading as “research”, is the name of the game for Lutyens’ media. Not very surprising though, given that Mr Gupta has a splendid track record behind him.
I’m not sure if this short piece would even reach the hallowed corridors of The Print’s office. Perhaps, all I need to do instead is pen down some “research” that says “Creating bogus studies increases chances of getting published by The Print.” However, one thing the ‘study’ and the applause it has received from liberal circles surely proves is that liberals are basically illiterate when it comes to statistics.