In an interesting statistical study that I did with a couple of my 4.0 friends1, we set out to find whether all the data that election commission collects from each candidate has any predictive power on their chances of winning. Our stated objective was:
The election commission of India collects various data on candidates of Lok Sabha elections. This data includes such variables as age, assets, liabilities, number and nature of criminal cases registered against the candidate, educational status etc.
The authors were interested in studying the effects (or the absence thereof) of these variables on the outcome of the election.2
There were two parts to the analysis. One part was analysis of data collected for 2004 elections, which was the actual data. The second part was analysis of data collected through a sample survey of IIMB students3. Although not very surprising, the preferences of the two were markedly different. Here are some examples from the report.
It is interesting to note that the IT sector people prefer candidate with medium wealth. Candidates with low wealth and candidates with high wealth are not preferred by the IT sector employees.
Preference of test constituency people is exactly opposite; they prefer candidates with either very low or very high wealth. Candidates with medium wealth are not preferred.4
The following observations were along similar lines.
It is interesting to note that the IT sector people prefer candidate with no cases while the people of test constituency prefer candidate with cases.5
IT sector people prefer national party candidate while the people of test constituency prefer candidate regional party candidate.6
It is interesting to note that education is the most important attribute for the IT sector people while wealth is the most important attribute for the test constituency people.7
After a lot of data collection and analysis. We ended up with this conclusion:
The study found that data collected by election commission does not have statistically robust power for predicting or discriminating between winners and losers in an election. Discounting this, however, we find that urban elite as well as general voting public in some constituencies prefer national parties rather than regional parties or independents. Independents, in general have least preference. This observation is vindicated in the 2009 results as well. The study also finds, again not very robustly, that preference structures for various attributes such as number of cases, assets and education are markedly different between the urban elite and the general population – perhaps not surprising.
The main learning w.r.t., multivariate data analysis techniques is that the quality of data and the selection of variables determine the utility of the techniques to a very large extent. In this case, the objective of the study determined the choice of variables. This, in turn, lead to unfruitful results from various techniques.8
If you are interested in statistical techniques and Indian elections, you may find the report interesting.
- In IIMB, grade points are awarded for each course out of a max of 4.0. These classmates of mine, Gaurav Pathak and Amit Purohit, could find it with ease! [↩]
- Page 4 of the report [↩]
- Who were mostly PGSEM students. Hence the report refers to them as “IT sector people” [↩]
- Page 28 of the report [↩]
- Page 29 of the report [↩]
- Page 29 of the report [↩]
- Page 30 of the report [↩]
- Page 35 of the report [↩]