Regional balance during disruptive times
by Mahesh Vyas
National and regional lockdowns and other restrictions on mobility and human interactions posed a serious challenge to the execution of the Consumer Pyramids Household Survey. We o vercame those challenges and continued to conduct the survey even during the lockdowns. We switched from face-to-face interviews to telephonic interviews and escalated the conducting of the interviews to supervisors to ensure no loss in quality. In all this a potential casualty could be the balance in the composition of responding households in relation to the balance of the sample. We describe our experience in this respect in this note.
The CPHS sample is skewed in favour of urban regions. The urban sample is disproportionately larger than the rural sample. This, of course, is by design to account for the greater variance in most characteristics of towns even within a state. Larger towns within a state are significantly different from smaller towns. Villages are relatively more homogeneous, particularly within a state. To capture the greater variance in towns, CPHS has, by design, stratified urban regions more and therefore selected a larger urban sample.
Such a skewed sample is often criticised implicitly to state that results from such a sample would provide estimates that are biased in favour of urban regions. But, this is not true. It is not true because CPHS unit-level data is always delivered with appropriate weights. If a region is over-sampled then its observations are assigned correspondingly lower weights. The use of these weights ensures that the estimates would not be biased in favour of any region.
While the CPHS sample is a panel, it has undergone several changes in its history since 2014. The biggest change effected was in 2017. In Wave 12 of September-December 2017, the number of villages covered shot up by nearly 30 per cent from 2,964 to 3,838. Rural households in the sample increased from 48,526 to 61,404. As a result, the urban:rural ratio of the sample households dropped from 2.3 to 1.7.
Wave 12 therefore marks a break in the urban:rural balance in the CPHS sample. Till Wave 11, the ratio was an average of 2.38. But, it was on a gradually declining gradient. From Wave 12, the ratio has been steadier at around the average of 1.75. The ratio moved up in the latest wave’s Wave 22 to 1.79 when we expanded the sample in select towns in Karnataka, Telangana and Andhra Pradesh.
Responses, broadly, reflected the sample. Till Wave 11 while the average ratio of urban:rural sample households was 2.38, the average ratio of urban:rural responding households was 2.29. After the expansion of the rural sample in Wave 12, the gap between the urban:rural balance of the sample and the responses has increased. While the average ratio from Wave 12 of the sample was 1.75, the average ratio of responses was higher at 1.93.
The proportion of urban responses has increased disproportionately in the pandemic-induced lockdown since Wave 19 of January-April 2020. As a result, the urban:rural ratio of responses since Wave 19 has been a shade over 2, while the sample urban:rural ratio was 1.76. This rising skew in responses was arrested in Wave 22. But, the ratio is still high at 2.13 and merits a reduction closer to the sample ratio of 1.79.
Deploying the combination of weights and non-response factors would ensure that the population estimates are computed correctly even though the difference between the sample and the response ratios has widened somewhat recently. CPdx does provide weights and non-response factors. Besides, the large sample size of CPHS provides a cover against the skewness impacting research adversely.
The CPHS sample is not an outcome of a population proportionate sampling process. Therefore, the state-wise sample distribution will not reflect the state-wise population distribution. Yet, there are some similarities between the sample size of a state and the population size of the state. As a result, Uttar Pradesh and Maharashtra lead in terms of sample size as they do in population size as well. They account for 12.9 per cent and 11.2 per cent of the total sample, respectively. According to Census 2011, Uttar Pradesh accounted for 16.5 per cent of population and Maharashtra accounted for 9.3 per cent.
Rajasthan and Tamil Nadu come next with a share of 6.2 per cent each in the total sample of CPHS. Then, Karnataka and West Bengal have a share of 6 and 5.9 per cent, respectively.
Of the top six states according to Census 2011 population data, four are also the top states by sample size. The exceptions are Bihar and Madhya Pradesh that are replaced by Rajasthan and Karnataka. Bihar and Madhya Pradesh did not find a place in the top six by sample because of their much lower urbanisation levels.
What is important is that the distribution of responses should match the distribution of the sample. I.e. survey execution should not deviate from the distribution of the sample. Some deviation is inevitable but our effort is to minimise this deviation.
One summary way of comparing the two distributions is to see if the ordinal ranking of states by their sample and their responses are strongly correlated. This can be done using the Spearman correlation coefficient. A perfectly monotonic relation between the two ranks (i.e. ranks of states in the sample and ranks in the responses) yields a Spearman correlation coefficient of 1.
The Spearman correlation coefficient averaged at 0.992 till Wave 17 with a range of 0.985 to 0.997. Therefore, at least the ordinal ranking of the sample size and the response size was very close for the first 17 waves. This indicates that the basic processes deployed in survey execution were adequate to ensure no bias in execution.
In Wave 18 and Wave 19 the Spearman correlation coefficient dropped to 0.975 and 0.976, respectively. The impact of the lockdown was evident in Wave 20 when the correlation coefficient dropped sharply to 0.921. In the following two waves, the coefficient has recovered partly to 0.932 and 0.946. Evidently, while the ranks continue re remain very closely related there was deterioration in the wave of May-August 2020, which has been rectified partly in the waves of September-December 2020 and January-April 2021. The effort will be to bring the two rankings closer.
The two extremes of deviation between responses and sample were Uttar Pradesh and West Bengal. Uttar Pradesh has a share of 15.6 per cent in the total responses compared to its share of 12.9 per cent in the sample and West Bengal has a share of 3.8 per cent in the responses compared to its 5.9 per cent share in the sample. The next big variation is in Odisha. The Odisha sample accounted for 3.8 per cent of the total sample but responses accounted for a lower 2.2 per cent.