Limitations? Sure. But Bias? Nope.
by Mahesh Vyas
We thank Jean Dreze and Anmol Somanchi for having critically examined CMIE’s Consumer Pyramids Household Survey. As the authors want CMIE “to reassert or retract its claim that CPHS is a nationally representative survey”, we first do that.
CPHS is a household survey of a nationally representative sample in the sense that it is selected through a process of multistage stratification and random sample selection over 98.5 per cent of India’s population with no bias in sample selection or survey execution. Limitations with respect to national representation include the exclusion of four border States / UTs of the northeast, some islands and one small UT on the mainland. We hope to expand into these regions eventually. Our concern is of security of the team given that we do not carry the imprimatur or power of the state.
Purists can disqualify CPHS from being called a nationally representative sample because of these exclusions. But, practical users accept these limitations and CMIE’s claim that CPHS is a nationally representative sample because the rest of the nation is sampled well.
Dreze and Somanchi say that a bias is inevitable because of the way the households are selected from villages wherein CMIE begins its selection from the main street and then proceeds to the outskirts only if the sample size requires it to do so. As a result, they say, poor households are bound to be under-represented. We explain that no bias creeps in because of our methodology.
The CPHS sampling begins at one end of a main street but ends in an inner street. Often, the starting of the main street is on the outskirts. It is not easy to avoid the outskirts in the CMIE sampling system. The average village in India has 300 households. The systematic random sampling exercise of CPHS requires the selection of every nth household in the village where n is a random number between 5 and 15 and the sample size required is 16. If the random number is 5 then CMIE would exhaust the selection of 16 households on the main street only if the main street contained at least 80 households. 80 households cannot be found easily on just one street of a village. Access to inner streets is inevitable. If the random number is 10, then we need 160 households and if the number is 15 we need 240 households on the main street for CMIE to exhaust the sample selection on the main street itself. Evidently, the CPHS sample cannot escape including households from the outskirts.
Our first choice was to enumerate all households like the NSSO and do simple random selection. But, this is not possible without local state support in all the 12,000 primary survey units because of security concerns. We have tried enumeration repeatedly and have been stopped midway by the local arms of government or in some cases non-government forces. So, we are compelled to use the next best option which is of systematic random sampling.
Nevertheless, we take the criticism of the authors seriously enough to undertake a thorough study of the sample and measure its distribution over the main area (street, circle or square) as against the outskirts. Wherever necessary, we will expand the sample to the outskirts. We hope to complete this exercise by April 2022. We are committed to creating and maintaining a robust and representative sample to the extent possible without the luxury of the heft of Sarkar with us.
Dreze and Somanchi say that estimates based on CPHS claiming that adult (15-49 years) literacy was 100 per cent in urban India in late 2019 are too good to be true. First, CPHS does not give a number like 100 per cent adult literacy. The number is 99.6 per cent. The (legitimate) rounding carries a misleading connotation. Besides, it is better to use a full calendar year as we do below and as the Census data are.
According to the Census, literacy in the 15-49 years age group in urban India was 81.6 per cent in 2001 and 86.1 per cent in 2011. CPHS shows a rapid increase in literacy in this age group from 89.4 per cent in 2014 to 98.1 per cent in 2020 in urban India. Note this is not 100 per cent. The annual CPHS estimates imply that urban adult literacy increased by 3.3 percentage points between 2011 and 2014 a plausible 1.1 percentage point increase per annum. Then between 2014 and 2020 urban adult literacy increased at the rate of 1.45 percentage points per annum. Is this acceleration in urban adult literacy evidence of limitations of the CPHS sample or of improvement in literacy among adult urbanites in a period when many were forced into participating in the aggressive digitalisation of India? Literacy is defined as a person’s ability to read. CPHS offers evidence that adult literacy accelerated perhaps because of digitalisation.
Dreze and Somanchi state that the CPHS sample is becoming more biased towards the better off. But, the reality is that it merely reflects the progress made by Indian households. Household incomes improved between 2014 and 2019. In 2014, households that earned less than Rs.100,000 per annum accounted for 31 per cent of the sample. By late 2019, their share dropped to 6.6 per cent. But, in 2020 when the economy shrunk, the share of households that earn less than Rs.100,000 rose to 9.6 per cent from 6.6 per cent in 2019.
Including the very poor and the very rich adequately is always a challenge. The homeless are systematically missed. The rich are becoming increasingly inaccessible. There are practical limitations but no bias in the CPHS.