In a nation as vast and diverse as India, data is the bedrock on which policy, planning, and public health interventions must stand. Yet the very foundation appears worryingly shaky.
A recent compilation of 11 research papers on large-scale demographic and health surveys in India reveals an uncomfortable truth: we may be collecting more data than ever before, but we’re not necessarily collecting better data.
India’s flagship health surveys—such as the National Family Health Survey (NFHS), Longitudinal Ageing Study of India (LASI), and Indian Human Development Survey (IHDS)—are among the most cited data sources used to design government programmes and measure policy outcomes.
However, concerns over data quality—ranging from interviewer bias to inconsistent measurements and flawed methodologies—are not just technical irritants. They are structural red flags that could lead to deeply flawed policies, misallocation of funds, and skewed political narratives.
The 2019–21 NFHS reported, among other findings, that 13% of children were born prematurely and 17% had low birth weight—data that rightly raised alarms. But the worry now is not just the findings themselves but how they were arrived at. With rising demand for granular data at district and sub-district levels, survey questionnaires have become longer and sample sizes larger.
The natural consequence? Interviewer fatigue, poor training standards, inconsistencies in data collection, and delays in analysis. The result is a paradox: as we gather more data, we risk making it less reliable. Several studies in this special issue underscore how interviewer-level bias has crept into critical variables such as the number of children born, maternal care indicators, and even basic anthropometric measurements. Workload discrepancies among interviewers were found to skew results, raising serious questions about the accuracy of statistics we routinely treat as sacrosanct.
Solutions do exist, and some are refreshingly pragmatic. The recommendations include using outlier-based real-time monitoring tools to detect interviewer bias during fieldwork, modularizing the questionnaire, extending survey duration in data-heavy regions, and tailoring survey implementation to local conditions.
These aren’t radical overhauls—they’re common-sense fixes that have long been ignored in the rush to meet targets and publish reports. Another uncomfortable finding pertains to the use of the “wealth index” as a proxy for economic status in health surveys.
One paper makes a compelling argument for dropping this metric in favour of a simplified consumption expenditure schedule—one that would provide a more accurate picture of economic inequality in health outcomes.
Similarly, discrepancies between self-reported morbidity and test-based diagnosis, as well as the influence of third-party presence during sensitive interviews, highlight how social and cultural contexts must inform both data collection and interpretation.
There’s also an urgent call to evaluate the underutilised Health Management Information System (HMIS) data. With improvements, HMIS could supplement or even substitute certain aspects of survey data, especially when timeliness and cost-efficiency are concerns. At the institutional level, the National Data Quality Forum (NDQF), a collaboration between ICMR and the Population Council, has taken steps in the right direction.
But its efforts, however commendable, remain peripheral. Unless the culture of data quality assurance is baked into the DNA of every large-scale survey in India—right from questionnaire design to training, implementation, monitoring, and analysis—the trustworthiness of our health statistics will remain in doubt.
In short, India doesn’t just need more data—it needs better data. Policymakers, researchers, and civil society must demand that data quality analysis becomes a standard practice before public release and policy use, not an afterthought buried in academic journals. As the adage goes, garbage in, garbage out.
And when that garbage is feeding into national health policies, the cost is measured not in charts or spreadsheets, but in human lives.