The creation of the e-Courts platform for disseminating data from the subordinate judiciary was an important step in making Indian courts more transparent. This platform has also prompted an interest in data-driven research on courts. While the e-Courts platform is a major reform in itself, there are numerous obstacles in successfully using this data for research. Previous work has pointed out that the data has standardisation issues, particularly in case-type nomenclature. It has also been shown that other fields, such as statute names and section numbers, are missing in some cases. In this paper, we quantify these error rates, which have so far only been known to exist anecdotally. We also identify new issues with the data, notably issues with wrong data being entered in certain fields. We report and quantify problems with mismatches between case-types and statute names, missing and malformed data in the statute name, section number, and date-time fields. We also show variations in error rates across states. The Indian Supreme Court eCommittee has taken cognisance of and initiated interventions to address some of these issues. However, the fundamental cause of bad quality data, viz. the lack of systematic data quality reviews and capacity building for the same does not seem to be part of the committee’s plans. Until these quality issues are addressed, the use of this data for research will be limited.
On 29th October 2020, a roundtable discussion was held to discuss various aspects of judicial data quality. The discussion involved various stakeholders from the legal fraternity, as well as researchers and data experts. A compilation of the points from this discussion is available for download here along with the paper.