Expanded coverage data

Today we updated coverage data page to provide extra information on the level of coverage we’re providing on the site. Previously we were providing total coverage figures, now we’re providing extra breakdowns by Full Members, and Associates and Affiliates, as well as showing coverage figures for matches between the two groups. Our overall coverage figure stays at 92.02%, as it was before, but it can now be seen that we have coverage for 98.24% of matches we’ve attempted to cover between the Full Members. We’ve also broken the yearly figures down by the same criteria.

This change is partly motivated by a desire to see more accurate figures ourselves. The overall coverage has been floating at around 92% for a number of months now, and we felt that the number was being dragged down by the difficulty in sourcing the data for the Associates and Affiliates. We were surprised at how high our coverage of Full Members matches is, and it has shown us how much work needs to be done for the Affiliates and Associates.

The other motivation we have for splitting the coverage data is that we’re going to start trying to add more international matches. This will include any full international match for a country for which we can source data, such as the recent Italy vs Denmark T20 match. Inevitably this will have the effect of reducing our overall coverage figure, so it will be useful to have more detailed figures so that we can see a more nuanced picture.

We’ll go into more detail regarding our plan to expand our coverage in the near future.

2 thoughts on “Expanded coverage data”

  1. I came across your blog while researching for cricket data. For the past 10 years or so, I have been collecting ball-by-ball data for the purpose of academic research on statistical modelling of cricket outcomes. Along with a team of a colleague and graduate students, we have published papers in the Operations Research, Mathematics and Statistics journals.
    The data collection part is quite tedious but we have been improving our methods. We start with downloading ball-by-ball commentary from cricinfo.com and then use R script to extract the name of the bowler, batsman, over.ball, outcome and some additional info on the type of ball bowled, run out, the match id etc.

  2. Hi Parmajit,
    I recently started using R for statistical analysis and id Iike to use the dataset on this site. Can you share some info on your R script to parse ball-by-ball data from cricinfo?


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.