Hello again

Cricsheet officially went “on hiatus” on the 30th of October 2017, with the last match data actually being added on the 16th of July that year. Since then various factors (some very important, some merely awkward) have kept the site in a state of limbo. You don’t really need (or probably care for) the details, and I’d be reluctant to share anyway, but the important news is that the hiatus is finally over.

I’ve released the data for the matches played so far in the 2019 Indian Premier League (33 matches as I post this) and will continue to do so for the rest of the tournament. I’m hopeful that the data is up to the usual (hopefully high) Cricsheet standards, but would be grateful if people report any errors to me. There have been major changes behind the scenes to how I generate this data and, while precautions have been taken, there is the chance something incorrect will have slipped through.

My short-term plan is to continue adding IPL 2019 matches, while also looking to gradually fill in the gaps in domestic T20 competitions played since the beginning of the hiatus. After that I’ll look to add one-day internationals, before contemplating Test matches. I suspect Test matches will take a while to appear as I took certain shortcuts while working behind the scenes in order to be able to start updating sooner. Deals no with that tech debt will take a little time.

Anyway, hello again. It’s good to be back.

A multitude of updates

It has been over a month since I added new data to the site and, to mark the end of that apparently inactive period, I’m going to provide an update on what I’ve actually been doing, and the results of that work. Make yourself comfortable because this is going to be a longer update.

Big Bash League data

I’ve added data for most matches from the last 2 years of the Big Bash League. This comprises 67 matches, and 15,859 deliveries. There are a few missing matches (you can see which ones on the Missing page), but the majority are there.

I plan to gradually add earlier years for the Big Bash League, and eventually expand to other major domestic T20 competitions, over the next number of months.

Replacement changes

As part of the work to add the Big Bash League I’ve also changed how the data format deals with replacements. Previously replacement information was confined to “super-substitutions” and nothing else. This has now been expanded to cover replacements generally, both at a “match” and “role” level. “match” replacements can be super- or concussion-substitutions (and involve a player replacing another in the match), whereas “role” replacements are occasions where a player replaces another as a bowler or batter (but does not take their place in the match).

This expansion of replacements has resulted in the “replacement” of the “super-subs” field on a delivery with the new and improved “replacements” field. More details on this new field can be found on the Format page.

Version update to 0.9

The expansion of replacements mentioned above has, unsurprisingly, resulted in the data version in the YAML files being changed from 0.8 to 0.9. My plan is that the next update to the data file will change the version to 1.0.0, and that I will stick strictly to semantic versioning from them on.

Name updates

Part of the work I have undertaken in the last month has been to perform a thorough update of the names using for players, and officials in the data files. This has consisted of two real changes. The first was to consolidate some occurrences of multiple names for players into a single instance, while the second update was to ensure that a name is used for only a single player. An example would be Rashid Khan, which was being used for 2 players. I have gone through all such instances and put steps in place to stop them re-occurring.

Innings-level penalty runs – Data version updated to 0.8

47 Test matches were played in 2016 however, as dedicated observers of Cricsheet will have noted, I have, until now, only provided data for 46 of those matches. The 3rd Test of the 2016 New Zealand tour of India was the sole omission. As of today that has been rectified, and we now provide full Test data for 2016.

In that 3rd Test, while batting, Ravindra Jadeja persistently ran on the pitch resulting in 5 penalty runs being awarded to New Zealand before the start of their 1st innings. This method of applying penalty runs was not one that my existing data format could support (as we previously expected penalty runs to be applied on a particular delivery), meaning that I’ve had to apply a small update to the format to support this new development. This has resulted in an update of the data format version from 0.7 to 0.8.

The only change between versions is the addition of an optional penalty_runs field within each innings. If penalty runs were added to an innings, either before or after the innings, then this field will be provided (with pre or post used as appropriate).

Previous data files will be exactly as they were, save for the change of version number, while the newly-added data file for the aforementioned Test match will actually use the new field. If you’ve written code that uses the data I provide you may want to tweak it to take account of the new field.

A long overdue addition – Women’s data

I’ve been making data available on this site since 2009 and have gradually increased the number of files I provide to the point that, as I write, 2,780 matches are available. Over time I’ve expanded from just matches involving Full Members, to the Indian Premier League, non-ODI international one-day matches, and international T20s. This gradual expansion means that I’m now providing over 380 matches involving only the Associates and Affiliates, meaning that I’m not just covering the Full Members. This has been an improvement, however there is still one issue, and that is that I’ve only been providing data for Men’s cricket.

I’ve wanted to add data for Women’s cricket for a while. I started the project with the idea of providing cricket data, but I didn’t really think of anything beyond Men’s cricket. Raf Nicholson expressed very well the trap I let myself fall into.

At its heart, it comes down to this: The first “C” in ICC has always, since its formation in 1909, stood for “cricket”, though what it should really have been called, up until it took control of women’s cricket in 2005, was the IMCC – the International Men’s Cricket Council. When a male journalist says, “I am a cricket correspondent”, he means “I am a men’s cricket correspondent.” When a blog refers to itself as an “England cricket blog”, what this generally means is “an England men’s cricket blog”. And when ordinary cricket fans say “cricket”, almost without exception what they really mean is “men’s cricket”. In short, men’s cricket is the default setting.

I very much fell into the trap of viewing Men’s cricket data as cricket data, and not considering Women’s cricket at all. This is unfair, and something I’ve been planning to fix. Men’s sports have awesome data as Allison McCann has noted, while Women’s sport is poorly served.

And just because the data doesn’t exist doesn’t mean we can’t compile it ourselves or make estimates based on what is available. I just think that in addition to praising the virtues of men’s sports data, we need to acknowledge that good women’s sports data is severely lacking.

I’m happy to announce that, as of today, Cricsheet will finally be providing data for Women’s cricket. The initial release consists of 257 matches, comprising 148 T20Is, 69 ODIs, 37 International T20s, and 3 Test Matches, and includes matches from as far back as 2009.

The addition of Women’s data has a practical implication for the data we already provide. The Data Format has just been changed to update the version from 0.6 to 0.7, to allow for the addition of gender as a new field in the info section. Right now this field contains either female or male, but I reserve the right to have other values in the future.

The Downloads page on the site has also been updated to allow users to download Women’s or Men’s matches in all of the variations we previously provided, as well as continuing to download all matches for all genders.

As Raf Nicholson wrote men’s cricket is the default setting, and I’ve been guilty of having that mindset. Today is a small step on the path to changing that, and to stop viewing men’s cricket as the default.

Until that changes, we have a problem. Until that changes, I’m going to keep telling the world that I am a feminist. Cricket needs feminism. End of story.

Version updated to 0.6

The data version included in every data file I provide, and explained on the format page of the site, has just been changed from 0.5 to 0.6. This actually reflects a relatively minor change, and is the first time I’ve bumped the version number since February 2013.

In the 1st Test of 2014 between Pakistan and Australia, Sarfraz Ahmed was dismissed and play stopped for tea. After the break Zulfiqar Babar, who had been batting with Ahmed, didn’t come back out and retired hurt. This meant that in the data I needed to record 2 dismissals related to a single delivery. A complication had arisen.

As I’d never even considered multiple wickets on a single delivery as a possibility, and since it had never occurred in the previous 31,271 wickets I provide data for, I’ve had to tweak the data format, along with numerous scripts, to allow for this possibility. The change I’ve implemented allows the wicket entry on a delivery to contain a list of wickets, rather than always assuming just one. Balls where only a single wicket fell (all 31,271 of them thus far) are unchanged, this tweak simply allows for the possibility of something different.

If you’ve written code that uses the data I provide you should make a small tweak to check for the existence of multiple wickets on a delivery, however, if you don’t, you’ll probably be fine apart from when you try to process that single Test match where this issue.

There will be substantial changes to the data format coming in the next number of months, which will add new information for many of the matches currently covered. These may require tweaks to some of your code, but I will be providing parallel versions of the data files for a period of time, allowing users to continue to use the older version while updating their code. More details on these changes soon.

9 new countries, 91 new matches

3 months ago, in October, we said that we would “need to give some further thought as to how we will deal with T20 matches that aren’t regarded by the ICC as ‘T20 Internationals’“.

Well we’ve reached a conclusion, and implemented it. We’ve just updated the site with 91 new data files, including 53 international T20 matches (not T20s internationals), and added an extra 9 countries to those we have some data for, namely Denmark, Hong Kong, Italy, Namibia, Nepal, Papua New Guinea, Uganda, United Arab Emirates, and the USA. All 53 of these matches come from the World T20 Qualifier in November 2013. The match_type used for these matches is IT20, meaning “International T20”.

What is the difference between a T20 International and an International T20?

“T20 International” and “International T20” sound like they refer to the same type of match, but there is a subtle difference. A “T20 International” is a match that is recognised by the ICC as being a full international. This means a match between Full Members or those Associates and Affiliates to whom the ICC has “granted” T20 status. An “International T20” is the name we’re using to cover all other international T20 matches, such as those involving a country that hasn’t been granted T20 status.

Confusingly a country can play both types of T20. Ireland did so during the World T20 Qualifiers, playing a “T20 international” against Canada in the group stage while playing “International T20s” in the other group matches.

We don’t think the distinction should exist. If a match is played between any two countries and follows the T20 rules we think it should have the same status as any other similar match. The ICC disagree sadly and we note the difference for accuracy.

What about One-day matches that aren’t ODIs?

We don’t currently have any non-ODI one-day matches on the site. We’ll start adding these when the World Cup Qualifiers start in a weeks time. These will slowly appear on the site with a match_type of ODM.

We will look into adding older data for these types of match too, however finding the data will be the main problem as always. It’s rare that ball-by-ball commentary is provided for these matches, and it’s even more rare for it to be accurate when it is done, sadly. We’re quite good at fixing errors by now (practice will do that) but some sources are so catastrophically bad that we have to just throw our hands up and walk away.

When is an ODI not an ODI?

A person without knowledge of the cricket world when asked to define a “one-day international” might say that it would be a match between two countries played on a single day under an approriate set of rules. This apparently reasonable explanation would be incorrect. This illogical position is currently causing uncertainty as we investigate a forthcoming change of emphasis, and is one of many small signs of the inequality in the world of cricket.

As we write there are 106 nations listed as members of the ICC, 10 Full Members, 37 Associates, and 59 Affiliates. Of those 106 members only the Full Members have the permanent right to play ODIs. 6 of the Associates/Affiliates are “granted” the right to play ODIs for a limited period, depending on how they do in the World Cricket League. The other 90 members don’t get to play ODIs, but they can play one-day matches against other countries (although rarely, if ever, a Full Member).

In June, Andrew Nixon made an observation on Twitter regarding “the lack of variety at the top of international cricket. Same teams playing each other over and over”. We responded that “It’s disturbing how many people have no idea more than 10 countries play the game, and that they don’t see that as a problem”. At that point we realised that we were were falling into a variation of that mistake.

At the moment we provide data for Tests, ODIS, T20 Internationals, and IPL matches. This means that for international cricket we’re focussing exclusively on 16 out of the 106 ICC members. The vast majority of our data files feature only Full Members. We hadn’t even been attempting to include any other international matches. We’ve now decided this must change to include matches featuring all countries. This causes a dilemma for us as we have to decide how to refer to these matches on the site and within the data we provide. It may seem like a minor issue, but it’s one that is standing in the way of this expansion.

We were originally tempted to call all one-day matches between countries ODIs and to be done with it, however we feel that there should be a way to indicate that the matches were viewed as distinct from ODIs at the time they were played. In an ideal world there would be no such distinction but our opinion on this shouldn’t cloud our work in creating reliable data, so that option is off the table. We’re gradually coming to the view that we’ll just call each of these matches a “One-day match” and use the short code of ‘ODM’ for the match_type in the data files. It’s a small difference but it seems to fulfill our requirements.

We haven’t yet implemented this addition in any data file. We still have a number of issues to deal with first, mainly on the website, so that we can adequately distinguish between ODIs and ODMs, but also so that we can have useful coverage figures. We also need to give some further thought as to how we will deal with T20 matches that aren’t regarded by the ICC as ‘T20 Internationals’, as well as dealing with multi-day matches such as the Intercontinental Cup. Once we deal with some, if not all, of these issues we’ll look into adding matches involving the rest of the cricket world to the site. The main problem there will be in finding any source of ball-by-ball data, but we’ll worry about that problem when we can.

Expanded coverage data

Today we updated coverage data page to provide extra information on the level of coverage we’re providing on the site. Previously we were providing total coverage figures, now we’re providing extra breakdowns by Full Members, and Associates and Affiliates, as well as showing coverage figures for matches between the two groups. Our overall coverage figure stays at 92.02%, as it was before, but it can now be seen that we have coverage for 98.24% of matches we’ve attempted to cover between the Full Members. We’ve also broken the yearly figures down by the same criteria.

This change is partly motivated by a desire to see more accurate figures ourselves. The overall coverage has been floating at around 92% for a number of months now, and we felt that the number was being dragged down by the difficulty in sourcing the data for the Associates and Affiliates. We were surprised at how high our coverage of Full Members matches is, and it has shown us how much work needs to be done for the Affiliates and Associates.

The other motivation we have for splitting the coverage data is that we’re going to start trying to add more international matches. This will include any full international match for a country for which we can source data, such as the recent Italy vs Denmark T20 match. Inevitably this will have the effect of reducing our overall coverage figure, so it will be useful to have more detailed figures so that we can see a more nuanced picture.

We’ll go into more detail regarding our plan to expand our coverage in the near future.

Now available: Zip files

I’ve had a couple of requests over the years for zip files of different groups of matches, particularly international T20s, and Indian Premier League matches. I’ve generally created a file for those groups and send them a link to that file, and left it at that. I’ve now made a slight tweak to that process. You’ll now find a section called *Zip Files* on the homepage which contains links to 5 zip files, one each for Test matches, One-day internationals, T20 internationals, IPL matches, and one zip file of all 1,182 matches we currently have. The zip files contain the same data files you can still download individually, however if you’re after a number of matches one of the zip files might make more sense for you.

I’ve written a script to generate the zip files based on the criteria I provide, so if anyone wants/needs a different subset feel free to ask.

A new data version

I’ve been adding matches to the site on a fairly regular for the last few years, despite the lack of new articles on the site. Now however period of silence is finally over as there is a new data version to announce. Today I’ve moved all of the data files to version 0.5 and made a few other small changes to the site. First of all we’ll deal with the data format changes for version 0.5. These are fairly minor for the most part.

The first change is the addition of a revision field to the meta section of the file. This is set to 1 for every file at the moment and will increment any time there is a revision to the file. This replaced the updated field which I’ve decided was of little use.

The second change is the addition of new fields to deal with the situation where a match is decided by a bowl-out. The first field is the addition of a bowl_out to the outcome part of the info section which indicates which team won the match by the bowl-out. The second bowl_out, an addition to info, is an array containing details of the details of the actual bowl-out. It lists each ball bowled showing the bowler and the outcome. An example of a bowl-out can be seen in the file for the first West Indian T20 international in 2006.

The final change is the addition of a supersub entry to any delivery in which a super-substitution was made. This will be an array containing an entry for each substitution, containing in, out, and team fields showing which player came in, who was replaced, and which team made the substitution. You can see the only example on the site at this time in a South Africa vs New Zealand T20 match from 2005.

A number of changes are already in the works for version 0.6 of the data. More details on what those will be will come in the next few weeks.