A multitude of updates

It has been over a month since I added new data to the site and, to mark the end of that apparently inactive period, I’m going to provide an update on what I’ve actually been doing, and the results of that work. Make yourself comfortable because this is going to be a longer update.

Big Bash League data

I’ve added data for most matches from the last 2 years of the Big Bash League. This comprises 67 matches, and 15,859 deliveries. There are a few missing matches (you can see which ones on the Missing page), but the majority are there.

I plan to gradually add earlier years for the Big Bash League, and eventually expand to other major domestic T20 competitions, over the next number of months.

Replacement changes

As part of the work to add the Big Bash League I’ve also changed how the data format deals with replacements. Previously replacement information was confined to “super-substitutions” and nothing else. This has now been expanded to cover replacements generally, both at a “match” and “role” level. “match” replacements can be super- or concussion-substitutions (and involve a player replacing another in the match), whereas “role” replacements are occasions where a player replaces another as a bowler or batter (but does not take their place in the match).

This expansion of replacements has resulted in the “replacement” of the “super-subs” field on a delivery with the new and improved “replacements” field. More details on this new field can be found on the Format page.

Version update to 0.9

The expansion of replacements mentioned above has, unsurprisingly, resulted in the data version in the YAML files being changed from 0.8 to 0.9. My plan is that the next update to the data file will change the version to 1.0.0, and that I will stick strictly to semantic versioning from them on.

Name updates

Part of the work I have undertaken in the last month has been to perform a thorough update of the names using for players, and officials in the data files. This has consisted of two real changes. The first was to consolidate some occurrences of multiple names for players into a single instance, while the second update was to ensure that a name is used for only a single player. An example would be Rashid Khan, which was being used for 2 players. I have gone through all such instances and put steps in place to stop them re-occurring.

A long overdue addition – Women’s data

I’ve been making data available on this site since 2009 and have gradually increased the number of files I provide to the point that, as I write, 2,780 matches are available. Over time I’ve expanded from just matches involving Full Members, to the Indian Premier League, non-ODI international one-day matches, and international T20s. This gradual expansion means that I’m now providing over 380 matches involving only the Associates and Affiliates, meaning that I’m not just covering the Full Members. This has been an improvement, however there is still one issue, and that is that I’ve only been providing data for Men’s cricket.

I’ve wanted to add data for Women’s cricket for a while. I started the project with the idea of providing cricket data, but I didn’t really think of anything beyond Men’s cricket. Raf Nicholson expressed very well the trap I let myself fall into.

At its heart, it comes down to this: The first “C” in ICC has always, since its formation in 1909, stood for “cricket”, though what it should really have been called, up until it took control of women’s cricket in 2005, was the IMCC – the International Men’s Cricket Council. When a male journalist says, “I am a cricket correspondent”, he means “I am a men’s cricket correspondent.” When a blog refers to itself as an “England cricket blog”, what this generally means is “an England men’s cricket blog”. And when ordinary cricket fans say “cricket”, almost without exception what they really mean is “men’s cricket”. In short, men’s cricket is the default setting.

I very much fell into the trap of viewing Men’s cricket data as cricket data, and not considering Women’s cricket at all. This is unfair, and something I’ve been planning to fix. Men’s sports have awesome data as Allison McCann has noted, while Women’s sport is poorly served.

And just because the data doesn’t exist doesn’t mean we can’t compile it ourselves or make estimates based on what is available. I just think that in addition to praising the virtues of men’s sports data, we need to acknowledge that good women’s sports data is severely lacking.

I’m happy to announce that, as of today, Cricsheet will finally be providing data for Women’s cricket. The initial release consists of 257 matches, comprising 148 T20Is, 69 ODIs, 37 International T20s, and 3 Test Matches, and includes matches from as far back as 2009.

The addition of Women’s data has a practical implication for the data we already provide. The Data Format has just been changed to update the version from 0.6 to 0.7, to allow for the addition of gender as a new field in the info section. Right now this field contains either female or male, but I reserve the right to have other values in the future.

The Downloads page on the site has also been updated to allow users to download Women’s or Men’s matches in all of the variations we previously provided, as well as continuing to download all matches for all genders.

As Raf Nicholson wrote men’s cricket is the default setting, and I’ve been guilty of having that mindset. Today is a small step on the path to changing that, and to stop viewing men’s cricket as the default.

Until that changes, we have a problem. Until that changes, I’m going to keep telling the world that I am a feminist. Cricket needs feminism. End of story.

9 new countries, 91 new matches

3 months ago, in October, we said that we would “need to give some further thought as to how we will deal with T20 matches that aren’t regarded by the ICC as ‘T20 Internationals’“.

Well we’ve reached a conclusion, and implemented it. We’ve just updated the site with 91 new data files, including 53 international T20 matches (not T20s internationals), and added an extra 9 countries to those we have some data for, namely Denmark, Hong Kong, Italy, Namibia, Nepal, Papua New Guinea, Uganda, United Arab Emirates, and the USA. All 53 of these matches come from the World T20 Qualifier in November 2013. The match_type used for these matches is IT20, meaning “International T20”.

What is the difference between a T20 International and an International T20?

“T20 International” and “International T20” sound like they refer to the same type of match, but there is a subtle difference. A “T20 International” is a match that is recognised by the ICC as being a full international. This means a match between Full Members or those Associates and Affiliates to whom the ICC has “granted” T20 status. An “International T20” is the name we’re using to cover all other international T20 matches, such as those involving a country that hasn’t been granted T20 status.

Confusingly a country can play both types of T20. Ireland did so during the World T20 Qualifiers, playing a “T20 international” against Canada in the group stage while playing “International T20s” in the other group matches.

We don’t think the distinction should exist. If a match is played between any two countries and follows the T20 rules we think it should have the same status as any other similar match. The ICC disagree sadly and we note the difference for accuracy.

What about One-day matches that aren’t ODIs?

We don’t currently have any non-ODI one-day matches on the site. We’ll start adding these when the World Cup Qualifiers start in a weeks time. These will slowly appear on the site with a match_type of ODM.

We will look into adding older data for these types of match too, however finding the data will be the main problem as always. It’s rare that ball-by-ball commentary is provided for these matches, and it’s even more rare for it to be accurate when it is done, sadly. We’re quite good at fixing errors by now (practice will do that) but some sources are so catastrophically bad that we have to just throw our hands up and walk away.

When is an ODI not an ODI?

A person without knowledge of the cricket world when asked to define a “one-day international” might say that it would be a match between two countries played on a single day under an approriate set of rules. This apparently reasonable explanation would be incorrect. This illogical position is currently causing uncertainty as we investigate a forthcoming change of emphasis, and is one of many small signs of the inequality in the world of cricket.

As we write there are 106 nations listed as members of the ICC, 10 Full Members, 37 Associates, and 59 Affiliates. Of those 106 members only the Full Members have the permanent right to play ODIs. 6 of the Associates/Affiliates are “granted” the right to play ODIs for a limited period, depending on how they do in the World Cricket League. The other 90 members don’t get to play ODIs, but they can play one-day matches against other countries (although rarely, if ever, a Full Member).

In June, Andrew Nixon made an observation on Twitter regarding “the lack of variety at the top of international cricket. Same teams playing each other over and over”. We responded that “It’s disturbing how many people have no idea more than 10 countries play the game, and that they don’t see that as a problem”. At that point we realised that we were were falling into a variation of that mistake.

At the moment we provide data for Tests, ODIS, T20 Internationals, and IPL matches. This means that for international cricket we’re focussing exclusively on 16 out of the 106 ICC members. The vast majority of our data files feature only Full Members. We hadn’t even been attempting to include any other international matches. We’ve now decided this must change to include matches featuring all countries. This causes a dilemma for us as we have to decide how to refer to these matches on the site and within the data we provide. It may seem like a minor issue, but it’s one that is standing in the way of this expansion.

We were originally tempted to call all one-day matches between countries ODIs and to be done with it, however we feel that there should be a way to indicate that the matches were viewed as distinct from ODIs at the time they were played. In an ideal world there would be no such distinction but our opinion on this shouldn’t cloud our work in creating reliable data, so that option is off the table. We’re gradually coming to the view that we’ll just call each of these matches a “One-day match” and use the short code of ‘ODM’ for the match_type in the data files. It’s a small difference but it seems to fulfill our requirements.

We haven’t yet implemented this addition in any data file. We still have a number of issues to deal with first, mainly on the website, so that we can adequately distinguish between ODIs and ODMs, but also so that we can have useful coverage figures. We also need to give some further thought as to how we will deal with T20 matches that aren’t regarded by the ICC as ‘T20 Internationals’, as well as dealing with multi-day matches such as the Intercontinental Cup. Once we deal with some, if not all, of these issues we’ll look into adding matches involving the rest of the cricket world to the site. The main problem there will be in finding any source of ball-by-ball data, but we’ll worry about that problem when we can.