A long overdue addition – Women’s data

I’ve been making data available on this site since 2009 and have gradually increased the number of files I provide to the point that, as I write, 2,780 matches are available. Over time I’ve expanded from just matches involving Full Members, to the Indian Premier League, non-ODI international one-day matches, and international T20s. This gradual expansion means that I’m now providing over 380 matches involving only the Associates and Affiliates, meaning that I’m not just covering the Full Members. This has been an improvement, however there is still one issue, and that is that I’ve only been providing data for Men’s cricket.

I’ve wanted to add data for Women’s cricket for a while. I started the project with the idea of providing cricket data, but I didn’t really think of anything beyond Men’s cricket. Raf Nicholson expressed very well the trap I let myself fall into.

At its heart, it comes down to this: The first “C” in ICC has always, since its formation in 1909, stood for “cricket”, though what it should really have been called, up until it took control of women’s cricket in 2005, was the IMCC – the International Men’s Cricket Council. When a male journalist says, “I am a cricket correspondent”, he means “I am a men’s cricket correspondent.” When a blog refers to itself as an “England cricket blog”, what this generally means is “an England men’s cricket blog”. And when ordinary cricket fans say “cricket”, almost without exception what they really mean is “men’s cricket”. In short, men’s cricket is the default setting.

I very much fell into the trap of viewing Men’s cricket data as cricket data, and not considering Women’s cricket at all. This is unfair, and something I’ve been planning to fix. Men’s sports have awesome data as Allison McCann has noted, while Women’s sport is poorly served.

And just because the data doesn’t exist doesn’t mean we can’t compile it ourselves or make estimates based on what is available. I just think that in addition to praising the virtues of men’s sports data, we need to acknowledge that good women’s sports data is severely lacking.

I’m happy to announce that, as of today, Cricsheet will finally be providing data for Women’s cricket. The initial release consists of 257 matches, comprising 148 T20Is, 69 ODIs, 37 International T20s, and 3 Test Matches, and includes matches from as far back as 2009.

The addition of Women’s data has a practical implication for the data we already provide. The Data Format has just been changed to update the version from 0.6 to 0.7, to allow for the addition of gender as a new field in the info section. Right now this field contains either female or male, but I reserve the right to have other values in the future.

The Downloads page on the site has also been updated to allow users to download Women’s or Men’s matches in all of the variations we previously provided, as well as continuing to download all matches for all genders.

As Raf Nicholson wrote men’s cricket is the default setting, and I’ve been guilty of having that mindset. Today is a small step on the path to changing that, and to stop viewing men’s cricket as the default.

Until that changes, we have a problem. Until that changes, I’m going to keep telling the world that I am a feminist. Cricket needs feminism. End of story.

Small Steps

This site contains the initial data files I generated for a number of international matches in 2009. They’re accurate, but not in a format I’m completely happy with. Rather than wait until I work out the right format I’m just throwing them out there and will change them as required.

Three things inspired me to do my initial work on generating data files for cricket matches; the first was the book Moneyball by Michael Lewis regarding the efforts of the baseball team the Oakland Athletics to use statistical analysis to build the roster; the second was a post on Pappus plane briefly mentioning a database of cricket data; the third, my discovery of the inspiring work of Aneesh at Against The Spin in providing data for numerous T20 matches.

After a brief discussion with Aneesh, I decided to put some work into trying to expand on his work. Rather than going into mind-numbing detail regarding the process I’ll simply say that I succeeded in adding further details of each wicket, such as who was out, how, and who was involved, better player names, and, non-striker information. These additions are merely the first small steps towards the level of data I would like to see available to statisticians. My thoughts on where this may go will come at a later date.