YORK, ENGLAND - MAY 13: A gentleman studies the form at York Racecourse on May 13, 2011 in York, England. (Photo by Alan Crowhurst/Getty Images)
This is the third in a five-part series examining the role and implementation of technology within the horse racing industry. Part I featured a review of horse racing websites. Part II examined mobile applications.
Handicapping horse races, for many, is an exercise in data analysis. Raw data -- final times, splits, distance, class conditions, or odds -- is an essential part of the handicapping process. Whether you calculate your own speed or pace figures, pedigree data, keep charts of track times and variants, it's essential to possess good, accurate data. Additionally, good data is critical in order to evaluate the history of the sport - the horses, jockeys and trainers that have come before. Without objective data we are simply left subjective impressions and memories. Those are certainly important because our subjective conclusions help to contextualize the raw data, but that's only part of the big picture.
Horse racing, perhaps more any other sport, is an industry built on numbers and data. Odds, time, distance, weight, pace fractions, parimutuel payouts, pool handles; every year huge amounts of data is generated within the industry. But while tons of data exists in the horse racing industry much of it, sadly, is locked away from most players and fans.
Compare horse racing, a sport that is heavily dependent on gambling dollars from handicappers, to baseball, a sport that is heavily dependent on tv revenue (which is a product of fan interest). Anybody could go to one of several websites and obtain, free of charge, statistics for every single man to ever play the game of professional baseball. Every single one. One of my favorite sites, run by database journalist Sean Lahman, provides zipped files of complete batting and pitching statistics back to 1871, along with "fielding statistics, standings, team stats, managerial records, post-season data, and more." His site isn't the only one where you can gather this kind of data, but it's one of the best. And it's all available for free (although he asks for donations to help maintain and update the data).
Fantasy sports are estimated to have a $3 to $4 billion annual impact across the sports industry as a whole, according to a University of Mississippi study in 2005. Baseball is a huge part of that share. Would that be the case if baseball's statistics and historical records were kept under lock and key, only available to those willing to pay a fee?
While we can speculate the direct impact to baseball of fantasy sports and the ability of baseball fans to access detailed statistical records, it seems foolish to think that viewership and interest would be as high if the system was closed. What does this have to do with horse racing? If you are a player that likes to use data as part of your handicapping process, it means everything.
Currently, the lifeblood of handicapping horse races - the raw data - is sheltered from most players and fans. Attempting to compile a database tracking the history of the sport (like you can with baseball data) would be extremely expensive or time consuming. You could construct the database yourself, hard-keying data in from result charts in a painstaking slow process (something I've done many times in the past). Or you could purchase track results in a more user-friendly format (comma-delimited) for a price. Chart files generally cost around $.50 per day/per track, and while that's not a lot of money, if you want to collect a several year history of multiple tracks the costs will quickly add up.
This isn't an argument that the Daily Racing Form, or any past performance product, should be free or that the creators of these products shouldn't sell for a profit. Proprietary information - such as speed figures, pace figures, track variants, whatever the case may be - certainly deserve to be sold for value. Where the trouble lies is with the raw, unadulterated data generated every day at tracks across America - the results. And I'll concede that the issue isn't clear-cut; a company like Equibase incurs costs in order to gather results data at tracks across the country, costs that it probably can't recoup without some kind of fee. But at the same time, to revert back to the baseball analogy, someone has to physically keep score of every baseball game and yet that data flows freely to fans.
While there are certainly difference between the business structures of baseball and that of the horse racing industry, one has to wonder: could we find a better way to collect and disseminate results data to the masses? Could the system open up, allowing for players to better enter the market? Could we generate more user-friendly data, instead of using formats that preclude efficient analysis?
Information that is available for free is generally presented in formats that make it difficult for the common fan to use it effectively, at least when trying to perform analysis on a large scales. Currently, most result information is available in pdf charts, a format that is not user friendly for data analysis. Some tracks - like Keeneland, Turfway, Arlington and Del Mar - provide a database of all the winners at their tracks over the last several years. The database downloads easily into Excel and provides a much quicker and efficient way to study results data. This is an excellent service to provide for their players, although it would be even better if the data included all runners in each race and not just the winners. But that's still more than most tracks provide.
I get the sense that, in some ways, the shielding of the data -- the reason behind the wall of payments for even the most basic historical data in user friendly formats -- is driven by the need to keep that data from the mainstream from a parimutuel perspective. What I mean by that is gambling on horse racing is a game of player vs. player. One player succeeds because he either a) has more information than those that bet incorrectly, or b) is better at analyzing the same information. I wonder if there is a fear that more robust and easily attainable data would invariably lead to a reduction in the parimutuel prices on winning horses.
Again turning to the baseball analogy; let's say that every Major League Baseball box score was published in pdf, as well as the yearly statistics. If you wanted data files (most likely in a comma delineated format which requires additional manipulation) you had to pay a fee of something like $200 a year, or more. Do we think that would be a good thing for baseball? Would the money generated from selling the nuts and bolts of the game - the on-field results - generate additional interest in the sport or less? It would likely severely dent fantasy baseball, as well as the countless publications and websites that provide in-depth statistical analysis of the sport on a daily basis. I don't think it's a stretch to suggest that would depress interest in the sport on some level.
Even if you don't believe that the sabermetric revolution in baseball is everything it's made out to be, I think most would admit that it's provided a fascinating way to look at the game that never existed before. Think of how many very smart people have used statistical analysis to change how fans look at the game. Could the same thing happen to horse racing if it were easier for its most die-hard fans to analyze the sport further? What could a horse racing version of Bill James or Tom Tango provide to the sport?
I think this is a problem for horse racing, and it's a bigger one than many are willing to admit. This is a sport dependent on people gambling and churning money through the windows. In order to bet they need to handicap races. In order to handicap races they need some kind of data to form their conclusions. The more difficult it is to obtain and understand the data, the more difficult the handicapping process becomes. Additionally, when data is available, if it's in a format that makes analysis cumbersome and time consuming, racing loses even more value. The easier it is for the handicapper to digest data, the more time he or she will have to do what tracks want and need them to do - bet.
Very few people within the horse racing industry have spoken out about the statistical wall in horse racing, with the exception of Ray Paulick at the Paulick Report who penned a fantastic piece entitled "Free Our Stats" in June of 2010.*
*Mr. Paulick also used the baseball statistics analogy but as an example of comparing great players from different eras, like Ken Griffey, Jr. and Hand Aaron. He noted the difficulty of comparing the greats in our sports, aside from the mundane 'starts-win-place-show' and 'earnings' lines.
There may be a slight light on the horizon, although it remains to be seen how bright that light is, in the form of Trakus. Trakus, a system that captures race data from transmitters placed inside the saddle cloth of each horse, eliminates much of the subjective nature of result charts and timing. The implementation of Trakus provides another source for race data although, at present, the data is not any easier to access on a large scale. You can find Trakus charts for free on their website but not the ability to download results to a more data friendly environment, or in large quantities.* Still, the advent of Trakus brings renewed hope that, eventually, the statistical black box of horse racing will one day be more open and available. It also provides for a much more robust and accurate system of charting results.
*You can also find Trakus charts at tracks like Keeneland, Del Mar, Woodbine, or any other track that uses the Trakus system.
The Big Picture
Statistics and data are the lifeblood of horseplayers and handicapping. Similar to many areas of the industry, and open and robust system of gathering and reporting data to players and fans can only serve to increase the reach of the sport. While it may be unrealistic for horse racing to match baseball in the amount of freely available data available to its fans, it's not unreasonable to expect the sport to improve the quantity and quality of information.