Monday, March 4, 2013

More on methodology....

The core of this project involves a simulation of a season in which 10 of the best college baseball teams of all time go head to head to determine who triumphs over the long haul of a 162-game season.  To address this question, the batting, pitching, and fielding statistics for these 10 teams were entered into a league library database for Digital Diamond Baseball, an excellent program that simulates individual baseball games using the actual statistics for individual players.    Digital Diamond Baseball (or DDBB) allows a great variety of options in how to handle simulations, but at the core of the game is the confrontation between batter and pitcher.   For you SABR-heads out there, the probability of any results (home run, strikeout, single, etc.) of such a confrontation in DDBB is based upon Bill James' Log5 method, where:

Combined event probability = ((a * b) / c) / (num + ((1-a)*(1-b))/(1-c))

Where a = batter's event probability; b = pitcher's event probability; and c = league average pitcher's event probability  


This reference to the "league average" is essential as it involves normalization, or placing all players against a reference point of the league average.  For this "great teams" project, the average will not be the opposition the teams actually faced--it will be nine other very strong teams from college history.  Thus, although the league sports some .400 hitters, this is somewhat less impressive when some team batting averages were in the .350 range.  Furthermore, these batters will now be facing the pitching staffs of some of the best teams in history.  So, expect some of the gaudy offensive numbers to be toned down.  However, given the gradual shift in the nature of college baseball between the 1967 and 2001 seasons sampled here, also expect the teams from the 60's to the early 80's to have the best pitching, while the teams from the late 80's to the 2000s to have the best hitters.


The DDBB Interface, with Omaha's Rosenblatt Stadium hosting all games.

The DDBB program is tremendously flexible in how it handles simulations, but because I wished to keep the project semi-objective, I retained all default settings, other than changing fatigue settings tied to season-long usage.  This latter adjustment was necessary because the (on-average) 70 games that these college teams played led to AB and IP numbers that would suggest limited use in a 162 game season, for players who actually played every game.  Thus, any usage restriction features were adjusted with a 230% multiplier.  All games will be managed by the computer, using lineups and rotations derived from examining box scores and season statistics for games started.  Before this project plays its first game, I have to thank the sports information departments from all of these fine universities for their friendly cooperation with my requests for obscure information, as much of the material I needed was not available in online or other sources.

Well, enough of the numerical mumbo-jumbo, and on with the season!

No comments:

Post a Comment