Thoughts about the baseball world.

Sunday, November 06, 2005

2005 Database

So for those of you who follow me and/or fantasy baseball, you may know that I do a database every year. It started out as simply a compilation of the typical stats, then progressed to sabermetrics, and, perhaps ironically, finally to the pinnacle of sophistication and utility, a fantasy baseball rotisserie database.

I've toyed around with formulas for the past few years, and last year I finally got serious, and started using some real, hard data direct from Yahoo! fantasy leagues. This year, same formula, although I hope my data set is a bit more refined and accurate this time around (data from 36 randomly chosen Yahoo! leagues).

Anyhow, the basic premise is this: Calculate the average delta between ranks for each rotisserie category. For example, take homeruns. The leader may be 240, and the last place player might be something like 150, and of course you have the middle 10. You would take the delta between each rank, so the difference in home runs between the first player and the second player, second player and third player, et. al. Finding the mean of these, you arrive at the average number of homeruns you would need to hit, to move up a rank (and one point).

Ahh, so, with that, of course then you'd simply be able to divide a player's total homeruns (or, projected totals, if you're doing predictions for next season) by that homerun per rank constant, which will give you exactly how many points that player's HR total is worth. Say that the HR constant is 9 HR/rank. If a player hits 45 homeruns, that's equivalent to 45/9=5 points. In a layman's explanation, if the average delta for rank is 9 HR, then +45 HR would signify an average rank boost of 5 (an +5 points). Thus, if a player were to add 45 HRs to your team's total, it would reason that his sole contributions would amount to 5 points, in that homerun category.

So, simply calculate such for every rotisserie category, and add them all up for each player. Simple concept enough, although I must tell you it is a mountain of work to come up with these constants each year.

For the average categories, things are a bit different, because they're not solely dependent on the raw numbers. A 2.50 ERA from your starter is worth much more than a 2.50 ERA from your reliever, because your starter may have three times as many innings. I used to calculate this by simply setting an average number of innings (something like 180), and adjusting based on that (so, if a player pitched 200 innings, his ERA value would be multiplied by a factor of 200/180). Needless to say, it was a wholly arbitrary system, and served little bearing on direct statistical calculations (although it was serviceable for pitcher-to-pitcher comparisons). Last year, I developed a formula actually based on the Yahoo! league format, where IP limit is set at 1250 innings per season. Thus, take the number of innings the player has, and remove this from the 1250. For the remainder, assume that all of those innings are pitched at the league average 4.something ERA, factor in the player's ERA and innings, and arithmetically evaluate the new cumulative ERA (with the player's innings at his ERA, and the rest of the innings at league average ERA) with the league average ERA. The difference then, divided by the Yahoo! ERA delta constant, is the points of the player. Again, in layman's terms: We assume a plain league average for a team, and then find what the ERA would be if the player is inserted. We thus find the ERA delta (or in cases like Jose Lima, the elevation) that the player contributes, and find out how many points this is worth by dividing by the (delta ERA)/(rank) constant. The same type of system goes for WHIP, and similarly for hitters, where the AVG stat is calculated assuming the other 8 guys hit at league average, and inserting the hitter at 9th.

Ahh, so the stats, the stats! And those constants, you ask? Well here they are.

HR: 9.01262626262626
Run: 23.3434343434343
RBI: 24.7550505050505
SB: 10.2954545454545
AVG: 0.00248989898989899
Win: 4.10606060606061
Save: 11.7121212121212
K: 47.0606060606061
ERA: -0.111262626262626
WHIP: -0.0193939393939394

Note that ERA and WHIP are negative, because better ranks are for lower stats.

So what does it all mean. Well, judge for yourself. The age-old questions here are answered, however. Yes, the HR is just so slightly more valuable than the SB. You need 10.295 SB's per point, while only 9.013 for a HR (Or, if it's easier to see this way, 1 SB = 0.097 point, while 1 HR = 0.111 points). And the 20-game winner is far far ahead of the 40-save closer, or even the 50-save closer, for that matter.

And so, the top players of 2005? Dominated by hitters, which seems like a bit of a shift from my old constants, which had guys like Johan Santana and Randy Johnson way ahead of the hitters, but then a block of hitters before the next pitchers showed up. But then again, there were no Santana's nor Johnson's this season.

  1. Alex Rodriguez, NYY, 3B - 20.397 points
  2. Albert Pujols, STL, 1B - 19.172 points
  3. Derrek Lee, CHC, 1B - 19.072 points
  4. David Ortiz, BOS, DH - 17.918 points
  5. Mark Teixeira, TEX, 1B - 17.482 points
  6. Manny Ramirez, BOS, OF - 16.840 points
  7. Jason Bay, PIT, OF 16.170 points
  8. Alfonso Soriano, TEX, 2B - 15.662 points
  9. Miguel Cabrera, FLA, OF - 15.587 points
  10. Chris Carpenter, STL, SP - 15.582 points
No real suprises here. Alex Rodriguez certainly takes the cake, as an absolutely complete player overall. Pujols and Lee are also suprisingly up there in the same range, with Pujols aided by an out-of-nowhere speed surge (16 SB's, after having no more than 5 in his previosu 4 years), and Lee with his breakout power season (46 HR's, compared to previous career high of 32). After those three, there is a significant drop off to the next group, although Alfonso Soriano is extremely high, which should be represented once I calculate the position constants (essentially deviation from the average of the top 12 players at that position - 12 because 12 players, for 12 teams, are needed for that position, except for OF, P positions).

The clear trend, however, is toward power hitters. The top 3 guys are power hitters, with a bit of speed, for sure, but directly after them are three straight power-only hitters. It is not until the 12th ranked player, Carl Crawford, at 15.438 points, that we find our first non-power hitter.

That's it for now. Will update later, as I get the position constants calculated, and get into predicting 2006 stats. A downloadable version (Excel file) may be posted in the future, although I'm toying with the idea of actually publishing the thing.


Post a Comment

<< Home