The Problem With WAR

August 18, 2015

Greetings, I am 1490, I am a four-year lurker for Twins Daily, and this is my first post.

Why now? Well, frankly, I feel like I can actually contribute with a valuable discussion that I don’t think has really ever been had since I’ve been lurking (perhaps it has and I missed it, but I digress), and as I was browsing B-Ref the other day, I stumbled across something that perturbed me quite a bit about WAR.

I like advanced metrics. I like them a lot. I believe they have had a fantastic impact on the sport and is beginning to force the average fan to consider baseball matters more thoroughly in order to keep up (in other words, it forces people to use their brain more, which is always good). But I’ve never been super crazy about WAR. I like the idea behind WAR, but I believe it needs to be replaced, or at least adjusted, as the “end-all statistic” of a player’s worth.

Why?

Well, the whole idea of WAR is based around how much a team is bound to suffer by losing that player and being replaced with an average replacement level place. This implies that there is some idea of a uniform value of a replacement player. It may fluxuate from year to year or based on position, but over the course of a single year, it should be uniform. In my mind, this implies that in one given season, there is a uniform standard of how many wins a team entirely composed of replacement level talent. Perhaps that team goes 50-112. Perhaps 65-97. Perhaps something else. Whatever. But in any event, different teams should not have different ideas of what a “replacement level player” looks like.

Well, the 2015 Minnesota Twins have a cumulative team bWAR of 7.4 currently, sitting at a .500 record of 59-59 (I’ll round and call that 7 WAR). In other words, if the Twins consisted of entirely replacement level players, they would have a record of 52-66 (extrapolated to a full season to be 71-91). OK, so I should be able to check a team with a record of 52-66, and they should have a cumulative team total of zero WAR. Luckliy enough, the Boston Red Sox have exactly that record. However, not only does their cumulative team total not equal zero WAR, but it equals 15.6 WAR, over twice the Twins total. So a “replacement level team” for the Bosox would currently have a record of 36-82 (or 49-113 over 162 games).

So to recap, the record of a replacement-level team in the 2015 season could be 71-91, or it could be 49-113.

I say that is an unacceptable disparity.

Those two teams were the only ones I checked (I can check more if need be, and probably will to satisfy my own curiosity), but hopefully it is enough to illustrate the point I am trying to make (that WAR must be either fixed or discarded).

How can this problem be solved? Well, I have ideas, but I’ll save them for another post (this one is long enough already).

Thoughts?

sane · August 19, 2015

WAR!, huh yeah
What is it good for?
Absolutely nothing, oh hoh, oh
WAR! huh yeah
What is it good for?
Absolutely nothing, say it again y'all
WAR!, huh good God, y'all!

JustinCB · August 19, 2015

For one thing, you're only accounting for batting WAR for each team. If you add Boston's pitching WAR (7) you get 22.6 for the team. If you add the Twins' team pitching WAR (10.7), you get 18.1 for them.

Second thing, team WAR isn't intended to match up perfectly with team win-loss record (which shows past performance only), it's meant to fall roughly in the range between team record and pythagorean record (which is a function of runs scored and allowed, and which is a better predictor of future performance.)

Pythag W%=[(Runs Scored)^1.81]/[(Runs Scored)^1.81 + (Runs Allowed)^1.81]

If you plug that in for Boston (.450) you'd expect their team WAR total to fall somewhere in the range of 52 team wins and 53.1 pythag wins. B-Ref sets replacement level at .320 winning percentage (or roughly 38 replacement wins based on 118 games played so far), so if you add Boston's team WAR to that, you get about 60. For Min (.478 pythag = 57 wins compared with their actual 59 wins, and team WAR gives you 56)

It's not perfect by any means, but it is useful as a rough tool for comparing the value of different players with a single metric. The Twins are right about where you'd expect them to be as a team based on WAR, and Boston is fairly significantly underperforming theirs, but like any other statistic there are almost always outliers.

August 19, 2015

Thanks for the response. I don't know how I missed that, but that does make it a lot better. It seemed like 7.4 was pretty low...

That also makes me feel better that there is some type of standard of how many wins a team of replacement level players ought to have.

Still, can't we avoid those outliers and things that kinda mess up the idea of WA? Why not simply calculating a team's total WAR based on their record compared to that .320 winning percentage, then divvy it up appropriately among players? Wouldn't that fix outliers like Boston?

Jham · August 19, 2015

Really well thought out post. Other posters know more about WAR than me, but I'm glad someone other than me is challenging our stats guys to come up with better stats and add interpretation to their comparisons. Also, there shouldn't be different ways to calculate the same stat...

kab21 · August 19, 2015

WAR has its problems but you have completely misused WAR and a team's record. According to nearly every advanced metric the Twins are very lucky to have a .500 record so you can't just subtract off X number of wins to find what a replacement level team would be. I think you would get a more consistent number if you used base runs for a team's record for example.

http://www.fangraphs.com/library/team-record-pythagorean-record-and-base-runs/

jimmer · August 19, 2015

From Dave Cameron:

'WAR is context neutral, so it doesn’t account for any kinds of sequencing, which can have a big impact on W-L record; just ask the A’s about that. It’s also a linear model, where team run scoring is not linear, so at the team level, it’s better to use something like BaseRuns.'

'WAR isn’t “inaccurate” because it doesn’t align with team win-loss records perfectly. It isn’t designed to, because team W-L records include a lot of things that aren’t player-skills.'

Secondary User · August 19, 2015

Thanks for the response. I don't know how I missed that, but that does make it a lot better. It seemed like 7.4 was pretty low...

That also makes me feel better that there is some type of standard of how many wins a team of replacement level players ought to have.

Still, can't we avoid those outliers and things that kinda mess up the idea of WA? Why not simply calculating a team's total WAR based on their record compared to that .320 winning percentage, then divvy it up appropriately among players? Wouldn't that fix outliers like Boston?

Sequencing. It makes a huge difference in observed results.

Edit: And as I post this, I see the post directly above mine quotes Dave Cameron's talk about sequencing

JustinCB · August 19, 2015

Still, can't we avoid those outliers and things that kinda mess up the idea of WA? Why not simply calculating a team's total WAR based on their record compared to that .320 winning percentage, then divvy it up appropriately among players? Wouldn't that fix outliers like Boston?

Are you talking about just taking any given teams' number of wins above replacement level and arbitrarily dividing them among the players? Its still fundamentally an individual statistic and its usefulness is in what it tells you about a given player compared to another with a single number derived from various measures of a player's performance. You can extrapolate team performance based on cumulative player WAR but it doesn't work the other way around.

August 19, 2015

Are you talking about just taking any given teams' number of wins above replacement level and arbitrarily dividing them among the players? Its still fundamentally an individual statistic and its usefulness is in what it tells you about a given player compared to another with a single number derived from various measures of a player's performance. You can extrapolate team performance based on cumulative player WAR but it doesn't work the other way around.

Arbitrarily? No, of course not. It would be divvied up more along the lines of Bill James' Win Shares (he still doesn't allow free use of that statistic, correct?)

If WAR is a context-free, individual statistic, then it needs to be measured using something that is individually quantifiable (or at least a number that isn't intrinsically baseball-related). Wins are not individually quantifiable, because individual players don't win games, teams do.

VORP seems like it captures my idea better than WAR (runs are a lot more individually quantifiable than wins, though not totally). Why isn't VORP more widely used? I'm not as familiar with it.

Otto von Ballpark · August 19, 2015

RAR is the runs based version of WAR. 10 runs in that metric is approximately 1 win.

August 19, 2015

RAR is the runs based version of WAR. 10 runs in that metric is approximately 1 win.

This makes a lot more sense as a measure for player ability, but what is the reasoning behind 10 runs ~ 1 win?

markos · August 19, 2015

VORP seems like it captures my idea better than WAR (runs are a lot more individually quantifiable than wins, though not totally). Why isn't VORP more widely used? I'm not as familiar with it.

Using wins instead of runs in useful because the run environment changes from season to season, so it is hard to compare players from different eras if you use RAR or VORP. So just think of it as a handy conversion factor, just like how people adjust for inflation when comparing economic data.

Otto von Ballpark · August 19, 2015

This makes a lot more sense as a measure for player ability, but what is the reasoning behind 10 runs ~ 1 win?

It's based on the Pythagorean formula.

Sign In

The Problem With WAR

Recommended Posts

1490

sane

JustinCB

1490

Jham

kab21

jimmer

Secondary User

JustinCB

1490

Otto von Ballpark

1490

markos

Otto von Ballpark

Archived

Member Statistics

Prospect News & Highlights

Quentin Young

Recent News

Notes & Rumors

Recent Blog Entries

Recent Status Updates

Twins Resources

Minor League Resources

MLB Draft Resources

History Resources

Recent Activity

My Activity Streams