Debating WAR

DocBauer · August 16, 2014

I'd love to invite debate and explanation of WAR and it's impact and reality.

I'm no expert on advanced metrics, and I admit this freely.

I think they have their place...at least to some degree...especially in a sport that is long established/obsessed with statistics to quantify performance and success. Is there anything in baseball that isn't measured? Forget the obvious such as BA, OB, or power numbers, RBI's, or situational hitting for positional players, or ERA or SO to BB percentages...we now measure everything from BABIP % to road/home and day/night BA. Now we try to interpret the rule book for the strike zone...which is NOT followed...vs pitching performance vs umpire strike calling vs catchers framing an obscure and poorly interpreted strike zone for each umpire to create a measurable statistic.

(Shaking head)

I think we sometimes hear or read certain modern age statistics and just nod our heads without regarding the very nature of said statistics. WAR, for instance. Someone prints/says/claims a player is plus or minus WAR and we just go "ooh" or "ahh". Once in a while we might even exclaim, "ah-ha!"

I just read an article on another web site about Danny Santana. You know, the top 20/10 prospect, depending on who's prospect list you read, who is loaded with athletic talent and potential who needs to harness his ability on a consistent basis to amount to anything, but who has real potential. He's the speedy, talented SS with flashes of pop/power with range and good arm who began his first ever season at AAA to begin the 2014 season. Lack of OF/CF depth, brought about by some FO mismanagement, thrust him in to a mostly unfamiliar role of player CF AT THE ML LEVEL.

And while everyone was surprised, enthralled, but certain of almost immediate regression/failure...the kid hit, and hit, and then hit some more. He bunted, stole bases, got extra base hits, and generally sparked a mundane lineup. When he went on the DL, I think most of us thought the magic carpet ride was probably over. Except, he came back playing and hitting as well as ever.

Now, I don't pretend to have a crystal ball to tell Santana's future, but said statistical experts claim he is worth 1.9 games above WAR. So here is where I get confused. Despite making mistakes, he has largely been mostly OK in CF and gotten better as the season progresses. He has continued to be a huge spark for the Twins somewhat limited offense. And I don't think I'm the only one who noticed a dip in production when he was out. And the Twins overall record notwithstanding, experts state that the this surprising spark plug player has given the Twins roughly TWO more wins than any other ML average player could provide. Am I missing something here?

I'm really interested if someone can explain a measurable of this nature. I hate to sound naive, because I'm not. And at the end of the day, baseball is still a team game. And a roster wins and loses together. But have statistics grown to an absurd level where a single player can be measured to the point of actually predicting wins and loses for a team?

USAFChief · August 16, 2014

If you're interested in learning how WAR is calculated, go to Fangraphs and read the explanation:

http://www.fangraphs.com/library/misc/war/

Baseball Reference does WAR a bit differently:

http://www.baseball-reference.com/about/war_explained.shtml

If it's a debate about its value, you might google it...there are dozens of articles out there discussing the subject.

If it's opinions from TD'ers you're looking for, my personal opinion is WAR is so flawed as to be worthless. I am probably in the minority with that opinion.

One minor point about the question you pose in your last sentence...WAR isn't intended to be predictive, it's intended to be a record of what happened.

kab21 · August 16, 2014

WAR is a good quick and dirty way to value a player.

There are many flaws for it to be used as a definitive value. For one it's very difficult to put defensive value, offensive value, positional value and pitching value onto the same scale. Inevitably when these are combined they result in a questionable answer.

Another big issue is that WAR is often used with too small of sample sizes. Santana has only played 62 games so far. Offensive stats need 1+ seasons of data to be useful. Defensive stats needs 3x (ballpark number) to be close to be useful.

One thing that I find odd is that pitching tries to eliminate luck from the equation by using Fielding Independent Pitching (at fangraphs) while offensive WAR doesn't care if someone has had a historically lucky season. For example Santana currently has a .395 BAPIP. Joe Mauer has one of the highest active career BAPIP's at .348 so it's safe to assume that Santana's BAPIP is going to drop a lot. His WAR looks good when he has .840 OPS but if/when his BAPIP drops to .320ish his OPS is going to drop below .700. Still a valuable player but not great like his current WAR says.

TheLeviathan · August 16, 2014

It's a cute little stat to look at, but defensive value is so subjective I'd much rather consult other stats to judge value.

jay · August 16, 2014

To clarify one thing from the original post, it's 2 Wins Above Replacement (WAR) level... NOT above ML average. That's a big difference. Plenty of articles out there on how replacement level is defined.

The concept is great -- combine all of the contributions a player makes in different aspects of the game and give it a value. It's relatively easy to understand. It's one number to tell you how good a player was. It's not perfect, but certainly a massive improvement over gawd-awful stats like RBIs and saves to value a player.

notoriousgod71 · August 16, 2014

Well according to FAngraphs Alex Gordon is the second highest poisition player in WAR this season even though no GM would ever draft him second. Jason Heyward is tenth. Ben Zobrist 11.

Just when you begin to think the only thing that matters in WAR is defense you get to Yan Gomes at 24 who throws every other ball into center field.

All of these guys are rated higher than Miguel Cabrera.

My other least favorite statisticis xFIP.

Some uppity fangrapsh writer dared to call me an idiot for not agreeing with him that Strasburg was having the best season of any pitcher this season (at the time he was leading in xFIP. At no point during this season has Strasburg been close to the best pitcher in baseball when you have Kershaw, Wainwright, Cueto, Hernandez, and now Kluber joining the party.

kab21 · August 16, 2014

I rather like xFIP for forward projection. It's not perfect for a season in review but ERA is pretty bad also. Strasburg has had a much better season than his 3.50 ERA suggests. Saying that he has had a better season than Felix or Kershaw is silly though.

TheLeviathan · August 16, 2014

I rather like xFIP for forward projection. It's not perfect for a season in review but ERA is pretty bad also. Strasburg has had a much better season than his 3.50 ERA suggests. Saying that he has had a better season than Felix or Kershaw is silly though.

WAR is a good example of the kind of stat that becomes it's own worst enemy when people use it in too heavy-handed a manner. Just like xFIP in this instance.

Thrylos · August 16, 2014

There are a lot of advanced statistical measurements out there and most of them have been built for a particular reason, but a whole bunch of them are used incorrectly on many occasions. Chief's links about WAR are a great reading.

There are 2 main categories of stats: Rate stats & Cumulative stats. Think Batting Average & ERA vs Home Runs and Strikeouts.

WAR is a cumulative stat and best used in retrospect (and best in comparison with contemporaries, even-though it was meant to be era neutral) to provide some light for discussions like whether someone belongs to the Hall of Fame, who should be the MVP, who was the best player overall for a particular team etc. And about as much predictive value as HRs, RBIs, & Es.

As a cumulative stat, if a player played 62 games (like Santana) it is very likely that his WAR will be less that someone who played 117 games (like Dozier.) And comparing WARs in this manner, does not make much sense, because that is not what the measurement was deviced to do...

Same with comparisons (even though are really really enticing) between position players and pitchers based of WAR. Pitcher WAR and Position player WAR are based on totally different individual components so they are not equivalent. It is like having a discussion about what car is the most powerful and comparing HorsePower numbers with Torque numbers.

Mike Sixel · August 16, 2014

My favorite part of this is the belief that people that watch a few games know more about defensive value than people that watch every single play, chart every play, analyze every play, and score every play. The defensive stats are not precise, but they are better than any other option we have.

WAR is cumulative, and backward looking........

TheLeviathan · August 16, 2014

My favorite part of this is the belief that people that watch a few games know more about defensive value than people that watch every single play, chart every play, analyze every play, and score every play. The defensive stats are not precise, but they are better than any other option we have.

WAR is cumulative, and backward looking........

"Not precise" is not the issue. It's largely subjective. I prefer stats, cumulative or rate, that are objective.

clone52 · August 16, 2014

I may be wrong, but 1.9 WAR is NOT 2 wins above an average MLB player. Its 2 wins above a replacement level player. Its 2 wins above the average AAA player that could take his spot. In theory anyway.

jorgenswest · August 16, 2014

In partial season samples, the rate stats are meaningful. Strike out rate, walk rate, ground ball rate and fly ball rate. For batters, ISO also becomes meaningful by 200 plate appearances. WAR describes the past but is not that useful in projecting the future.

When comparing Suzuki's rates in those areas to his career norms, a change in pattern can be seen. His change in performance is not just due to luck. Strikeout rate is a career low and walk rate a career high. Fly ball rate and isolated power are career lows. He has become a different hitter. His rate stats do a much better job of projecting as well as describing than WAR, but it is much harder to translate into a value. I would be surprised if WAR has much consideration in any teams analytics department. The basic rate stats play a key role in data analysis.

jay · August 16, 2014

Well according to FAngraphs Alex Gordon is the second highest poisition player in WAR this season even though no GM would ever draft him second. Jason Heyward is tenth. Ben Zobrist 11.

Just when you begin to think the only thing that matters in WAR is defense you get to Yan Gomes at 24 who throws every other ball into center field.

All of these guys are rated higher than Miguel Cabrera.

This is exactly how NOT to use WAR, if you ask me... So-n-so has 4.24 WAR this year and This-Guy has 4.21 WAR so So-n-so is better at baseball and that's really dumb cause everyone I know thinks This-Guy is better at baseball.

Unfortunately, it happens all the time.

USAFChief · August 16, 2014

This is exactly how NOT to use WAR, if you ask me... So-n-so has 4.24 WAR this year and This-Guy has 4.21 WAR so So-n-so is better at baseball and that's really dumb cause everyone I know thinks This-Guy is better at baseball.

Unfortunately, it happens all the time.

If I shouldn't use it to compare players, which is what it's designed to do, and which the designers say it does, what should it be used for?

And why should I pay any attention to it?

drjim · August 17, 2014

To me WAR represents a quick and dirty snapshot of how a player performed relative to his position and taking into account other contexts like ballpark. I also think it can provide a nice look at players across eras. A nice tool, but like most stats needs a lot of other information to say anything definitive.

I am especially skeptical of the defensive metrics and quantifying them in the same way offensive counting numbers are quantified. This, and the definition of replacement player and how it distorts some positional values (especially in small sample sizes), strike me as the biggest weaknesses of the stat.

The other key thing, in my mind, is to not pay too much attention to the tenths place. It is better to think of it as a range and not as a completely accurate order.

All that said, I very rarely look at WAR.

TheLeviathan · August 17, 2014

If you don't mind drjim, the next time I disagree with someone about the value of WAR....I'm using your post. It's spot on.

jay · August 17, 2014

If I shouldn't use it to compare players, which is what it's designed to do, and which the designers say it does, what should it be used for?

And why should I pay any attention to it?

My comment is more related to the finite level it is commonly used. I think that comes from the tradition of ranking players by the traditional counting stats (ie leading the league in RBIs). If the main complaint is accuracy, why insist on saying the SS ranked 11th by WAR was definitively better than the SS ranked 12th? I think it is accurate enough to see the range where a player's output across the full game fits, which is a massive improvement over simply looking at standard offensive counting stats.

Also, comparing quite commonly actually means predicting. The comment about who would get drafted where is a perfect example. Alex Gordon being 2nd in WAR this season among position players is a view of his performance this season. It is not saying GMs would predict him to do that going forward and hence draft him 2nd in some mythical all-player draft.

DocBauer · August 17, 2014

If I shouldn't use it to compare players, which is what it's designed to do, and which the designers say it does, what should it be used for?

And why should I pay any attention to it?

Kind of my point I guess

USAFChief · August 17, 2014

My comment is more related to the finite level it is commonly used. I think that comes from the tradition of ranking players by the traditional counting stats (ie leading the league in RBIs). If the main complaint is accuracy, why insist on saying the SS ranked 11th by WAR was definitively better than the SS ranked 12th? I think it is accurate enough to see the range where a player's output across the full game fits, which is a massive improvement over simply looking at standard offensive counting stats.

Also, comparing quite commonly actually means predicting. The comment about who would get drafted where is a perfect example. Alex Gordon being 2nd in WAR this season among position players is a view of his performance this season. It is not saying GMs would predict him to do that going forward and hence draft him 2nd in some mythical all-player draft.

Fair enough. My bad.

kab21 · August 17, 2014

Alex Gordon is basically the poster child of my two main hesitations about using WAR as a definitive ranking tool

A) sample size - defensive data does have value but multiple seasons is needed for it to have any value at all. Gordon is a good LF'er defensively but he's having one of the best defensive seasons in the MLB this year (not taking into account positional adjustment). Typically his defense rates a little above average so he benefits from a small sample size defensively.

quantifying defensive and offensive value (and pitching) using the same scale - Defense is important but I have never really agreed with fangraphs WAR about how much value is assigned defensively. Does heyward's great defense in LF really boost his average bat that much? Imo Freeman is an infinitely more valuable player but he will at best match Heyward's WAR averaged over multiple seasons.

Thegrin · August 17, 2014

WAR

What is it good for ?

Absolutely nothing.

TheLeviathan · August 17, 2014

Does heyward's great defense in LF really boost his average bat that much? Imo Freeman is an infinitely more valuable player but he will at best match Heyward's WAR averaged over multiple seasons.

Wow...I was having almost that exact debate with someone the other day. I used Heyward as an example of a guy whose WAR vastly overstates his significance. Couldn't agree with your post more.

jay · August 17, 2014

quantifying defensive and offensive value (and pitching) using the same scale - Defense is important but I have never really agreed with fangraphs WAR about how much value is assigned defensively. Does heyward's great defense in LF really boost his average bat that much? Imo Freeman is an infinitely more valuable player but he will at best match Heyward's WAR averaged over multiple seasons.

I agree with most of your post, but this part seems both accurate and inaccurate.

Offense and defense are measured on the same scale -- runs. At the end of the day, scoring 1 run and preventing 1 run accomplishes the exact same thing so that part certainly makes sense. However, simply being on the same scale doesn't mean they have the same influence.

When you look at the wRAA leaderboard (the batting swinging component of WAR), the leaders are around 40 runs above average (RAA) and the top 30 are all above 16. The UZR leaders (the defense component of WAR) are around 25 runs (Gordon and Heyward) but the top 30 are only above 5. That pretty well shows it is easier to generate WAR with the bat than the glove.

Somewhat ironically, it also highlights that the UZR numbers over this sample size are unreliable. Surely, Gordon and Heyward aren't 5x better on defense than the defenders at the bottom of the top 30. Until defense valuations are more accurate over smaller samples, it's my opinion that the UZR contributions to WAR should be weighted to account for less of the value.

kab21 · August 17, 2014

I agree with most of your post, but this part seems both accurate and inaccurate.

Offense and defense are measured on the same scale -- runs. At the end of the day, scoring 1 run and preventing 1 run accomplishes the exact same thing so that part certainly makes sense. However, simply being on the same scale doesn't mean they have the same influence.

When you look at the wRAA leaderboard (the batting swinging component of WAR), the leaders are around 40 runs above average (RAA) and the top 30 are all above 16. The UZR leaders (the defense component of WAR) are around 25 runs (Gordon and Heyward) but the top 30 are only above 5. That pretty well shows it is easier to generate WAR with the bat than the glove.

Somewhat ironically, it also highlights that the UZR numbers over this sample size are unreliable. Surely, Gordon and Heyward aren't 5x better on defense than the defenders at the bottom of the top 30. Until defense valuations are more accurate over smaller samples, it's my opinion that the UZR contributions to WAR should be weighted to account for less of the value.

I'm not saying that a run scored (offense) isn't equal to a run saved (defense).

The reason that this isn't true is that a player's offense doesn't measure into a run scored metric (other than the obvious) and first must go through all kinds of mathematics to output a number. Defense stats first face difficulty in accurately measuring plays made that are better or worse than the average defender. But the even bigger issue to putting offense and defense on the same scale (runs) is that you need to mathematically determine how many plays made equal one run.

The Wise One · August 17, 2014

Way back when there was what you called reliability and margin of error. Baseball statisticians do not talk of the margin for error because that goes down with sample size. There is a number of AB and games played that there would be some validity to defensive metrics and hence WAR. I am sure for the OF it would take a larger number of games that say a shortstop. For longer careers there would be something closer to validity than there would be for half seasons. Understanding what the statistic measures is the first key. Knowing the reliability would be the second.

stringer bell · August 17, 2014

I've always found WAR misses the boat when comparing players who play different positions. Here's two ex-Twins, one to whom WAR is kind, one to whom it is not: Nick Punto (36, with 3700+ PAs) Career WAR of 14.9. Michael Cuddyer (35, with 5600+ PAs) Career WAR of 15.3. Cuddy makes over $10M per year, Punto $3M. Per game played and plate appearance, Punto has been more "valuable" according to WAR.

Thegrin · August 17, 2014

Cuddyer was hurt and Span replaced him. Span was hurt and Revere replaced him.

In real life it all depends on the quality of the replacement.

drivlikejehu · August 17, 2014

The problem with defense is that you can't just ignore it. The difference in run-prevention between a gold glove SS and an immobile 1B is massive. Over a long period of time, the defensive fluctuations tend to even out and allow for comparisons across positions and eras. Sometimes the numbers don't produce the answer someone wants, but most likely it's because the person has misjudged player value due to their own biases.

For small sample use, WAR is still a fine starting point, from which adjustments can be made subjectively if desired.

About xFIP there is no question - it is a very handy metric. In a small sample size it is, by far, the best way to quickly evaluate pitcher performance.

jay · August 17, 2014

Way back when there was what you called reliability and margin of error. Baseball statisticians do not talk of the margin for error because that goes down with sample size. There is a number of AB and games played that there would be some validity to defensive metrics and hence WAR.

Right on! The FanGraphs primer on UZR talks about how a single season of UZR needs to be regressed by half and stresses the need to regress over and over (quote below). On the other hand, offensive stats generally become reliable within less than a season's worth of data.

...a player’s UZR, be it one year, one month or 5 years, is not necessarily what happened on the field and is not necessarily that player’s true talent level over that period of time either. That is why we regress, regress, and regress! A player can have a plus UZR and have played terrible defense, because the data we are using is far from perfect. It is exactly the same with offense and pitching. Do not for a second think that that is a unique problem with defensive metrics. It is not! The more data we have, however, the less likely the gap between UZR and what actually happened, and the smaller the gap between UZR and that player’s true defensive talent. And once we regress the sample numbers appropriately, we essentially shrink those gaps to zero, although there is still uncertainty with regard to the regressed number itself.

I wish FG would provide WAR values that are regressed instead of raw. You'd get rid of the big flaw people love to point out -- players who have single season values highly influenced by their unreliable defensive value.

Sign In

Debating WAR

Recommended Posts

Archived

Member Statistics

Prospect News & Highlights

Recent News

Notes & Rumors

Recent Blog Entries

Recent Status Updates