Jump to content

Providing independent coverage of the Minnesota Twins.
Subscribe to Twins Daily Email
Photo

Debating WAR

  • Please log in to reply
126 replies to this topic

#41 TheLeviathan

TheLeviathan

    Twins News Team

  • Members
  • 13,240 posts

Posted 17 August 2014 - 12:44 PM

Using UZR to project the future needs a good sample. They still describe the past just as using a player's OPS in a half season sample. Neither is adequate for projection in that sample. WAR describes past value. Defense should be part of that value.

 

Jay answered this earlier. (Last post, first page)  OPS is a fairly accurate measure of past performance much more quickly than UZR or, subsequently, WAR.  So while it's true that it is a reflection of the past, it's a far less accurate one than OPS.


#42 drjim

drjim

    Senior Member

  • Members
  • 7,754 posts
  • LocationSt. Paul

Posted 17 August 2014 - 12:53 PM

Right, WAR describes what did happen. I'm not sure I get the "what it is" argument. It is a measure that attempts to value players on offense and defense. Are people using it for something else? Lots of people seem to dwell on things other than "we think this guy is better than this other guy" based on offensive and defensive measures that are the best we have available to us today. 

 

I would ask, again, what is a better measure? Should we just throw out defense entirely, or should we take UZR and other measures are directional?

 

I'll try again with defense:

 

1. The measure is inexact. This isn't really in dispute, the people who designed UZR admit as much. It probably has some value as a snapshot, but the exact numbers cannot be taken as exact, especially in small samples, but also to any samples smaller than three full years.

 

2. The second problem is how it is quantified and measured on par with offensive stats. This is not defense has no value, it certainly does. But, it is problematic, in light of the previous point, to try and quantify defense into a value that can be compared to offense with any sort of reliability. This also doesn't take into account outlier numbers, especially in corner positions, or the fact they really don't know how to properly quantify catchers or 1B.

 

Defense just seems to factor in so much more than offense - such as teammates, pitching styles, shifts, etc. These surely skew the numbers, especially in smaller samples of a season or less.

 

To answer your questions, I would pretty much get rid of WAR as it is currently construed. I would keep defensive, offensive and baserunning statistics separate and to be judged on their own merit. I wouldn't get rid of UZR or other defensive metrics, but I would keep hammering home the point that they are best as snapshots and ranges, not definitive ranks. And I would especially be leery of any efforts to quantify them into the same common value as offensive runs and then think it is a definitive stat.

 

WAR is probably the *best* measure only in the sense that there are no others. But it is seriously flawed and doesn't us much without looking at many other stats. There is value in comparing across positions and across eras, but that is a quick measure that should not be seen as definitive.

Papers...business papers.

#43 Mike Sixel

Mike Sixel

    Now living in Oregon

  • Members
  • 22,482 posts

Posted 17 August 2014 - 12:59 PM

I agree with all of that, except one part. The offensive numbers also are skewed by things out of control of the hitter, but people seem to dismiss luck and other things on d, without acknowledging the luck in o.

But we agree completely that UZR is not precise. I think everyone that is serious about stats agrees with that.
  • Mike Frasier Law likes this

I don't know, it is a site to discuss sports, not airline safety.....maybe we should take it less seriously?


#44 TheLeviathan

TheLeviathan

    Twins News Team

  • Members
  • 13,240 posts

Posted 17 August 2014 - 01:03 PM

I agree with all of that, except one part. The offensive numbers also are skewed by things out of control of the hitter, but people seem to dismiss luck and other things on d, without acknowledging the luck in o.

But we agree completely that UZR is not precise. I think everyone that is serious about stats agrees with that.

 

"Luck" on offense stabilizes fairly quickly, but even if you want to equate luck on offense and defensive it says nothing of the difficulty quantifying defense.  That's the problem, not luck.


#45 USAFChief

USAFChief

    Anyone got a smoke?

  • Twins Mods
  • 18,445 posts
  • LocationTucson

Posted 17 August 2014 - 01:44 PM

I agree with all of that, except one part. The offensive numbers also are skewed by things out of control of the hitter, but people seem to dismiss luck and other things on d, without acknowledging the luck in o.

But we agree completely that UZR is not precise. I think everyone that is serious about stats agrees with that.

To me, the problem with comparing offense and defense and attributing it to "luck" is this:

 

While it most certainly is true to say that (to pick a name at random) Pedro Florimon was "lucky" to string together a good month and hit .400, it is not at all true to say it didn't really happen.  No matter now lucky it was, we know for a fact he hit .400 and the value of that cannot be disputed.  He had 100 ABs, got 40 hits, and hit .400.  

 

Nobody contends he didn't really hit .400.  It represents exactly what happened on the field.

 

Compare that with what the designer's of UZR say...a good or bad UZR doesn't mean that's what happened:

 

...a player’s UZR, be it one year, one month or 5 years, is not necessarily what happened on the field and is not necessarily that player’s true talent level over that period of time either. That is why we regress, regress, and regress! A player can have a plus UZR and have played terrible defense, because the data we are using is far from perfect.

 

If it doesn't represent what happened on the field, I have a hard time giving it any credit.  If we're going on opinions, I'll go with the opinions of professional baseball people, or myself, rather than MGL.

 

You can't have a .400 BA and have hit poorly.  Luckily maybe, but not poorly.  Those hits actually happened.

 

But if you can have a plus UZR, and "played terrible defense," the measurement system needs work.

 

EDIT (meant to include this with the original post):  As for regression,  I'd say if you're "regressing" data to represent what happened in the past, you're doing it wrong.  I can understand regressing data to guess at what might happen in the future.  But what happened, happened.  It shouldn't need to be regressed to more closely represent truth, should it?  Does Baseball Reference regress Joe Mauer's 2009, because it probably doesn't represent his "true talent level?"  No.  Nor should it.  

 

While I'm at it...I'll take a shot at the "UZR is accurate in large sample sizes" argument as well.  (Unsurprisingly) I don't buy that argument.  If it's innacurate in small sample sizes, it is innacurate in larger sample sizes as well.  Adding up several small bunches of innacurate data gives me one big bunch of innacurate data, IMO.  

 

Not to mention, if WAR is being used to compare players across eras, there isn't PBP data for any players except those from the last decade or so, so "WAR" for Babe Ruth, for example, is derived without even the flawed data used to derive WAR for today's players.

 

Personally, I don't understand how smart people pay any attention to it.

  • notoriousgod71, TheLeviathan and Hosken Bombo Disco like this

I am not the paranoid you're looking for.


#46 jay

jay

    Senior Member

  • Members
  • 1,504 posts

Posted 17 August 2014 - 05:39 PM

While I'm at it...I'll take a shot at the "UZR is accurate in large sample sizes" argument as well.  (Unsurprisingly) I don't buy that argument.  If it's innacurate in small sample sizes, it is innacurate in larger sample sizes as well.  Adding up several small bunches of innacurate data gives me one big bunch of innacurate data, IMO.  

 

This is the only part I wouldn't agree with.  The data is mostly accurate, but certainly no where near perfect, and should represent a bell curve around the player's true talent level.  Getting the first 3 data points on one far end of the bell curve doesn't mean 300 data points aren't going to look like a bell curve.  

 

The key assumption there is that the data is in fact a bell curve around the player's true talent level.  When you look at how players are rated by the defensive metrics and compare it to the scouting approach or other methods, I think they are in agreement enough to believe that assumption.


#47 Craig Arko

Craig Arko

    Cassini-Huygens

  • Members
  • 7,689 posts
  • LocationSaturn's atmosphere and on Titan

Posted 17 August 2014 - 06:02 PM

"A cab was involved in a hit and run accident at night. Two cab companies, the Green and the Blue, operate in the city. 85% of the cabs in the city are Green and 15% are Blue.

A witness identified the cab as Blue. The court tested the reliability of the witness under the same circumstances that existed on the night of the accident and concluded that the witness correctly identified each one of the two colors 80% of the time and failed 20% of the time.

What is the probability that the cab involved in the accident was Blue rather than Green knowing that this witness identified it as Blue?"
“Segui il tuo corso et lascia dir le genti!” - Dante Alighieri.

#48 B Richard

B Richard

    Senior Member

  • Members
  • 530 posts

Posted 17 August 2014 - 06:23 PM

There are some well-reasoned, cogent explanations of WAR's value and shortcomings in this thread. In particular, I believe jay, kab21 and drjim have all provided some insight for those who might be interested in learning more about WAR and advanced statistics.

 

Some important kernels that bear repeating:

 

-No metric is perfect- each statistic carries its own blind spots. Some statistics have more blind spots than others. 

 

-All statistics are susceptible to misuse when small sample size or confirmation bias exist. Look at this thread for an example: http://twinsdaily.co...h-numbers-2014/

 

-Some statistics are better for describing what happened in the past, while others have more predictive value.

 

--------

 

The present clamoring against WAR's defensive component is not at all unfounded. 

 

For those of you who are frustrated with the difficulty of comparing player defense and WAR, consider another statistic I believe is much simpler-- wRC+

 

wRC+ is a metric used to measure offensive production. wRC+ considers individual events (how much value a single has compared to a double or home run) and weights them accordingly. Finally, wRC+ measures a player's offensive output against LEAGUE average, not replacement level, and presents it as x% better or worse than league average. A player with a wRC+ of 140 is currently producing runs at a rate 40% better than league average. It is a rate stat rather than a counting stat, so be wary of SSS. wRC+ accounts for park factors and is a reliable way to compare players' offense from different eras.

 

wRC+ is absolutely my go-to when I need to figure out how good a player's offense really is. Of course, a SS with a wRC+ of 120 is probably more valuable than a first baseman with a wRC+ of 120. In conjunction with other stats, wRC+ is a great way to measure player value. I think it's a good starting point for anyone interested in sabermetrics. 

 

See link for a thorough yet accessible explanation of wRC+:  http://www.fangraphs...ry/offense/wrc/

  • USAFChief, Mike Frasier Law, chuchadoro and 1 other like this

#49 TheLeviathan

TheLeviathan

    Twins News Team

  • Members
  • 13,240 posts

Posted 17 August 2014 - 06:38 PM

I like wRC+, largely because it does many of the things WAR does well without the crummy defensive metrics.


#50 markos

markos

    Senior Member

  • Members
  • 1,115 posts

Posted 17 August 2014 - 08:09 PM

Compare that with what the designer's of UZR say...a good or bad UZR doesn't mean that's what happened:

 

...a player’s UZR, be it one year, one month or 5 years, is not necessarily what happened on the field and is not necessarily that player’s true talent level over that period of time either. That is why we regress, regress, and regress! A player can have a plus UZR and have played terrible defense, because the data we are using is far from perfect.

 

If it doesn't represent what happened on the field, I have a hard time giving it any credit.  If we're going on opinions, I'll go with the opinions of professional baseball people, or myself, rather than MGL.

 

You can't have a .400 BA and have hit poorly.  Luckily maybe, but not poorly.  Those hits actually happened.

 

But if you can have a plus UZR, and "played terrible defense," the measurement system needs work.

 

I'm not sure that the problem you have identified is as bad as it sounds. UZR splits every batted ball into a bucket based on location and some simple hit trajectory/speed factors. These buckets aren't terribly granular. Each bucket is assigned a certain run value based on the historical probability of a hit or out being recorded. For each play UZR does represent what happens in the field in the sense that it records whether or not an out was made and assigns runs (positively or negatively) accordingly. The "problem" is that UZR doesn't necessarily capture the context of the play to dynamically adjust the value of the play due to circumstances. For example, you can imagine that due to positioning (ex. playing deep and shifting for a power hitter), an outfielder makes a very easy catch on a ball that UZR determined was very difficult historically. This is where the disconnect happens between the UZR rating and "what happened on the field", and why a player can have a plus UZR but not play very good defense. I don't think this is a huge deal, because in many ways this is similar to what happens in offense. A batter gets credit for a hit regardless if the hit was a weak grounder that squeaked through the infield or if it was a solid line drive. So as long as the fielder is making plays that have traditionally been difficult, then that seems valuable to me.

 

Overall, I think UZR is okay. Not great, but okay. I'm very excited and optimistic about the Trackman features that they will be rolling out soon. That should provide enough data to answer some of these fielding questions much more conclusively. 

 

One thing that WAR is missing is the quality of the competition. A home run hit off of Clayton Kershaw in Dodger Stadium seems like it should be much more valuable that a homer off of 2013 Phil Hughes in Yankee Stadium. I'm not sure if there is a reasonable way to include that information, but I think it would be important. 

  • Mike Frasier Law and twinsfan34 like this

#51 USAFChief

USAFChief

    Anyone got a smoke?

  • Twins Mods
  • 18,445 posts
  • LocationTucson

Posted 17 August 2014 - 09:16 PM

"A cab was involved in a hit and run accident at night. Two cab companies, the Green and the Blue, operate in the city. 85% of the cabs in the city are Green and 15% are Blue.
A witness identified the cab as Blue. The court tested the reliability of the witness under the same circumstances that existed on the night of the accident and concluded that the witness correctly identified each one of the two colors 80% of the time and failed 20% of the time.
What is the probability that the cab involved in the accident was Blue rather than Green knowing that this witness identified it as Blue?"


According to son number 2, 41%.
  • Craig Arko and twinsfan34 like this

I am not the paranoid you're looking for.


#52 Hosken Bombo Disco

Hosken Bombo Disco

    Minnesota Twins

  • Members
  • 7,337 posts

Posted 17 August 2014 - 10:28 PM

"A cab was involved in a hit and run accident at night. Two cab companies, the Green and the Blue, operate in the city. 85% of the cabs in the city are Green and 15% are Blue.
A witness identified the cab as Blue. The court tested the reliability of the witness under the same circumstances that existed on the night of the accident and concluded that the witness correctly identified each one of the two colors 80% of the time and failed 20% of the time.
What is the probability that the cab involved in the accident was Blue rather than Green knowing that this witness identified it as Blue?"

Fine, but what does this have to do with a train leaving Cleveland at 75 miles per hour?
  • Craig Arko likes this

It's a mere moment in a man's life between the All Star

Game and an old timer's game. - Vin Scully


#53 Hosken Bombo Disco

Hosken Bombo Disco

    Minnesota Twins

  • Members
  • 7,337 posts

Posted 17 August 2014 - 10:56 PM

It's why Alex Gordon is laughably two wins better than Miguel Cabrera this year.  Something no rational person would agree is an accurate valuation.

Not a big WAR supporter, but how many times did Gordon put out Mauer on balls hit to left this weekend?

And in a strange way, provided Kansas City reaches the postseason ahead of Detroit, I might even buy in to Gordon being two wins better than Cabrera. Like Chief was getting at, things that actually happen on the field go a long way with me.

It's a mere moment in a man's life between the All Star

Game and an old timer's game. - Vin Scully


#54 Badsmerf

Badsmerf

    Senior Member

  • Members
  • 2,608 posts

Posted 17 August 2014 - 10:56 PM

Should be better at statistic questions consider I have a bs in mathematics.... my first hunch is 80. I know it can't be though, because the actual percent is too low. So ny next hunch is 12, but that seems too low.

Edited by Badsmerf, 17 August 2014 - 11:10 PM.

  • Craig Arko likes this
Do or do not. There is no try.

#55 nicksaviking

nicksaviking

    Billy G.O.A.T

  • Twins Mods
  • 10,539 posts

Posted 17 August 2014 - 11:03 PM

Let's not make agreeing with each other too much of a trend.

 

Ha, yes! I was just going to post that we seem to have finally found a topic that Twins Daily can agree on!


#56 Craig Arko

Craig Arko

    Cassini-Huygens

  • Members
  • 7,689 posts
  • LocationSaturn's atmosphere and on Titan

Posted 18 August 2014 - 12:59 AM

According to son number 2, 41%.


He did his papa proud. What does he think about WAR?
  • chuchadoro likes this
“Segui il tuo corso et lascia dir le genti!” - Dante Alighieri.

#57 USAFChief

USAFChief

    Anyone got a smoke?

  • Twins Mods
  • 18,445 posts
  • LocationTucson

Posted 18 August 2014 - 06:05 AM

What does he think about WAR?


Friend only to the undertaker.
  • ashburyjohn, Kirby_waved_at_me and Craig Arko like this

I am not the paranoid you're looking for.


#58 Kirby_waved_at_me

Kirby_waved_at_me

    Senior Member

  • Members
  • 2,141 posts

Posted 18 August 2014 - 06:36 AM

I'm enjoying reading this conversation - I think there are good points for both sides of the argument.

 

I think WAR is an interesting stat to look at in hindsight, but I agree with the criticism that taking less than accurate defensive metrics and lumping them into the equation is problematic.

 

The last couple MVP debates have used WAR a lot to compare Miggy and Mike Trout. Trout gets a boost for playing CF (which, ya know, I think he should , at least a little bit) and Cabrera just hit and hit and hit - Average, power, and took advantage of his teammates being on base in front of him.

 

The career WAR stat is kind of hard to rely upon as well - It's not recalculated, is it? It looks like they take each season WAR and just called the sum the career WAR.

If the Defensive metrics have to be adjusted over the years . . . Wouldn't the "Career" WAR need to be a different math problem than simple addition? 

What about a guy like Biggio or Yount, who played in CF for chunks (but not all) of their career, compared to Edgar Martinez or Big Papi, who get no defensive boost - can't really compare the Career WAR of those players. Or Joe Mauer, who now has to be a different hitter entirely to be a 5 WAR player since he's not getting the benefit of doing what he does as a Catcher anymore.

 

I think that stats are cool. WAR really only compares a player to his own shadow, and not really to another real player. You can make the argument that a 10+ WAR season like Mickey Mantle in the 50's or Babe Ruth or Barry Bonds is better than pretty much anyone else. But if there are two players, say one with a season WAR of 5.3 and another with a season WAR of 4.9, it's going to be harder to say that 5.3 is for certain the better player that year. There's a lot more to account for, including a margin of error for the defensive portion, what position the guy plays, etc. etc. 


#59 TheDean

TheDean

    Senior Member

  • Members
  • 238 posts

Posted 18 August 2014 - 08:58 AM

The career WAR stat is kind of hard to rely upon as well - It's not recalculated, is it? It looks like they take each season WAR and just called the sum the career WAR.

If the Defensive metrics have to be adjusted over the years . . . Wouldn't the "Career" WAR need to be a different math problem than simple addition? 

What about a guy like Biggio or Yount, who played in CF for chunks (but not all) of their career, compared to Edgar Martinez or Big Papi, who get no defensive boost - can't really compare the Career WAR of those players. Or Joe Mauer, who now has to be a different hitter entirely to be a 5 WAR player since he's not getting the benefit of doing what he does as a Catcher anymore.

 

The thing with WAR (fWAR or rWAR) is that the defensive component is still based on "runs" (either Ultimate Zone Runs or Defensive Runs Saved, respectively), so they accumulate with playing time.  There's no need to "recalculate" because you're just summing totals just like career RBIs or HRs or HBPs or whatever.  The only difference is that a below-average defensive year can result in a negative value, whereas traditional counting stats cannot (no such thing as a negative HBP... I think!).  All the aforementioned flaws of UZR and DRS (and Total Zone) remain, but just wanted to make that point.

 

Also, the position changes are already taken care of by assigning the bonus runs on a per-inning basis.  Mauer earned a lot of "bonus" catching runs earlier in his career, but he no longer does.

 

If you agree with the premise of WAR, you can still compare Biggio to Papi, understanding that Biggio offered value in being able to "run" faster than 2 MPH at an all out sprint and do things with baseballs in his hand or glove to prevent the other team from scoring, and that that component of his game should be included in his "value" as represented by WAR to his team(s).  WAR has flaws like all stats, but the point of creating it was to give us a starting point for comparisons between players as different as Ortiz and Biggio.  We all have our own opinions on whether we think it succeeds at its intentions!


#60 kab21

kab21

    Senior Member

  • Members
  • 4,443 posts

Posted 18 August 2014 - 09:02 AM

 

 

USAFChief

 

 

While I'm at it...I'll take a shot at the "UZR is accurate in large sample sizes" argument as well.  (Unsurprisingly) I don't buy that argument.  If it's innacurate in small sample sizes, it is innacurate in larger sample sizes as well.  Adding up several small bunches of innacurate data gives me one big bunch of innacurate data, IMO.

 

 

 

Offensive stats are inaccurate for the same reasons in small sample sizes.It would not be a fair conclusion to say that offensive stats are also inaccurate in larger sample sizes.

 

I know that you do not trust defensive stats at all but when I look at multiple years of data for players I don't find a lot of surprises.Usually they match up with observations.Looking at 1/2 or 1 season of data isn't much different than looking at 1-2 months of offensive data.All sorts of players have great hitting months even if they aren't very good.

 

I also disagree that a player can have a great UZR rating in a small sample size while playing terrible defense.It's not as reliable as hitting but that player made a lot of difficult plays and avoided missing easy ones.Of course it doesn't take into account defense after the initial batted ball (tags, scooping the ball at 1B or relay throws) but a player with a great UZR rating did make a lot of plays.

Edited by kab21, 18 August 2014 - 09:08 AM.

Is 2016 2017 the year that a good pitching prospect is truly blocked by 5 good pitchers in the starting rotation? 

Looks like we will have to wait another year until a good pitching prospect is actually blocked.