Debating WAR

TheDean · August 18, 2014

Offensive stats are inaccurate for the same reasons in small sample sizes. It would not be a fair conclusion to say that offensive stats are also inaccurate in larger sample sizes.

I know that you do not trust defensive stats at all but when I look at multiple years of data for players I don't find a lot of surprises. Usually they match up with observations. Looking at 1/2 or 1 season of data isn't much different than looking at 1-2 months of offensive data. All sorts of players have great hitting months even if they aren't very good.

I also disagree that a player can have a great UZR rating in a small sample size while playing terrible defense. It's not as reliable as hitting but that player made a lot of difficult plays and avoided missing easy ones. Of course it doesn't take into account defense after the initial batted ball (tags, scooping the ball at 1B or relay throws) but a player with a great UZR rating did make a lot of plays.

I mean, even in small samples it does a decent job for the 2014 Twins. Guess who has the worst UZR ratings. Kubel, CC, and Hammer. Who played good defense? Fuld, Florimon, Plouffe 2.0, and Escobar. The big small-sample size surprise is Dozier, playing pretty much league average 2B according to UZR. Also a bit surprised by how many negative UZ runs Hicks accumulated this year, but his defense has been criticized here before.

ashbury · August 18, 2014

Offensive stats are inaccurate for the same reasons in small sample sizes.

I think they are not quite on the same level, because the granularity of offensive stats actually contains several individual components. Since most plate appearances consist of several pitches, the batter has several opportunities to get himself out on a slider in the dirt or to mash a pitch that catches a little more of the plate than the pitcher intended. Each fielding chance, by contrast, is essentially "make this play, or not". A few cans o' corn for one center fielder versus long runs to the gap for another center fielder take a good deal more time to even out, as a result.

If you rated a batter based on a 100-pitch sample, probably you'd have somewhere the same SSS trouble as with a 100-chance sample for a fielder. _{^{Give or take.}}

Badsmerf · August 18, 2014

He did his papa proud. What does he think about WAR?

I haven't been able to get 41 yet. I was driving and trying to figure it out while everyone was sleeping yesterday. Now, working through it, I am still not able to get 41. What the hell. I'm getting pissed.

The part that is throwing me off is the 80 percent chance thing. Without caring to look up the answers, it should be .15 x .8 coming out to .12. I know its too low. So then I thought "well, at 50% chance of being right the chance would be 15%. So where does it go from there. Now I'm more confused and am contemplating getting out my calc-based statistics book. Help my sanity and just do that ****ing problem for me.

Craig Arko · August 18, 2014

I haven't been able to get 41 yet. I was driving and trying to figure it out while everyone was sleeping yesterday. Now, working through it, I am still not able to get 41. What the hell. I'm getting pissed.

The part that is throwing me off is the 80 percent chance thing. Without caring to look up the answers, it should be .15 x .8 coming out to .12. I know its too low. So then I thought "well, at 50% chance of being right the chance would be 15%. So where does it go from there. Now I'm more confused and am contemplating getting out my calc-based statistics book. Help my sanity and just do that ****ing problem for me.

There is a 12% chance (15% times 80%) of the witness correctly identifying a blue cab.

There is a 17% chance (85% times 20%) of the witness incorrectly identifying a green cab as blue.

There is therefore a 29% chance (12% plus 17%) the witness will identify the cab as blue.

This results in a 41% chance (12% divided by 29%) that the cab identified as blue is actually blue.

Badsmerf · August 18, 2014

I feel better now. Thanks for that.

Kirby_waved_at_me · August 18, 2014

Ah - but where does the 'grit' of the witness get factored in?

diehardtwinsfan · August 18, 2014

When someone says you need multiple seasons worth of data in order get "accurate" data, I become a bit skeptical. That's where I am with UZR. This was the same metric that said that Carlos Gomez was saving 80 runs in a season in CF, and while he's a good CF, UZR seems to miss pretty big when it misses. If that's the main basis of WAR, I think we have a problem. I also question any stat where a the best pitchers in the league are rated lower than the best hitters.

USAFChief · August 19, 2014

Offensive stats are inaccurate for the same reasons in small sample sizes.

...

I also disagree that a player can have a great UZR rating in a small sample size while playing terrible defense.

1. Offensive stats are not inaccurate. Lucky maybe, or unlucky. Unsustainable, unrepeatable, likely to regress, sure. But not inaccurate...remember, WAR isn't forward looking, it looks at what happened. We know exactly what someone hit, for any amount of PAs. Offensive stats are nothing if not accurate.

2. Well, you can disagree, but you're disagreeing with the definition as posted at Fangraphs, which says precisely that: "A player can have a plus UZR and have played terrible defense."

Those aren't my words.

kab21 · August 19, 2014

I think they are not quite on the same level, because the granularity of offensive stats actually contains several individual components. Since most plate appearances consist of several pitches, the batter has several opportunities to get himself out on a slider in the dirt or to mash a pitch that catches a little more of the plate than the pitcher intended. Each fielding chance, by contrast, is essentially "make this play, or not". A few cans o' corn for one center fielder versus long runs to the gap for another center fielder take a good deal more time to even out, as a result.

If you rated a batter based on a 100-pitch sample, probably you'd have somewhere the same SSS trouble as with a 100-chance sample for a fielder. _{^{Give or take.}}

I'm not sure if this actually is a counterpoint to anything I said. A small sample size of data is not useful by itself in almost every baseball and non-baseball application. And I think your argument basically explains the logic that significantly more data is needed for defensive stats which is what everyone in this thread has been saying is needed.

diehardtwinsfan When someone says you need multiple seasons worth of data in order get "accurate" data, I become a bit skeptical. That's where I am with UZR. This was the same metric that said that Carlos Gomez was saving 80 runs in a season in CF, and while he's a good CF, UZR seems to miss pretty big when it misses. If that's the main basis of WAR, I think we have a problem. I also question any stat where a the best pitchers in the league are rated lower than the best hitters.

Don't use a misuse of UZR as a counterargument to using UZR. I remember the 80 runs saved (or was it 40?) analysis but I don't know where that data came from anymore. His UZR/150 games in two Twins seasons was only about 14 and his career rate is 15. This does make him one of the best defensive CF'ers but not exponentially better than the other good CF'ers.

USAFChief · August 19, 2014

Don't use a misuse of UZR as a counterargument to using UZR. I remember the 80 runs saved (or was it 40?) analysis but I don't know where that data came from anymore.

http://www.minnpost.com/sports/2009/05/twins-pick-worst-option-benching-carlos-gomez-favor-delmon-young

try that for starters.

kab21 · August 19, 2014

1. Offensive stats are not inaccurate. Lucky maybe, or unlucky. Unsustainable, unrepeatable, likely to regress, sure. But not inaccurate...remember, WAR isn't forward looking, it looks at what happened. We know exactly what someone hit, for any amount of PAs. Offensive stats are nothing if not accurate.

2. Well, you can disagree, but you're disagreeing with the definition as posted at Fangraphs, which says precisely that: "A player can have a plus UZR and have played terrible defense."

Those aren't my words.

Please don't selectively quote long texts.

Also from fangraphs and it explains exactly what they meant by your selective quote. http://www.fangraphs.com/blogs/the-fangraphs-uzr-primer/#15

A player can have a plus UZR and have played terrible defense, because the data we are using is far from perfect. It is exactly the same with offense and pitching. Do not for a second think that that is a unique problem with defensive metrics.

USAFChief · August 19, 2014

Please don't selectively quote long texts.

Also from fangraphs and it explains exactly what they meant by your selective quote. http://www.fangraphs.com/blogs/the-fangraphs-uzr-primer/#15

Then one would think they would regress offensive stats too, right? Mauer's 2009 didn't really happen, we need to regress it.

I edited your earlier post because I only intended to respond to two specific things you said, and didn't want to clog up the thread with unneccessary quotes.

The entire fangraphs definition was posted earlier in this thread, as well.

drjim · August 19, 2014

So they are saying someone could hit poorly and still have a high OPS? That is an interesting concept.

I can understand a bad hitter having a hot stretch but that isn't what they are saying.

kab21 · August 19, 2014

So they are saying someone could hit poorly and still have a high OPS? That is an interesting concept.

I can understand a bad hitter having a hot stretch but that isn't what they are saying.

Chris Colabello did in April. Parker (or someone) did an analysis that concluded that he wasn't pulling many balls and just dropped them into the opposite field while connecting with a few HR's. He didn't hit the ball very well or often but they dropped for hits. That's a perfect example of what they are talking about.

TheLeviathan · August 19, 2014

Chris Colabello did in April. Parker (or someone) did an analysis that concluded that he wasn't pulling many balls and just dropped them into the opposite field while connecting with a few HR's. He didn't hit the ball very well or often but they dropped for hits. That's a perfect example of what they are talking about.

That just means it isn't sustainable. The problem cited about UZR is more analogous to it recording a "hit" when reality was a groundout.

ashbury · August 19, 2014

I'm not sure if this actually is a counterpoint to anything I said.

I was offering a possible reason why a season's worth of offensive stats is often considered meaningful, while it is stated that one needs several seasons' worth of defensive stats to be meaningful.

Let me say it another way: in the course of a season a batter may have to make decisions or two or three thousand pitches, many of which are easy and some fraction which are what separate the good batters from the bad. In the course of a season a right fielder may have to make decisions on two or three hundred fly balls, many of which are easy and some fraction of which are what separate the good fielders from the bad.

kab21 · August 19, 2014

Then one would think they would regress offensive stats too, right? Mauer's 2009 didn't really happen, we need to regress it.

I edited your earlier post because I only intended to respond to two specific things you said, and didn't want to clog up the thread with unneccessary quotes.

The entire fangraphs definition was posted earlier in this thread, as well.

You are still misusing the one sentence of fangraphs analysis because if you take as part of the larger analysis there is a much different conclusion.

Offensive stats are accurate the same way defensive stats are accurate although defensive stats have more room for error. We know a hitter got a hit the same as a whether or not a defender made a play or not. We don't know if a hitter got a hit because he hit an easy fly to Delmon and Delmon spun around 5 times and ended up doing a faceplant. The same is true of defensive stats. We don't know why or how someone makes plays but we know that a positive UZR means he's making more plays than another defender. BA and offensive stats don't tell you if a hitter actually hit the ball well no more so than measuring the number of plays that a defender makes tells you he plays good defense. It does tell you that he makes outs.

The reason imo why fangraphs wrote that exact quote is because there are things that UZR doesn't even attempt to include that influence how well a defender actually plays. It doesn't take into account bad/great defenders that increase/decrease the number of plays someone makes. It doesn't take into account ballpark or environmental effects (neither does BA). It doesn't take into account defensive positioning decisions made by the manager. It doesn't take into effect relay throws, receiving throws or tagging runners. And probably several other important defensive qualities that are hard to quantify.

Nobody has said that it's perfect but if you actually look at a relevant numbers (multi season) they do actually tell a pretty good story of defense.

kab21 · August 19, 2014

That just means it isn't sustainable. The problem cited about UZR is more analogous to it recording a "hit" when reality was a groundout.

It really isn't any different though. Chris didn't hit the ball well but the metric said he did. That could be true defensively. But usually if someone has a hot streak they are actually hitting the ball well and the same is true defensively. If someone is making a lot more plays than usual they are probably having a hot streak defensively. It doesn't make them a great defender but they are making plays/outs.

TheLeviathan · August 19, 2014

It really isn't any different though. Chris didn't hit the ball well but the metric said he did. That could be true defensively. But usually if someone has a hot streak they are actually hitting the ball well and the same is true defensively. If someone is making a lot more plays than usual they are probably having a hot streak defensively. It doesn't make them a great defender but they are making plays/outs.

I think you're making more out of "good hit" vs "bad hit" to try and make a point. If such a distinction even exists.

snepp · August 19, 2014

This was the same metric that said that Carlos Gomez was saving 80 runs in a season in CF,

No, it never remotely said any such thing.

There are many valid arguments to be made against defensive metrics (most of which have already been made in this thread), but grossly inaccurate statements like this are not one of them.

USAFChief · August 19, 2014

No, it never remotely said any such thing.

There are many valid arguments to be made against defensive metrics (most of which have already been made in this thread), but grossly inaccurate statements like this are not one of them.

Our Friend Aaron Gleeman:

According to Ultimate Zone Rating as a duo Gomez in center field and Span in left field (or right field) has been 30 to 35 runs above average per 150 games. Meanwhile, as a duo, Span in center field and Young in left field has been 45 to 50 runs below average per 150 games.

The latter total is inflated by Span's unsustainably horrible numbers in limited action as a center fielder, but even if you ignore them to give him credit for being exactly average in center field — which at this point is far from a safe assumption — the Young-Span alignment is 40 to 50 runs worse than the Span-Gomez alignment. In other words, by benching Gomez for Young, the Twins are gaining 15 runs offensively and losing 40 to 50 runs defensively. All of which is why focusing on their batting averages is silly.

http://www.minnpost.com/sports/2009/05/twins-pick-worst-option-benching-carlos-gomez-favor-delmon-young

I remember this specific debate at BYTO...

snepp · August 19, 2014

Our Friend Aaron Gleeman:

I remember this specific debate at BYTO...

I do as well. Very, very well.

Note the very first line of your post. That particular bit of analysis was a prime example of an egregious misuse of a statistic, one that made me cringe.

USAFChief · August 19, 2014

I do as well. Very, very well.

Note the very first line of your post. That particular bit of analysis was a prime example of an egregious misuse of a statistic, one that made me cringe.

My bad...I misunderstood. Not unusual for me, sadly.

snepp · August 19, 2014

My bad...I misunderstood. Not unusual for me, sadly.

We could re-enact some of those old BYTO discussions on UZR/WAR, but we probably wouldn't make it past page 1 before someone would have to close it down.

jay · August 19, 2014

I think you're making more out of "good hit" vs "bad hit" to try and make a point. If such a distinction even exists.

Makes plenty of sense to me. A guy with an .800 BABIP is either a cyborg or got plenty of "bad hits".

jay · August 19, 2014

Offensive counting stats mostly reflect the reality of the past. Scoring decisions for offensive stats can face the same subjectiveness that applies to defensive stats. Once that decision is made, you can put it in all sorts of counting and rate stats, but the downside is you lose an amazing amount of context.

Defensive stats start at that same point of reality -- did he catch it or not -- but have to use that same type of scorer's interpretation of whether it should have been caught based on the context in order to place a value on it. There's more subjectiveness to it, hence the need for larger sample sizes.

Neither one necessarily reflects true talent level or what you'd expect to happen in the future.

drjim · August 19, 2014

Joe Posnanski just tweeted this:

Irony would be Mike Trout, finally in line to win his MVP, LOSING to Alex Gordon or Josh Donaldson because they have higher WAR.

If Gordon wins because of a defensive bonus from playing LF, that has to be the death of WAR as a meaningful comparative statistic.

He is not good enough to stick at 3B but he can do well in LF and earn a bunch of dWAR, what a bunch of nonsense. A slightly above mediocre CF or 3B is so much more valuable than an elite LF, because it is hard to find even an adequate (or replacement level) defensive player at those positions. The replacement level players for LF are the 70% of MLBers who don't play there because they are good enough defenders to play elsewhere, not the lugs (ahem, Willingham) who they put out there because they can't play anywhere else.

Badsmerf · August 19, 2014

I have stayed out of the conversation, but I love (yes love) WAR. You want to pick it apart as not an encompassing stat.... whatever. It's not perfect, but it isn't terrible like many are claiming. Usually the best players in the league have the highest WAR. I think defensive position should also play into it more, but the metric for that would then be debated too. The thing is, you have to take it for what it is. It's only a number, making judgements on one stat is foolish.

markos · August 19, 2014

Then one would think they would regress offensive stats too, right? Mauer's 2009 didn't really happen, we need to regress it.

While offensive stats are not necessarily "regressed", it should be pointed out that offensive stats are heavily adjusted when they are added into the WAR calculations. They are adjusted for run environment, stadium and position (and probably other things). This is done so that players on different teams, positions and leagues can be directly compared, as well as across multiple seasons.

Using your example, if Mauer had the exact same offensive counting stats in 2014 that he had in 2009, he would actually be credited for a lot more offensive value because he plays in a much tougher ballpark and a lower run-scoring environment. On the other hand, he will be docked value because he plays 1B instead of C.

TheLeviathan · August 19, 2014

Makes plenty of sense to me. A guy with an .800 BABIP is either a cyborg or got plenty of "bad hits".

A ridiculous BABIP doesn't make him a bad hitter. Just prime for regression.

Here's another way to say this point: UZR, in part, can be based on I accurately recorded past results. That isn't the same kind of problem as a high BABIP.

Sign In

Debating WAR

Recommended Posts

Archived

Member Statistics

Prospect News & Highlights

Recent News

Notes & Rumors

Recent Blog Entries

Recent Status Updates