Jump to content

Providing independent coverage of the Minnesota Twins.
Photo

Article: Mailbag: Available Pitchers, Buxton Hype, Baseball Time Machine

  • Please log in to reply
91 replies to this topic

#41 USAFChief

USAFChief

    Bad puns. That's how eye roll.

  • Twins Mods
  • 22,936 posts
  • LocationTucson

Posted 05 March 2019 - 02:25 PM

 

I'd like to talk a little bit more about Morneau.His 2010 concussion halted an absolute monster year where he would have been a really strong candidate for MVP and might have made a difference in the season and the playoffs. However, if I could go back in time I maybe go back to 2008 and at a minimum keep him out of the home run derby and maybe the all star game itself. Then I rest him as much as I can the rest of the way.He had a sub par August and a very poor September.Later we found out he had been playing with a stress fracture in his back.A healthy Morneau for those last two months very likely makes a game 163 with the White Sox unnecessary and we would have played the Rays who we matched up ok with. Maybe that was the year things would have fallen our way. Who knows? 

 

 Onan individual level we talk about how injuries have derailed or dimmed Oliva and Mauer's chances for HOF. How about Morneau?Without playing 163 games and the stress fracture he would have been a shoo in for MVP that year if he had a strong September and the Twins made the playoffs. Couple that with a possible MVP in 2010 if he did not have the concussion and now you have a guy that wins 3 MVP's.Just on the surface that should get him into the Hall.Add in his performance after 2010 if he never suffered from that concussion just makes an already easy pick that much easier.No one talks about him and the Hall but if things break a different way he definitely had the ability.

Concur 100 percent.

 

Morneau was the better player between he and Mauer. 

  • Loosey and ken like this

Cutting my carbs...with a pizza slicer.


#42 Mike Sixel

Mike Sixel

    Now living in Oregon

  • Members
  • 29,149 posts

Posted 05 March 2019 - 02:55 PM

 

I hope they never stop trying to do that.

 

I hope they stop wasting precious 25 man spots......

  • Riverbrian likes this

It's IL now, btw, not DL.....


#43 Mike Sixel

Mike Sixel

    Now living in Oregon

  • Members
  • 29,149 posts

Posted 05 March 2019 - 02:55 PM

 

Tell that to the defensive metrics fans. 

 

Well, to properly use them, you need three years. Not sure your point at all.

It's IL now, btw, not DL.....


#44 Riverbrian

Riverbrian

    Goofy Moderator

  • Twins Mods
  • 20,567 posts
  • LocationGrand Forks, ND

Posted 05 March 2019 - 07:38 PM

Agreed, as long as they don't stubbornly hold on to that player and gift them a roster spot.


That’s exactly the result I’m hoping to avoid. No more Logan Morrison walking past the lineup card without checking to see if his name is on it.

A Skeleton walks into a bar and says... "Give me a beer... And a mop".

 

President of the "Baseball Player Positional Flexibility" Club 

Founded 4-23-16 

 

Strike Zone Automation Advocate

 

I'm not a starting 9 guy!!!


#45 USAFChief

USAFChief

    Bad puns. That's how eye roll.

  • Twins Mods
  • 22,936 posts
  • LocationTucson

Posted 06 March 2019 - 12:13 AM

Well, to properly use them, you need three years. Not sure your point at all.

unreliable small sample size, added to unreliable small sample size, is suddenly considered reliable, just because you now have more of the unreliable data.

Doesn't compute, IMO.

Cutting my carbs...with a pizza slicer.


#46 spycake

spycake

    Senior Member

  • Members
  • 15,315 posts

Posted 06 March 2019 - 10:51 AM

 

unreliable small sample size, added to unreliable small sample size, is suddenly considered reliable, just because you now have more of the unreliable data.

Doesn't compute, IMO.

Isn't this a pretty foundational concept of statistics? That the larger the sample, the more reliable it can be?

 

1000 plate appearances doesn't have the exact same reliability as each 10 PA chunk. Collectively, it basically sums the meager reliability of each of those small samples, and after awhile, it has meaningfully more reliability. You're never going to reach perfect 100% reliability in this field, and where you draw the line at "reliable enough" can be a little fuzzy/subjective/context-dependent, but the basic idea that a larger sample (3 seasons of defensive data) is more reliable than a smaller sample (1 season) is certainly true.

 

Edit: obviously if you're measuring something meaningless, you're never going to get a meaningful conclusion no matter how large the measurement sample. "Garbage in, garbage out" as they say. But I don't think it's fair to characterize the inputs of defensive metrics as meaningless, as imperfect as they might be. Even without Statcast, the inputs are basically just recorded scouting observations -- the location of the ball, the position of the fielder, the result of the play, etc. It's less reliable than offensive measurement, which is why we have the "three season" rule of thumb, but it's not meaningless.

Edited by spycake, 06 March 2019 - 10:59 AM.

  • ashbury, Mike Sixel, 70charger and 1 other like this

#47 ashbury

ashbury

    Haighters gonna Haight

  • Twins Mods
  • 21,671 posts
  • LocationLake Tahoe, NV

Posted 06 March 2019 - 12:32 PM

I concur with all that Spy wrote above, but want to add that defensive stats suffer from an inherent small sample size compared to batting stats. Each plate appearance consists on average of around 4 pitches, these days. Even though only outcomes are recorded, there is a richness in the experience being measured that goes far beyond the basic numbers. How well does the batter lay off bad pitches, how often does he whiff on good pitches - all these micro-results wind up contributing to what we know of as a plate appearance.

 

By comparison, defensive stats suffer the opposite problem. There is less to the data than meets the eye. An awful lot of Total Chances are on plays like cans of corn to the outfield and routine grounders to the infielders. Separating the wheat from the chaff is the first task of the data analysis, and there is an awful lot of chaff. That's IMO why it takes multiple seasons for defensive stats to take on the same meaningfulness of their offensive counterparts.

  • spycake and jkcarew like this

Nothing is impossible for the one who doesn't have to do the work.


#48 Mike Sixel

Mike Sixel

    Now living in Oregon

  • Members
  • 29,149 posts

Posted 06 March 2019 - 12:37 PM

 

I concur with all that Spy wrote above, but want to add that defensive stats suffer from an inherent small sample size compared to batting stats. Each plate appearance consists on average of around 4 pitches, these days. Even though only outcomes are recorded, there is a richness in the experience being measured that goes far beyond the basic numbers. How well does the batter lay off bad pitches, how often does he whiff on good pitches - all these micro-results wind up contributing to what we know of as a plate appearance.

 

By comparison, defensive stats suffer the opposite problem. There is less to the data than meets the eye. An awful lot of Total Chances are on plays like cans of corn to the outfield and routine grounders to the infielders. Separating the wheat from the chaff is the first task of the data analysis, and there is an awful lot of chaff. That's IMO why it takes multiple seasons for defensive stats to take on the same meaningfulness of their offensive counterparts.

 

I don't think anyone is arguing otherwise. Clearly defensive measures will always be behind offensive ones......

It's IL now, btw, not DL.....


#49 ashbury

ashbury

    Haighters gonna Haight

  • Twins Mods
  • 21,671 posts
  • LocationLake Tahoe, NV

Posted 06 March 2019 - 12:42 PM

I don't think anyone is arguing otherwise. Clearly defensive measures will always be behind offensive ones......

Understanding why they are different (and I offer no guarantee that I do) allows one to judge when and how much to trust the defensive stats we do have. It's not, for instance, all subjective on the defensive side - that's not the core problem.

Nothing is impossible for the one who doesn't have to do the work.


#50 spycake

spycake

    Senior Member

  • Members
  • 15,315 posts

Posted 06 March 2019 - 12:46 PM

 

That's IMO why it takes multiple seasons for defensive stats to take on the same meaningfulness of their offensive counterparts.

Yup. And I'm also comfortable saying that defensive stats (at least as we know them right now) can probably never take on the same meaningfulness as their offensive counterparts. Most notably, whereas K and BB rates might stabilize around 150 PA or whatever, once you're looking at 3 years of defensive stats, you essentially have to start accounting for age too, not to mention other factors (coaching and personnel changes?).

 

That's not to say that they're meaningless, or fully unreliable. Maybe the conclusions drawn from them have to be a little less specific. Maybe they have more value looking backward than forward, etc.


#51 ashbury

ashbury

    Haighters gonna Haight

  • Twins Mods
  • 21,671 posts
  • LocationLake Tahoe, NV

Posted 06 March 2019 - 12:49 PM

Yup. And I'm also comfortable saying that defensive stats (at least as we know them right now) can probably never take on the same meaningfulness as their offensive counterparts. Most notably, whereas K and BB rates might stabilize around 150 PA or whatever, once you're looking at 3 years of defensive stats, you essentially have to start accounting for age too, not to mention other factors (coaching and personnel changes?).

 

That's not to say that they're meaningless, or fully unreliable. Maybe the conclusions drawn from them have to be a little less specific. Maybe they have more value looking backward than forward, etc.

The good news may be that it's less important to be able to make the distinctions between good and so-so defense, for the same SSS reasons that make the analysis harder: Robbie Grossman in the outfield doesn't get a chance to affect the game's outcome as often as we sometimes think. :)

Nothing is impossible for the one who doesn't have to do the work.


#52 jorgenswest

jorgenswest

    Senior Member

  • Members
  • 4,179 posts

Posted 06 March 2019 - 07:46 PM

Defensive stats do need a significant sample size so do many of the batting and pitching numbers that are heavily used on almost any baseball telecast. Not only are these stats presented as meaningful in a partial season sample they are often used in splits that further degrade the sample.

 

One 2018 article with some discussion of the reliability of defensive metrics and reliability.

 

https://blogs.fangra...verage-and-uzr/

 

 

For those that don’t trust defensive numbers, keep in mind that r-squared for Offensive runs above average per 600 plate appearances for this same group of players was .21. Defense had a higher correlation than wOBA, wRC+, slugging percentage, and on-base percentage and was roughly equivalent to walk percentage and ISO. Much of this relationship is going to be due to positional factors, but positional factors are a very important part of determining a player’s value and overall defensive value is pretty consistent year to year.

Defensive stats have evolved since 2012 but here is an early article on their reliability. They have improved since.

 

https://www.billjame...tics_correlate/

 

 

We are at the point where our defensive analytics are nearly as reliable as offensive and pitching analytics. Just looking at the single best statistic in each: OPS is .69, Opponent OPS is .61, Defensive Runs Saved is .59. We’ve come a long way.

 

I join those of you that are skeptical of defensive metrics in samples smaller than a full season or even more. I think we should be just as skeptical of slash stats, ERA, FIP and others that are much more commonly cited.

Edited by jorgenswest, 06 March 2019 - 07:46 PM.

  • Mike Sixel likes this

#53 USAFChief

USAFChief

    Bad puns. That's how eye roll.

  • Twins Mods
  • 22,936 posts
  • LocationTucson

Posted 07 March 2019 - 07:08 AM

 

Isn't this a pretty foundational concept of statistics? That the larger the sample, the more reliable it can be?

 

1000 plate appearances doesn't have the exact same reliability as each 10 PA chunk. Collectively, it basically sums the meager reliability of each of those small samples, and after awhile, it has meaningfully more reliability. You're never going to reach perfect 100% reliability in this field, and where you draw the line at "reliable enough" can be a little fuzzy/subjective/context-dependent, but the basic idea that a larger sample (3 seasons of defensive data) is more reliable than a smaller sample (1 season) is certainly true.

 

Edit: obviously if you're measuring something meaningless, you're never going to get a meaningful conclusion no matter how large the measurement sample. "Garbage in, garbage out" as they say. But I don't think it's fair to characterize the inputs of defensive metrics as meaningless, as imperfect as they might be. Even without Statcast, the inputs are basically just recorded scouting observations -- the location of the ball, the position of the fielder, the result of the play, etc. It's less reliable than offensive measurement, which is why we have the "three season" rule of thumb, but it's not meaningless.

The difference is, while 10 PAs are certainly not predictive, nobody would argue those 10 PAs (whether the dude hit 1.000 or .000) weren't accurately measured. Those 10 PAs did happen, and we know, for certain, what the results were. We can use those 10 accurately measured PAs, in conjunction with another 10 and another 10, and so on, to accurately measure what happened, and perhaps form an educated opinion about what is likely to happen in the future.

 

Defensive metrics aren't sold that way. Everyone admits small sample sizes do not necessarily represent what actually happened. But those same people then turn around and claim "more inaccurate data" will solve the accuracy issue. 

 

I don't think it does. I think adding 10 accurately measured PAs to another 990 of the same gives you an accurate picture of what happened. But adding 10 inaccurately measured defensive plays to 990 other inaccurately measured defensive plays doesn't.

Cutting my carbs...with a pizza slicer.


#54 Riverbrian

Riverbrian

    Goofy Moderator

  • Twins Mods
  • 20,567 posts
  • LocationGrand Forks, ND

Posted 07 March 2019 - 07:33 AM

 

I concur with all that Spy wrote above, but want to add that defensive stats suffer from an inherent small sample size compared to batting stats. Each plate appearance consists on average of around 4 pitches, these days. Even though only outcomes are recorded, there is a richness in the experience being measured that goes far beyond the basic numbers. How well does the batter lay off bad pitches, how often does he whiff on good pitches - all these micro-results wind up contributing to what we know of as a plate appearance.

 

By comparison, defensive stats suffer the opposite problem. There is less to the data than meets the eye. An awful lot of Total Chances are on plays like cans of corn to the outfield and routine grounders to the infielders. Separating the wheat from the chaff is the first task of the data analysis, and there is an awful lot of chaff. That's IMO why it takes multiple seasons for defensive stats to take on the same meaningfulness of their offensive counterparts.

 

I concur with you... Yet small sample size single year defensive metrics based on a majority of "can of corn" plays are then tossed into the WAR calculations and from WAR... Win probablity, and everything else. 

 

And very few of us look at 3 year increased sample sizes. Most of us see 3.4in RF and -2 in LF for 2018 and the assumptions are made that the player can't play LF. 

 

 

  • TheLeviathan likes this

A Skeleton walks into a bar and says... "Give me a beer... And a mop".

 

President of the "Baseball Player Positional Flexibility" Club 

Founded 4-23-16 

 

Strike Zone Automation Advocate

 

I'm not a starting 9 guy!!!


#55 ashbury

ashbury

    Haighters gonna Haight

  • Twins Mods
  • 21,671 posts
  • LocationLake Tahoe, NV

Posted 07 March 2019 - 07:38 AM

You should always be wary. OPS is quick-and-dirty. WAR is quick-and-dirty. RBI and ERA and Wins all have extenuating factors.

 

Including defensive wins into WAR tells you something additional, even if it's staticky and sometimes even misleading. You teach your kids to watch their step on ice - you don't tell them not to walk on ice.

  • Mike Sixel likes this

Nothing is impossible for the one who doesn't have to do the work.


#56 Riverbrian

Riverbrian

    Goofy Moderator

  • Twins Mods
  • 20,567 posts
  • LocationGrand Forks, ND

Posted 07 March 2019 - 09:25 AM

You should always be wary. OPS is quick-and-dirty. WAR is quick-and-dirty. RBI and ERA and Wins all have extenuating factors.

Including defensive wins into WAR tells you something additional, even if it's staticky and sometimes even misleading. You teach your kids to watch their step on ice - you don't tell them not to walk on ice.


I tell old people to watch their step on ice.

With the Kids you hope they get a million dollar contract playing for the Vancouver Canucks. And if that’s not possible... you hope that they will hold the old guys arm as they navigate the slippery sidewalk.
  • Platoon likes this

A Skeleton walks into a bar and says... "Give me a beer... And a mop".

 

President of the "Baseball Player Positional Flexibility" Club 

Founded 4-23-16 

 

Strike Zone Automation Advocate

 

I'm not a starting 9 guy!!!


#57 spycake

spycake

    Senior Member

  • Members
  • 15,315 posts

Posted 07 March 2019 - 11:06 AM

 

The difference is, while 10 PAs are certainly not predictive, nobody would argue those 10 PAs (whether the dude hit 1.000 or .000) weren't accurately measured. Those 10 PAs did happen, and we know, for certain, what the results were. We can use those 10 accurately measured PAs, in conjunction with another 10 and another 10, and so on, to accurately measure what happened, and perhaps form an educated opinion about what is likely to happen in the future.

 

Defensive metrics aren't sold that way. Everyone admits small sample sizes do not necessarily represent what actually happened. But those same people then turn around and claim "more inaccurate data" will solve the accuracy issue. 

 

I don't think it does. I think adding 10 accurately measured PAs to another 990 of the same gives you an accurate picture of what happened. But adding 10 inaccurately measured defensive plays to 990 other inaccurately measured defensive plays doesn't.

I think what you are claiming is more or less "garbage in, garbage out" which I addressed in my post.

 

I'm not sure how you think defensive metrics are calculated, but the foundation of them is stringers recording what actually happened. Is there some subjectivity involved? Sure -- I remember one issue was classifying fly balls vs liners (Statcast can help with that now). But that doesn't mean the data is worthless. There's subjectivity in scouting too but that data isn't worthless, and can gain some reliability if you get a large enough sample.

  • ashbury, Mike Sixel and 70charger like this

#58 diehardtwinsfan

diehardtwinsfan

    G.O.A.T.

  • Twins Mods
  • 13,297 posts
  • Locationthe charred ruins of BYTO

Posted 07 March 2019 - 09:10 PM

 

I think what you are claiming is more or less "garbage in, garbage out" which I addressed in my post.

 

I'm not sure how you think defensive metrics are calculated, but the foundation of them is stringers recording what actually happened. Is there some subjectivity involved? Sure -- I remember one issue was classifying fly balls vs liners (Statcast can help with that now). But that doesn't mean the data is worthless. There's subjectivity in scouting too but that data isn't worthless, and can gain some reliability if you get a large enough sample.

 

I think my problem is that the reliability takes years to accumulate, and I'm not sure I trust the logic in that statement. It's not like there aren't tons of defensive plays available each year. 600 PAs for a batter ultimately leads to what... 400ish chances for a defender somewhere? Granted, there's 9 defenders on the field, but times the 9 batters in the lineup, you still have plenty of players with far more defensive activity then at bats. Yes, I know some have far less too, that's the nature of the game, but I'd think at this point they would be able to come up with something. It's quite literally 600 PAs against 400 or so defensive attempts per player on average. That's enough of a sample size to determine something.

 

My problem with all of this, to be honest, is the concept of luck that gets thrown into this way way way too much. When Buxton or Kepler posts a low BABIP, I don't think it's b/c it's simply that he's unlucky. What we view as luck is simply variation within the top 1/100th of 1% of people, and in that case, Buxton is performing like someone in the 1/50th of 1% of people. It can trick someone who doesn't recognize the skill required to perform in the range of what we could call random variation. It will take an adjustment for them to perform within that range. You see players saying this all the time, but we get lazy and call it luck. 

 

This was a long time ago, but I remember reading a paper on how easy it was to build design into a series of numbers and make it look random. I think, at least to some extent, we're dealing with much of the same problem in trying to quantify these types of things. We want to chalk up way too much to randomness that is skill because we cannot see the skill in the numbers due to the fact that we are dealing with the top .0001% of people on this planet who happen to possess this skill.

 

I'm not sure it's as simple as needing more years. I think it requires a better eye both physically and mentally.


#59 Mike Sixel

Mike Sixel

    Now living in Oregon

  • Members
  • 29,149 posts

Posted 08 March 2019 - 01:00 PM

 

I think my problem is that the reliability takes years to accumulate, and I'm not sure I trust the logic in that statement. It's not like there aren't tons of defensive plays available each year. 600 PAs for a batter ultimately leads to what... 400ish chances for a defender somewhere? Granted, there's 9 defenders on the field, but times the 9 batters in the lineup, you still have plenty of players with far more defensive activity then at bats. Yes, I know some have far less too, that's the nature of the game, but I'd think at this point they would be able to come up with something. It's quite literally 600 PAs against 400 or so defensive attempts per player on average. That's enough of a sample size to determine something.

 

My problem with all of this, to be honest, is the concept of luck that gets thrown into this way way way too much. When Buxton or Kepler posts a low BABIP, I don't think it's b/c it's simply that he's unlucky. What we view as luck is simply variation within the top 1/100th of 1% of people, and in that case, Buxton is performing like someone in the 1/50th of 1% of people. It can trick someone who doesn't recognize the skill required to perform in the range of what we could call random variation. It will take an adjustment for them to perform within that range. You see players saying this all the time, but we get lazy and call it luck. 

 

This was a long time ago, but I remember reading a paper on how easy it was to build design into a series of numbers and make it look random. I think, at least to some extent, we're dealing with much of the same problem in trying to quantify these types of things. We want to chalk up way too much to randomness that is skill because we cannot see the skill in the numbers due to the fact that we are dealing with the top .0001% of people on this planet who happen to possess this skill.

 

I'm not sure it's as simple as needing more years. I think it requires a better eye both physically and mentally.

 

statcast is that better eye......

 

It's a boring argument, chief and others have decided they don't trust defensive stats, and no one is going to change his mind. I guess it's possible others will read this thread, and change their mind.....so it isn't worthless overall, but no one is changing chief's mind.

It's IL now, btw, not DL.....


#60 Jham

Jham

    Junior Member

  • Members
  • 1,821 posts

Posted 09 March 2019 - 12:16 PM

statcast is that better eye......

It's a boring argument, chief and others have decided they don't trust defensive stats, and no one is going to change his mind. I guess it's possible others will read this thread, and change their mind.....so it isn't worthless overall, but no one is changing chief's mind.


Defensive metrics continue to improve. No doubt. I think all skeptics like cheif and myself say it's to acknowledge the blind spots. Many say they do, but don't really. Statcast is a game changer. It can track both the route of the defender, the trajectory and speed of the batted ball, and the place on the field it lands. The illusion is that it takes all into consideration. FSN will show Buxton on a route and catch and show it as caught only 6% of ther time. But that's not the route. It's simply based on how many balls of that trajectory are hit to that particular spot on the field and turned into outs. Useful, but with modern shifting the data can be heavily skewed. We've all seen the can of corn to the left side turned into a hot because of an extreme outfield shift. The left fielder would get crushed on that ball because statcast would show that ball as a FO 7 twingo spot 97% of the time when in reality he had no shot. Shifted the other way in the infield, a 2b might be called under that ball and statcast might have the play as nearly miraculous. Statcast tracks both but assembling is still difficult. SSS exacerbates this since one unlucky shift play can skew the numbers given the low number of chances. Then there's the arbitrary position adjustment for WAR which I contr want to get into again...