Jump to content

Providing independent coverage of the Minnesota Twins.

The same great Twins Daily coverage, now for the Vikings.

The Store

Recent Blogs


Photo

Projections?

nate silver pecota playoff probabilities projection
  • Please log in to reply
22 replies to this topic

#1 stringer bell

stringer bell

    Manager-in-Waiting

  • Twins News Team
  • 4,354 posts
  • LocationZumbrota MN

Posted 17 May 2014 - 10:14 AM

I check espn and mlb.com every once in awhile for a non-Twins centric view of baseball and see what the outside thinks of the Twins. Recently, I checked a couple of player pages and they recorded this year's stats plus a Pecota projection for the balance of the season and then combined them for an overall projection. Most of the projections weren't too kind to the Twins--Dozier is projected to fall off to a .735 OPS for the season, the pitchers aren't tabbed to improve much and Hughes is seen as regressing to the mean. I am mystified how these projections are made. I checked Wikipedia and it says that the whole system was designed by Nate Silver, a pretty well-respected numbers guy (unless you're a Republican) and among things the article notes that the exact methodology is secret.

Also, in checking the standings, the mlb.com site has a playoff projection. The Twins currently stand at 12.5%, how on earth can that be projected at this point in the season? In the East, first place Baltimore has the lowest playoff probability and the Yankees and Red Sox have the highest.

#2 jokin

jokin

    Twins News Team

  • Twins News Team
  • 7,326 posts

Posted 17 May 2014 - 10:33 AM

I check espn and mlb.com every once in awhile for a non-Twins centric view of baseball and see what the outside thinks of the Twins. Recently, I checked a couple of player pages and they recorded this year's stats plus a Pecota projection for the balance of the season and then combined them for an overall projection. Most of the projections weren't too kind to the Twins--Dozier is projected to fall off to a .735 OPS for the season, the pitchers aren't tabbed to improve much and Hughes is seen as regressing to the mean. I am mystified how these projections are made. I checked Wikipedia and it says that the whole system was

designed by Nate Silver, a pretty well-respected numbers guy (unless you're a Republican)


and among things the article notes that the exact methodology is secret.

Also, in checking the standings, the mlb.com site has a playoff projection. The Twins currently stand at 12.5%, how on earth can that be projected at this point in the season? In the East, first place Baltimore has the lowest playoff probability and the Yankees and Red Sox have the highest.


Actually, Nate Silver is now a pariah to the Democrats, as well, given his current forward political projections- and he's been pretty opaque about his methodology in both sports and politics- inviting controversy and speculation. And he has now wisely returned to a sports focus to reduce the slings and arrows, leaving the NY Times and joining ESPN.

Good stuff, though, Stringer. Do you have the direct links to the meaty projection stuff?

#3 drivlikejehu

drivlikejehu

    Senior Member

  • Members
  • 534 posts

Posted 17 May 2014 - 11:42 AM

There are a variety of projection systems and massive amounts of discussion out there with respect to methodology. They are all based on the players' past performance... The differences have to do with adjustments that are made based on historical data (e.g., differences in how players at different positions age).

A player with a big breakout or collapse season will obviously veer a lot from the projection, which is really more of an expectation baseline than a true prediction.

#4 Briley

Briley

    Junior Member

  • Members
  • 8 posts

Posted 17 May 2014 - 04:20 PM

Like said above, it's generally some combination of the player's history and how historic data. I would imagine they are predicting that Dozier's power numbers will start to regress at some point since he never really showed it in the minors.
Once held a .900 OBP in Church League Softball.

#5 halfchest

halfchest

    Senior Member

  • Members
  • 253 posts

Posted 17 May 2014 - 05:43 PM

There will always be outliers in these formula but it is fascinating as I believe overall they are pretty accurate. When looking at individuals there will be a decent amount of error but if you look at an entire team the predictions are more likely to be on. There will be some players that out hit and then others the under hit the projections to equal it out.

That said, I was at least somewhat skeptical of Dozier and there almost has to be some regression coming soon. If not, holy crap, he's on pace for what? 35-40 homeruns? That would shock me. That said, I think some of his homers will be replaced by doubles and he'll still end up 750 - 800 OPS.

#6 tarheeltwinsfan

tarheeltwinsfan

    Senior Member

  • Members
  • 150 posts

Posted 17 May 2014 - 08:45 PM

That's exactly what I was thinking. It doesn't take a degree in statistics to say Dozier will not hit 40 HR this year. My hope is that he does and I'll be forced to eat these words. This leads me to the following question: exactly how does one eat his words when they are electronically conveyed?

#7 tarheeltwinsfan

tarheeltwinsfan

    Senior Member

  • Members
  • 150 posts

Posted 17 May 2014 - 08:50 PM

Interesting post. I guess all I can say is that's why we play the games. Did anyone project both the Twins and the Braves going from worst to first in 1991? I doubt it.

#8 Brock Beauchamp

Brock Beauchamp

    Owner

  • Administrators
  • 8,940 posts

Posted 17 May 2014 - 09:15 PM

And this is why projection models are kinda broken. They can't tell you that Dozier changed his swing and had immediate success with it. They can't tell you that he looked like a different player after that date. All models can do it look at his season, compare it to his age, cross-compare them to similar players, and then spit out a number.

Except that Dozier was no longer the player his overall 2013 stats said he was, which renders the projection mostly useless.

#9 drivlikejehu

drivlikejehu

    Senior Member

  • Members
  • 534 posts

Posted 17 May 2014 - 09:32 PM

And this is why projection models are kinda broken. They can't tell you that Dozier changed his swing and had immediate success with it. They can't tell you that he looked like a different player after that date. All models can do it look at his season, compare it to his age, cross-compare them to similar players, and then spit out a number.

Except that Dozier was no longer the player his overall 2013 stats said he was, which renders the projection mostly useless.


Except that the projection models aren't "broken" because they do not attempt to consider unusual changes in player ability. A good projection tells us what to reasonably expect. That's all. It's a starting point. The fact that some people don't understand a tool, or use it incorrectly, has zero bearing on its validity.

#10 CRArko

CRArko

    Agent of SHIELD

  • Members
  • 1,695 posts
  • LocationThe Playground
  • Twitter: crarko

Posted 18 May 2014 - 05:00 AM

Except that the projection models aren't "broken" because they do not attempt to consider unusual changes in player ability. A good projection tells us what to reasonably expect. That's all. It's a starting point. The fact that some people don't understand a tool, or use it incorrectly, has zero bearing on its validity.


Except that we can't know because the basis of the model is being kept as a secret sauce recipe, and the results are being thrown around with no mention of confidence levels, or what the involved measure of the variance is, or what assumptions the model makes.

If one did have training in statistics one might say it's pretty shoddy work. If one had training in advertising one might call it brilliant work. It's all in how you treat ambiguity.

Edited by crarko, 18 May 2014 - 05:03 AM.

Go dark.

#11 Brock Beauchamp

Brock Beauchamp

    Owner

  • Administrators
  • 8,940 posts

Posted 18 May 2014 - 08:34 AM

Except that the projection models aren't "broken" because they do not attempt to consider unusual changes in player ability. A good projection tells us what to reasonably expect. That's all. It's a starting point. The fact that some people don't understand a tool, or use it incorrectly, has zero bearing on its validity.


Many projection models simply predict regression from good players and progression from bad players.

In the aggregate they have some use but I've found them mostly useless for individual performance because, as you just admitted, they cannot compensate for individual drive, changes to approach, and other assorted things that human beings do on a daily basis.

#12 TheLeviathan

TheLeviathan

    Twins News Team

  • Twins News Team
  • 5,236 posts

Posted 18 May 2014 - 08:42 AM

Dismissing a projection because it isn't Nostradamus is as silly as putting total faith in it.

#13 Guest_USAFChief_*

Guest_USAFChief_*
  • Guests

Posted 18 May 2014 - 08:47 AM

Except that the projection models aren't "broken" because they do not attempt to consider unusual changes in player ability. A good projection tells us what to reasonably expect. That's all. It's a starting point. The fact that some people don't understand a tool, or use it incorrectly, has zero bearing on its validity.

Wouldn't a projection system that, by design, doesn't attempt to include new information, be of questionable validity?

#14 PseudoSABR

PseudoSABR

    Twins News Team

  • Twins News Team
  • 1,973 posts

Posted 18 May 2014 - 11:11 AM

Many projection models simply predict regression from good players and progression from bad players.

In the aggregate they have some use but I've found them mostly useless for individual performance because, as you just admitted, they cannot compensate for individual drive, changes to approach, and other assorted things that human beings do on a daily basis.

Right. Prognosis models must demonstrate their worth in the aggregate (which as sums (or league totals) stay relatively the same year to year)--which means their models necessarily emphasize regression to the mean for nearly all players (both in terms of league averages and the player's career averages).

They are fun to look at and tell you something about where league totals sit and about the biases of their authors but probably aren't practically useful for much.

#15 TheLeviathan

TheLeviathan

    Twins News Team

  • Twins News Team
  • 5,236 posts

Posted 18 May 2014 - 11:17 AM

The fact that projections are accurate in the aggregate is informing. In sports, fans like to believe their guy is the outlier because he changed his approach, he's a new player, he's "fill in token optimism here!"

In reality, most players DO regress to the mean over time. There will be outliers who establish new performance standards, but they are just that - outliers. We can all hope that Dozier is an outlier like Jose Bautista was. But for every Dozier there are a dozen Colabellos or Diamonds.

Their practical use is in tempering expectations by overzealous fans.

#16 PseudoSABR

PseudoSABR

    Twins News Team

  • Twins News Team
  • 1,973 posts

Posted 18 May 2014 - 11:20 AM

A good projection tells us what to reasonably expect.

I think the debate is it's not very reasonable to expect a player to perform to a model limited by historical data and 'secret sauce' metrics. I think projections are a far more opaque tool (or lens) than simply looking at the historical data ourselves.

If a glass hammer breaks, it's not user error.

Edited by PseudoSABR, 18 May 2014 - 11:22 AM.


#17 PseudoSABR

PseudoSABR

    Twins News Team

  • Twins News Team
  • 1,973 posts

Posted 18 May 2014 - 11:22 AM

The fact that projections are accurate in the aggregate is informing.

Is it? For me, it's a symptom that league totals don't change over time. Of course, as one guy gets inexplicably better, another guy gets inexplicably worse; for me, the usefulness is in identifying who is who.

#18 drivlikejehu

drivlikejehu

    Senior Member

  • Members
  • 534 posts

Posted 18 May 2014 - 11:23 AM

Wouldn't a projection system that, by design, doesn't attempt to include new information, be of questionable validity?


They are based on objective, comparable information. Every year, there is huge amounts of talk regarding adjustments players are making, new pitches, new approaches, etc. Most of the time it is noise.

But again, the issue here is just not understanding what the system is and what it is trying to do. For any individual player, there may be information that could potentially improve a projection. But that defeats the purpose of objectivity and consistency.

That's why the correct use is to start with the baseline provided and consider whether other factors may result in exceeding or falling short of the projection. Switching the order of that process is human error that does nothing to diminish the projections.

Also, it's not true they are all "secret sauce." The formula for Marcel (Tom Tango) is published, among others.

It is true you can basically replace their use by just studying a player's historical performance. That's not very efficient though.

Edited by drivlikejehu, 18 May 2014 - 11:26 AM.


#19 TheLeviathan

TheLeviathan

    Twins News Team

  • Twins News Team
  • 5,236 posts

Posted 18 May 2014 - 11:27 AM

Is it? For me, it's a symptom that league totals don't change over time. Of course, as one guy gets inexplicably better, another guy gets inexplicably worse; for me, the usefulness is in identifying who is who.


Did Diamond suddenly become a turd overnight? Colabello's deal with the devil expired at midnight on April 23rd?

The myth is that you can identify the trends any better than a projection system - even one that functions like a caveman club. Parker, who is an excellent analyst of baseball, had two blogs earlier this year detailing the successes of Kubel and Colabello. He identified things that had changed that would indicate why they were being successful.

They still succumbed to regression. There are few that don't over time and it's a valuable lesson for fans to keep in mind. (Sometimes, it's also a very good thing! Nudge Nudge to Mauer right now)

#20 Guest_USAFChief_*

Guest_USAFChief_*
  • Guests

Posted 18 May 2014 - 12:42 PM

The fact that projections are accurate in the aggregate is informing.

My projection system predicts the combined W-L record of the AL and NL this year will be .500, and my system has been accurate in the aggregate for going on two centuries now. However, I'm having trouble deciding what good the data is to me.

#21 TheLeviathan

TheLeviathan

    Twins News Team

  • Twins News Team
  • 5,236 posts

Posted 18 May 2014 - 12:56 PM

My projection system predicts the combined W-L record of the AL and NL this year will be .500, and my system has been accurate in the aggregate for going on two centuries now. However, I'm having trouble deciding what good the data is to me.


It tempers predictions. Like, for example, when every fan board thinks they improve by 5-10 games for the next year. We know this won't happen and we can identify the culprits most likely to not fulfill this.

If the usefulness of projections and analysis is limited by being 100% accurate - someone should let Parker and every other baseball commentator know they're wasting their time. It's a rather unfair bar to hold anything to.

The funny thing is we hear these kinds of objections when there is pessimism about the favored player/club. But if someone were to post the same projections that say we can expect better from Nolasco, Mauer, and probably Perkins? Well, hell, regression to the mean is awesome then!

#22 70charger

70charger

    Senior Member

  • Members
  • 1,176 posts

Posted 18 May 2014 - 11:02 PM

The best explanation for the inability of projection models to reflect reality in anything but the aggregate comes from an analogy to financial modeling, and the best explanation for the inability of financial modeling comes in Nassim Taleb's work, which I'd recommend even if you aren't interested in finance because of how universally applicable this kind of thinking is. I'm partial to "Fooled by Randomness" because the book blew my hair back when I first read it, but "The Black Swan" is quite good too.

Basically, one of the core tenets of the arguments made is that, while a bell curve distribution may exist, the "tails," or the events that are statistically so improbable as to be virtually impossible are not only not impossible, but if they are treated as such, the models will not only fail but they will fail catastrophically, with cascading effects everywhere else. They're colloquially called the "fat tails" of the bell curve. We got an object lesson in the fat tails of the bell curve when mortgage backed securities, which were so "safe" as to be considered statistically virtually riskless, blew up and took down the market. Baseball reveals its fat tails again and again (remember the old saying "that's why they play the games"?), and as one poster previously mentioned, I would consider the Twins' 1991 season to be a resident of pretty darn fat part of a fat tail of a bell curve.

TL;DR, the models may be able to accurately tell you what will happen in the aggregate, but there will be outliers each and every year that make you stand back, scratch your head, and ask "what the actual ****?"

#23 spycake

spycake

    Senior Member

  • Members
  • 2,047 posts

Posted 19 May 2014 - 08:24 AM

Somewhat back on topic:

Don't know PECOTA, but the author of ZiPS has found that there is less in-season regression than season-to-season. Still small samples this year, but that bodes well for a few Twins (and not so well for a few others). Fun full article here:
http://www.hardballt...jection-system/

Relevant excerpt (Lesson #8):

Simply put, there was significantly less regression toward the mean for in-season stats than you would expect from the sample size, relative to season-to-season stats.

One notable example was BABIP, in that the BABIP overperformance, in the context of in-season projections, tended to stick more than one would expect from the heavier regression from season-to-season. That .400 first-half BABIP may be doomed next year, but players retain a surprisingly large amount of that bounce within the same season.