Jump to content
Twins Daily
  • Create Account

What If Everything is Actually Data?


Twins Video

Data and analytics have become scary words across the land of Minnesota Twins baseball recently.  Rocco Baldelli gets criticized regularly for never lifting his head out of the spreadsheet or not trusting his players because the computer told him not to. Many say that the manager needs to go with his gut more often in order to win games or that he needs to take game situations to mind when making decisions.  Let’s dig in a little bit.

So what is this “data”?  Is data all those batting, pitching, and fielding statistics broken down into every conceivable combination and minute detail?  Of course it is.  That’s what we all think of.  How does Batter X perform against a particular pitcher?  Are there platoon advantages to be gained from Batter Y?  We need a home run, who is most likely to hit one right now?  We just need to advance the runner, is Batter Z the right guy to do that?  On the pitching side, how well does Pitcher A perform in his third time through the lineup?  What pitch should Pitcher B throw to Hitter W to get him out?  There are literally hundreds of different statistics out there to analyze and utilize.  The breakdown can go on forever and possibly to the point of silliness, like “What is Batter Q’s hitting line against a submarining lefty pitcher wearing a red uniform north of the Mason Dixon line on a windy Thursday during Lent?”

So that’s what we understand data to be.  It’s all about numbers right?  Well, maybe not.  The things we think of a data are merely numerically quantifying and confirming what is true (or disproving what is thought to be true). For example, in 1977, everyone knew that Rod Carew was the guy you wanted batting if you wanted to start a rally.  That was common knowledge.  Why? Well, because he seems to get a lot of hits and walks and doesn’t strike out a ton.  It’s a no brainer, right?  Yes.  That’s right.  However, to use a simple piece of “data”, his on base percentage that year was almost .450 (I guess that happens when you hit .388!).  Those numbers reinforce or “prove” that he was the guy that the Twins want batting in that situation.  Until they don’t.   Sometimes Rod Carew struck out.  In fact, in arguably the greatest hitting season in team history, he made outs 55% of the time.  Even so, he was still the best option in Gene Mauch’s and pretty much everyone else’s mind. 

What about that “gut feeling”?  It’s called anecdotal data.  It is a belief in something based on some evidence that the decision-maker values.  It’s “the eye test”.  He “looks like a major leaguer”.  “What a great pitch!”  Why do people say that?  Because they have seen things happen that confirm their feelings. Their brain is comparing it to other things they have seen and is making a value judgement based on their experiences.  We don’t realize it, but the personal computer in our head is keeping track and counting occurrences of how things play out on the baseball field.  The brain is analyzing the data that it sees and is coming to a decision.  We don’t think about it that way because we don’t think out loud and verbalize that we are analyzing.  We just “do it”.  No one needs to tell us to drive on the right side of the road, we just know (without knowing any numerical statistics) that driving on the left side would lead to very bad outcomes eventually. 

Back in days of old, when the 1927 Yankees came to town, managers (and pitchers) knew that they were in trouble getting through the heart of the order.  They probably knew Babe Ruth’s and Lou Gehrig’s batting average and the number of home runs they hit, but that’s about all they had.  The rest was just their gut – what they thought might be true based on what they saw in the past. As time went on, more and more ways to quantify those gut feelings came along and gradually came into broader use across the league.  Do you think that manager Bucky Harris of the 1927 Washington Senators would have liked to have some statistical analysis that would help inform his decisions when facing the Bronx Bombers?  I’m certain that he would have.  He would likely have tried to use any advantage he could come up with and knowing where Ruth and Gehrig’s weak spots in the strike zone were would have come in very handy.  Goose Goslin and Tris Speaker were good, but they were never going to keep up with the unchecked Bambino and Slambino. By the way, Bucky Harris was also the 2nd baseman in addition to being the manager that year who used whatever data he could conceive of to beat those damn Yankees.  It didn’t work.  The Senators were pretty good in 1927, but still finished in 3rd place.

So let’s return to 2023. Why do people think that Rocco Baldelli uses data and analytics too much?  Probably because he talks about it a lot and because the game across the league has changed more than fans of one team realize.  Rocco is a smart guy, and a numbers guy.  He’s playing the odds using as much actuarial science as he can in most of the baseball decisions he makes.  Spoiler Alert: This will not always result in decisions working out!  Just as with Rod Carew making outs 55% of the time in 1977, it is not an exact science. If Choice A has a 45% chance of success and Choice B has a 25% chance of success, I’m going with Choice A every single time, even if sometimes it will go the other direction.  This is what insurance companies do all the time when they set the rates that they charge for your insurance policy.  They know that sometimes they will be wrong, but the odds (informed by more statistical analysis than I want to think about or can comprehend) say that over the long term they will have made a good decision.  Add in the human element and those decisions get even more complex. 

But Rocco still uses too much data!  If you mean he takes all the information available to him and factors it into the decision, then yes, he uses too much data if that’s your definition.  Did Tom Kelly use all the information he had to make decisions?  Ron Gardenhire?  I’m pretty sure they did and I’m pretty sure they would like to use the additional information that’s available now as well.  Are they better or worse managers than Rocco Baldelli?  I’m not here to answer that, but I’m certain that the determining factor shouldn’t be whether they used the most complete information available to them to make decisions. Sometimes the data will lead you in the right direction and sometimes it will be wrong, but decisions have to be based on something!  What do you think?

 

 

 

 

17 Comments


Recommended Comments

Reptevia

Posted

I think, still being in its infancy, a lot of the data is unreliable. Management does not agree, despite what reality is telling them.   

GOONER759

Posted

If you’re going to use analytics you need an actual analytics department. The Twins have a few people in theirs. Far under what good organizations have. Don’t pretend to use analytics and yes gut instinct has always been part of the game. You saw it recently with Jayce Tingler who took over for Rocco after and ejection. He let Maeda pitch to the lefty.  Never would have happened under Rocco. 

specialiststeve

Posted

Data and numbers are fine but everyone needs to understand it comes down to how they are used and the validly of the numbers...

Example is exit velocity... Great that Gallo can hit it 118 miles per hour out of the park... Buxton... Correa... etc.. but

What is the exit velocity of a strike out? Zero! A better stat would average exit velocity of all AB's. My guess Is Gallo and Bux is about 20 MPH. 

A stat the "analytics" hate is hitting with runners on base or in scoring position... to that crowd it is an invalid stat saying it is all the same... WRONG again. Hitting with runner on or in scoring position is a skill and an art. Swinging from your heels hoping to run into one is not... We lack that fundamental skill and Rocco and the new hitting coach "don't get it"... 

Data... would say they both need to go..  

Brassman

Posted

The key piece of data right now is W’s and L’s. You have a team with talent superior to the rest of the division barely staying at .500 and ahead by a single game, along with historic numbers of strikeouts. Deep dives into data aren’t getting it done. 

Richie the Rally Goat

Posted

@Rod Carews Birthdaygreat post! Very thoughtful. The Buzzwords that get thrown around carry very different meaning for different fans.

as a pro-data fan, I am an analyst by trade too.

I wonder if some of the debate is based on comfort with statistical/quantitative process?

im not a huge fan of Modern baseball theory. 3 true outcomes hitters are boring as heck to watch. It seems like Manfred has identified some of the flaws in the modern game and has instituted rules changes. I wonder if we’ll see more (like lowering of the mound) to drive more change?

Strombomb

Posted

My problem is not that Rocco uses too much data. I think he uses the wrong data. As you stated, data comes from infinite sources. Sometimes "the human element" or some other intangible perception may outweigh the available numbers on your spreadsheet. 

Rod Carews Birthday

Posted

30 minutes ago, Strombomb said:

My problem is not that Rocco uses too much data. I think he uses the wrong data. As you stated, data comes from infinite sources. Sometimes "the human element" or some other intangible perception may outweigh the available numbers on your spreadsheet. 

I would say that you have a great point.  With infinite choices out there, it becomes a question of which items to choose and what points to prioritize, and sometimes getting that right is an imperfect science.  However, that "human element" is a data point as well, just not one easily quantified as a single number.  It's not that I think the game can be managed "by spreadsheet", it's that I think it never is by anyone, Rocco Baldelli included.  The use of analytics just gives us all a nice target to fire at when the team isn't playing up to expectations.  Put another way, what if the players just aren't as good as we thought they were?

Rod Carews Birthday

Posted

2 hours ago, Brassman said:

The key piece of data right now is W’s and L’s. You have a team with talent superior to the rest of the division barely staying at .500 and ahead by a single game, along with historic numbers of strikeouts. Deep dives into data aren’t getting it done. 

You are correct that the key is W's and L's.  I would suggest that the talent on the field not getting it done might be because they aren't as good as we want to think they are.  On the flip side, since the pitching talent IS getting it done, the road to winning is mostly tied up in the bats at this point, so it isn't completely unfixable.  No matter what the analytics say, it is still up to the players to produce and the offense is failing to do that.  Fix that, and the W's and L's will take care of themselves. Now, if only it was that easy.

Rod Carews Birthday

Posted

1 hour ago, Richie the Rally Goat said:

I wonder if some of the debate is based on comfort with statistical/quantitative process?

im not a huge fan of Modern baseball theory. 3 true outcomes hitters are boring as heck to watch. It seems like Manfred has identified some of the flaws in the modern game and has instituted rules changes. I wonder if we’ll see more (like lowering of the mound) to drive more change?

I'm certain that the debate is absolutely based on comfort with statistical/quantitative process -- or more accurately, discomfort with the process.  Face it, this is "change" to a more traditional approach, and people hate change, especially if they can't completely wrap their head around it. 

As to more rule changes, I'm not sure.  I think there are interesting possibilities, and things like the pitch clock and new steal/on base rules have been great.  However, I think there is enough blowback on the extra innings runner on 2nd rule that we may be in a temporary pause on new rules.  Or maybe he says "Hold my beer!"  That would be fun!

Gary Ray

Posted

I agree that ana.ytics properly applied will increase the probability of winning. But I wonder if Rocco is taking into account sample size when making decisions on situational hitting.

tony&rodney

Posted

12 hours ago, specialiststeve said:

Data and numbers are fine but everyone needs to understand it comes down to how they are used and the validly of the numbers...

Example is exit velocity... Great that Gallo can hit it 118 miles per hour out of the park... Buxton... Correa... etc.. but

What is the exit velocity of a strike out? Zero! A better stat would average exit velocity of all AB's. My guess Is Gallo and Bux is about 20 MPH. 

A stat the "analytics" hate is hitting with runners on base or in scoring position... to that crowd it is an invalid stat saying it is all the same... WRONG again. Hitting with runner on or in scoring position is a skill and an art. Swinging from your heels hoping to run into one is not... We lack that fundamental skill and Rocco and the new hitting coach "don't get it"... 

Data... would say they both need to go..  

This is part of what I used as a baseball coach. When a batter swings the bat there is always a result. The optimal end in my view was a ball totally squared up, hit into fair territory. A foul ball or a swing and a miss are just strikes. A wind blown home run or a ball lost in the sun or misplayed for extra bases is nice but my data recorded the relative manner in which the batter was able to successfully hit the ball. It is a goal of all hitters - hitting the ball. 

The exit velocity stat is nonsense as is the framing stat among others. Of course individuals have their opinions on a range of plays. I favored defenders getting outs, throwing to the right base, and range. If someone looked awkward but was effective, that worked for me. A guy who steals bases and goes first to third or knows where the fielders are when a ball is hit beats a burner who doesn't know where to go or when. And so forth.

Data has always been kept in various forms but data fans now collect information on just about everything so it is true that there is more data than ever. As the specialiststeve argues, the way in which the data is collected and used can be very misleading which creates numbers that make a player seem decent when they are actually not very useful. The job of the front office and field staff is to discern which players put together will produce the best results. 

The Twins have one of the best records in baseball if you use the expected wins totals. This stat really demonstrates some of the futility of chasing all of the numbers. None of those expected wins count. The games are played and there is a result in the end, either a win or a loss. I sometimes think the Twins actually look at their expected wins to mollify or soften the blow of their mediocre record. You know, process being the point. 

There is another game tomorrow and the guys who dig through the data can find a stat for every Twin to show us that our guys are in the top ten of something to flash on the screen. But even they get frustrated and put up the RISP stat.

Muppet

Posted

After the game last night it is clear that the Twins are just out of their league. They are a good team, but not a great team. They have too many players that don't hit well and strike out too often. Half of their relief pitchers are not consistent enough. I'm sure they were well aware of their weaknesses going into the season. Their only hope of squeaking out a winning season and a possible playoff run was to rely on good starting pitching, defense, and finding new ways to play the data. None of those things are working out right now, but I don't blame them for trying. 

While they might have a better chance of winning if Royce Lewis hits instead of Kepler with a righty on the mound --- despite the reverse platoon issue, they still don't have a very good chance of winning against a team that far outmatches them. 

Once we go through the however many stages of grief there are, and accept that this is not a world series winning team, we can sit back and have fun watching the young players develop; get excited about those rare good defensive plays (if they ever happen) from our supposed gold glove, mendoza line hitting outfield; and hope for the miracle that Cleveland, Detroit, and Chicago end up even worse catapulting the Twins into the playoffs. Once in the playoffs, any team, no matter how bad, can get lucky and take it all. 

Playing the data is one of their only hopes. It isn't working. But neither is stellar starting pitching. 

Rod Carews Birthday

Posted

9 minutes ago, Muppet said:

After the game last night it is clear that the Twins are just out of their league. They are a good team, but not a great team. They have too many players that don't hit well and strike out too often. Half of their relief pitchers are not consistent enough. . . .

Once we go through the however many stages of grief there are, and accept that this is not a world series winning team, we can sit back and have fun watching the young players develop; get excited about those rare good defensive plays (if they ever happen) from our supposed gold glove, mendoza line hitting outfield; and hope for the miracle that Cleveland, Detroit, and Chicago end up even worse catapulting the Twins into the playoffs. Once in the playoffs, any team, no matter how bad, can get lucky and take it all.

I can agree entirely that the Twins are not really in the same league as the Atlanta Braves are right now.  It's a really good team on a really big hot streak.  Ouch!  It's like getting hit by a runaway freight train. 

Like you, I'm very willing to sit back and watch some baseball.  Sometimes it will be frustrating, sometimes gratifying.  Watching baseball has been like that forever, especially with the Twins.  I also agree that once in the playoffs, anything can happen.  If playing with the data can get the team a few more wins and get them there, we need to embrace it and hope that it can make up for some of the other deficiencies. 

Jocko87

Posted

On 6/27/2023 at 2:35 PM, Strombomb said:

My problem is not that Rocco uses too much data. I think he uses the wrong data. As you stated, data comes from infinite sources. Sometimes "the human element" or some other intangible perception may outweigh the available numbers on your spreadsheet. 

And this is the key. Analytics is not about simply having data, it’s knowing which data to use in a given situation.  The OP listed a few good sources of other data points that a manager would use to know which stat cast inputs to apply.  As a manager in business gets more senior this is the skill that is most valuable for success. Seeing the field and knowing what is important is everything. With our team, the hitters seem to be focused on the wrong thing and we see what happens.

I like the old school examples in the article. Here’s one I like to use.  “Pitching defense and the 3-run home run.”  Isn’t very different from what all the modern analytics tell us.  That was in the 60s. It was just derived from different data sources.  Get guys on base and get a big hit has been the core of baseball since the beginning but the moneyball era acts like they discovered it.  It’s just been different presentations of the same thing for 100 years.  Sometimes I think Rocco forgets that.  Yes, the data is better and it can better inform the decisions but it should not override what your eye and experience see and know, it just informs it.

 

Unwinder

Posted

I think when fans complain about "analytics" they're really complaining about one of the following:

-Management continuing to play a player who has been in a deep slump because the "numbers" say he's a good player. In the fans mind, "analytics" here means management having their noses in spreadsheets which compile data from previous months, or previous seasons, and ignoring the player's current performance. 

-Management following a very simple top-level rule (righties should always hit against lefty pitchers, always pull the starting pitcher the third time through the order, hitters should always swing for the fences, never steal bases, never bunt...) and seemingly ignoring any complicating factor like which hitters are hot, whether the pitcher is cruising, whether there's a guy on third base and you only need one more run to walk it off, etc. What this fan is reacting against isn't "analytics," but bad, overly broad analytics. All those complicating factors are measurable, and could be used to make decisions mathematically.

-Management valuing a player for some esoteric stat (spin rate? launch angle?) and continuing to trust that statistic over the player's actual bad performance. I think fans sometimes even apply this to useful stats like OPS or WAR if the eye test tells them that a player strikes out too much or throws meatballs down the middle of the zone in clutch moments or whatever.

None of these errors are inherently what analytics are about, they're just a caricature of analytics.

smitbret

Posted

Data, like statistics analyzing player performance in various situations (batting averages, on-base percentages, pitching matchups), offers valuable insights. But similar to how even greats like Rod Carew strike out, data alone doesn't ensure success. Intuition, based on experience and past observations (anecdotal data), plays a role. Our brains subconsciously analyze situations and make judgments based on this data. Data analysis provides a more objective way to quantify those gut feelings. Historically, managers relied heavily on intuition due to limited data availability. However, with the rise of CRM data enrichment, teams can now gather deeper insights into player performance and potential by integrating past performance data with customer relationship management (CRM) data. This enrichment process can include scouting reports, medical history, and even social media sentiment analysis, providing a more holistic view of a player's value.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...