What If Everything is Actually Data?
Twins Video
Data and analytics have become scary words across the land of Minnesota Twins baseball recently. Rocco Baldelli gets criticized regularly for never lifting his head out of the spreadsheet or not trusting his players because the computer told him not to. Many say that the manager needs to go with his gut more often in order to win games or that he needs to take game situations to mind when making decisions. Let’s dig in a little bit.
So what is this “data”? Is data all those batting, pitching, and fielding statistics broken down into every conceivable combination and minute detail? Of course it is. That’s what we all think of. How does Batter X perform against a particular pitcher? Are there platoon advantages to be gained from Batter Y? We need a home run, who is most likely to hit one right now? We just need to advance the runner, is Batter Z the right guy to do that? On the pitching side, how well does Pitcher A perform in his third time through the lineup? What pitch should Pitcher B throw to Hitter W to get him out? There are literally hundreds of different statistics out there to analyze and utilize. The breakdown can go on forever and possibly to the point of silliness, like “What is Batter Q’s hitting line against a submarining lefty pitcher wearing a red uniform north of the Mason Dixon line on a windy Thursday during Lent?”
So that’s what we understand data to be. It’s all about numbers right? Well, maybe not. The things we think of a data are merely numerically quantifying and confirming what is true (or disproving what is thought to be true). For example, in 1977, everyone knew that Rod Carew was the guy you wanted batting if you wanted to start a rally. That was common knowledge. Why? Well, because he seems to get a lot of hits and walks and doesn’t strike out a ton. It’s a no brainer, right? Yes. That’s right. However, to use a simple piece of “data”, his on base percentage that year was almost .450 (I guess that happens when you hit .388!). Those numbers reinforce or “prove” that he was the guy that the Twins want batting in that situation. Until they don’t. Sometimes Rod Carew struck out. In fact, in arguably the greatest hitting season in team history, he made outs 55% of the time. Even so, he was still the best option in Gene Mauch’s and pretty much everyone else’s mind.
What about that “gut feeling”? It’s called anecdotal data. It is a belief in something based on some evidence that the decision-maker values. It’s “the eye test”. He “looks like a major leaguer”. “What a great pitch!” Why do people say that? Because they have seen things happen that confirm their feelings. Their brain is comparing it to other things they have seen and is making a value judgement based on their experiences. We don’t realize it, but the personal computer in our head is keeping track and counting occurrences of how things play out on the baseball field. The brain is analyzing the data that it sees and is coming to a decision. We don’t think about it that way because we don’t think out loud and verbalize that we are analyzing. We just “do it”. No one needs to tell us to drive on the right side of the road, we just know (without knowing any numerical statistics) that driving on the left side would lead to very bad outcomes eventually.
Back in days of old, when the 1927 Yankees came to town, managers (and pitchers) knew that they were in trouble getting through the heart of the order. They probably knew Babe Ruth’s and Lou Gehrig’s batting average and the number of home runs they hit, but that’s about all they had. The rest was just their gut – what they thought might be true based on what they saw in the past. As time went on, more and more ways to quantify those gut feelings came along and gradually came into broader use across the league. Do you think that manager Bucky Harris of the 1927 Washington Senators would have liked to have some statistical analysis that would help inform his decisions when facing the Bronx Bombers? I’m certain that he would have. He would likely have tried to use any advantage he could come up with and knowing where Ruth and Gehrig’s weak spots in the strike zone were would have come in very handy. Goose Goslin and Tris Speaker were good, but they were never going to keep up with the unchecked Bambino and Slambino. By the way, Bucky Harris was also the 2nd baseman in addition to being the manager that year who used whatever data he could conceive of to beat those damn Yankees. It didn’t work. The Senators were pretty good in 1927, but still finished in 3rd place.
So let’s return to 2023. Why do people think that Rocco Baldelli uses data and analytics too much? Probably because he talks about it a lot and because the game across the league has changed more than fans of one team realize. Rocco is a smart guy, and a numbers guy. He’s playing the odds using as much actuarial science as he can in most of the baseball decisions he makes. Spoiler Alert: This will not always result in decisions working out! Just as with Rod Carew making outs 55% of the time in 1977, it is not an exact science. If Choice A has a 45% chance of success and Choice B has a 25% chance of success, I’m going with Choice A every single time, even if sometimes it will go the other direction. This is what insurance companies do all the time when they set the rates that they charge for your insurance policy. They know that sometimes they will be wrong, but the odds (informed by more statistical analysis than I want to think about or can comprehend) say that over the long term they will have made a good decision. Add in the human element and those decisions get even more complex.
But Rocco still uses too much data! If you mean he takes all the information available to him and factors it into the decision, then yes, he uses too much data if that’s your definition. Did Tom Kelly use all the information he had to make decisions? Ron Gardenhire? I’m pretty sure they did and I’m pretty sure they would like to use the additional information that’s available now as well. Are they better or worse managers than Rocco Baldelli? I’m not here to answer that, but I’m certain that the determining factor shouldn’t be whether they used the most complete information available to them to make decisions. Sometimes the data will lead you in the right direction and sometimes it will be wrong, but decisions have to be based on something! What do you think?


17 Comments
Recommended Comments
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now