I haven't read enough of this thread to know whose side I'm taking, but I do understand that people who follow baseball and watch baseball and coach and manage and scout baseball are way better at telling you whether someone is a "good" or "not good" fielder than UZR. The human eye can tell the difference between routine plays and great plays, botched easy plays and botched tough plays, and also tell you whether a player is more or less likely to boot a baseball or be out of position, or not paying attention, when the pressure is on than during the late innings of a blowout.
I certainly understand statistics (WAR, OPS+, etc.) that attempt to take individual statistics and turn them into something that measures a player's impact on his team and the game. I think those statistics are maturing and are valuable. I think there will also be a measure of fielding at some point, but getting into a discussion about the "value" of UZR as it relates to an ability to tell whether someone is a good or bad fielder (regardless of the sample size, though I do agree that larger sample sizes over several seasons are more valuable than whether someone went a month without an error), ignores the realities of baseball.
JJ was a sure-handed shortstop with limited range (and that's exactly why the Twins got rid of him - they thought they needed more speed on the bases and in the field in order to take that next step, which hindsight tells us was an improvident move); Valencia occasionally made good plays, but too often did not make routine plays, and this failing seemed to increase with the pressure of a situation. He also consistently struck me as a person whose head was not 100% into the game. I don't know what UZR says about either of those guys - I just know by watching a bunch of baseball that I trusted Hardy and didn't trust Valencia.
If someone tells me, then, that UZR says that Valencia is a better fielder than Hardy, or even that he was for a short period of time, I tell them that UZR is an inaccurate measure of fielding.
This post is nearly perfect. The only thing I disagree with... I don't recall ever seeing Valencia make a great play. At least... not as often as I see others make a great play... Everything else is spot on.
This discussion is a case of everyone being kinda right. A big part of my job is stats and one advantage I have over a lot of people who are looking at the same stats is recognizing conclusions based on bad or insufficient data. Others will take the flawed data and let it lead them in the wrong direction. I believe I'm able to toss out the bad stuff,
UZR is just plain bad data. It's based on typical zones and they are assigned to a position. It doesn't take into account that players don't play in the same spot batter to batter. This is just one thing in the many shades of defense that help keep runs off the board. Failing to get that out keeps your defense on the field and the damage can be 5 or 6 runs or it can be nothing if the next batter is simply retired.
Chief is right... You can put as much bad data together and increase the sample size... It's still bad data and the result will be flawed...
Brock is right... Increasing the sample size will stabilize the result and make it more reliable. If you study jackknife replication... You understand that results will replicate at a certain point.
One thing to always keep in mind with baseball metrics... Including offensive metrics. The data is constantly changing. Player improve... Players get worse. They get hot... They get cold... The length of the hotness and coldness goes up and down. Sample Size only matters if the data is consistent. There is nothing consistent about baseball. Baseball Stats are great... Baseball Stats are valuable but there is some bogus stuff out there and making it gospel will ultimately be a mistake. Combo the stats with your eyes and you have a better chance at a solid conclusion.
Defensive metrics will get better because smart people are working on it but if the result rendered today says that Valencia is a defensive friend of the pitchers. It's bad data leading to a bad result.
This post from tmerrickkeller says it perfectly... Just watch Valencia play defense.