By Andy Singer
December 30, 2022
Most of you know me as someone that uses a combination of video and statistical analysis to evaluate players and on-field performance. I believe very strongly that many of the publicly available statistics from places like Baseball-Reference and Fangraphs have always given fans a relatively accurate assessment of player value. One such statistic, calculated separately by both of the aforementioned bastions of baseball statistic, is WAR (bWAR and fWAR, respectively). While WAR has often been criticized (badly, I might add) by some prominent members of the baseball community, we've reached the point where most arguments against WAR fall flat and most of us have accepted that WAR gets pretty close to allowing us to compare the total on-field value of players of both the same and different generations.
The counter to my previous passage comes in the form of a quote from Mark Twain: "There are three types of lies: lies, damned lies, and statistics." I wanted to very quickly note a growing problem I see with the current manner in which WAR is calculated, and why WAR is not the be-all, end-all for arguments regarding player value in the current game.
The trickiest statistical measurement to make in the game of baseball has always been defense. As the statistical revolution hit baseball, we really didn't have concrete measurements for fielding events the way we did with offense. This basic fact is why Billy Beane's Moneyball A's basically ignored defense, because no one had a reliable way to measure defense in baseball. Errors give a very limited, and subjective, view of defense, and until the last 20 years, there really wasn't much else, so it stands to reason that defensive statistical analysis is decades behind offensive or pitching analysis.
Multiple statistics have been developed over the years in an attempt to give us more well-rounded defensive value metrics. Early on, I remember heated arguments about Derek Jeter's defense at SS using range factor, a statistic we almost never talk about anymore just 15 years later. In more recent seasons, we have depended on Defensive Runs Saved (DRS) and Ultimate Zone Fielding (UZR) for publicly available calculations of defensive value, and these statistics are prominent in both Baseball-Reference's and Fangraphs' calculations of WAR.
The problem? As we get better at evaluating defense, we have reached the point where it is clear that neither DRS nor UZR are the best statistics out there for evaluating defense. Since Baseball Savant began publishing Statcast's numbers picked up through tracking systems available in every Major League ballpark, it has become clear that Outs Above Average (OAA) is far superior to DRS and UZR, and blends the game we see in numbers with the game we see with our eyes. OAA produces a number for defensive value, yes, but we can dig into a physical map of plays made versus plays missed to assign value to each play, the range and reaction timing it took to make the play, and the probability a play was possible based on statistical analysis of similar plays based on the batted ball's exit velocity, launch angle and direction, defensive positioning.
I used to get hung up on the times that DRS, UZR, and OAA wouldn't agree, and those disagreements led to some bad evaluations. One of particular note: I argued very strongly last offseason that Isiah Kiner-Falefa was a good target for the Yankees if they sought a defensive-minded SS with a contact-oriented bat as a stopgap to keep the seat warm for Volpe/Peraza. If you believed Baseball-Reference's WAR calculation, you'd think I was right, as IKF produced 3.0 bWAR in 2022! Fangraphs, though, was decidedly less thrilled with IKF's performance, valuing his play at 1.3 fWAR. What a difference?!? I think Fangraphs hit the mark a lot closer here, but that's besides the point. Fangraphs and Baseball-Reference evaluated IKF's offensive contribution similarly, but gave wildly different evaluations of his defense. OAA aligned much more closely with Fangraphs' calculations, but that is not always the case.
My issue is that we have enough data now to prove that OAA is far superior to the defensive metrics Baseball-Reference and Fangraphs have long used to calculate WAR. Fangraphs has seemingly already acknowledged this fact by including OAA in the fielding statistics it lists in its fielding stat list, even if it isn't used for WAR. The time has come for both Fangraphs and Baseball-Reference to update the algorithm for calculating WAR to include OAA.
WAR is not perfect, but it could be a lot better. Until OAA is included as part of any WAR calculation, I will only cite WAR explicitly in situations where defensive valuations match closely; otherwise, I will be forced to note the discrepancy as a form of asterisk. I hope we see updates in the near future.
OAA is far superior to the defensive metrics Baseball-Reference (bWAR), Fangraphs (fWAR) and Baseball Perspectus (WARP) have long used to calculate overall player performance versus a replacement level or "League Average" player. I never liked the term "replacement level player" because its very ambiguous. Who, or what, is a replacement player anyways? I have no idea and I never have! But, I certainly can wrap my noodle around a term like "League Average." For this reason, I very much appreciate Baseball-Reference for their willingness to list League Average on many of their various Team Stats pages, but unfortunately they don't give a baseline for League Average on player pages. That's where Baseball-Savant comes in - as they do this!
I…
I'm old school and like a few others like the old stat line but thanks Andy I learned something after reading your article. Like you defensive metrics are a work in progress as is WAR! 😀
WAR lost all credibility with me when it ranked Mattingly at -6 2 and Keith Hernandez at 1.3 dWAR.
In 1989, Ozzie Smith was assigned a dWAR of 4.8
in the course of a long career, Smith ended up with a career dWAR of 44.2
this seems reasonable.
in the course of a very long career, Derek Jeter ended his tenure with a career dWAR of -9.4
this also seems roughly reasonable.......
although I suspect that he was not really that valuable as a defender.
I know that StatCast calculates catch expectancy on plays. Is there somewhere a listing of each player's average catch expectancy success rate? I'd love to see if player A caught 92% of plays that were 80% catch expectancy, while Play B was only at 78%.