SSTN Interviews Author and Statistician Sean Lahman
Today we are here with Sean Lahman. Sean is an investigative reporter for the USA Today Network, but among baseball fans, he is best known as the creator and maintainer of the Lahman Baseball Database. Sean has edited or contributed to more than a dozen sports reference books. He is the Data Projects Manager for SABR.
Sean, it is great to have this discussion with you. Thanks for coming to Start Spreading the News.
Thanks for having me. Always appreciate a chance to talk about baseball.
Please begin by telling us a little about your career. How did you get started in writing.
I started writing about baseball when I was in college in the late 1980s. I think my first paid piece was for one of the Bill James books in 1987 or so.
There was an interesting window of time between the boom of interest in fantasy baseball – or “rotisserie baseball” as it was called then – and widespread access to the Internet. There was a huge hunger for information about individual players, for scouting reports and basic analysis. Most of that content was published in books and magazines, and I wrote for many of the most popular ones. I didn’t have my mind set on a journalism career yet, but that was how I got my start.
As you know, I am an elementary school principal. If I were to talk to the students of my school about what it takes to make it in journalism, what advice would you tell me to share with them?
I always tell people that if you want to be a writer, you have to read voraciously. You may not understand why a piece of writing moves you, but it gives you something to aspire to.
And secondly, I would say that the craft of writing is a discipline, and as with most other things, the only way to get better is to write and write and write. I wrote for my high school paper and probably did four or five articles during my senior year. Now, as a reporter for the USA Today network, I write that many just about every week. It’s not uncommon for me to file two or three stories in a day.
Please tell us about the Lahman Baseball Database.
It’s a collection of batting pitching and fielding statistics for every player, team and manager in major league history. I started making it available on my website in the mid-1990s and it’s become a useful resource for anyone interested in statistical analysis.
As you know, it is a resource used by so many.
How did you compile all of this information? The task just seems so overwhelming to even consider.
I started the hard way, by manually typing names and numbers from the baseball encyclopedia into spreadsheets. I worked with a book called “The Sports Encyclopedia Baseball” by Neft & Cohen that was organized by season. I would tackle one year at a time over the course of a few days and then work back in time.
The process had some flaws, beyond just my own typos. That book didn’t have fielding statistics or some other things like hitting by pitchers. And my version of the encyclopedia only went back to 1900.
In the mid-nineties, Pete Palmer and John Thorn published a CD-ROM version of their baseball encyclopedia, called Total Baseball, which contained each page of the printed version in a text format. I figured out how to write a series of programs to scrape the data from those pages and built a database structure to contain it all. The process helped me fill in gaps and correct errors.
The amazing thing is that all of this information is free to the public. What made you offer this great research work for free?
I just assumed that others would follow suit and build their own databases in the same way, but it was clear that wasn’t happening,
When I started getting interested in baseball history and statistical analysis, I realized that there was a huge barrier to getting started – the raw data just wasn’t available to the average person. If you wanted to ask a basic question – like which Yankees’ hitters had the most sacrifice bunts in their career – there was really no easy way to find an answer. If you wanted to tackle more complicated questions, like understanding how aging affects player performance, you had to start by building the entire dataset yourself.
So I wanted to help remove that barrier for others, knowing that there were a lot of people smarter than I was who’d come up with new and interesting ways of analyzing the data. And I think that’s exactly what happened.
This was, and is, fantastic work.
What was the first great bit of trivia that you uncovered in your research?
Never a big trivia guy, but I was always fascinated by oddities, like the fact that Ralph Kiner led the league in homeruns in each of his first seven seasons in the majors.
Or that Hall of Famers Stan Musial and Ken Griffey, Jr. were both born in the same tiny town of Donora, PA.
How about this one… Hall of Fame pitcher Hoyt Wilhelm hit a homer in his first big league at bat. He played 21 seasons and never hit another.
Those are all great pieces of trivia. Absolutely.
How often does the baseball database get updated? Do the statistics change in real time?
It’s updated once a year. When it first launched, the process of making annual updates was cumbersome and took months. There was a group of volunteers who did the work together. Recently, that process has become largely automated. A fellow named Ted Turocy wrote some programs that handle the process now.
And automation makes it possible for others to make updates every day. If you go to the great Baseball Reference website in the morning, you’ll find every player’s record has been updated to include what he did the night before.
Like many fans, I visit Baseball-Reference probably every single day.
Please tell us about your thoughts on adding the Negro League statistics to the MLB. What are some of the most impressive stats that will now be part of baseball’s Major League history?
I think it is a fantastic step, and long overdue. The segregation of baseball is a dark chapter in the history of the game and our society. I think its important to acknowledge the players who were denied an opportunity to compete at the highest levels… not just to honor their athletic prowess but to acknowledge the shameful practice that kept them out.
A handful of those Negro Leagues stars are already well known and some have been honored with induction into the Baseball Hall of Fame. But this step now to elevate all of the players on all of the teams is vitally important.
Twenty years ago, this process would have been much more difficult, but a great team of researchers has been working diligently to document every Negro Leagues game from newspaper accounts, giving us the most accurate and complete version of statistics for those leagues.
I think what’s most interesting about this is that we can move beyond some of the tall tales and put specific numbers beside those players. For example, Josh Gibson, long recognized as a great power hitter, also had a career batting average of .361. That could put him at number two on the all-time list behind only Ty Cobb’s .366.
You were on the team to write The Baseball Biographical Encyclopedia. Are there plans to update that great work?
Probably not, if only because it’s harder to publish those kinds of 8-pound books anymore. A group of us wrote about 2,000 biographical entries to comprise that book, and it was a big task to update them a few years later.
I will say that SABR launched a great project with the goal of writing biographies of every major league player, called the BioProject That’s more than 20,000 people, with more being added every year.
I also love the Bio Project and look forward to contributing to it soon.
Do you plan to write a book on your own?
I wouldn’t rule it out. I worked on more than a dozen sports encyclopedias and wrote a handful of books in the early 2000s. Baseball’s my first passion, so if I do write another it would likely be a baseball book.
In looking at the history of the Yankees, or baseball in general, what person or event would you like to see a book written about?
If I had a great answer to that, maybe I’d have the idea for my next book. There have been so many books written about great and colorful players from the Yankees, and across baseball. I’d certainly be interested in a book that took an objective look at the legacy of Commissioner Bud Selig. For better or worse, he had a huge impact on the game.
In the book and the movie The Natural, the main character wants nothing more than to walk down the street and have people say, “There goes Roy Hobbs, the best there ever was.” Who was the best baseball player you ever saw?
That was a great book, and let me take this opportunity to recommend it to folks who only saw the movie. It’s such a rich story, and – spoiler alert – the book has a much different ending.
The best player of my lifetime was Barry Bonds, and nobody else was close. You might make an argument for Henry Aaron or Willie Mays, and I only got to see each of them at the tale end of their careers. But for my money, nobody was as uniquely great as Bonds was in every facet of the game. A disciplined hitter. Tremendous power. Great speed both in the outfield and on the base paths, but more than just raw talent. Kind of prickly, yes, and tainted by steroids, but the best I ever saw.
A guy like Mike Trout might surpass him, and he is certainly off to a great start. But the hardest part is continuing to play well into your thirties and beyond. We’ll just have to wait and see where he ends up.
Our final question is really just a collection of short answers…
What was your favorite baseball team growing up?
I grew up in Cincinnati in the 1970s so I was and remain a huge fan of the Reds. I fell in love with baseball when I was listening to Reds games on the radio in my bedroom every night, following along and looking at baseball cards to learn more about the players.
Who was your favorite player?
I first got interested in baseball as a young kid while following Henry Aaron’s pursuit of the all-time home run record. He retired a few years later and as I said, I only got to see the tail end of his career. As a Reds fan in the 1970s, Pete Rose was my favorite, and later it was their great centerfielder Eric Davis.
What is your most prized collectible?
I’m not really much of a memorabilia collector, but I have way too many baseball books. I have some really old ones and some autographed by writers I really admire.
Who is your favorite musical group or artist?
Grew up listening to the Beatles and Bob Dylan, but I love all kinds of music. The Beastie Boys have always been one of my favorites. During the pandemic, I’ve been listening to a lot of the contemplative songs of Wilco. They seem to speak to the experience we’re all going through, for me at least.
What is your favorite food (if it is pizza, what is your favorite pizza restaurant)?
Have always been a sucker for a good cheese burger, and Five Guys is my current favorite. I always loved Skyline Chili, too, which is a Cincinnati landmark but probably little known elsewhere. I haven’t lived in Cincinnati for nearly thirty years now, and it’s hard to get that authentic Cincinnati-style chili anywhere else.
Please share anything else you’d like with our audience…
Thanks for the invitation, and lets hope we can get through this pandemic so we can all get back out to the ballpark.
This was so great. We appreciate that you took this time with us. Keep up the great work. It’s people like you that provide so much to the people who love this sport.