metachronistic

Fri, 01 Apr 2011

Opening day!

Opening day

Tim Lincecum and the Giants

Yesterday was opening day in Major League Baseball. Yahoo!

I watched bits of Tigers v. Yankees, Padres v. Cardinals, and much of the Giants v. Dodgers game, all on my iPhone. I haven’t subscribed to MLB.tv, but yesterday’s games were sponsored by Volvo, so anyone with the MLB 11 app could watch for free. The quality is reasonable, and the app seems smart about not consuming all the available bandwidth.

I root for the A’s, Giants and Phillies (and favor the Tigers and Cubs when there are no other rooting interests). All three are pretty good teams, and all of them have the starting pitching to contend, but may have trouble scoring runs. The A’s have no power threat and a cadre of fragile, replacement-level position players, the Giants need a good season from Posey, Sandoval and rookie Brandon Belt, and the Phillies are hurting without Werth and Utley. But however it turns out, it’s great to have baseball back.

Well, however it turns out as long as the Yankees, Angels, Braves and Dodgers don’t wind up in the World Series…

Tags: , , , ,
cswingle @ 6:06:36 -0800

Sun, 31 Jan 2010

A’s 2010 Roster heatmap

I recently saw a pair of blog posts showing how to make heatmaps with straight R and with ggplot2. Basketball doesn’t really interest me, so I figured I’d attempt to do the same thing for the 2010 Oakland Athletics 40-man roster. Results are at the bottom of the post.

First, I needed to get the 40-man roster:

$ w3m -dump "http://oakland.athletics.mlb.com/team/roster_40man.jsp?c_id=oak" > 40man

Then trim it down so it’s just a listing of the player’s names.

Next, get the baseball data bank (BDB) database from http://baseball-databank.org/, convert and insert it into a PostgreSQL database using mysql2pgsql.perl.

A Python script reads the names from the roster, and dumps a CSV file of the batting and pitching data for the past two seasons for the players passed in.

$ cat 40man_names | ./get_two-year_batter_stats.py

The batting data looks like this:

            name  , age,   g,    ba,   obp,   slg,   ops,  rc,   hrr,    kr,   bbr
Daric Barton (1B) ,  25, 194, 0.238, 0.342, 0.365, 0.707,  73, 0.017, 0.173, 0.134
Travis Buck (RF)  ,  27,  74, 0.223, 0.289, 0.392, 0.682,  28, 0.035, 0.202, 0.073
Chris Carter (LF) ,  28,  13, 0.261, 0.320, 0.261, 0.581,   1, 0.000, 0.360, 0.080
...

I’ve used the counting stats in the BDB to calculate batting average (ba), on-base percentage (obp), slugging percentage (slg), OPS (on-base percentage + slugging percentage), runs created (rc), home run rate (hrr), strikeout rate (kr) and walks rate (bbr).

And the pitching data:

            name   , age,  g,      ip,  w, l,    sv,    wp,    lp,    wf,   era,    k9,   bb9,   hr9
Brett Anderson (P) ,  22,  30, 175.33, 11,  11,   0,  0.37,  0.37,  0.00,  4.06,  7.70,  2.36,  1.03
Andrew Bailey (P)  ,  26,  68,  83.33,  6,   3,  26,  0.09,  0.04,  0.04,  1.84,  9.83,  2.92,  0.54
Jerry Blevins (P)  ,  27,  56,  60.00,  1,   3,   0,  0.02,  0.05, -0.04,  3.75,  8.70,  3.30,  0.60
...

Here I’ve calculated innings pitched (ip), winning percentage (wp), losing percentage (lp), win frequency (wf), earned run average (era), strikeouts per nine innings (k9), walks per nine (bb9), and home runs given up per nine innings (hr9). All these stats are for the last two Major League seasons.

Finally, generate the heat maps in R. For batting statistics:

library(ggplot2)
mlb <- read.csv('batting.csv')
mlb$name <- with(mlb, reorder(name, ops))
mlb.m <- melt(mlb)
mlb.m <- ddply(mlb.m, .(variable), transform, rescale = rescale(value))
(p <- ggplot(mlb.m, aes(variable, name)) +
+   geom_tile(aes(fill = rescale), colour = "white") +
+   scale_fill_gradient(low = "gold", high = "darkgreen"))
base_size <- 14
p + theme_grey(base_size = base_size) + labs(x = "", y = "") +
+   scale_x_discrete(expand = c(0, 0)) + scale_y_discrete(expand = c(0, 0)) +
+   opts(legend.position = "none", axis.ticks = theme_blank(),
+   axis.text.x = theme_text(size = base_size * 0.8, angle = 0, hjust = 0.5, colour = "black"),
+   axis.text.y = theme_text(size = base_size * 0.8, lineheight = 0.9, colour="black", hjust = 1))
    

Pitching statistics are the same, except the third line (where I order the data frame) is:

mlb$name <- with(mlb, reorder(name, 1/(era+0.1)))
    

The results:

A’s batting heatmap, ordered by OPS

A’s pitching heatmap, ordered by ERA

You have to keep the number of games (or innings pitched for pitchers) in mind when you look at these charts. I don’t even know who some of those guys are, probably because they’ve only barely played in the majors. It might make some sense to split the pitching plot into plots for starters and relievers, but I’d need a good way to determine a pitcher’s status (innings pitched divided by games beyond some threshold, perhaps?).

As for the A’s, I like their pitching, but have serious doubts about their offense. I sure hope some of the younger guys on this chart start reaching their power potential because having Jack Cust as your only offensive weapon doesn’t bode well for the team scoring runs.

Tags: , , , ,
cswingle @ 14:40:21 -0800

Sun, 05 Oct 2008

Baseball fuzz

Baseball fuzz

oct 5, 2008; red sox v. angels

Major League Baseball is doing it’s best to ruin my baseball experience this season. I finally gave up on Gameday Audio after all my struggles trying to get it to play smoothly on my MacBook Pro (Windows Media format might work well on Windows, but it frigging sucks everywhere else). Once the piteous spectacle of the NFL started, my local AM radio station stopped broadcasting Sunday Night Baseball (and all the baseball playoff games this Sunday), which was the only opportunity I had to listen to games. And, the first round of the playoffs and half of the second round aren’t on regular television anymore (they’re on TBS).

Turns out that TBS is broadcast over the air on channel 28, but as you can see from the image, it doesn’t come in very well at our house. Despite all that, though, I’m watching and scoring the game. MLB hasn’t completely lost me yet.

Tags: , , , ,
cswingle @ 17:23:37 -0800

Sat, 04 Oct 2008

General Pencils

Oct 3, 2008; Red Sox v. Anaheim

oct 3, 2008; red sox v. angels

Yesterday I got three dozen pencils from General Pencil Company, one of the few remaining pencil manufacturers that still make their pencils in the United States. My favorite pencils had been Dixon Ticonderoga’s, but they’ve moved all their production to foreign countries, including China. Most people probably don’t think much about pencils, but there’s a big difference between a good pencil and a bad one. The crap they sell at office superstores have uneven graphite, poorly centered lead, small erasers, thin paint, are commonly made in China, and probably don’t use sustainably produced wood for the case. Mechanical pencils stay sharp and are refillable, but they just don’t feel as good as a wooden-cased pencil, and I think the environmental impact of a sustainably produced wooden pencil is lower than all the plastic and packaging of mechanical pencils and their supplies.

I got three dozen “Semi-Hex,” #2/HB pencils (number 492-2/HB). I’d never seen a General’s pencil in the store, and never (as far as I know) used one, so this was an experiment to see if I’ve found an American-made replacement for the Ticonderoga. Since it’s baseball playoff time, I tested them out by scoring yesterday’s playoff game between the Boston Red Sox and the California Angels (or whatever they’re calling themselves this year). In this year’s playoffs I’m rooting for the Cubs and Phillies in the National League and the Rays in the American League. But I’m primarily an A’s fan, so a loss by the Angels is always a win for this A’s fan.

The game was a good one (especially since the Angels lost), with a reasonable amount of scoring, and a very exciting ending. And the pencil was fantastic. The lead is very even, with none of the little hard bits you’ll find in a poorly made pencil, it makes a nice dark line, and isn’t so soft that it smudges easily. With the Ticonderoga, I’m torn between the #2 and #2/HB because the HB is just a touch too soft, and the #2 is too hard and doesn’t write well outside because the paper gets softer when it’s humid. The General #2/HB seems slightly harder than the Ticonderoga HB, so I was able to make it through the whole game on one sharpening. I think I’ve found a winner.

Hopefully when I’ve used all three dozen, General will still be making pencils in the U.S.

Note that if you’re interested in learning to score, I’ve got a reasonably complete Guide to Scoring Baseball, and a series of free scorecards you can download and print.

Tags: , , , ,
cswingle @ 11:19:41 -0800

Sat, 12 Apr 2008

Living room panorama

Living room panorama

living room from the couch

I’m sitting here on the couch watching baseball (Yankees v. Red Sox again) admiring the view out all our large windows. It was supposed to be cloudy today, but thus far it’s been clear and sunny. Makes me feel a bit guilty to be sitting here.

The panoramic image from where I’m sitting was stitched together using hugin. Despite making no effort to control the exposure on my little point-and-shoot camera and a pretty casual shooting technique, hugin really made it easy. You load the images into the program, select control points between adjacent photos, and it warps and manipulates the images so they fit together. If you click on the image to view the full size version, you can see some of the blurring and idiosyncrasies, but for very little effort, I think the results are quite impressive.

From left to right, you can see the front door and east window which looks out over the deck and the Creek. On the south wall is a bookshelf in the corner, the kitchen table and large south facing window overlooking the dog yard, DVD cabinet, TV and stereo, and the sliding glass doors that lead out to the deck. Piper is sitting in front of the door looking outside. The west wall has a second bookshelf, a side table (which is blocked by my laptop next to me), another large window overlooking Dog Island and the slough. To the right of the window is our heater and the baby gate that blocks off the stairs. The corner of the blue wall behind me shows up on the right of the image.

Might have to give this tool a try outside…

Tags: , , , , , , ,
cswingle @ 13:14:39 -0800

Wed, 25 Apr 2007

Why Sports?

summer of ’49

summer of ’49

The following comes from The Millions blog about David Halberstam’s passing (two of his best know works are The Best and the Brightest about the war in Vietnam, and Summer of ’49 about the 1949 pennant race between the Red Sox and Yankees). I think it’s a great commentary on why people watch and enjoy sports.

There is something to the notion of sports as a balm for citizens suffering from war fatigue. They are soldiers abroad gathered in a tent in the desert somewhere to watch the Super Bowl on television, and they are children bypassing front page headlines that scream death and destruction in favor of the sports section and the box scores of games that they were forbidden to watch because of woefully premature bed times. Sporting events bring people together in celebration of achievement, rather than in protest of failure, and are thus both a distraction from the duty of citizens as witnesses to history, no matter how grim, and at the same time real and not insignificant demonstrations of the values of a free society, complete with overpriced cotton candy, and (today) overpriced athletes. Athletic competition, so often couched in terms of battle when described, transcends violence. It is an elevated and, I would argue, rather sophisticated form of human interaction.

Tags: , ,
cswingle @ 5:39:20 -0800

Mon, 23 Apr 2007

Four in a row

four homers

four consecutive homers

When I saw that the game on Sunday Night Baseball was yet another Yankees and Red Sox matchup I complained to Andrea about how often the major media outlets show this particular matchup. It was the game on Fox Saturday Baseball this week, and guess what? It’s the game on Fox Saturday Baseball next week too. Haven’t we all seen enough Derek Jeter?* It seems like three quarters of the games I can watch on TV or listen to on the radio are Yankees / Red Sox, Cubs / Cardinals or Giants / Dodgers games. What about everyone else? It might be fun to see all the young talent in Tampa or Miami, or see a game televised from the new ballpark in Pittsburg. I’m a Giants and A’s fan, but I still like to see the rest of the league play once and awhile.

Baseball is baseball, though, so I grudgingly listened to yesterday’s game. And what a game it turned out to be! One of the great things about baseball between any two teams is that there’s always something interesting going on, and in last night’s game the Red Sox hit four consecutive home runs off the same pitcher. That’s only the fifth time in Major League history that a team has hit four in a row (last year’s Dodgers did it in a late inning comeback), and only the second time in history it’s been done against the same pitcher.

And even better than that, it was a tight game featuring the Japanese phenomenon Dice-K Matsuzaka (he didn’t pitch very well, but got the win), and ended in the top of the ninth with the go-ahead run at the plate in the form of Alex Rodriguez striking out to Jonathan Papelbon.

*Yes!

Tags:
cswingle @ 6:44:23 -0800

Sun, 01 Apr 2007

Opening Day 2007

opening day 2007

opening day, mets scoring

In Alaska, winter seems to turn to spring very quickly. That’s especially true this year because we’ve had more than six weeks of well below normal temperatures. Suddenly this weekend, it’s above freezing, the snow on the roof is starting to melt into the gutters, and the deck is dry for the first time since the baseball season ended last October.

Right on the heels of the warm spell is the start of the baseball season. We’ve been so busy this winter with dog mushing that I haven’t actually missed baseball that much, but listening to tonight’s game between the Mets and Cardinals at New Busch Stadium brought the game, it’s intricacies, and the excitement back. The game wasn’t a nail biter, with the Mets scoring two unanswered runs in both the third and fourth innings, but there was plenty of defensive excitement to go around. Some spectacular double plays, a perfect strike from center field to nail David Eckstein at home plate, and a great pitching performance from Tom Glavine was a great way to start off the 2007 season.

Tomorrow the A’s start their season in Seattle without Barry Zito, but hopefully with an improved offense and some better luck keeping players healthy. A full season of Rich Harden and former Alaska Goldpanner Bobby Crosby should go a long way to another AL West title.

Tags:
cswingle @ 18:19:44 -0800

Sun, 26 Feb 2006

There’s no such thing as a doubles hitter

A couple days ago, in an article about prospect analysis in baseball (subscription required) Nate Silver produced a cool table showing the year to year correlations of the six major batting events. This morning while I wait for my dough to rise, I decided to replicate this analysis with my new found baseball hack ability.

You can download the R program code for the analysis by clicking on the link.

Here’s the result, showing the 2004 to 2005 correlations for rate-adjusted batting statistics for all players with more than 250 at-bats in both seasons:

Hits / PA           0.422
Singles / PA        0.663
Doubles / PA        0.369
Triples / PA        0.501
Home Runs / PA      0.702
Walks / PA          0.718
Strikeouts / PA     0.813
Plate appearances   0.405

What Silver was trying to show by presenting his table (which included all year to year correlations since World War II) is that “there’s really no such thing as a doubles hitter.”

You can see from looking at the table that there’s very little relationship between how many doubles a hitter hit in 2004 and how many they got in 2005. But a home run hitter in one season is likely to hit them at the same rate in the next season. Also note that strikeouts and walks are very highly correlated. So, 2004 and 2005 strikeout leader Adam Dunn is likely to strike out more than 150 times in 2006. Thankfully for Reds fans, he’ll probably also hit more than 40 home runs.

The last number is also interesting. There isn’t a great correlation between plate appearances between seasons. This is probably a combination of older players breaking down between 2004 and 2005, and younger players stepping in to take their place at the plate.

Tags: ,
cswingle @ 9:57:43 -0800

Sat, 25 Feb 2006

Buying wins in Major League Baseball

Yesterday I discovered (and ordered!) a new book from O’Reilly called Baseball Hacks by Joseph Alder. I’ve got a bookshelf full of O’Reilly books on other computer subjects, so I’m very excited to see this. On the web site for the book, there are a couple example hacks from the book.

Last year I spent some time getting the Lahman database into MySQL so I could fool around with some advanced baseball statistics. The Lahman database is a Microsoft Access database, and doesn’t allow re-distribution, so for an open-source advocate like me, this isn’t exactly the best source for baseball information. It took me a few days to get it all into MySQL successfully, and any of my improvements couldn’t be distributed.

Well from reading the sample hacks, I discovered there’s a less restrictive database that’s also available for MySQL (a free database server). In addition, the author of Baseball Hacks shows how to connect a MySQL database with the fantastic statistical package R. R is also free, and is incredibly powerful. I also found previous article by the same author. Some of what appears below is based on that article.

Anyway, I can’t wait to get the book to see what’s in it, but meantime I did a very simple analysis comparing payroll to wins for the 2005 season. For the 2005 season, team payroll numbers range from a low of $29.7 million for the Tampa Bay Devil Rays to the Yankee’s astronomical payroll of $208.3 million. The second place team, the Red Sox, spent only $123.5 million on player payrolls in 2005. What does all that money buy? I’m sure the owners hope it’ll buy them enough wins to make it to the playoffs, and hopefully win the World Series. The White Sox, winners in 2005, were 13th in payroll at $75.2 million.

It turns out that payroll doesn’t really account for a lot of whether a team wins or loses. It explained only 24% of the variation in wins in 2005. For comparison, a team’s hits and earned run average explains 72% of the variation in wins. Obviously, getting lots of hits, and keeping your opponent from scoring runs will contribute to winning a lot of games.

But what I want to see is whether a team did better than expected based on their player spending. The Yankees didn’t wind up with the best record in baseball, despite spending more than twice as much as every other team in baseball except the Red Sox. How badly did they under-perform?

Not that badly, actually. The plot below shows the relationship between payroll and wins for 2005. The straight line is the regression line showing the best linear fit to the data. The team letters on the plot show how they actually performed. Teams that show up above the line, played better than their salaries would have predicted. Those below, did much worse.

Payroll v. Wins, 2005

For example, look how far the Chicago White Sox (CHA) are from the regression line. The Cardinals also wind up well above what we would expect based solely on their salaries (and that’s with Scott Rolen on the DL the whole season!). Also check out the Cleveland Indians. They’re a team that has a lot of very good younger players who aren’t eligible for arbitration yet, but have loads of talent.

You can see the Yankees over on the right, far from all the other teams. Based on their payroll, they should have won 102 games in 2005, but only managed 95. The Kansas City Royals were much worse, only managing to win 56 games when their player salaries predicted 75 wins. It’s easy to explain why teams like the Dodgers or Giants didn’t do well in 2005—their high paid players were injured for most of the year—but something else must be going on with Seattle and Kansas City.

What does all this tell us about baseball? Well, I’d argue that this metric (payroll vs. wins) tells us something about how effective the front office of a team is. Smart general managers will pick up talent that is undervalued by the market, buying more wins than they’re paying for. Also, teams with a good farm system can “grow their own” talent, rather than having to buy it on the free market. Teams like Cleveland and Oakland are good examples of this. The excesses of George Steinbrenner should have been enough to buy a World Series championship, but the Yankee front office overpaid for all their veteran talent, and in 2005, they didn’t live up to their high salaries.

If you want to see the R code I used to generate the plot, you can download it from the link.

Tags: ,
cswingle @ 20:34:26 -0800
Next Page »

Back to Swingley Development
Powered by WordPress

Switch to our mobile site