Baseball Time Series

I recently came across the MLB interactive standings chart, a pretty cool visualization of team performance over time. I think time-series analysis is relatively undervalued in baseball: we generally look at a team’s performance to date, but don’t look at the ups and downs of the season as it progresses in any clear way. I like this chart a lot: it shows relative performance over time, and you can learn a lot quickly about the season each of these teams has had indivdiually and in a comparative sense.

As far as I can tell, MLB doesn’t have one built for this season, so I thought I’d put one together (above). I’ll link to the code below, but the high level workflow for putting this chart together is as follows:

  1. Scrape daily win/loss data from Baseball Reference: BR allows you to navigate to a specific day and see all of the standings information across the league (for instance, July 30, 2016). Each day is on a different page but the URLs are similar other than the day and month pieces, so generating a list of all URLs for the season to date wasn’t too difficult. After that was complete I put an rvest call inside a loop that scrapes the team, wins, and losses on every page.

  2. Clean up the resulting data. After the scraping loop runs over all of the day pages, it is then just a matter of selecting the AL East teams, cleaning up the date variable a little bit, and generating the ‘games above/below .500 variable’ (wins - losses).

  3. I created the chart below using the highcharter package, a fantastic R wrapper for the highcharts D3 library. I haven’t figured out a way to pull in custom series icons for each team, but maybe I’ll have that done by the time I need to update this for the end of the regular season. I used the Team Colors site to get the right color for each team.

The resulting chart shows how far above or below .500 each team is on a day to day basis. You can select or de-select each series using the legend at the bottom of the chart to de-clutter.

You can get the R code to produce the chart above here.

Game-to-game progression isn’t just interesting at a team level: we can also look at how players are performing as the season goes on. I decided to take a look at how a few of the Jays have been playing this season. The statistic I chose to look at is OPS (on base plus slugging), the sum of a players’ on base percentage (how often a player reaches base) and their slugging percentage (total bases over at bats, a measure of hitting power). OPS isn’t perfect, but it is a better representation of a player’s offensive value than their batting average and I wasn’t able to find game-to-game data for more advanced statistics like wOBA, WRC+, or WAR. I think a full season time series look at cumulative WAR build up would be very cool to see, so I’ll keep hunting for that data. In the meantime, the chart below shows cumulative OPS for some of the Jays, starting about a month into the season to avoid capturing the noise generated by the small sample sizes at the beginning of play. Keep in mind that according to Fangraphs/wikipedia, 1.000 is excellent, 0.900 is great, 0.800 is above average, and 0.710 is average for the league.

I think the best way to look at this is by de-selecting all but one player and looking at the trend, but you can play with it to look at relative performance as well. Donaldson, Encarnacion, and Saunders are having fantastic seasons so far, and Josh and Edwin have been playing especially well as of late. Josh’s OPS is so good, I thought it would be interesting to see how he is doing relative to some of the top players in the MLB this season. I identified this select group of players by taking the top 5 in terms of Fangraphs WAR (wins above replacement).

In terms of OPS, Donaldson, Trout, and Altuve are clearly playing fantastic offensive baseball. Their season trajectories, however, are quite different. Trout recovered quickly from a slow offensive start and has consistently played at a league-leading level since early May. Altuve had a stellar start to the season and has regressed slightly to being “only” one of the best in the game. Donaldson had a bit of a slump in May (keep in mind, he was still playing well above the league average) and has been on fire since then. It might be a little early to call it, but the AL MVP looks like it will have to go to one of these three players.

UPDATE: I re-ran all of the charts above now that the regular season is over. Interesting to see how stable the Jays OPS trends got. In terms of MVP-calibre players, the usual suspects are still at the top of the league. By the end of the regular season, Trout still dominates in terms of OPS and WAR, accumulating a ridiculous 9.4, which is a whole win above the next best player, Kris Bryant (8.4). Betts (7.7 WAR) and Donaldson (7.6) are still near the top in the 3 and 4 positions respectively; this despite the semi-slump Donaldson had going from his best play in July.

You can get the R code to produce the player charts above here.