Analyzing Toronto’s Albacore Friday Night Race

I like sailing, and I like doing data analysis with R. It isn’t often I can bring those two things together! Don’t take this too seriously, it’s just a bit of fun, procrastination, and me noodling around on my computer for a bit. Enjoy, and thanks to all the sailors for a great summer!

Prelims

Load packages and set preliminary functions:

library(tidyverse)
library(rvest)
library(ggthemes)
library(lubridate)
library(ggrepel)
library(ggridges)

setwd("~/Dropbox/Code/Sailing/")

theme <- theme_classic() +
  theme(legend.position="bottom",
        axis.ticks = element_blank(),
        legend.title = element_blank())
theme_set(theme)

Import data from albacore.ca:

raw <- read_html("https://albacore.ca/race_results/2019_results_FNR_Helm_club-owned-boats.html") %>% 
  html_table() %>% 
  as.data.frame()

Data Cleansing

Clean the data into a useable dataframe by peforming the following steps:

remove unused columns
pivoting data into long format (with Weeks in one column instead of several)
adjusting the Week variable into a date that is machine-readable
adjusting the Score variable into a quantiative value that we can use

df <- raw %>%
  select(-Rank, -Total, -Drop, -Club.Rank, -Avg) %>% 
  gather(Week, Score, -c(1:2)) %>%
  mutate(Week = sub('X', '', Week)) %>%
  mutate(Week = mdy(paste(Week, "2019", sep = "."))) %>% 
  mutate(Score = case_when(grepl("\\(", Score) ~ "drop",
                           grepl("DNC", Score) ~ "1000",
                           grepl("OCS", Score) ~ sub(" OCS", "", Score),
                           grepl("DNF", Score) ~ sub(" DNF", "", Score),
                           TRUE ~ Score)) %>%
  mutate(Score = as.numeric(Score))

Helm Stats

The first thing we can do is take a look at some basic summary statistics. In the table below, helms are ranked from 1:n based on their final position after drops. I’ve only included helms who don’t have a 1000 DNC being counted in their final score. The average score represents the average of retained races (all score after drops). It’s neat to see that with one exception, the worst retained score amongst the top 10 sailors is a 14, indicating a high degree of consistency at the top of the fleet. The exception is Allan’s OCS 35; without it, he would have been second or third in the overall rankings.

df %>% 
  group_by(Name, Club) %>% 
  summarize(`Total Points` = sum(Score, na.rm = T),
            Best = min(Score, na.rm = T),
            Average = round(mean(Score, na.rm = T), 1),
            Worst = max(Score, na.rm = T)) %>%
  ungroup() %>% 
  arrange(`Total Points`) %>%
  filter(`Total Points` < 1000) %>% 
  mutate(`Final Rank` = row_number()) %>% 
  select(7, 1:6)

Helm Results

In the visual below, darker dots indicate more than one race with that score. Top 8 results only, and helms with a DNC in the top 8 are excluded. Helms are ranked from top to bottom based on cumulative score (top being the best or lowest scoring helm). There’s a pretty clear trend in the variability of results: the higher the overall finish, the more consistent the results. For an exception, see David O’Neil (TISC).

df %>% 
  group_by(Name, Club) %>% 
  mutate(Total_Points = sum(Score, na.rm = T)) %>% 
  filter(Total_Points < 1000) %>%
  na.omit() %>% 
  ggplot(aes(Score, reorder(Name, -Total_Points), color = Club)) +
  geom_vline(xintercept = c(1:5, 10, 20, 30), color = "grey90") +
  geom_point(alpha = .5) +
  geom_path(alpha = .3) +
  scale_x_continuous(breaks=c(1,5,10,20,30), position = "top") +
  labs(y = "", x = "") +
  scale_color_brewer(palette = "Set1")

Helm Trends

Retained results (the ones kept after drops) are of course only part of the story. Each sailor’s own sense of how the season went likely also includes the bad results they (thankfully) were able to drop, as well as the weeks they weren’t able to make it. The visual below shows the season results trend for the top 9 helms overall.¹ The 0’s represent races that were run, but that helm didn’t compete. The red scores were dropped from the final totals. Gaps indicate weeks were there was no race.

trenddf <- raw %>%
  select(-Rank, -Total, -Drop, -Club.Rank, -Avg) %>%
  gather(Week, Score, -c(1:2)) %>%
  mutate(Week = sub('X', '', Week)) %>%
  mutate(Week = paste(Week, "2019", sep = ".")) %>%
  mutate(Week = mdy(Week)) %>%
  mutate(Score2 = as.numeric(str_extract(Score, "(\\d)+"))) %>% 
  mutate(Score_Type = case_when(Score2 > 0 & grepl("\\(", Score) ~ "Drop",
                                Score2 == 1000 ~ "DNC",
                                TRUE ~ "Race")) %>% 
  mutate(Score_Type = if_else(Score2 == 1000 & Score_Type == "Drop", "DNC Drop", Score_Type)) %>% 
  mutate(Score = if_else(Score2 == 1000 & Score_Type == "DNC Drop", 0, Score2)) %>% 
  group_by(Name) %>% 
  mutate(Total = sum(Score[Score_Type %in% c("Race","DNC")])) %>% 
  ungroup() %>% 
  arrange(Week,Total) %>% 
  group_by(Week) %>% 
  mutate(Rank = row_number())

trenddf %>%
  filter(Rank %in% c(1:9)) %>% 
  ggplot(aes(Week, Score, label = Score)) +
  geom_text(aes(color = Score_Type), size = 2.5) +
  facet_wrap(~Name) +
  scale_color_manual(values = c("grey80", "red","forestgreen")) +
  theme_few() +
  labs(x = "", y = "") +
  ylim(-5, 40) +
  theme(legend.title = element_blank(),
        legend.position = "none", 
        axis.ticks = element_blank(),
        axis.text = element_blank())

There are some interesting trends in here. It looks like JTown racers got better at the tail end of the season, with Lianna, Ken, and Tom all having a pretty strong back half. George bookended his stellar season (6 bullets!!) with his only two non-mugs races, thankfully both dropped. Paul made the most FNR appearances of any racer in the top 9 – he was able to drop 6 races, and missed only one race all year long. Interestingly, all of the sailors above benefitted from a drop. Allan finished 10th, the highest ranked sailor without a drop (you’ll see his plot in the one below); if he had one more race he could have dropped the OCS 35, and would have finished second or third in the overall standings.

I like the trend facets above — let’s see some more people (ranked 10-18).

trenddf %>%
  filter(Rank %in% c(10:18)) %>% 
  ggplot(aes(Week, Score, label = Score)) +
  geom_text(aes(color = Score_Type), size = 2.5) +
  facet_wrap(~Name) +
  scale_color_manual(values = c("grey80", "red","forestgreen")) +
  theme_few() +
  labs(x = "", y = "") +
  ylim(-5, 50) +
  theme(legend.title = element_blank(),
        legend.position = "none", 
        axis.ticks = element_blank(),
        axis.text = element_blank())

And from 19 to the lowest ranked helm without a retained DNC:

trenddf %>%
  filter(Total < 1000) %>% 
  filter(Rank > 18) %>% 
  ggplot(aes(Week, Score, label = Score)) +
  geom_text(aes(color = Score_Type), size = 2.5) +
  facet_wrap(~Name) +
  scale_color_manual(values = c("grey80", "red","forestgreen")) +
  theme_few() +
  labs(x = "", y = "") +
  ylim(-5, 40) +
  theme(legend.title = element_blank(),
        legend.position = "none", 
        axis.ticks = element_blank(),
        axis.text = element_blank())

Club Helm Stats

Two different ways of visualizing the distribution of results by club. On the left, boxplots, showing the variation in individual helm scores by club as well as summary statistics encoded in the boxes. On the right, a more traditional histogram view.

df %>%
  filter(Score < 1000) %>%
  group_by(Club) %>% 
  mutate(Club_Total = median(Score, na.rm = T)) %>% 
  ggplot(aes(reorder(Club, -Club_Total), Score, color = Club)) +
  geom_boxplot(aes(colour = Club)) +
  geom_jitter(width = 0.05, alpha = .5) +
  coord_flip() +
  scale_color_brewer(palette = "Set1") +
  labs(x = "", y = "Race Result") +
  theme(legend.position = "none")

df %>%
  filter(Score < 1000) %>%
  group_by(Club) %>% 
  mutate(Club_Total = median(Score, na.rm = T)) %>% 
  ggplot(aes(x = Score, y = reorder(Club, -Club_Total), fill = Club)) +
  geom_density_ridges(stat = "binline",scale = .9, binwidth = 1) +
  labs(x = "Race Result", y = "") +
  theme(legend.position = "none") +
  scale_fill_brewer(palette = "Set1")

Crew Stats

rawcrew <- read_html(
  "https://albacore.ca/race_results/2019_results_FNR_Crew_club-owned-boats.html") %>% 
  html_table() %>% 
  as.data.frame()

dfcrew <- rawcrew %>%
  select(-Rank, -Total, -Drop, -Club.Rank, -Avg) %>% 
  gather(Week, Score, -c(1:2)) %>%
  mutate(Week = sub('X', '', Week)) %>%
  mutate(Week = paste(Week, "2019", sep = ".")) %>% 
  mutate(Week = mdy(Week)) %>%
  mutate(Score = case_when(grepl("\\(", Score) ~ "drop",
                           grepl("DNC", Score) ~ "1000",
                           grepl("OCS", Score) ~ sub(" OCS", "", Score),
                           grepl("DNF", Score) ~ sub(" DNF", "", Score),
                           TRUE ~ Score)) %>%
  mutate(Score = as.numeric(Score))

dfcrew %>% 
  group_by(Name, Club) %>% 
  summarize(`Total Points` = sum(Score, na.rm = T),
            Best = min(Score, na.rm = T),
            Average = round(mean(Score, na.rm = T), 1),
            Worst = max(Score, na.rm = T)) %>%
  arrange(`Total Points`) %>%
  filter(`Total Points` < 1000)

dfcrew %>% 
  group_by(Name, Club) %>% 
  mutate(Total_Points = sum(Score, na.rm = T)) %>% 
  filter(Total_Points < 1000) %>%
  na.omit() %>% 
  ggplot(aes(Score, reorder(Name, -Total_Points), color = Club)) +
  geom_vline(xintercept = c(1:5, 10, 20, 30), color = "grey90") +
  geom_point(alpha = .5) +
  geom_path(alpha = .3) +
  scale_x_continuous(breaks=c(1,5,10,20,30), position = "top") +
  labs(y = "", x = "") +
  scale_color_brewer(palette = "Set1")

Crew Trends

trenddfcrew <- rawcrew %>%
  select(-Rank, -Total, -Drop, -Club.Rank, -Avg) %>%
  gather(Week, Score, -c(1:2)) %>%
  mutate(Week = sub('X', '', Week)) %>%
  mutate(Week = paste(Week, "2019", sep = ".")) %>%
  mutate(Week = mdy(Week)) %>%
  mutate(Score2 = as.numeric(str_extract(Score, "(\\d)+"))) %>% 
  mutate(Score_Type = case_when(Score2 > 0 & grepl("\\(", Score) ~ "Drop",
                                Score2 == 1000 ~ "DNC",
                                TRUE ~ "Race")) %>% 
  mutate(Score_Type = if_else(Score2 == 1000 & Score_Type == "Drop", "DNC Drop", Score_Type)) %>% 
  mutate(Score = if_else(Score2 == 1000 & Score_Type == "DNC Drop", 0, Score2)) %>% 
  group_by(Name) %>% 
  mutate(Total = sum(Score[Score_Type %in% c("Race","DNC")])) %>% 
  ungroup() %>% 
  arrange(Week,Total) %>% 
  group_by(Week) %>% 
  mutate(Rank = row_number())

trenddfcrew %>%
  filter(Rank %in% c(1:12)) %>% 
  ggplot(aes(Week, Score, label = Score)) +
  geom_text(aes(color = Score_Type), size = 2.5) +
  facet_wrap(~Name) +
  scale_color_manual(values = c("grey80", "red","forestgreen")) +
  theme_few() +
  labs(x = "", y = "") +
  ylim(-5, 40) +
  theme(legend.title = element_blank(),
        legend.position = "bottom", 
        axis.ticks = element_blank(),
        axis.text = element_blank())

Imputing Helm results

The final accounting for the FNR takes into account your top 8 races. In all of the analysis above, we’ve restricted our view to those helms (and crews) who have at least 8 completed races, so they don’t have a 1000 DNC counting in their final results. There are a bunch of sailors who just missed on a proper ranking by only having 7 completed races. One way of seeing how they ‘could have done’ is by replacing that one 1000 DNC with the mean result over their 7 completed races.

The visual below re-does the helm ranking above, but adds in the helms who had one 1000 DNC in their top 8 results. that 1000 DNC has been replaced with their mean score on the other 7 nights they raced. The helms who have now been added in are identified in red. As you can see, the only change in the top 10 is that Jeff Krause is now 7^th. Jeff, Maxim, Milutin and Colum all had at least one top 5 finish over the course of the summer, but don’t crack the rankings above because they had one too few races.

df %>%
  na.omit() %>% 
  group_by(Name, Club) %>% 
  mutate(Total_Points = sum(Score, na.rm = T)) %>%
  arrange(Total_Points, Name) %>% 
  filter(Total_Points < 2000) %>%
  mutate(Score = if_else(Score == 1000, NA_real_, Score),
         Mean_Score = mean(Score, na.rm = T),
         Score = if_else(is.na(Score), Mean_Score, Score)) %>% 
  select(Name, Club, Score, Total_Points) %>% 
  mutate(Modified_Tag = if_else(Total_Points >= 1000, "Modified", "Unmodified")) %>% 
  mutate(Modified_Total = sum(Score, na.rm = T)) %>% 
  ggplot(aes(Score, reorder(Name, -Modified_Total), color = Modified_Tag)) +
  geom_vline(xintercept = c(1:5, 10, 20, 30), color = "grey90") +
  geom_point(alpha = .5) +
  geom_path(alpha = .3) +
  scale_x_continuous(breaks=c(1,5,10,20,30), position = "top") +
  scale_color_manual(values = c("red", "black")) +
  labs(y = "", x = "") +
  theme(legend.position = "none")

Crews and Helms together

Helms and Crews don’t just perform in isolation, however: they impact each other. Intuitively, a crew sailing with a better helm will, all else equal, perform better than a crew sailing with a worse helm (and vice versa). One way of thinking about this as a research question might be: how much does crew quality influence helm results?

The tricky part about that is measuring quality. One way of doing it might be looking at the average of all previous races, and then generating a score based on that. So the first race of the season nobody would have a score. One potential problem here might be that if you have a shit draw that first race, then that will impact your score going forward. So maybe we can apply a discount factor to that first race.

Ok, first things first: can we calculate a “crew quality” metric on a sailor-by-sailor, race-by-race basis? The approach below takes the following steps to do this:

replaces 1000 DNCs with NAs
sorts and groups the data by sailor and date, so that future operations are done on a within-sailor basis
creates a lagged weekly Score variable using dplyr’s lag function, which basically gives the value of the previous week’s result. This is necessary because if we calculated a cumulative average without doing this (ie. on the Score variable) then the cumulative average in any given week would include that week’s result, which isn’t what we want.
creates a cumulative average variable, crew_quality, where “quality” is basically an average of the sailor’s previous results.

quality_crew <- rawcrew %>% 
  select(-Rank, -Total, -Drop, -Club.Rank, -Avg) %>%
  gather(Week, Score, -c(1:2)) %>%
  mutate(Week = sub('X', '', Week)) %>%
  mutate(Week = paste(Week, "2019", sep = ".")) %>%
  mutate(Week = mdy(Week)) %>%
  mutate(Score = as.numeric(str_extract(Score, "(\\d)+"))) %>% 
  mutate(Score = if_else(Score == 1000, NA_real_, Score)) %>% 
  arrange(Name, Week) %>% 
  filter(!is.na(Score)) %>% 
  group_by(Name) %>% 
  mutate(Prev_Score = lag(Score)) %>% 
  filter(!is.na(Prev_Score)) %>% 
  mutate(crew_quality = cummean(Prev_Score)) %>% 
  rename(crew_name = Name) %>% 
  select(-Club, -Prev_Score)

quality_crew %>% 
  filter(grepl("Bem", crew_name))

That actually seems to work well. Let’s look at Michal Bem’s results to see how this works in practice. Michal started the season with a 13, buf followed up with two consecutive weeks in the mugs, dropping (improving) his quality score from 13 to 6.3. While the 46th (DNF) hurts, Michal’s quality score at the end of the season is one of the lowest in the fleet.

The next step is to do the same for helms, and then join the two together. The join combines the crew and helm datasets by Week and Score (ie. if we know when the race happened, and we know how each helm and crew placed in that race, then we can tie them together):

quality_df <- raw %>% 
  select(-Rank, -Total, -Drop, -Club.Rank, -Avg) %>%
  gather(Week, Score, -c(1:2)) %>%
  mutate(Week = sub('X', '', Week)) %>%
  mutate(Week = paste(Week, "2019", sep = ".")) %>%
  mutate(Week = mdy(Week)) %>%
  mutate(Score = as.numeric(str_extract(Score, "(\\d)+"))) %>% 
  mutate(Score = if_else(Score == 1000, NA_real_, Score)) %>% 
  arrange(Name, Week) %>% 
  filter(!is.na(Score)) %>% 
  group_by(Name) %>% 
  mutate(Prev_Score = lag(Score)) %>% 
  filter(!is.na(Prev_Score)) %>% 
  mutate(helm_quality = cummean(Prev_Score)) %>% 
  rename(helm_name = Name) %>% 
  select(-Prev_Score) %>% 
  inner_join(., quality_crew)

## Joining, by = c("Week", "Score")

quality_df

Cool! So now what we have is a dataset where each row is a boat with both Helm and Crew in a given Friday Night Race. So using this: which helms got the best draws over the course of the summer?

quality_df %>% 
  group_by(helm_name) %>% 
  summarize(avg_crew_quality = mean(crew_quality),
            number_of_races = n()) %>% 
  arrange(avg_crew_quality)

Looks like Ann and Frank had the best draws… but you can’t read too much into that: it looks like Ann only had one race where we know the quality of her crew, and Frank only 2. Robbin had a pretty good draw over the course of the summer, it looks like. We can confirm:

quality_df %>% 
  filter(grepl("Robbin", helm_name))

Yep, those are pretty good draws. Now Robbin will say hey, you’re missing my 9th place finish in August, what gives? Unfortunately we don’t have a crew_quality value for that race because Robbin was sailing with Colin McAuley, and it was his first race, and so we don’t have any info about how good he is.

But how much does the quality of the crew matter anyways? We can take a look at that by plotting crew_quality against Score across the whole fleet. While we do that, let’s also take a look at the relationship between helm_quality and Score as well. So, just to clarify, in the plots below each dot is a specific sailor’s race: the x axis corresponds to the quality score of the crew (or helm, on the right) in that race, and the y axis corresponds to the result in that specific race.

quality_df %>% 
  ungroup() %>% 
  select(Club, Score, helm_quality, crew_quality) %>% 
  rename(`Helm Quality` = helm_quality,
         `Crew Quality` = crew_quality) %>% 
  gather(quality_type, value, -c(1:2)) %>% 
  ggplot(aes(value, Score)) +
  geom_point() +
  facet_wrap(~ quality_type, scales = "free") +
  geom_smooth(method = "lm") +
  labs(x = "Quality (Avg. of Previous Races)",
       y = "Race Result") +
  theme_grey()

It doesn’t look like there’s much of a relationship between Crew Quality and Results – or, at least, the relationship isn’t as obvious as it could be: drawing a good crew (on the left hand side of the x axis) doesn’t deliver good results with any predictability. There does, however, seem to be a strong relationship between Helm Quality and Race Results. On any given night, the helms who do well are the ones we would expect to do well given their previous races.

A model predicting Score by helm_quality and crew_quality confirms this: positive, strong, and statistically significant effect of helm_quality on Score, no similar finding for crew_quality. The major caveat here is that this is basically a toy model, and doesn’t explain much of the variance in Score. helm_quality is only part of the story: if we could, it would also be great to quantify the quality of a start (maybe by first mark rounding position?), wind conditions and direction, etc. At the end of the day most everyone, based on common sense, would agree that drawing a good crew is on balance a very good thing, but measuring the impact of that effect is tricky because it’s impact is likely a lot more subtle than, for instance, the impact of being a good helm.

summary(lm(Score ~ helm_quality + crew_quality, data = quality_df))

## 
## Call:
## lm(formula = Score ~ helm_quality + crew_quality, data = quality_df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -20.833  -7.512  -2.101   5.672  27.298 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   8.78763    1.85059   4.749 3.14e-06 ***
## helm_quality  0.62484    0.07674   8.143 9.62e-15 ***
## crew_quality -0.06434    0.09278  -0.693    0.489    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.22 on 309 degrees of freedom
## Multiple R-squared:  0.1769, Adjusted R-squared:  0.1716 
## F-statistic: 33.21 on 2 and 309 DF,  p-value: 8.602e-14

One last cut at this: it would be interesting to know whether this “finding” holds at each of the clubs, or whether it is driven by some clubs and not others:

quality_df %>% 
  ungroup() %>% 
  select(Club, Score, helm_quality, crew_quality) %>% 
  rename(`Helm Quality` = helm_quality,
         `Crew Quality` = crew_quality) %>% 
  gather(quality_type, value, -c(1:2)) %>% 
  ggplot(aes(value, Score)) +
  geom_point() +
  facet_grid(Club ~ quality_type, scales = "free") +
  geom_smooth(method = "lm") +
  labs(x = "Quality (Avg. of Previous Races)",
       y = "Race Result") +
  theme_grey()

It looks like it holds: on the Helm Quality side, every club shows a positive (and probably statistically significant) trend. None of the clubs show the same trend for Crew Quality: TISC actually shows a negative trend, though that might be an artifact of both data sparseness and poor crew identification.

Thanks for reading!

knitr::include_graphics("albacores.jpg")

Why 9? I like having symmetrical plots, and 10 facets in the plot would have made the row column dimensions uneven :).↩