thu, 13-sep-2018, 17:40

Introduction

A couple years ago I wrote a post about past Equinox Marathon weather. Since that post Andrea and I have run the relay twice, and I plan on running the full marathon in a couple days. This post updates the statistics and plots to include two more years of the race.

Methods

Methods and data are the same as in my previous post, except the daily data has been updated to include 2016 and 2017. The R code is available at the end of the previous post.

Results

Race day weather

Temperatures at the airport on race day ranged from 19.9 °F in 1972 to 35.1 °F in 1969, but the average range is between 34.1 and 53.1 °F. Using our model of Ester Dome temperatures, we get an average range of 29.5 and 47.3 °F and an overall min / max of 16.1 / 61.3 °F. Generally speaking, it will be below freezing on Ester Dome, but possibly before most of the runners get up there.

Precipitation (rain, sleet or snow) has fallen on 16 out of 55 race days, or 29% of the time, and measurable snowfall has been recorded on four of those sixteen. The highest amount fell in 2014 with 0.36 inches of liquid precipitation (no snow was recorded and the temperatures were between 45 and 51 °F so it was almost certainly all rain, even on Ester Dome). More than a quarter of an inch of precipitation fell in three of the sixteen years when it rained or snowed (1990, 1993, and 2014), but most rainfall totals are much smaller.

Measurable snow fell at the airport in four years, or seven percent of the time: 4.1 inches in 1993, 2.1 inches in 1985, 1.2 inches in 1996, and 0.4 inches in 1992. But that’s at the airport station. Five of the 12 years where measurable precipitation fell at the airport and no snow fell, had possible minimum temperatures on Ester Dome that were below freezing. It’s likely that some of the precipitation recorded at the airport in those years was coming down as snow up on Ester Dome. If so, that means snow may have fallen on nine race days, bringing the percentage up to sixteen percent.

Wind data from the airport has only been recorded since 1984, but from those years the average wind speed at the airport on race day is 4.8 miles per hour. The highest 2-minute wind speed during Equinox race day was 21 miles per hour in 2003. Unfortunately, no wind data is available for Ester Dome, but it’s likely to be higher than what is recorded at the airport.

Weather from the week prior

It’s also useful to look at the weather from the week before the race, since excessive pre-race rain or snow can make conditions on race day very different, even if the race day weather is pleasant. The year I ran the full marathon (2013), it snowed the week before and much of the trail in the woods before the water stop near Henderson and all of the out and back were covered in snow.

The most dramatic example of this was 1992 where 23 inches (!) of snow fell at the airport in the week prior to the race, with much higher totals up on the summit of Ester Dome. Measurable snow has been recorded at the airport in the week prior to six races, but all the weekly totals are under an inch except for the snow year of 1992.

Precipitation has fallen in 44 of 55 pre-race weeks (80% of the time). Three years have had more than an inch of precipitation prior to the race: 1.49 inches in 2015, 1.26 inches in 1992 (most of which fell as snow), and 1.05 inches in 2007. On average, just over two tenths of an inch of precipitation falls in the week before the race.

Summary

The following stacked plots shows the weather for all 55 runnings of the Equinox marathon. The top panel shows the range of temperatures on race day from the airport station (wide bars) and estimated on Ester Dome (thin lines below bars). The shaded area at the bottom shows where temperatures are below freezing.

The middle panel shows race day liquid precipitation (rain, melted snow). Bars marked with an asterisk indicate years where snow was also recorded at the airport, but remember that five of the other years with liquid precipitation probably experienced snow on Ester Dome (1977, 1986, 1991, 1994, and 2016) because the temperatures were likely to be below freezing at elevation.

The bottom panel shows precipitation totals from the week prior to the race. Bars marked with an asterisk indicate weeks where snow was also recorded at the airport.

Equinox Marathon Weather

Here’s a table with most of the data from the analysis. A CSV with this data can be downloaded from all_wx.csv

Date min t max t ED min t ED max t awnd prcp snow p prcp p snow
1963-09-21 32.0 54.0 27.5 48.2   0.00 0.0 0.01 0.0
1964-09-19 34.0 57.9 29.4 51.8   0.00 0.0 0.03 0.0
1965-09-25 37.9 60.1 33.1 53.9   0.00 0.0 0.80 0.0
1966-09-24 36.0 62.1 31.3 55.8   0.00 0.0 0.01 0.0
1967-09-23 35.1 57.9 30.4 51.8   0.00 0.0 0.00 0.0
1968-09-21 23.0 44.1 19.1 38.9   0.00 0.0 0.04 0.0
1969-09-20 35.1 68.0 30.4 61.3   0.00 0.0 0.00 0.0
1970-09-19 24.1 39.9 20.1 34.9   0.00 0.0 0.42 0.0
1971-09-18 35.1 55.9 30.4 50.0   0.00 0.0 0.14 0.0
1972-09-23 19.9 42.1 16.1 37.0   0.00 0.0 0.01 0.2
1973-09-22 30.0 44.1 25.6 38.9   0.00 0.0 0.05 0.0
1974-09-21 48.0 60.1 42.5 53.9   0.08 0.0 0.00 0.0
1975-09-20 37.9 55.9 33.1 50.0   0.02 0.0 0.02 0.0
1976-09-18 34.0 59.0 29.4 52.9   0.00 0.0 0.54 0.0
1977-09-24 36.0 48.9 31.3 43.4   0.06 0.0 0.20 0.0
1978-09-23 30.0 42.1 25.6 37.0   0.00 0.0 0.10 0.3
1979-09-22 35.1 62.1 30.4 55.8   0.00 0.0 0.17 0.0
1980-09-20 30.9 43.0 26.5 37.8   0.00 0.0 0.35 0.0
1981-09-19 37.0 43.0 32.2 37.8   0.15 0.0 0.04 0.0
1982-09-18 42.1 61.0 37.0 54.8   0.02 0.0 0.22 0.0
1983-09-17 39.9 46.9 34.9 41.5   0.00 0.0 0.05 0.0
1984-09-22 28.9 60.1 24.6 53.9 5.8 0.00 0.0 0.08 0.0
1985-09-21 30.9 42.1 26.5 37.0 6.5 0.14 2.1 0.57 0.0
1986-09-20 36.0 52.0 31.3 46.3 8.3 0.07 0.0 0.21 0.0
1987-09-19 37.9 61.0 33.1 54.8 6.3 0.00 0.0 0.00 0.0
1988-09-24 37.0 45.0 32.2 39.7 4.0 0.00 0.0 0.11 0.0
1989-09-23 36.0 61.0 31.3 54.8 8.5 0.00 0.0 0.07 0.5
1990-09-22 37.9 50.0 33.1 44.4 7.8 0.26 0.0 0.00 0.0
1991-09-21 36.0 57.0 31.3 51.0 4.5 0.04 0.0 0.03 0.0
1992-09-19 24.1 33.1 20.1 28.5 6.7 0.01 0.4 1.26 23.0
1993-09-18 28.0 37.0 23.8 32.2 4.9 0.29 4.1 0.37 0.3
1994-09-24 27.0 51.1 22.8 45.5 6.0 0.02 0.0 0.08 0.0
1995-09-23 43.0 66.9 37.8 60.3 4.0 0.00 0.0 0.00 0.0
1996-09-21 28.9 37.9 24.6 33.1 6.9 0.06 1.2 0.26 0.0
1997-09-20 27.0 55.0 22.8 49.1 3.8 0.00 0.0 0.03 0.0
1998-09-19 42.1 60.1 37.0 53.9 4.9 0.00 0.0 0.37 0.0
1999-09-18 39.0 64.9 34.1 58.4 3.8 0.00 0.0 0.26 0.0
2000-09-16 28.9 50.0 24.6 44.4 5.6 0.00 0.0 0.30 0.0
2001-09-22 33.1 57.0 28.5 51.0 1.6 0.00 0.0 0.00 0.0
2002-09-21 33.1 48.9 28.5 43.4 3.8 0.00 0.0 0.03 0.0
2003-09-20 26.1 46.0 22.0 40.7 9.6 0.00 0.0 0.00 0.0
2004-09-18 26.1 48.0 22.0 42.5 4.3 0.00 0.0 0.25 0.0
2005-09-17 37.0 63.0 32.2 56.6 0.9 0.00 0.0 0.09 0.0
2006-09-16 46.0 64.0 40.7 57.6 4.3 0.00 0.0 0.00 0.0
2007-09-22 25.0 45.0 20.9 39.7 4.7 0.00 0.0 1.05 0.0
2008-09-20 34.0 51.1 29.4 45.5 4.5 0.00 0.0 0.08 0.0
2009-09-19 39.0 50.0 34.1 44.4 5.8 0.00 0.0 0.25 0.0
2010-09-18 35.1 64.9 30.4 58.4 2.5 0.00 0.0 0.00 0.0
2011-09-17 39.9 57.9 34.9 51.8 1.3 0.00 0.0 0.44 0.0
2012-09-22 46.9 66.9 41.5 60.3 6.0 0.00 0.0 0.33 0.0
2013-09-21 24.3 44.1 20.3 38.9 5.1 0.00 0.0 0.13 0.6
2014-09-20 45.0 51.1 39.7 45.5 1.6 0.36 0.0 0.00 0.0
2015-09-19 37.9 44.1 33.1 38.9 2.9 0.01 0.0 1.49 0.0
2016-09-17 34.0 57.9 29.4 51.8 2.2 0.01 0.0 0.61 0.0
2017-09-16 33.1 66.0 28.5 59.5 3.1 0.00 0.0 0.02 0.0
sun, 09-sep-2018, 10:54

Introduction

In previous posts (Fairbanks Race Predictor, Equinox from Santa Claus, Equinox from Gold Discovery) I’ve looked at predicting Equinox Marathon results based on results from earlier races. In all those cases I’ve looked at single race comparisons: how results from Gold Discovery can predict Marathon times, for example. In this post I’ll look at all the Usibelli Series races I completed this year to see how they can inform my expectations for next Saturday’s Equinox Marathon.

Methods

I’ve been collecting the results from all Usibelli Series races since 2010. Using that data, grouped by the name of the person racing and year, find all runners that completed the same set of Usibelli Series races that I finished in 2018, as well as their Equinox Marathon finish pace. Between 2010 and 2017 there are 160 records that match.

The data looks like this. crr is that person’s Chena River Run pace in minutes, msr is Midnight Sun Run pace for the same person and year, rotv is the pace from Run of the Valkyries, gdr is the Gold Discovery Run, and em is Equniox Marathon pace for that same person and year.

crr msr rotv gdr em
8.1559 8.8817 8.1833 10.2848 11.8683
8.7210 9.1387 9.2120 11.0152 13.6796
8.7946 9.0640 9.0077 11.3565 13.1755
9.4409 10.6091 9.6250 11.2080 13.1719
7.3581 7.1836 7.1310 8.0001 9.6565
7.4731 7.5349 7.4700 8.2465 9.8359
... ... ... ... ...

I will use two methods for using these records to predict Equinox Marathon times, multivariate linear regression and Random Forest.

The R code for the analysis appears at the end of this post.

Results

Linear regression

We start with linear regression, which isn’t entirely appropriate for this analysis because the independent variables (pre-Equinox race pace times) aren’t really independent of one another. A person who runs a 6 minute pace in the Chena River Run is likely to also be someone who runs Gold Discovery faster than the average runner. This relationship, in fact, is the basis for this analysis.

I started with a model that includes all the races I completed in 2018, but pace time for the Midnight Sun Run wasn’t statistically significant so I removed it from the final model, which included Chena River Run, Run of the Valkyries, and Gold Discovery.

This model is significant, as are all the coefficients except the intercept, and the model explains nearly 80% of the variation in the data:

##
## Call:
## lm(formula = em ~ crr + gdr + rotv, data = input_pivot)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -3.8837 -0.6534 -0.2265  0.3549  5.8273
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   0.6217     0.5692   1.092 0.276420
## crr          -0.3723     0.1346  -2.765 0.006380 **
## gdr           0.8422     0.1169   7.206 2.32e-11 ***
## rotv          0.7607     0.2119   3.591 0.000442 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.278 on 156 degrees of freedom
## Multiple R-squared:  0.786,  Adjusted R-squared:  0.7819
## F-statistic:   191 on 3 and 156 DF,  p-value: < 2.2e-16

Using this model and my 2018 results, my overall pace and finish times for Equinox are predicted to be 10:45 and 4:41:50. The 95% confidence intervals for these predictions are 10:30–11:01 and 4:35:11–4:48:28.

Random Forest

Random Forest is another regression method but it doesn’t require independent variables be independent of one another. Here are the results of building 5,000 random trees from the data:

##
## Call:
##  randomForest(formula = em ~ ., data = input_pivot, ntree = 5000)
##                Type of random forest: regression
##                      Number of trees: 5000
## No. of variables tried at each split: 1
##
##           Mean of squared residuals: 1.87325
##                     % Var explained: 74.82

##      IncNodePurity
## crr       260.8279
## gdr       321.3691
## msr       268.0936
## rotv      295.4250

This model, which includes all race results explains just under 74% of the variation in the data. And you can see from the importance result that Gold Discovery results factor more heavily in the result than earlier races in the season like Chena River Run and the Midnight Sun Run.

Using this model, my predicted pace is 10:13 and my finish time is 4:27:46. The 95% confidence intervals are 9:23–11:40 and 4:05:58–5:05:34. You’ll notice that the confidence intervals are wider than with linear regression, probably because there are fewer assumptions with Random Forest and less power.

Conclusion

My number one goal for this year’s Equinox Marathon is simply to finish without injuring myself, something I wasn’t able to do the last time I ran the whole race in 2013. I finished in 4:49:28 with an overall pace of 11:02, but the race or my training for it resulted in a torn hip labrum.

If I’m able to finish uninjured, I’d like to beat my time from 2013. These results suggest I should have no problem acheiving my second goal and perhaps knowing how much faster these predictions are from my 2013 times, I can race conservatively and still get a personal best time.

Appendix - R code

library(tidyverse)
library(RPostgres)
library(lubridate)
library(glue)
library(randomForest)
library(knitr)

races <- dbConnect(Postgres(),
                   host = "localhost",
                   dbname = "races")

all_races <- races %>%
    tbl("all_races")

usibelli_races <- tibble(race = c("Chena River Run",
                                  "Midnight Sun Run",
                                  "Jim Loftus Mile",
                                  "Run of the Valkyries",
                                  "Gold Discovery Run",
                                  "Santa Claus Half Marathon",
                                  "Golden Heart Trail Run",
                                  "Equinox Marathon"))

css_2018 <- all_races %>%
    inner_join(usibelli_races, copy = TRUE) %>%
    filter(year == 2018,
           name == "Christopher Swingley") %>%
    collect()

candidate_races <- css_2018 %>%
    select(race) %>%
    bind_rows(tibble(race = c("Equinox Marathon")))

input_data <- all_races %>%
    inner_join(candidate_races, copy = TRUE) %>%
    filter(!is.na(gender), !is.na(birth_year)) %>%
    collect()

input_pivot <- input_data %>%
    group_by(race, name, year) %>%
    mutate(n = n()) %>%
    filter(n == 1) %>%
    ungroup() %>%
    select(name, year, race, pace_min) %>%
    spread(race, pace_min) %>%
    rename(crr = `Chena River Run`,
           msr = `Midnight Sun Run`,
           rotv = `Run of the Valkyries`,
           gdr = `Gold Discovery Run`,
           em = `Equinox Marathon`) %>%
    filter(!is.na(crr), !is.na(msr), !is.na(rotv),
           !is.na(gdr), !is.na(em)) %>%
    select(-c(name, year))

kable(input_pivot %>% head)

css_2018_pivot <- css_2018 %>%
    select(name, year, race, pace_min) %>%
    spread(race, pace_min) %>%
    rename(crr = `Chena River Run`,
           msr = `Midnight Sun Run`,
           rotv = `Run of the Valkyries`,
           gdr = `Gold Discovery Run`) %>%
    select(-c(name, year))

pace <- function(minutes) {
    mm = floor(minutes)
    seconds = (minutes - mm) * 60

    glue('{mm}:{sprintf("%02.0f", seconds)}')
}

finish_time <- function(minutes) {
    hh = floor(minutes / 60.0)
    min = minutes - (hh * 60)
    mm = floor(min)
    seconds = (min - mm) * 60

    glue('{hh}:{sprintf("%02d", mm)}:{sprintf("%02.0f", seconds)}')
}

lm_model <- lm(em ~ crr + gdr + rotv,
               data = input_pivot)

summary(lm_model)

prediction <- predict(lm_model, css_2018_pivot,
                      interval = "confidence", level = 0.95)

prediction

rf <- randomForest(em ~ .,
                   data = input_pivot,
                   ntree = 5000)
rf
importance(rf)

rfp_all <- predict(rf, css_2018_pivot, predict.all = TRUE)

rfp_all$aggregate

rf_ci <- quantile(rfp_all$individual, c(0.025, 0.975))

rf_ci
Meta Photolog Archives