sat, 28-sep-2019, 18:46

Introduction

At the 57th running of the Equinox Marathon last weekend Aaron Fletcher broke Stan Justice’s 1985 course record, one of the oldest running records in Alaska sports. On the Equinox Marathon Facebook page Stan and Matias Saari were discussing whether more favorable weather might have meant an even faster record-breaking effort. Stan writes:

Where is a statistician when you need one. Would be interesting to compare times of all 2018 runners with their 2019 times.

I’m not a statistician, but let’s take a look.

Results

We’ve got Equinox Marathon finish time data going back to 1997, so we’ll compare the finish times for all runners who competed in consecutive years, subtracting their current year finish times (in hours) from the previous year. By this metric, negative values indicate individuals who ran faster in the current year than the previous. For example, I completed the race in 4:40:05 in 2018, and finished in 4:33:42 this year. My “hours_delta” for 2019 is -0.106 hours, or 6 minutes, 23 seconds faster.

Here’s the distribution of this statistic for 2019:

//media.swingleydev.com/img/blog/2019/09/twenty_nineteen_histogram.svgz

There are several people who were dramatically faster (on the left side of the graph), but the overall picture shows that times in 2019 were slower than 2018. The dark cyan line is the median value, which is at 0.18 hours or 10 minutes, 35 seconds slower. There were 53 runners that ran the race faster in 2019 than 2018 (including me), and 115 who were slower. That’s a pretty dramatic difference.

Here’s that relationship for all the years where we have data:

//media.swingleydev.com/img/blog/2019/09/slower_faster.svgz

The orange bars are runners who ran that year’s Equinox faster than the previous year and the dark cyan bars are those who were slower. 2019 is dramatically different than most other years for how much slower most people ran. 2013 is another particularly slow year. Fast years include 2007, 2009, and last year.

Here’s another way to look at the data. It shows the median number of minutes runners ran Equinox faster (negative numbers) or slower (positive) in consecutive years.

//media.swingleydev.com/img/blog/2019/09/median_diff_one_year.svgz

You can see that finish times were dramatically slower in 2019, and much faster in 2018. Since this comparison is using paired comparisons between years, at least part of the reason 2019 seemed like such a slow race is that 2018 was a fast one.

Two-year lag

Let’s see what happens if we use a two-year lag to calculate the differences. Instead of comparing the current year’s results with the previous year for individual runners that raced in both years, we’ll compare the current year with two years prior. For example runners that ran the race this year and in 2017.

Here’s what the distribution looks like comparing 2019 and 2017 results from the same runner.

//media.swingleydev.com/img/blog/2019/09/two_year_histogram.svgz

It’s a similar pattern, with the median values at 0.18 hours, indicating that runners were almost 10 minutes slower in 2019 when compared against their 2017 times. This strengthens the evidence that 2019 was a particularly difficult year to run the race.

Median difference by year for all years of the two-year lag data:

//media.swingleydev.com/img/blog/2019/09/two_year_diffs.svgz

Remember that the dark cyan bars are years with slower finish times and orange are faster. 2019 still comes out as an outlier, along with 2013. 2007 is the clear winner for fast times.

All pairwise race results

If we can do one and two year lags, how about combining all the pairwise race results? At some point the comparison is no longer a good one because of the large time interval between races, so we will restrict the comparisons to six or fewer years between results. We’ll also remove the earliest years from the results because those years are likely biased by having fewer long lag results.

Here’s the same plot showing difference times in minutes for all pairwise race results, six years and fewer.

//media.swingleydev.com/img/blog/2019/09/all_through_six_median_diff_minutes.svgz

You can see that there’s a pretty strong bias toward slower times, which is likely due to people aging and their times getting slower. The conditions were good enough in 2007 that this aging effect was offset and people running in that race tended to do it faster than their earlier performances despite being older. Even so, 2019 still stands out as one of the most difficult races.

Here’s the aging effect:

##
## Call:
## lm(formula = hours_delta ~ years_delta, data = all_through_six)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -6.3242 -0.3639 -0.0415  0.3115  6.4441
##
## Coefficients:
##             Estimate Std. Error t value            Pr(>|t|)
## (Intercept) -0.03390    0.01934  -1.752              0.0797 .
## years_delta  0.05152    0.00558   9.234 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8845 on 8664 degrees of freedom
## Multiple R-squared:  0.009745,   Adjusted R-squared:  0.00963
## F-statistic: 85.26 on 1 and 8664 DF,  p-value: < 0.00000000000000022

There’s a very significant positive relationship between the difference in years and the difference in marathon times for those runners (years_delta in the coefficient results above). The longer the gap between races, the slower a runner is by just over 3 minutes each year. Notice, however, that the noise in the data is so great that this model, no matter how significant the coefficients, explains almost none of the variation in the difference in marathon times (dismally small R-squared values).

Weather

The conditions in this year’s race were particularly harsh with a fairly constant 40 °F temperature and light rain falling at valley level; and below freezing temperatures, high winds, and snow falling up on Ester Dome. The trail was muddy, soft, and slippery in places, especially the single track and the on the unpaved section of Henderson Road. Compare this with last year when the weather was gorgeous: dry, sunny, and temperatures ranging from 39—60 °F.

We took a look at the differences in weather between years to see if there is a relationship between weather differences and finish time differences, but none of the models we tried were any good at predicting differences in finish times, probably because of the huge variation in finish times that had nothing to do with the weather. There are too many other factors contributing to an individual’s performance from one year to the next to be able to pull out just the effects of weather on the results.

Conclusion

2019 was a very slow year when we compared runners who completed Equinox in 2019 and earlier years. In fact, there’s some evidence that it’s the slowest year of all the years considered here (1997—2019). We could find no statistical evidence to show that weather was the cause of this, but anyone who was out there on race day this year knows it played a part in their finish times. I ran the race this year and last and managed to improve on my time despite the conditions, but I don’t think there’s any question that I would have improved my time even more had it been warm and sunny instead of cold, windy, and wet. Congratulations to all the competitors in this year’s race. It was a fun, but challenging year for Equinox.

tags: running  R  Equinox Marathon 
Meta Photolog Archives