While riding to work this morning I figured out a way to disentangle the effects of trail quality and physical conditioning (both of which improve over the season) from temperature, which also tends to increase throughout the season. As you recall in my previous post, I found that days into the season (winter day of year) and minimum temperature were both negatively related with fat bike energy consumption. But because those variables are also related to each other, we can’t make statements about them individually.
But what if we look at pairs of trips that are within two days of each other and look at the difference in temperature between those trips and the difference in energy consumption? We’ll only pair trips going the same direction (to or from work), and we’ll restrict the pairings to two days or less. That eliminates seasonality from the data because we’re always comparing two trips from the same few days.
For this analysis, I’m using SQL to filter the data because I’m better at window functions and filtering in SQL than R. Here’s the code to grab the data from the database. (The CSV file and RMarkdown script is on my GitHub repo for this analysis). The trick here is to categorize trips as being to work (“north”) or from work (“south”) and then include this field in the partition statement of the window function so I’m only getting the next trip that matches direction.
library(dplyr) library(ggplot2) library(scales) exercise_db <- src_postgres(host="example.com", dbname="exercise_data") diffs <- tbl(exercise_db, build_sql( "WITH all_to_work AS ( SELECT *, CASE WHEN extract(hour from start_time) < 11 THEN 'north' ELSE 'south' END AS direction FROM track_stats WHERE type = 'Fat Biking' AND miles between 4 and 4.3 ), with_next AS ( SELECT track_id, start_time, direction, kcal, miles, min_temp, lead(direction) OVER w AS next_direction, lead(start_time) OVER w AS next_start_time, lead(kcal) OVER w AS next_kcal, lead(miles) OVER w AS next_miles, lead(min_temp) OVER w AS next_min_temp FROM all_to_work WINDOW w AS (PARTITION BY direction ORDER BY start_time) ) SELECT start_time, next_start_time, direction, min_temp, next_min_temp, kcal / miles AS kcal_per_mile, next_kcal / next_miles as next_kcal_per_mile, next_min_temp - min_temp AS temp_diff, (next_kcal / next_miles) - (kcal / miles) AS kcal_per_mile_diff FROM with_next WHERE next_start_time - start_time < '60 hours' ORDER BY start_time")) %>% collect() write.csv(diffs, file="fat_biking_trip_diffs.csv", quote=TRUE, row.names=FALSE) kable(head(diffs))
|start time||next start time||temp diff||kcal / mile diff|
|2013-12-03 06:21:49||2013-12-05 06:31:54||3.0||-13.843866|
|2013-12-03 15:41:48||2013-12-05 15:24:10||3.7||-8.823329|
|2013-12-05 06:31:54||2013-12-06 06:39:04||23.4||-22.510564|
|2013-12-05 15:24:10||2013-12-06 16:38:31||13.6||-5.505662|
|2013-12-09 06:41:07||2013-12-11 06:15:32||-27.7||-10.227048|
|2013-12-09 13:44:59||2013-12-11 16:00:11||-25.4||-1.034789|
Out of a total of 123 trips, 70 took place within 2 days of each other. We still don’t have a measure of trail quality, so pairs where the trail is smooth and hard one day and covered with fresh snow the next won’t be particularly good data points.
Let’s look at a plot of the data.
s = ggplot(data=diffs, aes(x=temp_diff, y=kcal_per_mile_diff)) + geom_point() + geom_smooth(method="lm", se=FALSE) + scale_x_continuous(name="Temperature difference between paired trips (degrees F)", breaks=pretty_breaks(n=10)) + scale_y_continuous(name="Energy consumption difference (kcal / mile)", breaks=pretty_breaks(n=10)) + theme_bw() + ggtitle("Paired fat bike trips to and from work within 2 days of each other") print(s)
This shows that when the temperature difference between two paired trips is negative (the second trip is colder than the first), additional energy is required for the second (colder) trip. This matches the pattern we saw in my earlier post where minimum temperature and winter day of year were negatively associated with energy consumption. But because we’ve used differences to remove seasonal effects, we can actually determine how large of an effect temperature has.
There are quite a few outliers here. Those that are in the region with very little difference in temperature are likey due to snowfall changing the trail conditions from one trip to the next. I’m not sure why there is so much scatter among the points on the left side of the graph, but I don’t see any particular pattern among those points that might explain the higher than normal variation, and we don’t see the same variation in the points with a large positive difference in temperature, so I think this is just normal variation in the data not explained by temperature.
Here’s the linear regression results for this data.
summary(lm(data=diffs, kcal_per_mile_diff ~ temp_diff))
## ## Call: ## lm(formula = kcal_per_mile_diff ~ temp_diff, data = diffs) ## ## Residuals: ## Min 1Q Median 3Q Max ## -40.839 -4.584 -0.169 3.740 47.063 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -2.1696 1.5253 -1.422 0.159 ## temp_diff -0.7778 0.1434 -5.424 8.37e-07 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 12.76 on 68 degrees of freedom ## Multiple R-squared: 0.302, Adjusted R-squared: 0.2917 ## F-statistic: 29.42 on 1 and 68 DF, p-value: 8.367e-07
The model and coefficient are both highly signficant, and as we might expect, the intercept in the model is not significantly different from zero (if there wasn’t a difference in temperature between two trips there shouldn’t be a difference in energy consumption either, on average). Temperature alone explains 30% of the variation in energy consumption, and the coefficient tells us the scale of the effect: each degree drop in temperature results in an increase in energy consumption of 0.78 kcalories per mile. So for a 4 mile commute like mine, the difference between a trip at 10°F vs −20°F is an additional 93 kilocalories (30 × 0.7778 × 4 = 93.34) on the colder trip. That might not sound like much in the context of the calories in food (93 kilocalories is about the energy in a large orange or a light beer), but my average energy consumption across all fat bike trips to and from work is 377 kilocalories so 93 represents a large portion of the total.