metachronistic

sat, 01-dec-2012, 07:41

It’s now December 1st and the last time we got new snow was on November 11th. In my last post I looked at the lengths of snow-free periods in the available weather data for Fairbanks, now at 20 days. That’s a long time, but what I’m interested in looking at today is whether the monthly pattern of snowfall in Fairbanks is changing.

The Alaska Dog Musher’s Association holds a series of weekly sprint races starting at the beginning of December. For the past several years—and this year—there hasn’t been enough snow to hold the earliest of the races because it takes a certain depth of snowpack to allow a snow hook to hold a team back should the driver need to stop. I’m curious to know if scheduling a bunch of races in December and early January is wishful thinking, or if we used to get a lot of snow earlier in the season than we do now. In other words, has the pattern of snowfall in Fairbanks changed?

One way to get at this is to look at the earliest data in the “winter year” (which I’m defining as starting on September 1st, since we do sometimes get significant snowfall in September) when 12 inches of snow has fallen. Here’s what that relationship looks like:

And the results from a linear regression:

Call:
lm(formula = winter_doy ~ winter_year, data = first_foot)

Residuals:
    Min      1Q  Median      3Q     Max
-60.676 -25.149  -0.596  20.984  77.152

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) -498.5005   462.7571  -1.077    0.286
winter_year    0.3067     0.2336   1.313    0.194

Residual standard error: 33.81 on 60 degrees of freedom
Multiple R-squared: 0.02793,    Adjusted R-squared: 0.01173
F-statistic: 1.724 on 1 and 60 DF,  p-value: 0.1942

According to these results the date of the first foot of snow is getting later in the year, but it’s not significant, so we can’t say with any authority that the pattern we see isn’t just random. Worse, this analysis could be confounded by what appears to be a decline in the total yearly snowfall in Fairbanks:

This relationship (less snow every year) has even less statistical significance. If we combine the two analyses, however, there is a significant relationship:

Call:
lm(formula = winter_year ~ winter_doy * snow, data = yearly_data)

Residuals:
   Min     1Q Median     3Q    Max
-35.15 -11.78   0.49  14.15  32.13

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)
(Intercept)      1.947e+03  2.082e+01  93.520   <2e-16 ***
winter_doy       4.297e-01  1.869e-01   2.299   0.0251 *
snow             5.248e-01  2.877e-01   1.824   0.0733 .
winter_doy:snow -7.022e-03  3.184e-03  -2.206   0.0314 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 17.95 on 58 degrees of freedom
Multiple R-squared: 0.1078,     Adjusted R-squared: 0.06163
F-statistic: 2.336 on 3 and 58 DF,  p-value: 0.08317

Here we’re “predicting” winter year based on the yearly snowfall, the first date where a foot of snow had fallen, and the interaction between the two. Despite the near-significance of the model and the parameters, it doesn’t do a very good job of explaining the data (almost 90% of the variation is unexplained by this model).

One problem with boiling the data down into a single (or two) values for each year is that we’re reducing the amount of data being analyzed, lowering our power to detect a significant relationship between the pattern of snowfall and year. Here’s what the overall pattern for all years looks like:

And the individual plots for each year in the record:

Because “winter month” isn’t a continuous variable, we can’t use normal linear regression to evaluate the relationship between year and monthly snowfall. Instead we’ll use multinominal logistic regression to investigate the relationship between which month is the snowiest, and year:

library(nnet)
model <- multinom(data = snowiest_month, winter_month ~ winter_year)
summary(model)

Call:
multinom(formula = winter_month ~ winter_year, data = snowiest_month)

Coefficients:
  (Intercept)  winter_year
3    30.66572 -0.015149192
4    62.88013 -0.031771508
5    38.97096 -0.019623059
6    13.66039 -0.006941225
7   -68.88398  0.034023510
8   -79.64274  0.039217108

Std. Errors:
   (Intercept)  winter_year
3 9.992962e-08 0.0001979617
4 1.158940e-07 0.0002289479
5 1.120780e-07 0.0002218092
6 1.170249e-07 0.0002320081
7 1.668613e-07 0.0003326432
8 1.955969e-07 0.0003901701

Residual Deviance: 221.5413
AIC: 245.5413

I’m not exactly sure how to interpret the results, but typically you’re looking to see if the intercepts and coefficients are significantly different from zero. If you look at the difference in magnitude between the coefficients and the standard errors, it appears they are significantly different from zero, which would imply they are statistically significant.

In order to examine what they have to say, we’ll calculate the probability curves for whether each month will wind up as the snowiest month, and plot the results by year.

fit_snowiest <- data.frame(winter_year = 1949:2012)
probs <- cbind(fit_snowiest, predict(model, newdata = fit_snowiest, "probs"))
probs.melted <- melt(probs, id.vars = 'winter_year')
names(probs.melted) <- c('winter_year', 'winter_month', 'probability')
probs.melted$month <- factor(probs.melted$winter_month)
levels(probs.melted$month) <- \
  list('oct' = 2, 'nov' = 3, 'dec' = 4, 'jan' = 5, 'feb' = 6, 'mar' = 7, 'apr' = 8)
q <- ggplot(data = probs.melted, aes(x = winter_year, y = probability, colour = month))
q + theme_bw() + geom_line(size = 1) + scale_y_continuous(name = "Model probability") \
  + scale_x_continuous(name = 'Winter year', breaks = seq(1945, 2015, 5)) \
  + ggtitle('Snowiest month probabilities by year from logistic regression model,\n
    Fairbanks Airport station') \
  + scale_colour_manual(values = \
    c("violet", "blue", "cyan", "green", "#FFCC00", "orange", "red"))

The result:

Here’s how you interpret this graph. Each line shows how likely it is that a month will be the snowiest month (November is always the snowiest month because it always has the highest probabilities). The order of the lines for any year indicates the monthly order of snowiness (in 1950, November, December and January were predicted to be the snowiest months, in that order), and months with a negative slope are getting less snowy overall (November, December, January).

November is the snowiest month for all years, but it’s declining, as is snow in December and January. October, February, March and April are increasing. From these results, it appears that we’re getting more snow at the very beginning (October) and at the end of the winter, and less in the middle of the winter.

tags: Fairbanks R statistics weather