# Introduction

It’s November 10th in Fairbanks and we have only an inch of snow on the ground. The average depth on this date is 6.1 inches, but that a little deceptive because snow depth doesn’t follow a normal distribution; it can never be below zero, and has a long tail toward deeper snow depths. In the 92 years of snow depth data for the Fairbanks Airport, we’ve had less than an inch of snow only six times (6.5%). At the other end of the distribution, there have been seven years with more than 14 inches of snow on November 10th.

My question is: what does snow depth on November 10th tell us about how much snow we are going to get later on in the winter? Is there a relationship between depth on November 10th and depths later in the winter, and if there is, how much snow can we expect this winter?

# Data

We’ll use the 92-year record of snow depth data from the Fairbanks International Airport station that’s in the Global Historical Climate Network.

The correlation coefficients (a 1 means a perfect correlation, and a 0 is no correlation) between snow depth on November 10th, and the first of the months of December, January and February of that same winter are shown below:

nov_10 | dec_01 | jan_01 | feb_01 | |
---|---|---|---|---|

nov_10 | 1.00 | 0.65 | 0.49 | 0.46 |

dec_01 | 0.65 | 1.00 | 0.60 | 0.39 |

jan_01 | 0.49 | 0.60 | 1.00 | 0.74 |

feb_01 | 0.46 | 0.39 | 0.74 | 1.00 |

Looking down the `nov_10` column, you can see a high correlation
between snow depth on November 10th and depth on December 1st, but lower
(and similar) correlations with depths in January and February.

This makes sense. In Fairbanks, snow that falls after the second week in October is likely to be around for the rest of the winter, so all the snow on the ground on November 10th, will still be there in December, and throughout the winter.

But what can a snow depth of one inch on November 10th tell us about how much snow we will have in December or later on?

Here’s the data for those six years with a snow depth of 1 inch on November 10th:

wyear | dec_01 | jan_01 | feb_01 |
---|---|---|---|

1938 | 5 | 11 | 24 |

1940 | 6 | 8 | 9 |

1951 | 12 | 22 | 31 |

1953 | 1 | 5 | 17 |

1954 | 9 | 15 | 12 |

1979 | 3 | 8 | 14 |

Not exactly encouraging data for our current situation, although 1951 gives us some hope of a good winter.

# Methods

We used Bayesian linear regression to predict snow depth on December
1st, January 1st and February 1st, based on our snow depth data and the
current snow depth in Fairbanks. We used the `rstanarm` R package,
which mimics the `glm` function that’s part of base R.

Because of the non-zero, skewed nature of the distribution of snow depths, a
log-linked Gamma distribution is appropriate. We used the `rstanarm` defaults
for priors.

One of the great things about Bayesian linear regression is that it incorporates our uncertainty about the model coefficients to produce a distribution of predicted values. The more uncertainty there is in our model, the wider the range of predicted values. We examine the distribution of these predicted snow depth values and compare them with the distribution of actual values.

The code for the analysis appears at the bottom of the post.

# Results

The following figure shows a histogram and density function plot for the predicted snow depth on December 1st (top pane) for this year, and the actual December 1st snow depth data in past years (bottom).

The predicted snow depth ranges from zero to almost 27 inches of snow, but the distribution is concentrated around 5 inches. The lower plot showing the distribution of actual snow depth on December 1st isn’t as smooth, but it has a similar shape and peaks at 9 inches.

If we run the same analysis for January and February, we get a set of frequency distributions that look like the following plot, again with the predicted snow depth distribution on top and the distribution of actual data on the bottom.

The December densities are repeated here, in red, along with the January (green) and February (blue) results. In the top plot, you can clearly see that the shape of the distribution gets more spread out as we get farther from November, indicating our increasing uncertainty in our predictions, although some of that pattern is also from the source data (below), which also gets more spread out in January and February.

Despite our increasing uncertainty, it’s clear from comparing the peaks in these curves that our models expect there to be less snow in December, January and February this year, compared with historical values. By my reckoning, we can expect around 5 inches on December 1st, 10 inches on January 1st, and 12 or 13 inches by February. In an average year, these values would be closer to 9, 12, and 15 inches.

# Conclusion

There is a relationship between snow depth on November 10th and depths later in the winter, but the distributions of predicted values are so spread out that we could easily receive as much or more snow as we have in previous years. Last year on this date we had 5 inches, on December 1st we had 11 inches, 13 inches on New Year’s Day, and 20 inches on February 1st. Here’s hoping we quickly reach, and surpass those values in 2018/2019.

# Appendix

```
library(tidyverse)
library(glue)
library(ggpubr)
library(scales)
library(lubridate)
library(RPostgres)
library(rstanarm)
noaa <- dbConnect(Postgres(),
dbname = "noaa")
ghcnd_stations <- noaa %>%
tbl("ghcnd_stations") %>%
filter(station_name == "FAIRBANKS INTL AP")
ghcnd_variables <- noaa %>%
tbl("ghcnd_variables") %>%
filter(variable == "SNWD")
ghcnd_obs <- noaa %>%
tbl("ghcnd_obs") %>%
inner_join(ghcnd_stations, by = "station_id") %>%
inner_join(ghcnd_variables, by = "variable") %>%
mutate(month = date_part("month", dte),
day = date_part("day", dte)) %>%
filter((month == 11 & day == 10) |
(month == 12 & day == 1) |
(month == 1 & day == 1) |
(month == 2 & day == 1),
is.na(meas_flag) | meas_flag == "") %>%
mutate(value = raw_value * raw_multiplier) %>%
select(dte, month, day, variable, value) %>%
collect()
snow_depths <- ghcnd_obs %>%
mutate(wyear = year(dte - days(91)),
mmdd = factor(glue("{str_to_lower(month.abb[month])}",
"_{sprintf('%02d', day)}"),
levels = c("nov_10", "dec_01",
"jan_01", "feb_01")),
value = value / 25.4) %>%
select(wyear, mmdd, value) %>%
spread(mmdd, value) %>%
filter(!is.na(nov_10))
write_csv(snow_depths, "snow_depths.csv", na = "")
dec <- stan_glm(dec_01 ~ nov_10,
data = snow_depths,
family = Gamma(link = "log"),
# prior = normal(0.7, 3),
# prior_intercept = normal(1, 3),
iter = 5000)
# What does the model day about 2018?
dec_prediction_mat <- posterior_predict(dec,
newdata = tibble(nov_10 = 1))
dec_prediction <- tibble(pred_dec_01 = dec_prediction_mat[,1])
dec_hist <- ggplot(data = dec_prediction,
aes(x = pred_dec_01, y = ..density..)) +
theme_bw() +
geom_histogram(binwidth = 0.25, color = 'black',
fill = 'darkorange') +
geom_density() +
scale_x_continuous(name = "Snow depth (inches)",
limits = c(0, 40),
breaks = seq(0, 40, 5)) +
scale_y_continuous(name = "Frequency") +
theme(plot.margin = unit(c(1, 1, 0, 0.5), 'lines')) +
theme(axis.text.x = element_blank(),
axis.title.x = element_blank(),
axis.ticks.x = element_blank()) +
labs(title = "December Snow Depth",
subtitle = "Fairbanks Airport Station")
actual_december <- ggplot(data = snow_depths,
aes(x = dec_01, y = ..density..)) +
theme_bw() +
geom_histogram(binwidth = 1, color = 'black',
fill = 'darkorange') +
geom_density() +
scale_x_continuous(name = "Snow depth (inches)",
limits = c(0, 40),
breaks = seq(0, 40, 5)) +
scale_y_continuous(name = "Frequency") +
theme(plot.margin = unit(c(0, 1, 0.5, 0.5), 'lines'))
height <- 9
width <- 16
rescale <- 0.75
heights <- c(0.5, 0.5) * height
gt <- ggarrange(dec_hist, actual_december,
ncol = 1, nrow = 2, align = "v",
widths = c(1, 1), heights = heights)
svg('december_comparison.svg',
width=width*rescale, height=height*rescale)
gt
dev.off()
jan <- stan_glm(jan_01 ~ nov_10,
data = snow_depths,
# family = gaussian(link = "identity"),
family = Gamma(link = "log"),
# prior = normal(0.7, 3),
# prior_intercept = normal(1, 3),
iter = 5000)
jan_prediction_mat <- posterior_predict(jan,
newdata = tibble(nov_10 = 1))
jan_prediction <- tibble(pred_jan_01 = jan_prediction_mat[,1])
feb <- stan_glm(feb_01 ~ nov_10,
data = snow_depths,
# family = gaussian(link = "identity"),
family = Gamma(link = "log"),
# family = poisson(link = "identity"),
# prior = normal(0.7, 3),
# prior_intercept = normal(1, 3),
iter = 5000)
feb_prediction_mat <- posterior_predict(feb,
newdata = tibble(nov_10 = 1))
feb_prediction <- tibble(pred_feb_01 = feb_prediction_mat[,1])
all_predictions <- bind_cols(dec_prediction,
jan_prediction,
feb_prediction) %>%
rename(`Dec 1` = pred_dec_01,
`Jan 1` = pred_jan_01,
`Feb 1` = pred_feb_01) %>%
gather(prediction, snow_depth_inches) %>%
mutate(prediction = factor(prediction,
levels = c("Dec 1", "Jan 1", "Feb 1")))
pred_density_plot <- ggplot(data = all_predictions,
aes(x = snow_depth_inches, colour = prediction)) +
theme_bw() +
geom_density() +
scale_x_continuous(name = "Snow depth (inches)",
limits = c(0, 55),
breaks = pretty_breaks(n = 10)) +
theme(axis.text.x = element_blank(), axis.title.x = element_blank(),
axis.ticks.x = element_blank()) +
labs(title = "Predicted and actual snow depths based on November 10 depth",
subtitle = "Fairbanks Airport Station")
actual_data <- snow_depths %>%
transmute(`Dec 1` = dec_01,
`Jan 1` = jan_01,
`Feb 1` = feb_01) %>%
gather(actual, snow_depth_inches) %>%
mutate(actual = factor(actual, levels = c("Dec 1", "Jan 1", "Feb 1")))
actual_density_plot <- ggplot(data = actual_data,
aes(x = snow_depth_inches, colour = actual)) +
theme_bw() +
geom_density() +
scale_x_continuous(name = "Snow depth (inches)",
limits = c(0, 55),
breaks = pretty_breaks(n = 10)) +
theme(plot.margin = unit(c(0, 1, 0.5, 0.5), 'lines'))
height <- 9
width <- 16
rescale <- 0.75
heights <- c(0.5, 0.5) * height
gt <- ggarrange(pred_density_plot, actual_density_plot,
ncol = 1, nrow = 2, align = "v",
widths = c(1, 1), heights = heights)
svg('dec_jan_feb.svg',
width=width*rescale, height=height*rescale)
gt
dev.off()
```