This morning’s weather forecast:

SUNNY. HIGHS IN THE UPPER 70S TO LOWER 80S. LIGHT WINDS.

May 13th seems very early in the year to hit 80 degrees in Fairbanks, so I
decided to check it out. What I’m doing here is selecting all the dates where
the temperature is above 80°F, then ranking those dates by year and date, and
extracting the “winner” for each year (where `rank` is 1).

```
WITH warm AS (
SELECT extract(year from dte) AS year, dte,
c_to_f(tmax_c) AS tmax_f
FROM ghcnd_pivot
WHERE station_name = 'FAIRBANKS INTL AP'
AND c_to_f(tmax_c) >= 80.0),
ranked AS (
SELECT year, dte, tmax_f,
row_number() OVER (PARTITION BY year
ORDER BY dte) AS rank
FROM warm)
SELECT dte,
extract(doy from dte) AS doy,
round(tmax_f, 1) as tmax_f
FROM ranked
WHERE rank = 1
ORDER BY doy;
```

And the results:

Date | Day of year | High temperature (°F) |
---|---|---|

1995-05-09 | 129 | 80.1 |

1975-05-11 | 131 | 80.1 |

1942-05-12 | 132 | 81.0 |

1915-05-14 | 134 | 80.1 |

1993-05-16 | 136 | 82.0 |

2002-05-20 | 140 | 80.1 |

2015-05-22 | 142 | 80.1 |

1963-05-22 | 142 | 84.0 |

1960-05-23 | 144 | 80.1 |

2009-05-24 | 144 | 80.1 |

… | … | … |

If we hit 80°F today, it’ll be the fourth earliest day of year to hit that temperature since records started being kept in 1904.

Update: We didn’t reach 80°F on the 13th, but got to 82°F on May 14th, tied with that date in 1915 for the fourth earliest 80 degree temperature.

In a post last week I examined how often Fairbanks gets more than two inches of snow in late spring. We only got 1.1 inches on April 24th, so that event didn’t qualify, but another snowstorm hit Fairbanks this week. Enough that I skied to work a couple days ago (April 30th) and could have skied this morning too.

Another, probably more relevant statistic would be to look at storm totals rather than the amount of snow that fell within a single, somewhat arbitrary 24-hour period (midnight to midnight for the Fairbanks Airport station, 6 AM to 6 AM for my COOP station). With SQL window functions we can examine the totals over a moving window, in this case five days and see what the largest late season snowfall totals were in the historical record.

Here’s a list of the late spring (after April 21st) snowfall totals for Fairbanks where the five day snowfall was greater than three inches:

Storm start | Five day snowfall (inches) |
---|---|

1916-05-03 | 3.6 |

1918-04-26 | 5.1 |

1918-05-15 | 2.5 |

1923-05-03 | 3.0 |

1937-04-24 | 3.6 |

1941-04-22 | 8.1 |

1948-04-26 | 4.0 |

1952-05-05 | 3.0 |

1964-05-13 | 4.7 |

1982-04-30 | 3.1 |

1992-05-12 | 12.3 |

2001-05-04 | 6.4 |

2002-04-25 | 6.7 |

2008-04-30 | 4.3 |

2013-04-29 | 3.6 |

Anyone who was here in 1992 remembers that “summer,” with more than a foot of snow in mid May, and two feet of snow in a pair of storms starting on September 11th, 1992. I don’t expect that all the late spring cold weather and snow we’re experiencing this year will necessarily translate into a short summer like 1992, but we should keep the possibility in mind.

A couple days ago I got an email from a Galoot who was hoping to come north to see the aurora and wondered if March was a good time to come to Fairbanks. I know that March and September are two of my favorite months, but wanted to check to see if my perception of how sunny it is in March was because it really is sunny in March or if it’s because March is the month when winter begins to turn to spring in Fairbanks and it just seems brighter and sunnier, with longer days and white snow on the ground.

I found three sources of data for “cloudiness.” I’ve been parsing the Fairbanks Airport daily climate summary since 2002, and it has a value in it called Average Sky Cover which ranges from 0.0 (completely clear) to 1.0 (completely cloudy). I’ll call this data “pafa.”

The second source is the Global Historical Climatology - Daily for the Fairbanks Airport station. There’s a variable in there named ACMH, which is described as Cloudiness, midnight to midnight (percentage). For the Airport station, this value appears in the database from 1965 through 1997. One reassuring thing about this parameter is that it specifically says it’s from midnight to midnight, so it would include cloudiness when it was dark outside (and the aurora would be visible if it was present). This data set is named “ghcnd.”

The final source is modelled data from the North American Regional Reanalysis. This data set includes TCDC, or total cloud cover (percentage), and is available in three-hour increments over a grid covering North America. I chose the nearest grid point to the Fairbanks Airport and retrieved the daily mean of total cloud cover for the period of the database I have downloaded (1979—2012). In the plots that follow, this is named “narr.”

After reading the data and merging the three data sets together, I generate monthly means of cloud cover (scaled to percentages from 0 to 100) in each of the data sets, in R:

```
library(plyr)
cloud_cover <- merge(pafa, ghcnd, by = 'date', all = TRUE)
cloud_cover <- merge(cloud_cover, narr, by = 'date', all = TRUE)
cloud_cover$month <- month(cloud_cover$date)
by_month_mean <- ddply(
subset(cloud_cover,
select = c('month', 'pafa', 'ghcnd', 'narr')),
.(month),
summarise,
pafa = mean(pafa, na.rm = TRUE),
ghcnd = mean(ghcnd, na.rm = TRUE),
narr = mean(narr, na.rm = TRUE))
by_month_mean$mon <- factor(by_month_mean$month,
labels = c('jan', 'feb', 'mar',
'apr', 'may', 'jun',
'jul', 'aug', 'sep',
'oct', 'nov', 'dec'))
```

In order to plot it, I generate text labels for the year range of each data set and melt the data so it can be faceted:

```
library(lubridate)
library(reshape2)
text_labels <- rbind(
data.frame(variable = 'pafa',
str = paste(min(year(pafa$date)), '-', max(year(pafa$date)))),
data.frame(variable = 'ghcnd',
str = paste(min(year(ghcnd$date)), '-', max(year(ghcnd$date)))),
data.frame(variable = 'narr',
str = paste(min(year(narr$date)), '-', max(year(narr$date)))))
mean_melted <- melt(by_month_mean,
id.vars = 'mon',
measure.vars = c('pafa', 'ghcnd', 'narr'))
```

Finally, the plotting:

```
library(ggplot2)
q <- ggplot(data = mean_melted, aes(x = mon, y = value))
q +
theme_bw() +
geom_bar(stat = 'identity', colour = "darkred", fill = "darkorange") +
facet_wrap(~ variable, ncol = 1) +
scale_x_discrete(name = "Month") +
scale_y_continuous(name = "Mean cloud cover") +
ggtitle('Cloud cover data for Fairbanks Airport Station') +
geom_text(data = text_labels, aes(x = 'feb', y = 70, label = str), size = 4) +
geom_text(aes(label = round(value, digits = 1)), vjust = 1.5, size = 3)
```

The good news for the guy coming to see the northern lights is that March is indeed the least cloudy month in Fairbanks, and all three data sources show similar patterns, although the NARR dataset has September and October as the cloudiest months, and anyone who has lived in Fairbanks knows that August is the rainiest (and probably cloudiest) month. PAFA and GHCND have a late summer pattern that seems more like what I recall.

Another way to slice the data is to get the average number of days in a month with less than 20% cloud cover; a measure of the clearest days. This is a pretty easy calculation:

```
by_month_less_than_20 <- ddply(
subset(cloud_cover,
select = c('month', 'pafa', 'ghcnd', 'narr')),
.(month),
summarise,
pafa = sum(pafa < 20, na.rm = TRUE) / sum(!is.na(pafa)) * 100,
ghcnd = sum(ghcnd < 20, na.rm = TRUE) / sum(!is.na(ghcnd)) * 100,
narr = sum(narr < 20, na.rm = TRUE) / sum(!is.na(narr)) * 100);
```

And the results:

We see the same pattern as in the mean cloudiness plot. March is the month with the greatest number of days with less that 20% cloud cover. Depending on the data set, between 17 and 24 percent of March days are quite clear. In contrast, the summer months rarely see days with no cloud cover. In June and July, the days are long and convection often builds large clouds in the late afternoon, and by August, the rain has started. Just like in the previous plot, NARR has September as the month with the fewest clear days, which doesn’t match my experience.

It’s now December 1st and the last time we got new snow was on November 11th. In my last post I looked at the lengths of snow-free periods in the available weather data for Fairbanks, now at 20 days. That’s a long time, but what I’m interested in looking at today is whether the monthly pattern of snowfall in Fairbanks is changing.

The Alaska Dog Musher’s Association holds a series of weekly sprint races starting at the beginning of December. For the past several years—and this year—there hasn’t been enough snow to hold the earliest of the races because it takes a certain depth of snowpack to allow a snow hook to hold a team back should the driver need to stop. I’m curious to know if scheduling a bunch of races in December and early January is wishful thinking, or if we used to get a lot of snow earlier in the season than we do now. In other words, has the pattern of snowfall in Fairbanks changed?

One way to get at this is to look at the earliest data in the “winter year” (which I’m defining as starting on September 1st, since we do sometimes get significant snowfall in September) when 12 inches of snow has fallen. Here’s what that relationship looks like:

And the results from a linear regression:

```
Call:
lm(formula = winter_doy ~ winter_year, data = first_foot)
Residuals:
Min 1Q Median 3Q Max
-60.676 -25.149 -0.596 20.984 77.152
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -498.5005 462.7571 -1.077 0.286
winter_year 0.3067 0.2336 1.313 0.194
Residual standard error: 33.81 on 60 degrees of freedom
Multiple R-squared: 0.02793, Adjusted R-squared: 0.01173
F-statistic: 1.724 on 1 and 60 DF, p-value: 0.1942
```

According to these results the date of the first foot of snow is getting later
in the year, but it’s not significant, so we can’t say with any authority that
the pattern we see isn’t just random. Worse, this analysis could be confounded
by what appears to be a decline in the total *yearly* snowfall in Fairbanks:

This relationship (less snow every year) has even less statistical significance. If we combine the two analyses, however, there is a significant relationship:

```
Call:
lm(formula = winter_year ~ winter_doy * snow, data = yearly_data)
Residuals:
Min 1Q Median 3Q Max
-35.15 -11.78 0.49 14.15 32.13
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.947e+03 2.082e+01 93.520 <2e-16 ***
winter_doy 4.297e-01 1.869e-01 2.299 0.0251 *
snow 5.248e-01 2.877e-01 1.824 0.0733 .
winter_doy:snow -7.022e-03 3.184e-03 -2.206 0.0314 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 17.95 on 58 degrees of freedom
Multiple R-squared: 0.1078, Adjusted R-squared: 0.06163
F-statistic: 2.336 on 3 and 58 DF, p-value: 0.08317
```

Here we’re “predicting” winter year based on the yearly snowfall, the first date where a foot of snow had fallen, and the interaction between the two. Despite the near-significance of the model and the parameters, it doesn’t do a very good job of explaining the data (almost 90% of the variation is unexplained by this model).

One problem with boiling the data down into a single (or two) values for each year is that we’re reducing the amount of data being analyzed, lowering our power to detect a significant relationship between the pattern of snowfall and year. Here’s what the overall pattern for all years looks like:

And the individual plots for each year in the record:

Because “winter month” isn’t a continuous variable, we can’t use normal linear regression to evaluate the relationship between year and monthly snowfall. Instead we’ll use multinominal logistic regression to investigate the relationship between which month is the snowiest, and year:

```
library(nnet)
model <- multinom(data = snowiest_month, winter_month ~ winter_year)
summary(model)
Call:
multinom(formula = winter_month ~ winter_year, data = snowiest_month)
Coefficients:
(Intercept) winter_year
3 30.66572 -0.015149192
4 62.88013 -0.031771508
5 38.97096 -0.019623059
6 13.66039 -0.006941225
7 -68.88398 0.034023510
8 -79.64274 0.039217108
Std. Errors:
(Intercept) winter_year
3 9.992962e-08 0.0001979617
4 1.158940e-07 0.0002289479
5 1.120780e-07 0.0002218092
6 1.170249e-07 0.0002320081
7 1.668613e-07 0.0003326432
8 1.955969e-07 0.0003901701
Residual Deviance: 221.5413
AIC: 245.5413
```

I’m not exactly sure how to interpret the results, but typically you’re looking to see if the intercepts and coefficients are significantly different from zero. If you look at the difference in magnitude between the coefficients and the standard errors, it appears they are significantly different from zero, which would imply they are statistically significant.

In order to examine what they have to say, we’ll calculate the probability curves for whether each month will wind up as the snowiest month, and plot the results by year.

```
fit_snowiest <- data.frame(winter_year = 1949:2012)
probs <- cbind(fit_snowiest, predict(model, newdata = fit_snowiest, "probs"))
probs.melted <- melt(probs, id.vars = 'winter_year')
names(probs.melted) <- c('winter_year', 'winter_month', 'probability')
probs.melted$month <- factor(probs.melted$winter_month)
levels(probs.melted$month) <- \
list('oct' = 2, 'nov' = 3, 'dec' = 4, 'jan' = 5, 'feb' = 6, 'mar' = 7, 'apr' = 8)
q <- ggplot(data = probs.melted, aes(x = winter_year, y = probability, colour = month))
q + theme_bw() + geom_line(size = 1) + scale_y_continuous(name = "Model probability") \
+ scale_x_continuous(name = 'Winter year', breaks = seq(1945, 2015, 5)) \
+ ggtitle('Snowiest month probabilities by year from logistic regression model,\n
Fairbanks Airport station') \
+ scale_colour_manual(values = \
c("violet", "blue", "cyan", "green", "#FFCC00", "orange", "red"))
```

The result:

Here’s how you interpret this graph. Each line shows how likely it is that a month will be the snowiest month (November is always the snowiest month because it always has the highest probabilities). The order of the lines for any year indicates the monthly order of snowiness (in 1950, November, December and January were predicted to be the snowiest months, in that order), and months with a negative slope are getting less snowy overall (November, December, January).

November is the snowiest month for all years, but it’s declining, as is snow in December and January. October, February, March and April are increasing. From these results, it appears that we’re getting more snow at the very beginning (October) and at the end of the winter, and less in the middle of the winter.

Reading today’s weather discussion from the Weather Service, they state:

LITTLE OR NO SNOWFALL IS EXPECTED IN THE FORECAST AREA FOR THE NEXT WEEK.

The last time it snowed was November 11th, so if that does happen, it will be at least 15 days without snow. That seems unusual for Fairbanks, so I checked it out.

Finding the lengths of consecutive events is something I’ve wondered how to do in SQL for some time, but it’s not all that difficult if you can get a listing of just the dates where the event (snowfall) happens. Once you’ve got that listing, you can use window functions to calculate the intervals between dates (rows), eliminate those that don’t matter, and rank them.

For this exercise, I’m looking for days with more than 0.1 inches of snow where the maximum temperature was below 10°C. And I exclude any interval where the end date is after March. Without this exclusion I’d get a bunch of really long intervals between the last snowfall of the year, and the first from the next year.

Here’s the SQL (somewhat simplified), using the GHCN-Daily database for the Fairbanks airport station:

```
SELECT * FROM (
SELECT dte AS start,
LEAD(dte) OVER (ORDER BY dte) AS end,
LEAD(dte) OVER (ORDER BY dte) - dte AS interv
FROM (
SELECT dte
FROM ghcnd_obs
WHERE station_id = 'USW00026411'
AND tmax < 10.0
AND snow > 0
) AS foo
) AS bar
WHERE extract(month from foo.end) < 4
AND interv > 6
ORDER BY interv DESC;
```

Here’s the top-8 longest periods:

Start | End | Days |
---|---|---|

1952‑12‑01 | 1953‑01‑19 | 49 |

1978‑02‑08 | 1978‑03‑16 | 36 |

1968‑02‑23 | 1968‑03‑28 | 34 |

1969‑11‑30 | 1970‑01‑02 | 33 |

1959‑01‑02 | 1959‑02‑02 | 31 |

1979‑02‑01 | 1979‑03‑03 | 30 |

2011‑02‑26 | 2011‑03‑27 | 29 |

1950‑02‑02 | 1950‑03‑03 | 29 |

Kinda scary that there have been periods where no snow fell for more than a month!

Here’s how many times various snow-free periods longer than a week have come since 1948:

Days | Count |
---|---|

7 | 39 |

10 | 32 |

9 | 30 |

8 | 23 |

12 | 17 |

11 | 17 |

13 | 12 |

18 | 10 |

15 | 8 |

14 | 8 |

We can add one more to the 8-day category as of midnight tonight.