sat, 10-nov-2018, 10:02

Introduction

It’s November 10th in Fairbanks and we have only an inch of snow on the ground. The average depth on this date is 6.1 inches, but that a little deceptive because snow depth doesn’t follow a normal distribution; it can never be below zero, and has a long tail toward deeper snow depths. In the 92 years of snow depth data for the Fairbanks Airport, we’ve had less than an inch of snow only six times (6.5%). At the other end of the distribution, there have been seven years with more than 14 inches of snow on November 10th.

My question is: what does snow depth on November 10th tell us about how much snow we are going to get later on in the winter? Is there a relationship between depth on November 10th and depths later in the winter, and if there is, how much snow can we expect this winter?

Data

We’ll use the 92-year record of snow depth data from the Fairbanks International Airport station that’s in the Global Historical Climate Network.

The correlation coefficients (a 1 means a perfect correlation, and a 0 is no correlation) between snow depth on November 10th, and the first of the months of December, January and February of that same winter are shown below:

  nov_10 dec_01 jan_01 feb_01
nov_10 1.00 0.65 0.49 0.46
dec_01 0.65 1.00 0.60 0.39
jan_01 0.49 0.60 1.00 0.74
feb_01 0.46 0.39 0.74 1.00

Looking down the nov_10 column, you can see a high correlation between snow depth on November 10th and depth on December 1st, but lower (and similar) correlations with depths in January and February.

This makes sense. In Fairbanks, snow that falls after the second week in October is likely to be around for the rest of the winter, so all the snow on the ground on November 10th, will still be there in December, and throughout the winter.

But what can a snow depth of one inch on November 10th tell us about how much snow we will have in December or later on?

Here’s the data for those six years with a snow depth of 1 inch on November 10th:

wyear dec_01 jan_01 feb_01
1938 5 11 24
1940 6 8 9
1951 12 22 31
1953 1 5 17
1954 9 15 12
1979 3 8 14

Not exactly encouraging data for our current situation, although 1951 gives us some hope of a good winter.

Methods

We used Bayesian linear regression to predict snow depth on December 1st, January 1st and February 1st, based on our snow depth data and the current snow depth in Fairbanks. We used the rstanarm R package, which mimics the glm function that’s part of base R.

Because of the non-zero, skewed nature of the distribution of snow depths, a log-linked Gamma distribution is appropriate. We used the rstanarm defaults for priors.

One of the great things about Bayesian linear regression is that it incorporates our uncertainty about the model coefficients to produce a distribution of predicted values. The more uncertainty there is in our model, the wider the range of predicted values. We examine the distribution of these predicted snow depth values and compare them with the distribution of actual values.

The code for the analysis appears at the bottom of the post.

Results

The following figure shows a histogram and density function plot for the predicted snow depth on December 1st (top pane) for this year, and the actual December 1st snow depth data in past years (bottom).

December Snow Depth

The predicted snow depth ranges from zero to almost 27 inches of snow, but the distribution is concentrated around 5 inches. The lower plot showing the distribution of actual snow depth on December 1st isn’t as smooth, but it has a similar shape and peaks at 9 inches.

If we run the same analysis for January and February, we get a set of frequency distributions that look like the following plot, again with the predicted snow depth distribution on top and the distribution of actual data on the bottom.

Snow Depth Distribution

The December densities are repeated here, in red, along with the January (green) and February (blue) results. In the top plot, you can clearly see that the shape of the distribution gets more spread out as we get farther from November, indicating our increasing uncertainty in our predictions, although some of that pattern is also from the source data (below), which also gets more spread out in January and February.

Despite our increasing uncertainty, it’s clear from comparing the peaks in these curves that our models expect there to be less snow in December, January and February this year, compared with historical values. By my reckoning, we can expect around 5 inches on December 1st, 10 inches on January 1st, and 12 or 13 inches by February. In an average year, these values would be closer to 9, 12, and 15 inches.

Conclusion

There is a relationship between snow depth on November 10th and depths later in the winter, but the distributions of predicted values are so spread out that we could easily receive as much or more snow as we have in previous years. Last year on this date we had 5 inches, on December 1st we had 11 inches, 13 inches on New Year’s Day, and 20 inches on February 1st. Here’s hoping we quickly reach, and surpass those values in 2018/2019.

Appendix

library(tidyverse)
library(glue)
library(ggpubr)
library(scales)
library(lubridate)
library(RPostgres)
library(rstanarm)

noaa <- dbConnect(Postgres(),
                  dbname = "noaa")

ghcnd_stations <- noaa %>%
    tbl("ghcnd_stations") %>%
    filter(station_name == "FAIRBANKS INTL AP")

ghcnd_variables <- noaa %>%
    tbl("ghcnd_variables") %>%
    filter(variable == "SNWD")

ghcnd_obs <- noaa %>%
    tbl("ghcnd_obs") %>%
    inner_join(ghcnd_stations, by = "station_id") %>%
    inner_join(ghcnd_variables, by = "variable") %>%
    mutate(month = date_part("month", dte),
           day = date_part("day", dte)) %>%
    filter((month == 11 & day == 10) |
           (month == 12 & day == 1) |
           (month == 1 & day == 1) |
           (month == 2 & day == 1),
           is.na(meas_flag) | meas_flag == "") %>%
    mutate(value = raw_value * raw_multiplier) %>%
    select(dte, month, day, variable, value) %>%
    collect()

snow_depths <- ghcnd_obs %>%
    mutate(wyear = year(dte - days(91)),
           mmdd = factor(glue("{str_to_lower(month.abb[month])}",
                              "_{sprintf('%02d', day)}"),
                         levels = c("nov_10", "dec_01",
                                    "jan_01", "feb_01")),
           value = value / 25.4) %>%
    select(wyear, mmdd, value) %>%
    spread(mmdd, value) %>%
    filter(!is.na(nov_10))

write_csv(snow_depths, "snow_depths.csv", na = "")

dec <- stan_glm(dec_01 ~ nov_10,
                data = snow_depths,
                family = Gamma(link = "log"),
                # prior = normal(0.7, 3),
                # prior_intercept = normal(1, 3),
                iter = 5000)

# What does the model day about 2018?
dec_prediction_mat <- posterior_predict(dec,
                                        newdata = tibble(nov_10 = 1))
dec_prediction <- tibble(pred_dec_01 = dec_prediction_mat[,1])
dec_hist <- ggplot(data = dec_prediction,
                   aes(x = pred_dec_01, y = ..density..)) +
    theme_bw() +
    geom_histogram(binwidth = 0.25, color = 'black',
                   fill = 'darkorange') +
    geom_density() +
    scale_x_continuous(name = "Snow depth (inches)",
                       limits = c(0, 40),
                       breaks = seq(0, 40, 5)) +
    scale_y_continuous(name = "Frequency") +
    theme(plot.margin = unit(c(1, 1, 0, 0.5), 'lines')) +
    theme(axis.text.x = element_blank(),
          axis.title.x = element_blank(),
          axis.ticks.x = element_blank()) +
    labs(title = "December Snow Depth",
         subtitle = "Fairbanks Airport Station")

actual_december <- ggplot(data = snow_depths,
                          aes(x = dec_01, y = ..density..)) +
    theme_bw() +
    geom_histogram(binwidth = 1, color = 'black',
                   fill = 'darkorange') +
    geom_density() +
    scale_x_continuous(name = "Snow depth (inches)",
                       limits = c(0, 40),
                       breaks = seq(0, 40, 5)) +
    scale_y_continuous(name = "Frequency") +
    theme(plot.margin = unit(c(0, 1, 0.5, 0.5), 'lines'))

height <- 9
width <- 16
rescale <- 0.75
heights <- c(0.5, 0.5) * height
gt <- ggarrange(dec_hist, actual_december,
                ncol = 1, nrow = 2, align = "v",
                widths = c(1, 1), heights = heights)
svg('december_comparison.svg',
    width=width*rescale, height=height*rescale)
gt
dev.off()

jan <- stan_glm(jan_01 ~ nov_10,
                data = snow_depths,
                # family = gaussian(link = "identity"),
                family = Gamma(link = "log"),
                # prior = normal(0.7, 3),
                # prior_intercept = normal(1, 3),
                iter = 5000)

jan_prediction_mat <- posterior_predict(jan,
                                        newdata = tibble(nov_10 = 1))
jan_prediction <- tibble(pred_jan_01 = jan_prediction_mat[,1])

feb <- stan_glm(feb_01 ~ nov_10,
                data = snow_depths,
                # family = gaussian(link = "identity"),
                family = Gamma(link = "log"),
                # family = poisson(link = "identity"),
                # prior = normal(0.7, 3),
                # prior_intercept = normal(1, 3),
                iter = 5000)

feb_prediction_mat <- posterior_predict(feb,
                                        newdata = tibble(nov_10 = 1))
feb_prediction <- tibble(pred_feb_01 = feb_prediction_mat[,1])

all_predictions <- bind_cols(dec_prediction,
                             jan_prediction,
                             feb_prediction) %>%
    rename(`Dec 1` = pred_dec_01,
           `Jan 1` = pred_jan_01,
           `Feb 1` = pred_feb_01) %>%
    gather(prediction, snow_depth_inches) %>%
    mutate(prediction = factor(prediction,
                               levels = c("Dec 1", "Jan 1", "Feb 1")))

pred_density_plot <- ggplot(data = all_predictions,
                       aes(x = snow_depth_inches, colour = prediction)) +
    theme_bw() +
    geom_density() +
    scale_x_continuous(name = "Snow depth (inches)",
                       limits = c(0, 55),
                       breaks = pretty_breaks(n = 10)) +
    theme(axis.text.x = element_blank(), axis.title.x = element_blank(),
          axis.ticks.x = element_blank()) +
    labs(title = "Predicted and actual snow depths based on November 10 depth",
         subtitle = "Fairbanks Airport Station")


actual_data <- snow_depths %>%
    transmute(`Dec 1` = dec_01,
              `Jan 1` = jan_01,
              `Feb 1` = feb_01) %>%
    gather(actual, snow_depth_inches) %>%
    mutate(actual = factor(actual, levels = c("Dec 1", "Jan 1", "Feb 1")))

actual_density_plot <- ggplot(data = actual_data,
                       aes(x = snow_depth_inches, colour = actual)) +
    theme_bw() +
    geom_density() +
    scale_x_continuous(name = "Snow depth (inches)",
                       limits = c(0, 55),
                       breaks = pretty_breaks(n = 10)) +
    theme(plot.margin = unit(c(0, 1, 0.5, 0.5), 'lines'))

height <- 9
width <- 16
rescale <- 0.75
heights <- c(0.5, 0.5) * height
gt <- ggarrange(pred_density_plot, actual_density_plot,
                ncol = 1, nrow = 2, align = "v",
                widths = c(1, 1), heights = heights)
svg('dec_jan_feb.svg',
    width=width*rescale, height=height*rescale)
gt
dev.off()
thu, 13-sep-2018, 17:40

Introduction

A couple years ago I wrote a post about past Equinox Marathon weather. Since that post Andrea and I have run the relay twice, and I plan on running the full marathon in a couple days. This post updates the statistics and plots to include two more years of the race.

Methods

Methods and data are the same as in my previous post, except the daily data has been updated to include 2016 and 2017. The R code is available at the end of the previous post.

Results

Race day weather

Temperatures at the airport on race day ranged from 19.9 °F in 1972 to 35.1 °F in 1969, but the average range is between 34.1 and 53.1 °F. Using our model of Ester Dome temperatures, we get an average range of 29.5 and 47.3 °F and an overall min / max of 16.1 / 61.3 °F. Generally speaking, it will be below freezing on Ester Dome, but possibly before most of the runners get up there.

Precipitation (rain, sleet or snow) has fallen on 16 out of 55 race days, or 29% of the time, and measurable snowfall has been recorded on four of those sixteen. The highest amount fell in 2014 with 0.36 inches of liquid precipitation (no snow was recorded and the temperatures were between 45 and 51 °F so it was almost certainly all rain, even on Ester Dome). More than a quarter of an inch of precipitation fell in three of the sixteen years when it rained or snowed (1990, 1993, and 2014), but most rainfall totals are much smaller.

Measurable snow fell at the airport in four years, or seven percent of the time: 4.1 inches in 1993, 2.1 inches in 1985, 1.2 inches in 1996, and 0.4 inches in 1992. But that’s at the airport station. Five of the 12 years where measurable precipitation fell at the airport and no snow fell, had possible minimum temperatures on Ester Dome that were below freezing. It’s likely that some of the precipitation recorded at the airport in those years was coming down as snow up on Ester Dome. If so, that means snow may have fallen on nine race days, bringing the percentage up to sixteen percent.

Wind data from the airport has only been recorded since 1984, but from those years the average wind speed at the airport on race day is 4.8 miles per hour. The highest 2-minute wind speed during Equinox race day was 21 miles per hour in 2003. Unfortunately, no wind data is available for Ester Dome, but it’s likely to be higher than what is recorded at the airport.

Weather from the week prior

It’s also useful to look at the weather from the week before the race, since excessive pre-race rain or snow can make conditions on race day very different, even if the race day weather is pleasant. The year I ran the full marathon (2013), it snowed the week before and much of the trail in the woods before the water stop near Henderson and all of the out and back were covered in snow.

The most dramatic example of this was 1992 where 23 inches (!) of snow fell at the airport in the week prior to the race, with much higher totals up on the summit of Ester Dome. Measurable snow has been recorded at the airport in the week prior to six races, but all the weekly totals are under an inch except for the snow year of 1992.

Precipitation has fallen in 44 of 55 pre-race weeks (80% of the time). Three years have had more than an inch of precipitation prior to the race: 1.49 inches in 2015, 1.26 inches in 1992 (most of which fell as snow), and 1.05 inches in 2007. On average, just over two tenths of an inch of precipitation falls in the week before the race.

Summary

The following stacked plots shows the weather for all 55 runnings of the Equinox marathon. The top panel shows the range of temperatures on race day from the airport station (wide bars) and estimated on Ester Dome (thin lines below bars). The shaded area at the bottom shows where temperatures are below freezing.

The middle panel shows race day liquid precipitation (rain, melted snow). Bars marked with an asterisk indicate years where snow was also recorded at the airport, but remember that five of the other years with liquid precipitation probably experienced snow on Ester Dome (1977, 1986, 1991, 1994, and 2016) because the temperatures were likely to be below freezing at elevation.

The bottom panel shows precipitation totals from the week prior to the race. Bars marked with an asterisk indicate weeks where snow was also recorded at the airport.

Equinox Marathon Weather

Here’s a table with most of the data from the analysis. A CSV with this data can be downloaded from all_wx.csv

Date min t max t ED min t ED max t awnd prcp snow p prcp p snow
1963-09-21 32.0 54.0 27.5 48.2   0.00 0.0 0.01 0.0
1964-09-19 34.0 57.9 29.4 51.8   0.00 0.0 0.03 0.0
1965-09-25 37.9 60.1 33.1 53.9   0.00 0.0 0.80 0.0
1966-09-24 36.0 62.1 31.3 55.8   0.00 0.0 0.01 0.0
1967-09-23 35.1 57.9 30.4 51.8   0.00 0.0 0.00 0.0
1968-09-21 23.0 44.1 19.1 38.9   0.00 0.0 0.04 0.0
1969-09-20 35.1 68.0 30.4 61.3   0.00 0.0 0.00 0.0
1970-09-19 24.1 39.9 20.1 34.9   0.00 0.0 0.42 0.0
1971-09-18 35.1 55.9 30.4 50.0   0.00 0.0 0.14 0.0
1972-09-23 19.9 42.1 16.1 37.0   0.00 0.0 0.01 0.2
1973-09-22 30.0 44.1 25.6 38.9   0.00 0.0 0.05 0.0
1974-09-21 48.0 60.1 42.5 53.9   0.08 0.0 0.00 0.0
1975-09-20 37.9 55.9 33.1 50.0   0.02 0.0 0.02 0.0
1976-09-18 34.0 59.0 29.4 52.9   0.00 0.0 0.54 0.0
1977-09-24 36.0 48.9 31.3 43.4   0.06 0.0 0.20 0.0
1978-09-23 30.0 42.1 25.6 37.0   0.00 0.0 0.10 0.3
1979-09-22 35.1 62.1 30.4 55.8   0.00 0.0 0.17 0.0
1980-09-20 30.9 43.0 26.5 37.8   0.00 0.0 0.35 0.0
1981-09-19 37.0 43.0 32.2 37.8   0.15 0.0 0.04 0.0
1982-09-18 42.1 61.0 37.0 54.8   0.02 0.0 0.22 0.0
1983-09-17 39.9 46.9 34.9 41.5   0.00 0.0 0.05 0.0
1984-09-22 28.9 60.1 24.6 53.9 5.8 0.00 0.0 0.08 0.0
1985-09-21 30.9 42.1 26.5 37.0 6.5 0.14 2.1 0.57 0.0
1986-09-20 36.0 52.0 31.3 46.3 8.3 0.07 0.0 0.21 0.0
1987-09-19 37.9 61.0 33.1 54.8 6.3 0.00 0.0 0.00 0.0
1988-09-24 37.0 45.0 32.2 39.7 4.0 0.00 0.0 0.11 0.0
1989-09-23 36.0 61.0 31.3 54.8 8.5 0.00 0.0 0.07 0.5
1990-09-22 37.9 50.0 33.1 44.4 7.8 0.26 0.0 0.00 0.0
1991-09-21 36.0 57.0 31.3 51.0 4.5 0.04 0.0 0.03 0.0
1992-09-19 24.1 33.1 20.1 28.5 6.7 0.01 0.4 1.26 23.0
1993-09-18 28.0 37.0 23.8 32.2 4.9 0.29 4.1 0.37 0.3
1994-09-24 27.0 51.1 22.8 45.5 6.0 0.02 0.0 0.08 0.0
1995-09-23 43.0 66.9 37.8 60.3 4.0 0.00 0.0 0.00 0.0
1996-09-21 28.9 37.9 24.6 33.1 6.9 0.06 1.2 0.26 0.0
1997-09-20 27.0 55.0 22.8 49.1 3.8 0.00 0.0 0.03 0.0
1998-09-19 42.1 60.1 37.0 53.9 4.9 0.00 0.0 0.37 0.0
1999-09-18 39.0 64.9 34.1 58.4 3.8 0.00 0.0 0.26 0.0
2000-09-16 28.9 50.0 24.6 44.4 5.6 0.00 0.0 0.30 0.0
2001-09-22 33.1 57.0 28.5 51.0 1.6 0.00 0.0 0.00 0.0
2002-09-21 33.1 48.9 28.5 43.4 3.8 0.00 0.0 0.03 0.0
2003-09-20 26.1 46.0 22.0 40.7 9.6 0.00 0.0 0.00 0.0
2004-09-18 26.1 48.0 22.0 42.5 4.3 0.00 0.0 0.25 0.0
2005-09-17 37.0 63.0 32.2 56.6 0.9 0.00 0.0 0.09 0.0
2006-09-16 46.0 64.0 40.7 57.6 4.3 0.00 0.0 0.00 0.0
2007-09-22 25.0 45.0 20.9 39.7 4.7 0.00 0.0 1.05 0.0
2008-09-20 34.0 51.1 29.4 45.5 4.5 0.00 0.0 0.08 0.0
2009-09-19 39.0 50.0 34.1 44.4 5.8 0.00 0.0 0.25 0.0
2010-09-18 35.1 64.9 30.4 58.4 2.5 0.00 0.0 0.00 0.0
2011-09-17 39.9 57.9 34.9 51.8 1.3 0.00 0.0 0.44 0.0
2012-09-22 46.9 66.9 41.5 60.3 6.0 0.00 0.0 0.33 0.0
2013-09-21 24.3 44.1 20.3 38.9 5.1 0.00 0.0 0.13 0.6
2014-09-20 45.0 51.1 39.7 45.5 1.6 0.36 0.0 0.00 0.0
2015-09-19 37.9 44.1 33.1 38.9 2.9 0.01 0.0 1.49 0.0
2016-09-17 34.0 57.9 29.4 51.8 2.2 0.01 0.0 0.61 0.0
2017-09-16 33.1 66.0 28.5 59.5 3.1 0.00 0.0 0.02 0.0
sun, 31-dec-2017, 11:10

Introduction

I’m planning a short trip to visit family in Florida and thought I’d take advantage of being in a new place to do some late winter backpacking where it’s warmer than in Fairbanks. I think I’ve settled on a 3‒5 day backpacking trip in Big South Fork National River and Recreation Area, which is in northeastern Tennesee and southeastern Kentucky.

Except for a couple summer trips in New England in the 80s, my backpacking experience has been in summer, in places where it doesn’t rain much and is typically hot and dry (California, Oregon). So I’d like to find out what the weather should be like when I’m there.

Data

I’ll use the Global Historical Climatology Network — Daily dataset, which contains daily weather observations for more than 100 thousand stations across the globe. There are more than 26 thousand active stations in the United States, and data for some U.S. stations goes back to 1836. I loaded the entire dataset—2.4 billion records as of last week—into a PostgreSQL database, partitioning the data by year. I’m interested in daily minimum and maximum temperature (TMIN, TMAX), precipitation (PRCP) and snowfall (SNOW), and in stations within 50 miles of the center of the recreation area.

The following map shows the recreation area boundary (with some strange drawing errors, probably due to using the fortify command) in green, the Tennessee/Kentucky border across the middle of the plot, and the 19 stations used in the analysis.

//media.swingleydev.com/img/blog/2017/12/biso_stations.svgz

Here are the details on the stations:

station_id station_name start_year end_year latitude longitude miles
USC00407141 PICKETT SP 2000 2017 36.5514 -84.7967 6.13
USC00406829 ONEIDA 1959 2017 36.5028 -84.5308 9.51
USC00400081 ALLARDT 1928 2017 36.3806 -84.8744 12.99
USC00404590 JAMESTOWN 2003 2017 36.4258 -84.9419 14.52
USC00157677 STEARNS 2S 1936 2017 36.6736 -84.4792 16.90
USC00401310 BYRDSTOWN 1998 2017 36.5803 -85.1256 24.16
USC00406493 NEWCOMB 1999 2017 36.5517 -84.1728 29.61
USC00158711 WILLIAMSBURG 1NW 2011 2017 36.7458 -84.1753 33.60
USC00405332 LIVINGSTON RADIO WLIV 1961 2017 36.3775 -85.3364 36.52
USC00154208 JAMESTOWN WWTP 1971 2017 37.0056 -85.0617 39.82
USC00406170 MONTEREY 1904 2017 36.1483 -85.2650 40.04
USC00406619 NORRIS 1936 2017 36.2131 -84.0603 41.13
USC00402202 CROSSVILLE ED & RESEARCH 1912 2017 36.0147 -85.1314 41.61
USW00053868 OAK RIDGE ASOS 1999 2017 36.0236 -84.2375 42.24
USC00401561 CELINA 1948 2017 36.5408 -85.4597 42.31
USC00157510 SOMERSET 2 N 1950 2017 37.1167 -84.6167 42.36
USW00003841 OAK RIDGE ATDD 1948 2017 36.0028 -84.2486 43.02
USW00003847 CROSSVILLE MEM AP 1954 2017 35.9508 -85.0814 43.87
USC00404871 KINGSTON 2000 2017 35.8575 -84.5278 45.86

To perform the analysis, I collected all valid observations for the stations listed, then reduced the results, including observations where the day of the year was between 45 and 52 (February 14‒21).

variable observations
PRCP 5,942
SNOW 5,091
TMAX 4,900
TMIN 4,846

Results

Temperature

We will consider temperature first. The following two plots show the distribution of daily minimum and maximum temperatures. In both plots, the bars represent the number of observations at that temperature, the vertical red line through the middle of the plot shows the average temperature, and the light orange and blue sections show the ranges of temperatures enclosing 80% and 98% of the data.

//media.swingleydev.com/img/blog/2017/12/min_temp_dist.svgz
//media.swingleydev.com/img/blog/2017/12/max_temp_dist.svgz

The minimum daily temperature figure shows that the average minimum temperature is below freezing, (28.9 °F) and eighty percent of all days in the third week of February were between 15 and 43 °F (the light orange region). The minimum temperature was colder than 15 °F or warmer than 54 °F 2% of the time (the light blue region). Maximum daily temperature was an average of 51 °F, and was rarely below freezing or above 72 °F.

Another way to look at this sort of data is to count particular occurances and divide by the total, “binning” the data into groups. Here we look at the number of days that were below freezing, colder than 20 °F or colder than 10 °F.

temperature observed days percent chance
below freezing 3,006 62.0
colder than 20 1,079 22.3
colder than 10 203 4.2
TOTAL 4,846 100.0

What about the daily maximum temperature?

temperature observed days percent chance
colder than 20 22 0.4
below freezing 371 7.6
below 40 1,151 23.5
above 50 2,569 52.4
above 60 1,157 23.6
above 70 80 1.6
TOTAL 4,900 100.0

The chances of it being below freezing during the day are pretty slim, and more than half the time it’s warmer than 50 °F, so even if it’s cold at night, I should be able to get plenty warm hiking during the day.

Precipitation

How often it rains, and how much falls when it does is also important for planning a successful backpacking trip. Most of my backpacking has been done in the summer in California, where rainfall is rare and even when it does rain, it’s typically over quickly. Daily weather data can’t tell us about the hourly pattern of rainfall, but we can find out how often and how much it has rained in the past.

rainfall amount observed days percent chance
raining 2,375 40.0
tenth 1,610 27.1
quarter 1,136 19.1
half 668 11.2
inch 308 5.2
TOTAL 5,942 100.0

This data shows that the chance of rain on any given day between February 14th and the 21st is 40%, and the chance of getting at least a tenth of an inch is 30%. That’s certainly higher than in the Sierra Nevada in July, although by August, afternoon thunderstorms are more common in the mountains.

When there is precipitation, the distribution of precipitation totals looks like this:

cumulative frequency precipition
1% 0.01
5% 0.02
10% 0.02
25% 0.07
50% 0.22
75% 0.59
90% 1.18
95% 1.71
99% 2.56

These numbers are cumulative which means that on 1 percent of the days with precipition, there was a hundredth of an inch of liquid precipitation or less. Ten percent of the days had 0.02 inches or less. And 50 percent of rainy days had 0.22 inches or liquid precipitation or less. Reading the numbers from the top of the distribution, there was more than an inch of rain 10 percent of the days on which it rained, which is a little disturbing.

One final question about precipitation is how long it rains once it starts raining? Do we get little showers here and there, or are there large storms that dump rain for days without a break? To answer this question, I counted the number of days between zero-rainfall days, which is equal to the number of consecutive days where it rained.

consecutive days percent chance
1 53.0
2 24.4
3 11.9
4 7.5
5 2.2
6 0.9
7 0.1

The results show that more than half the time, a single day of rain is followed by at least one day without. And the chances of having it rain every day of a three day trip to this area in mid-February is 11.9%.

Snowfall

Repeating the precipitation analysis with snowfall:

snowfall amount observed days percent chance
snowing 322 6.3
inch 148 2.9
two 115 2.3
TOTAL 5,091 100.0

Snowfall isn’t common on these dates, but it did happen, so I will need to be prepared for it. Also, the PRCP variable includes melted snow, so a small portion of the precipitation from the previous section overlaps with the snowfall shown here.

Conclusion

Based on this analysis, a 3‒5 day backpacking trip to the Big South Fork National River and Recreation area seems well within my abilities and my gear. It will almost certainly be below freezing at night, but isn’t likely to be much below 20 °F, snowfall is uncommon, and even though I will probably experience some rain, it shouldn’t be too much or carry on for the entire trip.

Appendix

The R code for this analysis appears below. I’ve loaded the GHCND data into a PostgreSQL database with observation data partitioned by year. The database tables are structured basically as they come from the National Centers for Environmental Information.

library(tidyverse)
library(dbplyr)
library(glue)
library(maps)
library(sp)
library(rgdal)
library(scales)
library(knitr)

noaa <- src_postgres(dbname = "noaa")

biso_stations <- noaa %>%
    tbl(build_sql(
        "WITH inv AS (
            SELECT station_id, max(start_year) AS start_year,
                min(end_year) AS end_year,
                array_agg(variable::text) AS variables
            FROM ghcnd_inventory
            WHERE variable IN ('TMIN', 'TMAX', 'PRCP', 'SNOW')
            GROUP BY station_id)
         SELECT station_id, station_name, start_year, end_year,
            latitude, longitude,
            ST_Distance(ST_Transform(a.the_geom, 32617),
                        ST_Transform(b.the_geom, 32617))/1609 AS miles
         FROM ghcnd_stations AS a
            INNER JOIN inv USING(station_id),
            (SELECT ST_SetSRID(
                ST_MakePoint(-84.701553,
                              36.506800), 4326) AS the_geom) AS b
         WHERE inv.variables @> ARRAY['TMIN', 'TMAX', 'PRCP', 'SNOW']
            AND end_year = 2017
            AND ST_Distance(ST_Transform(a.the_geom, 32617),
                            ST_Transform(b.the_geom, 32617))/1609 < 65
         ORDER BY miles"))

start_doy <- 32  # Feb 1
end_doy <- 59    # Feb 28

ghcnd_variables <- noaa %>% tbl("ghcnd_variables")

# ghcnd_obs partitioned by year, so query by year
obs_by_year <- function(conn, year, start_doy, end_doy) {
    print(year)
    filter_start_dte <- glue("{year}-01-01")
    filter_end_dte <- glue("{year}-12-31")
    conn %>% tbl("ghcnd_obs") %>%
        inner_join(biso_stations) %>%
        inner_join(ghcnd_variables) %>%
        mutate(doy = date_part('doy', dte),
            value = raw_value * raw_multiplier) %>%
        filter(dte >= filter_start_dte,
               dte <= filter_end_dte,
               doy >= start_doy, doy <= end_doy,
               is.na(qual_flag),
               variable %in% c('TMIN', 'TMAX', 'PRCP', 'SNOW')) %>%
        select(-c(raw_value, time_of_obs, qual_flag, description,
                  raw_multiplier)) %>%
        collect()
}

feb_obs <- map_df(1968:2017, function(x)
                  obs_by_year(noaa, x, start_doy, end_doy))

# MAP
restrict_miles <- 50
biso_filtered <- biso %>%
    filter(miles < restrict_miles)

nps_boundary <- readOGR("nps_boundary.shp", verbose = FALSE)
biso_boundary <- subset(nps_boundary, UNIT_CODE == 'BISO')
biso_df <- fortify(biso_boundary) %>% tbl_df()

q <- ggplot(data = biso_filtered,
            aes(x = longitude, y = latitude)) +
    theme_bw() +
    theme(axis.text = element_blank(), axis.ticks = element_blank(),
            panel.grid = element_blank()) +
    geom_hline(yintercept = 36.6,
               colour = "darkcyan",
               size = 0.5) +
    geom_point(colour = "darkred") +
    geom_text(aes(label = str_to_title(station_name)), size = 3,
              hjust = 0.5, vjust = 0, nudge_y = 0.01) +
    geom_polygon(data = biso_df,
                 aes(x = long, y = lat),
                 fill = "darkgreen") +
    scale_x_continuous(name = "",
                       limits = c(min(biso_filtered$longitude) - 0.02,
                                  max(biso_filtered$longitude) + 0.02)) +
    scale_y_continuous(name = "",
                       limits = c(min(biso_filtered$latitude) - 0.02,
                                  max(biso_filtered$latitude) + 0.02)) +
    coord_quickmap()

print(q)

# OBS
feb_obs_filtered <- feb_obs %>%
    filter(miles < restrict_miles,
           doy >= 45, doy <= 52)  # feb 14-21

# TEMP PLOTS
tmin_rects <- tibble(pwidth = c("80", "98"),
                     xmin = quantile((feb_obs_filtered %>%
                                      filter(variable == 'TMIN'))$value*9/5+32,
                                     c(0.10, 0.01)),
                     xmax = quantile((feb_obs_filtered %>%
                                      filter(variable == 'TMIN'))$value*9/5+32,
                                     c(0.90, 0.99)),
                     ymin = -Inf, ymax = Inf)
q <- ggplot(data = feb_obs_filtered %>% filter(variable == 'TMIN'),
            aes(x = value*9/5+32)) +
    theme_bw() +
    geom_rect(data = tmin_rects %>% filter(pwidth == "98"), inherit.aes = FALSE,
              aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax),
              fill = "darkcyan", alpha = 0.2) +
    geom_rect(data = tmin_rects %>% filter(pwidth == "80"), inherit.aes = FALSE,
              aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax),
              fill = "darkorange", alpha = 0.2) +
    geom_vline(xintercept = mean((feb_obs_filtered %>%
                                      filter(variable == 'TMIN'))$value*9/5+32),
               colour = "red",
               size = 0.5) +
    geom_histogram(binwidth = 1) +
    scale_x_continuous(name = "Minimum temperature (°F)",
                       breaks = pretty_breaks(n = 10)) +
    scale_y_continuous(name = "Days", breaks = pretty_breaks(n = 6)) +
    ggtitle("Minimum daily temperature distribution, February 14‒21")

print(q)

max_temp_distribution <-
    quantile((feb_obs_filtered %>%
                filter(variable == 'TMAX'))$value*9/5 + 32,
    c(0.01, 0.05, 0.10, 0.25, 0.5, 0.75, 0.90, 0.95, 0.99))

tmax_rects <- tibble(pwidth = c("80", "98"),
                     xmin = quantile((feb_obs_filtered %>%
                                      filter(variable == 'TMAX'))$value*9/5+32,
                                     c(0.10, 0.01)),
                     xmax = quantile((feb_obs_filtered %>%
                                      filter(variable == 'TMAX'))$value*9/5+32,
                                     c(0.90, 0.99)),
                     ymin = -Inf, ymax = Inf)

q <- ggplot(data = feb_obs_filtered %>% filter(variable == 'TMAX'),
            aes(x = value*9/5+32)) +
    theme_bw() +
    geom_rect(data = tmax_rects %>% filter(pwidth == "98"), inherit.aes = FALSE,
              aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax),
              fill = "darkcyan", alpha = 0.2) +
    geom_rect(data = tmax_rects %>% filter(pwidth == "80"), inherit.aes = FALSE,
              aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax),
              fill = "darkorange", alpha = 0.2) +
    geom_vline(xintercept = mean((feb_obs_filtered %>%
                                      filter(variable == 'TMAX'))$value*9/5+32),
               colour = "red",
               size = 0.5) +
    geom_histogram(binwidth = 1) +
    scale_x_continuous(name = "Maximum temperature (°F)",
                       breaks = pretty_breaks(n = 10)) +
    scale_y_continuous(name = "Days", breaks = pretty_breaks(n = 8)) +
    ggtitle("Maximum daily temperature distribution, February 14‒21")

print(q)

# TEMP BINS
below_freezing_percent <- feb_obs_filtered %>%
    filter(variable == 'TMIN') %>%
    mutate(`below freezing` = ifelse(value < 0, 1, 0),
           `colder than 20` = ifelse(value*9/5 + 32 < 20, 1, 0),
           `colder than 10` = ifelse(value*9/5 + 32 < 10, 1, 0)) %>%
    summarize(`below freezing` = sum(`below freezing`),
              `colder than 20` = sum(`colder than 20`),
              `colder than 10` = sum(`colder than 10`),
              TOTAL = n(),
              total = n()) %>%
    gather(temperature, `observed days`, -total) %>%
    mutate(`percent chance` = `observed days` / total * 100) %>%
    select(temperature, `observed days`, `percent chance`)

kable(below_freezing_percent, digits = 1,
      align = "lrr",
      format.args = list(big.mark = ","))

# PRCP BINS
prcp_percent <- feb_obs_filtered %>%
    filter(variable == 'PRCP') %>%
    mutate(raining = ifelse(value > 0, 1, 0),
           tenth = ifelse(value > 0.1 * 25.4, 1, 0),
           quarter = ifelse(value > 0.25 * 25.4, 1, 0),
           half = ifelse(value > 0.5 * 25.4, 1, 0),
           inch = ifelse(value > 1 * 25.4, 1, 0)) %>%
    summarize(raining = sum(raining),
              tenth = sum(tenth),
              quarter = sum(quarter),
              half = sum(half),
              inch = sum(inch),
              TOTAL = n(),
              total = n()) %>%
    gather(`rainfall amount`, `observed days`, -total) %>%
    mutate(`percent chance` = `observed days` / total * 100) %>%
    select(`rainfall amount`, `observed days`, `percent chance`)

kable(prcp_percent, digits = 1,
      align = "lrr",
      format.args = list(big.mark = ","))

# PRCP DIST
prcp_cum_freq <-
    tibble(`cumulative frequency` = c("1%", "5%", "10%", "25%", "50%", "75%", "90%",
                                      "95%", "99%"),
       precipition = quantile((feb_obs_filtered %>% filter(variable == "PRCP",
                                                           value > 0))$value/25.4,
                              c(0.01, 0.05, 0.10, 0.25, 0.5, 0.75, 0.90, 0.95, 0.99)))

kable(prcp_cum_freq, digits = 2, align="lr")

# PRCP PATTERN
no_prcp <- feb_obs %>% filter(variable == 'PRCP', value == 0,
                              miles < restrict_miles, doy >= 44, doy <= 53)
consecutive_rain <- no_prcp %>%
    group_by(station_name) %>%
    arrange(station_name, dte) %>%
    mutate(days = as.integer(dte - lag(dte) - 1)) %>%
    filter(!is.na(days), days > 0, days < 10)

consecutive_days_dist <- consecutive_rain %>%
    ungroup() %>%
    mutate(total = n()) %>%
    arrange(days) %>%
    group_by(days, total) %>%
    summarize(`percent chance` = n()/max(total)*100) %>%
    rename(`consecutive days` = days) %>%
    select(`consecutive days`, `percent chance`)

kable(consecutive_days_dist, digits = 1,
      align = "lr")

# SNOW DIST
snow_percent <- feb_obs_filtered %>%
    filter(variable == 'SNOW') %>%
    mutate(snowing = ifelse(value > 0, 1, 0),
           half = ifelse(value > 0.5 * 25.4, 1, 0),
           inch = ifelse(value > 1 * 25.4, 1, 0),
           two = ifelse(value > 2 * 25.4, 1, 0)) %>%
    summarize(snowing = sum(snowing),
              inch = sum(inch),
              two = sum(two),
              TOTAL = n(),
              total = n()) %>%
    gather(`snowfall amount`, `observed days`, -total) %>%
    mutate(`percent chance` = `observed days` / total * 100) %>%
    select(`snowfall amount`, `observed days`, `percent chance`)

kable(snow_percent, digits = 1,
      align = "lrr",
      format.args = list(big.mark = ","))
tags: R  weather  BISO  Tennessee  Kentucky 
mon, 09-jan-2017, 09:49

Introduction

The latest forecast discussions for Northern Alaska have included warnings that we are likely to experience an extended period of below normal temperatures starting at the end of this week, and yesterday’s Deep Cold blog post discusses the similarity of model forecast patterns to patterns seen in the 1989 and 1999 extreme cold events.

Our dogs spend most of their time in the house when we’re home, but if both of us are at work they’re outside in the dog yard. They have insulated dog houses, but when it’s colder than −15° F, we put them into a heated dog barn. That means one of us has to come home in the middle of the day to let them out to go to the bathroom.

Since we’re past the Winter Solstice, and day length is now increasing, I was curious to see if that has an effect on daily temperature, hopeful that the frequency of days when we need to put the dogs in the barn is decreasing.

Methods

We’ll use daily minimum and maximum temperature data from the Fairbanks International Airport station, keeping track of how many years the temperatures are below −15° F and dividing by the total to get a frequency. We live in a cold valley on Goldstream Creek, so our temperatures are typically several degrees colder than the Fairbanks Airport, and we often don’t warm up as much during the day as in other places, but minimum airport temperature is a reasonable proxy for the overall winter temperature at our house.

Results

The following plot shows the frequency of minimum (the top of each line) and maximum (the bottom) temperature colder than −15° F at the airport over the period of record, 1904−2016. The curved blue line represents a best fit line through the minimum temperature frequency, and the vertical blue line is drawn at the date when the frequency is the highest.

Frequency of days with temperatures below −15° F

The maximum frequency is January 12th, so we have a few more days before the likelihood of needing to put the dogs in the barn starts to decline. The plot also shows that we could still reach that threshold all the way into April.

For fun, here’s the same plot using −40° as the threshold:

Frequency of days with temperatures below −40°

The date when the frequency starts to decline is shifted slightly to January 15th, and you can see the frequencies are lower. In mid-January, we can expect minimum temperature to be colder than −15° F more than half the time, but temperatures colder than −40° are just under 15%. There’s also an interesting anomaly in mid to late December where the frequency of very cold temperatures appears to drop.

Appendix: R code

library(tidyverse)
library(lubridate)
library(scales)

noaa <- src_postgres(host="localhost", dbname="noaa")

fairbanks <- tbl(noaa, build_sql("SELECT * FROM ghcnd_pivot
                                  WHERE station_name='FAIRBANKS INTL AP'")) %>%
    collect()

save(fairbanks, file="fairbanks_ghcnd.rdat")

for_plot <- fairbanks %>%
    mutate(doy=yday(dte),
           dte_str=format(dte, "%d %b"),
           min_below=ifelse(tmin_c < -26.11,1,0),
           max_below=ifelse(tmax_c < -26.11,1,0)) %>%
    filter(dte_str!="29 Feb") %>%
    mutate(doy=ifelse(leap_year(dte) & doy>60, doy-1, doy),
           doy=(doy+31+28+31+30)%%365) %>%
    group_by(doy, dte_str) %>%
    mutate(n_min=sum(ifelse(!is.na(min_below), 1, 0)),
           n_max=sum(ifelse(!is.na(max_below), 1, 0))) %>%
    summarize(min_freq=sum(min_below, na.rm=TRUE)/max(n_min, na.rm=TRUE),
              max_freq=sum(max_below, na.rm=TRUE)/max(n_max, na.rm=TRUE))

x_breaks <- for_plot %>%
    filter(doy %in% seq(49, 224, 7))

stats <- tibble(doy=seq(49, 224),
                pred=predict(loess(min_freq ~ doy,
                                   for_plot %>%
                                       filter(doy >= 49, doy <= 224))))

max_stats <- stats %>%
    arrange(desc(pred)) %>% head(n=1)

p <- ggplot(data=for_plot,
            aes(x=doy, ymin=min_freq, ymax=max_freq)) +
    geom_linerange() +
    geom_smooth(aes(y=min_freq), se=FALSE, size=0.5) +
    geom_segment(aes(x=max_stats$doy, xend=max_stats$doy,
                     y=-Inf, yend=max_stats$pred),
                 colour="blue", size=0.5) +
    scale_x_continuous(name=NULL,
                       limits=c(49, 224),
                       breaks=x_breaks$doy,
                       labels=x_breaks$dte_str) +
    scale_y_continuous(name="Frequency of days colder than −15° F",
                       breaks=pretty_breaks(n=10)) +
    theme_bw() +
    theme(axis.text.x=element_text(angle=30, hjust=1))

# Minus 40
for_plot <- fairbanks %>%
    mutate(doy=yday(dte),
           dte_str=format(dte, "%d %b"),
           min_below=ifelse(tmin_c < -40,1,0),
           max_below=ifelse(tmax_c < -40,1,0)) %>%
    filter(dte_str!="29 Feb") %>%
    mutate(doy=ifelse(leap_year(dte) & doy>60, doy-1, doy),
           doy=(doy+31+28+31+30)%%365) %>%
    group_by(doy, dte_str) %>%
    mutate(n_min=sum(ifelse(!is.na(min_below), 1, 0)),
           n_max=sum(ifelse(!is.na(max_below), 1, 0))) %>%
    summarize(min_freq=sum(min_below, na.rm=TRUE)/max(n_min, na.rm=TRUE),
              max_freq=sum(max_below, na.rm=TRUE)/max(n_max, na.rm=TRUE))

x_breaks <- for_plot %>%
    filter(doy %in% seq(63, 203, 7))

stats <- tibble(doy=seq(63, 203),
                pred=predict(loess(min_freq ~ doy,
                                   for_plot %>%
                                       filter(doy >= 63, doy <= 203))))

max_stats <- stats %>%
    arrange(desc(pred)) %>% head(n=1)

q <- ggplot(data=for_plot,
            aes(x=doy, ymin=min_freq, ymax=max_freq)) +
    geom_linerange() +
    geom_smooth(aes(y=min_freq), se=FALSE, size=0.5) +
    geom_segment(aes(x=max_stats$doy, xend=max_stats$doy,
                     y=-Inf, yend=max_stats$pred),
                 colour="blue", size=0.5) +
    scale_x_continuous(name=NULL,
                       limits=c(63, 203),
                       breaks=x_breaks$doy,
                       labels=x_breaks$dte_str) +
    scale_y_continuous(name="Frequency of days colder than −40°",
                       breaks=pretty_breaks(n=10)) +
    theme_bw() +
    theme(axis.text.x=element_text(angle=30, hjust=1))
tags: weather  climate  temperature  R 
sat, 19-nov-2016, 15:50

Introduction

So far this winter we’ve gotten only 4.1 inches of snow, well below the normal 19.7 inches, and there is only 2 inches of snow on the ground. At this point last year we had 8 inches and I’d been biking and skiing on the trail to work for two weeks. In his North Pacific Temperature Update blog post, Richard James mentions that winters like this one, with a combined strongly positive Pacific Decadal Oscillation phase and strongly negative North Pacific Mode phase tend to be a “distinctly dry” pattern for interior Alaska. I don’t pretend to understand these large scale climate patterns, but I thought it would be interesting to look at snowfall and snow depth in years with very little mid-November snow. In other years like this one do we eventually get enough snow that the trails fill in and we can fully participate in winter sports like skiing, dog mushing, and fat biking?

Data

We will use daily data from the Global Historical Climate Data set for the Fairbanks International Airport station. Data prior to 1950 is excluded because of poor quality snowfall and snow depth data and because there’s a good chance that our climate has changed since then and patterns from that era aren’t a good model for the current climate in Alaska.

We will look at both snow depth and the cumulative winter snowfall.

Results

The following tables show the ten years with the lowest cumulative snowfall and snow depth values from 1950 to the present on November 18th.

Year Cumulative Snowfall (inches)
1953 1.5
2016 4.1
1954 4.3
2014 6.0
2006 6.4
1962 7.5
1998 7.8
1960 8.5
1995 8.8
1979 10.2
Year Snow depth (inches)
1953 1
1954 1
1962 1
2016 2
2014 2
1998 3
1964 3
1976 3
1971 3
2006 4

2016 has the second-lowest cumulative snowfall behind 1953 and is tied for second with 2014 for snow depth with 1953, 1954 and 1962 all having only 1 inch of snow on November 18th.

It also seems like recent years appear in these tables more frequently than would be expected. Grouping by decade and averaging cumulative snowfall and snow depth yields the pattern in the chart below. The error bars (not shown) are fairly large, so the differences between decades aren’t likely to be statistically significant, but there is a pattern of lower snowfall amounts in recent decades.

Decadal average cumulative snowfall and snow depth

Now let’s see what happened in those years with low snowfall and snow depth values in mid-November starting with cumulative snowfall. The following plot (and the subsequent snow depth plot) shows the data for the low-value years (and one very high snowfall year—1990), with each year’s data as a separate line. The smooth dark cyan line through the middle of each plot is the smoothed line through the values for all years; a sort of “average” snowfall and snow depth curve.

Cumulative snowfall, years with low snow on November 18

In all four mid-November low-snowfall years, the cumulative snowfall values remain below average throughout the winter, but snow did continue to fall as the season went on. Even the lowest winter year here, 2006–2007, still ended the winter with 15 inches of snow on the groud.

The following plot shows snow depth for the four years with the lowest snow depth on November 18th. The data is formatted the same as in the previous plot except we’ve jittered the values slightly to make the plot easier to read.

Snow depth, years with low snow on November 18

The pattern here is similar, but the snow depths get much closer to the average values. Snow depth for all four low snow years remain low throughout November, but start rising in December, dramatically in 1954 and 2014.

One of the highest snowfall years between 1950 and 2016 was 1990–1991 (shown on both plots). An impressive 32.8 inches of snow fell in eight days between December 21st and December 28th, accounting for the sharp increase in cumulative snowfall and snow depth shown on both plots. There are five years in the record where the cumulative total for the entire winter was lower than these eight days in 1990.

Conclusion

Despite the lack of snow on the ground to this point in the year, the record shows that we are still likely to get enough snow to fill in the trails. We may need to wait until mid to late December, but it’s even possible we’ll eventually reach the long term average depth before spring.

Appendix

Here’s the R code used to generate the statistics, tables and plots from this post:

library(tidyverse)
library(lubridate)
library(scales)
library(knitr)

noaa <- src_postgres(host="localhost", dbname="noaa")

snow <- tbl(noaa, build_sql(
   "WITH wdoy_data AS (
         SELECT dte, dte - interval '120 days' as wdte,
            tmin_c, tmax_c, (tmin_c+tmax_c)/2.0 AS tavg_c,
            prcp_mm, snow_mm, snwd_mm
         FROM ghcnd_pivot
         WHERE station_name = 'FAIRBANKS INTL AP'
         AND dte > '1950-09-01')
   SELECT dte, date_part('year', wdte) AS wyear, date_part('doy', wdte) AS wdoy,
         to_char(dte, 'Mon DD') AS mmdd,
         tmin_c, tmax_c, tavg_c, prcp_mm, snow_mm, snwd_mm
   FROM wdoy_data")) %>%
   mutate(wyear=as.integer(wyear),
            wdoy=as.integer(wdoy),
            snwd_mm=as.integer(snwd_mm)) %>%
   select(dte, wyear, wdoy, mmdd,
            tmin_c, tmax_c, tavg_c, prcp_mm, snow_mm, snwd_mm) %>% collect()

write_csv(snow, "pafa_data_with_wyear_post_1950.csv")
save(snow, file="pafa_data_with_wyear_post_1950.rdata")

cum_snow <- snow %>%
   mutate(snow_na=ifelse(is.na(snow_mm),1,0),
         snow_mm=ifelse(is.na(snow_mm),0,snow_mm)) %>%
   group_by(wyear) %>%
   mutate(snow_mm_cum=cumsum(snow_mm),
         snow_na=cumsum(snow_na)) %>%
   ungroup() %>%
   mutate(snow_in_cum=round(snow_mm_cum/25.4, 1),
         snwd_in=round(snwd_mm/25.4, 0))

nov_18_snow <- cum_snow %>%
   filter(mmdd=='Nov 18') %>%
   select(wyear, snow_in_cum, snwd_in) %>%
   arrange(snow_in_cum)

decadal_avg <- nov_18_snow %>%
   mutate(decade=as.integer(wyear/10)*10) %>%
   group_by(decade) %>%
   summarize(`Snow depth`=mean(snwd_in),
            snwd_sd=sd(snwd_in),
            `Cumulative Snowfall`=mean(snow_in_cum),
            snow_cum_sd=sd(snow_in_cum))

decadal_averages <- ggplot(decadal_avg %>%
                              gather(variable, value, -decade) %>%
                              filter(variable %in% c("Cumulative Snowfall",
                                                      "Snow depth")),
                           aes(x=as.factor(decade), y=value, fill=variable)) +
            theme_bw() +
            geom_bar(stat="identity", position="dodge") +
            scale_x_discrete(name="Decade", breaks=c(1950, 1960, 1970, 1980,
                                                   1990, 2000, 2010)) +
            scale_y_continuous(name="Inches", breaks=pretty_breaks(n=10)) +
            scale_fill_discrete(name="Measurement")

print(decadal_averages)

date_x_scale <- cum_snow %>%
   filter(grepl(' (01|15)', mmdd), wyear=='1994') %>%
   select(wdoy, mmdd)

cumulative_snowfall <-
   ggplot(cum_snow %>% filter(wyear %in% c(1953, 1954, 2014, 2006, 1990),
                              wdoy>183,
                              wdoy<320),
            aes(x=wdoy, y=snow_in_cum, colour=as.factor(wyear))) +
   theme_bw() +
   geom_smooth(data=cum_snow %>% filter(wdoy>183, wdoy<320),
               aes(x=wdoy, y=snow_in_cum),
               size=0.5, colour="darkcyan",
               inherit.aes=FALSE,
               se=FALSE) +
   geom_line(position="jitter") +
   scale_x_continuous(name="",
                     breaks=date_x_scale$wdoy,
                     labels=date_x_scale$mmdd) +
   scale_y_continuous(name="Cumulative snowfall (in)",
                     breaks=pretty_breaks(n=10)) +
   scale_color_discrete(name="Winter year")

print(cumulative_snowfall)

snow_depth <-
   ggplot(cum_snow %>% filter(wyear %in% c(1953, 1954, 1962, 2014, 1990),
                              wdoy>183,
                              wdoy<320),
            aes(x=wdoy, y=snwd_in, colour=as.factor(wyear))) +
   theme_bw() +
   geom_smooth(data=cum_snow %>% filter(wdoy>183, wdoy<320),
               aes(x=wdoy, y=snwd_in),
               size=0.5, colour="darkcyan",
               inherit.aes=FALSE,
               se=FALSE) +
   geom_line(position="jitter") +
   scale_x_continuous(name="",
                     breaks=date_x_scale$wdoy,
                     labels=date_x_scale$mmdd) +
   scale_y_continuous(name="Snow Depth (in)",
                     breaks=pretty_breaks(n=10)) +
   scale_color_discrete(name="Winter year")

print(snow_depth)
tags: snow depth  snowfall  weather  climate  R 

0 1 2 3 4 5 6 7 8 9 10 11 12 >>
Meta Photolog Archives