Rainfall Days

weather
rain
R
Published

May 26, 2024

Summer Clouds

Introduction

Over the last three days we’ve gotten 0.92 inches of rain at our house (0.40, 0.39 and 0.13 each day). The airport got 0.79 inches (0.03, 0.51, 0.25). The daily differences are due to the different times that the daily totals are reported (midnight at the airport, 6 AM at our house). I thought it would be interesting to look at how many days each year we experience rainfall greater than a particular amount. For example, in an average year, on how many days would we expect to get more than 0.4 inches of rain?

Methods

We’ll retrieve precipitation and snowfall data from the Airport, and exclude any day where there was measurable snowfall. Precipitation data includes measuring melted snowfall, and we’re only interested in rain events, so we discard any precipitation value that could be partially snow. The data comes from the Global Historical Climatology Network daily (GHCNd) data set, and we are retrieving data for the Fairbanks, Anchorage, Juneau, Portland (Oregon), and Rochester (New York) airports, as well as the Davis California experimental farm for the full extent of their record. We remove the data for the earliest and latest year because they are likely to be incomplete.

Code
library(tidyverse)
library(glue)
library(fs)
library(RPostgres)
library(units)
library(scales)
library(patchwork)
library(gt)

if (!file_exists("obs_pivot.rds")) {
  db <- dbConnect(
    Postgres(),
    host = "bbma",
    dbname = "noaa",
    timezone = "US/Alaska"
  )

  ghcnd_stations <- db |>
    tbl("ghcnd_stations") |>
    filter(
      station_name == "FAIRBANKS INTL AP" |
        station_name == "ROCHESTER GTR INTL AP" |
        station_name == "PORTLAND INTL AP" |
        station_name == "ANCHORAGE MERRILL FLD" |
        station_name == "JUNEAU INTL AP" |
        station_name == "DAVIS 2 WSW EXP FARM"
    )

  ghcnd_variables <- db |>
    tbl("ghcnd_variables") |>
    filter(variable %in% c("PRCP", "SNOW"))

  ghcnd_obs <- db |>
    tbl("ghcnd_obs") |>
    inner_join(ghcnd_stations, join_by(station_id)) |>
    inner_join(ghcnd_variables, join_by(variable)) |>
    filter(!is.na(raw_value)) |>
    mutate(value = raw_value * raw_multiplier) |>
    select(station_name, dte, variable, value, meas_flag) |>
    collect()

  obs_pivot <- ghcnd_obs |>
    mutate(
      value = set_units(set_units(value, "mm"), "inches"),
      value = case_when(
        meas_flag == "T" & variable == "PRCP" ~ set_units(0.005, "inches"),
        meas_flag == "T" & variable == "SNOW" ~ set_units(0.05, "inches"),
        TRUE ~ value
      )
    ) |>
    select(station_name, dte, variable, value) |>
    pivot_wider(id_cols = c(station_name, dte), names_from = variable, values_from = value)

  saveRDS(obs_pivot, "obs_pivot.rds")
} else {
  obs_pivot <- readRDS("obs_pivot.rds")
}

# Remove earliest/latest year
obs_pivot <- obs_pivot |>
  group_by(station_name) |>
  filter(
    year(dte) != min(year(dte)),
    year(dte) != min(year(dte))
  ) |>
  ungroup()

just_rain <- obs_pivot |>
  filter(
    !is.na(PRCP),
    !is.na(SNOW),
    SNOW == set_units(0, "inches")
  )

After filtering the data to contain only rain, we’ll write a function to count the number of days in each year where the rainfall was greater than or equal to the value passed in as an argument. Then summarize those yearly values into the mean, median, maximum, and 5th and 95th percentiles. Of those statistics, median and maximum are the most useful. Because these are counts (days where an event occurs) the middle value (median) is a better estimator than the average (mean), which will be biased in a skewed distribution.

Finally, we select a series of rainfall amounts from 0 to 4 inches of rain in a day, and run the function for each of these values. We also created a subset of the data since 1990 to see if the pattern of rain has changed more recently.

Code
days_gte <- function(prcp, rain) {
  rain |>
    mutate(
      year = year(dte),
      gte = as.integer(PRCP >= set_units(prcp, "inches"))
    ) |>
    summarize(days = sum(gte), .by = c(station_name, year)) |>
    summarize(
      mean = mean(days),
      median = median(days),
      prob05 = quantile(days, prob = 0.05),
      prob95 = quantile(days, prob = 0.95),
      max = max(days),
      n = n(),
      .by = station_name
    ) |>
    mutate(prcp_level = prcp) |>
    select(prcp_level, everything())
}

if (!file_exists("all_recent.rds")) {
  days_rain <- map(
    seq(0, 3.4, 0.01),
    function(prcp) days_gte(prcp, just_rain)
  ) |>
    list_rbind()

  days_rain_1990 <- map(
    seq(0, 3.4, 0.01),
    function(prcp) {
      days_gte(
        prcp,
        just_rain |> filter(dte >= ymd("1990-01-01"))
      )
    }
  ) |>
    list_rbind()

  all_recent <- bind_rows(
    days_rain |> mutate(period = "all years"),
    days_rain_1990 |> mutate(period = "since 1990")
  )

  saveRDS(all_recent, "all_recent.rds")
} else {
  all_recent <- readRDS("all_recent.rds")
}

Results

Fairbanks

Here’s a pair of plots showing the median (top plot) and maximum (bottom) number of days per year (the y-axis) for various rainfall amounts (the x-axis). The full record (1930‒2023) for the Fairbanks station is shown in orange, and the data from 1990 to 2023 is shown in dark cyan.

Code
median_plot <- ggplot(
  data = all_recent |> filter(station_name == "FAIRBANKS INTL AP"),
  aes(x = prcp_level, y = median, color = period),
  alpha = 0.75
) +
  geom_line() +
  coord_cartesian(xlim = c(0, 1), ylim = c(0, 30)) +
  theme_bw() +
  scale_x_continuous(
    name = "Precipitation (inches)",
    breaks = pretty_breaks(n = 10)
  ) +
  scale_y_continuous(
    name = "Median days per year",
    breaks = pretty_breaks(n = 10)
  ) +
  scale_color_manual(
    values = c("darkorange", "darkcyan")
  ) +
  theme(
    legend.position = "none",
    axis.title.x = element_blank(),
    axis.text.x = element_blank(),
    axis.ticks.x = element_blank(),
    plot.margin = margin(1, 1, 0, 1, "pt")
  ) +
  labs(title = "Median days of precipitation")
max_plot <- ggplot(
  data = all_recent |> filter(station_name == "FAIRBANKS INTL AP"),
  aes(x = prcp_level, y = max, color = period),
  alpha = 0.75
) +
  geom_line() +
  coord_cartesian(xlim = c(0, 1), ylim = c(0, 30)) +
  theme_bw() +
  scale_x_continuous(
    name = "Precipitation (inches)",
    breaks = pretty_breaks(n = 10)
  ) +
  scale_y_continuous(
    name = "Maximum days in one year",
    breaks = pretty_breaks(n = 10)
  ) +
  scale_color_manual(
    values = c("darkorange", "darkcyan")
  ) +
  theme(
    legend.position = "inside",
    legend.position.inside = c(0.9, 0.7)
  ) +
  labs(title = "Maximum days of precipitation")
median_plot / max_plot

From this plot, you can see that getting 0.4 inches of rain in one day happens about 4 times in an average year and that at least once, Fairbanks got 0.4 inches or more on 13 days in a single year.

There isn’t much of a difference between the two curves except a slight uptick in the number of days with heavier rainfall events. The dark cyan line (more recent data) looks like it’s slightly ahead of the orange line for rain above 0.45 inches. These patterns would probably be more pronounced if the graph were to compare pre- and post-1990 data instead of comparing all years against the last 33 years.

Alaska

What about the precipitation data from other locations in the state? Here’s a set of median plots for Fairbanks, Anchorage, and Juneau.

Code
station_labels <- tribble(
  ~station_name, ~station_label,
  "FAIRBANKS INTL AP", "Fairbanks Alaska",
  "ANCHORAGE MERRILL FLD", "Anchorage Alaska",
  "JUNEAU INTL AP", "Juneau Alaska",
  "ROCHESTER GTR INTL AP", "Rochester New York",
  "PORTLAND INTL AP", "Portland Oregon",
  "DAVIS 2 WSW EXP FARM" , "Davis California"
)
median_plot <- ggplot(
  data = all_recent |>
    filter(
      station_name %in% c(
        "FAIRBANKS INTL AP",
        "ANCHORAGE MERRILL FLD",
        "JUNEAU INTL AP"
      )
    ) |>
    inner_join(station_labels, join_by(station_name)) |>
    mutate(station_label = factor(
      station_label,
      levels = c(
        "Fairbanks Alaska", "Anchorage Alaska",
        "Juneau Alaska"
      )
    )),
  aes(x = prcp_level, y = median, color = period),
  alpha = 0.75
) +
  geom_line() +
  coord_cartesian(xlim = c(0, 1), ylim = c(0, 30)) +
  theme_bw() +
  scale_x_continuous(
    name = "Precipitation (inches)",
    breaks = pretty_breaks(n = 10)
  ) +
  scale_y_continuous(
    name = "Median days per year",
    breaks = pretty_breaks(n = 10)
  ) +
  scale_color_manual(
    values = c("darkorange", "darkcyan")
  ) +
  facet_wrap(vars(station_label), ncol = 1)
median_plot

It’s a bit rainier in Anchorage than Fairbanks, but not by as much as I would have expected. Juneau, on the other hand, is off the charts rainy with more than 30 days in the year with more than half an inch of rain. The change in precipitation in the last 30 years is also quite pronounced in both Anchorage (lower precipitation compared with the full record) and Juneau (higher precipitation).

Outside

Finally, Portland, Rochester, and Davis California.

Code
station_labels <- tribble(
  ~station_name, ~station_label,
  "FAIRBANKS INTL AP", "Fairbanks Alaska",
  "ANCHORAGE MERRILL FLD", "Anchorage Alaska",
  "JUNEAU INTL AP", "Juneau Alaska",
  "ROCHESTER GTR INTL AP", "Rochester New York",
  "PORTLAND INTL AP", "Portland Oregon",
  "DAVIS 2 WSW EXP FARM" , "Davis California"
)
median_plot <- ggplot(
  data = all_recent |>
    filter(
      station_name %in% c(
        "FAIRBANKS INTL AP",
        "ROCHESTER GTR INTL AP",
        "PORTLAND INTL AP",
        "DAVIS 2 WSW EXP FARM"
      )
    ) |>
    inner_join(station_labels, join_by(station_name)) |>
    mutate(station_label = factor(
      station_label,
      levels = c(
        "Fairbanks Alaska", "Rochester New York",
        "Portland Oregon", "Davis California"
      )
    )),
  aes(x = prcp_level, y = median, color = period),
  alpha = 0.75
) +
  geom_line() +
  coord_cartesian(xlim = c(0, 1), ylim = c(0, 30)) +
  theme_bw() +
  scale_x_continuous(
    name = "Precipitation (inches)",
    breaks = pretty_breaks(n = 10)
  ) +
  scale_y_continuous(
    name = "Median days per year",
    breaks = pretty_breaks(n = 10)
  ) +
  scale_color_manual(
    values = c("darkorange", "darkcyan")
  ) +
  facet_wrap(vars(station_label), ncol = 1)
median_plot

Rochester and Portland are similar to each other, with around 15 days with more than half an inch of rain per year, but opposite patterns recently, with Rochester experiencing slightly higher rainfall and slightly lower in Portland. Having lived in Davis for three years, I’m surprised at how rainy it actually is, with 10 days a year with more than half an inch of rain.

Tabular data

Here’s all the data in tabular form.

Code
station_labels <- tribble(
  ~station_name, ~station_label,
  "FAIRBANKS INTL AP", "Fairbanks Alaska",
  "ANCHORAGE MERRILL FLD", "Anchorage Alaska",
  "JUNEAU INTL AP", "Juneau Alaska",
  "ROCHESTER GTR INTL AP", "Rochester New York",
  "PORTLAND INTL AP", "Portland Oregon",
  "DAVIS 2 WSW EXP FARM", "Davis California"
)
tabular <- all_recent |>
  inner_join(station_labels, join_by(station_name)) |>
  mutate(station_label = factor(
    station_label,
    levels = c(
      "Fairbanks Alaska", "Anchorage Alaska",
      "Juneau Alaska", "Rochester New York",
      "Portland Oregon", "Davis California"
    )
  )) |>
  select(-station_name) |>
  filter(prcp_level %in% c(0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.5, 2, 2.5, 3)) |>
  pivot_wider(
    id_cols = c(station_label, prcp_level),
    names_from = period,
    values_from = c(median, max)
  ) |>
  group_by(station_label) |>
  arrange(station_label, prcp_level) |>
  gt() |>
  tab_spanner(
    label = "Median",
    columns = contains("median")
  ) |>
  tab_spanner(
    label = "Maximum",
    columns = contains("max")
  ) |>
  tab_spanner(
    label = "Days per year",
    columns = !c(station_label, prcp_level)
  ) |>
  cols_label(
    prcp_level = "Minimum Precipitation (inches)",
    `median_all years` = "All years",
    `median_since 1990` = "1990‒2023",
    `max_all years` = "All years",
    `max_since 1990` = "1990‒2023",
  )
tabular
Days per year
Minimum Precipitation (inches) Median Maximum
All years 1990‒2023 All years 1990‒2023
Fairbanks Alaska
0.0 264.0 263.0 300 295
0.1 20.0 21.0 36 31
0.2 11.0 10.0 22 21
0.3 6.0 6.0 16 16
0.4 4.0 4.0 13 13
0.5 2.0 2.0 9 9
0.6 1.0 2.0 9 9
0.8 0.0 1.0 8 8
0.9 0.0 0.0 5 4
1.0 0.0 0.0 4 4
1.5 0.0 0.0 1 1
2.0 0.0 0.0 1 1
2.5 0.0 0.0 1 0
3.0 0.0 0.0 1 0
Anchorage Alaska
0.0 254.0 199.5 336 264
0.1 25.0 16.0 39 38
0.2 15.0 8.5 26 26
0.3 9.0 4.5 17 14
0.4 5.0 3.5 12 9
0.5 2.0 1.5 8 6
0.6 1.0 1.0 6 6
0.8 0.0 0.0 3 2
0.9 0.0 0.0 3 2
1.0 0.0 0.0 2 1
1.5 0.0 0.0 1 1
2.0 0.0 0.0 0 0
2.5 0.0 0.0 0 0
3.0 0.0 0.0 0 0
Juneau Alaska
0.0 282.0 291.0 324 324
0.1 95.0 102.0 133 133
0.2 69.0 78.0 112 112
0.3 48.0 56.0 84 84
0.4 36.0 45.0 69 69
0.5 26.0 33.0 51 51
0.6 19.0 25.0 41 41
0.8 11.0 15.0 25 25
0.9 8.0 12.0 19 19
1.0 6.0 9.0 16 16
1.5 1.0 2.0 6 6
2.0 0.0 0.0 3 3
2.5 0.0 0.0 2 2
3.0 0.0 0.0 1 1
Rochester New York
0.0 266.5 268.0 306 306
0.1 48.5 54.0 68 68
0.2 35.5 36.0 51 51
0.3 24.0 25.0 41 41
0.4 18.0 19.0 31 31
0.5 14.0 16.0 26 26
0.6 10.0 11.0 19 19
0.8 5.0 6.0 14 14
0.9 4.0 5.0 12 12
1.0 3.0 4.0 9 9
1.5 1.0 1.0 6 6
2.0 0.0 0.0 3 2
2.5 0.0 0.0 2 2
3.0 0.0 0.0 1 1
Portland Oregon
0.0 349.0 350.5 362 360
0.1 78.0 70.5 103 96
0.2 54.0 46.5 76 68
0.3 34.0 31.0 54 49
0.4 25.0 21.0 42 35
0.5 16.0 14.0 31 23
0.6 13.0 9.5 23 16
0.8 7.0 5.0 14 9
0.9 5.0 3.5 11 8
1.0 4.0 2.0 9 8
1.5 1.0 0.0 4 4
2.0 0.0 0.0 3 3
2.5 0.0 0.0 0 0
3.0 0.0 0.0 0 0
Davis California
0.0 365.0 363.0 366 366
0.1 29.0 28.5 64 56
0.2 22.0 23.0 49 42
0.3 17.0 18.0 42 31
0.4 14.0 14.5 36 24
0.5 10.5 11.0 25 22
0.6 7.0 8.5 21 21
0.8 5.0 5.0 16 16
0.9 4.0 4.0 15 15
1.0 3.0 3.0 11 10
1.5 1.0 1.0 4 4
2.0 0.0 0.0 3 2
2.5 0.0 0.0 2 2
3.0 0.0 0.0 1 1