Days above 80 °F

weather
temperature
R
Published

June 26, 2024

Cranes in the smoke

Introduction

It’s hot, sunny, and smoky in Fairbanks today, the seventh day this year where the high temperature has been 80 °F or above. Air conditioners in Fairbanks are uncommon because we only get to 90 °F in fewer than 30% of years and it cools off at night. Of course, opening the windows to cool off the house at night isn’t an option when the AQI is 348, as it is right now.

One way to think about high temperatures is to look at how many days in a year it’s warmer than a particular threshold. For example, Phoenix, Arizona averages 235 days per year with a high temperature above 80 °F and is above 90 °F a whopping 171 days a year.

Methods

We’ll get Fairbanks Airport data from the GHCN-Daily database, retrieving the daily high temperatures and counting the number of days in each year when the temperature is above 80 °F.

Code
library(tidyverse)
library(glue)
library(fs)
library(RPostgres)
library(gt)
library(rswingley)

if (!file_exists("above_80_by_year.rds")) {
  db <- dbConnect(
    Postgres(),
    host = "bbma",
    dbname = "noaa",
    user = "reader",
    timezone = "US/Alaska"
  )

  airport <- db |>
    tbl("ghcnd_stations") |>
    filter(str_detect(station_name, "(FAIRBANKS INTL|PHOENIX SKY HARBOR INTL)")) |>
    select(station_id, station_name)

  above_80 <- db |>
    tbl("ghcnd_obs") |>
    inner_join(airport, join_by(station_id)) |>
    filter(variable == "TMAX") |>
    mutate(year = year(dte)) |>
    group_by(station_id, station_name, year) |>
    mutate(days = n()) |>
    filter(days > 360) |>
    ungroup() |>
    mutate(temp_f = raw_value * 0.1 * 9 / 5 + 32) |>
    filter(temp_f >= 79.9) |> # 0.1 degree wiggle for F->C->F conversion
    select(station_id, station_name, year, dte, temp_f)

  above_90 <- db |>
    tbl("ghcnd_obs") |>
    inner_join(airport, join_by(station_id)) |>
    filter(variable == "TMAX") |>
    mutate(year = year(dte)) |>
    group_by(station_id, station_name, year) |>
    mutate(days = n()) |>
    filter(days > 360) |>
    ungroup() |>
    mutate(temp_f = raw_value * 0.1 * 9 / 5 + 32) |>
    filter(temp_f >= 89.9) |> # 0.1 degree wiggle for F->C->F conversion
    select(station_id, station_name, year, dte, temp_f)

  by_year <- above_80 |>
    filter(year < 2024) |>
    group_by(station_name, year) |>
    summarize(
      above_80 = n(),
      .groups = "drop"
    ) |>
    collect() |>
    left_join(
      above_90 |>
        filter(year < 2024) |>
        group_by(station_name, year) |>
        summarize(
          above_90 = n(),
          .groups = "drop"
        ) |>
        collect(),
      join_by(station_name, year)
    )

  gap_fill <- expand_grid(
    station_name = unique(by_year$station_name),
    year = seq(1934, 2023)
  )

  by_year_filled <- gap_fill |>
    left_join(by_year, join_by(station_name, year)) |>
    mutate(
      above_80 = coalesce(as.integer(above_80), 0),
      above_90 = coalesce(as.integer(above_90), 0)
    ) |>
    filter(
      station_name == "FAIRBANKS INTL AP" |
      year > 1947 # Bad Phoneix data < 1947
    )

  by_year_month <- above_80 |>
    filter(year < 2024, station_name == "FAIRBANKS INTL AP") |>
    mutate(month = month(dte)) |>
    group_by(station_name, year, month) |>
    summarize(
      above_80 = n(),
      .groups = "drop"
    ) |>
    collect()

  gap_fill <- expand_grid(
    station_name = unique(by_year_month$station_name),
    year = seq(1930, 2023),
    month = seq(5, 9)
  )

  by_year_month_filled <- gap_fill |>
    left_join(by_year_month, join_by(station_name, year, month)) |>
    mutate(
      above_80 = coalesce(as.integer(above_80), 0)
    )

  saveRDS(by_year_filled, "above_80_by_year.rds")
  saveRDS(by_year_month_filled, "above_80_by_year_month.rds")
} else {
  by_year_filled <- readRDS("above_80_by_year.rds")
  by_year_month_filled <- readRDS("above_80_by_year_month.rds")
}

quantiles <- quantile(
  by_year_filled |>
    filter(station_name == "FAIRBANKS INTL AP") |>
    pull(above_80),
  probs = c(0, 0.01, 0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95, 0.99, 1)
) |>
  as_tibble(rownames = "quantile") |>
  rename(days = value) |>
  pivot_wider(names_from = quantile, values_from = days)

monthly_pattern <- by_year_month_filled |>
  group_by(month) |>
  summarize(
    days = list(quantile(
      as.integer(above_80),
      probs = c(0.1, 0.5, 0.9)
    ))
  ) |>
  unnest_wider(days) |>
  mutate(month_name = month.name[month]) |>
  select(month_name, `10%`, `50%`, `90%`)

monthly_correlations <- by_year_month_filled |>
  mutate(month_name = month.name[month]) |>
  pivot_wider(id_cols = c(station_name, year), names_from = month_name, values_from = above_80) |>
  select(May:August) |>
  cor()

Results

Since we’re dealing with a counting statistic (number of days in a year above 80 degrees), things like mean and standard deviation aren’t as useful as statistics that don’t rely on the data being approximately normal. In situations like this, the median value is more informative, and percentiles provide a better view of the distribution of the data.

For example, there are 11 days/year at the 50th percentile, which means that 50 percent of the values are below 11 and 50 percent are above 11. This is the median, or what we’d expect in a normal year in Fairbanks. Zero percent is the minimum (1 day/year in 1945), 100% is the maximum (36 days in 2013).

Percentiles, days per year above 80 °F
0% 1% 5% 10% 25% 50% 75% 90% 95% 99% 100%
1 1.89 3.45 4.9 7 11 15 20 23.55 30.66 36

At this point in the year, we’ve had 7 days above eighty, and we’d expect to have 11 days once the summer is complete. The distribution by month is shown below.

Median days above 80 °F by month
Month 10% 50% 90%
May 0 0.0 2.0
June 0 4.0 10.0
July 1 4.5 9.0
August 0 1.0 4.7

This shows that the hottest days of the summer are June and July. In hotter years, it looks like June might be hotter (the 90th percentile), and July might be slightly warmer in a normal year (50%).

What this doesn’t say is if you have more than the usual number of hot days in June, like this year, should we expect the rest of the summer to also be warmer than normal? We can look at the within-year correlations between the values in one month to the others in that year.

Within-year correlations between months
Month May June July August
May 1.00 −0.04 0.18 0.16
June −0.04 1.00 0.07 0.14
July 0.18 0.07 1.00 0.28
August 0.16 0.14 0.28 1.00

A value of 0 means no relationship and 1 is a perfect relationship. If the numbers are negative that means that there is an opposite relationship where if the first value trends high, the second trends low. The correlations between months for the number of days above 80 °F are low, so if it’s a hot June, there isn’t much support for predicting a hot July (a correlation of 0.07).

Conclusion

This is a long winded way of saying that it’s been a warm June this year, but there’s no particular reason to think the rest of the summer will be as hot as it has been so far.