thu, 02-feb-2012, 16:56

In my last post I discussed the January 2012 cold snap, which was the fifth coldest January on record in Fairbanks. I wondered how much of this was due to the arbitrary choice of “month” as the aggregation level for the analysis. In other words, maybe January 2012 was really remarkable only because the cold happened to fall within the margins of that particular month?

So I ran a query against the GHCN daily database for the Fairbanks Airport station. This data set won’t allow me to answer the exact question I’m interested in because the data for Fairbanks only goes back to 1948 and because I don’t have the data for the last day in January 2012. But I think the analysis is still valid, even if it’s not perfect.

The following query calculates the average temperature in Fairbanks for every 31-day period possible, and ranks it according to a descending sort of the average temperature for those periods. The results show the start date, end date, the year and month that the majority of data appears in (more on that later), and the average temperature over the period.

One other note: the temperatures in this post are in degrees Celsius, which is why they’re different than those in my previous post (in °F).

```SELECT date(dte - interval '15 days') AS start,
date(dte + interval '15 days') AS end,
to_char(dte, 'YYYY-MM') AS yyyymm,
round(avg_31_days::numeric, 1),
rank() OVER (ORDER BY avg_31_days) AS rank
FROM (
SELECT dte,
avg(tavg_c) OVER (
ORDER BY dte
ROWS BETWEEN 15 PRECEDING AND 15 FOLLOWING
) AS avg_31_days
FROM (
SELECT dte,
(sum(CASE WHEN variable = 'TMIN'
THEN raw_value * 0.1
ELSE NULL END)
+ sum(CASE WHEN variable = 'TMAX'
THEN raw_value * 0.1
ELSE NULL END)) / 2.0 AS tavg_c
FROM ghcnd_obs
WHERE station_id = 'USW00026411'
GROUP BY dte
) AS foo
) AS bar
ORDER BY rank;
```

The funky CASE WHEN ... END stuff in the innermost query is because of the structure of the data in the GHCN database. The exciting part is the window function in the first subquery that calculates the average temperature for a 31-day window surrounding every date in the database.

Some of the results:

start end yyyymm avg (°C) rank
1964-12-10 1965-01-09 1964-12 -36.9 1
1964-12-09 1965-01-08 1964-12 -36.9 2
1964-12-11 1965-01-10 1964-12 -36.9 2
1971-01-06 1971-02-05 1971-01 -36.8 4
1964-12-08 1965-01-07 1964-12 -36.7 5
1971-01-07 1971-02-06 1971-01 -36.6 6
1971-01-05 1971-02-04 1971-01 -36.5 7
1964-12-12 1965-01-11 1964-12 -36.4 8
1968-12-21 1969-01-20 1969-01 -36.3 9
1968-12-22 1969-01-21 1969-01 -36.3 10
...
2012-01-14 2012-02-13 2012-01 -33.9 42
2012-01-13 2012-02-12 2012-01 -33.9 43

There are some interesting things here. First, the January we just went through doesn’t show up until the 42nd coldest. You can also see that there was a very cold period from mid-December 1964 through mid-January 1964. This “even” appears five times in the first ten coldest periods. The January 1971 event occurs three times in the top ten.

To me, this means a couple things. First, we’re diluting the rankings because the same cold weather event shows up multiple times. If we use the yyyymm column to combine these, we can get a better sense of where January 2012 fits. Also, if an event shows up on here a bunch of times, that probably means that the event was longer than the 31-day window we’ve set at the outset. If you look at the minimum and maximum dates for the 1964 event, the real even lasted from December 8, 1964 through January 11, 1964 (35 days).

If we use this query as a subquery of one that groups on yyyymm, we’ll get a ranking of the overall events:

```SELECT yyyymm, min_avg,
rank() OVER (ORDER BY min_rank)
FROM (
SELECT yyyymm,
min(avg_31_days) AS min_avg,
min(rank) AS min_rank
FROM (ABOVE QUERY) AS foobie
GROUP BY yyyymm
) AS barfoo
ORDER BY min_rank;
```

Here’s the results of the new query:

yyyymm min_avg (°C) rank
1964-12 -36.9 1
1971-01 -36.8 2
1969-01 -36.3 3
1951-01 -34.2 4
1996-11 -34.0 5
2012-01 -33.9 6
1953-01 -33.7 7
1956-12 -33.5 8
1966-01 -33.3 9
1965-01 -33.2 10

Our cold snap winds up in sixth place. I’m sure that if I had all the data the National Weather Service used in their analysis, last month would drop even lower in an analysis like this one.

My conclusion: last month was a really cold month. But the exceptional nature of it was at least partly due to the coincidence of it happening within the confines of an arbitrary 31-day period called “January.”

tags: SQL  weather
Meta Photolog Archives