wed, 14-nov-2012, 05:29

Early-season ski from work

Yesterday a co-worker and I were talking about how we weren’t able to enjoy the new snow because the weather had turned cold as soon as the snow stopped falling. Along the way, she mentioned that it seemed to her that the really cold winter weather was coming later and later each year. She mentioned years past when it was bitter cold by Halloween.

The first question to ask before trying to determine if there has been a change in the date of the first cold snap is what qualifies as “cold.” My officemate said that she and her friends had a contest to guess the first date when the temperature didn’t rise above -20°F. So I started there, looking for the month and day of the winter when the maximum daily temperature was below -20°F.

I’m using the GHCN-Daily dataset from NCDC, which includes daily minimum and maximum temperatures, along with other variables collected at each station in the database.

When I brought in the data for the Fairbanks Airport, which has data available from 1948 to the present, there was absolutely no relationship between the first -20°F or colder daily maximum and year.

However, when I changed the definition of “cold” to the first date when the daily minimum temperature is below -40, I got a weak (but not statistically significant) positive trend between date and year.

The SQL query looks like this:

```SELECT year, water_year, water_doy, mmdd, temp
FROM (
SELECT year, water_year, water_doy, mmdd, temp,
row_number() OVER (PARTITION BY water_year ORDER BY water_doy) AS rank
FROM (
SELECT extract(year from dte) AS year,
extract(year from dte + interval '92 days') AS water_year,
extract(doy from dte + interval '92 days') AS water_doy,
to_char(dte, 'mm-dd') AS mmdd,
sum(CASE WHEN variable = 'TMIN'
THEN raw_value * raw_multiplier
ELSE NULL END
) AS temp
FROM ghcnd_obs
INNER JOIN ghcnd_variables USING(variable)
WHERE station_id = 'USW00026411'
GROUP BY extract(year from dte),
extract(year from dte + interval '92 days'),
extract(doy from dte + interval '92 days'),
to_char(dte, 'mm-dd')
ORDER BY water_year, water_doy
) AS foo
WHERE temp < -40 AND temp > -80
) AS bar
WHERE rank = 1
ORDER BY water_year;
```

I used “water year” instead of the actual year because the winter is split between two years. The water year starts on October 1st (we’re in the 2013 water year right now, for example), which converts a split winter (winter of 2012/2013) into a single year (2013, in this case). To get the water year, you add 92 days (the sum of the days in October, November and December) to the date and use that as the year.

Here’s what it looks like (click on the image to view a PDF version):

The dots are the observed date of first -40° daily minimum temperature for each water year, and the blue line shows a linear regression model fitted to the data (with 95% confidence intervals in grey). Despite the scatter, you can see a slightly positive slope, which would indicate that colder temperatures in Fairbanks are coming later now, than they were in the past.

As mentioned, however, our eyes often deceive us, so we need to look at the regression model to see if the visible relationship is significant. Here’s the R lm results:

```Call:
lm(formula = water_doy ~ water_year, data = first_cold)

Residuals:
Min      1Q  Median      3Q     Max
-45.264 -15.147  -1.409  13.387  70.282

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -365.3713   330.4598  -1.106    0.274
water_year     0.2270     0.1669   1.360    0.180

Residual standard error: 23.7 on 54 degrees of freedom
Multiple R-squared: 0.0331,     Adjusted R-squared: 0.01519
F-statistic: 1.848 on 1 and 54 DF,  p-value: 0.1796
```

The first thing to check in the model summary is the p-value for the entire model on the last line of the results. It’s only 0.1796, which means that there’s an 18% chance of getting these results simply by chance. Typically, we’d like this to be below 5% before we’d consider the model to be valid.

You’ll also notice that the coefficient of the independent variable (water_year) is positive (0.2270), which means the model predicts that the earliest cold snap is 0.2 days later every year, but that this value is not significantly different from zero (a p-value of 0.180).

Still, this seems like a relationship worth watching and investigating further. It might be interesting to look at other definitions of “cold,” such as requiring three (or more) consecutive days of -40° temperatures before including that period as the earliest cold snap. I have a sense that this might reduce the year to year variation in the date seen with the definition used here.

thu, 02-feb-2012, 16:56

In my last post I discussed the January 2012 cold snap, which was the fifth coldest January on record in Fairbanks. I wondered how much of this was due to the arbitrary choice of “month” as the aggregation level for the analysis. In other words, maybe January 2012 was really remarkable only because the cold happened to fall within the margins of that particular month?

So I ran a query against the GHCN daily database for the Fairbanks Airport station. This data set won’t allow me to answer the exact question I’m interested in because the data for Fairbanks only goes back to 1948 and because I don’t have the data for the last day in January 2012. But I think the analysis is still valid, even if it’s not perfect.

The following query calculates the average temperature in Fairbanks for every 31-day period possible, and ranks it according to a descending sort of the average temperature for those periods. The results show the start date, end date, the year and month that the majority of data appears in (more on that later), and the average temperature over the period.

One other note: the temperatures in this post are in degrees Celsius, which is why they’re different than those in my previous post (in °F).

```SELECT date(dte - interval '15 days') AS start,
date(dte + interval '15 days') AS end,
to_char(dte, 'YYYY-MM') AS yyyymm,
round(avg_31_days::numeric, 1),
rank() OVER (ORDER BY avg_31_days) AS rank
FROM (
SELECT dte,
avg(tavg_c) OVER (
ORDER BY dte
ROWS BETWEEN 15 PRECEDING AND 15 FOLLOWING
) AS avg_31_days
FROM (
SELECT dte,
(sum(CASE WHEN variable = 'TMIN'
THEN raw_value * 0.1
ELSE NULL END)
+ sum(CASE WHEN variable = 'TMAX'
THEN raw_value * 0.1
ELSE NULL END)) / 2.0 AS tavg_c
FROM ghcnd_obs
WHERE station_id = 'USW00026411'
GROUP BY dte
) AS foo
) AS bar
ORDER BY rank;
```

The funky CASE WHEN ... END stuff in the innermost query is because of the structure of the data in the GHCN database. The exciting part is the window function in the first subquery that calculates the average temperature for a 31-day window surrounding every date in the database.

Some of the results:

start end yyyymm avg (°C) rank
1964-12-10 1965-01-09 1964-12 -36.9 1
1964-12-09 1965-01-08 1964-12 -36.9 2
1964-12-11 1965-01-10 1964-12 -36.9 2
1971-01-06 1971-02-05 1971-01 -36.8 4
1964-12-08 1965-01-07 1964-12 -36.7 5
1971-01-07 1971-02-06 1971-01 -36.6 6
1971-01-05 1971-02-04 1971-01 -36.5 7
1964-12-12 1965-01-11 1964-12 -36.4 8
1968-12-21 1969-01-20 1969-01 -36.3 9
1968-12-22 1969-01-21 1969-01 -36.3 10
...
2012-01-14 2012-02-13 2012-01 -33.9 42
2012-01-13 2012-02-12 2012-01 -33.9 43

There are some interesting things here. First, the January we just went through doesn’t show up until the 42nd coldest. You can also see that there was a very cold period from mid-December 1964 through mid-January 1964. This “even” appears five times in the first ten coldest periods. The January 1971 event occurs three times in the top ten.

To me, this means a couple things. First, we’re diluting the rankings because the same cold weather event shows up multiple times. If we use the yyyymm column to combine these, we can get a better sense of where January 2012 fits. Also, if an event shows up on here a bunch of times, that probably means that the event was longer than the 31-day window we’ve set at the outset. If you look at the minimum and maximum dates for the 1964 event, the real even lasted from December 8, 1964 through January 11, 1964 (35 days).

If we use this query as a subquery of one that groups on yyyymm, we’ll get a ranking of the overall events:

```SELECT yyyymm, min_avg,
rank() OVER (ORDER BY min_rank)
FROM (
SELECT yyyymm,
min(avg_31_days) AS min_avg,
min(rank) AS min_rank
FROM (ABOVE QUERY) AS foobie
GROUP BY yyyymm
) AS barfoo
ORDER BY min_rank;
```

Here’s the results of the new query:

yyyymm min_avg (°C) rank
1964-12 -36.9 1
1971-01 -36.8 2
1969-01 -36.3 3
1951-01 -34.2 4
1996-11 -34.0 5
2012-01 -33.9 6
1953-01 -33.7 7
1956-12 -33.5 8
1966-01 -33.3 9
1965-01 -33.2 10

Our cold snap winds up in sixth place. I’m sure that if I had all the data the National Weather Service used in their analysis, last month would drop even lower in an analysis like this one.

My conclusion: last month was a really cold month. But the exceptional nature of it was at least partly due to the coincidence of it happening within the confines of an arbitrary 31-day period called “January.”

tags: SQL  weather
sat, 11-jul-2009, 11:04

West weather sensors

Fairbanks had some very hot weather earlier this week, including breaking a high temperature record at the airport on Wednesday afternoon. We got up to 94°F here on the Creek, but what was worse was that the low temperature on Wednesday night was 60°F, too warm to cool the house much. We were tempted to sleep out in the back cabin because the house was so warm.

It’s been cooler in the last couple days. But how much cooler?

The new version of PostgreSQL (8.4) has support for window functions, which make it easier to answer questions like this. Window functions allow you to calculate aggregates (average, sum, etc.) at one level, while display the data from another level. In this case, I want to compare the hourly average temperatures over the last 24 hours with the overall hourly average temperature over the past week. Without window functions, I’d need to combine a query that yields the hourly average temperature over the last 24 hours with a query that calculates overall seven-day hourly average temperatures. And if I want the difference between the two, the first two queries become a subquery of a third query.

Here’s the query:

```SELECT dt, t_avg,
seven_day_avg::numeric(4,1),
(t_avg - seven_day_avg)::numeric(4,1) AS anomaly
FROM (
SELECT dt, t_avg::numeric(4,1),
avg(t_avg::numeric(4,1)) OVER (PARTITION BY extract(hour from dt)) AS seven_day_avg
FROM hourly WHERE dt > current_timestamp - interval ’7 days’
) AS sevenday
WHERE dt > current_timestamp - interval ’24 hours’
ORDER BY dt;
```

And the result for the last 24 hours of temperature data:

```         dt       | t_avg | seven_day_avg | anomaly
------------------+-------+---------------+---------
2009-07-10 11:00 |  67.1 |          72.3 |    -5.2
2009-07-10 12:00 |  70.8 |          75.8 |    -5.0
2009-07-10 13:00 |  72.9 |          77.4 |    -4.5
2009-07-10 14:00 |  74.1 |          78.5 |    -4.4
2009-07-10 15:00 |  74.6 |          80.2 |    -5.6
2009-07-10 16:00 |  75.9 |          80.4 |    -4.5
2009-07-10 17:00 |  76.1 |          81.0 |    -4.9
2009-07-10 18:00 |  76.9 |          80.4 |    -3.5
2009-07-10 19:00 |  76.5 |          79.3 |    -2.8
2009-07-10 20:00 |  73.1 |          77.0 |    -3.9
2009-07-10 21:00 |  69.1 |          73.5 |    -4.4
2009-07-10 22:00 |  63.7 |          68.2 |    -4.5
2009-07-10 23:00 |  57.6 |          62.3 |    -4.7
2009-07-11 00:00 |  52.1 |          56.5 |    -4.4
2009-07-11 01:00 |  48.5 |          52.6 |    -4.1
2009-07-11 02:00 |  45.5 |          49.3 |    -3.8
2009-07-11 03:00 |  43.4 |          47.9 |    -4.5
2009-07-11 04:00 |  42.2 |          47.1 |    -4.9
2009-07-11 05:00 |  44.8 |          47.6 |    -2.8
2009-07-11 06:00 |  47.5 |          49.7 |    -2.2
2009-07-11 07:00 |  51.2 |          53.8 |    -2.6
2009-07-11 08:00 |  55.3 |          59.2 |    -3.9
2009-07-11 09:00 |  60.4 |          64.0 |    -3.6
2009-07-11 10:00 |  65.5 |          68.3 |    -2.8
```

Conclusion? It’s been several degrees cooler in the last 24 hours compared with the last seven days.

And window functions are groovy.

sat, 13-dec-2008, 16:43

Heat gun, vent pipe

Last winter our vent pipe froze solid and I spent a couple hours in the attic with a heat gun melting the blockage. A couple days ago I noticed that flushing the toilet was pulling water out of the traps in the sink and bathtub, so I knew the vent was getting plugged again. My attempt at a fix over the summer was to put a larger pipe over the vent on the roof so the condensation might happen in the outer pipe, which could be easily removed and cleaned out. Unfortunately, when I got up on the roof today, not only was it impossible to get the outer pipe free, but the growing constriction wasn’t in the outer pipe anyway. Mid-summer my neighbor suggested replacing the section of the vent in the attic with a much larger diameter pipe (6” was his suggestion) and then insulating it. I never got around to it; almost frozen vent pipe; a priori.

The total freezing degree days to this point in the winter season has been 2,258, which ought to give me a pretty good way of estimating when I’ll have another problem. Thus far we’ve had 74 days with an average daily temperature below freezing, and the average temperature for those days was 1.4°F. That’s below normal for the year: nine of the last eleven weeks have been below average. But the heart of the winter is approaching, so it’ll take many fewer than 74 days to double our freezing degree days for the season.

My first attempt to fix the clog was to insert a heat gun into the clean out and let the warm air from the heat gun melt it (shown in the photo). I don’t think this had any real effect. I kept it going for a little over an hour, monitoring the backside of the pipe to make sure the heat gun wasn’t melting it. But it was around -10°F this morning, so I’ll bet the hot air was pretty cold by the time it rose to the ice. The second attempt was using a pipe snake from up on the roof, but it couldn’t go past the constriction, and even if it had, I’m not sure it really would have cleared out much ice. Finally, I dragged a five gallon bucket filled with hot water up onto the roof. I got about three gallons into the vent before it was filled to the top. At first I was nervous that this wasn’t going to work and I’d have three gallons of water turning to ice in the vent, but the water started dropping slowly, and after a minute, the blockage gave way and hot water came plunging down the pipe. I poured the final gallon or two through the pipe, and came down off the roof.

Tomorrow I’ll put some 2” fiberglass insulation around the part of the pipe that’s exposed in the attic, which should help. And I’ll be keeping my eye on the total freezing degree days for the rest of the winter. Once it gets up to 4,000, I may want to go back up on the roof with another bucket of hot water.

SQL query from my weather database: SELECT sum(32.0 - t_avg::double precision) FROM daily;
Meta Photolog Archives