A couple days ago I got an email from a Galoot who was hoping to come north to see the aurora and wondered if March was a good time to come to Fairbanks. I know that March and September are two of my favorite months, but wanted to check to see if my perception of how sunny it is in March was because it really is sunny in March or if it’s because March is the month when winter begins to turn to spring in Fairbanks and it just seems brighter and sunnier, with longer days and white snow on the ground.
I found three sources of data for “cloudiness.” I’ve been parsing the Fairbanks Airport daily climate summary since 2002, and it has a value in it called Average Sky Cover which ranges from 0.0 (completely clear) to 1.0 (completely cloudy). I’ll call this data “pafa.”
The second source is the Global Historical Climatology - Daily for the Fairbanks Airport station. There’s a variable in there named ACMH, which is described as Cloudiness, midnight to midnight (percentage). For the Airport station, this value appears in the database from 1965 through 1997. One reassuring thing about this parameter is that it specifically says it’s from midnight to midnight, so it would include cloudiness when it was dark outside (and the aurora would be visible if it was present). This data set is named “ghcnd.”
The final source is modelled data from the North American Regional Reanalysis. This data set includes TCDC, or total cloud cover (percentage), and is available in three-hour increments over a grid covering North America. I chose the nearest grid point to the Fairbanks Airport and retrieved the daily mean of total cloud cover for the period of the database I have downloaded (1979—2012). In the plots that follow, this is named “narr.”
After reading the data and merging the three data sets together, I generate monthly means of cloud cover (scaled to percentages from 0 to 100) in each of the data sets, in R:
library(plyr) cloud_cover <- merge(pafa, ghcnd, by = 'date', all = TRUE) cloud_cover <- merge(cloud_cover, narr, by = 'date', all = TRUE) cloud_cover$month <- month(cloud_cover$date) by_month_mean <- ddply( subset(cloud_cover, select = c('month', 'pafa', 'ghcnd', 'narr')), .(month), summarise, pafa = mean(pafa, na.rm = TRUE), ghcnd = mean(ghcnd, na.rm = TRUE), narr = mean(narr, na.rm = TRUE)) by_month_mean$mon <- factor(by_month_mean$month, labels = c('jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul', 'aug', 'sep', 'oct', 'nov', 'dec'))
In order to plot it, I generate text labels for the year range of each data set and melt the data so it can be faceted:
library(lubridate) library(reshape2) text_labels <- rbind( data.frame(variable = 'pafa', str = paste(min(year(pafa$date)), '-', max(year(pafa$date)))), data.frame(variable = 'ghcnd', str = paste(min(year(ghcnd$date)), '-', max(year(ghcnd$date)))), data.frame(variable = 'narr', str = paste(min(year(narr$date)), '-', max(year(narr$date))))) mean_melted <- melt(by_month_mean, id.vars = 'mon', measure.vars = c('pafa', 'ghcnd', 'narr'))
Finally, the plotting:
library(ggplot2) q <- ggplot(data = mean_melted, aes(x = mon, y = value)) q + theme_bw() + geom_bar(stat = 'identity', colour = "darkred", fill = "darkorange") + facet_wrap(~ variable, ncol = 1) + scale_x_discrete(name = "Month") + scale_y_continuous(name = "Mean cloud cover") + ggtitle('Cloud cover data for Fairbanks Airport Station') + geom_text(data = text_labels, aes(x = 'feb', y = 70, label = str), size = 4) + geom_text(aes(label = round(value, digits = 1)), vjust = 1.5, size = 3)
The good news for the guy coming to see the northern lights is that March is indeed the least cloudy month in Fairbanks, and all three data sources show similar patterns, although the NARR dataset has September and October as the cloudiest months, and anyone who has lived in Fairbanks knows that August is the rainiest (and probably cloudiest) month. PAFA and GHCND have a late summer pattern that seems more like what I recall.
Another way to slice the data is to get the average number of days in a month with less than 20% cloud cover; a measure of the clearest days. This is a pretty easy calculation:
by_month_less_than_20 <- ddply( subset(cloud_cover, select = c('month', 'pafa', 'ghcnd', 'narr')), .(month), summarise, pafa = sum(pafa < 20, na.rm = TRUE) / sum(!is.na(pafa)) * 100, ghcnd = sum(ghcnd < 20, na.rm = TRUE) / sum(!is.na(ghcnd)) * 100, narr = sum(narr < 20, na.rm = TRUE) / sum(!is.na(narr)) * 100);
And the results:
We see the same pattern as in the mean cloudiness plot. March is the month with the greatest number of days with less that 20% cloud cover. Depending on the data set, between 17 and 24 percent of March days are quite clear. In contrast, the summer months rarely see days with no cloud cover. In June and July, the days are long and convection often builds large clouds in the late afternoon, and by August, the rain has started. Just like in the previous plot, NARR has September as the month with the fewest clear days, which doesn’t match my experience.