sat, 08-apr-2017, 11:30

Introduction

The Alaska Department of Transportation is working on updating their bicycling and pedestrian master plan for the state and their web site mentions Alaska as having high percentages of bicycle and pedestrian commuters relative to the rest of the country. I’m interested because I commute to work by bicycle (and occasionally ski or run) every day, either on the trails in the winter, or the roads in the summer. The company I work for (ABR) pays it’s employees $3.50 per day for using non-motorized means of transportation to get to work. I earned more than $700 last year as part of this program and ABR has paid it’s employees almost $40K since 2009 not to drive to work.

The Census Bureau keeps track of how people get to work in the American Community Survey, easily accessible from their web site. We’ll use this data to see if Alaska really does have higher than average rates of non-motorized commuters.

Data

The data comes from FactFinder. I chose ‘American Community Survey’ from the list of data sources near the bottom, searched for ‘bicycle’, chose ‘Commuting characteristics by sex’ (Table S0801), and added the ‘All States within United States and Puerto Rico’ as the Geography of interest. The site generates a zip file containing the data as a CSV file along with several other informational files. The code for extracting the data appears at the bottom of this post.

The data are percentages of workers 16 years and over and their means of transportation to work. Here’s a table showing the top 10 states ordered by the combination of bicycling and walking percentage.

  state total motorized carpool public_trans walk bicycle
1 District of Columbia 358,150 38.8 5.2 35.8 14.0 4.1
2 Alaska 363,075 80.5 12.6 1.5 7.9 1.1
3 Montana 484,043 84.9 10.4 0.8 5.6 1.6
4 New York 9,276,438 59.3 6.6 28.6 6.3 0.7
5 Vermont 320,350 85.1 8.2 1.3 5.8 0.8
6 Oregon 1,839,706 81.4 10.2 4.8 3.8 2.5
7 Massachusetts 3,450,540 77.6 7.4 10.6 5.0 0.8
8 Wyoming 289,163 87.3 10.0 2.2 4.6 0.6
9 Hawaii 704,914 80.9 13.5 7.0 4.1 0.9
10 Washington 3,370,945 82.2 9.8 6.2 3.7 1.0

Alaska has the second highest rates of walking and biking to work behind the District of Columbia. The table is an interesting combination of states with large urban centers (DC, New York, Oregon, Massachusetts) and those that are more rural (Alaska, Montana, Vermont, Wyoming).

Another way to rank the data is by combining all forms of transportation besides single-vehicle motorized transport (car pooling, public transportation, walking and bicycling).

  state total motorized carpool public_trans walk bicycle
1 District of Columbia 358,150 38.8 5.2 35.8 14.0 4.1
2 New York 9,276,438 59.3 6.6 28.6 6.3 0.7
3 Massachusetts 3,450,540 77.6 7.4 10.6 5.0 0.8
4 New Jersey 4,285,182 79.3 7.5 11.6 3.3 0.3
5 Alaska 363,075 80.5 12.6 1.5 7.9 1.1
6 Hawaii 704,914 80.9 13.5 7.0 4.1 0.9
7 Oregon 1,839,706 81.4 10.2 4.8 3.8 2.5
8 Illinois 6,094,828 81.5 7.9 9.3 3.0 0.7
9 Washington 3,370,945 82.2 9.8 6.2 3.7 1.0
10 Maryland 3,001,281 82.6 8.9 9.0 2.6 0.3

Here, the states with large urban centers come out higher because of the number of commuters using public transportation. Despite very low availability of public transportation, Alaska still winds up 5th on this list because of high rates of car pooling, in addition to walking and bicycling.

Map data

To look at regional patterns, we can make a map of the United States colored by non-motorized transportation percentage. This can be a little challenging because Alaska and Hawaii are so far from the rest of the country. What I’m doing here is loading the state data, transforming the data to a projection that’s appropriate for Alaska, and moving Alaska and Hawaii closer to the lower-48 for display. Again, the code appears at the bottom.

You can see that non-motorized transportation is very low throughout the deep south, and tends to be higher in the western half of the country, but the really high rates of bicycling and walking to work are isolated. High Vermont next to low New Hampshire, or Oregon and Montana split by Idaho.

Urban and rural, median age of the population

What explains the high rates of non-motorized commuting in Alaska and the other states at the top of the list? Urbanization is certainly one important factor explaining why the District of Columbia and states like New York, Oregon and Massachusetts have high rates of walking and bicycling. But what about Montana, Vermont, and Wyoming?

Age of the population might have an effect as well, as younger people are more likely to walk and bike to work than older people. Alaska has the second youngest population (33.3 years) in the U.S. and DC is third (33.8), but the other states in the top five (Utah, Texas, North Dakota) don’t have high non-motorized transportation.

So it’s more complicated that just these factors. California is a good example, with a combination of high urbanization (second, 95.0% urban), low median age (eighth, 36.2) and great weather year round, but is 19th for non-motorized commuting. Who walks in California, after all?

Conclusion

I hope DOT comes up with a progressive plan for improving opportunities for pedestrian and bicycle transportation in Alaska They’ve made some progress here in Fairbanks; building new paths for non-motorized traffic; but they also seem blind to the realities of actually using the roads and paths on a bicycle. The “bike path” near my house abruptly turns from asphalt to gravel a third of the way down Miller Hill, and the shoulders of the roads I commute on are filled with deep snow in winter, gravel in spring, and all manner of detritus year round. Many roads don’t have a useable shoulder at all.

Code

library(tidyverse)  # data import, manipulation
library(knitr)      # pretty tables
library(rpostgis)   # PostGIS support
library(rgdal)      # geographic transformation
library(maptools)   # geographic transformation
library(viridis)    # color blind color palette

# Read the heading
heading <- read_csv('ACS_15_1YR_S0801.csv', n_max = 1) %>% names()

# Read the data
s0801 <- read_csv('ACS_15_1YR_S0801.csv', col_names = FALSE, skip = 2)

names(s0801) <- heading

# Extract only the columns we need, add state postal codes
commute <- s0801 %>%
    transmute(state = `GEO.display-label`,
              total = HC01_EST_VC01,
              motorized = HC01_EST_VC03,
              carpool = HC01_EST_VC05,
              public_trans = HC01_EST_VC10,
              walk = HC01_EST_VC11,
              bicycle = HC01_EST_VC12) %>%
    filter(state != 'Puerto Rico') %>%
    mutate(state_postal = c('AL', 'AK', 'AZ', 'AR', 'CA', 'CO', 'CT',
                            'DE', 'DC', 'FL', 'GA', 'HI', 'ID', 'IL',
                            'IN', 'IA', 'KS', 'KY', 'LA', 'ME', 'MD',
                            'MA', 'MI', 'MN', 'MS', 'MO', 'MT', 'NE',
                            'NV', 'NH', 'NJ', 'NM', 'NY', 'NC', 'ND',
                            'OH', 'OK', 'OR', 'PA', 'RI', 'SC', 'SD',
                            'TN', 'TX', 'UT', 'VT', 'VA', 'WA', 'WV',
                            'WI', 'WY'))

# Print top ten tables
kable(commute %>% select(-state_postal) %>%
      arrange(desc(walk + bicycle)) %>% head(n = 10),
  format.args = list(big.mark = ","),
  row.names = TRUE)

kable(commute %>% select(-state_postal) %>%
      arrange(motorized) %>% head(n = 10),
  format.args = list(big.mark = ","),
  row.names = TRUE)

# Connect to the database with the state layer
layers <- src_postgres(host = "localhost",
                             dbname = "layers")

states <- pgGetGeom(layers$con, c("public", "states"),
                    geom = "wkb_geometry", gid = "ogc_fid")

# Transform to srid 3338 (Alaska Albers)
states_3338 <- spTransform(states, CRS("+proj=aea +lat_1=55 +lat_2=65 +lat_0=50
                                       +lon_0=-154 +x_0=0 +y_ 0=0 +ellps=GRS80
                                       +towgs84=0,0,0,0,0,0,0 +units=m
                                       +no_defs"))

# Convert to a data frame suitable for ggplot, move AK and HI
ggstates <- fortify(states_3338, region = "state") %>%
    filter(id != 'PR') %>%
    inner_join(commute, by = (c("id" = "state_postal"))) %>%
    mutate(lat = ifelse(state == 'Hawaii', lat + 2300000, lat),
           long = ifelse(state == 'Hawaii', long + 2000000, long),
           lat = ifelse(state == 'Alaska', lat + 1000000, lat),
           long = ifelse(state == 'Alaska', long + 2000000, long))

# Plot it
p <- ggplot() + geom_polygon(data = ggstates, colour = "black",
                        aes(x = long, y = lat, group = group,
                            fill = bicycle + walk)) +
    coord_fixed(ratio = 1) +
    scale_fill_viridis(name = "Non-motorized\n commuters (%)",
                       option = "plasma",
                       limits = c(0, 9), breaks = seq(0, 9, 3)) +
    theme_void() +
    theme(legend.position = c(0.9, 0.2))

width <- 16
height <- 9
resize <- 0.75

svg("non_motorized_commute_map.svg", width = width*resize, height = height*resize)
print(p)
dev.off()
pdf("non_motorized_commute_map.pdf", width = width*resize, height = height*resize)
print(p)
dev.off()

# Urban and rural percentages by state
heading <- read_csv('../urban_rural/DEC_10_SF1_P2.csv', n_max = 1) %>% names()

dec10 <- read_csv('../urban_rural/DEC_10_SF1_P2.csv', col_names = FALSE, skip = 2)

names(dec10) <- heading

urban_rural <- dec10 %>%
    transmute(state = `GEO.display-label`,
              total = D001,
              urban = D002,
              rural = D005) %>%
    filter(state != 'Puerto Rico') %>%
    mutate(urban_percentage = urban / total * 100)

# Median age by state
heading <- read_csv('../age_sex/ACS_15_1YR_S0101.csv', n_max = 1) %>% names()

s0101 <- read_csv('../age_sex/ACS_15_1YR_S0101.csv', col_names = FALSE, skip = 2)

names(s0101) <- heading

age <- s0101 %>%
    transmute(state = `GEO.display-label`,
              median_age = HC01_EST_VC35) %>%
    filter(state != 'Puerto Rico')

# Do urban percentage and median age explain anything about
# non-motorized transit?
census_data <- commute %>%
    inner_join(urban_rural, by = "state") %>%
    inner_join(age, by = "state") %>%
    select(state, state_postal, walk, bicycle, urban_percentage, median_age) %>%
    mutate(non_motorized = walk + bicycle)

u <- ggplot(data = census_data,
       aes(x = urban_percentage, y = non_motorized)) +
    geom_text(aes(label = state_postal))

svg("urban.svg", width = width*resize, height = height*resize)
print(u)
dev.off()

a <- ggplot(data = census_data,
       aes(x = median_age, y = non_motorized)) +
    geom_text(aes(label = state_postal))

svg("age.svg", width = width*resize, height = height*resize)
print(a)
dev.off()

# Not significant:
model = lm(data = census_data,
           non_motorized ~ urban_percentage + median_age)

# Only significant because of DC (a clear outlier)
model = lm(data = census_data,
           non_motorized ~ urban_percentage * median_age)
tags: bicycling  walking  census data  R 
mon, 09-jan-2017, 09:49

Introduction

The latest forecast discussions for Northern Alaska have included warnings that we are likely to experience an extended period of below normal temperatures starting at the end of this week, and yesterday’s Deep Cold blog post discusses the similarity of model forecast patterns to patterns seen in the 1989 and 1999 extreme cold events.

Our dogs spend most of their time in the house when we’re home, but if both of us are at work they’re outside in the dog yard. They have insulated dog houses, but when it’s colder than −15° F, we put them into a heated dog barn. That means one of us has to come home in the middle of the day to let them out to go to the bathroom.

Since we’re past the Winter Solstice, and day length is now increasing, I was curious to see if that has an effect on daily temperature, hopeful that the frequency of days when we need to put the dogs in the barn is decreasing.

Methods

We’ll use daily minimum and maximum temperature data from the Fairbanks International Airport station, keeping track of how many years the temperatures are below −15° F and dividing by the total to get a frequency. We live in a cold valley on Goldstream Creek, so our temperatures are typically several degrees colder than the Fairbanks Airport, and we often don’t warm up as much during the day as in other places, but minimum airport temperature is a reasonable proxy for the overall winter temperature at our house.

Results

The following plot shows the frequency of minimum (the top of each line) and maximum (the bottom) temperature colder than −15° F at the airport over the period of record, 1904−2016. The curved blue line represents a best fit line through the minimum temperature frequency, and the vertical blue line is drawn at the date when the frequency is the highest.

Frequency of days with temperatures below −15° F

The maximum frequency is January 12th, so we have a few more days before the likelihood of needing to put the dogs in the barn starts to decline. The plot also shows that we could still reach that threshold all the way into April.

For fun, here’s the same plot using −40° as the threshold:

Frequency of days with temperatures below −40°

The date when the frequency starts to decline is shifted slightly to January 15th, and you can see the frequencies are lower. In mid-January, we can expect minimum temperature to be colder than −15° F more than half the time, but temperatures colder than −40° are just under 15%. There’s also an interesting anomaly in mid to late December where the frequency of very cold temperatures appears to drop.

Appendix: R code

library(tidyverse)
library(lubridate)
library(scales)

noaa <- src_postgres(host="localhost", dbname="noaa")

fairbanks <- tbl(noaa, build_sql("SELECT * FROM ghcnd_pivot
                                  WHERE station_name='FAIRBANKS INTL AP'")) %>%
    collect()

save(fairbanks, file="fairbanks_ghcnd.rdat")

for_plot <- fairbanks %>%
    mutate(doy=yday(dte),
           dte_str=format(dte, "%d %b"),
           min_below=ifelse(tmin_c < -26.11,1,0),
           max_below=ifelse(tmax_c < -26.11,1,0)) %>%
    filter(dte_str!="29 Feb") %>%
    mutate(doy=ifelse(leap_year(dte) & doy>60, doy-1, doy),
           doy=(doy+31+28+31+30)%%365) %>%
    group_by(doy, dte_str) %>%
    mutate(n_min=sum(ifelse(!is.na(min_below), 1, 0)),
           n_max=sum(ifelse(!is.na(max_below), 1, 0))) %>%
    summarize(min_freq=sum(min_below, na.rm=TRUE)/max(n_min, na.rm=TRUE),
              max_freq=sum(max_below, na.rm=TRUE)/max(n_max, na.rm=TRUE))

x_breaks <- for_plot %>%
    filter(doy %in% seq(49, 224, 7))

stats <- tibble(doy=seq(49, 224),
                pred=predict(loess(min_freq ~ doy,
                                   for_plot %>%
                                       filter(doy >= 49, doy <= 224))))

max_stats <- stats %>%
    arrange(desc(pred)) %>% head(n=1)

p <- ggplot(data=for_plot,
            aes(x=doy, ymin=min_freq, ymax=max_freq)) +
    geom_linerange() +
    geom_smooth(aes(y=min_freq), se=FALSE, size=0.5) +
    geom_segment(aes(x=max_stats$doy, xend=max_stats$doy,
                     y=-Inf, yend=max_stats$pred),
                 colour="blue", size=0.5) +
    scale_x_continuous(name=NULL,
                       limits=c(49, 224),
                       breaks=x_breaks$doy,
                       labels=x_breaks$dte_str) +
    scale_y_continuous(name="Frequency of days colder than −15° F",
                       breaks=pretty_breaks(n=10)) +
    theme_bw() +
    theme(axis.text.x=element_text(angle=30, hjust=1))

# Minus 40
for_plot <- fairbanks %>%
    mutate(doy=yday(dte),
           dte_str=format(dte, "%d %b"),
           min_below=ifelse(tmin_c < -40,1,0),
           max_below=ifelse(tmax_c < -40,1,0)) %>%
    filter(dte_str!="29 Feb") %>%
    mutate(doy=ifelse(leap_year(dte) & doy>60, doy-1, doy),
           doy=(doy+31+28+31+30)%%365) %>%
    group_by(doy, dte_str) %>%
    mutate(n_min=sum(ifelse(!is.na(min_below), 1, 0)),
           n_max=sum(ifelse(!is.na(max_below), 1, 0))) %>%
    summarize(min_freq=sum(min_below, na.rm=TRUE)/max(n_min, na.rm=TRUE),
              max_freq=sum(max_below, na.rm=TRUE)/max(n_max, na.rm=TRUE))

x_breaks <- for_plot %>%
    filter(doy %in% seq(63, 203, 7))

stats <- tibble(doy=seq(63, 203),
                pred=predict(loess(min_freq ~ doy,
                                   for_plot %>%
                                       filter(doy >= 63, doy <= 203))))

max_stats <- stats %>%
    arrange(desc(pred)) %>% head(n=1)

q <- ggplot(data=for_plot,
            aes(x=doy, ymin=min_freq, ymax=max_freq)) +
    geom_linerange() +
    geom_smooth(aes(y=min_freq), se=FALSE, size=0.5) +
    geom_segment(aes(x=max_stats$doy, xend=max_stats$doy,
                     y=-Inf, yend=max_stats$pred),
                 colour="blue", size=0.5) +
    scale_x_continuous(name=NULL,
                       limits=c(63, 203),
                       breaks=x_breaks$doy,
                       labels=x_breaks$dte_str) +
    scale_y_continuous(name="Frequency of days colder than −40°",
                       breaks=pretty_breaks(n=10)) +
    theme_bw() +
    theme(axis.text.x=element_text(angle=30, hjust=1))
tags: weather  climate  temperature  R 
sat, 19-nov-2016, 15:50

Introduction

So far this winter we’ve gotten only 4.1 inches of snow, well below the normal 19.7 inches, and there is only 2 inches of snow on the ground. At this point last year we had 8 inches and I’d been biking and skiing on the trail to work for two weeks. In his North Pacific Temperature Update blog post, Richard James mentions that winters like this one, with a combined strongly positive Pacific Decadal Oscillation phase and strongly negative North Pacific Mode phase tend to be a “distinctly dry” pattern for interior Alaska. I don’t pretend to understand these large scale climate patterns, but I thought it would be interesting to look at snowfall and snow depth in years with very little mid-November snow. In other years like this one do we eventually get enough snow that the trails fill in and we can fully participate in winter sports like skiing, dog mushing, and fat biking?

Data

We will use daily data from the Global Historical Climate Data set for the Fairbanks International Airport station. Data prior to 1950 is excluded because of poor quality snowfall and snow depth data and because there’s a good chance that our climate has changed since then and patterns from that era aren’t a good model for the current climate in Alaska.

We will look at both snow depth and the cumulative winter snowfall.

Results

The following tables show the ten years with the lowest cumulative snowfall and snow depth values from 1950 to the present on November 18th.

Year Cumulative Snowfall (inches)
1953 1.5
2016 4.1
1954 4.3
2014 6.0
2006 6.4
1962 7.5
1998 7.8
1960 8.5
1995 8.8
1979 10.2
Year Snow depth (inches)
1953 1
1954 1
1962 1
2016 2
2014 2
1998 3
1964 3
1976 3
1971 3
2006 4

2016 has the second-lowest cumulative snowfall behind 1953 and is tied for second with 2014 for snow depth with 1953, 1954 and 1962 all having only 1 inch of snow on November 18th.

It also seems like recent years appear in these tables more frequently than would be expected. Grouping by decade and averaging cumulative snowfall and snow depth yields the pattern in the chart below. The error bars (not shown) are fairly large, so the differences between decades aren’t likely to be statistically significant, but there is a pattern of lower snowfall amounts in recent decades.

Decadal average cumulative snowfall and snow depth

Now let’s see what happened in those years with low snowfall and snow depth values in mid-November starting with cumulative snowfall. The following plot (and the subsequent snow depth plot) shows the data for the low-value years (and one very high snowfall year—1990), with each year’s data as a separate line. The smooth dark cyan line through the middle of each plot is the smoothed line through the values for all years; a sort of “average” snowfall and snow depth curve.

Cumulative snowfall, years with low snow on November 18

In all four mid-November low-snowfall years, the cumulative snowfall values remain below average throughout the winter, but snow did continue to fall as the season went on. Even the lowest winter year here, 2006–2007, still ended the winter with 15 inches of snow on the groud.

The following plot shows snow depth for the four years with the lowest snow depth on November 18th. The data is formatted the same as in the previous plot except we’ve jittered the values slightly to make the plot easier to read.

Snow depth, years with low snow on November 18

The pattern here is similar, but the snow depths get much closer to the average values. Snow depth for all four low snow years remain low throughout November, but start rising in December, dramatically in 1954 and 2014.

One of the highest snowfall years between 1950 and 2016 was 1990–1991 (shown on both plots). An impressive 32.8 inches of snow fell in eight days between December 21st and December 28th, accounting for the sharp increase in cumulative snowfall and snow depth shown on both plots. There are five years in the record where the cumulative total for the entire winter was lower than these eight days in 1990.

Conclusion

Despite the lack of snow on the ground to this point in the year, the record shows that we are still likely to get enough snow to fill in the trails. We may need to wait until mid to late December, but it’s even possible we’ll eventually reach the long term average depth before spring.

Appendix

Here’s the R code used to generate the statistics, tables and plots from this post:

library(tidyverse)
library(lubridate)
library(scales)
library(knitr)

noaa <- src_postgres(host="localhost", dbname="noaa")

snow <- tbl(noaa, build_sql(
   "WITH wdoy_data AS (
         SELECT dte, dte - interval '120 days' as wdte,
            tmin_c, tmax_c, (tmin_c+tmax_c)/2.0 AS tavg_c,
            prcp_mm, snow_mm, snwd_mm
         FROM ghcnd_pivot
         WHERE station_name = 'FAIRBANKS INTL AP'
         AND dte > '1950-09-01')
   SELECT dte, date_part('year', wdte) AS wyear, date_part('doy', wdte) AS wdoy,
         to_char(dte, 'Mon DD') AS mmdd,
         tmin_c, tmax_c, tavg_c, prcp_mm, snow_mm, snwd_mm
   FROM wdoy_data")) %>%
   mutate(wyear=as.integer(wyear),
            wdoy=as.integer(wdoy),
            snwd_mm=as.integer(snwd_mm)) %>%
   select(dte, wyear, wdoy, mmdd,
            tmin_c, tmax_c, tavg_c, prcp_mm, snow_mm, snwd_mm) %>% collect()

write_csv(snow, "pafa_data_with_wyear_post_1950.csv")
save(snow, file="pafa_data_with_wyear_post_1950.rdata")

cum_snow <- snow %>%
   mutate(snow_na=ifelse(is.na(snow_mm),1,0),
         snow_mm=ifelse(is.na(snow_mm),0,snow_mm)) %>%
   group_by(wyear) %>%
   mutate(snow_mm_cum=cumsum(snow_mm),
         snow_na=cumsum(snow_na)) %>%
   ungroup() %>%
   mutate(snow_in_cum=round(snow_mm_cum/25.4, 1),
         snwd_in=round(snwd_mm/25.4, 0))

nov_18_snow <- cum_snow %>%
   filter(mmdd=='Nov 18') %>%
   select(wyear, snow_in_cum, snwd_in) %>%
   arrange(snow_in_cum)

decadal_avg <- nov_18_snow %>%
   mutate(decade=as.integer(wyear/10)*10) %>%
   group_by(decade) %>%
   summarize(`Snow depth`=mean(snwd_in),
            snwd_sd=sd(snwd_in),
            `Cumulative Snowfall`=mean(snow_in_cum),
            snow_cum_sd=sd(snow_in_cum))

decadal_averages <- ggplot(decadal_avg %>%
                              gather(variable, value, -decade) %>%
                              filter(variable %in% c("Cumulative Snowfall",
                                                      "Snow depth")),
                           aes(x=as.factor(decade), y=value, fill=variable)) +
            theme_bw() +
            geom_bar(stat="identity", position="dodge") +
            scale_x_discrete(name="Decade", breaks=c(1950, 1960, 1970, 1980,
                                                   1990, 2000, 2010)) +
            scale_y_continuous(name="Inches", breaks=pretty_breaks(n=10)) +
            scale_fill_discrete(name="Measurement")

print(decadal_averages)

date_x_scale <- cum_snow %>%
   filter(grepl(' (01|15)', mmdd), wyear=='1994') %>%
   select(wdoy, mmdd)

cumulative_snowfall <-
   ggplot(cum_snow %>% filter(wyear %in% c(1953, 1954, 2014, 2006, 1990),
                              wdoy>183,
                              wdoy<320),
            aes(x=wdoy, y=snow_in_cum, colour=as.factor(wyear))) +
   theme_bw() +
   geom_smooth(data=cum_snow %>% filter(wdoy>183, wdoy<320),
               aes(x=wdoy, y=snow_in_cum),
               size=0.5, colour="darkcyan",
               inherit.aes=FALSE,
               se=FALSE) +
   geom_line(position="jitter") +
   scale_x_continuous(name="",
                     breaks=date_x_scale$wdoy,
                     labels=date_x_scale$mmdd) +
   scale_y_continuous(name="Cumulative snowfall (in)",
                     breaks=pretty_breaks(n=10)) +
   scale_color_discrete(name="Winter year")

print(cumulative_snowfall)

snow_depth <-
   ggplot(cum_snow %>% filter(wyear %in% c(1953, 1954, 1962, 2014, 1990),
                              wdoy>183,
                              wdoy<320),
            aes(x=wdoy, y=snwd_in, colour=as.factor(wyear))) +
   theme_bw() +
   geom_smooth(data=cum_snow %>% filter(wdoy>183, wdoy<320),
               aes(x=wdoy, y=snwd_in),
               size=0.5, colour="darkcyan",
               inherit.aes=FALSE,
               se=FALSE) +
   geom_line(position="jitter") +
   scale_x_continuous(name="",
                     breaks=date_x_scale$wdoy,
                     labels=date_x_scale$mmdd) +
   scale_y_continuous(name="Snow Depth (in)",
                     breaks=pretty_breaks(n=10)) +
   scale_color_discrete(name="Winter year")

print(snow_depth)
tags: snow depth  snowfall  weather  climate  R 
sat, 29-oct-2016, 21:14
Equinox Marathon Relay leg 2, 2016

Equinox Marathon Relay leg 2, 2016

Introduction

A couple years ago I compared racing data between two races (Gold Discovery and Equinox, Santa Claus and Equinox) in the same season for all runners that ran in both events. The result was an estimate of how fast I might run the Equinox Marathon based on my times for Gold Discovery and the Santa Claus Half Marathon.

Several years have passed and I've run more races and collected more racing data for all the major Fairbanks races and wanted to run the same analysis for all combinations of races.

Data

The data comes from a database I’ve built of race times for all competitors, mostly coming from the results available from Chronotrack, but including some race results from SportAlaska.

We started by loading the required R packages and reading in all the racing data, a small subset of which looks like this.

race year name finish_time birth_year sex
Beat Beethoven 2015 thomas mcclelland 00:21:49 1995 M
Equinox Marathon 2015 jennifer paniati 06:24:14 1989 F
Equinox Marathon 2014 kris starkey 06:35:55 1972 F
Midnight Sun Run 2014 kathy toohey 01:10:42 1960 F
Midnight Sun Run 2016 steven rast 01:59:41 1960 M
Equinox Marathon 2013 elizabeth smith 09:18:53 1987 F
... ... ... ... ... ...

Next we loaded in the names and distances of the races and combined this with the individual racing data. The data from Chronotrack doesn’t include the mileage and we will need that to calculate pace (minutes per mile).

My database doesn’t have complete information about all the racers that competed, and in some cases the information for a runner in one race conflicts with the information for the same runner in a different race. In order to resolve this, we generated a list of runners, grouped by their name, and threw out racers where their name matches but their gender was reported differently from one race to the next. Please understand we’re not doing this to exclude those who have changed their gender identity along the way, but to eliminate possible bias from data entry mistakes.

Finally, we combined the racers with the individual racing data, substituting our corrected runner information for what appeared in the individual race’s data. We also calculated minutes per mile (pace) and the age of the runner during the year of the race (age). Because we’re assigning a birth year to the minimum reported year from all races, our age variable won’t change during the running season, which is closer to the way age categories are calculated in Europe. Finally, we removed results where pace was greater than 20 minutes per mile for races longer than ten miles, and greater than 16 minute miles for races less than ten miles. These are likely to be outliers, or competitors not running the race.

name birth_year gender race_str year miles minutes pace age
aaron austin 1983 M midnight_sun_run 2014 6.2 50.60 8.16 31
aaron bravo 1999 M midnight_sun_run 2013 6.2 45.26 7.30 14
aaron bravo 1999 M midnight_sun_run 2014 6.2 40.08 6.46 15
aaron bravo 1999 M midnight_sun_run 2015 6.2 36.65 5.91 16
aaron bravo 1999 M midnight_sun_run 2016 6.2 36.31 5.85 17
aaron bravo 1999 M spruce_tree_classic 2014 6.0 42.17 7.03 15
... ... ... ... ... ... ... ... ...

We combined all available results for each runner in all years they participated such that the resulting rows are grouped by runner and year and columns are the races themselves. The values in each cell represent the pace for the runner × year × race combination.

For example, here’s the first six rows for runners that completed Beat Beethoven and the Chena River Run in the years I have data. I also included the column for the Midnight Sun Run in the table, but the actual data has a column for all the major Fairbanks races. You’ll see that two of the six runners listed ran BB and CRR but didn’t run MSR in that year.

name gender age year beat_beethoven chena_river_run midnight_sun_run
aaron schooley M 36 2016 8.19 8.15 8.88
abby fett F 33 2014 10.68 10.34 11.59
abby fett F 35 2016 11.97 12.58 NA
abigail haas F 11 2015 9.34 8.29 NA
abigail haas F 12 2016 8.48 7.90 11.40
aimee hughes F 43 2015 11.32 9.50 10.69
... ... ... ... ... ... ...

With this data, we build a whole series of linear models, one for each race combination. We created a series of formula strings and objects for all the combinations, then executed them using map(). We combined the start and predicted race names with the linear models, and used glance() and tidy() from the broom package to turn the models into statistics and coefficients.

All of the models between races were highly significant, but many of them contain coefficients that aren’t significantly different than zero. That means that including that term (age, gender or first race pace) isn’t adding anything useful to the model. We used the significance of each term to reduce our models so they only contained coefficients that were significant and regenerated the statistics and coefficients for these reduced models.

The full R code appears at the bottom of this post.

Results

Here’s the statistics from the ten best performing models (based on ).

start_race predicted_race n p-value
run_of_the_valkyries golden_heart_trail_run 40 0.956 0
golden_heart_trail_run equinox_marathon 36 0.908 0
santa_claus_half_marathon golden_heart_trail_run 34 0.896 0
midnight_sun_run gold_discovery_run 139 0.887 0
beat_beethoven golden_heart_trail_run 32 0.886 0
run_of_the_valkyries gold_discovery_run 44 0.877 0
midnight_sun_run golden_heart_trail_run 52 0.877 0
gold_discovery_run santa_claus_half_marathon 111 0.876 0
chena_river_run golden_heart_trail_run 44 0.873 0
run_of_the_valkyries santa_claus_half_marathon 91 0.851 0

It’s interesting how many times the Golden Heart Trail Run appears on this list since that run is something of an outlier in the Usibelli running series because it’s the only race entirely on trails. Maybe it’s because it’s distance (5K) is comparable with a lot of the earlier races in the season, but because it’s on trails it matches well with the later races that are at least partially on trails like Gold Discovery or Equinox.

Here are the ten worst models.

start_race predicted_race n p-value
midnight_sun_run equinox_marathon 431 0.525 0
beat_beethoven hoodoo_half_marathon 87 0.533 0
beat_beethoven midnight_sun_run 818 0.570 0
chena_river_run equinox_marathon 196 0.572 0
equinox_marathon hoodoo_half_marathon 90 0.584 0
beat_beethoven equinox_marathon 265 0.585 0
gold_discovery_run hoodoo_half_marathon 41 0.599 0
beat_beethoven santa_claus_half_marathon 163 0.612 0
run_of_the_valkyries equinox_marathon 125 0.642 0
midnight_sun_run hoodoo_half_marathon 118 0.657 0

Most of these models are shorter races like Beat Beethoven or the Chena River Run predicting longer races like Equinox or one of the half marathons. Even so, each model explains more than half the variation in the data, which isn’t terrible.

Application

Now that we have all our models and their coefficients, we used these models to make predictions of future performance. I’ve written an online calculator based on the reduced models that let you predict your race results as you go through the running season. The calculator is here: Fairbanks Running Race Converter.

For example, I ran a 7:41 pace for Run of the Valkyries this year. Entering that, plus my age and gender into the converter predicts an 8:57 pace for the first running of the HooDoo Half Marathon. The for this model was a respectable 0.71 even though only 23 runners ran both races this year (including me). My actual pace for HooDoo was 8:18, so I came in quite a bit faster than this. No wonder my knee and hip hurt after the race! Using my time from the Golden Heart Trail Run, the converter predicts a HooDoo Half pace of 8:16.2, less than a minute off my 1:48:11 finish.

Appendix: R code

library(tidyverse)
library(lubridate)
library(broom)

races_db <- src_postgres(host="localhost", dbname="races")

combined_races <- tbl(races_db, build_sql(
    "SELECT race, year, lower(name) AS name, finish_time,
        year - age AS birth_year, sex
     FROM chronotrack
     UNION
     SELECT race, year, lower(name) AS name, finish_time,
        birth_year,
        CASE WHEN age_class ~ 'M' THEN 'M' ELSE 'F' END AS sex
     FROM sportalaska
     UNION
     SELECT race, year, lower(name) AS name, finish_time,
        NULL AS birth_year, NULL AS sex
     FROM other"))

races <- tbl(races_db, build_sql(
    "SELECT race,
        lower(regexp_replace(race, '[ ’]', '_', 'g')) AS race_str,
        date_part('year', date) AS year,
        miles
     FROM races"))

racing_data <- combined_races %>%
    inner_join(races) %>%
    filter(!is.na(finish_time))

racers <- racing_data %>%
    group_by(name) %>%
    summarize(races=n(),
              birth_year=min(birth_year),
              gender_filter=ifelse(sum(ifelse(sex=='M',1,0))==
                                   sum(ifelse(sex=='F',1,0)),
                                   FALSE, TRUE),
              gender=ifelse(sum(ifelse(sex=='M',1,0))>
                            sum(ifelse(sex=='F',1,0)),
                            'M', 'F')) %>%
    ungroup() %>%
    filter(gender_filter) %>%
    select(-gender_filter)

racing_data_filled <- racing_data %>%
    inner_join(racers, by="name") %>%
    mutate(birth_year=birth_year.y) %>%
    select(name, birth_year, gender, race_str, year, miles, finish_time) %>%
    group_by(name, race_str, year) %>%
    mutate(n=n()) %>%
    filter(!is.na(birth_year), n==1) %>%
    ungroup() %>%
    collect() %>%
    mutate(fixed=ifelse(grepl('[0-9]+:[0-9]+:[0-9.]+', finish_time),
                        finish_time,
                        paste0('00:', finish_time)),
           minutes=as.numeric(seconds(hms(fixed)))/60.0,
           pace=minutes/miles,
           age=year-birth_year,
           age_class=as.integer(age/10)*10,
           group=paste0(gender, age_class),
           gender=as.factor(gender)) %>%
    filter((miles<10 & pace<16) | (miles>=10 & pace<20)) %>%
    select(-fixed, -finish_time, -n)

speeds_combined <- racing_data_filled %>%
    select(name, gender, age, age_class, group, race_str, year, pace) %>%
    spread(race_str, pace)

main_races <- c('beat_beethoven', 'chena_river_run', 'midnight_sun_run',
                'run_of_the_valkyries', 'gold_discovery_run',
                'santa_claus_half_marathon', 'golden_heart_trail_run',
                'equinox_marathon', 'hoodoo_half_marathon')

race_formula_str <-
    lapply(seq(1, length(main_races)-1),
           function(i)
               lapply(seq(i+1, length(main_races)),
                      function(j) paste(main_races[[j]], '~',
                                        main_races[[i]],
                                        '+ gender', '+ age'))) %>%
    unlist()

race_formulas <- lapply(race_formula_str, function(i) as.formula(i)) %>%
    unlist()

lm_models <- map(race_formulas, ~ lm(.x, data=speeds_combined))

models <- tibble(start_race=factor(gsub('.* ~ ([^ ]+).*',
                                        '\\1',
                                        race_formula_str),
                                   levels=main_races),
                 predicted_race=factor(gsub('([^ ]+).*',
                                            '\\1',
                                            race_formula_str),
                                       levels=main_races),
                 lm_models=lm_models) %>%
    arrange(start_race, predicted_race)

model_stats <- glance(models %>% rowwise(), lm_models)
model_coefficients <- tidy(models %>% rowwise(), lm_models)

reduced_formula_str <- model_coefficients %>%
    ungroup() %>%
    filter(p.value<0.05, term!='(Intercept)') %>%
    mutate(term=gsub('genderM', 'gender', term)) %>%
    group_by(predicted_race, start_race) %>%
    summarize(independent_vars=paste(term, collapse=" + ")) %>%
    ungroup() %>%
    transmute(reduced_formulas=paste(predicted_race, independent_vars, sep=' ~ '))

reduced_formula_str <- reduced_formula_str$reduced_formulas

reduced_race_formulas <- lapply(reduced_formula_str,
                                function(i) as.formula(i)) %>% unlist()

reduced_lm_models <- map(reduced_race_formulas, ~ lm(.x, data=speeds_combined))

n_from_lm <- function(model) {
    summary_object <- summary(model)

    summary_object$df[1] + summary_object$df[2]
}

reduced_models <- tibble(start_race=factor(gsub('.* ~ ([^ ]+).*', '\\1', reduced_formula_str),
                                           levels=main_races),
                         predicted_race=factor(gsub('([^ ]+).*', '\\1', reduced_formula_str),
                                               levels=main_races),
                         lm_models=reduced_lm_models) %>%
    arrange(start_race, predicted_race) %>%
    rowwise() %>%
    mutate(n=n_from_lm(lm_models))

reduced_model_stats <- glance(reduced_models %>% rowwise(), lm_models)
reduced_model_coefficients <- tidy(reduced_models %>% rowwise(), lm_models) %>%
    ungroup()

coefficients_and_stats <- reduced_model_stats %>%
    inner_join(reduced_model_coefficients,
               by=c("start_race", "predicted_race", "n")) %>%
    select(start_race, predicted_race, n, r.squared, term, estimate)

write_csv(coefficients_and_stats,
          "coefficients.csv")

make_scatterplot <- function(start_race, predicted_race) {
   age_limits <- speeds_combined %>%
      filter_(paste("!is.na(", start_race, ")"),
               paste("!is.na(", predicted_race, ")")) %>%
      summarize(min=min(age), max=max(age)) %>%
      unlist()

   q <- ggplot(data=speeds_combined,
               aes_string(x=start_race, y=predicted_race)) +
            # plasma works better with a grey background
            # theme_bw() +
            geom_abline(slope=1, color="darkred", alpha=0.5) +
            geom_smooth(method="lm", se=FALSE) +
            geom_point(aes(shape=gender, color=age)) +
            scale_color_viridis(option="plasma",
                              limits=age_limits) +
            scale_x_continuous(breaks=pretty_breaks(n=10)) +
            scale_y_continuous(breaks=pretty_breaks(n=6))

   svg_filename <- paste0(paste(start_race, predicted_race, sep="-"), ".svg")

   height <- 9
   width <- 16
   resize <- 0.75

   svg(svg_filename, height=height*resize, width=width*resize)
   print(q)
   dev.off()
}

lapply(seq(1, length(main_races)-1),
      function(i)
            lapply(seq(i+1, length(main_races)),
                  function(j)
                        make_scatterplot(main_races[[i]], main_races[[j]])
                  )
sun, 28-feb-2016, 09:22

Introduction

I’ve been brewing beer since the early 90s, and since those days the number of hops available to homebrewers has gone from a handfull of varieties (Northern Brewer, Goldings, Fuggle, Willamette, Cluster) to well over a hundred. Whenever I go to my local brewing store I’m bewildered by the large variety of hops, most of which I’ve never heard of. I’m also not all that fond of super-citrusy hops like Cascade or it’s variants, so it is a challenge to find flavor and aroma hops that aren’t citrusy among the several dozen new varieties on display.

Most of the hops at the store are Yakima Chief – Hop Union branded, and they’ve got a great web site that lists all their varieties and has information about each hop. As convenient as a website can be, I’d rather have the data in a database where I can search and organize it myself. Since the data is all on the website, we can use a web scraping library to grab it and format it however we like.

One note: websites change all the time, so whenever the structure of a site changes, the code to grab the data will need to be updated. I originally wrote the code for this post a couple weeks ago, scraping data from the Hop Union web site. This morning, that site had been replaced with an entirely different Yakima Chief – Hop Union site and I had to completely rewrite the code.

rvest

I’m using the rvest package from Hadley Wickham and RStudio to do the work of pulling the data from each page. In the Python world, Beautiful Soup would be the library I’d use, but there’s a fair amount of data manipulation needed here and I felt like dplyr would be easier.

Process

First, load all the libraries we need.

library(rvest)       # scraping data from the web
library(dplyr)       # manipulation, filtering, grouping into tables
library(stringr)     # string functions
library(tidyr)       # creating columns from rows
library(RPostgreSQL) # dump final result to a PostgreSQL database

Next, we retrieve the data from the main page that has all the varieties listed on it, and extract the list of links to each hop. In the code below, we read the entire document into a variable, hop_varieties using the read_html function.

Once we’ve got the web page, we need to find the HTML nodes that contain links to the page for each individual hop. To do that, we use html_nodes(), passing a CSS selector to the function. In this case, we’re looking for a tags that have a class of card__name. I figured this out by looking at the raw source code from the page using my web browser’s inspection tools. If you right-click on what looks like a link on the page, one of the options in the pop-up menu is “inspect”, and when you choose that, it will show you the HTML for the element you clicked on. It can take a few tries to find the proper combination of tag, class, attribute or id to uniquely identify the things you want.

The YCH site is pretty well organized, so this isn’t too difficult. Once we’ve got the nodes, we extract the links by retrieving the href attribute from each one with html_attr().

hop_varieties <- read_html("http://ychhops.com/varieties")

hop_page_links <- hop_varieties %>%
    html_nodes("a.card__name") %>%
    html_attr("href")

Now we have a list of links to all the varieties on the page. It turns out that they made a mistake when they transitioned to the new site and the links all have the wrong host (ych.craft.dev). We can fix that by applying replacing the host in all the links.

fixed_links <- sapply(hop_page_links,
                     FUN=function(x) sub('ych.craft.dev',
                                         'ychhops.com', x)) %>%
    as.vector()

Each page will need to be loaded, the relevant information extracted, and the data formatted into a suitable data structure. I think a data frame is the best format for this, where each row in the data frame represents the data for a single hop and each column is a piece of information from the web page.

First we write a function the retrieves the data for a single hop and returns a one-row data frame with that data. Most of the data is pretty simple, with a single value for each hop. Name, description, type of hop, etc. Where it gets more complicated is the each hop can have more than one aroma category, and the statistics for each hop vary from one to the next. What we’ve done here is combine the aromas together into a single string, using the at symbol (@) to separate the parts. Since it’s unlikely that symbol will appear in the data, we can split it back apart later. We do the same thing for the other parameters, creating an @-delimited string for the items, and their values.

get_hop_stats <- function(p) {
    hop_page <- read_html(p)

    hop_name <- hop_page %>%
        html_nodes('h1[itemprop="name"]') %>%
        html_text()

    type <- hop_page %>%
        html_nodes('div.hop-profile__data div[itemprop="additionalProperty"]') %>%
        html_text()
    type <- (str_split(type, ' '))[[1]][2]

    region <- hop_page %>%
        html_nodes('div.hop-profile__data h5') %>%
        html_text()

    description <- hop_page %>%
        html_nodes('div.hop-profile__profile p[itemprop="description"]') %>%
        html_text()

    aroma_profiles <- hop_page %>%
        html_nodes('div.hop-profile__profile h3.headline a[itemprop="category"]') %>%
        html_text()

    aroma_profiles <- sapply(aroma_profiles,
                             FUN=function(x) sub(',$', '', x)) %>%
        as.vector() %>%
        paste(collapse="@")

    composition_keys <- hop_page %>%
        html_nodes('div.hop-composition__item') %>%
        html_text()

    composition_keys <- sapply(composition_keys,
                               FUN=function(x)
                                   tolower(gsub('[ -]', '_', x))) %>%
        as.vector() %>%
        paste(collapse="@")

    composition_values <- hop_page %>%
        html_nodes('div.hop-composition__value') %>%
        html_text() %>%
        paste(collapse="@")

    hop <- data.frame('hop'=hop_name, 'type'=type, 'region'=region,
                      'description'=description,
                      'aroma_profiles'=aroma_profiles,
                      'composition_keys'=composition_keys,
                      'composition_values'=composition_values)

}

With a function that takes a URL as input, and returns a single-row data frame, we use a common idiom in R to combine everything together. The inner-most lapply function will run the function on each of the fixed variety links, and each single-row data frame will then be combined together using rbind within do.call.

all_hops <- do.call(rbind,
                    lapply(fixed_links, get_hop_stats)) %>% tbl_df() %>%
    arrange(hop) %>%
    mutate(id=row_number())

At this stage we’ve retrieved all the data from the website, but some of it has been encoded in a less that useful format.

Data tidying

To tidy the data, we want to extract only a few of the item / value pairs of data from the data frame, alpha acid, beta acid, co-humulone and total oil. We also need to remove carriage returns from the description and remove the aroma column.

We split the keys and values back apart again using the @ symbol used earlier to combine them, then use unnest to create duplicate columns with each of the key / value pairs in them. spread pivots these up into columns such that the end result has one row per hop with the relevant composition values as columns in the tidy data set.

hops <- all_hops %>%
    arrange(hop) %>%
    mutate(description=gsub('\\r', '', description),
           keys=str_split(composition_keys, "@"),
           values=str_split(composition_values, "@")) %>%
    unnest(keys, values) %>%
    spread(keys, values) %>%
    select(id, hop, type, region, alpha_acid, beta_acid, co_humulone, total_oil, description)

kable(hops %>% select(id, hop, type, region, alpha_acid) %>% head())
id hop type region alpha_acid
1 Admiral Bittering United Kingdom 13 - 16%
2 Ahtanum™ Aroma Pacific Northwest 3.5 - 6.5%
3 Amarillo® Aroma Pacific Northwest 7 - 11%
4 Aramis Aroma France 7.9 - 8.3%
5 Aurora Dual Slovenia 7 - 9.5%
6 Bitter Gold Dual Pacific Northwest 12 - 14.5%

For the aromas we have a one to many relationship where each hop has one or more aroma categories associated. We could fully normalize this by created an aroma table and a join table that connects hop and aroma, but this data is simple enough that I just created the join table itself. We’re using the same str_split / unnest method we used before, except that in this case we don't want to turn those into columns, we want a separate row for each hop × aroma combination.

hops_aromas <- all_hops %>%
    select(id, hop, aroma_profiles) %>%
    mutate(aroma=str_split(aroma_profiles, "@")) %>%
    unnest(aroma) %>%
    select(id, hop, aroma)

Saving and exporting

Finally, we save the data and export it into a PostgreSQL database.

save(list=c("hops", "hops_aromas"),
     file="ych_hops.rdata")

beer <- src_postgres(host="dryas.swingleydev.com", dbname="beer",
                     port=5434, user="cswingle")

dbWriteTable(beer$con, "ych_hops", hops %>% data.frame(), row.names=FALSE)
dbWriteTable(beer$con, "ych_hops_aromas", hops_aromas %>% data.frame(), row.names=FALSE)

Usage

I created a view in the database that combines all the aroma categories into a Postgres array type using this query. I also use a pair of regular expressions to convert the alpha acid string into a Postgres numrange.

CREATE VIEW ych_basic_hop_data AS
SELECT ych_hops.id, ych_hops.hop, array_agg(aroma) AS aromas, type,
    numrange(
        regexp_replace(alpha_acid, '([0-9.]+).*', E'\\1')::numeric,
        regexp_replace(alpha_acid, '.*- ([0-9.]+)%', E'\\1')::numeric,
        '[]') AS alpha_acid_percent, description
FROM ych_hops
    INNER JOIN ych_hops_aromas USING(id)
GROUP BY ych_hops.id, ych_hops.hop, type, alpha_acid, description
ORDER BY hop;

With this, we can, for example, find US aroma hops that are spicy, but without citrus using the ANY() and ALL() array functions.

SELECT hop, region, type, aromas, alpha_acid_percent
FROM ych_basic_hop_data
WHERE type = 'Aroma' AND region = 'Pacific Northwest' AND 'Spicy' = ANY(aromas)
AND 'Citrus' != ALL(aromas) ORDER BY alpha_acid_percent;

     hop    |      region       | type  |            aromas            | alpha_acid_percent
 -----------+-------------------+-------+------------------------------+--------------------
  Crystal   | Pacific Northwest | Aroma | {Floral,Spicy}               | [3,6]
  Hallertau | Pacific Northwest | Aroma | {Floral,Spicy,Herbal}        | [3.5,6.5]
  Tettnang  | Pacific Northwest | Aroma | {Earthy,Floral,Spicy,Herbal} | [4,6]
  Mt. Hood  | Pacific Northwest | Aroma | {Spicy,Herbal}               | [4,6.5]
  Santiam   | Pacific Northwest | Aroma | {Floral,Spicy,Herbal}        | [6,8.5]
  Ultra     | Pacific Northwest | Aroma | {Floral,Spicy}               | [9.2,9.7]

Code

The RMarkdown version of this post, including the code can be downloaded from GitHub:

https://github.com/cswingle/ych_hops_scraper


0 1 2 3 4 5 6 7 >>
Meta Photolog Archives