metachronistic http://swingleydev.com/blog/ Latest metachronistic posts en-us Sat, 10 Jun 2017 10:21:21 -0800 Koidern, 2002—2017 http://swingleydev.com/blog/p/2004/ <div class="document"> <div class="figure align-right"> <a class="reference external image-reference" href="//media.swingleydev.com/img/blog/2017/06/koidern_bed.jpg"><img alt="Koidern on her bed" src="//media.swingleydev.com/img/blog/2017/06/koidern_bed_300.jpg" style="width: 300px; height: 225px;" /></a> <p class="caption">Koidern</p> </div> <p>Yesterday we lost Koidern to complications from laryngeal paralysis. Koidern came to us in 2006 from Andrea’s mushing partner who thought she was too “ornery.” It is true that she wouldn’t hesitate to growl at a dog or cat who got too close to her food bowl, and she was protective of her favorite bed, but in every other way she was a very sweet dog. When she was younger she loved to give hugs, jumping up on her hind legs and wrapping her front legs around your waist. She was part Saluki, which made her very distinctive in Andrea’s dog teams and she never lost her beautiful brown coat, perky ears, and curled tail. I will miss her continual energy in the dog yard racing around after the other dogs, how she’d pounce on dog bones and toss them around, “smash” the cats, and the way she’d bark right before coming into the house as if to announce her entrance.</p> <br clear="all" /><div class="figure"> <a class="reference external image-reference" href="//media.swingleydev.com/img/blog/2017/06/koidern_hug.jpg"><img alt="Koidern hug" class="img-responsive" src="//media.swingleydev.com/img/blog/2017/06/koidern_hug_600.jpg" /></a> <p class="caption">Koidern hug (with her sister Kluane and Carol Kaynor)</p> </div> <div class="figure"> <a class="reference external image-reference" href="//media.swingleydev.com/img/blog/2017/06/tok_with_piper_buddy.jpg"><img alt="Koidern with Piper and Buddy" class="img-responsive" src="//media.swingleydev.com/img/blog/2017/06/tok_with_piper_buddy_600.jpg" /></a> <p class="caption">Koidern in Tok, with Piper and Buddy</p> </div> </div> Sat, 10 Jun 2017 10:21:21 -0800 http://swingleydev.com/blog/p/2004/ Koidern dogs memorial Non-motorized commuting by state http://swingleydev.com/blog/p/2003/ <div class="document"> <div class="section" id="introduction"> <h1>Introduction</h1> <p>The Alaska Department of Transportation is working on updating their <a class="reference external" href="http://www.akbikeped.com/">bicycling and pedestrian master plan</a> for the state and their web site mentions Alaska as having high percentages of bicycle and pedestrian commuters relative to the rest of the country. I’m interested because I commute to work by bicycle (and occasionally ski or run) every day, either on the trails in the winter, or the roads in the summer. The company I work for (<a class="reference external" href="https://www.abrinc.com/">ABR</a>) pays it’s employees $3.50 per day for using non-motorized means of transportation to get to work. I earned more than $700 last year as part of this program and ABR has paid it’s employees almost $40K since 2009 not to drive to work.</p> <p>The Census Bureau keeps track of how people get to work in the American Community Survey, easily accessible from their web site. We’ll use this data to see if Alaska really does have higher than average rates of non-motorized commuters.</p> </div> <div class="section" id="data"> <h1>Data</h1> <p>The data comes from <a class="reference external" href="https://factfinder.census.gov/faces/nav/jsf/pages/searchresults.xhtml?refresh=t">FactFinder</a>. I chose ‘American Community Survey’ from the list of data sources near the bottom, searched for ‘bicycle’, chose ‘Commuting characteristics by sex’ (Table S0801), and added the ‘All States within United States and Puerto Rico’ as the Geography of interest. The site generates a zip file containing the data as a CSV file along with several other informational files. The code for extracting the data appears at the bottom of this post.</p> <p>The data are percentages of workers 16 years and over and their means of transportation to work. Here’s a table showing the top 10 states ordered by the combination of bicycling and walking percentage.</p> <table border="1" class="docutils"> <colgroup> <col width="6%" /> <col width="23%" /> <col width="13%" /> <col width="13%" /> <col width="11%" /> <col width="17%" /> <col width="8%" /> <col width="11%" /> </colgroup> <thead valign="bottom"> <tr><th class="head">&nbsp;</th> <th class="head">state</th> <th class="head">total</th> <th class="head">motorized</th> <th class="head">carpool</th> <th class="head">public_trans</th> <th class="head">walk</th> <th class="head">bicycle</th> </tr> </thead> <tbody valign="top"> <tr><td>1</td> <td>District of Columbia</td> <td>358,150</td> <td>38.8</td> <td>5.2</td> <td>35.8</td> <td>14.0</td> <td>4.1</td> </tr> <tr><td>2</td> <td>Alaska</td> <td>363,075</td> <td>80.5</td> <td>12.6</td> <td>1.5</td> <td>7.9</td> <td>1.1</td> </tr> <tr><td>3</td> <td>Montana</td> <td>484,043</td> <td>84.9</td> <td>10.4</td> <td>0.8</td> <td>5.6</td> <td>1.6</td> </tr> <tr><td>4</td> <td>New York</td> <td>9,276,438</td> <td>59.3</td> <td>6.6</td> <td>28.6</td> <td>6.3</td> <td>0.7</td> </tr> <tr><td>5</td> <td>Vermont</td> <td>320,350</td> <td>85.1</td> <td>8.2</td> <td>1.3</td> <td>5.8</td> <td>0.8</td> </tr> <tr><td>6</td> <td>Oregon</td> <td>1,839,706</td> <td>81.4</td> <td>10.2</td> <td>4.8</td> <td>3.8</td> <td>2.5</td> </tr> <tr><td>7</td> <td>Massachusetts</td> <td>3,450,540</td> <td>77.6</td> <td>7.4</td> <td>10.6</td> <td>5.0</td> <td>0.8</td> </tr> <tr><td>8</td> <td>Wyoming</td> <td>289,163</td> <td>87.3</td> <td>10.0</td> <td>2.2</td> <td>4.6</td> <td>0.6</td> </tr> <tr><td>9</td> <td>Hawaii</td> <td>704,914</td> <td>80.9</td> <td>13.5</td> <td>7.0</td> <td>4.1</td> <td>0.9</td> </tr> <tr><td>10</td> <td>Washington</td> <td>3,370,945</td> <td>82.2</td> <td>9.8</td> <td>6.2</td> <td>3.7</td> <td>1.0</td> </tr> </tbody> </table> <p>Alaska has the second highest rates of walking and biking to work behind the District of Columbia. The table is an interesting combination of states with large urban centers (DC, New York, Oregon, Massachusetts) and those that are more rural (Alaska, Montana, Vermont, Wyoming).</p> <p>Another way to rank the data is by combining all forms of transportation besides single-vehicle motorized transport (car pooling, public transportation, walking and bicycling).</p> <table border="1" class="docutils"> <colgroup> <col width="6%" /> <col width="23%" /> <col width="13%" /> <col width="13%" /> <col width="11%" /> <col width="17%" /> <col width="8%" /> <col width="11%" /> </colgroup> <thead valign="bottom"> <tr><th class="head">&nbsp;</th> <th class="head">state</th> <th class="head">total</th> <th class="head">motorized</th> <th class="head">carpool</th> <th class="head">public_trans</th> <th class="head">walk</th> <th class="head">bicycle</th> </tr> </thead> <tbody valign="top"> <tr><td>1</td> <td>District of Columbia</td> <td>358,150</td> <td>38.8</td> <td>5.2</td> <td>35.8</td> <td>14.0</td> <td>4.1</td> </tr> <tr><td>2</td> <td>New York</td> <td>9,276,438</td> <td>59.3</td> <td>6.6</td> <td>28.6</td> <td>6.3</td> <td>0.7</td> </tr> <tr><td>3</td> <td>Massachusetts</td> <td>3,450,540</td> <td>77.6</td> <td>7.4</td> <td>10.6</td> <td>5.0</td> <td>0.8</td> </tr> <tr><td>4</td> <td>New Jersey</td> <td>4,285,182</td> <td>79.3</td> <td>7.5</td> <td>11.6</td> <td>3.3</td> <td>0.3</td> </tr> <tr><td>5</td> <td>Alaska</td> <td>363,075</td> <td>80.5</td> <td>12.6</td> <td>1.5</td> <td>7.9</td> <td>1.1</td> </tr> <tr><td>6</td> <td>Hawaii</td> <td>704,914</td> <td>80.9</td> <td>13.5</td> <td>7.0</td> <td>4.1</td> <td>0.9</td> </tr> <tr><td>7</td> <td>Oregon</td> <td>1,839,706</td> <td>81.4</td> <td>10.2</td> <td>4.8</td> <td>3.8</td> <td>2.5</td> </tr> <tr><td>8</td> <td>Illinois</td> <td>6,094,828</td> <td>81.5</td> <td>7.9</td> <td>9.3</td> <td>3.0</td> <td>0.7</td> </tr> <tr><td>9</td> <td>Washington</td> <td>3,370,945</td> <td>82.2</td> <td>9.8</td> <td>6.2</td> <td>3.7</td> <td>1.0</td> </tr> <tr><td>10</td> <td>Maryland</td> <td>3,001,281</td> <td>82.6</td> <td>8.9</td> <td>9.0</td> <td>2.6</td> <td>0.3</td> </tr> </tbody> </table> <p>Here, the states with large urban centers come out higher because of the number of commuters using public transportation. Despite very low availability of public transportation, Alaska still winds up 5th on this list because of high rates of car pooling, in addition to walking and bicycling.</p> </div> <div class="section" id="map-data"> <h1>Map data</h1> <p>To look at regional patterns, we can make a map of the United States colored by non-motorized transportation percentage. This can be a little challenging because Alaska and Hawaii are so far from the rest of the country. What I’m doing here is loading the state data, transforming the data to a projection that’s appropriate for Alaska, and moving Alaska and Hawaii closer to the lower-48 for display. Again, the code appears at the bottom.</p> <div class="figure"> <a class="reference external image-reference" href="//media.swingleydev.com/img/blog/2017/04/non_motorized_commute_map.pdf"><object class="img-responsive" data="//media.swingleydev.com/img/blog/2017/04/non_motorized_commute_map.svg" type="image/svg+xml">Non-motorized commuting percentage by state</object></a> </div> <p>You can see that non-motorized transportation is very low throughout the deep south, and tends to be higher in the western half of the country, but the really high rates of bicycling and walking to work are isolated. High Vermont next to low New Hampshire, or Oregon and Montana split by Idaho.</p> </div> <div class="section" id="urban-and-rural-median-age-of-the-population"> <h1>Urban and rural, median age of the population</h1> <p>What explains the high rates of non-motorized commuting in Alaska and the other states at the top of the list? Urbanization is certainly one important factor explaining why the District of Columbia and states like New York, Oregon and Massachusetts have high rates of walking and bicycling. But what about Montana, Vermont, and Wyoming?</p> <p>Age of the population might have an effect as well, as younger people are more likely to walk and bike to work than older people. Alaska has the second youngest population (33.3 years) in the U.S. and DC is third (33.8), but the other states in the top five (Utah, Texas, North Dakota) don’t have high non-motorized transportation.</p> <p>So it’s more complicated that just these factors. California is a good example, with a combination of high urbanization (second, 95.0% urban), low median age (eighth, 36.2) and great weather year round, but is 19th for non-motorized commuting. Who walks in California, after all?</p> </div> <div class="section" id="conclusion"> <h1>Conclusion</h1> <p>I hope DOT comes up with a progressive plan for improving opportunities for pedestrian and bicycle transportation in Alaska They’ve made some progress here in Fairbanks; building new paths for non-motorized traffic; but they also seem blind to the realities of actually using the roads and paths on a bicycle. The “bike path” near my house abruptly turns from asphalt to gravel a third of the way down Miller Hill, and the shoulders of the roads I commute on are filled with deep snow in winter, gravel in spring, and all manner of detritus year round. Many roads don’t have a useable shoulder at all.</p> </div> <div class="section" id="code"> <h1>Code</h1> <div class="highlight"><pre><span></span><span class="kn">library</span><span class="p">(</span>tidyverse<span class="p">)</span> <span class="c1"># data import, manipulation</span> <span class="kn">library</span><span class="p">(</span>knitr<span class="p">)</span> <span class="c1"># pretty tables</span> <span class="kn">library</span><span class="p">(</span>rpostgis<span class="p">)</span> <span class="c1"># PostGIS support</span> <span class="kn">library</span><span class="p">(</span>rgdal<span class="p">)</span> <span class="c1"># geographic transformation</span> <span class="kn">library</span><span class="p">(</span>maptools<span class="p">)</span> <span class="c1"># geographic transformation</span> <span class="kn">library</span><span class="p">(</span>viridis<span class="p">)</span> <span class="c1"># color blind color palette</span> <span class="c1"># Read the heading</span> heading <span class="o">&lt;-</span> read_csv<span class="p">(</span><span class="s">&#39;ACS_15_1YR_S0801.csv&#39;</span><span class="p">,</span> n_max <span class="o">=</span> <span class="m">1</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="kp">names</span><span class="p">()</span> <span class="c1"># Read the data</span> s0801 <span class="o">&lt;-</span> read_csv<span class="p">(</span><span class="s">&#39;ACS_15_1YR_S0801.csv&#39;</span><span class="p">,</span> col_names <span class="o">=</span> <span class="kc">FALSE</span><span class="p">,</span> skip <span class="o">=</span> <span class="m">2</span><span class="p">)</span> <span class="kp">names</span><span class="p">(</span>s0801<span class="p">)</span> <span class="o">&lt;-</span> heading <span class="c1"># Extract only the columns we need, add state postal codes</span> commute <span class="o">&lt;-</span> s0801 <span class="o">%&gt;%</span> transmute<span class="p">(</span>state <span class="o">=</span> <span class="sb">`GEO.display-label`</span><span class="p">,</span> total <span class="o">=</span> HC01_EST_VC01<span class="p">,</span> motorized <span class="o">=</span> HC01_EST_VC03<span class="p">,</span> carpool <span class="o">=</span> HC01_EST_VC05<span class="p">,</span> public_trans <span class="o">=</span> HC01_EST_VC10<span class="p">,</span> walk <span class="o">=</span> HC01_EST_VC11<span class="p">,</span> bicycle <span class="o">=</span> HC01_EST_VC12<span class="p">)</span> <span class="o">%&gt;%</span> filter<span class="p">(</span>state <span class="o">!=</span> <span class="s">&#39;Puerto Rico&#39;</span><span class="p">)</span> <span class="o">%&gt;%</span> mutate<span class="p">(</span>state_postal <span class="o">=</span> <span class="kt">c</span><span class="p">(</span><span class="s">&#39;AL&#39;</span><span class="p">,</span> <span class="s">&#39;AK&#39;</span><span class="p">,</span> <span class="s">&#39;AZ&#39;</span><span class="p">,</span> <span class="s">&#39;AR&#39;</span><span class="p">,</span> <span class="s">&#39;CA&#39;</span><span class="p">,</span> <span class="s">&#39;CO&#39;</span><span class="p">,</span> <span class="s">&#39;CT&#39;</span><span class="p">,</span> <span class="s">&#39;DE&#39;</span><span class="p">,</span> <span class="s">&#39;DC&#39;</span><span class="p">,</span> <span class="s">&#39;FL&#39;</span><span class="p">,</span> <span class="s">&#39;GA&#39;</span><span class="p">,</span> <span class="s">&#39;HI&#39;</span><span class="p">,</span> <span class="s">&#39;ID&#39;</span><span class="p">,</span> <span class="s">&#39;IL&#39;</span><span class="p">,</span> <span class="s">&#39;IN&#39;</span><span class="p">,</span> <span class="s">&#39;IA&#39;</span><span class="p">,</span> <span class="s">&#39;KS&#39;</span><span class="p">,</span> <span class="s">&#39;KY&#39;</span><span class="p">,</span> <span class="s">&#39;LA&#39;</span><span class="p">,</span> <span class="s">&#39;ME&#39;</span><span class="p">,</span> <span class="s">&#39;MD&#39;</span><span class="p">,</span> <span class="s">&#39;MA&#39;</span><span class="p">,</span> <span class="s">&#39;MI&#39;</span><span class="p">,</span> <span class="s">&#39;MN&#39;</span><span class="p">,</span> <span class="s">&#39;MS&#39;</span><span class="p">,</span> <span class="s">&#39;MO&#39;</span><span class="p">,</span> <span class="s">&#39;MT&#39;</span><span class="p">,</span> <span class="s">&#39;NE&#39;</span><span class="p">,</span> <span class="s">&#39;NV&#39;</span><span class="p">,</span> <span class="s">&#39;NH&#39;</span><span class="p">,</span> <span class="s">&#39;NJ&#39;</span><span class="p">,</span> <span class="s">&#39;NM&#39;</span><span class="p">,</span> <span class="s">&#39;NY&#39;</span><span class="p">,</span> <span class="s">&#39;NC&#39;</span><span class="p">,</span> <span class="s">&#39;ND&#39;</span><span class="p">,</span> <span class="s">&#39;OH&#39;</span><span class="p">,</span> <span class="s">&#39;OK&#39;</span><span class="p">,</span> <span class="s">&#39;OR&#39;</span><span class="p">,</span> <span class="s">&#39;PA&#39;</span><span class="p">,</span> <span class="s">&#39;RI&#39;</span><span class="p">,</span> <span class="s">&#39;SC&#39;</span><span class="p">,</span> <span class="s">&#39;SD&#39;</span><span class="p">,</span> <span class="s">&#39;TN&#39;</span><span class="p">,</span> <span class="s">&#39;TX&#39;</span><span class="p">,</span> <span class="s">&#39;UT&#39;</span><span class="p">,</span> <span class="s">&#39;VT&#39;</span><span class="p">,</span> <span class="s">&#39;VA&#39;</span><span class="p">,</span> <span class="s">&#39;WA&#39;</span><span class="p">,</span> <span class="s">&#39;WV&#39;</span><span class="p">,</span> <span class="s">&#39;WI&#39;</span><span class="p">,</span> <span class="s">&#39;WY&#39;</span><span class="p">))</span> <span class="c1"># Print top ten tables</span> kable<span class="p">(</span>commute <span class="o">%&gt;%</span> select<span class="p">(</span><span class="o">-</span>state_postal<span class="p">)</span> <span class="o">%&gt;%</span> arrange<span class="p">(</span>desc<span class="p">(</span>walk <span class="o">+</span> bicycle<span class="p">))</span> <span class="o">%&gt;%</span> <span class="kp">head</span><span class="p">(</span>n <span class="o">=</span> <span class="m">10</span><span class="p">),</span> format.args <span class="o">=</span> <span class="kt">list</span><span class="p">(</span>big.mark <span class="o">=</span> <span class="s">&quot;,&quot;</span><span class="p">),</span> row.names <span class="o">=</span> <span class="kc">TRUE</span><span class="p">)</span> kable<span class="p">(</span>commute <span class="o">%&gt;%</span> select<span class="p">(</span><span class="o">-</span>state_postal<span class="p">)</span> <span class="o">%&gt;%</span> arrange<span class="p">(</span>motorized<span class="p">)</span> <span class="o">%&gt;%</span> <span class="kp">head</span><span class="p">(</span>n <span class="o">=</span> <span class="m">10</span><span class="p">),</span> format.args <span class="o">=</span> <span class="kt">list</span><span class="p">(</span>big.mark <span class="o">=</span> <span class="s">&quot;,&quot;</span><span class="p">),</span> row.names <span class="o">=</span> <span class="kc">TRUE</span><span class="p">)</span> <span class="c1"># Connect to the database with the state layer</span> layers <span class="o">&lt;-</span> src_postgres<span class="p">(</span>host <span class="o">=</span> <span class="s">&quot;localhost&quot;</span><span class="p">,</span> dbname <span class="o">=</span> <span class="s">&quot;layers&quot;</span><span class="p">)</span> states <span class="o">&lt;-</span> pgGetGeom<span class="p">(</span>layers<span class="o">$</span>con<span class="p">,</span> <span class="kt">c</span><span class="p">(</span><span class="s">&quot;public&quot;</span><span class="p">,</span> <span class="s">&quot;states&quot;</span><span class="p">),</span> geom <span class="o">=</span> <span class="s">&quot;wkb_geometry&quot;</span><span class="p">,</span> gid <span class="o">=</span> <span class="s">&quot;ogc_fid&quot;</span><span class="p">)</span> <span class="c1"># Transform to srid 3338 (Alaska Albers)</span> states_3338 <span class="o">&lt;-</span> spTransform<span class="p">(</span>states<span class="p">,</span> CRS<span class="p">(</span><span class="s">&quot;+proj=aea +lat_1=55 +lat_2=65 +lat_0=50</span> <span class="s"> +lon_0=-154 +x_0=0 +y_ 0=0 +ellps=GRS80</span> <span class="s"> +towgs84=0,0,0,0,0,0,0 +units=m</span> <span class="s"> +no_defs&quot;</span><span class="p">))</span> <span class="c1"># Convert to a data frame suitable for ggplot, move AK and HI</span> ggstates <span class="o">&lt;-</span> fortify<span class="p">(</span>states_3338<span class="p">,</span> region <span class="o">=</span> <span class="s">&quot;state&quot;</span><span class="p">)</span> <span class="o">%&gt;%</span> filter<span class="p">(</span>id <span class="o">!=</span> <span class="s">&#39;PR&#39;</span><span class="p">)</span> <span class="o">%&gt;%</span> inner_join<span class="p">(</span>commute<span class="p">,</span> by <span class="o">=</span> <span class="p">(</span><span class="kt">c</span><span class="p">(</span><span class="s">&quot;id&quot;</span> <span class="o">=</span> <span class="s">&quot;state_postal&quot;</span><span class="p">)))</span> <span class="o">%&gt;%</span> mutate<span class="p">(</span>lat <span class="o">=</span> <span class="kp">ifelse</span><span class="p">(</span>state <span class="o">==</span> <span class="s">&#39;Hawaii&#39;</span><span class="p">,</span> lat <span class="o">+</span> <span class="m">2300000</span><span class="p">,</span> lat<span class="p">),</span> long <span class="o">=</span> <span class="kp">ifelse</span><span class="p">(</span>state <span class="o">==</span> <span class="s">&#39;Hawaii&#39;</span><span class="p">,</span> long <span class="o">+</span> <span class="m">2000000</span><span class="p">,</span> long<span class="p">),</span> lat <span class="o">=</span> <span class="kp">ifelse</span><span class="p">(</span>state <span class="o">==</span> <span class="s">&#39;Alaska&#39;</span><span class="p">,</span> lat <span class="o">+</span> <span class="m">1000000</span><span class="p">,</span> lat<span class="p">),</span> long <span class="o">=</span> <span class="kp">ifelse</span><span class="p">(</span>state <span class="o">==</span> <span class="s">&#39;Alaska&#39;</span><span class="p">,</span> long <span class="o">+</span> <span class="m">2000000</span><span class="p">,</span> long<span class="p">))</span> <span class="c1"># Plot it</span> p <span class="o">&lt;-</span> ggplot<span class="p">()</span> <span class="o">+</span> geom_polygon<span class="p">(</span>data <span class="o">=</span> ggstates<span class="p">,</span> colour <span class="o">=</span> <span class="s">&quot;black&quot;</span><span class="p">,</span> aes<span class="p">(</span>x <span class="o">=</span> long<span class="p">,</span> y <span class="o">=</span> lat<span class="p">,</span> group <span class="o">=</span> group<span class="p">,</span> fill <span class="o">=</span> bicycle <span class="o">+</span> walk<span class="p">))</span> <span class="o">+</span> coord_fixed<span class="p">(</span>ratio <span class="o">=</span> <span class="m">1</span><span class="p">)</span> <span class="o">+</span> scale_fill_viridis<span class="p">(</span>name <span class="o">=</span> <span class="s">&quot;Non-motorized\n commuters (%)&quot;</span><span class="p">,</span> option <span class="o">=</span> <span class="s">&quot;plasma&quot;</span><span class="p">,</span> limits <span class="o">=</span> <span class="kt">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span> <span class="m">9</span><span class="p">),</span> breaks <span class="o">=</span> <span class="kp">seq</span><span class="p">(</span><span class="m">0</span><span class="p">,</span> <span class="m">9</span><span class="p">,</span> <span class="m">3</span><span class="p">))</span> <span class="o">+</span> theme_void<span class="p">()</span> <span class="o">+</span> theme<span class="p">(</span>legend.position <span class="o">=</span> <span class="kt">c</span><span class="p">(</span><span class="m">0.9</span><span class="p">,</span> <span class="m">0.2</span><span class="p">))</span> width <span class="o">&lt;-</span> <span class="m">16</span> height <span class="o">&lt;-</span> <span class="m">9</span> resize <span class="o">&lt;-</span> <span class="m">0.75</span> svg<span class="p">(</span><span class="s">&quot;non_motorized_commute_map.svg&quot;</span><span class="p">,</span> width <span class="o">=</span> width<span class="o">*</span>resize<span class="p">,</span> height <span class="o">=</span> height<span class="o">*</span>resize<span class="p">)</span> <span class="kp">print</span><span class="p">(</span>p<span class="p">)</span> dev.off<span class="p">()</span> pdf<span class="p">(</span><span class="s">&quot;non_motorized_commute_map.pdf&quot;</span><span class="p">,</span> width <span class="o">=</span> width<span class="o">*</span>resize<span class="p">,</span> height <span class="o">=</span> height<span class="o">*</span>resize<span class="p">)</span> <span class="kp">print</span><span class="p">(</span>p<span class="p">)</span> dev.off<span class="p">()</span> <span class="c1"># Urban and rural percentages by state</span> heading <span class="o">&lt;-</span> read_csv<span class="p">(</span><span class="s">&#39;../urban_rural/DEC_10_SF1_P2.csv&#39;</span><span class="p">,</span> n_max <span class="o">=</span> <span class="m">1</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="kp">names</span><span class="p">()</span> dec10 <span class="o">&lt;-</span> read_csv<span class="p">(</span><span class="s">&#39;../urban_rural/DEC_10_SF1_P2.csv&#39;</span><span class="p">,</span> col_names <span class="o">=</span> <span class="kc">FALSE</span><span class="p">,</span> skip <span class="o">=</span> <span class="m">2</span><span class="p">)</span> <span class="kp">names</span><span class="p">(</span>dec10<span class="p">)</span> <span class="o">&lt;-</span> heading urban_rural <span class="o">&lt;-</span> dec10 <span class="o">%&gt;%</span> transmute<span class="p">(</span>state <span class="o">=</span> <span class="sb">`GEO.display-label`</span><span class="p">,</span> total <span class="o">=</span> D001<span class="p">,</span> urban <span class="o">=</span> D002<span class="p">,</span> rural <span class="o">=</span> D005<span class="p">)</span> <span class="o">%&gt;%</span> filter<span class="p">(</span>state <span class="o">!=</span> <span class="s">&#39;Puerto Rico&#39;</span><span class="p">)</span> <span class="o">%&gt;%</span> mutate<span class="p">(</span>urban_percentage <span class="o">=</span> urban <span class="o">/</span> total <span class="o">*</span> <span class="m">100</span><span class="p">)</span> <span class="c1"># Median age by state</span> heading <span class="o">&lt;-</span> read_csv<span class="p">(</span><span class="s">&#39;../age_sex/ACS_15_1YR_S0101.csv&#39;</span><span class="p">,</span> n_max <span class="o">=</span> <span class="m">1</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="kp">names</span><span class="p">()</span> s0101 <span class="o">&lt;-</span> read_csv<span class="p">(</span><span class="s">&#39;../age_sex/ACS_15_1YR_S0101.csv&#39;</span><span class="p">,</span> col_names <span class="o">=</span> <span class="kc">FALSE</span><span class="p">,</span> skip <span class="o">=</span> <span class="m">2</span><span class="p">)</span> <span class="kp">names</span><span class="p">(</span>s0101<span class="p">)</span> <span class="o">&lt;-</span> heading age <span class="o">&lt;-</span> s0101 <span class="o">%&gt;%</span> transmute<span class="p">(</span>state <span class="o">=</span> <span class="sb">`GEO.display-label`</span><span class="p">,</span> median_age <span class="o">=</span> HC01_EST_VC35<span class="p">)</span> <span class="o">%&gt;%</span> filter<span class="p">(</span>state <span class="o">!=</span> <span class="s">&#39;Puerto Rico&#39;</span><span class="p">)</span> <span class="c1"># Do urban percentage and median age explain anything about</span> <span class="c1"># non-motorized transit?</span> census_data <span class="o">&lt;-</span> commute <span class="o">%&gt;%</span> inner_join<span class="p">(</span>urban_rural<span class="p">,</span> by <span class="o">=</span> <span class="s">&quot;state&quot;</span><span class="p">)</span> <span class="o">%&gt;%</span> inner_join<span class="p">(</span>age<span class="p">,</span> by <span class="o">=</span> <span class="s">&quot;state&quot;</span><span class="p">)</span> <span class="o">%&gt;%</span> select<span class="p">(</span>state<span class="p">,</span> state_postal<span class="p">,</span> walk<span class="p">,</span> bicycle<span class="p">,</span> urban_percentage<span class="p">,</span> median_age<span class="p">)</span> <span class="o">%&gt;%</span> mutate<span class="p">(</span>non_motorized <span class="o">=</span> walk <span class="o">+</span> bicycle<span class="p">)</span> u <span class="o">&lt;-</span> ggplot<span class="p">(</span>data <span class="o">=</span> census_data<span class="p">,</span> aes<span class="p">(</span>x <span class="o">=</span> urban_percentage<span class="p">,</span> y <span class="o">=</span> non_motorized<span class="p">))</span> <span class="o">+</span> geom_text<span class="p">(</span>aes<span class="p">(</span>label <span class="o">=</span> state_postal<span class="p">))</span> svg<span class="p">(</span><span class="s">&quot;urban.svg&quot;</span><span class="p">,</span> width <span class="o">=</span> width<span class="o">*</span>resize<span class="p">,</span> height <span class="o">=</span> height<span class="o">*</span>resize<span class="p">)</span> <span class="kp">print</span><span class="p">(</span>u<span class="p">)</span> dev.off<span class="p">()</span> a <span class="o">&lt;-</span> ggplot<span class="p">(</span>data <span class="o">=</span> census_data<span class="p">,</span> aes<span class="p">(</span>x <span class="o">=</span> median_age<span class="p">,</span> y <span class="o">=</span> non_motorized<span class="p">))</span> <span class="o">+</span> geom_text<span class="p">(</span>aes<span class="p">(</span>label <span class="o">=</span> state_postal<span class="p">))</span> svg<span class="p">(</span><span class="s">&quot;age.svg&quot;</span><span class="p">,</span> width <span class="o">=</span> width<span class="o">*</span>resize<span class="p">,</span> height <span class="o">=</span> height<span class="o">*</span>resize<span class="p">)</span> <span class="kp">print</span><span class="p">(</span>a<span class="p">)</span> dev.off<span class="p">()</span> <span class="c1"># Not significant:</span> model <span class="o">=</span> lm<span class="p">(</span>data <span class="o">=</span> census_data<span class="p">,</span> non_motorized <span class="o">~</span> urban_percentage <span class="o">+</span> median_age<span class="p">)</span> <span class="c1"># Only significant because of DC (a clear outlier)</span> model <span class="o">=</span> lm<span class="p">(</span>data <span class="o">=</span> census_data<span class="p">,</span> non_motorized <span class="o">~</span> urban_percentage <span class="o">*</span> median_age<span class="p">)</span> </pre></div> </div> </div> Sat, 08 Apr 2017 11:30:55 -0800 http://swingleydev.com/blog/p/2003/ bicycling walking census data R Cold temperature (−15° F) frequency http://swingleydev.com/blog/p/2002/ <div class="document"> <div class="section" id="introduction"> <h1>Introduction</h1> <p>The latest forecast discussions for Northern Alaska have included warnings that we are likely to experience an extended period of below normal temperatures starting at the end of this week, and yesterday’s <a class="reference external" href="http://ak-wx.blogspot.com/">Deep Cold</a> blog post discusses the similarity of model forecast patterns to patterns seen in the 1989 and 1999 extreme cold events.</p> <p>Our dogs spend most of their time <a class="reference external" href="https://swingleydev.com/photolog/p/894/">in the house</a> when we’re home, but if both of us are at work they’re outside in the dog yard. They have insulated dog houses, but when it’s colder than −15° F, we put them into a heated dog barn. That means one of us has to come home in the middle of the day to let them out to go to the bathroom.</p> <p>Since we’re past the Winter Solstice, and day length is now increasing, I was curious to see if that has an effect on daily temperature, hopeful that the frequency of days when we need to put the dogs in the barn is decreasing.</p> </div> <div class="section" id="methods"> <h1>Methods</h1> <p>We’ll use daily minimum and maximum temperature data from the Fairbanks International Airport station, keeping track of how many years the temperatures are below −15° F and dividing by the total to get a frequency. We live in a cold valley on Goldstream Creek, so our temperatures are typically several degrees colder than the Fairbanks Airport, and we often don’t warm up as much during the day as in other places, but minimum airport temperature is a reasonable proxy for the overall winter temperature at our house.</p> </div> <div class="section" id="results"> <h1>Results</h1> <p>The following plot shows the frequency of minimum (the top of each line) and maximum (the bottom) temperature colder than −15° F at the airport over the period of record, 1904−2016. The curved blue line represents a best fit line through the minimum temperature frequency, and the vertical blue line is drawn at the date when the frequency is the highest.</p> <div class="figure"> <a class="reference external image-reference" href="//media.swingleydev.com/img/blog/2017/01/frequency_fifteen_below.pdf"><img alt="Frequency of days with temperatures below −15° F" class="img-responsive" src="//media.swingleydev.com/img/blog/2017/01/frequency_fifteen_below.svgz" /></a> </div> <p>The maximum frequency is January 12th, so we have a few more days before the likelihood of needing to put the dogs in the barn starts to decline. The plot also shows that we could still reach that threshold all the way into April.</p> <p>For fun, here’s the same plot using −40° as the threshold:</p> <div class="figure"> <a class="reference external image-reference" href="//media.swingleydev.com/img/blog/2017/01/frequency_forty_below.pdf"><img alt="Frequency of days with temperatures below −40°" class="img-responsive" src="//media.swingleydev.com/img/blog/2017/01/frequency_forty_below.svgz" /></a> </div> <p>The date when the frequency starts to decline is shifted slightly to January 15th, and you can see the frequencies are lower. In mid-January, we can expect minimum temperature to be colder than −15° F more than half the time, but temperatures colder than −40° are just under 15%. There’s also an interesting anomaly in mid to late December where the frequency of very cold temperatures appears to drop.</p> </div> <div class="section" id="appendix-r-code"> <h1>Appendix: R code</h1> <div class="highlight"><pre><span></span><span class="kn">library</span><span class="p">(</span>tidyverse<span class="p">)</span> <span class="kn">library</span><span class="p">(</span>lubridate<span class="p">)</span> <span class="kn">library</span><span class="p">(</span>scales<span class="p">)</span> noaa <span class="o">&lt;-</span> src_postgres<span class="p">(</span>host<span class="o">=</span><span class="s">&quot;localhost&quot;</span><span class="p">,</span> dbname<span class="o">=</span><span class="s">&quot;noaa&quot;</span><span class="p">)</span> fairbanks <span class="o">&lt;-</span> tbl<span class="p">(</span>noaa<span class="p">,</span> build_sql<span class="p">(</span><span class="s">&quot;SELECT * FROM ghcnd_pivot</span> <span class="s"> WHERE station_name=&#39;FAIRBANKS INTL AP&#39;&quot;</span><span class="p">))</span> <span class="o">%&gt;%</span> collect<span class="p">()</span> <span class="kp">save</span><span class="p">(</span>fairbanks<span class="p">,</span> file<span class="o">=</span><span class="s">&quot;fairbanks_ghcnd.rdat&quot;</span><span class="p">)</span> for_plot <span class="o">&lt;-</span> fairbanks <span class="o">%&gt;%</span> mutate<span class="p">(</span>doy<span class="o">=</span>yday<span class="p">(</span>dte<span class="p">),</span> dte_str<span class="o">=</span><span class="kp">format</span><span class="p">(</span>dte<span class="p">,</span> <span class="s">&quot;%d %b&quot;</span><span class="p">),</span> min_below<span class="o">=</span><span class="kp">ifelse</span><span class="p">(</span>tmin_c <span class="o">&lt;</span> <span class="m">-26.11</span><span class="p">,</span><span class="m">1</span><span class="p">,</span><span class="m">0</span><span class="p">),</span> max_below<span class="o">=</span><span class="kp">ifelse</span><span class="p">(</span>tmax_c <span class="o">&lt;</span> <span class="m">-26.11</span><span class="p">,</span><span class="m">1</span><span class="p">,</span><span class="m">0</span><span class="p">))</span> <span class="o">%&gt;%</span> filter<span class="p">(</span>dte_str<span class="o">!=</span><span class="s">&quot;29 Feb&quot;</span><span class="p">)</span> <span class="o">%&gt;%</span> mutate<span class="p">(</span>doy<span class="o">=</span><span class="kp">ifelse</span><span class="p">(</span>leap_year<span class="p">(</span>dte<span class="p">)</span> <span class="o">&amp;</span> doy<span class="o">&gt;</span><span class="m">60</span><span class="p">,</span> doy<span class="m">-1</span><span class="p">,</span> doy<span class="p">),</span> doy<span class="o">=</span><span class="p">(</span>doy<span class="m">+31+28+31+30</span><span class="p">)</span><span class="o">%%</span><span class="m">365</span><span class="p">)</span> <span class="o">%&gt;%</span> group_by<span class="p">(</span>doy<span class="p">,</span> dte_str<span class="p">)</span> <span class="o">%&gt;%</span> mutate<span class="p">(</span>n_min<span class="o">=</span><span class="kp">sum</span><span class="p">(</span><span class="kp">ifelse</span><span class="p">(</span><span class="o">!</span><span class="kp">is.na</span><span class="p">(</span>min_below<span class="p">),</span> <span class="m">1</span><span class="p">,</span> <span class="m">0</span><span class="p">)),</span> n_max<span class="o">=</span><span class="kp">sum</span><span class="p">(</span><span class="kp">ifelse</span><span class="p">(</span><span class="o">!</span><span class="kp">is.na</span><span class="p">(</span>max_below<span class="p">),</span> <span class="m">1</span><span class="p">,</span> <span class="m">0</span><span class="p">)))</span> <span class="o">%&gt;%</span> summarize<span class="p">(</span>min_freq<span class="o">=</span><span class="kp">sum</span><span class="p">(</span>min_below<span class="p">,</span> na.rm<span class="o">=</span><span class="kc">TRUE</span><span class="p">)</span><span class="o">/</span><span class="kp">max</span><span class="p">(</span>n_min<span class="p">,</span> na.rm<span class="o">=</span><span class="kc">TRUE</span><span class="p">),</span> max_freq<span class="o">=</span><span class="kp">sum</span><span class="p">(</span>max_below<span class="p">,</span> na.rm<span class="o">=</span><span class="kc">TRUE</span><span class="p">)</span><span class="o">/</span><span class="kp">max</span><span class="p">(</span>n_max<span class="p">,</span> na.rm<span class="o">=</span><span class="kc">TRUE</span><span class="p">))</span> x_breaks <span class="o">&lt;-</span> for_plot <span class="o">%&gt;%</span> filter<span class="p">(</span>doy <span class="o">%in%</span> <span class="kp">seq</span><span class="p">(</span><span class="m">49</span><span class="p">,</span> <span class="m">224</span><span class="p">,</span> <span class="m">7</span><span class="p">))</span> stats <span class="o">&lt;-</span> tibble<span class="p">(</span>doy<span class="o">=</span><span class="kp">seq</span><span class="p">(</span><span class="m">49</span><span class="p">,</span> <span class="m">224</span><span class="p">),</span> pred<span class="o">=</span>predict<span class="p">(</span>loess<span class="p">(</span>min_freq <span class="o">~</span> doy<span class="p">,</span> for_plot <span class="o">%&gt;%</span> filter<span class="p">(</span>doy <span class="o">&gt;=</span> <span class="m">49</span><span class="p">,</span> doy <span class="o">&lt;=</span> <span class="m">224</span><span class="p">))))</span> max_stats <span class="o">&lt;-</span> stats <span class="o">%&gt;%</span> arrange<span class="p">(</span>desc<span class="p">(</span>pred<span class="p">))</span> <span class="o">%&gt;%</span> <span class="kp">head</span><span class="p">(</span>n<span class="o">=</span><span class="m">1</span><span class="p">)</span> p <span class="o">&lt;-</span> ggplot<span class="p">(</span>data<span class="o">=</span>for_plot<span class="p">,</span> aes<span class="p">(</span>x<span class="o">=</span>doy<span class="p">,</span> ymin<span class="o">=</span>min_freq<span class="p">,</span> ymax<span class="o">=</span>max_freq<span class="p">))</span> <span class="o">+</span> geom_linerange<span class="p">()</span> <span class="o">+</span> geom_smooth<span class="p">(</span>aes<span class="p">(</span>y<span class="o">=</span>min_freq<span class="p">),</span> se<span class="o">=</span><span class="kc">FALSE</span><span class="p">,</span> size<span class="o">=</span><span class="m">0.5</span><span class="p">)</span> <span class="o">+</span> geom_segment<span class="p">(</span>aes<span class="p">(</span>x<span class="o">=</span>max_stats<span class="o">$</span>doy<span class="p">,</span> xend<span class="o">=</span>max_stats<span class="o">$</span>doy<span class="p">,</span> y<span class="o">=-</span><span class="kc">Inf</span><span class="p">,</span> yend<span class="o">=</span>max_stats<span class="o">$</span>pred<span class="p">),</span> colour<span class="o">=</span><span class="s">&quot;blue&quot;</span><span class="p">,</span> size<span class="o">=</span><span class="m">0.5</span><span class="p">)</span> <span class="o">+</span> scale_x_continuous<span class="p">(</span>name<span class="o">=</span><span class="kc">NULL</span><span class="p">,</span> limits<span class="o">=</span><span class="kt">c</span><span class="p">(</span><span class="m">49</span><span class="p">,</span> <span class="m">224</span><span class="p">),</span> breaks<span class="o">=</span>x_breaks<span class="o">$</span>doy<span class="p">,</span> labels<span class="o">=</span>x_breaks<span class="o">$</span>dte_str<span class="p">)</span> <span class="o">+</span> scale_y_continuous<span class="p">(</span>name<span class="o">=</span><span class="s">&quot;Frequency of days colder than −15° F&quot;</span><span class="p">,</span> breaks<span class="o">=</span>pretty_breaks<span class="p">(</span>n<span class="o">=</span><span class="m">10</span><span class="p">))</span> <span class="o">+</span> theme_bw<span class="p">()</span> <span class="o">+</span> theme<span class="p">(</span>axis.text.x<span class="o">=</span>element_text<span class="p">(</span>angle<span class="o">=</span><span class="m">30</span><span class="p">,</span> hjust<span class="o">=</span><span class="m">1</span><span class="p">))</span> <span class="c1"># Minus 40</span> for_plot <span class="o">&lt;-</span> fairbanks <span class="o">%&gt;%</span> mutate<span class="p">(</span>doy<span class="o">=</span>yday<span class="p">(</span>dte<span class="p">),</span> dte_str<span class="o">=</span><span class="kp">format</span><span class="p">(</span>dte<span class="p">,</span> <span class="s">&quot;%d %b&quot;</span><span class="p">),</span> min_below<span class="o">=</span><span class="kp">ifelse</span><span class="p">(</span>tmin_c <span class="o">&lt;</span> <span class="m">-40</span><span class="p">,</span><span class="m">1</span><span class="p">,</span><span class="m">0</span><span class="p">),</span> max_below<span class="o">=</span><span class="kp">ifelse</span><span class="p">(</span>tmax_c <span class="o">&lt;</span> <span class="m">-40</span><span class="p">,</span><span class="m">1</span><span class="p">,</span><span class="m">0</span><span class="p">))</span> <span class="o">%&gt;%</span> filter<span class="p">(</span>dte_str<span class="o">!=</span><span class="s">&quot;29 Feb&quot;</span><span class="p">)</span> <span class="o">%&gt;%</span> mutate<span class="p">(</span>doy<span class="o">=</span><span class="kp">ifelse</span><span class="p">(</span>leap_year<span class="p">(</span>dte<span class="p">)</span> <span class="o">&amp;</span> doy<span class="o">&gt;</span><span class="m">60</span><span class="p">,</span> doy<span class="m">-1</span><span class="p">,</span> doy<span class="p">),</span> doy<span class="o">=</span><span class="p">(</span>doy<span class="m">+31+28+31+30</span><span class="p">)</span><span class="o">%%</span><span class="m">365</span><span class="p">)</span> <span class="o">%&gt;%</span> group_by<span class="p">(</span>doy<span class="p">,</span> dte_str<span class="p">)</span> <span class="o">%&gt;%</span> mutate<span class="p">(</span>n_min<span class="o">=</span><span class="kp">sum</span><span class="p">(</span><span class="kp">ifelse</span><span class="p">(</span><span class="o">!</span><span class="kp">is.na</span><span class="p">(</span>min_below<span class="p">),</span> <span class="m">1</span><span class="p">,</span> <span class="m">0</span><span class="p">)),</span> n_max<span class="o">=</span><span class="kp">sum</span><span class="p">(</span><span class="kp">ifelse</span><span class="p">(</span><span class="o">!</span><span class="kp">is.na</span><span class="p">(</span>max_below<span class="p">),</span> <span class="m">1</span><span class="p">,</span> <span class="m">0</span><span class="p">)))</span> <span class="o">%&gt;%</span> summarize<span class="p">(</span>min_freq<span class="o">=</span><span class="kp">sum</span><span class="p">(</span>min_below<span class="p">,</span> na.rm<span class="o">=</span><span class="kc">TRUE</span><span class="p">)</span><span class="o">/</span><span class="kp">max</span><span class="p">(</span>n_min<span class="p">,</span> na.rm<span class="o">=</span><span class="kc">TRUE</span><span class="p">),</span> max_freq<span class="o">=</span><span class="kp">sum</span><span class="p">(</span>max_below<span class="p">,</span> na.rm<span class="o">=</span><span class="kc">TRUE</span><span class="p">)</span><span class="o">/</span><span class="kp">max</span><span class="p">(</span>n_max<span class="p">,</span> na.rm<span class="o">=</span><span class="kc">TRUE</span><span class="p">))</span> x_breaks <span class="o">&lt;-</span> for_plot <span class="o">%&gt;%</span> filter<span class="p">(</span>doy <span class="o">%in%</span> <span class="kp">seq</span><span class="p">(</span><span class="m">63</span><span class="p">,</span> <span class="m">203</span><span class="p">,</span> <span class="m">7</span><span class="p">))</span> stats <span class="o">&lt;-</span> tibble<span class="p">(</span>doy<span class="o">=</span><span class="kp">seq</span><span class="p">(</span><span class="m">63</span><span class="p">,</span> <span class="m">203</span><span class="p">),</span> pred<span class="o">=</span>predict<span class="p">(</span>loess<span class="p">(</span>min_freq <span class="o">~</span> doy<span class="p">,</span> for_plot <span class="o">%&gt;%</span> filter<span class="p">(</span>doy <span class="o">&gt;=</span> <span class="m">63</span><span class="p">,</span> doy <span class="o">&lt;=</span> <span class="m">203</span><span class="p">))))</span> max_stats <span class="o">&lt;-</span> stats <span class="o">%&gt;%</span> arrange<span class="p">(</span>desc<span class="p">(</span>pred<span class="p">))</span> <span class="o">%&gt;%</span> <span class="kp">head</span><span class="p">(</span>n<span class="o">=</span><span class="m">1</span><span class="p">)</span> q <span class="o">&lt;-</span> ggplot<span class="p">(</span>data<span class="o">=</span>for_plot<span class="p">,</span> aes<span class="p">(</span>x<span class="o">=</span>doy<span class="p">,</span> ymin<span class="o">=</span>min_freq<span class="p">,</span> ymax<span class="o">=</span>max_freq<span class="p">))</span> <span class="o">+</span> geom_linerange<span class="p">()</span> <span class="o">+</span> geom_smooth<span class="p">(</span>aes<span class="p">(</span>y<span class="o">=</span>min_freq<span class="p">),</span> se<span class="o">=</span><span class="kc">FALSE</span><span class="p">,</span> size<span class="o">=</span><span class="m">0.5</span><span class="p">)</span> <span class="o">+</span> geom_segment<span class="p">(</span>aes<span class="p">(</span>x<span class="o">=</span>max_stats<span class="o">$</span>doy<span class="p">,</span> xend<span class="o">=</span>max_stats<span class="o">$</span>doy<span class="p">,</span> y<span class="o">=-</span><span class="kc">Inf</span><span class="p">,</span> yend<span class="o">=</span>max_stats<span class="o">$</span>pred<span class="p">),</span> colour<span class="o">=</span><span class="s">&quot;blue&quot;</span><span class="p">,</span> size<span class="o">=</span><span class="m">0.5</span><span class="p">)</span> <span class="o">+</span> scale_x_continuous<span class="p">(</span>name<span class="o">=</span><span class="kc">NULL</span><span class="p">,</span> limits<span class="o">=</span><span class="kt">c</span><span class="p">(</span><span class="m">63</span><span class="p">,</span> <span class="m">203</span><span class="p">),</span> breaks<span class="o">=</span>x_breaks<span class="o">$</span>doy<span class="p">,</span> labels<span class="o">=</span>x_breaks<span class="o">$</span>dte_str<span class="p">)</span> <span class="o">+</span> scale_y_continuous<span class="p">(</span>name<span class="o">=</span><span class="s">&quot;Frequency of days colder than −40°&quot;</span><span class="p">,</span> breaks<span class="o">=</span>pretty_breaks<span class="p">(</span>n<span class="o">=</span><span class="m">10</span><span class="p">))</span> <span class="o">+</span> theme_bw<span class="p">()</span> <span class="o">+</span> theme<span class="p">(</span>axis.text.x<span class="o">=</span>element_text<span class="p">(</span>angle<span class="o">=</span><span class="m">30</span><span class="p">,</span> hjust<span class="o">=</span><span class="m">1</span><span class="p">))</span> </pre></div> </div> </div> Mon, 09 Jan 2017 09:49:33 -0900 http://swingleydev.com/blog/p/2002/ weather climate temperature R Low snow years in Fairbanks http://swingleydev.com/blog/p/2001/ <div class="document"> <div class="section" id="introduction"> <h1>Introduction</h1> <p>So far this winter we’ve gotten only 4.1&nbsp;inches of snow, well below the normal 19.7&nbsp;inches, and there is only 2&nbsp;inches of snow on the ground. At this point last year we had 8&nbsp;inches and I’d been biking and skiing on the trail to work for two weeks. In his <a class="reference external" href="http://ak-wx.blogspot.com/2016/11/north-pacific-temperature-update.html">North Pacific Temperature Update</a> blog post, Richard James mentions that winters like this one, with a combined strongly positive Pacific Decadal Oscillation phase and strongly negative North Pacific Mode phase tend to be a “distinctly dry” pattern for interior Alaska. I don’t pretend to understand these large scale climate patterns, but I thought it would be interesting to look at snowfall and snow depth in years with very little mid-November snow. In other years like this one do we eventually get enough snow that the trails fill in and we can fully participate in winter sports like skiing, dog mushing, and fat biking?</p> </div> <div class="section" id="data"> <h1>Data</h1> <p>We will use daily data from the Global Historical Climate Data set for the Fairbanks International Airport station. Data prior to 1950 is excluded because of poor quality snowfall and snow depth data and because there’s a good chance that our climate has changed since then and patterns from that era aren’t a good model for the current climate in Alaska.</p> <p>We will look at both snow depth and the cumulative winter snowfall.</p> </div> <div class="section" id="results"> <h1>Results</h1> <p>The following tables show the ten years with the lowest cumulative snowfall and snow depth values from 1950 to the present on November&nbsp;18th.</p> <table border="1" class="tosf docutils"> <colgroup> <col width="20%" /> <col width="80%" /> </colgroup> <thead valign="bottom"> <tr><th class="head">Year</th> <th class="head">Cumulative Snowfall (inches)</th> </tr> </thead> <tbody valign="top"> <tr><td>1953</td> <td>1.5</td> </tr> <tr><td>2016</td> <td>4.1</td> </tr> <tr><td>1954</td> <td>4.3</td> </tr> <tr><td>2014</td> <td>6.0</td> </tr> <tr><td>2006</td> <td>6.4</td> </tr> <tr><td>1962</td> <td>7.5</td> </tr> <tr><td>1998</td> <td>7.8</td> </tr> <tr><td>1960</td> <td>8.5</td> </tr> <tr><td>1995</td> <td>8.8</td> </tr> <tr><td>1979</td> <td>10.2</td> </tr> </tbody> </table> <table border="1" class="tosf docutils"> <colgroup> <col width="26%" /> <col width="74%" /> </colgroup> <thead valign="bottom"> <tr><th class="head">Year</th> <th class="head">Snow depth (inches)</th> </tr> </thead> <tbody valign="top"> <tr><td>1953</td> <td>1</td> </tr> <tr><td>1954</td> <td>1</td> </tr> <tr><td>1962</td> <td>1</td> </tr> <tr><td>2016</td> <td>2</td> </tr> <tr><td>2014</td> <td>2</td> </tr> <tr><td>1998</td> <td>3</td> </tr> <tr><td>1964</td> <td>3</td> </tr> <tr><td>1976</td> <td>3</td> </tr> <tr><td>1971</td> <td>3</td> </tr> <tr><td>2006</td> <td>4</td> </tr> </tbody> </table> <p>2016 has the second-lowest cumulative snowfall behind 1953 and is tied for second with 2014 for snow depth with 1953, 1954 and 1962 all having only 1&nbsp;inch of snow on November 18th.</p> <p>It also seems like recent years appear in these tables more frequently than would be expected. Grouping by decade and averaging cumulative snowfall and snow depth yields the pattern in the chart below. The error bars (not shown) are fairly large, so the differences between decades aren’t likely to be statistically significant, but there is a pattern of lower snowfall amounts in recent decades.</p> <div class="figure"> <a class="reference external image-reference" href="//media.swingleydev.com/img/blog/2016/11/decadal_averages.pdf"><img alt="Decadal average cumulative snowfall and snow depth" class="img-responsive" src="//media.swingleydev.com/img/blog/2016/11/decadal_averages.svgz" /></a> </div> <p>Now let’s see what happened in those years with low snowfall and snow depth values in mid-November starting with cumulative snowfall. The following plot (and the subsequent snow depth plot) shows the data for the low-value years (and one very high snowfall year—1990), with each year’s data as a separate line. The smooth dark cyan line through the middle of each plot is the smoothed line through the values for all years; a sort of “average” snowfall and snow depth curve.</p> <div class="figure"> <a class="reference external image-reference" href="//media.swingleydev.com/img/blog/2016/11/cumulative_snowfall.pdf"><img alt="Cumulative snowfall, years with low snow on November 18" class="img-responsive" src="//media.swingleydev.com/img/blog/2016/11/cumulative_snowfall.svgz" /></a> </div> <p>In all four mid-November low-snowfall years, the cumulative snowfall values remain below average throughout the winter, but snow did continue to fall as the season went on. Even the lowest winter year here, 2006–2007, still ended the winter with 15 inches of snow on the groud.</p> <p>The following plot shows snow depth for the four years with the lowest snow depth on November 18th. The data is formatted the same as in the previous plot except we’ve jittered the values slightly to make the plot easier to read.</p> <div class="figure"> <a class="reference external image-reference" href="//media.swingleydev.com/img/blog/2016/11/snow_depth.pdf"><img alt="Snow depth, years with low snow on November 18" class="img-responsive" src="//media.swingleydev.com/img/blog/2016/11/snow_depth.svgz" /></a> </div> <p>The pattern here is similar, but the snow depths get much closer to the average values. Snow depth for all four low snow years remain low throughout November, but start rising in December, dramatically in 1954 and 2014.</p> <p>One of the highest snowfall years between 1950 and 2016 was 1990–1991 (shown on both plots). An impressive 32.8&nbsp;inches of snow fell in eight days between December&nbsp;21st and December&nbsp;28th, accounting for the sharp increase in cumulative snowfall and snow depth shown on both plots. There are five years in the record where the cumulative total for the entire winter was lower than these eight days in 1990.</p> </div> <div class="section" id="conclusion"> <h1>Conclusion</h1> <p>Despite the lack of snow on the ground to this point in the year, the record shows that we are still likely to get enough snow to fill in the trails. We may need to wait until mid to late December, but it’s even possible we’ll eventually reach the long term average depth before spring.</p> </div> <div class="section" id="appendix"> <h1>Appendix</h1> <p>Here’s the R code used to generate the statistics, tables and plots from this post:</p> <div class="highlight"><pre><span></span><span class="kn">library</span><span class="p">(</span>tidyverse<span class="p">)</span> <span class="kn">library</span><span class="p">(</span>lubridate<span class="p">)</span> <span class="kn">library</span><span class="p">(</span>scales<span class="p">)</span> <span class="kn">library</span><span class="p">(</span>knitr<span class="p">)</span> noaa <span class="o">&lt;-</span> src_postgres<span class="p">(</span>host<span class="o">=</span><span class="s">&quot;localhost&quot;</span><span class="p">,</span> dbname<span class="o">=</span><span class="s">&quot;noaa&quot;</span><span class="p">)</span> snow <span class="o">&lt;-</span> tbl<span class="p">(</span>noaa<span class="p">,</span> build_sql<span class="p">(</span> <span class="s">&quot;WITH wdoy_data AS (</span> <span class="s"> SELECT dte, dte - interval &#39;120 days&#39; as wdte,</span> <span class="s"> tmin_c, tmax_c, (tmin_c+tmax_c)/2.0 AS tavg_c,</span> <span class="s"> prcp_mm, snow_mm, snwd_mm</span> <span class="s"> FROM ghcnd_pivot</span> <span class="s"> WHERE station_name = &#39;FAIRBANKS INTL AP&#39;</span> <span class="s"> AND dte &gt; &#39;1950-09-01&#39;)</span> <span class="s"> SELECT dte, date_part(&#39;year&#39;, wdte) AS wyear, date_part(&#39;doy&#39;, wdte) AS wdoy,</span> <span class="s"> to_char(dte, &#39;Mon DD&#39;) AS mmdd,</span> <span class="s"> tmin_c, tmax_c, tavg_c, prcp_mm, snow_mm, snwd_mm</span> <span class="s"> FROM wdoy_data&quot;</span><span class="p">))</span> <span class="o">%&gt;%</span> mutate<span class="p">(</span>wyear<span class="o">=</span><span class="kp">as.integer</span><span class="p">(</span>wyear<span class="p">),</span> wdoy<span class="o">=</span><span class="kp">as.integer</span><span class="p">(</span>wdoy<span class="p">),</span> snwd_mm<span class="o">=</span><span class="kp">as.integer</span><span class="p">(</span>snwd_mm<span class="p">))</span> <span class="o">%&gt;%</span> select<span class="p">(</span>dte<span class="p">,</span> wyear<span class="p">,</span> wdoy<span class="p">,</span> mmdd<span class="p">,</span> tmin_c<span class="p">,</span> tmax_c<span class="p">,</span> tavg_c<span class="p">,</span> prcp_mm<span class="p">,</span> snow_mm<span class="p">,</span> snwd_mm<span class="p">)</span> <span class="o">%&gt;%</span> collect<span class="p">()</span> write_csv<span class="p">(</span>snow<span class="p">,</span> <span class="s">&quot;pafa_data_with_wyear_post_1950.csv&quot;</span><span class="p">)</span> <span class="kp">save</span><span class="p">(</span>snow<span class="p">,</span> file<span class="o">=</span><span class="s">&quot;pafa_data_with_wyear_post_1950.rdata&quot;</span><span class="p">)</span> cum_snow <span class="o">&lt;-</span> snow <span class="o">%&gt;%</span> mutate<span class="p">(</span>snow_na<span class="o">=</span><span class="kp">ifelse</span><span class="p">(</span><span class="kp">is.na</span><span class="p">(</span>snow_mm<span class="p">),</span><span class="m">1</span><span class="p">,</span><span class="m">0</span><span class="p">),</span> snow_mm<span class="o">=</span><span class="kp">ifelse</span><span class="p">(</span><span class="kp">is.na</span><span class="p">(</span>snow_mm<span class="p">),</span><span class="m">0</span><span class="p">,</span>snow_mm<span class="p">))</span> <span class="o">%&gt;%</span> group_by<span class="p">(</span>wyear<span class="p">)</span> <span class="o">%&gt;%</span> mutate<span class="p">(</span>snow_mm_cum<span class="o">=</span><span class="kp">cumsum</span><span class="p">(</span>snow_mm<span class="p">),</span> snow_na<span class="o">=</span><span class="kp">cumsum</span><span class="p">(</span>snow_na<span class="p">))</span> <span class="o">%&gt;%</span> ungroup<span class="p">()</span> <span class="o">%&gt;%</span> mutate<span class="p">(</span>snow_in_cum<span class="o">=</span><span class="kp">round</span><span class="p">(</span>snow_mm_cum<span class="o">/</span><span class="m">25.4</span><span class="p">,</span> <span class="m">1</span><span class="p">),</span> snwd_in<span class="o">=</span><span class="kp">round</span><span class="p">(</span>snwd_mm<span class="o">/</span><span class="m">25.4</span><span class="p">,</span> <span class="m">0</span><span class="p">))</span> nov_18_snow <span class="o">&lt;-</span> cum_snow <span class="o">%&gt;%</span> filter<span class="p">(</span>mmdd<span class="o">==</span><span class="s">&#39;Nov 18&#39;</span><span class="p">)</span> <span class="o">%&gt;%</span> select<span class="p">(</span>wyear<span class="p">,</span> snow_in_cum<span class="p">,</span> snwd_in<span class="p">)</span> <span class="o">%&gt;%</span> arrange<span class="p">(</span>snow_in_cum<span class="p">)</span> decadal_avg <span class="o">&lt;-</span> nov_18_snow <span class="o">%&gt;%</span> mutate<span class="p">(</span>decade<span class="o">=</span><span class="kp">as.integer</span><span class="p">(</span>wyear<span class="o">/</span><span class="m">10</span><span class="p">)</span><span class="o">*</span><span class="m">10</span><span class="p">)</span> <span class="o">%&gt;%</span> group_by<span class="p">(</span>decade<span class="p">)</span> <span class="o">%&gt;%</span> summarize<span class="p">(</span><span class="sb">`Snow depth`</span><span class="o">=</span><span class="kp">mean</span><span class="p">(</span>snwd_in<span class="p">),</span> snwd_sd<span class="o">=</span>sd<span class="p">(</span>snwd_in<span class="p">),</span> <span class="sb">`Cumulative Snowfall`</span><span class="o">=</span><span class="kp">mean</span><span class="p">(</span>snow_in_cum<span class="p">),</span> snow_cum_sd<span class="o">=</span>sd<span class="p">(</span>snow_in_cum<span class="p">))</span> decadal_averages <span class="o">&lt;-</span> ggplot<span class="p">(</span>decadal_avg <span class="o">%&gt;%</span> gather<span class="p">(</span>variable<span class="p">,</span> value<span class="p">,</span> <span class="o">-</span>decade<span class="p">)</span> <span class="o">%&gt;%</span> filter<span class="p">(</span>variable <span class="o">%in%</span> <span class="kt">c</span><span class="p">(</span><span class="s">&quot;Cumulative Snowfall&quot;</span><span class="p">,</span> <span class="s">&quot;Snow depth&quot;</span><span class="p">)),</span> aes<span class="p">(</span>x<span class="o">=</span><span class="kp">as.factor</span><span class="p">(</span>decade<span class="p">),</span> y<span class="o">=</span>value<span class="p">,</span> fill<span class="o">=</span>variable<span class="p">))</span> <span class="o">+</span> theme_bw<span class="p">()</span> <span class="o">+</span> geom_bar<span class="p">(</span>stat<span class="o">=</span><span class="s">&quot;identity&quot;</span><span class="p">,</span> position<span class="o">=</span><span class="s">&quot;dodge&quot;</span><span class="p">)</span> <span class="o">+</span> scale_x_discrete<span class="p">(</span>name<span class="o">=</span><span class="s">&quot;Decade&quot;</span><span class="p">,</span> breaks<span class="o">=</span><span class="kt">c</span><span class="p">(</span><span class="m">1950</span><span class="p">,</span> <span class="m">1960</span><span class="p">,</span> <span class="m">1970</span><span class="p">,</span> <span class="m">1980</span><span class="p">,</span> <span class="m">1990</span><span class="p">,</span> <span class="m">2000</span><span class="p">,</span> <span class="m">2010</span><span class="p">))</span> <span class="o">+</span> scale_y_continuous<span class="p">(</span>name<span class="o">=</span><span class="s">&quot;Inches&quot;</span><span class="p">,</span> breaks<span class="o">=</span>pretty_breaks<span class="p">(</span>n<span class="o">=</span><span class="m">10</span><span class="p">))</span> <span class="o">+</span> scale_fill_discrete<span class="p">(</span>name<span class="o">=</span><span class="s">&quot;Measurement&quot;</span><span class="p">)</span> <span class="kp">print</span><span class="p">(</span>decadal_averages<span class="p">)</span> date_x_scale <span class="o">&lt;-</span> cum_snow <span class="o">%&gt;%</span> filter<span class="p">(</span><span class="kp">grepl</span><span class="p">(</span><span class="s">&#39; (01|15)&#39;</span><span class="p">,</span> mmdd<span class="p">),</span> wyear<span class="o">==</span><span class="s">&#39;1994&#39;</span><span class="p">)</span> <span class="o">%&gt;%</span> select<span class="p">(</span>wdoy<span class="p">,</span> mmdd<span class="p">)</span> cumulative_snowfall <span class="o">&lt;-</span> ggplot<span class="p">(</span>cum_snow <span class="o">%&gt;%</span> filter<span class="p">(</span>wyear <span class="o">%in%</span> <span class="kt">c</span><span class="p">(</span><span class="m">1953</span><span class="p">,</span> <span class="m">1954</span><span class="p">,</span> <span class="m">2014</span><span class="p">,</span> <span class="m">2006</span><span class="p">,</span> <span class="m">1990</span><span class="p">),</span> wdoy<span class="o">&gt;</span><span class="m">183</span><span class="p">,</span> wdoy<span class="o">&lt;</span><span class="m">320</span><span class="p">),</span> aes<span class="p">(</span>x<span class="o">=</span>wdoy<span class="p">,</span> y<span class="o">=</span>snow_in_cum<span class="p">,</span> colour<span class="o">=</span><span class="kp">as.factor</span><span class="p">(</span>wyear<span class="p">)))</span> <span class="o">+</span> theme_bw<span class="p">()</span> <span class="o">+</span> geom_smooth<span class="p">(</span>data<span class="o">=</span>cum_snow <span class="o">%&gt;%</span> filter<span class="p">(</span>wdoy<span class="o">&gt;</span><span class="m">183</span><span class="p">,</span> wdoy<span class="o">&lt;</span><span class="m">320</span><span class="p">),</span> aes<span class="p">(</span>x<span class="o">=</span>wdoy<span class="p">,</span> y<span class="o">=</span>snow_in_cum<span class="p">),</span> size<span class="o">=</span><span class="m">0.5</span><span class="p">,</span> colour<span class="o">=</span><span class="s">&quot;darkcyan&quot;</span><span class="p">,</span> inherit.aes<span class="o">=</span><span class="kc">FALSE</span><span class="p">,</span> se<span class="o">=</span><span class="kc">FALSE</span><span class="p">)</span> <span class="o">+</span> geom_line<span class="p">(</span>position<span class="o">=</span><span class="s">&quot;jitter&quot;</span><span class="p">)</span> <span class="o">+</span> scale_x_continuous<span class="p">(</span>name<span class="o">=</span><span class="s">&quot;&quot;</span><span class="p">,</span> breaks<span class="o">=</span>date_x_scale<span class="o">$</span>wdoy<span class="p">,</span> labels<span class="o">=</span>date_x_scale<span class="o">$</span>mmdd<span class="p">)</span> <span class="o">+</span> scale_y_continuous<span class="p">(</span>name<span class="o">=</span><span class="s">&quot;Cumulative snowfall (in)&quot;</span><span class="p">,</span> breaks<span class="o">=</span>pretty_breaks<span class="p">(</span>n<span class="o">=</span><span class="m">10</span><span class="p">))</span> <span class="o">+</span> scale_color_discrete<span class="p">(</span>name<span class="o">=</span><span class="s">&quot;Winter year&quot;</span><span class="p">)</span> <span class="kp">print</span><span class="p">(</span>cumulative_snowfall<span class="p">)</span> snow_depth <span class="o">&lt;-</span> ggplot<span class="p">(</span>cum_snow <span class="o">%&gt;%</span> filter<span class="p">(</span>wyear <span class="o">%in%</span> <span class="kt">c</span><span class="p">(</span><span class="m">1953</span><span class="p">,</span> <span class="m">1954</span><span class="p">,</span> <span class="m">1962</span><span class="p">,</span> <span class="m">2014</span><span class="p">,</span> <span class="m">1990</span><span class="p">),</span> wdoy<span class="o">&gt;</span><span class="m">183</span><span class="p">,</span> wdoy<span class="o">&lt;</span><span class="m">320</span><span class="p">),</span> aes<span class="p">(</span>x<span class="o">=</span>wdoy<span class="p">,</span> y<span class="o">=</span>snwd_in<span class="p">,</span> colour<span class="o">=</span><span class="kp">as.factor</span><span class="p">(</span>wyear<span class="p">)))</span> <span class="o">+</span> theme_bw<span class="p">()</span> <span class="o">+</span> geom_smooth<span class="p">(</span>data<span class="o">=</span>cum_snow <span class="o">%&gt;%</span> filter<span class="p">(</span>wdoy<span class="o">&gt;</span><span class="m">183</span><span class="p">,</span> wdoy<span class="o">&lt;</span><span class="m">320</span><span class="p">),</span> aes<span class="p">(</span>x<span class="o">=</span>wdoy<span class="p">,</span> y<span class="o">=</span>snwd_in<span class="p">),</span> size<span class="o">=</span><span class="m">0.5</span><span class="p">,</span> colour<span class="o">=</span><span class="s">&quot;darkcyan&quot;</span><span class="p">,</span> inherit.aes<span class="o">=</span><span class="kc">FALSE</span><span class="p">,</span> se<span class="o">=</span><span class="kc">FALSE</span><span class="p">)</span> <span class="o">+</span> geom_line<span class="p">(</span>position<span class="o">=</span><span class="s">&quot;jitter&quot;</span><span class="p">)</span> <span class="o">+</span> scale_x_continuous<span class="p">(</span>name<span class="o">=</span><span class="s">&quot;&quot;</span><span class="p">,</span> breaks<span class="o">=</span>date_x_scale<span class="o">$</span>wdoy<span class="p">,</span> labels<span class="o">=</span>date_x_scale<span class="o">$</span>mmdd<span class="p">)</span> <span class="o">+</span> scale_y_continuous<span class="p">(</span>name<span class="o">=</span><span class="s">&quot;Snow Depth (in)&quot;</span><span class="p">,</span> breaks<span class="o">=</span>pretty_breaks<span class="p">(</span>n<span class="o">=</span><span class="m">10</span><span class="p">))</span> <span class="o">+</span> scale_color_discrete<span class="p">(</span>name<span class="o">=</span><span class="s">&quot;Winter year&quot;</span><span class="p">)</span> <span class="kp">print</span><span class="p">(</span>snow_depth<span class="p">)</span> </pre></div> </div> </div> Sat, 19 Nov 2016 15:50:20 -0900 http://swingleydev.com/blog/p/2001/ snow depth snowfall weather climate R Fairbanks Race Pace Prediction http://swingleydev.com/blog/p/2000/ <div class="document"> <div class="figure align-right"> <a class="reference external image-reference" href="//swingleydev.com/photolog/p/876/"><img alt="Equinox Marathon Relay leg 2, 2016" src="//media.swingleydev.com/img/blog/2016/10/equinox_relay_leg_2_2016.jpg" style="width: 300px; height: 169px;" /></a> <p class="caption">Equinox Marathon Relay leg 2, 2016</p> </div> <div class="section" id="introduction"> <h1>Introduction</h1> <p>A couple years ago I compared racing data between two races (<a class="reference external" href="/blog/p/1967/">Gold Discovery and Equinox</a>, <a class="reference external" href="/blog/p/1968/">Santa Claus and Equinox</a>) in the same season for all runners that ran in both events. The result was an estimate of how fast I might run the Equinox Marathon based on my times for Gold Discovery and the Santa Claus Half Marathon.</p> <p>Several years have passed and I've run more races and collected more racing data for all the major Fairbanks races and wanted to run the same analysis for all combinations of races.</p> </div> <div class="section" id="data"> <h1>Data</h1> <p>The data comes from a database I’ve built of race times for all competitors, mostly coming from the results available from Chronotrack, but including some race results from SportAlaska.</p> <p>We started by loading the required R packages and reading in all the racing data, a small subset of which looks like this.</p> <table border="1" class="tosf docutils"> <colgroup> <col width="23%" /> <col width="9%" /> <col width="24%" /> <col width="18%" /> <col width="17%" /> <col width="8%" /> </colgroup> <thead valign="bottom"> <tr><th class="head">race</th> <th class="head">year</th> <th class="head">name</th> <th class="head">finish_time</th> <th class="head">birth_year</th> <th class="head">sex</th> </tr> </thead> <tbody valign="top"> <tr><td>Beat Beethoven</td> <td>2015</td> <td>thomas mcclelland</td> <td>00:21:49</td> <td>1995</td> <td>M</td> </tr> <tr><td>Equinox Marathon</td> <td>2015</td> <td>jennifer paniati</td> <td>06:24:14</td> <td>1989</td> <td>F</td> </tr> <tr><td>Equinox Marathon</td> <td>2014</td> <td>kris starkey</td> <td>06:35:55</td> <td>1972</td> <td>F</td> </tr> <tr><td>Midnight Sun Run</td> <td>2014</td> <td>kathy toohey</td> <td>01:10:42</td> <td>1960</td> <td>F</td> </tr> <tr><td>Midnight Sun Run</td> <td>2016</td> <td>steven rast</td> <td>01:59:41</td> <td>1960</td> <td>M</td> </tr> <tr><td>Equinox Marathon</td> <td>2013</td> <td>elizabeth smith</td> <td>09:18:53</td> <td>1987</td> <td>F</td> </tr> <tr><td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> </tr> </tbody> </table> <p>Next we loaded in the names and distances of the races and combined this with the individual racing data. The data from Chronotrack doesn’t include the mileage and we will need that to calculate pace (minutes per mile).</p> <p>My database doesn’t have complete information about all the racers that competed, and in some cases the information for a runner in one race conflicts with the information for the same runner in a different race. In order to resolve this, we generated a list of runners, grouped by their name, and threw out racers where their name matches but their gender was reported differently from one race to the next. Please understand we’re not doing this to exclude those who have changed their gender identity along the way, but to eliminate possible bias from data entry mistakes.</p> <p>Finally, we combined the racers with the individual racing data, substituting our corrected runner information for what appeared in the individual race’s data. We also calculated minutes per mile (<tt class="docutils literal">pace</tt>) and the age of the runner during the year of the race (<tt class="docutils literal">age</tt>). Because we’re assigning a birth year to the minimum reported year from all races, our age variable won’t change during the running season, which is closer to the way age categories are calculated in Europe. Finally, we removed results where pace was greater than 20 minutes per mile for races longer than ten miles, and greater than 16 minute miles for races less than ten miles. These are likely to be outliers, or competitors not running the race.</p> <table border="1" class="tosf docutils"> <colgroup> <col width="14%" /> <col width="13%" /> <col width="10%" /> <col width="23%" /> <col width="8%" /> <col width="9%" /> <col width="9%" /> <col width="7%" /> <col width="7%" /> </colgroup> <thead valign="bottom"> <tr><th class="head">name</th> <th class="head">birth_year</th> <th class="head">gender</th> <th class="head">race_str</th> <th class="head">year</th> <th class="head">miles</th> <th class="head">minutes</th> <th class="head">pace</th> <th class="head">age</th> </tr> </thead> <tbody valign="top"> <tr><td>aaron austin</td> <td>1983</td> <td>M</td> <td>midnight_sun_run</td> <td>2014</td> <td>6.2</td> <td>50.60</td> <td>8.16</td> <td>31</td> </tr> <tr><td>aaron bravo</td> <td>1999</td> <td>M</td> <td>midnight_sun_run</td> <td>2013</td> <td>6.2</td> <td>45.26</td> <td>7.30</td> <td>14</td> </tr> <tr><td>aaron bravo</td> <td>1999</td> <td>M</td> <td>midnight_sun_run</td> <td>2014</td> <td>6.2</td> <td>40.08</td> <td>6.46</td> <td>15</td> </tr> <tr><td>aaron bravo</td> <td>1999</td> <td>M</td> <td>midnight_sun_run</td> <td>2015</td> <td>6.2</td> <td>36.65</td> <td>5.91</td> <td>16</td> </tr> <tr><td>aaron bravo</td> <td>1999</td> <td>M</td> <td>midnight_sun_run</td> <td>2016</td> <td>6.2</td> <td>36.31</td> <td>5.85</td> <td>17</td> </tr> <tr><td>aaron bravo</td> <td>1999</td> <td>M</td> <td>spruce_tree_classic</td> <td>2014</td> <td>6.0</td> <td>42.17</td> <td>7.03</td> <td>15</td> </tr> <tr><td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> </tr> </tbody> </table> <p>We combined all available results for each runner in all years they participated such that the resulting rows are grouped by runner and year and columns are the races themselves. The values in each cell represent the pace for the runner × year × race combination.</p> <p>For example, here’s the first six rows for runners that completed Beat Beethoven and the Chena River Run in the years I have data. I also included the column for the Midnight Sun Run in the table, but the actual data has a column for all the major Fairbanks races. You’ll see that two of the six runners listed ran BB and CRR but didn’t run MSR in that year.</p> <table border="1" class="tosf docutils"> <colgroup> <col width="17%" /> <col width="10%" /> <col width="7%" /> <col width="8%" /> <col width="18%" /> <col width="20%" /> <col width="21%" /> </colgroup> <thead valign="bottom"> <tr><th class="head">name</th> <th class="head">gender</th> <th class="head">age</th> <th class="head">year</th> <th class="head">beat_beethoven</th> <th class="head">chena_river_run</th> <th class="head">midnight_sun_run</th> </tr> </thead> <tbody valign="top"> <tr><td>aaron schooley</td> <td>M</td> <td>36</td> <td>2016</td> <td>8.19</td> <td>8.15</td> <td>8.88</td> </tr> <tr><td>abby fett</td> <td>F</td> <td>33</td> <td>2014</td> <td>10.68</td> <td>10.34</td> <td>11.59</td> </tr> <tr><td>abby fett</td> <td>F</td> <td>35</td> <td>2016</td> <td>11.97</td> <td>12.58</td> <td>NA</td> </tr> <tr><td>abigail haas</td> <td>F</td> <td>11</td> <td>2015</td> <td>9.34</td> <td>8.29</td> <td>NA</td> </tr> <tr><td>abigail haas</td> <td>F</td> <td>12</td> <td>2016</td> <td>8.48</td> <td>7.90</td> <td>11.40</td> </tr> <tr><td>aimee hughes</td> <td>F</td> <td>43</td> <td>2015</td> <td>11.32</td> <td>9.50</td> <td>10.69</td> </tr> <tr><td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> </tr> </tbody> </table> <p>With this data, we build a whole series of linear models, one for each race combination. We created a series of formula strings and objects for all the combinations, then executed them using <tt class="docutils literal">map()</tt>. We combined the start and predicted race names with the linear models, and used <tt class="docutils literal">glance()</tt> and <tt class="docutils literal">tidy()</tt> from the <tt class="docutils literal">broom</tt> package to turn the models into statistics and coefficients.</p> <p>All of the models between races were highly significant, but many of them contain coefficients that aren’t significantly different than zero. That means that including that term (age, gender or first race pace) isn’t adding anything useful to the model. We used the significance of each term to reduce our models so they only contained coefficients that were significant and regenerated the statistics and coefficients for these reduced models.</p> <p>The full R code appears at the bottom of this post.</p> </div> <div class="section" id="results"> <h1>Results</h1> <p>Here’s the statistics from the ten best performing models (based on <em>R²</em> ).</p> <table border="1" class="tosf docutils"> <colgroup> <col width="36%" /> <col width="36%" /> <col width="6%" /> <col width="8%" /> <col width="13%" /> </colgroup> <thead valign="bottom"> <tr><th class="head">start_race</th> <th class="head">predicted_race</th> <th class="head">n</th> <th class="head"><em>R²</em></th> <th class="head"><em>p</em>-value</th> </tr> </thead> <tbody valign="top"> <tr><td>run_of_the_valkyries</td> <td>golden_heart_trail_run</td> <td>40</td> <td>0.956</td> <td>0</td> </tr> <tr><td>golden_heart_trail_run</td> <td>equinox_marathon</td> <td>36</td> <td>0.908</td> <td>0</td> </tr> <tr><td>santa_claus_half_marathon</td> <td>golden_heart_trail_run</td> <td>34</td> <td>0.896</td> <td>0</td> </tr> <tr><td>midnight_sun_run</td> <td>gold_discovery_run</td> <td>139</td> <td>0.887</td> <td>0</td> </tr> <tr><td>beat_beethoven</td> <td>golden_heart_trail_run</td> <td>32</td> <td>0.886</td> <td>0</td> </tr> <tr><td>run_of_the_valkyries</td> <td>gold_discovery_run</td> <td>44</td> <td>0.877</td> <td>0</td> </tr> <tr><td>midnight_sun_run</td> <td>golden_heart_trail_run</td> <td>52</td> <td>0.877</td> <td>0</td> </tr> <tr><td>gold_discovery_run</td> <td>santa_claus_half_marathon</td> <td>111</td> <td>0.876</td> <td>0</td> </tr> <tr><td>chena_river_run</td> <td>golden_heart_trail_run</td> <td>44</td> <td>0.873</td> <td>0</td> </tr> <tr><td>run_of_the_valkyries</td> <td>santa_claus_half_marathon</td> <td>91</td> <td>0.851</td> <td>0</td> </tr> </tbody> </table> <p>It’s interesting how many times the Golden Heart Trail Run appears on this list since that run is something of an outlier in the Usibelli running series because it’s the only race entirely on trails. Maybe it’s because it’s distance (5K) is comparable with a lot of the earlier races in the season, but because it’s on trails it matches well with the later races that are at least partially on trails like Gold Discovery or Equinox.</p> <p>Here are the ten worst models.</p> <table border="1" class="tosf docutils"> <colgroup> <col width="32%" /> <col width="38%" /> <col width="6%" /> <col width="9%" /> <col width="14%" /> </colgroup> <thead valign="bottom"> <tr><th class="head">start_race</th> <th class="head">predicted_race</th> <th class="head">n</th> <th class="head"><em>R²</em></th> <th class="head"><em>p</em>-value</th> </tr> </thead> <tbody valign="top"> <tr><td>midnight_sun_run</td> <td>equinox_marathon</td> <td>431</td> <td>0.525</td> <td>0</td> </tr> <tr><td>beat_beethoven</td> <td>hoodoo_half_marathon</td> <td>87</td> <td>0.533</td> <td>0</td> </tr> <tr><td>beat_beethoven</td> <td>midnight_sun_run</td> <td>818</td> <td>0.570</td> <td>0</td> </tr> <tr><td>chena_river_run</td> <td>equinox_marathon</td> <td>196</td> <td>0.572</td> <td>0</td> </tr> <tr><td>equinox_marathon</td> <td>hoodoo_half_marathon</td> <td>90</td> <td>0.584</td> <td>0</td> </tr> <tr><td>beat_beethoven</td> <td>equinox_marathon</td> <td>265</td> <td>0.585</td> <td>0</td> </tr> <tr><td>gold_discovery_run</td> <td>hoodoo_half_marathon</td> <td>41</td> <td>0.599</td> <td>0</td> </tr> <tr><td>beat_beethoven</td> <td>santa_claus_half_marathon</td> <td>163</td> <td>0.612</td> <td>0</td> </tr> <tr><td>run_of_the_valkyries</td> <td>equinox_marathon</td> <td>125</td> <td>0.642</td> <td>0</td> </tr> <tr><td>midnight_sun_run</td> <td>hoodoo_half_marathon</td> <td>118</td> <td>0.657</td> <td>0</td> </tr> </tbody> </table> <p>Most of these models are shorter races like Beat Beethoven or the Chena River Run predicting longer races like Equinox or one of the half marathons. Even so, each model explains more than half the variation in the data, which isn’t terrible.</p> </div> <div class="section" id="application"> <h1>Application</h1> <p>Now that we have all our models and their coefficients, we used these models to make predictions of future performance. I’ve written an online calculator based on the reduced models that let you predict your race results as you go through the running season. The calculator is here: <a class="reference external" href="https://swingleydev.com/misc/fairbanks_race_conversion.html">Fairbanks Running Race Converter</a>.</p> <p>For example, I ran a 7:41 pace for Run of the Valkyries this year. Entering that, plus my age and gender into the converter predicts an 8:57 pace for the first running of the HooDoo Half Marathon. The <em>R²</em> for this model was a respectable 0.71 even though only 23 runners ran both races this year (including me). My actual pace for HooDoo was 8:18, so I came in quite a bit faster than this. No wonder my knee and hip hurt after the race! Using my time from the Golden Heart Trail Run, the converter predicts a HooDoo Half pace of 8:16.2, less than a minute off my 1:48:11 finish.</p> </div> <div class="section" id="appendix-r-code"> <h1>Appendix: R code</h1> <div class="highlight"><pre><span></span><span class="kn">library</span><span class="p">(</span>tidyverse<span class="p">)</span> <span class="kn">library</span><span class="p">(</span>lubridate<span class="p">)</span> <span class="kn">library</span><span class="p">(</span>broom<span class="p">)</span> races_db <span class="o">&lt;-</span> src_postgres<span class="p">(</span>host<span class="o">=</span><span class="s">&quot;localhost&quot;</span><span class="p">,</span> dbname<span class="o">=</span><span class="s">&quot;races&quot;</span><span class="p">)</span> combined_races <span class="o">&lt;-</span> tbl<span class="p">(</span>races_db<span class="p">,</span> build_sql<span class="p">(</span> <span class="s">&quot;SELECT race, year, lower(name) AS name, finish_time,</span> <span class="s"> year - age AS birth_year, sex</span> <span class="s"> FROM chronotrack</span> <span class="s"> UNION</span> <span class="s"> SELECT race, year, lower(name) AS name, finish_time,</span> <span class="s"> birth_year,</span> <span class="s"> CASE WHEN age_class ~ &#39;M&#39; THEN &#39;M&#39; ELSE &#39;F&#39; END AS sex</span> <span class="s"> FROM sportalaska</span> <span class="s"> UNION</span> <span class="s"> SELECT race, year, lower(name) AS name, finish_time,</span> <span class="s"> NULL AS birth_year, NULL AS sex</span> <span class="s"> FROM other&quot;</span><span class="p">))</span> races <span class="o">&lt;-</span> tbl<span class="p">(</span>races_db<span class="p">,</span> build_sql<span class="p">(</span> <span class="s">&quot;SELECT race,</span> <span class="s"> lower(regexp_replace(race, &#39;[ ’]&#39;, &#39;_&#39;, &#39;g&#39;)) AS race_str,</span> <span class="s"> date_part(&#39;year&#39;, date) AS year,</span> <span class="s"> miles</span> <span class="s"> FROM races&quot;</span><span class="p">))</span> racing_data <span class="o">&lt;-</span> combined_races <span class="o">%&gt;%</span> inner_join<span class="p">(</span>races<span class="p">)</span> <span class="o">%&gt;%</span> filter<span class="p">(</span><span class="o">!</span><span class="kp">is.na</span><span class="p">(</span>finish_time<span class="p">))</span> racers <span class="o">&lt;-</span> racing_data <span class="o">%&gt;%</span> group_by<span class="p">(</span>name<span class="p">)</span> <span class="o">%&gt;%</span> summarize<span class="p">(</span>races<span class="o">=</span>n<span class="p">(),</span> birth_year<span class="o">=</span><span class="kp">min</span><span class="p">(</span>birth_year<span class="p">),</span> gender_filter<span class="o">=</span><span class="kp">ifelse</span><span class="p">(</span><span class="kp">sum</span><span class="p">(</span><span class="kp">ifelse</span><span class="p">(</span>sex<span class="o">==</span><span class="s">&#39;M&#39;</span><span class="p">,</span><span class="m">1</span><span class="p">,</span><span class="m">0</span><span class="p">))</span><span class="o">==</span> <span class="kp">sum</span><span class="p">(</span><span class="kp">ifelse</span><span class="p">(</span>sex<span class="o">==</span><span class="s">&#39;F&#39;</span><span class="p">,</span><span class="m">1</span><span class="p">,</span><span class="m">0</span><span class="p">)),</span> <span class="kc">FALSE</span><span class="p">,</span> <span class="kc">TRUE</span><span class="p">),</span> gender<span class="o">=</span><span class="kp">ifelse</span><span class="p">(</span><span class="kp">sum</span><span class="p">(</span><span class="kp">ifelse</span><span class="p">(</span>sex<span class="o">==</span><span class="s">&#39;M&#39;</span><span class="p">,</span><span class="m">1</span><span class="p">,</span><span class="m">0</span><span class="p">))</span><span class="o">&gt;</span> <span class="kp">sum</span><span class="p">(</span><span class="kp">ifelse</span><span class="p">(</span>sex<span class="o">==</span><span class="s">&#39;F&#39;</span><span class="p">,</span><span class="m">1</span><span class="p">,</span><span class="m">0</span><span class="p">)),</span> <span class="s">&#39;M&#39;</span><span class="p">,</span> <span class="s">&#39;F&#39;</span><span class="p">))</span> <span class="o">%&gt;%</span> ungroup<span class="p">()</span> <span class="o">%&gt;%</span> filter<span class="p">(</span>gender_filter<span class="p">)</span> <span class="o">%&gt;%</span> select<span class="p">(</span><span class="o">-</span>gender_filter<span class="p">)</span> racing_data_filled <span class="o">&lt;-</span> racing_data <span class="o">%&gt;%</span> inner_join<span class="p">(</span>racers<span class="p">,</span> by<span class="o">=</span><span class="s">&quot;name&quot;</span><span class="p">)</span> <span class="o">%&gt;%</span> mutate<span class="p">(</span>birth_year<span class="o">=</span>birth_year.y<span class="p">)</span> <span class="o">%&gt;%</span> select<span class="p">(</span>name<span class="p">,</span> birth_year<span class="p">,</span> gender<span class="p">,</span> race_str<span class="p">,</span> year<span class="p">,</span> miles<span class="p">,</span> finish_time<span class="p">)</span> <span class="o">%&gt;%</span> group_by<span class="p">(</span>name<span class="p">,</span> race_str<span class="p">,</span> year<span class="p">)</span> <span class="o">%&gt;%</span> mutate<span class="p">(</span>n<span class="o">=</span>n<span class="p">())</span> <span class="o">%&gt;%</span> filter<span class="p">(</span><span class="o">!</span><span class="kp">is.na</span><span class="p">(</span>birth_year<span class="p">),</span> n<span class="o">==</span><span class="m">1</span><span class="p">)</span> <span class="o">%&gt;%</span> ungroup<span class="p">()</span> <span class="o">%&gt;%</span> collect<span class="p">()</span> <span class="o">%&gt;%</span> mutate<span class="p">(</span>fixed<span class="o">=</span><span class="kp">ifelse</span><span class="p">(</span><span class="kp">grepl</span><span class="p">(</span><span class="s">&#39;[0-9]+:[0-9]+:[0-9.]+&#39;</span><span class="p">,</span> finish_time<span class="p">),</span> finish_time<span class="p">,</span> <span class="kp">paste0</span><span class="p">(</span><span class="s">&#39;00:&#39;</span><span class="p">,</span> finish_time<span class="p">)),</span> minutes<span class="o">=</span><span class="kp">as.numeric</span><span class="p">(</span>seconds<span class="p">(</span>hms<span class="p">(</span>fixed<span class="p">)))</span><span class="o">/</span><span class="m">60.0</span><span class="p">,</span> pace<span class="o">=</span>minutes<span class="o">/</span>miles<span class="p">,</span> age<span class="o">=</span>year<span class="o">-</span>birth_year<span class="p">,</span> age_class<span class="o">=</span><span class="kp">as.integer</span><span class="p">(</span>age<span class="o">/</span><span class="m">10</span><span class="p">)</span><span class="o">*</span><span class="m">10</span><span class="p">,</span> group<span class="o">=</span><span class="kp">paste0</span><span class="p">(</span>gender<span class="p">,</span> age_class<span class="p">),</span> gender<span class="o">=</span><span class="kp">as.factor</span><span class="p">(</span>gender<span class="p">))</span> <span class="o">%&gt;%</span> filter<span class="p">((</span>miles<span class="o">&lt;</span><span class="m">10</span> <span class="o">&amp;</span> pace<span class="o">&lt;</span><span class="m">16</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span>miles<span class="o">&gt;=</span><span class="m">10</span> <span class="o">&amp;</span> pace<span class="o">&lt;</span><span class="m">20</span><span class="p">))</span> <span class="o">%&gt;%</span> select<span class="p">(</span><span class="o">-</span>fixed<span class="p">,</span> <span class="o">-</span>finish_time<span class="p">,</span> <span class="o">-</span>n<span class="p">)</span> speeds_combined <span class="o">&lt;-</span> racing_data_filled <span class="o">%&gt;%</span> select<span class="p">(</span>name<span class="p">,</span> gender<span class="p">,</span> age<span class="p">,</span> age_class<span class="p">,</span> group<span class="p">,</span> race_str<span class="p">,</span> year<span class="p">,</span> pace<span class="p">)</span> <span class="o">%&gt;%</span> spread<span class="p">(</span>race_str<span class="p">,</span> pace<span class="p">)</span> main_races <span class="o">&lt;-</span> <span class="kt">c</span><span class="p">(</span><span class="s">&#39;beat_beethoven&#39;</span><span class="p">,</span> <span class="s">&#39;chena_river_run&#39;</span><span class="p">,</span> <span class="s">&#39;midnight_sun_run&#39;</span><span class="p">,</span> <span class="s">&#39;run_of_the_valkyries&#39;</span><span class="p">,</span> <span class="s">&#39;gold_discovery_run&#39;</span><span class="p">,</span> <span class="s">&#39;santa_claus_half_marathon&#39;</span><span class="p">,</span> <span class="s">&#39;golden_heart_trail_run&#39;</span><span class="p">,</span> <span class="s">&#39;equinox_marathon&#39;</span><span class="p">,</span> <span class="s">&#39;hoodoo_half_marathon&#39;</span><span class="p">)</span> race_formula_str <span class="o">&lt;-</span> <span class="kp">lapply</span><span class="p">(</span><span class="kp">seq</span><span class="p">(</span><span class="m">1</span><span class="p">,</span> <span class="kp">length</span><span class="p">(</span>main_races<span class="p">)</span><span class="m">-1</span><span class="p">),</span> <span class="kr">function</span><span class="p">(</span>i<span class="p">)</span> <span class="kp">lapply</span><span class="p">(</span><span class="kp">seq</span><span class="p">(</span>i<span class="m">+1</span><span class="p">,</span> <span class="kp">length</span><span class="p">(</span>main_races<span class="p">)),</span> <span class="kr">function</span><span class="p">(</span>j<span class="p">)</span> <span class="kp">paste</span><span class="p">(</span>main_races<span class="p">[[</span>j<span class="p">]],</span> <span class="s">&#39;~&#39;</span><span class="p">,</span> main_races<span class="p">[[</span>i<span class="p">]],</span> <span class="s">&#39;+ gender&#39;</span><span class="p">,</span> <span class="s">&#39;+ age&#39;</span><span class="p">)))</span> <span class="o">%&gt;%</span> <span class="kp">unlist</span><span class="p">()</span> race_formulas <span class="o">&lt;-</span> <span class="kp">lapply</span><span class="p">(</span>race_formula_str<span class="p">,</span> <span class="kr">function</span><span class="p">(</span>i<span class="p">)</span> as.formula<span class="p">(</span>i<span class="p">))</span> <span class="o">%&gt;%</span> <span class="kp">unlist</span><span class="p">()</span> lm_models <span class="o">&lt;-</span> map<span class="p">(</span>race_formulas<span class="p">,</span> <span class="o">~</span> lm<span class="p">(</span><span class="m">.</span>x<span class="p">,</span> data<span class="o">=</span>speeds_combined<span class="p">))</span> models <span class="o">&lt;-</span> tibble<span class="p">(</span>start_race<span class="o">=</span><span class="kp">factor</span><span class="p">(</span><span class="kp">gsub</span><span class="p">(</span><span class="s">&#39;.* ~ ([^ ]+).*&#39;</span><span class="p">,</span> <span class="s">&#39;\\1&#39;</span><span class="p">,</span> race_formula_str<span class="p">),</span> levels<span class="o">=</span>main_races<span class="p">),</span> predicted_race<span class="o">=</span><span class="kp">factor</span><span class="p">(</span><span class="kp">gsub</span><span class="p">(</span><span class="s">&#39;([^ ]+).*&#39;</span><span class="p">,</span> <span class="s">&#39;\\1&#39;</span><span class="p">,</span> race_formula_str<span class="p">),</span> levels<span class="o">=</span>main_races<span class="p">),</span> lm_models<span class="o">=</span>lm_models<span class="p">)</span> <span class="o">%&gt;%</span> arrange<span class="p">(</span>start_race<span class="p">,</span> predicted_race<span class="p">)</span> model_stats <span class="o">&lt;-</span> glance<span class="p">(</span>models <span class="o">%&gt;%</span> rowwise<span class="p">(),</span> lm_models<span class="p">)</span> model_coefficients <span class="o">&lt;-</span> tidy<span class="p">(</span>models <span class="o">%&gt;%</span> rowwise<span class="p">(),</span> lm_models<span class="p">)</span> reduced_formula_str <span class="o">&lt;-</span> model_coefficients <span class="o">%&gt;%</span> ungroup<span class="p">()</span> <span class="o">%&gt;%</span> filter<span class="p">(</span>p.value<span class="o">&lt;</span><span class="m">0.05</span><span class="p">,</span> term<span class="o">!=</span><span class="s">&#39;(Intercept)&#39;</span><span class="p">)</span> <span class="o">%&gt;%</span> mutate<span class="p">(</span>term<span class="o">=</span><span class="kp">gsub</span><span class="p">(</span><span class="s">&#39;genderM&#39;</span><span class="p">,</span> <span class="s">&#39;gender&#39;</span><span class="p">,</span> term<span class="p">))</span> <span class="o">%&gt;%</span> group_by<span class="p">(</span>predicted_race<span class="p">,</span> start_race<span class="p">)</span> <span class="o">%&gt;%</span> summarize<span class="p">(</span>independent_vars<span class="o">=</span><span class="kp">paste</span><span class="p">(</span>term<span class="p">,</span> collapse<span class="o">=</span><span class="s">&quot; + &quot;</span><span class="p">))</span> <span class="o">%&gt;%</span> ungroup<span class="p">()</span> <span class="o">%&gt;%</span> transmute<span class="p">(</span>reduced_formulas<span class="o">=</span><span class="kp">paste</span><span class="p">(</span>predicted_race<span class="p">,</span> independent_vars<span class="p">,</span> sep<span class="o">=</span><span class="s">&#39; ~ &#39;</span><span class="p">))</span> reduced_formula_str <span class="o">&lt;-</span> reduced_formula_str<span class="o">$</span>reduced_formulas reduced_race_formulas <span class="o">&lt;-</span> <span class="kp">lapply</span><span class="p">(</span>reduced_formula_str<span class="p">,</span> <span class="kr">function</span><span class="p">(</span>i<span class="p">)</span> as.formula<span class="p">(</span>i<span class="p">))</span> <span class="o">%&gt;%</span> <span class="kp">unlist</span><span class="p">()</span> reduced_lm_models <span class="o">&lt;-</span> map<span class="p">(</span>reduced_race_formulas<span class="p">,</span> <span class="o">~</span> lm<span class="p">(</span><span class="m">.</span>x<span class="p">,</span> data<span class="o">=</span>speeds_combined<span class="p">))</span> n_from_lm <span class="o">&lt;-</span> <span class="kr">function</span><span class="p">(</span>model<span class="p">)</span> <span class="p">{</span> summary_object <span class="o">&lt;-</span> <span class="kp">summary</span><span class="p">(</span>model<span class="p">)</span> summary_object<span class="o">$</span>df<span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="o">+</span> summary_object<span class="o">$</span>df<span class="p">[</span><span class="m">2</span><span class="p">]</span> <span class="p">}</span> reduced_models <span class="o">&lt;-</span> tibble<span class="p">(</span>start_race<span class="o">=</span><span class="kp">factor</span><span class="p">(</span><span class="kp">gsub</span><span class="p">(</span><span class="s">&#39;.* ~ ([^ ]+).*&#39;</span><span class="p">,</span> <span class="s">&#39;\\1&#39;</span><span class="p">,</span> reduced_formula_str<span class="p">),</span> levels<span class="o">=</span>main_races<span class="p">),</span> predicted_race<span class="o">=</span><span class="kp">factor</span><span class="p">(</span><span class="kp">gsub</span><span class="p">(</span><span class="s">&#39;([^ ]+).*&#39;</span><span class="p">,</span> <span class="s">&#39;\\1&#39;</span><span class="p">,</span> reduced_formula_str<span class="p">),</span> levels<span class="o">=</span>main_races<span class="p">),</span> lm_models<span class="o">=</span>reduced_lm_models<span class="p">)</span> <span class="o">%&gt;%</span> arrange<span class="p">(</span>start_race<span class="p">,</span> predicted_race<span class="p">)</span> <span class="o">%&gt;%</span> rowwise<span class="p">()</span> <span class="o">%&gt;%</span> mutate<span class="p">(</span>n<span class="o">=</span>n_from_lm<span class="p">(</span>lm_models<span class="p">))</span> reduced_model_stats <span class="o">&lt;-</span> glance<span class="p">(</span>reduced_models <span class="o">%&gt;%</span> rowwise<span class="p">(),</span> lm_models<span class="p">)</span> reduced_model_coefficients <span class="o">&lt;-</span> tidy<span class="p">(</span>reduced_models <span class="o">%&gt;%</span> rowwise<span class="p">(),</span> lm_models<span class="p">)</span> <span class="o">%&gt;%</span> ungroup<span class="p">()</span> coefficients_and_stats <span class="o">&lt;-</span> reduced_model_stats <span class="o">%&gt;%</span> inner_join<span class="p">(</span>reduced_model_coefficients<span class="p">,</span> by<span class="o">=</span><span class="kt">c</span><span class="p">(</span><span class="s">&quot;start_race&quot;</span><span class="p">,</span> <span class="s">&quot;predicted_race&quot;</span><span class="p">,</span> <span class="s">&quot;n&quot;</span><span class="p">))</span> <span class="o">%&gt;%</span> select<span class="p">(</span>start_race<span class="p">,</span> predicted_race<span class="p">,</span> n<span class="p">,</span> r.squared<span class="p">,</span> term<span class="p">,</span> estimate<span class="p">)</span> write_csv<span class="p">(</span>coefficients_and_stats<span class="p">,</span> <span class="s">&quot;coefficients.csv&quot;</span><span class="p">)</span> make_scatterplot <span class="o">&lt;-</span> <span class="kr">function</span><span class="p">(</span>start_race<span class="p">,</span> predicted_race<span class="p">)</span> <span class="p">{</span> age_limits <span class="o">&lt;-</span> speeds_combined <span class="o">%&gt;%</span> filter_<span class="p">(</span><span class="kp">paste</span><span class="p">(</span><span class="s">&quot;!is.na(&quot;</span><span class="p">,</span> start_race<span class="p">,</span> <span class="s">&quot;)&quot;</span><span class="p">),</span> <span class="kp">paste</span><span class="p">(</span><span class="s">&quot;!is.na(&quot;</span><span class="p">,</span> predicted_race<span class="p">,</span> <span class="s">&quot;)&quot;</span><span class="p">))</span> <span class="o">%&gt;%</span> summarize<span class="p">(</span>min<span class="o">=</span><span class="kp">min</span><span class="p">(</span>age<span class="p">),</span> max<span class="o">=</span><span class="kp">max</span><span class="p">(</span>age<span class="p">))</span> <span class="o">%&gt;%</span> <span class="kp">unlist</span><span class="p">()</span> q <span class="o">&lt;-</span> ggplot<span class="p">(</span>data<span class="o">=</span>speeds_combined<span class="p">,</span> aes_string<span class="p">(</span>x<span class="o">=</span>start_race<span class="p">,</span> y<span class="o">=</span>predicted_race<span class="p">))</span> <span class="o">+</span> <span class="c1"># plasma works better with a grey background</span> <span class="c1"># theme_bw() +</span> geom_abline<span class="p">(</span>slope<span class="o">=</span><span class="m">1</span><span class="p">,</span> color<span class="o">=</span><span class="s">&quot;darkred&quot;</span><span class="p">,</span> alpha<span class="o">=</span><span class="m">0.5</span><span class="p">)</span> <span class="o">+</span> geom_smooth<span class="p">(</span>method<span class="o">=</span><span class="s">&quot;lm&quot;</span><span class="p">,</span> se<span class="o">=</span><span class="kc">FALSE</span><span class="p">)</span> <span class="o">+</span> geom_point<span class="p">(</span>aes<span class="p">(</span>shape<span class="o">=</span>gender<span class="p">,</span> color<span class="o">=</span>age<span class="p">))</span> <span class="o">+</span> scale_color_viridis<span class="p">(</span>option<span class="o">=</span><span class="s">&quot;plasma&quot;</span><span class="p">,</span> limits<span class="o">=</span>age_limits<span class="p">)</span> <span class="o">+</span> scale_x_continuous<span class="p">(</span>breaks<span class="o">=</span>pretty_breaks<span class="p">(</span>n<span class="o">=</span><span class="m">10</span><span class="p">))</span> <span class="o">+</span> scale_y_continuous<span class="p">(</span>breaks<span class="o">=</span>pretty_breaks<span class="p">(</span>n<span class="o">=</span><span class="m">6</span><span class="p">))</span> svg_filename <span class="o">&lt;-</span> <span class="kp">paste0</span><span class="p">(</span><span class="kp">paste</span><span class="p">(</span>start_race<span class="p">,</span> predicted_race<span class="p">,</span> sep<span class="o">=</span><span class="s">&quot;-&quot;</span><span class="p">),</span> <span class="s">&quot;.svg&quot;</span><span class="p">)</span> height <span class="o">&lt;-</span> <span class="m">9</span> width <span class="o">&lt;-</span> <span class="m">16</span> resize <span class="o">&lt;-</span> <span class="m">0.75</span> svg<span class="p">(</span>svg_filename<span class="p">,</span> height<span class="o">=</span>height<span class="o">*</span>resize<span class="p">,</span> width<span class="o">=</span>width<span class="o">*</span>resize<span class="p">)</span> <span class="kp">print</span><span class="p">(</span><span class="kp">q</span><span class="p">)</span> dev.off<span class="p">()</span> <span class="p">}</span> <span class="kp">lapply</span><span class="p">(</span><span class="kp">seq</span><span class="p">(</span><span class="m">1</span><span class="p">,</span> <span class="kp">length</span><span class="p">(</span>main_races<span class="p">)</span><span class="m">-1</span><span class="p">),</span> <span class="kr">function</span><span class="p">(</span>i<span class="p">)</span> <span class="kp">lapply</span><span class="p">(</span><span class="kp">seq</span><span class="p">(</span>i<span class="m">+1</span><span class="p">,</span> <span class="kp">length</span><span class="p">(</span>main_races<span class="p">)),</span> <span class="kr">function</span><span class="p">(</span>j<span class="p">)</span> make_scatterplot<span class="p">(</span>main_races<span class="p">[[</span>i<span class="p">]],</span> main_races<span class="p">[[</span>j<span class="p">]])</span> <span class="p">)</span> </pre></div> </div> </div> Sat, 29 Oct 2016 21:14:02 -0800 http://swingleydev.com/blog/p/2000/ running R races statistics data science