New Zealand fatal traffic crashes
At a glance:
I explore half a million rows of disaggregated crash data for New Zealand, and along the way illustrate geo-spatial projections, maps, forecasting with ensembles of methods, a state space model for change over time, and a generalized linear model for understanding interactions in a three-way cross tab.Traffic fatalities in the news
Fatalities from traffic crashes have been in the news in New Zealand recently, for tragic reasons. For once, data beyond the short term have come into the debate. Sam Warburton has written two good articles on the topic, with good consideration of the data. He has also, via Twitter, made his compilation of NZTA and other data and his considerable workings available.
Traffic, and crashes in particular, are characterised by large volumes of high quality data. Sam Warburton’s workings can be supplemented with a range of data from the NZTA. I know little about the subject and thought it seemed a good one to explore. This is a long blog post but I still feel I’ve barely touched the surface. I nearly split this into a series of posts but decided I won’t get back to this for months at least, so best to get down what I fiddled around with straight away.
Long term
I always like to see as long term a picture as I can in a new area. In one of the tabs in Sam Warburton’s Excel book is a table with annual data back to 1950. Check this out:
Interesting!
Here’s the R code that draws that graph, plus what’s needed in terms of R packages for the rest of the post:
library(tidyverse)
library(mapproj)
library(proj4)
library(ggmap)
library(ggthemes) # for theme_map
library(stringr)
library(forcats)
library(viridis)
library(leaflet)
library(htmltools)
library(broom)
library(openxlsx)
library(forecastHybrid)
library(rstan)
#=================long term picture========================
longterm <- read.xlsx("../data/vkt2.1.xlsx", sheet = "safety (2)",
cols = c(1, 3), rows = 2:69)
ggplot(longterm, aes(x = Year, y = Deaths)) +
geom_line() +
ggtitle("Road deaths in New Zealand 1950 - 2016",
"Not adjusted for population or vehicle kilometres travelled") +
labs(caption = "Source: NZTA data compiled by Sam Warburton")
Spatial elements
There have been a few charts going around of crashes and fatalities over time aggregated by region, but I was curious to see a more granular picture. It turns out the NZTA publish a CSV of all half a million recorded crashes (fatal and non-fatal) back to 2000. The data is cut back a bit - it doesn’t have comments on what may have caused the crash for example, or the exact date - but it does have the exact NZTM coordinates of the crash location, which we can convert to latitude and longitude and use for maps.
So I started as always with the big picture - where have all the fatal crashes been since 2000?
Unsurprisingly, they are heavily concentrated in the population centres and along roads. What happens when we look at this view over time?
So, turns out a map isn’t a great way to see subtle trends. In fact, I doubt there’s an effective simple visualisation that would show changing spatial patterns here; it’s definitely a job for a sophisticated model first, followed by a visualisation of its results. Outside today’s scope.
A map is a good way to just get to understand what is in some data though. Here are another couple of variables the NZTA provide. First, whether the accidents happened on a public holiday:
… and combination of vehicles they involved:
Here’s the code for downloading the individual crash data, converting coordinates from NZTM to latitude and longitude, and building those maps. This is a nice example of the power of the ggplot2
universe. Once I’ve defined my basic map and its theme, redrawing it with new faceting variables is just one line of code in each case.
#============download data and tidy up a bit======================
# The individual crash data come from
# https://www.nzta.govt.nz/safety/safety-resources/road-safety-information-and-tools/disaggregated-crash-data/
# caution, largish file - 27 MB
download.file("https://www.nzta.govt.nz/assets/Safety/docs/disaggregated-crash-data.zip",
destfile = "disaggregated-crash-data.zip", mode = "wb")
unzip("disaggregated-crash-data.zip")
crash <- read.csv("disaggregated-crash-data.csv", check.names = FALSE)
dim(crash) # 586,189 rows
# I prefer lower case names, and underscores rather than dots:
names(crash) <- gsub(" ", "_", str_trim(tolower(names(crash))), fixed = TRUE)
# Convert the NZTM coordinates to latitude and longitude:
# see https://gis.stackexchange.com/questions/20389/converting-nzmg-or-nztm-to-latitude-longitude-for-use-with-r-map-library
p4str <- "+proj=tmerc +lat_0=0.0 +lon_0=173.0 +k=0.9996 +x_0=1600000.0 +y_0=10000000.0 +datum=WGS84 +units=m"
p <- proj4::project(crash[ , c("easting", "northing")], proj = p4str, inverse = TRUE)
crash$longitude <- p$x
crash$latitude <- p$y
#====================NZ maps========================
my_map_theme <- theme_map(base_family = "myfont") +
theme(strip.background = element_blank(),
strip.text = element_text(size = rel(1), face = "bold"),
plot.caption = element_text(colour = "grey50"))
m <- crash %>%
as_tibble() %>%
filter(fatal_count > 0) %>%
ggplot(aes(x = longitude,