Choropleth maps are an excellent way to visualise numbers of observations, but when using point data, calculating the number of points in each polygon can be difficult when using shapefiles. Here we demonstrate how to extract and summarise the number of points in each polygon to create a choropleth map.
Choropleth maps are an excellent way to visualise differences in variables (eg. number of species observed) across several geographic regions (eg. countries, states, study areas). Often, creating a choropleth map from species observations requires two things:
However, to create a choropleth map of species observations requires us to summarise our points to a single statistic for each polygon of our shapefile. This conversion from points to polygons can sometimes be tricky!
Here, we show you how to extract and count the number of points inside each polygon of a shapefile to create a choropleth map of the total number of species observations in each suburb of the Australian Capital Territory (ACT).
First we will load the R packages that we need:
Next we will need a shapefile. You can find many shapefiles online from reputable sources. For this example, I’ve downloaded a shapefile of suburb boundaries in the city of Canberra, ACT from the ACT’s open-access map database.
Usually when you download a shapefile, it is compressed within a zip folder. Save this downloaded zip folder in a local folder inside your current R project. If you need to unzip your folder, you can do so with the following code:
Now we load this unzipped shapefile into R. To save space, we’ll
remove some complexity from our shapefile polygons with
ms_simplify() from the
actsuburbs shapefile contains both suburb boundaries
and “district” boundaries, which can encompass several suburbs. To avoid
confusion, we will remove districts using
filter(LOC_CLASS != "District"). We’ll also use
st_make_valid() to make sure any weird invalid geometries
in our shapefile are made valid, and therefore plot correctly.
<- st_read(here("folder-name", actsuburbs "folder-name-2", "shapefilename.shp") |> ms_simplify(keep = 0.1) |> st_transform(crs = st_crs("WGS84")) |> st_make_valid() |> filter(LOC_CLASS != "District")
Now to see if our shapefile plots correctly, we can use
geom_sf() (and it looks like it does!)
We can download all Aves (bird) data provided by BirdLife
Australia within the ACT by using
galah_filter() to narrow
our download. We’ll also add ALA’s data profile, or what the ALA calls a
set of data quality filters to remove suspicious records, using
You will need to provide a registered
email with the ALA using
galah_config(email = "email@example.com", verbose = FALSE)
To prepare our data, we’ll convert each observation into a format
suitable for spatial mapping.
st_as_sf() transforms each
point into an
sf spatial object (which plots nicely with
ggplot2}). We’ll also make sure the points are projected
crs = set_crs("WGS84"), the same as our shapefile, so
that the points line up correctly.
Now we’ll find and count how many points are in each of our suburbs.
st_intersects() function checks whether each point
is within, or “intersects”, a specified
POLYGON and then
marks it as
FALSE in a matrix. Using
st_intersects() in a loop with
us to run
st_intersects() on each row of a supplied
In our case, because each row of
corresponds to each suburb,
pmap() recursively checks which
points are within each of our ACT suburbs! Adding
st_intersects() will count the number of rows
returned for each suburb list, returning the total number of points that
intersect each suburb. 1. We’ve saved this count in a new
Note: This function takes ~3.5 minutes to run
Simple feature collection with 8 features and 2 fields Geometry type: POLYGON Dimension: XY Bounding box: xmin: 149.0563 ymin: -35.48032 xmax: 149.2188 ymax: -35.15856 Geodetic CRS: WGS 84 LOC_NAME bird_count geometry 1 Acton 670 POLYGON ((149.128 -35.28592... 2 Ainslie 120 POLYGON ((149.1374 -35.2583... 3 Amaroo 50 POLYGON ((149.1157 -35.1682... 4 Aranda 38 POLYGON ((149.0876 -35.2551... 5 Banks 0 POLYGON ((149.1019 -35.4803... 6 Barton 755 POLYGON ((149.1295 -35.3070... 7 Beard 0 POLYGON ((149.2156 -35.3407... 8 Belconnen 1521 POLYGON ((149.0789 -35.2304...
If we check our bird counts, we can notice that our count data is skewed because many regions have lower numbers of observations and only a few regions have very high numbers of observations.
Log transformation will reduce the skew in our data, ultimately making our final choropleth map easier to interpret.
Let’s add log-transformed counts to a new column
log_counts. Next we’ll add a column called
counts_discrete and use
cut() to assign a
number from 0 to 5 depending on what range each
number falls within (0-1, 1-2, 2-3, 3-4 or 4-5). To make this easier to
interpret, we’ve added labels that correspond to the un-transformed
values (0, 10, 100, 1000, 10000). Finally, we replace any pesky
NA values with 0.
act_counts <- act_counts |> rowwise() |> mutate(log_counts = log10(bird_count)) |> mutate(counts_discrete = cut(log_counts, breaks = c(0, 1, 2, 3, 4, 5), labels = c(0, 10, 100, 1000, 10000), include.lowest = TRUE)) |> replace_na(list(counts_discrete = "0")) act_counts |> rmarkdown::paged_table() # final data frame