Choropleth maps visually summarise how variables (like species richness or population density, for example) vary across geographic areas. These maps require two inputs:
- a geospatial object with information on regional boundaries
- a numerical variable that can be mapped to each geographic unit using colour
Here, I walk through the process of mapping the density of plant records from the ALA to geographic bioregions across Australia, using two colour scales to differentiate between marine and terrestrial records.
Get geospatial and count data
Let’s start by loading the packages we’ll need.
library(galah)
library(here)
library(sf)
library(rmapshaper)
library(dplyr)
library(stringr)
library(ggplot2)
library(ggnewscale)
Next, we’ll need some regional boundaries. I think the IBRA7 and IMCRA4 bioregions will work nicely for what we’re planning. These boundaries classify Australia’s landscapes and waters into geographically distinct bioregions based on variables like climate, geomorphology, and species information.
After downloading the data, we can read it in using the sf
package and check that it looks correct. Here, I’ve also elected to use ms_simplify()
from the rmapshaper
package to simplify the geospatial features and speed up computation.
# read in IMCRA shapefile
<- st_read(here("posts",
imcra_shp "data",
"imcra_mesoscale_bioregions",
"imcra4_meso.shp"),
quiet = TRUE) |>
ms_simplify(keep = 0.1)
# read in IBRA shapefile
<- st_read(here("posts",
ibra_shp "data",
"IBRA7_regions",
"ibra7_regions.shp"),
quiet = TRUE) |>
ms_simplify(keep = 0.1)
And finally, let’s get the number of plant records in the ALA using the galah
package, grouped by IBRA or IMCRA region. To do this, we need to know what the ALA calls the IBRA and IMCRA layers.
Using the search_fields()
function, we can determine that the IBRA layer we’re after is called cl1048 and the IMCRA layer, cl966.
search_fields("IBRA")
# A tibble: 3 × 3
id description type
<chr> <chr> <chr>
1 cl20 IBRA 6 Regions fields
2 cl1048 IBRA 7 Regions fields
3 cl1049 IBRA 7 Subregions fields
search_fields("IMCRA")
# A tibble: 2 × 3
id description type
<chr> <chr> <chr>
1 cl21 IMCRA Regions fields
2 cl966 IMCRA Meso-scale Bioregions fields
To get counts of records from the ALA, we can pass a query with galah_call()
and build our query using pipes.
We will specify that we only want plant records matching Plantae or Chlorophyta using galah_identify()
, apply the default set of ALA data quality filters to remove poor quality records using galah_filter()
, group records by region using galah_group_by()
, and finally return the counts of records that match all our criteria with atlas_counts()
.
# counts in IBRA regions
<- galah_call() |>
ibra_counts galah_identify("plantae", "chlorophyta") |>
galah_filter(profile = "ALA") |>
galah_group_by("cl1048") |> # IBRA regions
atlas_counts()
head(ibra_counts)
# A tibble: 6 × 2
cl1048 count
<chr> <int>
1 Sydney Basin 2809259
2 South Eastern Highlands 1833611
3 South Eastern Queensland 1104671
4 NSW North Coast 925380
5 South East Corner 909572
6 Murray Darling Depression 904340
# counts in IMCRA regions
<- galah_call() |>
imcra_counts galah_identify("plantae", "chlorophyta") |>
galah_filter(profile = "ALA") |>
galah_group_by("cl966") |> # IMCRA bioregions
atlas_counts()
head(imcra_counts)
# A tibble: 6 × 2
cl966 count
<chr> <int>
1 Shoalwater Coast 434913
2 Lucinda-Mackay Coast 260228
3 Torres Strait 242812
4 West Cape York 215772
5 Wet Tropic Coast 143362
6 Pellew 91955
Join geospatial and count data
We now have the two things we need to make a choropleth map:
- IBRA/IMCRA boundaries
- counts of plant records in each region
To create a plot, we need to combine the geospatial and numeric data. But first, let’s check if the data needs to be tidied.
As we’re going to be joining the spatial and count data, we need to be sure that the names of the IBRA/IMCRA regions match in both datasets. To double check that all of our region names match, we’ll use setdiff()
. There are no name mismatches when character(0)
is returned, but if any region names are returned that means there is a problem somewhere that we need to fix before joining our dataframes.
When we run setdiff()
, the IBRA names match perfectly, but there’s a mismatch in two IMCRA names.
# check region names match
setdiff(ibra_counts$cl1048, ibra_shp$REG_NAME_7)
character(0)
setdiff(imcra_counts$cl966, imcra_shp$MESO_NAME)
[1] "Pilbarra (nearshore)" "Pilbarra (offshore)"
Reversing the order of IMCRA data frames in setdiff()
reveals that that Pilbara is misspelled in the imcra_counts
dataset. We can easily change this and confirm both sets of names match before continuing.
# check the reverse for IMCRA names
setdiff(imcra_shp$MESO_NAME, imcra_counts$cl966)
[1] "Pilbara (offshore)" "Pilbara (nearshore)"
# replace "Pilbarra" with "Pilbara"
<- imcra_counts |>
imcra_counts mutate(cl966 = str_replace(string = cl966,
pattern = "Pilbarra",
replacement = "Pilbara"))
# check names match
setdiff(imcra_counts$cl966, imcra_shp$MESO_NAME)
character(0)
Now let’s check how our data are distributed so we can decide whether we should scale them with a transformation before plotting. Data skewed too far to the right will not show differences very clearly when they are mapped.
Checking the distribution of counts in each dataset shows a substantial skew to the right.
hist(imcra_counts$count)
hist(ibra_counts$count)
Applying a log-transformation to the count data makes the distribution more symmetrical.
hist(log(imcra_counts$count))
hist(log(ibra_counts$count))
Next, we join the geospatial and numeric data. Along the way, we rename some columns, remove unnecessary columns, calculate counts as a proportion of the area of each region (so we’re plotting density of records, not counts of records), and convert the resulting dataframe into a simple features object.
<- imcra_counts |>
imcra_join full_join(y = imcra_shp, by = c("cl966" = "MESO_NAME")) |>
rename("imcra" = "cl966") |>
select(imcra, count, AREA_KM2, geometry) |>
mutate(density_log10 = log10(count / AREA_KM2)) |>
select(imcra, density_log10, geometry) |>
st_as_sf()
<- ibra_counts |>
ibra_join full_join(y = ibra_shp, by = c("cl1048" = "REG_NAME_7")) |>
rename("ibra" = "cl1048") |>
select(ibra, count, SQ_KM, geometry) |>
mutate(density_log10 = log10(count / SQ_KM)) |>
select(ibra, density_log10, geometry) |>
st_as_sf()
Make a map
Finally, we’ll use the ggnewscale
package to apply different colour palettes to the marine and terrestrial data in a choropleth map.
ggplot() +
geom_sf(data = imcra_join,
aes(fill = density_log10),
colour = NA) +
scale_fill_distiller(name = "IMCRA",
type = "seq",
palette = "BuPu",
direction = 1,
labels = c("0.001", "0.01", "0.1", "1", "10"),
guide = guide_colorsteps(direction = "horizontal",
label.position = "bottom",
title.position = "left")) +
# adds new colour scale
::new_scale_fill() +
ggnewscalegeom_sf(data = ibra_join,
aes(fill = density_log10),
colour = NA) +
scale_fill_distiller(name = "IBRA",
type = "seq",
palette = "YlOrBr",
direction = 1,
labels = c("0.1", "1", "10", "100"),
guide = guide_colorsteps(direction = "horizontal",
label.position = "bottom",
title.position = "left")) +
# adds a title for both legends
annotate("text",
x = 133,
y = -45.5,
label = "No. of records per square km",
size = 6) +
coord_sf(xlim = c(110, 155), ylim = c(-45, -10)) +
theme_void() +
theme(legend.position = "bottom",
legend.key.width = unit(12, 'mm'))
Success!
One thing to note is that we didn’t necessarily have to use ggnewscale
here; we could just as easily have combined all the data and plotted them on the same map without keeping the IBRA and IMCRA datasets separate. But, i) it’s nice to be able to differentiate marine and terrestrial regions at a glance, and ii) using two legends also makes it clear that there’s a stark difference in the number of plant records for marine and terrestrial regions.
Expand for session info
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.3.2 (2023-10-31 ucrt)
os Windows 10 x64 (build 19045)
system x86_64, mingw32
ui RTerm
language (EN)
collate English_Australia.utf8
ctype English_Australia.utf8
tz Australia/Sydney
date 2024-02-12
pandoc 3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.3.2)
galah * 2.0.1 2024-02-06 [1] CRAN (R 4.3.2)
ggnewscale * 0.4.9 2023-05-25 [1] CRAN (R 4.3.2)
ggplot2 * 3.4.4 2023-10-12 [1] CRAN (R 4.3.1)
here * 1.0.1 2020-12-13 [1] CRAN (R 4.3.2)
htmltools * 0.5.7 2023-11-03 [1] CRAN (R 4.3.2)
rmapshaper * 0.5.0 2023-04-11 [1] CRAN (R 4.3.2)
sessioninfo * 1.2.2 2021-12-06 [1] CRAN (R 4.3.2)
sf * 1.0-14 2023-07-11 [1] CRAN (R 4.3.2)
stringr * 1.5.1 2023-11-14 [1] CRAN (R 4.3.2)
[1] C:/Users/KEL329/R-packages
[2] C:/Users/KEL329/AppData/Local/Programs/R/R-4.3.2/library
──────────────────────────────────────────────────────────────────────────────