Author

Date

23 May 2022

Choropleth maps visually summarise how variables (like species richness or population density, for example) vary across geographic areas. These maps require two inputs:

a geospatial object with information on regional boundaries
a numerical variable that can be mapped to each geographic unit using colour

Here, I walk through the process of mapping the density of plant records from the ALA to geographic bioregions across Australia, using two colour scales to differentiate between marine and terrestrial records.

Get geospatial and count data

Let’s start by loading the packages we’ll need.

library(galah)
library(here)
library(sf)
library(rmapshaper)
library(dplyr)
library(stringr)
library(ggplot2)
library(ggnewscale)

Next, we’ll need some regional boundaries. I think the IBRA7 and IMCRA4 bioregions will work nicely for what we’re planning. These boundaries classify Australia’s landscapes and waters into geographically distinct bioregions based on variables like climate, geomorphology, and species information.

After downloading the data, we can read it in using the sf package and check that it looks correct. Here, I’ve also elected to use ms_simplify() from the rmapshaper package to simplify the geospatial features and speed up computation.

# read in IMCRA shapefile
imcra_shp <- st_read(here("posts", 
                          "data",
                          "imcra_mesoscale_bioregions",
                          "imcra4_meso.shp"), 
                     quiet = TRUE) |> 
  ms_simplify(keep = 0.1)

# read in IBRA shapefile
ibra_shp <- st_read(here("posts",
                         "data",
                         "IBRA7_regions",
                         "ibra7_regions.shp"),
                    quiet = TRUE) |> 
  ms_simplify(keep = 0.1)

And finally, let’s get the number of plant records in the ALA using the galah package, grouped by IBRA or IMCRA region. To do this, we need to know what the ALA calls the IBRA and IMCRA layers.

Using the search_fields() function, we can determine that the IBRA layer we’re after is called cl1048 and the IMCRA layer, cl966.

search_fields("IBRA")

# A tibble: 3 × 3
  id     description       type  
  <chr>  <chr>             <chr> 
1 cl20   IBRA 6 Regions    fields
2 cl1048 IBRA 7 Regions    fields
3 cl1049 IBRA 7 Subregions fields

search_fields("IMCRA")

# A tibble: 2 × 3
  id    description                 type  
  <chr> <chr>                       <chr> 
1 cl21  IMCRA Regions               fields
2 cl966 IMCRA Meso-scale Bioregions fields

To get counts of records from the ALA, we can pass a query with galah_call() and build our query using pipes.

We will specify that we only want plant records matching Plantae or Chlorophyta using galah_identify(), apply the default set of ALA data quality filters to remove poor quality records using galah_filter(), group records by region using galah_group_by(), and finally return the counts of records that match all our criteria with atlas_counts().

# counts in IBRA regions
ibra_counts <- galah_call() |>
  galah_identify("plantae", "chlorophyta") |>
  galah_filter(profile = "ALA") |> 
  galah_group_by("cl1048") |>      # IBRA regions
  atlas_counts()

head(ibra_counts)

# A tibble: 6 × 2
  cl1048                      count
  <chr>                       <int>
1 Sydney Basin              2809259
2 South Eastern Highlands   1833611
3 South Eastern Queensland  1104671
4 NSW North Coast            925380
5 South East Corner          909572
6 Murray Darling Depression  904340

# counts in IMCRA regions
imcra_counts <- galah_call() |>
  galah_identify("plantae", "chlorophyta") |>
  galah_filter(profile = "ALA") |> 
  galah_group_by("cl966") |>      # IMCRA bioregions
  atlas_counts()

head(imcra_counts)

# A tibble: 6 × 2
  cl966                 count
  <chr>                 <int>
1 Shoalwater Coast     434913
2 Lucinda-Mackay Coast 260228
3 Torres Strait        242812
4 West Cape York       215772
5 Wet Tropic Coast     143362
6 Pellew                91955

Join geospatial and count data

We now have the two things we need to make a choropleth map:

IBRA/IMCRA boundaries
counts of plant records in each region

To create a plot, we need to combine the geospatial and numeric data. But first, let’s check if the data needs to be tidied.

As we’re going to be joining the spatial and count data, we need to be sure that the names of the IBRA/IMCRA regions match in both datasets. To double check that all of our region names match, we’ll use setdiff(). There are no name mismatches when character(0) is returned, but if any region names are returned that means there is a problem somewhere that we need to fix before joining our dataframes.

When we run setdiff(), the IBRA names match perfectly, but there’s a mismatch in two IMCRA names.

# check region names match
setdiff(ibra_counts$cl1048, ibra_shp$REG_NAME_7)

character(0)

setdiff(imcra_counts$cl966, imcra_shp$MESO_NAME)

[1] "Pilbarra (nearshore)" "Pilbarra (offshore)"

Reversing the order of IMCRA data frames in setdiff() reveals that that Pilbara is misspelled in the imcra_counts dataset. We can easily change this and confirm both sets of names match before continuing.

# check the reverse for IMCRA names
setdiff(imcra_shp$MESO_NAME, imcra_counts$cl966)

[1] "Pilbara (offshore)"  "Pilbara (nearshore)"

# replace "Pilbarra" with "Pilbara" 
imcra_counts <- imcra_counts |> 
  mutate(cl966 = str_replace(string = cl966, 
                             pattern = "Pilbarra", 
                             replacement = "Pilbara"))

# check names match
setdiff(imcra_counts$cl966, imcra_shp$MESO_NAME)

character(0)

Now let’s check how our data are distributed so we can decide whether we should scale them with a transformation before plotting. Data skewed too far to the right will not show differences very clearly when they are mapped.

Checking the distribution of counts in each dataset shows a substantial skew to the right.

hist(imcra_counts$count)
hist(ibra_counts$count)

Applying a log-transformation to the count data makes the distribution more symmetrical.

hist(log(imcra_counts$count))
hist(log(ibra_counts$count))

Next, we join the geospatial and numeric data. Along the way, we rename some columns, remove unnecessary columns, calculate counts as a proportion of the area of each region (so we’re plotting density of records, not counts of records), and convert the resulting dataframe into a simple features object.

imcra_join <- imcra_counts |> 
  full_join(y = imcra_shp, by = c("cl966" = "MESO_NAME")) |> 
  rename("imcra" = "cl966") |> 
  select(imcra, count, AREA_KM2, geometry) |> 
  mutate(density_log10 = log10(count / AREA_KM2)) |> 
  select(imcra, density_log10, geometry) |> 
  st_as_sf()

ibra_join <- ibra_counts |> 
  full_join(y = ibra_shp, by = c("cl1048" = "REG_NAME_7")) |> 
  rename("ibra" = "cl1048") |> 
  select(ibra, count, SQ_KM, geometry) |> 
  mutate(density_log10 = log10(count / SQ_KM)) |> 
  select(ibra, density_log10, geometry) |> 
  st_as_sf()

Make a map

Finally, we’ll use the ggnewscale package to apply different colour palettes to the marine and terrestrial data in a choropleth map.

ggplot() + 
  geom_sf(data = imcra_join,
          aes(fill = density_log10),
          colour = NA) +
  scale_fill_distiller(name = "IMCRA",
                       type = "seq",
                       palette = "BuPu",
                       direction = 1,
                       labels = c("0.001", "0.01", "0.1", "1", "10"),
                       guide = guide_colorsteps(direction = "horizontal",
                                                label.position = "bottom",
                                                title.position = "left")) +
  # adds new colour scale
  ggnewscale::new_scale_fill() +
  geom_sf(data = ibra_join,
          aes(fill = density_log10),
          colour = NA) +
  scale_fill_distiller(name = "IBRA",
                       type = "seq",
                       palette = "YlOrBr",
                       direction = 1,
                       labels = c("0.1", "1", "10", "100"),
                       guide = guide_colorsteps(direction = "horizontal",
                                                label.position = "bottom",
                                                title.position = "left")) +
  # adds a title for both legends
  annotate("text", 
           x = 133, 
           y = -45.5, 
           label = "No. of records per square km",
           size = 6) +
  coord_sf(xlim = c(110, 155), ylim = c(-45, -10)) +
  theme_void() +
  theme(legend.position = "bottom",
        legend.key.width = unit(12, 'mm'))

Success!

One thing to note is that we didn’t necessarily have to use ggnewscale here; we could just as easily have combined all the data and plotted them on the same map without keeping the IBRA and IMCRA datasets separate. But, i) it’s nice to be able to differentiate marine and terrestrial regions at a glance, and ii) using two legends also makes it clear that there’s a stark difference in the number of plant records for marine and terrestrial regions.

Expand for session info

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.2 (2023-10-31 ucrt)
 os       Windows 10 x64 (build 19045)
 system   x86_64, mingw32
 ui       RTerm
 language (EN)
 collate  English_Australia.utf8
 ctype    English_Australia.utf8
 tz       Australia/Sydney
 date     2024-02-12
 pandoc   3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package     * version date (UTC) lib source
 dplyr       * 1.1.4   2023-11-17 [1] CRAN (R 4.3.2)
 galah       * 2.0.1   2024-02-06 [1] CRAN (R 4.3.2)
 ggnewscale  * 0.4.9   2023-05-25 [1] CRAN (R 4.3.2)
 ggplot2     * 3.4.4   2023-10-12 [1] CRAN (R 4.3.1)
 here        * 1.0.1   2020-12-13 [1] CRAN (R 4.3.2)
 htmltools   * 0.5.7   2023-11-03 [1] CRAN (R 4.3.2)
 rmapshaper  * 0.5.0   2023-04-11 [1] CRAN (R 4.3.2)
 sessioninfo * 1.2.2   2021-12-06 [1] CRAN (R 4.3.2)
 sf          * 1.0-14  2023-07-11 [1] CRAN (R 4.3.2)
 stringr     * 1.5.1   2023-11-14 [1] CRAN (R 4.3.2)

 [1] C:/Users/KEL329/R-packages
 [2] C:/Users/KEL329/AppData/Local/Programs/R/R-4.3.2/library

──────────────────────────────────────────────────────────────────────────────