Download a species list and cross-reference with conservation status lists in R

Knowing what species have been observed in a local area is a regular task for ecosystem management. Here we show how to make a species list and how to cross-reference this list with threatened and sensitive species lists. We show how to do this task using 2 methods, with {galah} and using an external shapefile and species list. We then show how to visualise this information as a bar chart and a waffle chart using {ggplot2}.

Eukaryota
Animalia
Plantae
Summaries
R
Authors

Dax Kellie

Amanda Buyan

Published

July 13, 2025

Author

Dax Kellie
Amanda Buyan

Date

20 July 2025

Knowing what species inhabit an area is important for conservation and ecosystem management. In particular, it can help us find how many known species are in a given area, and whether any species are vulnerable or endangered.

In this post, we will present two options, one using the galah package, the other using an external shapefile and list. Using either workflow, we will show you how to download a list of species within a Local Government Area (Shoalhaven, NSW), cross-reference this list with a state conservation status list, and visualise the number of threatened species in the region with waffle and ggplot2.

For those unfamiliar with Australian geography, Shoalhaven is located here:

Let’s first load our packages. To download species lists, you will also need to enter a registered email with the ALA using galah_config().

library(tidyverse)
library(readxl)
library(sf)
library(rmapshaper)
library(here)
library(pilot) # remotes::install_github("olihawkins/pilot")
library(showtext)
library(galah)


galah_config(email = "your-email-here") # ALA-registered email

Download threatened species in an area

Choose which method you would like to view:

  • galah (using fields downloaded from the Atlas of Living Australia)
  • Downloaded shapefile + species list

The method you choose depends on whether the region or list you wish to return species for is already in galah, or whether you wish to filter for a more specific area defined by a separate shapefile or list. Keep in mind that using an external list may require additional work matching taxonomic names.

Whichever method you’ve followed, you will end up with very similar datasets containing threatened species and their statuses, though the number of matched species might differ7.

To finish, we can save our dataframe as a csv file.

# save
write.csv(threatened_status,
          here("path", "to", "file-name.csv"))

Visualise species conservation status

Along with a species list, we can also summarise threatened_status visually. Few options are as simple and easy-to-understand than a bar plot. Here we’ve made a simple bar plot displaying the number of species by conservation status, and styled it with a custom font and some nicer colours.

Code
# custom font
font_add_google("Roboto")
showtext_auto()

# count number of species by status
status_count <- threatened_status |>
  group_by(status) |>
  count()

# bar plot
bar_status <- 
  status_count |>
  arrange(-n) |>
  ggplot() +
  geom_bar(
    mapping = aes(x = status,
                  y = n,
                  fill = status),
    stat = "identity",
    colour = "transparent"
  ) + 
  labs(title = "Threatened species status in Shoalhaven, NSW (2024)",
       x = "Conservation status",
       y = "Number of species") +
  scale_fill_manual(values = c('#ab423f', '#cd826d', '#ebc09e'),
                    labels = c("Vulnerable", "Endangered", "Critically Endangered")) +
  pilot::theme_pilot(legend_position = "none",
                     grid = "",
                     axes = "l") + 
  theme(text = element_text(family = "Roboto"),
        plot.title = element_text(size = 29),
        axis.title = element_text(size = 18),
        axis.text = element_text(size = 16))

bar_status

A useful but more exciting way to see a taxonomic breakdown of species is using a waffle chart. Waffle charts are great because they display number and proportion all at once. For more advanced R users, waffle charts can be a useful summary tool.

Code
library(waffle)
library(glue)
library(marquee)

# Count number of species by taxonomic group
taxa_table <- threatened_status |>
  mutate(
    taxa_group = case_when(
      class == "Aves" ~ "Birds",
      class == "Reptilia" ~ "Reptiles",
      class == "Mammalia" ~ "Mammals",
      kingdom == "Plantae" ~ "Plants",
      .default = "Other"
    )
  ) |>
  group_by(taxa_group) |>
  summarise(n = n()) |>
  mutate(proportion = n/sum(n)*100)

# waffle chart
waffle_taxa <- 
  ggplot() +
  waffle::geom_waffle(
    data = taxa_table |> arrange(-n),             # reorder highest to lowest
    mapping = aes(fill = reorder(taxa_group, -n), # reorder legend
                  values = n),
    colour = "white",
    n_rows = 8,
    size = 1
    ) +
  scale_fill_manual(name = "",
                    values = c('#567c7c', '#687354', '#C3CB80', '#c4ac79', '#38493a'),
                    labels = c("Birds", "Mammals", "Plants", "Reptiles", "Other")
                    ) +
  labs(title = marquee_glue("Taxonomic breakdown of threatened species in Shoalhaven, NSW (2024)"),
       caption = marquee_glue("1 {cli::symbol$square_small_filled} = 1 species")) +
  coord_equal() + 
  theme_void() + 
  theme(legend.position = "bottom",
        text = element_text(family = "Roboto"),
        legend.title = element_text(hjust = 0.5, size = 20),
        legend.text = element_text(size = 18),
        plot.title = element_marquee(hjust = 0.5, size = 14, margin = margin(b=5), family = "Roboto"),
        plot.caption = element_marquee(size = 12, hjust = 1),
        plot.margin = margin(0.5, 1, 0.5, 1, unit = "cm"))

waffle_taxa

Final thoughts

We hope this post has helped you understand how to download a species list for a specific area and compare it to conservation lists. It’s also possible to compare species with other information like lists of migratory species or seasonal species.

For other posts, check out our beginner’s guide to map species observations or see an investigation of dingo observations in the ALA.

Expand for session info

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.5.0 (2025-04-11 ucrt)
 os       Windows 11 x64 (build 22631)
 system   x86_64, mingw32
 ui       RTerm
 language (EN)
 collate  English_Australia.utf8
 ctype    English_Australia.utf8
 tz       Australia/Sydney
 date     2025-07-23
 pandoc   3.4 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package     * version date (UTC) lib source
 dplyr       * 1.1.4   2023-11-17 [1] CRAN (R 4.3.2)
 forcats     * 1.0.0   2023-01-29 [1] CRAN (R 4.3.2)
 galah       * 2.1.2   2025-06-12 [1] CRAN (R 4.5.0)
 ggplot2     * 3.5.1   2024-04-23 [1] CRAN (R 4.4.3)
 glue        * 1.8.0   2024-09-30 [1] CRAN (R 4.4.2)
 here        * 1.0.1   2020-12-13 [1] CRAN (R 4.3.2)
 htmltools   * 0.5.8.1 2024-04-04 [1] CRAN (R 4.4.1)
 lubridate   * 1.9.4   2024-12-08 [1] CRAN (R 4.4.2)
 marquee     * 1.0.0   2025-01-20 [1] CRAN (R 4.5.0)
 ozmaps      * 0.4.5   2021-08-03 [1] CRAN (R 4.3.2)
 pilot       * 4.0.0   2022-07-13 [1] Github (olihawkins/pilot@f08cc16)
 purrr       * 1.0.4   2025-02-05 [1] CRAN (R 4.4.3)
 readr       * 2.1.5   2024-01-10 [1] CRAN (R 4.3.3)
 readxl      * 1.4.3   2023-07-06 [1] CRAN (R 4.3.2)
 rmapshaper  * 0.5.0   2023-04-11 [1] CRAN (R 4.3.2)
 sessioninfo * 1.2.2   2021-12-06 [1] CRAN (R 4.3.2)
 sf          * 1.0-20  2025-03-24 [1] CRAN (R 4.4.3)
 showtext    * 0.9-7   2024-03-02 [1] CRAN (R 4.4.1)
 showtextdb  * 3.0     2020-06-04 [1] CRAN (R 4.3.2)
 stringr     * 1.5.1   2023-11-14 [1] CRAN (R 4.3.2)
 sysfonts    * 0.8.9   2024-03-02 [1] CRAN (R 4.4.1)
 tibble      * 3.2.1   2023-03-20 [1] CRAN (R 4.3.2)
 tidyr       * 1.3.1   2024-01-24 [1] CRAN (R 4.3.3)
 tidyverse   * 2.0.0   2023-02-22 [1] CRAN (R 4.3.2)
 waffle      * 1.0.2   2024-05-03 [1] Github (hrbrmstr/waffle@767875b)

 [1] C:/Users/KEL329/R-packages
 [2] C:/Users/KEL329/AppData/Local/Programs/R/R-4.5.0/library

──────────────────────────────────────────────────────────────────────────────

Footnotes

  1. Each spatial layer has a two letter code, along with a number to identify it. The abbreviations are as follows:
    * cl = contextual layer (i.e. boundaries of LGAs, Indigenous Protected Areas, States/Territories etc.)
    * 11170 = number associated with the spatial layer in the atlas↩︎

  2. We used right_join() this time because we wanted to first select columns from nsw_threatened, then join so that we keep all 90+ rows in threatened (using left_join() would keep all 1,000+ rows in nsw_threatened instead).↩︎

  3. Simplifying a shapefile removes the number of total points that draw the shape outline.↩︎

  4. Check out this post for a better explanation of what CRS is and how it affects maps.↩︎

  5. On a related note, it’s possible to download a list specifically for Shoalhaven on the BioNet Atlas website. However, results from BioNet will be matched BioNet records only. As a result, fewer species will be identifed compared to the ALA, which matches NSW BioNet data as well as data from other sources.↩︎

  6. We can double check status information by viewing the species list in Excel and clicking on links in the info column. This is handy for double checking species status codes or learning more about each species and status.↩︎

  7. This is due to differences in taxonomic names in the externally downloaded list and in ALA data. More info can be found under the “Names Matching” tab in the Shapefile + list section.↩︎