Knowing what species inhabit an area is important for conservation and ecosystem management. In particular, it can help us find how many known species are in a given area, and whether any species are vulnerable or endangered.
In this post, we will present two options, one using the galah package, the other using an external shapefile and list. Using either workflow, we will show you how to download a list of species within a Local Government Area (Shoalhaven, NSW), cross-reference this list with a state conservation status list, and visualise the number of threatened species in the region with waffle and ggplot2.
Let’s first load our packages. To download species lists, you will also need to enter a registered email with the ALA using galah_config()
.
library(tidyverse)
library(readxl)
library(sf)
library(rmapshaper)
library(here)
library(pilot) # remotes::install_github("olihawkins/pilot")
library(showtext)
library(galah)
galah_config(email = "your-email-here") # ALA-registered email
Download threatened species in an area
Choose which method you would like to view:
- galah (using
fields
downloaded from the Atlas of Living Australia) - Downloaded shapefile + species list
The method you choose depends on whether the region or list you wish to return species for is already in galah, or whether you wish to filter for a more specific area defined by a separate shapefile or list. Keep in mind that using an external list may require additional work matching taxonomic names.
Search for fields
To find what exists in galah to help us narrow our query, we can use search_all()
to search for available fields. A field in galah refers to a column or layer stored in a living atlas. Let’s do a text search to find what fields contain information on “Local Government Areas”.
search_all(fields, "Local Government Areas")
# A tibble: 4 × 3
id description type
<chr> <chr> <chr>
1 cl10923 PSMA Local Government Areas (2018) fields
2 cl110923 PSMA Local Government Areas - Abbreviated (2018) fields
3 cl11170 Local Government Areas 2023 fields
4 cl959 Local Government Areas fields
The field cl11170
1 contains the most recent available data (from 2023). We can preview what values are within field cl11170
using show_values()
.
search_all(fields, "cl11170") |>
show_values()
• Showing values for 'cl11170'.
# A tibble: 547 × 1
cl11170
<chr>
1 Unincorporated ACT
2 Brisbane
3 Greater Geelong
4 East Gippsland
5 Moreton Bay
6 Unincorporated SA
7 Yarra Ranges
8 Sunshine Coast
9 Cairns
10 Mareeba
# ℹ 537 more rows
There are lots of Local Government Areas! To check whether Shoalhaven is included, we can do a text search for values that match “shoalhaven”.
search_all(fields, "cl11170") |>
search_values("shoalhaven")
• Showing values for 'cl11170'.
# A tibble: 1 × 1
cl11170
<chr>
1 Shoalhaven
Download data
Using the field
and value
returned above, we can now build our query. We begin our query with galah_call()
and filter to only Shoalhaven in the year 2024. Ending our query with atlas_species()
will return a list of species.
<- galah_call() |>
species_shoal filter(cl11170 == "Shoalhaven",
== 2024) |>
year atlas_species()
species_shoal
# A tibble: 2,936 × 11
taxon_concept_id species_name scientific_name_auth…¹ taxon_rank kingdom
<chr> <chr> <chr> <chr> <chr>
1 https://biodiversity.… Gymnorhina … (Latham, 1801) species Animal…
2 https://biodiversity.… Malurus (Ma… (Ellis, 1782) species Animal…
3 https://biodiversity.… Vanellus (L… (Boddaert, 1783) species Animal…
4 https://biodiversity.… Macropus gi… Shaw, 1790 species Animal…
5 https://biodiversity.… Corvus coro… Vigors & Horsfield, 1… species Animal…
6 https://biodiversity.… Anthochaera… (Latham, 1801) species Animal…
7 https://biodiversity.… Dacelo (Dac… (Hermann, 1783) species Animal…
8 https://biodiversity.… Potorous tr… (Kerr, 1792) species Animal…
9 https://biodiversity.… Chroicoceph… (Stephens, 1826) species Animal…
10 https://biodiversity.… Grallina cy… (Latham, 1801) species Animal…
# ℹ 2,926 more rows
# ℹ abbreviated name: ¹scientific_name_authorship
# ℹ 6 more variables: phylum <chr>, class <chr>, order <chr>, family <chr>,
# genus <chr>, vernacular_name <chr>
atlas_species()
returns taxonomic information at the species level (for more info, see the tab below). To make sure we return taxonomic information at the lowest level each occurrence was identified, we’ll group_by(taxonConceptID)
, which is a unique ID attached to each occurrence record’s taxonomic identification (read the box below for more on what this means).
By default atlas_species()
only returns taxonomic information at the species level. This means that if some species are identified to subspecies on a specific list like the NSW Conservation Status list, atlas_species()
will return the species-level match, rather than the subspecies-level match. For example, the name "Potorous tridactylus"
is returned instead of "Potorous tridactylus tridactylus"
.
Grouping by taxonConceptID
like we do below specifies that we wish to match to the identified taxon, rather than only to the species level.
<- galah_call() |>
species_shoal filter(cl11170 == "Shoalhaven",
== 2024) |>
year group_by(taxonConceptID) |>
atlas_species()
species_shoal
# A tibble: 4,282 × 11
taxon_concept_id species_name scientific_name_auth…¹ taxon_rank kingdom
<chr> <chr> <chr> <chr> <chr>
1 https://biodiversity.… Gymnorhina … (Latham, 1801) species Animal…
2 https://biodiversity.… Malurus (Ma… (Ellis, 1782) species Animal…
3 https://biodiversity.… Macropus gi… Shaw, 1790 species Animal…
4 https://biodiversity.… Corvus coro… Vigors & Horsfield, 1… species Animal…
5 https://biodiversity.… Trichogloss… Stephens, 1826 genus Animal…
6 https://biodiversity.… Vanellus (L… (Boddaert, 1783) species Animal…
7 https://biodiversity.… Anthochaera… (Latham, 1801) species Animal…
8 https://biodiversity.… Dacelo (Dac… (Hermann, 1783) species Animal…
9 https://biodiversity.… Potorous tr… (McCoy, 1865) subspecies Animal…
10 https://biodiversity.… Chroicoceph… (Stephens, 1826) species Animal…
# ℹ 4,272 more rows
# ℹ abbreviated name: ¹scientific_name_authorship
# ℹ 6 more variables: phylum <chr>, class <chr>, order <chr>, family <chr>,
# genus <chr>, vernacular_name <chr>
It’s also possible to return number of observations by ending our query with atlas_counts()
. In this case, we can group by scientificName
(the name of the lowest level the observation was identified).
galah_call() |>
filter(cl11170 == "Shoalhaven",
== 2024) |>
year group_by(scientificName) |>
atlas_counts()
# A tibble: 4,282 × 2
scientificName count
<chr> <int>
1 Gymnorhina tibicen 934
2 Malurus (Malurus) cyaneus 926
3 Macropus giganteus 888
4 Corvus coronoides 862
5 Trichoglossus 846
6 Vanellus (Lobipluvia) miles 845
7 Anthochaera (Anellobia) chrysoptera 820
8 Dacelo (Dacelo) novaeguineae 817
9 Potorous tridactylus trisulcatus 806
10 Chroicocephalus novaehollandiae 789
# ℹ 4,272 more rows
Cross-reference with threatened species lists
Next we’ll compare our Shoalhaven species list species_shoal
with a state-wide conservation status list. We can use galah to access lists that are available on the Atlas of Living Australia. Shoalhaven is within the state of New South Wales, so let’s search for “New South Wales” to see what state-specific lists are available.
search_all(lists, "New South Wales")
# A tibble: 2 × 22
species_list_uid listName description listType dateCreated lastUpdated
<chr> <chr> <chr> <chr> <chr> <chr>
1 dr650 New South Wales… "Classific… CONSERV… 2015-04-04… 2025-07-08…
2 dr487 New South Wales… "The NSW G… SENSITI… 2013-06-20… 2025-07-08…
# ℹ 16 more variables: lastUploaded <chr>, lastMatched <chr>, username <chr>,
# itemCount <int>, region <chr>, isAuthoritative <lgl>, isInvasive <lgl>,
# isThreatened <lgl>, isBIE <lgl>, isSDS <lgl>, wkt <chr>, category <chr>,
# generalisation <chr>, authority <chr>, sdsType <chr>, looseSearch <lgl>
Two lists are returned, and both appear relevant. With the help of some additional columns returned by search_all()
—listType
, isAuthoritative
and isThreatened
—we can learn more about which list suits our needs best. Although both lists are authoritative, only one list (dr650
) contains threatened species whereas the other (dr487
) contains sensitive species.
search_all(lists, "New South Wales") |>
select(species_list_uid, listType, isAuthoritative, isThreatened)
# A tibble: 2 × 4
species_list_uid listType isAuthoritative isThreatened
<chr> <chr> <lgl> <lgl>
1 dr650 CONSERVATION_LIST TRUE TRUE
2 dr487 SENSITIVE_LIST TRUE FALSE
By specifying the ID dr650
and using show_values()
, we can view the complete New South Wales threatened species list.
search_all(lists, "dr650") |>
show_values()
• Showing values for 'dr650'.
# A tibble: 1,064 × 6
id name commonName scientificName lsid dataResourceUid
<int> <chr> <chr> <chr> <chr> <chr>
1 6791272 Delma impar Striped L… Delma impar http… dr650
2 6790725 Callocephalon fimbri… Gang-gang… Callocephalon… http… dr650
3 6790769 Cacophis harriettae White-cro… Cacophis harr… http… dr650
4 6791482 Litoria booroolongen… Booroolon… Litoria booro… http… dr650
5 6790526 Anthochaera phrygia Regent Ho… Anthochaera (… http… dr650
6 6791456 Calidris tenuirostris Great Knot Calidris (Cal… http… dr650
7 6790500 Neochmia ruficauda Star Finch Neochmia (Neo… http… dr650
8 6790752 Uvidicolus sphyrurus Border Th… Uvidicolus sp… http… dr650
9 6791291 Amaurornis moluccana Pale-vent… Amaurornis mo… http… dr650
10 6791135 Phascogale tapoatafa Brush-tai… Phascogale ta… http… dr650
# ℹ 1,054 more rows
As of galah version 2.1.2, we can also use show_values()
to add conservation status columns to our species list. By adding the argument all_fields = TRUE
, we can add any columns stored in the ALA from the original list. For conservation lists, this includes columns like status
, sourceStatus
and IUCN_Status
.
<- search_all(lists, "dr650") |>
nsw_threatened show_values(all_fields = TRUE)
|>
nsw_threatened # reposition cols
select(status, sourceStatus, IUCN_equivalent_status,
everything()) scientificName,
# A tibble: 1,064 × 13
status sourceStatus IUCN_equivalent_status scientificName id name
<chr> <chr> <chr> <chr> <int> <chr>
1 Vulnerable Vulnerable Vulnerable Delma impar 6.79e6 Delm…
2 Endangered Endangered Endangered Callocephalon… 6.79e6 Call…
3 Vulnerable Vulnerable Vulnerable Cacophis harr… 6.79e6 Caco…
4 Endangered Endangered Endangered Litoria booro… 6.79e6 Lito…
5 Critically E… Critically … Critically Endangered Anthochaera (… 6.79e6 Anth…
6 Vulnerable Vulnerable Vulnerable Calidris (Cal… 6.79e6 Cali…
7 Extinct Extinct Extinct Neochmia (Neo… 6.79e6 Neoc…
8 Vulnerable Vulnerable Vulnerable Uvidicolus sp… 6.79e6 Uvid…
9 Vulnerable Vulnerable Vulnerable Amaurornis mo… 6.79e6 Amau…
10 Vulnerable Vulnerable Vulnerable Phascogale ta… 6.79e6 Phas…
# ℹ 1,054 more rows
# ℹ 7 more variables: commonName <chr>, lsid <chr>, dataResourceUid <chr>,
# raw_scientificName <chr>, vernacularName <chr>, rank <chr>, family <chr>
Adding status info can be handy if we want to join this with other information like record counts.
# get record counts for each species on the NSW Conservation Status list
<- galah_call() |>
threatened_counts galah_filter(species_list_uid == dr650,
== "Shoalhaven",
cl11170 == 2024) |>
year galah_group_by(scientificName) |>
atlas_counts()
threatened_counts
# A tibble: 94 × 2
scientificName count
<chr> <int>
1 Potorous tridactylus trisulcatus 806
2 Haematopus longirostris 542
3 Haliaeetus (Pontoaetus) leucogaster 343
4 Haematopus fuliginosus 221
5 Sternula albifrons 208
6 Numenius (Numenius) madagascariensis 197
7 Calyptorhynchus (Calyptorhynchus) lathami lathami 124
8 Dasyurus maculatus 111
9 Callocephalon fimbriatum 93
10 Esacus magnirostris 63
# ℹ 84 more rows
# join counts to status information
<-
threatened_counts_joined |>
threatened_counts left_join(nsw_threatened,
join_by(scientificName == scientificName)) |>
# reposition cols
select(scientificName, count, status, commonName, everything())
threatened_counts_joined
# A tibble: 94 × 14
scientificName count status commonName id name lsid dataResourceUid
<chr> <int> <chr> <chr> <int> <chr> <chr> <chr>
1 Potorous tridacty… 806 Vulne… Long-nose… 6.79e6 Poto… http… dr650
2 Haematopus longir… 542 Endan… Australia… 6.79e6 Haem… http… dr650
3 Haliaeetus (Ponto… 343 Vulne… White-bel… 6.79e6 Hali… http… dr650
4 Haematopus fuligi… 221 Vulne… Sooty Oys… 6.79e6 Haem… http… dr650
5 Sternula albifrons 208 Endan… Little Te… 6.79e6 Ster… http… dr650
6 Numenius (Numeniu… 197 Criti… Eastern C… 6.79e6 Nume… http… dr650
7 Calyptorhynchus (… 124 Vulne… South-eas… 6.79e6 Caly… http… dr650
8 Dasyurus maculatus 111 Vulne… Bindjulang 6.79e6 Dasy… http… dr650
9 Callocephalon fim… 93 Endan… Gang-gang… 6.79e6 Call… http… dr650
10 Esacus magnirostr… 63 Criti… Beach Sto… 6.79e6 Esac… http… dr650
# ℹ 84 more rows
# ℹ 6 more variables: raw_scientificName <chr>, vernacularName <chr>,
# rank <chr>, family <chr>, sourceStatus <chr>, IUCN_equivalent_status <chr>
To return which species on the New South Wales Conservation Status List (dr650
) were recorded in Shoalhaven in 2024, we can add species_list_uid == dr650
as a filter to a query ending with atlas_species()
. To make sure we return taxonomic information at the lowest level each occurrence was identified, we’ll group_by(taxonConceptID)
.
<- galah_call() |>
threatened galah_filter(cl11170 == "Shoalhaven",
== 2024,
year == dr650) |>
species_list_uid group_by(taxonConceptID) |>
atlas_species()
threatened
# A tibble: 94 × 11
taxon_concept_id species_name scientific_name_auth…¹ taxon_rank kingdom
<chr> <chr> <chr> <chr> <chr>
1 https://biodiversity.… Potorous tr… (McCoy, 1865) subspecies Animal…
2 https://biodiversity.… Haematopus … Vieillot, 1817 species Animal…
3 https://biodiversity.… Haliaeetus … (Gmelin, 1788) species Animal…
4 https://biodiversity.… Haematopus … Gould, 1845 species Animal…
5 https://biodiversity.… Sternula al… (Pallas, 1764) species Animal…
6 https://biodiversity.… Numenius (N… (Linnaeus, 1766) species Animal…
7 https://biodiversity.… Calyptorhyn… (Temminck, 1807) subspecies Animal…
8 https://biodiversity.… Dasyurus ma… (Kerr, 1792) species Animal…
9 https://biodiversity.… Callocephal… (Grant, 1803) species Animal…
10 https://biodiversity.… Esacus magn… Vieillot, 1818 species Animal…
# ℹ 84 more rows
# ℹ abbreviated name: ¹scientific_name_authorship
# ℹ 6 more variables: phylum <chr>, class <chr>, order <chr>, family <chr>,
# genus <chr>, vernacular_name <chr>
Note that status information is not included in the query above, but can be joined in the same way we added this status information to threatened_counts
2.
# select status columns, join status information
<-
threatened_status |>
nsw_threatened select(scientificName, status, sourceStatus, IUCN_equivalent_status) |>
right_join(threatened,
join_by(scientificName == species_name)) |>
# reposition cols
select(scientificName, status, sourceStatus, everything())
threatened_status
# A tibble: 94 × 14
scientificName status sourceStatus IUCN_equivalent_status taxon_concept_id
<chr> <chr> <chr> <chr> <chr>
1 Callocephalon fi… Endan… Endangered Endangered https://biodive…
2 Ninox (Rhabdogla… Vulne… Vulnerable Vulnerable https://biodive…
3 Limosa limosa Vulne… Vulnerable Vulnerable https://biodive…
4 Numenius (Numeni… Criti… Critically … Critically Endangered https://biodive…
5 Hirundapus cauda… Vulne… Vulnerable Vulnerable https://biodive…
6 Haliaeetus (Pont… Vulne… Vulnerable Vulnerable https://biodive…
7 Tyto tenebricosa Vulne… Vulnerable Vulnerable https://biodive…
8 Chalinolobus dwy… Endan… Endangered Endangered https://biodive…
9 Ixobrychus flavi… Vulne… Vulnerable Vulnerable https://biodive…
10 Hoplocephalus bu… Endan… Endangered Endangered https://biodive…
# ℹ 84 more rows
# ℹ 9 more variables: scientific_name_authorship <chr>, taxon_rank <chr>,
# kingdom <chr>, phylum <chr>, class <chr>, order <chr>, family <chr>,
# genus <chr>, vernacular_name <chr>
Download shapefile
To retrieve the spatial outline of Shoalhaven, let’s download the latest Local Government Areas data from the Australian Bureau of Statistics Digital Boundary files page. Find “Local Government Areas - 2023 - Shapefile” and click “Download ZIP”. Save the zip folder in your current directory and unzip it.
Let’s read the file into R. We will also simplify the shapefile3 using ms_simplify()
from the rmapshaper package because complex shapefiles can sometimes cause problems with sending queries to the ALA.
<- sf::st_read(here("LGA_2023_AUST_GDA2020.shp")) |>
lga ::ms_simplify(keep = 0.01)
rmapshaper lga
Simple feature collection with 544 features and 8 fields
Geometry type: GEOMETRY
Dimension: XY
Bounding box: xmin: 105.5335 ymin: -43.6331 xmax: 167.9969 ymax: -9.229273
Geodetic CRS: GDA2020
First 10 features:
LGA_CODE23 LGA_NAME23 AUS_CODE21 STE_CODE21 STE_NAME21 AREASQKM
1 10050 Albury AUS 1 New South Wales 305.6386
2 10180 Armidale AUS 1 New South Wales 7809.4406
3 10250 Ballina AUS 1 New South Wales 484.9692
4 10300 Balranald AUS 1 New South Wales 21690.7493
5 10470 Bathurst AUS 1 New South Wales 3817.8645
6 10500 Bayside (NSW) AUS 1 New South Wales 50.6204
7 10550 Bega Valley AUS 1 New South Wales 6278.5013
8 10600 Bellingen AUS 1 New South Wales 1600.4338
9 10650 Berrigan AUS 1 New South Wales 2065.8878
10 10750 Blacktown AUS 1 New South Wales 238.8471
AUS_NAME21 LOCI_URI21
1 Australia https://linked.data.gov.au/dataset/asgsed3/LGA2023/10050
2 Australia https://linked.data.gov.au/dataset/asgsed3/LGA2023/10180
3 Australia https://linked.data.gov.au/dataset/asgsed3/LGA2023/10250
4 Australia https://linked.data.gov.au/dataset/asgsed3/LGA2023/10300
5 Australia https://linked.data.gov.au/dataset/asgsed3/LGA2023/10470
6 Australia https://linked.data.gov.au/dataset/asgsed3/LGA2023/10500
7 Australia https://linked.data.gov.au/dataset/asgsed3/LGA2023/10550
8 Australia https://linked.data.gov.au/dataset/asgsed3/LGA2023/10600
9 Australia https://linked.data.gov.au/dataset/asgsed3/LGA2023/10650
10 Australia https://linked.data.gov.au/dataset/asgsed3/LGA2023/10750
geometry
1 POLYGON ((146.8177 -36.0673...
2 POLYGON ((152.2957 -30.9310...
3 POLYGON ((153.4496 -28.7550...
4 POLYGON ((143.5525 -33.1404...
5 POLYGON ((149.3947 -33.9975...
6 POLYGON ((151.155 -33.92618...
7 POLYGON ((149.9762 -37.5051...
8 POLYGON ((152.8035 -30.1895...
9 POLYGON ((145.4845 -35.5119...
10 POLYGON ((150.8129 -33.8223...
Now let’s transform our shapefile to use the Coordinate Reference System (CRS) EPSG:4326 (the standard used in cartography and GPS, also known as WGS84) so that it matches the projection of our data from the ALA 4.
<- lga |>
lga st_transform(crs = 4326)
Next we’ll filter our shapefile to Shoalhaven. The column LGA_NAME23
contains area names, and we can filter our data frame to only rows where LGA_NAME23
is equal to Shoalhaven
. We are left with a single polygon shape of Shoalhaven.
<- lga |>
shoalhaven_sf filter(LGA_NAME23 == "Shoalhaven")
shoalhaven_sf
Simple feature collection with 1 feature and 8 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 149.9774 ymin: -35.64458 xmax: 150.8494 ymax: -34.65044
Geodetic CRS: WGS 84
LGA_CODE23 LGA_NAME23 AUS_CODE21 STE_CODE21 STE_NAME21 AREASQKM
1 16950 Shoalhaven AUS 1 New South Wales 4567.201
AUS_NAME21 LOCI_URI21
1 Australia https://linked.data.gov.au/dataset/asgsed3/LGA2023/16950
geometry
1 POLYGON ((150.7813 -34.7921...
Download data
Now that shoalhaven_sf
contains our LGA shape, we can build our query. Once again, we’ll begin with galah_call()
and filter to only records from 2024. We can specify that we want records within shoalhaven_sf
using geolocate()
. To make sure we return taxonomic information at the level occurrences were identified to, we’ll group_by(taxonConceptID)
, which is a unique ID attached to each occurrence record’s taxonomic identification (read the box below for more on what this means). Finally, we can return a species list by ending our query with atlas_species()
.
By default atlas_species()
only returns taxonomic information at the species level. This means that if some species are identified to subspecies on a specific list like the NSW Conservation Status list, atlas_species()
will return the species-level match, rather than the subspecies-level match. For example, the name "Potorous tridactylus"
is returned instead of "Potorous tridactylus tridactylus"
.
Grouping by taxonConceptID
like we do below specifies that we wish to match to the identified taxon, rather than only to the species level.
<- galah_call() |>
species_shoal filter(year == 2024) |>
geolocate(shoalhaven_sf) |>
group_by(taxonConceptID) |>
atlas_species()
species_shoal
# A tibble: 4,459 × 11
taxon_concept_id species_name scientific_name_auth…¹ taxon_rank kingdom
<chr> <chr> <chr> <chr> <chr>
1 https://biodiversity.… Malurus (Ma… (Ellis, 1782) species Animal…
2 https://biodiversity.… Gymnorhina … (Latham, 1801) species Animal…
3 https://biodiversity.… Macropus gi… Shaw, 1790 species Animal…
4 https://biodiversity.… Corvus coro… Vigors & Horsfield, 1… species Animal…
5 https://biodiversity.… Vanellus (L… (Boddaert, 1783) species Animal…
6 https://biodiversity.… Trichogloss… Stephens, 1826 genus Animal…
7 https://biodiversity.… Potorous tr… (McCoy, 1865) subspecies Animal…
8 https://biodiversity.… Anthochaera… (Latham, 1801) species Animal…
9 https://biodiversity.… Dacelo (Dac… (Hermann, 1783) species Animal…
10 https://biodiversity.… Chroicoceph… (Stephens, 1826) species Animal…
# ℹ 4,449 more rows
# ℹ abbreviated name: ¹scientific_name_authorship
# ℹ 6 more variables: phylum <chr>, class <chr>, order <chr>, family <chr>,
# genus <chr>, vernacular_name <chr>
It’s also possible to return the observations counts by ending our query with atlas_counts()
. In this case, we can group by scientificName
(the name of the lowest level the observation was identified).
galah_call() |>
filter(year == 2024) |>
geolocate(shoalhaven_sf) |>
group_by(scientificName) |>
atlas_counts()
# A tibble: 4,459 × 2
scientificName count
<chr> <int>
1 Malurus (Malurus) cyaneus 931
2 Gymnorhina tibicen 923
3 Macropus giganteus 889
4 Corvus coronoides 856
5 Vanellus (Lobipluvia) miles 847
6 Trichoglossus 845
7 Potorous tridactylus trisulcatus 818
8 Anthochaera (Anellobia) chrysoptera 815
9 Dacelo (Dacelo) novaeguineae 801
10 Chroicocephalus novaehollandiae 795
# ℹ 4,449 more rows
Use external list
We can use our own conservation status lists from an external source to compare to our Shoalhaven species list. As an example, we are using the the New South Wales Conservation Status List downloaded from the NSW Bionet Atlas website5.
We downloaded this list on 2025/07/08. To download a complete NSW threatened species list, we selected the following options:
- Which species or group? All entities
- Legal status? Select records that fall under one or more categories ➝ Threatened NSW
- What area? Entire area
- Period of records? All records
- Status? Valid records only
Save the downloaded .xls
file in your working directory. We’ll read in our .xls
file, which we have renamed to nsw_threatened.xls
.
<- readxl::read_excel(here("path", "to", "nsw_threatened.xls"),
nsw_threatened_list skip = 3) # skip first 3 rows
It’s possible you might receive the following error.
Error:
filepath: [path-to-file]
libxls error: Unable to open file
This relates to a formatting issue preventing read_excel()
from reading the file correctly, which seems related to the way BioNet saves its files. To fix this issue, open the list file on your computer, then re-save the file as a .xlsx
document (File ➝ Save As ➝ select file format.xlsx
➝ Save). Then you can use read_excel()
to read the new file in.
<- readxl::read_excel(here("path", "to", "nsw_threatened.xlsx"),
nsw_threatened_list skip = 3) # skip first 3 rows
Cross-reference with threatened species lists
First we’ll clean the column names to make them easier to use in R using the amazing function janitor::clean_names()
. We also need to remove the ^
that precedes some names on the list.
<- nsw_threatened_list |>
nsw_threatened_list ::clean_names() |>
janitormutate(
scientific_name = stringr::str_remove_all(scientific_name, "\\^")
)
nsw_threatened_list
# A tibble: 1,218 × 11
kingdom class family species_code scientific_name exotic common_name
<chr> <chr> <chr> <chr> <chr> <lgl> <chr>
1 Animalia Amphibia Myobatrach… 3007 Assa darlingto… NA Pouched Fr…
2 Animalia Amphibia Myobatrach… 3135 Crinia sloanei NA Sloane's F…
3 Animalia Amphibia Myobatrach… 3137 Crinia tinnula NA Wallum Fro…
4 Animalia Amphibia Myobatrach… 3073 Mixophyes balb… NA Stuttering…
5 Animalia Amphibia Myobatrach… 3008 Mixophyes flea… NA Fleay's Ba…
6 Animalia Amphibia Myobatrach… 3075 Mixophyes iter… NA Giant Barr…
7 Animalia Amphibia Myobatrach… 3116 Pseudophryne a… NA Red-crowne…
8 Animalia Amphibia Myobatrach… 3119 Pseudophryne c… NA Southern C…
9 Animalia Amphibia Myobatrach… 3306 Pseudophryne p… NA Northern C…
10 Animalia Amphibia Myobatrach… 3932 Uperoleia maho… NA Mahony's T…
# ℹ 1,208 more rows
# ℹ 4 more variables: nsw_status <chr>, comm_status <chr>, records <chr>,
# info <lgl>
Now we can filter our Shoalhaven list to only those that match names in nsw_threatened_list
.
<- species_shoal |>
threatened_filter filter(species_name %in% nsw_threatened_list$scientific_name)
threatened_filter
# A tibble: 83 × 11
taxon_concept_id species_name scientific_name_auth…¹ taxon_rank kingdom
<chr> <chr> <chr> <chr> <chr>
1 https://biodiversity.… Potorous tr… (McCoy, 1865) subspecies Animal…
2 https://biodiversity.… Haematopus … Vieillot, 1817 species Animal…
3 https://biodiversity.… Perameles n… Geoffroy, 1804 species Animal…
4 https://biodiversity.… Haematopus … Gould, 1845 species Animal…
5 https://biodiversity.… Sternula al… (Pallas, 1764) species Animal…
6 https://biodiversity.… Dasyurus ma… (Kerr, 1792) species Animal…
7 https://biodiversity.… Callocephal… (Grant, 1803) species Animal…
8 https://biodiversity.… Esacus magn… Vieillot, 1818 species Animal…
9 https://biodiversity.… Tyto novaeh… (Stephens, 1826) species Animal…
10 https://biodiversity.… Hirundapus … (Latham, 1801) species Animal…
# ℹ 73 more rows
# ℹ abbreviated name: ¹scientific_name_authorship
# ℹ 6 more variables: phylum <chr>, class <chr>, order <chr>, family <chr>,
# genus <chr>, vernacular_name <chr>
To preserve status information, instead we can join species_shoal
and nsw_threatened_list
dataframes, which will retain columns while still filtering results.
<- species_shoal |>
threatened_joined left_join(
|>
nsw_threatened_list select(scientific_name, common_name, nsw_status, comm_status),
join_by(species_name == scientific_name)
|>
) filter(!is.na(nsw_status))
threatened_joined
# A tibble: 87 × 14
taxon_concept_id species_name scientific_name_auth…¹ taxon_rank kingdom
<chr> <chr> <chr> <chr> <chr>
1 https://biodiversity.… Potorous tr… (McCoy, 1865) subspecies Animal…
2 https://biodiversity.… Haematopus … Vieillot, 1817 species Animal…
3 https://biodiversity.… Perameles n… Geoffroy, 1804 species Animal…
4 https://biodiversity.… Perameles n… Geoffroy, 1804 species Animal…
5 https://biodiversity.… Haematopus … Gould, 1845 species Animal…
6 https://biodiversity.… Sternula al… (Pallas, 1764) species Animal…
7 https://biodiversity.… Dasyurus ma… (Kerr, 1792) species Animal…
8 https://biodiversity.… Callocephal… (Grant, 1803) species Animal…
9 https://biodiversity.… Esacus magn… Vieillot, 1818 species Animal…
10 https://biodiversity.… Tyto novaeh… (Stephens, 1826) species Animal…
# ℹ 77 more rows
# ℹ abbreviated name: ¹scientific_name_authorship
# ℹ 9 more variables: phylum <chr>, class <chr>, order <chr>, family <chr>,
# genus <chr>, vernacular_name <chr>, common_name <chr>, nsw_status <chr>,
# comm_status <chr>
Species lists from BioNet Atlas can sometimes return both species and specific populations of the same species with their own conservation status, respectively. When matching species names, this means there can be multiple matches for the same species. For example, there are seperate conservation statuses assigned to the Yellow bellied glider and a Yellow bellied glider population in the Bago Plateau.
Code
|>
threatened_joined filter(species_name == "Petaurus australis") |>
select(species_name, common_name, nsw_status)
# A tibble: 2 × 3
species_name common_name nsw_status
<chr> <chr> <chr>
1 Petaurus australis Yellow-bellied Glider V,P
2 Petaurus australis Yellow-bellied Glider population on the Bago Pl… E2,V,P
These multiple statuses explain why there are several more rows when we join dataframes (threatened_joined
) compared to when we filter by species names (threatened_filter
).
You might notice that fewer species are returned when using an externally downloaded list than using galah. This discrepancy is due to differences in scientific names between those on the BioNet Atlas and those on the ALA. Names mismatches are a risk when using external species lists. Additional work is usually needed to avoid unexpected mismatches. The Cleaning Biodiversity Data in R book details some methods for finding name synonyms, but amending taxonomic names can be difficult.
When ALA ingests data, it matches those data to the ALA’s taxonomic backbone, with the goal of minimising name mismatches. We recommend using galah because it makes names matching easier. However, not all lists exist on the ALA, so some tasks inevitably require matching to externally downloaded lists.
To use this list for summarising or plotting, it might be useful to add to threatened_joined
status information for each species as vulnerable, endangered, critically endangered or extinct. To add this info, we’ll extract the first value of nsw_status
by removing everything after the first comma and save that value in nsw_status_extracted
. Then we’ll recode these values6 and save them in nsw_status_simple
.
<- threatened_joined |>
threatened_clean mutate(
nsw_status_extracted = stringr::str_remove_all(nsw_status, "\\,.*"),
nsw_status_simple = case_match(
nsw_status_extracted,"V" ~ "Vulnerable",
c("E1", "E2", "E3") ~ "Endangered",
c("E4A") ~ "Critically Endangered",
c("E4") ~ "Extinct",
.default = nsw_status_extracted
)
)
|>
threatened_clean # re-position cols
select(nsw_status, nsw_status_extracted, nsw_status_simple, species_name, everything())
# A tibble: 87 × 16
nsw_status nsw_status_extracted nsw_status_simple species_name
<chr> <chr> <chr> <chr>
1 V,P V Vulnerable Potorous tridactylus t…
2 E1,P E1 Endangered Haematopus longirostris
3 E2,P E2 Endangered Perameles nasuta
4 E2,P E2 Endangered Perameles nasuta
5 V,P V Vulnerable Haematopus fuliginosus
6 E1,P E1 Endangered Sternula albifrons
7 V,P V Vulnerable Dasyurus maculatus
8 E1,P,3 E1 Endangered Callocephalon fimbriat…
9 E4A,P E4A Critically Endangered Esacus magnirostris
10 V,P,3 V Vulnerable Tyto novaehollandiae
# ℹ 77 more rows
# ℹ 12 more variables: taxon_concept_id <chr>,
# scientific_name_authorship <chr>, taxon_rank <chr>, kingdom <chr>,
# phylum <chr>, class <chr>, order <chr>, family <chr>, genus <chr>,
# vernacular_name <chr>, common_name <chr>, comm_status <chr>
Whichever method you’ve followed, you will end up with very similar datasets containing threatened species and their statuses, though the number of matched species might differ7.
|>
threatened_status ::paged_table() rmarkdown
|>
threatened_clean select(species_name, nsw_status_simple, everything()) |>
::paged_table() rmarkdown
To finish, we can save our dataframe as a csv file.
# save
write.csv(threatened_status,
here("path", "to", "file-name.csv"))
Visualise species conservation status
Along with a species list, we can also summarise threatened_status
visually. Few options are as simple and easy-to-understand than a bar plot. Here we’ve made a simple bar plot displaying the number of species by conservation status, and styled it with a custom font and some nicer colours.
Code
# custom font
font_add_google("Roboto")
showtext_auto()
# count number of species by status
<- threatened_status |>
status_count group_by(status) |>
count()
# bar plot
<-
bar_status |>
status_count arrange(-n) |>
ggplot() +
geom_bar(
mapping = aes(x = status,
y = n,
fill = status),
stat = "identity",
colour = "transparent"
+
) labs(title = "Threatened species status in Shoalhaven, NSW (2024)",
x = "Conservation status",
y = "Number of species") +
scale_fill_manual(values = c('#ab423f', '#cd826d', '#ebc09e'),
labels = c("Vulnerable", "Endangered", "Critically Endangered")) +
::theme_pilot(legend_position = "none",
pilotgrid = "",
axes = "l") +
theme(text = element_text(family = "Roboto"),
plot.title = element_text(size = 29),
axis.title = element_text(size = 18),
axis.text = element_text(size = 16))
bar_status
A useful but more exciting way to see a taxonomic breakdown of species is using a waffle chart. Waffle charts are great because they display number and proportion all at once. For more advanced R users, waffle charts can be a useful summary tool.
Code
library(waffle)
library(glue)
library(marquee)
# Count number of species by taxonomic group
<- threatened_status |>
taxa_table mutate(
taxa_group = case_when(
== "Aves" ~ "Birds",
class == "Reptilia" ~ "Reptiles",
class == "Mammalia" ~ "Mammals",
class == "Plantae" ~ "Plants",
kingdom .default = "Other"
)|>
) group_by(taxa_group) |>
summarise(n = n()) |>
mutate(proportion = n/sum(n)*100)
# waffle chart
<-
waffle_taxa ggplot() +
::geom_waffle(
waffledata = taxa_table |> arrange(-n), # reorder highest to lowest
mapping = aes(fill = reorder(taxa_group, -n), # reorder legend
values = n),
colour = "white",
n_rows = 8,
size = 1
+
) scale_fill_manual(name = "",
values = c('#567c7c', '#687354', '#C3CB80', '#c4ac79', '#38493a'),
labels = c("Birds", "Mammals", "Plants", "Reptiles", "Other")
+
) labs(title = marquee_glue("Taxonomic breakdown of threatened species in Shoalhaven, NSW (2024)"),
caption = marquee_glue("1 {cli::symbol$square_small_filled} = 1 species")) +
coord_equal() +
theme_void() +
theme(legend.position = "bottom",
text = element_text(family = "Roboto"),
legend.title = element_text(hjust = 0.5, size = 20),
legend.text = element_text(size = 18),
plot.title = element_marquee(hjust = 0.5, size = 14, margin = margin(b=5), family = "Roboto"),
plot.caption = element_marquee(size = 12, hjust = 1),
plot.margin = margin(0.5, 1, 0.5, 1, unit = "cm"))
waffle_taxa
Final thoughts
We hope this post has helped you understand how to download a species list for a specific area and compare it to conservation lists. It’s also possible to compare species with other information like lists of migratory species or seasonal species.
For other posts, check out our beginner’s guide to map species observations or see an investigation of dingo observations in the ALA.
Expand for session info
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.5.0 (2025-04-11 ucrt)
os Windows 11 x64 (build 22631)
system x86_64, mingw32
ui RTerm
language (EN)
collate English_Australia.utf8
ctype English_Australia.utf8
tz Australia/Sydney
date 2025-07-23
pandoc 3.4 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.3.2)
forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.3.2)
galah * 2.1.2 2025-06-12 [1] CRAN (R 4.5.0)
ggplot2 * 3.5.1 2024-04-23 [1] CRAN (R 4.4.3)
glue * 1.8.0 2024-09-30 [1] CRAN (R 4.4.2)
here * 1.0.1 2020-12-13 [1] CRAN (R 4.3.2)
htmltools * 0.5.8.1 2024-04-04 [1] CRAN (R 4.4.1)
lubridate * 1.9.4 2024-12-08 [1] CRAN (R 4.4.2)
marquee * 1.0.0 2025-01-20 [1] CRAN (R 4.5.0)
ozmaps * 0.4.5 2021-08-03 [1] CRAN (R 4.3.2)
pilot * 4.0.0 2022-07-13 [1] Github (olihawkins/pilot@f08cc16)
purrr * 1.0.4 2025-02-05 [1] CRAN (R 4.4.3)
readr * 2.1.5 2024-01-10 [1] CRAN (R 4.3.3)
readxl * 1.4.3 2023-07-06 [1] CRAN (R 4.3.2)
rmapshaper * 0.5.0 2023-04-11 [1] CRAN (R 4.3.2)
sessioninfo * 1.2.2 2021-12-06 [1] CRAN (R 4.3.2)
sf * 1.0-20 2025-03-24 [1] CRAN (R 4.4.3)
showtext * 0.9-7 2024-03-02 [1] CRAN (R 4.4.1)
showtextdb * 3.0 2020-06-04 [1] CRAN (R 4.3.2)
stringr * 1.5.1 2023-11-14 [1] CRAN (R 4.3.2)
sysfonts * 0.8.9 2024-03-02 [1] CRAN (R 4.4.1)
tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.3.2)
tidyr * 1.3.1 2024-01-24 [1] CRAN (R 4.3.3)
tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.3.2)
waffle * 1.0.2 2024-05-03 [1] Github (hrbrmstr/waffle@767875b)
[1] C:/Users/KEL329/R-packages
[2] C:/Users/KEL329/AppData/Local/Programs/R/R-4.5.0/library
──────────────────────────────────────────────────────────────────────────────
Footnotes
Each spatial layer has a two letter code, along with a number to identify it. The abbreviations are as follows:
*cl
= contextual layer (i.e. boundaries of LGAs, Indigenous Protected Areas, States/Territories etc.)
*11170
= number associated with the spatial layer in the atlas↩︎We used
right_join()
this time because we wanted to first select columns fromnsw_threatened
, then join so that we keep all 90+ rows inthreatened
(usingleft_join()
would keep all 1,000+ rows innsw_threatened
instead).↩︎Simplifying a shapefile removes the number of total points that draw the shape outline.↩︎
Check out this post for a better explanation of what CRS is and how it affects maps.↩︎
On a related note, it’s possible to download a list specifically for Shoalhaven on the BioNet Atlas website. However, results from BioNet will be matched BioNet records only. As a result, fewer species will be identifed compared to the ALA, which matches NSW BioNet data as well as data from other sources.↩︎
We can double check status information by viewing the species list in Excel and clicking on links in the
info
column. This is handy for double checking species status codes or learning more about each species and status.↩︎This is due to differences in taxonomic names in the externally downloaded list and in ALA data. More info can be found under the “Names Matching” tab in the Shapefile + list section.↩︎