Author

Date

2 April 2025

Intern Post

Bush fires are a frequent and natural part of Australia’s ecosystems. Australia’s flora and fauna have adapted alongside fire, with some plants needing fire to germinate and regenerate.

Aboriginal and Torres Strait Islander people have also expertly used fire for tens of thousands of years to care for Country, managing vegetation, reducing wildfire risk, and fostering biodiversity. Since European colonisation, however, the disruption of these practices has created a build-up of fuel loads and introduced invasive plant species, increasing Australia’s risk of larger, uncontrollable fires as temperatures rise.

In 2019-2020, Australia experienced one of the most catastrophic bushfire seasons on record. Fires burnt approximately 19 million hectares, of which 12.6 million were primarily forest and bushland. An estimated 900 species of plants and animals were severely impacted, and 3 billion animals were killed or displaced.

In this post we will explore the impact of the 2019-2020 bushfires on a population of greater gliders (Petauroides volans), a forest-dwelling marsupial species found along the east coast of Australia. We’ll determine greater gliders’ overall habitat range using tidymodels and tidysdm, then use the output of our model to explore whether the impact of fire on greater glider observations since 2019-2020.

To begin, we can load the following packages.

library(galah)
library(tidyverse)
library(tidymodels) 
library(tidysdm) # devtools::install_github("EvolEcolGroup/tidysdm")
library(terra)
library(tidyterra)
library(here)
library(sf)
library(ozmaps)
library(elevatr)
library(geodata)
library(stacks)

We are interested in testing whether observations of gliders changed in areas burnt by the 2019/2020 Black Summer bushfires compared to those that remained unburnt.

That means we’ll need a few components for our investigation:

Data of greater glider observations made before and after the fires
Environmental variables that determine greater gliders’ likely habitat range
The extent of area burnt by the 2019-2020 bushfires

Our hypothesised effect is that there are fewer occurrences of greater gliders after the fire, and a driver of this effect is that an area was previously burnt during the 2019/2020 bushfires.

First, let’s establish our study area and download records of our species of interest.

Download data

Observational data

The Southern Greater Glider (Petauroides volans) is Australia’s largest gliding marsupial, found in tall eucalypt forests along the east coast, from Queensland to Victoria. They are an endagered nocturnal, tree-dwelling herbivore that primarily feed on eucalyptus leaves and den in hollow bear trees¹.

Left: Petauroides volans (Josh Bowell | CC-BY-NC 3.0 (Au)), Right: Petauroides volans (David Sinnott | CC-BY-NC 4.0 (Int))

We’re going to focus on a region of South-East New South Wales and the Australian Captial Territory for our investigation. To start, let’s establish a bounding box around our area in a few different formats (tibble, SpatExtent and sf). We’ll also pull in a map of Australia for later rendering and working with our rasters.

# define geographic region of bounding box
# tibble format is handy for plotting
se_nsw_bbox <- tibble(
  ymin = -37.5,
  ymax = -35,
  xmin = 148.5,
  xmax = 151
)
# Create a terra extent 
# SpatExtent objects are used for modifying raster layers later on
bbox_ext <- terra::ext(
  c(se_nsw_bbox[["xmin"]], 
    se_nsw_bbox[["xmax"]], 
    se_nsw_bbox[["ymin"]], 
    se_nsw_bbox[["ymax"]]
  ))

# Create an sf object of our bounding box
# sf object will help specify elevation data later on
bbox_sf <- st_as_sf(as.polygons(bbox_ext, crs = "EPSG:4326"))

# Get outline of Australia
aus <- ozmaps::ozmap_country |>
  st_transform(crs = "EPSG:4326")

Now let’s use the galah package to download greater glider records from the Atlas of Living Australia over a 10-year time period from 2014 to 2024, which captures observations of greater gliders before and after the 2019-2020 bushfires. Passing our se_nsw_bbox to galah_geolocate() will return only the records in our bounding box. You’ll need to register your email address with the ALA, then pass it to galah using galah_config().

galah_config(email = "your-email-here") # Registered ALA email

# Collect all greater glider records between 2014 and 2024 for the region we defined
gliders <- galah_call() |>
  identify("Petauroides volans") |>
  filter(year >= 2014 & year <= 2024) |>
  galah_apply_profile(ALA) |>
  galah_geolocate(se_nsw_bbox, type = "bbox") |>
  atlas_occurrences()

# Create an sf object for spatial analysis & mapping
gliders_sf <- gliders |>
  st_as_sf(coords = c("decimalLongitude", "decimalLatitude")) |>
  st_set_crs(4326)

Before or after fire

Let’s add a new column fire_period to categorise whether an observation was recorded before or after the fires. Fires lasted over several months so it’s difficult to pinpoint an exact cut-off date. Let’s choose the 1st of December 2019 as a cut-off between pre_fire and post_fire, which marks the beginning of peak bush fire season. Many of the largest bush fires had moved into our study area by this point in time.

# Classify each glider record as pre or post fire
gliders_sf <- gliders_sf |>
  mutate(
    fire_period = if_else(eventDate < as.Date("2019-12-01"), "pre_fire", "post_fire"),
    .before = eventDate # position the column so we can see it
    )

Code

ggplot() +
  geom_sf(data = aus, fill = "grey97", color = "grey40") +
  geom_rect(data = se_nsw_bbox,
            mapping = aes(xmin = xmin, 
                          ymin = ymin, 
                          xmax = xmax, 
                          ymax = ymax),
            colour = "grey50",
            fill = NA) + 
  geom_sf(data = gliders_sf, 
          aes(color = fire_period), 
          size = 2, 
          alpha = 0.7) +
  scale_colour_manual(values = c("#0F3F5C", "#CF5F37")) +
  labs(color = "Fire Period") +
  coord_sf(xlim = c(se_nsw_bbox$xmin, se_nsw_bbox$xmax), 
           ylim = c(se_nsw_bbox$ymin, se_nsw_bbox$ymax)) +
  theme_void()

Greater glider observations before and after the 2019-2020 bushfires

Now that we have greater glider observations, we can start pulling in our raster data for our model.

Spatial data

There are a few environmental factors that are useful for determining the suitable habitat area for greater gliders. With the help of previous studies on gliders², we’ve chosen to download the following four raster layers³ to use as environmental predictors in our species distribution model:

Elevation (from the elevatr package)
Tree cover (from Global Land Analysis & Discovery)
Mean annual temperature (BIO1) (from CHELSA)
Annual precipitation (BIO12) (from CHELSA)

To assess bush fire impact, we will also need to download a fifth raster layer that maps the area burnt over the 2019-2020 bushfires:

Fire extent and severity mapping (2019-2020) (from the NSW government)

Each raster layer will need to be wrangled so that they fit neatly together (same area, same projection, same resolution). We will perform a common series of modifications to each layer that generally fall into the following steps:

Crop the layer to our study area
Mask (or remove) the ocean from our layer so that it does not skew our model
Rename the layer to something sensible
Resample each layer to the same resolution (so that every layer’s grid aligns correctly)

These steps might not always be in the same order, but you will recognise them as we go through the next few sections to download our spatial data.

Elevation

First we’ll download elevation data for our defined region using the {elevatr} package’s handy get_elev_raster() function. By passing our bounding box bbox_sf to the locations argument we can return elevation data (in metres) for our specified area!

# Download elevation raster
elevation_data <- get_elev_raster(locations = bbox_sf, 
                                  z = 9, 
                                  prj = "EPSG:4326")

# Remove raster information outside of the aus land boundary
elevation_aligned <- elevation_data |>
  terra::rast() |>      # convert to SpatRaster class
  terra::mask(aus) |>   # remove information outside of aus boundary
  terra::crop(bbox_ext) # crop layer to bbox

# Rename layer for simplicity
names(elevation_aligned) <- "elevation"

Code

ggplot() +
  geom_spatraster(data = elevation_aligned, aes(fill = elevation)) +
  scale_fill_terrain_c(na.value = NA) +
  guides(fill = guide_colorbar(title = "Elevation (m)")) +
  theme_minimal()

Tree cover

Next we’ll download tree cover data. This tree cover raster layer contains satellite data from Hansen et al (2010) where tree cover is recorded as a percentage (0 = no cover and 100 = complete tree cover).

Spatial data are held on the Global Land Analysis and Discovery website. Tree cover data files are divided in tiles of 10 x 10 latitude/longitude. Because our study area crosses over two tiles (30S_150 and 30S_140), we will need to download and stitch together two rasters. We can then crop them down to our specified area.

To download the files, go to the global 2010 treecover dataset –> click on the url link under the Data Links subheading near the bottom of the page –> click on the relevant data links to download. The two file names we are interested in are: treecover2010_30S_140E.tif and treecover2010_30S_150E.tif. Save these files in your local working directory.

Use code to download

Alternatively, we can use the following code chunk to download files:

download.file("https://glad.umd.edu/Potapov/TCC_2010/treecover2010_30S_140E.tif", destfile="treecover2010_30S_140E.tif")
download.file("https://glad.umd.edu/Potapov/TCC_2010/treecover2010_30S_150E.tif", destfile="treecover2010_30S_150E.tif")

Let’s load our files into R as rasters.

# Load the raster
tree_cover_150 <- terra::rast(here("treecover2010_30S_150E.tif"))
tree_cover_140 <- terra::rast(here("treecover2010_30S_140E.tif"))

Now we can crop them to our study area and merge them together.

# Crop to bbox
tree_cover_150_cropped <- tree_cover_150 |> terra::crop(bbox_ext)
tree_cover_140_cropped <- tree_cover_140 |> terra::crop(bbox_ext)

# Merge
tree_cover <- merge(tree_cover_150_cropped, tree_cover_140_cropped)

# Rename for simplicity
names(tree_cover) <- "treecover"

Best practice

When working with spatial data we tend to use up a lot of memory! That’s a big reason why we are performing these cropping steps—so that we can use up the least amount of memory necessary to get our model to function. Because of this, after we have stitched our two rasters together, we should safely delete the other two as we will no longer require them to clear space.

# We can now safely delete the first two raster files to save memory
rm(tree_cover_140); rm(tree_cover_150)

Code

ggplot() +
  geom_spatraster(data = tree_cover, aes(fill = treecover)) +
  scale_fill_viridis_c(direction = -1, begin = 0.8, end = 0.1) +
  guides(fill = guide_colorbar(title = "Tree Cover (%)")) +
  theme_minimal()

Temperature & precipitation

Next we’ll download two bioclimatic raster layers from CHELSA. CHELSA hosts climate projections at high resolutions intended for ecological use (here’s the paper about it). Climate projections extend from 1981 to 2100. Vegetation that greater gliders inhabit is sensitive to temperature and precipitation changes, so we decided to use climate projections for 2011-2040 for the following two climate variables:

BIO1: Mean Annual Temperature
BIO12: Annual Precipitation

To download these layers on the CHELSA website, click on ‘Downloads’ in the top bar, under version 2.1 click the ‘Download’ button, then navigate to GLOBAL/* –> climatologies/ –> UKESM1-0-LL –> ssp370/ –> bio/. Then select the files with names beginning with CHELSA_bio1_2011-2040 and CHELSA_bio12_2011-2040. Save these files in your local directory.

Use code to download

Alternatively, we can use the following code chunk to download files:

# Download our two raster files
download.file("https://os.zhdk.cloud.switch.ch/chelsav2/GLOBAL/climatologies/2011-2040/UKESM1-0-LL/ssp370/bio/CHELSA_bio1_2011-2040_ukesm1-0-ll_ssp370_V.2.1.tif",
              destfile="CHELSA_bio1_2011-2040_ukesm1-0-ll_ssp370_V.2.1.tif", mode="wb")  

download.file("https://os.zhdk.cloud.switch.ch/chelsav2/GLOBAL/climatologies/2011-2040/UKESM1-0-LL/ssp370/bio/CHELSA_bio12_2011-2040_ukesm1-0-ll_ssp370_V.2.1.tif",
              destfile="CHELSA_bio12_2011-2040_ukesm1-0-ll_ssp370_V.2.1.tif", mode="wb")

Let’s load our files into R as rasters.

temp <- rast("CHELSA_bio1_2011-2040_ukesm1-0-ll_ssp370_V.2.1.tif")
precip <- rast("CHELSA_bio12_2011-2040_ukesm1-0-ll_ssp370_V.2.1.tif")

Now we’ll mask and crop each layer, then rename them.

# mask and crop to study area
temp <- temp |>
  terra::mask(aus) |>   # remove oceans
  terra::crop(bbox_ext) # crop to bbox

precip <- precip |>
  terra::mask(aus) |>   # remove oceans
  terra::crop(bbox_ext) # crop to bbox

# Rename for simplicity
names(temp) <- "temp_bio1"
names(precip) <- "precip_bio12"

Burnt area

Finally, we will download a fire extent raster layer, documenting the area burnt by the 2019-2020 bushfires in New South Wales. The NSW Government’s Fire Extent and Severity Mapping (FESM) 2019/20 dataset contains information on the extent and severity of burnt areas across the state.

To download, navigate to the NSW Government website, click on ‘Dataset Packages’, then select the download icon next to ‘FESM v3-data in IMG and TIFF format’. Save this zip folder in your working directory and uncompress the folder.

Warning: Big zip folder

While the download itself is only several hundred megabytes, when uncompressed the .tif file is very large (10.3 GB). Please keep this in mind when choosing a place to store the folder!

Let’s load the fire extent layer into R as a raster.

# Load raster
fire_extent <- rast("cvmsre_NSW_20192020_ag1l0.tif")

This file is pretty huge and it’s in the wrong projection, which we can see under coord. ref when we view the object. Our desired CRS is WGS84/EPSG:4326 but this raster is projected using CRS GDA_94_Lambert, which affects the extent and coordinates.

fire_extent

class       : SpatRaster 
dimensions  : 103386, 107910, 1  (nrow, ncol, nlyr)
resolution  : 9.993738, 9.993738  (x, y)
extent      : 8857447, 9935872, 4022290, 5055502  (xmin, xmax, ymin, ymax)
coord. ref. : GDA94_NSW_Lambert 
source      : cvmsre_NSW_20192020_ag1l0.tif 
color table : 1 
name        : Layer_1

This poses an issue for us. Reprojecting the entire file to WGS84 (like we have for other rasters) will take up a lot of processing time and memory if we do this reprojection as a first step.

To save time and memory, let’s use an alternative method of cropping first and reprojecting second. We’ll first crop fire_extent to our desired study area by taking our original bbox_ext object and matching its projection to the CRS of fire_extent. We’ll use bbox_ext_gda94 to crop fire_extent to a much smaller area. Then, second, we can reproject our fire_extent_cropped object to the correct projection (and crop again to be certain our layer matches with other layers).

First let’s convert our bounding box bbox_ext, convert it to a polygon and reproject that polygon to use the CRS of fire_extent.

# Convert bbox to polygon with new projection that matches fire_extent
bbox_ext_gda94 <- bbox_ext |> 
  terra::as.polygons(
    crs = gliders_sf           # set crs to match glider data
    ) |>
  terra::project(fire_extent)  # reproject crs to match fire_extent

bbox_ext_gda94

 class       : SpatVector 
 geometry    : polygons 
 dimensions  : 1, 0  (geometries, attributes)
 extent      : 9432876, 9664886, 4021704, 4305054  (xmin, xmax, ymin, ymax)
 coord. ref. : GDA94_NSW_Lambert

Now we can use our new bounding box bbox_ext_gda94 to crop fire_extent to our study area, then reproject it to our desired CRS WGS84. When complete, you’ll notice the extent and coord. ref. or fire_extent_cropped have changed to match our expected bounding box and CRS.

This process takes a while to run

The next few steps to crop and reproject fire_extent takes ~5 minutes to run.

# Crop fire_extent using the reprojected bbox, then reproject to WGS84
fire_extent_cropped <- fire_extent |>
  terra::crop(bbox_ext_gda94) |>
  terra::project(crs(gliders_sf))   # reproject to match glider data

fire_extent_cropped

class       : SpatRaster 
dimensions  : 26110, 26732, 1  (nrow, ncol, nlyr)
resolution  : 9.962733e-05, 9.962733e-05  (x, y)
extent      : 148.4563, 151.1196, -37.54717, -34.9459  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source      : spat_38b02a6426e6_14512_80Q6udnsxVen2Mq.tif 
name        : Layer_1 
min value   :       0 
max value   :       5

If there are a few tailing decimal places that appeared while converting the extent of fire_extent_cropped, we can crop it again using bbox_ext to make sure it matches other rasters (the extent will slightly but noticeably change).

fire_extent_cropped <- fire_extent_cropped |>
  terra::crop(bbox_ext)

We’ll now mask out the ocean like we did for other layers.

# Remove ocean
fire_extent_cropped <- fire_extent_cropped |>
    terra::mask(aus)

fire_extent_cropped

class       : SpatRaster 
dimensions  : 2053, 2054, 1  (nrow, ncol, nlyr)
resolution  : 0.001217593, 0.001217593  (x, y)
extent      : 148.4994, 151.0003, -37.5001, -35.00038  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source(s)   : memory
varname     : cvmsre_NSW_20192020_ag1l0 
name        : fire_extent 
min value   :           0 
max value   :           5

Save your cropped layer

Now would be a good time to save your new cropped raster layer locally. This cropped raster layer is much smaller than our original layer, and it will be easier (and faster) to use this smaller layer in a workflow. Just be sure to document where you downloaded the larger file and the process you used to crop it!

Code

ggplot() +
  geom_spatraster(data = fire_extent_cropped, aes(fill = fire_extent)) +
  scale_fill_princess_c(palette = "america")  +
  guides(fill = guide_colorbar(title = "Fire Severity")) +
  theme_minimal()

The fire_extent layer contains information of the fire’s severity on a scale from 0 (unburnt) to 5 (extreme). For our purposes, we are mainly interested in whether the area was affected by fire or not, rather than its severity.

Fire severity values

If you are interested, the fire_extent dataset categorises fire severity using the following values:

0 - Unburnt (0% canopy and understory burnt)
1 - Reserved (Experimental category, is not used in raster right now)
2 - Low (> 10% burnt upderstory, >90% green canopy)
3 - Moderate (20-90% canopy scorch)
4 - High (> 90% canopy scorched, <50% canopy consumed)
5 - Extreme (>50% canopy biomass consumed)

For simplicity, let’s recode fire_extent to whether an area is burnt or unburnt and save this info in a new column burnt. We’ll use a dummy variable to make our results easier to interpret where burnt is 1 and unburnt is -1. Then we’ll rename the layer.

burnt_cropped <- fire_extent_cropped |>
  mutate(
    # make `burnt` column
    burnt = case_when(
      fire_extent >= 1 ~ 1, # burnt
      fire_extent == 0 ~ -1, # unburnt
      .default = NA_integer_
    )
  ) |>
  select(-fire_extent) # remove fire_extent column

# Rename for simplicity
names(burnt_cropped) <- "burnt"

burnt_cropped

class       : SpatRaster 
dimensions  : 2053, 2054, 1  (nrow, ncol, nlyr)
resolution  : 0.001217593, 0.001217593  (x, y)
extent      : 148.4994, 151.0003, -37.5001, -35.00038  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source(s)   : memory
name        : burnt 
min value   :    -1 
max value   :     1

Match resolution

A final important step is to make sure our layers are projected at the same resolution. Aligning rasters allows for more accurate and reliable results because grid cells won’t overlap in unexpected ways (which would affect our model). Typically, the resolution should match the layer with the lowest resolution. In our case, this is elevation_aligned layer, which you can see if you print the object to the console and compare with other layers.

elevation_aligned

class       : SpatRaster 
dimensions  : 2053, 2054, 1  (nrow, ncol, nlyr)
resolution  : 0.001217593, 0.001217593  (x, y)
extent      : 148.4994, 151.0003, -37.5001, -35.00038  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs 
source(s)   : memory
varname     : file6d6488e4000 
name        : elevation 
min value   :      -728 
max value   :      1901

Let’s resample each layer to match the resolution of elevation_aligned using terra::resample().

# Resample our tree cover to the resolution of the elevation data
tree_cover <- terra::resample(tree_cover, elevation_aligned)
temp <- terra::resample(temp, elevation_aligned)
precip <- terra::resample(precip, elevation_aligned)
burnt <- terra::resample(burnt_cropped, elevation_aligned)

Combine spatial layers

We can now combine all of our raster layers into one object containing:

Elevation
Tree cover
Mean annual temperature (BIO1)
Annual precipitation (BIO12)
Burnt area from the 2019-2020 bushfires

combined_rasters <- c(elevation_aligned, tree_cover, temp, precip, burnt)

combined_rasters

class       : SpatRaster 
dimensions  : 2053, 2054, 5  (nrow, ncol, nlyr)
resolution  : 0.001217593, 0.001217593  (x, y)
extent      : 148.4994, 151.0003, -37.5001, -35.00038  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs 
source(s)   : memory
varnames    : file6d6488e4000 
              file6d6488e4000 
              file6d6488e4000 
              ...
names       : elevation, treecover, temp_bio1, precip_bio12, burnt 
min values  :      -728,   0.00000,  6.293441,     468.7325,    -1 
max values  :      1901,  94.93295, 18.281870,    1358.0767,     1

Phew! That was a lot of data preparation. But with our spatial data sorted, we are now ready to begin preparing our model!

Build our model

If you’re new to species distribution modeling (SDM) or want a deeper dive into the details of species distribution modelling using tidymodels, check out this ALA Labs post. We’ll use a similar workflow but will spend less time explaining the ins-and-outs of how species distribution modelling works.

Prepare data

First, we will thin our data so that there is only one glider observation in any individual grid cell, so that each grid cell contains a “presence” or “absence”.

gliders_thin <- tidysdm::thin_by_cell(gliders_sf, 
                                      raster = combined_rasters)

Next, we will add pseudo-absences⁴ to our data because our glider observations from the ALA are presence-only⁵.

# Generate pseudo-absences
gliders_pseudoabs <- tidysdm::sample_pseudoabs(
  gliders_thin,
  n = 3 * nrow(combined_rasters), 
  raster = combined_rasters,
  method = c("dist_min", tidysdm::km2m(5))
  )

# Extract environmental data for each pseudo-absence
gliders_pseudoabs <- gliders_pseudoabs |>
  # Extract tree cover, elevation and climate values for pseudoabs points
  bind_cols(
  terra::extract(combined_rasters,
                 gliders_pseudoabs,
                 ID = FALSE)
  )

gliders_pseudoabs

Simple feature collection with 7849 features and 6 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 148.5 ymin: -37.49949 xmax: 150.8305 ymax: -35.00099
Geodetic CRS:  WGS 84
# A tibble: 7,849 × 7
   class                geometry elevation treecover temp_bio1 precip_bio12
 * <fct>             <POINT [°]>     <dbl>     <dbl>     <dbl>        <dbl>
 1 presence (149.5415 -35.71064)      1111      82.8      11.2         835.
 2 presence  (149.444 -36.59851)       920      87.1      11.7         861.
 3 presence (149.5139 -35.60314)      1041      82.3      11.4         781.
 4 presence (148.9655 -37.19917)       994      90        10.5         948.
 5 presence (150.2896 -35.48085)       143      90.0      16.9        1010.
 6 presence (149.5631 -35.93422)       970      71.7      11.5         884.
 7 presence (149.4088 -36.74284)       947      85.4      11.3         863.
 8 presence (148.8274 -37.13912)       979      90        10.5         925.
 9 presence (150.2758 -35.48776)       128      86.4      17.0         970.
10 presence (149.5228 -35.86793)      1233      76.5      10.3         889.
# ℹ 7,839 more rows
# ℹ 1 more variable: burnt <dbl>

And we’ll join our pseudo-absences to our thinned presence data gliders_thin to attach any missing columns.

gliders_joined <- gliders_pseudoabs |>
    st_join(
    gliders_thin, 
    left = TRUE
    )

gliders_joined

Simple feature collection with 7849 features and 13 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 148.5 ymin: -37.49949 xmax: 150.8305 ymax: -35.00099
Geodetic CRS:  WGS 84
# A tibble: 7,849 × 14
   class                geometry elevation treecover temp_bio1 precip_bio12
 * <fct>             <POINT [°]>     <dbl>     <dbl>     <dbl>        <dbl>
 1 presence (149.5415 -35.71064)      1111      82.8      11.2         835.
 2 presence  (149.444 -36.59851)       920      87.1      11.7         861.
 3 presence (149.5139 -35.60314)      1041      82.3      11.4         781.
 4 presence (148.9655 -37.19917)       994      90        10.5         948.
 5 presence (150.2896 -35.48085)       143      90.0      16.9        1010.
 6 presence (149.5631 -35.93422)       970      71.7      11.5         884.
 7 presence (149.4088 -36.74284)       947      85.4      11.3         863.
 8 presence (148.8274 -37.13912)       979      90        10.5         925.
 9 presence (150.2758 -35.48776)       128      86.4      17.0         970.
10 presence (149.5228 -35.86793)      1233      76.5      10.3         889.
# ℹ 7,839 more rows
# ℹ 8 more variables: burnt <dbl>, recordID <chr>, scientificName <chr>,
#   taxonConceptID <chr>, fire_period <chr>, eventDate <dttm>,
#   occurrenceStatus <chr>, dataResourceName <chr>

Earlier, we categorised our greater glider observations as pre_fire or post_fire. We will need to categorise our pseudo-absences into the same categories. Doing so allows our model to generate a probability of presence in both conditions.

As a simple way to categorise our pseudo-absence points, we’ll use sample() to randomly assign them to pre_fire or post-fire which will assign a more-or-less equal number of randomly selected pseudo-absence points for both timeframes⁶. Doing this with case_when() allows us to conditionally assign a new category only when there isn’t one already (i.e. only for pseudo-absence points).

# Assign `pre_fire` and `post_fire` categories to pseudoabsences
gliders_categorised <- gliders_joined |>
  mutate(
    fire_period = case_when(
      # If there is no category already, sample one
      is.na(fire_period) ~ sample(c("pre_fire", "post_fire"),
                                  size = nrow(gliders_joined),
                                  replace = TRUE),
      .default = fire_period
    )
  ) |>
  select(1:2, fire_period, everything()) # reorder

gliders_categorised

Simple feature collection with 7849 features and 13 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 148.5 ymin: -37.49949 xmax: 150.8305 ymax: -35.00099
Geodetic CRS:  WGS 84
# A tibble: 7,849 × 14
   class                geometry fire_period elevation treecover temp_bio1
   <fct>             <POINT [°]> <chr>           <dbl>     <dbl>     <dbl>
 1 presence (149.5415 -35.71064) post_fire        1111      82.8      11.2
 2 presence  (149.444 -36.59851) post_fire         920      87.1      11.7
 3 presence (149.5139 -35.60314) pre_fire         1041      82.3      11.4
 4 presence (148.9655 -37.19917) pre_fire          994      90        10.5
 5 presence (150.2896 -35.48085) post_fire         143      90.0      16.9
 6 presence (149.5631 -35.93422) post_fire         970      71.7      11.5
 7 presence (149.4088 -36.74284) pre_fire          947      85.4      11.3
 8 presence (148.8274 -37.13912) pre_fire          979      90        10.5
 9 presence (150.2758 -35.48776) pre_fire          128      86.4      17.0
10 presence (149.5228 -35.86793) post_fire        1233      76.5      10.3
# ℹ 7,839 more rows
# ℹ 8 more variables: precip_bio12 <dbl>, burnt <dbl>, recordID <chr>,
#   scientificName <chr>, taxonConceptID <chr>, eventDate <dttm>,
#   occurrenceStatus <chr>, dataResourceName <chr>

Pseudo-absences, assigned to `pre-fire` or `post-fire`

To make the modelled results of our fire_period column easier to interpret, let’s reformat our categories to a dummy variable format, where pre_fire is -1 and post_fire is 1. Whichever number our coefficient is closer to will tell us which period had more greater glider observations.

# Change fire period to a dummy variable 
gliders_filtered <- gliders_categorised |>
  mutate(
    fire_period = as.numeric(
      if_else(fire_period == "post_fire", 1, -1) # post_fire = 1
      )
    )

gliders_filtered |> head(5L)

Simple feature collection with 5 features and 13 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 148.9655 ymin: -37.19917 xmax: 150.2896 ymax: -35.48085
Geodetic CRS:  WGS 84
# A tibble: 5 × 14
  class                geometry fire_period elevation treecover temp_bio1
  <fct>             <POINT [°]>       <dbl>     <dbl>     <dbl>     <dbl>
1 presence (149.5415 -35.71064)           1      1111      82.8      11.2
2 presence  (149.444 -36.59851)           1       920      87.1      11.7
3 presence (149.5139 -35.60314)          -1      1041      82.3      11.4
4 presence (148.9655 -37.19917)          -1       994      90        10.5
5 presence (150.2896 -35.48085)           1       143      90.0      16.9
# ℹ 8 more variables: precip_bio12 <dbl>, burnt <dbl>, recordID <chr>,
#   scientificName <chr>, taxonConceptID <chr>, eventDate <dttm>,
#   occurrenceStatus <chr>, dataResourceName <chr>

Train our model

Our data is ready to be used for model training and testing. Let’s split our data into training and testing datasets, allocating ~75% of points to training and ~25% to testing.

set.seed(100)

# Allocate data into training or testing datasets
gliders_split <- 
  gliders_filtered |>
  initial_split()
gliders_split

<Training/Testing/Total>
<5886/1963/7849>

# Create datasets
gliders_train <- training(gliders_split)
gliders_test <- testing(gliders_split)

Train
Test

gliders_train |> head(5L)

Simple feature collection with 5 features and 13 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 148.858 ymin: -36.86635 xmax: 150.4548 ymax: -35.07283
Geodetic CRS:  WGS 84
# A tibble: 5 × 14
  class                 geometry fire_period elevation treecover temp_bio1
  <fct>              <POINT [°]>       <dbl>     <dbl>     <dbl>     <dbl>
1 pseudoabs (149.7846 -36.86635)           1       512    85.1       14.1 
2 presence  (150.4548 -35.35563)           1        86    70.9       17.5 
3 pseudoabs (149.0406 -35.76564)          -1      1353    67.0        9.68
4 pseudoabs  (148.858 -35.07283)           1       643     0         13.6 
5 pseudoabs (149.0455 -35.39184)          -1       621     0.172     13.6 
# ℹ 8 more variables: precip_bio12 <dbl>, burnt <dbl>, recordID <chr>,
#   scientificName <chr>, taxonConceptID <chr>, eventDate <dttm>,
#   occurrenceStatus <chr>, dataResourceName <chr>

gliders_test |> head(5L)

Simple feature collection with 5 features and 13 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 148.617 ymin: -37.19054 xmax: 149.6419 ymax: -35.35184
Geodetic CRS:  WGS 84
# A tibble: 5 × 14
  class                geometry fire_period elevation treecover temp_bio1
  <fct>             <POINT [°]>       <dbl>     <dbl>     <dbl>     <dbl>
1 presence (149.5139 -35.60314)          -1      1041      82.3      11.4
2 presence (148.8274 -37.13912)          -1       979      90        10.5
3 presence  (148.617 -35.35184)           1      1175      69.8      10.3
4 presence (149.6419 -35.35771)           1       899      81.7      12.2
5 presence (148.8454 -37.19054)           1       919      78.0      10.9
# ℹ 8 more variables: precip_bio12 <dbl>, burnt <dbl>, recordID <chr>,
#   scientificName <chr>, taxonConceptID <chr>, eventDate <dttm>,
#   occurrenceStatus <chr>, dataResourceName <chr>

Now we’ll resample our data into folds (i.e. smaller resampled datasets) using block cross-validation, a type of resampling better-suited to spatial data⁷. We will use these folds to train and tune our model.

# Perform Cross validation
gliders_cv <- spatial_block_cv(gliders_train, v = 5)

gliders_cv

#  5-fold spatial block cross-validation 
# A tibble: 5 × 2
  splits              id   
  <list>              <chr>
1 <split [4873/1013]> Fold1
2 <split [4613/1273]> Fold2
3 <split [4426/1460]> Fold3
4 <split [4890/996]>  Fold4
5 <split [4742/1144]> Fold5

Next, we’ll define our model’s “recipe”. We wish to test how our response variable class (presence or absence) is affected by each predictor variable (burnt, fire_period, elevation, temp_bio1, temp_bio12, and treecover). We’ll also add an interaction between burnt and fire_period because we expect greater glider occurrences to be more strongly affected in burnt areas and after the fires.

gliders_recipe <- recipe(
  gliders_train, 
  formula = class ~ burnt + fire_period + elevation + temp_bio1 + precip_bio12 + treecover
  ) |>
  step_interact(terms = ~burnt:fire_period)

gliders_recipe

Now we can set our workflow, which uses our gliders_recipe and training data to train several types of models (see tidysdm for more information on these model specifications).

gliders_models <-
  workflow_set(
    preproc = list(default = gliders_recipe), # Use the same recipe for all
    models = list(
      glm = sdm_spec_glm(),        # Generalised Linear Model
      rf = sdm_spec_rf(),          # Random Forest
      gbm = sdm_spec_boost_tree(), # Gradient Boosting Machine
      maxent = sdm_spec_maxent()   # Maximum Entropy
    ),
    cross = TRUE 
  ) |>
  option_add(control = control_ensemble_grid())

gliders_models

# A workflow set/tibble: 4 × 4
  wflow_id       info             option    result    
  <chr>          <list>           <list>    <list>    
1 default_glm    <tibble [1 × 4]> <opts[1]> <list [0]>
2 default_rf     <tibble [1 × 4]> <opts[1]> <list [0]>
3 default_gbm    <tibble [1 × 4]> <opts[1]> <list [0]>
4 default_maxent <tibble [1 × 4]> <opts[1]> <list [0]>

We’re ready to tune and optimise our models using our workflow above and the cross validation blocks we generated earlier. This step helps us find which model parameters and hyperparameters make reasonable predictions.

Using autoplot(), we can quickly see which models performed best using three performance metrics for evaluating species distribution models (boyce_cont, roc_auc and tss_max). In general, results are pretty varied depending on the metric, though Maxent models performed best according to the Boyce Continuous Index.

set.seed(9999)

# Tune the model using cross validation
gliders_models_tune <-
  gliders_models |>
  workflow_map("tune_grid",
               resamples = gliders_cv,     # Use our cross-validation blocks for tuning
               grid = 6,                   # number of tuning iterations
               metrics = sdm_metric_set(), # Evaluate model performance
               verbose = TRUE,
               control = stacks::control_stack_grid()
  )

gliders_models_tune

# A workflow set/tibble: 4 × 4
  wflow_id       info             option    result   
  <chr>          <list>           <list>    <list>   
1 default_glm    <tibble [1 × 4]> <opts[4]> <rsmp[+]>
2 default_rf     <tibble [1 × 4]> <opts[4]> <tune[+]>
3 default_gbm    <tibble [1 × 4]> <opts[4]> <tune[+]>
4 default_maxent <tibble [1 × 4]> <opts[4]> <tune[+]>

autoplot(gliders_models_tune)

To help improve accuracy and generalisation of our model predictions, let’s “stack” our models into an ensemble model to blend predictions. Again using autoplot(), we can see the relative weighting each model adds to our final prediction.

set.seed(98765)

gliders_stacked <- 
  stacks() |>                                # Initialize the stack
  add_candidates(gliders_models_tune) |>     # Add models
  blend_predictions() |>                     # Blend their predictions
  fit_members()                              # Fit the final model

# See model contribution
autoplot(gliders_stacked, 
         type = "weights")

Now that we have our stacked ensemble model, it’s time to test how well it performs on new data. We do this by making predictions about the points in gliders_test and comparing them to the “true” results.

# Predict probability of presence 
gliders_test_predictions <-
  gliders_test %>%
  bind_cols(predict(gliders_stacked, ., 
                    type = "prob", 
                    save_pred = TRUE)) 

gliders_test_predictions |>
  select(class, .pred_pseudoabs, .pred_presence, everything()) # reorder cols

Simple feature collection with 1963 features and 15 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 148.5 ymin: -37.49706 xmax: 150.8 ymax: -35.00221
Geodetic CRS:  WGS 84
# A tibble: 1,963 × 16
   class    .pred_pseudoabs .pred_presence             geometry fire_period
   <fct>              <dbl>          <dbl>          <POINT [°]>       <dbl>
 1 presence          0.153           0.847 (149.5139 -35.60314)          -1
 2 presence          0.0680          0.932 (148.8274 -37.13912)          -1
 3 presence          0.578           0.422  (148.617 -35.35184)           1
 4 presence          0.244           0.756 (149.6419 -35.35771)           1
 5 presence          0.176           0.824 (148.8454 -37.19054)           1
 6 presence          0.403           0.597 (149.4186 -36.57799)          -1
 7 presence          0.0669          0.933   (148.923 -37.2277)          -1
 8 presence          0.261           0.739  (149.507 -35.80711)           1
 9 presence          0.0761          0.924 (150.2911 -35.65586)           1
10 presence          0.288           0.712  (148.659 -35.25845)           1
# ℹ 1,953 more rows
# ℹ 11 more variables: elevation <dbl>, treecover <dbl>, temp_bio1 <dbl>,
#   precip_bio12 <dbl>, burnt <dbl>, recordID <chr>, scientificName <chr>,
#   taxonConceptID <chr>, eventDate <dttm>, occurrenceStatus <chr>,
#   dataResourceName <chr>

Using tidysdm::sdm_metric_set(), we can print several helpful metrics about our model’s performance. Overall, our model seems to have performed very well⁸.

# Evaluate performance
gliders_test_predictions |> 
  sdm_metric_set()(truth = class, .pred_presence)

# A tibble: 3 × 3
  .metric    .estimator .estimate
  <chr>      <chr>          <dbl>
1 boyce_cont binary         0.869
2 roc_auc    binary         0.969
3 tss_max    binary         0.843

Specifics about estimates

boyce_cont: The Boyce Continuous Index is from -1 to 1, where values closer to 1 indicate the model’s predictions are consistent with the actual value.
roc_auc: The Relative Operating Characteristics Area Under the Curve (ROC AUC) is from 0.5 to 1, where a value of 0.5 indicates that the model’s predictions are no better than random chance, whereas a value of 1 indicates perfect prediction.
tss_max: The True Skill Statistic is from -1 to 1, where values closer to 1 indicate better model performance.

Greater glider distribution

We are nearly ready to map our predicted distribution. There’s just one final step before predicting a distribution surface.

Our model includes the variable fire_period, a categorical variable without an associated spatial layer. However, in order for our model to predict a spatial surface with fire_period included (and it must be included because our model uses it), we have to create a raster based on the point locations and values of fire_period. We’ll follow the same general steps that we did earlier, building a raster, cropping and matching the resolution of the other layers.

# Add fire_period as a spatial layer
fire_period_rast <- gliders_filtered |>
  select(fire_period) |>
  terra::vect() |>                   # convert to vector
  terra::rast() |>                   # rasterise
  terra::crop(bbox_ext) |>           # crop to bounding box
  terra::resample(elevation_aligned) # match resolution

# add fire_period data to raster
terra::values(fire_period_rast) <- gliders_filtered$fire_period
names(fire_period_rast) <- "fire_period" # Rename for simplicity

fire_period_rast

class       : SpatRaster 
dimensions  : 2053, 2054, 1  (nrow, ncol, nlyr)
resolution  : 0.001217593, 0.001217593  (x, y)
extent      : 148.4994, 151.0003, -37.5001, -35.00038  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source(s)   : memory
varname     : file6d6488e4000 
name        : fire_period 
min value   :          -1 
max value   :           1

Now we can join fire_period_rast with the other raster layers

combined_rasters_complete <- c(combined_rasters, fire_period_rast)
combined_rasters_complete

class       : SpatRaster 
dimensions  : 2053, 2054, 6  (nrow, ncol, nlyr)
resolution  : 0.001217593, 0.001217593  (x, y)
extent      : 148.4994, 151.0003, -37.5001, -35.00038  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs 
source(s)   : memory
varnames    : file6d6488e4000 
              file6d6488e4000 
              file6d6488e4000 
              ...
names       : elevation, treecover, temp_bio1, precip_bio12, burnt, fire_period 
min values  :      -728,   0.00000,  6.293441,     468.7325,    -1,          -1 
max values  :      1901,  94.93295, 18.281870,    1358.0767,     1,           1

Ok! We’re finally ready to make our final prediction of greater gliders’ distribution. We’ll use tidysdm::predict_raster(), a very helpful function for spatial predictions, and maps our results!

# Predict
prediction_present <- predict_raster(gliders_stacked, 
                                     combined_rasters_complete, 
                                     type = "prob",
                                     wopt = list(steps=32))

# Map
ggplot() +
  geom_spatraster(data = prediction_present, 
                  aes(fill = .pred_presence)) +
  scale_fill_viridis_c(option = "E", na.value = NA) +
  guides(fill = guide_colorbar(
    title = "Relative\nHabitat\nSuitability")
    ) +
  labs(title = "Predicted distribution of greater gliders",
       subtitle = "Southeast New South Wales") +
  pilot::theme_pilot(grid="hv") +
  theme(
    legend.text = element_text(hjust = 0.5)
    )

Predicted distribution of greater gliders

As we used pseudo-absences rather than true absences, it’s best to interpret the map as showing us the relative habitat suitability for greater gliders across the area (as opposed to a true distribution of where greater gliders occur). The highest habitat suitability for greater gliders appears to be along the middle forest area where most of our greater glider observations were located (which makes sense), along with a few larger areas in the upper and lower west, and near the upper coastline.

Impact of fire

As the second part of our investigation, we wanted to know whether the 2019/2020 bushfires had an impact on greater gliders. Our hypothesis was that there are fewer greater glider occurrences after the fire, and a driver of this effect is that an area was burnt during the 2019/2020 bushfires.

To see the more traditional statistical test results (e.g., estimates, test statistics, p-values), we could extract them from our Generalised Linear Model (GLM) which we ran over our tuning process. These types of summary statistics are easier to interpret, but they might not be the best option for us to use.

Why can’t we use GLM results?

Generalised Linear Models fit a straight line of best-fit to the data. As a result, it’s possible to return a coefficient that estimates the slope of the line and some test statistics about each variable’s importance in the model.

Other models we’ve run like Maxent and Random Forest models are a little different, though. They use weights to “penalize” a model to get a better fitted line; one that isn’t just straight, but shaped and curved to better fit the data. This shaped, squiggly line can be represented by a unique equation, not just a coefficient of slope. When asked to return results for models like these, tidymodels returns penalty values (lambda) used by the model, not a summary of statistics and p values.

We have to use other packages (like {vip} and {DALEX}) to assess more complex types of models like our complete gliders_stacked model.

To see what we mean, let’s have a look at our GLM results. Before we see them, let’s remind ourselves what we are expecting to see given our hypothesis:

Fewer occurrences after the fire. This should appear as a significant, positive fire_period coefficient.
Fewer occurrences in burnt areas. This should appear as a significant, positive burnt coefficient.
An interaction between fire period and burnt area, because burnt areas after fire will have very few records⁹.

Code

# See the models that performed best
# gliders_models_tune |>
#   rank_results(
#     rank_metric = "boyce_cont",
#     select_best = TRUE
#   )

# extract best glm model
best_model <- gliders_models_tune |> 
  extract_workflow(id = "default_glm")

# use best glm to fit complete `gliders_filtered` data
fitted <- 
  best_model |> 
  fit(data = gliders_filtered)

# see results
fitted |>
  broom::tidy()

# A tibble: 8 × 5
  term                 estimate std.error statistic   p.value
  <chr>                   <dbl>     <dbl>     <dbl>     <dbl>
1 (Intercept)         20.3       1.35         15.0  5.27e- 51
2 burnt                0.286     0.0408        7.02 2.15e- 12
3 fire_period         -0.194     0.0362       -5.35 8.61e-  8
4 elevation           -0.00586   0.000418    -14.0  8.76e- 45
5 temp_bio1           -0.729     0.0727      -10.0  1.18e- 23
6 precip_bio12         0.000354  0.000306      1.16 2.48e-  1
7 treecover           -0.0836    0.00293     -28.5  1.68e-178
8 burnt_x_fire_period -0.115     0.0359       -3.21 1.32e-  3

When running a model on this many data points, it’s generally expected that most variables will be significant. It’s easier to look at the size of the effect (i.e. statistic) and the estimate & standard error to figure out the direction and certainty of that estimate. The estimates are a little confusing here due to the model using presence as the reference level—higher numbers are a stronger prediction of absence.

A quick first impression tells us that fire_period and burnt variables are significant but relatively weak predictors. You might also notice that some variables like elevation are very significant in the model but have tiny estimate coefficients. Let’s use DALEX to help us understand how this is possible.

Use DALEX

If we want to understand more about our more complex stacked model, we can use the DALEXtra package. DALEX¹⁰ is a model-agnostic tool for assessing complex models.

library(DALEXtra)

DALEX assumes the opposite reference level to tidymodels for our outcome variable class. So to make the results of DALEX make sense, we’ll need to recode the predicted values of our model to flip the 1’s and 0’s. We can do this by returning the results of our best model (according to the metric boyce_cont), then we can use extract_mold() to see the components of the model, extract the outcome values, and recode the 1’s and 0’s.

# get the results of the best fit model
best_fit <- fit_best(gliders_models_tune, metric = "boyce_cont")

# flip `class` variable coding
class_revised <- best_fit |>
  workflowsets::extract_mold() |> # components of fitted model results
  purrr::pluck("outcomes") |>     # pluck predicted 'class' from list
  dplyr::pull() |>                # pull values from tibble
  case_match("pseudoabs" ~ 1, "presence" ~ 0) # recode

class_revised |> head(30L)

 [1] 1 0 1 1 1 1 1 1 1 0 1 0 0 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1

The rest is very straightforward (thank goodness). We can feed our stacked model, data and revised class variable into explain_tidymodels() to do exactly that!

test <- explain_tidymodels(gliders_stacked,
                           data = gliders_train,
                           y = class_revised)

Preparation of a new explainer is initiated
  -> model label       :  list  (  default  )
  -> data              :  5886  rows  14  cols 
  -> data              :  tibble converted into a data.frame 
  -> target variable   :  5886  values 
  -> predict function  :  yhat.model_stack  will be used (  default  )
  -> predicted values  :  No value for predict function target column. (  default  )
  -> model_info        :  package stacks , ver. 1.0.5 , task classification (  default  ) 
  -> predicted values  :  numerical, min =  0.04100672 , mean =  0.76612 , max =  0.9738481  
  -> residual function :  difference between y and yhat (  default  )
  -> residuals         :  numerical, min =  -0.9528437 , mean =  0.01743423 , max =  0.9202559  
  A new explainer has been created!

Let’s see which parts of our model had the greatest importance for predicting the outcome. You’ll notice these results look similar to the results of our GLM.

test |>
  model_parts() |> 
  plot(show_boxplots = FALSE)

We can also see the partial effects of our model, which show how the model is fitting each individual variable effect. Here we can appreciate the model’s complexity, as each variable has its own non-linear fitted line! DALEX can return many other helpful summaries to check model performance and results. See more here.

model_profile(test, 
              type = "partial", 
              variables = c("burnt", "fire_period", "treecover", "temp_bio1", "precip_bio12", "elevation")
              ) |>
  plot()

Overall, our results seem to show that fire does have an effect, but it’s a relatively small effect on greater glider occurrences.

Here we can see a few things:

Tree cover and elevation are excellent predictors of greater gliders, which makes sense given they live in trees in alpine regions.
The negative estimate of fire_period suggests that there are more records of greater gliders after the fires than before¹¹. This is probably due to the growing number of records each year in the ALA (though the effect is relatively small).
The slight positive estimate of burnt suggests fewer records in burnt areas¹². However, the effect is pretty small.

Final thoughts

The 2019-2020 bushfires had a huge impact on the Australian environment. However, it wasn’t all negative; across taxonomic groups there was a wide range of negative and positive effects from the fire. It might not all be “doom-and-gloom” after all, thanks to the resilience of many Australian species to fires.

Our investigation finds that despite the general negative impact of severe fire on forests, greater gliders may have been less immediately affected in the aftermath of the 2019-2020 fires than other mammals and marsupials. In the short-term, glider distributions after megafires seem generally stable. However, other areas in NSW were more negatively impacted, and they may deteriorate over longer periods. Ultimately, factors like the density of hollowed den trees may determine how well greater gliders survive after fire, not only the burnt forest area.

To learn more on ALA Labs, check out these posts on using tidymodels and tidysdm and mapping multiple overlapping species distributions.

Expand for session info

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.4.3 (2025-02-28 ucrt)
 os       Windows 10 x64 (build 19045)
 system   x86_64, mingw32
 ui       RTerm
 language (EN)
 collate  English_Australia.utf8
 ctype    English_Australia.utf8
 tz       Australia/Sydney
 date     2025-04-04
 pandoc   3.2 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package       * version date (UTC) lib source
 broom         * 1.0.7   2024-09-26 [1] CRAN (R 4.4.1)
 DALEX         * 2.4.3   2023-01-15 [1] CRAN (R 4.3.1)
 DALEXtra      * 2.3.0   2023-05-26 [1] CRAN (R 4.3.1)
 dials         * 1.3.0   2024-07-30 [1] CRAN (R 4.4.1)
 dplyr         * 1.1.4   2023-11-17 [1] CRAN (R 4.3.2)
 elevatr       * 0.99.0  2023-09-12 [1] CRAN (R 4.3.2)
 forcats       * 1.0.0   2023-01-29 [1] CRAN (R 4.3.2)
 galah         * 2.1.1   2025-02-07 [1] CRAN (R 4.4.2)
 geodata       * 0.6-2   2024-06-10 [1] CRAN (R 4.4.1)
 ggplot2       * 3.5.1   2024-04-23 [1] CRAN (R 4.4.3)
 glmnet        * 4.1-8   2023-08-22 [1] CRAN (R 4.3.1)
 here          * 1.0.1   2020-12-13 [1] CRAN (R 4.3.2)
 infer         * 1.0.7   2024-03-25 [1] CRAN (R 4.4.1)
 lubridate     * 1.9.4   2024-12-08 [1] CRAN (R 4.4.2)
 Matrix        * 1.7-0   2024-03-22 [1] CRAN (R 4.4.1)
 maxnet        * 0.1.4   2021-07-09 [1] CRAN (R 4.3.2)
 modeldata     * 1.4.0   2024-06-19 [1] CRAN (R 4.4.1)
 ozmaps        * 0.4.5   2021-08-03 [1] CRAN (R 4.3.2)
 parsnip       * 1.2.1   2024-03-22 [1] CRAN (R 4.4.1)
 purrr         * 1.0.2   2023-08-10 [1] CRAN (R 4.3.2)
 ranger        * 0.16.0  2023-11-12 [1] CRAN (R 4.3.2)
 readr         * 2.1.5   2024-01-10 [1] CRAN (R 4.3.3)
 recipes       * 1.1.0   2024-07-04 [1] CRAN (R 4.4.1)
 rsample       * 1.2.1   2024-03-25 [1] CRAN (R 4.4.1)
 scales        * 1.3.0   2023-11-28 [1] CRAN (R 4.3.2)
 sessioninfo   * 1.2.2   2021-12-06 [1] CRAN (R 4.3.2)
 sf            * 1.0-19  2024-11-05 [1] CRAN (R 4.4.2)
 spatialsample * 0.6.0   2024-10-02 [1] CRAN (R 4.4.1)
 stacks        * 1.0.5   2024-07-22 [1] CRAN (R 4.4.1)
 stringr       * 1.5.1   2023-11-14 [1] CRAN (R 4.3.2)
 terra         * 1.8-29  2025-02-26 [1] CRAN (R 4.4.3)
 tibble        * 3.2.1   2023-03-20 [1] CRAN (R 4.3.2)
 tidymodels    * 1.2.0   2024-03-25 [1] CRAN (R 4.4.1)
 tidyr         * 1.3.1   2024-01-24 [1] CRAN (R 4.3.3)
 tidysdm       * 0.9.5   2024-06-23 [1] CRAN (R 4.4.1)
 tidyterra     * 0.6.1   2024-06-08 [1] CRAN (R 4.4.1)
 tidyverse     * 2.0.0   2023-02-22 [1] CRAN (R 4.3.2)
 tune          * 1.2.1   2024-04-18 [1] CRAN (R 4.4.1)
 workflows     * 1.1.4   2024-02-19 [1] CRAN (R 4.4.1)
 workflowsets  * 1.1.0   2024-03-21 [1] CRAN (R 4.4.1)
 xgboost       * 1.7.8.1 2024-07-24 [1] CRAN (R 4.4.1)
 yardstick     * 1.3.1   2024-03-21 [1] CRAN (R 4.4.1)

 [1] C:/Users/KEL329/R-packages
 [2] C:/Users/KEL329/AppData/Local/Programs/R/R-4.4.3/library

──────────────────────────────────────────────────────────────────────────────

Footnotes

Southern greater gliders grow up to 1m long from head to tail and can glide up to 100 metres through the canopy!↩︎
e.g. Smith et al. 2007; Ridley et al. 2024 ↩︎
For more info about what rasters are, see the “What’s a raster” section from this ALA Labs article.↩︎
Pseudo-absences, also called background points, are points that represent true absences in our data for modelling. They are there to help our model make predictions about what variables more strongly predict a presence vs an absence. Because we don’t have true presence data, this is one way to provide a model this information, though it is less informative for our interpretation.↩︎
They only provide information about the conditions of when a glider was observed but no information about when gliders aren’t observed↩︎
For this analysis, it doesn’t really matter whether we have a perfect split.↩︎
For more info on what this means, see this section from another ALA Labs article ↩︎
So well, in fact, that it’s plausible it might be overfitting, making our model poor for prediction outside of our small spatial bounding box in New South Wales. For our intended analysis, this doesn’t matter, but for a model intended for use to make broad future predictions this could pose a problem.↩︎
To the extent that the combination of an area being burnt and post_fire might multiply the effect when together.↩︎
DALEX stands for moDel Agnostic Language for Exploration and eXplanation. It may be one of the strangest acronyms of all time.↩︎
Based on how our variable is coded, a negative coefficient means that before fires, there is a higher likelihood of absence.↩︎
Based on how our variable is coded, a positive coefficient means that in burnt areas, there is a higher likelihood of absence.↩︎