library(galah)
library(tidyverse)
library(tidymodels)
library(tidysdm) # devtools::install_github("EvolEcolGroup/tidysdm")
library(terra)
library(tidyterra)
library(here)
library(sf)
library(ozmaps)
library(elevatr)
library(geodata)
library(stacks)
Bush fires are a frequent and natural part of Australia’s ecosystems. Australia’s flora and fauna have adapted alongside fire, with some plants needing fire to germinate and regenerate.
Aboriginal and Torres Strait Islander people have also expertly used fire for tens of thousands of years to care for Country, managing vegetation, reducing wildfire risk, and fostering biodiversity. Since European colonisation, however, the disruption of these practices has created a build-up of fuel loads and introduced invasive plant species, increasing Australia’s risk of larger, uncontrollable fires as temperatures rise.
In 2019-2020, Australia experienced one of the most catastrophic bushfire seasons on record. Fires burnt approximately 19 million hectares, of which 12.6 million were primarily forest and bushland. An estimated 900 species of plants and animals were severely impacted, and 3 billion animals were killed or displaced.
In this post we will explore the impact of the 2019-2020 bushfires on a population of greater gliders (Petauroides volans), a forest-dwelling marsupial species found along the east coast of Australia. We’ll determine greater gliders’ overall habitat range using tidymodels and tidysdm, then use the output of our model to explore whether the impact of fire on greater glider observations since 2019-2020.
To begin, we can load the following packages.
We are interested in testing whether observations of gliders changed in areas burnt by the 2019/2020 Black Summer bushfires compared to those that remained unburnt.
That means we’ll need a few components for our investigation:
- Data of greater glider observations made before and after the fires
- Environmental variables that determine greater gliders’ likely habitat range
- The extent of area burnt by the 2019-2020 bushfires
Our hypothesised effect is that there are fewer occurrences of greater gliders after the fire, and a driver of this effect is that an area was previously burnt during the 2019/2020 bushfires.
First, let’s establish our study area and download records of our species of interest.
Download data
Observational data
The Southern Greater Glider (Petauroides volans) is Australia’s largest gliding marsupial, found in tall eucalypt forests along the east coast, from Queensland to Victoria. They are an endagered nocturnal, tree-dwelling herbivore that primarily feed on eucalyptus leaves and den in hollow bear trees1.
Left: Petauroides volans (Josh Bowell | CC-BY-NC 3.0 (Au)), Right: Petauroides volans (David Sinnott | CC-BY-NC 4.0 (Int))
We’re going to focus on a region of South-East New South Wales and the Australian Captial Territory for our investigation. To start, let’s establish a bounding box around our area in a few different formats (tibble
, SpatExtent
and sf
). We’ll also pull in a map of Australia for later rendering and working with our rasters.
# define geographic region of bounding box
# tibble format is handy for plotting
<- tibble(
se_nsw_bbox ymin = -37.5,
ymax = -35,
xmin = 148.5,
xmax = 151
)# Create a terra extent
# SpatExtent objects are used for modifying raster layers later on
<- terra::ext(
bbox_ext c(se_nsw_bbox[["xmin"]],
"xmax"]],
se_nsw_bbox[["ymin"]],
se_nsw_bbox[["ymax"]]
se_nsw_bbox[[
))
# Create an sf object of our bounding box
# sf object will help specify elevation data later on
<- st_as_sf(as.polygons(bbox_ext, crs = "EPSG:4326"))
bbox_sf
# Get outline of Australia
<- ozmaps::ozmap_country |>
aus st_transform(crs = "EPSG:4326")
Now let’s use the galah package to download greater glider records from the Atlas of Living Australia over a 10-year time period from 2014 to 2024, which captures observations of greater gliders before and after the 2019-2020 bushfires. Passing our se_nsw_bbox
to galah_geolocate()
will return only the records in our bounding box. You’ll need to register your email address with the ALA, then pass it to galah using galah_config()
.
galah_config(email = "your-email-here") # Registered ALA email
# Collect all greater glider records between 2014 and 2024 for the region we defined
<- galah_call() |>
gliders identify("Petauroides volans") |>
filter(year >= 2014 & year <= 2024) |>
galah_apply_profile(ALA) |>
galah_geolocate(se_nsw_bbox, type = "bbox") |>
atlas_occurrences()
# Create an sf object for spatial analysis & mapping
<- gliders |>
gliders_sf st_as_sf(coords = c("decimalLongitude", "decimalLatitude")) |>
st_set_crs(4326)
Before or after fire
Let’s add a new column fire_period
to categorise whether an observation was recorded before or after the fires. Fires lasted over several months so it’s difficult to pinpoint an exact cut-off date. Let’s choose the 1st of December 2019 as a cut-off between pre_fire
and post_fire
, which marks the beginning of peak bush fire season. Many of the largest bush fires had moved into our study area by this point in time.
# Classify each glider record as pre or post fire
<- gliders_sf |>
gliders_sf mutate(
fire_period = if_else(eventDate < as.Date("2019-12-01"), "pre_fire", "post_fire"),
.before = eventDate # position the column so we can see it
)
Code
ggplot() +
geom_sf(data = aus, fill = "grey97", color = "grey40") +
geom_rect(data = se_nsw_bbox,
mapping = aes(xmin = xmin,
ymin = ymin,
xmax = xmax,
ymax = ymax),
colour = "grey50",
fill = NA) +
geom_sf(data = gliders_sf,
aes(color = fire_period),
size = 2,
alpha = 0.7) +
scale_colour_manual(values = c("#0F3F5C", "#CF5F37")) +
labs(color = "Fire Period") +
coord_sf(xlim = c(se_nsw_bbox$xmin, se_nsw_bbox$xmax),
ylim = c(se_nsw_bbox$ymin, se_nsw_bbox$ymax)) +
theme_void()
Now that we have greater glider observations, we can start pulling in our raster data for our model.
Spatial data
There are a few environmental factors that are useful for determining the suitable habitat area for greater gliders. With the help of previous studies on gliders2, we’ve chosen to download the following four raster layers3 to use as environmental predictors in our species distribution model:
- Elevation (from the elevatr package)
- Tree cover (from Global Land Analysis & Discovery)
- Mean annual temperature (BIO1) (from CHELSA)
- Annual precipitation (BIO12) (from CHELSA)
To assess bush fire impact, we will also need to download a fifth raster layer that maps the area burnt over the 2019-2020 bushfires:
- Fire extent and severity mapping (2019-2020) (from the NSW government)
Each raster layer will need to be wrangled so that they fit neatly together (same area, same projection, same resolution). We will perform a common series of modifications to each layer that generally fall into the following steps:
- Crop the layer to our study area
- Mask (or remove) the ocean from our layer so that it does not skew our model
- Rename the layer to something sensible
- Resample each layer to the same resolution (so that every layer’s grid aligns correctly)
These steps might not always be in the same order, but you will recognise them as we go through the next few sections to download our spatial data.
Elevation
First we’ll download elevation data for our defined region using the {elevatr}
package’s handy get_elev_raster()
function. By passing our bounding box bbox_sf
to the locations
argument we can return elevation data (in metres) for our specified area!
# Download elevation raster
<- get_elev_raster(locations = bbox_sf,
elevation_data z = 9,
prj = "EPSG:4326")
# Remove raster information outside of the aus land boundary
<- elevation_data |>
elevation_aligned ::rast() |> # convert to SpatRaster class
terra::mask(aus) |> # remove information outside of aus boundary
terra::crop(bbox_ext) # crop layer to bbox
terra
# Rename layer for simplicity
names(elevation_aligned) <- "elevation"
Code
ggplot() +
geom_spatraster(data = elevation_aligned, aes(fill = elevation)) +
scale_fill_terrain_c(na.value = NA) +
guides(fill = guide_colorbar(title = "Elevation (m)")) +
theme_minimal()
Tree cover
Next we’ll download tree cover data. This tree cover raster layer contains satellite data from Hansen et al (2010) where tree cover is recorded as a percentage (0
= no cover and 100
= complete tree cover).
Spatial data are held on the Global Land Analysis and Discovery website. Tree cover data files are divided in tiles of 10 x 10 latitude/longitude. Because our study area crosses over two tiles (30S_150
and 30S_140
), we will need to download and stitch together two rasters. We can then crop them down to our specified area.
To download the files, go to the global 2010 treecover dataset –> click on the url link under the Data Links subheading near the bottom of the page –> click on the relevant data links to download. The two file names we are interested in are: treecover2010_30S_140E.tif
and treecover2010_30S_150E.tif
. Save these files in your local working directory.
Use code to download
Alternatively, we can use the following code chunk to download files:
download.file("https://glad.umd.edu/Potapov/TCC_2010/treecover2010_30S_140E.tif", destfile="treecover2010_30S_140E.tif")
download.file("https://glad.umd.edu/Potapov/TCC_2010/treecover2010_30S_150E.tif", destfile="treecover2010_30S_150E.tif")
Let’s load our files into R as rasters.
# Load the raster
<- terra::rast(here("treecover2010_30S_150E.tif"))
tree_cover_150 <- terra::rast(here("treecover2010_30S_140E.tif")) tree_cover_140
Now we can crop them to our study area and merge them together.
# Crop to bbox
<- tree_cover_150 |> terra::crop(bbox_ext)
tree_cover_150_cropped <- tree_cover_140 |> terra::crop(bbox_ext)
tree_cover_140_cropped
# Merge
<- merge(tree_cover_150_cropped, tree_cover_140_cropped)
tree_cover
# Rename for simplicity
names(tree_cover) <- "treecover"
When working with spatial data we tend to use up a lot of memory! That’s a big reason why we are performing these cropping steps—so that we can use up the least amount of memory necessary to get our model to function. Because of this, after we have stitched our two rasters together, we should safely delete the other two as we will no longer require them to clear space.
# We can now safely delete the first two raster files to save memory
rm(tree_cover_140); rm(tree_cover_150)
Code
ggplot() +
geom_spatraster(data = tree_cover, aes(fill = treecover)) +
scale_fill_viridis_c(direction = -1, begin = 0.8, end = 0.1) +
guides(fill = guide_colorbar(title = "Tree Cover (%)")) +
theme_minimal()
Temperature & precipitation
Next we’ll download two bioclimatic raster layers from CHELSA. CHELSA hosts climate projections at high resolutions intended for ecological use (here’s the paper about it). Climate projections extend from 1981 to 2100. Vegetation that greater gliders inhabit is sensitive to temperature and precipitation changes, so we decided to use climate projections for 2011-2040 for the following two climate variables:
- BIO1: Mean Annual Temperature
- BIO12: Annual Precipitation
To download these layers on the CHELSA website, click on ‘Downloads’ in the top bar, under version 2.1 click the ‘Download’ button, then navigate to GLOBAL/*
–> climatologies/
–> UKESM1-0-LL
–> ssp370/
–> bio/
. Then select the files with names beginning with CHELSA_bio1_2011-2040
and CHELSA_bio12_2011-2040
. Save these files in your local directory.
Use code to download
Alternatively, we can use the following code chunk to download files:
# Download our two raster files
download.file("https://os.zhdk.cloud.switch.ch/chelsav2/GLOBAL/climatologies/2011-2040/UKESM1-0-LL/ssp370/bio/CHELSA_bio1_2011-2040_ukesm1-0-ll_ssp370_V.2.1.tif",
destfile="CHELSA_bio1_2011-2040_ukesm1-0-ll_ssp370_V.2.1.tif", mode="wb")
download.file("https://os.zhdk.cloud.switch.ch/chelsav2/GLOBAL/climatologies/2011-2040/UKESM1-0-LL/ssp370/bio/CHELSA_bio12_2011-2040_ukesm1-0-ll_ssp370_V.2.1.tif",
destfile="CHELSA_bio12_2011-2040_ukesm1-0-ll_ssp370_V.2.1.tif", mode="wb")
Let’s load our files into R as rasters.
<- rast("CHELSA_bio1_2011-2040_ukesm1-0-ll_ssp370_V.2.1.tif")
temp <- rast("CHELSA_bio12_2011-2040_ukesm1-0-ll_ssp370_V.2.1.tif") precip
Now we’ll mask and crop each layer, then rename them.
# mask and crop to study area
<- temp |>
temp ::mask(aus) |> # remove oceans
terra::crop(bbox_ext) # crop to bbox
terra
<- precip |>
precip ::mask(aus) |> # remove oceans
terra::crop(bbox_ext) # crop to bbox
terra
# Rename for simplicity
names(temp) <- "temp_bio1"
names(precip) <- "precip_bio12"
Burnt area
Finally, we will download a fire extent raster layer, documenting the area burnt by the 2019-2020 bushfires in New South Wales. The NSW Government’s Fire Extent and Severity Mapping (FESM) 2019/20 dataset contains information on the extent and severity of burnt areas across the state.
To download, navigate to the NSW Government website, click on ‘Dataset Packages’, then select the download icon next to ‘FESM v3-data in IMG and TIFF format’. Save this zip folder in your working directory and uncompress the folder.
While the download itself is only several hundred megabytes, when uncompressed the .tif file is very large (10.3 GB). Please keep this in mind when choosing a place to store the folder!
Let’s load the fire extent layer into R as a raster.
# Load raster
<- rast("cvmsre_NSW_20192020_ag1l0.tif") fire_extent
This file is pretty huge and it’s in the wrong projection, which we can see under coord. ref
when we view the object. Our desired CRS is WGS84
/EPSG:4326
but this raster is projected using CRS GDA_94_Lambert
, which affects the extent and coordinates.
fire_extent
class : SpatRaster
dimensions : 103386, 107910, 1 (nrow, ncol, nlyr)
resolution : 9.993738, 9.993738 (x, y)
extent : 8857447, 9935872, 4022290, 5055502 (xmin, xmax, ymin, ymax)
coord. ref. : GDA94_NSW_Lambert
source : cvmsre_NSW_20192020_ag1l0.tif
color table : 1
name : Layer_1
This poses an issue for us. Reprojecting the entire file to WGS84
(like we have for other rasters) will take up a lot of processing time and memory if we do this reprojection as a first step.
To save time and memory, let’s use an alternative method of cropping first and reprojecting second. We’ll first crop fire_extent
to our desired study area by taking our original bbox_ext
object and matching its projection to the CRS of fire_extent
. We’ll use bbox_ext_gda94
to crop fire_extent
to a much smaller area. Then, second, we can reproject our fire_extent_cropped
object to the correct projection (and crop again to be certain our layer matches with other layers).
First let’s convert our bounding box bbox_ext
, convert it to a polygon and reproject that polygon to use the CRS of fire_extent
.
# Convert bbox to polygon with new projection that matches fire_extent
<- bbox_ext |>
bbox_ext_gda94 ::as.polygons(
terracrs = gliders_sf # set crs to match glider data
|>
) ::project(fire_extent) # reproject crs to match fire_extent
terra
bbox_ext_gda94
class : SpatVector
geometry : polygons
dimensions : 1, 0 (geometries, attributes)
extent : 9432876, 9664886, 4021704, 4305054 (xmin, xmax, ymin, ymax)
coord. ref. : GDA94_NSW_Lambert
Now we can use our new bounding box bbox_ext_gda94
to crop fire_extent
to our study area, then reproject it to our desired CRS WGS84
. When complete, you’ll notice the extent
and coord. ref.
or fire_extent_cropped
have changed to match our expected bounding box and CRS.
The next few steps to crop and reproject fire_extent
takes ~5 minutes to run.
# Crop fire_extent using the reprojected bbox, then reproject to WGS84
<- fire_extent |>
fire_extent_cropped ::crop(bbox_ext_gda94) |>
terra::project(crs(gliders_sf)) # reproject to match glider data
terra
fire_extent_cropped
class : SpatRaster
dimensions : 26110, 26732, 1 (nrow, ncol, nlyr)
resolution : 9.962733e-05, 9.962733e-05 (x, y)
extent : 148.4563, 151.1196, -37.54717, -34.9459 (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326)
source : spat_38b02a6426e6_14512_80Q6udnsxVen2Mq.tif
name : Layer_1
min value : 0
max value : 5
If there are a few tailing decimal places that appeared while converting the extent
of fire_extent_cropped
, we can crop it again using bbox_ext
to make sure it matches other rasters (the extent
will slightly but noticeably change).
<- fire_extent_cropped |>
fire_extent_cropped ::crop(bbox_ext) terra
We’ll now mask out the ocean like we did for other layers.
# Remove ocean
<- fire_extent_cropped |>
fire_extent_cropped ::mask(aus)
terra
fire_extent_cropped
class : SpatRaster
dimensions : 2053, 2054, 1 (nrow, ncol, nlyr)
resolution : 0.001217593, 0.001217593 (x, y)
extent : 148.4994, 151.0003, -37.5001, -35.00038 (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326)
source(s) : memory
varname : cvmsre_NSW_20192020_ag1l0
name : fire_extent
min value : 0
max value : 5
Now would be a good time to save your new cropped raster layer locally. This cropped raster layer is much smaller than our original layer, and it will be easier (and faster) to use this smaller layer in a workflow. Just be sure to document where you downloaded the larger file and the process you used to crop it!
Code
ggplot() +
geom_spatraster(data = fire_extent_cropped, aes(fill = fire_extent)) +
scale_fill_princess_c(palette = "america") +
guides(fill = guide_colorbar(title = "Fire Severity")) +
theme_minimal()
The fire_extent
layer contains information of the fire’s severity on a scale from 0 (unburnt) to 5 (extreme). For our purposes, we are mainly interested in whether the area was affected by fire or not, rather than its severity.
If you are interested, the fire_extent
dataset categorises fire severity using the following values:
- 0 - Unburnt (0% canopy and understory burnt)
- 1 - Reserved (Experimental category, is not used in raster right now)
- 2 - Low (> 10% burnt upderstory, >90% green canopy)
- 3 - Moderate (20-90% canopy scorch)
- 4 - High (> 90% canopy scorched, <50% canopy consumed)
- 5 - Extreme (>50% canopy biomass consumed)
For simplicity, let’s recode fire_extent
to whether an area is burnt or unburnt and save this info in a new column burnt
. We’ll use a dummy variable to make our results easier to interpret where burnt is 1
and unburnt is -1
. Then we’ll rename the layer.
<- fire_extent_cropped |>
burnt_cropped mutate(
# make `burnt` column
burnt = case_when(
>= 1 ~ 1, # burnt
fire_extent == 0 ~ -1, # unburnt
fire_extent .default = NA_integer_
)|>
) select(-fire_extent) # remove fire_extent column
# Rename for simplicity
names(burnt_cropped) <- "burnt"
burnt_cropped
class : SpatRaster
dimensions : 2053, 2054, 1 (nrow, ncol, nlyr)
resolution : 0.001217593, 0.001217593 (x, y)
extent : 148.4994, 151.0003, -37.5001, -35.00038 (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326)
source(s) : memory
name : burnt
min value : -1
max value : 1
Match resolution
A final important step is to make sure our layers are projected at the same resolution. Aligning rasters allows for more accurate and reliable results because grid cells won’t overlap in unexpected ways (which would affect our model). Typically, the resolution should match the layer with the lowest resolution. In our case, this is elevation_aligned
layer, which you can see if you print the object to the console and compare with other layers.
elevation_aligned
class : SpatRaster
dimensions : 2053, 2054, 1 (nrow, ncol, nlyr)
resolution : 0.001217593, 0.001217593 (x, y)
extent : 148.4994, 151.0003, -37.5001, -35.00038 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs
source(s) : memory
varname : file6d6488e4000
name : elevation
min value : -728
max value : 1901
Let’s resample each layer to match the resolution of elevation_aligned
using terra::resample()
.
# Resample our tree cover to the resolution of the elevation data
<- terra::resample(tree_cover, elevation_aligned)
tree_cover <- terra::resample(temp, elevation_aligned)
temp <- terra::resample(precip, elevation_aligned)
precip <- terra::resample(burnt_cropped, elevation_aligned) burnt
Combine spatial layers
We can now combine all of our raster layers into one object containing:
- Elevation
- Tree cover
- Mean annual temperature (BIO1)
- Annual precipitation (BIO12)
- Burnt area from the 2019-2020 bushfires
<- c(elevation_aligned, tree_cover, temp, precip, burnt)
combined_rasters
combined_rasters
class : SpatRaster
dimensions : 2053, 2054, 5 (nrow, ncol, nlyr)
resolution : 0.001217593, 0.001217593 (x, y)
extent : 148.4994, 151.0003, -37.5001, -35.00038 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs
source(s) : memory
varnames : file6d6488e4000
file6d6488e4000
file6d6488e4000
...
names : elevation, treecover, temp_bio1, precip_bio12, burnt
min values : -728, 0.00000, 6.293441, 468.7325, -1
max values : 1901, 94.93295, 18.281870, 1358.0767, 1
Phew! That was a lot of data preparation. But with our spatial data sorted, we are now ready to begin preparing our model!
Build our model
If you’re new to species distribution modeling (SDM) or want a deeper dive into the details of species distribution modelling using tidymodels, check out this ALA Labs post. We’ll use a similar workflow but will spend less time explaining the ins-and-outs of how species distribution modelling works.
Prepare data
First, we will thin our data so that there is only one glider observation in any individual grid cell, so that each grid cell contains a “presence” or “absence”.
<- tidysdm::thin_by_cell(gliders_sf,
gliders_thin raster = combined_rasters)
Next, we will add pseudo-absences4 to our data because our glider observations from the ALA are presence-only5.
# Generate pseudo-absences
<- tidysdm::sample_pseudoabs(
gliders_pseudoabs
gliders_thin,n = 3 * nrow(combined_rasters),
raster = combined_rasters,
method = c("dist_min", tidysdm::km2m(5))
)
# Extract environmental data for each pseudo-absence
<- gliders_pseudoabs |>
gliders_pseudoabs # Extract tree cover, elevation and climate values for pseudoabs points
bind_cols(
::extract(combined_rasters,
terra
gliders_pseudoabs,ID = FALSE)
)
gliders_pseudoabs
Simple feature collection with 7849 features and 6 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 148.5 ymin: -37.49949 xmax: 150.8305 ymax: -35.00099
Geodetic CRS: WGS 84
# A tibble: 7,849 × 7
class geometry elevation treecover temp_bio1 precip_bio12
* <fct> <POINT [°]> <dbl> <dbl> <dbl> <dbl>
1 presence (149.5415 -35.71064) 1111 82.8 11.2 835.
2 presence (149.444 -36.59851) 920 87.1 11.7 861.
3 presence (149.5139 -35.60314) 1041 82.3 11.4 781.
4 presence (148.9655 -37.19917) 994 90 10.5 948.
5 presence (150.2896 -35.48085) 143 90.0 16.9 1010.
6 presence (149.5631 -35.93422) 970 71.7 11.5 884.
7 presence (149.4088 -36.74284) 947 85.4 11.3 863.
8 presence (148.8274 -37.13912) 979 90 10.5 925.
9 presence (150.2758 -35.48776) 128 86.4 17.0 970.
10 presence (149.5228 -35.86793) 1233 76.5 10.3 889.
# ℹ 7,839 more rows
# ℹ 1 more variable: burnt <dbl>
And we’ll join our pseudo-absences to our thinned presence data gliders_thin
to attach any missing columns.
<- gliders_pseudoabs |>
gliders_joined st_join(
gliders_thin, left = TRUE
)
gliders_joined
Simple feature collection with 7849 features and 13 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 148.5 ymin: -37.49949 xmax: 150.8305 ymax: -35.00099
Geodetic CRS: WGS 84
# A tibble: 7,849 × 14
class geometry elevation treecover temp_bio1 precip_bio12
* <fct> <POINT [°]> <dbl> <dbl> <dbl> <dbl>
1 presence (149.5415 -35.71064) 1111 82.8 11.2 835.
2 presence (149.444 -36.59851) 920 87.1 11.7 861.
3 presence (149.5139 -35.60314) 1041 82.3 11.4 781.
4 presence (148.9655 -37.19917) 994 90 10.5 948.
5 presence (150.2896 -35.48085) 143 90.0 16.9 1010.
6 presence (149.5631 -35.93422) 970 71.7 11.5 884.
7 presence (149.4088 -36.74284) 947 85.4 11.3 863.
8 presence (148.8274 -37.13912) 979 90 10.5 925.
9 presence (150.2758 -35.48776) 128 86.4 17.0 970.
10 presence (149.5228 -35.86793) 1233 76.5 10.3 889.
# ℹ 7,839 more rows
# ℹ 8 more variables: burnt <dbl>, recordID <chr>, scientificName <chr>,
# taxonConceptID <chr>, fire_period <chr>, eventDate <dttm>,
# occurrenceStatus <chr>, dataResourceName <chr>
Earlier, we categorised our greater glider observations as pre_fire
or post_fire
. We will need to categorise our pseudo-absences into the same categories. Doing so allows our model to generate a probability of presence in both conditions.
As a simple way to categorise our pseudo-absence points, we’ll use sample()
to randomly assign them to pre_fire
or post-fire
which will assign a more-or-less equal number of randomly selected pseudo-absence points for both timeframes6. Doing this with case_when()
allows us to conditionally assign a new category only when there isn’t one already (i.e. only for pseudo-absence points).
# Assign `pre_fire` and `post_fire` categories to pseudoabsences
<- gliders_joined |>
gliders_categorised mutate(
fire_period = case_when(
# If there is no category already, sample one
is.na(fire_period) ~ sample(c("pre_fire", "post_fire"),
size = nrow(gliders_joined),
replace = TRUE),
.default = fire_period
)|>
) select(1:2, fire_period, everything()) # reorder
gliders_categorised
Simple feature collection with 7849 features and 13 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 148.5 ymin: -37.49949 xmax: 150.8305 ymax: -35.00099
Geodetic CRS: WGS 84
# A tibble: 7,849 × 14
class geometry fire_period elevation treecover temp_bio1
<fct> <POINT [°]> <chr> <dbl> <dbl> <dbl>
1 presence (149.5415 -35.71064) post_fire 1111 82.8 11.2
2 presence (149.444 -36.59851) post_fire 920 87.1 11.7
3 presence (149.5139 -35.60314) pre_fire 1041 82.3 11.4
4 presence (148.9655 -37.19917) pre_fire 994 90 10.5
5 presence (150.2896 -35.48085) post_fire 143 90.0 16.9
6 presence (149.5631 -35.93422) post_fire 970 71.7 11.5
7 presence (149.4088 -36.74284) pre_fire 947 85.4 11.3
8 presence (148.8274 -37.13912) pre_fire 979 90 10.5
9 presence (150.2758 -35.48776) pre_fire 128 86.4 17.0
10 presence (149.5228 -35.86793) post_fire 1233 76.5 10.3
# ℹ 7,839 more rows
# ℹ 8 more variables: precip_bio12 <dbl>, burnt <dbl>, recordID <chr>,
# scientificName <chr>, taxonConceptID <chr>, eventDate <dttm>,
# occurrenceStatus <chr>, dataResourceName <chr>
pre-fire
or post-fire
To make the modelled results of our fire_period
column easier to interpret, let’s reformat our categories to a dummy variable format, where pre_fire
is -1
and post_fire
is 1
. Whichever number our coefficient is closer to will tell us which period had more greater glider observations.
# Change fire period to a dummy variable
<- gliders_categorised |>
gliders_filtered mutate(
fire_period = as.numeric(
if_else(fire_period == "post_fire", 1, -1) # post_fire = 1
)
)
|> head(5L) gliders_filtered
Simple feature collection with 5 features and 13 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 148.9655 ymin: -37.19917 xmax: 150.2896 ymax: -35.48085
Geodetic CRS: WGS 84
# A tibble: 5 × 14
class geometry fire_period elevation treecover temp_bio1
<fct> <POINT [°]> <dbl> <dbl> <dbl> <dbl>
1 presence (149.5415 -35.71064) 1 1111 82.8 11.2
2 presence (149.444 -36.59851) 1 920 87.1 11.7
3 presence (149.5139 -35.60314) -1 1041 82.3 11.4
4 presence (148.9655 -37.19917) -1 994 90 10.5
5 presence (150.2896 -35.48085) 1 143 90.0 16.9
# ℹ 8 more variables: precip_bio12 <dbl>, burnt <dbl>, recordID <chr>,
# scientificName <chr>, taxonConceptID <chr>, eventDate <dttm>,
# occurrenceStatus <chr>, dataResourceName <chr>
Train our model
Our data is ready to be used for model training and testing. Let’s split our data into training and testing datasets, allocating ~75% of points to training and ~25% to testing.
set.seed(100)
# Allocate data into training or testing datasets
<-
gliders_split |>
gliders_filtered initial_split()
gliders_split
<Training/Testing/Total>
<5886/1963/7849>
# Create datasets
<- training(gliders_split)
gliders_train <- testing(gliders_split) gliders_test
|> head(5L) gliders_train
Simple feature collection with 5 features and 13 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 148.858 ymin: -36.86635 xmax: 150.4548 ymax: -35.07283
Geodetic CRS: WGS 84
# A tibble: 5 × 14
class geometry fire_period elevation treecover temp_bio1
<fct> <POINT [°]> <dbl> <dbl> <dbl> <dbl>
1 pseudoabs (149.7846 -36.86635) 1 512 85.1 14.1
2 presence (150.4548 -35.35563) 1 86 70.9 17.5
3 pseudoabs (149.0406 -35.76564) -1 1353 67.0 9.68
4 pseudoabs (148.858 -35.07283) 1 643 0 13.6
5 pseudoabs (149.0455 -35.39184) -1 621 0.172 13.6
# ℹ 8 more variables: precip_bio12 <dbl>, burnt <dbl>, recordID <chr>,
# scientificName <chr>, taxonConceptID <chr>, eventDate <dttm>,
# occurrenceStatus <chr>, dataResourceName <chr>
|> head(5L) gliders_test
Simple feature collection with 5 features and 13 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 148.617 ymin: -37.19054 xmax: 149.6419 ymax: -35.35184
Geodetic CRS: WGS 84
# A tibble: 5 × 14
class geometry fire_period elevation treecover temp_bio1
<fct> <POINT [°]> <dbl> <dbl> <dbl> <dbl>
1 presence (149.5139 -35.60314) -1 1041 82.3 11.4
2 presence (148.8274 -37.13912) -1 979 90 10.5
3 presence (148.617 -35.35184) 1 1175 69.8 10.3
4 presence (149.6419 -35.35771) 1 899 81.7 12.2
5 presence (148.8454 -37.19054) 1 919 78.0 10.9
# ℹ 8 more variables: precip_bio12 <dbl>, burnt <dbl>, recordID <chr>,
# scientificName <chr>, taxonConceptID <chr>, eventDate <dttm>,
# occurrenceStatus <chr>, dataResourceName <chr>
Now we’ll resample our data into folds (i.e. smaller resampled datasets) using block cross-validation, a type of resampling better-suited to spatial data7. We will use these folds to train and tune our model.
# Perform Cross validation
<- spatial_block_cv(gliders_train, v = 5)
gliders_cv
gliders_cv
# 5-fold spatial block cross-validation
# A tibble: 5 × 2
splits id
<list> <chr>
1 <split [4873/1013]> Fold1
2 <split [4613/1273]> Fold2
3 <split [4426/1460]> Fold3
4 <split [4890/996]> Fold4
5 <split [4742/1144]> Fold5
Next, we’ll define our model’s “recipe”. We wish to test how our response variable class
(presence or absence) is affected by each predictor variable (burnt
, fire_period
, elevation
, temp_bio1
, temp_bio12
, and treecover
). We’ll also add an interaction between burnt
and fire_period
because we expect greater glider occurrences to be more strongly affected in burnt areas and after the fires.
<- recipe(
gliders_recipe
gliders_train, formula = class ~ burnt + fire_period + elevation + temp_bio1 + precip_bio12 + treecover
|>
) step_interact(terms = ~burnt:fire_period)
gliders_recipe
Now we can set our workflow, which uses our gliders_recipe
and training data to train several types of models (see tidysdm for more information on these model specifications).
<-
gliders_models workflow_set(
preproc = list(default = gliders_recipe), # Use the same recipe for all
models = list(
glm = sdm_spec_glm(), # Generalised Linear Model
rf = sdm_spec_rf(), # Random Forest
gbm = sdm_spec_boost_tree(), # Gradient Boosting Machine
maxent = sdm_spec_maxent() # Maximum Entropy
),cross = TRUE
|>
) option_add(control = control_ensemble_grid())
gliders_models
# A workflow set/tibble: 4 × 4
wflow_id info option result
<chr> <list> <list> <list>
1 default_glm <tibble [1 × 4]> <opts[1]> <list [0]>
2 default_rf <tibble [1 × 4]> <opts[1]> <list [0]>
3 default_gbm <tibble [1 × 4]> <opts[1]> <list [0]>
4 default_maxent <tibble [1 × 4]> <opts[1]> <list [0]>
We’re ready to tune and optimise our models using our workflow above and the cross validation blocks we generated earlier. This step helps us find which model parameters and hyperparameters make reasonable predictions.
Using autoplot()
, we can quickly see which models performed best using three performance metrics for evaluating species distribution models (boyce_cont
, roc_auc
and tss_max
). In general, results are pretty varied depending on the metric, though Maxent models performed best according to the Boyce Continuous Index.
set.seed(9999)
# Tune the model using cross validation
<-
gliders_models_tune |>
gliders_models workflow_map("tune_grid",
resamples = gliders_cv, # Use our cross-validation blocks for tuning
grid = 6, # number of tuning iterations
metrics = sdm_metric_set(), # Evaluate model performance
verbose = TRUE,
control = stacks::control_stack_grid()
)
gliders_models_tune
# A workflow set/tibble: 4 × 4
wflow_id info option result
<chr> <list> <list> <list>
1 default_glm <tibble [1 × 4]> <opts[4]> <rsmp[+]>
2 default_rf <tibble [1 × 4]> <opts[4]> <tune[+]>
3 default_gbm <tibble [1 × 4]> <opts[4]> <tune[+]>
4 default_maxent <tibble [1 × 4]> <opts[4]> <tune[+]>
autoplot(gliders_models_tune)
To help improve accuracy and generalisation of our model predictions, let’s “stack” our models into an ensemble model to blend predictions. Again using autoplot()
, we can see the relative weighting each model adds to our final prediction.
set.seed(98765)
<-
gliders_stacked stacks() |> # Initialize the stack
add_candidates(gliders_models_tune) |> # Add models
blend_predictions() |> # Blend their predictions
fit_members() # Fit the final model
# See model contribution
autoplot(gliders_stacked,
type = "weights")
Now that we have our stacked ensemble model, it’s time to test how well it performs on new data. We do this by making predictions about the points in gliders_test
and comparing them to the “true” results.
# Predict probability of presence
<-
gliders_test_predictions %>%
gliders_test bind_cols(predict(gliders_stacked, .,
type = "prob",
save_pred = TRUE))
|>
gliders_test_predictions select(class, .pred_pseudoabs, .pred_presence, everything()) # reorder cols
Simple feature collection with 1963 features and 15 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 148.5 ymin: -37.49706 xmax: 150.8 ymax: -35.00221
Geodetic CRS: WGS 84
# A tibble: 1,963 × 16
class .pred_pseudoabs .pred_presence geometry fire_period
<fct> <dbl> <dbl> <POINT [°]> <dbl>
1 presence 0.153 0.847 (149.5139 -35.60314) -1
2 presence 0.0680 0.932 (148.8274 -37.13912) -1
3 presence 0.578 0.422 (148.617 -35.35184) 1
4 presence 0.244 0.756 (149.6419 -35.35771) 1
5 presence 0.176 0.824 (148.8454 -37.19054) 1
6 presence 0.403 0.597 (149.4186 -36.57799) -1
7 presence 0.0669 0.933 (148.923 -37.2277) -1
8 presence 0.261 0.739 (149.507 -35.80711) 1
9 presence 0.0761 0.924 (150.2911 -35.65586) 1
10 presence 0.288 0.712 (148.659 -35.25845) 1
# ℹ 1,953 more rows
# ℹ 11 more variables: elevation <dbl>, treecover <dbl>, temp_bio1 <dbl>,
# precip_bio12 <dbl>, burnt <dbl>, recordID <chr>, scientificName <chr>,
# taxonConceptID <chr>, eventDate <dttm>, occurrenceStatus <chr>,
# dataResourceName <chr>
Using tidysdm::sdm_metric_set()
, we can print several helpful metrics about our model’s performance. Overall, our model seems to have performed very well8.
# Evaluate performance
|>
gliders_test_predictions sdm_metric_set()(truth = class, .pred_presence)
# A tibble: 3 × 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 boyce_cont binary 0.869
2 roc_auc binary 0.969
3 tss_max binary 0.843
boyce_cont
: The Boyce Continuous Index is from -1 to 1, where values closer to 1 indicate the model’s predictions are consistent with the actual value.roc_auc
: The Relative Operating Characteristics Area Under the Curve (ROC AUC) is from 0.5 to 1, where a value of 0.5 indicates that the model’s predictions are no better than random chance, whereas a value of 1 indicates perfect prediction.tss_max
: The True Skill Statistic is from -1 to 1, where values closer to 1 indicate better model performance.
Greater glider distribution
We are nearly ready to map our predicted distribution. There’s just one final step before predicting a distribution surface.
Our model includes the variable fire_period
, a categorical variable without an associated spatial layer. However, in order for our model to predict a spatial surface with fire_period
included (and it must be included because our model uses it), we have to create a raster based on the point locations and values of fire_period
. We’ll follow the same general steps that we did earlier, building a raster, cropping and matching the resolution of the other layers.
# Add fire_period as a spatial layer
<- gliders_filtered |>
fire_period_rast select(fire_period) |>
::vect() |> # convert to vector
terra::rast() |> # rasterise
terra::crop(bbox_ext) |> # crop to bounding box
terra::resample(elevation_aligned) # match resolution
terra
# add fire_period data to raster
::values(fire_period_rast) <- gliders_filtered$fire_period
terranames(fire_period_rast) <- "fire_period" # Rename for simplicity
fire_period_rast
class : SpatRaster
dimensions : 2053, 2054, 1 (nrow, ncol, nlyr)
resolution : 0.001217593, 0.001217593 (x, y)
extent : 148.4994, 151.0003, -37.5001, -35.00038 (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326)
source(s) : memory
varname : file6d6488e4000
name : fire_period
min value : -1
max value : 1
Now we can join fire_period_rast
with the other raster layers
<- c(combined_rasters, fire_period_rast)
combined_rasters_complete combined_rasters_complete
class : SpatRaster
dimensions : 2053, 2054, 6 (nrow, ncol, nlyr)
resolution : 0.001217593, 0.001217593 (x, y)
extent : 148.4994, 151.0003, -37.5001, -35.00038 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs
source(s) : memory
varnames : file6d6488e4000
file6d6488e4000
file6d6488e4000
...
names : elevation, treecover, temp_bio1, precip_bio12, burnt, fire_period
min values : -728, 0.00000, 6.293441, 468.7325, -1, -1
max values : 1901, 94.93295, 18.281870, 1358.0767, 1, 1
Ok! We’re finally ready to make our final prediction of greater gliders’ distribution. We’ll use tidysdm::predict_raster()
, a very helpful function for spatial predictions, and maps our results!
# Predict
<- predict_raster(gliders_stacked,
prediction_present
combined_rasters_complete, type = "prob",
wopt = list(steps=32))
# Map
ggplot() +
geom_spatraster(data = prediction_present,
aes(fill = .pred_presence)) +
scale_fill_viridis_c(option = "E", na.value = NA) +
guides(fill = guide_colorbar(
title = "Relative\nHabitat\nSuitability")
+
) labs(title = "Predicted distribution of greater gliders",
subtitle = "Southeast New South Wales") +
::theme_pilot(grid="hv") +
pilottheme(
legend.text = element_text(hjust = 0.5)
)
As we used pseudo-absences rather than true absences, it’s best to interpret the map as showing us the relative habitat suitability for greater gliders across the area (as opposed to a true distribution of where greater gliders occur). The highest habitat suitability for greater gliders appears to be along the middle forest area where most of our greater glider observations were located (which makes sense), along with a few larger areas in the upper and lower west, and near the upper coastline.
Impact of fire
As the second part of our investigation, we wanted to know whether the 2019/2020 bushfires had an impact on greater gliders. Our hypothesis was that there are fewer greater glider occurrences after the fire, and a driver of this effect is that an area was burnt during the 2019/2020 bushfires.
To see the more traditional statistical test results (e.g., estimates, test statistics, p-values), we could extract them from our Generalised Linear Model (GLM) which we ran over our tuning process. These types of summary statistics are easier to interpret, but they might not be the best option for us to use.
Generalised Linear Models fit a straight line of best-fit to the data. As a result, it’s possible to return a coefficient that estimates the slope of the line and some test statistics about each variable’s importance in the model.
Other models we’ve run like Maxent and Random Forest models are a little different, though. They use weights to “penalize” a model to get a better fitted line; one that isn’t just straight, but shaped and curved to better fit the data. This shaped, squiggly line can be represented by a unique equation, not just a coefficient of slope. When asked to return results for models like these, tidymodels returns penalty values (lambda) used by the model, not a summary of statistics and p values.
We have to use other packages (like {vip} and {DALEX}) to assess more complex types of models like our complete gliders_stacked
model.
To see what we mean, let’s have a look at our GLM results. Before we see them, let’s remind ourselves what we are expecting to see given our hypothesis:
- Fewer occurrences after the fire. This should appear as a significant, positive
fire_period
coefficient. - Fewer occurrences in burnt areas. This should appear as a significant, positive
burnt
coefficient. - An interaction between fire period and burnt area, because burnt areas after fire will have very few records9.
Code
# See the models that performed best
# gliders_models_tune |>
# rank_results(
# rank_metric = "boyce_cont",
# select_best = TRUE
# )
# extract best glm model
<- gliders_models_tune |>
best_model extract_workflow(id = "default_glm")
# use best glm to fit complete `gliders_filtered` data
<-
fitted |>
best_model fit(data = gliders_filtered)
# see results
|>
fitted ::tidy() broom
# A tibble: 8 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 20.3 1.35 15.0 5.27e- 51
2 burnt 0.286 0.0408 7.02 2.15e- 12
3 fire_period -0.194 0.0362 -5.35 8.61e- 8
4 elevation -0.00586 0.000418 -14.0 8.76e- 45
5 temp_bio1 -0.729 0.0727 -10.0 1.18e- 23
6 precip_bio12 0.000354 0.000306 1.16 2.48e- 1
7 treecover -0.0836 0.00293 -28.5 1.68e-178
8 burnt_x_fire_period -0.115 0.0359 -3.21 1.32e- 3
When running a model on this many data points, it’s generally expected that most variables will be significant. It’s easier to look at the size of the effect (i.e. statistic
) and the estimate & standard error to figure out the direction and certainty of that estimate. The estimates are a little confusing here due to the model using presence
as the reference level—higher numbers are a stronger prediction of absence.
A quick first impression tells us that fire_period
and burnt
variables are significant but relatively weak predictors. You might also notice that some variables like elevation
are very significant in the model but have tiny estimate coefficients. Let’s use DALEX to help us understand how this is possible.
Use DALEX
If we want to understand more about our more complex stacked model, we can use the DALEXtra package. DALEX10 is a model-agnostic tool for assessing complex models.
library(DALEXtra)
DALEX assumes the opposite reference level to tidymodels for our outcome variable class
. So to make the results of DALEX make sense, we’ll need to recode the predicted values of our model to flip the 1’s and 0’s. We can do this by returning the results of our best model (according to the metric boyce_cont
), then we can use extract_mold()
to see the components of the model, extract the outcome values, and recode the 1’s and 0’s.
# get the results of the best fit model
<- fit_best(gliders_models_tune, metric = "boyce_cont")
best_fit
# flip `class` variable coding
<- best_fit |>
class_revised ::extract_mold() |> # components of fitted model results
workflowsets::pluck("outcomes") |> # pluck predicted 'class' from list
purrr::pull() |> # pull values from tibble
dplyrcase_match("pseudoabs" ~ 1, "presence" ~ 0) # recode
|> head(30L) class_revised
[1] 1 0 1 1 1 1 1 1 1 0 1 0 0 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1
The rest is very straightforward (thank goodness). We can feed our stacked model, data and revised class
variable into explain_tidymodels()
to do exactly that!
<- explain_tidymodels(gliders_stacked,
test data = gliders_train,
y = class_revised)
Preparation of a new explainer is initiated
-> model label : list ( default )
-> data : 5886 rows 14 cols
-> data : tibble converted into a data.frame
-> target variable : 5886 values
-> predict function : yhat.model_stack will be used ( default )
-> predicted values : No value for predict function target column. ( default )
-> model_info : package stacks , ver. 1.0.5 , task classification ( default )
-> predicted values : numerical, min = 0.04100672 , mean = 0.76612 , max = 0.9738481
-> residual function : difference between y and yhat ( default )
-> residuals : numerical, min = -0.9528437 , mean = 0.01743423 , max = 0.9202559
A new explainer has been created!
Let’s see which parts of our model had the greatest importance for predicting the outcome. You’ll notice these results look similar to the results of our GLM.
|>
test model_parts() |>
plot(show_boxplots = FALSE)
We can also see the partial effects of our model, which show how the model is fitting each individual variable effect. Here we can appreciate the model’s complexity, as each variable has its own non-linear fitted line! DALEX can return many other helpful summaries to check model performance and results. See more here.
model_profile(test,
type = "partial",
variables = c("burnt", "fire_period", "treecover", "temp_bio1", "precip_bio12", "elevation")
|>
) plot()
Overall, our results seem to show that fire does have an effect, but it’s a relatively small effect on greater glider occurrences.
Here we can see a few things:
- Tree cover and elevation are excellent predictors of greater gliders, which makes sense given they live in trees in alpine regions.
- The negative estimate of
fire_period
suggests that there are more records of greater gliders after the fires than before11. This is probably due to the growing number of records each year in the ALA (though the effect is relatively small). - The slight positive estimate of
burnt
suggests fewer records in burnt areas12. However, the effect is pretty small.
Final thoughts
The 2019-2020 bushfires had a huge impact on the Australian environment. However, it wasn’t all negative; across taxonomic groups there was a wide range of negative and positive effects from the fire. It might not all be “doom-and-gloom” after all, thanks to the resilience of many Australian species to fires.
Our investigation finds that despite the general negative impact of severe fire on forests, greater gliders may have been less immediately affected in the aftermath of the 2019-2020 fires than other mammals and marsupials. In the short-term, glider distributions after megafires seem generally stable. However, other areas in NSW were more negatively impacted, and they may deteriorate over longer periods. Ultimately, factors like the density of hollowed den trees may determine how well greater gliders survive after fire, not only the burnt forest area.
To learn more on ALA Labs, check out these posts on using tidymodels and tidysdm and mapping multiple overlapping species distributions.
Expand for session info
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.4.3 (2025-02-28 ucrt)
os Windows 10 x64 (build 19045)
system x86_64, mingw32
ui RTerm
language (EN)
collate English_Australia.utf8
ctype English_Australia.utf8
tz Australia/Sydney
date 2025-04-04
pandoc 3.2 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
broom * 1.0.7 2024-09-26 [1] CRAN (R 4.4.1)
DALEX * 2.4.3 2023-01-15 [1] CRAN (R 4.3.1)
DALEXtra * 2.3.0 2023-05-26 [1] CRAN (R 4.3.1)
dials * 1.3.0 2024-07-30 [1] CRAN (R 4.4.1)
dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.3.2)
elevatr * 0.99.0 2023-09-12 [1] CRAN (R 4.3.2)
forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.3.2)
galah * 2.1.1 2025-02-07 [1] CRAN (R 4.4.2)
geodata * 0.6-2 2024-06-10 [1] CRAN (R 4.4.1)
ggplot2 * 3.5.1 2024-04-23 [1] CRAN (R 4.4.3)
glmnet * 4.1-8 2023-08-22 [1] CRAN (R 4.3.1)
here * 1.0.1 2020-12-13 [1] CRAN (R 4.3.2)
infer * 1.0.7 2024-03-25 [1] CRAN (R 4.4.1)
lubridate * 1.9.4 2024-12-08 [1] CRAN (R 4.4.2)
Matrix * 1.7-0 2024-03-22 [1] CRAN (R 4.4.1)
maxnet * 0.1.4 2021-07-09 [1] CRAN (R 4.3.2)
modeldata * 1.4.0 2024-06-19 [1] CRAN (R 4.4.1)
ozmaps * 0.4.5 2021-08-03 [1] CRAN (R 4.3.2)
parsnip * 1.2.1 2024-03-22 [1] CRAN (R 4.4.1)
purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.2)
ranger * 0.16.0 2023-11-12 [1] CRAN (R 4.3.2)
readr * 2.1.5 2024-01-10 [1] CRAN (R 4.3.3)
recipes * 1.1.0 2024-07-04 [1] CRAN (R 4.4.1)
rsample * 1.2.1 2024-03-25 [1] CRAN (R 4.4.1)
scales * 1.3.0 2023-11-28 [1] CRAN (R 4.3.2)
sessioninfo * 1.2.2 2021-12-06 [1] CRAN (R 4.3.2)
sf * 1.0-19 2024-11-05 [1] CRAN (R 4.4.2)
spatialsample * 0.6.0 2024-10-02 [1] CRAN (R 4.4.1)
stacks * 1.0.5 2024-07-22 [1] CRAN (R 4.4.1)
stringr * 1.5.1 2023-11-14 [1] CRAN (R 4.3.2)
terra * 1.8-29 2025-02-26 [1] CRAN (R 4.4.3)
tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.3.2)
tidymodels * 1.2.0 2024-03-25 [1] CRAN (R 4.4.1)
tidyr * 1.3.1 2024-01-24 [1] CRAN (R 4.3.3)
tidysdm * 0.9.5 2024-06-23 [1] CRAN (R 4.4.1)
tidyterra * 0.6.1 2024-06-08 [1] CRAN (R 4.4.1)
tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.3.2)
tune * 1.2.1 2024-04-18 [1] CRAN (R 4.4.1)
workflows * 1.1.4 2024-02-19 [1] CRAN (R 4.4.1)
workflowsets * 1.1.0 2024-03-21 [1] CRAN (R 4.4.1)
xgboost * 1.7.8.1 2024-07-24 [1] CRAN (R 4.4.1)
yardstick * 1.3.1 2024-03-21 [1] CRAN (R 4.4.1)
[1] C:/Users/KEL329/R-packages
[2] C:/Users/KEL329/AppData/Local/Programs/R/R-4.4.3/library
──────────────────────────────────────────────────────────────────────────────
Footnotes
Southern greater gliders grow up to 1m long from head to tail and can glide up to 100 metres through the canopy!↩︎
For more info about what rasters are, see the “What’s a raster” section from this ALA Labs article.↩︎
Pseudo-absences, also called background points, are points that represent true absences in our data for modelling. They are there to help our model make predictions about what variables more strongly predict a presence vs an absence. Because we don’t have true presence data, this is one way to provide a model this information, though it is less informative for our interpretation.↩︎
They only provide information about the conditions of when a glider was observed but no information about when gliders aren’t observed↩︎
For this analysis, it doesn’t really matter whether we have a perfect split.↩︎
For more info on what this means, see this section from another ALA Labs article↩︎
So well, in fact, that it’s plausible it might be overfitting, making our model poor for prediction outside of our small spatial bounding box in New South Wales. For our intended analysis, this doesn’t matter, but for a model intended for use to make broad future predictions this could pose a problem.↩︎
To the extent that the combination of an area being
burnt
andpost_fire
might multiply the effect when together.↩︎DALEX stands for moDel Agnostic Language for Exploration and eXplanation. It may be one of the strangest acronyms of all time.↩︎
Based on how our variable is coded, a negative coefficient means that before fires, there is a higher likelihood of absence.↩︎
Based on how our variable is coded, a positive coefficient means that in burnt areas, there is a higher likelihood of absence.↩︎