import galah
import pandas as pd
import geopandas
import numpy as np
from dateutil.parser import parse
import matplotlib.pyplot as plt
import matplotlib as mpl
import alphashape
from flexitext import flexitext
Humans’ movement across the globe has led to the accidental, and sometimes deliberate, transportation of species beyond their native habitats. In Australia since European colonisation, around 3,000 species have been introduced.
Within the last 200 years over 100 native species have gone extinct, with invasive species labelled as affecting 82% (1,257 of 1,533) of Australia’s threatened taxa in 2018. Since 1960, invasive species have cost the Australian economy at least $390 billion in damages, and are now considered a main driver of extinctions in native plants and animals.
However, species from outside of Australia aren’t the only ones that can encroach on other species’ habitats. Native Australian species can do it, too. Thanks in part to human activity, changing temperatures and more frequent extreme weather events, some Australian species have established themselves in new areas outside of their native range. Although not as popularly discussed, Australian species that have become pests in new habitats can disrupt ecosystems much like internationally invasive species.
In this post, we will use Python and the {galah} package to visualise how distributions of both international invasive species and native introduced pest species have shifted over time. To do this, we will use alpha shapes to visualise the distribution of Rhinella marina (Cane toads) since the 1930s and create a choropleth map to visualise the expanded habitat range of Pittosporum undulatum.
Invasive Species
Download data
To start, we will use the infamous example of the cane toad to illustrate how far an invasive species’ distribution can spread each decade.
First load the required Python packages.
Next, we will use the {galah} package to download occurrence records of cane toads in Australia from the Atlas of Living Australia (ALA). You will need to first provide a registered email with the ALA using galah.galah_config()
before retrieving records.
# Add registered email (register at ala.org.au)
= "your-email@email.com")
galah.galah_config(email ="ALA") galah.galah_config(data_profile
= galah.atlas_occurrences(taxa = "Rhinella marina", use_data_profile = True)
cane_toads 5) cane_toads.head(
decimalLatitude | decimalLongitude | eventDate | scientificName | taxonConceptID | recordID | dataResourceName | occurrenceStatus | |
---|---|---|---|---|---|---|---|---|
0 | -38.300000 | 145.000000 | 1990-04-23T00:00:00Z | Rhinella marina | https://biodiversity.org.au/afd/taxa/e79179f8-... | 58772bea-1c61-4716-a453-f201e89fcc8d | Museums Victoria provider for OZCAM | PRESENT |
1 | -38.300000 | 145.000000 | 1990-07-21T00:00:00Z | Rhinella marina | https://biodiversity.org.au/afd/taxa/e79179f8-... | 141f0b10-bdf2-4239-80e4-1b5167e1f76c | Museums Victoria provider for OZCAM | PRESENT |
2 | -37.820000 | 145.230000 | 1990-04-14T00:00:00Z | Rhinella marina | https://biodiversity.org.au/afd/taxa/e79179f8-... | f6d07eb2-4f8f-476e-a212-726b10a4a745 | Museums Victoria provider for OZCAM | PRESENT |
3 | -36.431411 | 148.329322 | 2017-03-07T00:00:00Z | Rhinella marina | https://biodiversity.org.au/afd/taxa/e79179f8-... | 549908bf-0b34-4227-9288-42a7fa52dabf | NSW BioNet Atlas | PRESENT |
4 | -36.250765 | 149.120224 | 2021-02-04T00:00:00Z | Rhinella marina | https://biodiversity.org.au/afd/taxa/e79179f8-... | ae36a792-f694-4a73-acfb-e56be564def6 | NSW BioNet Atlas | PRESENT |
Clean data
We’ll clean our data to ensure that there are no null or missing values in our coordinates and date fields. Because galah.atlas_occurrences()
returns a Pandas dataframe, we have plenty of functions we can use to clean our data.
= cane_toads.dropna(subset=["eventDate", "decimalLatitude", "decimalLongitude"]) cane_toads
We want to map cane toad’s distribution each decade in our final visualisation. However, the eventDate
value for each record is formaatted as a string value yyyy-mm-dd Thh:mm:ssZ
. Let’s write our own function convert_date_to_decade()
that extract the year from a date string
and return its corresponding decade by rounding down to the nearest decade.
def convert_date_to_decade(value):
= parse(value)
date return date.year - (date.year%10)
We’ll create our new decade
column by mapping each record’s date value in eventDate
to its corresponding decade value.
"decade"] = cane_toads["eventDate"].map(convert_date_to_decade) cane_toads[
Make Australia map
Next, let’s download a shapefile of Australia with state boundaries. The Australian Bureau of Statistics provides digital boundary files from which you can explore many other Australian shapefiles. Download the States and Territories - 2021 - Shapefile a zip folder. Save the zip folder inside your working folder and then unzip it to access the .shp
file inside.
{GeoPandas
} is a package that handles geospatial data in Python and can be used to load in shapefiles as GeoPandas
dataframes. Let’s test this out by plotting our Australian state boundary shapefile.
'figure.dpi'] = 1200 # generate a high resolution image
mpl.rcParams[= geopandas.read_file("Australia_state_boundaries/STE_2021_AUST_GDA2020.shp")
states = "#5A5A5A", linewidth = 0.5, facecolor = "white") states.plot(edgecolor
Generate alpha shapes
Alpha shapes can be used to define and visualise the shape of a set of species occurrence points in space. They are useful because they can be generated on data-deficient species with few available observations, and without using environmental data or complex algorithms. Let’s use alpha shapes to see how cane toads’ distribution has changed each decade since they were introduced.
First, we need to obtain a list of all decades with cane toad observations. We’ll use the decade
column from our cane_toads
dataframe to group our observations.
= list(set(cane_toads["decade"])) decades
We will be using the {alphashape
} package to create alpha shapes representing the cane toad distribution for each decade they have been observed. The alphashape.alphashape() function requires two things:
- A set of observation coordinates
- An alpha parameter, which sets how tightly the shape’s lines conform to our observations
Let’s make an alpha shape for each decade’s observations. We’ll also add a slight buffer to each alpha shape to smooth out some of its edges. Then we’ll group all the shapes into one large GeoPandas
dataframe.
We used alpha = 1
, but it’s good practice to change this parameter depending on how widely distributed the coordinates of your data are. Also note that alphashape.alphashape()
requires at least 3 data points to calculate an alpha shape.
= geopandas.GeoDataFrame() # GeoPandas data frame to contain all alpha shapes
alpha_shape_gdf for i, d in enumerate(decades):
= cane_toads[["decimalLongitude", "decimalLatitude"]] [cane_toads["decade"] == d]
decade_points if len(decade_points) <= 3:
continue
= alphashape.alphashape(decade_points, 1)
alpha_shape = {"decade": d, "geometry": [alpha_shape.buffer(0.2)]}
d = geopandas.GeoDataFrame(d, crs="EPSG:7844")
tmp_gdf = pd.concat([alpha_shape_gdf, tmp_gdf]) alpha_shape_gdf
WARNING:root:Singular matrix. Likely caused by all points lying in an N-1 space.
Next, let’s clean up our GeoPandas
dataframe so that it is ready for plotting! Sometimes the alphashape.alphashape()
algorithm will produce an empty shape that needs to be removed from the dataframe (this generally happens when the chosen alpha parameter is not appropriate for the supplied set of points). Let’s remove these shapes from our data.
= alpha_shape_gdf[~alpha_shape_gdf["geometry"].is_empty] alpha_shape_gdf
Now let’s format our decade string to display correctly on the figure legend by making sure it’s in YYYYs
format.
"decade_string"] = alpha_shape_gdf["decade"].map(lambda d: str(d) + "s") alpha_shape_gdf[
Finally, because we expect cane toad distributions in earlier decades to be smaller than in recent decades, we’ll need to plot earlier distributions on top of later distributions to avoid covering the earlier ones up. To achieve this, let’s order the alpha shapes in descending order by decade.
='decade', ascending=False, inplace=True) alpha_shape_gdf.sort_values(by
Map alpha shape distributions
Finally, we can plot our alpha shape distributions for each decade onto our map of Australia!
This figure showcases the incredible pace of the cane toad’s spread across northern Australia. Our map shows that cane toads have spread across most of Queensland, the top end of the Northern Territory (from the 1980s to 2010s) and more recently, into the Kimberley region of Western Australia.
= states.boundary.plot(edgecolor="#5A5A5A", linewidth=0.5, facecolor="white", zorder=-1)
ax
= ax, cmap="plasma", column = "decade_string", legend=True, categorical=True)
alpha_shape_gdf.plot(ax = ax.get_legend()
lgd False)
lgd.draw_frame(1.2, 0.8))
lgd.set_bbox_to_anchor((
= "<style: italic>Rhinella marina</> (cane toad) distributions per decade"
title_text 0.5, 1, title_text, va="bottom", ha="center");
flexitext(
= "<color:#5A5A5A, style:italic, size:7>Distributions calculated with alpha hulls of each decade's cane toad observations</>"
caption_text 0.05, 0, caption_text, va="top");
flexitext(
110, 161])
plt.xlim([-45, -8])
plt.ylim(["off")
plt.axis(=-0.15, right=1)
plt.subplots_adjust(left
plt.show()
Other invasive species
Let’s use the same code as above to visualise other invasive species Camelus dromedarius (Feral dromedary camels) and Echium plantagineum (Paterson’s curse).
Code
# Camel
= galah.atlas_occurrences("Camelus dromedarius", use_data_profile="ALA")
camels = camels.dropna(subset=["eventDate", "decimalLatitude", "decimalLongitude"])
camels "decade"] = camels["eventDate"].map(convert_date_to_decade)
camels[= list(set(camels["decade"]))
decades
= geopandas.GeoDataFrame() # GeoPandas data frame to contain all alpha shapes
alpha_shape_gdf
for i, d in enumerate(decades):
= camels[["decimalLongitude", "decimalLatitude"]] [camels["decade"] == d]
decade_points if len(decade_points) <= 3:
continue
= alphashape.alphashape(decade_points, 1)
alpha_shape = {"decade": d, "geometry": [alpha_shape.buffer(0.2)]}
d = geopandas.GeoDataFrame(d, crs="EPSG:4326")
tmp_gdf = pd.concat([alpha_shape_gdf, tmp_gdf])
alpha_shape_gdf
= alpha_shape_gdf[ ~alpha_shape_gdf["geometry"].is_empty]
alpha_shape_gdf "decade_string"] = alpha_shape_gdf["decade"].map(lambda d: str(d) + "s")
alpha_shape_gdf[='decade', ascending=False, inplace=True)
alpha_shape_gdf.sort_values(by
= states.boundary.plot(edgecolor="#5A5A5A", linewidth=0.5, facecolor="white", zorder=-1)
ax
= ax, cmap="plasma", column = "decade", legend=True, categorical=True)
alpha_shape_gdf.plot(ax = ax.get_legend()
lgd False)
lgd.draw_frame(1.2, 0.61))
lgd.set_bbox_to_anchor((
= "<style: italic>Camelus dromedarius</> (dromedary camel) distributions per decade"
title_text 0.5, 1, title_text, va="bottom", ha="center");
flexitext(
= "<color:#5A5A5A, style:italic, size:7>Distributions calculated with alpha hulls of each decade's dromedary camel observations</>"
caption_text 0.05, 0, caption_text, va="top");
flexitext(
110, 161])
plt.xlim([-45, -8])
plt.ylim(["off")
plt.axis(=-0.1, right=1)
plt.subplots_adjust(left
plt.show()
Code
# Paterson's Curse
= galah.atlas_occurrences("Echium plantagineum", use_data_profile="ALA")
opuntia = opuntia.dropna(subset=["eventDate", "decimalLatitude", "decimalLongitude"])
opuntia "decade"] = opuntia["eventDate"].map(convert_date_to_decade)
opuntia[= list(set(opuntia["decade"]))
decades
= geopandas.GeoDataFrame() # GeoPandas data frame to contain all alpha shapes
alpha_shape_gdf
for i, d in enumerate(decades):
= opuntia[["decimalLongitude", "decimalLatitude"]] [opuntia["decade"] == d]
decade_points if len(decade_points) <= 3:
continue
= alphashape.alphashape(decade_points, 1)
alpha_shape = {"decade": d, "geometry": [alpha_shape.buffer(0.2)]}
d = geopandas.GeoDataFrame(d, crs="EPSG:4326")
tmp_gdf = pd.concat([alpha_shape_gdf, tmp_gdf])
alpha_shape_gdf
= alpha_shape_gdf[ ~alpha_shape_gdf["geometry"].is_empty]
alpha_shape_gdf "decade_string"] = alpha_shape_gdf["decade"].map(lambda d: str(d) + "s")
alpha_shape_gdf[='decade', ascending=False, inplace=True)
alpha_shape_gdf.sort_values(by
= states.boundary.plot(edgecolor="#5A5A5A", linewidth=0.5, facecolor="white", zorder=-1)
ax
= ax, cmap="plasma", column = "decade", legend=True, categorical=True)
alpha_shape_gdf.plot(ax = ax.get_legend()
lgd False)
lgd.draw_frame(1.2, 0.85))
lgd.set_bbox_to_anchor((
= "<style: italic>Echium plantagineum</> (Paterson's curse) distributions per decade"
title_text 0.5, 1, title_text, va="bottom", ha="center");
flexitext(
= "<color:#5A5A5A, style:italic, size:7>Distributions calculated with alpha hulls of each decade's Paterson's curse observations</>"
caption_text 0.05, 0, caption_text, va="top");
flexitext(
110, 161])
plt.xlim([-45, -8])
plt.ylim(["off")
plt.axis(=-0.1, right=1)
plt.subplots_adjust(left
plt.show()
Native introduced pest species
When people think of invasive species, they generally think of species that have been introduced to Australia from other countries. However, even Australia’s native species can become pests when introduced to a new ecosystem.
One good example of native pests are the trees Pittosporum undulatum (sometimes called Sweet Pittosporum). These trees have been introduced as ornamental plants in gardens across Australia because of their sweet-scented flowers and bright berries. Although Pittosporum undulatum’s native range extends from southern Queensland to eastern Victoria, it is now considered an environmental weed in many regions where it has been introduced.
Let’s create a choropleth map to visualise the to visualise the bioregions where Pittosporum undulatum is native and introduced.
Download IBRA regions
First, let’s download a shapefile of Australia’s bioregions. The IBRA7 bioregions classify areas within Australia that are geographically and ecologically distinct. Download the zip folder, save it in your project directory and unzip it. We can again use the {GeoPandas} package to read in and handle these data.
= geopandas.read_file("IBRA7_regions/ibra7_regions.shp")
bioregions = "#5A5A5A", linewidth = 0.25, facecolor = "white") bioregions.plot(edgecolor
Within our bioregions
dataframe, the column REG_NAME_7
contains IBRA bioregion names.
5) bioregions.head(
REG_CODE_7 | REG_NAME_7 | HECTARES | SQ_KM | REC_ID | REG_CODE_6 | REG_NAME_6 | REG_NO_61 | FEAT_ID | Shape_Leng | Shape_Area | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ARC | Arnhem Coast | 3.335669e+06 | 33356.686 | 1 | ARC | Arnhem Coast | 81.0 | GA_100K_Islands | 52.135362 | 2.774143 | MULTIPOLYGON (((136.93863 -14.36548, 136.93790... |
1 | ARP | Arnhem Plateau | 2.306023e+06 | 23060.226 | 2 | ARP | Arnhem Plateau | 82.0 | GA_100K_Mainland | 8.206764 | 1.921833 | POLYGON ((134.08186 -12.32124, 134.09525 -12.3... |
2 | AUA | Australian Alps | 1.232981e+06 | 12329.805 | 3 | AA | Australian Alps | 6.0 | GA_100K_Mainland | 63.337626 | 1.242821 | MULTIPOLYGON (((145.75905 -37.67486, 145.76034... |
3 | AVW | Avon Wheatbelt | 9.517104e+06 | 95171.043 | 4 | AW | Avon Wheatbelt | 70.0 | GA_100K_Mainland | 35.275357 | 9.012298 | POLYGON ((115.48396 -28.30698, 115.48428 -28.3... |
4 | BBN | Brigalow Belt North | 1.367453e+07 | 136745.325 | 5 | BBN | Brigalow Belt North | 22.0 | GA_100K_Islands | 91.288203 | 11.986228 | MULTIPOLYGON (((151.14295 -23.71304, 151.14210... |
Find bioregions with observations
We’ll once again use {galah} to find numbers of Pittosporum undulatum in each bioregion. First, let’s find which field ID corresponds to bioregions in {galah}
= "IBRA") galah.search_all(fields
id | description | type | link | |
---|---|---|---|---|
0 | cl3 | Western Australian Biodiversity Science Resear... | layers | |
1 | cl20 | IBRA 6 Regions Interim Biogeographic Regionali... | layers | |
2 | cl1049 | IBRA 7 Subregions IBRA 7 Subregions | layers | |
3 | cl1048 | IBRA 7 Regions Interim Biogeographic Regionali... | layers |
It looks like field cl1048
contains IBRA 7 regions. Let’s check what values this field contains by using galah.show_values()
.
= "cl1048") galah.show_values(field
field | category | |
---|---|---|
0 | cl1048 | South Eastern Queensland |
1 | cl1048 | Sydney Basin |
2 | cl1048 | South Eastern Highlands |
3 | cl1048 | South East Coastal Plain |
4 | cl1048 | NSW North Coast |
... | ... | ... |
84 | cl1048 | Central Arnhem |
85 | cl1048 | Indian Tropical Islands |
86 | cl1048 | Little Sandy Desert |
87 | cl1048 | Gibson Desert |
88 | cl1048 | Coral Sea |
89 rows × 2 columns
Now we can use the group_by
argument in galah.atlas_counts()
to group observations of Pittosporum undulatum by bioregion, returning all bioregions where Pittosporum undulatum has been observed at least once. We’ll extract extract and save the bioregion names in a dataframe.
= galah.atlas_counts("Pittosporum undulatum",
found_bioregion_counts ="cl1048",
group_by= False)
expand
# extract bioregion names from Pandas dataframe into list
= list(found_bioregion_counts["cl1048"])
found_bioregions
print(found_bioregion_counts[0:10])
cl1048 count
0 Sydney Basin 10244
1 South East Coastal Plain 3300
2 South Eastern Highlands 2945
3 South East Corner 2931
4 NSW North Coast 2445
5 South Eastern Queensland 1190
6 Flinders Lofty Block 464
7 Southern Volcanic Plain 368
8 Victorian Midlands 200
9 Brigalow Belt South 173
Separate native & introduced regions
Next, let’s separate bioregions where Pittosporum undulatum is native from bioregions where it has been introduced. The Australia Native Plants Society estimates Pittosporum undulatum’s native range overlapping with South Eastern Queensland, NSW North Coast, Sydney Basin, South East Corner and South East Coastal Plain (see here). Let’s save these bioregion names in a separate dataframe and compare them to the overall list found_bioregions
.
= ["South Eastern Queensland", "NSW North Coast", "Sydney Basin", "South East Corner", "South East Coastal Plain"]
native_bioregions = [region for region in found_bioregions if region not in native_bioregions]
introduced_bioregions
print(introduced_bioregions[1:5]) # first 5 introduced regions
['Flinders Lofty Block', 'Southern Volcanic Plain', 'Victorian Midlands', 'Brigalow Belt South']
Next we can add a new column native
to our GeoPandas bioregion dataframe to identify native and introduced regions. We’ll use the .loc
method to assign a “Native”, “Introduced” or “No observations” label to each row depending on whether the region is in native_bioregions
or introduced_bioregions
.
"REG_NAME_7"].isin(native_bioregions), "native"] = "Native"
bioregions.loc[bioregions["REG_NAME_7"].isin(introduced_bioregions), "native"] = "Introduced"
bioregions.loc[bioregions["native"] = bioregions["native"].replace("nan", "No observations") bioregions[
Make choropleth map
When plotting this GeoPandas dataframe, we can specify that we want the map coloured according to its native
label so that native, introduced and not found bioregions are distinguishable colours. This is done by supplying the column
argument of the .plot()
function with the column of the dataframe that the colouring is based upon. However, matplotlib
would choose a default colourmap to colour the bioregions so we will need to specify the exact colours we wanted associated with each type of bioregion.
To identify our three categories of regions on our map, we’ll create a new column colour
containing colour hex codes for plotting our regions.
"native"] == "Native", "colour"] = "#8FBD4C" # Native
bioregions.loc[bioregions["native"] == "Introduced", "colour"] = "#F7872E" # Introduced
bioregions.loc[bioregions["native"] == "No observations", "colour"] = "#E4DFCF" # No observations bioregions.loc[bioregions[
We can use this colour
column as the input to our .plot()
function.
Our map shows that Pittosporum undulatum has been observed in Western Australia, Northern Territory, South Australia, and even Tasmania despite having a fairly narrow native range along the east coast of Australia.
="white", linewidth = 0.25, color = bioregions["colour"])
bioregions.plot(edgecolor
= "<style:italic>Pittosporum undulatum</> <color:#8FBD4C, weight:bold>native</> and <color:#F7872E, weight:bold>introduced</> Australian bioregions"
title_text 0.5, 1, title_text, va="bottom", ha="center");
flexitext(
110, 161])
plt.xlim([-45, -8])
plt.ylim(["off")
plt.axis( plt.show()
Final thoughts
Human activity—from constructing buildings to travelling overseas to gardening—plays a part in shaping modern ecosystems. Our maps showed how quickly well-known invasive species have established themselves across Australia, and how widely even native Australian plants can spread when introduced to non-native regions.
Humans are just one of many drivers of introducing species to new areas. Changes to the environment, for example, can shrink available resources and living space in a habitat, giving introduced species a chance to outcompete native species for what resources and space are left. As species inevitably enter and alter ecosystems, large weather events, extreme temperatures and habitat degradation can give invasives a big leg-up on the native competition, too.
Nonetheless, there is still hope. Research finds native species can still adapt to changing environments and simple tasks like pulling weeds can help native species survive after events like fires.
Expand for session info
import math
import natsort
import pandas
import session_info
session_info.show()
Click to view session information
----- alphashape 1.3.1 dateutil 2.8.2 flexitext 0.2.0 galah 0.8.2 geopandas 0.14.1 matplotlib 3.8.2 natsort 8.4.0 numpy 1.26.2 pandas 2.1.3 session_info 1.0.0 shapely 2.0.2 -----
Click to view modules imported as dependencies
PIL 10.1.0 asttokens NA attr 23.1.0 certifi 2023.11.17 cffi 1.15.1 charset_normalizer 3.3.2 colorama 0.4.6 comm 0.2.0 cycler 0.12.1 cython_runtime NA debugpy 1.8.0 decorator 5.1.1 defusedxml 0.7.1 executing 2.0.1 fiona 1.9.5 google NA idna 3.6 ipykernel 6.27.0 ipython_genutils 0.2.0 jedi 0.19.1 kiwisolver 1.4.5 matplotlib_inline 0.1.6 mpl_toolkits NA networkx 3.2.1 packaging 23.2 parso 0.8.3 pickleshare 0.7.5 pkg_resources NA platformdirs 4.1.0 prompt_toolkit 3.0.41 psutil 5.9.6 pure_eval 0.2.2 pydev_ipython NA pydevconsole NA pydevd 2.9.5 pydevd_file_utils NA pydevd_plugins NA pydevd_tracing NA pygments 2.17.2 pyparsing 3.1.1 pyproj 3.6.1 pytz 2023.3.post1 pywin32_bootstrap NA pywin32_system32 NA requests 2.31.0 rtree 1.0.1 scipy 1.12.0 six 1.16.0 stack_data 0.6.3 tornado 6.3.3 traitlets 5.13.0 trimesh 3.21.5 urllib3 2.1.0 wcwidth 0.2.12 zmq 25.1.2
----- IPython 8.17.2 jupyter_client 8.6.0 jupyter_core 5.5.0 notebook 6.5.4 ----- Python 3.12.0 (tags/v3.12.0:0fb18b0, Oct 2 2023, 13:03:39) [MSC v.1935 64 bit (AMD64)] Windows-10-10.0.19045-SP0 ----- Session information updated at 2024-02-13 11:25