Plotting invasive species distributions with alpha shapes and choropleth maps in Python

Invasive and introduced species can expand quickly into new habitats, altering ecosystems. In this post we use Python’s {galah}, {alphashape} and {GeoPandas} packages to visualise the growing distribution of Rhinella marina (cane toads) and the expanding range of Pittisporum undulatum in Australia.

Eukaryota
Animalia
Plantae
Maps
Python
Authors

Caitlin Ramsay

Amanda Buyan

Dax Kellie

Published

April 28, 2023

Author

Caitlin Ramsay
Amanda Buyan
Dax Kellie

Date

28 April 2023

Intern Post

Humans’ movement across the globe has led to the accidental, and sometimes deliberate, transportation of species beyond their native habitats. In Australia since European colonisation, around 3,000 species have been introduced.

Within the last 200 years over 100 native species have gone extinct, with invasive species labelled as affecting 82% (1,257 of 1,533) of Australia’s threatened taxa in 2018. Since 1960, invasive species have cost the Australian economy at least $390 billion in damages, and are now considered a main driver of extinctions in native plants and animals.

However, species from outside of Australia aren’t the only ones that can encroach on other species’ habitats. Native Australian species can do it, too. Thanks in part to human activity, changing temperatures and more frequent extreme weather events, some Australian species have established themselves in new areas outside of their native range. Although not as popularly discussed, Australian species that have become pests in new habitats can disrupt ecosystems much like internationally invasive species.

In this post, we will use Python and the {galah} package to visualise how distributions of both international invasive species and native introduced pest species have shifted over time. To do this, we will use alpha shapes to visualise the distribution of Rhinella marina (Cane toads) since the 1930s and create a choropleth map to visualise the expanded habitat range of Pittosporum undulatum.

Invasive Species

Download data

To start, we will use the infamous example of the cane toad to illustrate how far an invasive species’ distribution can spread each decade.

 

 

First load the required Python packages.

import galah
import pandas as pd
import geopandas
import numpy as np
from dateutil.parser import parse
import matplotlib.pyplot as plt
import matplotlib as mpl
import alphashape
from flexitext import flexitext

Next, we will use the {galah} package to download occurrence records of cane toads in Australia from the Atlas of Living Australia (ALA). You will need to first provide a registered email with the ALA using galah.galah_config() before retrieving records.

# Add registered email (register at ala.org.au)
galah.galah_config(email = "your-email@email.com")
galah.galah_config(data_profile="ALA")
cane_toads = galah.atlas_occurrences(taxa = "Rhinella marina", use_data_profile = True)
cane_toads.head(5)
decimalLatitude decimalLongitude eventDate scientificName taxonConceptID recordID dataResourceName occurrenceStatus
0 -38.300000 145.000000 1990-04-23T00:00:00Z Rhinella marina https://biodiversity.org.au/afd/taxa/e79179f8-... 58772bea-1c61-4716-a453-f201e89fcc8d Museums Victoria provider for OZCAM PRESENT
1 -38.300000 145.000000 1990-07-21T00:00:00Z Rhinella marina https://biodiversity.org.au/afd/taxa/e79179f8-... 141f0b10-bdf2-4239-80e4-1b5167e1f76c Museums Victoria provider for OZCAM PRESENT
2 -37.820000 145.230000 1990-04-14T00:00:00Z Rhinella marina https://biodiversity.org.au/afd/taxa/e79179f8-... f6d07eb2-4f8f-476e-a212-726b10a4a745 Museums Victoria provider for OZCAM PRESENT
3 -36.431411 148.329322 2017-03-07T00:00:00Z Rhinella marina https://biodiversity.org.au/afd/taxa/e79179f8-... 549908bf-0b34-4227-9288-42a7fa52dabf NSW BioNet Atlas PRESENT
4 -36.250765 149.120224 2021-02-04T00:00:00Z Rhinella marina https://biodiversity.org.au/afd/taxa/e79179f8-... ae36a792-f694-4a73-acfb-e56be564def6 NSW BioNet Atlas PRESENT

Clean data

We’ll clean our data to ensure that there are no null or missing values in our coordinates and date fields. Because galah.atlas_occurrences() returns a Pandas dataframe, we have plenty of functions we can use to clean our data.

cane_toads = cane_toads.dropna(subset=["eventDate", "decimalLatitude", "decimalLongitude"])

We want to map cane toad’s distribution each decade in our final visualisation. However, the eventDate value for each record is formaatted as a string value yyyy-mm-dd Thh:mm:ssZ. Let’s write our own function convert_date_to_decade() that extract the year from a date string and return its corresponding decade by rounding down to the nearest decade.

def convert_date_to_decade(value):
    date = parse(value)
    return date.year - (date.year%10)

We’ll create our new decade column by mapping each record’s date value in eventDate to its corresponding decade value.

cane_toads["decade"] = cane_toads["eventDate"].map(convert_date_to_decade)

Make Australia map

Next, let’s download a shapefile of Australia with state boundaries. The Australian Bureau of Statistics provides digital boundary files from which you can explore many other Australian shapefiles. Download the States and Territories - 2021 - Shapefile a zip folder. Save the zip folder inside your working folder and then unzip it to access the .shp file inside.

{GeoPandas} is a package that handles geospatial data in Python and can be used to load in shapefiles as GeoPandas dataframes. Let’s test this out by plotting our Australian state boundary shapefile.

mpl.rcParams['figure.dpi'] = 1200 # generate a high resolution image
states = geopandas.read_file("Australia_state_boundaries/STE_2021_AUST_GDA2020.shp")
states.plot(edgecolor = "#5A5A5A", linewidth = 0.5, facecolor = "white")

Generate alpha shapes

Alpha shapes can be used to define and visualise the shape of a set of species occurrence points in space. They are useful because they can be generated on data-deficient species with few available observations, and without using environmental data or complex algorithms. Let’s use alpha shapes to see how cane toads’ distribution has changed each decade since they were introduced.

First, we need to obtain a list of all decades with cane toad observations. We’ll use the decade column from our cane_toads dataframe to group our observations.

decades = list(set(cane_toads["decade"]))

We will be using the {alphashape} package to create alpha shapes representing the cane toad distribution for each decade they have been observed. The alphashape.alphashape() function requires two things:

  1. A set of observation coordinates
  2. An alpha parameter, which sets how tightly the shape’s lines conform to our observations

Let’s make an alpha shape for each decade’s observations. We’ll also add a slight buffer to each alpha shape to smooth out some of its edges. Then we’ll group all the shapes into one large GeoPandas dataframe.

Note

We used alpha = 1, but it’s good practice to change this parameter depending on how widely distributed the coordinates of your data are. Also note that alphashape.alphashape() requires at least 3 data points to calculate an alpha shape.

alpha_shape_gdf = geopandas.GeoDataFrame() # GeoPandas data frame to contain all alpha shapes
for i, d in enumerate(decades):
    decade_points = cane_toads[["decimalLongitude", "decimalLatitude"]] [cane_toads["decade"] == d]
    if len(decade_points) <= 3: 
        continue
    alpha_shape = alphashape.alphashape(decade_points, 1)
    d = {"decade": d, "geometry": [alpha_shape.buffer(0.2)]}
    tmp_gdf = geopandas.GeoDataFrame(d, crs="EPSG:7844")
    alpha_shape_gdf = pd.concat([alpha_shape_gdf, tmp_gdf])
WARNING:root:Singular matrix. Likely caused by all points lying in an N-1 space.

Next, let’s clean up our GeoPandas dataframe so that it is ready for plotting! Sometimes the alphashape.alphashape() algorithm will produce an empty shape that needs to be removed from the dataframe (this generally happens when the chosen alpha parameter is not appropriate for the supplied set of points). Let’s remove these shapes from our data.

alpha_shape_gdf = alpha_shape_gdf[~alpha_shape_gdf["geometry"].is_empty]

Now let’s format our decade string to display correctly on the figure legend by making sure it’s in YYYYs format.

alpha_shape_gdf["decade_string"] = alpha_shape_gdf["decade"].map(lambda d: str(d) + "s")

Finally, because we expect cane toad distributions in earlier decades to be smaller than in recent decades, we’ll need to plot earlier distributions on top of later distributions to avoid covering the earlier ones up. To achieve this, let’s order the alpha shapes in descending order by decade.

alpha_shape_gdf.sort_values(by='decade', ascending=False, inplace=True)

Map alpha shape distributions

Finally, we can plot our alpha shape distributions for each decade onto our map of Australia!

This figure showcases the incredible pace of the cane toad’s spread across northern Australia. Our map shows that cane toads have spread across most of Queensland, the top end of the Northern Territory (from the 1980s to 2010s) and more recently, into the Kimberley region of Western Australia.

ax = states.boundary.plot(edgecolor="#5A5A5A", linewidth=0.5, facecolor="white", zorder=-1)

alpha_shape_gdf.plot(ax = ax, cmap="plasma", column = "decade_string", legend=True, categorical=True)
lgd = ax.get_legend()
lgd.draw_frame(False)
lgd.set_bbox_to_anchor((1.2, 0.8))

title_text = "<style: italic>Rhinella marina</> (cane toad) distributions per decade"
flexitext(0.5, 1, title_text, va="bottom", ha="center");

caption_text = "<color:#5A5A5A, style:italic, size:7>Distributions calculated with alpha hulls of each decade's cane toad observations</>"
flexitext(0.05, 0, caption_text, va="top");

plt.xlim([110, 161])
plt.ylim([-45, -8])
plt.axis("off")
plt.subplots_adjust(left=-0.15, right=1)

plt.show()

Other invasive species

Let’s use the same code as above to visualise other invasive species Camelus dromedarius (Feral dromedary camels) and Echium plantagineum (Paterson’s curse).

Native introduced pest species

When people think of invasive species, they generally think of species that have been introduced to Australia from other countries. However, even Australia’s native species can become pests when introduced to a new ecosystem.

One good example of native pests are the trees Pittosporum undulatum (sometimes called Sweet Pittosporum). These trees have been introduced as ornamental plants in gardens across Australia because of their sweet-scented flowers and bright berries. Although Pittosporum undulatum’s native range extends from southern Queensland to eastern Victoria, it is now considered an environmental weed in many regions where it has been introduced.

 

 

Let’s create a choropleth map to visualise the to visualise the bioregions where Pittosporum undulatum is native and introduced.

Download IBRA regions

First, let’s download a shapefile of Australia’s bioregions. The IBRA7 bioregions classify areas within Australia that are geographically and ecologically distinct. Download the zip folder, save it in your project directory and unzip it. We can again use the {GeoPandas} package to read in and handle these data.

bioregions = geopandas.read_file("IBRA7_regions/ibra7_regions.shp")
bioregions.plot(edgecolor = "#5A5A5A", linewidth = 0.25, facecolor = "white")

Within our bioregions dataframe, the column REG_NAME_7 contains IBRA bioregion names.

bioregions.head(5)
REG_CODE_7 REG_NAME_7 HECTARES SQ_KM REC_ID REG_CODE_6 REG_NAME_6 REG_NO_61 FEAT_ID Shape_Leng Shape_Area geometry
0 ARC Arnhem Coast 3.335669e+06 33356.686 1 ARC Arnhem Coast 81.0 GA_100K_Islands 52.135362 2.774143 MULTIPOLYGON (((136.93863 -14.36548, 136.93790...
1 ARP Arnhem Plateau 2.306023e+06 23060.226 2 ARP Arnhem Plateau 82.0 GA_100K_Mainland 8.206764 1.921833 POLYGON ((134.08186 -12.32124, 134.09525 -12.3...
2 AUA Australian Alps 1.232981e+06 12329.805 3 AA Australian Alps 6.0 GA_100K_Mainland 63.337626 1.242821 MULTIPOLYGON (((145.75905 -37.67486, 145.76034...
3 AVW Avon Wheatbelt 9.517104e+06 95171.043 4 AW Avon Wheatbelt 70.0 GA_100K_Mainland 35.275357 9.012298 POLYGON ((115.48396 -28.30698, 115.48428 -28.3...
4 BBN Brigalow Belt North 1.367453e+07 136745.325 5 BBN Brigalow Belt North 22.0 GA_100K_Islands 91.288203 11.986228 MULTIPOLYGON (((151.14295 -23.71304, 151.14210...

Find bioregions with observations

We’ll once again use {galah} to find numbers of Pittosporum undulatum in each bioregion. First, let’s find which field ID corresponds to bioregions in {galah}

galah.search_all(fields = "IBRA") 
id description type link
0 cl3 Western Australian Biodiversity Science Resear... layers
1 cl20 IBRA 6 Regions Interim Biogeographic Regionali... layers
2 cl1049 IBRA 7 Subregions IBRA 7 Subregions layers
3 cl1048 IBRA 7 Regions Interim Biogeographic Regionali... layers

It looks like field cl1048 contains IBRA 7 regions. Let’s check what values this field contains by using galah.show_values().

galah.show_values(field = "cl1048")
field category
0 cl1048 South Eastern Queensland
1 cl1048 Sydney Basin
2 cl1048 South Eastern Highlands
3 cl1048 South East Coastal Plain
4 cl1048 NSW North Coast
... ... ...
84 cl1048 Central Arnhem
85 cl1048 Indian Tropical Islands
86 cl1048 Little Sandy Desert
87 cl1048 Gibson Desert
88 cl1048 Coral Sea

89 rows × 2 columns

Now we can use the group_by argument in galah.atlas_counts() to group observations of Pittosporum undulatum by bioregion, returning all bioregions where Pittosporum undulatum has been observed at least once. We’ll extract extract and save the bioregion names in a dataframe.

found_bioregion_counts = galah.atlas_counts("Pittosporum undulatum",
                                           group_by="cl1048",
                                           expand = False)

# extract bioregion names from Pandas dataframe into list
found_bioregions = list(found_bioregion_counts["cl1048"])

print(found_bioregion_counts[0:10])
                     cl1048  count
0              Sydney Basin  10244
1  South East Coastal Plain   3300
2   South Eastern Highlands   2945
3         South East Corner   2931
4           NSW North Coast   2445
5  South Eastern Queensland   1190
6      Flinders Lofty Block    464
7   Southern Volcanic Plain    368
8        Victorian Midlands    200
9       Brigalow Belt South    173

Separate native & introduced regions

Next, let’s separate bioregions where Pittosporum undulatum is native from bioregions where it has been introduced. The Australia Native Plants Society estimates Pittosporum undulatum’s native range overlapping with South Eastern Queensland, NSW North Coast, Sydney Basin, South East Corner and South East Coastal Plain (see here). Let’s save these bioregion names in a separate dataframe and compare them to the overall list found_bioregions.

native_bioregions = ["South Eastern Queensland", "NSW North Coast", "Sydney Basin", "South East Corner", "South East Coastal Plain"]
introduced_bioregions = [region for region in found_bioregions if region not in native_bioregions]

print(introduced_bioregions[1:5]) # first 5 introduced regions
['Flinders Lofty Block', 'Southern Volcanic Plain', 'Victorian Midlands', 'Brigalow Belt South']

Next we can add a new column native to our GeoPandas bioregion dataframe to identify native and introduced regions. We’ll use the .loc method to assign a “Native”, “Introduced” or “No observations” label to each row depending on whether the region is in native_bioregions or introduced_bioregions.

bioregions.loc[bioregions["REG_NAME_7"].isin(native_bioregions), "native"] = "Native"
bioregions.loc[bioregions["REG_NAME_7"].isin(introduced_bioregions), "native"] = "Introduced"
bioregions["native"] = bioregions["native"].replace("nan", "No observations")

Make choropleth map

When plotting this GeoPandas dataframe, we can specify that we want the map coloured according to its native label so that native, introduced and not found bioregions are distinguishable colours. This is done by supplying the column argument of the .plot() function with the column of the dataframe that the colouring is based upon. However, matplotlib would choose a default colourmap to colour the bioregions so we will need to specify the exact colours we wanted associated with each type of bioregion.

To identify our three categories of regions on our map, we’ll create a new column colour containing colour hex codes for plotting our regions.

bioregions.loc[bioregions["native"] == "Native", "colour"] = "#8FBD4C" # Native
bioregions.loc[bioregions["native"] == "Introduced", "colour"] = "#F7872E" # Introduced
bioregions.loc[bioregions["native"] == "No observations", "colour"] = "#E4DFCF" # No observations

We can use this colour column as the input to our .plot() function.

Our map shows that Pittosporum undulatum has been observed in Western Australia, Northern Territory, South Australia, and even Tasmania despite having a fairly narrow native range along the east coast of Australia.

bioregions.plot(edgecolor="white", linewidth = 0.25, color = bioregions["colour"])

title_text = "<style:italic>Pittosporum undulatum</> <color:#8FBD4C, weight:bold>native</> and <color:#F7872E, weight:bold>introduced</> Australian bioregions"
flexitext(0.5, 1, title_text, va="bottom", ha="center");

plt.xlim([110, 161])
plt.ylim([-45, -8])
plt.axis("off")
plt.show()

Final thoughts

Human activity—from constructing buildings to travelling overseas to gardening—plays a part in shaping modern ecosystems. Our maps showed how quickly well-known invasive species have established themselves across Australia, and how widely even native Australian plants can spread when introduced to non-native regions.

Humans are just one of many drivers of introducing species to new areas. Changes to the environment, for example, can shrink available resources and living space in a habitat, giving introduced species a chance to outcompete native species for what resources and space are left. As species inevitably enter and alter ecosystems, large weather events, extreme temperatures and habitat degradation can give invasives a big leg-up on the native competition, too.

Nonetheless, there is still hope. Research finds native species can still adapt to changing environments and simple tasks like pulling weeds can help native species survive after events like fires.

Expand for session info
import math
import natsort
import pandas
import session_info

session_info.show()
Click to view session information
-----
alphashape          1.3.1
dateutil            2.8.2
flexitext           0.2.0
galah               0.8.2
geopandas           0.14.1
matplotlib          3.8.2
natsort             8.4.0
numpy               1.26.2
pandas              2.1.3
session_info        1.0.0
shapely             2.0.2
-----
Click to view modules imported as dependencies
PIL                 10.1.0
asttokens           NA
attr                23.1.0
certifi             2023.11.17
cffi                1.15.1
charset_normalizer  3.3.2
colorama            0.4.6
comm                0.2.0
cycler              0.12.1
cython_runtime      NA
debugpy             1.8.0
decorator           5.1.1
defusedxml          0.7.1
executing           2.0.1
fiona               1.9.5
google              NA
idna                3.6
ipykernel           6.27.0
ipython_genutils    0.2.0
jedi                0.19.1
kiwisolver          1.4.5
matplotlib_inline   0.1.6
mpl_toolkits        NA
networkx            3.2.1
packaging           23.2
parso               0.8.3
pickleshare         0.7.5
pkg_resources       NA
platformdirs        4.1.0
prompt_toolkit      3.0.41
psutil              5.9.6
pure_eval           0.2.2
pydev_ipython       NA
pydevconsole        NA
pydevd              2.9.5
pydevd_file_utils   NA
pydevd_plugins      NA
pydevd_tracing      NA
pygments            2.17.2
pyparsing           3.1.1
pyproj              3.6.1
pytz                2023.3.post1
pywin32_bootstrap   NA
pywin32_system32    NA
requests            2.31.0
rtree               1.0.1
scipy               1.12.0
six                 1.16.0
stack_data          0.6.3
tornado             6.3.3
traitlets           5.13.0
trimesh             3.21.5
urllib3             2.1.0
wcwidth             0.2.12
zmq                 25.1.2
-----
IPython             8.17.2
jupyter_client      8.6.0
jupyter_core        5.5.0
notebook            6.5.4
-----
Python 3.12.0 (tags/v3.12.0:0fb18b0, Oct  2 2023, 13:03:39) [MSC v.1935 64 bit (AMD64)]
Windows-10-10.0.19045-SP0
-----
Session information updated at 2024-02-13 11:25