Ecological data is often used to understand what species are found in a given location, especially for conservation monitoring and environmental impact assessment prior to land development. A common method for this task is to use a buffer, an outward boundary around a given area location. Adding a buffer helps to capture all the species in an area, including those that have been observed just outside the area and probably live there, too.
Choosing a buffer size, however, can be tougher than it seems. Individual organisms move, either over the course of a day or an entire season (e.g. migration, perennial growth), so species’ lifecycles and behaviours may determine the size of our final buffer. A more difficult challenge occurs when species are considered sensitive, vulnerable or endangered. These species’ exact point locations are often obfuscated (aka their location is made less precise) to keep these species safe. This added imprecision will again affect our final decision on buffer size.
In this post, we’ll show how to add a buffer around a shapefile with {geopandas}, {shapely} and {matplotlib}. Then we will use {galah-python} to download data of Stuttering frogs (Mixophyes balbus) to demonstrate how the size of a buffer can affect the detection of threatened species in an area. Lastly, we will use {scipy} and {matplotlib} to show the effect of buffers on detecting threatened species in an area.
Draw a buffer
For this example, our area of interest is Mid-Western, a Local Government Area (LGA) in New South Wales. We’ll first need to download a shapefile of our area, which we can get by downloading a shapefile of all LGAs from the Australian Bureau of Statistics and filtering to our area. Download the zip file of “Local Government Areas - 2024”, then place the zip file in your local directory. We can then read in the shapefile and show what it looks like using {geopandas}.
For those unfamiliar with Australian geography, the LGA of Mid-Western is located here:
import geopandas as gpd
import shapely
from shapely.geometry import Polygon
= gpd.read_file("LGA_2024_AUST_GDA2020.zip")
lgas lgas.head()
LGA_CODE24 LGA_NAME24 STE_CODE21 STE_NAME21 AUS_CODE21 AUS_NAME21 AREASQKM LOCI_URI21 geometry
0 10050 Albury 1 New South Wales AUS Australia 305.6386 https://linked.data.gov.au/dataset/asgsed3/LGA... POLYGON ((146.86566 -36.07292, 146.86512 -36.0...
1 10180 Armidale 1 New South Wales AUS Australia 7809.4406 https://linked.data.gov.au/dataset/asgsed3/LGA... POLYGON ((152.38816 -30.52639, 152.38812 -30.5...
2 10250 Ballina 1 New South Wales AUS Australia 484.9692 https://linked.data.gov.au/dataset/asgsed3/LGA... MULTIPOLYGON (((153.57106 -28.87381, 153.57106...
3 10300 Balranald 1 New South Wales AUS Australia 21690.7493 https://linked.data.gov.au/dataset/asgsed3/LGA... POLYGON ((143.00433 -33.78164, 143.01538 -33.7...
4 10470 Bathurst 1 New South Wales AUS Australia 3817.8645 https://linked.data.gov.au/dataset/asgsed3/LGA... POLYGON ((149.84877 -33.52784, 149.84864 -33.5...
Then we’ll filter to Mid-Western.
= lgas[lgas['LGA_NAME24'] == 'Mid-Western'] midwestern
Now, we will create a 5 km buffer around Mid-Western, as we are looking at a relatively small area. To do this, we’ll need to convert the shapefiles between different Coordinate Reference Systems (CRS)1 to allow us to draw our buffer.
First, we’ll reproject our polygon midwestern
to a CRS measured in metres, like Australian Albers (EPSG:3577
). Then we can create a buffer in metres around midwestern
. Finally, we’ll reproject midwestern
match the CRS of the data we intend to use, EPSG:4326
2, and unify any intersecting shapes.
# reproject to Australian Albers CRS
= midwestern.to_crs(3577)
midwestern_metres
# create buffer, reproject, unify any overlapping shapes
= midwestern_metres['geometry'].buffer(5000)
buffer_5km = buffer_5km.to_crs(4326)
buffer_5km_degrees = shapely.unary_union(buffer_5km_degrees) union_buffer_5km_degrees
Let’s plot our 5km buffer on a map.
# import matplotlib for plotting
import matplotlib.pyplot as plt
# set initial shapefile as axis
= midwestern.plot(edgecolor="#292C26", linewidth = 2.0, facecolor="white")
ax
# plot buffer on same axis as original shapefile
*union_buffer_5km_degrees.exterior.xy, c='#358BA5', lw=2.0, label=length)
plt.plot('off') # remove axis to make plot look prettier ax.axis(
Now that we’ve drawn our buffer around the Mid-Western LGA, let’s discuss how data obfuscation of sensitive/threatened species data might impact our decision about buffer size.
Obfuscation and why it’s important for sensitive species
When we talk about a record being obfuscated, we mean that the coordinate location of this record has been made less precise either by generalisation or randomisation. The Atlas of Living Australia generalises coordinate locations by reducing the number of decimals in the record’s lat/long coordinates, lowering the point’s precision3. This figure below illustrates this; as the decimal points are removed, the data appears more ‘grid-like’ as the data loses resolution.
Code
# import plotting and animation packages
import matplotlib.animation as animation
import math
import pandas as pd
from IPython.display import display, Javascript
# use below if you are running in jupyter notebook
%matplotlib ipympl
# create dataframe of points
= pd.DataFrame({
points 'orig_long': [149.9153,149.9181,149.9204,149.9233,149.9101,149.9258,149.9121,149.9163,
149.9295,149.9175,149.9287,149.9236,149.9109,149.9091,149.9113,149.9211,
149.9073,149.9087,149.9236,149.9241,149.9289],
'orig_lat': [-33.3874,-33.3509,-33.3694,-33.3479,-33.3341,-33.3789,-33.3475,-33.3748,
-33.3554,-33.3723,-33.3808,-33.3798,-33.3475,-33.3607,-33.3521,-33.3871,
-33.3541,-33.3633,-33.3799,-33.3833,-33.3423]
})
# Round to 3 and 2 decimal places
for i in range(3,1,-1):
= 10 ** i
factor 'round{}_long'.format(i)] = points['orig_long'].apply(lambda x: math.floor(x * factor) / factor)
points['round{}_lat'.format(i)] = points['orig_lat'].apply(lambda x: math.ceil(x * factor) / factor)
points[
# create initial figure
= plt.subplots(1,2)
fig,ax
# create dictionary for columns - will be easier to use in update function
= {0: ['orig_long', 'orig_lat'],
column_labels 1: ['round3_long', 'round3_lat'],
2: ['round2_long', 'round2_lat']}
= []
artists for i in range(3):
0].axis('off')
ax[= ax[0].table(cellText=points[column_labels[i]].values,colLabels=['Latitude','Longitude'],loc='center')
table
1].set_xticks(list(np.arange(149.90,149.93,0.01)))
ax[1].set_yticks(list(np.arange(-33.39,-33.32,0.01)))
ax[1].set_xticklabels([]) # Remove x-axis ticks
ax[1].set_yticklabels([]) # Remove y-axis ticks
ax[1].tick_params(which='major', bottom=False, left=False)
ax[1].grid(True)
ax[= ax[1].scatter(points[column_labels[i][0]],points[column_labels[i][1]],c='purple',alpha=0.5)
scatter
artists.append([table,scatter])
= animation.ArtistAnimation(fig=fig, artists=artists, interval=1000, repeat=True)
ani 'obfuscation.gif')
ani.save( plt.show()
Generalisation of species records is performed in accordance with the state or territory the species is located, typically rounded to distances of 1km, 10km or 50km from a species’ original location4.
How this affects our ability to know the true location of a species in an area is illustrated in the diagram below. The true location of the point is nearby5, but the process of generalisation has removed coordinate decimal points. The effect appears as points that seem to “snap” to the corner of a grid cell, the size of which depends on the the distance of generalisation (see the tab below for more information). When these points move to their new generalised locations, any of the following three scenarios are possible:
- A point falls inside a specified area when its true location is outside the area (left),
- A point falls inside a specified area when its true location is inside the area (middle)
- A point falls outside a specified area when its true location is inside the area (right)
When points are generalised, all points within the same grid cell will snap to the same corner regardless of where they sit within the grid cell. In Australia, this is the north west corner. In Brazil it’s the north east corner, in China it’s the south west corner and in Europe it’s the south east corner. For more info on the methods of generalisation (and obfuscation more broadly), see this online book on current best practices by Arthur Chapman.
Thanks to generalisation, points can appear in places that they actually aren’t6! When dealing with many species’ point locations, this can make it difficult to keep track of which of these three scenarios might affect each generalised species record.
The main takeaway is that obfuscation makes it harder to know that you are accurately capturing all the species in a defined area. In ecological assessment, it’s generally better to capture more rather than less because species interact with their broader ecosystems (outside of our human-defined boundaries). Therefore, the goal we are trying to achieve by using a buffer is to realistically estimate how many species are influenced by the health of our defined area, not just what has been observed within a pre-defined boundary.
Things get even more complicated when you consider that the location of every observation has a degree of uncertainty around it. This information, held in coordinateUncertaintyInMetres
, adds yet another layer of complexity to knowing the true location of a given species observation. For many species, this uncertainty reflects the opportunistic nature of an observation (e.g., an organism was observed at a distance before moving somewhere else). For others, this is due to inaccurate measurement or documentation. Whatever the reason, uncertainty is another important aspect—and difficult challenge—to consider when determining which species inhabit an area.
Example: The stuttering frog
Let’s see an example of how buffer size affects our ability to detect a threatened species in our area of interest that is also a sensitive species in our area of interest.
The stuttering frog (Mixophyes balbus) is a large Australian species of frog that inhabits temperate, sub-tropical rainforest and wet sclerophyll forest. They have a brown back and a yellow underbelly, with a light blue iris that diffuses into gold above the pupil. Their call is a “kook kook kook kra-a-ak kruk kruk”, which lasts 1-2 seconds7.
Left: Mixophyes balbus (Darren Fielder CC-BY-NC 4.0 (Int)), Middle: Mixophyes balbus (liznoble CC-BY-NC 4.0 (Int)), Right: Mixophyes balbus (lachlan_harriman CC-BY-NC 4.0 (Int))
Download data
Let’s download occurrence records of stuttering frogs in a bounding box that encompasses an area slightly larger than the Mid-Western LGA8. We will also include a column with the distance each record’s location has been obfuscated, generalisationInMetres
.
import galah
import shapely
="<your-email-address>")
galah.galah_config(email# xmin, ymin, xmax, ymax
= shapely.box(148.5, -33.6, 151.1, -31.6)
bbox_midwestern = galah.atlas_occurrences(
frogs ='Mixophyes balbus',
taxa=bbox_midwestern,
bbox=["basic","generalisationInMetres"]
fields
)10) # first 10 rows frogs.head(
decimalLatitude decimalLongitude eventDate scientificName taxonConceptID recordID dataResourceName occurrenceStatus generalisationInMetres
0 -33.6 150.3 2002-02-01T00:00:00Z Mixophyes balbus https://biodiversity.org.au/afd/taxa/cfb20a07-... 4d9ab89c-00ff-436a-a318-f1a167c47679 NSW BioNet Atlas PRESENT 10000
1 -33.6 150.5 1977-01-25T00:00:00Z Mixophyes balbus https://biodiversity.org.au/afd/taxa/cfb20a07-... bac77bf9-608a-42eb-96e8-52067feb8ebf Queensland Museum provider for OZCAM PRESENT 10000
2 -33.5 150.1 1978-12-30T00:00:00Z Mixophyes balbus https://biodiversity.org.au/afd/taxa/cfb20a07-... de8c5a71-7beb-48d5-bab0-5e76268266d8 Queensland Museum provider for OZCAM PRESENT 10000
3 -33.4 150.2 2004-12-11T00:00:00Z Mixophyes balbus https://biodiversity.org.au/afd/taxa/cfb20a07-... 24c1c5d8-63ec-4a61-8636-f5c78f61ba76 NSW BioNet Atlas PRESENT 10000
4 -33.1 150.2 2007-03-26T00:00:00Z Mixophyes balbus https://biodiversity.org.au/afd/taxa/cfb20a07-... 8345db64-b2ab-485b-8195-7164d79d6ac0 NSW BioNet Atlas PRESENT 10000
5 -32.8 150.4 2005-12-08T00:00:00Z Mixophyes balbus https://biodiversity.org.au/afd/taxa/cfb20a07-... ca6be186-c0d3-405c-b8f1-20a2e57faa03 NSW BioNet Atlas PRESENT 10000
6 -32.8 150.4 2005-12-08T00:00:00Z Mixophyes balbus https://biodiversity.org.au/afd/taxa/cfb20a07-... 823c9ded-75ab-4664-94d6-1987ea68d879 NSW BioNet Atlas PRESENT 10000
7 -32.8 150.4 2005-12-07T00:00:00Z Mixophyes balbus https://biodiversity.org.au/afd/taxa/cfb20a07-... 1b7fba62-3114-432d-9d6d-e6320f932555 NSW BioNet Atlas PRESENT 10000
A good first step is to view the distance of generalisation applied to our stuttering frog data. The result shows us that some records in frogs
have been generalised to 10,000 metres (10 km).
'generalisationInMetres'].value_counts() frogs[
generalisationInMetres
10000 8
Name: count, dtype: int64
How buffer size affects species detection
Let’s consider how different buffer sizes impact our ability to detect threatened species. Below, we’ve created six buffers of increasing size around Mid-Western, then overlayed a grid of 10 km blocks over our map. You’ll notice the points have snapped to a corner in our grid. Observations of stuttering frog observations are just outside of Mid-Western; without a buffer this frog species would go undetected, but with a 30km buffer we would detect multiple observations. Which should we choose?
Code
# drop duplicates and NAs
= frogs.drop_duplicates(subset=['decimalLatitude','decimalLongitude']).dropna().reset_index(drop=True)
frogs_set
# convert to GeoDataFrame
= gpd.GeoDataFrame(
frogs_set_gdf
frogs_set, =gpd.points_from_xy(frogs_set.decimalLongitude,frogs_set.decimalLatitude),
geometry="EPSG:4326"
crs
)
# start plots
= plt.subplots(2,3,figsize=(15,10))
fig,ax = 0
extra
# initialise buffer data for visualisation
= {}
buffer_shapes = {"5km": 5000, "10km": 10000,"15km": 15000,"20km": 20000,"25km": 25000,"30km": 30000}
buffer_lengths = list(buffer_lengths.keys())
buffer_distances
# get shapefile into Australian Albers CRS
= midwestern.to_crs(3577)
midwestern_metres
# loop over each buffer length to create the buffer
# then, convert it back to degrees for conformance with the ALA CRS and unify any overlapping shapes
for length in buffer_lengths:
buffer = midwestern_metres['geometry'].buffer(buffer_lengths[length])
= buffer.to_crs(4326)
gdf_buffers_degrees = shapely.unary_union(gdf_buffers_degrees)
union_buffers_degrees = union_buffers_degrees
buffer_shapes[length]
# loop over all axes for ease of plotting
for i in range(len(ax)):
for j in range(len(ax[i])):
# if this is the first subplot, there is no buffer, so you only draw the original shape and species counts
if i == 0 and j == 0:
# there is no buffer; plot the original shape file
= ax[i][j].set_title("No Buffer",fontsize=14)
a =ax[i][j],edgecolor = "#292C26", linewidth = 2.0, facecolor = "None")
midwestern.plot(ax
# plot frogs as circles on map for reference
= frogs_set_gdf.plot(
a =ax[i][j],facecolor='#d4af37',edgecolor='#d4af37',label='label'
ax
)
# get count of frogs in midwestern
= [(x,y) for x,y in zip(frogs_set_gdf["decimalLongitude"],frogs_set_gdf["decimalLatitude"]) if shapely.contains_xy(midwestern['geometry'][74],x,y)]
points = len(points)
count
# add number of counts to graph for reference
= ax[i][j].text(150.2,-31.85,'Count={}'.format(count),fontsize=12,color='#6E260E')
a
# draw grid on plot
list(np.arange(148.7,150.8,0.1)))
ax[i][j].set_xticks(list(np.arange(-33.7,-31.7,0.1)))
ax[i][j].set_yticks(# Remove x-axis ticks
ax[i][j].set_xticklabels([]) # Remove y-axis ticks
ax[i][j].set_yticklabels([]) ='major', bottom=False, left=False)
ax[i][j].tick_params(whichTrue)
ax[i][j].set_axisbelow(
ax[i][j].grid()
# else, draw the buffer around the original shape and include species counts
else:
# get buffer
buffer = buffer_distances[i+j+extra-1]
# draw buffer on plot
= ax[i][j].plot(*buffer_shapes[buffer].exterior.xy,c='#358BA5',lw=2.0,label=buffer)
a
# set title and plot original shape
= ax[i][j].set_title("{} Buffer".format(buffer),fontsize=14)
a =ax[i][j],edgecolor = "#292C26", linewidth = 2.0, facecolor = "None", alpha = 1)
midwestern.plot(ax
# plot frogs as circles on map for reference
= frogs_set_gdf.plot(
a =ax[i][j],facecolor='#d4af37',edgecolor='#d4af37',label='label'
ax
)
# get raw number of threatened species, drop duplicates and records without lat/long
= [(x,y) for x,y in zip(frogs_set_gdf["decimalLongitude"],frogs_set_gdf["decimalLatitude"]) if shapely.contains_xy(buffer_shapes[buffer],x,y)]
points = len(points)
count
# add number of counts to graph for reference
= ax[i][j].text(150.2,-31.85,'Count={}'.format(count),fontsize=12,color='#6E260E')
a
# draw grid on plot
list(np.arange(148.7,150.8,0.1)))
ax[i][j].set_xticks(list(np.arange(-33.7,-31.7,0.1))) # -33.75
ax[i][j].set_yticks(# Remove x-axis ticks
ax[i][j].set_xticklabels([]) # Remove y-axis ticks
ax[i][j].set_yticklabels([]) ='major', bottom=False, left=False)
ax[i][j].tick_params(whichTrue)
ax[i][j].set_axisbelow(
ax[i][j].grid()
# change limits of graph and set whitespace for better looking plot
= ax[i][j].set_ylim([-33.75,-31.7])
a = ax[i][j].set_xlim([148.7,150.8])
a = ax[i][j].set_aspect('equal')
a = plt.subplots_adjust(wspace=0, hspace=0.15)
a
# add offset to ensure that we get subplots on both lines of the overall plot
+= 3
extra
; plt.show()
If we are interested in just this species, the best thing to do is to consider the distance of generalisation applied to these data. As we saw above in the generalisationInMetres
column, stuttering frog records are generalised to a distance of 10 km, so a 10 km buffer is probably the best option.
If we are interested in more than one species and there are multiple degrees of generalisation (1 km, 10 km, 50 km), then we might need to think about the goal of our species search. Using a large buffer risks capturing too many species—more than are realistically interacting with our area. Considering other factors like topography, river systems, and nutrient gradients can help us determine how big our buffer should practically be.
Threatened species in area
As a final step, let’s download and visualise where threatened species have been observed in our buffered area on a map, remembering that some threatened species are sensitive species and have been generalised. To start, let’s download occurrence records in our area with {galah-python}, with a 10 km buffer around our area (which seems reasonable given our example above). To download records, we’ll once again create a buffer around Mid-Western, then use the bounding box around this area to download occurrence records using atlas_occurrences()
9. We’ll also add the generalisationInMetres
column to our query again.
# make a 10km buffer
buffer = midwestern_metres["geometry"].buffer(10000)
= buffer.to_crs(4326)
gdf_buffers_degrees = shapely.unary_union(gdf_buffers_degrees)
buffer_10km
# get bounding box around buffer for efficient querying
= buffer_10km.bounds
bds = shapely.box(bds[0], bds[1], bds[2], bds[3])
bbox_midwestern
# get all occurrence records within bounding box from the ALA
="<your-email@example.com>")
galah.galah_config(email= galah.atlas_occurrences(
occs =bbox_midwestern,
bbox=["basic","generalisationInMetres"]
fields
)
10) # first 10 rows occs.head(
decimalLatitude decimalLongitude eventDate scientificName taxonConceptID recordID dataResourceName occurrenceStatus generalisationInMetres
0 -33.239130 150.227790 2017-10-06T00:00:00Z Rhipidura (Rhipidura) albiscapa https://biodiversity.org.au/afd/taxa/97a59c84-... 6f14e70f-07d9-4af8-9556-9b3e2c5fbb7f BirdLife Australia, Birdata PRESENT NaN
1 -33.239130 150.227790 2017-10-06T00:00:00Z Menura (Menura) novaehollandiae https://biodiversity.org.au/afd/taxa/944960f7-... d7996683-ec57-4161-8a27-d3b46495c2b9 BirdLife Australia, Birdata PRESENT NaN
2 -33.239130 150.227790 2017-10-06T00:00:00Z Philemon (Tropidorhynchus) corniculatus https://biodiversity.org.au/afd/taxa/7822040e-... 430693e5-18ec-4f4a-ae13-4721219c2a94 BirdLife Australia, Birdata PRESENT NaN
3 -33.239130 150.227790 2017-10-06T00:00:00Z Pardalotus (Pardalotus) punctatus https://biodiversity.org.au/afd/taxa/5254fe03-... f81e962c-5d3a-435c-97ac-aec05c640178 BirdLife Australia, Birdata PRESENT NaN
4 -33.239130 150.227790 2017-10-06T00:00:00Z Sericornis (Sericornis) frontalis https://biodiversity.org.au/afd/taxa/031b2b69-... a44bbba4-cd71-485f-808d-3476d63842c2 BirdLife Australia, Birdata PRESENT NaN
5 -33.239130 150.227790 2017-10-06T00:00:00Z Pachycephala (Pachycephala) pectoralis https://biodiversity.org.au/afd/taxa/30edbd1a-... 37e273e7-a6b9-49bc-ab07-b912f1af660d BirdLife Australia, Birdata PRESENT NaN
6 -33.239130 150.227790 2017-10-06T00:00:00Z Phylidonyris (Meliornis) novaehollandiae https://biodiversity.org.au/afd/taxa/da002998-... 46fcb5f4-0b37-48f1-a5be-8b3d9ff96bf3 BirdLife Australia, Birdata PRESENT NaN
7 -33.239130 150.227790 2017-10-06T00:00:00Z Strepera (Strepera) graculina https://biodiversity.org.au/afd/taxa/eb315a61-... 5aa466a1-fcbd-4803-ae09-b1d332519bbe BirdLife Australia, Birdata PRESENT NaN
8 -33.239130 150.227790 2017-10-06T00:00:00Z Ptilonorhynchus violaceus https://biodiversity.org.au/afd/taxa/d6192a35-... c2d9e630-440b-4982-bfc7-7d8f839aa39c BirdLife Australia, Birdata PRESENT NaN
9 -33.239117 150.230476 2005-04-02T11:43:00Z Banksia spinulosa https://id.biodiversity.org.au/node/apni/7931274 40f20599-21ad-412b-a07b-f0ba60a0324a iNaturalist Australia PRESENT NaN
Let’s once again check the generalisationInMetres
of our data as a starting point. Our results show that records in our area have been generalised to 1 km and 10 km distances.
'generalisationInMetres'].value_counts() occs[
generalisationInMetres
1000.0 4723
10000.0 4023
Name: count, dtype: int64
Next we’ll use show_values()
to download a list of species on the EPBC Act Threatened species list (see the tab below for more information on how to find species list IDs). By adding all_fields=True
, we can append all original columns of the list, which for conservation lists like this includes status information.
Search for species lists available in galah using search_all(lists=True)
. The species_list_uid
for the EPBC Act Threatened Species list is dr656
. We can use this information to filter downloads.
="epbc act") galah.search_all(lists
species_list_uid listName description listType dateCreated lastUpdated lastUploaded lastMatched username fullName itemCount region category generalisation authority sdsType isAuthoritative isInvasive isThreatened looseSearch isBIE isSDS wkt
0 dr17756 GM_EPBC Act EPBC Act listed species (flora and fauna CONSERVATION_LIST 2021-08-04T14:00:00Z 2021-08-05T04:33:09Z 2021-08-05T04:33:09Z 2021-08-05T04:33:09Z gminatel@umwelt.com.au None 495 None None None None None False False False None False False None
1 dr656 EPBC Act Threatened Species Threatened species currently listed under the ... CONSERVATION_LIST 2015-04-04T13:00:00Z 2025-07-08T05:14:22Z 2025-07-08T05:13:30Z 2025-07-08T05:13:30Z amanda.buyan@csiro.au None 2160 Australia None None None None True False True None True False
# get all species on epbc list + status info
= galah.show_values(field='dr656',lists=True,all_fields=True)
epbc_list 10) epbc_list.head(
id name commonName scientificName lsid dataResourceUid raw_scientificName vernacularName family status sourceStatus genus IUCN_equivalent_status rank
0 6802348 Abutilon julianae Norfolk Island Abutilon Abutilon julianae https://id.biodiversity.org.au/node/apni/2900707 dr656 Abutilon julianae Norfolk Island Abutilon Malvaceae Critically Endangered Critically Endangered Abutilon Critically Endangered NaN
1 6802412 Acacia ammophila None Acacia ammophila https://id.biodiversity.org.au/node/apni/2899480 dr656 Acacia ammophila - Fabaceae Vulnerable Vulnerable Acacia Vulnerable NaN
2 6801831 Acacia anomala Grass Wattle Acacia anomala https://id.biodiversity.org.au/node/apni/2914483 dr656 Acacia anomala Grass Wattle, Chittering Grass Wattle Fabaceae Vulnerable Vulnerable Acacia Vulnerable NaN
3 6801922 Acacia aphylla Leafless Rock Wattle Acacia aphylla https://id.biodiversity.org.au/node/apni/2913504 dr656 Acacia aphylla Leafless Rock Wattle Fabaceae Vulnerable Vulnerable Acacia Vulnerable NaN
4 6801331 Acacia aprica Blunt Wattle Acacia aprica https://id.biodiversity.org.au/node/apni/2903843 dr656 Acacia aprica Blunt Wattle Fabaceae Endangered Endangered Acacia Endangered NaN
5 6800876 Acacia araneosa Spidery Wattle Acacia araneosa https://id.biodiversity.org.au/node/apni/2919802 dr656 Acacia araneosa Spidery Wattle, Balcanoona Wattle Fabaceae Vulnerable Vulnerable Acacia Vulnerable NaN
6 6801415 Acacia aristulata Watheroo Wattle Acacia aristulata https://id.biodiversity.org.au/node/apni/2909621 dr656 Acacia aristulata Watheroo Wattle Fabaceae Endangered Endangered Acacia Endangered NaN
7 6800783 Acacia ataxiphylla subsp. magna Largefruited Tammin Wattle Acacia ataxiphylla subsp. magna https://id.biodiversity.org.au/node/apni/2905184 dr656 Acacia ataxiphylla subsp. magna Large-fruited Tammin Wattle Fabaceae Endangered Endangered Acacia Endangered NaN
8 6801897 Acacia attenuata None Acacia attenuata https://id.biodiversity.org.au/node/apni/2887463 dr656 Acacia attenuata - Fabaceae Vulnerable Vulnerable Acacia Vulnerable NaN
9 6800492 Acacia auratiflora Orangeflowered Wattle Acacia auratiflora https://id.biodiversity.org.au/node/apni/2913715 dr656 Acacia auratiflora Orange-flowered Wattle Fabaceae Endangered Endangered Acacia Endangered NaN
By merging our species list to our occurrence records occs
, our data will filter to only occurrence records of species on the EPBC list.
# merge epbc list with occurrences
= pd.merge(occs,epbc_list[['scientificName','status']],on='scientificName')
threatened_species 10) threatened_species.head(
decimalLatitude decimalLongitude eventDate scientificName taxonConceptID recordID dataResourceName occurrenceStatus generalisationInMetres status
0 -33.238924 150.094060 2009-03-18T00:00:00Z Petauroides volans https://biodiversity.org.au/afd/taxa/5e2dc7c9-... 5e84c591-8442-4ba4-9b49-fbc647318ac9 NSW BioNet Atlas PRESENT NaN Endangered
1 -33.238842 150.155935 2011-01-14T00:00:00Z Stagonopleura (Stagonopleura) guttata https://biodiversity.org.au/afd/taxa/6e872b58-... 011cabe0-abb8-4313-b626-5a4fb01cdc4d NSW BioNet Atlas PRESENT NaN Vulnerable
2 -33.238753 149.825891 2000-02-22T00:00:00Z Notechis scutatus https://biodiversity.org.au/afd/taxa/0b67b63f-... 63f6958c-6376-4e08-b4ef-f31233809122 NSW BioNet Atlas PRESENT NaN Vulnerable
3 -33.237932 150.094441 2009-03-18T00:00:00Z Petauroides volans https://biodiversity.org.au/afd/taxa/5e2dc7c9-... 6c5f4849-4136-40b4-903b-ecb6327b534a NSW BioNet Atlas PRESENT NaN Endangered
4 -33.237884 149.158401 2025-02-16T00:00:00Z Petauroides volans https://biodiversity.org.au/afd/taxa/5e2dc7c9-... 31cfba82-3909-4101-99dd-890260f16484 NSW BioNet Atlas PRESENT NaN Endangered
5 -33.237554 150.181725 2011-01-14T00:00:00Z Climacteris (Climacteris) picumnus victoriae https://biodiversity.org.au/afd/taxa/fe69a214-... d7d1d8e4-76e5-4e5d-af7e-bd593b4e3428 NSW BioNet Atlas PRESENT NaN Vulnerable
6 -33.237539 150.232407 2009-02-19T00:00:00Z Hirundapus caudacutus https://biodiversity.org.au/afd/taxa/6485cd0c-... e109b46e-35c6-4201-acb6-bbae9b56a84e NSW BioNet Atlas PRESENT NaN Vulnerable
7 -33.237364 149.202657 2005-04-24T00:00:00Z Petauroides volans https://biodiversity.org.au/afd/taxa/5e2dc7c9-... fabf9319-4e59-4ed7-9d7d-eb848ae65827 NSW BioNet Atlas PRESENT NaN Endangered
8 -33.237257 150.276301 2013-01-23T12:11:00Z Persoonia hindii https://id.biodiversity.org.au/node/apni/2913791 092b8717-a9ae-4698-bdc3-e9fb806d4436 iNaturalist Australia PRESENT NaN Endangered
9 -33.237188 150.088124 2009-03-18T00:00:00Z Petauroides volans https://biodiversity.org.au/afd/taxa/5e2dc7c9-... ce7a7e40-1220-4cfa-9b31-39cc6927bc04 NSW BioNet Atlas PRESENT NaN Endangered
Finally, we’ll filter our observations to only those within the buffered area around Mid-Western.
# convert to GeoDataFrame
= gpd.GeoDataFrame(
threatened_species_gdf
threatened_species, =gpd.points_from_xy(threatened_species.decimalLongitude, threatened_species.decimalLatitude),
geometry="EPSG:4326"
crs
)
# filter to species within our 10km buffer
= threatened_species[threatened_species_gdf.geometry.within(buffer_10km)] threatened_species_10km
We can now plot our observations on a map using {matplotlib}, adjusting the opacity of our points (alpha
) so we can see areas with many overlapping points more easily. To understand just how many additional species our 10 km buffer captures, we’ve added an additional bar plot comparing the number of species in Mid-Western vs Mid-Western + our 10 km buffer. Notice that the number of threatened species detected markedly increases with the addition of our buffer because we captured a few noticeable hotspots of threatened species observations near the edge of the Mid-Western border, though remember that they might not be real hotspots but rather a group of generalised points!
# map
= plt.subplots()
fig, ax 'decimalLongitude'],threatened_species_10km['decimalLatitude'], alpha=0.3, color='#5A2A57')
ax.scatter(threatened_species_10km[*midwestern['geometry'][74].exterior.xy,c='#292C26',lw=2)
ax.plot(*buffer_10km.exterior.xy,c='#358BA5',lw=2)
ax.plot('off')
plt.axis(='equal')
ax.set_aspect(aspect; plt.show()
Code to make bar plot
# import numpy for arange
import numpy as np
# get number of species within midwestern
= midwestern.to_crs(4326)
midwestern = threatened_species[threatened_species_gdf.geometry.within(midwestern['geometry'][74])]
species_mw
# set dictionary for get number of species in midwestern, sorted by status
= {x:0 for x in list(set(species_mw['status']))}
num_species_mid
# loop over each key, get the number of unique species for each status in Midwestern
for key in num_species_mid:
= species_mw[species_mw['status'] == key]
temp = len(list(set(temp['scientificName'])))
num_species_mid[key]
# sort species from most to least
= dict(sorted(num_species_mid.items(), key=lambda item: item[1]))
num_species_mid_sorted
# set dictionary to get number of species in buffer, sorted by status
= {x:0 for x in list(set(threatened_species_10km['status']))}
num_species_buff
# loop over each key, get the number of unique species for each status in buffered region
for key in num_species_buff:
= threatened_species_10km[threatened_species_10km['status'] == key]
temp = len(list(set(temp['scientificName'])))
num_species_buff[key]
# sort buffered species from most to least
= dict(sorted(num_species_buff.items(), key=lambda item: item[1]))
num_species_buff_sorted
# create plot
= plt.subplots()
fig, ax = 0.4, np.arange(len(num_species_mid_sorted.keys()))
h, n = ax.barh(n + h/2, num_species_mid_sorted.values(), height=h, color='#292C26', label='Mid-Western')
bar
# set ticks
= ax.set_yticks(range(len(num_species_mid_sorted.keys())), labels=num_species_mid_sorted.keys())
bar = ax.barh(n - h/2, num_species_buff_sorted.values(), height=h, color='#358BA5', label='Mid-Western + buffer')
bar = ax.legend()
bar = ax.set_xlabel('Number of Unique Species')
bar = ax.set_aspect(aspect=12) bar
In our example, record locations were generalised to 1 km or 10 km distances. However, it’s possible to return species locations generalised to greater distances of 50 km. In these situations, a species that lives quite far from our specified area might appear in our query! To ensure species lists are accurate, it’s always important to use generalisationInMetres
to identify these data points, determine whether they are realistic to include or not, and clean them accordingly!
Final thoughts
We hope this post has helped you understand how to draw buffers around a shape, as well as the importance of considering buffer size when determining threatened species in an area. Ultimately, buffer size will depend on the question we are trying to answer, whether that’s for research, monitoring, conservation or environmental impact assessment prior to development.
For other Python posts, check out our beginner’s guide to map species observations or see how to cross reference a species list with a conservation list.
Expand for session info
-----
galah 0.12.1
geopandas 1.1.1
matplotlib 3.10.3
natsort 8.4.0
numpy 2.3.1
pandas 2.3.1
session_info v1.0.1
shapely 2.1.1
-----
Python 3.13.7 (tags/v3.13.7:bcee1c3, Aug 14 2025, 14:15:11) [MSC v.1944 64 bit (AMD64)]
Windows-11-10.0.22631-SP0
-----
Session information updated at 2025-08-26 12:28
Thanks to Cameron Slatyer and Tania Laity for their helpful comments and suggestions for this post.
Footnotes
Check out this section of a previous ALA Labs post for a more complete explanation of what a CRS is.↩︎
ALA data is projected using CRS EPSG:4326 (the same one used by Google Earth).↩︎
Alternatively, iNaturalist randomises their coordinate locations at a 30 km resolution prior to sharing data with the Atlas of Living Australia.↩︎
Some records, if the species is very sensitive, may be witheld altogether.↩︎
How nearby is relative to the degree of generalisation, so what we mean by “near” varies↩︎
Generalising data points is good because it makes it very difficult know the specific location of an observation when access to this information should be restricted for some reason (i.e., endangered, subject to poaching or misuse, privacy concerns). However, it can also make data more difficult to work with because data points are shifted to new locations. In some cases, it can even cause data points to appear in the ocean despite the true location being on land!↩︎
Downloading all occurrence records using a bounding box then filtering the records to fit a shapefile can be a quicker way to subset records (rather than waiting for the API to process a more complex shapefile polygon shape). You can find a more in-depth article on this here.↩︎
It’s faster to use a bounding box than polygon shape to download records - a box has fewer points than a complicated polygon shape, making for a far simpler query to process!↩︎