Add a buffer to a shapefile and account for obfuscated species locations

Knowing what species have been observed in a local area is a regular task for ecosystem management. Here we show how to add a buffer to a shapefile using {shapely} and {geopandas}. We then demonstrate how data that have been obfuscated (i.e. their location has been made less precise) may affect the buffer size required to more confidently capture threatened species in the area using {galah-python} and {matplotlib}.

Eukaryota
Animalia
Amphibia
Summaries
Python
Authors

Amanda Buyan

Dax Kellie

Published

August 26, 2025

Author

Amanda Buyan
Dax Kellie

Date

26 August 2025

Ecological data is often used to understand what species are found in a given location, especially for conservation monitoring and environmental impact assessment prior to land development. A common method for this task is to use a buffer, an outward boundary around a given area location. Adding a buffer helps to capture all the species in an area, including those that have been observed just outside the area and probably live there, too.

Choosing a buffer size, however, can be tougher than it seems. Individual organisms move, either over the course of a day or an entire season (e.g. migration, perennial growth), so species’ lifecycles and behaviours may determine the size of our final buffer. A more difficult challenge occurs when species are considered sensitive, vulnerable or endangered. These species’ exact point locations are often obfuscated (aka their location is made less precise) to keep these species safe. This added imprecision will again affect our final decision on buffer size.

In this post, we’ll show how to add a buffer around a shapefile with {geopandas}, {shapely} and {matplotlib}. Then we will use {galah-python} to download data of Stuttering frogs (Mixophyes balbus) to demonstrate how the size of a buffer can affect the detection of threatened species in an area. Lastly, we will use {scipy} and {matplotlib} to show the effect of buffers on detecting threatened species in an area.

Draw a buffer

For this example, our area of interest is Mid-Western, a Local Government Area (LGA) in New South Wales. We’ll first need to download a shapefile of our area, which we can get by downloading a shapefile of all LGAs from the Australian Bureau of Statistics and filtering to our area. Download the zip file of “Local Government Areas - 2024”, then place the zip file in your local directory. We can then read in the shapefile and show what it looks like using {geopandas}.

For those unfamiliar with Australian geography, the LGA of Mid-Western is located here:

import geopandas as gpd
import shapely
from shapely.geometry import Polygon
lgas = gpd.read_file("LGA_2024_AUST_GDA2020.zip")
lgas.head()
  LGA_CODE24 LGA_NAME24 STE_CODE21       STE_NAME21 AUS_CODE21 AUS_NAME21    AREASQKM                                         LOCI_URI21                                           geometry
0      10050     Albury          1  New South Wales        AUS  Australia    305.6386  https://linked.data.gov.au/dataset/asgsed3/LGA...  POLYGON ((146.86566 -36.07292, 146.86512 -36.0...
1      10180   Armidale          1  New South Wales        AUS  Australia   7809.4406  https://linked.data.gov.au/dataset/asgsed3/LGA...  POLYGON ((152.38816 -30.52639, 152.38812 -30.5...
2      10250    Ballina          1  New South Wales        AUS  Australia    484.9692  https://linked.data.gov.au/dataset/asgsed3/LGA...  MULTIPOLYGON (((153.57106 -28.87381, 153.57106...
3      10300  Balranald          1  New South Wales        AUS  Australia  21690.7493  https://linked.data.gov.au/dataset/asgsed3/LGA...  POLYGON ((143.00433 -33.78164, 143.01538 -33.7...
4      10470   Bathurst          1  New South Wales        AUS  Australia   3817.8645  https://linked.data.gov.au/dataset/asgsed3/LGA...  POLYGON ((149.84877 -33.52784, 149.84864 -33.5...

Then we’ll filter to Mid-Western.

midwestern = lgas[lgas['LGA_NAME24'] == 'Mid-Western']

Now, we will create a 5 km buffer around Mid-Western, as we are looking at a relatively small area. To do this, we’ll need to convert the shapefiles between different Coordinate Reference Systems (CRS)1 to allow us to draw our buffer.

First, we’ll reproject our polygon midwestern to a CRS measured in metres, like Australian Albers (EPSG:3577). Then we can create a buffer in metres around midwestern. Finally, we’ll reproject midwestern match the CRS of the data we intend to use, EPSG:43262, and unify any intersecting shapes.

# reproject to Australian Albers CRS
midwestern_metres = midwestern.to_crs(3577)

# create buffer, reproject, unify any overlapping shapes
buffer_5km = midwestern_metres['geometry'].buffer(5000)
buffer_5km_degrees = buffer_5km.to_crs(4326)
union_buffer_5km_degrees = shapely.unary_union(buffer_5km_degrees)

Let’s plot our 5km buffer on a map.

# import matplotlib for plotting
import matplotlib.pyplot as plt

# set initial shapefile as axis
ax = midwestern.plot(edgecolor="#292C26", linewidth = 2.0, facecolor="white")

# plot buffer on same axis as original shapefile
plt.plot(*union_buffer_5km_degrees.exterior.xy, c='#358BA5', lw=2.0, label=length)
ax.axis('off') # remove axis to make plot look prettier

Now that we’ve drawn our buffer around the Mid-Western LGA, let’s discuss how data obfuscation of sensitive/threatened species data might impact our decision about buffer size.

Obfuscation and why it’s important for sensitive species

When we talk about a record being obfuscated, we mean that the coordinate location of this record has been made less precise either by generalisation or randomisation. The Atlas of Living Australia generalises coordinate locations by reducing the number of decimals in the record’s lat/long coordinates, lowering the point’s precision3. This figure below illustrates this; as the decimal points are removed, the data appears more ‘grid-like’ as the data loses resolution.

Code
# import plotting and animation packages
import matplotlib.animation as animation
import math
import pandas as pd
from IPython.display import display, Javascript
# use below if you are running in jupyter notebook
%matplotlib ipympl

# create dataframe of points
points = pd.DataFrame({
    'orig_long': [149.9153,149.9181,149.9204,149.9233,149.9101,149.9258,149.9121,149.9163,
                  149.9295,149.9175,149.9287,149.9236,149.9109,149.9091,149.9113,149.9211,
                  149.9073,149.9087,149.9236,149.9241,149.9289], 
    'orig_lat': [-33.3874,-33.3509,-33.3694,-33.3479,-33.3341,-33.3789,-33.3475,-33.3748,
                 -33.3554,-33.3723,-33.3808,-33.3798,-33.3475,-33.3607,-33.3521,-33.3871,
                 -33.3541,-33.3633,-33.3799,-33.3833,-33.3423]
})

# Round to 3 and 2 decimal places
for i in range(3,1,-1):
    factor = 10 ** i
    points['round{}_long'.format(i)] = points['orig_long'].apply(lambda x: math.floor(x * factor) / factor)
    points['round{}_lat'.format(i)] = points['orig_lat'].apply(lambda x: math.ceil(x * factor) / factor)

# create initial figure
fig,ax = plt.subplots(1,2)

# create dictionary for columns - will be easier to use in update function
column_labels = {0: ['orig_long', 'orig_lat'], 
                 1: ['round3_long', 'round3_lat'], 
                 2: ['round2_long', 'round2_lat']}

artists = []
for i in range(3):
    
    ax[0].axis('off')
    table = ax[0].table(cellText=points[column_labels[i]].values,colLabels=['Latitude','Longitude'],loc='center')

    ax[1].set_xticks(list(np.arange(149.90,149.93,0.01)))
    ax[1].set_yticks(list(np.arange(-33.39,-33.32,0.01)))
    ax[1].set_xticklabels([])  # Remove x-axis ticks
    ax[1].set_yticklabels([])  # Remove y-axis ticks
    ax[1].tick_params(which='major', bottom=False, left=False)
    ax[1].grid(True)
    scatter = ax[1].scatter(points[column_labels[i][0]],points[column_labels[i][1]],c='purple',alpha=0.5)
    artists.append([table,scatter])

ani = animation.ArtistAnimation(fig=fig, artists=artists, interval=1000, repeat=True)
ani.save('obfuscation.gif')
plt.show()

Generalisation of species records is performed in accordance with the state or territory the species is located, typically rounded to distances of 1km, 10km or 50km from a species’ original location4.

How this affects our ability to know the true location of a species in an area is illustrated in the diagram below. The true location of the point is nearby5, but the process of generalisation has removed coordinate decimal points. The effect appears as points that seem to “snap” to the corner of a grid cell, the size of which depends on the the distance of generalisation (see the tab below for more information). When these points move to their new generalised locations, any of the following three scenarios are possible:

  1. A point falls inside a specified area when its true location is outside the area (left),
  2. A point falls inside a specified area when its true location is inside the area (middle)
  3. A point falls outside a specified area when its true location is inside the area (right)

When points are generalised, all points within the same grid cell will snap to the same corner regardless of where they sit within the grid cell. In Australia, this is the north west corner. In Brazil it’s the north east corner, in China it’s the south west corner and in Europe it’s the south east corner. For more info on the methods of generalisation (and obfuscation more broadly), see this online book on current best practices by Arthur Chapman.

Thanks to generalisation, points can appear in places that they actually aren’t6! When dealing with many species’ point locations, this can make it difficult to keep track of which of these three scenarios might affect each generalised species record.

The main takeaway is that obfuscation makes it harder to know that you are accurately capturing all the species in a defined area. In ecological assessment, it’s generally better to capture more rather than less because species interact with their broader ecosystems (outside of our human-defined boundaries). Therefore, the goal we are trying to achieve by using a buffer is to realistically estimate how many species are influenced by the health of our defined area, not just what has been observed within a pre-defined boundary.

Things get even more complicated when you consider that the location of every observation has a degree of uncertainty around it. This information, held in coordinateUncertaintyInMetres, adds yet another layer of complexity to knowing the true location of a given species observation. For many species, this uncertainty reflects the opportunistic nature of an observation (e.g., an organism was observed at a distance before moving somewhere else). For others, this is due to inaccurate measurement or documentation. Whatever the reason, uncertainty is another important aspect—and difficult challenge—to consider when determining which species inhabit an area.

Example: The stuttering frog

Let’s see an example of how buffer size affects our ability to detect a threatened species in our area of interest that is also a sensitive species in our area of interest.

The stuttering frog (Mixophyes balbus) is a large Australian species of frog that inhabits temperate, sub-tropical rainforest and wet sclerophyll forest. They have a brown back and a yellow underbelly, with a light blue iris that diffuses into gold above the pupil. Their call is a “kook kook kook kra-a-ak kruk kruk”, which lasts 1-2 seconds7.

Download data

Let’s download occurrence records of stuttering frogs in a bounding box that encompasses an area slightly larger than the Mid-Western LGA8. We will also include a column with the distance each record’s location has been obfuscated, generalisationInMetres.

import galah
import shapely
galah.galah_config(email="<your-email-address>")
#                              xmin,  ymin,  xmax,  ymax
bbox_midwestern = shapely.box(148.5, -33.6, 151.1, -31.6)
frogs = galah.atlas_occurrences(
    taxa='Mixophyes balbus',
    bbox=bbox_midwestern,
    fields=["basic","generalisationInMetres"]
)
frogs.head(10) # first 10 rows
   decimalLatitude  decimalLongitude             eventDate    scientificName                                     taxonConceptID                              recordID                      dataResourceName occurrenceStatus  generalisationInMetres
0            -33.6             150.3  2002-02-01T00:00:00Z  Mixophyes balbus  https://biodiversity.org.au/afd/taxa/cfb20a07-...  4d9ab89c-00ff-436a-a318-f1a167c47679                      NSW BioNet Atlas          PRESENT                   10000
1            -33.6             150.5  1977-01-25T00:00:00Z  Mixophyes balbus  https://biodiversity.org.au/afd/taxa/cfb20a07-...  bac77bf9-608a-42eb-96e8-52067feb8ebf  Queensland Museum provider for OZCAM          PRESENT                   10000
2            -33.5             150.1  1978-12-30T00:00:00Z  Mixophyes balbus  https://biodiversity.org.au/afd/taxa/cfb20a07-...  de8c5a71-7beb-48d5-bab0-5e76268266d8  Queensland Museum provider for OZCAM          PRESENT                   10000
3            -33.4             150.2  2004-12-11T00:00:00Z  Mixophyes balbus  https://biodiversity.org.au/afd/taxa/cfb20a07-...  24c1c5d8-63ec-4a61-8636-f5c78f61ba76                      NSW BioNet Atlas          PRESENT                   10000
4            -33.1             150.2  2007-03-26T00:00:00Z  Mixophyes balbus  https://biodiversity.org.au/afd/taxa/cfb20a07-...  8345db64-b2ab-485b-8195-7164d79d6ac0                      NSW BioNet Atlas          PRESENT                   10000
5            -32.8             150.4  2005-12-08T00:00:00Z  Mixophyes balbus  https://biodiversity.org.au/afd/taxa/cfb20a07-...  ca6be186-c0d3-405c-b8f1-20a2e57faa03                      NSW BioNet Atlas          PRESENT                   10000
6            -32.8             150.4  2005-12-08T00:00:00Z  Mixophyes balbus  https://biodiversity.org.au/afd/taxa/cfb20a07-...  823c9ded-75ab-4664-94d6-1987ea68d879                      NSW BioNet Atlas          PRESENT                   10000
7            -32.8             150.4  2005-12-07T00:00:00Z  Mixophyes balbus  https://biodiversity.org.au/afd/taxa/cfb20a07-...  1b7fba62-3114-432d-9d6d-e6320f932555                      NSW BioNet Atlas          PRESENT                   10000

A good first step is to view the distance of generalisation applied to our stuttering frog data. The result shows us that some records in frogs have been generalised to 10,000 metres (10 km).

frogs['generalisationInMetres'].value_counts()
generalisationInMetres
10000    8
Name: count, dtype: int64

How buffer size affects species detection

Let’s consider how different buffer sizes impact our ability to detect threatened species. Below, we’ve created six buffers of increasing size around Mid-Western, then overlayed a grid of 10 km blocks over our map. You’ll notice the points have snapped to a corner in our grid. Observations of stuttering frog observations are just outside of Mid-Western; without a buffer this frog species would go undetected, but with a 30km buffer we would detect multiple observations. Which should we choose?

Code
# drop duplicates and NAs
frogs_set = frogs.drop_duplicates(subset=['decimalLatitude','decimalLongitude']).dropna().reset_index(drop=True)

# convert to GeoDataFrame
frogs_set_gdf = gpd.GeoDataFrame(
    frogs_set, 
    geometry=gpd.points_from_xy(frogs_set.decimalLongitude,frogs_set.decimalLatitude), 
    crs="EPSG:4326"
)

# start plots
fig,ax = plt.subplots(2,3,figsize=(15,10))
extra = 0

# initialise buffer data for visualisation
buffer_shapes = {}
buffer_lengths = {"5km": 5000, "10km": 10000,"15km": 15000,"20km": 20000,"25km": 25000,"30km": 30000}
buffer_distances = list(buffer_lengths.keys())

# get shapefile into Australian Albers CRS
midwestern_metres = midwestern.to_crs(3577)

# loop over each buffer length to create the buffer
# then, convert it back to degrees for conformance with the ALA CRS and unify any overlapping shapes
for length in buffer_lengths:
  buffer = midwestern_metres['geometry'].buffer(buffer_lengths[length])
  gdf_buffers_degrees = buffer.to_crs(4326)
  union_buffers_degrees = shapely.unary_union(gdf_buffers_degrees)
  buffer_shapes[length] = union_buffers_degrees

# loop over all axes for ease of plotting
for i in range(len(ax)):
    for j in range(len(ax[i])):

        # if this is the first subplot, there is no buffer, so you only draw the original shape and species counts
        if i == 0 and j == 0:

            # there is no buffer; plot the original shape file
            a = ax[i][j].set_title("No Buffer",fontsize=14)
            midwestern.plot(ax=ax[i][j],edgecolor = "#292C26", linewidth = 2.0, facecolor = "None")

            # plot frogs as circles on map for reference
            a = frogs_set_gdf.plot(
                ax=ax[i][j],facecolor='#d4af37',edgecolor='#d4af37',label='label'
            ) 
    
            # get count of frogs in midwestern
            points = [(x,y) for x,y in zip(frogs_set_gdf["decimalLongitude"],frogs_set_gdf["decimalLatitude"]) if shapely.contains_xy(midwestern['geometry'][74],x,y)]
            count = len(points)
            
            # add number of counts to graph for reference
            a = ax[i][j].text(150.2,-31.85,'Count={}'.format(count),fontsize=12,color='#6E260E')

            # draw grid on plot
            ax[i][j].set_xticks(list(np.arange(148.7,150.8,0.1)))
            ax[i][j].set_yticks(list(np.arange(-33.7,-31.7,0.1)))
            ax[i][j].set_xticklabels([])  # Remove x-axis ticks
            ax[i][j].set_yticklabels([])  # Remove y-axis ticks
            ax[i][j].tick_params(which='major', bottom=False, left=False)
            ax[i][j].set_axisbelow(True)
            ax[i][j].grid()

        # else, draw the buffer around the original shape and include species counts
        else:

            # get buffer 
            buffer = buffer_distances[i+j+extra-1]
            
            # draw buffer on plot
            a = ax[i][j].plot(*buffer_shapes[buffer].exterior.xy,c='#358BA5',lw=2.0,label=buffer)
            
            # set title and plot original shape
            a = ax[i][j].set_title("{} Buffer".format(buffer),fontsize=14)
            midwestern.plot(ax=ax[i][j],edgecolor = "#292C26", linewidth = 2.0, facecolor = "None", alpha = 1)

            # plot frogs as circles on map for reference
            a = frogs_set_gdf.plot(
                ax=ax[i][j],facecolor='#d4af37',edgecolor='#d4af37',label='label'
            ) 
    
            # get raw number of threatened species, drop duplicates and records without lat/long
            points = [(x,y) for x,y in zip(frogs_set_gdf["decimalLongitude"],frogs_set_gdf["decimalLatitude"]) if shapely.contains_xy(buffer_shapes[buffer],x,y)]
            count = len(points)
            
            # add number of counts to graph for reference
            a = ax[i][j].text(150.2,-31.85,'Count={}'.format(count),fontsize=12,color='#6E260E')
            
            # draw grid on plot
            ax[i][j].set_xticks(list(np.arange(148.7,150.8,0.1)))
            ax[i][j].set_yticks(list(np.arange(-33.7,-31.7,0.1))) # -33.75
            ax[i][j].set_xticklabels([])  # Remove x-axis ticks
            ax[i][j].set_yticklabels([])  # Remove y-axis ticks
            ax[i][j].tick_params(which='major', bottom=False, left=False)
            ax[i][j].set_axisbelow(True)
            ax[i][j].grid()

        # change limits of graph and set whitespace for better looking plot
        a = ax[i][j].set_ylim([-33.75,-31.7])
        a = ax[i][j].set_xlim([148.7,150.8])
        a = ax[i][j].set_aspect('equal')
        a = plt.subplots_adjust(wspace=0, hspace=0.15)

    # add offset to ensure that we get subplots on both lines of the overall plot
    extra += 3

plt.show();

If we are interested in just this species, the best thing to do is to consider the distance of generalisation applied to these data. As we saw above in the generalisationInMetres column, stuttering frog records are generalised to a distance of 10 km, so a 10 km buffer is probably the best option.

If we are interested in more than one species and there are multiple degrees of generalisation (1 km, 10 km, 50 km), then we might need to think about the goal of our species search. Using a large buffer risks capturing too many species—more than are realistically interacting with our area. Considering other factors like topography, river systems, and nutrient gradients can help us determine how big our buffer should practically be.

Threatened species in area

As a final step, let’s download and visualise where threatened species have been observed in our buffered area on a map, remembering that some threatened species are sensitive species and have been generalised. To start, let’s download occurrence records in our area with {galah-python}, with a 10 km buffer around our area (which seems reasonable given our example above). To download records, we’ll once again create a buffer around Mid-Western, then use the bounding box around this area to download occurrence records using atlas_occurrences()9. We’ll also add the generalisationInMetres column to our query again.

# make a 10km buffer
buffer = midwestern_metres["geometry"].buffer(10000)
gdf_buffers_degrees = buffer.to_crs(4326)
buffer_10km = shapely.unary_union(gdf_buffers_degrees)

# get bounding box around buffer for efficient querying
bds = buffer_10km.bounds
bbox_midwestern = shapely.box(bds[0], bds[1], bds[2], bds[3])

# get all occurrence records within bounding box from the ALA
galah.galah_config(email="<your-email@example.com>")
occs = galah.atlas_occurrences(
    bbox=bbox_midwestern, 
    fields=["basic","generalisationInMetres"]
)

occs.head(10)  # first 10 rows
   decimalLatitude  decimalLongitude             eventDate                            scientificName                                     taxonConceptID                              recordID             dataResourceName occurrenceStatus  generalisationInMetres
0       -33.239130        150.227790  2017-10-06T00:00:00Z           Rhipidura (Rhipidura) albiscapa  https://biodiversity.org.au/afd/taxa/97a59c84-...  6f14e70f-07d9-4af8-9556-9b3e2c5fbb7f  BirdLife Australia, Birdata          PRESENT                     NaN
1       -33.239130        150.227790  2017-10-06T00:00:00Z           Menura (Menura) novaehollandiae  https://biodiversity.org.au/afd/taxa/944960f7-...  d7996683-ec57-4161-8a27-d3b46495c2b9  BirdLife Australia, Birdata          PRESENT                     NaN
2       -33.239130        150.227790  2017-10-06T00:00:00Z   Philemon (Tropidorhynchus) corniculatus  https://biodiversity.org.au/afd/taxa/7822040e-...  430693e5-18ec-4f4a-ae13-4721219c2a94  BirdLife Australia, Birdata          PRESENT                     NaN
3       -33.239130        150.227790  2017-10-06T00:00:00Z         Pardalotus (Pardalotus) punctatus  https://biodiversity.org.au/afd/taxa/5254fe03-...  f81e962c-5d3a-435c-97ac-aec05c640178  BirdLife Australia, Birdata          PRESENT                     NaN
4       -33.239130        150.227790  2017-10-06T00:00:00Z         Sericornis (Sericornis) frontalis  https://biodiversity.org.au/afd/taxa/031b2b69-...  a44bbba4-cd71-485f-808d-3476d63842c2  BirdLife Australia, Birdata          PRESENT                     NaN
5       -33.239130        150.227790  2017-10-06T00:00:00Z    Pachycephala (Pachycephala) pectoralis  https://biodiversity.org.au/afd/taxa/30edbd1a-...  37e273e7-a6b9-49bc-ab07-b912f1af660d  BirdLife Australia, Birdata          PRESENT                     NaN
6       -33.239130        150.227790  2017-10-06T00:00:00Z  Phylidonyris (Meliornis) novaehollandiae  https://biodiversity.org.au/afd/taxa/da002998-...  46fcb5f4-0b37-48f1-a5be-8b3d9ff96bf3  BirdLife Australia, Birdata          PRESENT                     NaN
7       -33.239130        150.227790  2017-10-06T00:00:00Z             Strepera (Strepera) graculina  https://biodiversity.org.au/afd/taxa/eb315a61-...  5aa466a1-fcbd-4803-ae09-b1d332519bbe  BirdLife Australia, Birdata          PRESENT                     NaN
8       -33.239130        150.227790  2017-10-06T00:00:00Z                 Ptilonorhynchus violaceus  https://biodiversity.org.au/afd/taxa/d6192a35-...  c2d9e630-440b-4982-bfc7-7d8f839aa39c  BirdLife Australia, Birdata          PRESENT                     NaN
9       -33.239117        150.230476  2005-04-02T11:43:00Z                         Banksia spinulosa   https://id.biodiversity.org.au/node/apni/7931274  40f20599-21ad-412b-a07b-f0ba60a0324a        iNaturalist Australia          PRESENT                     NaN

Let’s once again check the generalisationInMetres of our data as a starting point. Our results show that records in our area have been generalised to 1 km and 10 km distances.

occs['generalisationInMetres'].value_counts()
generalisationInMetres
1000.0     4723
10000.0    4023
Name: count, dtype: int64

Next we’ll use show_values() to download a list of species on the EPBC Act Threatened species list (see the tab below for more information on how to find species list IDs). By adding all_fields=True, we can append all original columns of the list, which for conservation lists like this includes status information.

Search for species lists available in galah using search_all(lists=True). The species_list_uid for the EPBC Act Threatened Species list is dr656. We can use this information to filter downloads.

galah.search_all(lists="epbc act")
  species_list_uid                     listName                                        description           listType           dateCreated           lastUpdated          lastUploaded           lastMatched                username fullName  itemCount     region category generalisation authority sdsType  isAuthoritative  isInvasive  isThreatened looseSearch  isBIE  isSDS   wkt
0          dr17756                  GM_EPBC Act           EPBC Act listed species (flora and fauna  CONSERVATION_LIST  2021-08-04T14:00:00Z  2021-08-05T04:33:09Z  2021-08-05T04:33:09Z  2021-08-05T04:33:09Z  gminatel@umwelt.com.au     None        495       None     None           None      None    None            False       False         False        None  False  False  None
1            dr656  EPBC Act Threatened Species  Threatened species currently listed under the ...  CONSERVATION_LIST  2015-04-04T13:00:00Z  2025-07-08T05:14:22Z  2025-07-08T05:13:30Z  2025-07-08T05:13:30Z   amanda.buyan@csiro.au     None       2160  Australia     None           None      None    None             True       False          True        None   True  False      
# get all species on epbc list + status info
epbc_list = galah.show_values(field='dr656',lists=True,all_fields=True)
epbc_list.head(10)
        id                             name                  commonName                   scientificName                                              lsid dataResourceUid               raw_scientificName                         vernacularName     family                 status           sourceStatus     genus IUCN_equivalent_status rank
0  6802348                Abutilon julianae     Norfolk Island Abutilon                Abutilon julianae  https://id.biodiversity.org.au/node/apni/2900707           dr656                Abutilon julianae                Norfolk Island Abutilon  Malvaceae  Critically Endangered  Critically Endangered  Abutilon  Critically Endangered  NaN
1  6802412                 Acacia ammophila                        None                 Acacia ammophila  https://id.biodiversity.org.au/node/apni/2899480           dr656                 Acacia ammophila                                      -   Fabaceae             Vulnerable             Vulnerable    Acacia             Vulnerable  NaN
2  6801831                   Acacia anomala                Grass Wattle                   Acacia anomala  https://id.biodiversity.org.au/node/apni/2914483           dr656                   Acacia anomala  Grass Wattle, Chittering Grass Wattle   Fabaceae             Vulnerable             Vulnerable    Acacia             Vulnerable  NaN
3  6801922                   Acacia aphylla        Leafless Rock Wattle                   Acacia aphylla  https://id.biodiversity.org.au/node/apni/2913504           dr656                   Acacia aphylla                   Leafless Rock Wattle   Fabaceae             Vulnerable             Vulnerable    Acacia             Vulnerable  NaN
4  6801331                    Acacia aprica                Blunt Wattle                    Acacia aprica  https://id.biodiversity.org.au/node/apni/2903843           dr656                    Acacia aprica                           Blunt Wattle   Fabaceae             Endangered             Endangered    Acacia             Endangered  NaN
5  6800876                  Acacia araneosa              Spidery Wattle                  Acacia araneosa  https://id.biodiversity.org.au/node/apni/2919802           dr656                  Acacia araneosa      Spidery Wattle, Balcanoona Wattle   Fabaceae             Vulnerable             Vulnerable    Acacia             Vulnerable  NaN
6  6801415                Acacia aristulata             Watheroo Wattle                Acacia aristulata  https://id.biodiversity.org.au/node/apni/2909621           dr656                Acacia aristulata                        Watheroo Wattle   Fabaceae             Endangered             Endangered    Acacia             Endangered  NaN
7  6800783  Acacia ataxiphylla subsp. magna  Largefruited Tammin Wattle  Acacia ataxiphylla subsp. magna  https://id.biodiversity.org.au/node/apni/2905184           dr656  Acacia ataxiphylla subsp. magna            Large-fruited Tammin Wattle   Fabaceae             Endangered             Endangered    Acacia             Endangered  NaN
8  6801897                 Acacia attenuata                        None                 Acacia attenuata  https://id.biodiversity.org.au/node/apni/2887463           dr656                 Acacia attenuata                                      -   Fabaceae             Vulnerable             Vulnerable    Acacia             Vulnerable  NaN
9  6800492               Acacia auratiflora       Orangeflowered Wattle               Acacia auratiflora  https://id.biodiversity.org.au/node/apni/2913715           dr656               Acacia auratiflora                 Orange-flowered Wattle   Fabaceae             Endangered             Endangered    Acacia             Endangered  NaN

By merging our species list to our occurrence records occs, our data will filter to only occurrence records of species on the EPBC list.

# merge epbc list with occurrences
threatened_species = pd.merge(occs,epbc_list[['scientificName','status']],on='scientificName')
threatened_species.head(10)
   decimalLatitude  decimalLongitude             eventDate                                scientificName                                     taxonConceptID                              recordID       dataResourceName occurrenceStatus  generalisationInMetres      status
0       -33.238924        150.094060  2009-03-18T00:00:00Z                            Petauroides volans  https://biodiversity.org.au/afd/taxa/5e2dc7c9-...  5e84c591-8442-4ba4-9b49-fbc647318ac9       NSW BioNet Atlas          PRESENT                     NaN  Endangered
1       -33.238842        150.155935  2011-01-14T00:00:00Z         Stagonopleura (Stagonopleura) guttata  https://biodiversity.org.au/afd/taxa/6e872b58-...  011cabe0-abb8-4313-b626-5a4fb01cdc4d       NSW BioNet Atlas          PRESENT                     NaN  Vulnerable
2       -33.238753        149.825891  2000-02-22T00:00:00Z                             Notechis scutatus  https://biodiversity.org.au/afd/taxa/0b67b63f-...  63f6958c-6376-4e08-b4ef-f31233809122       NSW BioNet Atlas          PRESENT                     NaN  Vulnerable
3       -33.237932        150.094441  2009-03-18T00:00:00Z                            Petauroides volans  https://biodiversity.org.au/afd/taxa/5e2dc7c9-...  6c5f4849-4136-40b4-903b-ecb6327b534a       NSW BioNet Atlas          PRESENT                     NaN  Endangered
4       -33.237884        149.158401  2025-02-16T00:00:00Z                            Petauroides volans  https://biodiversity.org.au/afd/taxa/5e2dc7c9-...  31cfba82-3909-4101-99dd-890260f16484       NSW BioNet Atlas          PRESENT                     NaN  Endangered
5       -33.237554        150.181725  2011-01-14T00:00:00Z  Climacteris (Climacteris) picumnus victoriae  https://biodiversity.org.au/afd/taxa/fe69a214-...  d7d1d8e4-76e5-4e5d-af7e-bd593b4e3428       NSW BioNet Atlas          PRESENT                     NaN  Vulnerable
6       -33.237539        150.232407  2009-02-19T00:00:00Z                         Hirundapus caudacutus  https://biodiversity.org.au/afd/taxa/6485cd0c-...  e109b46e-35c6-4201-acb6-bbae9b56a84e       NSW BioNet Atlas          PRESENT                     NaN  Vulnerable
7       -33.237364        149.202657  2005-04-24T00:00:00Z                            Petauroides volans  https://biodiversity.org.au/afd/taxa/5e2dc7c9-...  fabf9319-4e59-4ed7-9d7d-eb848ae65827       NSW BioNet Atlas          PRESENT                     NaN  Endangered
8       -33.237257        150.276301  2013-01-23T12:11:00Z                              Persoonia hindii   https://id.biodiversity.org.au/node/apni/2913791  092b8717-a9ae-4698-bdc3-e9fb806d4436  iNaturalist Australia          PRESENT                     NaN  Endangered
9       -33.237188        150.088124  2009-03-18T00:00:00Z                            Petauroides volans  https://biodiversity.org.au/afd/taxa/5e2dc7c9-...  ce7a7e40-1220-4cfa-9b31-39cc6927bc04       NSW BioNet Atlas          PRESENT                     NaN  Endangered

Finally, we’ll filter our observations to only those within the buffered area around Mid-Western.

# convert to GeoDataFrame
threatened_species_gdf = gpd.GeoDataFrame(
    threatened_species, 
    geometry=gpd.points_from_xy(threatened_species.decimalLongitude, threatened_species.decimalLatitude), 
    crs="EPSG:4326"
)

# filter to species within our 10km buffer
threatened_species_10km = threatened_species[threatened_species_gdf.geometry.within(buffer_10km)]

We can now plot our observations on a map using {matplotlib}, adjusting the opacity of our points (alpha) so we can see areas with many overlapping points more easily. To understand just how many additional species our 10 km buffer captures, we’ve added an additional bar plot comparing the number of species in Mid-Western vs Mid-Western + our 10 km buffer. Notice that the number of threatened species detected markedly increases with the addition of our buffer because we captured a few noticeable hotspots of threatened species observations near the edge of the Mid-Western border, though remember that they might not be real hotspots but rather a group of generalised points!

# map
fig, ax = plt.subplots()
ax.scatter(threatened_species_10km['decimalLongitude'],threatened_species_10km['decimalLatitude'], alpha=0.3, color='#5A2A57')
ax.plot(*midwestern['geometry'][74].exterior.xy,c='#292C26',lw=2)
ax.plot(*buffer_10km.exterior.xy,c='#358BA5',lw=2)
plt.axis('off')
ax.set_aspect(aspect='equal')
plt.show();
Code to make bar plot
# import numpy for arange
import numpy as np

# get number of species within midwestern
midwestern = midwestern.to_crs(4326)
species_mw = threatened_species[threatened_species_gdf.geometry.within(midwestern['geometry'][74])]

# set dictionary for get number of species in midwestern, sorted by status
num_species_mid = {x:0 for x in list(set(species_mw['status']))}

# loop over each key, get the number of unique species for each status in Midwestern
for key in num_species_mid:
    temp = species_mw[species_mw['status'] == key]
    num_species_mid[key] = len(list(set(temp['scientificName'])))

# sort species from most to least
num_species_mid_sorted = dict(sorted(num_species_mid.items(), key=lambda item: item[1]))

# set dictionary to get number of species in buffer, sorted by status
num_species_buff = {x:0 for x in list(set(threatened_species_10km['status']))}

# loop over each key, get the number of unique species for each status in buffered region
for key in num_species_buff:
    temp = threatened_species_10km[threatened_species_10km['status'] == key]
    num_species_buff[key] = len(list(set(temp['scientificName'])))

# sort buffered species from most to least
num_species_buff_sorted = dict(sorted(num_species_buff.items(), key=lambda item: item[1]))

# create plot
fig, ax = plt.subplots()
h, n = 0.4, np.arange(len(num_species_mid_sorted.keys()))
bar = ax.barh(n + h/2, num_species_mid_sorted.values(), height=h, color='#292C26', label='Mid-Western')

# set ticks
bar = ax.set_yticks(range(len(num_species_mid_sorted.keys())), labels=num_species_mid_sorted.keys())
bar = ax.barh(n - h/2, num_species_buff_sorted.values(), height=h, color='#358BA5', label='Mid-Western + buffer')
bar = ax.legend()
bar = ax.set_xlabel('Number of Unique Species')
bar = ax.set_aspect(aspect=12)

Number of threatened species observed in Mid-Western only vs Mid-Western + a buffer of 10 km

Number of threatened species observed in Mid-Western only vs Mid-Western + a buffer of 10 km

Map of observations of threatened species within the buffered area around Mid-Western

Map of observations of threatened species within the buffered area around Mid-Western

In our example, record locations were generalised to 1 km or 10 km distances. However, it’s possible to return species locations generalised to greater distances of 50 km. In these situations, a species that lives quite far from our specified area might appear in our query! To ensure species lists are accurate, it’s always important to use generalisationInMetres to identify these data points, determine whether they are realistic to include or not, and clean them accordingly!

Final thoughts

We hope this post has helped you understand how to draw buffers around a shape, as well as the importance of considering buffer size when determining threatened species in an area. Ultimately, buffer size will depend on the question we are trying to answer, whether that’s for research, monitoring, conservation or environmental impact assessment prior to development.

For other Python posts, check out our beginner’s guide to map species observations or see how to cross reference a species list with a conservation list.

Expand for session info
-----
galah               0.12.1
geopandas           1.1.1
matplotlib          3.10.3
natsort             8.4.0
numpy               2.3.1
pandas              2.3.1
session_info        v1.0.1
shapely             2.1.1
-----
Python 3.13.7 (tags/v3.13.7:bcee1c3, Aug 14 2025, 14:15:11) [MSC v.1944 64 bit (AMD64)]
Windows-11-10.0.22631-SP0
-----
Session information updated at 2025-08-26 12:28

Thanks to Cameron Slatyer and Tania Laity for their helpful comments and suggestions for this post.

Footnotes

  1. Check out this section of a previous ALA Labs post for a more complete explanation of what a CRS is.↩︎

  2. ALA data is projected using CRS EPSG:4326 (the same one used by Google Earth).↩︎

  3. Alternatively, iNaturalist randomises their coordinate locations at a 30 km resolution prior to sharing data with the Atlas of Living Australia.↩︎

  4. Some records, if the species is very sensitive, may be witheld altogether.↩︎

  5. How nearby is relative to the degree of generalisation, so what we mean by “near” varies↩︎

  6. Generalising data points is good because it makes it very difficult know the specific location of an observation when access to this information should be restricted for some reason (i.e., endangered, subject to poaching or misuse, privacy concerns). However, it can also make data more difficult to work with because data points are shifted to new locations. In some cases, it can even cause data points to appear in the ocean despite the true location being on land!↩︎

  7. https://en.wikipedia.org/wiki/Stuttering_frog↩︎

  8. Downloading all occurrence records using a bounding box then filtering the records to fit a shapefile can be a quicker way to subset records (rather than waiting for the API to process a more complex shapefile polygon shape). You can find a more in-depth article on this here.↩︎

  9. It’s faster to use a bounding box than polygon shape to download records - a box has fewer points than a complicated polygon shape, making for a far simpler query to process!↩︎