Show seasonal species trends using a ridgeline plot

Author

Date

10 May 2024

In ecology, it’s common to investigate trends across individuals, populations, species or taxonomic groups. Although it’s possible to use box plots and bar plots for this task, viewing many boxes or bars at once can become messy or crowded. These plots also display summary statistics which can sometimes mask important elements of variation in the data and potentially over-exaggerate existing trends.

Ridgeline plots are one useful, fast type of visualisation for showing trends in ecological data like seasonality, diurnality and population growth or decline. They are especially useful for comparing a large number of individuals, species or groups because they display density curves rather than summary statistics. This means that variation in the data is visibly preserved.

Here, we use a ridgeline plot to quickly display the yearly seasonality of shorebirds (birds fond of tidal and estuary environments), in Pindanland, Western Australia using the galah, ggplot2 and ggridges packages.

Let’s start by loading the R packages that we will need.

library(galah)
library(tidyverse)
library(ggridges)
library(pilot) #remotes::install_github("olihawkins/pilot")

We will use the galah package to download occurrence records from the Atlas of Living Australia (ALA). To do this, you’ll need to provide a registered email address and pass it to galah using galah_config().

galah_config(email = "your-email@email.com")

Download data

Shorebirds are a group with many highly migratory birds that travel large distances between breeding seasons. One species found in Pindanland, the Bar-tailed Godwit (below left), migrates once a year across the Pacific to Australia for food; that’s 11,000 kilometers non-stop!¹ As a result shorebird abundance can fluctuate a lot in a given area depending on the time of year. Our goal is to display these changes in abundance by looking at how many total shorebird observations there have been on each day of the year, grouped by species.

Left: Limosa lapponica (lozwoz88 CC-BY-NC 4.0 (Int), Middle: Stiltia isabella (Steve Murray CC-BY-NC 4.0 (Int)), Right: Cladorhynchus leucocephalus (Blythe Nilson, iNaturalist CC-BY-NC 4.0 (Int))

Let’s download data of shorebirds in the order Charadriiformes.

We are interested in downloading data from Pindanland, a subregion of the IBRA bioregion Dampierland in Western Australia. To filter our data to only Pindanland, we’ll do a text search for any fields in galah that contain IBRA information.

search_all(fields, "ibra")

# A tibble: 3 × 3
  id     description       type  
  <chr>  <chr>             <chr> 
1 cl20   IBRA 6 Regions    fields
2 cl1048 IBRA 7 Regions    fields
3 cl1049 IBRA 7 Subregions fields

The field ID cl1049 appears to contain IBRA subregions. Let’s show what values are recorded in the IBRA 7 Subregions field to check.

search_all(fields, "cl1049") |> show_values()

• Showing values for 'cl1049'.

# A tibble: 419 × 1
   cl1049                            
   <chr>                             
 1 Gippsland Plain                   
 2 Murrumbateman                     
 3 Victorian Volcanic Plain          
 4 Burringbar-Conondale Ranges       
 5 Sunshine Coast-Gold Coast Lowlands
 6 Highlands-Southern Fall           
 7 Pittwater                         
 8 Otway Plain                       
 9 Cumberland                        
10 Moreton Basin                     
# ℹ 409 more rows

We can also search for “Pindanland” to double check for our subregion.

search_all(fields, "cl1049") |> 
  search_values("Pindanland")

• Showing values for 'cl1049'.

# A tibble: 1 × 1
  cl1049    
  <chr>     
1 Pindanland

Now we are able to download occurrence data of shorebirds in Pindanland by using field cl1049 in our query. We’ll further filter our query to return occurrences recorded after the year 2000, human observations (rather than museum specimens), and records identified to the species level. We’ll also use a set of ALA data cleaning filters (i.e. a data profile) by adding galah_apply_profile(ALA) to return fewer erroneous records. To shrink the amount of data we return, we’ll select only columns eventDate, scientificName to return.

# download shorebird records
shorebirds <- galah_call() |>
  galah_identify("Charadriiformes") |> 
  galah_filter(cl1049 == "Pindanland",
               year > 2000,
               basisOfRecord == "HUMAN_OBSERVATION",
               taxonRank == "species") |>
  galah_apply_profile(ALA) |>
  galah_select(eventDate, scientificName) |>
  atlas_occurrences()
shorebirds

# A tibble: 67,280 × 2
   eventDate scientificName                      
   <dttm>    <chr>                               
 1 NA        Haematopus longirostris             
 2 NA        Calidris (Crocethia) alba           
 3 NA        Tringa (Glottis) nebularia          
 4 NA        Pluvialis squatarola                
 5 NA        Xenus cinereus                      
 6 NA        Charadrius (Eupoda) veredus         
 7 NA        Numenius (Numenius) madagascariensis
 8 NA        Arenaria interpres                  
 9 NA        Calidris (Calidris) falcinellus     
10 NA        Arenaria interpres                  
# ℹ 67,270 more rows

Prepare data

Now that we have our data, we need to prepare it for our plot. Remember that we are trying to visualise total number of observations of each species each day of the year.

To clean our data, we’ll remove records missing an eventDate. Then we can convert eventDate to a standard date class (yyyy/mm/dd), and extract the day of the year (Julian date)².

# format date
shorebirds_dates <- shorebirds |>
  drop_na(eventDate) |>
  mutate(
    eventDate = as_date(eventDate),
    date_julian = yday(eventDate)
  )

shorebirds_dates

# A tibble: 66,693 × 3
   eventDate  scientificName                      date_julian
   <date>     <chr>                                     <dbl>
 1 2001-01-01 Calidris (Calidris) canutus                   1
 2 2001-01-01 Calidris (Calidris) tenuirostris              1
 3 2001-01-03 Limosa lapponica                              3
 4 2001-01-03 Actitis hypoleucos                            3
 5 2001-01-06 Elseyornis melanops                           6
 6 2001-01-06 Chlidonias (Pelodes) hybrida                  6
 7 2001-01-06 Limosa limosa                                 6
 8 2001-01-06 Himantopus himantopus                         6
 9 2001-01-06 Chlidonias (Chlidonias) leucopterus           6
10 2001-01-06 Chlidonias (Chlidonias) leucopterus           6
# ℹ 66,683 more rows

We then filter our data to only include species with more than 10 occurrence records, which leaves us our final data frame ready for plotting.

shorebirds_filtered <- shorebirds_dates |>
  group_by(scientificName) |>
  filter(n_distinct(date_julian) >= 10)

shorebirds_filtered |> rmarkdown::paged_table()

Make ridgeline plot

We can now create a simple ridgeline plot for our data using geom_density_ridges().

ridge_plot <- ggplot(
  data = shorebirds_filtered,
  aes(x = date_julian,
      y = scientificName,
      fill = scientificName)) +  
  ggridges::geom_density_ridges(color = NA) +  
  theme_minimal() +
  theme(legend.position = "none")

ridge_plot

If we want to refine our plot, there are some extra things we can do to increase its readability.

For example, we can make the trends easier to interpret by ordering species by a summary statistic (e.g., mean). We ordered by month with the highest proportion of observations³, which helps place birds with greater abundance at the end of the year towards the top, and birds with greater abundance at the beginning of the year at the bottom. We also adjusted the the smoothness of our ridges to see more fine-scale variation in our data.

We can also adjust the colours and axis labels. We chose theme_pilot from the pilot package as it uses a colour-blind friendly palette.

Code

# add month
shorebirds_filtered <- shorebirds_filtered |>
  mutate(
    month = month(eventDate, 
                  abbr = TRUE, 
                  label = TRUE),
    month_number = month(eventDate,
                         abbr = FALSE,
                         label = FALSE)
    )

# add month proportion column
shorebirds_filtered_prop <- shorebirds_filtered |>
  group_by(scientificName, month) |>
  summarise(n = n(), 
            .groups = "drop") |>
  group_by(scientificName) |>
  mutate(
    total = sum(n),
    prop = n/total * 100,
  ) |>
  left_join(shorebirds_filtered,
            join_by(scientificName == scientificName, 
                    month == month))

shorebirds_filtered_prop |>
  ggplot(
  aes(
    x = date_julian,
    y = fct_reorder(scientificName, prop*month_number),
    fill = fct_reorder(scientificName, prop*month_number), 
    colour = fct_reorder(scientificName, prop*month_number),
  )) +
  scale_x_continuous(
    breaks = c(1, 30, 60, 90, 120, 150, 
               180, 210, 240, 270, 300, 330),  # set numbers for labels
    labels = c("Jan", "Feb", "Mar", "Apr", "May", "June", 
               "July", "Aug", "Sept", "Oct", "Nov", "Dec"), # set labels
    expand = c(0,0)) + 
  labs(x = "Month") +
  ggridges::geom_density_ridges(color = NA,
    bandwidth = 9,       # smoothness of the curve
    scale = 6,            # ridge width
    height = 0.05,        # ridge height
    alpha = .8,           # transparency
    rel_min_height = 0.02) +
  pilot::theme_pilot(grid = "v",  # grid lines 
                     axes = "") + # axis lines
  pilot::scale_fill_pilot() +
  theme(legend.position = "none",
        axis.title.y = element_blank(),
        axis.title.x = element_text(size = 16),
        axis.text.x = element_text(size = 14))

Total daily observations of shorebirds from 2000–2024 in Pindanland, WA

Our plot shows that many birds arrive around September each year, and many species are observed most around November. We can also see that many species have a drop in observations from May to August (i.e., winter months).

Final thoughts

And that’s it! Ridgeline plots are a simple and fast visualisation to use, and are a beautiful way to display ecological data.

Ridgeline plots do, however, have their limits. They don’t give an indication of how many observations there are of each bird species, or how they compare to each other, so they are mainly useful for displaying broader trends.

Keep in mind that ridgeline plots might need adjusting to visualise your data clearly. Compare our first quick plot and our second refined plot above, for example. The smoother ridges in the first plot mask variation in our data, only made clearer with less smooth ridges in the second plot. Balancing the shape of your ridges will help improve the transparency of your data visualisation.

If you want to make other transparent summary visualisations, check out this post on how to make beeswarm and raincloud plots.

Expand for session info

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.2 (2023-10-31 ucrt)
 os       Windows 10 x64 (build 19045)
 system   x86_64, mingw32
 ui       RTerm
 language (EN)
 collate  English_Australia.utf8
 ctype    English_Australia.utf8
 tz       Australia/Sydney
 date     2024-05-09
 pandoc   3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package     * version date (UTC) lib source
 dplyr       * 1.1.4   2023-11-17 [1] CRAN (R 4.3.2)
 forcats     * 1.0.0   2023-01-29 [1] CRAN (R 4.3.2)
 galah       * 2.0.2   2024-04-12 [1] CRAN (R 4.3.3)
 ggplot2     * 3.4.4   2023-10-12 [1] CRAN (R 4.3.1)
 ggridges    * 0.5.4   2022-09-26 [1] CRAN (R 4.3.2)
 htmltools   * 0.5.7   2023-11-03 [1] CRAN (R 4.3.2)
 lubridate   * 1.9.3   2023-09-27 [1] CRAN (R 4.3.2)
 ozmaps      * 0.4.5   2021-08-03 [1] CRAN (R 4.3.2)
 pilot       * 4.0.0   2022-07-13 [1] Github (olihawkins/pilot@f08cc16)
 purrr       * 1.0.2   2023-08-10 [1] CRAN (R 4.3.2)
 readr       * 2.1.5   2024-01-10 [1] CRAN (R 4.3.3)
 sessioninfo * 1.2.2   2021-12-06 [1] CRAN (R 4.3.2)
 stringr     * 1.5.1   2023-11-14 [1] CRAN (R 4.3.2)
 tibble      * 3.2.1   2023-03-20 [1] CRAN (R 4.3.2)
 tidyr       * 1.3.1   2024-01-24 [1] CRAN (R 4.3.3)
 tidyverse   * 2.0.0   2023-02-22 [1] CRAN (R 4.3.2)

 [1] C:/Users/KEL329/R-packages
 [2] C:/Users/KEL329/AppData/Local/Programs/R/R-4.3.2/library

──────────────────────────────────────────────────────────────────────────────

Footnotes

This is considered one of the longest continuous journeys by any bird in the world.↩︎
Thanks to leap years, our Julian dates wont be perfectly to the day, but good enough for a quick summary.↩︎
Specifically, we multiplied the proportion of records by month number (e.g., January = 1, December = 12) so that birds with lots of records at the end of the year return a high number, whereas birds with lots of records at the beginning of the year return a low number.↩︎