library(galah)
library(tidyverse)
library(ggridges)
library(pilot) #remotes::install_github("olihawkins/pilot")
In ecology, it’s common to investigate trends across individuals, populations, species or taxonomic groups. Although it’s possible to use box plots and bar plots for this task, viewing many boxes or bars at once can become messy or crowded. These plots also display summary statistics which can sometimes mask important elements of variation in the data and potentially over-exaggerate existing trends.
Ridgeline plots are one useful, fast type of visualisation for showing trends in ecological data like seasonality, diurnality and population growth or decline. They are especially useful for comparing a large number of individuals, species or groups because they display density curves rather than summary statistics. This means that variation in the data is visibly preserved.
Here, we use a ridgeline plot to quickly display the yearly seasonality of shorebirds (birds fond of tidal and estuary environments), in Pindanland, Western Australia using the galah, ggplot2 and ggridges packages.
Let’s start by loading the R packages that we will need.
We will use the galah package to download occurrence records from the Atlas of Living Australia (ALA). To do this, you’ll need to provide a registered email address and pass it to galah using galah_config()
.
galah_config(email = "your-email@email.com")
Download data
Shorebirds are a group with many highly migratory birds that travel large distances between breeding seasons. One species found in Pindanland, the Bar-tailed Godwit (below left), migrates once a year across the Pacific to Australia for food; that’s 11,000 kilometers non-stop!1 As a result shorebird abundance can fluctuate a lot in a given area depending on the time of year. Our goal is to display these changes in abundance by looking at how many total shorebird observations there have been on each day of the year, grouped by species.
Let’s download data of shorebirds in the order Charadriiformes.
We are interested in downloading data from Pindanland, a subregion of the IBRA bioregion Dampierland in Western Australia. To filter our data to only Pindanland, we’ll do a text search for any fields in galah that contain IBRA information.
search_all(fields, "ibra")
# A tibble: 3 × 3
id description type
<chr> <chr> <chr>
1 cl20 IBRA 6 Regions fields
2 cl1048 IBRA 7 Regions fields
3 cl1049 IBRA 7 Subregions fields
The field ID cl1049
appears to contain IBRA subregions. Let’s show what values are recorded in the IBRA 7 Subregions field to check.
search_all(fields, "cl1049") |> show_values()
• Showing values for 'cl1049'.
# A tibble: 419 × 1
cl1049
<chr>
1 Gippsland Plain
2 Murrumbateman
3 Victorian Volcanic Plain
4 Burringbar-Conondale Ranges
5 Sunshine Coast-Gold Coast Lowlands
6 Highlands-Southern Fall
7 Pittwater
8 Otway Plain
9 Cumberland
10 Moreton Basin
# ℹ 409 more rows
We can also search for “Pindanland” to double check for our subregion.
search_all(fields, "cl1049") |>
search_values("Pindanland")
• Showing values for 'cl1049'.
# A tibble: 1 × 1
cl1049
<chr>
1 Pindanland
Now we are able to download occurrence data of shorebirds in Pindanland by using field cl1049
in our query. We’ll further filter our query to return occurrences recorded after the year 2000, human observations (rather than museum specimens), and records identified to the species level. We’ll also use a set of ALA data cleaning filters (i.e. a data profile) by adding galah_apply_profile(ALA)
to return fewer erroneous records. To shrink the amount of data we return, we’ll select only columns eventDate
, scientificName
to return.
# download shorebird records
<- galah_call() |>
shorebirds galah_identify("Charadriiformes") |>
galah_filter(cl1049 == "Pindanland",
> 2000,
year == "HUMAN_OBSERVATION",
basisOfRecord == "species") |>
taxonRank galah_apply_profile(ALA) |>
galah_select(eventDate, scientificName) |>
atlas_occurrences()
shorebirds
# A tibble: 67,280 × 2
eventDate scientificName
<dttm> <chr>
1 NA Haematopus longirostris
2 NA Calidris (Crocethia) alba
3 NA Tringa (Glottis) nebularia
4 NA Pluvialis squatarola
5 NA Xenus cinereus
6 NA Charadrius (Eupoda) veredus
7 NA Numenius (Numenius) madagascariensis
8 NA Arenaria interpres
9 NA Calidris (Calidris) falcinellus
10 NA Arenaria interpres
# ℹ 67,270 more rows
Prepare data
Now that we have our data, we need to prepare it for our plot. Remember that we are trying to visualise total number of observations of each species each day of the year.
To clean our data, we’ll remove records missing an eventDate
. Then we can convert eventDate
to a standard date class (yyyy/mm/dd), and extract the day of the year (Julian date)2.
# format date
<- shorebirds |>
shorebirds_dates drop_na(eventDate) |>
mutate(
eventDate = as_date(eventDate),
date_julian = yday(eventDate)
)
shorebirds_dates
# A tibble: 66,693 × 3
eventDate scientificName date_julian
<date> <chr> <dbl>
1 2001-01-01 Calidris (Calidris) canutus 1
2 2001-01-01 Calidris (Calidris) tenuirostris 1
3 2001-01-03 Limosa lapponica 3
4 2001-01-03 Actitis hypoleucos 3
5 2001-01-06 Elseyornis melanops 6
6 2001-01-06 Chlidonias (Pelodes) hybrida 6
7 2001-01-06 Limosa limosa 6
8 2001-01-06 Himantopus himantopus 6
9 2001-01-06 Chlidonias (Chlidonias) leucopterus 6
10 2001-01-06 Chlidonias (Chlidonias) leucopterus 6
# ℹ 66,683 more rows
We then filter our data to only include species with more than 10 occurrence records, which leaves us our final data frame ready for plotting.
<- shorebirds_dates |>
shorebirds_filtered group_by(scientificName) |>
filter(n_distinct(date_julian) >= 10)
|> rmarkdown::paged_table() shorebirds_filtered
Make ridgeline plot
We can now create a simple ridgeline plot for our data using geom_density_ridges()
.
<- ggplot(
ridge_plot data = shorebirds_filtered,
aes(x = date_julian,
y = scientificName,
fill = scientificName)) +
::geom_density_ridges(color = NA) +
ggridgestheme_minimal() +
theme(legend.position = "none")
ridge_plot
If we want to refine our plot, there are some extra things we can do to increase its readability.
For example, we can make the trends easier to interpret by ordering species by a summary statistic (e.g., mean). We ordered by month with the highest proportion of observations3, which helps place birds with greater abundance at the end of the year towards the top, and birds with greater abundance at the beginning of the year at the bottom. We also adjusted the the smoothness of our ridges to see more fine-scale variation in our data.
We can also adjust the colours and axis labels. We chose theme_pilot
from the pilot package as it uses a colour-blind friendly palette.
Code
# add month
<- shorebirds_filtered |>
shorebirds_filtered mutate(
month = month(eventDate,
abbr = TRUE,
label = TRUE),
month_number = month(eventDate,
abbr = FALSE,
label = FALSE)
)
# add month proportion column
<- shorebirds_filtered |>
shorebirds_filtered_prop group_by(scientificName, month) |>
summarise(n = n(),
.groups = "drop") |>
group_by(scientificName) |>
mutate(
total = sum(n),
prop = n/total * 100,
|>
) left_join(shorebirds_filtered,
join_by(scientificName == scientificName,
== month))
month
|>
shorebirds_filtered_prop ggplot(
aes(
x = date_julian,
y = fct_reorder(scientificName, prop*month_number),
fill = fct_reorder(scientificName, prop*month_number),
colour = fct_reorder(scientificName, prop*month_number),
+
)) scale_x_continuous(
breaks = c(1, 30, 60, 90, 120, 150,
180, 210, 240, 270, 300, 330), # set numbers for labels
labels = c("Jan", "Feb", "Mar", "Apr", "May", "June",
"July", "Aug", "Sept", "Oct", "Nov", "Dec"), # set labels
expand = c(0,0)) +
labs(x = "Month") +
::geom_density_ridges(color = NA,
ggridgesbandwidth = 9, # smoothness of the curve
scale = 6, # ridge width
height = 0.05, # ridge height
alpha = .8, # transparency
rel_min_height = 0.02) +
::theme_pilot(grid = "v", # grid lines
pilotaxes = "") + # axis lines
::scale_fill_pilot() +
pilottheme(legend.position = "none",
axis.title.y = element_blank(),
axis.title.x = element_text(size = 16),
axis.text.x = element_text(size = 14))
Our plot shows that many birds arrive around September each year, and many species are observed most around November. We can also see that many species have a drop in observations from May to August (i.e., winter months).
Final thoughts
And that’s it! Ridgeline plots are a simple and fast visualisation to use, and are a beautiful way to display ecological data.
Ridgeline plots do, however, have their limits. They don’t give an indication of how many observations there are of each bird species, or how they compare to each other, so they are mainly useful for displaying broader trends.
Keep in mind that ridgeline plots might need adjusting to visualise your data clearly. Compare our first quick plot and our second refined plot above, for example. The smoother ridges in the first plot mask variation in our data, only made clearer with less smooth ridges in the second plot. Balancing the shape of your ridges will help improve the transparency of your data visualisation.
If you want to make other transparent summary visualisations, check out this post on how to make beeswarm and raincloud plots.
Expand for session info
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.3.2 (2023-10-31 ucrt)
os Windows 10 x64 (build 19045)
system x86_64, mingw32
ui RTerm
language (EN)
collate English_Australia.utf8
ctype English_Australia.utf8
tz Australia/Sydney
date 2024-05-09
pandoc 3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.3.2)
forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.3.2)
galah * 2.0.2 2024-04-12 [1] CRAN (R 4.3.3)
ggplot2 * 3.4.4 2023-10-12 [1] CRAN (R 4.3.1)
ggridges * 0.5.4 2022-09-26 [1] CRAN (R 4.3.2)
htmltools * 0.5.7 2023-11-03 [1] CRAN (R 4.3.2)
lubridate * 1.9.3 2023-09-27 [1] CRAN (R 4.3.2)
ozmaps * 0.4.5 2021-08-03 [1] CRAN (R 4.3.2)
pilot * 4.0.0 2022-07-13 [1] Github (olihawkins/pilot@f08cc16)
purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.2)
readr * 2.1.5 2024-01-10 [1] CRAN (R 4.3.3)
sessioninfo * 1.2.2 2021-12-06 [1] CRAN (R 4.3.2)
stringr * 1.5.1 2023-11-14 [1] CRAN (R 4.3.2)
tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.3.2)
tidyr * 1.3.1 2024-01-24 [1] CRAN (R 4.3.3)
tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.3.2)
[1] C:/Users/KEL329/R-packages
[2] C:/Users/KEL329/AppData/Local/Programs/R/R-4.3.2/library
──────────────────────────────────────────────────────────────────────────────
Footnotes
This is considered one of the longest continuous journeys by any bird in the world.↩︎
Thanks to leap years, our Julian dates wont be perfectly to the day, but good enough for a quick summary.↩︎
Specifically, we multiplied the proportion of records by month number (e.g., January = 1, December = 12) so that birds with lots of records at the end of the year return a high number, whereas birds with lots of records at the beginning of the year return a low number.↩︎