A majority of species observations in the Atlas of Living Australia are collected opportunistically, where people record observations incidentally rather than through an ongoing monitoring program.
However, whether an observation is recorded or not doesn’t just depend on the species. It might be rainy, it might be too hot, an area might be inaccessible; all of these factors can affect whether people make an observation.
The COVID-19 pandemic had a major impact on people’s health, behaviour and travel. In Australia, several lockdowns over 2020-2021 imposed restrictions on people’s movements, limiting them to certain activities near their homes. Melbourne experienced the longest continuous lockdown in the world.
To what extent did COVID-19 and lockdowns affect the number of species observations people made over that time? Here, we’ll use a highlighted time-series plot to investigate how lockdowns in Melbourne affected the observations of Anatidae (ducks, geese and swans), a taxonomic group frequently seen on walks and outdoor gatherings, compared to previous years.
Get data
We’ll start by downloading Anatidae records.
First, let’s load some packages:
# Load packages
library(galah)
library(tidyverse)
library(grid)
library(pilot) # remotes::install_github("olihawkins/pilot")
library(ggtext)
library(showtext)
Let’s use the {galah} package to download Anatidae records in Melbourne from years before and during COVID-19.
Searching with search_all(fields)
shows us that {galah} contains Greater Capital City Statistical Areas, which we can use to filter our query.
search_all(fields, "city")
# A tibble: 1 × 3
id description type
<chr> <chr> <chr>
1 cl10929 PSMA ABS Greater Capital City Statistical Areas (2016) fields
search_all(fields, "city") |> search_values("melbourne")
• Showing values for 'cl10929'.
# A tibble: 1 × 1
cl10929
<chr>
1 GREATER MELBOURNE
Let’s build our query to return Anatidae records from GREATER MELBOURNE
and use galah_select()
to return only the eventDate
column.
You will need to first provide a registered email with the ALA using galah_config()
before retrieving records.
# Add registered email (register at ala.org.au)
galah_config(email = "your-email@email.com")
<-
birds galah_call() |>
galah_identify("Anatidae") |>
galah_filter(
== "GREATER MELBOURNE",
cl10929 >= 2017,
year <= 2021,
year == "HUMAN_OBSERVATION"
basisOfRecord |>
) galah_select(eventDate) |>
atlas_occurrences()
|> slice_sample(n = 10) birds
# A tibble: 10 × 2
recordID eventDate
<chr> <dttm>
1 1f2ecd45-a862-46c8-a3ac-629fd43ea95c 2018-02-17 13:54:00
2 d2752ab2-ec46-40f1-82f8-422969dc6e5c 2021-07-09 11:02:00
3 c2d69c66-24f3-4f35-9701-f1f26b514fec 2019-07-21 08:40:00
4 1a8b43ea-109f-453d-9432-b614b9c67424 2021-09-02 16:26:00
5 577740ce-701a-4389-aa02-97b62a485ca6 2020-09-18 10:20:00
6 6d122db3-1478-419e-9f6b-e31a41ab2488 2021-12-29 13:45:00
7 7749f3fb-db99-4339-ba45-e5a6eea9d1bf 2021-08-09 07:52:00
8 b45fdcb9-10d8-4aee-a528-8de6df94297d 2019-10-05 09:16:00
9 88a30420-0a28-4c99-a628-d06bd4d73e49 2019-11-16 08:31:00
10 e0e783fe-5183-4e07-b01a-1a7e373836ba 2018-06-09 12:24:00
We’ll then extract the week and year of each date and count the total observations for each week.
<- birds |>
birds_weekly mutate(date = as_date(eventDate),
year = year(eventDate),
week = week(eventDate)) |>
filter(year > 2016) |> # remove stray 2016 records
group_by(year, week) |>
summarise(week_obs = n())
birds_weekly
# A tibble: 265 × 3
# Groups: year [5]
year week week_obs
<dbl> <dbl> <int>
1 2017 1 831
2 2017 2 679
3 2017 3 725
4 2017 4 775
5 2017 5 767
6 2017 6 771
7 2017 7 651
8 2017 8 660
9 2017 9 703
10 2017 10 689
# ℹ 255 more rows
We want to compare observations recorded in 2020-2021 to previous years, but because we know that contributions to the ALA have increased each year, comparing raw numbers will be an unequal comparison and bias our results.
To avoid this, let’s scale our weekly record counts by the total number of Anatidae observations each year. Doing this let’s us compare proportions rather than raw numbers.
First let’s download the total Anatidae records for each year.
<-
birds_yearly galah_call() |>
galah_identify("Anatidae") |>
galah_filter(cl10929 == "GREATER MELBOURNE",
>= 2017, year <= 2021) |>
year galah_group_by(year) |>
atlas_counts() |>
rename(year_obs = count) |>
mutate(year = as.numeric(year)) |>
arrange(-desc(year))
birds_yearly
# A tibble: 5 × 2
year year_obs
<dbl> <int>
1 2017 39359
2 2018 50768
3 2019 48939
4 2020 49837
5 2021 58878
Now we’ll join birds_yearly
with birds_weekly
so we can calculate the proportion of records observed each week. We’ll do this by dividing each row’s weekly total by the yearly total.
<- birds_weekly |>
birds_prop left_join(birds_yearly) |>
rowwise() |>
mutate(prop = week_obs / year_obs,
.keep = "unused") |>
ungroup()
birds_prop
# A tibble: 265 × 3
year week prop
<dbl> <dbl> <dbl>
1 2017 1 0.0211
2 2017 2 0.0173
3 2017 3 0.0184
4 2017 4 0.0197
5 2017 5 0.0195
6 2017 6 0.0196
7 2017 7 0.0165
8 2017 8 0.0168
9 2017 9 0.0179
10 2017 10 0.0175
# ℹ 255 more rows
To compare observations in years prior to and during COVID-19, we’ll want to plot two lines:
- A baseline of average weekly observation counts in 2017-2019
- A line with weekly observation counts over 2020 and 2021
To create the average 2017-2019 baseline, let’s calculate the mean proportion of records each week from 2017-2019.
To do this, we’ll place our weekly proportions in separate columns using pivot_wider()
.
<- birds_prop |>
birds_wide pivot_wider(names_from = year,
values_from = prop,
names_sort = TRUE,
names_glue = "year_{year}")
birds_wide
# A tibble: 53 × 6
week year_2017 year_2018 year_2019 year_2020 year_2021
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0.0211 0.0209 0.0338 0.0313 0.0225
2 2 0.0173 0.0178 0.0350 0.0228 0.0191
3 3 0.0184 0.0168 0.0258 0.0239 0.0201
4 4 0.0197 0.0158 0.0236 0.0244 0.0190
5 5 0.0195 0.0164 0.0201 0.0155 0.0184
6 6 0.0196 0.0137 0.0200 0.0180 0.0194
7 7 0.0165 0.0176 0.0190 0.0154 0.0168
8 8 0.0168 0.0138 0.0184 0.0153 0.0192
9 9 0.0179 0.0151 0.0144 0.0131 0.0186
10 10 0.0175 0.0132 0.0215 0.0115 0.0163
# ℹ 43 more rows
Then we’ll calculate the mean proportion of observations each week across year_2017
, year_2018
and year_2019
columns.
<- birds_wide |>
birds_mean_prop rowwise() |>
mutate(
mean_2017_19 = mean(c_across(year_2017:year_2019)),
.keep = "unused"
|>
) ungroup()
birds_mean_prop
# A tibble: 53 × 4
week year_2020 year_2021 mean_2017_19
<dbl> <dbl> <dbl> <dbl>
1 1 0.0313 0.0225 0.0253
2 2 0.0228 0.0191 0.0234
3 3 0.0239 0.0201 0.0203
4 4 0.0244 0.0190 0.0197
5 5 0.0155 0.0184 0.0187
6 6 0.0180 0.0194 0.0178
7 7 0.0154 0.0168 0.0177
8 8 0.0153 0.0192 0.0163
9 9 0.0131 0.0186 0.0158
10 10 0.0115 0.0163 0.0174
# ℹ 43 more rows
Now we have all the numbers we need for plotting! We just need to reorganise them so that they plot correctly.
Two columns in birds_mean_prop
contain proportional counts for 2020 and 2021. Although there are 52 weeks in a year, both years extend from weeks 1-53 because neither year started or ended exactly at the end of the week — 2020 ended on a Thursday and 2021 ended on a Friday.
wday(ymd("2020-12-31"), label = TRUE)
[1] Thu
Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
wday(ymd("2021-12-31"), label = TRUE)
[1] Fri
Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
This means that the proportional counts in week 53 of 2020 and week 1 of 2021 are in the same week! We can combine them to better represent the full week’s observations and save the combined count in week 1 of 2021.
<- birds_mean_prop |>
birds_mean_prop rows_update(tibble(week = 1, year_2021 = sum(birds_mean_prop$year_2020[53] + birds_mean_prop$year_2021[1]))) |>
rows_update(tibble(week = 53, year_2020 = NA)) # remove 2020's week 53 count
|> slice_head(n = 3) birds_mean_prop
# A tibble: 3 × 4
week year_2020 year_2021 mean_2017_19
<dbl> <dbl> <dbl> <dbl>
1 1 0.0313 0.0290 0.0253
2 2 0.0228 0.0191 0.0234
3 3 0.0239 0.0201 0.0203
|> slice_tail(n = 3) birds_mean_prop
# A tibble: 3 × 4
week year_2020 year_2021 mean_2017_19
<dbl> <dbl> <dbl> <dbl>
1 51 0.0181 0.0145 0.0162
2 52 0.0216 0.0159 0.0181
3 53 NA 0.00124 0.00291
To allow us to plot proportional counts from Jan 2020 to Dec 2021 as one line (105 weeks total), we’ll separate our 2021 proportional counts, revise 2021 week numbers to start from 53, and place them under our 2020 proportions. That’ll let us plot from week 1 to week 105!
# 2021 record count proportions
<- birds_mean_prop |>
birds_2021 select(-year_2020) |>
rename(prop = year_2021) |>
mutate(week = week + 52)
glimpse(birds_2021)
Rows: 53
Columns: 3
$ week <dbl> 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 6…
$ prop <dbl> 0.02902850, 0.01912429, 0.02012636, 0.01897143, 0.0183769…
$ mean_2017_19 <dbl> 0.02528978, 0.02338032, 0.02031649, 0.01971618, 0.0186806…
# 2020 + 2021 record count proportions
<- birds_mean_prop |>
birds_final select(-year_2021) |>
drop_na() |>
rename(prop = year_2020) |>
bind_rows(birds_2021) # attach 2021 to the bottom
glimpse(birds_final)
Rows: 105
Columns: 3
$ week <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17…
$ prop <dbl> 0.031281979, 0.022754179, 0.023857776, 0.024439673, 0.015…
$ mean_2017_19 <dbl> 0.02528978, 0.02338032, 0.02031649, 0.01971618, 0.0186806…
Lockdowns
During the height of the pandemic, Melbourne had 6 distinct lockdowns. Let’s add their start and end dates to a tibble
.
Because we want to plot 2020 and 2021 on the same plot, we’ll use ifelse()
to make sure our week numbers in 2021 match our week numbers in birds_final
.
<- c(1:6)
n_lockdown <- c("2020-03-31", "2020-07-09",
start_date "2021-02-13", "2021-05-28",
"2021-07-16", "2021-08-05")
<- c("2020-05-12", "2020-10-27",
end_date "2021-02-17", "2021-06-10",
"2021-07-27", "2021-10-21")
<- tibble(n_lockdown, start_date, end_date) |>
lockdowns mutate(
n_days = as_date(ymd(end_date)) - as_date(ymd(start_date)),
week_start = ifelse(year(start_date) == 2020,
week(start_date), week(start_date) + 52),
week_end = ifelse(year(end_date) == 2020,
week(end_date), week(end_date) + 52),
) lockdowns
# A tibble: 6 × 6
n_lockdown start_date end_date n_days week_start week_end
<int> <chr> <chr> <drtn> <dbl> <dbl>
1 1 2020-03-31 2020-05-12 42 days 13 19
2 2 2020-07-09 2020-10-27 110 days 28 43
3 3 2021-02-13 2021-02-17 4 days 59 59
4 4 2021-05-28 2021-06-10 13 days 74 75
5 5 2021-07-16 2021-07-27 11 days 81 82
6 6 2021-08-05 2021-10-21 77 days 83 94
Make plot
To help us see the components of our final plot more clearly, let’s construct our visualisation step-by-step.
First, we’ll add our lockdown dates as highlighted rectangular blocks. To do this we can use geom_rect()
, setting the xmin
and xmax
values to our week_start
and week_end
columns in lockdowns
. We’ll make the rectangle spread across the entire plot by setting ymax = Inf
and ymin = 0
.
We’ll also set our fill
inside of aes()
and define its value within scale_fill_manual()
which will allow us to add the lockdown colour and label to its own legend.
<- ggplot() +
p1 geom_rect(data = lockdowns,
aes(xmin = week_start,
xmax = week_end,
fill = "Lockdown"),
ymin = 0,
ymax = Inf,
alpha=0.2) +
scale_fill_manual(
values = c("Lockdown" = pilot_color("yellow")))
p1
Next we’ll add our proportional species observation counts as lines. We can define their colours and edit the legend and axis labels, too.
<- p1 +
p2 # add lines
geom_line(data = birds_final,
aes(x = week, y = prop,
color = "2020-21 Records"),
linewidth = 1) +
geom_line(data = birds_final,
aes(x = week, y = mean_2017_19,
color = "2017-19 Average"),
linetype = "twodash",
linewidth = 0.8) +
# add fill
geom_area(data = birds_final,
aes(x = week, y = prop),
fill=pilot_color("blue"),
alpha=0.3) +
scale_color_manual(values = c(pilot_color("orange"),
pilot_color("blue")),
labels = c("2017-19 (average)",
"2020-21")) +
guides(fill = guide_legend(title = ""),
color = guide_legend(title = "Year")) +
labs(y = "Proportion of year's total observations",
x = "Month",
title = "Anatidae observations in Melbourne prior to & during COVID-19")
p2
The plot above is enough to see everything we need to see from our data. You could stop here if you wished, but we’ve gone ahead and made a more refined visualisation with nice fonts, axis scales, axis labels and titles!
Code
# add fonts
font_add_google("Montserrat", family = "mont")
font_add_google("Hind", family = "hind")
showtext_auto(enable = TRUE)
+
p2 # make axis scales understandable
scale_y_continuous(expand = c(0, 0),
limits = c(0, 0.035),
labels = scales::percent_format()) +
scale_x_continuous(expand = c(0, 0),
limits = c(1, 105),
breaks = c(1, 14, 27, 40, 52, 65, 78, 91),
labels = c("Jan", "Apr", "Jul", "Oct", "Jan", "Apr", "Jul", "Oct")) +
# add year x axis labels
coord_cartesian(clip = "off") +
annotate_pilot(label = "2020", x = 27, y = 0,
alpha = 0.7, vjust = 3.8,size = 10) +
annotate_pilot(label = "2021", x = 78, y = 0,
alpha = 0.7, vjust = 3.8, size = 10) +
labs(title = "*Anatidae* observations in Melbourne prior to & during COVID-19") +
theme_pilot(grid = "",
axes = "bl") +
theme(axis.title.x = element_text(size = 23, vjust = -1.3),
axis.title.y = element_text(size = 23),
axis.text.x = element_text(size = 20),
axis.text.y = element_text(size = 20),
axis.line = element_line(linewidth = 0.5),
legend.text = element_text(size = 23),
legend.title = element_text(size = 20),
plot.caption = element_text(size = 18),
text = element_text(family = "hind"),
plot.title = element_markdown(family = "mont", size = 31),
plot.subtitle = element_markdown(family = "mont", size = 28))
Now that we have our final plot, we can see a few interesting trends:
- In the first lockdown (soon after COVID-19 arrived in Australia), species observations were lower than the 2017-2019 average.
- In the 2 longest lockdowns, species observations were higher than the 2017-2019 average.
- In the last half of all lockdowns except the first lockdown, observations increased.
Are these trends that you expected to see?
It’s impossible to make any claims of why these trends emerged from our data visualisation alone, but we can speculate (for fun!)
Were people making fewer observations in the first lockdown because they were preoccupied with all the other priorities of settling into new working-from-home routines? Did people make more observations at the tail end of lockdowns because they were seeking relief from being inside by spending more time by natural ponds and lakes?
Some evidence from the US found more people were using natural spaces during COVID-19, and time in these spaces lowered depression and anxiety. A New Zealand study found similar results.
Final thoughts
This post has detailed how you can use ALA data to explore relationships between record counts and events. Although we can’t make any causal claims from what we see in our plot, making a time-series is a nice way to do some exploratory analysis on a lot of data!
Expand for session info
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.3.2 (2023-10-31 ucrt)
os Windows 10 x64 (build 19045)
system x86_64, mingw32
ui RTerm
language (EN)
collate English_Australia.utf8
ctype English_Australia.utf8
tz Australia/Sydney
date 2024-02-12
pandoc 3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.3.2)
forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.3.2)
galah * 2.0.1 2024-02-06 [1] CRAN (R 4.3.2)
ggplot2 * 3.4.4 2023-10-12 [1] CRAN (R 4.3.1)
ggtext * 0.1.2 2022-09-16 [1] CRAN (R 4.3.2)
htmltools * 0.5.7 2023-11-03 [1] CRAN (R 4.3.2)
lubridate * 1.9.3 2023-09-27 [1] CRAN (R 4.3.2)
pilot * 4.0.0 2022-07-13 [1] Github (olihawkins/pilot@f08cc16)
purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.2)
readr * 2.1.4 2023-02-10 [1] CRAN (R 4.3.2)
sessioninfo * 1.2.2 2021-12-06 [1] CRAN (R 4.3.2)
showtext * 0.9-6 2023-05-03 [1] CRAN (R 4.3.2)
showtextdb * 3.0 2020-06-04 [1] CRAN (R 4.3.2)
stringr * 1.5.1 2023-11-14 [1] CRAN (R 4.3.2)
sysfonts * 0.8.8 2022-03-13 [1] CRAN (R 4.3.2)
tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.3.2)
tidyr * 1.3.0 2023-01-24 [1] CRAN (R 4.3.2)
tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.3.2)
[1] C:/Users/KEL329/R-packages
[2] C:/Users/KEL329/AppData/Local/Programs/R/R-4.3.2/library
──────────────────────────────────────────────────────────────────────────────