Make a highlighted time-series plot

Time-series analyses can be handy for seeing trends over time, and exploring how trends relate to major events. Here, we show how to create an exploratory time-series plot comparing observations of waterbirds prior to and during the COVID-19 pandemic.

Eukaryota
Animalia
Aves
Summaries
Intern-post
Authors

Thai Rushbrook

Olivia Torresan

Dax Kellie

Published

April 3, 2023

Author

Thai Rushbrook
Olivia Torresan
Dax Kellie

Date

3 April 2023

Intern Post

A majority of species observations in the Atlas of Living Australia are collected opportunistically, where people record observations incidentally rather than through an ongoing monitoring program.

However, whether an observation is recorded or not doesn’t just depend on the species. It might be rainy, it might be too hot, an area might be inaccessible; all of these factors can affect whether people make an observation.

The COVID-19 pandemic had a major impact on people’s health, behaviour and travel. In Australia, several lockdowns over 2020-2021 imposed restrictions on people’s movements, limiting them to certain activities near their homes. Melbourne experienced the longest continuous lockdown in the world.

To what extent did COVID-19 and lockdowns affect the number of species observations people made over that time? Here, we’ll use a highlighted time-series plot to investigate how lockdowns in Melbourne affected the observations of Anatidae (ducks, geese and swans), a taxonomic group frequently seen on walks and outdoor gatherings, compared to previous years.

Get data

We’ll start by downloading Anatidae records.

First, let’s load some packages:

# Load packages
library(galah)
library(tidyverse)
library(grid)
library(pilot) # remotes::install_github("olihawkins/pilot")
library(ggtext)
library(showtext)

Let’s use the {galah} package to download Anatidae records in Melbourne from years before and during COVID-19.

Searching with search_all(fields) shows us that {galah} contains Greater Capital City Statistical Areas, which we can use to filter our query.

search_all(fields, "city")
# A tibble: 1 × 3
  id      description                                            type  
  <chr>   <chr>                                                  <chr> 
1 cl10929 PSMA ABS Greater Capital City Statistical Areas (2016) fields
search_all(fields, "city") |> search_values("melbourne")
• Showing values for 'cl10929'.
# A tibble: 1 × 1
  cl10929          
  <chr>            
1 GREATER MELBOURNE

Let’s build our query to return Anatidae records from GREATER MELBOURNE and use galah_select() to return only the eventDate column.

You will need to first provide a registered email with the ALA using galah_config() before retrieving records.

# Add registered email (register at ala.org.au)
galah_config(email = "your-email@email.com")
birds <-
  galah_call() |>
  galah_identify("Anatidae") |>
  galah_filter(
    cl10929 == "GREATER MELBOURNE",
    year >= 2017,
    year <= 2021,
    basisOfRecord == "HUMAN_OBSERVATION"
  ) |>
  galah_select(eventDate) |>
  atlas_occurrences()

birds |> slice_sample(n = 10)
# A tibble: 10 × 2
   recordID                             eventDate          
   <chr>                                <dttm>             
 1 1f2ecd45-a862-46c8-a3ac-629fd43ea95c 2018-02-17 13:54:00
 2 d2752ab2-ec46-40f1-82f8-422969dc6e5c 2021-07-09 11:02:00
 3 c2d69c66-24f3-4f35-9701-f1f26b514fec 2019-07-21 08:40:00
 4 1a8b43ea-109f-453d-9432-b614b9c67424 2021-09-02 16:26:00
 5 577740ce-701a-4389-aa02-97b62a485ca6 2020-09-18 10:20:00
 6 6d122db3-1478-419e-9f6b-e31a41ab2488 2021-12-29 13:45:00
 7 7749f3fb-db99-4339-ba45-e5a6eea9d1bf 2021-08-09 07:52:00
 8 b45fdcb9-10d8-4aee-a528-8de6df94297d 2019-10-05 09:16:00
 9 88a30420-0a28-4c99-a628-d06bd4d73e49 2019-11-16 08:31:00
10 e0e783fe-5183-4e07-b01a-1a7e373836ba 2018-06-09 12:24:00

We’ll then extract the week and year of each date and count the total observations for each week.

birds_weekly <- birds |> 
  mutate(date = as_date(eventDate),
         year = year(eventDate),
         week = week(eventDate)) |>
  filter(year > 2016) |> # remove stray 2016 records
  group_by(year, week) |>
  summarise(week_obs = n())

birds_weekly 
# A tibble: 265 × 3
# Groups:   year [5]
    year  week week_obs
   <dbl> <dbl>    <int>
 1  2017     1      831
 2  2017     2      679
 3  2017     3      725
 4  2017     4      775
 5  2017     5      767
 6  2017     6      771
 7  2017     7      651
 8  2017     8      660
 9  2017     9      703
10  2017    10      689
# ℹ 255 more rows

We want to compare observations recorded in 2020-2021 to previous years, but because we know that contributions to the ALA have increased each year, comparing raw numbers will be an unequal comparison and bias our results.

To avoid this, let’s scale our weekly record counts by the total number of Anatidae observations each year. Doing this let’s us compare proportions rather than raw numbers.

First let’s download the total Anatidae records for each year.

birds_yearly <- 
  galah_call() |>    
  galah_identify("Anatidae") |> 
  galah_filter(cl10929 == "GREATER MELBOURNE", 
               year >= 2017, year <= 2021) |> 
  galah_group_by(year) |>
  atlas_counts() |>
  rename(year_obs = count) |>
  mutate(year = as.numeric(year)) |>
  arrange(-desc(year))
  
birds_yearly
# A tibble: 5 × 2
   year year_obs
  <dbl>    <int>
1  2017    39359
2  2018    50768
3  2019    48939
4  2020    49837
5  2021    58878

Now we’ll join birds_yearly with birds_weekly so we can calculate the proportion of records observed each week. We’ll do this by dividing each row’s weekly total by the yearly total.

birds_prop <- birds_weekly |> 
  left_join(birds_yearly) |> 
  rowwise() |> 
  mutate(prop = week_obs / year_obs,
         .keep = "unused") |> 
  ungroup()

birds_prop
# A tibble: 265 × 3
    year  week   prop
   <dbl> <dbl>  <dbl>
 1  2017     1 0.0211
 2  2017     2 0.0173
 3  2017     3 0.0184
 4  2017     4 0.0197
 5  2017     5 0.0195
 6  2017     6 0.0196
 7  2017     7 0.0165
 8  2017     8 0.0168
 9  2017     9 0.0179
10  2017    10 0.0175
# ℹ 255 more rows

To compare observations in years prior to and during COVID-19, we’ll want to plot two lines:

  1. A baseline of average weekly observation counts in 2017-2019
  2. A line with weekly observation counts over 2020 and 2021

To create the average 2017-2019 baseline, let’s calculate the mean proportion of records each week from 2017-2019.

To do this, we’ll place our weekly proportions in separate columns using pivot_wider().

birds_wide <- birds_prop |>
  pivot_wider(names_from = year, 
              values_from = prop, 
              names_sort = TRUE,
              names_glue = "year_{year}")

birds_wide
# A tibble: 53 × 6
    week year_2017 year_2018 year_2019 year_2020 year_2021
   <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
 1     1    0.0211    0.0209    0.0338    0.0313    0.0225
 2     2    0.0173    0.0178    0.0350    0.0228    0.0191
 3     3    0.0184    0.0168    0.0258    0.0239    0.0201
 4     4    0.0197    0.0158    0.0236    0.0244    0.0190
 5     5    0.0195    0.0164    0.0201    0.0155    0.0184
 6     6    0.0196    0.0137    0.0200    0.0180    0.0194
 7     7    0.0165    0.0176    0.0190    0.0154    0.0168
 8     8    0.0168    0.0138    0.0184    0.0153    0.0192
 9     9    0.0179    0.0151    0.0144    0.0131    0.0186
10    10    0.0175    0.0132    0.0215    0.0115    0.0163
# ℹ 43 more rows

Then we’ll calculate the mean proportion of observations each week across year_2017, year_2018 and year_2019 columns.

birds_mean_prop <- birds_wide |>
  rowwise() |>
  mutate(
    mean_2017_19 = mean(c_across(year_2017:year_2019)),
    .keep = "unused"
    ) |>
  ungroup()

birds_mean_prop
# A tibble: 53 × 4
    week year_2020 year_2021 mean_2017_19
   <dbl>     <dbl>     <dbl>        <dbl>
 1     1    0.0313    0.0225       0.0253
 2     2    0.0228    0.0191       0.0234
 3     3    0.0239    0.0201       0.0203
 4     4    0.0244    0.0190       0.0197
 5     5    0.0155    0.0184       0.0187
 6     6    0.0180    0.0194       0.0178
 7     7    0.0154    0.0168       0.0177
 8     8    0.0153    0.0192       0.0163
 9     9    0.0131    0.0186       0.0158
10    10    0.0115    0.0163       0.0174
# ℹ 43 more rows

Now we have all the numbers we need for plotting! We just need to reorganise them so that they plot correctly.

Two columns in birds_mean_prop contain proportional counts for 2020 and 2021. Although there are 52 weeks in a year, both years extend from weeks 1-53 because neither year started or ended exactly at the end of the week — 2020 ended on a Thursday and 2021 ended on a Friday.

wday(ymd("2020-12-31"), label = TRUE)
[1] Thu
Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
wday(ymd("2021-12-31"), label = TRUE)
[1] Fri
Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat

This means that the proportional counts in week 53 of 2020 and week 1 of 2021 are in the same week! We can combine them to better represent the full week’s observations and save the combined count in week 1 of 2021.

birds_mean_prop <- birds_mean_prop |>
  rows_update(tibble(week = 1, year_2021 = sum(birds_mean_prop$year_2020[53] + birds_mean_prop$year_2021[1]))) |>
  rows_update(tibble(week = 53, year_2020 = NA)) # remove 2020's week 53 count

birds_mean_prop |> slice_head(n = 3)
# A tibble: 3 × 4
   week year_2020 year_2021 mean_2017_19
  <dbl>     <dbl>     <dbl>        <dbl>
1     1    0.0313    0.0290       0.0253
2     2    0.0228    0.0191       0.0234
3     3    0.0239    0.0201       0.0203
birds_mean_prop |> slice_tail(n = 3)
# A tibble: 3 × 4
   week year_2020 year_2021 mean_2017_19
  <dbl>     <dbl>     <dbl>        <dbl>
1    51    0.0181   0.0145       0.0162 
2    52    0.0216   0.0159       0.0181 
3    53   NA        0.00124      0.00291

To allow us to plot proportional counts from Jan 2020 to Dec 2021 as one line (105 weeks total), we’ll separate our 2021 proportional counts, revise 2021 week numbers to start from 53, and place them under our 2020 proportions. That’ll let us plot from week 1 to week 105!

# 2021 record count proportions
birds_2021 <- birds_mean_prop |>
  select(-year_2020) |>
  rename(prop = year_2021) |>
  mutate(week = week + 52)

glimpse(birds_2021)
Rows: 53
Columns: 3
$ week         <dbl> 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 6…
$ prop         <dbl> 0.02902850, 0.01912429, 0.02012636, 0.01897143, 0.0183769…
$ mean_2017_19 <dbl> 0.02528978, 0.02338032, 0.02031649, 0.01971618, 0.0186806…
# 2020 + 2021 record count proportions
birds_final <- birds_mean_prop |>
  select(-year_2021) |>
  drop_na() |>
  rename(prop = year_2020) |>
  bind_rows(birds_2021) # attach 2021 to the bottom

glimpse(birds_final)
Rows: 105
Columns: 3
$ week         <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17…
$ prop         <dbl> 0.031281979, 0.022754179, 0.023857776, 0.024439673, 0.015…
$ mean_2017_19 <dbl> 0.02528978, 0.02338032, 0.02031649, 0.01971618, 0.0186806…

Lockdowns

During the height of the pandemic, Melbourne had 6 distinct lockdowns. Let’s add their start and end dates to a tibble.

Because we want to plot 2020 and 2021 on the same plot, we’ll use ifelse() to make sure our week numbers in 2021 match our week numbers in birds_final.

n_lockdown <- c(1:6)
start_date <- c("2020-03-31", "2020-07-09",
                "2021-02-13", "2021-05-28",
                "2021-07-16", "2021-08-05")
end_date <- c("2020-05-12", "2020-10-27",
              "2021-02-17", "2021-06-10",
              "2021-07-27", "2021-10-21")

lockdowns <- tibble(n_lockdown, start_date, end_date) |>
  mutate(
    n_days = as_date(ymd(end_date)) - as_date(ymd(start_date)),
    week_start = ifelse(year(start_date) == 2020, 
                        week(start_date), week(start_date) + 52),
    week_end = ifelse(year(end_date) == 2020, 
                      week(end_date), week(end_date) + 52),
    )
lockdowns 
# A tibble: 6 × 6
  n_lockdown start_date end_date   n_days   week_start week_end
       <int> <chr>      <chr>      <drtn>        <dbl>    <dbl>
1          1 2020-03-31 2020-05-12  42 days         13       19
2          2 2020-07-09 2020-10-27 110 days         28       43
3          3 2021-02-13 2021-02-17   4 days         59       59
4          4 2021-05-28 2021-06-10  13 days         74       75
5          5 2021-07-16 2021-07-27  11 days         81       82
6          6 2021-08-05 2021-10-21  77 days         83       94

Make plot

To help us see the components of our final plot more clearly, let’s construct our visualisation step-by-step.

First, we’ll add our lockdown dates as highlighted rectangular blocks. To do this we can use geom_rect(), setting the xmin and xmax values to our week_start and week_end columns in lockdowns. We’ll make the rectangle spread across the entire plot by setting ymax = Inf and ymin = 0.

We’ll also set our fill inside of aes() and define its value within scale_fill_manual() which will allow us to add the lockdown colour and label to its own legend.

p1 <- ggplot() +
  geom_rect(data = lockdowns,
            aes(xmin = week_start,
                xmax = week_end,
                fill = "Lockdown"),
            ymin = 0,
            ymax = Inf,
            alpha=0.2) +
  scale_fill_manual(
    values = c("Lockdown" = pilot_color("yellow")))
p1

Next we’ll add our proportional species observation counts as lines. We can define their colours and edit the legend and axis labels, too.

p2 <- p1 +
  # add lines
  geom_line(data = birds_final, 
            aes(x = week, y = prop, 
            color = "2020-21 Records"), 
            linewidth = 1) + 
  geom_line(data = birds_final, 
            aes(x = week, y = mean_2017_19, 
            color = "2017-19 Average"),
            linetype = "twodash", 
            linewidth = 0.8) + 
  # add fill
  geom_area(data = birds_final, 
            aes(x = week, y = prop),
            fill=pilot_color("blue"), 
            alpha=0.3) + 
  scale_color_manual(values = c(pilot_color("orange"),
                                pilot_color("blue")), 
                     labels = c("2017-19 (average)", 
                                "2020-21")) +
  guides(fill = guide_legend(title = ""), 
         color = guide_legend(title = "Year")) +
  labs(y = "Proportion of year's total observations",
       x = "Month",
       title = "Anatidae observations in Melbourne prior to & during COVID-19")
p2

The plot above is enough to see everything we need to see from our data. You could stop here if you wished, but we’ve gone ahead and made a more refined visualisation with nice fonts, axis scales, axis labels and titles!

Code
# add fonts
font_add_google("Montserrat", family = "mont")
font_add_google("Hind", family = "hind")  
showtext_auto(enable = TRUE)

p2 + 
  # make axis scales understandable
  scale_y_continuous(expand = c(0, 0),
                     limits = c(0, 0.035),
                     labels = scales::percent_format()) +
  scale_x_continuous(expand = c(0, 0),
                     limits = c(1, 105),
                     breaks = c(1, 14, 27, 40, 52, 65, 78, 91),
                     labels = c("Jan", "Apr", "Jul", "Oct", "Jan", "Apr", "Jul", "Oct")) +
  # add year x axis labels
  coord_cartesian(clip = "off") +
  annotate_pilot(label = "2020", x = 27, y = 0, 
                 alpha = 0.7, vjust = 3.8,size = 10) +
  annotate_pilot(label = "2021", x = 78, y = 0, 
                 alpha = 0.7, vjust = 3.8, size = 10) +
  labs(title = "*Anatidae* observations in Melbourne prior to & during COVID-19") +
  theme_pilot(grid = "",
              axes = "bl") +
  theme(axis.title.x = element_text(size = 23, vjust = -1.3),
        axis.title.y = element_text(size = 23),
        axis.text.x = element_text(size = 20),
        axis.text.y = element_text(size = 20),
        axis.line = element_line(linewidth = 0.5),
        legend.text = element_text(size = 23),
        legend.title = element_text(size = 20),
        plot.caption = element_text(size = 18),
        text = element_text(family = "hind"),
        plot.title = element_markdown(family = "mont", size = 31),
        plot.subtitle = element_markdown(family = "mont", size = 28))

Now that we have our final plot, we can see a few interesting trends:

  1. In the first lockdown (soon after COVID-19 arrived in Australia), species observations were lower than the 2017-2019 average.
  2. In the 2 longest lockdowns, species observations were higher than the 2017-2019 average.
  3. In the last half of all lockdowns except the first lockdown, observations increased.

Are these trends that you expected to see?

It’s impossible to make any claims of why these trends emerged from our data visualisation alone, but we can speculate (for fun!)

Were people making fewer observations in the first lockdown because they were preoccupied with all the other priorities of settling into new working-from-home routines? Did people make more observations at the tail end of lockdowns because they were seeking relief from being inside by spending more time by natural ponds and lakes?

Some evidence from the US found more people were using natural spaces during COVID-19, and time in these spaces lowered depression and anxiety. A New Zealand study found similar results.

Final thoughts

This post has detailed how you can use ALA data to explore relationships between record counts and events. Although we can’t make any causal claims from what we see in our plot, making a time-series is a nice way to do some exploratory analysis on a lot of data!

Expand for session info
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.2 (2023-10-31 ucrt)
 os       Windows 10 x64 (build 19045)
 system   x86_64, mingw32
 ui       RTerm
 language (EN)
 collate  English_Australia.utf8
 ctype    English_Australia.utf8
 tz       Australia/Sydney
 date     2024-02-12
 pandoc   3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package     * version date (UTC) lib source
 dplyr       * 1.1.4   2023-11-17 [1] CRAN (R 4.3.2)
 forcats     * 1.0.0   2023-01-29 [1] CRAN (R 4.3.2)
 galah       * 2.0.1   2024-02-06 [1] CRAN (R 4.3.2)
 ggplot2     * 3.4.4   2023-10-12 [1] CRAN (R 4.3.1)
 ggtext      * 0.1.2   2022-09-16 [1] CRAN (R 4.3.2)
 htmltools   * 0.5.7   2023-11-03 [1] CRAN (R 4.3.2)
 lubridate   * 1.9.3   2023-09-27 [1] CRAN (R 4.3.2)
 pilot       * 4.0.0   2022-07-13 [1] Github (olihawkins/pilot@f08cc16)
 purrr       * 1.0.2   2023-08-10 [1] CRAN (R 4.3.2)
 readr       * 2.1.4   2023-02-10 [1] CRAN (R 4.3.2)
 sessioninfo * 1.2.2   2021-12-06 [1] CRAN (R 4.3.2)
 showtext    * 0.9-6   2023-05-03 [1] CRAN (R 4.3.2)
 showtextdb  * 3.0     2020-06-04 [1] CRAN (R 4.3.2)
 stringr     * 1.5.1   2023-11-14 [1] CRAN (R 4.3.2)
 sysfonts    * 0.8.8   2022-03-13 [1] CRAN (R 4.3.2)
 tibble      * 3.2.1   2023-03-20 [1] CRAN (R 4.3.2)
 tidyr       * 1.3.0   2023-01-24 [1] CRAN (R 4.3.2)
 tidyverse   * 2.0.0   2023-02-22 [1] CRAN (R 4.3.2)

 [1] C:/Users/KEL329/R-packages
 [2] C:/Users/KEL329/AppData/Local/Programs/R/R-4.3.2/library

──────────────────────────────────────────────────────────────────────────────