Author

Matilda Stevenson
Dax Kellie
Martin Westgate

Date

March 2021

Note

Article updated 6 February, 2023. Updates streamline code, and provide more examples of output after each step. More in-text detail has also been added about what is happening at each step.

The Atlas of Living Australia (ALA) holds records of magpie sightings from a number data providers like iNaturalist, eBird and BirdLife Australia. Let’s make a visualisation of Australian Bird of the Year 2018 winner, Magpies, using records held in the ALA.

Getting species occurrences

As with any R project, a good first step is to load the required packages.

# packages
library(ggplot2)
library(tidyr)
library(dplyr)
library(ozmaps)
library(sf)
library(hexbin)

We will use the {galah} package to download records.

To download species occurrence records, the {galah} package requires you to add an email registered with the ALA to galah_config(). If running this code yourself, you will need to add an email using the code below, substituting your email with myemail@email.com. This email address should be registered with the ALA, which you can do here

library(galah)
galah_config(email = "myemail@email.com")

Now we can download magpie occurrence records by using atlas_occurrences(). Note that we also set our data ‘profile’ to ‘ALA’; this means we only download records that meet some basic data quality standards enforced by the atlas. This is optional, but tends to improve the quality of the data returned. (If you wish to see the data quality filters applied in the ALA profile, use search_all(profiles, "ALA") |> show_values())

magpie_occ <- galah_call() %>%
  galah_identify("Cracticus tibicen") %>%
  galah_apply_profile(ALA) %>%
  atlas_occurrences()

Retrying in 1 seconds.
Retrying in 2 seconds.
Retrying in 4 seconds.
Retrying in 8 seconds.
Retrying in 16 seconds.
Retrying in 32 seconds.
Retrying in 60 seconds.

Let’s have a look at the first few rows of the data we’ve just downloaded:

magpie_occ %>% head()

# A tibble: 6 × 8
  recordID        scientificName taxonConceptID decimalLatitude decimalLongitude
  <chr>           <chr>          <chr>                    <dbl>            <dbl>
1 0000032b-8690-… Gymnorhina ti… https://biodi…           -37.0             145.
2 0000150d-489d-… Gymnorhina ti… https://biodi…           -37.9             145.
3 00001736-1fd8-… Gymnorhina ti… https://biodi…           -34.8             151.
4 00001e12-29cf-… Gymnorhina ti… https://biodi…           -39.6             144.
5 00001f60-0371-… Gymnorhina ti… https://biodi…           -35.2             149.
6 000035cd-c3a1-… Gymnorhina ti… https://biodi…           -37.6             144.
# ℹ 3 more variables: eventDate <dttm>, occurrenceStatus <chr>,
#   dataResourceName <chr>

For the purpose of this exercise, we’re going to filter records not on the mainland or Tasmania.

filtered_occ <- magpie_occ %>% filter(decimalLongitude < 155,
                                      decimalLongitude > 110,
                                      decimalLatitude > -45,
                                      decimalLatitude < -10)

Plotting binned data

The easiest way to create a hex map is using the hexbin package. However, because there are some areas that have many more observations than other areas, without standardising our data the result is not very useful.

ggplot() +
  geom_hex(data = filtered_occ,
           mapping = aes(x = decimalLongitude, 
                         y = decimalLatitude), 
           bins = 47, 
           colour = "white") +
  coord_sf(ylim = c(-45, -10), 
           xlim = c(110, 155)) +
  scale_fill_gradientn(colours = c("#EEECEA", "#E06E53")) +
  theme_void()

To make a more informative hex map, in this case it might be useful to try to create our hexagons manually. We can do this by creating a grid of hexagons, filtering the grid to the outline of Australia, and adding our data of magpie counts to set the fill color of those hexagons.

To achieve this, we can first convert the map of Australia provided by ozmaps to the same coordinate system as ALA data.

aus <- st_transform(ozmaps::ozmap_country, 4326)

Next we’ll create a grid of hexagons.

grid_all <- st_make_grid(aus, 
                         cellsize = 1, 
                         what = "polygons", 
                         square = FALSE,
                         flat_topped = TRUE)

ggplot() +
  geom_sf(data = grid_all)

Now we’ll extract all the hexagons in our full grid that intersect our map of Australia, and filter our grid to only include those hexagons by only keeping the hexagon rows that are returned after running st_intersects().

# extract rows that are within AUS land
keep_hexes <- st_intersects(grid_all, aus) %>%
  as.data.frame(.) %>%
  pull(row.id)

# filter full grid to only hexagon IDs in AUS
oz_grid <- grid_all[keep_hexes]

ggplot() + geom_sf(data = oz_grid)

Now to figure out how many magpie observations are within each hexagon. To do this, first we’ll convert our magpie observation points to an sf spatial object and make sure the point projection is the same as our map of Australia. Then we can use st_intersects() again to return a list, where each data.frame within the list shows which hexagon ID each point is within.

magpie_points_sf <- filtered_occ %>% 
  st_as_sf(coords = c("decimalLongitude", "decimalLatitude"), 
  crs = st_crs(4326))

intersect <- st_intersects(magpie_points_sf, oz_grid)

intersect[5:10]

[[1]]
[1] 69

[[2]]
[1] 43

[[3]]
[1] 414

[[4]]
[1] 31

[[5]]
[1] 185

[[6]]
[1] 67

With all points in their own separate data.frame, we can use the wicked-fast table() function from base R to count how many points match each hexagon ID, giving us our point counts! A little renaming and wrangling helps to get our counts in the right format.

# condense counts into tibble
counts <- as_tibble(table(unlist(intersect)), 
          .name_repair = "unique") %>%
  rename("hex_id" = 1,
         "count" = 2) %>%
  mutate(hex_id = as.integer(hex_id)) %>%
  replace_na(list(count = 0))

We’ll add our count column from complete_counts to our oz_grid, along with an id column containing the row number. This column will act as a reference column to join with complete_counts. Then we’ll also make sure that oz_grid is an sf object for plotting.

oz_grid <- oz_grid %>%
  as_tibble() %>%
  mutate(id = row_number()) %>%
  full_join(counts,
            by = join_by(id == hex_id)) %>%
  st_as_sf()

oz_grid |> head()

Simple feature collection with 6 features and 2 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 144.8105 ymin: -44.13203 xmax: 148.5632 ymax: -41.63203
Geodetic CRS:  WGS 84
# A tibble: 6 × 3
                                                            geometry    id count
                                                       <POLYGON [°]> <int> <int>
1 ((146.5425 -43.63203, 146.8312 -44.13203, 147.4085 -44.13203, 147…     1   591
2 ((145.6765 -43.13203, 145.9652 -43.63203, 146.5425 -43.63203, 146…     2     9
3 ((147.4085 -43.13203, 147.6972 -43.63203, 148.2746 -43.63203, 148…     3  1602
4 ((144.8105 -42.63203, 145.0991 -43.13203, 145.6765 -43.13203, 145…     4     6
5 ((146.5425 -42.63203, 146.8312 -43.13203, 147.4085 -43.13203, 147…     5  9692
6 ((145.6765 -42.13203, 145.9652 -42.63203, 146.5425 -42.63203, 146…     6    71

Finally, let’s build our map! We’ll use scale_fill_gradientn() to add a nice legend, and standardise our data using a log-transformation so that the colours on our map are scaled to be more informative.

ggplot() +
  geom_sf(data = oz_grid, aes(fill = count), size = .01) +
  scale_fill_gradientn(colours = c("#EEECEA", "#E06E53"), 
                       na.value = "white", 
                       trans = "log10",
                       labels = scales::comma_format(),
                       n.breaks = 6,
                       guide = guide_colourbar(title = "Observations")) +
  coord_sf(ylim = c(-45, -10), 
           xlim = c(110, 155)) +
  theme_void()

That’s it! All the extra work does make a difference in this case, providing a better representation of the spread of Mapgies across Australia. Manually constructing hex maps can be useful in other circumstances, too. For example, if we wanted to compare the number of magpies to contextual information within each polygon (such as rainfall or human population data), then manually constructing our own hexagons could help us to combine data from different sources.

A final point is that we could have achieved the same result by creating polygons first, then querying the ALA for the number of magpie records in each polygon using galah_geolocate(). That’s a bit more challenging, and not worthwhile in this case; but it can be an efficient solution where you require information on more species than there are polygons, for example. You can learn how to do this in this ALA Labs article, if you are interested to learn how!

Expand for session info

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.2 (2023-10-31 ucrt)
 os       Windows 10 x64 (build 19045)
 system   x86_64, mingw32
 ui       RTerm
 language (EN)
 collate  English_Australia.utf8
 ctype    English_Australia.utf8
 tz       Australia/Sydney
 date     2024-02-12
 pandoc   3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package     * version date (UTC) lib source
 dplyr       * 1.1.4   2023-11-17 [1] CRAN (R 4.3.2)
 galah       * 2.0.1   2024-02-06 [1] CRAN (R 4.3.2)
 ggplot2     * 3.4.4   2023-10-12 [1] CRAN (R 4.3.1)
 hexbin      * 1.28.3  2023-03-21 [1] CRAN (R 4.3.2)
 htmltools   * 0.5.7   2023-11-03 [1] CRAN (R 4.3.2)
 ozmaps      * 0.4.5   2021-08-03 [1] CRAN (R 4.3.2)
 sessioninfo * 1.2.2   2021-12-06 [1] CRAN (R 4.3.2)
 sf          * 1.0-14  2023-07-11 [1] CRAN (R 4.3.2)
 tidyr       * 1.3.0   2023-01-24 [1] CRAN (R 4.3.2)

 [1] C:/Users/KEL329/R-packages
 [2] C:/Users/KEL329/AppData/Local/Programs/R/R-4.3.2/library

──────────────────────────────────────────────────────────────────────────────