Article updated 6 February, 2023. Updates streamline code, and provide more examples of output after each step. More in-text detail has also been added about what is happening at each step.
The Atlas of Living Australia (ALA) holds records of magpie sightings from a number data providers like iNaturalist, eBird and BirdLife Australia. Let’s make a visualisation of Australian Bird of the Year 2018 winner, Magpies, using records held in the ALA.
Getting species occurrences
As with any R project, a good first step is to load the required packages.
# packages
library(ggplot2)
library(tidyr)
library(dplyr)
library(ozmaps)
library(sf)
library(hexbin)
We will use the {galah} package to download records.
To download species occurrence records, the {galah} package requires you to add an email registered with the ALA to galah_config()
. If running this code yourself, you will need to add an email using the code below, substituting your email with myemail@email.com
. This email address should be registered with the ALA, which you can do here
library(galah)
galah_config(email = "myemail@email.com")
Now we can download magpie occurrence records by using atlas_occurrences()
. Note that we also set our data ‘profile’ to ‘ALA’; this means we only download records that meet some basic data quality standards enforced by the atlas. This is optional, but tends to improve the quality of the data returned. (If you wish to see the data quality filters applied in the ALA profile, use search_all(profiles, "ALA") |> show_values()
)
<- galah_call() %>%
magpie_occ galah_identify("Cracticus tibicen") %>%
galah_apply_profile(ALA) %>%
atlas_occurrences()
Retrying in 1 seconds.
Retrying in 2 seconds.
Retrying in 4 seconds.
Retrying in 8 seconds.
Retrying in 16 seconds.
Retrying in 32 seconds.
Retrying in 60 seconds.
Let’s have a look at the first few rows of the data we’ve just downloaded:
%>% head() magpie_occ
# A tibble: 6 × 8
recordID scientificName taxonConceptID decimalLatitude decimalLongitude
<chr> <chr> <chr> <dbl> <dbl>
1 0000032b-8690-… Gymnorhina ti… https://biodi… -37.0 145.
2 0000150d-489d-… Gymnorhina ti… https://biodi… -37.9 145.
3 00001736-1fd8-… Gymnorhina ti… https://biodi… -34.8 151.
4 00001e12-29cf-… Gymnorhina ti… https://biodi… -39.6 144.
5 00001f60-0371-… Gymnorhina ti… https://biodi… -35.2 149.
6 000035cd-c3a1-… Gymnorhina ti… https://biodi… -37.6 144.
# ℹ 3 more variables: eventDate <dttm>, occurrenceStatus <chr>,
# dataResourceName <chr>
For the purpose of this exercise, we’re going to filter records not on the mainland or Tasmania.
<- magpie_occ %>% filter(decimalLongitude < 155,
filtered_occ > 110,
decimalLongitude > -45,
decimalLatitude < -10) decimalLatitude
Plotting binned data
The easiest way to create a hex map is using the hexbin
package. However, because there are some areas that have many more observations than other areas, without standardising our data the result is not very useful.
ggplot() +
geom_hex(data = filtered_occ,
mapping = aes(x = decimalLongitude,
y = decimalLatitude),
bins = 47,
colour = "white") +
coord_sf(ylim = c(-45, -10),
xlim = c(110, 155)) +
scale_fill_gradientn(colours = c("#EEECEA", "#E06E53")) +
theme_void()
To make a more informative hex map, in this case it might be useful to try to create our hexagons manually. We can do this by creating a grid of hexagons, filtering the grid to the outline of Australia, and adding our data of magpie counts to set the fill
color of those hexagons.
To achieve this, we can first convert the map of Australia provided by ozmaps
to the same coordinate system as ALA data.
<- st_transform(ozmaps::ozmap_country, 4326) aus
Next we’ll create a grid of hexagons.
<- st_make_grid(aus,
grid_all cellsize = 1,
what = "polygons",
square = FALSE,
flat_topped = TRUE)
ggplot() +
geom_sf(data = grid_all)
Now we’ll extract all the hexagons in our full grid that intersect our map of Australia, and filter our grid to only include those hexagons by only keeping the hexagon rows that are returned after running st_intersects()
.
# extract rows that are within AUS land
<- st_intersects(grid_all, aus) %>%
keep_hexes as.data.frame(.) %>%
pull(row.id)
# filter full grid to only hexagon IDs in AUS
<- grid_all[keep_hexes]
oz_grid
ggplot() + geom_sf(data = oz_grid)
Now to figure out how many magpie observations are within each hexagon. To do this, first we’ll convert our magpie observation points to an sf
spatial object and make sure the point projection is the same as our map of Australia. Then we can use st_intersects()
again to return a list, where each data.frame
within the list shows which hexagon ID each point is within.
<- filtered_occ %>%
magpie_points_sf st_as_sf(coords = c("decimalLongitude", "decimalLatitude"),
crs = st_crs(4326))
<- st_intersects(magpie_points_sf, oz_grid)
intersect
5:10] intersect[
[[1]]
[1] 69
[[2]]
[1] 43
[[3]]
[1] 414
[[4]]
[1] 31
[[5]]
[1] 185
[[6]]
[1] 67
With all points in their own separate data.frame
, we can use the wicked-fast table()
function from base
R to count how many points match each hexagon ID, giving us our point counts! A little renaming and wrangling helps to get our counts in the right format.
# condense counts into tibble
<- as_tibble(table(unlist(intersect)),
counts .name_repair = "unique") %>%
rename("hex_id" = 1,
"count" = 2) %>%
mutate(hex_id = as.integer(hex_id)) %>%
replace_na(list(count = 0))
We’ll add our count
column from complete_counts
to our oz_grid
, along with an id
column containing the row number. This column will act as a reference column to join with complete_counts
. Then we’ll also make sure that oz_grid
is an sf
object for plotting.
<- oz_grid %>%
oz_grid as_tibble() %>%
mutate(id = row_number()) %>%
full_join(counts,
by = join_by(id == hex_id)) %>%
st_as_sf()
|> head() oz_grid
Simple feature collection with 6 features and 2 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 144.8105 ymin: -44.13203 xmax: 148.5632 ymax: -41.63203
Geodetic CRS: WGS 84
# A tibble: 6 × 3
geometry id count
<POLYGON [°]> <int> <int>
1 ((146.5425 -43.63203, 146.8312 -44.13203, 147.4085 -44.13203, 147… 1 591
2 ((145.6765 -43.13203, 145.9652 -43.63203, 146.5425 -43.63203, 146… 2 9
3 ((147.4085 -43.13203, 147.6972 -43.63203, 148.2746 -43.63203, 148… 3 1602
4 ((144.8105 -42.63203, 145.0991 -43.13203, 145.6765 -43.13203, 145… 4 6
5 ((146.5425 -42.63203, 146.8312 -43.13203, 147.4085 -43.13203, 147… 5 9692
6 ((145.6765 -42.13203, 145.9652 -42.63203, 146.5425 -42.63203, 146… 6 71
Finally, let’s build our map! We’ll use scale_fill_gradientn()
to add a nice legend, and standardise our data using a log-transformation so that the colours on our map are scaled to be more informative.
ggplot() +
geom_sf(data = oz_grid, aes(fill = count), size = .01) +
scale_fill_gradientn(colours = c("#EEECEA", "#E06E53"),
na.value = "white",
trans = "log10",
labels = scales::comma_format(),
n.breaks = 6,
guide = guide_colourbar(title = "Observations")) +
coord_sf(ylim = c(-45, -10),
xlim = c(110, 155)) +
theme_void()
That’s it! All the extra work does make a difference in this case, providing a better representation of the spread of Mapgies across Australia. Manually constructing hex maps can be useful in other circumstances, too. For example, if we wanted to compare the number of magpies to contextual information within each polygon (such as rainfall or human population data), then manually constructing our own hexagons could help us to combine data from different sources.
A final point is that we could have achieved the same result by creating polygons first, then querying the ALA for the number of magpie records in each polygon using galah_geolocate()
. That’s a bit more challenging, and not worthwhile in this case; but it can be an efficient solution where you require information on more species than there are polygons, for example. You can learn how to do this in this ALA Labs article, if you are interested to learn how!
Expand for session info
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.3.2 (2023-10-31 ucrt)
os Windows 10 x64 (build 19045)
system x86_64, mingw32
ui RTerm
language (EN)
collate English_Australia.utf8
ctype English_Australia.utf8
tz Australia/Sydney
date 2024-02-12
pandoc 3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.3.2)
galah * 2.0.1 2024-02-06 [1] CRAN (R 4.3.2)
ggplot2 * 3.4.4 2023-10-12 [1] CRAN (R 4.3.1)
hexbin * 1.28.3 2023-03-21 [1] CRAN (R 4.3.2)
htmltools * 0.5.7 2023-11-03 [1] CRAN (R 4.3.2)
ozmaps * 0.4.5 2021-08-03 [1] CRAN (R 4.3.2)
sessioninfo * 1.2.2 2021-12-06 [1] CRAN (R 4.3.2)
sf * 1.0-14 2023-07-11 [1] CRAN (R 4.3.2)
tidyr * 1.3.0 2023-01-24 [1] CRAN (R 4.3.2)
[1] C:/Users/KEL329/R-packages
[2] C:/Users/KEL329/AppData/Local/Programs/R/R-4.3.2/library
──────────────────────────────────────────────────────────────────────────────