Colors in R
Color palettes are important to people, and the R ecosystem includes literally hundreds of possible palettes. If you want a “complete” list, go and check out Emil Hvitfeldt’s list of palettes here; but in practice there are only a few that we use routinely. Our default at ALA labs is to use viridis
for continuous scales, because (to quote their CRAN page) it’s color-blind friendly, perceptually uniform, and pretty. The default purple-green-yellow color scheme is lovely, but I’m a big fan of ‘magma’, which has a black-purple-orange-yellow scheme
library(galah)
library(dplyr)
library(ggplot2)
library(viridis)
# Get field code for states/territories
search_fields("state") # layer: cl22 OR stateProvince
# A tibble: 14 × 3
id description type
<chr> <chr> <chr>
1 cl22 Australian States and Territories fields
2 cl927 States including coastal waters fields
3 cl938 Fruit Fly Exclusion Zone - Tri State fields
4 cl2013 ASGS Australian States and Territories fields
5 cl10900 Australia's Indigenous forest estate (2013) v2.0 fields
6 cl10922 PSMA State Electoral Boundaries (2018) fields
7 cl10925 PSMA States (2016) fields
8 cl110922 PSMA State Electoral Boundary Classes (2018) fields
9 cl110925 PSMA States - Abbreviated (2016) fields
10 stateInvasive <NA> fields
11 stateProvince State/Territory fields
12 raw_stateProvince State/Territory (unprocessed) fields
13 stateConservation State conservation fields
14 raw_stateConservation State conservation (unprocessed) fields
# Download record counts by state/territory
<- galah_call() %>%
records galah_group_by(cl22) %>%
atlas_counts()
# Add state information back to data frame
$State <- factor(seq_len(nrow(records)), labels = records$cl22)
records
# Plot
ggplot(records, aes(x = State, y = log10(count), fill = count)) +
geom_bar(stat = "identity") +
coord_flip() +
scale_fill_viridis(option = "magma", begin = 0.10, end = 0.95) +
theme_bw() +
theme(legend.position = "none")
My default for categorical color schemes is the ‘dark2’ palette from RColorBrewer
; but given the subject matter of our work, it’s worth mentioning the wonderful feather
package by Shandiya Balasubramaniam, which gives colors based on Australian bird plumage.
# remotes::install_github(repo = "shandiya/feathers")
library(feathers)
<- galah_call() %>%
rcfd galah_identify("Rose-crowned Fruit-Dove") %>%
galah_group_by(cl22) %>%
atlas_counts()
$State <- factor(seq_len(nrow(rcfd)), labels = rcfd$cl22)
rcfd
ggplot(rcfd, aes(x = State, y = log10(count), fill = State)) +
geom_bar(stat = "identity") +
coord_flip() +
scale_fill_manual(values = get_pal("rose_crowned_fruit_dove")) +
theme_bw() +
theme(legend.position = "none")
All of this is fine, but what if you have a specific image that you want to take colors from? A logical choice is to pick the colors you want using an image editting program, but if we want to try something automated, there are options in R as well.
Extracting colors
National Eucalypt Day aims to raise awareness about Eucalypts and celebrate their influence on the lives of Australians. In honour of National Eucalypt day, we wanted to created a plot based on occurrences data held in the Atlas of Living Australia, themed using colours from actual Eucalypts.
We used this image from a tweet by Dean Nicolle:
Happy 'National Eucalypt Day'!
— Dean Nicolle (@DeanNicolle1) March 22, 2021
The Western Australian gimlet (Eucalyptus salubris) has just been announced as Eucalypt of the Year for 2021. Renowned for its fluted, smooth, shiny, and colourful trunk & branches. pic.twitter.com/pOsufQtxWS
First, get observations of the Eucalypt of the Year 2021 from ALA using the galah package. Specifically, we use atlas_counts()
to determine how many records of Eucalyptus salubris are held by the ALA:
<- galah_call() %>%
n_records galah_identify("Eucalyptus salubris") %>%
atlas_counts()
Here is what the data look like:
%>% head() n_records
# A tibble: 1 × 1
count
<int>
1 892
Then get a color scheme from images of the species in question using the paletter package (which needs to be installed from GitHub)
# remotes::install_github("AndreaCirilloAC/paletter")
library(paletter)
<- create_palette(
image_pal image_path = "./data/Dean_Nicolle_Esalubris_image_small.jpeg",
type_of_variable = "categorical",
number_of_colors = 15)
Note that we downsized the image before running the paletter
code, as large images take much longer to process.
Creating a plot
Once we have this palette, the obvious question is what kind of plot to draw. We could have done a map, but those can be a bit boring. We decided to try something that represented the number of observations we had of this species at ALA, and included color, but was otherwise just a pretty picture that didn’t need to contain any further information. Rather than have a traditional x and y axis, therefore, we decided to try out the igraph package to plot the points in an interesting way.
First, we create a vector containing as many points as we want to display, and distribute our colors among them as evenly as possible
# create a vector to index colours
<- floor(n_records / length(image_pal))
rep_times
<- rep(seq_along(image_pal),
colour_index each = as.integer(rep_times))
Then we can create a network using igraph
, and use it to create a layout for our points
library(igraph)
<- lapply(c(1:15), function(a){
graph_list <- which(colour_index == a)
lookup return(
tibble(
from = lookup[c(1:(length(lookup)-1))],
to = lookup[c(2:length(lookup))])
)
})<- as_tibble(do.call(rbind, graph_list)) %>% # build matrix
graph_df ::drop_na() %>%
tidyras.matrix(.)
<- graph_from_edgelist(graph_df) # create network graph
colour_graph
# convert to a set of point locations
<- as.data.frame(layout_nicely(colour_graph)) # convert to df
test_layout colnames(test_layout) <- c("x", "y") # change colnames
$colour_index <- factor(colour_index) # add colour_index col test_layout
Finally, we draw the plot with ggplot2, removing axes with theme_void()
ggplot(test_layout, aes(x = x, y = y, colour = colour_index)) +
geom_point(size = 3, alpha = 0.9) +
scale_color_manual(values = image_pal) +
coord_fixed() +
theme_void() +
theme(legend.position = "none")
That’s it! While I like the effect here, I think the paletter
package is best suited to cases where there are large areas of strongly contrasting colors; it’s less ideal for images with subtle color differences. It also doesn’t appear to have been updated lately, which may mean it’s not being supported any more. But I’m happy with this plot, and would definitely consider using it again.
Expand for session info
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.3.2 (2023-10-31 ucrt)
os Windows 10 x64 (build 19045)
system x86_64, mingw32
ui RTerm
language (EN)
collate English_Australia.utf8
ctype English_Australia.utf8
tz Australia/Sydney
date 2024-02-12
pandoc 3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.3.2)
feathers * 0.0.0.9000 2022-10-11 [1] Github (shandiya/feathers@4be766d)
galah * 2.0.1 2024-02-06 [1] CRAN (R 4.3.2)
ggplot2 * 3.4.4 2023-10-12 [1] CRAN (R 4.3.1)
htmltools * 0.5.7 2023-11-03 [1] CRAN (R 4.3.2)
igraph * 1.5.1 2023-08-10 [1] CRAN (R 4.3.2)
paletter * 0.0.0.9000 2023-01-10 [1] Github (AndreaCirilloAC/paletter@c09605b)
sessioninfo * 1.2.2 2021-12-06 [1] CRAN (R 4.3.2)
viridis * 0.6.4 2023-07-22 [1] CRAN (R 4.3.2)
viridisLite * 0.4.2 2023-05-02 [1] CRAN (R 4.3.1)
[1] C:/Users/KEL329/R-packages
[2] C:/Users/KEL329/AppData/Local/Programs/R/R-4.3.2/library
──────────────────────────────────────────────────────────────────────────────