Skip to content

Peformance: vec_unique_count#5158

Merged
teunbrand merged 2 commits intotidyverse:mainfrom
teunbrand:vec_unique_count
Feb 21, 2023
Merged

Peformance: vec_unique_count#5158
teunbrand merged 2 commits intotidyverse:mainfrom
teunbrand:vec_unique_count

Conversation

@teunbrand
Copy link
Copy Markdown
Collaborator

I've noticed that length(unique0(x)) is a common pattern in ggplot2 code. However, up to vectors with 1000 members, vctrs::vec_unique_count() seems to be faster (they perform about equal thereafter). For small vectors, the memory footprint is also better and about equal for larger vectors. The improvement is small, but the pattern is common enough that I thought this was warranted to save a few microseconds per plot.

A benchmark:

devtools::load_all("~/packages/ggplot2")
#> ℹ Loading ggplot2

lengths <- 2^(0:24)

bms <- lapply(lengths, function(n) {
  x <- sample(LETTERS, n, TRUE)
  bench::mark(
    length(unique0(x)),
    vec_unique_count(x), 
    iterations = 50
  )
})
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.

df <- do.call(rbind, bms)
df$n <- rep(lengths, each = 2)

ggplot(df, aes(n, as.numeric(median), colour = as.character(expression))) +
  geom_line() +
  scale_y_log10() +
  scale_x_log10() +
  labs(
    x = "Length of vector",
    y = "Seconds",
    colour = "Pattern"
  )

Created on 2023-01-24 with reprex v2.0.2

Copy link
Copy Markdown
Member

@yutannihilation yutannihilation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@teunbrand
Copy link
Copy Markdown
Collaborator Author

Thanks for the review!

@teunbrand teunbrand merged commit 4daadf7 into tidyverse:main Feb 21, 2023
@teunbrand teunbrand deleted the vec_unique_count branch February 21, 2023 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants