Skip to content

Refactor: update superseded dplyr syntax to current best practices #243

@martinctc

Description

@martinctc

Several parts of the codebase currently use superseded dplyr syntax, such as mutate_at(), summarise_at(), and other scoped variants. Since dplyr 1.0.0, these functions have been superseded in favor of using across() within verbs like mutate() and summarise(), which is now the recommended approach for applying functions across multiple columns.

Additionally, there are instances where select() and other tidyselect functions are used directly with string variables (e.g., select(df, varname) where varname is a character variable). The current best practice is to wrap such variables with all_of() or any_of() to ensure robust and predictable selection, especially when variables may or may not exist in the data frame.

Examples of superseded syntax:

  • mutate_at(vars(starts_with("x")), funs(mean))
  • summarise_at(vars(matches("score")), mean)

Recommended replacements:

  • mutate(across(starts_with("x"), mean))
  • summarise(across(matches("score"), mean))

For variable selection:

  • Instead of select(df, varname), use select(df, all_of(varname)) when varname is a character vector of column names.

Benefits of updating:

  • Ensures compatibility with the latest dplyr releases
  • Improves code readability and maintainability
  • Reduces risk of deprecation warnings or errors in future dplyr versions

Action items:

  • Refactor all instances of *_at(), *_if(), and *_all() to use across() within the relevant verbs.
  • Update any direct string-based column selection to use all_of() or any_of() as appropriate.

These changes will help keep the codebase current with modern dplyr best practices.

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions