R Style Guide & Function Writing Best Practices
Consistent naming, spacing, structure, and function design for R code
Function Writing Best Practices
Structure and Style
Good function structure
rescale01 <- function(x) { rng <- range(x, na.rm = TRUE, finite = TRUE) (x - rng[1]) / (rng[2] - rng[1]) }
Use type-stable outputs
map_dbl() # returns numeric vector map_chr() # returns character vector map_lgl() # returns logical vector
Naming and Arguments
Good naming: snake_case for variables/functions
calculate_mean_score <- function(data, score_col) {
Function body
}
Prefix non-standard arguments with .
my_function <- function(.data, ...) {
Reduces argument conflicts
}
Style Guide Essentials
Object Names
-
Use snake_case for all names
-
Variable names = nouns, function names = verbs
-
Avoid dots except for S3 methods
Good
day_one calculate_mean user_data
Avoid
DayOne calculate.mean userData
Spacing and Layout
Good spacing
x[, 1] mean(x, na.rm = TRUE) if (condition) { action() }
Pipe formatting
data |> filter(year >= 2020) |> group_by(category) |> summarise( mean_value = mean(value), count = n() )
Assignment
Good - Use <- for assignment
x <- 5
Avoid - = for assignment (use only for function arguments)
x = 5 # Less clear intent
Indentation and Line Length
-
Use 2 spaces for indentation (never tabs)
-
Keep lines under 80 characters when possible
-
For long function calls, put each argument on its own line
Good - Long function call
do_something_complicated( data = my_data, arg_one = value_one, arg_two = value_two, arg_three = value_three )
Good - Long pipe chain
result <- data |> filter(year >= 2020) |> mutate( new_var = old_var * 2, another_var = str_to_lower(text_var) ) |> summarise( mean_value = mean(value), .by = category )
Comments
Good - Comments explain WHY, not WHAT
Calculate running average to smooth noise in sensor data
running_avg <- zoo::rollmean(values, k = 5)
Avoid - Comments that just repeat the code
Add 1 to x
x <- x + 1
File Organization
1. Load packages at the top
library(dplyr) library(ggplot2)
2. Source any helper files
source("R/helpers.R")
3. Define constants
MAX_ITERATIONS <- 1000 DEFAULT_THRESHOLD <- 0.05
4. Define functions
process_data <- function(data) {
...
}
5. Main script logic (if not a package)
main <- function() { data <- read_csv("data/input.csv") result <- process_data(data) write_csv(result, "data/output.csv") }
Function Design Guidelines
Single Responsibility
Good - Each function does one thing
read_and_validate <- function(path) { data <- read_csv(path) validate_columns(data) data }
validate_columns <- function(data) { required <- c("id", "value", "date") missing <- setdiff(required, names(data)) if (length(missing) > 0) { stop("Missing columns: ", paste(missing, collapse = ", ")) } }
Avoid - Function does too many things
do_everything <- function(path, output_path, ...) {
Reads, validates, transforms, models, plots, writes...
}
Return Values
Good - Explicit return for complex functions
calculate_metrics <- function(data) { metrics <- list( mean = mean(data$value), sd = sd(data$value), n = nrow(data) ) return(metrics) }
Good - Implicit return for simple functions
square <- function(x) { x^2 }
Avoid - Return in the middle without good reason
process <- function(x) { if (is.null(x)) return(NULL) # OK - early exit
... more code
result # Implicit return at end }
Error Handling
Good - Informative error messages
validate_input <- function(x, name = "x") {
if (!is.numeric(x)) {
stop("", name, " must be numeric, not ", typeof(x), call. = FALSE)
}
if (length(x) == 0) {
stop("", name, " cannot be empty", call. = FALSE)
}
}
Good - Use cli for user-friendly messages
validate_input_cli <- function(x) { if (!is.numeric(x)) { cli::cli_abort( "{.arg x} must be numeric, not {.cls {class(x)}}." ) } }
Default Arguments
Good - Sensible defaults
summarise_data <- function(data, na.rm = TRUE, digits = 2) {
...
}
Good - NULL default for optional arguments
filter_data <- function(data, min_value = NULL, max_value = NULL) { if (!is.null(min_value)) { data <- filter(data, value >= min_value) } if (!is.null(max_value)) { data <- filter(data, value <= max_value) } data }
Tidyverse API Conventions
Data-First Argument
Good - Data as first argument for piping
my_transform <- function(data, var, threshold = 0.5) { data |> filter({{ var }} > threshold) }
Usage
data |> my_transform(value, threshold = 0.8)
Prefixed Non-Standard Arguments
Good - Prefix with . to avoid conflicts
group_summary <- function(.data, ..., .by = NULL) { .data |> summarise(..., .by = {{ .by }}) }
Consistent Return Types
Good - Always return tibble
my_function <- function(data) { result <- data |> # processing... filter(!is.na(value))
tibble::as_tibble(result) }
Common Style Mistakes
Avoid These Patterns
Avoid - Inconsistent spacing
x<-1+2 # No spaces x <- 1 + 2 # Correct
Avoid - Unnecessary parentheses
if ((x > 0)) {} # Extra parens if (x > 0) {} # Correct
Avoid - Using T/F instead of TRUE/FALSE
if (x == T) {} # T can be overwritten if (x == TRUE) {} # Correct
Avoid - Semicolons to separate statements
x <- 1; y <- 2 # Hard to read x <- 1 # Correct y <- 2
Avoid - attach() - creates ambiguity
attach(mtcars) mean(mpg) # Which mpg? detach(mtcars)
Correct - Be explicit
mean(mtcars$mpg)
or
with(mtcars, mean(mpg))
or
mtcars |> pull(mpg) |> mean()