tdd-workflow

Test-Driven Development Workflow for R

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "tdd-workflow" with this command: npx skills add ab604/claude-code-r-skills/ab604-claude-code-r-skills-tdd-workflow

Test-Driven Development Workflow for R

This skill ensures all R code development follows TDD principles with comprehensive test coverage using testthat.

When to Activate

  • Writing new functions or features

  • Fixing bugs or issues

  • Refactoring existing code

  • Adding new model types

  • Creating data processing pipelines

  • Building Shiny components

Getting Started

Initialize testing infrastructure for your package:

Set up testthat (Edition 3)

usethis::use_testthat(3)

Create a test file for an existing source file

usethis::use_test("function_name")

Or create test and source file together

usethis::use_r("function_name") usethis::use_test("function_name")

Core Principles

  1. Tests BEFORE Code

ALWAYS write tests first, then implement code to make tests pass.

  1. Coverage Requirements
  • Minimum 80% coverage (unit + integration)

  • 100% coverage for statistical calculations

  • 100% coverage for data validation

  • All edge cases covered

  • Error scenarios tested

  1. Test Types

Tests follow a three-level hierarchy: File → Test → Expectation

Unit Tests

Individual functions and utilities:

test_that("rescale01 normalizes to [0, 1] range", { expect_equal(rescale01(c(0, 5, 10)), c(0, 0.5, 1)) expect_equal(rescale01(c(-10, 0, 10)), c(0, 0.5, 1)) })

test_that("rescale01 handles edge cases", { expect_equal(rescale01(c(5, 5, 5)), c(NaN, NaN, NaN)) expect_equal(rescale01(numeric(0)), numeric(0)) expect_equal(rescale01(c(0, NA, 10)), c(0, NA, 1)) })

Integration Tests

Function interactions and workflows:

test_that("data pipeline produces expected output", { raw_data <- read_fixture("sample_input.csv")

result <- raw_data |> clean_data() |> transform_features() |> summarize_results()

expect_s3_class(result, "tbl_df") expect_named(result, c("group", "mean", "sd", "n")) expect_true(all(result$n > 0)) })

Snapshot Tests

For complex outputs that are hard to specify:

test_that("model summary format is stable", { model <- fit_model(test_data) expect_snapshot(print(summary(model))) })

test_that("error messages are informative", { expect_snapshot( validate_input(invalid_data), error = TRUE ) })

Snapshot workflow:

Review snapshot changes

testthat::snapshot_review("test_name")

Accept snapshot changes

testthat::snapshot_accept("test_name")

Snapshots are stored in tests/testthat/_snaps/ directory.

BDD Alternative (Optional)

For behavior-driven development, use describe() and it() :

describe("matrix()", { it("can be multiplied by a scalar", { m1 <- matrix(1:4, 2, 2) m2 <- m1 * 2 expect_equal(matrix(c(2, 4, 6, 8), 2, 2), m2) })

it("can be transposed", { m <- matrix(1:4, 2, 2) expect_equal(t(m), matrix(c(1, 3, 2, 4), 2, 2)) }) })

Key distinction: "describe() verifies you implement the right things, test_that() ensures you do things right."

Test Design Principles

Self-Sufficient Tests

Each test should contain all setup, execution, and teardown code. Tests must be independent and runnable in isolation without relying on ambient state or prior test execution.

GOOD: Self-contained

test_that("function works with specific data", { data <- tibble(x = 1:10, y = rnorm(10)) # Setup result <- my_function(data) # Execute expect_equal(nrow(result), 10) # Assert })

BAD: Depends on external state

setup_data <- tibble(...) # Created outside test

test_that("function works", { result <- my_function(setup_data) # Relies on external data expect_equal(nrow(result), 10) })

Duplication Over Factoring

Repetition is acceptable in tests—duplicate setup code rather than extracting it elsewhere. Clarity outweighs avoiding duplication.

GOOD: Duplicated but clear

test_that("clean_data handles missing values", { data <- tibble(x = c(1, NA, 3), y = c(4, 5, 6)) result <- clean_data(data) expect_equal(nrow(result), 2) })

test_that("clean_data handles invalid values", { data <- tibble(x = c(1, -999, 3), y = c(4, 5, 6)) result <- clean_data(data, invalid = -999) expect_equal(nrow(result), 2) })

ACCEPTABLE: Each test is self-contained and readable

Plan for Failure

Write tests assuming they'll fail and require debugging. Make logic explicit and obvious. Run tests in fresh R sessions independently.

Use devtools::load_all()

During development, prefer devtools::load_all() over library() . This:

  • Exposes unexported functions for testing

  • Automatically attaches testthat

  • Eliminates unnecessary library() calls in tests

  • Simulates package loading without installation

testthat Edition 3

Edition 3 provides improved snapshot testing, better diffs via waldo, unified condition handling, parallel execution support, and byte-compiled code compatibility for mocking.

Deprecated Patterns → Modern Alternatives

DEPRECATED: context() calls

context("Data validation") # Remove - filename serves this purpose

DEPRECATED: expect_equivalent()

expect_equivalent(x, y)

MODERN:

expect_equal(x, y, ignore_attr = TRUE)

DEPRECATED: with_mock()

with_mock(external_call = function() "mocked", { result <- my_function() })

MODERN:

local_mocked_bindings( external_call = function() "mocked" ) result <- my_function()

DEPRECATED: expect_is()

expect_is(x, "data.frame")

MODERN:

expect_s3_class(x, "data.frame")

Initialize Edition 3

In DESCRIPTION , ensure:

Config/testthat/edition: 3

Or initialize with:

usethis::use_testthat(3)

Essential Expectations Reference

Equality & Identity

expect_equal(x, y) # With numeric tolerance expect_equal(x, y, tolerance = 0.001) expect_equal(x, y, ignore_attr = TRUE) expect_identical(x, y) # Exact match required expect_all_equal(x) # Every element equal (v3.3.0+)

Conditions

expect_error(code) expect_error(code, "pattern") expect_error(code, class = "validation_error") expect_warning(code) expect_no_warning(code) expect_message(code) expect_no_message(code)

Collections & Sets

expect_setequal(x, y) # Same elements, any order expect_contains(set, element) # Subset relationship (v3.2.0+) expect_in(element, set) # Membership check (v3.2.0+) expect_disjoint(set1, set2) # No overlap (v3.3.0+) expect_named(x, c("a", "b")) # Named vector/list

Type & Structure

expect_type(x, "double") expect_s3_class(x, "data.frame") expect_s4_class(x, "S4Class") expect_r6_class(x, "R6Class") expect_shape(matrix, c(2, 3)) # Matrix/array dimensions (v3.3.0+) expect_length(x, 10)

Logical

expect_true(x) expect_false(x) expect_all_true(x) # Every element TRUE (v3.3.0+) expect_all_false(x) # Every element FALSE (v3.3.0+)

Other Useful Expectations

expect_null(x) expect_invisible(result) expect_output(print(x), "pattern") expect_snapshot(complex_output)

File Organization

Tests mirror your package structure:

tests/ ├── testthat/ │ ├── test-validation.R # Tests for R/validation.R │ ├── test-processing.R # Tests for R/processing.R │ ├── test-models.R # Tests for R/models.R │ ├── test-output.R # Tests for R/output.R │ ├── helper-fixtures.R # Shared functions (sourced before tests) │ ├── setup-database.R # Setup code (runs during R CMD check) │ ├── helper-expectations.R # Custom expectations │ └── fixtures/ # Static test data files │ ├── sample_input.csv │ └── expected_output.rds └── testthat.R # Test runner

File Types

  • test-*.R

  • Actual test files (paired with source files)

  • helper-*.R

  • Shared utility functions, sourced before tests run

  • setup-*.R

  • Setup code that runs only during R CMD check

  • fixtures/

  • Static test data, accessed via test_path("fixtures/file")

Access fixtures:

test_path("fixtures", "sample_data.csv")

TDD Workflow Steps

Step 1: Define Expected Behavior

Document what the function should do:

Function: calculate_ci

Purpose: Calculate bootstrap confidence intervals

Inputs:

- data: numeric vector

- conf_level: confidence level (default 0.95)

- n_boot: number of bootstrap samples (default 1000)

Outputs:

- Named numeric vector with lower and upper bounds

Edge cases:

- Handle NA values

- Error on non-numeric input

- Error on empty input

Step 2: Write Failing Tests

tests/testthat/test-calculate_ci.R

library(testthat)

test_that("calculate_ci returns correct structure", { set.seed(123) result <- calculate_ci(1:100)

expect_type(result, "double") expect_named(result, c("lower", "upper")) expect_true(result["lower"] < result["upper"]) })

test_that("calculate_ci respects confidence level", { set.seed(123) ci_95 <- calculate_ci(1:100, conf_level = 0.95) ci_99 <- calculate_ci(1:100, conf_level = 0.99)

99% CI should be wider

expect_true(ci_99["upper"] - ci_99["lower"] > ci_95["upper"] - ci_95["lower"]) })

test_that("calculate_ci handles NA values", { set.seed(123) result <- calculate_ci(c(1:100, NA, NA))

expect_false(any(is.na(result))) })

test_that("calculate_ci validates inputs", { expect_error(calculate_ci("not numeric"), class = "validation_error") expect_error(calculate_ci(numeric(0)), class = "validation_error") expect_error(calculate_ci(1:10, conf_level = 1.5), class = "validation_error") })

Step 3: Run Tests (They Should Fail)

devtools::test()

✖ calculate_ci returns correct structure

✖ calculate_ci respects confidence level

✖ calculate_ci handles NA values

✖ calculate_ci validates inputs

Step 4: Implement Minimal Code

R/calculate_ci.R

#' Calculate Bootstrap Confidence Interval #' #' @param x Numeric vector #' @param conf_level Confidence level (default 0.95) #' @param n_boot Number of bootstrap samples (default 1000) #' @return Named numeric vector with lower and upper bounds #' @export calculate_ci <- function(x, conf_level = 0.95, n_boot = 1000) {

Validate inputs

if (!is.numeric(x)) { cli::cli_abort("{.arg x} must be numeric", class = "validation_error") } if (length(x) == 0) { cli::cli_abort("{.arg x} cannot be empty", class = "validation_error") } if (conf_level <= 0 || conf_level >= 1) { cli::cli_abort("{.arg conf_level} must be between 0 and 1", class = "validation_error") }

Remove NA values

x <- x[!is.na(x)]

Bootstrap

boot_means <- replicate(n_boot, mean(sample(x, replace = TRUE)))

Calculate quantiles

alpha <- 1 - conf_level c( lower = unname(quantile(boot_means, alpha / 2)), upper = unname(quantile(boot_means, 1 - alpha / 2)) ) }

Step 5: Run Tests Again

devtools::test()

✔ calculate_ci returns correct structure

✔ calculate_ci respects confidence level

✔ calculate_ci handles NA values

✔ calculate_ci validates inputs

Step 6: Refactor

Improve while keeping tests green:

Extract validation to helper

validate_ci_inputs <- function(x, conf_level) { if (!is.numeric(x)) { cli::cli_abort("{.arg x} must be numeric", class = "validation_error") } if (length(x) == 0) { cli::cli_abort("{.arg x} cannot be empty", class = "validation_error") } if (conf_level <= 0 || conf_level >= 1) { cli::cli_abort("{.arg conf_level} must be between 0 and 1", class = "validation_error") } }

calculate_ci <- function(x, conf_level = 0.95, n_boot = 1000) { validate_ci_inputs(x, conf_level)

x <- x[!is.na(x)] boot_means <- replicate(n_boot, mean(sample(x, replace = TRUE)))

alpha <- 1 - conf_level c( lower = unname(quantile(boot_means, alpha / 2)), upper = unname(quantile(boot_means, 1 - alpha / 2)) ) }

Step 7: Verify Coverage

covr::package_coverage()

calculate_ci.R: 100%

Testing Patterns

Testing Data Transformations

test_that("clean_data removes invalid rows", { input <- tibble( id = 1:4, value = c(1, NA, 3, -999) )

result <- clean_data(input, invalid_value = -999)

expect_equal(nrow(result), 2) expect_equal(result$id, c(1, 3)) expect_false(anyNA(result$value)) })

Testing Statistical Functions

test_that("weighted_mean matches manual calculation", { x <- c(1, 2, 3) w <- c(1, 2, 1)

result <- weighted_mean(x, w) expected <- sum(x * w) / sum(w) # (1 + 4 + 3) / 4 = 2

expect_equal(result, expected) })

Testing with Fixtures

helper-fixtures.R

read_fixture <- function(name) { path <- testthat::test_path("fixtures", name) readr::read_csv(path, show_col_types = FALSE) }

test-pipeline.R

test_that("pipeline handles real data", { input <- read_fixture("sample_data.csv") result <- process_pipeline(input)

expect_snapshot(result) })

Mocking External Dependencies

test_that("fetch_data handles API errors", {

Mock the API call

local_mocked_bindings( httr2_request = function(...) { stop("API unavailable") } )

expect_error( fetch_data("endpoint"), "API unavailable" ) })

Using withr for Cleanup

Use withr functions to manage temporary state with automatic restoration:

test_that("function respects options", {

Temporarily set options

withr::local_options(list(digits = 2))

result <- format_number(3.14159) expect_equal(result, "3.14") })

test_that("function writes to temp file", {

Create temp file that's automatically cleaned up

tmp <- withr::local_tempfile(lines = c("line 1", "line 2"))

result <- process_file(tmp) expect_equal(result$n_lines, 2) })

test_that("function uses custom environment variable", {

Temporarily set env var

withr::local_envvar(MY_VAR = "test_value")

result <- get_config() expect_equal(result$my_var, "test_value") })

Test Data Strategies

Choose the appropriate approach for your testing needs:

  1. Constructor Functions

Create data on-demand with helper functions:

helper-data.R

make_sample_data <- function(n = 100) { tibble( id = 1:n, group = sample(c("A", "B"), n, replace = TRUE), value = rnorm(n) ) }

test-analysis.R

test_that("analysis handles grouped data", { data <- make_sample_data(n = 50) result <- analyze_groups(data) expect_s3_class(result, "tbl_df") })

  1. Local Functions with Cleanup

Handle side effects using withr:

test_that("function reads CSV correctly", {

Create temp file with cleanup

tmp <- withr::local_tempfile(fileext = ".csv") write.csv(mtcars, tmp, row.names = FALSE)

result <- read_and_process(tmp) expect_equal(nrow(result), 32) })

  1. Static Fixtures

Store data files in fixtures/ directory:

Store in: tests/testthat/fixtures/sample_data.csv

test_that("function handles real data format", { path <- test_path("fixtures", "sample_data.csv") data <- read_csv(path) result <- process_data(data) expect_true(all(result$valid)) })

Common Testing Mistakes to Avoid

WRONG: Testing Implementation Details

Don't test internal state

expect_equal(obj$internal_cache, expected_cache)

CORRECT: Test Behavior

Test observable behavior

expect_equal(get_result(obj), expected_result)

WRONG: Brittle Tests

Breaks on any output change

expect_equal(as.character(result), "Mean: 5.234567890")

CORRECT: Flexible Assertions

Robust to formatting changes

expect_equal(result$mean, 5.23, tolerance = 0.01)

WRONG: Dependent Tests

test_that("creates data", { global_data <<- create() }) test_that("uses data", { process(global_data) }) # Depends on previous!

CORRECT: Independent Tests

test_that("creates and uses data", { data <- create() result <- process(data) expect_true(is_valid(result)) })

WRONG: Modifying Tests to Pass

When a test fails, don't change the test (unless it's wrong)

test_that("function returns 42", { expect_equal(my_function(), 42) # Test fails })

DON'T DO THIS:

test_that("function returns 41", { expect_equal(my_function(), 41) # Changed to pass - WRONG! })

CORRECT: Fix the Implementation

Fix the code to match expected behavior

test_that("function returns 42", { expect_equal(my_function(), 42) # Test fails })

Fix my_function() implementation instead

When Tests Fail

  • Do NOT modify tests to make them pass (unless the test is wrong)

  • Fix the implementation to match expected behavior

  • Add more tests if the failure reveals missing coverage

  • Update snapshots only if the change is intentional

Review and accept snapshot changes

testthat::snapshot_review("test_name") testthat::snapshot_accept("test_name")

Coverage Verification

Run coverage report

covr::package_coverage()

Interactive HTML report

covr::report()

Check specific thresholds

cov <- covr::package_coverage() pct <- covr::percent_coverage(cov) if (pct < 80) { stop("Coverage below 80%: ", round(pct, 1), "%") }

In testthat.R or as a coverage check

covr::package_coverage( type = "all", line_coverage = 0.80, function_coverage = 0.80 )

Debugging & Development

Running Tests at Different Scales

Micro: Interactive development

devtools::load_all() expect_equal(my_function(1), 1) # Direct expectation

Mezzo: Single file

testthat::test_file("tests/testthat/test-validation.R")

RStudio: Ctrl/Cmd+Shift+T

Macro: Full suite

devtools::test() devtools::check() # Full package validation

Test Reporters

Find slow tests

devtools::test(reporter = "slow")

Progress reporter (verbose)

devtools::test(reporter = "progress")

Test execution order independence

devtools::test(shuffle = TRUE)

Continuous Testing

Watch mode - auto-run on file changes

testthat::auto_test_package()

Parallel Execution (Edition 3)

Edition 3 supports parallel test execution for faster runs on multi-core systems.

Running Tests

All tests

devtools::test()

All tests (keyboard shortcut)

RStudio: Ctrl/Cmd+Shift+T

With coverage

covr::package_coverage()

Specific file

testthat::test_file("tests/testthat/test-validation.R")

Watch mode

testthat::auto_test_package()

Verbose output

devtools::test(reporter = "progress")

Find slow tests

devtools::test(reporter = "slow")

Test independence

devtools::test(shuffle = TRUE)

Full package check

devtools::check()

Success Metrics

  • 80%+ code coverage achieved

  • All tests passing

  • No skipped tests

  • Fast execution (< 30s for unit tests)

  • Tests catch bugs before production

  • Confident refactoring enabled

  • Tests run independently in any order

  • Clear, descriptive test names

  • Each test validates one concept

Remember: Tests are not optional. They are the safety net that enables confident refactoring, rapid development, and production reliability. Write them FIRST.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

r-style-guide

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

r-performance

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

rlang-patterns

No summary provided by upstream source.

Repository SourceNeeds Review