Testing
A disciplined approach to verifying that software behaves correctly, remains stable under change, and communicates intent to future developers. Good tests act as living documentation, a safety net for refactoring, and a design feedback mechanism.
This skill covers universal testing concepts that apply regardless of language, framework, or tooling.
When to Use
-
Designing a test strategy for a new project or feature
-
Deciding what level of testing (unit, integration, e2e) a piece of code needs
-
Evaluating whether existing tests are providing value or creating drag
-
Applying TDD to drive design decisions
-
Debugging a flaky or brittle test suite
-
Reviewing test code for quality and maintainability
Testing Pyramid
The testing pyramid describes the ideal distribution of tests across three levels. More tests at the base, fewer at the top.
/ E2E \ Few, slow, expensive
/----------\
/ Integration \ Moderate number, moderate speed
/----------------\
/ Unit Tests \ Many, fast, cheap
/____________________\
Unit Tests (Base)
-
Test a single unit of behavior in isolation (a function, a method, a small class)
-
No I/O, no database, no network, no file system
-
Execute in milliseconds
-
Should form the majority of your test suite (roughly 70%)
-
Fast feedback loop enables rapid iteration
Integration Tests (Middle)
-
Test how multiple units collaborate, or how code interacts with external systems
-
May involve a real database, message queue, or HTTP endpoint
-
Execute in seconds
-
Verify that wiring, configuration, and contracts between components work
-
Roughly 20% of your test suite
End-to-End Tests (Top)
-
Test complete user journeys through the full system
-
Interact with the application as a user would
-
Slowest, most brittle, most expensive to maintain
-
Reserve for critical business paths only
-
Roughly 10% of your test suite
The Ice Cream Cone Antipattern
The inverted pyramid: many e2e tests, few unit tests. Symptoms:
-
Test suite takes hours to run
-
Tests break constantly due to UI changes or timing issues
-
Developers stop running tests locally
-
Feedback loop is too slow to support continuous delivery
Fix: Identify what each e2e test is actually verifying. Push that verification down to the lowest possible level. Most business logic can be tested at the unit level.
Test Design Principles
Arrange-Act-Assert (AAA)
Every test should follow three distinct phases:
-
Arrange — set up the preconditions and inputs
-
Act — execute the behavior under test
-
Assert — verify the expected outcome
Keep each phase clearly separated. If Arrange dominates the test, extract a builder or factory. If Act requires multiple steps, you may be testing too much at once.
One Assertion per Concept
A test should verify one logical concept. This does not mean literally one assert call — asserting multiple properties of a single result is fine. What matters is that the test fails for exactly one reason.
// Good: one concept — "completed order has correct totals" assert order.subtotal == 100 assert order.tax == 21 assert order.total == 121
// Bad: two unrelated concepts in one test assert order.total == 121 assert emailService.wasCalled()
Test Naming
Test names should describe the behavior, not the implementation. A good test name answers: "What scenario is being tested, and what is the expected outcome?"
Patterns that work across languages:
-
should_return_zero_when_cart_is_empty
-
rejects_negative_quantities
-
applies_discount_for_premium_customers
Avoid names like testCalculate , test1 , or testGetterSetter .
Test Independence and Isolation
Each test must be completely independent of every other test:
-
No shared mutable state between tests
-
No required execution order
-
Each test sets up its own preconditions and cleans up after itself
-
A single failing test should not cascade into other failures
Deterministic Tests
A test must produce the same result every time it runs, regardless of:
-
The current time or date
-
The order of test execution
-
The machine it runs on
-
Network availability
-
Other tests running in parallel
Non-deterministic tests (flaky tests) destroy trust in the test suite and are worse than no tests at all.
FIRST Principles
Principle Meaning
Fast Tests should run in seconds, not minutes. Slow tests don't get run.
Independent No test relies on the output of another test.
Repeatable Same result in any environment — local, CI, staging.
Self-validating Pass or fail with no human interpretation required.
Timely Written at the right time — ideally before or alongside the production code.
Test-Driven Development (TDD)
TDD is a design discipline where tests are written before production code, following a tight feedback loop.
Red-Green-Refactor Cycle
-
Red — Write a failing test that describes the desired behavior
-
Green — Write the simplest production code that makes the test pass
-
Refactor — Improve the code structure while keeping all tests green
Rules:
-
Never write production code without a failing test
-
Write only enough test to fail (compilation failure counts)
-
Write only enough production code to pass the current failing test
Two Schools of TDD
Aspect Chicago (Classical) London (Mockist)
Verification State-based Interaction-based
Direction Inside-out Outside-in
Collaborators Real objects Mocks/stubs
Strength Refactoring-resilient tests Drives interface design
Risk Complex setup for deep graphs Tests coupled to implementation
See TDD Schools reference for detailed comparison and guidance.
When TDD Helps Most
-
Business logic with clear rules and edge cases
-
Algorithm design
-
API contract definition
-
Bug reproduction and fixing (write the failing test first)
When TDD May Not Apply
-
Exploratory prototyping (write tests after you understand the shape)
-
UI layout and styling
-
One-off scripts
Test Doubles
Test doubles replace real dependencies during testing. Each type serves a different purpose.
Double Purpose Verifies?
Dummy Fill parameter lists. Never actually used. No
Stub Provide canned responses to method calls. No
Spy Record interactions for later assertion. Yes (after the fact)
Mock Pre-programmed with expectations. Fails if not called correctly. Yes (inline)
Fake Simplified working implementation (e.g., in-memory repository). No
See Test Doubles reference for detailed guidance on when to use each type.
Key Principle: Mock at Boundaries
Use test doubles at architectural boundaries (ports, external services), not between internal collaborators. Mocking internal classes couples your tests to implementation details and makes refactoring painful.
What to Test / What Not to Test
High Value — Always Test
-
Business rules and domain logic
-
Edge cases, boundary conditions, error paths
-
State transitions and workflows
-
Input validation and sanitization
-
Security-critical paths (authentication, authorization)
-
Data transformations and calculations
Low Value — Usually Skip
-
Trivial getters/setters with no logic
-
Framework-generated code (ORM mappings, routing config)
-
Third-party library internals (test your integration, not their code)
-
Private methods (test through the public API)
-
Logging and telemetry (unless business-critical)
Testing Implementation vs Behavior
Test behavior, not implementation. A good test describes what the system does, not how it does it internally.
Signs you are testing implementation:
-
Test breaks when you refactor without changing behavior
-
Test asserts the order of internal method calls
-
Test verifies private state rather than public output
-
Renaming an internal class breaks tests for unrelated features
Signs you are testing behavior:
-
Test describes a user-meaningful scenario
-
Test remains green after internal refactoring
-
Test asserts on outputs, side effects, or state changes visible through the public API
Testing Strategies by Layer
Different architectural layers call for different testing approaches. See Testing Strategies reference for detailed guidance.
Layer Primary Test Type Key Technique
Domain/Business Logic Unit tests State-based verification, no I/O
Application Services Unit + Integration Test doubles for infrastructure ports
Data Access Integration Real database (test containers, in-memory)
API Endpoints Integration + Contract Request/response validation
UI Components Component tests Interaction simulation
Full System E2E (selective) Critical paths only
Common Antipatterns
Antipattern Symptoms Fix
Brittle tests Tests break on every refactor even when behavior is unchanged Test behavior through public API, not internal structure
Testing implementation Asserting on method call order, private state, internal wiring Assert on outputs and observable side effects
Slow test suite Test suite takes 10+ minutes; developers skip running tests Push tests down the pyramid; use test doubles for I/O
Flaky tests Tests pass/fail randomly without code changes Remove time dependencies, shared state, and ordering assumptions
Excessive mocking More mock setup than actual test logic; tests are unreadable Use real collaborators where possible; mock only at boundaries
Test data coupling Tests share fixtures and break when shared data changes Each test creates its own data; use builders/factories
Missing error paths Only happy path tested; failures discovered in production Explicitly test error cases, edge cases, and boundary conditions
Commented-out tests Failing tests are disabled rather than fixed or deleted Fix the test, or delete it if the behavior changed intentionally
Giant test methods Tests are 50+ lines with multiple acts and asserts Split into focused tests; extract setup into helpers
No assertion Test executes code but never asserts anything Every test must have at least one meaningful assertion
Quality Checklist
Use this checklist when writing or reviewing tests:
-
Behavior-focused: tests describe what the system does, not how
-
Independent: no test depends on another test's execution or state
-
Deterministic: same result every time, on every machine
-
Fast: unit tests in milliseconds, full suite in under 5 minutes
-
Readable: a new team member can understand the test without reading the implementation
-
Arranged clearly: AAA structure with obvious separation of phases
-
Named descriptively: test name explains the scenario and expected outcome
-
Error paths covered: not just happy path — edge cases and failures are tested
-
Minimal setup: no unnecessary dependencies or fixtures; builders/factories where needed
-
No flakiness: no time-dependent, order-dependent, or environment-dependent tests
-
Appropriate level: tested at the lowest pyramid level that provides confidence
-
Doubles at boundaries: mocks/stubs used at architectural ports, not internal classes