Testing Strategy Selection#

Choosing the right mix of tests determines whether your test suite catches real bugs or just consumes CI minutes. There is no single correct answer – the right strategy depends on your system architecture, team size, deployment cadence, and the cost of production failures.

The Testing Pyramid#

The classic testing pyramid, introduced by Mike Cohn, prescribes many unit tests at the base, fewer integration tests in the middle, and a small number of end-to-end tests at the top.

        /  E2E  \          Slow, expensive, brittle
       /----------\        Few tests
      / Integration \      Medium speed, moderate cost
     /----------------\    Moderate number
    /    Unit Tests     \  Fast, cheap, isolated
   /--------------------\  Many tests

The rationale is straightforward. Unit tests are fast (milliseconds), cheap to write, and pinpoint failures precisely. Integration tests verify that components work together but are slower and have more failure modes. E2E tests validate complete user flows but are slow, flaky, and expensive to maintain.

This model works well for systems with clear boundaries between units – libraries, data processing pipelines, algorithmic code, and well-structured backend services.

The Testing Trophy#

Kent C. Dodds proposed the testing trophy as an alternative, particularly for frontend and full-stack applications where integration tests deliver more confidence per test dollar than unit tests.

        /  E2E  \
       /----------\
      /             \
     /  Integration  \     <-- Most tests here
    /                  \
   /    Static Types    \
  /----------------------\

The trophy places integration tests as the largest category, with static analysis (TypeScript, ESLint) at the base, unit tests below integration, and E2E tests at the top.

The argument: in a React application, unit testing a component in isolation (mocking all its dependencies) tells you little about whether the feature works. An integration test that renders the component with its real child components, wired to a mock API, catches far more real bugs.

When to Use Each Test Type#

Unit Tests#

Write unit tests when the code under test has clear inputs and outputs with minimal external dependencies.

Strong fit:

Pure functions and algorithms (sorting, parsing, validation, calculations)
Domain model logic (business rules, state machines)
Utility libraries and shared modules
Code with complex branching (many edge cases to cover)

Weak fit:

Thin wrappers around external services
UI components that primarily compose other components
Glue code that just passes data between systems

Practical guidance: A unit test that requires more than two mocks is testing the wrong thing. If you find yourself mocking the database, the HTTP client, and the file system in a single test, you are writing a bad unit test – write an integration test instead.

# Good unit test: clear input, clear output, no dependencies
def test_calculate_shipping_cost():
    order = Order(weight_kg=2.5, destination="US", items=3)
    cost = calculate_shipping(order)
    assert cost == Decimal("12.50")

# Bad unit test: mocking everything defeats the purpose
def test_process_order(mock_db, mock_payment, mock_shipping, mock_email):
    # This is an integration test wearing unit test clothing
    ...

Integration Tests#

Write integration tests when you need to verify that components work together correctly.

Strong fit:

API endpoint handlers (HTTP request through to response)
Database query logic (use a real database, not a mock)
Service-to-service communication paths
Message queue consumers and producers
Frontend components interacting with APIs

Weak fit:

Pure algorithms (unit tests are faster and more precise)
Full user journeys spanning many services (use E2E)

Practical guidance: Testcontainers is the standard approach for integration tests that need real dependencies. Spin up a PostgreSQL container, run your migrations, execute your tests, tear it down. This is slower than a mock but catches real bugs: incorrect SQL, missing indexes, constraint violations.

// Integration test: real HTTP server, real database
func TestCreateUser(t *testing.T) {
    db := setupTestDB(t)  // real PostgreSQL via testcontainers
    srv := httptest.NewServer(NewRouter(db))
    defer srv.Close()

    resp, err := http.Post(srv.URL+"/users",
        "application/json",
        strings.NewReader(`{"name":"Alice","email":"alice@example.com"}`))

    require.NoError(t, err)
    require.Equal(t, 201, resp.StatusCode)

    // Verify it actually landed in the database
    var count int
    db.QueryRow("SELECT count(*) FROM users WHERE email=$1",
        "alice@example.com").Scan(&count)
    require.Equal(t, 1, count)
}

End-to-End Tests#

Write E2E tests for critical user journeys that generate revenue or where failure causes significant damage.

Strong fit:

Signup and login flows
Payment and checkout processes
Core business workflows (the 3-5 most important paths)
Smoke tests after deployment

Weak fit:

Edge cases and error paths (use unit/integration tests)
Exhaustive feature coverage (too slow, too brittle)
Anything that can be validated at a lower level

Practical guidance: Keep E2E test count low. A healthy ratio is 5-20 E2E tests covering your most critical paths. If you have 500 E2E tests, your CI pipeline takes an hour and your team stops trusting the tests when they fail. Use Playwright over Cypress for new projects – it supports multiple browsers, is faster, and has better debugging tools.

// E2E test: the real signup flow
test('new user can sign up and reach dashboard', async ({ page }) => {
  await page.goto('/signup');
  await page.fill('[name="email"]', 'test@example.com');
  await page.fill('[name="password"]', 'SecurePass123!');
  await page.click('button[type="submit"]');

  // Verify redirect to dashboard
  await expect(page).toHaveURL('/dashboard');
  await expect(page.locator('h1')).toContainText('Welcome');
});

Contract Testing#

Contract testing verifies that services agree on their API contracts without running full integration tests across services.

When to use contract testing:

Multiple teams own different services that communicate via HTTP or messaging
You cannot spin up the entire system for testing (too many services)
Service APIs change frequently and you need early breakage detection
You have a consumer-driven development workflow

When to skip it:

Single team owns all services
Services communicate only through a well-defined, versioned API (REST with OpenAPI spec)
Fewer than 5 services total

Pact is the standard tool. The consumer writes a contract (expected request and response), the provider verifies it. Contracts are stored in a Pact Broker.

// Consumer side: define what you expect from the provider
const interaction = {
  state: 'a user with id 1 exists',
  uponReceiving: 'a request for user 1',
  withRequest: {
    method: 'GET',
    path: '/users/1',
  },
  willRespondWith: {
    status: 200,
    body: {
      id: 1,
      name: like('Alice'),  // type matching, not exact value
      email: term({ matcher: '.*@.*', generate: 'alice@example.com' }),
    },
  },
};

Property-Based Testing#

Property-based testing generates random inputs and verifies that properties hold across all of them. Instead of writing specific test cases, you define invariants.

When to use property-based testing:

Serialization/deserialization (encode then decode equals original)
Parsers (all valid inputs parse without error, parse then serialize equals normalized input)
Mathematical properties (commutativity, associativity, idempotency)
Data transformations (sort is idempotent, filter preserves order)

When to skip it:

Simple CRUD operations
UI testing
Code with many external side effects

from hypothesis import given
from hypothesis.strategies import text, integers

@given(text())
def test_json_roundtrip(s):
    """Encoding then decoding produces the original value."""
    assert json.loads(json.dumps(s)) == s

@given(integers(), integers())
def test_addition_commutative(a, b):
    assert add(a, b) == add(b, a)

Snapshot Testing#

Snapshot testing captures the output of a function or component and compares it against a stored snapshot. Changes to the output require explicit snapshot updates.

When to use snapshot testing:

UI component rendering (HTML or component tree output)
API response format stability
Configuration generation (Terraform plans, Kubernetes manifests)
Serialized data structures

When to avoid it:

Frequently changing outputs (you will just keep updating snapshots without reviewing them)
Large outputs where diffs are hard to review
Non-deterministic outputs (timestamps, random IDs)

The danger of snapshot testing is snapshot fatigue: developers blindly update snapshots without reviewing changes. Mitigate this by keeping snapshots small and reviewing snapshot diffs in code review.

Decision Matrix by System Type#

System Type	Primary Tests	Secondary Tests	Tertiary Tests
Library/SDK	Unit (80%)	Property-based (15%)	Integration (5%)
REST API service	Integration (60%)	Unit (30%)	E2E (10%)
Frontend SPA	Integration (50%)	Unit (20%)	E2E (20%), Snapshot (10%)
Data pipeline	Integration (50%)	Unit (30%)	Property-based (20%)
Microservice mesh	Contract (30%)	Integration (30%)	Unit (25%), E2E (15%)
CLI tool	Integration (50%)	Unit (40%)	Snapshot (10%)

Practical Guidelines for Agents#

When an agent is setting up testing for a new project or evaluating an existing test suite, follow this decision process:

Identify the system type from the matrix above. This sets the default test distribution.
Identify the critical paths – the 3-5 workflows where a production bug causes the most damage. These get E2E tests.
Identify the complex logic – algorithms, business rules, validation. These get unit tests.
Everything else gets integration tests – API handlers, database queries, service interactions.
Add contract tests only when multiple teams are involved and service boundaries are crossed.
Add property-based tests only for code with mathematical properties or serialization logic.

For CI pipeline configuration, run unit tests first (fastest feedback), integration tests second, and E2E tests last (or in a separate pipeline stage that does not block merges).

# Example CI pipeline ordering
stages:
  - lint-and-typecheck    # seconds
  - unit-tests            # seconds to low minutes
  - integration-tests     # minutes
  - e2e-tests             # minutes, can run in parallel
  - deploy-staging
  - smoke-tests           # post-deploy E2E on staging

A test suite that runs in under 10 minutes keeps developers in flow. If it exceeds 10 minutes, split it: run fast tests on every push, run slow tests on merge to main.