Files
local-cal/.ruler/05-TESTING-DOCTRINE.md
Dmytro Stanchiev 206f028fdf init ruler
Signed-off-by: Dmytro Stanchiev <git@dmytros.dev>
2026-04-06 20:48:07 -04:00

8.0 KiB
Raw Permalink Blame History

Production Testing Doctrine

Project-Agnostic Engineering Standard


1. Purpose of Testing

Testing exists to:

  • Prevent regressions
  • Protect critical business behavior
  • Enforce invariants
  • Guard boundaries
  • Provide safe refactoring
  • Reduce production incidents

Testing does not exist to:

  • Increase coverage numbers
  • Satisfy tooling requirements
  • Mirror implementation linebyline
  • Create a false sense of security

If a test does not reduce real-world risk, it should not exist.


2. Core Principles


2.1 Determinism Is Non-Negotiable

A test must:

  • Produce the same result every run
  • Not depend on execution order
  • Not depend on global state
  • Not depend on wall-clock time
  • Not depend on external networks
  • Not depend on randomness (unless seeded)

A flaky test is worse than no test.

If a test fails intermittently:

  • Fix it immediately
  • Or delete it

There is no third option.


2.2 Isolation of Behavior

Tests should verify behavior in isolation from unrelated systems.

The smaller the scope of the test, the more reliable and faster it is.

We separate:

  • Pure logic
  • System interactions
  • External integrations
  • Full-system behavior

Confusing these layers results in slow, fragile suites.


2.3 Risk-Based Testing

Testing effort should scale with risk.

High-risk areas:

  • Financial logic
  • Security and access control
  • Data mutation
  • Distributed coordination
  • Concurrency
  • Migration and transformation logic

Low-risk areas:

  • Static rendering
  • Formatting helpers
  • Simple data mapping

Testing must prioritize business-critical systems.


2.4 Tests Are Part of the System

Tests must follow the same standards as production code:

  • Clean structure
  • Clear naming
  • Maintainable
  • Reviewed in PRs
  • Refactored when necessary

Test code quality reflects engineering quality.


3. Testing Layers (Architecture-Neutral)

These layers apply universally.


3.1 Unit Tests (Logic Layer)

Definition: Tests that validate pure behavior without system dependencies.

Must:

  • Run fast
  • Avoid I/O
  • Avoid network
  • Avoid persistent state
  • Avoid framework bootstrapping

Should test:

  • Business rules
  • Domain invariants
  • Edge cases
  • Validation
  • Transformation logic

Reasoning: If logic cannot be tested without infrastructure, it is coupled too tightly.


3.2 Integration Tests (System Boundary Layer)

Definition: Tests that validate interactions between internal components.

May include:

  • Datastores
  • Filesystems
  • Queues
  • Caches
  • Framework wiring
  • Service boundaries

Must:

  • Use real internal components
  • Reset state between runs
  • Avoid real external services

Reasoning: Most production bugs occur at boundaries, not in pure functions.


3.3 External Integration Tests

Definition: Tests that validate interaction with third-party systems.

Policy:

  • Prefer mocking or simulation
  • Use sandbox environments only when necessary
  • Never depend on live production services

Reasoning: External systems are outside your control and introduce nondeterminism.


3.4 End-to-End Tests (System-Level)

Definition: Tests that validate complete workflows from entry to outcome.

Must:

  • Cover only critical flows
  • Be minimal in number
  • Run in isolated environments
  • Avoid unnecessary duplication of lower-level tests

End-to-end tests are expensive and fragile. Use them surgically.


4. State Management Policy


4.1 No Shared State Between Tests

Every test must assume a blank environment.

Options:

  • Fresh environment per test
  • Transaction rollback
  • Full reset between runs
  • Isolated test containers

No test may depend on side effects from another test.


4.2 Reproducible Environments

Tests must run consistently:

  • Locally
  • In CI
  • In parallel
  • Across operating systems (if supported)

Environment drift is unacceptable.


5. Mocking Policy


5.1 Mock External Systems

Mock:

  • Third-party APIs
  • Payment providers
  • Email systems
  • External storage
  • Network services outside system boundary

Reasoning: You do not control them.


5.2 Do Not Mock Core Logic

Never mock:

  • Business rules
  • Authorization checks
  • Data validation
  • Domain logic

Mocking internal logic invalidates the test.


5.3 Avoid Over-Mocking

Over-mocking:

  • Couples tests to implementation
  • Breaks refactoring
  • Creates fragile tests

Mock only what crosses system boundaries.


6. Error & Edge Case Policy

Every public interface must have tests for:

  • Valid input
  • Invalid input
  • Unauthorized or restricted access (if applicable)
  • Boundary values
  • Failure paths
  • Concurrency conflicts (if applicable)

Most real-world failures happen outside happy paths.


7. Security Testing Doctrine

All systems must test:

  • Access control enforcement
  • Privilege boundaries
  • Input validation
  • Injection resistance (where applicable)
  • Role escalation prevention

Security-sensitive logic must have near-complete coverage.


8. Concurrency & Race Conditions

If the system involves:

  • Multi-threading
  • Distributed nodes
  • Async processing
  • Queues
  • Parallel writes

Then tests must include:

  • Concurrent execution scenarios
  • Conflict handling
  • Idempotency verification
  • Retry logic behavior

These bugs rarely appear in simple test cases.


9. Migration & Data Evolution

If the system stores data over time:

  • Schema migrations must be tested
  • Data transformation must be verified
  • Backward compatibility must be validated
  • Downgrade scenarios (if supported) must be considered

Silent data corruption is catastrophic.


10. CI Enforcement

Tests must run automatically:

  • On every pull request
  • On main branch
  • Before release

CI must:

  • Fail fast
  • Prevent merges on failure
  • Run in clean environments
  • Be reproducible

If tests only run locally, they are not part of the system.


11. Coverage Philosophy

Coverage is a diagnostic tool, not a goal.

Required:

  • High coverage on business-critical modules
  • Full coverage on security boundaries
  • Full coverage on financial logic

Optional:

  • High coverage on trivial UI or formatting

100% coverage does not imply correctness. Low coverage in critical areas is unacceptable.


12. Performance of the Test Suite

The test suite must:

  • Run quickly enough to encourage frequent execution
  • Support parallelization
  • Avoid arbitrary sleeps
  • Avoid unnecessary bootstrapping

Slow tests reduce engineering velocity and discourage use.


13. Red Flags (Immediate Rejection)

  • Tests that sometimes fail
  • Tests that depend on execution order
  • Snapshot abuse
  • Arbitrary timeouts to “fix” flakiness
  • Global mutable state
  • Randomized data without seed
  • Testing implementation details instead of behavior
  • Excessive E2E replacing proper layering
  • Mocking core domain logic
  • Tests that assert only truthy values

14. Refactoring Policy

Tests must enable refactoring.

If changing internal structure breaks many tests without changing behavior:

  • The tests are coupled incorrectly.

Behavioral contracts should remain stable under refactor.


15. Production Observability Complements Testing

Testing does not replace:

  • Logging
  • Monitoring
  • Alerting
  • Metrics
  • Tracing

Tests prevent known failures. Observability detects unknown ones.

Both are required.


16. The Engineering Mindset

Before writing any test, ask:

  1. What failure would hurt the business most?
  2. What invariant must never break?
  3. What boundary is being crossed?
  4. What assumptions are being made?
  5. Can this test fail nondeterministically?
  6. Is this testing behavior or implementation?

If the test does not meaningfully reduce risk, reconsider it.


17. Definition of Production-Grade Testing

A system with production-grade testing:

  • Can be refactored safely
  • Rarely ships regressions
  • Catches security violations before release
  • Detects data integrity failures early
  • Has a stable, trusted CI pipeline
  • Has a fast feedback loop
  • Is boringly reliable

Engineers trust the test suite. They do not ignore it. They do not fear it. They rely on it.

That is the standard.