Skip to main content

Test Execution Expectations

This is a testing/completeness support page. Use it from the normal operational path when you need to make test evidence more legible and harder to overclaim.

This doc exists to answer a simple question:

When we execute a test, what exactly are we supposed to look for?

VibeGov already treats tests as proof of delivery claims. This checklist makes the execution side more explicit so test runs become reviewable, comparable, and harder to overclaim.

Core rule

A test run should not just say that something passed. It should say:

  • what claim was under test
  • which scenario classes were exercised
  • what kind of proof was produced
  • what remains unverified, blocked, or deferred
  • what follow-up work was created because of missing completeness

1. Claim under test

State the exact thing the test run is trying to prove:

  • requirement ID
  • acceptance criterion
  • behavior statement
  • contract statement

Weak signal:

  • “ran tests”

Strong signal:

  • “verified REQ-123 persistence behavior after save + reload”

2. Scenario classes reviewed

Consider these scenario classes where relevant:

  • happy path
  • invalid input
  • empty state
  • loading state
  • error state
  • cancel/back-out path
  • persistence/refresh behavior
  • downstream side effects
  • role/permission variation
  • keyboard/accessibility path

Not every change needs every scenario class, but meaningful test execution should say which classes were:

  • exercised
  • not applicable
  • deferred
  • blocked

3. Evidence type

State what kind of evidence was used:

  • unit
  • integration
  • end-to-end
  • regression
  • manual proof
  • smoke/release verification
  • contract verification

The point is not to maximize ceremony. The point is to make the proof legible.

4. Result classification

Use one of these classifications for each meaningful claim or scenario:

  • Verified
  • Invalidated
  • Blocked
  • Deferred
  • Not applicable

This prevents weak “green enough” reporting.

5. Proof strength

Ask whether the evidence is:

  • Direct proof — clearly proves the claimed behavior
  • Partial proof — proves only part of the claim
  • Surrogate-only proof — proves a proxy, not the actual outcome

Examples of weak surrogate-only proof:

  • success toast shown
  • API returned 200
  • page rendered
  • build passed

Examples of stronger direct proof:

  • persisted state survives reload
  • downstream state/source-of-truth changed correctly
  • role restriction actually enforced
  • failure path behaves as specified

6. Persistence and post-action proof

When the work changes saved, synced, deleted, or otherwise durable state, testing should look beyond immediate UI feedback.

Check for things like:

  • state after refresh/reload
  • source-of-truth state changed
  • downstream consumer sees the change
  • deleted item is actually gone
  • sync behavior is correct

UI success alone is not enough proof for mutation-heavy work.

7. Residuals and follow-up

A strong test run should end by stating:

  • what remains unverified
  • what was blocked
  • what was deferred
  • what assumptions still exist
  • what follow-up artifact was created

Missing test completeness must become tracked work, not invisible compromise.

Fast execution template

Use this template when reviewing a meaningful test run:

  • Claim under test:
  • Requirement / acceptance criterion:
  • Scenario classes exercised:
  • Scenario classes not applicable / deferred / blocked:
  • Evidence type:
  • Result classification:
  • Proof strength:
  • Persistence/post-action checks:
  • Residual risks / unverified items:
  • Follow-up artifact(s) created: