5 posts tagged with "execution"

From Vibe Coding to Governed Delivery

May 7, 2026 · 10 min read

Governance Foundation

AI coding agents are getting good enough that the old question, "Can they write code?", is becoming less interesting.

The harder question is whether they can participate in a real delivery system without turning the repo into a mess.

Once agents can read issues, modify files, run tests, create branches, and merge work, the risk changes. The problem is no longer capability. The problem is control.

More agents do not automatically create more delivery. Without an operating model, they create duplicated work, unclear ownership, long-lived branches, hidden feature flags, broken integration, and a growing gap between what the system appears to be doing and what is actually safe to ship.

That is the problem VibeGov is designed to address.

The mistake is treating agents like clever freelancers

A repo does not need a crowd of clever freelancers.

It needs a governed delivery system.

In many AI-assisted workflows, each agent is given a task, a prompt, and access to the repo. That can work for a small change. It does not scale into reliable delivery.

The moment multiple agents are involved, the system needs answers to basic governance questions:

Who decides what the issue means?
Who decides whether the issue is ready to build?
Who owns the architecture boundary?
Who owns delivery into the integration branch?
Who owns the user experience and design-system contract?
Who verifies the outcome independently?
Who watches for stale work, broken state, and follow-through?
Who is allowed to block unsafe change?

If those answers are not explicit, agents will fill the gaps with assumptions.

And assumptions are where delivery drift begins.

Prompts are not governance

Agent instructions matter, but prompts alone are not enough.

A prompt can say:

Do not expand scope.

But the delivery system still needs a place where scope is defined, reviewed, and enforced.

A prompt can say:

Keep the repo clean.

But the workflow still needs branch rules, validation gates, issue evidence, and a clear definition of done.

A prompt can say:

Follow the architecture.

But the project still needs someone or something accountable for defining that architecture, maintaining ADRs, and deciding when a change crosses a boundary.

VibeGov starts from a simple assumption:

Agents should be autonomous inside clear boundaries, not free outside accountability.

The issue is the work contract

In AI-assisted delivery, the issue becomes more important, not less.

A weak issue gives the agent room to guess. A strong issue gives the agent a contract to execute.

That contract should define:

the intended outcome
why it matters
scope and non-goals
OpenSpec binding or SPEC_GAP
acceptance criteria
verification expectations
risk level
any required research, exploration, design, security, or architecture input

This is why a one-line issue should not move straight into development.

Fast capture is fine. Fast execution from unclear intent is not.

The work can start as:

Fix login weirdness.

But it should not reach implementation until the issue explains what is weird, what correct behaviour looks like, how it binds to the spec, and how the result will be verified.

Intake can be loose. Execution should not be.

The board is the operating system

The project board is not just a reporting tool. It is the operational state machine.

A simple board is enough:

No status
Backlog
Ready
In Progress - In Dev
In Review - In Test
Done
Blocked
Parking Lot

The important part is not the labels. It is what they mean.

Ready means the issue is buildable and releasable.

In Progress - In Dev means the Developer agent is actively delivering it.

In Review - In Test means the change is being validated through automation, verifier activity, or release confidence checks.

Done means the work has landed cleanly and the integration branch is healthy.

Blocked means progress needs an explicit unblocker, not silent waiting.

Parking Lot means the idea is acknowledged but intentionally outside the current path.

This gives agents a shared operating surface. They do not need to invent side queues, hidden TODOs, or chat-based promises.

The board is where state lives.

Ready means releasable

One of the most important rules in an agent delivery system is this:

Ready means releasable.

An issue should not enter Ready unless the work can safely land on the integration branch and move toward release.

That does not mean every issue must deliver a large user-facing feature. It means the increment should be coherent, integrated, and safe.

Bad ready work looks like:

build half a feature and hide it
create a parallel implementation path
start a migration with no cutover plan
add a feature toggle with no owner or removal condition
implement speculative code for a future product decision

Good ready work looks like:

deliver a complete behaviour change
add a tested internal capability with a clear future use
implement a paid feature as an explicit entitlement
add an operational toggle with defined enabled and disabled behaviour
create a migration step that leaves the system stable

Agents move quickly. That makes issue slicing more important.

If the work is not safe to land, it is not ready for Dev.

Done means green integration state

Code written is not done.

Tests passing locally is not done.

A branch that looks good is not done.

Done means the work has made it to the integration branch and that integration state is still green.

This matters because agent delivery can create a false sense of progress. The agent can produce code, explain the change, and sound confident. But until the work is integrated, validated, and traceable to the issue, it has not improved the product.

The Developer agent should own the path from ready issue to green integration state:

start from a clean integration branch
implement the issue
update tests, docs, and config where required
validate locally
refresh from the current integration branch
integrate the change according to repo policy
watch automation
fix immediately if the pipeline fails
close the issue only when evidence is complete

This is not bureaucracy. It is delivery closure.

No wild forks

Branches are useful as temporary implementation workspaces.

They are not product states.

Long-lived branches, hidden futures, and parallel product lines create exactly the kind of ambiguity AI delivery should avoid.

The rule should be blunt:

All development must converge.

If a feature is worth building, it should be shaped into a releasable increment. If it is not ready to be released, it should remain in Backlog, Parking Lot, research, design, or architecture analysis.

Do not let the repo become a museum of abandoned futures.

Feature toggles are configuration, not hiding places

Feature toggles are not bad.

Undisciplined toggles are bad.

A feature toggle should be an explicit product, operational, or release control. It should not be a way to merge unfinished code and decide later what it means.

Good toggle use includes:

paid feature entitlement
tenant or customer-specific enablement
environment-specific behaviour
staged rollout
operational kill switch
time-bound experiment

For every toggle, define:

name
purpose
owner
configuration location
default state
enabled behaviour
disabled behaviour
tests for both states
removal condition if temporary

The key rule is simple:

No feature should require code edits to enable after development.

If a feature is optional, paid, staged, or tenant-specific, build it that way from the start.

Toggles are configuration and product controls, not hiding places for incomplete work.

Separate roles are useful when they create real control

The goal is not to create an agent circus.

Separate roles are useful when they create clearer accountability.

A practical operating model can include:

planner for intake, prioritisation, backlog hygiene, and developer handoff
architect for system design, ADRs, boundaries, migrations, developer-experience architecture, and technical direction
designer for UI/UX intent, Design Language System stewardship, user flows, component states, and accessibility-by-design
developer for issue execution, coding, testing, git hygiene, and integration
researcher for external evidence gathering, source evaluation, and cited synthesis
explorer for repo, UI, and API exploration, evidence capture, finding triage, and spec gaps
verifier for independent QA, regression checks, acceptance evidence, and release confidence
security for threat modelling, secrets, auth, privacy, dependency, licensing, and exposure review
documenter for READMEs, install guides, changelogs, user docs, and public comms
maintainer for repo hygiene, branch closure, changelogs, versioning, and release readiness
operator for recurring sweeps, task/state orchestration, reminders, and follow-through

Not every issue should pass through every role.

That would kill delivery speed.

Instead, route work by need.

Researcher and Explorer feed evidence. Designer shapes experience intent. Security blocks unsafe change. Architect protects direction. Planner protects readiness. Developer ships. Verifier proves. Documenter keeps the written surface aligned. Maintainer keeps release and repo hygiene clean. Operator keeps the system moving.

The model is not many agents doing whatever they want.

It is governed autonomy.

Specialists should feed the spec, not bypass it

A clean pattern is:

Raw idea
 ↓
Planner triage
 ↓
Research / exploration / design / security input as needed
 ↓
Architect or Planner creates the build-ready issue
 ↓
Developer delivers
 ↓
Automation and Verifier validate
 ↓
Integration remains green

Specialist work is independent of code. A Researcher can answer a question. An Explorer can inspect the repo. A Designer can define the user flow. Security can identify controls.

But those outputs should flow back into the issue or OpenSpec before development starts.

Research and design should not bypass the accountable delivery contract.

Automation proves mechanics; governance preserves meaning

Automation is essential, but it cannot do the whole job.

Automation can prove:

tests pass
build succeeds
lint and type checks pass
secrets are not detected
dependency checks are clean
pipeline triggered
artifact was produced

But automation cannot fully decide:

whether the issue meant the right thing
whether the architecture direction is sound
whether the user experience is coherent
whether the trade-off is acceptable
whether the feature should exist
whether scope was silently expanded
whether the disabled state of a paid feature makes product sense

That is why governance still matters.

Automation is the proof layer. It does not replace accountability.

The real unlock is governed autonomy

The next phase of AI software delivery will not be won by giving agents unlimited freedom.

It will be won by teams that can give agents enough autonomy to move fast and enough governance to keep the system coherent.

That means:

issues are treated as execution contracts
OpenSpec captures requirement truth
the project board carries operational state
the integration branch remains the integration truth
the release branch remains release truth
agents act within role authority
automation validates the mechanics
security and verification provide independent confidence
operators keep the loop moving

Vibe coding showed how quickly software can be produced when humans and AI work fluidly together.

The next step is making that flow reliable enough for serious delivery.

That is the shift from vibe coding to governed delivery.

Execution Sharpness and Governed Closure

April 24, 2026 · 3 min read

VibeGov Team

Governance Foundation

A lot of agent systems now know how to move fast.

That part is getting easier.

The harder problem is keeping fast execution legible, governable, and closable.

The real upgrade teams need

The next upgrade is not more agent theater. It is not longer plans. It is not status spam.

It is a tighter operating shape:

direct execution on bounded work,
verification before completion claims,
concise checkpoints at meaningful state changes,
explicit handling of inherited state,
and closure that reaches the governed landing path.

That is what dependable execution looks like.

What strong execution should feel like

A healthy implementation loop should feel crisp.

When the task is clear, the agent should:

gather the needed context,
make the change,
run the right proof,
close the state honestly,
and stop pretending that "edited files" means finished work.

That is the productive part of high-agency execution.

What goes wrong when speed loses governance

Fast execution becomes dangerous when teams let it collapse into black-box momentum.

Common failure modes look like this:

inherited repo mess ignored in the name of progress,
silence mistaken for professionalism,
passing build output treated as completion,
risky decisions taken without visible boundary,
and residue pushed into the next work unit.

These are not small style issues. They are reliability problems.

The operating rule VibeGov should encode

The useful rule is simple:

Keep execution sharp, but make closure and legibility non-negotiable.

That means:

tool-first execution,
bounded work units,
truthful verifier and evaluator gates,
concise operator-visible checkpoints,
explicit inherited-state assessment,
and governed git/repo closure.

Legibility is not the same as chatter

Teams often get stuck between two bad options:

constant narration, or
total silence.

The better target is interrupt-efficient legibility.

Operators should be able to see:

when a slice started or resumed,
when the plan materially changed,
when a blocker or decision boundary appeared,
what validation actually passed or failed,
and how the slice closed.

That is enough for oversight without drowning the channel.

Closure is part of the work

A slice is not complete when the code exists.

A slice is complete when the governed path is closed:

issue/spec state is updated where required,
evidence exists,
git state is accounted for,
the merge or follow-up path is explicit,
and the repo returns to its expected base state.

If that part is missing, the execution loop is still open.

Practical takeaway

The goal is not to make agents slower.

The goal is to make fast execution dependable.

A strong system should feel like this:

less ceremony,
less ambiguity,
less hidden residue,
more direct proof,
more reliable closure.

That is what VibeGov should normalize.

Handling One-Liner Issues Without Losing Delivery Speed

March 10, 2026 · 2 min read

VibeGov Team

Governance Foundation

One-liner issues are common in fast-moving teams.

They are useful for capturing intent quickly, but dangerous if treated as execution-ready work.

A one-liner like:

"Fix login weirdness"

is not enough to implement safely.

The problem with one-liners

If one-liners go straight into implementation, teams usually get:

mismatched outcomes (different people infer different intent)
poor traceability (no spec binding)
low-quality verification (unclear acceptance)
rework and issue churn

In short: speed at intake, chaos at execution.

The VibeGov approach

Keep one-liners for capture speed, but require intake hardening before execution.

Rule

A one-liner issue must not move directly to implementation.

Before execution, convert it into implementation-ready intent by:

Binding to existing OpenSpec requirement IDs, or
Creating/expanding spec coverage when missing (SPEC_GAP -> requirement), and
Upgrading the issue body to implementation-grade quality.

Only then does it enter active implementation.

Practical hardening checklist

For each one-liner, add:

clear outcome (what success looks like)
why it matters
in scope / out of scope
OpenSpec binding (ID/path or SPEC_GAP)
acceptance criteria
verification expectations

This preserves speed while restoring delivery clarity.

Why this works

intake stays fast (capture now, clarify before build)
implementation gets deterministic requirements
spec and backlog stay aligned
evidence quality improves
rework drops over time

Recommended workflow pattern

Use two backlog states:

Intake/Triage
- one-liners allowed
- not execution-ready
Ready for Execution
- hardened issue body
- spec-bound
- acceptance + verification defined

This simple split prevents governance bypass while keeping momentum.

Bottom line

One-liner issues are good for capture, not for execution.

Treat them as raw intake, harden them through spec binding and issue-quality upgrades, then build with confidence.

Getting Other Agents to Help Your Project (Without Losing Quality)

March 4, 2026 · 4 min read

VibeGov Team

Governance Foundation

A pattern that works well in real project delivery is splitting responsibilities across agents with clear contracts.

In current VibeGov terms, this is really a coordinated Development + Exploration operating model:

the builder primarily runs in Development mode
the validator primarily runs in Exploration mode
release verification stays inside the Development delivery path as a shipping gate

The pattern

Use two independent lanes:

Builder lane (shipping agent)
- implements features/fixes
- runs tests
- produces commits/artifacts
Validator lane (independent QA/spec agent)
- behaves like a normal user
- opens the app in browser and clicks real flows
- checks every clickable action (plus keyboard paths)
- compares behavior against OpenSpec/contracts
- creates focused backlog issues for each mismatch

This is exactly the setup where one agent is busy building and another agent/device is continuously validating outcomes against real UI behavior.

Why this works

Separation of concern: builder optimizes for delivery, validator optimizes for correctness.
Reduced bias: independent validation catches assumptions the builder misses.
Faster backlog hardening: defects become concrete, reproducible issues quickly.
Spec quality improves: uncovered behaviors force explicit requirement IDs and test mappings.

Operating contract

For each discovered gap, enforce:

Issue
Spec update (append-only IDs)
Validation evidence
Commit linked to issue

No “done” without runnable proof.

Recommended cadence

Builder runs continuously through priority backlog.
Validator runs on a fixed schedule (for example, every 45–60 minutes) and after major merges.
Release-aware checks can skip full reruns if build/version hasn’t changed.

Minimum evidence bundle per validation cycle

audited screens list
action inventory (every clickable)
pass/fail per action
keyboard traversal evidence (Tab, Shift+Tab, Enter, Space)
persistence/mutation verification where actions claim to save, delete, sync, import, or reconfigure
issue files for failures with expected vs actual
spec coverage reconciliation notes
explicit completeness status for the validation scope

Required issue fields (for validator-created backlog items)

When the validator opens an issue, include these fields every time:

Screen/route: exact URL/route where failure occurred
Control type: button/link/icon/menu item/form field/dialog action
Expected intent: what should happen (route/state/data/error)
Actual result: what happened instead
Repro steps: shortest deterministic path
Evidence links: screenshot/video/report path
Spec link/ID: existing requirement ID or SPEC_GAP
Suggested fix path: likely file/module owner

This keeps backlog items implementation-ready and eliminates “cannot reproduce” churn.

UI layering checks you should always include

Agents often miss visual-layer defects that humans catch immediately. Make these first-class checks:

Dialog visibility: modal/drawer appears when triggered and remains visible while active
Focus trap: keyboard focus stays in dialog while open
Backdrop behavior: backdrop blocks underlying clicks while modal is active
Z-index correctness: dialogs/toasts/menus are not hidden behind headers/sidebars/dev overlays
Escape/Close behavior: Esc, close icon, and Cancel all behave consistently

If any layering issue is found, file a dedicated issue (don’t bury it under generic “UI bug”).

CI handoff pattern (dev bot → validator bot)

A robust release handoff for bot teams, with release verification treated as part of Development:

Dev bot pushes issue-linked commit.
Dev bot monitors pipeline trigger for up to 30 seconds.
- Poll CI by commit SHA every ~5s.
If CI run appears:
- post run URL + SHA in issue evidence comment,
- hand off to validator bot.
If CI run fails early:
- update same issue with failing job/step/log snippet,
- fix immediately on the same ticket,
- commit/push again with same issue prefix.
If no CI run appears within 30s:
- create/update P0 CI-trigger blocker issue,
- stop downstream handoff until trigger is restored.

This prevents false “done” states where code is pushed but release validation never actually started.

Practical tips

Keep one issue per failed behavior.
Keep commits scoped to one issue whenever possible.
Track unresolved blockers publicly in backlog (don’t hide them in chat).
Treat spec drift as a first-class defect.
For release workflows, always include commit SHA + CI run URL in handoff comments.

If you run this loop consistently, backlog quality improves while velocity stays high—because Development and Exploration happen in parallel, not serially.

Communication Rules That Make AI Agent Execution Clearer

February 25, 2026 · 2 min read

VibeGov Team

Governance Foundation

Most AI delivery teams don’t fail from lack of output. They fail from unclear status, hidden blockers, and weak handoffs.

GOV-03 is the communication layer that turns agent activity into decision-grade visibility.

The real problem

Without communication rules, teams get:

"working on it" updates with no evidence
"done" claims with no verification context
blocker messages with no owner or next step
handoffs that lose scope and intent

That creates management noise, not delivery clarity.

What GOV-03 changes

GOV-03 makes every update actionable.

A useful execution update should answer:

What changed?
What proof exists?
What is blocked (if anything)?
What happens next?

This is the minimum needed for reliable human oversight and multi-agent continuity.

Why this matters commercially

Clear communication rules improve:

throughput predictability
confidence in delivery reporting
escalation speed when risk appears
onboarding speed for new contributors

In short: better communication quality directly improves delivery quality.

Practical rollout in one day

standardize one checkpoint update format
require evidence links for completion claims
require explicit blocker owner + next action
reject vague status updates

Small discipline, big clarity gain.

If your AI delivery feels busy but unclear, you don’t need more output. You need better communication contracts.

Read the canonical page:

The mistake is treating agents like clever freelancers​

Prompts are not governance​

The issue is the work contract​

The board is the operating system​

Ready means releasable​

Done means green integration state​

No wild forks​

Feature toggles are configuration, not hiding places​

Separate roles are useful when they create real control​

Specialists should feed the spec, not bypass it​

Automation proves mechanics; governance preserves meaning​

The real unlock is governed autonomy​

The real upgrade teams need​

What strong execution should feel like​

What goes wrong when speed loses governance​

The operating rule VibeGov should encode​

Legibility is not the same as chatter​

Closure is part of the work​

Practical takeaway​

Related reading​

The problem with one-liners​

The VibeGov approach​

Rule​

Practical hardening checklist​

Why this works​

Recommended workflow pattern​

Bottom line​

The pattern​

Why this works​

Operating contract​

Recommended cadence​

Minimum evidence bundle per validation cycle​

Required issue fields (for validator-created backlog items)​

UI layering checks you should always include​

CI handoff pattern (dev bot → validator bot)​

Practical tips​

Related docs​

The real problem​

What GOV-03 changes​

Why this matters commercially​

Practical rollout in one day​

Social takeaway​

The mistake is treating agents like clever freelancers

Prompts are not governance

The issue is the work contract

The board is the operating system

Ready means releasable

Done means green integration state

No wild forks

Feature toggles are configuration, not hiding places

Separate roles are useful when they create real control

Specialists should feed the spec, not bypass it

Automation proves mechanics; governance preserves meaning

The real unlock is governed autonomy

The real upgrade teams need

What strong execution should feel like

What goes wrong when speed loses governance

The operating rule VibeGov should encode

Legibility is not the same as chatter

Closure is part of the work

Practical takeaway

Related reading

The problem with one-liners

The VibeGov approach

Rule

Practical hardening checklist

Why this works

Recommended workflow pattern

Bottom line

The pattern

Why this works

Operating contract

Recommended cadence

Minimum evidence bundle per validation cycle

Required issue fields (for validator-created backlog items)

UI layering checks you should always include

CI handoff pattern (dev bot → validator bot)

Practical tips

Related docs

The real problem

What GOV-03 changes

Why this matters commercially

Practical rollout in one day

Social takeaway