AI coding agents are getting good enough that the old question, "Can they write code?", is becoming less interesting.
The harder question is whether they can participate in a real delivery system without turning the repo into a mess.
Once agents can read issues, modify files, run tests, create branches, and merge work, the risk changes. The problem is no longer capability. The problem is control.
More agents do not automatically create more delivery. Without an operating model, they create duplicated work, unclear ownership, long-lived branches, hidden feature flags, broken integration, and a growing gap between what the system appears to be doing and what is actually safe to ship.
That is the problem VibeGov is designed to address.
The mistake is treating agents like clever freelancers
A repo does not need a crowd of clever freelancers.
It needs a governed delivery system.
In many AI-assisted workflows, each agent is given a task, a prompt, and access to the repo. That can work for a small change. It does not scale into reliable delivery.
The moment multiple agents are involved, the system needs answers to basic governance questions:
- Who decides what the issue means?
- Who decides whether the issue is ready to build?
- Who owns the architecture boundary?
- Who owns delivery into the integration branch?
- Who owns the user experience and design-system contract?
- Who verifies the outcome independently?
- Who watches for stale work, broken state, and follow-through?
- Who is allowed to block unsafe change?
If those answers are not explicit, agents will fill the gaps with assumptions.
And assumptions are where delivery drift begins.
Prompts are not governance
Agent instructions matter, but prompts alone are not enough.
A prompt can say:
Do not expand scope.
But the delivery system still needs a place where scope is defined, reviewed, and enforced.
A prompt can say:
Keep the repo clean.
But the workflow still needs branch rules, validation gates, issue evidence, and a clear definition of done.
A prompt can say:
Follow the architecture.
But the project still needs someone or something accountable for defining that architecture, maintaining ADRs, and deciding when a change crosses a boundary.
VibeGov starts from a simple assumption:
Agents should be autonomous inside clear boundaries, not free outside accountability.
The issue is the work contract
In AI-assisted delivery, the issue becomes more important, not less.
A weak issue gives the agent room to guess. A strong issue gives the agent a contract to execute.
That contract should define:
- the intended outcome
- why it matters
- scope and non-goals
- OpenSpec binding or
SPEC_GAP - acceptance criteria
- verification expectations
- risk level
- any required research, exploration, design, security, or architecture input
This is why a one-line issue should not move straight into development.
Fast capture is fine. Fast execution from unclear intent is not.
The work can start as:
Fix login weirdness.
But it should not reach implementation until the issue explains what is weird, what correct behaviour looks like, how it binds to the spec, and how the result will be verified.
Intake can be loose. Execution should not be.
The board is the operating system
The project board is not just a reporting tool. It is the operational state machine.
A simple board is enough:
- No status
- Backlog
- Ready
- In Progress - In Dev
- In Review - In Test
- Done
- Blocked
- Parking Lot
The important part is not the labels. It is what they mean.
Ready means the issue is buildable and releasable.
In Progress - In Dev means the Developer agent is actively delivering it.
In Review - In Test means the change is being validated through automation, verifier activity, or release confidence checks.
Done means the work has landed cleanly and the integration branch is healthy.
Blocked means progress needs an explicit unblocker, not silent waiting.
Parking Lot means the idea is acknowledged but intentionally outside the current path.
This gives agents a shared operating surface. They do not need to invent side queues, hidden TODOs, or chat-based promises.
The board is where state lives.
Ready means releasable
One of the most important rules in an agent delivery system is this:
Ready means releasable.
An issue should not enter Ready unless the work can safely land on the integration branch and move toward release.
That does not mean every issue must deliver a large user-facing feature. It means the increment should be coherent, integrated, and safe.
Bad ready work looks like:
- build half a feature and hide it
- create a parallel implementation path
- start a migration with no cutover plan
- add a feature toggle with no owner or removal condition
- implement speculative code for a future product decision
Good ready work looks like:
- deliver a complete behaviour change
- add a tested internal capability with a clear future use
- implement a paid feature as an explicit entitlement
- add an operational toggle with defined enabled and disabled behaviour
- create a migration step that leaves the system stable
Agents move quickly. That makes issue slicing more important.
If the work is not safe to land, it is not ready for Dev.
Done means green integration state
Code written is not done.
Tests passing locally is not done.
A branch that looks good is not done.
Done means the work has made it to the integration branch and that integration state is still green.
This matters because agent delivery can create a false sense of progress. The agent can produce code, explain the change, and sound confident. But until the work is integrated, validated, and traceable to the issue, it has not improved the product.
The Developer agent should own the path from ready issue to green integration state:
- start from a clean integration branch
- implement the issue
- update tests, docs, and config where required
- validate locally
- refresh from the current integration branch
- integrate the change according to repo policy
- watch automation
- fix immediately if the pipeline fails
- close the issue only when evidence is complete
This is not bureaucracy. It is delivery closure.
No wild forks
Branches are useful as temporary implementation workspaces.
They are not product states.
Long-lived branches, hidden futures, and parallel product lines create exactly the kind of ambiguity AI delivery should avoid.
The rule should be blunt:
All development must converge.
If a feature is worth building, it should be shaped into a releasable increment. If it is not ready to be released, it should remain in Backlog, Parking Lot, research, design, or architecture analysis.
Do not let the repo become a museum of abandoned futures.
Feature toggles are configuration, not hiding places
Feature toggles are not bad.
Undisciplined toggles are bad.
A feature toggle should be an explicit product, operational, or release control. It should not be a way to merge unfinished code and decide later what it means.
Good toggle use includes:
- paid feature entitlement
- tenant or customer-specific enablement
- environment-specific behaviour
- staged rollout
- operational kill switch
- time-bound experiment
For every toggle, define:
- name
- purpose
- owner
- configuration location
- default state
- enabled behaviour
- disabled behaviour
- tests for both states
- removal condition if temporary
The key rule is simple:
No feature should require code edits to enable after development.
If a feature is optional, paid, staged, or tenant-specific, build it that way from the start.
Toggles are configuration and product controls, not hiding places for incomplete work.
Separate roles are useful when they create real control
The goal is not to create an agent circus.
Separate roles are useful when they create clearer accountability.
A practical operating model can include:
plannerfor intake, prioritisation, backlog hygiene, and developer handoffarchitectfor system design, ADRs, boundaries, migrations, developer-experience architecture, and technical directiondesignerfor UI/UX intent, Design Language System stewardship, user flows, component states, and accessibility-by-designdeveloperfor issue execution, coding, testing, git hygiene, and integrationresearcherfor external evidence gathering, source evaluation, and cited synthesisexplorerfor repo, UI, and API exploration, evidence capture, finding triage, and spec gapsverifierfor independent QA, regression checks, acceptance evidence, and release confidencesecurityfor threat modelling, secrets, auth, privacy, dependency, licensing, and exposure reviewdocumenterfor READMEs, install guides, changelogs, user docs, and public commsmaintainerfor repo hygiene, branch closure, changelogs, versioning, and release readinessoperatorfor recurring sweeps, task/state orchestration, reminders, and follow-through
Not every issue should pass through every role.
That would kill delivery speed.
Instead, route work by need.
Researcher and Explorer feed evidence. Designer shapes experience intent. Security blocks unsafe change. Architect protects direction. Planner protects readiness. Developer ships. Verifier proves. Documenter keeps the written surface aligned. Maintainer keeps release and repo hygiene clean. Operator keeps the system moving.
The model is not many agents doing whatever they want.
It is governed autonomy.
Specialists should feed the spec, not bypass it
A clean pattern is:
Raw idea
↓
Planner triage
↓
Research / exploration / design / security input as needed
↓
Architect or Planner creates the build-ready issue
↓
Developer delivers
↓
Automation and Verifier validate
↓
Integration remains green
Specialist work is independent of code. A Researcher can answer a question. An Explorer can inspect the repo. A Designer can define the user flow. Security can identify controls.
But those outputs should flow back into the issue or OpenSpec before development starts.
Research and design should not bypass the accountable delivery contract.
Automation proves mechanics; governance preserves meaning
Automation is essential, but it cannot do the whole job.
Automation can prove:
- tests pass
- build succeeds
- lint and type checks pass
- secrets are not detected
- dependency checks are clean
- pipeline triggered
- artifact was produced
But automation cannot fully decide:
- whether the issue meant the right thing
- whether the architecture direction is sound
- whether the user experience is coherent
- whether the trade-off is acceptable
- whether the feature should exist
- whether scope was silently expanded
- whether the disabled state of a paid feature makes product sense
That is why governance still matters.
Automation is the proof layer. It does not replace accountability.
The real unlock is governed autonomy
The next phase of AI software delivery will not be won by giving agents unlimited freedom.
It will be won by teams that can give agents enough autonomy to move fast and enough governance to keep the system coherent.
That means:
- issues are treated as execution contracts
- OpenSpec captures requirement truth
- the project board carries operational state
- the integration branch remains the integration truth
- the release branch remains release truth
- agents act within role authority
- automation validates the mechanics
- security and verification provide independent confidence
- operators keep the loop moving
Vibe coding showed how quickly software can be produced when humans and AI work fluidly together.
The next step is making that flow reliable enough for serious delivery.
That is the shift from vibe coding to governed delivery.

