Skip to main content

5 posts tagged with "execution"

View All Tags

· 10 min read
VibeGov Team

AI coding agents are getting good enough that the old question, "Can they write code?", is becoming less interesting.

The harder question is whether they can participate in a real delivery system without turning the repo into a mess.

Once agents can read issues, modify files, run tests, create branches, and merge work, the risk changes. The problem is no longer capability. The problem is control.

More agents do not automatically create more delivery. Without an operating model, they create duplicated work, unclear ownership, long-lived branches, hidden feature flags, broken integration, and a growing gap between what the system appears to be doing and what is actually safe to ship.

That is the problem VibeGov is designed to address.

The mistake is treating agents like clever freelancers

A repo does not need a crowd of clever freelancers.

It needs a governed delivery system.

In many AI-assisted workflows, each agent is given a task, a prompt, and access to the repo. That can work for a small change. It does not scale into reliable delivery.

The moment multiple agents are involved, the system needs answers to basic governance questions:

  • Who decides what the issue means?
  • Who decides whether the issue is ready to build?
  • Who owns the architecture boundary?
  • Who owns delivery into the integration branch?
  • Who owns the user experience and design-system contract?
  • Who verifies the outcome independently?
  • Who watches for stale work, broken state, and follow-through?
  • Who is allowed to block unsafe change?

If those answers are not explicit, agents will fill the gaps with assumptions.

And assumptions are where delivery drift begins.

Prompts are not governance

Agent instructions matter, but prompts alone are not enough.

A prompt can say:

Do not expand scope.

But the delivery system still needs a place where scope is defined, reviewed, and enforced.

A prompt can say:

Keep the repo clean.

But the workflow still needs branch rules, validation gates, issue evidence, and a clear definition of done.

A prompt can say:

Follow the architecture.

But the project still needs someone or something accountable for defining that architecture, maintaining ADRs, and deciding when a change crosses a boundary.

VibeGov starts from a simple assumption:

Agents should be autonomous inside clear boundaries, not free outside accountability.

The issue is the work contract

In AI-assisted delivery, the issue becomes more important, not less.

A weak issue gives the agent room to guess. A strong issue gives the agent a contract to execute.

That contract should define:

  • the intended outcome
  • why it matters
  • scope and non-goals
  • OpenSpec binding or SPEC_GAP
  • acceptance criteria
  • verification expectations
  • risk level
  • any required research, exploration, design, security, or architecture input

This is why a one-line issue should not move straight into development.

Fast capture is fine. Fast execution from unclear intent is not.

The work can start as:

Fix login weirdness.

But it should not reach implementation until the issue explains what is weird, what correct behaviour looks like, how it binds to the spec, and how the result will be verified.

Intake can be loose. Execution should not be.

The board is the operating system

The project board is not just a reporting tool. It is the operational state machine.

A simple board is enough:

  • No status
  • Backlog
  • Ready
  • In Progress - In Dev
  • In Review - In Test
  • Done
  • Blocked
  • Parking Lot

The important part is not the labels. It is what they mean.

Ready means the issue is buildable and releasable.

In Progress - In Dev means the Developer agent is actively delivering it.

In Review - In Test means the change is being validated through automation, verifier activity, or release confidence checks.

Done means the work has landed cleanly and the integration branch is healthy.

Blocked means progress needs an explicit unblocker, not silent waiting.

Parking Lot means the idea is acknowledged but intentionally outside the current path.

This gives agents a shared operating surface. They do not need to invent side queues, hidden TODOs, or chat-based promises.

The board is where state lives.

Ready means releasable

One of the most important rules in an agent delivery system is this:

Ready means releasable.

An issue should not enter Ready unless the work can safely land on the integration branch and move toward release.

That does not mean every issue must deliver a large user-facing feature. It means the increment should be coherent, integrated, and safe.

Bad ready work looks like:

  • build half a feature and hide it
  • create a parallel implementation path
  • start a migration with no cutover plan
  • add a feature toggle with no owner or removal condition
  • implement speculative code for a future product decision

Good ready work looks like:

  • deliver a complete behaviour change
  • add a tested internal capability with a clear future use
  • implement a paid feature as an explicit entitlement
  • add an operational toggle with defined enabled and disabled behaviour
  • create a migration step that leaves the system stable

Agents move quickly. That makes issue slicing more important.

If the work is not safe to land, it is not ready for Dev.

Done means green integration state

Code written is not done.

Tests passing locally is not done.

A branch that looks good is not done.

Done means the work has made it to the integration branch and that integration state is still green.

This matters because agent delivery can create a false sense of progress. The agent can produce code, explain the change, and sound confident. But until the work is integrated, validated, and traceable to the issue, it has not improved the product.

The Developer agent should own the path from ready issue to green integration state:

  1. start from a clean integration branch
  2. implement the issue
  3. update tests, docs, and config where required
  4. validate locally
  5. refresh from the current integration branch
  6. integrate the change according to repo policy
  7. watch automation
  8. fix immediately if the pipeline fails
  9. close the issue only when evidence is complete

This is not bureaucracy. It is delivery closure.

No wild forks

Branches are useful as temporary implementation workspaces.

They are not product states.

Long-lived branches, hidden futures, and parallel product lines create exactly the kind of ambiguity AI delivery should avoid.

The rule should be blunt:

All development must converge.

If a feature is worth building, it should be shaped into a releasable increment. If it is not ready to be released, it should remain in Backlog, Parking Lot, research, design, or architecture analysis.

Do not let the repo become a museum of abandoned futures.

Feature toggles are configuration, not hiding places

Feature toggles are not bad.

Undisciplined toggles are bad.

A feature toggle should be an explicit product, operational, or release control. It should not be a way to merge unfinished code and decide later what it means.

Good toggle use includes:

  • paid feature entitlement
  • tenant or customer-specific enablement
  • environment-specific behaviour
  • staged rollout
  • operational kill switch
  • time-bound experiment

For every toggle, define:

  • name
  • purpose
  • owner
  • configuration location
  • default state
  • enabled behaviour
  • disabled behaviour
  • tests for both states
  • removal condition if temporary

The key rule is simple:

No feature should require code edits to enable after development.

If a feature is optional, paid, staged, or tenant-specific, build it that way from the start.

Toggles are configuration and product controls, not hiding places for incomplete work.

Separate roles are useful when they create real control

The goal is not to create an agent circus.

Separate roles are useful when they create clearer accountability.

A practical operating model can include:

  • planner for intake, prioritisation, backlog hygiene, and developer handoff
  • architect for system design, ADRs, boundaries, migrations, developer-experience architecture, and technical direction
  • designer for UI/UX intent, Design Language System stewardship, user flows, component states, and accessibility-by-design
  • developer for issue execution, coding, testing, git hygiene, and integration
  • researcher for external evidence gathering, source evaluation, and cited synthesis
  • explorer for repo, UI, and API exploration, evidence capture, finding triage, and spec gaps
  • verifier for independent QA, regression checks, acceptance evidence, and release confidence
  • security for threat modelling, secrets, auth, privacy, dependency, licensing, and exposure review
  • documenter for READMEs, install guides, changelogs, user docs, and public comms
  • maintainer for repo hygiene, branch closure, changelogs, versioning, and release readiness
  • operator for recurring sweeps, task/state orchestration, reminders, and follow-through

Not every issue should pass through every role.

That would kill delivery speed.

Instead, route work by need.

Researcher and Explorer feed evidence. Designer shapes experience intent. Security blocks unsafe change. Architect protects direction. Planner protects readiness. Developer ships. Verifier proves. Documenter keeps the written surface aligned. Maintainer keeps release and repo hygiene clean. Operator keeps the system moving.

The model is not many agents doing whatever they want.

It is governed autonomy.

Specialists should feed the spec, not bypass it

A clean pattern is:

Raw idea

Planner triage

Research / exploration / design / security input as needed

Architect or Planner creates the build-ready issue

Developer delivers

Automation and Verifier validate

Integration remains green

Specialist work is independent of code. A Researcher can answer a question. An Explorer can inspect the repo. A Designer can define the user flow. Security can identify controls.

But those outputs should flow back into the issue or OpenSpec before development starts.

Research and design should not bypass the accountable delivery contract.

Automation proves mechanics; governance preserves meaning

Automation is essential, but it cannot do the whole job.

Automation can prove:

  • tests pass
  • build succeeds
  • lint and type checks pass
  • secrets are not detected
  • dependency checks are clean
  • pipeline triggered
  • artifact was produced

But automation cannot fully decide:

  • whether the issue meant the right thing
  • whether the architecture direction is sound
  • whether the user experience is coherent
  • whether the trade-off is acceptable
  • whether the feature should exist
  • whether scope was silently expanded
  • whether the disabled state of a paid feature makes product sense

That is why governance still matters.

Automation is the proof layer. It does not replace accountability.

The real unlock is governed autonomy

The next phase of AI software delivery will not be won by giving agents unlimited freedom.

It will be won by teams that can give agents enough autonomy to move fast and enough governance to keep the system coherent.

That means:

  • issues are treated as execution contracts
  • OpenSpec captures requirement truth
  • the project board carries operational state
  • the integration branch remains the integration truth
  • the release branch remains release truth
  • agents act within role authority
  • automation validates the mechanics
  • security and verification provide independent confidence
  • operators keep the loop moving

Vibe coding showed how quickly software can be produced when humans and AI work fluidly together.

The next step is making that flow reliable enough for serious delivery.

That is the shift from vibe coding to governed delivery.

· 3 min read
VibeGov Team

A lot of agent systems now know how to move fast.

That part is getting easier.

The harder problem is keeping fast execution legible, governable, and closable.

The real upgrade teams need

The next upgrade is not more agent theater. It is not longer plans. It is not status spam.

It is a tighter operating shape:

  • direct execution on bounded work,
  • verification before completion claims,
  • concise checkpoints at meaningful state changes,
  • explicit handling of inherited state,
  • and closure that reaches the governed landing path.

That is what dependable execution looks like.

What strong execution should feel like

A healthy implementation loop should feel crisp.

When the task is clear, the agent should:

  • gather the needed context,
  • make the change,
  • run the right proof,
  • close the state honestly,
  • and stop pretending that "edited files" means finished work.

That is the productive part of high-agency execution.

What goes wrong when speed loses governance

Fast execution becomes dangerous when teams let it collapse into black-box momentum.

Common failure modes look like this:

  • inherited repo mess ignored in the name of progress,
  • silence mistaken for professionalism,
  • passing build output treated as completion,
  • risky decisions taken without visible boundary,
  • and residue pushed into the next work unit.

These are not small style issues. They are reliability problems.

The operating rule VibeGov should encode

The useful rule is simple:

Keep execution sharp, but make closure and legibility non-negotiable.

That means:

  • tool-first execution,
  • bounded work units,
  • truthful verifier and evaluator gates,
  • concise operator-visible checkpoints,
  • explicit inherited-state assessment,
  • and governed git/repo closure.

Legibility is not the same as chatter

Teams often get stuck between two bad options:

  • constant narration, or
  • total silence.

The better target is interrupt-efficient legibility.

Operators should be able to see:

  • when a slice started or resumed,
  • when the plan materially changed,
  • when a blocker or decision boundary appeared,
  • what validation actually passed or failed,
  • and how the slice closed.

That is enough for oversight without drowning the channel.

Closure is part of the work

A slice is not complete when the code exists.

A slice is complete when the governed path is closed:

  • issue/spec state is updated where required,
  • evidence exists,
  • git state is accounted for,
  • the merge or follow-up path is explicit,
  • and the repo returns to its expected base state.

If that part is missing, the execution loop is still open.

Practical takeaway

The goal is not to make agents slower.

The goal is to make fast execution dependable.

A strong system should feel like this:

  • less ceremony,
  • less ambiguity,
  • less hidden residue,
  • more direct proof,
  • more reliable closure.

That is what VibeGov should normalize.

· 2 min read
VibeGov Team

One-liner issues are common in fast-moving teams.

They are useful for capturing intent quickly, but dangerous if treated as execution-ready work.

A one-liner like:

"Fix login weirdness"

is not enough to implement safely.

The problem with one-liners

If one-liners go straight into implementation, teams usually get:

  • mismatched outcomes (different people infer different intent)
  • poor traceability (no spec binding)
  • low-quality verification (unclear acceptance)
  • rework and issue churn

In short: speed at intake, chaos at execution.

The VibeGov approach

Keep one-liners for capture speed, but require intake hardening before execution.

Rule

A one-liner issue must not move directly to implementation.

Before execution, convert it into implementation-ready intent by:

  1. Binding to existing OpenSpec requirement IDs, or
  2. Creating/expanding spec coverage when missing (SPEC_GAP -> requirement), and
  3. Upgrading the issue body to implementation-grade quality.

Only then does it enter active implementation.

Practical hardening checklist

For each one-liner, add:

  • clear outcome (what success looks like)
  • why it matters
  • in scope / out of scope
  • OpenSpec binding (ID/path or SPEC_GAP)
  • acceptance criteria
  • verification expectations

This preserves speed while restoring delivery clarity.

Why this works

  • intake stays fast (capture now, clarify before build)
  • implementation gets deterministic requirements
  • spec and backlog stay aligned
  • evidence quality improves
  • rework drops over time

Use two backlog states:

  1. Intake/Triage

    • one-liners allowed
    • not execution-ready
  2. Ready for Execution

    • hardened issue body
    • spec-bound
    • acceptance + verification defined

This simple split prevents governance bypass while keeping momentum.

Bottom line

One-liner issues are good for capture, not for execution.

Treat them as raw intake, harden them through spec binding and issue-quality upgrades, then build with confidence.

· 4 min read
VibeGov Team

A pattern that works well in real project delivery is splitting responsibilities across agents with clear contracts.

In current VibeGov terms, this is really a coordinated Development + Exploration operating model:

  • the builder primarily runs in Development mode
  • the validator primarily runs in Exploration mode
  • release verification stays inside the Development delivery path as a shipping gate

The pattern

Use two independent lanes:

  1. Builder lane (shipping agent)

    • implements features/fixes
    • runs tests
    • produces commits/artifacts
  2. Validator lane (independent QA/spec agent)

    • behaves like a normal user
    • opens the app in browser and clicks real flows
    • checks every clickable action (plus keyboard paths)
    • compares behavior against OpenSpec/contracts
    • creates focused backlog issues for each mismatch

This is exactly the setup where one agent is busy building and another agent/device is continuously validating outcomes against real UI behavior.

Why this works

  • Separation of concern: builder optimizes for delivery, validator optimizes for correctness.
  • Reduced bias: independent validation catches assumptions the builder misses.
  • Faster backlog hardening: defects become concrete, reproducible issues quickly.
  • Spec quality improves: uncovered behaviors force explicit requirement IDs and test mappings.

Operating contract

For each discovered gap, enforce:

  1. Issue
  2. Spec update (append-only IDs)
  3. Validation evidence
  4. Commit linked to issue

No “done” without runnable proof.

  • Builder runs continuously through priority backlog.
  • Validator runs on a fixed schedule (for example, every 45–60 minutes) and after major merges.
  • Release-aware checks can skip full reruns if build/version hasn’t changed.

Minimum evidence bundle per validation cycle

  • audited screens list
  • action inventory (every clickable)
  • pass/fail per action
  • keyboard traversal evidence (Tab, Shift+Tab, Enter, Space)
  • persistence/mutation verification where actions claim to save, delete, sync, import, or reconfigure
  • issue files for failures with expected vs actual
  • spec coverage reconciliation notes
  • explicit completeness status for the validation scope

Required issue fields (for validator-created backlog items)

When the validator opens an issue, include these fields every time:

  • Screen/route: exact URL/route where failure occurred
  • Control type: button/link/icon/menu item/form field/dialog action
  • Expected intent: what should happen (route/state/data/error)
  • Actual result: what happened instead
  • Repro steps: shortest deterministic path
  • Evidence links: screenshot/video/report path
  • Spec link/ID: existing requirement ID or SPEC_GAP
  • Suggested fix path: likely file/module owner

This keeps backlog items implementation-ready and eliminates “cannot reproduce” churn.

UI layering checks you should always include

Agents often miss visual-layer defects that humans catch immediately. Make these first-class checks:

  • Dialog visibility: modal/drawer appears when triggered and remains visible while active
  • Focus trap: keyboard focus stays in dialog while open
  • Backdrop behavior: backdrop blocks underlying clicks while modal is active
  • Z-index correctness: dialogs/toasts/menus are not hidden behind headers/sidebars/dev overlays
  • Escape/Close behavior: Esc, close icon, and Cancel all behave consistently

If any layering issue is found, file a dedicated issue (don’t bury it under generic “UI bug”).

CI handoff pattern (dev bot → validator bot)

A robust release handoff for bot teams, with release verification treated as part of Development:

  1. Dev bot pushes issue-linked commit.
  2. Dev bot monitors pipeline trigger for up to 30 seconds.
    • Poll CI by commit SHA every ~5s.
  3. If CI run appears:
    • post run URL + SHA in issue evidence comment,
    • hand off to validator bot.
  4. If CI run fails early:
    • update same issue with failing job/step/log snippet,
    • fix immediately on the same ticket,
    • commit/push again with same issue prefix.
  5. If no CI run appears within 30s:
    • create/update P0 CI-trigger blocker issue,
    • stop downstream handoff until trigger is restored.

This prevents false “done” states where code is pushed but release validation never actually started.

Practical tips

  • Keep one issue per failed behavior.
  • Keep commits scoped to one issue whenever possible.
  • Track unresolved blockers publicly in backlog (don’t hide them in chat).
  • Treat spec drift as a first-class defect.
  • For release workflows, always include commit SHA + CI run URL in handoff comments.

If you run this loop consistently, backlog quality improves while velocity stays high—because Development and Exploration happen in parallel, not serially.

· 2 min read
VibeGov Team

Most AI delivery teams don’t fail from lack of output. They fail from unclear status, hidden blockers, and weak handoffs.

GOV-03 is the communication layer that turns agent activity into decision-grade visibility.

The real problem

Without communication rules, teams get:

  • "working on it" updates with no evidence
  • "done" claims with no verification context
  • blocker messages with no owner or next step
  • handoffs that lose scope and intent

That creates management noise, not delivery clarity.

What GOV-03 changes

GOV-03 makes every update actionable.

A useful execution update should answer:

  1. What changed?
  2. What proof exists?
  3. What is blocked (if anything)?
  4. What happens next?

This is the minimum needed for reliable human oversight and multi-agent continuity.

Why this matters commercially

Clear communication rules improve:

  • throughput predictability
  • confidence in delivery reporting
  • escalation speed when risk appears
  • onboarding speed for new contributors

In short: better communication quality directly improves delivery quality.

Practical rollout in one day

  • standardize one checkpoint update format
  • require evidence links for completion claims
  • require explicit blocker owner + next action
  • reject vague status updates

Small discipline, big clarity gain.

Social takeaway

If your AI delivery feels busy but unclear, you don’t need more output. You need better communication contracts.

Read the canonical page: