3 posts tagged with "multi-agent"

View All Tags

ACP Setup Is Not Enough: The Parent Must Keep Supervising

March 19, 2026 · 3 min read

VibeGov Team

Governance Foundation

A multi-agent system can look healthy for exactly the wrong reason:

the worker spawned successfully
the session exists
the runtime says it is still alive

That is not the same thing as governed execution.

Recent project learnings made this painfully clear. A parent thread can successfully launch a worker thread and still fail the real governance test by going quiet afterwards.

The hidden failure mode

People often focus on whether ACP setup works at all:

can the worker spawn?
can the runtime create a session?
can you read results back later?

Those are important setup questions. But they are not the whole question.

The deeper question is:

does the parent keep visible ownership of the delegated unit until completion, blocker, or explicit handoff?

If the answer is no, the system has a supervision problem even if the worker runtime is technically healthy.

Worker health is not governance health

A worker can be:

alive
executing
emitting some output

And the governance can still be weak.

Why? Because a silent parent creates ambiguity:

who owns the unit right now?
how long has it been running?
has anyone checked progress recently?
is the latest state meaningful progress or a stale transcript?
when will the next supervisory action happen?

Without those answers, a parent thread is not orchestrating. It is just launching.

Delegation does not end accountability

This is the key lesson.

Delegation does not transfer orchestration accountability.

The parent may delegate execution. It does not delegate responsibility for visible supervision.

In governed systems, the parent should still:

announce the delegated unit clearly
report worker identity when available
perform early follow-up checks
continue periodic supervision for long-running work
report completion, blocker, or recovery action explicitly

That is what turns delegation into governed execution instead of fire-and-forget behavior.

Why cadence matters

A common failure pattern is vague follow-through:

one start message
maybe one worker id
then silence
then, much later, either a result or nothing

That pattern is operationally weak because it hides whether the parent is still on top of the unit.

Governance should not necessarily hardcode one universal timing rule for every environment. But governance should require that a system define:

an early-follow-up checkpoint window
an ongoing supervision cadence for long-running work
an escalation expectation when progress is stale or ambiguous

The runtime or project docs can set the exact numbers. Governance should enforce the accountability shape.

What this means for ACP setup docs

ACP setup docs should not stop at:

how to spawn sessions
how to configure backends
how to attach tools
how to read transcript output

They should also explain:

how the parent tracks ownership after delegation
how follow-up checks are scheduled or enforced
how elapsed runtime is surfaced
how stale or missing readback is escalated
how the parent proves it is still supervising the worker thread

That is where setup guidance meets governance.

The better practical test

Instead of asking only:

did the worker spawn successfully?

Ask:

if this worker runs for 20 minutes, can a human still see who owns it, how long it has been running, what its latest known state is, and what the next supervisory step will be?

If not, the setup may be functional but it is not yet governable.

Explicit Orchestration Beats Hidden Agent Pyramids

March 18, 2026 · 3 min read

VibeGov Team

Governance Foundation

A lot of multi-agent failure is not caused by weak models. It is caused by weak structure.

One agent quietly spawns another. That worker quietly turns into a coordinator. Soon the team has a small invisible management hierarchy inside the runtime, while the human only sees a vague status line and a missing result.

VibeGov should be stricter than that.

The governance principle

Governed execution should use explicit orchestration and bounded work units.

That means the parent orchestration context should:

select one tracked unit of work
announce that delegation clearly
hand the unit to one bounded worker or lane
receive a visible result bundle
only then continue to the next unit by default

This is not an argument against capable workers. It is an argument against hidden coordination.

Why hidden agent pyramids are bad governance

When a worker turns into a silent coordinator, teams lose the things governance is supposed to protect:

Visibility — humans cannot tell what is actually running
Accountability — ownership gets blurred across layers
Recovery — failures become harder to isolate and restart
Evidence quality — outputs arrive detached from the unit that produced them
Scope control — sub-work expands without an explicit decision

A system can still look busy while becoming less governable. That is the trap.

Sequential bounded stages are usually the safer default

People sometimes overcorrect and say all work must be linear forever. That is too absolute.

The better rule is:

prefer sequential bounded stages when they improve observability, recoverability, or handoff clarity.

If a workflow is easier to inspect, interrupt, retry, or hand off when split into clear stages, that is the right default.

Parallelism is still allowed

VibeGov is not anti-parallel. It is anti-opaque.

Parallel lanes are fine when each lane still has:

an explicit owner
bounded scope
visible checkpoints
clear evidence outputs
recoverable failure handling

The issue is not "more than one worker." The issue is "more than one hidden coordinator."

What belongs in governance vs implementation docs

This principle belongs in governance because it defines the shape of accountable execution.

What does not belong in governance:

exact runtime settings
queue TTLs
model defaults
local file paths
wrapper commands
temporary transcript or recovery hacks
patch-specific engineering notes

Those are implementation details, runbook material, or architecture notes. Useful, yes. Governance, no.

The practical test

If a human asks, "what is running right now, on which tracked unit, with what evidence expected?" the system should answer that directly.

If the honest answer is, "well, one worker spawned another coordinator which then delegated a few things internally," governance has already weakened.

That is why explicit orchestration matters. Not because it is pretty, but because it keeps multi-agent delivery legible under pressure.

Getting Other Agents to Help Your Project (Without Losing Quality)

March 4, 2026 · 4 min read

VibeGov Team

Governance Foundation

A pattern that works well in real project delivery is splitting responsibilities across agents with clear contracts.

In current VibeGov terms, this is really a coordinated Development + Exploration operating model:

the builder primarily runs in Development mode
the validator primarily runs in Exploration mode
release verification stays inside the Development delivery path as a shipping gate

The pattern

Use two independent lanes:

Builder lane (shipping agent)
- implements features/fixes
- runs tests
- produces commits/artifacts
Validator lane (independent QA/spec agent)
- behaves like a normal user
- opens the app in browser and clicks real flows
- checks every clickable action (plus keyboard paths)
- compares behavior against OpenSpec/contracts
- creates focused backlog issues for each mismatch

This is exactly the setup where one agent is busy building and another agent/device is continuously validating outcomes against real UI behavior.

Why this works

Separation of concern: builder optimizes for delivery, validator optimizes for correctness.
Reduced bias: independent validation catches assumptions the builder misses.
Faster backlog hardening: defects become concrete, reproducible issues quickly.
Spec quality improves: uncovered behaviors force explicit requirement IDs and test mappings.

Operating contract

For each discovered gap, enforce:

Issue
Spec update (append-only IDs)
Validation evidence
Commit linked to issue

No “done” without runnable proof.

Recommended cadence

Builder runs continuously through priority backlog.
Validator runs on a fixed schedule (for example, every 45–60 minutes) and after major merges.
Release-aware checks can skip full reruns if build/version hasn’t changed.

Minimum evidence bundle per validation cycle

audited screens list
action inventory (every clickable)
pass/fail per action
keyboard traversal evidence (Tab, Shift+Tab, Enter, Space)
persistence/mutation verification where actions claim to save, delete, sync, import, or reconfigure
issue files for failures with expected vs actual
spec coverage reconciliation notes
explicit completeness status for the validation scope

Required issue fields (for validator-created backlog items)

When the validator opens an issue, include these fields every time:

Screen/route: exact URL/route where failure occurred
Control type: button/link/icon/menu item/form field/dialog action
Expected intent: what should happen (route/state/data/error)
Actual result: what happened instead
Repro steps: shortest deterministic path
Evidence links: screenshot/video/report path
Spec link/ID: existing requirement ID or SPEC_GAP
Suggested fix path: likely file/module owner

This keeps backlog items implementation-ready and eliminates “cannot reproduce” churn.

UI layering checks you should always include

Agents often miss visual-layer defects that humans catch immediately. Make these first-class checks:

Dialog visibility: modal/drawer appears when triggered and remains visible while active
Focus trap: keyboard focus stays in dialog while open
Backdrop behavior: backdrop blocks underlying clicks while modal is active
Z-index correctness: dialogs/toasts/menus are not hidden behind headers/sidebars/dev overlays
Escape/Close behavior: Esc, close icon, and Cancel all behave consistently

If any layering issue is found, file a dedicated issue (don’t bury it under generic “UI bug”).

CI handoff pattern (dev bot → validator bot)

A robust release handoff for bot teams, with release verification treated as part of Development:

Dev bot pushes issue-linked commit.
Dev bot monitors pipeline trigger for up to 30 seconds.
- Poll CI by commit SHA every ~5s.
If CI run appears:
- post run URL + SHA in issue evidence comment,
- hand off to validator bot.
If CI run fails early:
- update same issue with failing job/step/log snippet,
- fix immediately on the same ticket,
- commit/push again with same issue prefix.
If no CI run appears within 30s:
- create/update P0 CI-trigger blocker issue,
- stop downstream handoff until trigger is restored.

This prevents false “done” states where code is pushed but release validation never actually started.

Practical tips

Keep one issue per failed behavior.
Keep commits scoped to one issue whenever possible.
Track unresolved blockers publicly in backlog (don’t hide them in chat).
Treat spec drift as a first-class defect.
For release workflows, always include commit SHA + CI run URL in handoff comments.

If you run this loop consistently, backlog quality improves while velocity stays high—because Development and Exploration happen in parallel, not serially.

The hidden failure mode​

Worker health is not governance health​

Delegation does not end accountability​

Why cadence matters​

What this means for ACP setup docs​

The better practical test​

Related docs​

The governance principle​

Why hidden agent pyramids are bad governance​

Sequential bounded stages are usually the safer default​

Parallelism is still allowed​

What belongs in governance vs implementation docs​

The practical test​

Related docs​

The pattern​

Why this works​

Operating contract​

Recommended cadence​

Minimum evidence bundle per validation cycle​

Required issue fields (for validator-created backlog items)​

UI layering checks you should always include​

CI handoff pattern (dev bot → validator bot)​

Practical tips​

Related docs​

The hidden failure mode

Worker health is not governance health

Delegation does not end accountability

Why cadence matters

What this means for ACP setup docs

The better practical test

Related docs

The governance principle

Why hidden agent pyramids are bad governance

Sequential bounded stages are usually the safer default

Parallelism is still allowed

What belongs in governance vs implementation docs

The practical test

Related docs

The pattern

Why this works

Operating contract

Recommended cadence

Minimum evidence bundle per validation cycle

Required issue fields (for validator-created backlog items)

UI layering checks you should always include

CI handoff pattern (dev bot → validator bot)

Practical tips

Related docs