Skip to main content

3 posts tagged with "multi-agent"

View All Tags

· 3 min read
VibeGov Team

A multi-agent system can look healthy for exactly the wrong reason:

  • the worker spawned successfully
  • the session exists
  • the runtime says it is still alive

That is not the same thing as governed execution.

Recent project learnings made this painfully clear. A parent thread can successfully launch a worker thread and still fail the real governance test by going quiet afterwards.

The hidden failure mode

People often focus on whether ACP setup works at all:

  • can the worker spawn?
  • can the runtime create a session?
  • can you read results back later?

Those are important setup questions. But they are not the whole question.

The deeper question is:

does the parent keep visible ownership of the delegated unit until completion, blocker, or explicit handoff?

If the answer is no, the system has a supervision problem even if the worker runtime is technically healthy.

Worker health is not governance health

A worker can be:

  • alive
  • executing
  • emitting some output

And the governance can still be weak.

Why? Because a silent parent creates ambiguity:

  • who owns the unit right now?
  • how long has it been running?
  • has anyone checked progress recently?
  • is the latest state meaningful progress or a stale transcript?
  • when will the next supervisory action happen?

Without those answers, a parent thread is not orchestrating. It is just launching.

Delegation does not end accountability

This is the key lesson.

Delegation does not transfer orchestration accountability.

The parent may delegate execution. It does not delegate responsibility for visible supervision.

In governed systems, the parent should still:

  1. announce the delegated unit clearly
  2. report worker identity when available
  3. perform early follow-up checks
  4. continue periodic supervision for long-running work
  5. report completion, blocker, or recovery action explicitly

That is what turns delegation into governed execution instead of fire-and-forget behavior.

Why cadence matters

A common failure pattern is vague follow-through:

  • one start message
  • maybe one worker id
  • then silence
  • then, much later, either a result or nothing

That pattern is operationally weak because it hides whether the parent is still on top of the unit.

Governance should not necessarily hardcode one universal timing rule for every environment. But governance should require that a system define:

  • an early-follow-up checkpoint window
  • an ongoing supervision cadence for long-running work
  • an escalation expectation when progress is stale or ambiguous

The runtime or project docs can set the exact numbers. Governance should enforce the accountability shape.

What this means for ACP setup docs

ACP setup docs should not stop at:

  • how to spawn sessions
  • how to configure backends
  • how to attach tools
  • how to read transcript output

They should also explain:

  • how the parent tracks ownership after delegation
  • how follow-up checks are scheduled or enforced
  • how elapsed runtime is surfaced
  • how stale or missing readback is escalated
  • how the parent proves it is still supervising the worker thread

That is where setup guidance meets governance.

The better practical test

Instead of asking only:

did the worker spawn successfully?

Ask:

if this worker runs for 20 minutes, can a human still see who owns it, how long it has been running, what its latest known state is, and what the next supervisory step will be?

If not, the setup may be functional but it is not yet governable.

· 3 min read
VibeGov Team

A lot of multi-agent failure is not caused by weak models. It is caused by weak structure.

One agent quietly spawns another. That worker quietly turns into a coordinator. Soon the team has a small invisible management hierarchy inside the runtime, while the human only sees a vague status line and a missing result.

VibeGov should be stricter than that.

The governance principle

Governed execution should use explicit orchestration and bounded work units.

That means the parent orchestration context should:

  1. select one tracked unit of work
  2. announce that delegation clearly
  3. hand the unit to one bounded worker or lane
  4. receive a visible result bundle
  5. only then continue to the next unit by default

This is not an argument against capable workers. It is an argument against hidden coordination.

Why hidden agent pyramids are bad governance

When a worker turns into a silent coordinator, teams lose the things governance is supposed to protect:

  • Visibility — humans cannot tell what is actually running
  • Accountability — ownership gets blurred across layers
  • Recovery — failures become harder to isolate and restart
  • Evidence quality — outputs arrive detached from the unit that produced them
  • Scope control — sub-work expands without an explicit decision

A system can still look busy while becoming less governable. That is the trap.

Sequential bounded stages are usually the safer default

People sometimes overcorrect and say all work must be linear forever. That is too absolute.

The better rule is:

prefer sequential bounded stages when they improve observability, recoverability, or handoff clarity.

If a workflow is easier to inspect, interrupt, retry, or hand off when split into clear stages, that is the right default.

Parallelism is still allowed

VibeGov is not anti-parallel. It is anti-opaque.

Parallel lanes are fine when each lane still has:

  • an explicit owner
  • bounded scope
  • visible checkpoints
  • clear evidence outputs
  • recoverable failure handling

The issue is not "more than one worker." The issue is "more than one hidden coordinator."

What belongs in governance vs implementation docs

This principle belongs in governance because it defines the shape of accountable execution.

What does not belong in governance:

  • exact runtime settings
  • queue TTLs
  • model defaults
  • local file paths
  • wrapper commands
  • temporary transcript or recovery hacks
  • patch-specific engineering notes

Those are implementation details, runbook material, or architecture notes. Useful, yes. Governance, no.

The practical test

If a human asks, "what is running right now, on which tracked unit, with what evidence expected?" the system should answer that directly.

If the honest answer is, "well, one worker spawned another coordinator which then delegated a few things internally," governance has already weakened.

That is why explicit orchestration matters. Not because it is pretty, but because it keeps multi-agent delivery legible under pressure.

· 4 min read
VibeGov Team

A pattern that works well in real project delivery is splitting responsibilities across agents with clear contracts.

In current VibeGov terms, this is really a coordinated Development + Exploration operating model:

  • the builder primarily runs in Development mode
  • the validator primarily runs in Exploration mode
  • release verification stays inside the Development delivery path as a shipping gate

The pattern

Use two independent lanes:

  1. Builder lane (shipping agent)

    • implements features/fixes
    • runs tests
    • produces commits/artifacts
  2. Validator lane (independent QA/spec agent)

    • behaves like a normal user
    • opens the app in browser and clicks real flows
    • checks every clickable action (plus keyboard paths)
    • compares behavior against OpenSpec/contracts
    • creates focused backlog issues for each mismatch

This is exactly the setup where one agent is busy building and another agent/device is continuously validating outcomes against real UI behavior.

Why this works

  • Separation of concern: builder optimizes for delivery, validator optimizes for correctness.
  • Reduced bias: independent validation catches assumptions the builder misses.
  • Faster backlog hardening: defects become concrete, reproducible issues quickly.
  • Spec quality improves: uncovered behaviors force explicit requirement IDs and test mappings.

Operating contract

For each discovered gap, enforce:

  1. Issue
  2. Spec update (append-only IDs)
  3. Validation evidence
  4. Commit linked to issue

No “done” without runnable proof.

  • Builder runs continuously through priority backlog.
  • Validator runs on a fixed schedule (for example, every 45–60 minutes) and after major merges.
  • Release-aware checks can skip full reruns if build/version hasn’t changed.

Minimum evidence bundle per validation cycle

  • audited screens list
  • action inventory (every clickable)
  • pass/fail per action
  • keyboard traversal evidence (Tab, Shift+Tab, Enter, Space)
  • persistence/mutation verification where actions claim to save, delete, sync, import, or reconfigure
  • issue files for failures with expected vs actual
  • spec coverage reconciliation notes
  • explicit completeness status for the validation scope

Required issue fields (for validator-created backlog items)

When the validator opens an issue, include these fields every time:

  • Screen/route: exact URL/route where failure occurred
  • Control type: button/link/icon/menu item/form field/dialog action
  • Expected intent: what should happen (route/state/data/error)
  • Actual result: what happened instead
  • Repro steps: shortest deterministic path
  • Evidence links: screenshot/video/report path
  • Spec link/ID: existing requirement ID or SPEC_GAP
  • Suggested fix path: likely file/module owner

This keeps backlog items implementation-ready and eliminates “cannot reproduce” churn.

UI layering checks you should always include

Agents often miss visual-layer defects that humans catch immediately. Make these first-class checks:

  • Dialog visibility: modal/drawer appears when triggered and remains visible while active
  • Focus trap: keyboard focus stays in dialog while open
  • Backdrop behavior: backdrop blocks underlying clicks while modal is active
  • Z-index correctness: dialogs/toasts/menus are not hidden behind headers/sidebars/dev overlays
  • Escape/Close behavior: Esc, close icon, and Cancel all behave consistently

If any layering issue is found, file a dedicated issue (don’t bury it under generic “UI bug”).

CI handoff pattern (dev bot → validator bot)

A robust release handoff for bot teams, with release verification treated as part of Development:

  1. Dev bot pushes issue-linked commit.
  2. Dev bot monitors pipeline trigger for up to 30 seconds.
    • Poll CI by commit SHA every ~5s.
  3. If CI run appears:
    • post run URL + SHA in issue evidence comment,
    • hand off to validator bot.
  4. If CI run fails early:
    • update same issue with failing job/step/log snippet,
    • fix immediately on the same ticket,
    • commit/push again with same issue prefix.
  5. If no CI run appears within 30s:
    • create/update P0 CI-trigger blocker issue,
    • stop downstream handoff until trigger is restored.

This prevents false “done” states where code is pushed but release validation never actually started.

Practical tips

  • Keep one issue per failed behavior.
  • Keep commits scoped to one issue whenever possible.
  • Track unresolved blockers publicly in backlog (don’t hide them in chat).
  • Treat spec drift as a first-class defect.
  • For release workflows, always include commit SHA + CI run URL in handoff comments.

If you run this loop consistently, backlog quality improves while velocity stays high—because Development and Exploration happen in parallel, not serially.