Skip to Content
ConceptsThe auto-suspension safeguard

The auto-suspension safeguard

If an agent fails three calls in a row, AgentValet suspends it. Its next call comes back denied — not because the scope is missing, but because the agent itself is offline. To get it running again, you re-enable it in the dashboard.

This is the circuit breaker. Its job is to limit the blast radius of an agent that’s gone wrong, before you’ve had time to notice.

When it trips

A single counter per agent: circuit_breaker_failures. Every failed call increments it. Three failures and the agent’s status flips to suspended.

What counts as a failure:

  • Signature verification failed (wrong key, expired JWT)
  • Permission check failed (scope not granted, agent revoked)
  • Proxy-side errors that prevent the call from reaching the platform
  • Upstream platform errors that survive Concierge’s retry logic

What does not count:

  • pending_approval returns (the call was queued, not failed)
  • Transient upstream errors that Concierge retried successfully

When it resets

Any successful call resets the counter to zero and clears the suspension. There’s no time-based decay — the agent stays at “2 failures in the bank” indefinitely until either a third failure trips it or a success clears the slate.

This means a healthy agent can drift up to 2 failures over its lifetime without consequence. Only consecutive trouble — three failures with no successes between them — actually trips the breaker.

What the agent sees

When the breaker is open, the agent’s next call gets a denial. The reason in the response is agent_suspended or circuit_breaker_open, depending on the path. The error envelope includes a correlation_id the agent can include in a self-report, and the agent’s own MCP server may surface this to you naturally (“AgentValet says this agent is suspended — want me to lodge a diagnostic?”).

The agent cannot un-suspend itself. There’s no retry strategy that gets past it. That’s the design.

Bringing it back

Open Agents → [the agent] → Resume in the dashboard. The status flips back to active, the failure counter resets to zero, and the agent’s next call goes through normally.

Before you do that, it’s worth understanding why the breaker tripped. Open the agent’s audit log and look at the failed calls. Common patterns:

  • Same scope failing repeatedly — the agent was granted the scope but the upstream API is rejecting the call. Often an expired OAuth token; look for reauth_needed on the platform connection.
  • Bad signatures — the agent is using an old or wrong private key. Re-register, get a fresh key.
  • Scope-not-granted on a scope the agent expected — the agent’s permission was revoked between when it was registered and now.

Resuming an agent without understanding why it failed often just trips the breaker again within minutes.

Why three

It’s a tradeoff. One failure is too aggressive — transient blips happen and you don’t want every flaky network to suspend your fleet. Five or ten is too lenient — by the time you’ve noticed, a misbehaving agent has done meaningful damage. Three is the smallest number that filters noise without giving a runaway agent room to spiral.

The threshold isn’t currently configurable per agent. If your usage suggests it should be, that’s the kind of feedback worth sending.

Next

Last updated on