Shutdown & fleets
Processes get told to stop — deploys, autoscaling, a node going away. And in production there is rarely one process: the same app runs as N replicas behind a load balancer. The runtime treats both as first-class: shutdown drains instead of dropping work, and a declared fleet posture catches the mistakes that only N replicas can make.
Graceful drain¶
ExecutionRuntime.shutdown() — and therefore scope() exit and
runtime_lifespan — does not tear infrastructure down under in-flight work.
It drains first:
- The scope stops admitting new top-level invocations. They fail with a
retryable
throttled(code="draining") — a 429 at the FastAPI edge, a requeue-worthy nack for a queue consumer. - In-flight operations get a bounded window —
drain_timeout, default 10 seconds — to finish before lifecycle teardown closes the clients they depend on. - Lifecycle shutdown runs as usual, in reverse wave order.
runtime = build_runtime(..., drain_timeout=timedelta(seconds=20))
An operation that is already running keeps all of its machinery: nested dispatch deliberately rides the outer invocation's slot, so draining never starves an admitted operation of its own dispatch chains. Zero in-flight work exits immediately; a window that expires logs the leftover count and proceeds — shutdown is never blocked indefinitely.
Readiness¶
The load balancer should stop routing before the drain window starts. The
runtime exposes its state — runtime.ready and runtime.draining — and the
FastAPI integration turns it into a probe:
from forze_fastapi.routes import attach_readiness_route
attach_readiness_route(router, runtime) # GET /readyz
200 while a scope is active and not draining; 503 draining once shutdown
flips the gate, 503 unavailable before the scope exists. Point your
orchestrator's readiness check here and the rollout sequence takes care of
itself: routing stops, in-flight work drains, teardown runs.
Declare the fleet posture¶
Some startup work is safe in one process and a stampede in twenty — N replicas
all running CREATE INDEX or a data migration at the same moment. Declaring
the posture makes that a composition-time error instead of a 3 a.m. incident:
from forze.application.execution import DeploymentProfile
runtime = build_runtime(..., deployment=DeploymentProfile.FLEET)
Under FLEET, assembly fails for any lifecycle step marked
mutates_shared_state=True that is not also singleton_guarded. The markers
are declared by the step author — mutation can't be detected structurally, so
the validation is honest-by-declaration: mark the steps that touch shared
backends, and the profile enforces that each one is guarded.
Singleton lifecycle steps¶
The guard itself ships in forze_kits: wrap a step in a distributed lock so
one replica runs it and the rest skip —
from forze.application.contracts.dlock import DistributedLockSpec
from forze_kits.lifecycle import singleton_lifecycle_step
step = singleton_lifecycle_step(
ensure_indexes_step,
spec=DistributedLockSpec(name="ensure-indexes"), # resolved from the scope
owner=instance_id,
)
You pass the lock spec, not a live port: the guard resolves the command port
from the execution context (ctx.dlock.command(spec)) at startup, so it slots
into a lifecycle plan that's assembled before any scope exists.
The first replica to acquire the lock runs the startup and releases it; replicas that find the lock held skip — the holder is doing the work. Shutdown later runs only on the replica whose startup actually executed. The step must be idempotent ("ensure"-style): a replica that starts after the holder released will acquire and run it again. Size the lock's TTL to comfortably exceed the step's duration — no heartbeat extends it here.
Migrations are deploy steps
singleton_lifecycle_step is for ensure-style work: indexes, queue
declarations, seed data. One-shot work like a schema migration wants
run-exactly-once semantics, which a skip-if-held lock does not give —
run it as a deploy step in your pipeline, not as a runtime step.
What's shared, what's per-process¶
Most coordination state already lives in your backends (idempotency records, distributed locks, the outbox). The resilience layer's state is process-local by default, and each piece has a deliberate fleet answer:
| State | Default | In a fleet |
|---|---|---|
| Circuit breaker | per-process | share via redis_circuit_breaker_store — one replica's open circuit protects them all |
| Rate limits | per-process (fleet rate = permits × replicas) |
share via redis_rate_limit_store — the declared rate becomes the fleet's rate |
| Bulkheads | per-process | stays local by design — fleet capacity is max_concurrency × replicas, and adaptive bulkheads converge across uncoordinated replicas |
Wiring for the shared stores is on the resilience page. The framework's periodic loops — the outbox relay tick, consumer crash-restart backoff — are jittered out of the box, so N replicas don't synchronize into a thundering herd against the same claim query.