Observability
Forze is already observability-rich inside — a runtime tracer, a transaction tracer, structured logs. One call pushes that out to OpenTelemetry: a span and metrics for every operation, tagged with identity and correlation, so your operations show up in any tracing or metrics backend.
OpenTelemetry is a core dependency — the logging layer already uses it — so this is built in. You only bring an exporter.
Instrument every operation¶
Wrap the registry once, before you freeze it:
from forze.application.execution import instrument_operations
registry = instrument_operations(build_orders_registry())
frozen = registry.freeze()
That's the whole integration. It emits through the global OTel providers by
default; pass tracer= / meter= to target your own.
What you get¶
Every operation produces a span, named by its operation key and nested under
whatever span is already active (an incoming HTTP request, say). A failure
records the exception and sets the span status to ERROR — then re-raises it
unchanged. The span carries:
| Attribute | Value |
|---|---|
forze.operation / forze.operation.kind |
the operation key and query / command |
forze.execution_id / forze.correlation_id / forze.causation_id |
the invocation metadata |
forze.tenant_id / forze.principal_id |
the bound tenant and principal |
Alongside each span, two metrics — forze.operations (a counter) and
forze.operation.duration (a histogram, in ms) — labelled by operation, kind,
and outcome.
Resilience metrics¶
The resilience layer makes decisions worth watching —
retries, rejections, breaker trips, bulkhead backoff. instrument_resilience
exports them as always-on metrics, independent of any tracing gate, so a
production process with tracing off still reports them:
from forze.application.execution import instrument_resilience
instrument_resilience(ctx.resilience()) # once, when the scope is up
| Metric | What it carries |
|---|---|
forze.resilience.events (counter) |
every event — retry attempts, timeouts, rate-limit and bulkhead rejections, breaker transitions — labelled by event, policy, and route |
forze.resilience.breaker.state (gauge) |
breaker phase per policy/route: 0 closed, 1 half-open, 2 open |
forze.resilience.bulkhead.queue_depth (gauge) |
calls queued behind each bulkhead, sampled at collection |
forze.resilience.bulkhead.limit (gauge) |
the current adaptive-bulkhead concurrency limit |
forze.resilience.hedge.delay (gauge) |
the effective adaptive hedge delay (P² quantile estimate), in seconds |
Two reading notes: breaker_open counts the open transition and every
admission shed while open, so its rate tracks shed load; and a breaker that
never tripped reports no state at all — closed by absence.
Tenant pool metrics¶
Routed clients keep one connection pool per tenant in a
bounded LRU, and evicting a pool is expensive — the next request rebuilds the
connection from scratch. instrument_tenant_pools exports the churn
counters:
from forze.application.execution import instrument_tenant_pools
instrument_tenant_pools({"postgres": pg, "redis": redis})
Per client (labelled forze.client): forze.tenancy.pool.size and
….capacity gauges, plus cumulative ….created, ….disposed, and
….evicted_explicit counters. The alert worth setting: a sustained
creation rate while size sits at capacity means the LRU is thrashing —
hot tenants' pools evicted by cold one-off traffic, each rebuild paying full
connection establishment. The fix is usually a larger max_cached_tenants;
the metric tells you when.
Document L1 metrics¶
The in-process L1 exports its counters the same way:
from forze.application.integrations.document import instrument_document_l1
instrument_document_l1()
Per document (labelled forze.document): forze.cache.l1.size /
….capacity gauges and cumulative ….hits / ….misses / ….evictions
counters. The hit rate validates that the L1 is earning its staleness budget,
and sustained evictions at full capacity with a sagging hit rate is the
scan-pollution signature — the signal to switch the eviction policy to the
in-box W-TinyLFU store or raise capacity.
Logs correlate for free¶
configure_logging(otel_config=...) injects the active span's trace_id and
span_id into every log line. Because instrument_operations is what starts the
span, your structured logs line up with the operation trace automatically — no
extra wiring.
Bring your own exporter¶
Forze emits to the global tracer and meter providers; your application owns the SDK and exporter choice — OTLP, Prometheus, console, whatever your backend speaks. The OTel API and SDK ship with Forze, so you add only the exporter package and the few lines of standard OTel setup that point the providers at it.