Patterns
πŸ”„

Graph State Machines

Finite state machines implemented as graphs for workflow control

Complexity: mediumPattern

Core Mechanism

Graph State Machines represent control flow as states (nodes) and event-driven transitions (edges) with guard conditions and actions. Statecharts augment this with hierarchical states, orthogonal regions (parallelism), and history to capture complex real-world behavior. This gives deterministic, auditable flow control for agent/tool invocations, approvals, retries, and human-in-the-loop checkpoints.

Workflow / Steps

  1. Enumerate states (initial, intermediate, terminal) and define state/context schema.
  2. List events; define transitions with guards and actions (include error/timeouts as events).
  3. Design the graph: priorities/defaults per state; add hierarchy/parallel regions if using statecharts.
  4. Implement pure transition logic; run side effects in effect handlers with retries and idempotency.
  5. Persist state + minimal context; version the machine; enable visualization from source.
  6. Test transitions (table-driven), fuzz guards, and validate invariants; add observability.

Best Practices

Minimize state explosion: encode data checks in guards, not separate states.
Use statecharts for hierarchy/parallelism; leverage history for resumption.
Keep transition functions pure; isolate I/O and side effects with idempotency keys.
Model failures/timeouts explicitly; add compensations and rollback paths.
Source-of-truth diagrams from machine definition; review diffs in PRs.
Persist only necessary context; externalize large artifacts via URIs; protect sensitive data.
Comprehensive tests for guards/transitions; property tests for safety invariants.

When NOT to Use

Simple linear flows where a straight-line function or DAG is clearer and cheaper.

Heavy dataflow/ETL where stream/DAG engines fit better than event-guard machines.

Ultra low-latency paths sensitive to event loop/guard evaluation overhead.

Strict ACID cross-service transactions; prefer workflow engines with sagas/transactions.

Common Pitfalls

State explosion from encoding fine-grained data conditions as states rather than guards.

Ambiguous or overlapping guards; missing default transitions β†’ nondeterminism.

Embedding side effects inside transitions β†’ non-determinism and retry hazards.

Missing persistence/idempotency β†’ unrecoverable or duplicated effects on retry.

Parallel regions without proper isolation β†’ races and deadlocks.

Key Features

Deterministic event→transition semantics with guards and actions
Hierarchical and parallel states (statecharts); history/restoration
Explicit error/timeout and compensation transitions
Visual models and formal reasoning about reachability/invariants
Composable submachines; reuse via regions and invoked machines

KPIs / Success Metrics

Transition coverage and unreachable-state detection in tests.
Mean steps-to-completion; time-in-state distributions; p50/p95 per transition.
Failure/compensation transition rates; replay/resume success rates.
For LLM steps: tokens per state and per run; cost per run; cache hit rate.

Token / Resource Usage

Drivers: per-state LLM/tool calls, validation/self-check prompts, and context growth in state data.

Controls: per-state token budgets; summarize/trim context; cache deterministic steps; bound concurrency.

Storage/CPU: persist minimal state + references; avoid logging sensitive context; instrument per-transition cost.

Best Use Cases

Order/approval/fulfillment flows with retries, compensations, and human approvals.

Agent/dialog controllers gating tool calls and safety checks per step.

Compliance workflows requiring deterministic replay and audit trails.

Device/process controllers with hierarchical/parallel modes.

References & Further Reading

Patterns

closed

Loading...

Built by Kortexya