Agentic & Event-Driven AI Systems

Designing stateful AI systems that reason, decide, and act over time — reliably and at scale.

Many discussions around “agentic AI” focus on orchestrating LLM calls. In production systems, that framing is insufficient.

Agentic systems that work reliably are long-running, stateful system components. They react to signals, maintain durable state, coordinate with other components, and recover deterministically from failure.

This page describes how we design agentic systems as distributed systems, using event-driven and stateful architectures — with LLMs as reasoning components, not as controllers.

technologiesAn illustration of technologies

What Organizations Gain

Production-grade agentic systems require distributed-systems thinking, not just prompt engineering.

time iconAn illustration of time icon

Fault Tolerance

Agents are designed as long-running processes with durable, externally managed state. Failures are recovered deterministically, not through retry loops.

architecture iconAn illustration of architecture icon

Replayability

Historical inputs can be replayed. State transitions are explicit. Behavior can be audited and reasoned about — enabling safe evolution and correction of logic.

db optimisation iconAn illustration of db optimisation icon

Determinism

Agents react to signals over time. Decisions are emitted as outputs. State is checkpointed and recoverable — not reconstructed from chat history.

knowledge iconAn illustration of knowledge icon

Observability

Agent behavior, state changes, and decisions are traceable. Debugging and introspection are first-class concerns, not afterthoughts.

secure luggage iconAn illustration of secure luggage icon

Governance

LLMs are invoked selectively, not as controllers. Context is retrieved explicitly. Reasoning is constrained to valid options. Results are validated deterministically.

flexibility iconAn illustration of flexibility icon

Cost Control

Memory is managed deliberately. LLMs are used when they add value, not as a default step. This leads to predictable cost and bounded failure modes.

What “Agentic Systems” Mean in Production

In production, an agent is not a prompt template, a polling loop around an LLM, or a stateless function.

An agent is a long-running process with durable, externally managed state, reacting to signals over time, producing decisions, actions, or new signals.

Agents live inside the system, not at its edges.

This introduces non-negotiable requirements: fault tolerance, replayability, determinism, observability, governance, and cost control.

These are distributed-systems concerns — not prompt-engineering problems.

locationAn illustration of location

Signal-Driven vs Prompt-Driven Agents

Many agent implementations are prompt-driven. Signal-driven agents work differently.

analysis iconAn illustration of analysis icon

Prompt-Driven Approach

Many agent implementations today are prompt-driven: receive input, reconstruct context, call the model, return a response. This approach breaks down once decisions span multiple steps, context accumulates over time, workflows branch dynamically, or failures must be recovered cleanly.

flexibility iconAn illustration of flexibility icon

Signal-Driven Agents

Agents react to signals (events, batches, replays, scheduled runs). Signals trigger explicit state transitions. Decisions are emitted as outputs. State is checkpointed and recoverable. Whether input arrives as a stream, batch, bulk upload, or replayed dataset is irrelevant.

graphdb iconAn illustration of graphdb icon

Stateful Decision Logic

What matters is stateful decision logic over time, not ingestion mode. Agents maintain durable state, enabling deterministic recovery, replayability, and correct behavior under real-world conditions.

Stateful Agents, Determinism & Replay

Agent state is not chat history.

rdbms iconAn illustration of rdbms icon

Agent State in Production

In production systems, agent state includes decisions already taken, partial workflow progress, accumulated facts or signals, validation outcomes, and coordination metadata. This state must be durable, replayable, inspectable, and versionable.

performance iconAn illustration of performance icon

Explicit State Transitions

We design agents so that state transitions are explicit, failures recover deterministically, and historical inputs can be replayed. Behavior can be audited and reasoned about at every step.

quality iconAn illustration of quality icon

Safe Evolution & Correction

This enables safe evolution, backfills, and correction of logic. Systems can be improved without losing historical context or breaking existing workflows.

Event-Driven Agent Frameworks in Practice

We implement agentic systems using stateful, event-driven runtimes.

stream iconAn illustration of stream icon

Using Flink, agents are implemented as long-running, stateful operators driven by signals (streams, batches, replays) with exactly-once state guarantees. This enables durable agent state, deterministic recovery, controlled reprocessing, and coordination across multiple agents.

fault tolerance iconAn illustration of fault tolerance icon

Akka & Event-Sourced Agents

Akka provides a complementary model: isolated agents via actors, explicit supervision and lifecycle management, event sourcing as a first-class concept, and strong modeling of command/event flows. Particularly effective for agent hierarchies and complex business logic.

documentdb iconAn illustration of documentdb icon

LLMs as Components

In both cases, LLMs are invoked by agents — not vice versa. Agents control execution flow, manage state, and decide when reasoning is required. This ensures reproducibility and bounded failure modes.

LLMs as Reasoning Components

In production agentic systems, LLMs do not own state or control execution flow.

stream iconAn illustration of stream icon

Agent Decision Cycle

A typical agent decision cycle: receive a signal, evaluate current state, decide whether reasoning is required, retrieve constrained context, invoke the LLM, validate and structure the output, update state, and emit actions or new signals.

security iconAn illustration of security icon

Reproducibility & Auditability

This ensures reproducibility and auditability. Every decision can be traced back to its inputs, state, and reasoning process. Behavior is inspectable and verifiable.

db optimisation iconAn illustration of db optimisation icon

Bounded Failure Modes

Bounded failure modes and predictable cost. LLMs are invoked selectively, not as a default step. Context is constrained, and outputs are validated deterministically.

Memory, Context & Token Optimization

Agentic systems must manage memory deliberately.

knowledge iconAn illustration of knowledge icon

Layered Memory Model

Agent state: Durable, structured, replayable system state. Interaction context: Context scoped to a specific workflow or decision step. Long-term memory: Persisted knowledge such as historical decisions, user profiles, or domain facts.

db authorization iconAn illustration of db authorization icon

Context Retrieval as Constraint

Agents often reason within boundaries defined by system or user context: schemas, allowed categories, validation rules, processing constraints. Context is retrieved explicitly, reasoning is constrained to valid options, and results are validated deterministically.

db optimisation iconAn illustration of db optimisation icon

Token Optimization

This leads to higher reliability, lower token usage, and clearer failure modes. LLMs are used when they add value, not as a default step for every operation.

Multi-Agent Coordination & Dependencies

Real-world systems rarely involve a single agent.

stream iconAn illustration of stream icon

Explicit Dependencies & Ordering

We design multi-agent systems with explicit dependencies, ordering guarantees, backpressure handling, and failure isolation. Coordination is deterministic, not emergent.

communication iconAn illustration of communication icon

Common Patterns

Common patterns include staged pipelines, conditional branching, dependency-aware scheduling, and background rebuilds and reprocessing. These patterns borrow from distributed DAG execution and event-driven coordination.

security iconAn illustration of security icon

Coordinated Intelligence

The outcome is coordinated intelligence, not uncontrolled autonomy. Agents work together predictably, with clear boundaries and failure isolation.

Operating Agentic Systems in Production

Agentic systems are operational systems.

teamwork iconAn illustration of teamwork icon

Observability & Debugging

We design for observability of agent behavior, tracing of decisions and state changes, and debugging and introspection. Every action is traceable to its cause.

flexibility iconAn illustration of flexibility icon

Controlled Evolution

Controlled rollouts and upgrades, rollback and replay strategies, and governance and access control. Agents can be versioned, deployed blue/green, shadow-executed, and migrated gradually.

implementation iconAn illustration of implementation icon

Long-Term Operation

This allows systems to evolve safely over long lifecycles. Agents are designed to run for years, not minutes, with continuous improvement without disruption.

When Agentic Systems Make Sense (and When They Don’t)

Agentic systems are effective when processing involves multiple steps.

knowledge iconAn illustration of knowledge icon

When Agents Make Sense

Agentic systems are effective when processing involves multiple steps, decisions depend on accumulated context, workflows branch dynamically, reasoning must be combined with deterministic validation, and behavior must be replayable and auditable.

flexibility iconAn illustration of flexibility icon

Trigger Modes

Agents can be triggered by continuous streams, scheduled batch executions, bulk uploads, or historical reprocessing. The deciding factor is stateful decision logic, not ingestion mode.

implementation iconAn illustration of implementation icon

When to Avoid Agents

Agentic systems are not the right choice when logic is purely stateless, processing is a single deterministic transformation, or no coordination or branching is required. We deliberately avoid agents where simpler architectures are sufficient.

Technologies & Frameworks

Production-grade, pluggable by design.

Stateful operators with exactly-once guarantees for agent execution.

stream iconAn illustration of stream icon

Akka

Event-sourced agents and supervision for hierarchical systems.

PostgreSQL

Structured data storage for agent state and operational data.

MongoDB

Document storage for flexible agent context and configuration.

Apache Iceberg

Historical context and analytical lookups for replayable decision inputs.

Apache Paimon

Streaming table storage for agent state and context.

Pinecone

Vector database for semantic search and long-term memory.

Milvus

Open-source vector database for embedding storage and retrieval.

Weaviate

Vector database with native AI integration for agent memory.

Qdrant

High-performance vector database for fast semantic retrieval.

Neo4j

Graph database for relationship modeling and knowledge graphs.

Apache JanusGraph

Distributed graph database for large-scale relationship tracking.

How This Expertise Is Applied

This expertise is applied to:

  • decision intelligence systems
  • multi-step processing pipelines
  • intelligent automation
  • AI-assisted operational platforms
  • coordinated agent-based systems

It integrates naturally with:

consulting illustrationAn illustration of consulting illustration

Frequently Asked Questions

How are agentic systems different from LLM orchestration?

Agentic systems are long-running, stateful processes with durable state and deterministic recovery. LLM orchestration libraries typically focus on chaining prompts without addressing fault tolerance, replay, or multi-step coordination.

Can agentic systems work with batch processing?

Yes. Agents are signal-driven, not stream-only. They can be triggered by scheduled batches, bulk uploads, historical reprocessing, or continuous streams. The architecture is ingestion-mode agnostic.

How do you handle agent failures in production?

Agents checkpoint state explicitly. Failures trigger deterministic recovery from the last consistent checkpoint. Historical inputs can be replayed. State transitions are auditable.

What role do LLMs play in agentic systems?

LLMs are reasoning components, not controllers. Agents decide when to invoke the LLM, retrieve constrained context, validate outputs, and update state. This ensures reproducibility and cost control.

How do you coordinate multiple agents?

Multi-agent systems use explicit dependencies, ordering guarantees, and backpressure handling. Patterns include staged pipelines, conditional branching, and dependency-aware scheduling — borrowed from distributed DAG execution.

When should organizations avoid agentic systems?

When logic is purely stateless, processing is a single deterministic transformation, or no coordination is required. We deliberately avoid agents where simpler architectures are sufficient.

Building agentic systems that must run reliably in production? Let’s talk about your stateful AI architecture.

Discuss Your Agentic AI System