Streaming & Event-Driven Systems

Architecting stateful, real-time systems that remain correct, evolvable, and operable over time.

Event-driven systems are easy to prototype — and notoriously hard to operate correctly at scale. As organizations move from batch-oriented processing to continuous, stateful streaming, architectural decisions around state, time, deployment, and evolution become critical.

Many systems fail not because of throughput limits, but because they were not designed to evolve safely under live traffic.

Acosom works with software architects and platform engineers to design and operate streaming and event-driven systems that remain correct under continuous load, evolvable over years, and operable by real teams.

This expertise is about systems, not individual pipelines.

locationAn illustration of location

What Architects Gain

When streaming systems are designed for state, failure, and change.

knowledge iconAn illustration of knowledge icon

Event-First Architecture

Events represent business facts, not integration artifacts. Event schemas act as long-lived contracts, enabling decoupled evolution, replayability, and clear ownership boundaries.

implementation iconAn illustration of implementation icon

Stateful Processing as a Core Primitive

State is modeled explicitly, supports deterministic rebuilds, and separates evolution from deployment mechanics. Essential when systems evolve under live traffic.

db optimisation iconAn illustration of db optimisation icon

Controlled State Growth & Cost

State growth is managed through offloading, externalization, and table-based patterns. Predictable cost, faster rebuilds, and explicit lifecycle control.

stream iconAn illustration of stream icon

Change Data Capture as Integration Boundary

CDC is applied deliberately as a bridge from legacy systems, with explicit semantics and controlled schema evolution. Observable, restartable, resilient.

flexibility iconAn illustration of flexibility icon

Safe Evolution Under Live Traffic

Coordinated rebuilds, dependency-aware deployments, and blue-green patterns for stateful systems. Systems that can evolve without interrupting correctness.

security iconAn illustration of security icon

Schema Governance & Observability

Contracts enforced centrally, compatibility rules, and observability around state size, correctness, and recovery behavior. Long-lived platform assets.

Event-Driven Foundations & Domain Modeling

Streaming architectures start with how events are modeled, not with tools.

Events are explicit, durable, and meaningful across system boundaries. Domains, ownership, and bounded contexts are modeled deliberately. Event sourcing is applied where auditability and deterministic rebuilds matter — not as a default for every system.

Event schemas are treated as long-lived contracts. Compatibility, versioning, and ownership are architectural concerns, not afterthoughts.

technologiesAn illustration of technologies

Stateful Stream Processing & Execution Models

Stateful computation is the defining characteristic of real streaming systems.

implementation iconAn illustration of implementation icon

Stateful Stream Processing as a Core Primitive

State is modeled explicitly and treated as part of the system design — not hidden inside jobs. Time, ordering, and correctness are designed for event time, late data, and determinism under real-world conditions.

security iconAn illustration of security icon

Exactly-Once Processing & Deduplication

Exactly-once semantics are treated as an architectural property, supported by design decisions around identity, transactions, and replay — not just configuration flags.

db optimisation iconAn illustration of db optimisation icon

Cluster-Based Stream Processing

We design shared, platform-level stream processing systems using engines such as Apache Flink, focusing on state management, recovery semantics, and long-lived operation.

Managing State Growth & Cost at Scale

One of the most common failure modes in mature streaming systems is unbounded state growth.

db optimisation iconAn illustration of db optimisation icon

The Problem of Growing State

Increasing cardinality, longer retention, and evolving business logic often lead to silent state explosions that drive cost and instability. For long-running systems, state must be treated as a first-class architectural concern.

flexibility iconAn illustration of flexibility icon

Offloading and Externalizing State

We design systems where large or historical state is offloaded from the streaming engine, rather than accumulating indefinitely. State is separated into “hot” operational and “cold” historical layers.

implementation iconAn illustration of implementation icon

Table-Based Streaming & Object Storage

State is externalized into table-based storage on S3-compatible object stores, making it inspectable, queryable, and lifecycle-managed. Disaggregated state architectures decouple compute from state using Apache Paimon, Apache Fluss, and modern Flink.

Change Data Capture & Data Ingestion Patterns

CDC is often the pragmatic entry point into event-driven architectures.

stream iconAn illustration of stream icon

CDC as an Architectural Boundary

Database changes are captured intentionally and modeled as events with clear semantics — not raw table diffs. We avoid “changelog spaghetti” by mapping low-level changes to meaningful domain events.

implementation iconAn illustration of implementation icon

Operational Characteristics of CDC

CDC pipelines are designed to be observable, restartable, and resilient under continuous load. Schema evolution is managed deliberately, with compatibility and replay in mind. We commonly work with Kafka Connect, Debezium, and Flink-based CDC.

Projections, Read Models & Real-Time Data Access

Event streams are rarely consumed directly.

stream iconAn illustration of stream icon

Materialized Projections from Streams

We design projections derived exclusively from streams, reproducible from history and isolated per use case. Read models are tailored to individual services, avoiding shared databases and tight coupling.

db optimisation iconAn illustration of db optimisation icon

Real-Time Analytical Data Stores

For low-latency queries, we use real-time analytical and time-series databases such as ClickHouse and QuestDB. Streaming-derived data is exposed through analytics and visualization layers like Apache Superset.

Service-Centric Streaming with Kafka Streams

Not all streaming workloads belong on a shared processing cluster.

implementation iconAn illustration of implementation icon

Kafka Streams vs. Cluster-Based Processing

We deliberately choose between embedded and platform-level stream processing based on ownership, state size, and deployment needs. This work is built on deep experience with Kafka Streams in production.

db optimisation iconAn illustration of db optimisation icon

Topology Design & Performance Optimization

Kafka Streams topologies are optimized by minimizing repartitioning, using GlobalKTables where appropriate, and managing joins and state stores carefully. Topology design is reviewed as part of architecture, not left to chance.

knowledge iconAn illustration of knowledge icon

Internal Topics & State Store Management

Internal topics and state stores are explicitly accounted for, monitored, and lifecycle-managed. We design systems with correct transactional boundaries, deduplication strategies, and controlled reprocessing behavior.

Operating & Evolving Streaming Platforms

Correct architecture is meaningless without operability.

flexibility iconAn illustration of flexibility icon

Coordinated Rebuilds & Dependency Graphs

We design rebuild mechanisms where upstream state is reconstructed first and downstream applications follow dependency-aware orderings. Rebuilds happen without interrupting live traffic.

implementation iconAn illustration of implementation icon

Blue–Green Deployments for Stateful Systems

Stateful deployments are upgraded safely by ensuring both old and new versions reach consistent state before traffic is switched. This enables evolution under continuous load.

security iconAn illustration of security icon

Schema Governance & Compatibility Rules

Contracts are enforced centrally to prevent accidental breakage across teams. Observability covers lag, state size, correctness, and recovery behavior — not just lag alone.

Technologies

Technologies support architecture — they do not define it.

Stateful stream processing at scale. Cluster-based execution for complex streaming systems with large state, long-lived operation, and recovery semantics.

Kafka Streams

Embedded, service-centric streaming. Stream processing co-located with application logic for service-owned state and deployment.

Apache Kafka

Distributed event log. Foundation for event-driven architectures, providing durability, ordering, and replayability.

Kafka Connect

Connector framework for streaming data integration. Observable, restartable, and resilient pipelines.

Debezium

CDC platform capturing database changes as events. Intentional modeling of database changes with clear semantics.

Flink-based change data capture. Direct integration of database changes into streaming pipelines.

Apache Paimon

Table format for streaming. State externalized to object storage, making it inspectable, queryable, and lifecycle-managed.

Apache Fluss

Streaming storage for disaggregated state. Decouple compute from state for predictable cost and faster rebuilds.

implementation iconAn illustration of implementation iconpinot-navbar-logo-722f37Created with Sketch.Apache Druid logo

ClickHouse

Real-time analytical database. Low-latency queries on streaming-derived data with columnar storage and aggregation.

QuestDB

Time-series database for streaming data. Fast ingestion and queries for time-series workloads.

Apache Superset

Operational analytics and visualization. Expose streaming-derived data to humans through dashboards and exploration.

Who This Expertise Is For

This page is written for software architects, platform engineers, and senior engineers responsible for distributed systems.

If you are accountable for systems that must keep working while they evolve, this is where we typically engage.

consulting illustrationAn illustration of consulting illustration

Frequently Asked Questions

How do you handle state growth in long-running streaming systems?

State growth is one of the most common failure modes in mature streaming platforms.

Our approach:

  • Treat state as a first-class architectural concern from the start
  • Design for state offloading and externalization early
  • Use table-based storage on S3-compatible object stores
  • Separate “hot” operational state from “cold” historical state
  • Apply disaggregated state patterns with modern streaming engines

This enables predictable cost, faster rebuilds, and long-lived platforms that don’t collapse under their own state.

When should we use Kafka Streams vs. a cluster-based processor like Flink?

The choice depends on ownership, state size, deployment model, and operational maturity.

Kafka Streams makes sense when:

  • Stream processing is embedded directly in services
  • State size is manageable within service instances
  • Teams prefer service-centric deployment models
  • Ownership aligns with service boundaries

Cluster-based processing makes sense when:

  • State is large and needs disaggregation
  • Multiple teams share processing infrastructure
  • Complex windowing and late-data handling are required
  • Platform-level observability and governance are needed

We have deep experience with both and choose deliberately based on constraints.

How do you ensure exactly-once semantics in production?

Exactly-once is an architectural property, not a configuration flag.

Our approach:

  • Design around event identity and idempotency keys
  • Use correct transactional boundaries in Kafka Streams
  • Apply end-to-end exactly-once processing where required
  • Design deduplication based on business semantics
  • Control replay and reprocessing strategies explicitly

This ensures correctness under failure, not just under happy-path conditions.

How do you handle schema evolution in event-driven systems?

Event schemas are long-lived contracts that must evolve safely.

Our approach:

  • Treat schemas as architectural artifacts with explicit ownership
  • Enforce compatibility rules centrally (forward, backward, full)
  • Design for additive changes and avoid breaking modifications
  • Version schemas explicitly and document evolution
  • Test schema changes before deployment

This prevents accidental breakage across teams and enables safe evolution over years.

Can you help with existing streaming systems that have grown problematic?

Yes. Many of our engagements involve improving existing streaming platforms.

Common improvement areas:

  • Addressing unbounded state growth and cost explosions
  • Adding coordinated rebuild mechanisms
  • Improving observability beyond lag metrics
  • Refactoring fragile topologies and dependencies
  • Implementing schema governance retroactively
  • Enabling safe deployments for stateful systems

We assess current architecture, identify failure modes, and evolve systems incrementally without rewriting from scratch.

Do you work with specific streaming technologies only?

We’re technology-agnostic and choose based on constraints.

We commonly work with:

  • Apache Flink and Kafka Streams for stream processing
  • Apache Kafka as the event backbone
  • CDC tools like Debezium, Kafka Connect, and Flink CDC
  • Table formats like Apache Paimon and Apache Fluss
  • Real-time stores like ClickHouse and QuestDB

Technology choices follow operating model, lifecycle, and constraints — not trends or vendor preference.

Building streaming systems that must keep working while they evolve? Let’s talk about your architectural challenges.

Discuss Your Streaming Architecture