Apache Kafka architecture is a distributed, log-based event-streaming system organized around topics (logs), partitions (units of parallelism and ordering), brokers (servers that store and serve partitions), producers, consumers, and a metadata layer. Modern Kafka uses KRaft (Kafka Raft) for metadata instead of ZooKeeper. A production Kafka architecture goes beyond the single-cluster default — it is designed for durability, scalability, multi-tenant operation, and regulated environments.
Core components of a production Kafka architecture:
- Brokers and KRaft controllers: Sized for throughput, durability, and replication — not default configs
- Topics, partitions, and replication: Partition count, replication factor, and min-in-sync-replicas chosen per use case (throughput vs ordering vs durability)
- Producers and consumers: Idempotent producers, transactional writes where exactly-once is required, and carefully tuned consumer groups
- Schema Registry and data contracts: Avro/Protobuf/JSON schemas governed centrally to keep producers and consumers in sync
- Connectors and CDC: Kafka Connect for source/sink integration and Debezium for change data capture from operational databases
- Stream processing: Apache Flink (preferred for stateful, event-time-correct workloads) or Kafka Streams for transformations and enrichment
- Multi-tenancy and isolation: Quotas, ACLs, namespace conventions, and often separate clusters for regulatory boundaries
- Multi-region / DR: MirrorMaker 2, stretched clusters, or active-active replication — depending on RTO/RPO requirements
Operational concerns that make Kafka architecture production-grade:
- Observability (metrics, lag monitoring, tracing)
- Upgrade and rebalance strategies that don’t lose data
- Disaster recovery, backups, and replay procedures
- Capacity planning, storage tiering, and cost controls
- Security: mTLS, SASL, OAuth, RBAC, audit logging
Acosom designs and operates Kafka architecture for regulated enterprises — on-prem, hybrid, or sovereign cloud — as the event-streaming backbone of modern streaming data platforms. No default configs, no vendor lock-in.