Apache Kafka

Kafka is an open source distributed streaming platform which gives high-throughput, fault-tolerance and reduces latency.

It follows pub/sub system, where pub is to generate event and it send to topic and sub is to collect the events from the topic. In topic our events are stored in the partition we have to explicitly create topic, if not by default one partition will be created. Why use partition? for parallel processing when multiple consumers are retrieving data it makes high throughput. each partition will have offset (index-based) it helps when a service failure then to maintains the order of processing data. offset will start from 0.

A Kafka broker is a Kafka server responsible for managing and handling data and partitions. When two or more brokers are present, they form a Kafka cluster, which provides fault-tolerance, scalability, and high availability. This cluster architecture allows for the efficient distribution, replication, and processing of data across partitions and topics in a distributed streaming platform.

replication means creating copies (replicas) of each partition it helps when data is lost system fails to get backup.

why this Kafka?

Have you ever noticed that after visiting website, the products or data you viewed there seem to appear as advertisements on another websites?

How your data will be sent to other websites? using message brokers why message brokers?

Consider a scenario where data you viewed on one website starts appearing across multiple websites. In such cases, it's plausible that various applications may communicate with each other to process and share user data. However, if one website will communicate with multiple websites, leading to increased system complexity and potentially resulting in system degradation. To avoid these failures, we use message brokers. It acts as a mediator.

Kafka is a message broker which stores events from the event sources.