profile
viewpoint
Michael G. Noll miguno Europe https://www.michael-noll.com/ Ex Office of the CTO at @confluentinc (Apache Kafka), formerly product manager of ksqlDB & Kafka Streams. Open source committer. Writer.

miguno/kafka-storm-starter 725

Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.

miguno/avro-cli-examples 129

Examples on how to use the command line tools in Avro Tools to read and write Avro files

miguno/avro-hadoop-starter 111

Example MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.

miguno/akka-mock-scheduler 86

A mock Akka scheduler to simplify testing scheduler-dependent code

miguno/puppet-kafka 42

Wirbelsturm-compatible Puppet module to deploy Kafka 0.8+ clusters

miguno/golang-docker-build-tutorial 39

A template project to create a minimal Docker image for a Go application

miguno/gradle-testng-mockito-bootstrap 33

A ready-to-use bootstrap Java project backed by gradle, TestNG, Mockito, FEST-Assert 2 and Cobertura for Eclipse and IntelliJ IDEA, with support for Jenkins CI.

miguno/kafka-avro-codec 19

Avro encoder/decoder for use as serializer.class in Apache Kafka 0.8

miguno/graphite-supervisord-rpm 17

How to install and configure Graphite 0.9.x via RPMs on RHEL 6 and run it under process supervision with supervisord.

miguno/java-docker-build-tutorial 13

A template project to create a minimal Docker image for a Java application

startedCaldis/Mos

started time in a month

fork miguno/Sabaki

An elegant Go board and SGF editor for a more civilized age.

https://sabaki.yichuanshen.de/

fork in a month

startedBotspot/pi-apps

started time in 2 months

PR opened confluentinc/event-streaming-patterns

Use responseEvent to configure response

Fixes what likely was a copy-paste error when the pattern description was written.

Original bug report at https://forum.confluent.io/t/how-do-correlation-identifiers-work/2864/2.

+2 -2

0 comment

1 changed file

pr created time in 2 months

push eventconfluentinc/event-streaming-patterns

Michael G. Noll

commit sha 476bc6000ad6a63f7169e4b949fe331a9d2f16e7

Use responseEvent to configure response

view details

push time in 2 months

create barnchconfluentinc/event-streaming-patterns

branch : correlation-identifier-fix

created branch time in 2 months

Pull request review commentapache/kafka-site

Added select KS APAC & EU 2021 videos

 <h4 class="anchor-heading">       </h4>        <ul>+        <li>+          <a target="_blank" href="https://www.confluent.io/events/kafka-summit-apac-2021/should-you-read-kafka-as-a-stream-or-in-batch-should-you-even-care/">Should You Read Kafka as a Stream or in Batch? Should You Even Care?</a>,+          Ido Nadler (Nielsen) &amp; Opher Dubrovsky (Nielsen) +        </li>+        <li>+          <a target="_blank" href="https://www.confluent.io/events/kafka-summit-apac-2021/scaling-a-core-banking-engine-using-apache-kafka/">Scaling a Core Banking Engine Using Apache Kafka</a>,+          Peter Dudbridge (Thought Machine)

Lacks conference info (which summit?)

bellemare

comment created time in 3 months

Pull request review commentapache/kafka-site

Added select KS APAC & EU 2021 videos

 <h4 class="anchor-heading">       </h4>        <ul>+        <li>+          <a target="_blank" href="https://www.confluent.io/events/kafka-summit-europe-2021/making-kafka-cloud-native/">Making Kafka Cloud Native</a>,+          Jay Kreps (Confluent), KS EU 2021+        </li>+        <li>+          <a target="_blank" href="https://www.confluent.io/events/kafka-summit-europe-2021/how-ksqldb-works/">How ksqlDB works</a>,+          Michael Drogalis (Confluent) , KS EU 2021
          Michael Drogalis (Confluent), KS EU 2021
bellemare

comment created time in 3 months

PullRequestReviewEvent
PullRequestReviewEvent

startedgawindx/WinNUT-Client

started time in 3 months

issue commentgawindx/WinNUT-Client

How to setup WinNUT Client on Win 10 with synology

The following WinNUT settings (v2.0.7722.30975) work on a Synology DS412+ NAS (running DSM 6.2.4-25556) and an Eaton Ellipse ECO 1600 UPS.

  • NUT host: <IP or hostname of the NAS>
  • NUT Port: 3493
  • UPS Name: ups
  • Polling Interval: 1 (as per https://github.com/gawindx/WinNUT-Client/issues/47#issuecomment-783938247; the default of 0 did not work)
  • Login: upsmon
  • Password: secret
  • Optional: Check the "[X] Re-establish connection" config at the bottom of the settings window

Thanks for maintaining this project!

winnut-screenshot3

winnut-screenshot4

winnut-screenshot

witzker

comment created time in 3 months

Pull request review commentconfluentinc/event-streaming-patterns

Idempotent Reader

+---+seo:+  title: Idempotent Reader+  description: An Idempotent Reader consumes the same event once or multiple times, with the same effect.+---++# Idempotent Reader++In ideal circumstances, [Events](../event/event.md) are only written once into an [Event Stream](../event-stream/event-stream.md). Under normal operations, all consumers of the stream will also only read and process each event once. However, depending on the behavior and configuration of the [Event Source](../event-source/event-source.md), there may be failures that create duplicate events. When this happens, we need a strategy for dealing with the duplicates.++An [idempotent](https://en.wikipedia.org/wiki/Idempotence) reader must take two causes of duplicate events into consideration:++1. *Operational Failures*: Intermittent network and system failures are unavoidable in distributed systems. In the case of a machine failure or a brief network outage, an [Event Source](../event-source/event-source.md) may produce the same event multiple times due to retries. Similarly, an [Event Sink](../event-sink/event-sink.md) may consume and process the same event multiple times due to intermittent offset updating failures. The [Event Streaming Platform](../event-stream/event-streaming-platform.md) should automatically guard against these operational failures by providing strong delivery and processing guarantees, such as those found in Apache Kafka® transactions.++2. *Incorrect Application Logic*: An [Event Source](../event-source/event-source.md) could mistakenly produce the same event multiple times, populating the [Event Stream](../event-stream/event-stream.md) with multiple distinct events from the perspective of the [Event Streaming Platform](../event-stream/event-streaming-platform.md). For example, imagine a bug in the Event Source that writes a customer payment event twice, instead of just once. The Event Streaming Platform knows nothing of the business logic, so it cannot differentiate between the two events and instead considers them as two distinct payment events.++## Problem+How can an application deal with duplicate Events when reading from an Event Stream?++## Solution+![idempotent-reader](../img/idempotent-reader.svg)++This can be addressed using exactly-once semantics (EOS), including native support for transactions and support for idempotent clients.+EOS allows [Event Streaming Applications](../event-processing/event-processing-application.md) to process data without loss or duplication, ensuring that computed results are always accurate. ++[Idempotent Writing](idempotent-writer.md) by the [Event Source](../event-source/event-source.md) is the first step in solving this problem. Idempotent Writing provides strong, exactly-once delivery guarantees of the producer's Events, and removes operational failures as a cause of written duplicate Events.++On the reading side, in [Event Processors](../event-processing/event-processor.md) and [Event Sinks](../event-sink/event-sink.md), an Idempotent Reader can be configured to read only committed transactions. This prevents events within incomplete transactions from being read, providing the reader isolation from operational writer failures. Keep in mind that idempotence means that the reader's business logic must be able to process the same consumed event multiple times, where multiple reads have the same effect as a single read of the event. For example, if the reader manages a counter (i.e., a state) for the events it has read, then reading the same event multiple times should only increment the counter once.++Duplicates caused by incorrect application logic from upstream sources are best resolved by fixing the application's logic (i.e., fixing the root cause). In cases where this is not possible, such as when events are generated outside of our control, the next best option is to embed a tracking ID into the event. A tracking ID should be a field that is unique to the logical event, such as an event key or request ID. The consumer can then read the tracking ID, cross-reference it against an internal state store of IDs it has already processed, and discard the event if necessary.+++## Implementation+To configure an Idempotent Reader to read only committed transactions, set the following parameter:++```+isolation.level="read_committed"+```++In your Kafka Streams application, to handle operational failures, you can [enable EOS](https://docs.confluent.io/platform/current/streams/developer-guide/config-streams.html#processing-guarantee). Within a single transaction, a Kafka Streams application using EOS will atomically update its consumer offsets, its state stores including their changelog topics, its repartition topics, and its output topics.

I don't see the code block with the setting for KStreams? Once added, feel free to merge this PR.

ybyzek

comment created time in 3 months

PullRequestReviewEvent

Pull request review commentconfluentinc/event-streaming-patterns

Idempotent writer

+---+seo:+  title: Idempotent Writer+  description: An Idempotent Writer produces an Event to an Event Streaming Platform exactly once.+---++# Idempotent Writer+A writer produces [Events](../event/event.md) that are written into an [Event Stream](../event-stream/event-stream.md), and under stable conditions, each Event is recorded only once.+However, in the case of an operational failure or a brief network outage, an [Event Source](../event-source/event-source.md) may need to retry writes. This may result in multiple copies of the same Event ending up in the Event Stream, as the first write may have actually succeeded even though the producer did not receive the acknowledgement from the [Event Streaming Platform](../event-stream/event-streaming-platform.md). This type of duplication is a common failure scenario in practice and one of the perils of distributed systems.++## Problem+How can an [Event Streaming Platform](../event-stream/event-streaming-platform.md) ensure that an Event Source does not write the same Event more than once?++## Solution+![idempotent-writer](../img/idempotent-writer.svg)++Generally speaking, this can be addressed by native support for idempotent clients.+This means that a writer may try to produce an Event more than once, but the Event Streaming Platform detects and discards duplicate write attempts for the same Event.++## Implementation+To make an Apache Kafka® producer idempotent, configure your producer with the following setting:++```+enable.idempotence=true+```++The Kafka producer tags each batch of Events that it sends to the Kafka cluster with a sequence number. Brokers in the cluster use this sequence number to enforce deduplication of Events sent from this specific producer. Each batch's sequence number is persisted so that even if the [leader broker](https://www.confluent.io/blog/apache-kafka-intro-how-kafka-works/#replication) fails, the new leader broker will also know if a given batch is a duplicate.++To enable [exactly-once processing guarantees](https://docs.ksqldb.io/en/latest/operate-and-deploy/exactly-once-semantics/) in ksqlDB or Kafka Streams, configure the application with the following setting, which includes enabling idempotence in the embedded producer:++```+processing.guarantee=exactly_once

Yes, no change is required. I already Approved subsequently.

ybyzek

comment created time in 3 months

PullRequestReviewEvent
PullRequestReviewEvent

Pull request review commentconfluentinc/event-streaming-patterns

Idempotent writer

+---+seo:+  title: Idempotent Writer+  description: An Idempotent Writer produces an Event to an Event Streaming Platform exactly once.+---++# Idempotent Writer+A writer produces [Events](../event/event.md) that are written into an [Event Stream](../event-stream/event-stream.md), and under stable conditions, each Event is recorded only once.+However, in the case of an operational failure or a brief network outage, an [Event Source](../event-source/event-source.md) may need to retry writes. This may result in multiple copies of the same Event ending up in the Event Stream, as the first write may have actually succeeded even though the producer did not receive the acknowledgement from the [Event Streaming Platform](../event-stream/event-streaming-platform.md). This type of duplication is a common failure scenario in practice and one of the perils of distributed systems.++## Problem+How can an [Event Streaming Platform](../event-stream/event-streaming-platform.md) ensure that an Event Source does not write the same Event more than once?++## Solution+![idempotent-writer](../img/idempotent-writer.svg)++Generally speaking, this can be addressed by native support for idempotent clients.+This means that a writer may try to produce an Event more than once, but the Event Streaming Platform detects and discards duplicate write attempts for the same Event.++## Implementation+To make an Apache Kafka® producer idempotent, configure your producer with the following setting:++```+enable.idempotence=true+```++The Kafka producer tags each batch of Events that it sends to the Kafka cluster with a sequence number. Brokers in the cluster use this sequence number to enforce deduplication of Events sent from this specific producer. Each batch's sequence number is persisted so that even if the [leader broker](https://www.confluent.io/blog/apache-kafka-intro-how-kafka-works/#replication) fails, the new leader broker will also know if a given batch is a duplicate.++To enable [exactly-once processing guarantees](https://docs.ksqldb.io/en/latest/operate-and-deploy/exactly-once-semantics/) in ksqlDB or Kafka Streams, configure the application with the following setting, which includes enabling idempotence in the embedded producer:++```+processing.guarantee=exactly_once

So, no change is needed for now. We can add the _v2 setting once it's available officially.

ybyzek

comment created time in 3 months

PullRequestReviewEvent

Pull request review commentconfluentinc/event-streaming-patterns

Idempotent writer

+---+seo:+  title: Idempotent Writer+  description: An Idempotent Writer produces an Event to an Event Streaming Platform exactly once.+---++# Idempotent Writer+A writer produces [Events](../event/event.md) that are written into an [Event Stream](../event-stream/event-stream.md), and under stable conditions, each Event is recorded only once.+However, in the case of an operational failure or a brief network outage, an [Event Source](../event-source/event-source.md) may need to retry writes. This may result in multiple copies of the same Event ending up in the Event Stream, as the first write may have actually succeeded even though the producer did not receive the acknowledgement from the [Event Streaming Platform](../event-stream/event-streaming-platform.md). This type of duplication is a common failure scenario in practice and one of the perils of distributed systems.++## Problem+How can an [Event Streaming Platform](../event-stream/event-streaming-platform.md) ensure that an Event Source does not write the same Event more than once?++## Solution+![idempotent-writer](../img/idempotent-writer.svg)++Generally speaking, this can be addressed by native support for idempotent clients.+This means that a writer may try to produce an Event more than once, but the Event Streaming Platform detects and discards duplicate write attempts for the same Event.++## Implementation+To make an Apache Kafka® producer idempotent, configure your producer with the following setting:++```+enable.idempotence=true+```++The Kafka producer tags each batch of Events that it sends to the Kafka cluster with a sequence number. Brokers in the cluster use this sequence number to enforce deduplication of Events sent from this specific producer. Each batch's sequence number is persisted so that even if the [leader broker](https://www.confluent.io/blog/apache-kafka-intro-how-kafka-works/#replication) fails, the new leader broker will also know if a given batch is a duplicate.++To enable [exactly-once processing guarantees](https://docs.ksqldb.io/en/latest/operate-and-deploy/exactly-once-semantics/) in ksqlDB or Kafka Streams, configure the application with the following setting, which includes enabling idempotence in the embedded producer:++```+processing.guarantee=exactly_once

In this case, I would stick with the v1 version of EOS for now, because it doesn't have the (current) "beta" suffix like the v2 version of EOS.

TL;DR:

processing.guarantee=exactly_once

(this gives you EOS v1)

ybyzek

comment created time in 3 months

PullRequestReviewEvent

Pull request review commentconfluentinc/event-streaming-patterns

Idempotent writer

+---+seo:+  title: Idempotent Writer+  description: An Idempotent Writer produces an Event to an Event Streaming Platform exactly once.+---++# Idempotent Writer+A writer produces [Events](../event/event.md) that are written into an [Event Stream](../event-stream/event-stream.md), and under stable conditions, each Event is recorded only once.+However, in the case of an operational failure or a brief network outage, an [Event Source](../event-source/event-source.md) may need to retry writes. This may result in multiple copies of the same Event ending up in the Event Stream, as the first write may have actually succeeded even though the producer did not receive the acknowledgement from the [Event Streaming Platform](../event-stream/event-streaming-platform.md). This type of duplication is a common failure scenario in practice and one of the perils of distributed systems.++## Problem+How can an [Event Streaming Platform](../event-stream/event-streaming-platform.md) ensure that an Event Source does not write the same Event more than once?++## Solution+![idempotent-writer](../img/idempotent-writer.svg)++Generally speaking, this can be addressed by native support for idempotent clients.+This means that a writer may try to produce an Event more than once, but the Event Streaming Platform detects and discards duplicate write attempts for the same Event.++## Implementation+To make an Apache Kafka® producer idempotent, configure your producer with the following setting:++```+enable.idempotence=true+```++The Kafka producer tags each batch of Events that it sends to the Kafka cluster with a sequence number. Brokers in the cluster use this sequence number to enforce deduplication of Events sent from this specific producer. Each batch's sequence number is persisted so that even if the [leader broker](https://www.confluent.io/blog/apache-kafka-intro-how-kafka-works/#replication) fails, the new leader broker will also know if a given batch is a duplicate.++To enable [exactly-once processing guarantees](https://docs.ksqldb.io/en/latest/operate-and-deploy/exactly-once-semantics/) in ksqlDB or Kafka Streams, configure the application with the following setting, which includes enabling idempotence in the embedded producer:++```+processing.guarantee=exactly_once

So, there's a bug in our documentation. The correct setting for KSTreams (as of today) is:

processing.guarantee=exactly_once_beta

The *_v2 variant will be introduced in AK 3.0 through KIP-732.

ybyzek

comment created time in 3 months

PullRequestReviewEvent

Pull request review commentconfluentinc/event-streaming-patterns

Idempotent Reader

+---+seo:+  title: Idempotent Reader+  description: An Idempotent Reader consumes the same event once or multiple times, with the same effect.+---++# Idempotent Reader++In ideal circumstances, [Events](../event/event.md) are only written once into an [Event Stream](../event-stream/event-stream.md). Under normal operations, all consumers of the stream will also only read and process each event once. However, depending on the behavior and configuration of the [Event Source](../event-source/event-source.md), there may be failures that create duplicate events. When this happens, we need a strategy for dealing with the duplicates.++An [idempotent](https://en.wikipedia.org/wiki/Idempotence) reader must take two causes of duplicate events into consideration:++1. *Operational Failures*: Intermittent network and system failures are unavoidable in distributed systems. In the case of a machine failure or a brief network outage, an [Event Source](../event-source/event-source.md) may produce the same event multiple times due to retries. Similarly, an [Event Sink](../event-sink/event-sink.md) may consume and process the same event multiple times due to intermittent offset updating failures. The [Event Streaming Platform](../event-stream/event-streaming-platform.md) should automatically guard against these operational failures by providing strong delivery and processing guarantees, such as those found in Apache Kafka® transactions.++2. *Incorrect Application Logic*: An [Event Source](../event-source/event-source.md) could mistakenly produce the same event multiple times, populating the [Event Stream](../event-stream/event-stream.md) with multiple distinct events from the perspective of the [Event Streaming Platform](../event-stream/event-streaming-platform.md). For example, imagine a bug in the Event Source that writes a customer payment event twice, instead of just once. The Event Streaming Platform knows nothing of the business logic, so it cannot differentiate between the two events and instead considers them as two distinct payment events.++## Problem+How can an application deal with duplicate Events when reading from an Event Stream?++## Solution+![idempotent-reader](../img/idempotent-reader.svg)++This can be addressed using exactly-once semantics (EOS), including native support for transactions and support for idempotent clients.+EOS allows [Event Streaming Applications](../event-processing/event-processing-application.md) to process data without loss or duplication, ensuring that computed results are always accurate. ++[Idempotent Writing](idempotent-writer.md) by the [Event Source](../event-source/event-source.md) is the first step in solving this problem. Idempotent Writing provides strong, exactly-once delivery guarantees of the producer's Events, and removes operational failures as a cause of written duplicate Events.++On the reading side, in [Event Processors](../event-processing/event-processor.md) and [Event Sinks](../event-sink/event-sink.md), an Idempotent Reader can be configured to read only committed transactions. This prevents events within incomplete transactions from being read, providing the reader isolation from operational writer failures. Keep in mind that idempotence means that the reader's business logic must be able to process the same consumed event multiple times, where multiple reads have the same effect as a single read of the event. For example, if the reader manages a counter (i.e., a state) for the events it has read, then reading the same event multiple times should only increment the counter once.++Duplicates caused by incorrect application logic from upstream sources are best resolved by fixing the application's logic (i.e., fixing the root cause). In cases where this is not possible, such as when events are generated outside of our control, the next best option is to embed a tracking ID into the event. A tracking ID should be a field that is unique to the logical event, such as an event key or request ID. The consumer can then read the tracking ID, cross-reference it against an internal state store of IDs it has already processed, and discard the event if necessary.+++## Implementation+To configure an Idempotent Reader to read only committed transactions, set the following parameter:++```+isolation.level="read_committed"+```++In your Kafka Streams application, to handle operational failures, you can [enable EOS](https://docs.confluent.io/platform/current/streams/developer-guide/config-streams.html#processing-guarantee). Within a single transaction, a Kafka Streams application using EOS will atomically update its consumer offsets, its state stores including their changelog topics, its repartition topics, and its output topics.

Btw, I am checking why https://kafka.apache.org/28/documentation/streams/developer-guide/config-streams.html#processing-guarantee says the value should be exactly_once_beta. IIRC the *_v2 name supercedes the *_beta name in the latest Kafka, but I have to double-check.

ybyzek

comment created time in 3 months

PullRequestReviewEvent
PullRequestReviewEvent
PullRequestReviewEvent

Pull request review commentconfluentinc/event-streaming-patterns

Idempotent writer

+---+seo:+  title: Idempotent Writer+  description: An Idempotent Writer produces an Event to an Event Streaming Platform exactly once.+---++# Idempotent Writer+A writer produces [Events](../event/event.md) that are written into an [Event Stream](../event-stream/event-stream.md), and under stable conditions, each Event is recorded only once.+However, in the case of an operational failure or a brief network outage, an [Event Source](../event-source/event-source.md) may need to retry writes. This may result in multiple copies of the same Event ending up in the Event Stream, as the first write may have actually succeeded even though the producer did not receive the acknowledgement from the [Event Streaming Platform](../event-stream/event-streaming-platform.md). This type of duplication is a common failure scenario in practice and one of the perils of distributed systems.++## Problem+How can an [Event Streaming Platform](../event-stream/event-streaming-platform.md) ensure that an Event Source does not write the same Event more than once?++## Solution+![idempotent-writer](../img/idempotent-writer.svg)++Generally speaking, this can be addressed by native support for idempotent clients.+This means that a writer may try to produce an Event more than once, but the Event Streaming Platform detects and discards duplicate write attempts for the same Event.++## Implementation+To make an Apache Kafka® producer idempotent, configure your producer with the following setting:++```+enable.idempotence=true+```++The Kafka producer tags each batch of Events that it sends to the Kafka cluster with a sequence number. Brokers in the cluster use this sequence number to enforce deduplication of Events sent from this specific producer. Each batch's sequence number is persisted so that even if the [leader broker](https://www.confluent.io/blog/apache-kafka-intro-how-kafka-works/#replication) fails, the new leader broker will also know if a given batch is a duplicate.++To enable [exactly-once processing guarantees](https://docs.ksqldb.io/en/latest/operate-and-deploy/exactly-once-semantics/) in ksqlDB or Kafka Streams, configure the application with the following setting, which includes enabling idempotence in the embedded producer:++```+processing.guarantee=exactly_once

The docs link for ksqlDB above says that it is indeed this setting.

For Kafka Streams, however, the current docs at https://docs.confluent.io/platform/current/streams/developer-guide/config-streams.html#processing-guarantee ask for a different setting:

processing.guarantee= exactly_once_v2
ybyzek

comment created time in 3 months

Pull request review commentconfluentinc/event-streaming-patterns

Idempotent Reader

+---+seo:+  title: Idempotent Reader+  description: An Idempotent Reader consumes the same event once or multiple times, with the same effect.+---++# Idempotent Reader++In ideal circumstances, [Events](../event/event.md) are only written once into an [Event Stream](../event-stream/event-stream.md). Under normal operations, all consumers of the stream will also only read and process each event once. However, depending on the behavior and configuration of the [Event Source](../event-source/event-source.md), there may be failures that create duplicate events. When this happens, we need a strategy for dealing with the duplicates.++An [idempotent](https://en.wikipedia.org/wiki/Idempotence) reader must take two causes of duplicate events into consideration:++1. *Operational Failures*: Intermittent network and system failures are unavoidable in distributed systems. In the case of a machine failure or a brief network outage, an [Event Source](../event-source/event-source.md) may produce the same event multiple times due to retries. Similarly, an [Event Sink](../event-sink/event-sink.md) may consume and process the same event multiple times due to intermittent offset updating failures. The [Event Streaming Platform](../event-stream/event-streaming-platform.md) should automatically guard against these operational failures by providing strong delivery and processing guarantees, such as those found in Apache Kafka® transactions.++2. *Incorrect Application Logic*: An [Event Source](../event-source/event-source.md) could mistakenly produce the same event multiple times, populating the [Event Stream](../event-stream/event-stream.md) with multiple distinct events from the perspective of the [Event Streaming Platform](../event-stream/event-streaming-platform.md). For example, imagine a bug in the Event Source that writes a customer payment event twice, instead of just once. The Event Streaming Platform knows nothing of the business logic, so it cannot differentiate between the two events and instead considers them as two distinct payment events.++## Problem+How can an application deal with duplicate Events when reading from an Event Stream?++## Solution+![idempotent-reader](../img/idempotent-reader.svg)++This can be addressed using exactly-once semantics (EOS), including native support for transactions and support for idempotent clients.+EOS allows [Event Streaming Applications](../event-processing/event-processing-application.md) to process data without loss or duplication, ensuring that computed results are always accurate. ++[Idempotent Writing](idempotent-writer.md) by the [Event Source](../event-source/event-source.md) is the first step in solving this problem. Idempotent Writing provides strong, exactly-once delivery guarantees of the producer's Events, and removes operational failures as a cause of written duplicate Events.++On the reading side, in [Event Processors](../event-processing/event-processor.md) and [Event Sinks](../event-sink/event-sink.md), an Idempotent Reader can be configured to read only committed transactions. This prevents events within incomplete transactions from being read, providing the reader isolation from operational writer failures. Keep in mind that idempotence means that the reader's business logic must be able to process the same consumed event multiple times, where multiple reads have the same effect as a single read of the event. For example, if the reader manages a counter (i.e., a state) for the events it has read, then reading the same event multiple times should only increment the counter once.++Duplicates caused by incorrect application logic from upstream sources are best resolved by fixing the application's logic (i.e., fixing the root cause). In cases where this is not possible, such as when events are generated outside of our control, the next best option is to embed a tracking ID into the event. A tracking ID should be a field that is unique to the logical event, such as an event key or request ID. The consumer can then read the tracking ID, cross-reference it against an internal state store of IDs it has already processed, and discard the event if necessary.+++## Implementation+To configure an Idempotent Reader to read only committed transactions, set the following parameter:++```+isolation.level="read_committed"+```++In your Kafka Streams application, to handle operational failures, you can [enable EOS](https://docs.confluent.io/platform/current/streams/developer-guide/config-streams.html#processing-guarantee). Within a single transaction, a Kafka Streams application using EOS will atomically update its consumer offsets, its state stores including their changelog topics, its repartition topics, and its output topics.

Please add how to enable EOS in KStreams:

processing.guarantee= exactly_once_v2

Unfortunately, I don't know how to create a suggestion for this, as the backticks for the code above interferes with the syntax for the GH suggestion.

ybyzek

comment created time in 3 months

PullRequestReviewEvent
more