Skip to content

Adding Regex based subscriptions to KafkaIo Dynamic Read.  #21338

@damccorm

Description

@damccorm

In https://issues.apache.org/jira/browse/BEAM-11325 the ability for Kafka to read from topics dynamically was added.

Along with this change, the ability to use regex to subscribe to topics in a dynamic way was discussed in the design for this change. 

https://docs.google.com/document/d/1FU3GxVRetHPLVizP3Mdv6mP5tpjZ3fd99qNjUI5DT5k/edit?disco=AAAALGbMoak

Pointing out the idea that subscribing to all topics in a cluster isn't particularly useful for most kafka users, where pattern based subscription is a very common pattern of use.

https://kafka.apache.org/25/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#subscribe-java.util.regex.Pattern-

https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/WatchKafkaTopicPartitionDoFn.java#L113

The current implementation does not utilize a pattern in anyway to subscribe to topics. The comments on the document mention piggy backing off the existing functionality in the KafkaConsumers subscribe method.

However, piggy backing on the existing consumer method is made difficult by the per partition subscription method used by beam.

But I believe a simple solution exists, 

public Read<K, V> withDynamicRead(Duration duration) {

As apart of the with dynamic read method, allow the option to pass a Pattern.

As the watchkafkatopicpartitiondofn does now, call listTopics, and then match against the list of topics using the supplied pattern. 

Imported from Jira BEAM-13987. Original Jira may contain additional context.
Reported by: njohnson223.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions