Skip to content

Conversation

@sandeep-mst
Copy link
Contributor

Fixes #24422

Motivation

The Apache Pulsar KinesisSink utilises the KinesisProducer from the KPL. The KPL provides a comprehensive set of configuration parameters through its KinesisProducerConfiguration class.

In the current KinesisSink implementation, most of these parameters such as collectionMaxCount, collectionMaxSize, connectTimeout, maxConnections, and minConnections—are not configurable. This restricts users' ability to optimise the sink's performance, or meet network and operational requirements potentially affecting throughput, latency, and resource utilisation.

Modifications

  • Added the following KPL configuration parameters directly into the KinesisSink configuration.
Parameter Description
collectionMaxCount Maximum number of items to pack into an PutRecords request.
collectionMaxSize Maximum amount of data to send with a PutRecords request.
connectTimeout Timeout (milliseconds) for establishing TLS connections.
credentialsRefreshDelay How often to refresh credentials (in milliseconds).
maxConnections Maximum number of connections to open to the backend.
minConnections Minimum number of connections to keep open to the backend
rateLimit Limits the maximum allowed put rate for a shard, as a percentage of the backend limits.
recordTtl Set a time-to-live on records (milliseconds).
requestTimeout The maximum total time (milliseconds) allowed for a HTTP request
  • Introduce an Optional Hashmap for Additional Configuration (extraKinesisProducerConfiguration), which accepts a hashmap of key-value pairs corresponding to any other KPL configuration parameters not directly exposed. This ensures users retain full control over the KinesisProducer's behavior without overloading the primary sink configuration interface.

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change added tests and can be verified as follows:

  • Will be updated

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:
cognitree#25

@github-actions github-actions bot added the doc-required Your PR changes impact docs and you will update later. label Jul 7, 2025
Copy link
Contributor

@dlg99 dlg99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@BewareMyPower
Copy link
Contributor

Could you rebase to master so that the checkstyle check for tests will be applied?

@sandeep-mst
Copy link
Contributor Author

sandeep-mst commented Jul 11, 2025

Could you rebase to master so that the checkstyle check for tests will be applied?

@BewareMyPower
Done.

@sandeep-mst
Copy link
Contributor Author

/pulsarbot rerun-failure-checks

@BewareMyPower
Copy link
Contributor

It's weird for this failure:

Restoring tar from name pulsar-maven-repository-binaries.tar.zst in GitHub Actions Artifacts to /home/runner
  /home/runner/work/_temp/_github_home/.local/bin/gh-actions-artifact-client.js:79024
      throw new ArtifactNotFoundError(`No artifacts found with name: ${name}`);

@sandeep-mst maybe you have to try merging to master again to re-trigger the build.

BTW, @lhotari do you have any idea about the failure above?

@sandeep-mst
Copy link
Contributor Author

@BewareMyPower
Rebased to master again.

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 93.33333% with 1 line in your changes missing coverage. Please review.

Project coverage is 74.33%. Comparing base (bbc6224) to head (f609773).
Report is 1208 commits behind head on master.

Files with missing lines Patch % Lines
...java/org/apache/pulsar/io/kinesis/KinesisSink.java 92.30% 0 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #24489      +/-   ##
============================================
+ Coverage     73.57%   74.33%   +0.76%     
- Complexity    32624    32933     +309     
============================================
  Files          1877     1869       -8     
  Lines        139502   146105    +6603     
  Branches      15299    16762    +1463     
============================================
+ Hits         102638   108610    +5972     
+ Misses        28908    28889      -19     
- Partials       7956     8606     +650     
Flag Coverage Δ
inttests 26.88% <ø> (+2.30%) ⬆️
systests 23.27% <0.00%> (-1.05%) ⬇️
unittests 73.84% <93.33%> (+0.99%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...rg/apache/pulsar/io/kinesis/KinesisSinkConfig.java 90.90% <100.00%> (+10.13%) ⬆️
...java/org/apache/pulsar/io/kinesis/KinesisSink.java 57.53% <92.30%> (+2.91%) ⬆️

... and 1100 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@sandeep-mst
Copy link
Contributor Author

sandeep-mst commented Jul 21, 2025

@BewareMyPower, @dlg99, @lhotari

The OWASP dependency check is failing on master as well and I have verified that pulsar-io/kinesis has no new vulnerabilities.

Let me know if anything else is required on this PR.

@lhotari
Copy link
Member

lhotari commented Jul 21, 2025

It's weird for this failure:

Restoring tar from name pulsar-maven-repository-binaries.tar.zst in GitHub Actions Artifacts to /home/runner
  /home/runner/work/_temp/_github_home/.local/bin/gh-actions-artifact-client.js:79024
      throw new ArtifactNotFoundError(`No artifacts found with name: ${name}`);

@sandeep-mst maybe you have to try merging to master again to re-trigger the build.

BTW, @lhotari do you have any idea about the failure above?

@BewareMyPower yes. Retrying a failed build after 3 days won't work since the GitHub workflow artifacts retention is set to 3 days so that we don't overflow the size of storage. It's necessary to trigger a completely new build in that case. Usually it's recommended to merge master to the PR branch to trigger that.

@dlg99 dlg99 merged commit 80902f8 into apache:master Jul 22, 2025
96 of 98 checks passed
priyanshu-ctds pushed a commit to datastax/pulsar that referenced this pull request Jul 24, 2025
priyanshu-ctds pushed a commit to datastax/pulsar that referenced this pull request Jul 24, 2025
priyanshu-ctds pushed a commit to datastax/pulsar that referenced this pull request Jul 24, 2025
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Jul 24, 2025
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Jul 24, 2025
@lhotari lhotari added this to the 4.1.0 milestone Jul 25, 2025
Technoboy- pushed a commit that referenced this pull request Jul 31, 2025
KannarFr pushed a commit to CleverCloud/pulsar that referenced this pull request Sep 22, 2025
walkinggo pushed a commit to walkinggo/pulsar that referenced this pull request Oct 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-picked/branch-4.0 doc-required Your PR changes impact docs and you will update later. ready-to-test release/4.0.7

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Enhancement] Expose KinesisProducerConfiguration with the KinesisSinkConfig

6 participants