Skip to content

make the method setBackOffFunction work for the batch processing as well#4455

Closed
LeosBitto wants to merge 4 commits into
spring-projects:mainfrom
LeosBitto:main
Closed

make the method setBackOffFunction work for the batch processing as well#4455
LeosBitto wants to merge 4 commits into
spring-projects:mainfrom
LeosBitto:main

Conversation

@LeosBitto

Copy link
Copy Markdown
Contributor

The method setBackOffFunction of FailedRecordProcessor and its descendants (FailedBatchProcessor and DefaultErrorHandler) is useful for classifying the exceptions thrown from the application code (listeners). However, I find it very surprising that this method does not work when processing the messages in batches (the supplied backOffFunction is never called there). Please review my attempt to make it work for the batch processing as well.

LeosBitto added 2 commits May 30, 2026 10:19
Signed-off-by: Leos Bitto <leos.bitto@gmail.com>
…cessing (in FallbackBatchErrorHandler) the same way as it does without batch processing (in FailedRecordTracker)

Signed-off-by: Leos Bitto <leos.bitto@gmail.com>
@sobychacko

Copy link
Copy Markdown
Contributor

Hi @LeosBitto, thanks for the PR.

I'd like to scope this down to just the setBackOffFunction fix if possible. The resetStateOnExceptionChange part brings in a fair bit: new public retryBatch overload, deprecation of the old one, the override on FailedBatchProcessor that silently changes the existing setter's semantics, and the outer loop in handleBatch, etc. We want to backport this to 4.0.x and possibly 3.3.x before OSS support ends end of June, so a smaller diff helps. Also, as we are very close to the 4.1.0 GA release (< 1 week), I would like to reduce the API surface area changes as little as possible, but would love to merge the fundamental issues you uncovered via these changes.

How about a minimal version of this PR for the time being?

  • Keep your setFailureTracker wiring in the FailedBatchProcessor constructor
  • Keep the package-private widening of FailedRecordTracker.determineBackOff
  • In FallbackBatchErrorHandler.handleBatch, resolve the BackOff using determineBackOff and pass it to the existing retryBatch call.

That way we keep ErrorHandlingUtils untouched - no new overload, no deprecation, no outer loop, no new public setters, etc.

For resetStateOnExceptionChange on the batch path in the PR, please open a separate issue. Defaults differ between the two (true on FailedRecordTracker, false on FallbackBatchErrorHandler), and overloading the existing setter to drive both feels off. Worth its own design discussion, and a new issue and PR might be the right approach.

Also could you add a test that exercises setBackOffFunction in batch mode? I don't see such a test in the PR.

Couple of nits: Found some minor typos in javadoc comments, SuppressWarning tags, etc.

Please let us know what you think about these. As I am said, I am eager to make these changes into the releases next week, but a bit ambivalent to merge the full change set as it stands right now.

Thanks!

@LeosBitto

LeosBitto commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

I'd like to scope this down to just the setBackOffFunction fix if possible. The resetStateOnExceptionChange part brings in a fair bit: new public retryBatch overload, deprecation of the old one, the override on FailedBatchProcessor that silently changes the existing setter's semantics, and the outer loop in handleBatch, etc. We want to backport this to 4.0.x and possibly 3.3.x before OSS support ends end of June, so a smaller diff helps.

I understand your concerns, and I have started with just just the minimal setBackOffFunction fix as you suggest - see my first commit. But then I have tested it in my application and found a problem with resetStateOnExceptionChange being hardcoded to false for batch processing (even after DefaultErrorHandler#setResetStateOnExceptionChange(true) is called), which makes BackOffFunction work incorrectly - consider a first exception which leads to a BackOff with many retries (until this exception is resolved, because it is known to be transient like java.sql.SQLTransientException for example), then another exception which should lead to quickly skipping the message (to avoid blocking processing of the following messages, because that exception is known to be unresolvable by repeating) but this does not happen because resetStateOnExceptionChange is hardcoded to false in ErrorHandlingUtils.retryBatch (i.e. even after the first exception is resolved its BackOff always continues to be used for another exception). So I have added the second commit which makes resetStateOnExceptionChange work for the single message processing and batch processing consistently. I have preserved the backward compatibility as much as possible (even with the different default values for resetStateOnExceptionChange in FailedRecordTracker and FallbackBatchErrorHandler as you have correctly noticed) while enabling the application to configure the exception handling properly.

Of course I could revert my second commit and keep resetStateOnExceptionChange working incorrectly for the batch processing, but I would not like this because I see the current ignoring of resetStateOnExceptionChange for the batch processing to be an obvious bug, which makes using BackOffFunction unreliable.

Please let me know whether you really want to keep the current (mis)behaviour of resetStateOnExceptionChange for the batch processing even after I have explained the consequences.

Signed-off-by: Leos Bitto <leos.bitto@gmail.com>
@LeosBitto

Copy link
Copy Markdown
Contributor Author

Also could you add a test that exercises setBackOffFunction in batch mode? I don't see such a test in the PR.

You are right, such test was missing. I have added it now.

Couple of nits: Found some minor typos in javadoc comments, SuppressWarning tags, etc.

Could you point me to those, please? I will fix them asap.

@sobychacko

Copy link
Copy Markdown
Contributor

@LeosBitto Yes, I see your points now. It makes sense to me after hearing your perspectives. I agree that without the reset working, the BackOffFunction is misleading in batch mode. However, let's merge the full change only on main branch. No backport to 4.0.x or 3.3.x - the propagation behavior change is fine for a feature release but I don't recommend it on a maintenance branch this late.

Thanks for adding the missing test.

Couple of things before merge:

  • Please add a what's-new entry for 4.1.0 calling out the propagation. Existing users setting setResetStateOnExceptionChange(true) on DefaultErrorHandler will see new batch behavior on the fallback path.
  • Typo in the @SuppressWarnings comment in FailedBatchProcessor: "geFailureTracker"
  • The new retryBatch overload javadoc reads off: "true to if a different exception thrown during retry should end this method".

Thanks!

Signed-off-by: Leos Bitto <leos.bitto@gmail.com>
@LeosBitto

Copy link
Copy Markdown
Contributor Author

@LeosBitto Yes, I see your points now. It makes sense to me after hearing your perspectives. I agree that without the reset working, the BackOffFunction is misleading in batch mode. However, let's merge the full change only on main branch. No backport to 4.0.x or 3.3.x - the propagation behavior change is fine for a feature release but I don't recommend it on a maintenance branch this late.

I have fixed the typos and amended whats-new.adoc, so this PR should be ready for merge.

Do I understand it correctly that you are afraid of backporting because of changed behavior of setResetStateOnExceptionChange(true) for the batch processing? I cannot imagine that somebody would actually call this and rely on the bug that it does not actually work for the batch processing...

@sobychacko sobychacko added this to the 4.1.0 milestone Jun 3, 2026
sobychacko pushed a commit to sobychacko/spring-kafka that referenced this pull request Jun 3, 2026
Wire `FailedRecordTracker` into `FallbackBatchErrorHandler` so
that `BackOffFunction` is consulted for batch retries, matching
the behavior already in place for single-record processing.

Also propagate `setResetStateOnExceptionChange` from
`FailedBatchProcessor` to `FallbackBatchErrorHandler` so that a
changed exception can start a fresh back-off sequence. The
default remains false on the batch path to preserve existing
runtime behavior.

Signed-off-by: Leos Bitto <leos.bitto@gmail.com>
@sobychacko

Copy link
Copy Markdown
Contributor

Merged upstream via this commit: 7c9f5c0

The main reason we don't want to backport this to maintenance branches is the new API changes. It wouldn't be fair to users on 3.3.x and 4.0.x suddenly finding the retryBatch method deprecated alongside other public API additions. We usually don't make those types of API changes in maintenance releases. Notwithstanding that, if you want to send a PR
to add these behavior changes to those maintenance versions in a non-API-changing manner, we'd be open to the idea.

Thanks a lot for the PR and for your patience throughout the review process.

@sobychacko sobychacko closed this Jun 3, 2026
artembilan pushed a commit that referenced this pull request Jun 8, 2026
Wire `FailedRecordTracker` into `FallbackBatchErrorHandler` so
that `BackOffFunction` is consulted for batch retries, matching
the behavior already in place for single-record processing.

Also propagate `setResetStateOnExceptionChange` from
`FailedBatchProcessor` to `FallbackBatchErrorHandler` so that a
changed exception can start a fresh back-off sequence. The
default remains false on the batch path to preserve existing
runtime behavior.

Fixes: #4455

Signed-off-by: Leos Bitto <leos.bitto@gmail.com>
artembilan pushed a commit that referenced this pull request Jun 8, 2026
Wire `FailedRecordTracker` into `FallbackBatchErrorHandler` so
that `BackOffFunction` is consulted for batch retries, matching
the behavior already in place for single-record processing.

Also propagate `setResetStateOnExceptionChange` from
`FailedBatchProcessor` to `FallbackBatchErrorHandler` so that a
changed exception can start a fresh back-off sequence. The
default remains false on the batch path to preserve existing
runtime behavior.

Fixes: #4455

Signed-off-by: Leos Bitto <leos.bitto@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants