KAFKA-5886: Introduce delivery.timeout.ms producer config (KIP-91) by yuyang08 · Pull Request #5270 · apache/kafka

yuyang08 · 2018-06-21T23:09:21Z

This change is based on @sutambe 's change #3849 earlier.

primary changes in this pr:

In RecordAccumulator.java, use inFlightBatches to track the in-flight batches, instead of using soonToExpireInFlightsBatches to only track the soon-to-expire batches. With this change, in RecordAccumulator.expiredBatches, we check both inFlightBatches and batches to find the expired batches.
Fixed the test failures in SenderTest.java and RecordAccumulatorTest.java.

yuyang08 · 2018-06-21T23:14:12Z

@apurvam, @becketqin, @guozhangwang, @ijuma could you help to review this change?

guozhangwang · 2018-06-23T00:27:18Z

Thanks @yuyang08 , we will take a look at this PR asap.

yuyang08 · 2018-06-26T17:40:03Z

@guozhangwang , @ijuma , @apurvam , @becketqin friendly ping ... your feedback will help us to iterate faster

hachikuji

Left a few initial comments. Still making my way through the full PR.

hachikuji · 2018-06-26T20:16:27Z

This check is a little weird to me. It seems arbitrary to check only lingerMs in the case of overflow. Wouldn't it make more sense to use Integer.MAX_VALUE?

updated the sanity checking logic. changed lingerMs and deliveryTimeoutMs to integer type. with that, we convert them to long for addition and do not need to worry about the overflow.

hachikuji · 2018-06-26T20:18:34Z

I'm wondering if we should set this value automatically if the value is not overridden. It would be annoying to get this config error after an upgrade if I have overridden the request timeout. Maybe we could use the max of the default delivery timeout and the sum of requestTimeoutMs and lingerMs?

my concern is that setting the value automatically may hide some issues from the user, and it also requires more explanation. It might be better to keep it simple. The issue that the users can run into an upgrade is that they override requestTimeoutMs with a large value, and that violates the required variant deliveryTimeout > requestTimeoutMs + lingerMs. In that case, the user will be notify the error during producer initialization, and it shall not take much effort to fix it.

Yeah, I appreciate the concern. However, there is already some precedent for redefining defaults based on provided configurations. For example, when idempotence is enabled, we override retries if it has not been explicitly provided by the user. We log an info message so that the user knows the default has been adjusted.

In general, I think we should try to avoid compatibility issues even in configuration as long as it is safe to do so. By using a larger request timeout, the user has already declared willingness to await for the request timeout to receive a produce acknowledgement, so redefining the default delivery timeout seems reasonable and saves an annoying config update.

A reason for the error is that a longer request timeout is not unlikely to have been done as a workaround for the issue that delivery timeout is fixing. So, it might provide the user with an opportunity to fix that. A warning might be a less heavy handed way to achieve this though.

updated the change to use requestTimeoutMs + lingerMs if it is larger than deliveryTimeoutMs setting, and log a warning message.

hachikuji · 2018-06-26T20:21:11Z

The latter half here may not be very clear to users since it's a bit low level. Is there actually a lower bound for the timeout or can it be arbitrarily small depending on how long the batch remains open?

updated the doc string on the lower bound explanation.

hachikuji · 2018-06-26T20:25:07Z

Can you mention this default change in the upgrade notes?

updated the upgrade doc on this.

hachikuji · 2018-06-26T20:27:16Z

Maybe helpful to mention the baseOffset in this message?

added baseOffset info in the log message

hachikuji · 2018-06-26T20:53:17Z

nit: this is unused

hachikuji · 2018-06-26T20:56:20Z

Can we explain in this comment why we use this order? Intuitively, I would expect that we'd expire the oldest stuff first so that the callbacks are invoked in the order of sending.

this is a comment that was in #3849. updated the comment.

hachikuji · 2018-06-26T20:57:59Z

Hmm.. Shouldn't we be expiring the batch in the else case? Otherwise it seems like we may lose track of the batch.

good catch. expire the batch in else branch.

hachikuji · 2018-06-26T21:03:43Z

Unless I'm missing something, we only remove from the inFlightBatches collection when we reenqueue and when we encounter a delivery timeout. Don't we need to remove on completion or failure as well?

Good catch. This would have to be called from Sender.failBatch and Sender.completeBatch as well, similar to how the transactionManager.removeInflightBatch is called in those methods.

good catch! updated the code to call maybeRemoveFromInflightBatches from Sender.failBatch and Sender.completeBatch.

hachikuji · 2018-06-26T21:04:00Z

nit: unintentional?

restored the space

apurvam

Thanks for the patch @yuyang08 .. I made a pass over the core logic and left some comments. The logic is looking good.

I still need to go over the tests and make a second pass over the core code.

apurvam · 2018-06-28T06:06:01Z

In what scenario will batch.createdMs + deliveryTimeoutMs be negative? batch.createdMs is the wall clock time when the batch is created, and should always be positive. The config def for deliveryTimeoutMs enforces the value is atleast(0). So then how could the sum of the two be negative? Are you accounting for wrap around?

yes, this is to guard us against potential overflow due to setting a large value for deliveryTimeoutMs.

Ok. It would be good mention this in a comment. It is very non obvious.

update the code to use if ... statement and add the comment

apurvam · 2018-06-28T06:15:07Z

I am assuming this method and drainBatchesForOneNode is just a refactor? Was any logic changed here? It is hard to tell from the diff and the logic is super intricate so subtle changes are easy to miss. I assume nothing should have to change in this portion of the code since it is totally orthogonal to expiring batches.

yes, this is a refactoring. otherwise, the method RecordAccumulator.drain fails in style checking due to too many possible paths in one method. comparing with the previous code, the change is to add the following lines on updating inflightBatches

// put this batch in the infligh list List<ProducerBatch> inflightBatchList = inFlightBatches.get(batch.topicPartition); if (inflightBatchList == null) { inflightBatchList = new LinkedList<>(); inFlightBatches.put(tp, inflightBatchList); } inflightBatchList.add(batch);

apurvam · 2018-06-28T06:20:20Z

here and elsewhere, the code style for kafka requires that there be braces even around single line if statements like this.

restored the curly braces.

apurvam · 2018-06-28T06:21:20Z

The preceding comment needs to be updated to account for this new logic.

good point. updated the comment

apurvam · 2018-06-28T06:21:50Z

This seems not to be used.

good catch. removed the unused code.

apurvam · 2018-06-28T06:27:08Z

Good catch. This would have to be called from Sender.failBatch and Sender.completeBatch as well, similar to how the transactionManager.removeInflightBatch is called in those methods.

apurvam

Hi @yuyang08 , thanks for the updates.

It is looking good. I left some comments, but I think the larger point is that we should add some checks that the inflight batches tracked in the accumulator are actually cleared.

I added some suggestions for adding checks for when batches expire.

But we should also add checks that the accumulator has no inflight batches when batches complete successfully and when they fail.

apurvam · 2018-06-28T23:30:13Z

nit: move this to a helper named markBatchInflight or something similar.

updated the code to capture this in markBatchInflight method

apurvam · 2018-06-29T05:48:45Z

We should log a warning if we have overflowed and are hence not updating the next batch expiry time.

good point. added the logging here.

apurvam · 2018-06-29T06:00:58Z

We should add some sort of accumulator.hasInflightBatches method, and then check that it returns false here. This would check that expired batches are not reenqueued, which is logic added in this patch.

added the method public List<ProducerBatch> inFlightBatches(TopicPartition tp) in RecordAccumulator, and updated the test with two assertions:

line 1912: assertEquals("Expect one in-flight batch in accumulator", 1, accumulator.inFlightBatches(tp0).size()); ..... line 1920: assertEquals("Expect zero in-flight batch in accumulator", 0, accumulator.inFlightBatches(tp0).size());

apurvam · 2018-06-29T06:03:16Z

How is this different from the test testExpiryOfFirstBatchShouldNotCauseUnresolvedSequencesIfFutureBatchesSucceed?

testExpiryOfFirstBatchShouldNotCauseUnresolvedSequencesIfFutureBatchesSucceed initialize sender with guaranteeMessageOrder = false, while testWhenFirstBatchExpireNoSendSecondBatchIfGuaranteeOrder initialize sender with guaranteeMessageOrder = true. The inflightBatches size is different when we set the parameter to true/false.

apurvam · 2018-06-29T06:05:12Z

THis should be dropped, or be log.debug.

good catch. removed this debugging line.

apurvam · 2018-06-29T06:05:43Z

If we had accumulator.hasInflightRequests we could assert false here.

apurvam · 2018-06-29T06:22:42Z

How is this testing that we don't double deallocate. Ideally, we would expire the inflight batch, then get a response, and then check that the batch is not deallocated twice. In this case, it would de allocate only once, unless I am missing something.

updated the test a bit to add another sender.run(time.milliseconds()); call after the batch expiry. Not sure if that fits expire the inflight batch, then get a response.

my expectation was that if there is double deallocation, we would get an IllegalStateException exception from MatchingBufferPool.deallocate.

yuyang08 · 2018-06-29T23:37:03Z

@hachikuji, @apurvam I've updated SenderTest.java with checks to ensure that the inflight batches tracked in the accumulator are cleared properly, and addressed your comments. could you check again?

yuyang08 · 2018-07-02T18:57:38Z

@hachikuji , @apurvam friendly ping ... could you help to review the updated change to allow us iterate faster? thanks!

hachikuji

Thanks for the updates. Left a few more comments.

hachikuji · 2018-07-02T20:33:42Z

To clarify, I think we should only override the default value. If the user has explicitly provided an inconsistent value, then we should throw an exception. We can check this using config.originals().

hachikuji · 2018-07-02T20:38:10Z

nit: createdMs for consistency?

By the way, we have a function below createdTimeMs which is currently unused and seems incorrect anyway. Can you remove it?

updated to createdMs, and removed the unused method.

hachikuji · 2018-07-02T20:42:43Z

nit: "Ignores" -> "Ignored"?

hachikuji · 2018-07-02T20:46:38Z

It seems tryFinalState can only be SUCCEEDED or FAILED, so one of these transitions is not possible anyway.

ProducerBatcn.finalState can also be updated to FinalState.ABORTED through Sender.run() --> RecordAccumulator.abortIncompleteBatches() or abortUndrainedBatches() --> ... -> ProducerBatch.abort() .

My point is that tryFinalState can only be SUCCEEDED or FAILED.

final FinalState tryFinalState = (exception == null) ? FinalState.SUCCEEDED : FinalState.FAILED;

So the only possibilities are FAILED -> FAILED and ABORTED -> FAILED.

i see. misunderstood your comment earlier. updated the change

hachikuji · 2018-07-02T20:48:18Z

Are the order of the first two arguments backwards? I think tryFinalState is the state we're trying to transition to.

good catch! fixed the arguments order.

hachikuji · 2018-07-02T21:56:44Z

nit: add back the newline

added it back

hachikuji · 2018-07-02T21:57:02Z

nit: unneeded newline

removed the new line

hachikuji · 2018-07-02T21:58:19Z

nit: topic-partition as we did above?

hachikuji · 2018-07-02T22:00:16Z

nit: realign

hachikuji · 2018-07-02T22:06:55Z

Aren't both bounds inclusive? The inconsistency is a little annoying and this is a private constructor anyway. I think mentioning this in the javadoc below is good enough.

added java doc comments and changed the parameter name to max

hachikuji · 2018-07-03T16:13:03Z

Can you mention that this is for the producer? For example:

The default value for the producer's retries config was changed to INT.MAX_VALUE ...

Might also be worthwhile including a link to KIP-91

updated to include KIP-91 link

yuyang08 · 2018-07-09T17:57:40Z

@hachikuji @apurvam @ijuma have updated the change to address your comments. could you take a look again? thanks!

yuyang08 · 2018-07-11T18:00:19Z

@hachikuji , @apurvam friendly ping... mind to take another look? thanks!

hachikuji

Thanks, left a few more comments.

hachikuji · 2018-07-11T17:58:30Z

I'm not sure this is sufficient to solve the issue mentioned above. If we do not reenqueue because the timeout has been reached, then who is responsible for completing the batch? I think I would probably suggest that we skip the check for delivery timeout and just reenqueue. We will detect the expiration the next time we iterate the deque.

I think the only issue with reenqueing unconditionally is that the batch will then be drained even if it is expired, since accumulator.drain is called before accumulator.expiredBatches in the background thread. This could violate the contract.

That said, we need to complete the batch and deallocate it over here, otherwise it seems to be dropped on the flor.

update to check whether a batch has reached deliveryTimeoutMs or not in Sender.canRetry. With this change, we do not need to check whether a batch has timeout or not in RecordAccumuator.reenqueue, and cleaning up of timed-out batches is handled by Sender.failBatch.

private boolean canRetry(ProducerBatch batch, ProduceResponse.PartitionResponse response, long now) { return !batch.hasReachedDeliveryTimeout(accumulator.getDeliveryTimeoutMs(), now) && ...

hachikuji · 2018-07-11T18:00:20Z

Is this call necessary for in-flight batches? Presumably we have already closed the append stream since the batch was already sent once.

good point. removed this line.

hachikuji · 2018-07-11T18:01:56Z

If it is an expected invariant, I would suggest we raise an exception if it does not hold. Otherwise, bugs will go undetected since we don't have a way to verify logging output when running tests.

+1. Better to fail with an exception if the invariant is violated.

updated the change to throw IllegalBatchFinalStateException when the invariant is violated.

hachikuji · 2018-07-11T18:05:49Z

nit: introduces -> introduced

fixed the typo.

tedyu · 2018-07-12T04:14:37Z

Why changing linger to int ?

we have request.timeout.ms and delivery.timeout.ms of int type. this is to make the type of linger.ms be consistent with other timeout related settings.

tedyu · 2018-07-12T04:15:18Z

Should this be of type long ?
With long, there is no overflow on line 478

In ProducerBatch, deliveryTimeoutMs is long in hasReachedDeliveryTimeout

ProducerBatch.hasReachedDeliveryTimeout is called by RecordAccumulator. In RecordAccumulator's construct, we have had using long type for lingerMs, and retryBackoffMs. It will be inconsistent to use int for deliveryTimeoutMs. And it will require changes at many places (especially in the test cases) if we use int type for lingerMs and retryBackoffMs. I thought that it would be better to have another PR for data type related changes for lingerMs etc.

public RecordAccumulator(LogContext logContext, int batchSize, CompressionType compression, long lingerMs, long retryBackoffMs, ...

tedyu · 2018-07-12T04:17:42Z

Doesn't need to be warn.
Can be info since there is no action from user

This is try to get the user's attention as we are overriding the default delivery.timeout.ms setting. Previously the user may set a long request.timeout.ms as a work around. The user may want to explicitly set delivery.timeout.ms and give a smaller value for request.timeout.ms.

apurvam

Sorry for the delayed response @yuyang08 . The week of july 4 was extremely short, and last week we had a company event and multi-day off site, which took a lot of cycles.

I left a few more comments, the biggest one around synchronization of inFlightBatches.

apurvam · 2018-07-17T05:46:53Z

It seems to me that we need to synchronize on accesses to the contained List<ProducerBatch>. This list is updated in maybeRemoveFromInflightBatches, which called both from the sender background thread and from the network client threads. This unsafe access can corrupt the list in the present patch.

I am not fully convinced that there is concurrent access to List<ProducerBatch> inFlightBatches. In KafkaProducer, there is only one background I/O thread (Sender) that turns the records into requests and transmit them to the cluster. inFlightBatches is only being modified by RecordAccumulator methods that are called by Sender thread. Sender.run(long now) --> NetworkClient.poll --> NetworkClient.handle... --> Sender.handleProduceResponse --> Sender.completeBatch --> Sender.reenqueueBatch --> RecordAccumulator.reenque --> RecordAccumulator.maybeRemoveFromInflightBatches is the call stack. NetworkClient uses Selector for non-blocking i/o multiplexing, but it is not a thread.

We need to synchronize access on Deque<ProducerBatch> deque = entry.getValue(); as both the producer thread and the sender thread can concurrently access deque at the same time.

I couldn't find the concurrent access either, so it may be ok. It actually makes me wonder why we are using a ConcurrentMap? To be honest, it would be more intuitive for the inflight batches to be tracked inside Sender. I'd suggest doing that if it's straightforward, but I'm fine leaving it here if it takes a lot of work.

Yea. I think I was mistaken. If the callbacks from the network client are happening in the context of the sender thread, then there is no concurrent access on inflightBatches.

good point on tracking the inflight batches in Sender. updated the change to move inFlightBatches to Sender, and changed its type from ConcurrentMap to Map.

apurvam · 2018-07-17T05:48:10Z

+1. Better to fail with an exception if the invariant is violated.

apurvam · 2018-07-17T05:53:29Z

typo: "created".

fixed the typo

apurvam · 2018-07-17T05:54:25Z

As mentioned above, we should synchronize on this list while iterating, and then synchronize again in maybeRemoveFromInflightBatches

As I mentioned earlier, i might miss something, but it seems to me that there is no concurrent access to inFlightBatches, and we do not need synchronize qualifier for it.

apurvam · 2018-07-17T06:00:07Z

I think the only issue with reenqueing unconditionally is that the batch will then be drained even if it is expired, since accumulator.drain is called before accumulator.expiredBatches in the background thread. This could violate the contract.

That said, we need to complete the batch and deallocate it over here, otherwise it seems to be dropped on the flor.

apurvam · 2018-07-17T06:02:08Z

Why did those tests fail if you leave the isMuted in there?

asfgit · 2018-07-20T07:13:59Z

FAILURE
4401 tests run, 1 skipped, 1 failed.
--none--

yuyang08 · 2018-07-23T17:39:21Z

@hachikuji , @apurvam, @tedyu thanks for your review! I've address all of your comments. The change also passed the tests. could you take a look again?

hachikuji

@yuyang08 Thanks for the updates. I think we're pretty close. I had a few more small comments.

hachikuji · 2018-07-24T00:04:05Z

I'm not sure we need a new exception type for an error that should not happen. Maybe we can just use IllegalStateException?

updated to use IllegalStateException

hachikuji · 2018-07-24T01:07:37Z

nit: can we use ArrayList? https://twitter.com/joshbloch/status/583813919019573248

updated to use ArrayList

hachikuji · 2018-07-24T01:08:17Z

We're not using the keys, so can we just iterate the values?

update to iterate through batches.values()

hachikuji · 2018-07-25T06:01:33Z

nit: extra space before method name

removed the extra space

hachikuji · 2018-07-25T06:03:04Z

Can you clarify why use this for the iteration? I would have expected we would iterate over the entry set of inFlightBatches.

this is really not needed. updated to iterate through inFlightBatches directly.

hachikuji · 2018-07-25T06:04:56Z

nit: this is unused

removed this unused method

hachikuji · 2018-07-25T06:08:00Z

This message needs to be updated.

updated the comment:

// expireBatches is called in Sender.sendProducerData, before client.poll. // The batch.finalState() == null invariant should always hold. An IllegalStateException // exception will be thrown if the invariant is violated.

hachikuji · 2018-07-25T06:10:12Z

If batches is empty, should we remove the list from inFlightBatches?

updated the change to remove the list from inFlightBatches when it is empty.

hachikuji · 2018-07-25T21:25:12Z

retest this please

apurvam

Thanks for the patch @yuyang08 and for the patience during the review!

I went over the core flow and the tests once more, and it has been simplified quite nicely and LGTM!

…cordAccumulatorTest

hachikuji

LGTM, merging to trunk. Thanks for the patch!

hachikuji · 2018-07-26T16:02:54Z

Note that I removed an unused method and made some tweaks to the upgrade notes.

ijuma · 2018-07-26T16:42:58Z

@yuyang08 Thanks for picking up this important improvement. And thanks @sutambe for submitting the original PR.

likan999 · 2019-01-12T01:46:44Z

+        int lingerMs = config.getInt(ProducerConfig.LINGER_MS_CONFIG);
+        int requestTimeoutMs = config.getInt(ProducerConfig.REQUEST_TIMEOUT_MS_CONFIG);
+
+        if (deliveryTimeoutMs < Integer.MAX_VALUE && deliveryTimeoutMs < lingerMs + requestTimeoutMs) {


(deliveryTimeoutMs < lingerMs + requestTimeoutMs) implies (deliveryTimeoutMs < Integer.MAX_VALUE), why do we need to check (deliveryTimeoutMs < Integer.MAX_VALUE), logically it can be removed.

yuyang08 mentioned this pull request Jun 21, 2018

KAFKA-5886: Implement KIP-91 delivery.timeout.ms #3849

Closed

ijuma added this to the 2.1.0 milestone Jun 23, 2018

ijuma added the producer label Jun 23, 2018

hachikuji reviewed Jun 26, 2018

View reviewed changes

apurvam reviewed Jun 28, 2018

View reviewed changes

apurvam reviewed Jun 29, 2018

View reviewed changes

hachikuji reviewed Jul 2, 2018

View reviewed changes

hachikuji reviewed Jul 3, 2018

View reviewed changes

hachikuji reviewed Jul 11, 2018

View reviewed changes

tedyu reviewed Jul 12, 2018

View reviewed changes

apurvam reviewed Jul 17, 2018

View reviewed changes

hachikuji reviewed Jul 25, 2018

View reviewed changes

KAFKA-5886: Introduce delivery.timeout.ms producer config

ef5bf7e

apurvam approved these changes Jul 25, 2018

View reviewed changes

Yu Yang and others added 3 commits July 25, 2018 17:02

KAFKA-5886: update kafka pull request 3849, and fix SenderTest and Re…

89a35b8

…cordAccumulatorTest

Remove unused method

2c73e61

Additional detail in upgrade notes

b9aa6b1

hachikuji approved these changes Jul 26, 2018

View reviewed changes

hachikuji merged commit 7fc7136 into apache:trunk Jul 26, 2018

yuyang08 deleted the kip91 branch August 31, 2018 01:06

likan999 reviewed Jan 12, 2019

View reviewed changes

dnwe mentioned this pull request Jun 6, 2022

Seeing broken pipe and EOF errors IBM/sarama#2239

Closed

Conversation

yuyang08 commented Jun 21, 2018

Uh oh!

yuyang08 commented Jun 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

guozhangwang commented Jun 23, 2018

Uh oh!

yuyang08 commented Jun 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hachikuji left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ijuma Jun 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuyang08 Jun 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

apurvam left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuyang08 Jun 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

yuyang08 commented Jun 21, 2018 •

edited

Loading

yuyang08 commented Jun 26, 2018 •

edited

Loading

ijuma Jun 28, 2018 •

edited

Loading

yuyang08 Jun 28, 2018 •

edited

Loading

yuyang08 Jun 28, 2018 •

edited

Loading