KAFKA-8052; Ensure fetch session epoch is updated before new request by rajinisivaram · Pull Request #6582 · apache/kafka

rajinisivaram · 2019-04-14T10:36:40Z

When fetch response is processed by the heartbeat thread, polling thread may send new fetch request with the same epoch as the previous fetch request if heartbeat thread hasn't yet updated the epoch. This results in INVALID_FETCH_SESSION_EPOCH error. Even though the request is retried without any disconnections, it will be good to avoid this error. The PR tracks status of previous request in the session handler and sends next fetch request only after the response from the previous request is processed.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

cmccabe · 2019-04-15T20:46:22Z

Thanks, @rajinisivaram. Good find!

If I understand correctly, the problem is that we could have invoked the callback for the fetch request, and be about to increment the epoch, when another prepareFetchRequests call happened, and ended up re-using the same epoch number. This is because the client.hasPendingRequests(node) condition will be false once the response is received, even if we haven't yet processed that response. (P.S. Thanks to @hachikuji for an explanation of how the heartbeat thread works)

I don't think the logic for closing this race condition belongs in the FetchSessionHandler since it doesn't really have anything to do with fetch sessions. It would also be better to not use volatiles-- they are scary! :)

I think we should just get rid of the client.hasPendingRequests(node) check, since it's not sufficient, and add a SynchronizedSet<Integer> pendingFetchRequests. Then we can just do something like

if (pendingFetchRequests.add(nodeId)) {
  // ... invoke session handler
} else {
  // log.trace("Skipping fetch for partition {} because previous request to {} has not been processed", partition, node);
}

Then I think the onSuccess and onFailure handers in Fetcher.java can do pendingFetchRequests.remove(nodeId), once they're finished with all processing. It would be better to do it at the very end of those functions, since otherwise we have to think really hard about the state of data structures like completedFetches, etc. Maybe a try/finally block could help here?

hachikuji · 2019-04-16T15:26:19Z

@rajinisivaram By the way, it seems this bug can also explain https://issues.apache.org/jira/browse/KAFKA-7565?

manderson23 · 2019-04-23T14:31:19Z

Can this be merged into 2.2.x when resolved? Our logs are flooded with these messages.

noslowerdna · 2019-05-17T18:56:21Z

Is warning level more appropriate than info?

@noslowerdna Thanks for the review. I am not sure we want to change the log level from info to warn in all the cases in Fetcher. @cmccabe is more familiar with this code, so will be interested to see what he thinks.

noslowerdna · 2019-05-17T18:56:57Z

Is warning level more appropriate than info?

noslowerdna · 2019-05-17T18:57:19Z

Is warning level more appropriate than info?

noslowerdna · 2019-05-17T18:58:43Z

Is warning level more appropriate than info? Especially since we no longer expect the INVALID_FETCH_SESSION_EPOCH error to commonly occur.

noslowerdna · 2019-05-17T19:05:26Z

The class-level javadoc should thoroughly document the thread-safety of this class. And the sessionPartitions variable should be a ConcurrentHashMap rather than a LinkedHashMap (as KAFKA-7280 shows).

Have included the changes in this PR under the thread safety section. To avoid any risky changes, only changes specific to this issue are included in this PR. For changes to other variables, it will be better to create a separate JIRA describing the issue.

ijuma · 2019-05-19T22:57:23Z

Do we want to get this merged before 2.3.0?

rajinisivaram · 2019-05-20T10:47:36Z

@ijuma @cmccabe @noslowerdna I have rewritten this PR based on the suggestion from @cmccabe. It is a small change specifically to address the race condition in fetch request creation. Since several users run into this frequently, it may be worth including this in 2.3.0 even though it is actually not a blocker.

rajinisivaram · 2019-05-20T10:57:35Z

retest this please

ijuma · 2019-05-20T14:42:44Z

@rajinisivaram To double check, https://issues.apache.org/jira/browse/KAFKA-7565 is not addressed by this change, correct?

noslowerdna · 2019-05-20T17:02:23Z

The rewrite looks good

rajinisivaram · 2019-05-20T18:04:42Z

@ijuma Yes, that is correct, this does not fix the issue in https://issues.apache.org/jira/browse/KAFKA-7565.

hachikuji

Thanks @rajinisivaram. Left one comment.

hachikuji · 2019-05-20T20:17:58Z

        Map<Node, FetchSessionHandler.FetchRequestData> reqs = new LinkedHashMap<>();
        for (Map.Entry<Node, FetchSessionHandler.Builder> entry : fetchable.entrySet()) {
            reqs.put(entry.getKey(), entry.getValue().build());
+            this.nodesWithPendingFetchRequests.add(entry.getKey().id());


I wonder if it would be better to make this call after client.send returns successfully in sendFetches?

Another safety we might add is to remove from nodesWithPendingFetchRequests above if client.isUnavailable returns true?

@hachikuji Thanks for the review. Moved the call after client.send.

I wasn't sure about removing for client.isUnavailable. I think we guarantee that the listener onSuccess or onFailure is invoked in all cases if send suceeds. So that should be sufficient? I was a bit concerned that an additional remove could in theory mean that a listener invoked later from heartbeat thread could potentially remove the subsequent send. Dont think it can happen in practice though.

Yeah, I think that should be fine. I just wasn't sure how much I wanted to trust the request completion handling logic. I guess it would break a lot of expectations elsewhere if it were broken though, so probably no additional harm from relying on it here.

hachikuji · 2019-05-20T20:46:29Z

Here is why I thought it might be the cause of KAFKA-7565. The bug as I understand it is that we are sending requests using stale session information. Basically the sequence is like this:

send request epoch=n
receive response, but don't handle it
send new request with epoch=n

Is that right? I was thinking that prior to step 3), we might add a new partition to the fetch session. Then we might hit the KAFKA-7565 case when we eventually handled the response from 2).

rajinisivaram · 2019-05-21T09:53:47Z

@hachikuji Yes that sequence is the issue in KAFKA-7565. So this should prevent that happening. Thank you!

noslowerdna · 2019-05-21T16:28:27Z

    private final AtomicReference<RuntimeException> cachedListOffsetsException = new AtomicReference<>();
    private final AtomicReference<RuntimeException> cachedOffsetForLeaderException = new AtomicReference<>();
    private final OffsetsForLeaderEpochClient offsetsForLeaderEpochClient;
+    private final Set<Integer> nodesWithPendingFetchRequests;


One idea that I had was to make this a Map<Integer, Long>, with the value being System.currentTimeMillis() at the time the fetch request is sent.

That would allow the "Skipping fetch for partition" log message to include the duration that the previous request has been pending for (possibly adjusting the log level based on how long ago that previous request was sent), and also enable a fetch request time metric to be easily collected if someone wishes to add that enhancement in the future.

FYI, there is already a metric for fetch request latency.

hachikuji

Thanks @rajinisivaram . LGTM

hachikuji · 2019-05-21T19:44:29Z

    }

+    @Test
+    public void testFetcherSessionEpochUpdate() throws Exception {


I guess there was no choice but to carefully tailor this test in order to hit the bug. We have to do it, but the downside is that its scope is narrow and may be difficult to keep it relevant as the code evolves. Anyway, hopefully at some point we'll get the time to move all network IO to the background thread and then we can simplify a lot of this.

rajinisivaram · 2019-05-21T20:42:51Z

@hachikuji @cmccabe @noslowerdna @jsancio Thanks for the reviews. Merging to trunk and 2.3.

…6582) Reviewers: Jason Gustafson <jason@confluent.io>, Colin Patrick McCabe <cmccabe@confluent.io>, Andrew Olson <aolson1@cerner.com>, José Armando García Sancio <jsancio@users.noreply.github.com>

manderson23 · 2019-05-21T21:49:15Z

Would this ever be merged to a 2.2.x release?

* apache-github/trunk: MINOR: Set `replicaId` for OffsetsForLeaderEpoch from followers (apache#6775) MINOR: A few logging improvements in the broker (apache#6773) KAFKA-8052; Ensure fetch session epoch is updated before new request (apache#6582) KAFKA-8315: fix the JoinWindows retention deprecation doc (apache#6664) KAFKA-8265: Fix override config name to match KIP-458. (apache#6776) KAFKA-3143: Controller should transition offline replicas on startup MINOR: Work around OpenJDK 11 javadocs issue. (apache#6747) MINOR: Bump version to 2.4.0-SNAPSHOT (apache#6774)

cmccabe · 2019-05-22T17:15:06Z

Thanks, @rajinisivaram .

cmccabe · 2019-05-22T17:16:12Z

Would this ever be merged to a 2.2.x release?

If there is ever a 2.2.2 release, I don't see any reason why we couldn't merge this.

…pache#6582) Reviewers: Jason Gustafson <jason@confluent.io>, Colin Patrick McCabe <cmccabe@confluent.io>, Andrew Olson <aolson1@cerner.com>, José Armando García Sancio <jsancio@users.noreply.github.com>

…updated before new request (apache#6582) TICKET = KAFKA-8052 LI_DESCRIPTION = This will remove intermittent INVALID_FETCH_SESSION_EPOCH errors on fetch requests EXIT_CRITERIA = HASH [012880d] ORIGINAL_DESCRIPTION = Reviewers: Jason Gustafson <jason@confluent.io>, Colin Patrick McCabe <cmccabe@confluent.io>, Andrew Olson <aolson1@cerner.com>, José Armando García Sancio <jsancio@users.noreply.github.com (cherry picked from commit 012880d)

…updated before new request (apache#6582) (#31) TICKET = KAFKA-8052 LI_DESCRIPTION = This will remove intermittent INVALID_FETCH_SESSION_EPOCH errors on fetch requests EXIT_CRITERIA = HASH [012880d] ORIGINAL_DESCRIPTION = Reviewers: Jason Gustafson <jason@confluent.io>, Colin Patrick McCabe <cmccabe@confluent.io>, Andrew Olson <aolson1@cerner.com>, José Armando García Sancio <jsancio@users.noreply.github.com (cherry picked from commit 012880d)

rajinisivaram requested a review from hachikuji April 14, 2019 10:36

rajinisivaram force-pushed the KAFKA-8052-fetch-epoch branch from 4c2116e to 8b3666b Compare April 14, 2019 10:49

jsancio reviewed Apr 16, 2019

View reviewed changes

Comment thread clients/src/main/java/org/apache/kafka/clients/FetchSessionHandler.java Outdated

noslowerdna reviewed May 17, 2019

View reviewed changes

Comment thread clients/src/main/java/org/apache/kafka/clients/FetchSessionHandler.java Outdated

noslowerdna reviewed May 17, 2019

View reviewed changes

KAFKA-8052; Ensure fetch session epoch is updated before new request

0e05a90

rajinisivaram force-pushed the KAFKA-8052-fetch-epoch branch from 8b3666b to 0e05a90 Compare May 20, 2019 10:44

ijuma requested a review from cmccabe May 20, 2019 14:38

noslowerdna approved these changes May 20, 2019

View reviewed changes

hachikuji reviewed May 20, 2019

View reviewed changes

rajinisivaram force-pushed the KAFKA-8052-fetch-epoch branch from a1dbe3f to 58c05a4 Compare May 21, 2019 09:35

Address review comment

f6690a7

rajinisivaram force-pushed the KAFKA-8052-fetch-epoch branch from 58c05a4 to f6690a7 Compare May 21, 2019 09:50

noslowerdna reviewed May 21, 2019

View reviewed changes

hachikuji approved these changes May 21, 2019

View reviewed changes

rajinisivaram merged commit 012880d into apache:trunk May 21, 2019

jsvd mentioned this pull request Aug 1, 2019

INVALID_FETCH_SESSION_EPOCH - Sending LeaveGroup request to coordinator logstash-plugins/logstash-input-kafka#323

Open

alinazemian mentioned this pull request Aug 19, 2021

Kafka messages are being lost on INVALID_FETCH_SESSION_EPOCH spring-attic/spring-cloud-stream-binder-kafka#1118

Closed

Conversation

rajinisivaram commented Apr 14, 2019

Committer Checklist (excluded from commit message)

Uh oh!

cmccabe commented Apr 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hachikuji commented Apr 16, 2019

Uh oh!

Uh oh!

manderson23 commented Apr 23, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ijuma commented May 19, 2019

Uh oh!

rajinisivaram commented May 20, 2019

Uh oh!

rajinisivaram commented May 20, 2019

Uh oh!

ijuma commented May 20, 2019

Uh oh!

noslowerdna commented May 20, 2019

Uh oh!

rajinisivaram commented May 20, 2019

Uh oh!

hachikuji left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hachikuji commented May 20, 2019

Uh oh!

rajinisivaram commented May 21, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hachikuji left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rajinisivaram commented May 21, 2019

Uh oh!

manderson23 commented May 21, 2019

Uh oh!

cmccabe commented May 22, 2019

Uh oh!

cmccabe commented May 22, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

cmccabe commented Apr 15, 2019 •

edited

Loading