Skip to content

MINOR: Proactively update producer topic access time.#7672

Merged
rajinisivaram merged 2 commits into
apache:trunkfrom
bdbyrne:KAFKA-9164
Dec 3, 2019
Merged

MINOR: Proactively update producer topic access time.#7672
rajinisivaram merged 2 commits into
apache:trunkfrom
bdbyrne:KAFKA-9164

Conversation

@bdbyrne

@bdbyrne bdbyrne commented Nov 8, 2019

Copy link
Copy Markdown
Contributor

Changes the ProducerMetadata to longer record a sentinel TOPIC_EXPIRY_NEEDS_UPDATE upon topic map emplacement, and instead set the expiry time directly. Previously the expiry time was being updated for all touched topics after a metadata fetch was processed, which could be seconds/minutes in the future.

Additionally propagates the current time further in the Producer, which should reduce the total number of current-time calls.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's safe to reuse nowMs here since we may have blocked on free.allocate.

@bdbyrne bdbyrne Nov 12, 2019

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Common case should be that there's already an in-progress batch above, so shouldn't be a big deal to always call time.milliseconds() here even if there's no memory pressure.

@ijuma ijuma left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. This looks good generally, nice to see some unnecessary milliseconds() calls being removed. As noted in a separate comment, I think we have to get a fresh time after allocation since that can block if the buffer is full. @rajinisivaram, since you implemented the metadata expiry a while back, do the changes here look good to you?

@rajinisivaram rajinisivaram left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bdbyrne Thanks for the PR. As @ijuma has said, we need to update time after the potentially blocking buffer allocation, but the rest of the changes look good.

We currently expire topics that are not being actively used. Topics are expired 5 minutes after the metadata response following the last time the topic was used. With the changes in this PR, we continue to expire unused topics, but with slight difference that we are expiring 5 minutes after the last use. I think the main reason for the sentinel value was to avoid getting time just to set expiry, while with your changes since the value is available anyway, it makes sense to use it. A slight difference now is that we rely on successful metadata response to be received within the hard-coded 5 minutes. Just wanted to confirm that my understanding is correct and we are expecting this to be a reasonable limit.

@rajinisivaram

Copy link
Copy Markdown
Contributor

@bdbyrne Thanks for the PR, LGTM.

Can we clarify the relationship between this PR and the issue described in https://issues.apache.org/jira/browse/KAFKA-9164? The JIRA says:

The producer's metadata currently marks a topic as "not needing to be retained" if it has been 5 minutes since it was first considered, regardless of whether records were being actively produced for the topic. This shouldn't happen and should be fixed.

We were updating the access time for the topic whenever a record was sent. And that seems to be the case even with this PR. So it is not obvious to me what the issue was and how this PR addresses the issue.

@bdbyrne bdbyrne changed the title KAFKA-9164: Improve producer topic expiry logic to account for last a… MINOR: Proactively update producer topic access time. Dec 2, 2019
@bdbyrne

bdbyrne commented Dec 2, 2019

Copy link
Copy Markdown
Contributor Author

We were updating the access time for the topic whenever a record was sent. And that seems to be the case even with this PR. So it is not obvious to me what the issue was and how this PR addresses the issue.

You're correct, I was mistaken. I've closed the issue and updated the description of this PR.

@rajinisivaram

Copy link
Copy Markdown
Contributor

retest this please

@rajinisivaram rajinisivaram left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bdbyrne Thanks for the PR, LGTM. Rerunning tests since the results from the previous run are not available for checking, will merge after that completes.

@rajinisivaram

Copy link
Copy Markdown
Contributor

Test failure not related, SaslScramSslEndToEndAuthorizationTest.testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl is a known flaky test with a PR open to address it. Merging to trunk.

@rajinisivaram rajinisivaram merged commit 38fde81 into apache:trunk Dec 3, 2019
@bdbyrne bdbyrne deleted the KAFKA-9164 branch December 3, 2019 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants