Conversation
🤖 GitHub commentsJust comment with:
|
* Added delays and waits at certain steps to ensure the Kafka configuration running properly before starting the test. * Add connection retries for the Kafka producer Assisted by Cursor.
b7f8b79 to
722b13f
Compare
|
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
|
Looks like it still fails:
I'm taking this back to draft and will investigate. |
|
@faec is it a requirement to have 3 partitions for Kafka? I cannot find any test that requires this: |
|
I'm going to run the tests on this PR multiple times to make sure it actually fixed the flaky behavior. After this confirmation the PR can be merged. |
|
Failed again with:
I'll investigate further but the easiest way to fix this would be switching to a single partition instead of 3. |
Create the test topic explicitly and wait for partition leaders before producing messages, preventing transient CI failures with “no leader for this partition” during topic auto-creation. GenAI-Assisted: Yes Human-Reviewed: Yes Tool: Cursor CLI, Model: GPT-5.3 Codex
|
/test |
|
Looks like 3 consecutive CI runs didn't fail. Let's merge it and observe. |
Yes I share the confidence! Also, the test is already flaky: best case scenario the flakiness is fixed, worst case it stays flaky and we learn that this was not the root cause. So, merging it it's a win-win. |
|
@Mergifyio backport 8.19 9.2 9.3 |
✅ Backports have been createdDetails
|
* Added delays and waits at certain steps to ensure the Kafka configuration running properly before starting the test. * Add connection retries for the Kafka producer Assisted by Cursor. * Try to disable leader election * Replace the attempts loop with the eventually call * Make the helper more debuggable * filebeat/input/kafka: stabilize TestSASLAuthentication topic setup Create the test topic explicitly and wait for partition leaders before producing messages, preventing transient CI failures with “no leader for this partition” during topic auto-creation. GenAI-Assisted: Yes Human-Reviewed: Yes Tool: Cursor CLI, Model: GPT-5.3 Codex --------- Co-authored-by: Tiago Queiroz <tiago.queiroz@elastic.co> (cherry picked from commit 0132a25)
* Added delays and waits at certain steps to ensure the Kafka configuration running properly before starting the test. * Add connection retries for the Kafka producer Assisted by Cursor. * Try to disable leader election * Replace the attempts loop with the eventually call * Make the helper more debuggable * filebeat/input/kafka: stabilize TestSASLAuthentication topic setup Create the test topic explicitly and wait for partition leaders before producing messages, preventing transient CI failures with “no leader for this partition” during topic auto-creation. GenAI-Assisted: Yes Human-Reviewed: Yes Tool: Cursor CLI, Model: GPT-5.3 Codex --------- Co-authored-by: Tiago Queiroz <tiago.queiroz@elastic.co> (cherry picked from commit 0132a25)
* Added delays and waits at certain steps to ensure the Kafka configuration running properly before starting the test. * Add connection retries for the Kafka producer Assisted by Cursor. * Try to disable leader election * Replace the attempts loop with the eventually call * Make the helper more debuggable * filebeat/input/kafka: stabilize TestSASLAuthentication topic setup Create the test topic explicitly and wait for partition leaders before producing messages, preventing transient CI failures with “no leader for this partition” during topic auto-creation. GenAI-Assisted: Yes Human-Reviewed: Yes Tool: Cursor CLI, Model: GPT-5.3 Codex --------- Co-authored-by: Tiago Queiroz <tiago.queiroz@elastic.co> (cherry picked from commit 0132a25)
* Added delays and waits at certain steps to ensure the Kafka configuration running properly before starting the test. * Add connection retries for the Kafka producer * Create the test topic explicitly and wait for partition leaders before producing messages, preventing transient CI failures with “no leader for this partition” during topic auto-creation. Assisted by Cursor. --------- (cherry picked from commit 0132a25) Co-authored-by: Denis <denis.rechkunov@elastic.co> Co-authored-by: Tiago Queiroz <tiago.queiroz@elastic.co>
* Added delays and waits at certain steps to ensure the Kafka configuration running properly before starting the test. * Add connection retries for the Kafka producer * Create the test topic explicitly and wait for partition leaders before producing messages, preventing transient CI failures with “no leader for this partition” during topic auto-creation. Assisted by Cursor. --------- (cherry picked from commit 0132a25) Co-authored-by: Denis <denis.rechkunov@elastic.co> Co-authored-by: Tiago Queiroz <tiago.queiroz@elastic.co>
* Added delays and waits at certain steps to ensure the Kafka configuration running properly before starting the test. * Add connection retries for the Kafka producer * Create the test topic explicitly and wait for partition leaders before producing messages, preventing transient CI failures with “no leader for this partition” during topic auto-creation. Assisted by Cursor. --------- (cherry picked from commit 0132a25) Co-authored-by: Denis <denis.rechkunov@elastic.co> Co-authored-by: Tiago Queiroz <tiago.queiroz@elastic.co>
|
I failed again in a backport PR to 9.2, even after the fix was merged.
Stack trace: I'll create a flaky test issue for this. |
|
@belimawr thanks for creating the issue and investigating! It might be an environmental problem too. Perhaps some buildkite runners have something different about them. |
I was looking more into it, well I mostly delegated the analysis to AI. Even though it is the same error, the failure is in another test, same root cause. We'll (AI and I) fix it in all test for this input. |
Proposed commit message
Assisted by Cursor.
The Kafka tests are flaky and sometime fails with:
For example, in this build https://buildkite.com/elastic/filebeat/builds/28161#019c482f-481b-40db-85e6-34da7056207f
Relates to #48026