tests: fix flakiness of integration tests due to too poor logs checks#87719
Conversation
… way) CI found failuer again [1]: [1]: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=87584&sha=97e95fa30f48eb6414c18ffc59e2c70c38a437d1&name_0=PR&name_1=Integration%20tests%20%28amd_binary%2C%204%2F5%29 And after added ClickHouse#86030, I can see that logs contains rows starting from: 2025.09.26 16:33:49.268044 [ 965 ] {} <Trace> test_database.postgresql_replica_4 (938f7e82-5e0e-4f88-ad88-a5ac82b289c3): Trying to reserve 1.00 MiB using storage policy from min volume index 0 While the haystack was slightly earlier: 2025.09.26 16:33:49.267221 [ 965 ] {BgSchPool::2fe592b6-5cb1-45f2-becc-27ceea4a4e98} <Warning> PostgreSQLReplicaConsumer(postgres_database): Table postgresql_replica_1 is skipped from replication stream because its structure has changes. Please detach this table and reattach to resume the replication (relation id: 16619) This is a typical problem with test logging, let's simply increase look_behind_lines, but not in the test, but globally, to avoid any flakiness further, 100 or 10000 is not a big deal for `grep -F`
|
Workflow [PR], commit [6f64561] Summary: ❌
|
474b31a to
0db69a7
Compare
Is flaky as well |
|
Kafka tests are very poor, they have |
Kafka2 produce Stalled multiple times, logs are unreliable
@antaljanosbenjamin FYI I have to fix all kafka tests (at least in |
In flaky checks, there might be repeated lines of logs which can cause issue in some tests I think, that's why I was very conservative with the log lines. But I trust you and the CI. |
|
…l-look_behind_lines
|
Flaky check will not pass in this PR because I've touched too many kafka tests and it is not able to run all changed tests within timeout. |
|
CI:
|
…dentifier generated In one of CI runs the generated name was completelly from digits [1], while it is not a valid identifier, so the test failed. [1]: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=87888&sha=edc7f7ddbe1ac2910244def7f77cf1858a9e5af7&name_0=PR&name_1=Integration%20tests%20%28amd_asan%2C%20old%20analyzer%2C%202%2F6%29 I've looked through all other usages of `gg choice.*string.digits` and this is mostly the only one (except for kafka, which I will in ClickHouse#87719)
…l-look_behind_lines
In flaky check previous iterations may took more then 200 seconds, and in this case the DNS cache will be updated in the middle..
23bfe37
CI found failure again 1:
And after added #86030, I can see that logs contains rows starting from:
While the haystack was slightly earlier:
This is a typical problem with test logging, let's simply increase look_behind_lines, but not in the test, but globally, to avoid any flakiness further, 100 or 10000 is not a big deal for
grep -FBut after this tests for kafka failed... And I decided to fix them as well.
Changelog category (leave one):
Fixes: #86185