Fix rabbitmq test by starting RabbitMQ from scratch every test by pamarcos · Pull Request #78186 · ClickHouse/ClickHouse

pamarcos · 2025-03-24T17:16:35Z

Use rabbitmqctl to stop and start instead of killing the docker instance

Closes #71049

Changelog category (leave one):

CI Fix or Improvement (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

...

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

clickhouse-gh · 2025-03-24T17:17:03Z

Workflow [PR], commit [00ffc43]

nikitamikhaylov · 2025-03-25T00:06:21Z

Almost! Except test_storage_rabbitmq/test_failed_connection.py::test_rabbitmq_restore_failed_connection_without_losses_2 is failing.

pamarcos · 2025-03-25T12:30:17Z

Almost! Except test_storage_rabbitmq/test_failed_connection.py::test_rabbitmq_restore_failed_connection_without_losses_2 is failing.

Yep, curious how the flaky check went okay, but the single test execution failed 😏.
I've run all tests tests thousands of times on my local dev without issues, for what is worth.

At least the test where it failed shows something quite clear and not an obscure error due to some weird RabbitMQ server thing. I'll keep investigating 🧐

nikitamikhaylov · 2025-03-25T22:18:37Z

I found something interesting in logs:

2025-03-24 19:54:32.495582+00:00 [warning] <0.1422.0> memory resource limit alarm set on node rabbit@rabbitmq1.
2025-03-24 19:54:32.495582+00:00 [warning] <0.1422.0>
2025-03-24 19:54:32.495582+00:00 [warning] <0.1422.0> **********************************************************
2025-03-24 19:54:32.495582+00:00 [warning] <0.1422.0> *** Publishers will be blocked until this alarm clears ***
2025-03-24 19:54:32.495582+00:00 [warning] <0.1422.0> **********************************************************
2025-03-24 19:54:32.495582+00:00 [warning] <0.1422.0>

2025-03-24 19:54:48.436350+00:00 [info] <0.2194.0> vm_memory_high_watermark clear. Memory used:503021568 allowed:4000000000
2025-03-24 19:54:48.436560+00:00 [warning] <0.2192.0> memory resource limit alarm cleared on node rabbit@rabbitmq1
2025-03-24 19:54:48.436630+00:00 [warning] <0.2192.0> memory resource limit alarm cleared across the cluster

And also Rabbit was doing nothing for 2 minutes:

2025-03-24 19:55:42.488308+00:00 [debug] <0.2556.0> Will stop virtual host process reconciliation after 12 runs
2025-03-24 19:57:44.332872+00:00 [debug] <0.2622.0> Consistent hashing exchange: removing binding from exchange exchange 'consumer_reconnect_test_consumer_reconnect' in vhost '/' to destinat
ion queue '1_test_consumer_reconnect' in vhost '/' with routing key '1'
2025-03-24 19:57:44.333978+00:00 [warning] <0.2436.0> closing AMQP connection <0.2436.0> (172.16.1.5:50766 -> 172.16.1.2:5672, vhost: '/', user: 'root', duration: '2M, 57s'):
2025-03-24 19:57:44.333978+00:00 [warning] <0.2436.0> client unexpectedly closed TCP connection

And these two minutes were exactly the time we tried to read the messages from the Rabbit and gave up at the end

2025-03-24 19:57:43 [ 672 ] DEBUG : Result: 148591 / 150000 (test_failed_connection.py:252, test_rabbitmq_restore_failed_connection_without_losses_2)

nikitamikhaylov · 2025-03-25T22:29:15Z

Also:

rabbitmq1-1  | 2025-03-24 19:53:53.942113+00:00 [info] <0.291.0> Memory high watermark set to 3814 MiB (4000000000 bytes) of 63258 MiB (66330923008 bytes) total
rabbitmq1-1  | 2025-03-24 19:53:53.944444+00:00 [info] <0.293.0> Enabling free disk space monitoring (disk free space: 108880531456, total memory: 66330923008)
rabbitmq1-1  | 2025-03-24 19:53:53.944541+00:00 [info] <0.293.0> Disk free limit set to 50MB

Do we really have 64Gb RAM in the RabbitMQ container? Let's use more of that then.

Use rabbitmqctl to stop and start instead of killing the docker instance

pamarcos · 2025-03-26T08:35:38Z

Do we really have 64Gb RAM in the RabbitMQ container?

Well, not exactly. My understanding is that we set 64GB for the outer docker runner that orchestrates everything. Then, we run the tests along with the rest of docker containers (DoD or Docker on Docker) such as RabbitMQ within those limits.

Thanks @nikitamikhaylov 🙏
I already increased from 2GB to 4GB the memory used by RabbitMQ in the prior PR. I was checking what could have changed because there is a clear increase in the number of times this test failed . Before, it didn't fail that much even with 2GB 🤔

Anyhoo, let's merge this and I'll keep monitoring it

pamarcos · 2025-03-26T16:12:56Z

Still happening for test_storage_rabbitmq/test_failed_connection.py::test_rabbitmq_restore_failed_connection_without_losses_2 😭

https://s3.amazonaws.com/clickhouse-test-reports/REFs/master/8e682a936336fd64055217548b60fcfacbac588e//integration_tests_release_3_4/integration_run_test_storage_rabbitmq_test_failed_connection_py_0.log

clickhouse-gh bot added the pr-ci label Mar 24, 2025

pamarcos changed the title ~~Fix rabbitmq test by starting from scratch every test~~ Fix rabbitmq test by starting RabbitMQ from scratch every test Mar 25, 2025

nikitamikhaylov self-assigned this Mar 25, 2025

nikitamikhaylov approved these changes Mar 25, 2025

View reviewed changes

pamarcos and others added 2 commits March 26, 2025 01:56

Fix rabbitmq test by starting from scratch every test

1238e6c

Use rabbitmqctl to stop and start instead of killing the docker instance

More juice

00ffc43

nikitamikhaylov force-pushed the fix-rabbitmq-test-once-again branch from 68d2057 to 00ffc43 Compare March 26, 2025 00:56

pamarcos marked this pull request as ready for review March 26, 2025 08:24

pamarcos enabled auto-merge March 26, 2025 08:38

pamarcos added this pull request to the merge queue Mar 26, 2025

Merged via the queue into master with commit 7cd5024 Mar 26, 2025
119 checks passed

pamarcos deleted the fix-rabbitmq-test-once-again branch March 26, 2025 08:47

robot-ch-test-poll3 added the pr-synced-to-cloud The PR is synced to the cloud repo label Mar 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix rabbitmq test by starting RabbitMQ from scratch every test#78186

Fix rabbitmq test by starting RabbitMQ from scratch every test#78186
pamarcos merged 2 commits intomasterfrom
fix-rabbitmq-test-once-again

pamarcos commented Mar 24, 2025 •

edited

Loading

Uh oh!

clickhouse-gh bot commented Mar 24, 2025 •

edited

Loading

Uh oh!

nikitamikhaylov commented Mar 25, 2025

Uh oh!

pamarcos commented Mar 25, 2025

Uh oh!

nikitamikhaylov commented Mar 25, 2025

Uh oh!

nikitamikhaylov commented Mar 25, 2025

Uh oh!

pamarcos commented Mar 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

pamarcos commented Mar 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pamarcos commented Mar 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Documentation entry for user-facing changes

Uh oh!

clickhouse-gh bot commented Mar 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikitamikhaylov commented Mar 25, 2025

Uh oh!

pamarcos commented Mar 25, 2025

Uh oh!

nikitamikhaylov commented Mar 25, 2025

Uh oh!

nikitamikhaylov commented Mar 25, 2025

Uh oh!

pamarcos commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

pamarcos commented Mar 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pamarcos commented Mar 24, 2025 •

edited

Loading

clickhouse-gh bot commented Mar 24, 2025 •

edited

Loading

pamarcos commented Mar 26, 2025 •

edited

Loading