Bug #70756
opentest_bn FAIL: test persistent topic configurations with max_retries and retry_sleep_duration
0%
Description
2025-03-26T03:05:39.870 INFO:teuthology.orchestra.run.smithi017.stderr:====================================================================== 2025-03-26T03:05:39.870 INFO:teuthology.orchestra.run.smithi017.stderr:FAIL: test persistent topic configurations with max_retries and retry_sleep_duration 2025-03-26T03:05:39.870 INFO:teuthology.orchestra.run.smithi017.stderr:---------------------------------------------------------------------- 2025-03-26T03:05:39.870 INFO:teuthology.orchestra.run.smithi017.stderr:Traceback (most recent call last): 2025-03-26T03:05:39.870 INFO:teuthology.orchestra.run.smithi017.stderr: File "/home/ubuntu/cephtest/ceph/src/test/rgw/bucket_notification/virtualenv/lib64/python3.9/site-packages/nose/case.py", line 198, in runTest 2025-03-26T03:05:39.870 INFO:teuthology.orchestra.run.smithi017.stderr: self.test(*self.arg) 2025-03-26T03:05:39.870 INFO:teuthology.orchestra.run.smithi017.stderr: File "/home/ubuntu/cephtest/ceph/src/test/rgw/bucket_notification/test_bn.py", line 3476, in test_ps_s3_persistent_topic_configs_max_retries 2025-03-26T03:05:39.870 INFO:teuthology.orchestra.run.smithi017.stderr: ps_s3_persistent_topic_configs(persistency_time, config_dict) 2025-03-26T03:05:39.870 INFO:teuthology.orchestra.run.smithi017.stderr: File "/home/ubuntu/cephtest/ceph/src/test/rgw/bucket_notification/test_bn.py", line 3440, in ps_s3_persistent_topic_configs 2025-03-26T03:05:39.870 INFO:teuthology.orchestra.run.smithi017.stderr: get_stats_persistent_topic(topic_name, number_of_objects) 2025-03-26T03:05:39.870 INFO:teuthology.orchestra.run.smithi017.stderr: File "/home/ubuntu/cephtest/ceph/src/test/rgw/bucket_notification/test_bn.py", line 557, in get_stats_persistent_topic 2025-03-26T03:05:39.871 INFO:teuthology.orchestra.run.smithi017.stderr: assert_equal(parsed_result['Topic Stats']['Entries'], assert_entries_number) 2025-03-26T03:05:39.871 INFO:teuthology.orchestra.run.smithi017.stderr:AssertionError: 14 != 10 2025-03-26T03:05:39.871 INFO:teuthology.orchestra.run.smithi017.stderr:-------------------- >> begin captured logging << --------------------
Updated by Casey Bodley 12 months ago
- Subject changed from Intermittent test failure in test_bn to test_bn FAIL: test persistent topic configurations with max_retries and retry_sleep_duration
Updated by Yuval Lifshitz 12 months ago
if the actual number was smaller than the expected number, then it would have been a test timing issue (some notifications already expired by the time we get the topic stats), then could be resolved by adding a safety time buffer.
but in the case above we have more notifications in the queue (14) then we expected (10).
need to verify this is not a product issue.
Updated by Yuval Lifshitz 12 months ago
does not seem like a test issue. or an issue with the stats counters.
in the log it looks like there are 15 expiry events (even though there are only 10 objects uploaded):
1039504:2025-03-26T02:56:07.110+0000 7fdaeadfd640 1 rgw notify: WARNING: Expiring entry marker: 0/24576 for event with notification id: 'hjrmli-17_notif', topic: 'hjrmli-17_topic', endpoint: '172.21.15.17:10924', endpoint_user: '', bucket_owner: 'foo.client.0', bucket: 'hjrmli-17', object: 'key-0', event type: 'ObjectCreated:Put' entry retry_number: 11 creation_time: 2025-03-26T02:55:27.990896+0000 time_now: 2025-03-26T02:56:07.110506+0000 1039536:2025-03-26T02:56:07.114+0000 7fdaeadfd640 1 rgw notify: WARNING: Expiring entry marker: 0/25145 for event with notification id: 'hjrmli-17_notif', topic: 'hjrmli-17_topic', endpoint: '172.21.15.17:10924', endpoint_user: '', bucket_owner: 'foo.client.0', bucket: 'hjrmli-17', object: 'key-1', event type: 'ObjectCreated:Put' entry retry_number: 11 creation_time: 2025-03-26T02:55:28.038895+0000 time_now: 2025-03-26T02:56:07.114505+0000 1039636:2025-03-26T02:56:07.122+0000 7fdaeadfd640 1 rgw notify: WARNING: Expiring entry marker: 0/25714 for event with notification id: 'hjrmli-17_notif', topic: 'hjrmli-17_topic', endpoint: '172.21.15.17:10924', endpoint_user: '', bucket_owner: 'foo.client.0', bucket: 'hjrmli-17', object: 'key-2', event type: 'ObjectCreated:Put' entry retry_number: 11 creation_time: 2025-03-26T02:55:28.059895+0000 time_now: 2025-03-26T02:56:07.122505+0000 1039638:2025-03-26T02:56:07.123+0000 7fdaeadfd640 1 rgw notify: WARNING: Expiring entry marker: 0/26283 for event with notification id: 'hjrmli-17_notif', topic: 'hjrmli-17_topic', endpoint: '172.21.15.17:10924', endpoint_user: '', bucket_owner: 'foo.client.0', bucket: 'hjrmli-17', object: 'key-3', event type: 'ObjectCreated:Put' entry retry_number: 11 creation_time: 2025-03-26T02:55:28.078895+0000 time_now: 2025-03-26T02:56:07.123505+0000 1039640:2025-03-26T02:56:07.123+0000 7fdaeadfd640 1 rgw notify: WARNING: Expiring entry marker: 0/26853 for event with notification id: 'hjrmli-17_notif', topic: 'hjrmli-17_topic', endpoint: '172.21.15.17:10924', endpoint_user: '', bucket_owner: 'foo.client.0', bucket: 'hjrmli-17', object: 'key-4', event type: 'ObjectCreated:Put' entry retry_number: 11 creation_time: 2025-03-26T02:55:28.086895+0000 time_now: 2025-03-26T02:56:07.123505+0000 1039642:2025-03-26T02:56:07.123+0000 7fdaeadfd640 1 rgw notify: WARNING: Expiring entry marker: 0/27423 for event with notification id: 'hjrmli-17_notif', topic: 'hjrmli-17_topic', endpoint: '172.21.15.17:10924', endpoint_user: '', bucket_owner: 'foo.client.0', bucket: 'hjrmli-17', object: 'key-5', event type: 'ObjectCreated:Put' entry retry_number: 11 creation_time: 2025-03-26T02:55:28.087895+0000 time_now: 2025-03-26T02:56:07.123505+0000 1039599:2025-03-26T02:56:07.119+0000 7fdaeadfd640 1 rgw notify: WARNING: Expiring entry marker: 0/28563 for event with notification id: 'hjrmli-17_notif', topic: 'hjrmli-17_topic', endpoint: '172.21.15.17:10924', endpoint_user: '', bucket_owner: 'foo.client.0', bucket: 'hjrmli-17', object: 'key-6', event type: 'ObjectCreated:Put' entry retry_number: 11 creation_time: 2025-03-26T02:55:28.097895+0000 time_now: 2025-03-26T02:56:07.119505+0000 1341454:2025-03-26T02:56:17.138+0000 7fdaeadfd640 1 rgw notify: WARNING: Expiring entry marker: 0/28563 for event with notification id: 'hjrmli-17_notif', topic: 'hjrmli-17_topic', endpoint: '172.21.15.17:10924', endpoint_user: '', bucket_owner: 'foo.client.0', bucket: 'hjrmli-17', object: 'key-6', event type: 'ObjectCreated:Put' entry retry_number: 11 creation_time: 2025-03-26T02:55:28.097895+0000 time_now: 2025-03-26T02:56:17.138314+0000 1039597:2025-03-26T02:56:07.119+0000 7fdaeadfd640 1 rgw notify: WARNING: Expiring entry marker: 0/27993 for event with notification id: 'hjrmli-17_notif', topic: 'hjrmli-17_topic', endpoint: '172.21.15.17:10924', endpoint_user: '', bucket_owner: 'foo.client.0', bucket: 'hjrmli-17', object: 'key-7', event type: 'ObjectCreated:Put' entry retry_number: 11 creation_time: 2025-03-26T02:55:28.097895+0000 time_now: 2025-03-26T02:56:07.119505+0000 1341414:2025-03-26T02:56:17.135+0000 7fdaeadfd640 1 rgw notify: WARNING: Expiring entry marker: 0/27993 for event with notification id: 'hjrmli-17_notif', topic: 'hjrmli-17_topic', endpoint: '172.21.15.17:10924', endpoint_user: '', bucket_owner: 'foo.client.0', bucket: 'hjrmli-17', object: 'key-7', event type: 'ObjectCreated:Put' entry retry_number: 11 creation_time: 2025-03-26T02:55:28.097895+0000 time_now: 2025-03-26T02:56:17.135314+0000 1039658:2025-03-26T02:56:07.124+0000 7fdaeadfd640 1 rgw notify: WARNING: Expiring entry marker: 0/29701 for event with notification id: 'hjrmli-17_notif', topic: 'hjrmli-17_topic', endpoint: '172.21.15.17:10924', endpoint_user: '', bucket_owner: 'foo.client.0', bucket: 'hjrmli-17', object: 'key-8', event type: 'ObjectCreated:Put' entry retry_number: 11 creation_time: 2025-03-26T02:55:28.099895+0000 time_now: 2025-03-26T02:56:07.124505+0000 1341498:2025-03-26T02:56:17.141+0000 7fdaeadfd640 1 rgw notify: WARNING: Expiring entry marker: 0/29701 for event with notification id: 'hjrmli-17_notif', topic: 'hjrmli-17_topic', endpoint: '172.21.15.17:10924', endpoint_user: '', bucket_owner: 'foo.client.0', bucket: 'hjrmli-17', object: 'key-8', event type: 'ObjectCreated:Put' entry retry_number: 11 creation_time: 2025-03-26T02:55:28.099895+0000 time_now: 2025-03-26T02:56:17.141314+0000 1617673:2025-03-26T02:56:27.157+0000 7fdaeadfd640 1 rgw notify: WARNING: Expiring entry marker: 0/29701 for event with notification id: 'hjrmli-17_notif', topic: 'hjrmli-17_topic', endpoint: '172.21.15.17:10924', endpoint_user: '', bucket_owner: 'foo.client.0', bucket: 'hjrmli-17', object: 'key-8', event type: 'ObjectCreated:Put' entry retry_number: 11 creation_time: 2025-03-26T02:55:28.099895+0000 time_now: 2025-03-26T02:56:27.157121+0000 1039656:2025-03-26T02:56:07.124+0000 7fdaeadfd640 1 rgw notify: WARNING: Expiring entry marker: 0/29132 for event with notification id: 'hjrmli-17_notif', topic: 'hjrmli-17_topic', endpoint: '172.21.15.17:10924', endpoint_user: '', bucket_owner: 'foo.client.0', bucket: 'hjrmli-17', object: 'key-9', event type: 'ObjectCreated:Put' entry retry_number: 11 creation_time: 2025-03-26T02:55:28.097895+0000 time_now: 2025-03-26T02:56:07.124505+0000 1341528:2025-03-26T02:56:17.144+0000 7fdaeadfd640 1 rgw notify: WARNING: Expiring entry marker: 0/29132 for event with notification id: 'hjrmli-17_notif', topic: 'hjrmli-17_topic', endpoint: '172.21.15.17:10924', endpoint_user: '', bucket_owner: 'foo.client.0', bucket: 'hjrmli-17', object: 'key-9', event type: 'ObjectCreated:Put' entry retry_number: 11 creation_time: 2025-03-26T02:55:28.097895+0000 time_now: 2025-03-26T02:56:17.144314+0000
several keys with the same creation time (e.g. key-9) expire multiple times.
only 10 keys are pushed into the topic:
34376:2025-03-26T02:55:27.889+0000 7f730491c640 20 req 2129677200105759270 0.000000000s INFO: notification: 'hjrmli-17_notif' on topic: 'hjrmli-17_topic' and bucket: 'hjrmli-17' (unique topic: 'hjrmli-17_topic') apply to event of type: 's3:ObjectCreated:Put' 34462:2025-03-26T02:55:27.916+0000 7f731493c640 20 req 6770237115898223031 0.000999990s INFO: notification: 'hjrmli-17_notif' on topic: 'hjrmli-17_topic' and bucket: 'hjrmli-17' (unique topic: 'hjrmli-17_topic') apply to event of type: 's3:ObjectCreated:Put' 34548:2025-03-26T02:55:27.957+0000 7f7376a00640 20 req 2735929488797507160 0.000999990s INFO: notification: 'hjrmli-17_notif' on topic: 'hjrmli-17_topic' and bucket: 'hjrmli-17' (unique topic: 'hjrmli-17_topic') apply to event of type: 's3:ObjectCreated:Put' 34658:2025-03-26T02:55:28.002+0000 7f72f38fa640 20 req 13134334590969294498 0.000999989s INFO: notification: 'hjrmli-17_notif' on topic: 'hjrmli-17_topic' and bucket: 'hjrmli-17' (unique topic: 'hjrmli-17_topic') apply to event of type: 's3:ObjectCreated:Put' 34744:2025-03-26T02:55:28.009+0000 7f73c2297640 20 req 15366714260770714089 0.000000000s INFO: notification: 'hjrmli-17_notif' on topic: 'hjrmli-17_topic' and bucket: 'hjrmli-17' (unique topic: 'hjrmli-17_topic') apply to event of type: 's3:ObjectCreated:Put' 34830:2025-03-26T02:55:28.015+0000 7f738fa32640 20 req 16179371182527194526 0.000000000s INFO: notification: 'hjrmli-17_notif' on topic: 'hjrmli-17_topic' and bucket: 'hjrmli-17' (unique topic: 'hjrmli-17_topic') apply to event of type: 's3:ObjectCreated:Put' 34916:2025-03-26T02:55:28.022+0000 7f73dd2cd640 20 req 5709256455311900473 0.000000000s INFO: notification: 'hjrmli-17_notif' on topic: 'hjrmli-17_topic' and bucket: 'hjrmli-17' (unique topic: 'hjrmli-17_topic') apply to event of type: 's3:ObjectCreated:Put' 35008:2025-03-26T02:55:28.026+0000 7f73539ba640 20 req 10183921372684822482 0.000000000s INFO: notification: 'hjrmli-17_notif' on topic: 'hjrmli-17_topic' and bucket: 'hjrmli-17' (unique topic: 'hjrmli-17_topic') apply to event of type: 's3:ObjectCreated:Put' 35094:2025-03-26T02:55:28.029+0000 7f73c1295640 20 req 8488431303204348171 0.000000000s INFO: notification: 'hjrmli-17_notif' on topic: 'hjrmli-17_topic' and bucket: 'hjrmli-17' (unique topic: 'hjrmli-17_topic') apply to event of type: 's3:ObjectCreated:Put' 35198:2025-03-26T02:55:28.041+0000 7f7301115640 20 req 9991337586528839934 0.000000000s INFO: notification: 'hjrmli-17_notif' on topic: 'hjrmli-17_topic' and bucket: 'hjrmli-17' (unique topic: 'hjrmli-17_topic') apply to event of type: 's3:ObjectCreated:Put'
Updated by J. Eric Ivancich 11 months ago
- Backport changed from reef squid to reef squid tentacle
Updated by Yuval Lifshitz 11 months ago
- Assignee changed from Yuval Lifshitz to Ali Masarwa
Updated by Ali Masarwa 11 months ago
it seems like we hit this issue because of slowness in RGW, if RGW takes some time to respond to aws, aws will retry sending the op, which creates duplicate entries in the queue.
I added a debug to check that and we can see duplicate IDs:
commit succeded for event_id 1746970321.178290.020997dcc7a6dc8f262578753c22cce1 creation_time2025-05-11T13:32:01.178038+0000
commit succeded for event_id 1746970321.188276.1ee6a13b023a7edde92a32986b4f64c8 creation_time2025-05-11T13:32:01.198038+0000
commit succeded for event_id 1746970321.192401.db27473a10a64a8b91d5a6f14d8089df creation_time2025-05-11T13:32:01.198038+0000
commit succeded for event_id 1746970321.198977.c297386b418d700f61e2ee2b96e3498f creation_time2025-05-11T13:32:01.199038+0000
commit succeded for event_id 1746970321.199930.a1223f74109fed7a4a9537fbffb15a81 creation_time2025-05-11T13:32:01.200038+0000
commit succeded for event_id 1746970321.200271.87cfc58d20b18108a09918b9a863731a creation_time2025-05-11T13:32:01.200038+0000
commit succeded for event_id 1746970361.505862.52672127b35bffaa80dd87cc0f7c2937 creation_time2025-05-11T13:32:41.505010+0000
commit succeded for event_id 1746970361.508808.324d3f3a6fc82aeee7bc2e1d379fde01 creation_time2025-05-11T13:32:41.508010+0000
commit succeded for event_id 1746970361.510251.b011a5d38e698e4f15388ad1e98bf485 creation_time2025-05-11T13:32:41.509010+0000
commit succeded for event_id 1746970361.510675.0f493dcbd94bb6ec1df3540401ea1db9 creation_time2025-05-11T13:32:41.510010+0000
commit succeded for event_id 1746970361.514450.020997dcc7a6dc8f262578753c22cce1 creation_time2025-05-11T13:32:41.513010+0000
commit succeded for event_id 1746970361.516282.db27473a10a64a8b91d5a6f14d8089df creation_time2025-05-11T13:32:41.515010+0000
commit succeded for event_id 1746970361.517283.1ee6a13b023a7edde92a32986b4f64c8 creation_time2025-05-11T13:32:41.516010+0000
commit succeded for event_id 1746970361.521317.c297386b418d700f61e2ee2b96e3498f creation_time2025-05-11T13:32:41.520010+0000
commit succeded for event_id 1746970361.522046.a1223f74109fed7a4a9537fbffb15a81 creation_time2025-05-11T13:32:41.521010+0000
Updated by Casey Bodley 10 months ago
looks like the same failure in https://qa-proxy.ceph.com/teuthology/cbodley-2025-05-19_22:28:25-rgw-wip-cbodley-testing-distro-default-smithi/8289688/teuthology.log
======================================================================
FAIL: bucket_notification.test_bn.test_ps_s3_persistent_topic_configs_max_retries
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/ubuntu/cephtest/ceph/src/test/rgw/bucket_notification/virtualenv/lib64/python3.9/site-packages/nose/case.py", line 189, in runTest
self.test(*self.arg)
File "/home/ubuntu/cephtest/ceph/src/test/rgw/bucket_notification/test_bn.py", line 3346, in test_ps_s3_persistent_topic_configs_max_retries
ps_s3_persistent_topic_configs(persistency_time, config_dict)
File "/home/ubuntu/cephtest/ceph/src/test/rgw/bucket_notification/test_bn.py", line 3310, in ps_s3_persistent_topic_configs
get_stats_persistent_topic(topic_name, number_of_objects)
File "/home/ubuntu/cephtest/ceph/src/test/rgw/bucket_notification/test_bn.py", line 575, in get_stats_persistent_topic
assert_equal(parsed_result['Topic Stats']['Entries'], assert_entries_number)
AssertionError: 9 != 10
-------------------- >> begin captured logging << --------------------
Updated by Yuval Lifshitz 9 months ago
- could not find the following log messages: "commit succeded for event_id ..." in the RGW log
- also, in the original log, you can see 10 events received in the fronend and 15 expiries in the notification thread
@Casey Bodley in the failure from note-9, the topic stats (9) are less than the number of events (10), which seems like the opposite problem
Updated by Yuval Lifshitz 9 months ago
@Adam Emerson is it possible to get a link to the teuthology run that failed with the original issue?
Updated by Adam Emerson 9 months ago
/a/yuriw-2025-03-25_22:40:18-rgw-wip-yuri-testing-2025-03-25-1039-squid-distro-default-smithi/8210270/teuthology.log.gz on any Ceph playground is the run!
Updated by Yuval Lifshitz 9 months ago
- Assignee changed from Ali Masarwa to Yuval Lifshitz
Updated by Yuval Lifshitz 9 months ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 64010
Updated by Yuval Lifshitz 9 months ago
- Status changed from Fix Under Review to Pending Backport
Updated by Upkeep Bot 9 months ago
- Copied to Backport #71900: reef: test_bn FAIL: test persistent topic configurations with max_retries and retry_sleep_duration added
Updated by Upkeep Bot 9 months ago
- Copied to Backport #71901: tentacle: test_bn FAIL: test persistent topic configurations with max_retries and retry_sleep_duration added
Updated by Upkeep Bot 9 months ago
- Copied to Backport #71902: squid: test_bn FAIL: test persistent topic configurations with max_retries and retry_sleep_duration added
Updated by Yuval Lifshitz 9 months ago
issue is till seen after the fix. opened: https://tracker.ceph.com/issues/71979
Updated by Upkeep Bot 9 months ago
- Merge Commit set to 26b60e4595cb624dd11462231b282c4440a84137
- Fixed In set to v20.3.0-1274-g26b60e4595c
- Upkeep Timestamp set to 2025-07-08T14:46:59+00:00
Updated by Upkeep Bot 8 months ago
- Fixed In changed from v20.3.0-1274-g26b60e4595c to v20.3.0-1274-g26b60e4595cb
- Upkeep Timestamp changed from 2025-07-08T14:46:59+00:00 to 2025-07-14T15:21:07+00:00
Updated by Upkeep Bot 8 months ago
- Fixed In changed from v20.3.0-1274-g26b60e4595cb to v20.3.0-1274-g26b60e4595
- Upkeep Timestamp changed from 2025-07-14T15:21:07+00:00 to 2025-07-14T20:45:41+00:00