Skip to content

qa/tests: retry the api call after making the request#61744

Merged
nizamial09 merged 1 commit intoceph:mainfrom
rhcs-dashboard:mgr-api-test-fixes
Mar 5, 2025
Merged

qa/tests: retry the api call after making the request#61744
nizamial09 merged 1 commit intoceph:mainfrom
rhcs-dashboard:mgr-api-test-fixes

Conversation

@nizamial09
Copy link
Member

@nizamial09 nizamial09 commented Feb 10, 2025

based on the pointer from Bill in https://tracker.ceph.com/issues/62972#note-75

Fixes: https://tracker.ceph.com/issues/62972

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@nizamial09 nizamial09 requested a review from a team as a code owner February 10, 2025 18:21
@nizamial09 nizamial09 requested review from dnyanee1997 and nmunet and removed request for a team February 10, 2025 18:21
@nizamial09
Copy link
Member Author

a similar kind of error.
pasting it here for future ref:

Using guessed paths /home/jenkins-build/build/workspace/ceph-api/build/lib/ ['/home/jenkins-build/build/workspace/ceph-api/qa', '/home/jenkins-build/build/workspace/ceph-api/build/lib/cython_modules/lib.3', '/home/jenkins-build/build/workspace/ceph-api/src/pybind']
Traceback (most recent call last):
  File "/home/jenkins-build/build/workspace/ceph-api/build/../qa/tasks/vstart_runner.py", line 1616, in <module>
    exec_test()
  File "/home/jenkins-build/build/workspace/ceph-api/build/../qa/tasks/vstart_runner.py", line 1459, in exec_test
    LocalCephCluster(LocalContext()).mon_manager.wait_for_all_osds_up(timeout=30)
  File "/home/jenkins-build/build/workspace/ceph-api/qa/tasks/ceph_manager.py", line 2955, in wait_for_all_osds_up
    assert time.time() - start < timeout, \
AssertionError: timeout expired in wait_for_all_osds_up

@nizamial09
Copy link
Member Author

jenkins test api

Copy link
Contributor

@bill-scales bill-scales left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change to test_list_disabled_module and test_list_enabled_module fixes the timing window, the change to the @Retry wrapper won't work as intended but is harmless.

Copy link
Member

@epuertat epuertat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nizamial09 nizamial09 force-pushed the mgr-api-test-fixes branch 3 times, most recently from 4f7ccf2 to 28d9178 Compare February 12, 2025 09:19
@nizamial09
Copy link
Member Author

jenkins retest this please

@nizamial09
Copy link
Member Author

before merging this PR, I'll probably trigger api runs multiple times over the course of this week just to be sure there aren't any related errors.

@nizamial09
Copy link
Member Author

saw this error in one of the run. trying out locally but at first glance this shouldn't happen

2025-02-24 10:41:43,077.077 INFO:__main__:test_get (tasks.mgr.dashboard.test_mgr_module.MgrModuleTest) ... ok
2025-02-24 10:41:43,077.077 INFO:__main__:test_list_disabled_module (tasks.mgr.dashboard.test_mgr_module.MgrModuleTest) ... ok
2025-02-24 10:41:43,078.078 INFO:__main__:test_list_enabled_module (tasks.mgr.dashboard.test_mgr_module.MgrModuleTest) ... FAIL
2025-02-24 10:41:43,078.078 INFO:__main__:
2025-02-24 10:41:43,078.078 INFO:__main__:Stopped test: test_list_enabled_module (tasks.mgr.dashboard.test_mgr_module.MgrModuleTest) in 7.410687s
2025-02-24 10:41:43,079.079 INFO:__main__:
2025-02-24 10:41:43,079.079 INFO:__main__:======================================================================
2025-02-24 10:41:43,079.079 INFO:__main__:FAIL: test_list_enabled_module (tasks.mgr.dashboard.test_mgr_module.MgrModuleTest)
2025-02-24 10:41:43,079.079 INFO:__main__:----------------------------------------------------------------------
2025-02-24 10:41:43,079.079 INFO:__main__:Traceback (most recent call last):
2025-02-24 10:41:43,079.079 INFO:__main__:  File "/home/jenkins-build/build/workspace/ceph-api/qa/tasks/mgr/dashboard/test_mgr_module.py", line 81, in test_list_enabled_module
2025-02-24 10:41:43,079.079 INFO:__main__:    self.assertTrue(module_info['enabled'])
2025-02-24 10:41:43,080.080 INFO:__main__:AssertionError: False is not true
2025-02-24 10:41:43,080.080 INFO:__main__:
2025-02-24 10:41:43,080.080 INFO:__main__:> ip netns list
2025-02-24 10:41:43,085.085 INFO:__main__:> sudo ip link delete ceph-brx

@epuertat
Copy link
Member

saw this error in one of the run. trying out locally but at first glance this shouldn't happen

2025-02-24 10:41:43,077.077 INFO:__main__:test_get (tasks.mgr.dashboard.test_mgr_module.MgrModuleTest) ... ok
2025-02-24 10:41:43,077.077 INFO:__main__:test_list_disabled_module (tasks.mgr.dashboard.test_mgr_module.MgrModuleTest) ... ok
2025-02-24 10:41:43,078.078 INFO:__main__:test_list_enabled_module (tasks.mgr.dashboard.test_mgr_module.MgrModuleTest) ... FAIL
2025-02-24 10:41:43,078.078 INFO:__main__:
2025-02-24 10:41:43,078.078 INFO:__main__:Stopped test: test_list_enabled_module (tasks.mgr.dashboard.test_mgr_module.MgrModuleTest) in 7.410687s
2025-02-24 10:41:43,079.079 INFO:__main__:
2025-02-24 10:41:43,079.079 INFO:__main__:======================================================================
2025-02-24 10:41:43,079.079 INFO:__main__:FAIL: test_list_enabled_module (tasks.mgr.dashboard.test_mgr_module.MgrModuleTest)
2025-02-24 10:41:43,079.079 INFO:__main__:----------------------------------------------------------------------
2025-02-24 10:41:43,079.079 INFO:__main__:Traceback (most recent call last):
2025-02-24 10:41:43,079.079 INFO:__main__:  File "/home/jenkins-build/build/workspace/ceph-api/qa/tasks/mgr/dashboard/test_mgr_module.py", line 81, in test_list_enabled_module
2025-02-24 10:41:43,079.079 INFO:__main__:    self.assertTrue(module_info['enabled'])
2025-02-24 10:41:43,080.080 INFO:__main__:AssertionError: False is not true
2025-02-24 10:41:43,080.080 INFO:__main__:
2025-02-24 10:41:43,080.080 INFO:__main__:> ip netns list
2025-02-24 10:41:43,085.085 INFO:__main__:> sudo ip link delete ceph-brx

I observed this behaviour in Teuthology where the qa/ code run is not (always?) the latest (not sure if this is intended behaviour).

@nizamial09
Copy link
Member Author

I observed this behaviour in Teuthology where the qa/ code run is not (always?) the latest

didn't knew that. rebased the branch to main. let's observe a little more then..

@nizamial09 nizamial09 force-pushed the mgr-api-test-fixes branch 2 times, most recently from 22e799f to 8ad9fb3 Compare February 28, 2025 04:58
@nizamial09
Copy link
Member Author

nizamial09 commented Feb 28, 2025

Some thing I observed about the above mentioned issue when I checked the logs

  • after enabling the iostat module in test_list_enabled_module, it calls for /api/mgr/module at
2025-02-26 12:15:45,653.653 DEBUG:tasks.mgr.dashboard.helper:Request GET to https://172.21.3.234:7790/api/mgr/module

but the funny thing is that never gets logged into the mgr logs (even though the api call from test_list_disabled_module gets logged fine.

mgr.y.log at the same time

2025-02-26T12:15:43.540+0000 7fd2c4d72640  0 log_channel(cluster) log [DBG] : pgmap v19: 226 pgs: 226 active+clean; 591 KiB data, 4.0 GiB used, 400 GiB / 404 GiB avail
2025-02-26T12:15:45.544+0000 7fd2c4d72640  0 log_channel(cluster) log [DBG] : pgmap v20: 226 pgs: 226 active+clean; 591 KiB data, 4.0 GiB used, 400 GiB / 404 GiB avail
2025-02-26T12:15:45.680+0000 7fd3a73d8640  1 mgr handle_mgr_map respawning because set of enabled modules changed!
2025-02-26T12:15:45.680+0000 7fd3a73d8640  1 mgr respawn  e: './bin/ceph-mgr'
2025-02-26T12:15:45.680+0000 7fd3a73d8640  1 mgr respawn  0: './bin/ceph-mgr'
2025-02-26T12:15:45.680+0000 7fd3a73d8640  1 mgr respawn  1: '-i'
2025-02-26T12:15:45.680+0000 7fd3a73d8640  1 mgr respawn  2: 'y'
2025-02-26T12:15:45.680+0000 7fd3a73d8640  1 mgr respawn  3: '-f'
2025-02-26T12:15:45.684+0000 7fd3a73d8640  1 mgr respawn respawning with exe /home/jenkins-build/build/workspace/ceph-api/build/bin/ceph-mgr
2025-02-26T12:15:45.684+0000 7fd3a73d8640  1 mgr respawn  exe_path /proc/self/exe
2025-02-26T12:15:45.900+0000 7fbd779c2240 -1 WARNING: all dangerous and experimental features are enabled.
2025-02-26T12:15:45.920+0000 7fbd779c2240 -1 WARNING: all dangerous and experimental features are enabled.
2025-02-26T12:15:45.920+0000 7fbd779c2240  0 ceph version Development (no_version) squid (dev), process ceph-mgr, pid 2146325
2025-02-26T12:15:45.920+0000 7fbd779c2240 -1 WARNING: all dangerous and experimental features are enabled.
2025-02-26T12:15:45.956+0000 7fbd779c2240  1 mgr[py] Loading python module 'devicehealth'
2025-02-26T12:15:46.052+0000 7fbd779c2240  1 mgr[py] Loading python module 'stats'

I assume thats because the mgr goes down after accepting the request but failed to log? So the response became good and the retries never happen and test just fails on the excepted result.

I just went ahead and added a sleep after disabling and enabling the module to see if its that.

@nizamial09
Copy link
Member Author

nizamial09 commented Feb 28, 2025

looks like its gracefull retrying: https://jenkins.ceph.com/job/ceph-api/90617/consoleFull ✔️ . Anyways, will try one more and if that passes will merge it.

2025-02-28 05:36:07,062.062 INFO:__main__:> ./bin/ceph mgr module enable iostat
2025-02-28 05:36:07,943.943 DEBUG:tasks.mgr.dashboard.helper:command result: 
2025-02-28 05:36:10,946.946 DEBUG:tasks.mgr.dashboard.helper:Request GET to https://172.21.5.33:7789/api/mgr/module
2025-02-28 05:36:10,949.949 INFO:tasks.mgr.dashboard.helper:Retrying the GET req. Total retries left is... 1
2025-02-28 05:36:10,949.949 INFO:tasks.mgr.dashboard.test_mgr_module:Trying to reach the REST API endpoint
2025-02-28 05:36:10,949.949 DEBUG:tasks.mgr.dashboard.helper:Request GET to https://172.21.5.33:7789/api/mgr/module
2025-02-28 05:36:10,950.950 DEBUG:tasks.ceph_test_case:wait_until_true: waiting (timeout=30 retry_count=0)...
2025-02-28 05:36:15,956.956 INFO:tasks.mgr.dashboard.test_mgr_module:Trying to reach the REST API endpoint
2025-02-28 05:36:15,956.956 DEBUG:tasks.mgr.dashboard.helper:Request GET to https://172.21.5.33:7789/api/mgr/module
2025-02-28 05:36:16,657.657 DEBUG:tasks.ceph_test_case:wait_until_true: success in 5s and 0 retries
2025-02-28 05:36:16,658.658 DEBUG:tasks.mgr.dashboard.helper:Request GET to https://172.21.5.33:7789/api/mgr/module
2025-02-28 05:36:16,794.794 INFO:__main__:> ./bin/ceph config reset 132

@nizamial09
Copy link
Member Author

jenkins test api

@nizamial09
Copy link
Member Author

another pass: https://jenkins.ceph.com/job/ceph-api/90627/

I'll fix the make check lint issue and push again..

@nizamial09
Copy link
Member Author

jenkins test make check

@nizamial09
Copy link
Member Author

jenkins test make check arm64

@nizamial09 nizamial09 merged commit 224a0e7 into ceph:main Mar 5, 2025
11 of 13 checks passed
@nizamial09 nizamial09 deleted the mgr-api-test-fixes branch March 5, 2025 05:24
@github-project-automation github-project-automation bot moved this from Reviewer approved to Done in Ceph-Dashboard Mar 5, 2025
@nizamial09
Copy link
Member Author

thank you @bill-scales and @epuertat for your help here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants