Skip to content

rgw/notification: Fix segmentation fault in RGWPSListTopicsOp::execute() and correct topic listing to use get_topics_v2#60674

Merged
yuvalif merged 1 commit intoceph:mainfrom
oshrey16:bugfix-rgw-ps-list-topics-op
Nov 19, 2024
Merged

rgw/notification: Fix segmentation fault in RGWPSListTopicsOp::execute() and correct topic listing to use get_topics_v2#60674
yuvalif merged 1 commit intoceph:mainfrom
oshrey16:bugfix-rgw-ps-list-topics-op

Conversation

@oshrey16
Copy link
Copy Markdown
Contributor

@oshrey16 oshrey16 commented Nov 8, 2024

This PR resolves two issues in RGWPSListTopicsOp::execute():

  1. Segmentation Fault: Resolved a segmentation fault caused by dereferencing a null bucket pointer. Updated the code to use get_account_or_tenant(s->owner.id) to handle the bucket owner safely and prevent crashes.

  2. Fix for Topic Listing Logic: Fixed an issue where v2 notifications were not retrieved when v2 support was enabled. The logic now uses get_topics_v2 when available, and falls back to get_topics_v1 in cases v1 system or migration.

Fixes: https://tracker.ceph.com/issues/68756

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@oshrey16 oshrey16 requested a review from a team as a code owner November 8, 2024 12:08
@github-actions github-actions Bot added the rgw label Nov 8, 2024
@jmundack
Copy link
Copy Markdown
Contributor

jmundack commented Nov 8, 2024

@oshrey16 - thanks for fixing this!
can we add/update some test cases these situations?

@oshrey16
Copy link
Copy Markdown
Contributor Author

@jmundack Thank you for the feedback!
I appreciate your suggestion on adding more test cases. I'd be happy to enhance the PR by covering additional scenarios. I'll get started on this and follow up with an update soon!

Comment thread src/rgw/rgw_rest_pubsub.cc Outdated
driver->stat_topics_v1(s->bucket->get_tenant(), null_yield, this) == -ENOENT) {
op_ret = ps.get_topics_v1(this, result, y);
} else {
driver->stat_topics_v1(s->bucket_tenant, null_yield, this) == -ENOENT) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • why did you switch the if/else conditions?
  • this will fix the issue, but won't work in case of tenented topics (the "bucket_tenant" is probably empty in SNS requests). i think that the right things would be to use the get_account_or_tenant() call from line 495 to get the right value

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review!
I switched the conditions to fix the issue with V2 support. When both conditions are true:

  • Zonegroups support V2 notifications.
  • The V1 topic file is missing.

In this case, we’re fully in V2, so the system uses the get_topics_v2 function to retrieve topics from the V2 configuration.

If only one condition is met, it can indicate one of the following states:

  • Migration: The V1 topic file is exists, but all zonegroups support V2, indicating we’re migrating to V2. Here, the system should prioritize get_topics_v1.
  • V1: The V1 topic file exists, and not all zonegroups support V2. In this case, we fall back to get_topics_v1 to retrieve topics from the V1 configuration.

The updated code now correctly prioritizes get_topics_v2 when conditions allow and falls back to get_topics_v1 only when necessary.

I understand your concern regarding tenanted topics and will update the code to use get_account_or_tenant() to fetch the correct tenant information for SNS requests.

@github-actions github-actions Bot added the tests label Nov 11, 2024
http_server.close()


@attr('kafka_test')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to verify getting the topic list, you do not need to actually send them.
so, better to mark that as a basic_test.

topic_name2 = bucket_name + TOPIC_SUFFIX + '2'
topic_name3 = bucket_name + TOPIC_SUFFIX + '3'
# Start kafka receiver
task, receiver = create_kafka_receiver_thread(topic_name1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my previos comment. no need to start the kafka reciever - we are not sending any notifications, just configuring topics

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that you can still create a topic with a kafka endpoint, we don't check whether the endpoint is there until we send actual topics to it


# Create s3 topics
zonegroup = get_config_zonegroup()
topic_conf = PSTopicS3(conn, topic_name1, zonegroup, endpoint_args=endpoint_args)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please also add a case where we create a topic on a specific tenant, to make sure that we get the list per-tenant.

@oshrey16 oshrey16 force-pushed the bugfix-rgw-ps-list-topics-op branch from d46b919 to 96f04e0 Compare November 16, 2024 15:58
@yuvalif yuvalif self-requested a review November 18, 2024 07:31
@yuvalif
Copy link
Copy Markdown
Contributor

yuvalif commented Nov 18, 2024

jenkins test make check

3 similar comments
@yuvalif
Copy link
Copy Markdown
Contributor

yuvalif commented Nov 18, 2024

jenkins test make check

@oshrey16
Copy link
Copy Markdown
Contributor Author

jenkins test make check

@yuvalif
Copy link
Copy Markdown
Contributor

yuvalif commented Nov 18, 2024

jenkins test make check

@oshrey16 oshrey16 force-pushed the bugfix-rgw-ps-list-topics-op branch from 96f04e0 to 0ae8d76 Compare November 18, 2024 10:04
- Fixed a segmentation fault caused by a null bucket pointer in RGWPSListTopicsOp::execute()
- Corrected logic to use get_topics_v2 when supported, with fallback otherwise

Fixes: https://tracker.ceph.com/issues/68756
Signed-off-by: Oshrey Avraham <oshrey16@gmail.com>
@oshrey16 oshrey16 force-pushed the bugfix-rgw-ps-list-topics-op branch from 0ae8d76 to d27dab2 Compare November 18, 2024 10:08
@yuvalif
Copy link
Copy Markdown
Contributor

yuvalif commented Nov 18, 2024

jenkins test make check

@yuvalif
Copy link
Copy Markdown
Contributor

yuvalif commented Nov 18, 2024

jenkins test api

@yuvalif
Copy link
Copy Markdown
Contributor

yuvalif commented Nov 19, 2024

teuthology
notifications passing: https://pulpito.ceph.com/yuvalif-2024-11-18_13:49:41-rgw:notifications-wip-yuval-68756-distro-default-smithi/
rgw has 3 failures: https://pulpito.ceph.com/yuvalif-2024-11-18_13:46:55-rgw-wip-yuval-68756-distro-default-smithi/

  • crash is not seen
  • rgw_multi.tests.test_datalog_autotrim - does not seem related to the fix
  • ModuleNotFoundError: No module named 'cherrypy.wsgiserver' - python/test issue
  • Exception java.lang.NoClassDefFoundError: Could not initialize class org.codehaus.groovy.vmplugin.v7.Java7 - java/test issue

@yuvalif yuvalif merged commit 862104b into ceph:main Nov 19, 2024
@cbodley
Copy link
Copy Markdown
Contributor

cbodley commented Nov 19, 2024

thanks @oshrey16 @yuvalif!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants