Skip to content

ceph-dencoder: RBD - Add missing types#52481

Closed
NitzanMordhai wants to merge 2 commits intomainfrom
wip-nitzan-ceph-dencoder-extend-rbd-type-available
Closed

ceph-dencoder: RBD - Add missing types#52481
NitzanMordhai wants to merge 2 commits intomainfrom
wip-nitzan-ceph-dencoder-extend-rbd-type-available

Conversation

@NitzanMordhai
Copy link
Contributor

@NitzanMordhai NitzanMordhai commented Jul 17, 2023

Currently, ceph-dencoder lacks certain rbd types, preventing us from accurately checking the ceph corpus for encode-decode mismatches. This pull request aims to address this issue by adding the missing types to ceph-dencoder.

To successfully incorporate these types into ceph-dencoder, we need to introduce the necessary dump and generate_test_instances functions that was missing in some types. These functions are essential for proper encode and decode of the added types.

This PR will enhance the functionality of ceph-dencoder by including the missing types, enabling a comprehensive analysis of encode-decode consistency. With the addition of these types, we can ensure the robustness and correctness of the ceph corpus.

This update will significantly contribute to improving the overall reliability and accuracy of ceph-dencoder. It allows for a more comprehensive assessment of the encode-decode behavior, leading to enhanced data integrity and stability within the ceph ecosystem.

Fixes: https://tracker.ceph.com/issues/61788

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@NitzanMordhai NitzanMordhai requested a review from a team as a code owner July 17, 2023 12:33
@github-actions github-actions bot added the rbd label Jul 17, 2023
@NitzanMordhai
Copy link
Contributor Author

jenkins test make check

using namespace librbd::watch_notify;
TYPE_NOCOPY(NotifyMessage)
TYPE(ResponseMessage)
TYPE(AsyncRequestId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here and below for using namespace cls::rbd;. I don't understand why these entries are being duplicated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the corpuse repository contain types without namespaces. that causing ceph-dencoder to output the type is not recognize (wihtout full namespace), i didn't want to remove the duplicate exist type names.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the corpuse repository contain types without namespaces.

Still not following. For example, RBD has several different NotifyMessage types in different namespaces. How does that work?

that causing ceph-dencoder to output the type is not recognize (wihtout full namespace)

Do you have an example of such output?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, so we will need from now to change the WRITE_CLASS_ENCODER so we will dump the type with the namespace, currently we have WRITE_CLASS_ENCODER(NotifyMessage) so it will be WRITE_CLASS_ENCODER(librbd::watch_notify::NotifyMessage).

we will lose the archived version checks, but we will make it right for the next versions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, so we will need from now to change the WRITE_CLASS_ENCODER so we will dump the type with the namespace, currently we have WRITE_CLASS_ENCODER(NotifyMessage) so it will be WRITE_CLASS_ENCODER(librbd::watch_notify::NotifyMessage).

My first instinct would be to change WRITE_CLASS_ENCODER macro to get the fully-qualified name of the type internally and use that for naming the dump file. But, if

we will lose the archived version checks, but we will make it right for the next versions.

is the case, that is probably too disruptive.

Hrm... I think NotifyMessage is unique in that we have five of them in different namespaces so as a (rather poor) workaround we might consider just giving each of them a unique name.

But fully-qualified names have been there in ceph-dencoder types.h and later rbd_types.h for years: see e.g. eb8d37a. Are you saying that it never worked?

Can you describe the breakage related to the object corpus in detail? Since three out of five NotifyMessage types happen to be registered with ceph-dencoder but their fully-qualified names get stripped, what actually made it to the corpus?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@idryomov I added new column to your table if corpus has object:

WRITE_CLASS macro in the code TYPE macro in rbd_types.h or as noted STATUS NOTES corpus entries
src/cls/journal/cls_journal_types.h:WRITE_CLASS_ENCODER(ObjectPosition); TYPE(cls::journal::ObjectPosition)   in rgw_types.h -- why?  
src/cls/journal/cls_journal_types.h:WRITE_CLASS_ENCODER(ObjectSetPosition); TYPE(cls::journal::ObjectSetPosition)   in rgw_types.h -- why?  
src/cls/journal/cls_journal_types.h:WRITE_CLASS_ENCODER(Client); TYPE(cls::journal::Client)   in rgw_types.h -- why?  
src/cls/journal/cls_journal_types.h:WRITE_CLASS_ENCODER(Tag); TYPE(cls::journal::Tag)   in rgw_types.h -- why?  
src/cls/lock/cls_lock_ops.h:WRITE_CLASS_ENCODER(cls_lock_lock_op); TYPE(cls_lock_lock_op) MATCH in common_types.h  
src/cls/lock/cls_lock_ops.h:WRITE_CLASS_ENCODER(cls_lock_unlock_op); TYPE(cls_lock_unlock_op) MATCH in common_types.h  
src/cls/lock/cls_lock_ops.h:WRITE_CLASS_ENCODER(cls_lock_break_op); TYPE(cls_lock_break_op) MATCH in common_types.h  
src/cls/lock/cls_lock_ops.h:WRITE_CLASS_ENCODER(cls_lock_get_info_op); TYPE(cls_lock_get_info_op) MATCH in common_types.h  
src/cls/lock/cls_lock_ops.h:WRITE_CLASS_ENCODER_FEATURES(cls_lock_get_info_reply); TYPE_FEATUREFUL(cls_lock_get_info_reply) MATCH in common_types.h  
src/cls/lock/cls_lock_ops.h:WRITE_CLASS_ENCODER(cls_lock_list_locks_reply); TYPE(cls_lock_list_locks_reply) MATCH in common_types.h  
src/cls/lock/cls_lock_ops.h:WRITE_CLASS_ENCODER(cls_lock_assert_op); TYPE(cls_lock_assert_op) MATCH in common_types.h  
src/cls/lock/cls_lock_ops.h:WRITE_CLASS_ENCODER(cls_lock_set_cookie_op); TYPE(cls_lock_set_cookie_op) MATCH in common_types.h  
src/cls/lock/cls_lock_types.h: WRITE_CLASS_ENCODER(locker_id_t); TYPE(rados::cls::lock::locker_id_t)   in common_types.h  
src/cls/lock/cls_lock_types.h: WRITE_CLASS_ENCODER_FEATURES(locker_info_t); TYPE_FEATUREFUL(rados::cls::lock::locker_info_t)   in common_types.h  
src/cls/lock/cls_lock_types.h: WRITE_CLASS_ENCODER_FEATURES(lock_info_t); TYPE_FEATUREFUL(rados::cls::lock::lock_info_t)   in common_types.h  
src/cls/rbd/cls_rbd.h:WRITE_CLASS_ENCODER_FEATURES(cls_rbd_parent); TYPE_FEATUREFUL(cls_rbd_parent) MATCH   ./objects/cls_rbd_parent
src/cls/rbd/cls_rbd.h:WRITE_CLASS_ENCODER_FEATURES(cls_rbd_snap); TYPE_FEATUREFUL(cls_rbd_snap) MATCH   ./objects/cls_rbd_snap
src/cls/rbd/cls_rbd_types.h:WRITE_CLASS_ENCODER(MirrorPeer); TYPE(cls::rbd::MirrorPeer)     ./objects/cls::rbd::MirrorPeer
src/cls/rbd/cls_rbd_types.h:WRITE_CLASS_ENCODER(MirrorImage); TYPE(cls::rbd::MirrorImage)     ./objects/cls::rbd::MirrorImage
src/cls/rbd/cls_rbd_types.h:WRITE_CLASS_ENCODER(MirrorImageSiteStatus); TYPE(cls::rbd::MirrorImageSiteStatus)     ./objects/cls::rbd::MirrorImageSiteStatus
src/cls/rbd/cls_rbd_types.h:WRITE_CLASS_ENCODER_FEATURES(MirrorImageSiteStatusOnDisk); TYPE_FEATUREFUL(cls::rbd::MirrorImageSiteStatusOnDisk)     ./objects/cls::rbd::MirrorImageSiteStatusOnDisk
src/cls/rbd/cls_rbd_types.h:WRITE_CLASS_ENCODER(MirrorImageStatus); TYPE(cls::rbd::MirrorImageStatus)     ./objects/cls::rbd::MirrorImageStatus
src/cls/rbd/cls_rbd_types.h:WRITE_CLASS_ENCODER(ParentImageSpec); TYPE(cls::rbd::ParentImageSpec)     ./objects/cls::rbd::ParentImageSpec
src/cls/rbd/cls_rbd_types.h:WRITE_CLASS_ENCODER(ChildImageSpec); TYPE(cls::rbd::ChildImageSpec)     ./objects/cls::rbd::ChildImageSpec
src/cls/rbd/cls_rbd_types.h:WRITE_CLASS_ENCODER(GroupImageSpec); TYPE(cls::rbd::GroupImageSpec)     ./objects/cls::rbd::GroupImageSpec
src/cls/rbd/cls_rbd_types.h:WRITE_CLASS_ENCODER(GroupImageStatus); TYPE(cls::rbd::GroupImageStatus)     ./objects/cls::rbd::GroupImageStatus
src/cls/rbd/cls_rbd_types.h:WRITE_CLASS_ENCODER(GroupSpec); TYPE(cls::rbd::GroupSpec)     ./objects/cls::rbd::GroupSpec
src/cls/rbd/cls_rbd_types.h:WRITE_CLASS_ENCODER(SnapshotNamespace); TYPE(cls::rbd::SnapshotNamespace)     ./objects/cls::rbd::SnapshotNamespace
src/cls/rbd/cls_rbd_types.h:WRITE_CLASS_ENCODER(SnapshotInfo); TYPE(cls::rbd::SnapshotInfo)     ./objects/cls::rbd::SnapshotInfo
src/cls/rbd/cls_rbd_types.h:WRITE_CLASS_ENCODER(ImageSnapshotSpec); TYPE(cls::rbd::ImageSnapshotSpec)     ./objects/cls::rbd::ImageSnapshotSpec
src/cls/rbd/cls_rbd_types.h:WRITE_CLASS_ENCODER(GroupSnapshot); TYPE(cls::rbd::GroupSnapshot)     ./objects/cls::rbd::GroupSnapshot
src/cls/rbd/cls_rbd_types.h:WRITE_CLASS_ENCODER(TrashImageSpec);        
src/cls/rbd/cls_rbd_types.h:WRITE_CLASS_ENCODER(MirrorImageMap); TYPE(cls::rbd::MirrorImageMap)     ./objects/cls::rbd::MirrorImageMap
src/cls/rbd/cls_rbd_types.h:WRITE_CLASS_ENCODER(MigrationSpec); TYPE(cls::rbd::MigrationSpec)     ./objects/cls::rbd::MigrationSpec
src/journal/Entry.h:WRITE_CLASS_ENCODER(journal::Entry); TYPE(journal::Entry) MATCH in common_types.h  
src/librbd/WatchNotifyTypes.h:WRITE_CLASS_ENCODER(ClientId);     duplicate WRITE_CLASS_ENCODER for librbd::watcher::ClientId  
src/librbd/WatchNotifyTypes.h:WRITE_CLASS_ENCODER(AsyncRequestId);        
src/librbd/WatchNotifyTypes.h:WRITE_CLASS_ENCODER(NotifyMessage); TYPE_NOCOPY(librbd::watch_notify::NotifyMessage)     ./objects/librbd::watch_notify::NotifyMessage
src/librbd/WatchNotifyTypes.h:WRITE_CLASS_ENCODER(ResponseMessage); TYPE(librbd::watch_notify::ResponseMessage)     ./objects/librbd::watch_notify::ResponseMessage
src/librbd/cache/pwl/Types.h:WRITE_CLASS_DENC(librbd::cache::pwl::WriteLogCacheEntry); TYPE(librbd::cache::pwl::WriteLogCacheEntry) MATCH    
src/librbd/cache/pwl/Types.h:WRITE_CLASS_DENC(librbd::cache::pwl::WriteLogPoolRoot); TYPE(librbd::cache::pwl::WriteLogPoolRoot) MATCH    
src/librbd/cache/pwl/ssd/Types.h:WRITE_CLASS_DENC(librbd::cache::pwl::ssd::SuperBlock); TYPE(librbd::cache::pwl::ssd::SuperBlock) MATCH    
src/librbd/journal/Types.h:WRITE_CLASS_ENCODER(EventEntry); TYPE(librbd::journal::EventEntry)     ./objects/librbd::journal::EventEntry
src/librbd/journal/Types.h:WRITE_CLASS_ENCODER(ClientData); TYPE(librbd::journal::ClientData)     ./objects/librbd::journal::ClientData
src/librbd/journal/Types.h:WRITE_CLASS_ENCODER(TagData); TYPE(librbd::journal::TagData)     ./objects/librbd::journal::TagData
src/librbd/mirror/snapshot/Types.h:WRITE_CLASS_ENCODER(ImageStateHeader);        
src/librbd/mirror/snapshot/Types.h:WRITE_CLASS_ENCODER(SnapState);        
src/librbd/mirror/snapshot/Types.h:WRITE_CLASS_ENCODER(ImageState);        
src/librbd/mirroring_watcher/Types.h:WRITE_CLASS_ENCODER(NotifyMessage); TYPE(librbd::mirroring_watcher::NotifyMessage)     ./objects/librbd::mirroring_watcher::NotifyMessage
src/librbd/trash_watcher/Types.h:WRITE_CLASS_ENCODER(NotifyMessage); TYPE(librbd::trash_watcher::NotifyMessage)     ./objects/librbd::trash_watcher::NotifyMessage
src/librbd/watcher/Types.h:WRITE_CLASS_ENCODER(ClientId);        
src/librbd/watcher/Types.h:WRITE_CLASS_ENCODER(NotifyResponse);        
src/rbd_replay/ActionTypes.h:WRITE_CLASS_ENCODER(Dependency); TYPE(rbd_replay::action::Dependency)     ./objects/rbd_replay::action::Dependency
src/rbd_replay/ActionTypes.h:WRITE_CLASS_ENCODER(ActionEntry); TYPE(rbd_replay::action::ActionEntry)     ./objects/rbd_replay::action::ActionEntry
src/tools/rbd_mirror/image_map/Types.h:WRITE_CLASS_ENCODER(PolicyData); TYPE(rbd::mirror::image_map::PolicyData)     ./objects/rbd::mirror::image_map::PolicyData
src/tools/rbd_mirror/instance_watcher/Types.h:WRITE_CLASS_ENCODER(NotifyMessage);        
src/tools/rbd_mirror/instance_watcher/Types.h:WRITE_CLASS_ENCODER(NotifyAckPayload);        
src/tools/rbd_mirror/leader_watcher/Types.h:WRITE_CLASS_ENCODER(NotifyMessage);        

yes, most of the types doesn't have entries in corpus, I'll need to add the corpus script more test that will cause the corpus capture those types

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@idryomov if we want to match the TYPE and WRITE_CLASS_ENCODER, that means we need to update all the corpus directories as well (if we updating the TYPE)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@idryomov if we want to match the TYPE and WRITE_CLASS_ENCODER, that means we need to update all the corpus directories as well (if we updating the TYPE)

It's been a while and the details are out of my head... IIRC the reason I put together this table is that you stated that the namespace part of the name doesn't make it to the corpus:

the corpuse repository contain types without namespaces. that causing ceph-dencoder to output the type is not recognize (wihtout full namespace), i didn't want to remove the duplicate exist type names.

But now, entries like ./objects/cls::rbd::MirrorPeer, ./objects/librbd::journal::TagData, etc in the column that you added today seem to contradict that -- the namespace part is clearly there.

So I'm not sure what you mean by "if we want to match the TYPE and WRITE_CLASS_ENCODER" (emphasis mine). Am I wrong in assuming that they need to match for the tests to work correctly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I wrong in assuming that they need to match for the tests to work correctly?

If I'm not wrong, then a further observation is that the intersection of MATCH status and an entry being present in the corpus produces only two entries: ./objects/cls_rbd_parent and ./objects/cls_rbd_snap.

  • Are these two types the only RBD-related types that are not only stored in the corpus and also checked?
  • What is the behavior for entries such as ./objects/cls::rbd::MirrorPeer which are stored in the corpus but their WRITE_CLASS_ENCODER and TYPE names don't match? Are they just skipped?

Copy link
Contributor

@idryomov idryomov Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are they just skipped?

You may have answered this already in an earlier comment

[...] since the corpus script first check if the type is supported by ceph-dencoder, its ignore all the types that return unsupported.

but I want to get a confirmation to avoid misunderstanding.

Is there some message logged when this happens? If so, can you confirm that this message is produced for cls::rbd::MirrorPeer and other RBD-related types that have entries in the corpus except for cls_rbd_parent and cls_rbd_snap?

@NitzanMordhai NitzanMordhai force-pushed the wip-nitzan-ceph-dencoder-extend-rbd-type-available branch from 39d63d8 to acc566e Compare July 24, 2023 09:37
@NitzanMordhai NitzanMordhai requested a review from idryomov July 24, 2023 10:40
@NitzanMordhai NitzanMordhai force-pushed the wip-nitzan-ceph-dencoder-extend-rbd-type-available branch 2 times, most recently from 6f7684d to be5b64b Compare July 31, 2023 11:51
@NitzanMordhai NitzanMordhai changed the title ceph-dencoder: Add missing rbd types to ceph-dencoder for accurate en… ceph-dencoder: RBD - Add missing types Aug 6, 2023
@NitzanMordhai
Copy link
Contributor Author

jenkins retest this please

@NitzanMordhai NitzanMordhai force-pushed the wip-nitzan-ceph-dencoder-extend-rbd-type-available branch from be5b64b to 9e957e7 Compare August 14, 2023 06:04
@NitzanMordhai NitzanMordhai force-pushed the wip-nitzan-ceph-dencoder-extend-rbd-type-available branch 2 times, most recently from 574fe56 to f3d4c42 Compare September 7, 2023 11:29
…code-decode comparison

Currently, ceph-dencoder lacks certain rbd types, preventing us from accurately checking the ceph corpus for encode-decode mismatches.
This pull request aims to address this issue by adding the missing types to ceph-dencoder.

To successfully incorporate these types into ceph-dencoder, we need to introduce the necessary `dump` and `generate_test_instances`
functions that was missing in some types. These functions are essential for proper encode and decode of the added types.

This PR will enhance the functionality of ceph-dencoder by including the missing types, enabling a comprehensive analysis of encode-decode consistency.
With the addition of these types, we can ensure the robustness and correctness of the ceph corpus.

This update will significantly contribute to improving the overall reliability and accuracy of ceph-dencoder.
It allows for a more comprehensive assessment of the encode-decode behavior, leading to enhanced data integrity and stability within the ceph ecosystem.

Fixes: https://tracker.ceph.com/issues/61788
Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
@NitzanMordhai NitzanMordhai force-pushed the wip-nitzan-ceph-dencoder-extend-rbd-type-available branch from f3d4c42 to 892672c Compare September 7, 2023 12:00
@NitzanMordhai
Copy link
Contributor Author

@idryomov i completed the changes for ceph-objects-corpus and now we are good to go. is there any other changes needed for that PR?

@idryomov
Copy link
Contributor

i completed the changes for ceph-objects-corpus

What exactly do you mean here? I see "submodule: update ceph-object-corpus submodule" commit, but it doesn't seem to touch any submodules. In fact it appears to be completely empty.

TYPE(GroupSpec)
TYPE(ImageSnapshotSpec)
TYPE(SnapshotInfo)
TYPE(SnapshotNamespace)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to decide which way to go for types which are not matching or missing: introduce them either as fully qualified or not fully qualified and be consistent about it. This is actually an interesting question because I don't think we have ever intended for C++ namespaces to "leak" in this fashion. Regardless, I really doubt that both

TYPE(cls::rbd::SnapshotNamespace)

and

TYPE(SnapshotNamespace)

are needed and this goes for 15 other types above too.

The only type which has to be fully qualified is NotifyMessage because there are five of them in different namespaces.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Jan 19, 2024
@github-actions
Copy link

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

@github-actions github-actions bot closed this Feb 18, 2024
@rzarzynski rzarzynski reopened this Aug 15, 2024
@rzarzynski
Copy link
Contributor

We talked about resurrecting this PR during the CDS RADOS session this Tuesday.
@NitzanMordhai, would you mind addressing the @idryomov's comment on namespaces?

@github-actions github-actions bot removed the stale label Aug 15, 2024
@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Oct 18, 2024
@github-actions
Copy link

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

@github-actions github-actions bot closed this Nov 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants