Skip to content

erasure-code: Increase SIMD_ALIGN from 32 to 64#60246

Merged
SrinivasaBharath merged 1 commit intoceph:mainfrom
jamiepryde:SIMD-align-64
Jan 29, 2025
Merged

erasure-code: Increase SIMD_ALIGN from 32 to 64#60246
SrinivasaBharath merged 1 commit intoceph:mainfrom
jamiepryde:SIMD-align-64

Conversation

@jamiepryde
Copy link
Contributor

Fixes: https://tracker.ceph.com/issues/61573

We want the buffers used for erasure coding to be 64-byte aligned. This should ensure that they fit a single 64 byte cache line for AVX512 instructions. If the buffers are misaligned then we can see a reduction in performance as the CPU has to do extra loads from memory.

The testing in the tracker does show a performance gain using 64-byte alignment. I wasn't able to see any improvement in my own testing, but this could be down to the CPU I used (Xeon Gold 6336Y).

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

Fixes: https://tracker.ceph.com/issues/61573

Signed-off-by: Jamie Pryde <jamiepry@uk.ibm.com>
@jamiepryde jamiepryde self-assigned this Oct 10, 2024
@jamiepryde jamiepryde requested a review from a team as a code owner October 10, 2024 12:52
@jamiepryde
Copy link
Contributor Author

ceph_erasure_code_benchmark does not show any noticeable difference on my system

32 byte-aligned encode
image

64 byte-aligned encode
image

32 byte-aligned decode
image

64 byte-aligned decode
image

@jamiepryde
Copy link
Contributor Author

jenkins test make check arm64

Copy link
Member

@markhpc markhpc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM assuming there are no weird corner cases it breaks!

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Dec 30, 2024
@markhpc
Copy link
Member

markhpc commented Jan 9, 2025

@yuriw Any idea if this one passed testing? Thanks!

@github-actions github-actions bot removed the stale label Jan 9, 2025
@ljflores
Copy link
Member

Hey @jamiepryde and @markhpc, apologies for the delay on this one. I see an erasure code timeout would be good to confirm isn't a regression before moving foward:
/a/yuriw-2024-11-20_16:11:55-rados-wip-yuri3-testing-2024-11-14-0857-distro-default-smithi/8001927

2024-11-20T23:15:48.704 INFO:tasks.ceph.ceph_manager.ceph:{'pgid': '3.f', 'version': "985'12460", 'reported_seq': 24949, 'reported_epoch': 1539, 'state': 'active+recovering+undersized+remapped', 'last_fresh': '2024-11-20T23:15:44.824441+0000', 'last_change': '2024-11-20T22:33:34.780387+0000', 'last_active': '2024-11-20T23:15:44.824441+0000', 'last_peered': '2024-11-20T23:15:44.824441+0000', 'last_clean': '2024-11-20T22:28:36.199758+0000', 'last_became_active': '2024-11-20T22:29:00.345905+0000', 'last_became_peered': '2024-11-20T22:29:00.345905+0000', 'last_unstale': '2024-11-20T23:15:44.824441+0000', 'last_undegraded': '2024-11-20T23:15:44.824441+0000', 'last_fullsized': '2024-11-20T22:28:38.200361+0000', 'mapping_epoch': 151, 'log_start': "79'2400", 'ondisk_log_start': "79'2400", 'created': 32, 'last_epoch_clean': 87, 'parent': '0.0', 'parent_split_bits': 0, 'last_scrub': "145'7910", 'last_scrub_stamp': '2024-11-20T22:28:32.158770+0000', 'last_deep_scrub': "0'0", 'last_deep_scrub_stamp': '2024-11-20T22:25:22.815227+0000', 'last_clean_scrub_stamp': '2024-11-20T22:28:32.158770+0000', 'objects_scrubbed': 4582, 'log_size': 10060, 'log_dups_size': 0, 'ondisk_log_size': 10060, 'stats_invalid': False, 'dirty_stats_invalid': False, 'omap_stats_invalid': False, 'hitset_stats_invalid': False, 'hitset_bytes_stats_invalid': False, 'pin_stats_invalid': False, 'manifest_stats_invalid': False, 'snaptrimq_len': 0, 'last_scrub_duration': 2, 'scrub_schedule': 'no scrub is scheduled', 'scrub_duration': 1667, 'objects_trimmed': 0, 'snaptrim_duration': 0, 'stat_sum': {'num_bytes': 131072, 'num_objects': 2, 'num_object_clones': 0, 'num_object_copies': 6, 'num_objects_missing_on_primary': 1794, 'num_objects_missing': 1794, 'num_objects_degraded': 0, 'num_objects_misplaced': 6, 'num_objects_unfound': 0, 'num_objects_dirty': 2, 'num_whiteouts': 0, 'num_read': 0, 'num_read_kb': 0, 'num_write': 12460, 'num_write_kb': 398784, 'num_scrub_errors': 0, 'num_shallow_scrub_errors': 0, 'num_deep_scrub_errors': 0, 'num_objects_recovered': 4893, 'num_bytes_recovered': 320667648, 'num_keys_recovered': 0, 'num_objects_omap': 0, 'num_objects_hit_set_archive': 0, 'num_bytes_hit_set_archive': 0, 'num_flush': 0, 'num_flush_kb': 0, 'num_evict': 0, 'num_evict_kb': 0, 'num_promote': 0, 'num_flush_mode_high': 0, 'num_flush_mode_low': 0, 'num_evict_mode_some': 0, 'num_evict_mode_full': 0, 'num_objects_pinned': 0, 'num_legacy_snapsets': 0, 'num_large_omap_objects': 0, 'num_objects_manifest': 0, 'num_omap_bytes': 0, 'num_omap_keys': 0, 'num_objects_repaired': 0}, 'up': [2, 3, 15], 'acting': [2, 3, 2147483647], 'avail_no_missing': ['4(2)', '12(1)', '15(0)'], 'object_location_counts': [{'shards': '4(2),12(1),15(0)', 'objects': 2}], 'blocked_by': [], 'up_primary': 2, 'acting_primary': 2, 'purged_snaps': []}
2024-11-20T23:15:48.706 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
  File "/home/teuthworker/src/github.com_ceph_ceph-c_aec760e0fabcca6ad826f66c77e5a4b1dfe7585f/qa/tasks/ceph_manager.py", line 192, in wrapper
    return func(self)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_aec760e0fabcca6ad826f66c77e5a4b1dfe7585f/qa/tasks/ceph_manager.py", line 1488, in _do_thrash
    self.ceph_manager.wait_for_recovery(
  File "/home/teuthworker/src/github.com_ceph_ceph-c_aec760e0fabcca6ad826f66c77e5a4b1dfe7585f/qa/tasks/ceph_manager.py", line 3015, in wait_for_recovery
    assert now - start < timeout, \
AssertionError: wait_for_recovery: failed before timeout expired

Since the failure is a bit ambiguous, I am going to retest it along with #59679, for which I also identified a timeout on a test with the new isa profile. Doing an isolated retest will give us a better idea if this is a regression.

Testing ref: https://tracker.ceph.com/issues/68795

@Naveenaidu
Copy link
Contributor

@SrinivasaBharath SrinivasaBharath merged commit a11c179 into ceph:main Jan 29, 2025
@jamiepryde jamiepryde deleted the SIMD-align-64 branch July 10, 2025 14:30
@jamiepryde jamiepryde restored the SIMD-align-64 branch July 10, 2025 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants