test/erasure-code: Get more accurate erasure code benchmark results by avoiding reallocating large buffers. by jamiepryde · Pull Request #60121 · ceph/ceph

jamiepryde · 2024-10-04T16:13:47Z

This is a follow-up PR that adds more improvements to the erasure code bench following #59486. As stated in the earlier PR, we want to use large data buffers that don't fit in the CPU cache so that we can get a more accurate representation of the encoding and decoding performance of the various erasure code plugins that Ceph supports. It was noticed that repeatedly calling the encode and decode functions with very large buffers can result in lots of reallocations and system calls to grow and shrink the heap. For example, we see the following results when using a large 100MB buffer with 400 encode iterations:

 Performance counter stats for './ceph_erasure_code_benchmark -s 102400000 -i 400 -p isa -w encode -P k=4 -P m=3 --erasure-code-dir .':

           5236545      dTLB-load-misses
          45997074      dTLB-store-misses
           4990679      iTLB-load-misses
        1401943352      cache-references
         277535788      cache-misses              #   19.797 % of all cache refs

      11.223069055 seconds time elapsed

       4.082841000 seconds user
       7.050570000 seconds sys

The majority of the test time (7 of 11 seconds) is spent in system calls resizing the heap, rather than testing the performance of the EC plugin.

This PR changes the encode and decode benchmarks so that we have a single call to the encode or decode function to ensure proper alignment of buffers. Then on every further iteration we call encode_chunks or decode_chunks directly, using the same buffers from the first iteration. This means there is a single allocation of buffers followed by lots of testing of the plugin to encode/decode data. e.g:

 Performance counter stats for './ceph_erasure_code_benchmark -s 102400000 -i 400 -p isa -w encode -P k=4 -P m=3 --erasure-code-dir . -P technique=reed_sol_van':

           4289767      dTLB-load-misses
           9914327      dTLB-store-misses
            168432      iTLB-load-misses
        1030858008      cache-references
         234344122      cache-misses              #   22.733 % of all cache refs

       4.170905991 seconds time elapsed

       3.988915000 seconds user
       0.169565000 seconds sys

I have made it more clear that there are 3 methods of running the decode test.
Method 1 = specify "-E exhaustive" - tests all possible erasure combinations. This will continue to use decode on every iteration due to needing to reallocate buffers.
Method 2 = specify specific chunks to erase with "--erased 0 --erased 1". This will call decode on the first iteration and decode_chunks on further iterations.
Method 3 = specify number of erasures with "--erasures 3". This randomly selects the chunks to erase. This will also call decode on the first iteration and decode_chunks on further iterations.

Also note that we still call encode on every iteration when testing LRC. I haven't looked into whether there is a nice workaround yet, but LRC requires that encode is called before encode_chunks on every iteration.

Contribution Guidelines

To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows
jenkins test rook e2e

…ng large buffers Signed-off-by: Jamie Pryde <jamiepry@uk.ibm.com>

jamiepryde · 2024-10-10T13:56:34Z

jenkins test make check arm64

jamiepryde · 2024-10-10T13:56:49Z

jenkins test api

jamiepryde · 2024-10-10T19:54:06Z

jenkins test make check arm64

jamiepryde · 2024-10-17T22:06:34Z

jenkins test make check arm64

github-actions · 2024-12-17T02:28:38Z

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

github-actions · 2025-01-16T04:01:49Z

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

test/erasure-code: More accurate bench results by avoiding reallocati…

0afbac7

…ng large buffers Signed-off-by: Jamie Pryde <jamiepry@uk.ibm.com>

jamiepryde added the performance label Oct 4, 2024

jamiepryde self-assigned this Oct 4, 2024

github-actions bot added the tests label Oct 4, 2024

jamiepryde mentioned this pull request Oct 14, 2024

core: Change the default plugin for Ceph erasure coded pools from Jerasure to ISA-L #58052

Merged

14 tasks

markhpc requested a review from a team October 17, 2024 14:16

github-actions bot added the stale label Dec 17, 2024

github-actions bot closed this Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test/erasure-code: Get more accurate erasure code benchmark results by avoiding reallocating large buffers.#60121

test/erasure-code: Get more accurate erasure code benchmark results by avoiding reallocating large buffers.#60121
jamiepryde wants to merge 1 commit intoceph:mainfrom
jamiepryde:more-ceph-ec-benchmark-improvements-2

jamiepryde commented Oct 4, 2024 •

edited

Loading

Uh oh!

jamiepryde commented Oct 10, 2024

Uh oh!

jamiepryde commented Oct 10, 2024

Uh oh!

jamiepryde commented Oct 10, 2024

Uh oh!

jamiepryde commented Oct 17, 2024

Uh oh!

github-actions bot commented Dec 17, 2024

Uh oh!

github-actions bot commented Jan 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jamiepryde commented Oct 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Contribution Guidelines

Checklist

Uh oh!

jamiepryde commented Oct 10, 2024

Uh oh!

jamiepryde commented Oct 10, 2024

Uh oh!

jamiepryde commented Oct 10, 2024

Uh oh!

jamiepryde commented Oct 17, 2024

Uh oh!

github-actions bot commented Dec 17, 2024

Uh oh!

github-actions bot commented Jan 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jamiepryde commented Oct 4, 2024 •

edited

Loading