-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[opt](rowset) Remote fetch rowsets to avoid -230 error when capturing rowsets (#52995) #52440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
|
run buildall |
|
run buildall |
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
|
PR approved by at least one committer and no changes requested. |
dataroaring
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…52582) ### What problem does this PR solve? Currently, `DeleteBitmap`'s wrong move assignment operator can cause correctness problem on mow table if SC's alter process produce delete bitmaps when calculating delete bitmaps for incremental rowsets after the modification in `CloudSchemaChangeJob::_process_delete_bitmap` of #52440 because `origin_dbm` and `delete_bitmap` refer to the same `DeleteBitmap` object. 
…52582) ### What problem does this PR solve? Currently, `DeleteBitmap`'s wrong move assignment operator can cause correctness problem on mow table if SC's alter process produce delete bitmaps when calculating delete bitmaps for incremental rowsets after the modification in `CloudSchemaChangeJob::_process_delete_bitmap` of #52440 because `origin_dbm` and `delete_bitmap` refer to the same `DeleteBitmap` object. 
…52582) ### What problem does this PR solve? Currently, `DeleteBitmap`'s wrong move assignment operator can cause correctness problem on mow table if SC's alter process produce delete bitmaps when calculating delete bitmaps for incremental rowsets after the modification in `CloudSchemaChangeJob::_process_delete_bitmap` of #52440 because `origin_dbm` and `delete_bitmap` refer to the same `DeleteBitmap` object. 
### What problem does this PR solve? Fix 1. cloud_decommission increased it robustness 2. test_cloud_full_compaction_multi_segments, test_cloud_mow_retry_txn_interleave, test_cloud_concurrent_calc_dbm_task fix its not set fe debug point 3. fix CloudTablet.capture_rs_readers.return.e-230 debug point not work, due to #52440,
Fix 1. cloud_decommission increased it robustness 2. test_cloud_full_compaction_multi_segments, test_cloud_mow_retry_txn_interleave, test_cloud_concurrent_calc_dbm_task fix its not set fe debug point 3. fix CloudTablet.capture_rs_readers.return.e-230 debug point not work, due to apache#52440,
Fix 1. cloud_decommission increased it robustness 2. test_cloud_full_compaction_multi_segments, test_cloud_mow_retry_txn_interleave, test_cloud_concurrent_calc_dbm_task fix its not set fe debug point 3. fix CloudTablet.capture_rs_readers.return.e-230 debug point not work, due to apache#52440,
…kends except for warmup jobs (#54131) ### What problem does this PR solve? Problem Summary: Fix logic conflict of #52514 and #52440 ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
It is intruduced by apache#40716 while it is dismissed by apache#52440.
It is intruduced by apache#40716 while it is dismissed by apache#52440.
Fix 1. cloud_decommission increased it robustness 2. test_cloud_full_compaction_multi_segments, test_cloud_mow_retry_txn_interleave, test_cloud_concurrent_calc_dbm_task fix its not set fe debug point 3. fix CloudTablet.capture_rs_readers.return.e-230 debug point not work, due to apache#52440, pick from apache#54227
… rowsets (#52995) Related PR: #52440 In read-write splitting scenarios, some BE (Backend) nodes may have already merged certain rowset versions, while another BE still attempts to capture or access those rowsets. When this happens, the BE reports error E-230 (versions already merged), causing data access or synchronization to fail. This PR introduces a remote rowset fetching mechanism, allowing a BE that lacks the required rowset to fetch it from other BE nodes, instead of failing with E-230. - Added a remote fetch mechanism in the rowset management layer: When a BE detects that a rowset is missing locally but has already been merged, it will try to fetch the rowset from other BE nodes. - Updated version and state checking logic to correctly identify the “merged but missing” condition. - Adjusted the rowset access path to trigger remote fetch rather than throwing an immediate error. - Added tests (unit/integration) to cover the new logic where applicable. - Ensured backward compatibility: If the BE already has the rowset locally or read-write splitting is not enabled, the behavior remains unchanged. ### Release note Introduce a remote rowset fetching mechanism to prevent E-230 (“versions already merged”) errors in read-write splitting scenarios. This improves BE fault tolerance when some nodes have merged versions that others have not yet synchronized.
… rowsets (apache#52995) Related PR: apache#52440 In read-write splitting scenarios, some BE (Backend) nodes may have already merged certain rowset versions, while another BE still attempts to capture or access those rowsets. When this happens, the BE reports error E-230 (versions already merged), causing data access or synchronization to fail. This PR introduces a remote rowset fetching mechanism, allowing a BE that lacks the required rowset to fetch it from other BE nodes, instead of failing with E-230. - Added a remote fetch mechanism in the rowset management layer: When a BE detects that a rowset is missing locally but has already been merged, it will try to fetch the rowset from other BE nodes. - Updated version and state checking logic to correctly identify the “merged but missing” condition. - Adjusted the rowset access path to trigger remote fetch rather than throwing an immediate error. - Added tests (unit/integration) to cover the new logic where applicable. - Ensured backward compatibility: If the BE already has the rowset locally or read-write splitting is not enabled, the behavior remains unchanged. Introduce a remote rowset fetching mechanism to prevent E-230 (“versions already merged”) errors in read-write splitting scenarios. This improves BE fault tolerance when some nodes have merged versions that others have not yet synchronized.
… rowsets (apache#52995) Related PR: apache#52440 In read-write splitting scenarios, some BE (Backend) nodes may have already merged certain rowset versions, while another BE still attempts to capture or access those rowsets. When this happens, the BE reports error E-230 (versions already merged), causing data access or synchronization to fail. This PR introduces a remote rowset fetching mechanism, allowing a BE that lacks the required rowset to fetch it from other BE nodes, instead of failing with E-230. - Added a remote fetch mechanism in the rowset management layer: When a BE detects that a rowset is missing locally but has already been merged, it will try to fetch the rowset from other BE nodes. - Updated version and state checking logic to correctly identify the “merged but missing” condition. - Adjusted the rowset access path to trigger remote fetch rather than throwing an immediate error. - Added tests (unit/integration) to cover the new logic where applicable. - Ensured backward compatibility: If the BE already has the rowset locally or read-write splitting is not enabled, the behavior remains unchanged. ### Release note Introduce a remote rowset fetching mechanism to prevent E-230 (“versions already merged”) errors in read-write splitting scenarios. This improves BE fault tolerance when some nodes have merged versions that others have not yet synchronized.
What problem does this PR solve?
Problem Summary:
Make stale rowsets accessible across BEs to avoid E-230 (versions already merged) in Read-Write Splitting senario.
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)