[Segment Replication] Delete integration test fix #4117
[Segment Replication] Delete integration test fix #4117Poojita-Raj wants to merge 2 commits intoopensearch-project:mainfrom
Conversation
Signed-off-by: Poojita Raj <poojiraj@amazon.com>
Signed-off-by: Poojita Raj <poojiraj@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
| * Returns a diff between the two snapshots that can be used for getting list of files to copy over to a replica for segment replication. The given snapshot is treated as the | ||
| * target and this snapshot as the source. | ||
| */ | ||
| public RecoveryDiff getFilesRecoveryDiff(MetadataSnapshot recoveryTargetSnapshot) { |
There was a problem hiding this comment.
nit - this is intended as a diff for segrep - segmentReplicationDiff ?
| final Map<String, List<StoreFileMetadata>> perSegment = (Map<String, List<StoreFileMetadata>>) groupedFiles[0]; | ||
| final List<StoreFileMetadata> perCommitStoreFiles = (List<StoreFileMetadata>) groupedFiles[1]; | ||
| final ArrayList<StoreFileMetadata> identicalFiles = new ArrayList<>(); | ||
| for (List<StoreFileMetadata> segmentFiles : Iterables.concat(perSegment.values(), Collections.singleton(perCommitStoreFiles))) { |
There was a problem hiding this comment.
both of these methods concat and loop through the iterable. What about making getGroupedFiles return an Iterable<List>? Then you don't need to fetch them by index either on the previous lines.
| * @see MetadataSnapshot#recoveryDiff(MetadataSnapshot) | ||
| * @see MetadataSnapshot#getFilesRecoveryDiff(MetadataSnapshot) | ||
| */ | ||
| public Object[] getGroupedFiles() { |
| * Returns a diff between the two snapshots that can be used for getting list of files to copy over to a replica for segment replication. The given snapshot is treated as the | ||
| * target and this snapshot as the source. | ||
| */ | ||
| public RecoveryDiff getFilesRecoveryDiff(MetadataSnapshot recoveryTargetSnapshot) { |
There was a problem hiding this comment.
We'll need unit tests on this method. I think we also need more context here on why the existing recoveryDiff method does not work. Mainly that the stronger cksum check based on the segment generation results in frequently copying identical files.
There was a problem hiding this comment.
Yes, although the reason is different - not because of checksum - it marks all identical files as different even when they're not if there are any missing files that exist between the snapshots.
Description
The delete operation integration test for segment replication was failing due to the calculation of recoveryDiff (difference between snapshots) used by the getFiles method to get the files needed as specified by the checkpoints.
Issues Resolved
Resolves #3787
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.