Skip to content

GH-38618: [C++] S3FileSystem: fix regression in deleting explicitly created sub-directories#38845

Merged
jorisvandenbossche merged 7 commits intoapache:mainfrom
jorisvandenbossche:gh-38618
Dec 5, 2023
Merged

GH-38618: [C++] S3FileSystem: fix regression in deleting explicitly created sub-directories#38845
jorisvandenbossche merged 7 commits intoapache:mainfrom
jorisvandenbossche:gh-38618

Conversation

@jorisvandenbossche
Copy link
Copy Markdown
Member

@jorisvandenbossche jorisvandenbossche commented Nov 22, 2023

Rationale for this change

See #38618 (comment) and below for the analysis. When deleting the dir contents, we use a GetFileInfo with recursive FileSelector to list all objects to delete, but when doing that the file paths for directories don't end in a trailing /, so for deleting explicitly created directories we need to add the kSep here as well to properly delete the object.

Are these changes tested?

I tested them manually with an actual S3 bucket. The problem is that MinIO doesn't have the same problem, and so it's not actually tested with the test I added using our MinIO testing setup.

Are there any user-facing changes?

Fixes the regression

@github-actions
Copy link
Copy Markdown

⚠️ GitHub issue #38618 has been automatically assigned in GitHub to PR creator.

Copy link
Copy Markdown
Contributor

@felipecrv felipecrv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for investigating and fixing this!

if (file_info.IsDirectory()) {
file_path = file_path + kSep;
}
file_paths.push_back(file_path);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that file_path lives in a variable, you need to explicitly std::move it to avoid a copy when pushing:

file_paths.push_back(std::move(file_path));

Copy link
Copy Markdown
Member

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I would like to merge this to be added to 14.0.2

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting committer review Awaiting committer review labels Dec 4, 2023
@raulcd raulcd added awaiting changes Awaiting changes and removed awaiting merge Awaiting merge labels Dec 4, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting changes Awaiting changes awaiting change review Awaiting change review labels Dec 5, 2023
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Dec 5, 2023
@jorisvandenbossche jorisvandenbossche merged commit cf80bd1 into apache:main Dec 5, 2023
@jorisvandenbossche jorisvandenbossche removed the awaiting change review Awaiting change review label Dec 5, 2023
@jorisvandenbossche jorisvandenbossche deleted the gh-38618 branch December 5, 2023 17:23
@conbench-apache-arrow
Copy link
Copy Markdown

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit cf80bd1.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 5 possible false positives for unstable benchmarks that are known to sometimes produce them.

raulcd pushed a commit that referenced this pull request Dec 6, 2023
…reated sub-directories (#38845)

### Rationale for this change

See #38618 (comment) and below for the analysis. When deleting the dir contents, we use a GetFileInfo with recursive FileSelector to list all objects to delete, but when doing that the file paths for directories don't end in a trailing `/`, so for deleting explicitly created directories we need to add the `kSep` here as well to properly delete the object.

### Are these changes tested?

I tested them manually with an actual S3 bucket. The problem is that MinIO doesn't have the same problem, and so it's not actually tested with the test I added using our MinIO testing setup.

### Are there any user-facing changes?

Fixes the regression
* Closes: #38618

Lead-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…itly created sub-directories (apache#38845)

### Rationale for this change

See apache#38618 (comment) and below for the analysis. When deleting the dir contents, we use a GetFileInfo with recursive FileSelector to list all objects to delete, but when doing that the file paths for directories don't end in a trailing `/`, so for deleting explicitly created directories we need to add the `kSep` here as well to properly delete the object.

### Are these changes tested?

I tested them manually with an actual S3 bucket. The problem is that MinIO doesn't have the same problem, and so it's not actually tested with the test I added using our MinIO testing setup.

### Are there any user-facing changes?

Fixes the regression
* Closes: apache#38618

Lead-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Python][C++] S3FileSystem delete_dir() regression in PyArrow 14

4 participants