Skip to content

Conversation

@felipecrv
Copy link
Contributor

@felipecrv felipecrv commented Feb 14, 2024

Rationale for this change

It was not implemented yet.

What changes are included in this PR?

  • An implementation of DeleteFile() that is specialized to storage accounts that don't have HNS support enabled
  • This fixes a semantic issue: deleting a file should not delete the parent directory when the file deleted was the last one
  • Increased test coverage
  • Fix of a bug in the version that deletes files in HNS-enabled accounts (we shouldn't let DeleteFile delete directories even if they are empty)

Are these changes tested?

Yes. Tests were re-written and moved to TestAzureFileSystemOnAllScenarios.

@github-actions
Copy link

⚠️ GitHub issue #40074 has been automatically assigned in GitHub to PR creator.

@felipecrv
Copy link
Contributor Author

@av8or1

@felipecrv
Copy link
Contributor Author

@Tom-Newton

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Feb 15, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Feb 16, 2024
@felipecrv
Copy link
Contributor Author

felipecrv commented Feb 16, 2024

@kou I added a commit making error messages a bit more detailed now and removed the WithHierarchicalNamespace() that was needed because of differences in HNS and Flat namespace implementations.

@felipecrv felipecrv requested a review from kou February 16, 2024 02:06
DCHECK(!location.path.empty());
constexpr auto kFileBlobLeaseTime = std::chrono::seconds{15};
auto no_trailing_slash_location = location.RemoveTrailingSlash(
/*preserve_original=*/true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I think that preserve_original isn't a good approach. It will confuse us. Is it really needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want all to be path the user used so error messages make sense.

Copy link
Contributor Author

@felipecrv felipecrv Feb 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about RemoveTrailingSlashFromPath()?

UPDATE: I added a commit renaming it and the docstring now explains why all is always preserved. I don't think there will be any use-case for changing all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that no_trailing_slash_location.all isn't used by the user:

      ARROW_ASSIGN_OR_RAISE(auto file_info,
                            GetFileInfo(container_client, no_trailing_slash_location));

GetFileInfo() in the above code may return an error to the user. But GetFileInfo() doesn't use no_trailing_slash_location.all for an error message:

return ExceptionToStatus(
exception, "GetProperties for '", file_client.GetUrl(),
"' failed. GetFileInfo is unable to determine whether the path exists.");

It uses file_client.GetUrl() and it's based on no_trailing_slash_location.path not .all.

And it seems that FileInfo::path() of file_info returned by GetFileInfo() isn't used in this lambda. (file_info.path() uses no_trailing_slash_location.all but it's not used.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test failures when I remove this:

[ RUN      ] TestAzureFileSystemOnAllScenarios/2.DeleteFileAtContainerRoot
/Users/felipe/code/arrow/cpp/src/arrow/filesystem/azurefs_test.cc:942: Failure
Failed
'fs()->DeleteFile(data.ObjectPath() + "/")' did not fail with errno=ENOTDIR: IOError: Path does not exist 'w3ay9l35qoxg0i0lqbfababvwbrrrhxq/test-object-name/'. Detail: [errno 2] No such file or directory

[  FAILED  ] TestAzureFileSystemOnAllScenarios/2.DeleteFileAtContainerRoot, where TypeParam = arrow::fs::TestingScenario<arrow::fs::AzureFlatNSEnv,true> (3185 ms)
[ RUN      ] TestAzureFileSystemOnAllScenarios/2.DeleteFileAtSubdirectory
/Users/felipe/code/arrow/cpp/src/arrow/filesystem/azurefs_test.cc:1000: Failure
Value of: _st.ToStringWithoutContextLines()
Expected: has substring "Not a directory: 'sco3n8q67r3cd2mas6lx53nlc36uw7f9/dir/file0/'"
  Actual: "IOError: Path does not exist 'sco3n8q67r3cd2mas6lx53nlc36uw7f9/dir/file0/'. Detail: [errno 2] No such file or directory"

/Users/felipe/code/arrow/cpp/src/arrow/filesystem/azurefs_test.cc:1000: Failure
Value of: _st.ToStringWithoutContextLines()
Expected: has substring "Not a directory: '3nw7fyp8whjb3r8e6g7eoyg2zav8nr8y/dir/file0/'"
  Actual: "IOError: Path does not exist '3nw7fyp8whjb3r8e6g7eoyg2zav8nr8y/dir/file0/'. Detail: [errno 2] No such file or directory"

[  FAILED  ] TestAzureFileSystemOnAllScenarios/2.DeleteFileAtSubdirectory, where TypeParam = arrow::fs::TestingScenario<arrow::fs::AzureFlatNSEnv,true> (12108 ms)

I'm going to implement this by copying location and removing the trailing slashes manually.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Feb 16, 2024
...instead of taking a bool parameter.
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Feb 16, 2024
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Feb 16, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Feb 19, 2024
@felipecrv felipecrv requested a review from kou February 20, 2024 18:42
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

/// \pre location.path is not empty.
Status DeleteFileOnContainer(const Blobs::BlobContainerClient& container_client,
const AzureLocation& location, bool require_file_to_exist,
const char* operation) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we don't need to receive operation as an argument. How about defining it as a local variable instead of an argument?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might use Move() with it, so I will keep it.

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Feb 20, 2024
@felipecrv felipecrv merged commit a2d0729 into apache:main Feb 21, 2024
@felipecrv felipecrv removed the awaiting merge Awaiting merge label Feb 21, 2024
@felipecrv felipecrv deleted the azure_delete_blob branch February 21, 2024 01:04
@github-actions github-actions bot added the awaiting review Awaiting review label Feb 21, 2024
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit a2d0729.

There were 2 benchmark results with an error:

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C++][FS][Azure] Implement DeleteFile() for flat-namespace storage accounts

2 participants