Skip to content

GH-38794: [C++][S3] Handle conventional content-type for directories#40147

Merged
pitrou merged 4 commits intoapache:mainfrom
pitrou:gh38794-s3-directory-content-type
Mar 7, 2024
Merged

GH-38794: [C++][S3] Handle conventional content-type for directories#40147
pitrou merged 4 commits intoapache:mainfrom
pitrou:gh38794-s3-directory-content-type

Conversation

@pitrou
Copy link
Copy Markdown
Member

@pitrou pitrou commented Feb 19, 2024

Rationale for this change

Some AWS-related tools write and expect the content-type "application/x-directory" for directory-like entries.

This PR does two things:

  1. set the object's content-type to "application/x-directory" when the user explicitly creates a directory
  2. when a 0-sized object with content-type starting with "application/x-directory" is encountered, consider it a directory

Are these changes tested?

Unfortunately, this cannot be tested with MinIO, as it seems to ignore the content-type set on directories (as opposed to regular files).

Are there any user-facing changes?

Hopefully better compatibility with existing S3 filesystem hierarchies.

@github-actions
Copy link
Copy Markdown

⚠️ GitHub issue #38794 has been automatically assigned in GitHub to PR creator.

@pitrou
Copy link
Copy Markdown
Member Author

pitrou commented Feb 19, 2024

@yf-yang This might solve your issue, though obviously I'm not able to test.

@github-actions github-actions bot added the awaiting review Awaiting review label Feb 19, 2024
@pitrou
Copy link
Copy Markdown
Member Author

pitrou commented Feb 19, 2024

Actually, probably not, since the FileInfo result in #38794 (comment) will not be changed.

Edit: fixed.

Copy link
Copy Markdown
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 if it works.

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting review Awaiting review labels Feb 19, 2024
@pitrou pitrou force-pushed the gh38794-s3-directory-content-type branch from 989c18a to 7fb554c Compare February 20, 2024 10:14
@pitrou
Copy link
Copy Markdown
Member Author

pitrou commented Feb 20, 2024

@github-actions crossbow submit -g wheel

@github-actions

This comment was marked as outdated.

@pitrou
Copy link
Copy Markdown
Member Author

pitrou commented Feb 20, 2024

@yf-yang Once they are marked "passing", the Crossbow badges above will give you access to binary wheels of PyArrow with this PR's changes (for example : https://github.com/ursacomputing/crossbow/actions/runs/7971839531). Can you try installing the corresponding wheel for your configuration and check if it fixes the issue for you?

@pitrou
Copy link
Copy Markdown
Member Author

pitrou commented Feb 27, 2024

Ping @yf-yang

pitrou added 4 commits March 7, 2024 10:58
…ories

Some AWS-related tools write and expect the content-type "application/x-directory" for directory-like entries.

Unfortunately, this cannot be tested for MinIO, as it apparently ignores the content-type set on directories (as opposed to files).
@pitrou pitrou force-pushed the gh38794-s3-directory-content-type branch from 7fb554c to 70563aa Compare March 7, 2024 10:08
@pitrou pitrou marked this pull request as ready for review March 7, 2024 10:08
@pitrou
Copy link
Copy Markdown
Member Author

pitrou commented Mar 7, 2024

@github-actions crossbow submit -g wheel -g python

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 7, 2024

Revision: 70563aa

Submitted crossbow builds: ursacomputing/crossbow @ actions-fb005e5ec7

Task Status
test-conda-python-3.10 GitHub Actions
test-conda-python-3.10-cython2 GitHub Actions
test-conda-python-3.10-hdfs-2.9.2 GitHub Actions
test-conda-python-3.10-hdfs-3.2.1 GitHub Actions
test-conda-python-3.10-pandas-latest GitHub Actions
test-conda-python-3.10-pandas-nightly GitHub Actions
test-conda-python-3.10-spark-v3.5.0 GitHub Actions
test-conda-python-3.10-substrait GitHub Actions
test-conda-python-3.11 GitHub Actions
test-conda-python-3.11-dask-latest GitHub Actions
test-conda-python-3.11-dask-upstream_devel GitHub Actions
test-conda-python-3.11-hypothesis GitHub Actions
test-conda-python-3.11-pandas-upstream_devel GitHub Actions
test-conda-python-3.11-spark-master GitHub Actions
test-conda-python-3.12 GitHub Actions
test-conda-python-3.8 GitHub Actions
test-conda-python-3.8-pandas-1.0 GitHub Actions
test-conda-python-3.8-spark-v3.5.0 GitHub Actions
test-conda-python-3.9 GitHub Actions
test-conda-python-3.9-pandas-latest GitHub Actions
test-cuda-python GitHub Actions
test-debian-11-python-3-amd64 Azure
test-debian-11-python-3-i386 GitHub Actions
test-fedora-39-python-3 Azure
test-ubuntu-20.04-python-3 Azure
test-ubuntu-22.04-python-3 GitHub Actions
wheel-macos-big-sur-cp310-arm64 GitHub Actions
wheel-macos-big-sur-cp311-arm64 GitHub Actions
wheel-macos-big-sur-cp312-arm64 GitHub Actions
wheel-macos-big-sur-cp38-arm64 GitHub Actions
wheel-macos-big-sur-cp39-arm64 GitHub Actions
wheel-macos-catalina-cp310-amd64 GitHub Actions
wheel-macos-catalina-cp311-amd64 GitHub Actions
wheel-macos-catalina-cp312-amd64 GitHub Actions
wheel-macos-catalina-cp38-amd64 GitHub Actions
wheel-macos-catalina-cp39-amd64 GitHub Actions
wheel-manylinux-2-28-cp310-amd64 GitHub Actions
wheel-manylinux-2-28-cp310-arm64 GitHub Actions
wheel-manylinux-2-28-cp311-amd64 GitHub Actions
wheel-manylinux-2-28-cp311-arm64 GitHub Actions
wheel-manylinux-2-28-cp312-amd64 GitHub Actions
wheel-manylinux-2-28-cp312-arm64 GitHub Actions
wheel-manylinux-2-28-cp38-amd64 GitHub Actions
wheel-manylinux-2-28-cp38-arm64 GitHub Actions
wheel-manylinux-2-28-cp39-amd64 GitHub Actions
wheel-manylinux-2-28-cp39-arm64 GitHub Actions
wheel-manylinux-2014-cp310-amd64 GitHub Actions
wheel-manylinux-2014-cp310-arm64 GitHub Actions
wheel-manylinux-2014-cp311-amd64 GitHub Actions
wheel-manylinux-2014-cp311-arm64 GitHub Actions
wheel-manylinux-2014-cp312-amd64 GitHub Actions
wheel-manylinux-2014-cp312-arm64 GitHub Actions
wheel-manylinux-2014-cp38-amd64 GitHub Actions
wheel-manylinux-2014-cp38-arm64 GitHub Actions
wheel-manylinux-2014-cp39-amd64 GitHub Actions
wheel-manylinux-2014-cp39-arm64 GitHub Actions
wheel-windows-cp310-amd64 GitHub Actions
wheel-windows-cp311-amd64 GitHub Actions
wheel-windows-cp312-amd64 GitHub Actions
wheel-windows-cp38-amd64 GitHub Actions
wheel-windows-cp39-amd64 GitHub Actions

@pitrou pitrou merged commit e38583c into apache:main Mar 7, 2024
@pitrou pitrou removed the awaiting merge Awaiting merge label Mar 7, 2024
@pitrou pitrou deleted the gh38794-s3-directory-content-type branch March 7, 2024 11:41
@conbench-apache-arrow
Copy link
Copy Markdown

After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit e38583c.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 18 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Python] [AWS] Fail to open partitioned parquet with s3fs + pyarrow due to s3 prefix

3 participants