Skip to content

[C++] S3FileSystem: deleting directory with double slash in name crashes #38821

@jorisvandenbossche

Description

@jorisvandenbossche

Found while investigating #38618 (so might be caused by the same change in 14.0, possibly related to #35440).

Assuming minio is running as how it is set up in our tests, the following script crashes:

import os
from pyarrow.fs import S3FileSystem

host, port, access_key, secret_key = ('localhost', 54383, 'arrow', 'apachearrow')

s3_bucket = 'pyarrow-filesystem/'

s3fs = S3FileSystem(
    access_key=access_key,
    secret_key=secret_key,
    endpoint_override='{}:{}'.format(host, port),
    scheme='http',
    allow_bucket_creation=True,
    allow_bucket_deletion=True
)

s3fs.create_dir(s3_bucket)

test_dir = "test_dir"
s3fs.create_dir(s3_bucket + "/" + test_dir)
print(s3fs.get_file_info(s3_bucket + "/" + test_dir))
print(s3fs.get_file_info(s3_bucket + test_dir))
s3fs.delete_dir(s3_bucket + "/" + test_dir)

output:

<FileInfo for 'pyarrow-filesystem//test_dir': type=FileType.Directory>
<FileInfo for 'pyarrow-filesystem/test_dir': type=FileType.Directory>
terminate called after throwing an instance of 'std::out_of_range'
  what():  basic_string::substr: __pos (which is 19) > this->size() (which is 18)
Aborted (core dumped)

So what I happened here is that I accidentally constructed a dir path name with a double slash using s3_bucket + "/" + test_dir while the s3_bucket already included a trailing /. While this was not intentionally, and once discovered easy to fix, we should still not crash for something like that.

For S3 (at least using minio), it seems we do allow to create the directory (and it will ignore the double slash, just creating a directory with name "test_dir"), and to get the file info (both with a single or double slash, it returns the info about the same directory), but then when trying to delete the directory using the name with double slash, it segfaults.

GDB backtrace:

Details
#0  0x00007ffff7c7800b in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff7c57859 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007ffff3822026 in __gnu_cxx::__verbose_terminate_handler () at ../../../../libstdc++-v3/libsupc++/vterminate.cc:95
#3  0x00007ffff3820514 in __cxxabiv1::__terminate (handler=<optimized out>) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48
#4  0x00007ffff3820566 in std::terminate () at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:58
#5  0x00007ffff3820758 in __cxxabiv1::__cxa_throw (obj=0x7fffa00065d0, tinfo=0x7ffff39155e8 <typeinfo for std::out_of_range>, dest=0x7ffff382cd34 <std::out_of_range::~out_of_range()>)
    at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:98
#6  0x00007ffff383bbea in std::__throw_out_of_range_fmt (__fmt=__fmt@entry=0x7ffff4eba6c0 "%s: __pos (which is %zu) > this->size() (which is %zu)") at ../../../../../libstdc++-v3/src/c++11/functexcept.cc:101
#7  0x00007ffff4cedfee in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_check (__s=0x7ffff4eba770 "basic_string::substr", __pos=<optimized out>, this=0x7fffa000a088)
    at /home/joris/miniconda3/envs/arrow-dev/x86_64-conda-linux-gnu/include/c++/10.4.0/bits/basic_string.h:321
#8  std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::substr (__n=18446744073709551615, __pos=<optimized out>, this=0x7fffa000a088)
    at /home/joris/miniconda3/envs/arrow-dev/x86_64-conda-linux-gnu/include/c++/10.4.0/bits/basic_string.h:2848
#9  arrow::fs::S3FileSystem::Impl::DoDeleteDirContentsAsync(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::{lambda(arrow::util::AsyncTaskScheduler*, arrow::fs::S3FileSystem::Impl*)#1}::operator()(arrow::util::AsyncTaskScheduler*, arrow::fs::S3FileSystem::Impl*) const::{lambda()#1}::operator()() const::{lambda(std::vector<arrow::fs::FileInfo, std::allocator<arrow::fs::FileInfo> > const&)#1}::operator()(std::vector<arrow::fs::FileInfo, std::allocator<arrow::fs::FileInfo> > const&) const (
    __closure=__closure@entry=0x7fffa0004bc8, file_infos=...) at /home/joris/scipy/repos/arrow/cpp/src/arrow/filesystem/s3fs.cc:2411
#10 0x00007ffff4d07ff7 in arrow::LoopBody::Callback::operator() (next=..., this=0x7fffa0004bc8) at /home/joris/scipy/repos/arrow/cpp/src/arrow/util/async_generator.h:93
#11 arrow::detail::ContinueFuture::operator()<arrow::VisitAsyncGenerator<std::vector<arrow::fs::FileInfo, std::allocator<arrow::fs::FileInfo> >, arrow::fs::S3FileSystem::Impl::DoDeleteDirContentsAsync(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::{lambda(arrow::util::AsyncTaskScheduler*, arrow::fs::S3FileSystem::Impl*)#1}::operator()(arrow::util::AsyncTaskScheduler*, arrow::fs::S3FileSystem::Impl*) const::{lambda()#1}::operator()() const::{lambda(std::vector<arrow::fs::FileInfo, std::allocator<arrow::fs::FileInfo> > const&)#1}>(std::function<arrow::Future<std::vector<arrow::fs::FileInfo, std::allocator<arrow::fs::FileInfo> > > ()>, arrow::fs::S3FileSystem::Impl::DoDeleteDirContentsAsync(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::{lambda(arrow::util::AsyncTaskScheduler*, arrow::fs::S3FileSystem::Impl*)#1}::operator()(arrow::util::AsyncTaskScheduler*, arrow::fs::S3FileSystem::Impl*) const::{lambda()#1}::operator()() const::{lambda(std::vector<arrow::fs::FileInfo, std::allocator<arrow::fs::FileInfo> > const&)#1})::LoopBody::Callback, std::vector<arrow::fs::FileInfo, std::allocator<arrow::fs::FileInfo> > const&, arrow::Result<std::optional<arrow::internal::Empty> >, arrow::Future<std::optional<arrow::internal::Empty> > >(arrow::Future<std::optional<arrow::internal::Empty> >, arrow::VisitAsyncGenerator<std::vector<arrow::fs::FileInfo, std::allocator<arrow::fs::FileInfo> >, arrow::fs::S3FileSystem::Impl::DoDeleteDirContentsAsync(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::{lambda(arrow::util::AsyncTaskScheduler*, arrow::fs::S3FileSystem::Impl*)#1}::operator()(arrow::util::AsyncTaskScheduler*, arrow::fs::S3FileSystem::Impl*) const::{lambda()#1}::operator()() const::{lambda(std::vector<arrow::fs::FileInfo, std::allocator<arrow::fs::FileInfo> > const&)#1}>(std::function<arrow::Future<std::vector<arrow::fs::FileInfo, std::allocator<arrow::fs::FileInfo> > > ()>, arrow::fs::S3FileSystem::Impl::DoDeleteDirContentsAsync(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::{lambda(arrow::util::AsyncTaskScheduler*, arrow::fs::S3FileSystem::Impl*)#1}::operator()(arrow::util::AsyncTaskScheduler*, arrow::fs::S3FileSystem::Impl*) const::{lambda()#1}::operator()() const::{lambda(std::vector<arrow::fs::FileInfo, std::allocator<arrow::fs::FileInfo> > const&)#1})::LoopBody::Callback&&, std::vector<arrow::fs::FileInfo, std::allocator<arrow::fs::FileInfo> > const&) const (this=<optimized out>, f=..., next=...)
    at /home/joris/scipy/repos/arrow/cpp/src/arrow/util/future.h:150

It seems it is crashing while trying to format the dir name for an exception that is raised from DoDeleteDirContentsAsync.

This only crashes with pyarrow 14.0, but "works" (ignoring the double slash and actually deleting the directory, but one could also argue it should raise a proper error for it) for older versions.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions