-
Notifications
You must be signed in to change notification settings - Fork 4.1k
[C++] S3FileSystem: deleting directory with double slash in name crashes #38821
Description
Found while investigating #38618 (so might be caused by the same change in 14.0, possibly related to #35440).
Assuming minio is running as how it is set up in our tests, the following script crashes:
import os
from pyarrow.fs import S3FileSystem
host, port, access_key, secret_key = ('localhost', 54383, 'arrow', 'apachearrow')
s3_bucket = 'pyarrow-filesystem/'
s3fs = S3FileSystem(
access_key=access_key,
secret_key=secret_key,
endpoint_override='{}:{}'.format(host, port),
scheme='http',
allow_bucket_creation=True,
allow_bucket_deletion=True
)
s3fs.create_dir(s3_bucket)
test_dir = "test_dir"
s3fs.create_dir(s3_bucket + "/" + test_dir)
print(s3fs.get_file_info(s3_bucket + "/" + test_dir))
print(s3fs.get_file_info(s3_bucket + test_dir))
s3fs.delete_dir(s3_bucket + "/" + test_dir)output:
<FileInfo for 'pyarrow-filesystem//test_dir': type=FileType.Directory>
<FileInfo for 'pyarrow-filesystem/test_dir': type=FileType.Directory>
terminate called after throwing an instance of 'std::out_of_range'
what(): basic_string::substr: __pos (which is 19) > this->size() (which is 18)
Aborted (core dumped)
So what I happened here is that I accidentally constructed a dir path name with a double slash using s3_bucket + "/" + test_dir while the s3_bucket already included a trailing /. While this was not intentionally, and once discovered easy to fix, we should still not crash for something like that.
For S3 (at least using minio), it seems we do allow to create the directory (and it will ignore the double slash, just creating a directory with name "test_dir"), and to get the file info (both with a single or double slash, it returns the info about the same directory), but then when trying to delete the directory using the name with double slash, it segfaults.
GDB backtrace:
Details
#0 0x00007ffff7c7800b in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007ffff7c57859 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007ffff3822026 in __gnu_cxx::__verbose_terminate_handler () at ../../../../libstdc++-v3/libsupc++/vterminate.cc:95
#3 0x00007ffff3820514 in __cxxabiv1::__terminate (handler=<optimized out>) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48
#4 0x00007ffff3820566 in std::terminate () at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:58
#5 0x00007ffff3820758 in __cxxabiv1::__cxa_throw (obj=0x7fffa00065d0, tinfo=0x7ffff39155e8 <typeinfo for std::out_of_range>, dest=0x7ffff382cd34 <std::out_of_range::~out_of_range()>)
at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:98
#6 0x00007ffff383bbea in std::__throw_out_of_range_fmt (__fmt=__fmt@entry=0x7ffff4eba6c0 "%s: __pos (which is %zu) > this->size() (which is %zu)") at ../../../../../libstdc++-v3/src/c++11/functexcept.cc:101
#7 0x00007ffff4cedfee in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_check (__s=0x7ffff4eba770 "basic_string::substr", __pos=<optimized out>, this=0x7fffa000a088)
at /home/joris/miniconda3/envs/arrow-dev/x86_64-conda-linux-gnu/include/c++/10.4.0/bits/basic_string.h:321
#8 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::substr (__n=18446744073709551615, __pos=<optimized out>, this=0x7fffa000a088)
at /home/joris/miniconda3/envs/arrow-dev/x86_64-conda-linux-gnu/include/c++/10.4.0/bits/basic_string.h:2848
#9 arrow::fs::S3FileSystem::Impl::DoDeleteDirContentsAsync(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::{lambda(arrow::util::AsyncTaskScheduler*, arrow::fs::S3FileSystem::Impl*)#1}::operator()(arrow::util::AsyncTaskScheduler*, arrow::fs::S3FileSystem::Impl*) const::{lambda()#1}::operator()() const::{lambda(std::vector<arrow::fs::FileInfo, std::allocator<arrow::fs::FileInfo> > const&)#1}::operator()(std::vector<arrow::fs::FileInfo, std::allocator<arrow::fs::FileInfo> > const&) const (
__closure=__closure@entry=0x7fffa0004bc8, file_infos=...) at /home/joris/scipy/repos/arrow/cpp/src/arrow/filesystem/s3fs.cc:2411
#10 0x00007ffff4d07ff7 in arrow::LoopBody::Callback::operator() (next=..., this=0x7fffa0004bc8) at /home/joris/scipy/repos/arrow/cpp/src/arrow/util/async_generator.h:93
#11 arrow::detail::ContinueFuture::operator()<arrow::VisitAsyncGenerator<std::vector<arrow::fs::FileInfo, std::allocator<arrow::fs::FileInfo> >, arrow::fs::S3FileSystem::Impl::DoDeleteDirContentsAsync(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::{lambda(arrow::util::AsyncTaskScheduler*, arrow::fs::S3FileSystem::Impl*)#1}::operator()(arrow::util::AsyncTaskScheduler*, arrow::fs::S3FileSystem::Impl*) const::{lambda()#1}::operator()() const::{lambda(std::vector<arrow::fs::FileInfo, std::allocator<arrow::fs::FileInfo> > const&)#1}>(std::function<arrow::Future<std::vector<arrow::fs::FileInfo, std::allocator<arrow::fs::FileInfo> > > ()>, arrow::fs::S3FileSystem::Impl::DoDeleteDirContentsAsync(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::{lambda(arrow::util::AsyncTaskScheduler*, arrow::fs::S3FileSystem::Impl*)#1}::operator()(arrow::util::AsyncTaskScheduler*, arrow::fs::S3FileSystem::Impl*) const::{lambda()#1}::operator()() const::{lambda(std::vector<arrow::fs::FileInfo, std::allocator<arrow::fs::FileInfo> > const&)#1})::LoopBody::Callback, std::vector<arrow::fs::FileInfo, std::allocator<arrow::fs::FileInfo> > const&, arrow::Result<std::optional<arrow::internal::Empty> >, arrow::Future<std::optional<arrow::internal::Empty> > >(arrow::Future<std::optional<arrow::internal::Empty> >, arrow::VisitAsyncGenerator<std::vector<arrow::fs::FileInfo, std::allocator<arrow::fs::FileInfo> >, arrow::fs::S3FileSystem::Impl::DoDeleteDirContentsAsync(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::{lambda(arrow::util::AsyncTaskScheduler*, arrow::fs::S3FileSystem::Impl*)#1}::operator()(arrow::util::AsyncTaskScheduler*, arrow::fs::S3FileSystem::Impl*) const::{lambda()#1}::operator()() const::{lambda(std::vector<arrow::fs::FileInfo, std::allocator<arrow::fs::FileInfo> > const&)#1}>(std::function<arrow::Future<std::vector<arrow::fs::FileInfo, std::allocator<arrow::fs::FileInfo> > > ()>, arrow::fs::S3FileSystem::Impl::DoDeleteDirContentsAsync(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::{lambda(arrow::util::AsyncTaskScheduler*, arrow::fs::S3FileSystem::Impl*)#1}::operator()(arrow::util::AsyncTaskScheduler*, arrow::fs::S3FileSystem::Impl*) const::{lambda()#1}::operator()() const::{lambda(std::vector<arrow::fs::FileInfo, std::allocator<arrow::fs::FileInfo> > const&)#1})::LoopBody::Callback&&, std::vector<arrow::fs::FileInfo, std::allocator<arrow::fs::FileInfo> > const&) const (this=<optimized out>, f=..., next=...)
at /home/joris/scipy/repos/arrow/cpp/src/arrow/util/future.h:150
It seems it is crashing while trying to format the dir name for an exception that is raised from DoDeleteDirContentsAsync.
This only crashes with pyarrow 14.0, but "works" (ignoring the double slash and actually deleting the directory, but one could also argue it should raise a proper error for it) for older versions.