Skip to content

Panic when reading from partitioned datasets with columns that have ' in them #9269

@alamb

Description

@alamb

Describe the bug

There is a bug when reading from partitioned tables that have commas in their names

Here is the test
https://github.com/apache/arrow-datafusion/blob/b2a04519da97c2ff81789ef41dd652870794a73a/datafusion/sqllogictest/test_files/copy.slt#L109

To Reproduce

Run this script

-- create a table with quotes in the column names
create table test ("'test'" varchar, "'test2'" varchar, "'test3'" varchar);
insert into test VALUES ('a', 'x', 'aa'), ('b','y', 'bb'), ('c', 'z', 'cc');
copy test to '/tmp/escape_quote' (format csv, partition_by '''test2'',''test3''');

-- read back from the table
CREATE EXTERNAL TABLE validate_partitioned_escape_quote STORED AS CSV
LOCATION '/tmp/escape_quote/' PARTITIONED BY ("'test2'", "'test3'");

-- This panics
select * from validate_partitioned_escape_quote;

Here is an example:

-- create a table with quotes in the column names
create table test ("'test'" varchar, "'test2'" varchar, "'test3'" varchar);
insert into test VALUES ('a', 'x', 'aa'), ('b','y', 'bb'), ('c', 'z', 'cc');
copy test to '/tmp/escape_quote' (format csv, partition_by '''test2'',''test3''');

0 rows in set. Query took 0.008 seconds.

+-------+
| count |
+-------+
| 3     |
+-------+
1 row in set. Query took 0.009 seconds.

+-------+
| count |
+-------+
| 3     |
+-------+
1 row in set. Query took 0.029 seconds.

❯ -- read back from the table
CREATE EXTERNAL TABLE validate_partitioned_escape_quote STORED AS CSV
LOCATION '/tmp/escape_quote/' PARTITIONED BY ("'test2'", "'test3'");

0 rows in set. Query took 0.004 seconds.

❯ -- This panics
select * from validate_partitioned_escape_quote;

thread 'thread 'tokio-runtime-workertokio-runtime-worker' panicked at ' panicked at /Users/andrewlamb/Software/arrow-datafusion/datafusion/core/src/datasource/physical_plan/file_scan_config.rs/Users/andrewlamb/Software/arrow-datafusion/datafusion/core/src/datasource/physical_plan/file_scan_config.rs::248:thread '54248:
:tokio-runtime-workerindex out of bounds: the len is 0 but the index is 054' panicked at
/Users/andrewlamb/Software/arrow-datafusion/datafusion/core/src/datasource/physical_plan/file_scan_config.rs:248:
:index out of bounds: the len is 0 but the index is 054
:
index out of bounds: the len is 0 but the index is 0
stack backtrace:
   0: rust_begin_unwind
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:645:5
   1: core::panicking::panic_fmt
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/panicking.rs:72:14
   2: core::panicking::panic_bounds_check
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/panicking.rs:208:5
   3: datafusion::datasource::physical_plan::file_scan_config::PartitionColumnProjector::project
   4: <datafusion::datasource::physical_plan::file_stream::FileStream<F> as futures_core::stream::Stream>::poll_next
   5: datafusion_physical_plan::stream::RecordBatchReceiverStreamBuilder::run_input::{{closure}}
   6: tokio::runtime::task::core::Core<T,S>::poll
   7: tokio::runtime::task::harness::Harness<T,S>::poll
   8: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
   9: tokio::runtime::scheduler::multi_thread::worker::Context::run
  10: tokio::runtime::context::runtime::enter_runtime
  11: tokio::runtime::scheduler::multi_thread::worker::run
  12: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
  13: tokio::runtime::task::core::Core<T,S>::poll
  14: tokio::runtime::task::harness::Harness<T,S>::poll
  15: tokio::runtime::blocking::pool::Inner::run
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Expected behavior

Note the data is written correctly

andrewlamb@Andrews-MacBook-Pro:~/Software/influxdb_iox$ find /tmp/escape_quote
/tmp/escape_quote
/tmp/escape_quote/'test2'=x
/tmp/escape_quote/'test2'=x/'test3'=aa
/tmp/escape_quote/'test2'=x/'test3'=aa/3zMw255TXFQxId14.csv
/tmp/escape_quote/'test2'=y
/tmp/escape_quote/'test2'=y/'test3'=bb
/tmp/escape_quote/'test2'=y/'test3'=bb/3zMw255TXFQxId14.csv
/tmp/escape_quote/'test2'=z
/tmp/escape_quote/'test2'=z/'test3'=cc
/tmp/escape_quote/'test2'=z/'test3'=cc/3zMw255TXFQxId14.csv
andrewlamb@Andrews-MacBook-Pro:~/Software/influxdb_iox$ cat /tmp/escape_quote/\'test2\'\=x/\'test3\'\=aa/3zMw255TXFQxId14.csv
'test'
a

Additional context

@devinjdangelo found this in #9240

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions