-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
There is a bug when reading from partitioned tables that have commas in their names
Here is the test
https://github.com/apache/arrow-datafusion/blob/b2a04519da97c2ff81789ef41dd652870794a73a/datafusion/sqllogictest/test_files/copy.slt#L109
To Reproduce
Run this script
-- create a table with quotes in the column names
create table test ("'test'" varchar, "'test2'" varchar, "'test3'" varchar);
insert into test VALUES ('a', 'x', 'aa'), ('b','y', 'bb'), ('c', 'z', 'cc');
copy test to '/tmp/escape_quote' (format csv, partition_by '''test2'',''test3''');
-- read back from the table
CREATE EXTERNAL TABLE validate_partitioned_escape_quote STORED AS CSV
LOCATION '/tmp/escape_quote/' PARTITIONED BY ("'test2'", "'test3'");
-- This panics
select * from validate_partitioned_escape_quote;Here is an example:
❯ -- create a table with quotes in the column names
create table test ("'test'" varchar, "'test2'" varchar, "'test3'" varchar);
insert into test VALUES ('a', 'x', 'aa'), ('b','y', 'bb'), ('c', 'z', 'cc');
copy test to '/tmp/escape_quote' (format csv, partition_by '''test2'',''test3''');
0 rows in set. Query took 0.008 seconds.
+-------+
| count |
+-------+
| 3 |
+-------+
1 row in set. Query took 0.009 seconds.
+-------+
| count |
+-------+
| 3 |
+-------+
1 row in set. Query took 0.029 seconds.
❯ -- read back from the table
CREATE EXTERNAL TABLE validate_partitioned_escape_quote STORED AS CSV
LOCATION '/tmp/escape_quote/' PARTITIONED BY ("'test2'", "'test3'");
0 rows in set. Query took 0.004 seconds.
❯ -- This panics
select * from validate_partitioned_escape_quote;
thread 'thread 'tokio-runtime-workertokio-runtime-worker' panicked at ' panicked at /Users/andrewlamb/Software/arrow-datafusion/datafusion/core/src/datasource/physical_plan/file_scan_config.rs/Users/andrewlamb/Software/arrow-datafusion/datafusion/core/src/datasource/physical_plan/file_scan_config.rs::248:thread '54248:
:tokio-runtime-workerindex out of bounds: the len is 0 but the index is 054' panicked at
/Users/andrewlamb/Software/arrow-datafusion/datafusion/core/src/datasource/physical_plan/file_scan_config.rs:248:
:index out of bounds: the len is 0 but the index is 054
:
index out of bounds: the len is 0 but the index is 0
stack backtrace:
0: rust_begin_unwind
at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:645:5
1: core::panicking::panic_fmt
at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/panicking.rs:72:14
2: core::panicking::panic_bounds_check
at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/panicking.rs:208:5
3: datafusion::datasource::physical_plan::file_scan_config::PartitionColumnProjector::project
4: <datafusion::datasource::physical_plan::file_stream::FileStream<F> as futures_core::stream::Stream>::poll_next
5: datafusion_physical_plan::stream::RecordBatchReceiverStreamBuilder::run_input::{{closure}}
6: tokio::runtime::task::core::Core<T,S>::poll
7: tokio::runtime::task::harness::Harness<T,S>::poll
8: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
9: tokio::runtime::scheduler::multi_thread::worker::Context::run
10: tokio::runtime::context::runtime::enter_runtime
11: tokio::runtime::scheduler::multi_thread::worker::run
12: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
13: tokio::runtime::task::core::Core<T,S>::poll
14: tokio::runtime::task::harness::Harness<T,S>::poll
15: tokio::runtime::blocking::pool::Inner::run
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.Expected behavior
Note the data is written correctly
andrewlamb@Andrews-MacBook-Pro:~/Software/influxdb_iox$ find /tmp/escape_quote
/tmp/escape_quote
/tmp/escape_quote/'test2'=x
/tmp/escape_quote/'test2'=x/'test3'=aa
/tmp/escape_quote/'test2'=x/'test3'=aa/3zMw255TXFQxId14.csv
/tmp/escape_quote/'test2'=y
/tmp/escape_quote/'test2'=y/'test3'=bb
/tmp/escape_quote/'test2'=y/'test3'=bb/3zMw255TXFQxId14.csv
/tmp/escape_quote/'test2'=z
/tmp/escape_quote/'test2'=z/'test3'=cc
/tmp/escape_quote/'test2'=z/'test3'=cc/3zMw255TXFQxId14.csvandrewlamb@Andrews-MacBook-Pro:~/Software/influxdb_iox$ cat /tmp/escape_quote/\'test2\'\=x/\'test3\'\=aa/3zMw255TXFQxId14.csv
'test'
a
Additional context
@devinjdangelo found this in #9240
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working