Skip to content

Cannot query some parquet files in S3, but they work locally #3633

@andygrove

Description

@andygrove

Describe the bug
I am trying to query parquet files in S3 from the CLI. Some work, and some do not.

To Reproduce

DataFusion CLI v12.0.0
❯ create external table test stored as parquet location 's3://nyc-tlc/trip data/yellow_tripdata_2022-06.parquet';
ObjectStore(Generic { store: "S3", source: MissingLastModified })

However, if I download the file locally it works.

$ aws s3 cp "s3://nyc-tlc/trip data/yellow_tripdata_2022-06.parquet" /tmp/yellow_tripdata_2022-06.parquet
download: s3://nyc-tlc/trip data/yellow_tripdata_2022-06.parquet to ../../../../../../tmp/yellow_tripdata_2022-06.parquet
ataFusion CLI v12.0.0
❯ create external table test stored as parquet location '/tmp/yellow_tripdata_2022-06.parquet';
0 rows in set. Query took 0.006 seconds.
❯ select * from test limit 10;
+----------+----------------------+-----------------------+-----------------+---------------+------------+--------------------+--------------+--------------+--------------+-------------+-------+---------+------------+--------------+-----------------------+--------------+----------------------+-------------+
| VendorID | tpep_pickup_datetime | tpep_dropoff_datetime | passenger_count | trip_distance | RatecodeID | store_and_fwd_flag | PULocationID | DOLocationID | payment_type | fare_amount | extra | mta_tax | tip_amount | tolls_amount | improvement_surcharge | total_amount | congestion_surcharge | airport_fee |
+----------+----------------------+-----------------------+-----------------+---------------+------------+--------------------+--------------+--------------+--------------+-------------+-------+---------+------------+--------------+-----------------------+--------------+----------------------+-------------+
| 1        | 2022-06-01 00:25:41  | 2022-06-01 00:48:22   | 1               | 11            | 1          | N                  | 70           | 48           | 1            | 32          | 3     | 0.5     | 2          | 6.55         | 0.3                   | 44.35        | 2.5                  | 0           |
| 1        | 2022-06-01 00:44:40  | 2022-06-01 01:01:48   | 1               | 4.2           | 1          | N                  | 170          | 226          | 1            | 14          | 3     | 0.5     | 0          | 0            | 0.3                   | 17.8         | 2.5                  | 0           |
| 2        | 2022-06-01 00:23:07  | 2022-06-01 00:39:50   | 1               | 9.49          | 1          | N                  | 264          | 113          | 1            | 26          | 0.5   | 0.5     | 5          | 6.55         | 0.3                   | 42.6         | 2.5                  | 1.25        |
| 1        | 2022-06-01 00:25:53  | 2022-06-01 00:57:06   | 2               | 12.1          | 1          | N                  | 132          | 17           | 2            | 37          | 1.75  | 0.5     | 0          | 0            | 0.3                   | 39.55        | 0                    | 1.25        |
| 1        | 2022-06-01 00:23:58  | 2022-06-01 00:33:43   | 0               | 1.8           | 1          | N                  | 140          | 163          | 1            | 9           | 3     | 0.5     | 2.55       | 0            | 0.3                   | 15.35        | 2.5                  | 0           |
| 2        | 2022-06-01 00:01:27  | 2022-06-01 00:10:53   | 1               | 2.02          | 1          | N                  | 148          | 158          | 1            | 9           | 0.5   | 0.5     | 0.64       | 0            | 0.3                   | 13.44        | 2.5                  | 0           |
| 2        | 2022-06-01 00:16:25  | 2022-06-01 00:40:45   | 1               | 8.08          | 1          | N                  | 158          | 116          | 1            | 26.5        | 0.5   | 0.5     | 7.58       | 0            | 0.3                   | 37.88        | 2.5                  | 0           |
| 1        | 2022-06-01 00:11:08  | 2022-06-01 00:27:02   | 1               | 4.3           | 1          | N                  | 246          | 262          | 1            | 15          | 3     | 0.5     | 3.75       | 0            | 0.3                   | 22.55        | 2.5                  | 0           |
| 2        | 2022-06-01 00:21:42  | 2022-06-01 00:42:01   | 1               | 8.78          | 1          | N                  | 197          | 191          | 1            | 26.5        | 0.5   | 0.5     | 5.56       | 0            | 0.3                   | 33.36        | 0                    | 0           |
| 2        | 2022-06-01 00:23:05  | 2022-06-01 00:30:45   | 1               | 1.76          | 1          | N                  | 48           | 186          | 1            | 7.5         | 0.5   | 0.5     | 2.26       | 0            | 0.3                   | 13.56        | 2.5                  | 0           |
+----------+----------------------+-----------------------+-----------------+---------------+------------+--------------------+--------------+--------------+--------------+-------------+-------+---------+------------+--------------+-----------------------+--------------+----------------------+-------------+
10 rows in set. Query took 1.792 seconds.

Expected behavior
Should work

Additional context
None

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions