ARROW-16204: [C++][Dataset] Default error existing_data_behaviour for writing dataset ignores a single file #12898

jorisvandenbossche · 2022-04-15T09:24:26Z

No description provided.

… writing dataset ignores a single file

github-actions · 2022-04-15T09:24:55Z

https://issues.apache.org/jira/browse/ARROW-16204

github-actions · 2022-04-15T09:24:56Z

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

jorisvandenbossche · 2022-04-15T13:51:17Z

docs/source/python/dataset.rst

        pa.schema([("c", pa.int16())]), flavor="hive"
    )
-    ds.write_dataset(table, "sample_dataset", format="parquet", partitioning=part)
+    ds.write_dataset(table, "partitioned_dataset", format="parquet", partitioning=part)


We already wrote to "sample_dataset" a bit above, so this now started to (correctly) raise an error if using the same name

jorisvandenbossche · 2022-04-22T07:27:56Z

@westonpace I think it should be a trivial bug fix, but it would be good to get a sanity check

westonpace

Good catch. Thank you. I think I was allowing the base dir to exist but the docs are pretty clear that the base dir is not returned by GetFileInfo(selector).

ursabot · 2022-04-25T23:21:07Z

Benchmark runs are scheduled for baseline = 16638a4 and contender = 912e2bb. 912e2bb is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed] test-mac-arm
[Failed ⬇️1.13% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.13% ⬆️0.0%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/588| 912e2bb3 ec2-t3-xlarge-us-east-2>
[Failed] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/576| 912e2bb3 test-mac-arm>
[Failed] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/574| 912e2bb3 ursa-i9-9960x>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/586| 912e2bb3 ursa-thinkcentre-m75q>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/587| 16638a44 ec2-t3-xlarge-us-east-2>
[Failed] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/575| 16638a44 test-mac-arm>
[Failed] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/573| 16638a44 ursa-i9-9960x>
[Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/585| 16638a44 ursa-thinkcentre-m75q>
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

ARROW-16204: [C++][Dataset] Default error existing_data_behaviour for…

f5c5764

… writing dataset ignores a single file

jorisvandenbossche requested a review from westonpace April 15, 2022 09:24

github-actions bot added the Component: C++ label Apr 15, 2022

update docs

36af489

jorisvandenbossche commented Apr 15, 2022

View reviewed changes

westonpace approved these changes Apr 22, 2022

View reviewed changes

jorisvandenbossche closed this in 912e2bb Apr 22, 2022

jorisvandenbossche deleted the ARROW-16204 branch April 22, 2022 16:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ARROW-16204: [C++][Dataset] Default error existing_data_behaviour for writing dataset ignores a single file #12898

ARROW-16204: [C++][Dataset] Default error existing_data_behaviour for writing dataset ignores a single file #12898

Uh oh!

jorisvandenbossche commented Apr 15, 2022

Uh oh!

github-actions bot commented Apr 15, 2022

Uh oh!

github-actions bot commented Apr 15, 2022

Uh oh!

jorisvandenbossche Apr 15, 2022

Uh oh!

jorisvandenbossche commented Apr 22, 2022

Uh oh!

westonpace left a comment

Uh oh!

ursabot commented Apr 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ARROW-16204: [C++][Dataset] Default error existing_data_behaviour for writing dataset ignores a single file #12898

ARROW-16204: [C++][Dataset] Default error existing_data_behaviour for writing dataset ignores a single file #12898

Uh oh!

Conversation

jorisvandenbossche commented Apr 15, 2022

Uh oh!

github-actions bot commented Apr 15, 2022

Uh oh!

github-actions bot commented Apr 15, 2022

Uh oh!

jorisvandenbossche Apr 15, 2022

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche commented Apr 22, 2022

Uh oh!

westonpace left a comment

Choose a reason for hiding this comment

Uh oh!

ursabot commented Apr 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants