Skip to content

Conversation

@devinjdangelo
Copy link
Contributor

Which issue does this PR close?

Related to #8853 and #8851

Rationale for this change

A number of issues were recently brought up related to the parallel parquet writer. Most appear to be related to how nested types are being handled. It is not immediately obvious where the flaw is related to handling nested types.

What changes are included in this PR?

Disables parallel writing by default. Adds tests to copy.slt that cover the failure cases.

Are these changes tested?

Yes

Are there any user-facing changes?

Potentially fewer cores will be used by default when writing parquet files.

/// in each row group in each output file are serialized in parallel
/// leveraging a maximum possible core count of n_files*n_row_groups*n_columns.
pub allow_single_file_parallelism: bool, default = true
pub allow_single_file_parallelism: bool, default = false
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As flagged in #8851 and #8853, setting this to true will result in the new copy.slt tests to fail.


query ??
COPY
(values (struct ('foo', (struct ('foo', make_array(1,2,3)))), make_array(timestamp '2023-01-01 01:00:01',timestamp '2023-01-01 01:00:01')),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New test case covers:

  1. Struct types
  2. Array types
  3. Multiple levels of nesting (struct of struct with array value)
  4. Timestamp types

I attempted to add array of struct as specifically flagged in #8851, but was unable to construct the literal value due to #8867.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I left a comment there

Would it be possible to change the constant values of the two rows so it is clear they are different

e.g.

Suggested change
(values (struct ('foo', (struct ('foo', make_array(1,2,3)))), make_array(timestamp '2023-01-01 01:00:01',timestamp '2023-01-01 01:00:01')),
(values (struct ('foo', (struct ('foo', make_array(10,20,30)))), make_array(timestamp '2024-01-01 01:00:01',timestamp '2024-01-01 01:00:01')),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed up a change to make the rows distinct and add an array of struct as #8867 is fixed after pulling main.

@devinjdangelo devinjdangelo marked this pull request as ready for review January 15, 2024 01:03
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @devinjdangelo -- adding tests is great and very much appreciated.

I think we can merge this PR as is, but if we can improve the tests slightly before doing so that would be great.


query ??
COPY
(values (struct ('foo', (struct ('foo', make_array(1,2,3)))), make_array(timestamp '2023-01-01 01:00:01',timestamp '2023-01-01 01:00:01')),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I left a comment there

Would it be possible to change the constant values of the two rows so it is clear they are different

e.g.

Suggested change
(values (struct ('foo', (struct ('foo', make_array(1,2,3)))), make_array(timestamp '2023-01-01 01:00:01',timestamp '2023-01-01 01:00:01')),
(values (struct ('foo', (struct ('foo', make_array(10,20,30)))), make_array(timestamp '2024-01-01 01:00:01',timestamp '2024-01-01 01:00:01')),

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @devinjdangelo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants