Skip to content

Conversation

@kumarlokesh
Copy link
Contributor

@kumarlokesh kumarlokesh commented Apr 19, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

  • Removed explicit type casting from the CorrelationGroupsAccumulator.update_batch method

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Apr 19, 2025
@kumarlokesh kumarlokesh force-pushed the avoid-explicit-cast-in-corr-aggregate-fn branch 2 times, most recently from 1933efe to d8a7641 Compare April 19, 2025 17:05
@github-actions
Copy link

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale PR has not had any activity for some time label Jun 20, 2025
@kumarlokesh
Copy link
Contributor Author

PR is active.

};

let physical_args =
// Create base physical expressions (without coercion)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think typically coercsion is applied ealier in the planning proesses, not after physiscal expressions have been crated

Did you consider applying the coercsion along with the other coercion ?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR @kumarlokesh 🙏

let array_x = downcast_array::<Float64Array>(array_x);
let array_y = &cast(&values[1], &DataType::Float64)?;
let array_y = downcast_array::<Float64Array>(array_y);
let array_x = downcast_array::<Float64Array>(&values[0]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me like this code only handles Float64 but the signature of the function reports that it accepts any numeric type:

impl Correlation {
    /// Create a new COVAR_POP aggregate function
    pub fn new() -> Self {
        Self {
            signature: Signature::uniform(2, NUMERICS.to_vec(), Volatility::Immutable),
        }
    }
}

I wonder if you just changed the signature to say the function needs Float64 argument types, woudl that be enough?

DataFusion already has a bunch of coercion rules, see https://docs.rs/datafusion/latest/datafusion/logical_expr/type_coercion/index.html for example

@alamb
Copy link
Contributor

alamb commented Jun 22, 2025

I am sorry for the delayed review

@alamb
Copy link
Contributor

alamb commented Jun 23, 2025

Marking as draft as I think this PR is no longer waiting on feedback and I am trying to make it easier to find PRs in need of review. Please mark it as ready for review when it is ready for another look

@alamb alamb marked this pull request as draft June 23, 2025 20:15
@github-actions github-actions bot removed the Stale PR has not had any activity for some time label Jun 24, 2025
@github-actions github-actions bot removed logical-expr Logical plan and expressions core Core DataFusion crate labels Jul 5, 2025
@kumarlokesh kumarlokesh changed the title Perform type coercion for corr aggregate function during physical planning Perform type coercion for corr aggregate function Jul 5, 2025
@kumarlokesh kumarlokesh force-pushed the avoid-explicit-cast-in-corr-aggregate-fn branch from e8a9364 to 14e40a6 Compare July 5, 2025 05:26
@github-actions github-actions bot removed the physical-expr Changes to the physical-expr crates label Jul 5, 2025
@kumarlokesh kumarlokesh force-pushed the avoid-explicit-cast-in-corr-aggregate-fn branch from 14e40a6 to bcb761a Compare July 5, 2025 05:29
@kumarlokesh kumarlokesh force-pushed the avoid-explicit-cast-in-corr-aggregate-fn branch from bcb761a to bacff91 Compare July 5, 2025 05:42
@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Jul 5, 2025
@kumarlokesh kumarlokesh force-pushed the avoid-explicit-cast-in-corr-aggregate-fn branch from bacff91 to 1e4f2e9 Compare July 5, 2025 05:45
@github-actions github-actions bot removed the physical-expr Changes to the physical-expr crates label Jul 5, 2025
@kumarlokesh kumarlokesh marked this pull request as ready for review July 5, 2025 06:19
@kumarlokesh kumarlokesh requested a review from alamb July 5, 2025 06:20
@kumarlokesh
Copy link
Contributor Author

@alamb thanks for the review!
Updated the PR to address review comments.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @kumarlokesh -- this looks good to me

Thanks for sticking with it

Maybe you can review the content of datafusion/sqllogictest/test_files/corr_type_coercion.slt one more time to make sure it is adding new coverage but otherwise it is ready to merge in my mind

Self {
signature: Signature::uniform(2, NUMERICS.to_vec(), Volatility::Immutable),
signature: Signature::exact(
vec![DataType::Float64, DataType::Float64],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

group1 1 1
group2 2 1

# Verify the benchmark query's physical plan to confirm type coercion is happening
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't really see any verification in this plan -- I would expect to see casting happening in the datasource of something

@@ -0,0 +1,248 @@
# Licensed to the Apache Software Foundation (ASF) under one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how much coverage this file provides compared to what was added in aggregate.slt

However, I think it is ok to keep

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb Test file datafusion/sqllogictest/test_files/corr_type_coercion.slt doesn't seem to add enough new coverage than what's already covered in aggregate.slt.

Removed redundant test file.

@alamb
Copy link
Contributor

alamb commented Jul 11, 2025

Thanks again @kumarlokesh

@alamb alamb merged commit 4dd7825 into apache:main Jul 11, 2025
27 checks passed
@kumarlokesh kumarlokesh deleted the avoid-explicit-cast-in-corr-aggregate-fn branch July 11, 2025 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Avoid explicit cast during execution in corr aggregate function

2 participants