-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
I found a case where a query fails when there is both a HAVING clause and an ORDER BY clause. The query will work if it has either the HAVING clause or the ORDER BY clause .. but not both.
Here is a failing query:
with revs
as (
SELECT
customer,
cast(round(CAST(substring(revenue,2) AS float)) as INT) AS visit_revenue
FROM visits
)
SELECT
customer,
SUM(visit_revenue) as total_revenue
FROM revs
group by customer
having SUM(visit_revenue) > 999
order by SUM(visit_revenue)The error I get is:
Error: SchemaError(FieldNotFound { field: Column { relation: None, name: "visit_revenue" }, valid_fields: [Column { relation: Some(Bare { table: "revs" }), name: "customer" }, Column { relation: None, name: "total_revenue" }, Column { relation: Some(Bare { table: "revs" }), name: "customer" }, Column { relation: None, name: "SUM(revs.visit_revenue)" }] }, Some(""))If I comment out either the line with having SUM(visit_revenue) > 999 OR order by SUM(visit_revenue) the the query will succeed.
To Reproduce
To reproduce, run this program with the attached CSV file as data.
use datafusion::arrow::datatypes::{DataType, Field, Schema};
use datafusion::prelude::{CsvReadOptions, SessionContext};
#[tokio::main]
async fn main() -> datafusion::error::Result<()> {
let file_name = "data/test.csv";
let schema = Schema::new(vec![
Field::new("date", DataType::Utf8, false),
Field::new("weekday", DataType::Utf8, false),
Field::new("time", DataType::Utf8, false),
Field::new("customer", DataType::Utf8, false),
Field::new("revenue", DataType::Utf8, false),
]);
let options = CsvReadOptions::new()
.schema(&schema)
.delimiter(b'\t')
.has_header(true)
;
let ctx = SessionContext::new();
ctx.register_csv("visits", file_name, options).await?;
let df = ctx.sql(r#"
with revs
as (
SELECT
customer,
cast(round(CAST(substring(revenue,2) AS float)) as INT) AS visit_revenue
FROM visits
)
SELECT
customer,
SUM(visit_revenue) as total_revenue
FROM revs
group by customer
having SUM(visit_revenue) > 999
order by SUM(visit_revenue)
"#).await?;
df.show().await?;
Ok(())
}The attached file contains the data:
test.csv
Expected behavior
The expectation is that the result set is both filtered for records where the sum of visit_revenue > 999 AND ordered by the same without crashing.
Additional context
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working