Skip to content

Conversation

@ntjohnson1
Copy link
Contributor

@ntjohnson1 ntjohnson1 commented Dec 29, 2025

Which issue does this PR close?

Rationale for this change

Explanation in the issue. Motivation coming more concretely from datafusion-python apache/datafusion-python#1305 (comment)

What changes are included in this PR?

  • Adds the test from the issue to highlight expected behavior
  • Expands drop_columns to coerce things into a fully qualified column to support the range of column varieties
  • This further adds a helper to extract the table name associated with the dataframe to simplify use of qualified drop columns support
    • This is potentially the most controversial part. I could see a nicer api being df.col(<name>) to match the expr version but then we probably do repeated checks for the underlying table name unless there is some caching somewhere. Maybe that performance impact isn't significant.

Are these changes tested?

Yes some additional tests are provided.

Are there any user-facing changes?

I had to update the drop_columns(&[]) test since the type can no longer be inferred. I'm not sure if that is representative of any actual use cases though since I expect the more common is a vector that might be empty in which case the type would be specified.

It now requires specifying columns with dots in them similar to other places "\"f.col1\"" to disambiguate from "f.col1".

@github-actions github-actions bot added core Core DataFusion crate functions Changes to functions implementation labels Dec 29, 2025
@renato2099
Copy link
Contributor

now we are changing/improving the behavior of drop_columns, we should probably update the documentation (wherever the right place is?). I mean after this PR drop_columns now uses find_qualified_columns so now:

  • Qualified names are supported
  • Missing columns now error instead of silently failing

Copy link
Contributor

@renato2099 renato2099 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my only request/comment would be to find the place where we can update the documentation for the drop_column change, thanks!

.map(|name| {
schema
.qualified_field_from_column(&Column::from_name(*name))
.map_err(|_| plan_datafusion_err!("Column '{}' not found", name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much better than silently ignoring the error 👍

@ntjohnson1
Copy link
Contributor Author

now we are changing/improving the behavior of drop_columns, we should probably update the documentation (wherever the right place is?). I mean after this PR drop_columns now uses find_qualified_columns so now:

  • Qualified names are supported
  • Missing columns now error instead of silently failing

So the drop_columns documentation actually doesn't specify anything about qualified or unqualified names at the moment

/// Returns a new DataFrame containing all columns except the specified columns.
which was my original issue. IMO this makes the implementation match the existing documentation.

Also just clarification that drop_columns doesn't actually call find_qualified_columns. Instead find_qualified_columns is a public helper that makes it easier to call drop_columns with qualified names if you don't remember the table name for a dataframe but have the variable for that dataframe lying around (I added a test for this use case).

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

pub fn drop_columns(self, columns: &[&str]) -> Result<DataFrame> {
pub fn drop_columns<T>(self, columns: &[T]) -> Result<DataFrame>
where
T: Into<Column> + Clone,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering why the Clone bound is needed. In case anyone else is curious -- it turns out it needed because what is passed in is &[T] (vs something like [T])

error[E0277]: the trait bound `datafusion_common::Column: std::convert::From<&T>` is not satisfied
   --> datafusion/core/src/dataframe/mod.rs:455:42
    |
455 |                 let column: Column = col.into();
    |                                          ^^^^ the trait `std::convert::From<&T>` is not implemented for `datafusion_common::Column`
    |
    = note: required for `&T` to implement `std::convert::Into<datafusion_common::Column>`
help: consider extending the `where` clause, but there might be an alternative better way to express this requirement
    |
450 |         T: Into<Column>, datafusion_common::Column: std::convert::From<&T>
    |                          +++++++++++++++++++++++++++++++++++++++++++++++++


ctx.register_batch("t", batch)?;

let df = ctx.table("t").await?.drop_columns(&["f.c1"])?;
let df = ctx.table("t").await?.drop_columns(&["\"f.c1\""])?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while this is technically a breaking API change, I think it is reasonable to treat it as a bug fix

@alamb
Copy link
Contributor

alamb commented Jan 13, 2026

Thank you @ntjohnson1 and @renato2099

@github-actions github-actions bot removed the functions Changes to functions implementation label Jan 13, 2026
@alamb alamb added this pull request to the merge queue Jan 16, 2026
Merged via the queue into apache:main with commit 8afccc9 Jan 16, 2026
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Drop columns only allows unqualified names and silently fails otherwise

3 participants