We already have an example of column selection pushdown for loading columns from parquet.
- However, this selection fails for something complex like
df.col[df.x > 0 & df.y < 100 & df.y > 0] which should only read three of the source columns, not all of them.
- We don't do this for other data types where it may be useful (CSV somewhat, ORC definitely, SQL yes but hard - others?)