Skip to content

[Python] Extend project to accept a list of types + add DuckDBPyType class#6777

Merged
Mytherin merged 39 commits intoduckdb:masterfrom
Tishj:python_project_on_column_types
Apr 13, 2023
Merged

[Python] Extend project to accept a list of types + add DuckDBPyType class#6777
Mytherin merged 39 commits intoduckdb:masterfrom
Tishj:python_project_on_column_types

Conversation

@Tishj
Copy link
Contributor

@Tishj Tishj commented Mar 18, 2023

This PR implements the feature request in #6706

project on list of column types

This allows you to provide a list of types to project on instead of column names, selecting all of the columns of the relation that match any of the provided types.

The implementation of this is likely temporarily, as discussed in the linked request.
Aiming to replace this when we have support for this functionality in core.

DuckDBPyType

I initially thought it was overkill to add a python class to represent our LogicalType, but we have thought of another use for this, so I have opted to add it anyways.

Primitives (defined on duckdb.typing)

  • SQLNULL
  • BOOLEAN
  • TINYINT
  • UTINYINT
  • SMALLINT
  • USMALLINT
  • INTEGER
  • UINTEGER
  • BIGINT
  • UBIGINT
  • HUGEINT
  • UUID
  • FLOAT
  • DOUBLE
  • DATE
  • TIMESTAMP
  • TIMESTAMP_MS
  • TIMESTAMP_NS
  • TIMESTAMP_S
  • TIME
  • TIME_TZ
  • TIMESTAMP_TZ
  • VARCHAR
  • BLOB
  • BIT
  • INTERVAL

Creation methods

  • sqltype(type_str: str)
    alias: type, dtype
    create a type from parsing the type_str, this can also be used for user or extension defined types
  • string_type(collation: str = "")
    create VARCHAR type with optional collation
  • decimal_type(width: int, scale: int)
    create a DECIMAL type of the given width + scale
  • enum_type(name: str, type: DuckDBPyType, values: list)
    create en ENUM type from the 'values' list, cast as type as underlying values
  • array_type(type: DuckDBPyType)
    alias: list_type
    create a LIST type of the 'type' as child type
  • struct_type(fields: List[DuckDBPyType] | Dict[str, DuckDBPyType])
    alias: row_type
    create a STRUCT type of the given field types (uses default names if given as List)
  • map_type(key: DuckDBPyType, value: DuckDBPyType)
    create a MAP type out of the 'key' and 'value' types
    union_type(members: List[DuckDBPyType] | Dict[str, DuckDBPyType])
    create a UNION type of the given member types (similar to STRUCT)

Misc

Can be compared against strings
Converts implicitly from: str, builtins types (str, bool, float etc..), list, dict, {'name', type, ..} dictionary (to STRUCT), numpy builtin types int64, bool_, float32 etc.

@Tishj
Copy link
Contributor Author

Tishj commented Mar 18, 2023

The implementation of the EnumType creation method is left as an exercise for the reader ;)
No but in all seriousness, I'll probably remove it for now.

Copy link
Contributor Author

@Tishj Tishj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some further ideas

Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Looks great. I think adding support for types is a great idea - some comments below:

@Tishj Tishj requested a review from Mytherin March 27, 2023 12:34
@Mytherin
Copy link
Collaborator

Could you have a look at fixing the merge conflicts?

@Tishj
Copy link
Contributor Author

Tishj commented Mar 30, 2023

The failure seems unrelated?
Could we merge this, or first rerun the failing test?

@Tishj
Copy link
Contributor Author

Tishj commented Apr 3, 2023

Pandas provides this method as select_dtypes, and it has the option for both an include and an exclude list

Do we maybe also want to add the exclude option and provide a select_dtypes alias for this method?

@Mytherin
Copy link
Collaborator

Mytherin commented Apr 3, 2023

I think providing both options is sensible. Perhaps we should just rename it to be fully compatible? Or have both select_types and select_dtypes (considering our types aren't named dtypes).

@Tishj
Copy link
Contributor Author

Tishj commented Apr 4, 2023

I think I'll add exclude later, I'm working on the scalar python udf currently which depends on this PR

@Mytherin Mytherin merged commit 236e580 into duckdb:master Apr 13, 2023
@Mytherin
Copy link
Collaborator

Thanks! LGTM

@Tishj Tishj deleted the python_project_on_column_types branch November 7, 2025 16:15
hawkfish added a commit to hawkfish/duckdb that referenced this pull request Dec 5, 2025
* Add resource requirements.
hawkfish added a commit to hawkfish/duckdb that referenced this pull request Dec 7, 2025
* Fix NOP memory race with placement new atomics.
hawkfish added a commit to hawkfish/duckdb that referenced this pull request Dec 7, 2025
lnkuiper added a commit that referenced this pull request Dec 11, 2025
* Pull out L1/L2 construction to tasks (still single threaded)
* Convert IEJoinUnion to work on slices of L2
* Remove dead code.
* Cherry pick block iterator locking fix.
* Add resource requirements to test.
* Fix NOP memory race with placement new atomics.
* Fix missing include
github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request Feb 16, 2026
Remove pointer indirection in ExtensionAccess (duckdb/duckdb#19529)
fix: link error on linux with multiple definition of LogicalType::VARCHAR in `shell_renderer.cpp` (duckdb/duckdb#20096)
Internal duckdb/duckdb#6777: IEJoin Unified L1/2 (duckdb/duckdb#20083)
github-actions bot added a commit to duckdb/duckdb-r that referenced this pull request Feb 16, 2026
Remove pointer indirection in ExtensionAccess (duckdb/duckdb#19529)
fix: link error on linux with multiple definition of LogicalType::VARCHAR in `shell_renderer.cpp` (duckdb/duckdb#20096)
Internal duckdb/duckdb#6777: IEJoin Unified L1/2 (duckdb/duckdb#20083)

Co-authored-by: krlmlr <krlmlr@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants