Skip to content

Add QgsVectorLayerUtils.fieldToDataArray(), QgsVectorLayer.field_to_numpy()#63532

Merged
nyalldawson merged 4 commits intoqgis:masterfrom
nyalldawson:field_as_array
Oct 29, 2025
Merged

Add QgsVectorLayerUtils.fieldToDataArray(), QgsVectorLayer.field_to_numpy()#63532
nyalldawson merged 4 commits intoqgis:masterfrom
nyalldawson:field_as_array

Conversation

@nyalldawson
Copy link
Copy Markdown
Collaborator

QgsVectorLayerUtils.fieldToDataArray:

Converts field values from an iterator to a binary array of data. The conversion is heavily optimised to provide fastest possible
conversion to binary data. Only numeric data types are supported, other types will raise a Python TypeError exception.

QgsVectorLayer.field_to_numpy:

Returns the values from a field as a numpy masked array. Heavily optimised to provide fantastic performance. Supports numeric fields only, other types raise a TypeError.

Converts field values from an iterator to a binary array of data.
The conversion is heavily optimised to provide fastest possible
conversion to binary data.

Only numeric data types are supported, other types will raise
a Python TypeError exception.
@nyalldawson nyalldawson added API API improvement only, no visible user interface changes PyQGIS Related to the PyQGIS API labels Oct 14, 2025
@github-actions github-actions bot added this to the 4.0.0 milestone Oct 14, 2025
@nyalldawson
Copy link
Copy Markdown
Collaborator Author

@merydian what do you think?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Oct 14, 2025

🍎 MacOS Qt6 builds

Download MacOS Qt6 builds of this PR for testing.
This installer is not signed, control+click > open the app to avoid the warning
(Built from commit b731447)

🪟 Windows Qt6 builds

Download Windows Qt6 builds of this PR for testing.
(Built from commit b731447)

Returns the values from a field as a numpy masked array.
Heavily optimised to provide fantastic performance.

Supports numeric fields only, other types raise a TypeError
@merydian
Copy link
Copy Markdown
Contributor

Great addition, everything looks good to me. I suppose this should be used in QgsVectorLayer.as_geopandas for better performance?

Do you think implementing this for strings etc. would be worth considering?

@nyalldawson
Copy link
Copy Markdown
Collaborator Author

@merydian

Great addition, everything looks good to me. I suppose this should be used in QgsVectorLayer.as_geopandas for better performance?

Eventually, yes -- but there's a few more steps first (see below)

Do you think implementing this for strings etc. would be worth considering?

It's not possible to use (variable length) strings in numpy arrays, as all the objects in the array must have equal size. (For this reason I think it'd be possible to support boolean/date/time fields in addition to numeric, but I haven't looked into that).

What I'm thinking for next steps are:

  1. Add a similar method to field_as_numpy, for fields_as_numpy (and accompanying QgsVectorLayerUtils.fieldsToDataArray). This would accept a list of (numeric!) fields, and first create an interleaved raw binary blob which is then used as the source of a multi-dimensional numpy array. (The advantage here is iterating ONCE over the data and building up the binary data in one pass)
  2. Add a method "as_dataframe". This would scan the layer field types, and for all numeric fields use fields_as_numpy to create a single multidimensional numpy array. Then we'd use the data frame constructor to take columns from that array. For non-numeric fields, we'd have to resort to a similar approach as you used for as_geopandas. We could further optimise this by adding a callback function to QgsVectorLayerUtils.fieldsToDataArray so that we can build the non-numeric column lists during the same layer iteration pass as we use for the numeric ones (otherwise we'd incur two layer iterations, one for numeric and one for non-numeric).
  3. Rework as_geopandas to use as_dataframe (or similar code to it) but with the addition of the geometry handling.

@merydian
Copy link
Copy Markdown
Contributor

Sounds good! Do you have any thoughts on how to split up this work, so that we don't do the same things simultaneously?

Also just noticed that the methods in this PR are called to_numpy, whereas the other implementations are called as_numpy...

@nyalldawson
Copy link
Copy Markdown
Collaborator Author

Sounds good! Do you have any thoughts on how to split up this work, so that we don't do the same things simultaneously?

How about I add the raw c++ API for fieldsToDataArray, and then you try building the python stuff on top of that?

Also just noticed that the methods in this PR are called to_numpy, whereas the other implementations are called as_numpy...

Thanks, fixed

@merydian
Copy link
Copy Markdown
Contributor

On second thought, I'd like to try my hand at the whole implementation if that's fine with you? This PR would serve as a nice template I suppose.

@nyalldawson nyalldawson merged commit 49f2893 into qgis:master Oct 29, 2025
36 of 42 checks passed
@nyalldawson nyalldawson deleted the field_as_array branch October 29, 2025 04:37
@nyalldawson
Copy link
Copy Markdown
Collaborator Author

@merydian

On second thought, I'd like to try my hand at the whole implementation if that's fine with you? This PR would serve as a nice template I suppose.

Go for it! Just reach out if you get stuck!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

API API improvement only, no visible user interface changes PyQGIS Related to the PyQGIS API

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants