Feature description
After #63749 it is possible to iterate over QgsFeatures as Arrow batches! In the interest of keeping that PR reasonably scoped, several nice-to-haves on top of the core conversion logic were deferred.
- Implementing the Arrow PyCapsule Interface (i.e.,
__arrow_c_stream__(self, requested_schema=None) -> PyCapsule). This might require a tiny bit of Python C code (or a dependency) since creating Python won't let you create capsules from Python that I'm aware of.
- Allowing layer data providers to skip iterating over
QgsFeatures and provide its own ArrowArrayStream. Notably: GDAL/OGR can do this and it's much faster than per-feature iteration.
From #63749 (comment) , some places where GDAL exposes layers in this way:
From #63749 (comment) :
One way to do that would be to have a virtual bool QgsVectorDataProvider::getArrowStream( struct ArrowArrayStream* stream, const QgsFeatureRequest &request = QgsFeatureRequest() ) const method whose base implementation would use QgsArrowStream and that the OGR provider could delegate to OGR_L_GetArrowStream() if the request is simple enough to be forwarded to OGR and a similar method at the QgsVectorLayer level.
The final state after this improvement would be a compact way for Arrow Python consumers like GeoPandas to ergonomically consume a layer. Maybe:
geopandas.GeoDataFrame.from_arrow(qgis_layer_object)
Or maybe:
geopandas.GeoDataFrame.from_arrow(qgis_layer_object.getArrowStream())
Additional context
No response
Feature description
After #63749 it is possible to iterate over
QgsFeatures as Arrow batches! In the interest of keeping that PR reasonably scoped, several nice-to-haves on top of the core conversion logic were deferred.__arrow_c_stream__(self, requested_schema=None) -> PyCapsule). This might require a tiny bit of Python C code (or a dependency) since creating Python won't let you create capsules from Python that I'm aware of.QgsFeatures and provide its ownArrowArrayStream. Notably: GDAL/OGR can do this and it's much faster than per-feature iteration.From #63749 (comment) , some places where GDAL exposes layers in this way:
From #63749 (comment) :
The final state after this improvement would be a compact way for Arrow Python consumers like GeoPandas to ergonomically consume a layer. Maybe:
Or maybe:
Additional context
No response