Skip to content

Commit aea10c2

Browse files
GH-41480: [Python] Update Python development guide about components being enabled by default based on Arrow C++ (#41705)
### Rationale for this change Follow-up on #41494 to update the Python development guide to reflect the change in how PyArrow is build (defaults for the various `PYARROW_BUILD_<component>` are now set based on the `ARROW_<component>` setting. The current `PYARROW_WITH_<component>` environment variables are kept working to allow to override this default) * GitHub Issue: #41480 Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
1 parent 2ae6d11 commit aea10c2

1 file changed

Lines changed: 49 additions & 46 deletions

File tree

docs/source/developers/python.rst

Lines changed: 49 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -397,18 +397,14 @@ Now, build pyarrow:
397397
.. code-block::
398398
399399
$ pushd arrow/python
400-
$ export PYARROW_WITH_PARQUET=1
401-
$ export PYARROW_WITH_DATASET=1
402400
$ export PYARROW_PARALLEL=4
403401
$ python setup.py build_ext --inplace
404402
$ popd
405403
406-
If you did build one of the optional components (in C++), you need to set the
407-
corresponding ``PYARROW_WITH_$COMPONENT`` environment variable to 1.
408-
409-
Similarly, if you built with ``PARQUET_REQUIRE_ENCRYPTION`` (in C++), you
410-
need to set the corresponding ``PYARROW_WITH_PARQUET_ENCRYPTION`` environment
411-
variable to 1.
404+
If you did build one of the optional components in C++, the equivalent components
405+
will be enabled by default for building pyarrow. This default can be overridden
406+
by setting the corresponding ``PYARROW_WITH_$COMPONENT`` environment variable
407+
to 0 or 1, see :ref:`python-dev-env-variables` below.
412408

413409
To set the number of threads used to compile PyArrow's C++/Cython components,
414410
set the ``PYARROW_PARALLEL`` environment variable.
@@ -551,7 +547,6 @@ Now, we can build pyarrow:
551547
.. code-block::
552548
553549
$ pushd arrow\python
554-
$ set PYARROW_WITH_PARQUET=1
555550
$ set CONDA_DLL_SEARCH_MODIFICATION_ENABLE=1
556551
$ python setup.py build_ext --inplace
557552
$ popd
@@ -601,46 +596,12 @@ Then run the unit tests with:
601596
Caveats
602597
-------
603598

599+
.. _python-dev-env-variables:
600+
604601
Relevant components and environment variables
605602
=============================================
606603

607-
List of relevant Arrow CMake flags and corresponding environment variables
608-
to be used when building PyArrow are:
609-
610-
.. list-table::
611-
:widths: 30 30
612-
:header-rows: 1
613-
614-
* - Arrow flags/options
615-
- Corresponding environment variables for PyArrow
616-
* - ``CMAKE_BUILD_TYPE``
617-
- ``PYARROW_BUILD_TYPE`` (release, debug or relwithdebinfo)
618-
* - ``ARROW_GCS``
619-
- ``PYARROW_WITH_GCS``
620-
* - ``ARROW_S3``
621-
- ``PYARROW_WITH_S3``
622-
* - ``ARROW_HDFS``
623-
- ``PYARROW_WITH_HDFS``
624-
* - ``ARROW_CUDA``
625-
- ``PYARROW_WITH_CUDA``
626-
* - ``ARROW_SUBSTRAIT``
627-
- ``PYARROW_WITH_SUBSTRAIT``
628-
* - ``ARROW_FLIGHT``
629-
- ``PYARROW_WITH_FLIGHT``
630-
* - ``ARROW_DATASET``
631-
- ``PYARROW_WITH_DATASET``
632-
* - ``ARROW_PARQUET``
633-
- ``PYARROW_WITH_PARQUET``
634-
* - ``PARQUET_REQUIRE_ENCRYPTION``
635-
- ``PYARROW_WITH_PARQUET_ENCRYPTION``
636-
* - ``ARROW_TENSORFLOW``
637-
- ``PYARROW_WITH_TENSORFLOW``
638-
* - ``ARROW_ORC``
639-
- ``PYARROW_WITH_ORC``
640-
* - ``ARROW_GANDIVA``
641-
- ``PYARROW_WITH_GANDIVA``
642-
643-
List of relevant environment variables that can also be used to build
604+
List of relevant environment variables that can be used to build
644605
PyArrow are:
645606

646607
.. list-table::
@@ -650,6 +611,9 @@ PyArrow are:
650611
* - PyArrow environment variable
651612
- Description
652613
- Default value
614+
* - ``PYARROW_BUILD_TYPE``
615+
- Build type for PyArrow (release, debug or relwithdebinfo), sets ``CMAKE_BUILD_TYPE``
616+
- ``release``
653617
* - ``PYARROW_CMAKE_GENERATOR``
654618
- Example: ``'Visual Studio 15 2017 Win64'``
655619
- ``''``
@@ -678,6 +642,45 @@ PyArrow are:
678642
- Number of processes used to compile PyArrow’s C++/Cython components
679643
- ``''``
680644

645+
The components being disabled or enabled when building PyArrrow is by default
646+
based on how Arrow C++ is build (i.e. it follows the ``ARROW_$COMPONENT`` flags).
647+
However, the ``PYARROW_WITH_$COMPONENT`` environment variables can still be used
648+
to override this when building PyArrow (e.g. to disable components, or to enforce
649+
certain components to be built):
650+
651+
.. list-table::
652+
:widths: 30 30
653+
:header-rows: 1
654+
655+
* - Arrow flags/options
656+
- Corresponding environment variables for PyArrow
657+
* - ``ARROW_GCS``
658+
- ``PYARROW_WITH_GCS``
659+
* - ``ARROW_S3``
660+
- ``PYARROW_WITH_S3``
661+
* - ``ARROW_AZURE``
662+
- ``PYARROW_WITH_AZURE``
663+
* - ``ARROW_HDFS``
664+
- ``PYARROW_WITH_HDFS``
665+
* - ``ARROW_CUDA``
666+
- ``PYARROW_WITH_CUDA``
667+
* - ``ARROW_SUBSTRAIT``
668+
- ``PYARROW_WITH_SUBSTRAIT``
669+
* - ``ARROW_FLIGHT``
670+
- ``PYARROW_WITH_FLIGHT``
671+
* - ``ARROW_ACERO``
672+
- ``PYARROW_WITH_ACERO``
673+
* - ``ARROW_DATASET``
674+
- ``PYARROW_WITH_DATASET``
675+
* - ``ARROW_PARQUET``
676+
- ``PYARROW_WITH_PARQUET``
677+
* - ``PARQUET_REQUIRE_ENCRYPTION``
678+
- ``PYARROW_WITH_PARQUET_ENCRYPTION``
679+
* - ``ARROW_ORC``
680+
- ``PYARROW_WITH_ORC``
681+
* - ``ARROW_GANDIVA``
682+
- ``PYARROW_WITH_GANDIVA``
683+
681684
Deleting stale build artifacts
682685
==============================
683686

0 commit comments

Comments
 (0)