ARROW-16018: [Doc][Python] Run doctests on Python docstring examples (--doctest-modules) #13199

AlenkaF · 2022-05-19T14:57:38Z

A series of 3 PRs add doctest functionality to ensure that docstring examples are actually correct (and keep being correct).

Add --doctest-module
Add --doctest-cython ARROW-16018: [Doc][Python] Run doctests on Python docstring examples (--doctest-cython) #13204
Create a CI job ARROW-16018: [Doc][Python] Run doctests on Python docstring examples (CI job) #13216

This PR can be tested with pytest --doctest-modules python/pyarrow.

…__.py

github-actions · 2022-05-19T14:58:23Z

https://issues.apache.org/jira/browse/ARROW-16018

github-actions · 2022-05-19T14:58:25Z

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

AlenkaF · 2022-05-20T07:02:41Z

There are some examples I removed from ds.dataset that I will add back as a follow-up when I will work on the docstring examples for Filesystems as they include, for example, reading from an S3 bucket and so I need a bit more time to find a good solution.

AlenkaF · 2022-05-20T08:09:55Z

@raulcd @jorisvandenbossche the code now works for --doctest-modules (not sure about AppVeyor pyarrow test error ...). I would suggest creating a new workflow task that only runs the doctests in a separate PR following what Raul suggested when we talked. There I would remove addopts option for doctest from setup.cfg.

raulcd · 2022-05-20T10:48:34Z

Hi @AlenkaF creating a new job sounds good to me. I have checked out your PR locally and there are a couple of things that are not clear to me. If I try to run tests locally with the current setup python -m pytest python/pyarrow/tests/ no tests are collected:

$ python -m pytest python/pyarrow/tests/
============================================================== test session starts ===============================================================
platform linux -- Python 3.10.4, pytest-7.1.2, pluggy-1.0.0
rootdir: /home/raulcd/open_source/arrow/python, configfile: setup.cfg
plugins: hypothesis-6.46.3, lazy-fixture-0.6.3
collected 0 items                                                                                                                                

============================================================= no tests ran in 0.02s ==============================================================

This is because the new conftest skips collecting tests if we are using --doctest-modules which is the default set up.
This is solved if I run tests using: pytest -r s -v --pyargs pyarrow but this is not how we have it documented on the developers guide.

I also have been able to reproduce the same appveyor failure if I remove the new conftest file, we might require some more investigation:

$ mv conftest.py no_conftest.py
$ pytest -r s -v  --pyargs pyarrow.tests
...
collected 4239 items / 2 errors / 6 skipped                                                                                                      

===================================================================== ERRORS =====================================================================
______________________________________________ ERROR collecting pyarrow/tests/deserialize_buffer.py ______________________________________________
tests/deserialize_buffer.py:24: in <module>
    with open(sys.argv[1], 'rb') as f:
E   FileNotFoundError: [Errno 2] No such file or directory: '-r'
______________________________________________ ERROR collecting pyarrow/tests/read_record_batch.py _______________________________________________
tests/read_record_batch.py:26: in <module>
    with open(sys.argv[1], 'rb') as f:
E   FileNotFoundError: [Errno 2] No such file or directory: '-r'
============================================================ short test summary info =============================================================
SKIPPED [2] tests/test_cuda.py:32: could not import 'pyarrow.cuda': No module named 'pyarrow._cuda'
SKIPPED [2] tests/test_cuda_numba_interop.py:23: could not import 'pyarrow.cuda': No module named 'pyarrow._cuda'
SKIPPED [2] tests/test_jvm.py:27: could not import 'jpype': No module named 'jpype'
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 2 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
========================================================== 6 skipped, 2 errors in 2.69s ==========================================================

I would propose to not add --doctest-modules to the default pytest setup, this also solved the appveyor issue I was able to experience locally.

AlenkaF · 2022-05-20T11:10:04Z

Thanks @raulcd for reviewing and testing locally!

For the first issue: python -m pytest python/pyarrow/tests/ doesn't collect any tests which is correct. The skip of all the tests is added to the conftest file on purpose as this feature should only check the docstring examples in .py files and not the unit tests. Could you try to run python -m pytest python/pyarrow/ and let me know how it goes?

For the second issue. The errors that you get when removing newly added conftest file are due to the fact that doctest collects all the .py files from pyarrow to run doctest on. Because some modules were not build (in you case cuda for example) doctest complains. For this reason new conftest file was added, to check which modules are not installed and tell doctest to skip the files that are connected to this missing modules.

The issue was solved by removing --doctest--modules from pytest setup because in that case doctest didn't run at all. So we need the conftest file to skip the files connected to the missing modules and then we should move --doctest-modules from pytest setup to a new job where we would add something similar to pytest --doctest-modules python/pyarrow/.

AlenkaF · 2022-05-20T12:25:08Z

Thanks for helping me understand your comment better Raul and sorry for not seeing the problem! Now I get what Joris was trying to tell me yesterday also :)

Yes, tests are totally being skipped due to this change and that's not good at all.

I will remove --doctest-modules from pytest setup and this PR will be ready I think. Then I will make a separate one for --doctest-cython similar to this PR and the last one will be the PR for the CI job. @jorisvandenbossche what do you think?

jorisvandenbossche

We should maybe see if it would be possible to de-duplicate the groups/defaults definitions in both conftest.py files (I suppose that if all of them are defined in the top-level conftest.py, that should be fine for the tests as well)

jorisvandenbossche · 2022-05-19T16:40:14Z

python/pyarrow/conftest.py

+
+try:
+    from pyarrow.fs import S3FileSystem  # noqa
+    defaults['fs'] = True


Are there S3 examples in fs.py?

I think there is just one (line)

arrow/python/pyarrow/fs.py

Line 227 in d4a7638

>>> copy_files("s3://your-bucket-name", "local-directory")

Realised I will have to check these examples also (as they do not get checked locally for me at the moment).

jorisvandenbossche · 2022-05-20T14:31:57Z

I will remove --doctest-modules from pytest setup and this PR will be ready I think. Then I will make a separate one for --doctest-cython similar to this PR and the last one will be the PR for the CI job. @jorisvandenbossche what do you think?

Yes, that sounds good. Although I think you can maybe do the CI job one before tackling the cython doctests; in that way we can directly test it in CI in the cython PR.

AlenkaF · 2022-05-23T04:28:24Z

I will do the CI one today and then we can decide which PR to close first =)

jorisvandenbossche

Did a pass through the doctest changes and added a few comments.

Further, I think we should still take a look whether we can deduplicate some content of the conftest.py files:

We should maybe see if it would be possible to de-duplicate the groups/defaults definitions in both conftest.py files (I suppose that if all of them are defined in the top-level conftest.py, that should be fine for the tests as well)

python/pyarrow/dataset.py

python/pyarrow/parquet/__init__.py

AlenkaF · 2022-05-24T07:31:27Z

Further, I think we should still take a look whether we can deduplicate some content of the conftest.py files:

We should maybe see if it would be possible to de-duplicate the groups/defaults definitions in both conftest.py files (I suppose that if all of them are defined in the top-level conftest.py, that should be fine for the tests as well)

Yes, totally agree. Had it in mind. There is some small differences but I will try to put the code for groups/defaults definitions together in the top level (pyarrow) conftest.py file and leave the rest as is.

I will also check other comments and make changes. Thanks for reviewing!

…RROW-16018

jorisvandenbossche · 2022-05-25T09:07:45Z

python/pyarrow/conftest.py

+
+
+# Save output files from doctest examples into temp dir
+@pytest.fixture(autouse=True)


For a follow-up, it might be possible use a dynamic scope to ensure we don't have to run this for every test in /tests (in case that adds runtime to our tests).
Or alternatively, we would also override this fixture in /tests/conftest.py to be a no-op fixture, and then all tests in /tests should automatically use that version of the fixture.

I ran it with and without this fixture on test_arrays.py, and it doesn't give a noticeable difference, so probably not worth looking into making this "smarter" with a dynamic scope.

jorisvandenbossche

I added some minor comments, but maybe (since this is now green) we can also merge this and you can address those comments in #13216 ?

python/pyarrow/dataset.py

python/pyarrow/conftest.py

AlenkaF · 2022-05-25T09:32:37Z

Thanks! I would address the comments in this PR.
For the fixture scope follow-up, I will see if I will manage to put it into #13216. Otherwise I will create a JIRA for it.

…octest_groups

AlenkaF · 2022-05-25T09:51:43Z

If the checks pass, this should be ready to merge @jorisvandenbossche.

ursabot · 2022-05-25T19:01:35Z

Benchmark runs are scheduled for baseline = fe2ce20 and contender = 3b92f02. 3b92f02 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.19% ⬆️0.0%] test-mac-arm
[Failed ⬇️0.37% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.08% ⬆️0.04%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 3b92f027 ec2-t3-xlarge-us-east-2
[Failed] 3b92f027 test-mac-arm
[Failed] 3b92f027 ursa-i9-9960x
[Finished] 3b92f027 ursa-thinkcentre-m75q
[Finished] fe2ce209 ec2-t3-xlarge-us-east-2
[Failed] fe2ce209 test-mac-arm
[Failed] fe2ce209 ursa-i9-9960x
[Finished] fe2ce209 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

AlenkaF added 3 commits May 19, 2022 12:59

First try at --doctest-module check for python/pyarrow/parquet/__init…

a15bbdc

…__.py

Add ignore options for modules not installed to be skipped by doctest

2a70c4e

Correct most of the failed docstring examples

cd981b6

github-actions bot added the Component: Python label May 19, 2022

AlenkaF added 2 commits May 20, 2022 08:44

Correct all the doctest (s3 examples will be done as a follow-up)

d727c7e

Linter corrections

d4f12c4

Correct CI error in test_compute.py

03ca798

AlenkaF marked this pull request as ready for review May 20, 2022 08:09

Remove --doctest-modules from setup.cfg

4ac6712

AlenkaF changed the title ~~ARROW-16018: [Doc][Python] Run doctests on Python docstring examples~~ ARROW-16018: [Doc][Python] Run doctests on Python docstring examples (--doctest-modules) May 20, 2022

jorisvandenbossche reviewed May 20, 2022

View reviewed changes

jorisvandenbossche reviewed May 24, 2022

View reviewed changes

AlenkaF and others added 8 commits May 24, 2022 10:04

Move groups and defaults definition to parent conftest.py

37c7359

Remove the output for ds.partitioning examples

dd49e02

Change pa.Table to Table in ds.dataset example

1815599

Change current working dir to temp for doctest examples

dd93335

Add lines not working yet and mark them with # doctest: +SKIP

f9f90e0

Use ParquetDataset public API for docstring examples

7f3fa96

Linter corrections

8d98588

Merge branch 'master' into ARROW-16018

8a27165

AlenkaF added 2 commits May 24, 2022 13:30

Add missing substrait in groups and defaults

453963e

Merge branch 'ARROW-16018' of https://github.com/AlenkaF/arrow into A…

20a207d

…RROW-16018

jorisvandenbossche reviewed May 25, 2022

View reviewed changes

jorisvandenbossche approved these changes May 25, 2022

View reviewed changes

python/pyarrow/dataset.py Outdated Show resolved Hide resolved

python/pyarrow/dataset.py Outdated Show resolved Hide resolved

python/pyarrow/conftest.py Outdated Show resolved Hide resolved

python/pyarrow/conftest.py Outdated Show resolved Hide resolved

AlenkaF added 2 commits May 25, 2022 11:47

Change indentations, if sentence break and add missing substrait to d…

bda86b1

…octest_groups

Change doctestmodules reference

2e7ca5e

jorisvandenbossche mentioned this pull request May 25, 2022

ARROW-16018: [Doc][Python] Run doctests on Python docstring examples (CI job) #13216

Closed

jorisvandenbossche closed this in 3b92f02 May 25, 2022

AlenkaF deleted the ARROW-16018 branch May 26, 2022 03:56

jorisvandenbossche mentioned this pull request Jul 7, 2022

ARROW-12526: Pre-generating pyarrow.compute and creating a docstring additions system for pyarrow functions #13126

Open

asfimport mentioned this pull request May 26, 2022

[Doc][Python] Run doctests on Python docstring examples #31440

Closed



		# Save output files from doctest examples into temp dir
		@pytest.fixture(autouse=True)

ARROW-16018: [Doc][Python] Run doctests on Python docstring examples (--doctest-modules) #13199

ARROW-16018: [Doc][Python] Run doctests on Python docstring examples (--doctest-modules) #13199

Uh oh!

Conversation

AlenkaF commented May 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 19, 2022

Uh oh!

github-actions bot commented May 19, 2022

Uh oh!

AlenkaF commented May 20, 2022

Uh oh!

AlenkaF commented May 20, 2022

Uh oh!

raulcd commented May 20, 2022

Uh oh!

AlenkaF commented May 20, 2022

Uh oh!

AlenkaF commented May 20, 2022

Uh oh!

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche May 19, 2022

Choose a reason for hiding this comment

Uh oh!

AlenkaF May 23, 2022

Choose a reason for hiding this comment

Uh oh!

AlenkaF May 23, 2022

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche commented May 20, 2022

Uh oh!

AlenkaF commented May 23, 2022

Uh oh!

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlenkaF commented May 24, 2022

Uh oh!

jorisvandenbossche May 25, 2022

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche May 25, 2022

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlenkaF commented May 25, 2022

Uh oh!

AlenkaF commented May 25, 2022

Uh oh!

ursabot commented May 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

AlenkaF commented May 19, 2022 •

edited

Loading