Skip to content

Conversation

@AlenkaF
Copy link
Member

@AlenkaF AlenkaF commented May 19, 2022

A series of 3 PRs add doctest functionality to ensure that docstring examples are actually correct (and keep being correct).

This PR can be tested with pytest --doctest-modules python/pyarrow.

@github-actions
Copy link

@github-actions
Copy link

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

@AlenkaF
Copy link
Member Author

AlenkaF commented May 20, 2022

There are some examples I removed from ds.dataset that I will add back as a follow-up when I will work on the docstring examples for Filesystems as they include, for example, reading from an S3 bucket and so I need a bit more time to find a good solution.

@AlenkaF AlenkaF marked this pull request as ready for review May 20, 2022 08:09
@AlenkaF
Copy link
Member Author

AlenkaF commented May 20, 2022

@raulcd @jorisvandenbossche the code now works for --doctest-modules (not sure about AppVeyor pyarrow test error ...). I would suggest creating a new workflow task that only runs the doctests in a separate PR following what Raul suggested when we talked. There I would remove addopts option for doctest from setup.cfg.

@raulcd
Copy link
Member

raulcd commented May 20, 2022

Hi @AlenkaF creating a new job sounds good to me. I have checked out your PR locally and there are a couple of things that are not clear to me. If I try to run tests locally with the current setup python -m pytest python/pyarrow/tests/ no tests are collected:

$ python -m pytest python/pyarrow/tests/
============================================================== test session starts ===============================================================
platform linux -- Python 3.10.4, pytest-7.1.2, pluggy-1.0.0
rootdir: /home/raulcd/open_source/arrow/python, configfile: setup.cfg
plugins: hypothesis-6.46.3, lazy-fixture-0.6.3
collected 0 items                                                                                                                                

============================================================= no tests ran in 0.02s ==============================================================

This is because the new conftest skips collecting tests if we are using --doctest-modules which is the default set up.
This is solved if I run tests using: pytest -r s -v --pyargs pyarrow but this is not how we have it documented on the developers guide.

I also have been able to reproduce the same appveyor failure if I remove the new conftest file, we might require some more investigation:

$ mv conftest.py no_conftest.py
$ pytest -r s -v  --pyargs pyarrow.tests
...
collected 4239 items / 2 errors / 6 skipped                                                                                                      

===================================================================== ERRORS =====================================================================
______________________________________________ ERROR collecting pyarrow/tests/deserialize_buffer.py ______________________________________________
tests/deserialize_buffer.py:24: in <module>
    with open(sys.argv[1], 'rb') as f:
E   FileNotFoundError: [Errno 2] No such file or directory: '-r'
______________________________________________ ERROR collecting pyarrow/tests/read_record_batch.py _______________________________________________
tests/read_record_batch.py:26: in <module>
    with open(sys.argv[1], 'rb') as f:
E   FileNotFoundError: [Errno 2] No such file or directory: '-r'
============================================================ short test summary info =============================================================
SKIPPED [2] tests/test_cuda.py:32: could not import 'pyarrow.cuda': No module named 'pyarrow._cuda'
SKIPPED [2] tests/test_cuda_numba_interop.py:23: could not import 'pyarrow.cuda': No module named 'pyarrow._cuda'
SKIPPED [2] tests/test_jvm.py:27: could not import 'jpype': No module named 'jpype'
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 2 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
========================================================== 6 skipped, 2 errors in 2.69s ==========================================================

I would propose to not add --doctest-modules to the default pytest setup, this also solved the appveyor issue I was able to experience locally.

@AlenkaF
Copy link
Member Author

AlenkaF commented May 20, 2022

Thanks @raulcd for reviewing and testing locally!

For the first issue: python -m pytest python/pyarrow/tests/ doesn't collect any tests which is correct. The skip of all the tests is added to the conftest file on purpose as this feature should only check the docstring examples in .py files and not the unit tests. Could you try to run python -m pytest python/pyarrow/ and let me know how it goes?

For the second issue. The errors that you get when removing newly added conftest file are due to the fact that doctest collects all the .py files from pyarrow to run doctest on. Because some modules were not build (in you case cuda for example) doctest complains. For this reason new conftest file was added, to check which modules are not installed and tell doctest to skip the files that are connected to this missing modules.

The issue was solved by removing --doctest--modules from pytest setup because in that case doctest didn't run at all. So we need the conftest file to skip the files connected to the missing modules and then we should move --doctest-modules from pytest setup to a new job where we would add something similar to pytest --doctest-modules python/pyarrow/.

@AlenkaF
Copy link
Member Author

AlenkaF commented May 20, 2022

Thanks for helping me understand your comment better Raul and sorry for not seeing the problem! Now I get what Joris was trying to tell me yesterday also :)

Yes, tests are totally being skipped due to this change and that's not good at all.

I will remove --doctest-modules from pytest setup and this PR will be ready I think. Then I will make a separate one for --doctest-cython similar to this PR and the last one will be the PR for the CI job. @jorisvandenbossche what do you think?

@AlenkaF AlenkaF changed the title ARROW-16018: [Doc][Python] Run doctests on Python docstring examples ARROW-16018: [Doc][Python] Run doctests on Python docstring examples (--doctest-modules) May 20, 2022
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should maybe see if it would be possible to de-duplicate the groups/defaults definitions in both conftest.py files (I suppose that if all of them are defined in the top-level conftest.py, that should be fine for the tests as well)


try:
from pyarrow.fs import S3FileSystem # noqa
defaults['fs'] = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there S3 examples in fs.py?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is just one (line)

>>> copy_files("s3://your-bucket-name", "local-directory")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realised I will have to check these examples also (as they do not get checked locally for me at the moment).

@jorisvandenbossche
Copy link
Member

I will remove --doctest-modules from pytest setup and this PR will be ready I think. Then I will make a separate one for --doctest-cython similar to this PR and the last one will be the PR for the CI job. @jorisvandenbossche what do you think?

Yes, that sounds good. Although I think you can maybe do the CI job one before tackling the cython doctests; in that way we can directly test it in CI in the cython PR.

@AlenkaF
Copy link
Member Author

AlenkaF commented May 23, 2022

I will do the CI one today and then we can decide which PR to close first =)

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a pass through the doctest changes and added a few comments.

Further, I think we should still take a look whether we can deduplicate some content of the conftest.py files:

We should maybe see if it would be possible to de-duplicate the groups/defaults definitions in both conftest.py files (I suppose that if all of them are defined in the top-level conftest.py, that should be fine for the tests as well)

@AlenkaF
Copy link
Member Author

AlenkaF commented May 24, 2022

Further, I think we should still take a look whether we can deduplicate some content of the conftest.py files:

We should maybe see if it would be possible to de-duplicate the groups/defaults definitions in both conftest.py files (I suppose that if all of them are defined in the top-level conftest.py, that should be fine for the tests as well)

Yes, totally agree. Had it in mind. There is some small differences but I will try to put the code for groups/defaults definitions together in the top level (pyarrow) conftest.py file and leave the rest as is.

I will also check other comments and make changes. Thanks for reviewing!



# Save output files from doctest examples into temp dir
@pytest.fixture(autouse=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a follow-up, it might be possible use a dynamic scope to ensure we don't have to run this for every test in /tests (in case that adds runtime to our tests).
Or alternatively, we would also override this fixture in /tests/conftest.py to be a no-op fixture, and then all tests in /tests should automatically use that version of the fixture.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran it with and without this fixture on test_arrays.py, and it doesn't give a noticeable difference, so probably not worth looking into making this "smarter" with a dynamic scope.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some minor comments, but maybe (since this is now green) we can also merge this and you can address those comments in #13216 ?

@AlenkaF
Copy link
Member Author

AlenkaF commented May 25, 2022

Thanks! I would address the comments in this PR.
For the fixture scope follow-up, I will see if I will manage to put it into #13216. Otherwise I will create a JIRA for it.

@AlenkaF
Copy link
Member Author

AlenkaF commented May 25, 2022

If the checks pass, this should be ready to merge @jorisvandenbossche.

@ursabot
Copy link

ursabot commented May 25, 2022

Benchmark runs are scheduled for baseline = fe2ce20 and contender = 3b92f02. 3b92f02 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.19% ⬆️0.0%] test-mac-arm
[Failed ⬇️0.37% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.08% ⬆️0.04%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 3b92f027 ec2-t3-xlarge-us-east-2
[Failed] 3b92f027 test-mac-arm
[Failed] 3b92f027 ursa-i9-9960x
[Finished] 3b92f027 ursa-thinkcentre-m75q
[Finished] fe2ce209 ec2-t3-xlarge-us-east-2
[Failed] fe2ce209 test-mac-arm
[Failed] fe2ce209 ursa-i9-9960x
[Finished] fe2ce209 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants