Skip to content

dataorigin branch#17839

Merged
bsipocz merged 11 commits intoastropy:mainfrom
gilleslandais:dataorigin
Apr 25, 2025
Merged

dataorigin branch#17839
bsipocz merged 11 commits intoastropy:mainfrom
gilleslandais:dataorigin

Conversation

@gilleslandais
Copy link
Contributor

Hello,
The pull request includes a simple API to extract basic provenance information from VOTAble header.

Metadata are described in the IVOA Note https://ivoa.net/documents/DataOrigin/index.html (@gilleslandais, @augustfly, @msdemlei, @rsav)

@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2025

Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.

  • Do the proposed changes actually accomplish desired goals?
  • Do the proposed changes follow the Astropy coding guidelines?
  • Are tests added/updated as required? If so, do they follow the Astropy testing guidelines?
  • Are docs added/updated as required? If so, do they follow the Astropy documentation guidelines?
  • Is rebase and/or squash necessary? If so, please provide the author with appropriate instructions. Also see instructions for rebase and squash.
  • Did the CI pass? If no, are the failures related? If you need to run daily and weekly cron jobs as part of the PR, please apply the "Extra CI" label. Codestyle issues can be fixed by the bot.
  • Is a change log needed? If yes, did the change log check pass? If no, add the "no-changelog-entry-needed" label. If this is a manual backport, use the "skip-changelog-checks" label unless special changelog handling is necessary.
  • Is this a big PR that makes a "What's new?" entry worthwhile and if so, is (1) a "what's new" entry included in this PR and (2) the "whatsnew-needed" label applied?
  • At the time of adding the milestone, if the milestone set requires a backport to release branch(es), apply the appropriate "backport-X.Y.x" label(s) before merge.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Welcome to Astropy 👋 and congratulations on your first pull request! 🎉

A project member will respond to you as soon as possible; in the meantime, please have a look over the Checklist for Contributed Code and make sure you've addressed as many of the questions there as possible.

If you feel that this pull request has not been responded to in a timely manner, please send a message directly to the development mailing list. If the issue is urgent or sensitive in nature (e.g., a security vulnerability) please send an e-mail directly to the private e-mail feedback@astropy.org.

Copy link
Member

@pllim pllim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello and thanks!

At first glance, the Sphinx docstring and user doc need some clean up (e.g., remove extra indentation, blank line after colon before listing, whitespace after >>>, and so on). Please use existing doc in the repo as reference.

Your new module also need an __all__ that only contains public API.

If you have a downstream PR ready to use this, please cross link here also.

@pllim pllim added this to the v7.1.0 milestone Mar 3, 2025
@pllim pllim requested review from bsipocz and tomdonaldson March 3, 2025 14:59
@gilleslandais gilleslandais force-pushed the dataorigin branch 4 times, most recently from ba07eaa to cda8522 Compare March 3, 2025 17:37
@pllim
Copy link
Member

pllim commented Mar 3, 2025

There are warnings in your doc build that must be addressed.

  • docs/io/votable/dataorigin.rst:61: WARNING: Bullet list ends without a blank line; unexpected unindent. [docutils]
  • docs/io/votable/dataorigin.rst:71: WARNING: Bullet list ends without a blank line; unexpected unindent. [docutils]
  • docs/io/votable/index.rst:9: WARNING: duplicate label astropy-io-votable, other instance in docs/io/votable/dataorigin.rst
  • docs/io/votable/dataorigin.rst: WARNING: document isn't included in any toctree [toc.not_included]
  • docs/io/votable/dataorigin.rst:61: WARNING: py:obj reference target not found: astropy.io.votable.DataOrigin [ref.obj]
  • docs/io/votable/dataorigin.rst:71: WARNING: py:obj reference target not found: astropy.io.votable.DataOrigin [ref.obj]
  • docs/io/votable/dataorigin.rst:83: WARNING: py:obj reference target not found: astropy.io.votable.DataOrigin [ref.obj]

@gilleslandais gilleslandais force-pushed the dataorigin branch 3 times, most recently from 9c07312 to d2011a7 Compare March 4, 2025 12:42
@gilleslandais
Copy link
Contributor Author

There are warnings in your doc build that must be addressed.

  • docs/io/votable/dataorigin.rst:61: WARNING: Bullet list ends without a blank line; unexpected unindent. [docutils]
  • docs/io/votable/dataorigin.rst:71: WARNING: Bullet list ends without a blank line; unexpected unindent. [docutils]
  • docs/io/votable/index.rst:9: WARNING: duplicate label astropy-io-votable, other instance in docs/io/votable/dataorigin.rst
  • docs/io/votable/dataorigin.rst: WARNING: document isn't included in any toctree [toc.not_included]
  • docs/io/votable/dataorigin.rst:61: WARNING: py:obj reference target not found: astropy.io.votable.DataOrigin [ref.obj]
  • docs/io/votable/dataorigin.rst:71: WARNING: py:obj reference target not found: astropy.io.votable.DataOrigin [ref.obj]
  • docs/io/votable/dataorigin.rst:83: WARNING: py:obj reference target not found: astropy.io.votable.DataOrigin [ref.obj]

I removed warnings except 2 alerts that I have not managed to delete :

/home/docs/checkouts/readthedocs.org/user_builds/astropy/checkouts/17839/docs/io/votable/dataorigin.rst:5: WARNING: duplicate label astropy-io-votable-dataorigin, other instance in /home/docs/checkouts/readthedocs.org/user_builds/astropy/checkouts/17839/docs/io/votable/dataorigin.rst
/home/docs/checkouts/readthedocs.org/user_builds/astropy/checkouts/17839/docs/io/votable/dataorigin.rst:4: WARNING: py:obj reference target not found: astropy.io.votable.dataorigin [ref.obj]

What is wrong?

Copy link
Member

@pllim pllim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you have to add dataorigin API doc to docs/io/votable/ref_api.rst.

These are mostly formatting comments. I will see if I can make sense of the remaining RTD warnings in a separate thread.

Copy link
Member

@pllim pllim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: py:obj reference target not found: astropy.io.votable.dataorigin

I think this will be fixed after your addition to ref_api.rst.

@gilleslandais gilleslandais force-pushed the dataorigin branch 6 times, most recently from ff02637 to 10029d6 Compare March 5, 2025 17:59
@gilleslandais
Copy link
Contributor Author

Hi,
I updated the python doc (according to the review)
I think that it is ok.
What's the next step?
Regards
Gilles

Copy link
Contributor

@tomdonaldson tomdonaldson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few non-blocker comments, but overall this looks pretty good. I leave it to @bsipocz to add some more comments and make changes as needed to meet the code freeze.

Comment on lines +44 to +45
dores = dataorigin.extract_data_origin(vot.resources[0])
dot = dataorigin.extract_data_origin(vot.resources[0].tables[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these used anywhere?

Exception
input type not managed
"""
data_origin = DataOrigin()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible refactor: Have each of these types implement extract_data_origin_info() (or similar name), and have them inherit from a common ancestor which specifies that abstract method.

from astropy.table import Column, Table


def __generate_votable_test():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to the programmatic construction of a VOTable here, it would be nice to have at least one actual VOTable added to the data area to be parsed and tested here. This would do a deeper tests than what is currently happening in the doc tests.

Copy link
Member

@bsipocz bsipocz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments, the two major ones that I think needs to be fixed prior merging are changing the type of exceptions being raised, and the other is to fix up the narrative docs with using more remote-data directives, etc.

astropy.io.votable.tree.Table,
),
):
raise Exception("Bad type of vot_element")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has to be a more specific exception, basically we must never raise a plain generic Exception


elif info_name in DATAORIGIN_QUERY_INFO:
if not isinstance(vot_element, astropy.io.votable.tree.VOTableFile):
raise Exception(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't raise Exception

vot_element.infos.extend([new_info])
return

raise Exception("Unknown DataOrigin info name")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here, too

Comment on lines +44 to +45
dores = dataorigin.extract_data_origin(vot.resources[0])
dot = dataorigin.extract_data_origin(vot.resources[0].tables[0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just wondering, how do unused variables pass codestyle, is that expected @pllim?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a few more asserts here anyway, e.g.

assert dores.origin[0].creator == do.origin[0].creator

Comment on lines +46 to +47
for dseto in do:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we just testing if this is iterable?

do = dataorigin.extract_data_origin(vot)

dores = dataorigin.extract_data_origin(vot.resources[0])
dot = dataorigin.extract_data_origin(vot.resources[0].tables[0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to extract nothing?

In [27]: print(dot)




In [28]: print(dot.origin)
[]

In [29]: print(dot.query)


In [30]: 

assert len(do.origin) == 1 and len(do.origin[0].creator) == 2
assert do.origin[0].creator[0] == __TEST_CREATOR1_NAME
assert len(str(do)) > 1
assert (do.origin[0].get_votable_element()) != None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!= None should not pass codestyle again, right @pllim

>>> # get DataOrigin with the description of each INFO
>>> for dataset_origin in data_origin.origin: # doctest: +REMOTE_DATA
... for info in dataset_origin.infos: # doctest: +REMOTE_DATA
... print(f"{info.name}: {info.value} ({info.content})") # doctest: +REMOTE_DATA +IGNORE_OUTPUT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should try and drop +IGNORE_OUTPUT in some of these tests, so they can test more than for basic crashes. E.g. I feel this one would be a safe one to drop (and ultimately once more tests are added in the module itself (see Tom's comment), then we could tune back on doctesting if it turns out to be too flaky

Comment on lines +90 to +93
.. code-block:: python

# get the Title retrieved in Element
>> origin = data_origin.origin[0] # doctest: +REMOTE_DATA
>> vo_elt = origin.get_votable_element() # doctest: +REMOTE_DATA
>> title = vo_elt.description if vo_elt else "" # doctest: +REMOTE_DATA
>> print(f"APA: {','.join(origin.creator)} ({origin.publication_date[0]}). {title} [Dataset]. {data_origin.query.publisher}. {origin.citation[0]}") # doctest: +REMOTE_DATA +IGNORE_OUTPUT
APA: Hong K. (2024-11-06). Period variations of 32 contact binaries (Hong+, 2024) [Dataset]. CDS. doi:10.26093/cds/vizier.51670018
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generic comment: drop the code-block directive and use remote-data instead, that would cleanup all the lines a little bit

@@ -0,0 +1,514 @@
# Licensed under a 3-clause BSD style license - see LICENSE.rst

"""Extract Data Origin in VOTable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't show up anywhere in the docs, so I would recommend ditching it altogether in favour of narrative documentation and docstrings in the classes/functions.

@bsipocz
Copy link
Member

bsipocz commented Apr 25, 2025

This also needs a rebase to fix up the docs and what's new, I'll start with that along with a patch for the docs addressing the comments above.

@bsipocz
Copy link
Member

bsipocz commented Apr 25, 2025

@pllim - could you have a look at the raised exception types here, do they make sense to you or shall change them to something else?

@pllim
Copy link
Member

pllim commented Apr 25, 2025

could you have a look at the raised exception types

At a glance, they look fine to me.

I restarted RTD. 🤞

@bsipocz
Copy link
Member

bsipocz commented Apr 25, 2025

RTD failure is related, I'm afraid something gone wrong during my rebase. I'm trying to fix it locally and then push a fix for it.

Copy link
Contributor

@tomdonaldson tomdonaldson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pending possible last minute CI work from @bsipocz and @pllim this look good to me.

@bsipocz bsipocz merged commit 2fe1909 into astropy:main Apr 25, 2025
25 of 27 checks passed
@bsipocz
Copy link
Member

bsipocz commented Apr 25, 2025

Thank you so much @gilleslandais for this PR and also thank you for your patience. I'm sorry that we ended up hijacking the PR and addressing our own review comments, but I hope you don't find any of the ones I fixed up controversial.

I'll open follow-up issues for the two remaining items from Tom's review, neither of these are release blockers but would be nice to haves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants