Add the GMTSampleData class to simplify the load_sample_data and list_sample_data functions (#2342) by seisman · Pull Request #2342 · GenericMappingTools/pygmt

seisman · 2023-02-01T00:03:46Z

Description of proposed changes

It's easier to review this PR by looking at the changes in each commit:

22a8e48: Move load_sample_data and list_sample_data to the end. Required by the changes in commit 4641649
4641649: Add a GMTSampleData class, similar to the GMTRemoteData used in load_remote_dataset.py. This class contains two attributes, the function to load the dataset and the description of the dataset. With the GMTSampleData class, the load_sample_data and list_sample_data functions are greatly simplified and we no longer need to maintain two dictionaries which have the same keys.
ea9132c: Add a fmt_dataset_list decorator to automatically inserts the list of available datasets to the docstrings of the load_sample_data function, to address Improve the documentation of load_sample_data and list_sample_data #1774.

Preview: https://pygmt-dev--2342.org.readthedocs.build/en/2342/api/generated/pygmt.datasets.load_sample_data.html

Fixes #1774 and supersedes #1814.

Reminders

Run make format and make check to make sure the code follows the style guide.
Add tests for new features or tests that would have caught the bug that you're fixing.
Add new public functions/methods/classes to doc/api/index.rst.
Write detailed docstrings for all functions/methods.
If wrapping a new module, open a 'Wrap new GMT module' issue and submit reasonably-sized PRs.
If adding new functionality, add an example to docstrings or tutorials.
Use underscores (not hyphens) in names of Python files and directories.

Slash Commands

You can write slash commands (/command) in the first line of a comment to perform
specific operations. Supported slash commands are:

/format: automatically format and lint the code
/test-gmt-dev: run full tests on the latest GMT development version

…_sample_data functions

…trings

pygmt/datasets/samples.py

seisman · 2023-02-01T00:19:40Z

pygmt/datasets/samples.py

+    texts = "\n        ".join(
+        f'- ``"{name}"``: {dataset.description}.' for name, dataset in datasets.items()
+    )
+
+    docstrings = module_func.__doc__.format(dataset_list=texts)
+    module_func.__doc__ = textwrap.dedent(docstrings)
+    return module_func


This decorator inserts the list of available datasets to the load_sample_data function by substituting the {dataset_list} placeholder with the following text strings:

- ``"bathymetry"``: Table of ship bathymetric observations off Baja California - ``"earth_relief_holes"``: Regional 20 arc-minutes Earth relief grid with holes - ...

However, because the placeholder {dataset_list} is indented by 8 whitespaces (see Line 341), substituting the placeholder with the text strings would give:

- ``"bathymetry"``: Table of ship bathymetric observations off Baja California - ``"earth_relief_holes"``: Regional 20 arc-minutes Earth relief grid with holes - ...

So at line 302, I have to use "\n " (newline with 8 whitespaces) to join the list, which is not ideal but I don't have a better solution for this.

pygmt/datasets/samples.py

willschlitzer · 2023-02-01T12:46:30Z

pygmt/datasets/samples.py

+    --------
+    load_sample_data : Load an example dataset from the GMT server.
+    """
+    return {name: dataset.description for name, dataset in datasets.items()}


Should there be a test_list_sample_data that has a dictionary of all of the functions/descriptions to test line 324?

Are you suggesting a test to check if the returned value is equal to:

{'bathymetry': 'Table of ship bathymetric observations off Baja California', 'earth_relief_holes': 'Regional 20 arc-minutes Earth relief grid with holes', 'fractures': 'Table of hypothetical fracture lengths and azimuths', 'hotspots': 'Table of locations, names, and symbol sizes of hotpots from Mueller et al., 1993', 'japan_quakes': 'Table of earthquakes around Japan from NOAA NGDC database', 'mars_shape': 'Table of topographic signature of the hemispheric dichotomy of Mars from Smith and Zuber (1996)', 'maunaloa_co2': 'Table of CO2 readings from Mauna Loa', 'notre_dame_topography': 'Table 5.11 in Davis: Statistics and Data Analysis in Geology', 'ocean_ridge_points': 'Table of ocean ridge points for the entire world', 'rock_compositions': 'Table of rock sample compositions', 'usgs_quakes': 'Table of global earthquakes from the USGS'}

In 518900d, I added a test to check if the returned value is in dict type.

I think that's fine; I was just going off of the codecov alert saying that like 324 was untested. My thought had been to use something like a list of the dataset names being equal to dict.keys() to avoid a very lengthy comparison, but I think your way works better.

Co-authored-by: Will Schlitzer <schlitzer90@gmail.com>

seisman · 2023-02-03T14:46:45Z

@GenericMappingTools/pygmt-maintainers It would be good if we can have one more reviewer since this PR brings some big changes.

pygmt/datasets/samples.py

maxrjones

Thanks for working on this, it'll be a big improvement!

My preference would be to document of the list of datasets manually as the output of a docstring example for list_sample_data (in which case the doctest should catch whether it's out of date). I would prefer avoiding the decorator/docstring inject trick when possible because it makes the docstrings less interpretable for IDEs like VSCode.

Co-authored-by: Yvonne Fröhlich <94163266+yvonnefroehlich@users.noreply.github.com>

seisman · 2023-02-06T02:28:41Z

I would prefer avoiding the decorator/docstring inject trick when possible because it makes the docstrings less interpretable for IDEs like VSCode.

That's a good point.

My preference would be to document of the list of datasets manually as the output of a docstring example for list_sample_data (in which case the doctest should catch whether it's out of date).

I'm OK with adding the the output of list_sample_data as a docstring example, but instead of adding it to the list_sample_data function, I feel it's more useful to add it to the load_sample_data function, i.e., in the load_sample_data function:

>>> # use list_sample_data to see the available datasets
>>> list_sample_data()
{'bathymetry': 'Table of ship bathymetric observations off Baja California',
 'earth_relief_holes': 'Regional 20 arc-minutes Earth relief grid with holes',
 'fractures': 'Table of hypothetical fracture lengths and azimuths',
 'hotspots': 'Table of locations, names, and symbol sizes of hotpots from  Mueller et al., 1993',
 'japan_quakes': 'Table of earthquakes around Japan from NOAA NGDC database',
 'mars_shape': 'Table of topographic signature of the hemispheric dichotomy of  Mars from Smith and Zuber (1996)',
 'maunaloa_co2': 'Table of CO2 readings from Mauna Loa',
 'notre_dame_topography': 'Table 5.11 in Davis: Statistics and Data Analysis in Geology',
 'ocean_ridge_points': 'Table of ocean ridge points for the entire world',
 'rock_compositions': 'Table of rock sample compositions',
 'usgs_quakes': 'Table of global earthquakes from the USGS'}
>>> # load the sample bathymetry dataset
>>> data = load_sample_data("bathymetry")

maxrjones · 2023-02-06T16:07:40Z

I would prefer avoiding the decorator/docstring inject trick when possible because it makes the docstrings less interpretable for IDEs like VSCode.

That's a good point.

My preference would be to document of the list of datasets manually as the output of a docstring example for list_sample_data (in which case the doctest should catch whether it's out of date).

I'm OK with adding the the output of list_sample_data as a docstring example, but instead of adding it to the list_sample_data function, I feel it's more useful to add it to the load_sample_data function, i.e., in the load_sample_data function:
>>> # use list_sample_data to see the available datasets
>>> list_sample_data()
{'bathymetry': 'Table of ship bathymetric observations off Baja California',
 'earth_relief_holes': 'Regional 20 arc-minutes Earth relief grid with holes',
 'fractures': 'Table of hypothetical fracture lengths and azimuths',
 'hotspots': 'Table of locations, names, and symbol sizes of hotpots from  Mueller et al., 1993',
 'japan_quakes': 'Table of earthquakes around Japan from NOAA NGDC database',
 'mars_shape': 'Table of topographic signature of the hemispheric dichotomy of  Mars from Smith and Zuber (1996)',
 'maunaloa_co2': 'Table of CO2 readings from Mauna Loa',
 'notre_dame_topography': 'Table 5.11 in Davis: Statistics and Data Analysis in Geology',
 'ocean_ridge_points': 'Table of ocean ridge points for the entire world',
 'rock_compositions': 'Table of rock sample compositions',
 'usgs_quakes': 'Table of global earthquakes from the USGS'}
>>> # load the sample bathymetry dataset
>>> data = load_sample_data("bathymetry")

sounds good to me

seisman · 2023-02-11T16:27:38Z

I tried to follow @maxrjones's suggestion and added a doctest for list_sample_data(), but things is becoming more complicated.

The default returned value of list_sample_data is a unformatted big dictionary, which is not readable and needs to be in a single line:

{'bathymetry': 'Table of ship bathymetric observations off Baja California', 'earth_relief_holes': 'Regional 20 arc-minutes Earth relief grid with holes', 'fractures': 'Table of hypothetical fracture lengths and azimuths', 'hotspots': 'Table of locations, names, and symbol sizes of hotpots from  Mueller et al., 1993', 'japan_quakes': 'Table of earthquakes around Japan from NOAA NGDC database', 'mars_shape': 'Table of topographic signature of the hemispheric dichotomy of  Mars from Smith and Zuber (1996)', 'maunaloa_co2': 'Table of CO2 readings from Mauna Loa', 'notre_dame_topography': 'Table 5.11 in Davis: Statistics and Data Analysis in Geology', 'ocean_ridge_points': 'Table of ocean ridge points for the entire world', 'rock_compositions': 'Table of rock sample compositions', 'usgs_quakes': 'Table of global earthquakes from the USGS'}

Then I try to use pprint(list_sample_data()) instead. It returns:

pprint(list_sample_data())
{'bathymetry': 'Table of ship bathymetric observations off Baja California',
 'earth_relief_holes': 'Regional 20 arc-minutes Earth relief grid with holes',
 'fractures': 'Table of hypothetical fracture lengths and azimuths',
 'hotspots': 'Table of locations, names, and symbol sizes of hotpots from  '
             'Mueller et al., 1993',
 'japan_quakes': 'Table of earthquakes around Japan from NOAA NGDC database',
 'mars_shape': 'Table of topographic signature of the hemispheric dichotomy '
               'of  Mars from Smith and Zuber (1996)',
 'maunaloa_co2': 'Table of CO2 readings from Mauna Loa',
 'notre_dame_topography': 'Table 5.11 in Davis: Statistics and Data Analysis '
                          'in Geology',
 'ocean_ridge_points': 'Table of ocean ridge points for the entire world',
 'rock_compositions': 'Table of rock sample compositions',
 'usgs_quakes': 'Table of global earthquakes from the USGS'}

Now it looks better but pylint reports that some lines are longer than 79 characters. So I only have two options:

Use pprint(list_sample_data(), indent=75) to make sure that the maximum line length < 80:

{'bathymetry': 'Table of ship bathymetric observations off Baja '
               'California',
 'earth_relief_holes': 'Regional 20 arc-minutes Earth relief grid with '
                       'holes',
 'fractures': 'Table of hypothetical fracture lengths and azimuths',
 'hotspots': 'Table of locations, names, and symbol sizes of hotpots from '
             'Müller et al. (1993)',
 'japan_quakes': 'Table of earthquakes around Japan from the NOAA NGDC '
                 'database',
 'mars_shape': 'Table of topographic signature of the hemispheric '
               'dichotomy of Mars from Smith and Zuber (1996)',
 'maunaloa_co2': 'Table of CO2 readings from Mauna Loa',
 'notre_dame_topography': 'Table 5.11 in Davis: Statistics and Data '
                          'Analysis in Geology',
 'ocean_ridge_points': 'Table of ocean ridge points for the entire world',
 'rock_compositions': 'Table of rock sample compositions',
 'usgs_quakes': 'Table of earthquakes from the USGS'}

Use pprint(list_sample_data(), indent=120) so that each entry is printed in a single line:

{'bathymetry': 'Table of ship bathymetric observations off Baja California',
 'earth_relief_holes': 'Regional 20 arc-minutes Earth relief grid with holes',
 'fractures': 'Table of hypothetical fracture lengths and azimuths',
 'hotspots': 'Table of locations, names, and symbol sizes of hotpots from Müller et al. (1993)',
 'japan_quakes': 'Table of earthquakes around Japan from the NOAA NGDC database',
 'mars_shape': 'Table of topographic signature of the hemispheric dichotomy of Mars from Smith and Zuber (1996)',
 'maunaloa_co2': 'Table of CO2 readings from Mauna Loa',
 'notre_dame_topography': 'Table 5.11 in Davis: Statistics and Data Analysis in Geology',
 'ocean_ridge_points': 'Table of ocean ridge points for the entire world',
 'rock_compositions': 'Table of rock sample compositions',
 'usgs_quakes': 'Table of earthquakes from the USGS'}

I choose option 2 because in my opinion it's more readable, but I have to add # noqa: W505 and # pylint: disable=line-too-long to disable flakeheaven and pylint errors:

pygmt/datasets/samples.py

michaelgrund

Looks fine!

seisman added 3 commits February 1, 2023 07:53

Move load_sample_dataset and list_sample_dataset to the end

22a8e48

Add the GMTSampleData class to simplify the load_sample_data and list…

4641649

…_sample_data functions

Add the fmt_dataset_list decorator to list available datasets in docs…

ea9132c

…trings

seisman added the maintenance Boring but important stuff for the core devs label Feb 1, 2023

seisman added this to the 0.9.0 milestone Feb 1, 2023

seisman commented Feb 1, 2023

View reviewed changes

pygmt/datasets/samples.py Outdated Show resolved Hide resolved

Fix a typo "rock_sample_compositions"->"rock_compositions"

96ddfd0

seisman commented Feb 1, 2023

View reviewed changes

seisman requested a review from maxrjones February 1, 2023 01:00

seisman changed the title ~~Simplify load_sample_data and list_sample_data functions~~ Add the GMTSampleData class to simplify the load_sample_data and list_sample_data functions Feb 1, 2023

seisman added the needs review This PR has higher priority and needs review. label Feb 1, 2023

Merge branch 'main' into simplify-load-samples

f768973

willschlitzer reviewed Feb 1, 2023

View reviewed changes

Update pygmt/datasets/samples.py

937ac6f

Co-authored-by: Will Schlitzer <schlitzer90@gmail.com>

maxrjones mentioned this pull request Feb 1, 2023

WIP: Add inline example for pygmt.datasets.list_sample_data #1814

Closed

6 tasks

Add a test for list_sample_data

518900d

willschlitzer approved these changes Feb 2, 2023

View reviewed changes

willschlitzer added final review call This PR requires final review and approval from a second reviewer and removed needs review This PR has higher priority and needs review. labels Feb 2, 2023

Merge branch 'main' into simplify-load-samples

c1390d1

seisman added 2 commits February 4, 2023 19:24

Merge branch 'main' into simplify-load-samples

d4ebcec

Merge branch 'main' into simplify-load-samples

a14553f

yvonnefroehlich reviewed Feb 4, 2023

View reviewed changes

pygmt/datasets/samples.py Outdated Show resolved Hide resolved

pygmt/datasets/samples.py Show resolved Hide resolved

pygmt/datasets/samples.py Outdated Show resolved Hide resolved

maxrjones reviewed Feb 4, 2023

View reviewed changes

Apply suggestions from code review

3376d30

Co-authored-by: Yvonne Fröhlich <94163266+yvonnefroehlich@users.noreply.github.com>

michaelgrund approved these changes Feb 7, 2023

View reviewed changes

Merge branch 'main' into simplify-load-samples

01c9267

seisman added 2 commits February 10, 2023 21:25

Remove the fmt_dataset_list decorator

664c123

Add inline example

ef5ca21

seisman marked this pull request as draft February 11, 2023 03:04

seisman removed the final review call This PR requires final review and approval from a second reviewer label Feb 11, 2023

seisman added 4 commits February 11, 2023 18:18

Merge branch 'main' into simplify-load-samples

b6e622f

Ignore the 'W505 doc line too long' warning

47e413b

Print in a long line

838c33f

Disable the pylint line-too-long error

dfc3b8e

seisman marked this pull request as ready for review February 11, 2023 16:27

seisman added the needs review This PR has higher priority and needs review. label Feb 11, 2023

seisman requested review from maxrjones, michaelgrund, willschlitzer and yvonnefroehlich February 11, 2023 16:29

seisman commented Feb 12, 2023

View reviewed changes

pygmt/datasets/samples.py Outdated Show resolved Hide resolved

Update pygmt/datasets/samples.py

c838644

seisman commented Feb 12, 2023

View reviewed changes

pygmt/datasets/samples.py Outdated Show resolved Hide resolved

Update pygmt/datasets/samples.py

c88688c

michaelgrund approved these changes Feb 12, 2023

View reviewed changes

seisman added final review call This PR requires final review and approval from a second reviewer and removed needs review This PR has higher priority and needs review. labels Feb 13, 2023

seisman merged commit 454380a into main Feb 14, 2023

seisman deleted the simplify-load-samples branch February 14, 2023 07:24

seisman removed the final review call This PR requires final review and approval from a second reviewer label Feb 14, 2023

Conversation

seisman commented Feb 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

seisman Feb 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

willschlitzer Feb 1, 2023

Choose a reason for hiding this comment

Uh oh!

seisman Feb 1, 2023

Choose a reason for hiding this comment

Uh oh!

seisman Feb 2, 2023

Choose a reason for hiding this comment

Uh oh!

willschlitzer Feb 2, 2023

Choose a reason for hiding this comment

Uh oh!

seisman commented Feb 3, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

maxrjones left a comment

Choose a reason for hiding this comment

Uh oh!

seisman commented Feb 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maxrjones commented Feb 6, 2023

Uh oh!

seisman commented Feb 11, 2023

Uh oh!

Uh oh!

Uh oh!

michaelgrund left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

seisman commented Feb 1, 2023 •

edited

Loading

seisman Feb 1, 2023 •

edited

Loading

seisman commented Feb 6, 2023 •

edited

Loading