Increase number per items on CSV, JSON searches #255

vinisalazar · 2022-06-20T05:49:18Z

Hi,

this is in relation to #244. I wasn't sure of the best way to go about this, so I tried keeping it as simple as possible. Please let me know if there's an obvious way that I'm missing.

What I did was simply to set the number of items_per_page to a big number (initially it was 1e9, but I thought that was excessive?) if the response is either 'csv' or 'json'. This doesn't require actually making the request, it just builds the URL. I couldn't find a "number of records" variable in the search response. I suppose I could set the items_per_page variable to that, but that would require making the request, and I understand that we just want the URL. Removing the items_per_page or page variables from the URL didn't make any difference.

To test, I added a fixture for the gliders server, since the other two had less than 1000 records. I did not commit the cassettes since they turned out fairly large, at around 3MB each. Do you have any suggestions on what to do about this?

Summary of changes

Modify variable items_per_page if response is either 'csv' or 'json'
Add new fixture gliders to test_to_objects.py
Add tests test_csv_search and test_json_search

Please feel free to request any changes, and don't forget to add the gsoc-2022 label :)

Thank you,
Vini

…s#244) - Set items_per_page variable to 1e6 if response is either csv or json

- Add new fixture 'gliders' - has more than 1000 datasets - Adds tests 'test_csv_search' and 'test_json_search' that check if the response has more than 1000 records

vinisalazar · 2022-06-21T03:56:42Z

I think that the test failing here may be fixed by #253.

abkfenris · 2022-06-21T14:59:09Z

While you could write a mock response generator, you then would be testing that more so than .get_search_url().

In this case, I think we can skip calling the endpoints, and instead inspect the URL returned from .get_search_url() directly to make sure that it is the formatting differently based on the response type.

Additionally we could leave these tests in in addition to testing the URL, mark them, and have pytest skip them by default. Then maybe we set up a weekly cron driven actions workflow to run those tests and update the VCR cassettes to do a fuller test run?

vinisalazar · 2022-06-21T23:38:43Z

Hi @abkfenris, sounds good to me, I'll add new tests that check the URL. However, I'm still thinking we should drop the test on the CoastWatch server? That seems to be behaving inconsistently. Please let me know what you think works best.

Thanks,
Vini

abkfenris · 2022-06-22T00:17:36Z

Ya, avoiding the Coastwatxh server might be best with Bobs aggressive blocklist.

vinisalazar · 2022-06-22T01:21:37Z

Do you think you could merge #253 then? Please let me know if you'd like me to make any changes to that.

Thanks,
Vini

erddapy/erddapy.py

- On method ERDDAP.get_search_url, add all CSV, JSON and TSV-type responses to variable 'non_paginated_responses'

ocefpaf · 2022-07-28T13:58:08Z

@vinisalazar do you mind rebasing this one to fix the conflict?

Please let me know if you need help with that!

…s#244) - Set items_per_page variable to 1e6 if response is either csv or json

- Add new fixture 'gliders' - has more than 1000 datasets - Adds tests 'test_csv_search' and 'test_json_search' that check if the response has more than 1000 records

- On method ERDDAP.get_search_url, add all CSV, JSON and TSV-type responses to variable 'non_paginated_responses'

vinisalazar · 2022-07-28T14:14:28Z

@ocefpaf I had a go at doing it, not sure if I got it right. I can reset the branch and redo it if needed.

vinisalazar · 2022-07-28T14:30:42Z

I'll definitely accept the offer for helping out with this one!

ocefpaf · 2022-07-28T15:09:07Z

Looks like you got it. At least the original changes are in there, only one tests is failing.

tests/test_to_objects.py

erddapy/erddapy.py

- Improve comment in function 'get_search_url' - Move duplicated comment to fixture 'gliders'

vinisalazar · 2022-07-28T16:02:13Z

Thank you for the review @ocefpaf!

vinisalazar added 4 commits June 20, 2022 15:40

Modify 'get_search_url' to return all items if non-html response (ioo…

f0b40a2

…s#244) - Set items_per_page variable to 1e6 if response is either csv or json

Add test for CSV and JSON searches

05fb3d6

- Add new fixture 'gliders' - has more than 1000 datasets - Adds tests 'test_csv_search' and 'test_json_search' that check if the response has more than 1000 records

Remove VCR marking while cassettes are not committed

e07fb82

Update cassette for new csv search method

b0c7e86

ocefpaf reviewed Jul 8, 2022

View reviewed changes

erddapy/erddapy.py Outdated Show resolved Hide resolved

ocefpaf reviewed Jul 8, 2022

View reviewed changes

erddapy/erddapy.py Show resolved Hide resolved

Mark more response types as 'non_paginated'

9346051

- On method ERDDAP.get_search_url, add all CSV, JSON and TSV-type responses to variable 'non_paginated_responses'

vinisalazar added 5 commits July 28, 2022 11:09

Modify 'get_search_url' to return all items if non-html response (ioo…

84a2449

…s#244) - Set items_per_page variable to 1e6 if response is either csv or json

Add test for CSV and JSON searches

a400086

- Add new fixture 'gliders' - has more than 1000 datasets - Adds tests 'test_csv_search' and 'test_json_search' that check if the response has more than 1000 records

Remove VCR marking while cassettes are not committed

e5463ba

Mark more response types as 'non_paginated'

cfbce9d

- On method ERDDAP.get_search_url, add all CSV, JSON and TSV-type responses to variable 'non_paginated_responses'

Updating cassette with 'main' branch

808f942

vinisalazar closed this Jul 28, 2022

vinisalazar force-pushed the issue-244 branch from 808f942 to fc44c01 Compare July 28, 2022 14:26

vinisalazar reopened this Jul 28, 2022

update only this cassette

b38b652

ocefpaf reviewed Jul 28, 2022

View reviewed changes

tests/test_to_objects.py Outdated Show resolved Hide resolved

ocefpaf reviewed Jul 28, 2022

View reviewed changes

erddapy/erddapy.py Outdated Show resolved Hide resolved

Code review for ioos#255

ec79c31

- Improve comment in function 'get_search_url' - Move duplicated comment to fixture 'gliders'

ocefpaf merged commit b6a1f7e into ioos:main Jul 28, 2022

ocefpaf added the GSoC22 label Jul 28, 2022

vinisalazar deleted the issue-244 branch August 31, 2022 17:08

Increase number per items on CSV, JSON searches #255

Increase number per items on CSV, JSON searches #255

Uh oh!

Conversation

vinisalazar commented Jun 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vinisalazar commented Jun 21, 2022

Uh oh!

abkfenris commented Jun 21, 2022

Uh oh!

vinisalazar commented Jun 21, 2022

Uh oh!

abkfenris commented Jun 22, 2022

Uh oh!

vinisalazar commented Jun 22, 2022

Uh oh!

Uh oh!

Uh oh!

ocefpaf commented Jul 28, 2022

Uh oh!

vinisalazar commented Jul 28, 2022

Uh oh!

vinisalazar commented Jul 28, 2022

Uh oh!

ocefpaf commented Jul 28, 2022

Uh oh!

Uh oh!

Uh oh!

vinisalazar commented Jul 28, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vinisalazar commented Jun 20, 2022 •

edited

Loading