Skip to content

Conversation

@PGijsbers
Copy link
Collaborator

See #646. Reprocess ARFF file if outdated datatype was used in pickle.

No unit tests are currently added. To add unit tests we would also need to add more data files (of outdated data) to the repository. That just doesn't seem worth it. Changes were tested locally by updating my local cache :) I will add unit tests if requested. The code changed in this patch is so minimal that I feel confident enough if the current unit tests don't break.

@PGijsbers
Copy link
Collaborator Author

PGijsbers commented Mar 30, 2019

I get a consistent error, but this also happens on the current develop head (tested locally):

   def test_get_runs_list_by_tag(self):
        # TODO: comes from live, no such lists on test
        openml.config.server = self.production_server
        runs = openml.runs.list_runs(tag='curves')
>       self.assertGreaterEqual(len(runs), 1)
E       AssertionError: 0 not greater than or equal to 1

These seem to be server issues. is my best bet. If I try to filter runs with the tag through the website then over 6k results are returned quickly, but through the API I got the following response:
image
image
@janvanrijn any clue?

edit: requesting the response through my webbrowser is similarly slow (and has the same result).
image

@janvanrijn
Copy link
Member

Seems like joining to the run tag column is extremely slow (I guess due to the large size).

This could be solved by adding an index on the tag name, at the expense of additional disk cost. We currently have 23M run tag records, mostly indicating stuff that can be obtained through other ways as well. I think we should review the utility of (run) tags, and when a library automatically adds a tag to a run.

@PGijsbers
Copy link
Collaborator Author

PGijsbers commented Mar 30, 2019

  1. But the website loads them extremely quickly, e.g. runs with 'curves'?
  2. It seemed to have worked before? git blame tells me this code was last edited 2017-03-02 14:34:54. And we have definitely had this test passed lots of times (though I realize there are more runs over time, and there could have been a proverbial final drop).

@janvanrijn
Copy link
Member

website runs on ES, which is an index build upon the database.

I am surprised that the query returns no results. It should either crash or return 6000+ results.

@PGijsbers
Copy link
Collaborator Author

It does seem odd. Either way, I think we're best served by skipping this unit test for now as it is not an issue with the Python code. So I'll go ahead and do that (also @mfeurer so he knows what's up too).

@janvanrijn
Copy link
Member

@PGijsbers feel free to open an issue about this in the main tracker, with tags 'CoreSystem' and 'highest priority' assigned to me. I can't solve it this week (ECML deadline) but it will be high on my priority list

@codecov-io
Copy link

codecov-io commented Apr 1, 2019

Codecov Report

Merging #654 into develop will decrease coverage by 0.02%.
The diff coverage is 90.19%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #654      +/-   ##
===========================================
- Coverage    90.96%   90.94%   -0.03%     
===========================================
  Files           32       32              
  Lines         3388     3391       +3     
===========================================
+ Hits          3082     3084       +2     
- Misses         306      307       +1
Impacted Files Coverage Δ
openml/datasets/dataset.py 88.51% <90.19%> (+0.1%) ⬆️
openml/runs/functions.py 86.43% <0%> (-0.22%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6b081c5...c9ee562. Read the comment docs.

@PGijsbers PGijsbers requested a review from mfeurer April 1, 2019 10:10
Copy link
Collaborator

@mfeurer mfeurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

@mfeurer mfeurer merged commit 7ec429e into develop Apr 1, 2019
@mfeurer mfeurer deleted the Fix646 branch April 1, 2019 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants