Skip to content

Conversation

@desilinguist
Copy link
Collaborator

@desilinguist desilinguist commented Feb 3, 2021

This PR closes #656.

  • Right now, this method basically assumes that all featuresets will be sparse and, hence, uses the .data attribute to get a flattened version of the feature array. However, this does not work for featuresets with dense feature arrays (see the issue for a detailed description of the problem).
  • To handle dense feature arrays, we can use the .flat attribute that returns a 1-dimensional iterator over the dense array.
  • Added three new tests:
    • 2 tests that train SKLL models -- one regression & one classification -- with a non-sparse featureset (which would not have worked before this PR)
    • 1 to trigger the other check from this method -- regression with string labels -- which is not something we are testing for currently.

- Right now, this method basically assumes that all featuresets will be sparse and hence uses the `.data` attribute to get a flattened version of the feature array. However, this will not work for featuresets with dense feature arrays. To handle those, we need to use the `.flat` iterator that is available for dense arrays.
@codecov
Copy link

codecov bot commented Feb 3, 2021

Codecov Report

Merging #658 (a9321a4) into main (32823c0) will increase coverage by 0.03%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #658      +/-   ##
==========================================
+ Coverage   95.06%   95.09%   +0.03%     
==========================================
  Files          27       27              
  Lines        3098     3101       +3     
==========================================
+ Hits         2945     2949       +4     
+ Misses        153      152       -1     
Impacted Files Coverage Δ
skll/learner/__init__.py 96.47% <100.00%> (+0.19%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 32823c0...a9321a4. Read the comment docs.

@desilinguist desilinguist merged commit f1a1e7f into main Feb 3, 2021
@delete-merged-branch delete-merged-branch bot deleted the 656-fix-check-input-formatting-for-dense-featuresets branch February 3, 2021 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Learner._check_input_formatting() does not work for dense featuresets

4 participants