Skip to content

Restore ability to use vocab language different from project language#613

Merged
osma merged 3 commits intomasterfrom
fix-vocab-language
Aug 24, 2022
Merged

Restore ability to use vocab language different from project language#613
osma merged 3 commits intomasterfrom
fix-vocab-language

Conversation

@osma
Copy link
Copy Markdown
Member

@osma osma commented Aug 24, 2022

The ability to use a vocabulary language different from the project language was implemented in PR #600, but subsequently broken by mistake in PR #608. For example, it should be possible to use vocab=lcsh(en) in a project with language=fi where all documents are in Finnish but English language labels are used for LCSH concepts (which don't even have Finnish labels) both when reading corpora and outputting results.

This PR aims to restore that functionality by making sure that

  1. When reading corpora in the directory-based format, labels are compared to vocabulary labels in the vocabulary language;
  2. When performing suggest operations (CLI or REST), the labels of suggested subjects are in the vocabulary language;
  3. When writing an evaluation results file which contains subject labels, the labels will be in the vocabulary language.

Currently there are unit tests to verify item 2. above, but not 1. or 3.

Also some of the test vocabularies were renamed and repurposed to better match current needs.

@osma osma added the bug label Aug 24, 2022
@osma osma added this to the 0.59 milestone Aug 24, 2022
@osma osma self-assigned this Aug 24, 2022
@sonarqubecloud
Copy link
Copy Markdown

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

No Coverage information No Coverage information
0.0% 0.0% Duplication

@codecov
Copy link
Copy Markdown

codecov bot commented Aug 24, 2022

Codecov Report

Merging #613 (26c134d) into master (c291930) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #613   +/-   ##
=======================================
  Coverage   99.58%   99.58%           
=======================================
  Files          87       87           
  Lines        5840     5850   +10     
=======================================
+ Hits         5816     5826   +10     
  Misses         24       24           
Impacted Files Coverage Δ
annif/cli.py 99.63% <ø> (ø)
annif/rest.py 100.00% <ø> (ø)
tests/test_cli.py 100.00% <100.00%> (ø)
tests/test_config.py 100.00% <100.00%> (ø)
tests/test_project.py 100.00% <100.00%> (ø)
tests/test_rest.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@osma osma marked this pull request as ready for review August 24, 2022 07:48
@osma
Copy link
Copy Markdown
Member Author

osma commented Aug 24, 2022

I think this is good enough for now. Need to get moving with the load-vocabulary command (#602) which will likely be touching some of the same bits of code anyway.

@osma osma merged commit 576c7b7 into master Aug 24, 2022
@osma osma deleted the fix-vocab-language branch August 24, 2022 07:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant