Make vocabularies multilingual by osma · Pull Request #600 · NatLibFi/Annif

osma · 2022-08-04T07:41:27Z

This PR implements #559 - making vocabularies multilingual, so that there is no need to use separate language-specific vocabulary id's such as yso-fi, yso-sv and yso-en. Instead the vocabulary id yso can be used for all projects and the loadvoc command needs to be executed just once as it will detect which languages are available in the vocabulary and load the labels in all available languages.

It's also possible to define/override the language of labels, for example to use vocab=lcsh(en) in a Finnish language project.

The changes need to be carefully tested as they are quite disruptive. Documentation (including the Annif tutorial) should be updated, mainly by stripping language suffixes from vocabulary id's in examples. However, old examples (vocabulary id's with a language suffix) should still keep working.

codecov · 2022-08-04T07:45:47Z

Codecov Report

Merging #600 (b258873) into master (b6a1363) will decrease coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #600      +/-   ##
==========================================
- Coverage   99.54%   99.52%   -0.02%     
==========================================
  Files          86       86              
  Lines        5653     5695      +42     
==========================================
+ Hits         5627     5668      +41     
- Misses         26       27       +1

Impacted Files	Coverage Δ
annif/cli.py	`99.63% <100.00%> (ø)`
annif/corpus/skos.py	`100.00% <100.00%> (ø)`
annif/corpus/subject.py	`100.00% <100.00%> (ø)`
annif/project.py	`99.39% <100.00%> (ø)`
annif/vocab.py	`95.50% <100.00%> (-0.15%)`	⬇️
tests/conftest.py	`100.00% <100.00%> (ø)`
tests/test_cli.py	`100.00% <100.00%> (ø)`
tests/test_project.py	`100.00% <100.00%> (ø)`
tests/test_vocab.py	`100.00% <100.00%> (ø)`
tests/test_vocab_skos.py	`100.00% <100.00%> (ø)`
... and 1 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

osma · 2022-08-05T13:12:53Z

Rebased on current master (which now contains PR #597 that was the starting point of this branch) and force-pushed.

sonarqubecloud · 2022-08-05T15:11:35Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
6 Code Smells

No Coverage information
0.0% Duplication

osma · 2022-08-05T15:25:02Z

This is more or less done now, pending review.

I tried to fix the issues reported by QA tools. Code Climate still complains about load_vocabulary, but I can't figure out how to make it better. Codecov says there's one more missed line than before in annif.vocab, but I can't find it in the detailed report.

lgtm-com · 2022-08-05T15:56:25Z

This pull request introduces 1 alert when merging b258873 into b6a1363 - view on LGTM.com

new alerts:

1 for Module is imported with 'import' and 'import from'

juhoinkinen

LGTM

osma · 2022-08-08T07:18:49Z

Thanks for reviewing @juhoinkinen ! I will merge this now, although we still need to do more testing before the next release, however there are other related changes to the vocabulary functionality coming up (possibly e.g. #602) and it makes sense to test them all in one go.

osma added the enhancement label Aug 4, 2022

osma added this to the 0.59 milestone Aug 4, 2022

osma self-assigned this Aug 4, 2022

osma mentioned this pull request Aug 4, 2022

loadvoc command should take a vocabulary id, not project id #602

Closed

osma added 7 commits August 5, 2022 16:11

get available languages from SKOS vocabulary

8cbfffa

make SubjectFileSKOS language-agnostic

72dd66b

make subject index filenames language-specific, e.g. subjects.en.tsv

f38ac54

store subject index in all available (SKOS) vocab languages

a0ea2d3

use language-agnostic vocabulary id's in test configuration

b67bb1c

enable overriding of vocab language in configuration, e.g. lcsh(en)

cca7ca8

handle case where SKOS lacks language tags & fix comments

4d78bb5

osma force-pushed the issue559-multilingual-vocabularies branch from b1aa810 to 4d78bb5 Compare August 5, 2022 13:12

osma added 5 commits August 5, 2022 17:36

refactor languages property

7388f78

refactor vocabulary creation

8204368

refactor languages property (again), now using set comprehension

baa3b2f

refactor load_vocabulary slightly

4e62fc9

Add test for invalid vocabulary specifier

b258873

osma marked this pull request as ready for review August 5, 2022 15:23

osma requested a review from juhoinkinen August 5, 2022 15:25

juhoinkinen approved these changes Aug 8, 2022

View reviewed changes

osma merged commit 954a8d9 into master Aug 8, 2022

osma deleted the issue559-multilingual-vocabularies branch August 8, 2022 07:19

osma mentioned this pull request Aug 8, 2022

Share vocabulary objects between projects #603

Closed

osma mentioned this pull request Aug 15, 2022

multilingual SubjectIndex backed by CSV file #608

Merged

osma mentioned this pull request Aug 24, 2022

Restore ability to use vocab language different from project language #613

Merged

osma mentioned this pull request Sep 23, 2022

Multilingual vocabularies #559

Closed

osma mentioned this pull request Sep 22, 2023

optimization: load a vocabulary only once even if used in different languages #736

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make vocabularies multilingual#600

Make vocabularies multilingual#600
osma merged 12 commits intomasterfrom
issue559-multilingual-vocabularies

osma commented Aug 4, 2022 •

edited

Loading

Uh oh!

codecov bot commented Aug 4, 2022 •

edited

Loading

Uh oh!

osma commented Aug 5, 2022

Uh oh!

sonarqubecloud bot commented Aug 5, 2022

Uh oh!

osma commented Aug 5, 2022

Uh oh!

lgtm-com bot commented Aug 5, 2022

Uh oh!

juhoinkinen left a comment

Uh oh!

osma commented Aug 8, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

osma commented Aug 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Aug 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

osma commented Aug 5, 2022

Uh oh!

sonarqubecloud bot commented Aug 5, 2022

Uh oh!

osma commented Aug 5, 2022

Uh oh!

lgtm-com bot commented Aug 5, 2022

Uh oh!

juhoinkinen left a comment

Choose a reason for hiding this comment

Uh oh!

osma commented Aug 8, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

osma commented Aug 4, 2022 •

edited

Loading

codecov bot commented Aug 4, 2022 •

edited

Loading