-
Notifications
You must be signed in to change notification settings - Fork 46
Description
Hi!
I've noticed some potentially inconsistent behaviour of the annif loadvoc command when loading voacabulary in ttl SKOS without language tags.
We've currently switched in our project from simple tsv vocabulary format to SKOS. I assumed that since we do not provide any information about language in tsv file, we don't necessarily have to add appropriate language tags in SKOS either. But it turned out, that loading vocabulary in SKOS without language tags prevents annif from creating the subject index (though original ttl file is being copied and gzipped file is being dumped).
It seems, that when annif converts tsv to SKOS, it adds language tags (they're based upon language configuration in projects.cfg), but when it loads vocabulary directly from SKOS format, it checks if language tag for a label is the same as language code in projects.cfg, and when it's not or if there is no language tag at all it skips the whole concept:
def get_concept_labels(self, concept, label_types, language):
return [str(label)
for label_type in label_types
for label in self.graph.objects(concept, label_type)
if label.language == language]
@property
def subjects(self):
for concept in self.concepts:
labels = self.get_concept_labels(
concept, [SKOS.prefLabel, RDFS.label], self.language)
notation = self.graph.value(concept, SKOS.notation, None, any=True)
if not labels:
continue
label = labels[0]
if notation is not None:
notation = str(notation)
yield Subject(uri=str(concept), label=label, notation=notation,
text=None)
I think, that maybe it would be safer to assume, that when there is no explicit information about the language in SKOS file (there is a label without the language tag), its language corresponds with the language defined in projects.cfg and skip a concept (or label from the concept) only when the language tag truly exists and is not equal to the language from projects.cfg.
If this change is not possible for some reasons, it would be nice to provide some information about this behaviour in annif wiki (it took me some time to find this "bug").
Maciej Sagata, National Library of Poland