Skip to content

vocabulary in SKOS (Turtle serialization) should be loaded even in case of lacking language tags #556

@macsag

Description

@macsag

Hi!
I've noticed some potentially inconsistent behaviour of the annif loadvoc command when loading voacabulary in ttl SKOS without language tags.

We've currently switched in our project from simple tsv vocabulary format to SKOS. I assumed that since we do not provide any information about language in tsv file, we don't necessarily have to add appropriate language tags in SKOS either. But it turned out, that loading vocabulary in SKOS without language tags prevents annif from creating the subject index (though original ttl file is being copied and gzipped file is being dumped).

It seems, that when annif converts tsv to SKOS, it adds language tags (they're based upon language configuration in projects.cfg), but when it loads vocabulary directly from SKOS format, it checks if language tag for a label is the same as language code in projects.cfg, and when it's not or if there is no language tag at all it skips the whole concept:

def get_concept_labels(self, concept, label_types, language):
    return [str(label)
            for label_type in label_types
            for label in self.graph.objects(concept, label_type)
            if label.language == language]
@property
    def subjects(self):
        for concept in self.concepts:
            labels = self.get_concept_labels(
                concept, [SKOS.prefLabel, RDFS.label], self.language)
            notation = self.graph.value(concept, SKOS.notation, None, any=True)
            if not labels:
                continue
            label = labels[0]
            if notation is not None:
                notation = str(notation)
            yield Subject(uri=str(concept), label=label, notation=notation,
                          text=None)

I think, that maybe it would be safer to assume, that when there is no explicit information about the language in SKOS file (there is a label without the language tag), its language corresponds with the language defined in projects.cfg and skip a concept (or label from the concept) only when the language tag truly exists and is not equal to the language from projects.cfg.

If this change is not possible for some reasons, it would be nice to provide some information about this behaviour in annif wiki (it took me some time to find this "bug").

Maciej Sagata, National Library of Poland

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions