Skip to content

Representing literal values in SSSOM #81

@matentzn

Description

@matentzn

While SSSOM is clearly focused on mappings between controlled vocabulary terms (ids which correspond to some entity in the world), there is a huge amount of work going on that is concerned with "literal" mappings -> mapping a string to a concept in an ontology/controlled vocabulary. These are byproducts for example of pipelines for named entity recognition tasks, or manual mapping efforts of curators that read papers and map strings they read to an ontology concept. I believe that the same concerns that we have for normal terminological mappings also apply here - (provenance, curation rules metadata), and therefore propose to develop a scheme by which we can represent literals in the subject_id field. One example would be to use URL encoded strings in a standard namespace:

SSSOM_LITERAL:Antinukle%C3%A4re%20Antik%C3%B6rper%20%28Kernantik%C3%B6rper%29

I believe this would take care of quite a few problems, and users can easily handle or ignore these kinds of subjects. Note that strictly speaking, this is already permitted by the spec of SSSOM - so we are not discussing here whether this should be allowed or not. The question is more a matter of discussing whether we should as a community recommend a "standard" way to handle literals.

I invite everyone to voice their support, ideas or concerns here.

see related: mapping-commons/sssom-py#28 (@cmungall also has some big plans to use these for mapping reconciliation).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions