-
Notifications
You must be signed in to change notification settings - Fork 29
Description
While SSSOM is clearly focused on mappings between controlled vocabulary terms (ids which correspond to some entity in the world), there is a huge amount of work going on that is concerned with "literal" mappings -> mapping a string to a concept in an ontology/controlled vocabulary. These are byproducts for example of pipelines for named entity recognition tasks, or manual mapping efforts of curators that read papers and map strings they read to an ontology concept. I believe that the same concerns that we have for normal terminological mappings also apply here - (provenance, curation rules metadata), and therefore propose to develop a scheme by which we can represent literals in the subject_id field. One example would be to use URL encoded strings in a standard namespace:
SSSOM_LITERAL:Antinukle%C3%A4re%20Antik%C3%B6rper%20%28Kernantik%C3%B6rper%29
I believe this would take care of quite a few problems, and users can easily handle or ignore these kinds of subjects. Note that strictly speaking, this is already permitted by the spec of SSSOM - so we are not discussing here whether this should be allowed or not. The question is more a matter of discussing whether we should as a community recommend a "standard" way to handle literals.
I invite everyone to voice their support, ideas or concerns here.
see related: mapping-commons/sssom-py#28 (@cmungall also has some big plans to use these for mapping reconciliation).