Skip to content

DomainOntologiesDevelopment

Emanuele Ghedini edited this page Dec 1, 2025 · 1 revision

Development of EMMO-based domain ontologies

Based on lessons learned from a range of EU projects (OntoTrans, OntoCommons, OpenModel, DOME 4.0, CoBRAIN and MatCHMaker), it is recommended that the development domain ontologies follows a general methodology consisting of the following steps:

Each of these steps are further elaborated on in the following sub-sections. Note that even through these steps follow a logical order, the development of a domain ontology is in reality an iterative process.

Use case

In order to work efficiently and avoid misunderstandings, it is extremely important to start by defining and agreeing on the purpose and what you want to express with the domain ontology. It is very helpful to have one or a few use cases in mind before you start defining the concepts. The use case, purpose and scope of the ontology can be documented in the ontology itself using dcterms:abstract from Dublin Core. See Section Ontology metadata for additional recommendations for how to document a domain ontology.

Classes

Define the classes that are needed for the use case.

Figure 1. Basic types of classes.

A good starting point is to consider the basic set of EMMO classes shown in Figure 1. These classes are further elaborated on below:

  • Process: The different types of processes that are needed. Processes always have a temporal evolution. Common types of processes include Measurement, Simulation and TechnologicalProcess (e.g. manufacturing). Each of these may be a workflow consisting of a set of tasks.

  • Object: Are the complement of Process, hence objects maintains their identity through their life. Objects may computer systems, production systems, files on a hard drive, etc. Object individuals are typically also individual of other classes. For instance, objects that are input and output of a process, will typically also be Data or Material.

  • Data: In EMMO data is contrast (variation of properties) that is encoded by an agent and that can be decoded by another agent according to a specific rule. Common Data subclasses includes Datum (self-consistent encoded data entity), Dataset (encoded data made of more than one datum) and Software (encoded data).

    A software is encoded data residing on a memory storage. During a simulation process, a temporal part of the software individual is loaded into the CPU and executed. Hence, it is only a temporal part of the software that takes part in the simulation (and may be assigned a role in the simulation). This distinction is important, since the same software can be used for different purposes and may therefore have different roles in different simulations.

  • Material: An amount of matter substance. A material individual (e.g. nitrogen) can represent any state of matter or phase. It will typically also be an individual of other classes (e.g. Gas), revealing the actual form in which the material is found. Since EMMO is nominalistic, objects that have a physical manifestation can also be regarded as a material.

  • Property: A sign that stands for an object, obtained through a well-defined procedure (like a characterisation procedure or a modelling workflow). [Quantities] are most likely the most relevant types of properties in physics-based domain ontologies. For more background, please refer to the sections about semiotics and properties in the introduction.

    A property is atomic in the sense that it aims to deliver one and only one aspect of the object according to one code, such as the color with one sign (e.g., black) or a quantitative property (e.g., 1.4 kg). If more than one aspect of the object is described, it is a Description (which is a collection of properties).

  • Role: An entity that contributes to a Whole (and is related to it through a parthood relations). An important type of role is Agent, which is a participant to a process that drives the process. See holism for more details.

Classes are organised hierarchically, in a so-called taxonomy (categorisation according to generality). For example:

Object <-- Constituent <-- MicrostructureConstituent <-- Grain <-- CementGrain <-- UnhydratedCementGrain

It may be helpful to use a simple graphical tool to visualise how the classes are related taxonomically.

All classes should at minimum have a unique IRI, a descriptive label and a description (elucidation).

Relations

EMMO is anchored in a novel mereocausal theory providing a consistent and expressive language for expressing relations between the objects and processes. Mereocausality is the combination of mereology (theory of whole and parts) and causality (theory of cause and effect). Figure 3 and Figure 4 show a 4D graphical representation of mereological relations between objects and processes.

Figure 3. Part-of relations. An entity that is totally comprised into another. All these relations are subrelations of hasPart. Each case is labelled according to how the grey boxes relate to the white ones. Objects are labelled O and represented by a solid lined box, processes are labelled P and represented by a dashed lined box. The subject of the triple is grey, the object of the triple is white.

Figure 4. Proper overlap relations. Two entities that overlap and neither of them is a part of the other. All these relations are subrelation of properOverlaps. Each case is labelled according to how the grey boxes relate to the white ones as described in Figure 3. I can be either an object or process (I stands for Item).

Figure 5 shows different causal relations between object or process entities. Whether these relations are directed or symmetric determines whether the relation has a temporal or spatial nature.

Figure 5. Causal relations. Can be temporal (unfold in time, directed) or spatial (unfold in space, symmetric) as well as direct or indirect.

Finally, Figure 6 shows a somewhat arbitrary selection of more high-level relations between some important types of objects and processes. Here is a brief description of these relations (object properties):

  • hasInput/hasOutput: Relates a process to its input and output.
  • isAfter: A causal relation that relates a process step to the previous process step (essentially specifying a workflow, where each of the process step are a task, i.e. a temporal part of the workflow). isAfter only implies that the subject of the relation comes after the object, but says nothing about whether this is a direct or indirect causal relation. This make sense in many workflows that can be further detailed by inserting substeps between the specified process steps.
  • processedFrom: A causal relation that relates the output to the input of a process. This relation is inferred by the reasoner.
  • hasAgent: Relates a process caused by an agent (a so-called Agency) to the agent.
  • participatesTo: A general proper overlap relation (see Figure 4) that relates an object to a process. In Figure 6. it is used to relate the software to the a simulation process.
  • hasProperty/hasDescription: Semiotic relation that relates the object (referent) in a semiotic process to a property/description assigned to the object by an interpreter.
  • hasPortionPart: A general parthood relation that relates an Item to one of its proper parts. In Figure 6. it is used to relate a description to one of its property parts. See mereocausality for other parthood relations and more details.

Figure 6. Basic relationships between core concepts. The dual nature of Dataset/Description and Datum/Property is not necessary, but common when dealing with experimental and modelling datasets that have been determined through a semiotic process.

Datatypes

In use cases where you want to store the actual data in the knowledge base, a datatype should be assigned to the datum/properties.