The semantics for our schema is compatible with prevailing scientific practice.
The schemas we work with follow a relatively standard semantic pattern that may be called Entities, Properties, Values:
- Entities. Every table is associated with one specific type of entity/thing, with a documented definition. At a minimum the definition should be sufficient for the reader to know how, in principle, to identify the set of entities of that type present in a given real-life context. Ideally each instance is borne by a definite spatiotemporal extent, consistent with a strong principle of ontological realism. Every row of the table, or record, should correspond to exactly 1 element from this set. Such tables are generally assumed to be incomplete, not necessarily containing a record for all elements in the set.
- Properties. Every column of a table corresponds to one kind of property that makes sense for the given entity type, with a documented definition. In functional terms, what this really means is that the definition should be sufficient for the reader to know how, in principle, to identify when a real-life entity has a given value for this property.
- Values. Each value should have a definition that is sufficient for the reader to know how, in principle, to identify when a real-life entity has this specific value for the given property.
Thus the schema will refer to 3 types of patterns – entities, properties, and values – and each will have their own documentation in the library of definitional annotations.
Note: The distinction between properties and values may be a matter of convenience. In case of a property with a small, idiosyncratic value set, the property may be disintegrated into distinct disjunctive properties, and the values deprecated. Or, the other way around, many related disjunctive properties may be bound together into a property and new values minted in place of the former properties.
Important: The phrase ‘in principle’ here is intended to provoke the reader to think about what is meant in reality under the conditions of complete knowledge. We often habitually limit our thinking by the epistemological constraints of our data. But this habit tends to allow data processing to proceed in an unstable ‘open loop’ manner. For example, a biological cell is not just a pattern of pixel values in an acquired image, even if the data pertaining to a given cell in a given study might consist solely of such pixel values. It is important – and not as difficult as it might seem – to specify what would count as a biological cell in complete generality, assuming perfect knowledge (e.g. the location of membrane, thickness, shape, lineage, position in tissue, etc.). Such definitions do not have to be perfect to be useful.
Note: We are aiming for a more rigid schema than what you might find backed by, for example, SKOS (Simple Knowledge Organization System), with its rather loose notion of ‘concepts’. The schema is however generally compatible with OWL.