Overview of components
A complete implementation consists of:
- The syntax for a linked data table schema (formalized with SQL, OWL, and Frictionless Data).
- A library of definitional annotations meant to provide the semantics for all elements mentioned in the schema, backed by the Entities/Properties/Values pattern.
The semantic annotations may be organized into domains. Items (1) and (2) together might be called the “data model” for a given domain. You can use existing domains or write your own for very application-specific software.
Such a data model may support formal annotations of source code functions, for a variety of programming languages, indicating that input/output tables or data elements conform to a part of the schema, and automatic syntactical validation at the function-call level; a data-based stack trace.
In addition to this semantic enrichment, the controlled syntax allows general-purpose:
- loaders
- serializers
- transformation (e.g. reshaping)
- data creation
- validation, or data integrity checks
Why tabular?
The choice of linked/relational data tables is not the only possible choice. For example, GraphQL is associated with a different data model, which might be called the “object composition hierarchy” model, where every data item is a recursive list of smaller subitems. As another example, the FHIR specification is associated with a closely-related “document-oriented” data model.
Our choice is intended to make authoring accurate schemas as easy as possible, and to allow a common semantic interpretation to be applicable both to front-end API and to frequently occurring low-level back-end concrete realizations:
- SQL database
- Pandas data frames
- R data frames
- CSV/TSV files
- A multi-sheet Excel file
- Lists of lists
- Lists of dictionaries
- HTML tables