Traditional APIs (Application Programming Interfaces) solve the problem of coordination between two modules which must programmatically interact. Usually one module is the “libary” and one module is the “application” which uses the library’s functions. By providing the “library user” with a documented promise of behavior for a given function, the “library developer” is free to make changes to the implementation without notifying the users, and the users can use the library without requiring the library developer to clarify the behavior. This pattern – a type of decoupling pattern – is useful even when the “libary users” and “library developers” are the same people, because it separates concerns, enabling independent modification of separate modules.

There are two main reasons that the API pattern is insufficient, on its own, for contemporary scientific software:

  • Complexity. The data inputs consumed and data outputs produced by our tools are increasingly complex. The normal practices for documentation in an API are strained by this complexity. In response, Python APIs, for example, tend to relegate types to dictionaries with undocumented key-value pairs – a dodge that forces users to inspect source code to infer behavior. This is a frequent source of leaky abstractions.
  • Accuracy and semantics. In doing science, and similarly rigorous work, it is necessary at the time of writing or modifying software to understand the actual meaning of the inputs and outputs. Each input data item stands for a truthful record about something (also, “data are individual facts”), and each output data item stands for a claim that you are making to the users in the next link of the chain. If you produce wrong claims, the error propagates to all subsequent users. If you produce unclear or incomplete claims, not quite the same as wrong, downstream users are blocked, unable to act.

The result is that much time is spent on error-prone and repetitious “data cleaning” due to the circulation of messy data. A primary mechanism of the ADI framework is to shun messy data at every turn, preventing its circulation.