wiki:FunctionalCuration

Functional Curation

The Web Lab

Link to the workshop page

Introduction

Virtual experiments schematic

One of the key issues in computational and mathematical biology is the way we integrate models and data. Current approaches are typically ad-hoc and disconnected, and so a completely different strategy is needed. Central to recent modelling efforts is the integration of existing biophysical mathematical models that represent specific physiological function, and examining how this function changes in the presence of novel pathologies or pharmacological agents. A quantitative model should provide an unambiguous and testable description of a proposed mechanism. However, today models are probably the least reproducible type of research outputs. Even different implementations of purportedly the same model may yield significantly different results. While some community standards for representing models themselves exist, tasks such as comparing different hypotheses against experimental data, determining a model's suitability or limitations for a particular study, or incremental development of models, are still challenging and often performed inadequately.

To address these issues, we are building a system for performing “functional curation” of models. The key idea underpinning functional curation is that when mathematical and computational models are being developed and curated the primary goal should be the continuous validation of those models against experimental data. To achieve this goal, it must be possible to simulate in the computational models precisely the same protocols used in generating the experimental data on which the models are based. The two data sets (experimental and simulated) should be curated together, and as new competing models of the same biological system are developed they can then be compared directly with existing models through use of the same (and any new) protocols. In this way a repository of knowledge is built up, with existing experimental data captured within the theoretical models in a precisely quantified manner, allowing models to be extended and re-used by other members of the community with confidence.

Functional curation therefore supports the specification of experiments through a protocol language, allowing an in silico version of a wet lab experiment to be run on a range of alternative models and the results compared. This allows the rational selection of an existing model for a new study based on the ability of the model to produce a set of expected outputs for a predefined set of protocols. Similarly, the inability of any existing model to replicate a set of protocols motivates further modelling work. Inspired by “test-driven” development methodologies from software engineering, functional curation can also facilitate producing more robust models. By defining collections of protocols and desired outputs new or adapted models can continually be evaluated during the development process to ensure that central functionality is not lost. The publication of the set of protocols and corresponding desired outputs used to develop a model alongside the model equations provides a means of confirming that the model equations are correctly implemented.

SED-ML already exists as an XML-based format for encoding simulation experiments, and is gaining significant traction within the systems biology developer community. However, the language in its current form is limited in the kinds of experiment that can be represented. For instance, protocols defined in SED-ML are tightly tied to a particular model, hence typically precluding the definition of a single protocol that can be applied to multiple models. It is not yet capable of expressing parameter sweeps, or any general form of repeated simulation (although such features are planned). The functionality for post-processing raw simulation results is also extremely basic, being inadequate for almost any electrophysiology protocol. Our language addresses these shortcomings. However, it must be emphasised that we are not aiming to develop a competing standard. Rather, this project is investigating what language features are required to perform experiments of interest to us. We are working to cast these features as extensions to SED-ML, for future inclusion within that standard.

In our approach, a protocol does not specify which model to use, but instead specifies interface rules allowing a range of models to be mapped to the mathematical form expected by the protocol. Simulations may be comprised of arbitrarily nested loops over arbitrary ranges, with simulation outputs therefore represented as n-dimensional arrays. Post-processing operations are defined using a small functional programming language built on MathML, which provides a good balance between expressivity (many complex operations, including interpolation and action potential duration calculation, may be defined) and ease of implementation (a prerequisite for widespread adoption). A protocol import feature allows libraries of common operations and simulation components to be built up.

Other sources of information

There is now an online prototype of a functional curation system for cardiac electrophysiology. This provides an intuitive user interface to the tools. Materials from a training workshop in September 2015 are available, as is a preprint publication about it.

Our initial publication on functional curation is available, and we provide a local pre-print for those without access to the journal version. There is in addition a paper tutorial corresponding to this paper, which provides a walk-through of the code. Slides from some talks are available, and others will be added to figshare:

The latest source code for our implementation of functional curation, as an add-on project to Chaste, is available to browse and check-out. Released versions may be obtained from the main Chaste downloads page. See the project README for more information on installation and basic usage.

To follow development progress, the list of related tickets and some SimulationProtocolNotes may be of interest. (Some of these are currently only accessible by Chaste developers; see also #1989.) You may also view the continuous and nightly test results.

If working with the prototype Python implementation, see also FunctionalCuration/PythonImplementation.

Protocol language

Protocol language overview

The protocol language has several components, which can be considered somewhat independently. A major part is the post-processing language, which extends real number arithmetic (as in MathML) with support for n-dimensional arrays, and sequences of statements (assignment, function return, and assertions). This could stand independently of the rest of the protocol language, although it is used in various other parts of the protocol. Another independent section is the model interface, which is currently largely specific to models defined as systems of equations, especially ODEs. The main sections of a protocol are shown graphically on the right, and in order of appearance within a protocol file in the following list.

  • Input specifications, giving default values for protocol inputs
  • Protocol imports, allowing functionality from other protocol files to be re-used
  • Library, defines functions & variables for use in the protocol
  • Units definitions, currently just used in the model interface
  • Model interface:
    • Define what variables are considered to be model inputs & outputs
    • Define units conversion rules
    • Change the mathematics of the model, e.g. to implement a stimulus protocol
  • Simulation definitions
  • Post-processing of raw simulation results
  • Output specifications, defining which variables should be saved to disk as the protocol outputs
  • Default plots, just 2d "y against x" graphs at present

For more details, see the syntax of the protocol language.

Related pages

Last modified 18 months ago Last modified on Oct 29, 2015, 11:44:38 AM

Attachments (6)

Download all attachments as: .zip