wiki:FunctionalCuration/ProtocolSyntax

The Functional Curation Protocol Syntax

This page describes the textual syntax for protocol descriptions, and how it relates to the semantics of running a protocol. This textual syntax is the recommended format replacing the old XML syntax, since it is much easier for humans to read and write. Unfortunately at present there is no dedicated editor, so unlike using an XML editor which could provide content completion using the schema, users must use a standard text editor and get no assistance. Any errors in the syntax will only be detected when the protocol is run.

Uploading a protocol to the Web Lab will attempt to parse it (as a quick background job) and add an 'errors.txt' file to the protocol listing if any parsing errors are found. If you have a Chaste install, running projects/FunctionalCuration/apps/CheckSyntax.py path_to_protocol will similarly parse the file and report any errors.

The definitive description of the textual syntax is given in the parser, although this will mainly be of use to those familiar with the pyparsing grammar syntax. The tests for the parser may also be of use.

Finally, you can get a good idea of what is possible by looking at our collection of example protocols, the standard library of post-processing functions, and the cardiac post-processing library. Many of the protocols in the Web Lab have extensive tutorial-style comments.

The overall structure of a protocol description appears below, but first we explain some general concepts that occur throughout protocols.

General concepts and constructs

The syntax borrows heavily from programming languages like C and Python. Major sections are contained within braces, and introduced by the section name; the brace may occur on the same line or the next. However, individual statements are terminated by newlines, rather than a specific character like the semicolon. Comments begin with a hash "#", and continue until the end of the line. Non-mathematical content is liberally scattered with English keywords to indicate the constituent parts.

In the following, we often use a style borrowed from command-line help messages to indicate the syntax for particular segments of the language. Optional items are enclosed in [square brackets]; <angle brackets> indicate a placeholder for something that needs to be substituted; and a vertical bar '|' indicates a choice between alternatives.

Identifiers and name resolution

Name resolution schematic

An important concept throughout a protocol is the names that may be given to variables, simulations, etc. and how references are resolved. Normal identifiers consist of alphanumeric characters and underscores, e.g. "var_name1". However, in many places variable references may also contain colons, which are used as a prefix separator, separating identifiers into distinct namespaces. This allows us, for example, to distinguish between results arising from different simulations of the same model, which would otherwise use the same name for the same model output. Each simulation is assigned a prefix, and this is used to refer to the results from that simulation.

This prefix concept is used throughout the protocol language for resolving name references. In the run-time implementation of a protocol many environments exist, mapping names to the corresponding values. Any environment may associate another with a particular prefix, and names starting with that prefix are looked up in that environment. As well as for simulation results, this mechanism is used to refer to names from imported protocols, and also to refer to model variables. In the latter case we utilise ontology terms, i.e. URIs, and the prefix to use is given by the namespace bindings in the protocol definition.

An environment may also have a default delegatee, which is used to look up any name not found within the environment. Understanding the delegations occurring between environments is thus important for understanding what variable names to use if you want to look up a particular entity. The delegation graph is shown in the schematic on the right.

The following environments exist.

  • An environment containing the protocol inputs.
  • An environment containing the definitions in the protocol library.
  • An environment for names defined in the post-processing section.
  • Each simulation has an associated environment as it runs, which is used for evaluating expressions in ranges and setVariable modifiers. The environment itself contains only the current value of the associated range(s). However it delegates to many useful environments. In a nested simulation, note that the environment for each nested level delegates by default to the next level out.
  • Environments containing the results of each simulation. These always exist, but only contain values once the simulation has run.
    • The environments delegate by (ontology) prefix to the environments containing the current values of variables in the model that will be simulated.
    • The names of results within these environments use the local names of variable annotations, i.e. with the base URI for whatever ontology stripped out. This may change; see #2529.
  • An environment containing the declared outputs of the protocol. Its contents are copied from the post-processing environment once the protocol has run - it does not delegate to any environment. Any variables to be plotted must appear in this environment.
  • Function definitions define a local environment for the function body, which delegates to the statically enclosing scope (i.e. the environment where the function was defined). Note that function parameters get evaluated in the environment of the caller of the function, however.

Mathematical computations

Mathematical computations can use most of the operations on real numbers defined by MathML, with some extensions to support working with n-dimensional arrays. Expressions are evaluated to yield values, and we also support explicit sequencing of operations through statements. The semantics of the language are pure (i.e. variables just represent bindings of values to names; they may not be updated once assigned) and functional (loosely, functions are first-class entities that may be passed as arguments to functions).

Values

Values yielded by expressions in the post-processing language may be of 6 kinds.

  1. Regular n-dimensional array of real numbers (IEEE double precision). These are written within square brackets.
  2. Single real number (which is equivalent to a 0d array). These may be written in decimal or scientific format, e.g. "1.5e3".
    • Note that there is no explicit boolean type. Instead, zero is considered to represent false, and any other value is true.
  3. Function objects (closures) - functions are first class entities.
  4. Tuple of values - created with brackets and commas, e.g. (1, 2, 3).
  5. Special "null" value (for use as a sentinel etc.) - the reserved word null.
  6. Special "default parameter" value, which allows us to say explicitly "use the default value" when passing parameters) - the reserved word default.

Expressions

The core part of the expression language, for working just with real numbers, uses syntax familiar in most programming languages. The basic arithmetic operators are: + - * /; ^ is used for exponentiation. The relational operators are: == != <= >= < >. Logical and/or are represented by && and || respectively, with negation using the keyword not (e.g. not (1 > 2)). Precedence can be overridden or clarified by using explicit parentheses, but the default rules shouldn't be surprising.

In addition to the operators with special syntax, many MathML operators are available for use as functions. Their names must be prefixed by MathML:, for instance MathML:exp(2.0). All the trigonometic functions are available, as well as: quotient rem max min root xor abs floor ceiling exp ln log.

Conditionals use a verbose if-then-else syntax, with the else clause being compulsory. For example, if a < 5 then 1 else 0.

Defining and calling functions

While functions will often be defined by a statement, inline anonymous functions are also supported. These are particularly useful in defining the first argument for the map or fold functions. Anonymous functions are defined by the lambda keyword:

    lambda <param>[=<value>] [,<param>[=<value>]]*: <expr>
    lambda <param>[=<value>] [,<param>[=<value>]]* { <statements> }

Default values can be given for any parameter. The first form, where the function body is just an expression, is most common, but full 'list of statements' functions are also supported.

Where a function is assigned to a name (or defined with the statement syntax below) it can be called using the syntax:

    <function_name>(<expr> [,<expr>]*)

If you want to use the default value for any parameter, you may pass the special value default. If not enough arguments are supplied, but defaults are defined for all remaining parameters, then the defaults will be used.

As a convenience for the common use case where you want to define an anonymous function that simple wraps a mathematical operator, there is a special syntax for this. For example: @2:+ is a short-hand for writing lambda a,b: a + b. You need to give the number of arguments after the @ symbol, and then the operator after a :. The operator may be one of those with special syntax as above, or MathML:<name> for others defined in MathML (e.g. @1:MathML:exp).

Creating arrays

Arrays may be created simply by listing their entries within square brackets, e.g. [1, 2, 3] or [[1,2], [3,4]] for a 2-d example. For larger or more complex arrays, the array comprehension syntax is much more powerful. This lets you define arrays in terms of a generating expression which is evaluated to yield each entry, and index variables which range over the dimensions of the array, determining its shape. The generating expression can use these index variables in order to give a different value for each entry.

The general syntax is

    [ <generating_expr> <for_spec> [<for_spec>]* ]
    # i.e. one of more instances of <for_spec>
    # where <for_spec> = for [<expr>$]<identifier> in <expr>[:<expr>]:<expr>

Each for specification defines one dimension of the array. The identifier gives the name of the index variable, and the optional expr preceding it says explicitly which dimension it varies over; by default each for specifies the next non-specified dimension. The in clause gives the start, step, and end values for the index variable, using half-open range semantics (as in Python): the start point is included, but not the end point.

The generating expression may evaluate to an array, in which case the array must be the same shape at every evaluation. This can be used, for instance, to replicate an array along a new dimension. The dimensions of the array at each entry fill in any 'gaps' in the for specifications, or go at the end.

Some examples may be clearer:

    [i for i in 0:N]                           # --> [0, 1, ..., N-1]
    [ i for 0$i in 0:10 ]                      # --> [0,1,2,3,4,5,6,7,8,9]
    [i*2 for i in 0:2:4]                       # --> [0, 2]

    [i+j*5 for i in 1:3 for j in 2:4]          # --> [ [11, 16], [12, 17] ]
    [ i*3 + j for 0$i in 1:3 for 1$j in 0:3 ]  # --> [[3, 4, 5], [6, 7, 8]]
    [ i*3 + j for 1$j in 0:3 for 0$i in 1:3 ]  # --> [[3, 4, 5], [6, 7, 8]]
    [i^j for 1$j in 4:-1:2 for i in 1:3]       # --> [ [1, 1], [16, 8] ]

    [ [i*3+j for j in 0:2] for i in 0:5]       # --> [ [0,1], [3,4], [6,7], [9,10], [12,13] ]
    [ [[-10+j,j],[10+j,20+j]] for 1$j in 0:2 ] # --> [ [[-10,0], [-9,1]] , [[10,20], [11,21]] ]

    # Dimension specifiers can be expressions too...
    [i for (2-2)$i in 2:(3+5)]           # --> [2, 3, 4, 5, 6, 7]
    [i for 2-2$i in 2:4]                 # --> [2, 3]
Extracting sub-arrays

Appending one or more 'view specifications' in square brackets to the end of an expression yielding an array gives you a portion of that array. A view specification looks like `[<dim_expr>$][<start_expr>]:[<step_expr>:][<end_expr>], and determines the start, step, and end along one dimension. As for array comprehensions, view specifications use half-open range semantics: the start element is included, but the end is not. If start or end are omitted, the view goes from the beginning or end of the dimension, respectively (or vice versa, if step is negative). If the dimension specification is omitted, the view specification is considered to apply to the first unspecified dimension. The dimension specification may also be *, in which case the view will apply to all unspecified dimensions.

Examples:

    # Assume input = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    # And input2d = [ [1,2,3], [11,12,13] ]
    input[2]         # --> 3
    input[-3]        # --> 8

    input[1:4]       # --> [2, 3, 4]
    input[:2]        # --> [1, 2]
    input[8:]        # --> [9, 10]
    input[-2:]       # --> [9, 10]

    input[0$3]       # --> 4
    input2d[0]       # --> [1,2,3]
    input2d[0][1]    # --> 2
    input2d[0:1][1]  # --> [2]
    input2d[1$0]     # --> [1,11]
    input2d[1$0][0$1]# --> 11
    input2d[*$1]     # --> 12

    # Negative step views
    input[:-1:]      # --> [ 10, 9, 8, 7, 6, 5, 4, 3, 2, 1 ]
    input[:-1:-3]    # --> [10, 9]
    input[3:-1:]     # --> [4, 3, 2, 1]
    input[4:-1:2]    # --> [5, 4]
    input[2:-2:0]    # --> [3]
    input[2:-2:-11]  # --> [3, 1]
    input[-1:-3:]    # --> [10, 7, 4, 1]
Special operations on arrays

map: This function applies another function (the first parameter) element-wise to one or more arrays.

    map(f, [0, 1, 2])                 # --> [f(0), f(1), f(2)]
    map(@2:*, [0, 1, 2], [3, 4, 5])   # --> [0, 4, 10]

    def Diff(a, dim): map(@2:-, a[dim$1:], a[dim$:-1])

fold: This is a 'reduce' style operation, which folds a 2-argument function along one dimension of an input array. It takes 4 arguments: the function to fold, the array to fold over, an initial value (defaulting to null, in which case the first element of the input array is used), and the dimension to fold along (defaulting to the last). Note that the folded function must return just a single real number, not an array, for simplicity of implementation. The result will have the same number of dimensions as the input, and the same shape except that the folded dimension will have length 1.

    fold(@2:+, [ [1, 2, 3], [4, 5, 6] ], 10, 1) # --> [ [16], [25] ]

    def Max(a, dim=default): fold(@2:MathML:max, a, default, dim)
    def Min(a, dim=default): fold(@2:MathML:min, a, default, dim)
    def Sum(a, dim=default): fold(@2:+, a, 0, dim)
    def Product(a, dim=default): fold(@2:*, a, 1, dim)

find: This single-argument function finds non-zero entries in the input array. It returns an array of indices suitable for passing to the index operator. The shape of this result is num_non_zeros x num_input_dimensions.

    find([ [0, 1, 2], [1, 0, 0] ])   # --> [ [0,0], [1,1], [1,2] ]

index: This creates a subarray of the first operand containing only those elements whose indices are in the second operand. The language data model requires regular arrays, so by default an assertion is tripped if the result would be irregular. However, further operands may specify either to pad short rows along one dimension with a given value, or shrink either left or right along one dimension to make the result regular. Due to its complexity, this operation has special syntax rather than being a built-in function:

    <input_expr>{<indices_expr> [, <dim_expr>] [, pad:<pad_direction>=<pad_value>] [, shrink:<shrink_direction>] }

Its behaviour is easiest to illustrate by example:

    # Create 3x5 input array
    input = [ i*5 + j for i in 0:3 for j in 0:5 ]
    input_max = 14
    
    # All but one entry
    all_bar_largest = map(lambda x: x < input_max, input)
    all_bar_largest_idxs = find(all_bar_largest)
    all_bar_largest_indexed = input{all_bar_largest_idxs, 0, pad:1=-1}
    assert ArrayEq(all_bar_largest_indexed, [ if i == 2 && j == 4 then -1 else input[i][j] for i in 0:3 for j in 0:5 ])
    
    # Whole array
    all_entries = map(lambda x: x < input_max + 1, input)
    all_entries_idxs = find(all_entries)
    all_entries_indexed = input{all_entries_idxs}
    assert ArrayEq(all_entries_indexed, input)
    
    # Odd entries
    odd_entries = map(lambda x: MathML:rem(x, 2), input)
    odd_indices = find(odd_entries)
    # Note that indexing using odd_indices as-is would fail as result is irregular
    shrink_right = input{odd_indices, 1, shrink:1}
    shrink_left = input{odd_indices, 1, shrink:-1}
    assert ArrayEq(shrink_right, [[1, 3], [5, 7], [11, 13]])
    assert ArrayEq(shrink_left, [[1, 3], [7, 9], [11, 13]])
    
    # Explicitly construct a regular index
    reg_indices = [ [i/2, [1,3,0,2,1,3][i]] for i in 0:6 ]
    some_odd_entries = input{reg_indices} # Dimension defaults to 1
    assert ArrayEq(some_odd_entries, shrink_right)
    
    # Now try with padding instead
    pad_right = input{odd_indices, 1, pad:1=55}
    pad_left = input{odd_indices, 1, pad:-1=-55}
    assert ArrayEq(pad_right, [ [1,3,55], [5,7,9], [11,13,55] ])
    assert ArrayEq(pad_left, [ [-55,1,3], [5,7,9], [-55,11,13] ])
Other expression types

The type of entity yielded by an expression can be examined using the accessor syntax, in which an expression is followed immediately by a period '.' and the name of the property to test. For example, [1, 2, 3].IS_ARRAY evaluates to 1 (true). The properties are:

  • Type tests: IS_SIMPLE_VALUE IS_ARRAY IS_STRING IS_TUPLE IS_FUNCTION IS_NULL IS_DEFAULT
  • Array information extraction: NUM_DIMS NUM_ELEMENTS SHAPE

Finally, for debugging purposes any (sub)expression may be 'traced' by directly appending a question mark '?' to it. Whenever a traced expression is evaluated, its value is written to a trace.txt file within the experiment's output folder.

Statements

Four kinds of statement are supported: assignment, function return, assertion, and function definition (although the latter is really just friendly syntax for assignment of a lambda expression).

    [optional] a = <expr>             # Simplest form of assignment
    [optional] a, b = <expr>, <expr>  # Assignment of multiple names at once
    [optional] a, b = <expr>          # Here the expression must evaluate to a 2-tuple (or pair)

Note that since the language is pure, multiple assignments to the same name within the same context are not allowed. If the optional keyword is given then any errors in evaluating the defining expression will be ignored, and the variable(s) will simply not be defined.

Return statements are only allowed within the context of a function body.

    return <expr>          # Simple return
    return <expr>, <expr>  # Convenience syntax for returning a tuple of values

Assertions will terminate the protocol if the expression evaluates to zero - no error handling is currently supported.

    assert <expr>

Function definitions come in two flavours, depending on whether the body consists of a single expression or a sequence of statements. The former is just a more compact syntax for a function containing a single return statement.

    def <name>(<params>): <expr>
    def <name>(<params>) {
        <statements>
    }

Any parameter may be given a default value, as for lambda expressions, using the form <param_name>=<default_value>. Note that at present the default value is not allowed to be computed by an expression; it must be a simple value.

This example shows several advanced uses of the function syntax. A zero-argument function can be used to set up a nested scope. Name lookups within the function body delegate to the statically enclosing scope, so the lookup of outer_var works.

    def nested_scopes() {
        # Nested function
        def nested_fn(input): input + outer_var
        # Variable that should be accessible
        outer_var = 1
        # Call nested function & check result
        assert nested_fn(1) == 2
    }
    nested_scopes()

Loading data from file

There is a special built-in load function that can read 2d arrays from CSV files. The syntax is just like a normal function call, and here the data read is assigned to the variable 'array':

    array = load("path/to/data.csv")

The data layout is assumed to be the same as for 2d arrays output by a protocol to CSV, so rows vary along the first dimension and columns along the columns along the second. This means that you can, for instance, access the first column using an expression like array[0], or the first row using array[1$0].

Relative file paths are resolved relative to the folder containing the protocol file itself, so you can use a filename with no path component to reference a data file in the same folder.

Note that if you are using a data file to define a voltage clamp protocol or similar, you may find the interpolation feature of the model interface section to be more appropriate.

Parts of a protocol file

Protocol language overview

A protocol description consists of several sections, shown graphically on the right, which must appear in the order given here:

  1. Documentation
  2. Namespace bindings
  3. Protocol Inputs
  4. Imports of other protocols/libraries
  5. Library statements
  6. Units definitions/conversions
  7. Model Interface (model inputs and outputs)
  8. Simulation Tasks
  9. Postprocessing
  10. Protocol Outputs
  11. Plots

For each section we describe the content, then give some examples of the syntax.

Execution of a protocol also essentially follows this sequence: the model is loaded, the library statements are executed, the simulations are run, post-processing takes place, protocol outputs may be saved to disk, and plots are created.

Documentation

Human-readable documentation for a protocol can be included, using the Markdown format. The functional curation web interface will parse the Markdown and render it nicely in the browser. markdownlivepreview.com provides a nice way to test your markdown is doing what you want, without having to upload loads of different versions of your protocol!

documentation {
# Protocol title

Arbitrary Markdown goes here.  Or you can just write plain text if you prefer.

* This is a list...
* ...with four items
* *This one is emphasised*
* You need to escape underscores: \_ (if you don't want them to indicate emphasis)
}

Namespace bindings

In order for a protocol to refer to model variables, we utilise ontological annotations of those variables. To fit with the identifier naming scheme, prefixes must be defined for the ontologies used for model annotation. Each namespace line associates a prefix with a base URI; an identifier prefix:name then refers to the term obtained by concatenating name to the base URI.

namespace oxmeta = "https://chaste.comlab.ox.ac.uk/cellml/ns/oxford-metadata#"

# oxmeta:membrane_voltage then refers to https://chaste.comlab.ox.ac.uk/cellml/ns/oxford-metadata#membrane_voltage

(Note that the above is the only ontology currently supported by pycml.)

Protocol input declarations

Protocols may declare inputs, along with default values that can be computed using arbitrary expressions. The idea is that external software can supply alternative values, or the defaults can be overridden when the protocol is imported or nested.

inputs
{
    input1 = 1.0
    input2 = 5.0 / 3
    input3 = [ i * 2 for i in 0:2:11 ]
}

Imports of other protocols

Protocol imports are intended for use either in specialising existing protocols (by overriding their inputs) or importing functionality from other protocols. This functionality is often definitions within a library section useful for post-processing, but all sections of an imported protocol may be merged with the importer, allowing for importing units definitions, model interface specifications, etc. Note that imports are not placed within a named section, but like namespace bindings appear at the top level of the protocol.

# When specialising a protocol, simply specify the path and new values for any inputs.
import "generic_protocol.txt" {
    input1 = 42
}

# To import a library of routines useful in post-processing, it's advisable to give it a prefix
# to avoid potential name clashes, although this is not required.
import std = "BasicLibrary.txt"

# If no prefix is given then all parts of the imported protocol are merged with the importing protocol.
# Definitions in the imported protocol precede those in the importing one.
import "MyProtocolComponents.txt"

Common imports are of the standard library of post-processing functions, and the cardiac post-processing library.

Library

This consists of a sequence of assignment and/or assertion statements. It is typically used to define functions for use later in the protocol, e.g. in post-processing. Another common use case is to store the initial/default value of a model variable, since the library statements are evaluated before any simulations are run.

library {
    default_Cao = sim:oxmeta:extracellular_calcium_concentration
    step_calcium_values = map(lambda multiple: default_Cao * multiple, [0.5, 1.0, 1.5])
}

Physical unit definitions

We reuse the model of physical units defined in the CellML language. The SI units are built-in, and users may define their own units based on these: combining multiple units, perhaps with multipliers (which can often be expressed using standard SI prefixes) and/or exponents. Optionally, a description for the units may be given in quotes afterwards, which is used for labelling axes when plotting; if no description is given the units name is used instead.

These definitions are referenced within the model interface section, to ensure that conversions are applied where necessary. They are also referenced when defining simulation ranges and protocol outputs.

units {
    my_units = 50 deci metre^2
    mV = milli volt
    uA_per_cm2 = micro ampere . centi metre^-2 "{/Symbol m}A/cm^2"
    A_per_F = ampere . farad^-1
    mM = milli mole . litre^-1 "{/Symbol m}M"
}

Model interface

This section defines the interface between the model and protocol. It defines model inputs (variables which can be set from the protocol), outputs (the variables recorded while running a simulation, and hence generate n-d arrays), alterations to model mathematics, and special units conversion rules. Each directive within the section starts with its own keyword, and they may appear in any order, even interleaved.

The following snippet shows all the constructs that can be used here. Optional items are enclosed in [square brackets]; <angle brackets> indicate a placeholder for something that needs to be substituted; and a vertical bar '|' indicates a choice between alternatives. The term simple_expr denotes a simplified form of expression which contains only the mathematical constructs permitted in a CellML model, i.e. operations on real numbers; no arrays, function definitions, etc. Any constants appearing in such simple expressions must be annotated with their units, using the syntax "<number> :: <units_ref>". Here, <units_ref> may be either be the name of some units defined in the protocol (i.e. like <uname>), or the construct units_of(<prefix:term>) to refer to the units of some existing variable within the model.

model interface {
    independent var units <uname>
    input <prefix:term> [units <uname>] [= <initial_value>]
    output <prefix:term> [units <uname>]
    optional <prefix:term> [default <simple_expr>]
    clamp <prefix:term> [to <simple_expr>]
    var <name> units <uname> [= <initial_value>]
    define diff(<prefix:term>;<prefix:term>) | <prefix:term> = <simple_expr>
    convert <uname> to <uname> by <simple_lambda_expr>
}

Note that the <initial_value> for an input declaration is a plain number, no units annotation, since the units are specified elsewhere in that declaration.

independent var units: This specifies the units of the independent variable, as expected by the protocol. Often the independent variable will be specified as an output, in which case the units can be given there and this clause is unnecessary.

    independent var units ms

input: This specifies a variable that can be altered in a simulation modifier, optionally with units specified so conversions can be applied if needed. If the variable doesn't exist, this is an error unless units are given - the variable is then created. Unless the variable is specified as optional (see below) it must also be given a value, either using an initial value here, or defining it by a define clause, or both (if defining by an ODE). The initial value must currently be a simple number, not an expression; it overrides any initial value present in the model.

    input oxmeta:membrane_voltage units mV

output: specify a variable to record while running a simulation, and hence which will generate an n-d array in the simulation results. If units are given, conversions will be added if necessary; otherwise the output will use the units defined in the model. The model equations will be filtered so only equations required to obtain the specified outputs are computed.

The ontology term for an output variable may be used in the model to annotate multiple variables. In this case, a single output array is created containing values of all the variables; the last dimension of this array varies over the variables. Currently this facility is only available through the oxmeta:state_variable term, which "magically" annotates all state variables.

optional: this is used to declare a model variable (referenced by ontology term) as optional. If an optional variable doesn't appear in the model, then it is not an error for it to be referenced by an input, output, clamp or define construct. If a default clause is included, then it will be used as the variable's definition; otherwise any construct referencing the variable will simply be ignored. (A warning will be emitted if optional variables are not found, to assist in debugging unexpected results.)

Note that this 'optional-ness' only applies directly to uses within this section of the protocol - if a simulation or post-processing tries to access or set an optional variable that doesn't exist, this is still an error. The exceptions are post-processing assignments and output specifications that are themselves guarded by an optional keyword.

clamp: This specifies that a model variable should be fixed, rather than allowed to vary according to the model equations. It can either be clamped to the value it takes initially in the model, or to a value defined in the protocol. A clamped variable can still be changed during a simulation, but only by using a 'set' simulation modifier. Note that clamp var is equivalent in behaviour to define var = var, and clamp var to val is equivalent to define var = val; the clamp syntax is merely clearer for users.

var: This declares a new variable to add to the model, optionally with an initial value. The variable can be referenced within units conversion rules (convert) or new model equations (define).

define: This construct is used to add new equations to the model, or replace existing ones. If the left-hand-side of the equation is a variable or derivative that appears in the model, its definition is replaced. Variable references within the equation must use ontology terms unless they are to new variables defined using var. The special variable name original_definition may be used in the right-hand side when replacing a model equation, to insert the original definition within the new one, allowing an equation to be augmented rather than simply replaced.

As a special case a variable may be defined as a linear interpolation on a supplied data file rather than computed directly by an equation. For example,

    define oxmeta:membrane_voltage = interpolate("voltage_data.csv", oxmeta:time, ms, mV)

to define a voltage clamp protocol which may include arbitrary linear ramps. The supplied data file must have 2 columns, the first of which gives values for the independent variable (oxmeta:time in this example) and the second the dependent variable whose value is to be calculated (here oxmeta:membrane_voltage). The final two arguments give the units of the independent and dependent variable, respectively, within the data file, in case conversions are necessary to match what the model and/or protocol are using for these variables.

Note that this will only be performant with large tables if the independent variable is regularly spaced, in which case the lookups can be optimised. Otherwise it is implemented as a piecewise expression with a case for each segment.

convert: This defines a rule for a units conversion which isn't a simple scaling along a dimension (such scalings are handled automatically). The rule utilises the values of model variables (referenced using ontology terms) or protocol variables (defined using var) to convert between quantities in different dimensions. It specifies a unary function for converting a value from its actual units to the desired units. If a variable is required to be in the desired units, but is actually in the actual units, then applying the conversion function to its definition yields the desired value (assuming the original assignment has consistent units). Note that conversion functions just need to get you to the right dimensions, rather than exactly the desired units: normal scaling gets applied afterwards which doesn't need any special definition.

If the conversion function references (using an ontology term) a non-existent model variable, the rule can't be applied; a warning will be generated for each such rule. You'll probably later get an error saying the model can't be units-converted. However, this does allow specifying alternate rules for converting between the same units (e.g. do ionic currents first by seeing if the model contains an explicit volume, and falling back to the capacitance trick if not).

Implementation note: the conversion rules don't necessarily require the model interface code to understand the full post-processing language, since the conversion function body must be a single expression using only the pure-MathML subset supported in a model (i.e. no arrays, no function definitions, etc.). Hence all you need to do is copy the lambda definition and replace any ci referring to the bvar with the original definition of the converted variable.

    var chaste_membrane_capacitance units uF_per_cm2 = 1 # For units conversion rules
    convert A_per_F to uA_per_cm2 by lambda rhs: rhs * chaste_membrane_capacitance
    convert microamps to uA_per_cm2 by lambda rhs: (rhs / oxmeta:membrane_capacitance) * chaste_membrane_capacitance

Simulation tasks

Three kinds of simulation tasks are currently supported: straightforward timecourse simulations of a model; nesting of another simulation task within a loop; and a single step of the default simulation method associated with a model. Syntactically, these are defined using the following forms.

tasks {
    simulation [<prefix> =] timecourse {
        <range>
        [<modifiers>]
    }[?]
    simulation [<prefix> =] nested {
        <range>
        [<modifiers>]
        nests <simulation>|<protocol>
    }[?]
    simulation [<prefix> =] oneStep [<expr>] [ { [<modifiers>] } ] [?]
}

Each simulation task may be associated with a prefix, which can then be used to access its results, whether within modifiers for that or a later simulation, in post-processing, or for protocol outputs. If no prefix is defined, the simulation's results are not stored; this is typically used in conjunction with a 'save state' modifier to save the final state of the simulation for subsequent use.

Ranges and modifiers are described below. The other element common to each simulation type is the optional question mark after the definition. This indicates that the simulation should be 'traced', i.e. that intermediate or debugging information that would normally be suppressed should instead be stored to disk within subfolders of the main results folder. This only has any effect, at present, with cell-based Chaste models or nested protocols.

A nested simulation encloses another simulation directly within its definition, rather than referring to a separate task by name (at least at present). The <simulation> placeholder should thus be replaced by a complete simulation definition.

Alternatively, an entire protocol may be nested within a simulation loop, using the nested protocol form. The protocol to nest must in this case however be defined in a separate file. Inputs of the nested protocol may be given new values computed by arbitrary expressions. Also, the outputs of interest from the nested protocol may be specified (by name), and only these results will be stored. If an output is noted as optional, it is not an error if the nested protocol does not provide that output. (If an optional output is present on some invocations of the protocol but not others, it is treated as though it is always missing.)

By default, any output files or graphs that would typically be written by the nested protocol are lost. This behaviour can be changed by appending a question mark '?' after the closing brace, in which case all results will be stored in subfolders.

        nests protocol "path/to/protocol.txt" {
            [ <name> = <expr> ]*
            [select [optional] output <name> ]*
        }[?]

Ranges

Ranges are used to define the loop extents for timecourse and nested simulations. Three kinds of looping are supported: uniform spacing over a closed interval; iteration over the elements of a vector; or continuing while some condition holds. For each kind, the range must be given a name (by which the loop variable can be referenced in modifiers) and physical units.

    range <name> units <uname> uniform <expr>[:<expr>]:<expr>
    range <name> units <uname> vector <expr>
    range <name> units <uname> while <expr>

For a uniform range the three expressions determine the start point, optional step size, and end point of the iteration. Note that both the start and end point are included in the range.

For a vector range, the single expression must evaluate to a 1-dimensional array, the entries of which will be iterated over.

A while range may only be used for outermost loops, not for simulations embedded within a nested simulation. The loop variable increments over the integers at each iteration, yielding 1, 2, 3, etc. The condition expression is evaluated at the start of each iteration, except for the first (since empty simulation results are not allowed). If it is zero, the loop will terminate. The condition expression may utilise the partial results from the current simulation, by using the simulation's own prefix in name references. The size of these partial results will reflect how many simulation steps have occurred this far. All this means that the first time the condition is evaluated, the loop variable will be 2, the model's free variable if this is a timecourse will be 1, and there will be partial results of length 1 (in the first dimension) available.

Modifiers

Modifiers allow the protocol to change the model state as a simulation progresses. They are intended for modifications performed at defined points, rather than continuously during simulation - for the latter kind, the model interface section must be used to change the model equations. When a modifier is actioned is determined by the <when> term, which can be "start", "each loop", or "end".

         modifiers {
             at <when> set <prefix:term> = <expr>
             at <when> save as <name>
             at <when> reset [ to <name> ]
         }

The set modifier changes the value of a single model variable. The expression can use the full expression language, but must evaluate to a single number.

The save and reset modifiers work in conjunction. The save modifier records the current model state and gives it a name. A reset modifier in this or a subsequent simulation may then reset the model to that state. If no name is given when resetting, the model is reset to its initial conditions.

Post-processing

The post-processing section consists of a sequence of statements (except for return statements) which are executed in order once all simulations have been run. For example:

post-processing {
    assert sim:result.IS_ARRAY
    a, b = swap(1, 2)
    def Product(a, dim=default): fold(@2:*, a, 1, dim)
    def Mean(a, dim_=default) {
        dim = DefaultDim(a, dim_)
        dim_len = a.SHAPE[dim]
        f = lambda sum: sum/dim_len
        return map(f, Sum(a, dim))
    }
}

Protocol outputs

This section specifies what the recorded outputs of the protocol are, out of all the potential values arising from simulations and post-processing. It also allows protocol authors to describe these outputs in more detail than just their identifier, and associate physical units with them (for those computed by post-processing; the units of simulation results are given by the corresponding model variable).

For outputs arising from post-processing, the same name may be used for the protocol output as for the post-processing variable; names are looked up in the post-processing environment. However, outputs may be renamed using the second form shown below, and this form is also required for simulation results, or indeed any variable that needs a prefixed lookup, since the names of outputs may not contain colons.

Output specifications may be preceded by the optional keyword to suppress errors if the reference variable does not exist.

outputs {
    [optional] <name> units <uname> ["<description>"]
    [optional] <name> = <var_reference> [units <uname>] ["<description>"]
}

Graphical plots

Two-dimensional line plots of protocol results may be specified. The syntax is somewhat more flexible than the current implementation, which only supports a single 'curve' in each plot. Simple styling of the curves may be specified with a 'using' clause: either continuous lines, isolated points, or lines with points for each datum.

plots {
    plot "<title>" [ using lines|points|linespoints ] {
        <name> [, <name>]* against <name> [ key <name> ]
        ...
    }
}

If a multi-dimensional array is used as the 'y' variable, this results in multiple curves plotted. For instance, a 2d array of shape (3,40) would result in 3 curves each with 40 points. In such circumstances, it is particularly useful to specify a 'key' variable as well, the values of which are used to label the different curves.

Last modified 13 months ago Last modified on Jun 20, 2018, 4:17:45 PM