EasyStoch User Manual 0V9x

From Facile
Jump to: navigation, search

EasyStoch User's Guide

Overview

When molecular copy counts are small, biological systems are intrinsically stochastic due to the random timing of molecular events. Modeling many biological systems therefore requires a stochastic approach. Here, the standard method is the Gillespie algorithm, an "exact" method for simulating the stochastic dynamics of a biochemical system.

Easystoch is an implementation of the Time-Variable Next Reaction Method [1,2], which is itself an extension of the Gibson-Bruck version of the Gillespie algorithm [3,4]. All methods assume fast diffusion and that all molecules are in the same well-stirred compartment.

The computational performance of the conventional "Direct" and "First Reaction" variations Gillespie's algorithm does not scale well with the number of reactions. The Gibson-Bruck "Next Reaction Method" (NRM) solves this problem -- it is an exact algorithm that is also efficient [3]. It uses a single random number per simulation event, and run times scale logarithmically with the number of reactions, rather than linearly. However, the NRM does not easily accomodate parameters whose value varies with time.

EasyStoch implements the new Time-Varying Next Reaction Method (TVNRM), which extends the NRM to allow the efficient modelling of systems with time-dependent parameters of arbitrary functional form [1,2]. This feature can be used to model extrinsic noise or changes in cellular volume, a significant differentiator with otherwise similar tools.

Easystoch is best used in combination with Facile. The system model is created as a .eqn text file that lists biochemical equations, biochemical parameters and configuration options. Facile reads the model and generates an input .seqn file for EasyStoch.

For best performance, Easystoch is written in C++ and uses the Mersenne Twister pseudorandom number generator, which has a period 2^19937-1 [5].


References

1. Ollivier JF. "Scalable Methods for Modelling Complex Biochemical Networks". PhD Thesis. McGill University, 2011.

2. Shahrezaei V, Ollivier JF, Swain PS. "Colored extrinsic fluctuations and stochastic gene expression". Mol. Sys. Bio. 4:196, 2008.

3. MA Gibson and J Bruck. Efficient Exact Stochastic Simulation of Chemical Systems with Many Species and Many Channels. J. Phys. Chem., 2000.

4. Gillespie DT. "Exact Stochastic Simulation of Coupled Chemical Reactions". J. Phys. Chem., 1977.

5. Matsumoto M., Nishimura T. Mersenne twister: a 623-dimensionally equidistributed uniform pseudorandom number generator. ACM Trans. Model. Comput. Simul. 1998;8:3-30.

Installation

Supported Platforms (UNIX/Windows)

Easystoch was developed in a Linux environment, but should function correctly on any UNIX-like platform (including OS-X) which provides a suitable shell (e.g. bash) along with GNU Make and the GNU C compiler.

On Windows, the easiest solution (and the only one we have tested) is to install CYGWIN. CYGWIN provides a bash shell in Windows along with many standard UNIX commands and applications. When installing, be sure to include the make, gcc, tar and perl packages.

Install Facile

To use EasyStoch, it is best to first download and install Facile (http://facile.sourceforge.net).

Installation using tarball

The EasyStoch stochastic simulator is compiled and installed by running the following commands, adapting as appropriate:

cd ~/toolbox
tar -xvzf ~/Downloads/easystoch_RELEASE_0V90.tar.gz
cd easystoch_RELEASE_0V90
make EASYSTOCH

With easystoch compiled, simply add the installation directory to your PATH by adding the following line to your .bashrc file:

export PATH="$PATH:~/toolbox/easystoch_RELEASE_0V90"

Verifying the Installation

Check that the basic installation is correct by asking for the help page:

easystoch -h


A Simple Example

Create the .eqn File

EasyStoch requires an input file specifying the system to be simulated. The input file comprises chemical equations, reaction rate parameters, and initial numbers of molecules. It is easiest to generate the input file using Facile. Facile reads reactions in a simple text format and produces the input file for EasyStoch.

Let's illustrate the use of Facile and the format of the input file by a simple example. For simulating a binary reversible reaction, the following can be Facile's input file, which we name binary.eqn:

# File binary.eqn
EQN  
# the reversible binary reaction and its forward and backward rate
A + B <-> C;  f= 1e6; b= 2  
INIT                                      
# initial values are 1000 molecules of A, 500 molecules of B and no C molecules
A = 1000N 
B = 500N

To learn more about the format of Facile input see the Facile User Manual. We run Facile with -s command line option to produce EasyStoch input:

facile.pl -s binary.eqn

This command produces two files:

  • binary.seqn, a stochastic equation file that describes the model in a format understood by EasyStoch
  • binary_convert.m is a script used for analyzing the data generated by EasyStoch in Matlab.

Run EasyStoch

Having created the .seqn file read by EasyStoch, we can now run the stochastic simulation. The general form of the easystoch command is

%bash> easystoch <options> input_file > output_file

For our example, we have:

%bash> easystoch --iseed=33 --cout --sout --vol0=1e-15 --tmin=0.01 --tmax=0.2 binary.seqn > simdata.binary.dat

Here, the input file is binary.seqn and the output is captured into the file simdata.binary.dat using the shell's piping mechanism. Several command-line options are specified as well. The --iseed option specifies the seed for the random number generator (if not given, easystoch generates a seed automatically). The --cout option tells easystoch to report concentrations as well as copy counts. The initial volume for the simulation is specified by --vol0. Finally, --tmin gives the time when output and statistical accumulators begin (though the simulation itself always starts at t=0), --tmax specifies when the simulation ends, and --sout specifies that a statistics report should be generated at the end of the simulation.

Input/Output File Format Descriptions

Note on Units

It is possible to re-scale rates, volume and time to arbitrary units, given that the units for bi-molecular reaction rates, time and volume are determined by:

  • for volume, the units of --volume argument
  • for time, the unit of uni and bi-molecular rates
  • for concentration, the unit of Avogadro's number used internally (defaulting to 6.022e23 molecules/mole)

For example, to use units of uM^-1 s^-1 in bi-molecular rates, it is necessary to change Avogadro's number to:

6.022e17 molecules / micro-mole

through the --avo switch.

Input File Description

We specify here the format of the input file read by EasyStoch (which will usually be generated automatically by Facile). As an example, We show EasyStoch's input file for the binary reaction.

Section 0

Section 0 specifies the number of reactions, species, and promoters. Chemical species specified as promoters need to be identified in the version of EasyStoch that simulates cell cycle effects. Currently, all chemical species are treated as proteins.

 # Section 0 (header): no. reactions, no. species, no. promoters
 2 3 0

Section I

Section I specifies the properties of the chemical species to be simulated including their initial values, name, type (currently not used), time of replication for promoters (not used), and EasyStoch ID. The output file prints the chemical species simulated in the order of their ID number.

# Section I (substrates): iv, name, associated promoter (if complex), time (for promoter), EasyStoch ID
1000    A       -1      9999    0       
500     B       -1      9999    1       
0       C       -1      9999    2

Section II

Section II specifies reactions. Reactions with up to two reactants and three products are supported, as well as dimerization reactions, such as A+A -> A2. Reactions must follow the law of mass-action. For reactions with rates that change with time, the reaction ID must be specified in Section IV.

# Section II (reactions): ID 1st reactant, 2nd reactant,
# 1st product, 2nd product, 3rd product, reaction rate, reaction ID
0       1       2       -1      -1      1e6     0 # A + B -> C; f=1e6
2       -1      0       1       -1      2       1 # C -> A + B; b=2

Section III

Section III specifies the dependency matrix which is used to implement the Gibson-Bruck version of the Gillespie algorithm. The i 'th row shows the reaction IDs that are affected by the i 'th reaction. Each line ends with a P. Use of Facile is most convenient here as (for historical reasons) it computes the dependency matrix of the system for EasyStoch.

# Section III (dependency matrix): list ID of affected reactions
1       P
0       P

Section IV

Section IV lists the reaction rates or chemical species that change at some fixed time during the simulation. Each line includes the index of the reaction rate or species to be changed, the time of change, and the new value. Indexes to be changed could refer to either a species ID (an "s" followed by species ID number) or a reaction rate ID (an "r" followed by rate ID number). If no letter is used, the ID number is assumed to refer to a rate.

# Section IV (dynamic rates/species change): reaction/species index, time of rate change, new value
# (no sample times given)

For changes in chemical species, the index number is the initial value for the species given in the Facile eqn file. During the simulation, this numbers of the chemical species will change. It will also be reset to the new, specified value at the specified time of change. This resetting does not affect any other species numbers.

For changes in reaction rates, if reaction rate IDs are not specified by --rate_vary, rates will be kept constant until a rate change specified in section IV occurs or if "ichval, tchval, newval" are implemented in the command line (see below). If reaction rate IDs are specified by --rate_vary, these rates will vary linearly between each rate change. Details of the algorithm are in Shahrezaei et al. (Mol. Syst. Biol., 2008). Section IV can be added to the Facile output. An example, which can be appended to the binary reaction simulation input file, is:

r0 10 0
r1 10 0
s1 20 100
0 100 1e6
1 100 2

The rates corresponding to IDs 0 and 1 (A and B in this case) are set to zero at time 10, species with ID 1 (A in this case) is set to 100 at time 20, and rates with IDs 0 and 1 are returned to their original values (specified in the Facile equation file) at time 100.

To model extrinsic fluctuations in a particular parameter, we generate a time series with the desired sampling frequency using the appropriate model for the fluctuations, e.g. Gaussian white noise or log-normal coloured noise. Matlab or any other application can be used. This time series can be appended to the EasyStoch simulation input file provided it is generated in the format specified above (rate ID; time of change ; parameter value). For example, we use the following command in Linux to append the file time_series.txt to stoch_simulation_input:

cat time_series.txt >> stoch_simulation_input

Output File Description

The output file begins with the options used to call EasyStoch and the changes enforced in any parameters. The simulation output is displayed in columns. The first column is the time of each reaction. The others are the numbers of the species simulated in the order of their species ID (also concentrations are given if the -cout switch is used). With the -i switch, output is given at fixed time intervals. All the comments begin with the % sign to allow the output file to be directly loaded into Matlab. Running binarystoch_convert.m in Matlab defines the variables specified in the Facile file, such as t, A, and B, to allow easy plotting of EasyStoch's results.

The output form can change slightly if other options are specified.

Command Line Parameters

Easystoch has a sophisticated command-line interface which is described below. Some of these options can also be configured in the Facile input file.

OPTIONS (short form, long form, default value if any, description):
h,help       Print this help message.
e,iseed=0    Integer seed for random number generator; if not supplied,
             the seed is derived from current system time.
cout         Flag to output results in micromolar concentration units
             in addition to numbers of each species.
rtrigger=?   Trigger output on given reaction occurring (supply reaction no.) (NOT IMPLEMENTED)
strigger=?   Trigger output on given species changing (supply species no.) (NOT IMPLEMENTED)
ctrigger=?   Trigger output when species given by strigger reaches a count
             specified by ctrigger (NOT IMPLEMENTED)
ntrigger=?   Number of lines to output if triggered (defaults to 1)
             (NOT IMPLEMENTED)
filter=?     Only output data for the given species. This reduces the number
             of columns in the output file. (NOT IMPLEMENTED)
sout         Statistics out flag to print the mean and variances of molecules
             over time period specified; statistics are calculated to tmax 
             whether the tmaxstrict flag is used or not.
i,out_interval=0.0
             Output system state every out_interval seconds. If zero or negative,
             then output state after each reaction.
v,vol0=2.5e-15
             Volume of the cell. Used to rescale binary reaction rates and to
             calculate concentrations.
z,timescale=1.0
             All times are rescaled by this amount. E.g. timescale= 60 
             gives a time coordinate in minutes 
m,tmin=0.0   Time when output and averaging starts. Note that simulation always
             starts at time t=0.
n,tmax=-1    Time when simulation ends. Use -1 to indicate infinity. 
r,runs=1     Rumber of runs the simulation is repeated. Output for all the 
             runs is printed.
ichval=?     Comma-separated list of reaction or species IDs whose rate constant
             or copy count is to be changed during the simulation run. Reaction
             and species are distinguished by prefixing 'r' or 's' to the ID.
tchval=?     Comma-separated list of times when the rate constant or copy count of
             the reaction/species indexed by the corresponding ichval will be changed.
newval=?     Comma-separated list of new values for the rate constant or copy count of
             the reaction or species indexed by the corresponding ichval, which will
             replace the old value at the times indicated by the corresponding value
             in tchval.
pwlin=?      Specifies a comma separated list of reaction IDs whose reaction rates
             change in a piecewise-linear fashion (the default is piecewise-constant).
             The value and time of rate changes is entered using the chval commands
             or the input file.
choff        Flag to stop printing warnings when the parameters (rate constants)
             change during the simulation (NOT IMPLEMENTED)
tmaxstrict   Flag to cause simulation to end when t>=tmax, otherwise 
             one extra reaction is output (but not included in statistics). (NOT IMPLEMENTED????)
version      version info 

N.b. For options that specify a value, do not use the '=' sign with the short form. Eg. for the out_interval option:

easystoch -o 0.001

OR

easystoch --out_interval=0.001

are the same.