The Kappa Tutorial, v4.1¶
Kappa is a language to write computational models; it is intended for biochemical signaling cascades, but not constrained to that use-case. Currently, there are two main platforms to develop and run Kappa-based models:
- Command-line binaries & script files
- Browser-based proto-IDE
For the binaries, there are KaSim nightly builds and there is the KaSim source code. Power users will probably want to compile from source code. The procedure to achieve that will be detailled in a second part elsewhere. For developing Kappa scripts, there is a Notepad++ syntax highlighter avaialable at the development home page.
As this is a tutorial, we’ll write a simple Kappa model and do some basic analysis of it. The proto-IDE will suffice for these purposes, so go ahead an open it in your browser of choice.
The Kappa Script¶
Kappa script files customarily have the .ka
file extension, e.g.:
foo.ka
. Each line of code is treated independently, so the order of
instructions doesn’t matter (except for variables, who must be
declared prior to being used). While KaSim will not complain about
the structure of the code, your advisor might: understandable scripts
do have a structure. Moreover, they have comments and explanations. In
Kappa, there are 2 kinds of comments:
line comments: | start with the pound symbol # and end with the end
of the line |
---|---|
block comments: | start by /* and end by */ , and can span multiple lines |
You can split your model into different files, and tell the simulator to read them in order. You can also put everything on the same file. This last usage is the one will be following here.
Roughly speaking, a Kappa script has the following sections:
Agent Signatures: | |
---|---|
These define the agents and tokens in our
model. An agent (e.g. a protein) has sites it uses
to bind and/or to signal it’s state (e.g. site Y3
in state phosphorylated ). |
|
Variables: | These are symbols that represent a value used during the simulation, e.g. the binding rates, the volume, some complicated algebraic expression. |
Rules: | These are the interactions between agents. Due to the nature of the rules, there are effectively 4 operations in Kappa: change site’s bond state (i.e. bind or unbind), change site’s state (e.g. phosphorylate it, methylate it, ubiquitinate it), degrade an agent or set of agents, and create an agent or set of agents. These operations can be combined into a single rule, or split into multiple rules, depending on the behavior we want to simulate. |
Observables: | These define the entities for which the simulator will output time-course abundances. If we define no observables, the simulator will not ouput any abundance. |
Perturbations: | These modify the simulation, e.g. we add X amount of
an agent, alter the rate of a rule, change the state
of all Y17 sites of agent Smith() to some state gone . |
Agent signature¶
Files customarily begin with the agent signatures, i.e. how we define our agents and their sites of interaction.
To define an agent, we use the %agent:
keyword, followed by the agent’s name, and the agent’s sites (between parenthesis).
The syntax is:
%agent: [agent name]([agent's sites])
For example, let’s define an agent named Protein1
that has two
sites called bindingSiteForProtein2
and bindingSiteForProtein3
,
and a site called Serine12
, which can be in state Unmodified
,
Phosphorylated
, or Mutated
, with Unmodified
being the default
state. The agent would be defined as:
%agent: Protein1(bindingSiteForProtein2,bindingSiteForProtein3, Serine12~Unmodified~Phosphorylated~Mutated)
Or more succinctly as:
%agent: P1(P2, P3, S12~u~p~m)
Note
Single letter codes quickly become opaque: if
U
is for unmodified, what’s for ubiquitinated? If P
is for
phosphorylated, what’s for palmitoylated? If M
is for mutated,
what’s for methylated, di-methylated, or tri-methylated? Moreover, the
English alphabet contains 26 letters, but biochemistry knows of at
least 27 post-translational modifications. I therefore recommend using
at least two letters for state (e.g. ub
, ph
, py
, 1m
,
2m
, 3m
, xx
).
Let’s stick with:
%agent: Prot1(P2, P3,S12~un~ph~xx)
In fact, let’s define two more agents similar to
Prot1
, which we’ll call Prot2
and Prot3
. The sites of Prot2
will be called P1
and P3
, while
Prot3
’s are P1
and P2
. And while we’re at it, let’s add a text
pod in our script to mark where we’re at. So far in our script, we
should have:
############################################################
# Here are my agent signatures
############################################################
%agent: Prot1(P2, P3, S12~un~ph~xx)
%agent: Prot2(P1, P3, S12~un~ph~xx)
%agent: Prot3(P1, P2, S12~un~ph~xx)
Now, it’s time to write how these guys play together.
Rules¶
Here we will write the rules that dictate how can our agents interact. An irreversible rule’s syntax is:
'[rule name]' [left-hand side] -> [right-hand side] @ [forward rates]
A reversible rule’s syntax is quite similar:
'[rule name]' [left-hand side] <-> [right-hand side] @ [forward rates], [reverse rates]
The left-hand side (LHS) is the pattern of reactants, the
right-hand side (RHS) is the pattern of products, and the arrow marks
if it is a reversible (i.e. <->
) or irreversible (->
)
rule.
Note
The reversible rule syntax is purely sintactic sugar: the simulator is internally producing two irreversible rules that would read:
'[rule name]' [left-hand side] -> [right-hand side] @ [forward rates]
'[rule name_op]' [right-hand side] -> [left-hand side] @ [reverse rates]
If in some of KaSim’s output you see rules with _op
appended to their names,
rules that you did not write, they are the reverse of the reversible rules you wrote.
In terms of the guts of the simulator, what is doing is matching the LHS to whatever is in the reaction mixture, and replacing that with what we wrote in the RHS. In a more formal speech, left go the sufficient conditions to trigger a rule, and right goes the pattern injected by said rule’s application. The pace at which a rule is triggered, what would be the rule’s activity, is governed by mass action dynamics. In other words, the probability of rule \(i\) being triggered is given by:
Where \(A_i\) is the activity of rule \(i\), \(LHS_i\) is the left-hand side of rule \(i\), and \(K_i^f\) is the forward rate of rule \(i\) (for reverse reactions, it would be the RHS times the corresponding reverse rate).
Note
To make a rule trigger more often, one can increase the abundance of its LHS, and/or increase the rule’s rate. E.g. a rule with a large rate, but a rare LHS, may still be triggered rarely, depending on the overall activity of the system.
Rule Rates¶
A rule can technically have up to 4 rates:
'[name]' [LHS] <-> [RHS] @ [bimolecular forward rate](unimolecular forward rare), [bimolecular reverse rate] (unimolecular reverse rate)
In practice 3 is the most seen for reversible binding rules, 2 for irreversible binding rules, 1 for irreversible unbinding rules. The rates are used when:
bimolecular forward rate: | |
---|---|
if the LHS has ambiguous molecularity, this is the rate for bimolecular cases. Think of it as the diffusion of two independent entities in a large volume | |
unimolecular forward rate: | |
if the LHS has ambiguous molecularity, this is the rate for unimolecular cases. Think of it as the interaction of agents already connected, possibly through a third party. | |
bimolecular reverse rate: | |
if the RHS has ambiguous molecularity and the rule is reversible, this is the rate for bimolecular cases. | |
unimolecular reverse rate: | |
if the RHS has ambiguous molecularity and the rule is reversible, this is the rate for unimolecular cases. |
What do we mean by ambiguous molecularity? It means we specify two agents which may be already connected through a path not described in a rule; thus the pattern may be bimolecular (two separate things), or unimolecular (two things connected already). Let’s take a look at an example of this situation.
Tip
Ambiguous Molecularity
A LHS has ambiguous molecularity if it has at least two agents that may be connected through a path not stated in the LHS. To observe the proper kinetics, such rules require both a bimolecular and a unimolecular rate.
We want to express the reversible binding relation between Prot1
and Prot2
, who bind through their respective P2
and P1
sites. For the rates, a determinstic binding rate is on the order of
\(10^10\), an unbinding rate around \(10^2\) (this would mean a
disassociation constant \(K_D\) of \(10^{-8}\) molar, or 10
nanomolar). When accounting for volume, let’s use a mammalian cell volume
of \(10^{-12}\) liters, the binding rate becomes \(10^{-2}\); the
unbinding rate shouldn’t care about volume dependency, so the
deterministic rate is the same as the stochastic one. Thus we arrive
at our stochastic rates, a forward (i.e. bind) rate of \(10^{-2}\)
and a reverse (i.e. unbind) rate of \(10^{-2}\). Let’s call such
a rule 'P1.P2'
, it would be written as:
'P1.P2' Prot1(P2), Prot2(P1) <-> Prot1(P2!1), Prot2(P1!1) @ 1.0e-2,1.0e-2
The usage of !n
, where n
is an integer, identifies the binding
endpoints; we could have just as validly used !99
or !0
. Let’s keep
going and add the other two binding rules, one for Prot1
binding
Prot3
, and one for Prot2
binding Prot3
:
'P1.P3' Prot1(P3), Prot3(P1) <-> Prot1(P3!1), Prot3(P1!1) @ 1.0e-2, 1.0e-2
'P2.P3' Prot2(P3), Prot3(P2) <-> Prot2(P3!1), Prot3(P2!1) @ 1.0e-2, 1.0e-2
Warning
It is worth noting that the agents must be in the same order on both sides of the arrow signs. If not, the simulator would replace them with what we told it, thereby effectively degrading the original copies and injecting fresh ones.
For example, agents connected to the original copies would not be bound to the new ones. If the original agents had sites in states not mentioned in the rule, they would be replaced with agents whose sites would be in the default state.
Notice that our rules don’t specify every site of the agents, but just some
of the sites. In Kappa, we follow the don’t care, don’t write philosophy:
if a site is not important to the mechanism we want to represent, then we don’t
write that site. In this case, the binding of our agents depends exclusively
on the respective binding sites; it is independent of the state of the other
binding sites, and the state of their 12
site.
Tip
Don’t care, don’t write
If a site is not important to the mechanism we want to represent, then we don’t write that site. With this action, we make the rule independent of whatever is happening at unmentioned sites.
Having these three rules, we can render the contact map, which would look something like this:
Note
On the proto-IDE, you can drag around the agents to re-organize the contact map. Once you’re pleased with the layout, you can output the file as an image.
Notice there are no unimolecular rates in the above writing of the rules. This means that the simulator will always use the bimolecular rate to bind those agents. Consider however what would happen if we apply a binding rule to agents already bound through a third party!
For example, imagine we have a Prot1
bound to a Prot2
itself bound to a
Prot3
. In such a case, Prot1
’s S3
site is empty, as is Prot3
’s S1
site. Thus it is perfectly valid to apply rule P1.P3
to that Prot1
and that Prot3
. The simulator would use the only rate we gave it.
However, diffusion should play no role in things already bound together. Applying this
rule at that bimolecular rate would invalidate our physical interpretation of the model.
Thus we should refine the rules by adding a unimolecular forward (i.e. binding)
rate that’s much higher than the bimolecular one:
'P1.P2' Prot1(P2), Prot2(P1) <-> Prot1(P2!1), Prot2(P1!1) @ 1.0e-2 (1.0), 1.0e-2
'P1.P3' Prot1(P3), Prot3(P1) <-> Prot1(P3!1), Prot3(P1!1) @ 1.0e-2 (1.0), 1.0e-2
'P2.P3' Prot2(P3), Prot3(P2) <-> Prot2(P3!1), Prot3(P2!1) @ 1.0e-2 (1.0), 1.0e-2
Note
You can consider the unimolecular rate as being similar in spirit to the bimolecular rate, but representing diffusion in a much smaller volume.
Notice that the RHSes of our rules have to be unimolecular: we have the
!1
bond right there. The simulator is smart enough to recognize
this and will use 1.0e-2
as the sole unbinding rate; there is no
point in giving a bimolecular reverse rate as these RHSes can not be
bimolecular. For this reason, it is rare for reversible binding rules to have more than
3 rates: a bimolecular binding, a unimolecular binding, and the
unbinding rate.
Let’s add another rule. Now we want to add the production of Prot1
.
Since for this model we don’t care about gene regulation,
transcription, mRNA regulation, translation, protein folding, maturation, or
transport, but just want to have a steady production of the protein,
we can write a simple zeroth-order rule. In this case, said rule could
be written as:
'creation of Prot1' -> Prot1() @ 1.0
Or more succinctly:
'P1/' -> Prot1() @ 1.0
This rule would add one copy of Prot1()
, fully unbound, and with sites
in their default state, at around 1 per simulated second. At time 10,
we would have around 10 more copies of Prot1
, at time 100, we would have around 100
more copies. So far, our script should look something like this:
############################################################
# Here are my agent signatures
############################################################
%agent: Prot1(P2, P3, S12~un~ph~xx)
%agent: Prot2(P1, P3, S12~un~ph~xx)
%agent: Prot3(P1, P2, S12~un~ph~xx)
############################################################
# Here are my rules
############################################################
'P1.P2' Prot1(P2), Prot2(P1) <-> Prot1(P2!1), Prot2(P1!1) @ 1.0e-2 (1.0), 1.0e-2
'P1.P3' Prot1(P3), Prot3(P1) <-> Prot1(P3!1), Prot3(P1!1) @ 1.0e-2 (1.0), 1.0e-2
'P2.P3' Prot2(P3), Prot3(P2) <-> Prot2(P3!1), Prot3(P2!1) @ 1.0e-2 (1.0), 1.0e-2
'P1/' -> Prot1() @ 1.0
Now that we have defined our agents and how the interact, we must define initial conditions.
Initial Conditions¶
The syntax for initial conditions is quite simple:
%init [number or variable] [kappa expression]
Let’s say we want to start the simulation with five hundred
copies of Prot2
and Prot3
. We could write this as:
%init: 500 Prot2(), Prot3()
This would start the simulation with the above amounts of each agent, with all sites unbound, and sites in their default state. If we wanted to initialize with complexes, we could just as fairly write:
%init: 500 Prot2(P3!1), Prot3(P2!1)
This would add 500 dimers to the simulation. Let’s keep these two declarations of initial conditions. Adding the text pod declaring the initial condition stage, our script so far would look like this:
############################################################
# Here are my agent signatures
############################################################
%agent: Prot1(P2, P3, S12~un~ph~xx)
%agent: Prot2(P1, P3, S12~un~ph~xx)
%agent: Prot3(P1, P2, S12~un~ph~xx)
############################################################
# Here are my rules
############################################################
'P1.P2' Prot1(P2), Prot2(P1) <-> Prot1(P2!1), Prot2(P1!1) @ 1.0e-2 (1.0), 1.0e-2
'P1.P3' Prot1(P3), Prot3(P1) <-> Prot1(P3!1), Prot3(P1!1) @ 1.0e-2 (1.0), 1.0e-2
'P2.P3' Prot2(P3), Prot3(P2) <-> Prot2(P3!1), Prot3(P2!1) @ 1.0e-2 (1.0), 1.0e-2
'P1/' -> Prot1() @ 1.0
############################################################
# Here are my initial conditions
############################################################
%init: 500 Prot2(), Prot3()
%init: 500 Prot2(P3!1), Prot3(P2!1)
It’s now time to declare the observables.
Observables¶
This is one of the most important parts of the script as this dictate the program’s plotting output. If we specify the rules and initial mixture perfectly, but forget to observe for something, then we will see nothing.
The syntax is quite simple, we begin with %obs:
, then
assign a name to that tracking event with 'name'
, and finally the code
of what exactly is the program tracking flanked by pipe symbols |
. For example:
%obs: 'Amount of Protein 1' |Prot1()|
Or more succinctly:
%obs: '[P1]' |Prot1()|
This would report the total amount of agent Prot1
under label '[P1]'
,
in whatever state it is, bound, unbound, modified, etc.
This means that on the output file, one of the column headers will be
'[P1]'
, and for that column, each row will be the time-point indexed
abundance of the label’s definition; i.e. how much Prot1()
was there
at those times. Let’s define three more observables, in this case the
dimers of the system.
%obs: '[P1.P2]' |Prot1(P2!1,P3), Prot2(P1!1,P3)|
%obs: '[P1.P3]' |Prot1(P2,P3!1), Prot3(P1!1,P2)|
%obs: '[P2.P3]' |Prot2(P1,P3!1), Prot3(P1,P2!1)|
From the contact map, we see this the system has the capacity to
generate a cycle. Let’s add another observable to check how many of
these trimer cycles there are. We would be observing for a Prot1
bound to a Prot2
that’s bound to Prot3
itself bound to the
initial Prot1
.
%obs: '[P1.P2.P3]' Prot1(P2!1,P3!3), Prot2(P1!1,P3!2), Prot3(P1!3,P2!2)
So far, our script should look something like this:
############################################################
# Here are my agent signatures
############################################################
%agent: Prot1(P2, P3, S12~un~ph~xx)
%agent: Prot2(P1, P3, S12~un~ph~xx)
%agent: Prot3(P1, P2, S12~un~ph~xx)
############################################################
# Here are my rules
############################################################
'P1.P2' Prot1(P2), Prot2(P1) <-> Prot1(P2!1), Prot2(P1!1) @ 1.0e-2 (1.0), 1.0e-2
'P1.P3' Prot1(P3), Prot3(P1) <-> Prot1(P3!1), Prot3(P1!1) @ 1.0e-2 (1.0), 1.0e-2
'P2.P3' Prot2(P3), Prot3(P2) <-> Prot2(P3!1), Prot3(P2!1) @ 1.0e-2 (1.0), 1.0e-2
'P1/' -> Prot1() @ 1.0
############################################################
# Here are my initial conditions
############################################################
%init: 500 Prot2(), Prot3()
%init: 500 Prot2(P3!1), Prot3(P2!1)
############################################################
# Here are my observables
############################################################
%obs: '[P1]' |Prot1()|
%obs: '[P1.P2]' |Prot1(P2!1,P3), Prot2(P1!1,P3)|
%obs: '[P1.P3]' |Prot1(P2,P3!1), Prot3(P1!1,P2)|
%obs: '[P2.P3]' |Prot2(P1,P3!1), Prot3(P1,P2!1)|
%obs: '[P1.P2.P3]' |Prot1(P2!1,P3!3), Prot2(P1!1,P3!2), Prot3(P1!3,P2!2)|
Execution¶
Now let’s execute the simulation! If you’re using the proto-IDE specify a simulated time of 5000 seconds and 150 points to plot. If you’re running the command-line executable, save your file (e.g. “MyFile.ka”) and invoke KaSim with input-file “MyFile.ka”, to simulate 5000 seconds, and output 150 plot points to a file called “MyOutput.out”, i.e.:
$KaSim -i MyFile.ka -t 51 -p 51 -o MyOutput.svg
This should generate a plot like this:
Notice that, as expected, the amount of P1 steadily increases. Notice
also that the amount of trimer increases up to a point, and then
decreases. In early times, it makes sense the amount of Prot1
was
limiting the assembly of the trimer: there was not enough to go
around. However, what is happening at late times, when Prot1
is in excess?
Notice the amount of the dimers that contain Prot1
, i.e. P1.P2
and P1.P3
, steadily
increase. Thus, although both Prot2
and Prot3
are still binding
independently Prot1
, the likelihood that they bind the same Prot1
decreases as Prot1
accumulates. This inhibitory phenomenon is called a
prozone, and is very well known in immunology as the Hook effect. It
is a product of the concurrency between the binding for Prot1
of Prot2
vs.
Prot3
.
Let’s keep playing! Now let’s think of what would happen if we set the unimolecular binding rates to zero. That is, we disallow entities that are already bound, from further binding. If we set the rates to zero, and hit run with the same plotting parameters, we would get something like this:
The amount of trimer cycle is now zero, as we expected. Things that are bound, can not bind further.
However, the
system is not dominated by the dimers we defined. There are a thousand
copies of Prot2
and Prot3
, but the amount of dimers does not add up to
such a value. What is happening? We can take a look at the reaction mixture by
using snapshots.
Perturbations and Modifications¶
Let’s start by checking the state of the reaction mixture, in what is
called a snapshot
. We can tell the simulator to produce a snapshot with:
%mod: [trigger condition] do $SNAPSHOT [snapshot's name]
This will ouput a snapshot when the trigger conditions are met as a file
whose name we specified. Let’s define our snapshot to be triggered after the
simulation reaches second 4500 and dump that to a snapshot called T4500
:
%mod: [T]>50 do $SNAPSHOT "T51"
Go ahead and add that line to the script, and re-run the simulation with the same time parameters. In the IDE, such a snapshot would look like this:
If I open that in a separate program and manually blow it up, we can see more clearly what is happening in the mixture:
As we can see, the system has produced polymers! Instead of having
dimers, we have much bigger oligomers, like the heptamer at top. How did this happen? Well, when
we made the rules, we only mentioned some sites. For example, the
binding of Prot1
and Prot2
only mentions their respective P2
and P1
sites; it says nothing about their respective P3
sites. Thus, this binding event is independent of whatever is the state of those P3
sites. For example, if there are two dimers, say:
P1(P2,P3!1), P3(P1!1,P2)
P2(P1,P3!1), P3(P1,P2!1)
Can we apply rule P1.P2
to those agents? Yes, we can! Those P1
and P2
can bind through their respective P2
and P1
sites to generate a tetramer:
P3(P1!1,P2), P1(P2!2,P3!1), P2(P1!2,P3!3), P3(P1,P2!3)
By a similar process, any n mer can recruit an *m*mer if it has the right agent capping it. This leads to open-ended polymerization.
This illustrates a consequence of Kappa’s don’t care, don’t write philosophy. If the mechanism we are trying to express states only that those bonds depend on those sites, the system does indeed have the capacity to oligomerize, even if the modeler did not write that in.
If we wanted a system with geometric constrains, that means the sites would be constrained to each other’s bond-state. To make a 3 agent system where the biggest entity is the trimer, one would have to write the 3 possible collision events of the respective obligate monomers, in addition to the 3 collision events of the monomers with the compatible dimers. In effect, one ends up writting molecular species (i.e. where every site is declared) instead of patterns (i.e. where some things are omitted for independence), to include the geometric constrains.
Causal analysis¶
ToDo
Local installation¶
ToDo
Glossary of Symbols¶
#
- start single line comment
%agent:
- command to define an agent
%obs:
- command to define an observable
%var:
- command to define a variable
%mod:
- command to define a modification or perturbation
%def:
- command to define something, like a file name or the graphical format of a snapshot
''
- single quote for internal naming, e.g. for rule names (
'
vs."
) ""
- double quote for external naming, e.g. for file names (
'
vs."
) @
- specify the reaction’s rate
@ X,Y
- rates for the reversible reaction,
X
: forward,Y
: reverse @ X(Y)
- rates for the rule with a molecularly ambiguous LHS, bimolecular(unimolecular)
Smith(foo)
- Specifies site
foo
on agentSmith
Y!x
- Where
x
is a number, it indicates the bond’s identity ending on siteY
Y!_
- Indicates site
Y
bound to anything (useful in observables) Y?
- Indicates it doesn’t matter if site
Y
is bound, to what, or not (notice the absence of!
) Y~foo
- Specifies site
Y
in statefoo