Cellular Potts - Extracellular Matrix coupled simulation¶
The Cellular Potts Model (CPM) models one or more cells sitting on or in a substrate. In an organism, that substrate is the Extracellular Matrix (ECM). In the CPM, this matrix isn’t represented explicitly, which limits the amount of detail with which cell-ECM interactions can be modelled.
The Tissue Simulation Toolkit solves this by adding a separate simulation of the ECM using coarse-grained molecular dynamics (MD). CPM cells can then form adhesions with which they can grab the ECM, thus applying a force to it, which is resisted by the ECM resulting in a counterforce. As the simulations run, these forces are exchanged as boundary conditions between the models.
Simulation design¶
The CPM-ECM coupled simulation has two main components: the CPM and the ECM. The CPM models the cells and their adhesions on a grid of pixels, with each pixel containing the number of the cell it is a part of, or 0 if there is no cell present.
The ECM is represented by a large number of simulated collagen fibrils, each represented by a small number of particles connected by bonds, which resist stretching, compression and bending loads. The fibrils are connected by crosslinkers to create a coherent material.
Cells in the CPM can form adhesions, which are represented by a particle as well. This particle is connected to a CPM cell occupying the pixel it is inside of, and can be connected to the beads in the ECM fibrils by a bond. These bonds are modelled as linear springs, which stretch or compress as the cell tries to move.
The models influence each other via the forces created by the bonds between the adhesion beads. If a cell in the CPM wants to move an adhesion particle, then it needs to stretch the bond(s) it is attached to, and the energy this takes is taken into account in the copy attempt. Likewise, if the ECM wants to move a fibril particle that is bonded to an adhesion, then it will have to stretch or compress the bond spring accordingly.
Thus, the models occupy adjacent domains and exchange boundary conditions in the form of forces between MD particles. The ECM “sees” the adhesion particles of the CPM, while the CPM “sees” any fibril particles that share a bond or angle constraint with the ECM. This collection of adhesion and associated fibril particles we call the boundary.
Coupling¶
A coupled simulation run starts with creating an initial ECM, which is done by a separate component (see below). The ECM is then equilibrated using the main simulation code, after which the CPM and the ECM run in parallel. After initialising (which is done internally for the CPM), the CPM sends a set of interaction attempts to the ECM model. These can be the creation of new adhesions, or movement of existing ones [1]. At the same time, the ECM sends to the CPM the current location of all the fibril particles in the boundary, as well as the bonds and angle constraints involving them.
The CPM stores the locations of the boundary particles and associated
constraints in a data structure called the AdhesionIndex
, which stores them
in a form that allows for efficient access for calculating forces on the
adhesion particles. The ECM meanwhile updates its state based on the requested
interactions.
Now both models run, the CPM performing one Monte Carlo step (MCS) and the ECM running a number of timesteps to re-equilibrate the forces between the particles. During the MCS, the CPM considers the fibrils to be fixed in space, it only (potentially) moves the adhesion particles. Vice versa, the adhesion particles are fixed in the ECM model, which only moves the fibrils.
When the MCS and equilibration are done, the models loop back and communicate again as described above, and this process is repeated for however many timesteps are set in the configuration.
Implementation¶
MUSCLE3 coupling¶
The CPM-ECM coupled simulation is implemented using the MUSCLE3 model coupling
framework. MUSCLE3 coupled simulations consist of separate, independent programs
that exchange messages as they run. The simulation consists of three main
programs, plus an additional optional one that produces output. The main
programs are make_ecm
, which makes the initial extracellular matrix,
simulate_ecm
, which is used to equilibrate the initial ECM and also to
simulate its behaviour during the main part of the run, and cellular_potts
,
which simulates the cells.
MUSCLE3 calls these programs simulation components, or components for short. Components communicate by sending messages to each other. Each component has a set of named ports, which it can send messages to or receive messages from. The MUSCLE3 configuration file specifies which components there are, and how their ports are connected. By changing the configuration file, the connections between the components can be changed as needed to do different kinds of things.
The current simulation is defined in src/models/adhesions.ymmsl.in
. It
describes all four components described above, and then connects them together.
The make_ecm
component has a port named ecm_out
on which it sends the
ECM it creates. This port is associated with the so-called o_f
operator,
which specifies that this port is used to send the Final Output of the
component. This port gets connected (via a conduit, which is a tube that the
messages pass through from one port to another) to the ecm_in
port on the
equilibrate_ecm
component. This ecm_in
is an f_init
port, meaning
that equilibrate_ecm
can receive messages on it that it uses to initialise
its state at the beginning of the run.
The equilibrate_ecm
component then runs a number of MD steps to equilibrate
the forces, during which it doesn’t communicate, and then it sends the final
result on its ecm_out
port (also o_f
of course). This port is connected
to the ecm_in
port of simulate_ecm
, which initialises itself using the
received equilibrated ECM data (and so that ecm_in
port is an f_init
port). simulate_ecm
then goes and runs, and while it does so it
communicates with cellular_potts
.
The cellular_potts
model has its own built-in initialisation, so it does not
have any f_init
ports. It does have three other ports though, which are for
communicating inside of its main loop (the one that does Monte Carlo steps). The
first one of these is called cell_ecm_interactions_out
, and it is used to,
at each MCS, send the desired interactions with the ECM. This is not the final
output but an intermediate one, so it’s an o_i
type port. Then there is
state_out
, which is used once in a while to send out the entire state (see
below). Finally there is an ecm_boundary_state_in
port, on which
cellular_potts
receives the current state of the boundary between the ECM
and the CPM (more on that below). This is of type s
, because the information
received is used during the subsequent state update.
As you can imagine, simulate_ecm
has the opposite ports: an
ecm_boundary_state_out
to send the state of the boundary particles (of type
o_i
, this being an intermediate output), and a cell_ecm_interactions_in
to receive the interactions (of type s
). Like cellular_potts
,
simulate_ecm
has a state_out
port on which it periodically sends its
whole state.
As a result of this coupling scheme, the simulation runs by first running
make_ecm
, then equilibrate_ecm
, and then cellular_potts
and
simulate_ecm
simultaneously and in lock-step.
The yMMSL file consists of three more sections, settings
, resources
and
implementations
. The settings
are essentially a dictionary containing
named values that can be used by the models. If an entry here starts with the
name of a component, followed by a period and the name of the setting, then that
setting applies specifically to that component. Otherwise, it applies to all of
them. The TST C++ code has its own Parameters system (see
src/util/parameters.hpp
), which for the cellular_potts
model these
parameters are loaded into. As a result, you only need to set them once in the
yMMSL file, and they’ll show up everywhere automatically.
The resources
section specifies how many cores to use for each component. By
default, the components all run as single-threaded programs, but for larger
simulations the simulate_ecm
program can be run with MPI, in which case it
can use multiple cores. See Going faster below.
Finally, the implementations
section specifies how to start the different
programs. For the Python programs, it activates the virtualenv at venv/
(which all the Python code gets installed into, see the documentation on the
build system) and runs a command (these are mapped to a specific Python function
in pyproject.toml
). For the C++ code, it needs to set the
LD_LIBRARY_PATH
to make MUSCLE3 available, and then it runs the executable
in bin/
. There are also some configurations for debugging and profiling,
which are explained below in Developing.
So this defines the overall structure. The rest of this section describes the components in more detail.
Cellular Potts Model¶
The CPM side of the simulation consists mostly of the CellularPotts
class
from TST, with an additional extension for computing the work required to move
the adhesion particles. The source code for this can be found in the
src/adhesions/
directory.
As described above, there are two things the CPM needs to do: 1) receive the locations of the particles in the boundary, and calculate the work required to move the adhesions, and 2) move the adhesions, and keep track of these moves so that they can be sent to the ECM.
Part 1) starts with receiving an ECMBoundaryState
object with the particles
and the bonds. This is passed to an AdhesionMover
, which is the main class
implementing the adhesions. It uses the received data to (re)build its
AdhesionIndex
, which stores that data in an efficient-to-access way and
which can use that stored data to calculate the work required to move an
adhesion particle.
The CPM simulation can then ask the AdhesionMover
to calculate the
additional work required to copy a cell that has an adhesion particle in it,
and/or to copy a cell onto a cell that has an adhesion particle in it. This
entails choosing whether and where to move the adhesion, the logic for which is
in a number of functions in adhesion_movement.cpp
, and the AdhesionIndex
is used to calculate the resulting energy requirement.
Part 2) starts when an adhesion is moved. As this move needs to be taken into
account for subsequent copy attempts, the AdhesionIndex
needs to be updated,
so that is done by the AdhesionMover
. The AdhesionIndex
has an
ECMInteractionTracker
which keeps track of the changes that were made. At
the end of the Monte Carlo step, CellularPotts
will ask the
AdhesionMover
for these changes, which will get them from its
AdhesionIndex
which gets them from the ECMInteractionTracker
. The
tracker is then reset for the next MCS.
The main program for the CPM side of the coupled simulation is in
src/models/adhesions.cpp
. This is currently basically the chemotaxis.cpp
model, but with additional code for communicating with the ECM. When running
with MUSCLE3, no .par
file is used. Instead, parameter values are obtained
from MUSCLE3 via its settings mechanism (see MUSCLE3 coupling) and loaded
into the TST parameters object via a utility function in
src/util/muscle3/settings.hpp
.
On timestep zero, the CPM sends a special interaction request which asks the ECM to randomly convert a number of fibril particles into adhesion particles. Conceptually, this is a bit funny, but it’s what was done in the original TST-MD code as a way of getting started. This and moving adhesions are currently the only kinds of interactions implemented, so a bit of work needs to be done to be able to dynamically add and remove adhesion particles. This is left as an exercise for the reader :).
Note that the code in src/models/adhesions.cpp
is a bit messy, and not a
nice example of what a MUSCLE3 simulation component should look like. This is
currently the way that TST models are written, so it is what it is at least for
now. MUSCLE3 was designed to be flexible though specifically to cater to this
situation, so it all does work.
Extracellular matrix¶
The ECM simulation is implemented in Python using the hoomd particle simulator.
The implementation can be found in src/ecm/
. The Simulation
class
implements the simulation of the ECM. It is initialised with an initial ECM via
an object of type MDState
, and it receives a set of parameters governing its
operation. apply_interactions()
can be used to apply a received interaction
request represented by a CellECMInteractions
object, after which the
run()
method runs a number of timesteps set by the parameters, and then
get_boundary_state()
is used to extract the state of the adhesion particles
and the fibril particles attached to them as an object of class
ECMBoundaryState
, for sending to the CPM.
The state of the ECM, i.e. the particles and bonds and angle constraints and
their types, are kept by hoomd
, possibly on the GPU. Simulation
uses an
instance of class Boundary
(also in simulation.py
) to keep track of
which particles are in the boundary, so that it can extract only those as
needed. Since the boundary is only a tiny fraction of the whole state, this
saves a lot of time.
The easiest way of interacting with hoomd
is to use a so-called snapshot,
which is a copy of the simulation state. When running on multiple cores with
MPI, the snapshot conveniently collects all the data on the MPI process with
rank 0, so that it’s easy to process. Unfortunately, this is also rather slow
and it was holding up the simulation, so we use a hoomd
local snapshot and a
bunch of custom MPI communication to extract the data. For the ECM size I
tested, this brought the duration of one ECM run down from 280 ms to 25 ms, so
it’s worth the extra complexity.
The main program is in src/ecm/simulate_ecm.py
. It sets up the simulation,
gets parameters from MUSCLE3, receives an initial ECM configuration, and then
runs the main loop, on each iteration communicating with TST. It also sends the
final state of the ECM out at the end of the simulation.
Besides the main part of the code in simulation.py
and simulate_ecm.py
,
there are various helpers in src/ecm/
that require some explanation:
parameters.py
Contains two data classes with the parameters used when generating and evolving the ECM. These are obtained from MUSCLE3 and loaded into an object of these classes by the
from_settings()
function inmuscle3.py
.muscle3.py
Contains helper functions for encoding and decoding the data we’re exchanging between dictionaries (which MUSCLE3 knows how to send) and the classes we use to represent them in the rest of the code.
muscle3_mpi_wrapper.py
This is a helper class to compensate for the fact that MUSCLE3 doesn’t support MPI in Python (yet). It defines an
Instance
class that wraps MUSCLE3’sInstance
class and does the things that need to be done when running in parallel with MPI (see also Going faster below). This class also works if MPI is not installed at all; if you’re on a machine with a single GPU then MPI doesn’t really add anything so it’s nice for it to be optional. When MUSCLE3 gets MPI support for Python then this class can be removed.
Making the initial ECM¶
Before we can start the simulation, an initial ECM needs to be generated. This
is done by a separate program at src/ecm/make_ecm.py
. This generates the ECM
using code taken from TST-MD, and then sends it out to the rest of the
simulation. make_ecm.py
is just a very simple MUSCLE3 component, the actual
generation code is in src/ecm/network/
.
Producing output¶
No science is complete without some pretty pictures, so the simulation needs to produce some output. The CPM part of the code will plot its state to the screen when running as usual, but it cannot show the ECM in the plot because it doesn’t have that information. Vice versa, the ECM component has its own state, but doesn’t know where the cells are.
So, in order to create a plot showing both, this information needs to be brought
together. This is optionally done every N timesteps, with N set by the
state_output_interval
setting. At these points, both ECM and CPM will send
an extra message containing their entire state. These messages are routed to a
separate component that either writes them to disk
(src/cpm_ecm/state_dumper.py
) or plots them on the screen
(src/cpm_ecm/state_viewer.py
). A little program in
src/scripts/plot_states.py
can be used to create a series of PNG files from
the dumped states, which can then be encoded into a movie file.
The extra simulation component is not in src/models/adhesions.ymmsl.in
, but
src/models/dump_state.ymmsl.in
or src/models/plot_state.ymmsl.in
. To
enable either (or both), you can add them to the muscle_manager
command
line. See Running locally below for examples.
Note that the plotting here is done using the Matplotlib-based plotting code
from TST-MD
. This is very slow. TST-MD-V2
has some faster Qt-based code,
and making a qt_state_viewer.py
based on that is probably a very nice little
project to get familiar with MUSCLE3 and the whole setup. Meanwhile, this works.
Building the coupled simulation¶
(This is for running locally, see Running on Snellius below if you’re on Snellius or another HPC machine.)
The coupled simulation, and the components needed for it, aren’t built by
default if you use make
, they need to be built explicitly. This is supported
on Linux and MacOS, and it may require installing some extra dependencies, in
particular Python and possibly MPI and CUDA.
The ECM simulation uses HOOMD, which can use a GPU (if you have one) and/or MPI (if you want to run on multiple CPUs or multiple GPUs). This will really speed things up, so it’s recommended to make the effort. Depending on your configuration, here’s what you need. (Note that we’re counting physical CPU cores, not hyperthreads, and that if you have a fairly recent midrange laptop or desktop, then you probably have more than two physical cores.)
Note that the instructions below assume you’ve already followed the instructions
from the README at least up to make
, so that you have Qt and the other
dependencies installed.
No GPU, two CPU cores or less¶
Neither MPI or GPU support will help, so we’ll build without either. Use
Tissue-Simulation-Toolkit$ make with_adhesions
No GPU, more than two CPU cores¶
Build with MPI support to use multiple CPU cores for the ECM simulation.
You’ll need to install MPI first if you don’t have it already. On Ubuntu:
$ sudo apt install openmpi-dev
On MacOS you can use Homebrew or MacPorts:
$ brew install open-mpi
$ sudo port install openmpi
Then, you can build using
Tissue-Simulation-Toolkit$ ENABLE_MPI=1 make with_adhesions
One GPU¶
Build with GPU support, MPI won’t help and is not needed.
On a local machine, you’ll need to install CUDA first following the instructions on the nVidia website. AMD GPUs can work in theory as long as they support HIP, but in practice this may be tricky.
Then, you can build using
Tissue-Simulation-Toolkit$ HOOMD_BUILD_OPTIONS=-DENABLE_GPU=ON make with_adhesions
Multiple GPUs¶
Build with both MPI and GPU support to use all of them.
You’ll need to install MPI first if you don’t have it already. On Ubuntu:
$ sudo apt install openmpi-dev
On MacOS you can use Homebrew or MacPorts:
$ brew install open-mpi
$ sudo port install openmpi
On a local machine, you’ll need to install CUDA first following the instructions on the nVidia website. AMD GPUs can work in theory as long as they support HIP, but in practice this may be tricky and we haven’t tried it.
Once CUDA is there, you can build using
Tissue-Simulation-Toolkit$ ENABLE_MPI=1 HOOMD_BUILD_OPTIONS=-DENABLE_GPU=ON make with_adhesions
Known issues¶
Building the coupled simulation will take some time, hoomd in particular is very slow to build. Not much to do about it other than to go have lunch, or go do something else for a bit. Note that almost all of that time is spent building hoomd and the other dependencies, so if you change something in TST and then rebuild, the rebuild will only rebuild TST, which is nice and quick. So this is a one time wait.
If you get the following error
c++: fatal error: Killed signal terminated program cc1plus
then the hoomd build ran out of memory. Hoomd seems to need a large amount of memory to build, and we run multiple compilers in parallel to speed things up. If together they try to use more memory than you, have, this rather unhelpful error will appear. To solve it, you need to reduce the amount of parallelism by using fewer cores to build hoomd. The TST build system uses every core you have by default, but this can be changed by setting HOOMD_CORES:
Tissue-Simulation-Toolkit$ HOOMD_CORES=4 ENABLE_MPI=1 make with_adhesions
or whichever combination of options you selected above.
Running locally¶
Once the build is done, it’s time to run the simulation. The Python parts of the
simulation will have been installed in a virtual environment called venv
, in
the top TST directory. MUSCLE3 is also in there, and we’re going to start the
simulation via MUSCLE3, so we need to activate it first. Then, we can ask the
MUSCLE3 manager to run the simulation for us:
Tissue-Simulation-Toolkit$ . venv/bin/activate
Tissue-Simulation-Toolkit$ muscle_manager --start-all ymmsl/adhesions.ymmsl ymmsl/plot_state.ymmsl
You will see a two windows pop up, one is the graphics output from the Cellular Potts Model, the other one the combined display described above under Producing output. It takes a few seconds for data to appear, because the ECM needs to be built first, and that has no visible output. Once the ECM is ready, it is sent to the ECM simulation component and that and the CPM can start running.
Once the run is done, you’ll find a directory named
run_adhesions_<date>_<time>/
, which we call the rundir. This directory
contains a file documenting the configuration of the model as run, a log file
for the manager, performance information (see the MUSCLE3 documentation), and
a subdirectory instances/
in which you can find a directory with output for
each of the components. It’s a good idea to have a look through these files and
see what you can find where.
One note: you may see something like this at the end of the
muscle3_manager.log
file:
muscle_manager 2023-11-14 21:11:47,489 INFO libmuscle.manager.instance_manager: Instance make_ecm finished with exit code 0
muscle_manager 2023-11-14 21:11:47,489 INFO libmuscle.manager.instance_manager: Instance cellular_potts finished with exit code 0
muscle_manager 2023-11-14 21:11:47,489 INFO libmuscle.manager.instance_manager: Instance state_dumper finished with exit code 0
muscle_manager 2023-11-14 21:11:47,489 INFO libmuscle.manager.instance_manager: Instance equilibrate_ecm finished with exit code 0
muscle_manager 2023-11-14 21:11:47,489 INFO libmuscle.manager.instance_manager: Instance simulate_ecm finished with exit code 0
muscle_manager 2023-11-14 21:11:47,489 ERROR libmuscle.manager.instance_manager: At this point, one or more instances crashed because they lost their connection to another instance, but no other crashing instance was found that could have caused this.
muscle_manager 2023-11-14 21:11:47,489 ERROR libmuscle.manager.instance_manager: This means that either another instance quit before it was supposed to, but with exit code 0, or there was an actual network problem that caused the connection to drop.
muscle_manager 2023-11-14 21:11:47,489 ERROR libmuscle.manager.instance_manager: Here is the output of the instances that lost connection:
This looks like there was a problem, and there is a problem, but it’s in MUSCLE3, not in TST. What happened is that in the latest release, I redid the code that analyses the outcome of the simulation, so as to give a better error message. The new version now gives better explanations of all sorts of weird corner cases, but I somehow managed to mess up the case where everything goes well. So it prints the final three ERROR lines instead of saying that the simulation ran successfully. Doh!
There’s already a fix for this in the MUSCLE3 git repository, which will be released with the next version, so when that’s out we’ll upgrade and this will disappear.
Dumping state¶
If you want to make a movie (and who doesn’t like movies?), then you need to
save the state of the simulation regularly. There’s a separate component that
does this, which can be added to the simulation by adding the
ymmsl/dump_state.ymmsl
to the command line. You can remove the plotting, or
run both. Now, in your rundir there will be an instances/state_dumper/
which
has in its workdir/
a collection of data files. To plot the data in these
files to PNG files, there’s a helper script in src/scripts/plot_states.py
which gets installed in the virtual environment. So you can run
Tissue-Simulation-Toolkit$ plot_states run_adhesions_<date>_<time>
and it will write one PNG file next to each .pickle
file. On Ubuntu, if you
install ffmpeg
then you can convert that to a video using
state_dumper/workdir$ ffmpeg -f image2 -framerate 20 -i state_%04d0.png -c:v libx264 -strict -2 -preset slow -pix_fmt yuv420p -f mp4 video.mp4
if you saved every 10th state.
Creating the PNG images is currently rather slow, because we’re using Matplotlib. Making a faster Qt-based version is left as an exercise for the reader (who can then also update this documentation :)).
Developing¶
Debugging¶
The adhesion model included with TST should run out of the box, but scientifically it leaves something to be desired. A lot, actually, as it’s intended to be a reasonably representative (from a pure performance standpoint) simulation for benchmarking purposes. To make it scientifically accurate, you’re going to want to do some programming.
Of course, when you’re programming your code is most of the time broken (it’s just the natural state of things), and you find yourself trying to fix it. It can be useful to run the model in a debugger, so that you can step through it.
If you browse to the bottom of the src/models/adhesions.ymmsl.in
file,
you’ll find that there’s a tst_adhesions_debug
implementation defined.
You’ll probably want to modify it a bit to run your favourite terminal
application, e.g. gnome-terminal
. Next, scroll up to components:
and
change the implementation for cellular_potts
to tst_adhesions_debug
to
use it.
Once you’ve done so, you can do make ymmsl/adhesions.ymmsl
in the top
directory to rebuild that file, and run again. Now when you start the model, a
terminal window will pop up with gdb
active, which will work normally. You
can type run
to start, and if it crashes bt
will produce a backtrace.
If your code throws a C++ exception, then it will crash at the place where the
exception is handled, rather than where it was thrown. You probably don’t want
that, because you won’t be able to see what the problem was. If you type catch
throw <exception_type>
before doing run
, then GDB will stop when the
exception is thrown, and you can do a backtrace there. We refer you to the GDB
manual for more, and to the MUSCLE3 documentation.
Profiling¶
MUSCLE3 has built-in profiling functionality. If you’ve had a look around a
run directory, you’ve probably seen a performance.sqlite
file. If you
activate the venv
virtualenv in the top directory, then you’ll have the
muscle3 profile
command available which will show you some statistics. See
the MUSCLE3 documentation for how to use it.
Going faster¶
ECM with MPI support¶
Simulating the extracellular matrix is quite expensive, and if your performance data show that it is the most expensive part of the simulation, then making it faster will get you your results sooner. One way to do this is to run on multiple CPU cores, or even on multiple GPUs.
Hoomd can do that, but only if it’s built with MPI support (see above), and if
we tell MUSCLE3 to start it with MPI. At the bottom of
src/models/adhesions.ymmsl.in
you’ll find a simulate_ecm_mpi
implementation. It’s just like simulate_ecm
, except that we tell MUSCLE3 to
start the submodel using OpenMPI. To use it, find simulate_ecm
at the top
under model/components
and set its implementation
to
simulate_ecm_mpi
.
Next, we need to specify the number of MPI processes we want. This is done in
the resources
section, where without MPI we specify threads: 1
to start
the simulation as an ordinary single-threaded program, but where we will now set
mpi_processes: <N>
. If you have M CPU cores in your machine, then you will
want to set N to at most M-1 cores, so that the Cellular Potts model has a core
to run on. You may also leave a core for the plotter or state dumper, and use
the remaining M-2 cores for the ECM simulation. If you are running on GPU
(see below), then you need to set N equal to the number of GPUs you are using,
as hoomd is designed to use one MPI process per GPU.
If you’re getting a message
RuntimeError: Error registering instance: An instance with name simulate_ecm was already registered. Did you start a non-MPI component using mpirun?
then you probably somehow ended up with a virtual environment without mpi4py
installed. You can use
$ make mpi4py
in the Tissue-Simulation-Toolkit
directory to resolve this.
ECM with GPU support¶
Molecular Dynamics simulations involve many particles, for each of which the
same calculation needs to be done. GPUs are very good at this, so for large
systems using one is a good idea. To run on a GPU, first hoomd needs to be
compiled with GPU support (see above), and then the md_use_gpu
setting needs
to be set to true
in the ymmsl file. And of course you need to have a GPU
:).
If you have multiple GPUs, as on some HPC machines, then you need to combine the MPI instructions with the GPU instructions, setting the number of MPI processes to the number of GPUs. Hoomd will then automatically distribute the work over the available GPUs.
Running on Snellius¶
If you have a large simulation (many pixels, many particles) then you may need more hardware to run it quickly. The national supercomputer Snellius has this hardware, in particular it has GPU nodes with four fast GPUs and 72 CPU cores. If your laptop or desktop isn’t quite keeping up, then it’s time to move to Snellius. Here’s how to do that.
First, you need to get a Snellius account (talk to Roeland). Once you have one, you can use ssh to connect to it (see the Snellius documentation), which will give you a Linux command line on a Snellius head node. This is where we will build TST.
Getting TST onto Snellius¶
First though, we need to get it onto the machine. If TST were open source, then
we could just git clone
it, but since it isn’t, things get a bit more
complicated. There are basically two options: 1) clone locally and then upload
the files to Snellius using scp
, or 2) set up SSH agent forwarding and git
clone
on Snellius itself. Option 1) doesn’t require any set-up, but is more
cumbersome every time you do it, while option 2) requires some set-up but is
then probably easier in use. Option 2) also has the advantage that you’ll be
able to commit and push changes you make on Snellius back to Github directly, as
opposed to having to download them again and push.
To copy files to Snellius, you can use scp
. For example, to copy the
Tissue-Simulation-Toolkit/
directory to the Snellius scratch file system,
you can type this:
$ scp -r Tissue-Simulation-Toolkit snellius.surf.nl:/home/<username>/
If you have any run directories from local runs in your
Tissue-Simulation-Toolkit
directory, then you’ll probably want to avoid
copying those. I’ve used tar
before to create a .tar.bz2
with only the
things I wanted (see its --exclude
option) and copied that, then extracted
it on the Snellius headnode. For example:
$ tar -c -f TST2.tar.bz2 --exclude 'Tissue-Simulation-Toolkit/run_*' --exclude 'Tissue-Simulation-Toolkit/venv' --exclude 'Tissue-Simulation-Toolkit/.tox' Tissue-Simulation-Toolkit
$ scp TST2.tar.bz2 snellius.surf.nl:/home/<username>/
$ ssh snellius.surf.nl
# on Snellius
$ tar xf TST2.tar.bz2
If you’re packing up a clean tree, which is definitely recommended, then you
don’t need the --exclude
options. It will work without them even if you
don’t have a clean tree, but those folders contain a lot of files and a lot of
bytes, so your upload may take a while…
So that’s a bit of a hassle, and if you make any local changes to the code
you’ll have to upload them again in the same way. Or if you make any changes on
Snellius then you’ll have to download them. Git is designed for this of course,
and you can use it if you go with option 2), which is explained in the Github
documentation on SSH agent forwarding.
Once you have that set up, you can git clone
and git pull
and git
push
on Snellius as usual. If you set up a work branch, then it becomes really
easy to commit a change locally, push it, then pull it on Snellius, or vice
versa, and keep everything nicely synchronised.
Compiling TST on Snellius¶
Since you’re on Snellius, you probably want to go fast, so we need MPI and GPU
support. For this we need OpenMPI and CUDA as well as the other dependencies,
and on Snellius (as on most HPC machines) these are already installed and made
available via the module
command. The module
command changes your
shell’s working environment in such a way that the program you ask for can be
found. If you log out, your shell quits and the settings disappear, so you’ll
have to run this every time you log in.
$ module load 2022
$ module load Qt5/5.15.5-GCCcore-11.3.0 CMake/3.23.1-GCCcore-11.3.0 OpenMPI/4.1.4-GCC-11.3.0 Python/3.10.4-GCCcore-11.3.0 CUDA/11.8.0
With that done, and assuming that you have got the source code onto Snellius, we can compile everything:
$ cd Tissue-Simulation-Toolkit
Tissue-Simulation-Toolkit$ make clean clean_hoomd
Tissue-Simulation-Toolkit$ HOOMD_BUILD_OPTIONS='-DENABLE_GPU=ON -DENABLE_MPI=ON' HOOMD_CORES=4 make -j 16 with_adhesions
7:42 TODO -j 16 hoomd -j 4 12:25 -> 4:43 hours
Compiling will take quite a while (more than an hour), so you’ll want to go do something else in the mean time. Do keep the connection open! Otherwise the shell will crash and compilation will stop.
Once this is done the model is ready to run, except that starting programs works
a bit differently on an HPC machine, and we need to tell MUSCLE3 about this in
the src/models/adhesions.ymmsl.in
file. You can edit this file using
nano
(or vim
, if you are familiar with it), changing the implementations
for equilibrate_ecm
and simulate_ecm
to simulate_ecm_snellius
, and
the implementation for cellular_potts
to tst_adhesions_snellius
.
If you look at the bottom of the file, then you’ll see that these
implementations differ from the local ones in that they load the modules
required to run the corresponding model programs, and in that the
simulate_ecm_snellius
one uses MPI. Because of that, you need to go to
resources
and set the resources for simulate_ecm
and equilibrate_ecm
to mpi_processes: <N>
, where <N> is the number of GPUs you want to use.
Finally, we want to actually use the GPU, so set use_gpu: true
under
settings
.
In nano, you can use Ctrl-O to save your changes and Ctrl-X to quit, after which
you can use make ymmsl/adhesions.ymmsl
to rebuild that file.
Running TST on Snellius¶
Running programs on a supercomputer works a bit different than locally. So far, you have logged in to a head node, which is a helper computer that is used for accessing the cluster and compiling software. The head node is not for running calculations however, and should never be used for that.
To run simulations, you need to submit a job to the scheduler. The scheduler on Snellius is called Slurm, and its job is to keep a queue of jobs that users want to run, and run them on the available worker nodes. These worker nodes are often busy (there are many people using Snellius at any one time), so your job will typically sit in the queue for a bit until there’s a worker node available, after which it will be started on the worker node and run. Any output must be written to disk, so that you can pick it up when the job is done.
Every job you run consumes a certain number of System Billing Units, or SBUs.
Your account is associated with a budget, from which these SBUs are deducted.
The accinfo
will, after a few seconds, print an overview of the budget
you’re using and how many SBUs are left. A full GPU node costs 512 SBUs per
hour, a 1/4 node costs 1/4 of that.
Submitting a job is done using the sbatch
command, which expects to be given
a shell script with the commands to run. There’s one for running CPM-ECM on
Snellius in src/scripts/snellius_cpm_ecm.sh
. This file looks like this:
#!/bin/bash
#SBATCH -J cpm_ecm
#SBATCH --time=00:10:00
#SBATCH --partition=gpu
#SBATCH --nodes=1
#SBATCH --ntasks=18
#SBATCH --cpus-per-task=1
#SBATCH --gpus=1
module load 2022
module load Python/3.10.4-GCCcore-11.3.0
cd ${HOME}/Tissue-Simulation-Toolkit
source venv/bin/activate
export QT_QPA_PLATFORM=offscreen
muscle_manager --start-all ymmsl/adhesions.ymmsl ymmsl/dump_state.ymmsl
Many of the commands here will look familiar if you’ve run locally and have built TST. There’s a new section at the top however which tells Slurm how many hardware resources we want. Here’s what these options mean:
- -J cpm_ecm
Says that this job is called
cpm_ecm
. It will be listed as such in the queue.
- –time=00:10:00
Specifies for how long we will calculate. The simulation will be shut down by the system after this time, even if it isn’t done yet. The value here is 10 minutes. If your settings result in a simulation that takes longer to run, then you’ll have to ask for more time to make sure that it has a chance to finish. Note that it’s fine to ask for more time than you end up needing (you only pay for what you use), but the longer a period you ask for, the longer it may take for Slurm to find you a free worker node.
- --partition=gpu
Snellius has different kinds of worker nodes, some with GPUs, some with extra memory, and so on. They are grouped by type into partitions, and this says that we want a node with GPUs.
- –nodes=1
We’re asking for one node here. One worker node in the GPU partition has 72 CPU cores and four GPUs, which is a huge amount of compute power. It’s also the maximum we can run on at the moment, because of a limitation in MUSCLE3 (it doesn’t know what a GPU is, and when running on multiple nodes it wouldn’t distribute the hoomd MPI processes over the nodes correctly). This is not much of an issue in practice, because with four GPUs the CPM is already the limiting factor so that adding more wouldn’t speed things up.
- –ntasks=18
This tells Slurm that we want to have 18 CPU nodes, or 1/4 of the node. Since a single GPU node has such a large amount of compute capability, they are actually split into four quarter nodes. For large-but-not-huge simulations, using more than one GPU doesn’t help (it even slows things down, because there’s more communication overhead) and it doesn’t make sense to pay for a whole node. This option will ask for a quarter node (18 out of 72 CPU cores). Note that the ECM model uses as many CPU cores as there are GPUs, CPM uses a single core, and so does the state dumper, so most of these cores will actually sit idle during the simulation. We could ask for less, but accounting is done by the quarter node so we’re paying for them anyway.
- –cpus-per-task=1
Actually, what we’re doing here is to tell Slurm we’re going to start 18 programs (ntasks=18) with one thread each (this option). That will get us 18 CPU cores, which MUSCLE3 will then use to start our actual simulation. Slurm doesn’t understand coupled simulations, but it will just give us our CPUs and MUSCLE3 knows what to do with them, so it’s all good.
- –gpus=1
This asks for a single GPU, which is one quarter of the total available.
Depending on your settings, you will probably have to change the time requested,
and if you need to go faster then you can go to 36 tasks and 2 GPUs, or even the
full 72 and 4 (be user to set the mpi_processes
for simulate_ecm
to the
number of GPUs too!). It’s good to experiment a little here with some short runs
to see how fast you’re going and whether the additional resources help (or help
enough to be worth it, paying four times as many SBUs per hour for a 10%
increase in speed doesn’t make much sense for example, also because it costs
more energy and we only have one planet) before launching a run with the full
number of timesteps.
If you have put the TST source code somewhere else than in
Tissue-Simulation-Toolkit
in your home directory, then you’ll have to modify
the cd
command to point to the right place.
The line export QT_QPA_PLATFORM=offscreen
tells Qt to not create a window,
and to draw nothing. Supercomputers don’t have monitors, so it would crash if it
tried.
If you’re producing large amounts of output, then it may be better to put the
output on the scratch filesystem. The easiest way to do that is to put another
cd
command just before the muscle_manager
one to change to the directory
where you want your output. See the Snellius documentation, in particular the
page about file systems.
With the script in order, we can now submit a job using
Tissue-Simulation-Toolkit$ sbatch src/scripts/snellius_cpm_ecm.sh
This will print a message saying that the job has been submitted, with its job id. You can see it in the queue using
$ squeue
and get more information with
$ scontrol show job <job id>
The job will only show in squeue
while it is queued and waiting to run and
while it is running. Once it’s done, it will disappear there.
When the job is started, a slurm-<job id>.out
file is created in the
directory in which you ran the sbatch
command that contains the output that
would otherwise have been printed to the screen. You can use less slurm-<job
id>.out
to view its contents while the job is running and after it’s done.
Debugging on Snellius¶
If something goes awry, then the first thing to do is to check the log output.
The slurm-<job id>.out
file will contain the things that are normally
printed to the screen. If the simulation crashed, then there will be an error
message here that will point you to the log files in the run directory. Those
should contain more information, which will hopefully help you figure out what
went wrong in the same way as when running locally.
Note that because of the batch system, doing interactive debugging on an HPC machine isn’t so easy. It’s better to do that locally if the problem is with the code rather than the configuration, and then go back to the cluster with the fix in place.