OpenMM Tutorial: Molecular Dynamics of Na+/Cl- Association¶
by Joshua L. Adelman
Updated with WESTPA version 1.0 beta and OpenMM 5.1
Overview¶
Requirements: ~? hours wallclock time on ?; ~? GB disk space
In this tutorial we will use the standard weighted ensemble approach to simulate Na+/Cl- association in Generalized Born implicit solvent. The system consists of single Na+ and Cl- ions modeled with the Amber force field, using the distance between the two ions as the progress coordinate. OpenMM will be used to run the molecular dynamics, and familiarity with it is a prerequisite (see tutorials).
This tutorial uses the same starting files, generated using Amber, as the
Introductory Amber Tutorial.
Instead of using Amber (pmemd
or sander
) to run the dynamics, we
will use the OpenMM framework along with its python wrapper to propagate the
system.
While the Amber tutorial uses the executable propagator to call out to Amber and then a number of shell scripts to communicate the results back to WESTPA, OpenMM’s python wrapper allows us to integrate the molecular dynamics simulation directly into WESTPA. This tutorial will therefore require some understanding of python and the OpenMM framework. Additionally, the focus will be on integrating OpenMM into WESTPA and less on the analysis of the resulting simulations.
Since the Na+/Cl- system only contains two atoms, we will make some design decisions in setting up the simulation that may not be appropriate more generally. Most notably, we will avoid writing trajectory data and restart files for each trajectory segment to disk as separate files, as is generally done when using the executable propagator. Instead, we will store all of this data in the main hdf5 file that also contains the data related to the weighted ensemble run.
The first step is to set up a directory containing the necessary AMBER and
WESTPA files. A working example directory can be found at
westpa/lib/examples/nacl_openmm
.
Preparing the OpenMM files¶
Input File | Description |
---|---|
... | ... |
We will begin with the coordinate and topology files (nacl.inpcrd
and
nacl.prmtop
respectively) produced by AmberTools. We will construct an
OpenMM system and integrator and then serialize the resulting python objects
using a short script:
python build_system.py
Preparing the WESTPA files¶
Input File | Description |
---|---|
env.sh | set environment variables |
system.py | system implementation |
openmm_propagator.py | custom propagator to run openmm calculation across multiple devices |
restart_plugin.py | plugin to allow restart information to be stored in the west.h5 file |
west.cfg | WESTPA configuration |
init.sh | initialize WESTPA |
system.py¶
This file contains information about the progress coodinate, binning, walkers per bin and more. This file is nearly identical to one defined in the Introductory Amber Tutorial <amber_tutorial>. In this example we will be using the distance between the two ions as the progress coordinate, giving us a one dimensional coordinate:
self.pcoord_ndim = 1
The positions of the bins along this progress coordinate are the same as those used by Zwier, Kaus, and Chong:
binbounds = [0.0] + [2.8, 2.88, 3.0, 3.10, 3.29, 3.79, 3.94, 4.12, 4.39,
5.43] + [5.90+1.0*i for i in xrange(0,11)] + [30,float('inf')]
Since every walker must lie in a bin, the upper boundary to the last bin is set
to infinity i.e. [30, float('inf')]
. The bin boundaries are left inclusive
e.g. a walker with a value of 2.8 would end up in the second bin. The positions
of your bins must be either monotonically increasing or decreasing - otherwise,
you will get an error message indicating this requirement.
The number of walkers per bin is specified by the following:
bin.target_count = 48
Using a tau value of 0.5 ps, we will monitor the progress coordinate every 0.05 ps, writing coordinates 10 times. Including the initial configuration this gives an expected progress coordinate length of 11:
self.pcoord_len = 11
Finally, we specify the format in which the coordinates are stored:
self.pcoord_dtype = numpy.float32
openmm_propagator.py¶
The OpenMMPropagator subclasses the WESTPropagator interface and implements all of the necessary methods to run a WESTPA simulation using OpenMM. The implementation presented in this example, while fairly generic, is still specific enough to the Na+/Cl- association example, that changes will likely be necessary to adapt it for another system. Below is a brief description of each method in the class:
__init__ method¶
The __init__
method is primarily responsible for parsing the configuration
parameters form west.cfg
and building the OpenMM system, integrator and
platform objects. Since each OpenMM context must be tied to a unique
integrator, the propagator
method actually deserializes the integrator for
each propagation step. In this method, however, it is primarily being used to
retrieve the temperature of the system.
static methods¶
The OpenMMPropagator contains three methods that are tagged with the Python’s
@staticmethod
decorator. This designation just allows the methods to be
encapsulated within the class, but they do not have direct access to the
class’s internal data. The dist
method just calculates a simple Euclidean
distance between two points and is used in calculating the pcoord of a
conformation of the system. The makepath
method assembles a path on the
filesystem from a template and is used to tell the propagator where to grab
initial state information from. The mkdir_p
method augments the standard
library’s os
module to allow unix mkdir -p
like behavior.
get_pcoord method¶
This method assigns a pcoord value to a given state. The state can either be an
BasisState
, in which case we uses the basis state’s coordinate, which are
stored as a class variable to calculate the pcoord using dist
. If the state
is an InitialState
(i.e. the result of perturbing the x-position of one of
the ions by a random amount), we construct the path to the file containing its
coordinates, and calculate the pcoord after reading the file from disk.
propagate method¶
The propagate
method takes a set of segments and runs each for a length of
time tau. Initially, the method attempts to assign the calculation to a device
based on the WM_PROCESS_INDEX
environment variable if it is available (both
the zmq and processes work managers set it, but the other work managers do
not). A context is then constructed, before the method iterates over all
segments.
For each segment, an initial set of coordinates or velocities are obtained either from the parent segment, if this segment is a continuation of previous dynamics, or from an initial state if the segment is being initiated at the start of the WE calculation or is the result of a recycling event. Dynamics are then run using the OpenMM integrator. At a user-specified interval, the calculation is halted and the coordinates and velocities, along with the calculated pcoord are saved to temporary arrays. Finally this data is transferred to the segment’s internal data structures.
gen_istate method¶
This method takes a basis state and generates an initial state by randomly
perturbing the basis state and storing the results to disk using the naming
convention specified by the template given in the west.cfg
file.
restart_plugin.py¶
In order to restart a segment from its parent, we need access to the last set
of coordinates and velocities recorded for the parent in the coord
and
veloc
data sets. We use a custom plugin that is run just before the
propagation step that temporarily loads the necessary coordinates and
velocities into a segment’s data dictionary as
segment.data['restart_coord']
and segment.data['restart_veloc']
. The
propagator will then delete this data once it has been transferred to the
OpenMM context.
This allows us to run the entire simulation from the main hdf5 file without writing any per-segment data to individual files. While convenient for a simple system like the one in this example, it may not be as desirable for systems with a large number of particles. In that case the propagator will need to be modified to load the restart data from individual files contained in the traj_segs directory on the file system, as is the case for the examples that use the executable propagator.
west.cfg¶
The actual WESTPA simulation is configured using the yaml-formatted
west.cfg
file. The custom propagator will extract a number of parameters
from the openmm
section shown below.:
---
west:
...
openmm:
system:
file: system.xml
integrator:
file: integrator.xml
steps_per_tau: 250
steps_per_write: 25
platform:
name: CUDA
#properties: {'OpenCLPlatformIndex': '1', 'OpenCLDeviceIndex': '0'} # Platform specific properties
The xml files are the output of running the build_system.py
script. Within
the integrator
section, the steps_per_tau
and steps_per_write
specify the number of time steps that the integrator should advance the system
per tau (so 250 x 2 fs = 0.5 ps) and at what frequency, in numbers of steps,
that the pcoord and auxiliary data should be collected, respectively.
The platform
section defines a platform name
, which can be
Reference
, CUDA
, or OpenCL
, assuming the latter two are installed
on your system. The CUDA platform requires a compatible GPU card, but the
OpenCL platform, in addition to running on GPUs supports both the Intel and AMD
CPU OpenCL SDK.
Finally, the properties
variable under the platform
section defines a
dictionary, whose members override the defaults specified in the propagator
__init__
method. See the defaults for all possible platform specific
settings. Importantly, the XXXDeviceIndex
settings are ignored when running
in parallel using either the zeromq or processes work managers, since they set
that variable dynamically for each worker. However, when running in serial mode
on a multi-device system, it can be useful to select a specific device to run
the calculation on. When running using the OpenCL platform, the oclutils library is useful in extracting
information about the available devices and platforms (in the OpenCL meaning of
platform, rather than the OpenMM one).
There are also some important settings under the propagation
section:
---
west:
...
propagation:
max_total_iterations: 2
max_run_wallclock: 2:00:00
propagator: openmm_propagator.OpenMMPropagator
gen_istates: true
block_size: 138
In addition to setting the location of the custom openmm propagator, this
section allows you to set the total number of iterations to run using
max_total_iterations
. This should be changed to collect data for this
system to at least 100. The max_run_wallclock
time should also be adjusted
depending on the hardware being used to run this simulation. Using four GTX
680s, this system takes approximately 16 seconds per iteration.
A particularly important setting in terms of the performance of the calculation
is block_size
. This parameter determines how many segments are sent to the
propagator at a time during the run. Since setting up the OpenMM context is
quite expensive, one can get a large boost in performance by re-using the same
context and just pushing new coordinates and velocities to it. So if the
calculation is run using the serial work manager, block_size
should be set
to the maximum number of replicas possible for the system, which in this case
is 552. Likewise, if running the calculation over 4 devices, this number should
be 552 / 4 = 138.
Running the simulation¶
The simulation can then be initiated and ran using the shell scripts,
init.sh
and run.sh
.
From the simulation root directory ($WEST_SIM_ROOT
) directory, enter into
the command line:
./init.sh
The script should create a directory called istates
, as well as an HDF5
file named west.h5
. Because the gen_istates
flag was set to True in the
west.cfg file, the propagator’s gen_istate
method should prepare multiple
different .txt
input coordinate files, located in the istates/1
directory. The init.sh
script should finish by printing “Simulation
prepared.” with a short list (8 lines) of probabilities and statistics about
the initial state of the methane-methane simulation.
Now that your simulation has been initialized, it is ready to be run by the weighted ensemble code. Use the command:
./run.sh --work-manager=zmq --n-workers=4 &
to use the zmq
work manager and run using 4 workers.
The init.sh
and run.sh
scripts call w_init.py
and w_run.py
from
the main weighted ensemble code, respectively. If either does not work, check
to see if the env.sh
is set up properly and if it points to the right
directory for your weighted ensemble code (the default settings assume you are
running from within the westpa/lib/nacl_openmm directory). Make sure that the
WEST_ROOT
variable is set to where the westpa
directory exists and the
WEST_SIM_ROOT
variable is set to where your simulation directory exists.
Analyzing the data¶
Output¶
Output File | Remarks |
---|---|
west.h5 | WESTPA output in hdf5 database |
west.log | WESTPA log file |
The way in which we set up the calculation, all output data is stored within
the hdf5 file, west.h5
. Because we specified 2 iterations in the
west.cfg
file, the simulation should have only run for a short period of
time. This is not enough to generate any meaningful results, but is sufficient
to ensure that the system was set up properly.
In the west.cfg
file, change the max_total_iterations
variable to 100.
The westpa code will continue the simulation from where you left off, based on
the data present in the west.h5
file. If you wanted to restart the
simulation from scratch, you would need to run the init.sh
script again,
which would remove the existing west.h5
file and create a new one. Once you
have changed the max_total_iterations
flag to 100, execute the run.sh
script again. Simulating 100 iterations may take some time, so be prepared to
wait. Using 4 GTX 680s and running with the CUDA, platform, this should take
about 25 minutes. Not, that for a small number of atoms, such is the case for
this system, running on the GPUs does not leverage the full capabilities of the
hardware and is likely to be slower than using an optimized CPU-based code.
Computing the association rate¶
WESTPA includes several tools for analysis located in $WEST_ROOT/bin
. In
init.sh
we specified the bin containing an Na+/Cl-
distance of 1.8 Å as the bound state, and that containing a distance of 16.9 Å
as the unbound state. Using w_fluxanl
, we can calculate the flux into these
target states, and from that calculate the association rate of Na+/Cl-. w_fluxanl
may be run with the following commands:
source env.sh
$WEST_ROOT/bin/w_fluxanl
The script will output the flux into the target states including confidence intervals calculated using the block bootstrap method:
Calculating mean flux and confidence intervals for iterations [1,101)
target 'bound':
correlation length = w tau
mean flux and CI = x (y, z) tau^(-1)
More information on how to use w_fluxanl
can be viewed using the --help
flag. w_fluxanl
also stores this information in an hdf5 file,
fluxanl.h5
.
Presently, w_fluxanl
has used the data from all 100 iterations (note the
exclusive bracket after 101) to calculate the mean flux (x) and the 95%
confidence interval (y, z) for reaching the bound state (target ‘bound’), which
we specified as less than 2.8 angstroms of separation in the system.py
file
and with the target state variable in init.sh
. The value given for the flux
also represents the association rate. Taking the inverse of the mean flux (1/x)
will give the mean first passage time for Na+/Cl- in units of
tau. We can further analyze the output of w_fluxanl
by investigating the
fluxanl.h5
file. You can look at the data contained within the file by
using programs such as h5ls or hdfview, but I am instead going to use h5py in
python to analyze the data. Open up ipython
in the interactive plotting
mode:
ipython --pylab
and then enter the following commands:
import h5py
import numpy as np
fluxanl = h5py.File('fluxanl.h5')
fluxanl['target_flux']['index'][:]
We can see that the dataset named [‘index’] contains the output printed
above by w_fluxanl
. We can plot the flux using:
flux = np.array(fluxanl['target_flux']['target_0']['flux'])
plot(flux)
The x-axis represents the iteration number recorded after the occurence of the
first binding event. The y-axis represents the flux in units of tau-1.
We can see that the instantaneous flux has settled after large fluctuations
during the first part of the run, however the plot is also relatively noisy. To
reduce noise, we can plot the time evolution flux. Run the w_fluxanl
tool
again, this time with the ‘–evol’ flag at the end of the command. Running this
command will add an HDF5 dataset named [‘flux_evolution’] to the [‘target_0’]
group. To plot the time evolution flux, you can use the following python code,
continuing from the above ipython session:
mean_flux = fluxanl['target_flux']['target_0']['flux_evolution']['expected']
ci_lb = fluxanl['target_flux']['target_0']['flux_evolution']['ci_lbound']
ci_ub = fluxanl['target_flux']['target_0']['flux_evolution']['ci_ubound']
plot(mean_flux, 'b', ci_lb, 'g', ci_ub, 'r')
Compared to the first plot of the instantaneous flux, the time evolution plot is much less noisy. We can see that the flux is leveling off and the confience intervals have somewhat converged, meaning that the simulation is approaching steady-state conditions.
Visualizing a selected pathway¶
In order to visualize a binding event, you will need to stitch together the
individual trajectory segments that start at the initial state and then reach
the bound state. The introductory Amber tutorial <amber_tutorial> provides
directions on how to extract the sequence of segments in a set of successful
binding events, however the script to construct a visualization of the pathway
will not work for this example since we have stored all of the relevant data
directly in the west.h5
file. For this example, we leave writing the
necessary script as an exercise. To create a netcdf-formatted Amber trajectory
file, you might want to take a look at netcdf4storage.py
or you might consider using the dcd writer built into OpenMM which can imported
into python using:
import simtk.openmm.app.dcdfile
Useful hints¶
- Make sure your paths are set correctly in
env.sh
- If the simulation doesn’t stop properly with CTRL+C , use CTRL+Z.
- Another method to stop the simulation relatively cleanly is to rename
runseg.sh
; WESTPA will shut the simulation down and prevent the hdf5 file from becoming corrupted. Some extra steps may be necessary to ensure that the analysis scripts can be run successfully.