Welcome to molPX: The Molecular Projection Explorer¶
The Molecular Projection Explorer, molPX, is a python module that provides interactive visualization of projected coordinates of molecular dynamics (MD) trajectories inside a Jupyter notebook.
molPX is based on the incredibly useful nglview IPython/Jupyter widget. Other libraries heavily used are mdtraj and PyEMMA. At the moment, there is also an sklearn dependency that might disappear in the future.

At the moment the API consists of two subpackages:
TL;DR: see molPX in action through the
Find more about the people behind molPX here:
Download and Install¶
If you can’t wait to play around with molPX, and you have the Anaconda scientifc python distribution (which we strongly recommend), the easiest way to get molPX is to issue the conda command:
>>> conda install molpx -c conda-forge
and jump to the Quick Start section of this document. Otherwise, check out our more exhaustive
Quick Start¶
Start an IPython
console
>>> ipython
Import molpx
and let the example notebooks guide you
>>> import molpx
>>> molpx.example_notebooks()
Voilà: you should be looking at a list of jupyter notebooks explaining the basic functionality of molPX
Documentation¶
You can find the latest documentation online here You can build a local copy of the html documentation by navigating to the molPX installation directory and issuing:
>>> cd doc
>>> make html
This will generate molPX/docs/build/html/index.html with the html documentation. If you are missing some of the requirements for the documentation , issue:
>>> pip install -r ./source/doc_requirements.txt
If you don’t know where molPX is installed, you can find out this way:
>>> ipython
>>> import molpx
>>> molpx._molpxdir()
The output of the last command is one subdirectory of molPX’s installation directory, so just copy it and issue:
>>> cd the-output-of-the-molpx._molpxdir-command
>>> cd ..
and you are there !
Warnings¶
molPX is currently under heavy development and the API might change rapidly. Stay tuned.
Data Privacy Statement¶
When you import this Python package, some of your metadata is sent to our servers. These are:
- molPX version
- Python version
- Operating System
- Hostname/ mac address of the accessing computer
- Time of retrieval
How to disable this feature easily:¶
Even before you use molPX for the first time:
- Create a hidden folder .molpx in your home folder
- Create a file conf_molpx.py inside of .molpx with the following line: report_status = False
- Restart your ipython/jupyter sessions
Hints:
This is most easily realized from terminal by issuing:
>>> mkdir ~/.molpx >>> echo "report_status = False" >> ~/.molpx/conf_molpx.py
You can check your report status anytime by typing this line in a (i)python terminal
>>> import molpx >>> molpx._report_status()
If you don’t know where your home folder is (for whatever reason), you can find it out by typing in a (i)python terminal
>>> import os >>> os.path.expanduser('~/.molpx')
molpx.visualize¶
The core functionality is to link two interative figures, fig1 and fig2, inside an IPython/Jupyter notebook, so that an action in fig1 (e.g.a click of the mouse or a slide of a slidebar) will trigger an event in fig2 (e.g. a frame update or point moved) and vice versa. Usually, these two figures contain representations from:
- molecules: an NGLview widget showing one (or more) molecular structure(s) that a particular value of the coordinate(s) is associated with and
- projected coordinates: a matplotlib figure showing the projected coordinates (e.g. TICs or PCs or any other), \({Y_0, ..., Y_N}\), either as a 2D histogram, \(PDF(Y_i, Y_j)\) or as trajectory views \({Y_0(t), ...Y_N(t)}\)
You are strongly encouraged to check NGLview’s documentation, since its functionalities extend beyond the scope of this package and the molecular visualization universe is rich and complex (unlike this module).
The methods offered by this module are:
molpx.visualize.FES (MD_trajectories, MD_top, …) |
Return a molecular visualization widget connected with a free energy plot. |
molpx.visualize.sample (positions, geom, ax) |
Visualize the geometries in geom according to the data in positions on an existing matplotlib axes ax |
molpx.visualize.traj (MD_trajectories, …[, …]) |
Link one or many projected trajectories , [Y_0(t), Y_1(t)…], with the MD_trajectories that originated them. |
molpx.visualize.correlations (correlation_input) |
Provide a visual and textual representation of the linear correlations between projected coordinates (PCA, TICA) and original features. |
molpx.visualize.feature (feat, widget[, …]) |
Provide a visual representation of a PyEMMA feature. |
-
molpx.visualize.
FES
(MD_trajectories, MD_top, projected_trajectories, proj_idxs=[0, 1], nbins=100, n_sample=100, proj_stride=1, weights=None, proj_labels='proj', n_overlays=1, atom_selection=None, **sample_kwargs)¶ Return a molecular visualization widget connected with a free energy plot.
Parameters: - MD_trajectories (str, or list of strings with the filename(s) the the molecular dynamics (MD) trajectories.) – Any file extension that
mdtraj
(.xtc, .dcd, etc) can read is accepted. Alternatively, a singlemdtraj.Trajectory
object or a list of them can be given as input. - MD_top (str to topology filename or directly an
mdtraj.Topology
object) – - projected_trajectories (numpy ndarray (or list thereof) of shape (n_frames, n_dims) with the time-series) –
- the projection(s) that want to be explored. Alternatively, strings or list of string with .npy or ascii filenames (of) – filenames (.dat, .txt etc)
- NOTE (molpx assumes that there is no time column.) –
- proj_idxs (int, list or ndarray) – Selection of projection idxs (zero-idxd) to visualize.
- nbins (int, default 100) – The number of bins per axis to used in the histogram (FES)
- n_sample (int, default is 100) – The number of geometries that will be used to represent the FES. The higher the number, the higher the spatial resolution of the “click”-action. (And the longer it will take to generate the plot)
- proj_stride (int, default is 1) – Stride value that was used in the
projected_trajectories
relative to theMD_trajectories
If the originalMD_trajectories
were stored every 5 ps but the projected trajectories were stored every 50 ps,proj_stride
= 10 has to be provided, otherwise an exception will be thrown informing the user that theMD_trajectories
and theprojected_trajectories
have different number of frames. - weights (iterable of floats (or list thereof) each of shape (n_frames, 1) or (n_frames)) – The sample weights can come from a metadynamics run or an MSM-object, e.g.
via the method
pyemma.msm.BayesianMSM.trajectory_weights
. Has to have the same length as theprojected_trajectories
- proj_labels (either string or list of strings or (experimental) PyEMMA featurizer) – The projection plots will get this parameter for labeling their yaxis. If a str is
provided, that will be the base name
proj_labels='%s_%u'%(proj_labels,ii)
for each projection. Ifproj_labels
is a list, the list will be used as is. If there are not enough labels, the module will complain - n_overlays (int, default is 1) – The number of structures that will be simultaneously displayed as overlays for every sampled point of the FES. This parameter can seriously slow down the method, it is currently limited to a maximum value of 50
- atom_selection (string or iterable of integers, default is None) – The geometries of the original trajectory files will be filtered down to these atoms. It can be any DSL string
that
mdtraj.Topology.select
could understand or directly the iterable of integers. IfMD_trajectories
is already a (list of)mdtraj.Trajectory
objects, the atom-slicing can be done by the user place before calling this method. - sample_kwargs (dictionary of named arguments, optional) – named arguments for the function
molpx.visualize.sample
. Non-expert users can safely ignore this option. Examples aresuperpose
orproj_idxs
Returns: A
molpx._linkutils.MolPXHBox
-object.It contains the
NGLWidget
and theAxesWidget
(which is responsible for the interactive figure). It is child-class of theipywidgets.HBox
-class and has been monkey-patched to have the following extra attributes so that the user has access to all the information being displayed.- linked_axes :
list with all the
pylab.Axis
-objects contained in thewidgetbox
- linked_ax_wdgs :
list with all the
matplotlib.widgets.AxesWidget`objects contained in the :obj:`widgetbox
- linked_figs :
list with all the
pylab.Figure
-objects contained in thewidgetbox
- linked_ngl_wdgs :
list with all the
nglview.NGLWidget
-objects contained in thewidgetbox
- linked_data_arrays :
list with all the numpy ndarrays contained in the
widgetbox
- linked_mdgeoms:
list with all the
mdtraj.Trajectory
-objects contained in thewidgetbox
Return type: widgetbox
- MD_trajectories (str, or list of strings with the filename(s) the the molecular dynamics (MD) trajectories.) – Any file extension that
-
molpx.visualize.
correlations
(correlation_input, geoms=None, proj_idxs=None, feat_name=None, widget=None, proj_color_list=None, n_feats=1, verbose=False, featurizer=None)¶ Provide a visual and textual representation of the linear correlations between projected coordinates (PCA, TICA) and original features.
Parameters: - correlation_input (numpy ndarray or some PyEMMA objects) –
- if array :
- (m,m) correlation matrix, with a row for each feature and a column for each projection
- if PyEMMA-object :
TICA
,PCA
orMDFeaturizer
.
- geoms (None or
mdtraj.Trajectory
, default is None) – The values of the most correlated features will be returned for the geometries in this object. If widget is left to its default, None,correlations
will create a new widget and try to show the most correlated features on top of the widget. - widget (None or
nglview.NGLWidget
, default is None) –Provide an already existing widget to visualize the correlations on top of. This is only for expert use, because no checks are done to see if
correlation_input
and the geometry contained in the widget actually match. Use with caution.Note
- When objects
geoms
andwidget
are provided simultaneously, three things happen: - no new widget will be instantiated
- the display of features will be on top of whatever geometry
widget
contains - the value of the features is computed for the geometry of
geom
Use with caution and clean bookkeeping!
- When objects
- proj_color_list (list, default is None) – projection specific list of colors to provide the representations with. The default None yields blue.
In principle, the list can contain one color for each projection (= as many colors as len(proj_idxs)
but if your list is short it will just default to the last color. This way, proj_color_list=[‘black’] will paint
all black regardless len(proj_idxs). Input anything that yields
matplotlib.colors.is_color_like
== True - proj_idxs (None, or int, or iterable of integers, default is None) – The indices of the projections for which the most correlated feture will be returned. If none it will default to the dimension of the correlation_input object
- feat_name (None or str, default is None) – The prefix with which to prepend the labels of the most correlated features. If left to None, the feature
description found in
correlation_input
will be used, if available - n_feats (int, default is 1) – Number of most correlated features to return for each projection
- featurizer (None or
MDFeaturizer
, default is None) – Ifcorrelation_input
is not anMDFeaturizer
itself, or doesn’t have adata_producer
attribute, the user can input one here. Ifcorrelation_input
andfeaturizer
are bothMDFeaturizer
-objects,featurizer
will be ignored. - verbose (Bool, default is True) – print to standard output
Returns: corr_dict and ngl_wdg
corr_dict – Dictionary with items:
- idxs :
List of length len(proj_idxs) with lists of length n_feat with the idxs of the most correlated features
- vals :
List of length len(proj_idxs) with lists of length n_feat with the corelation values of the most correlated features
- labels :
List of length len(proj_idxs) with lists of length n_feat with the labels of the most correlated features
- feats :
If an
mdtraj.Trajectory
is passed as angeom
argument, the most correlated features will be evaluated and returned as list of length len(proj_idxs
). Each element in th elist is an arrays with shape (geom.n_frames
,n_feats
)- atom_idxs :
List of length len(proj_idxs) each with an nd.array of shape (nfeat, m), where m is the number of atoms needed to describe each feature (1 of cartesian, 2 for distances, 3 for angles, 4 for dihedrals)
- info :
List of length len(
proj_idxs
) with lists of lengthn_feat
with strings describing the correlations
ngl_wdg –
nglview.NGLWidget
with the most correlated features (distances, angles, dihedrals, positions) visualized on top of it.
- correlation_input (numpy ndarray or some PyEMMA objects) –
-
molpx.visualize.
feature
(feat, widget, idxs=0, color_list=None, **kwargs)¶ Provide a visual representation of a PyEMMA feature. PyEMMA’s features are found as a list of the MDFeaturizers’s active_features attribute
Parameters: - featurizer (py:obj:_MDFeautrizer) – A PyEMMA MDFeaturizer object (either a feature or a featurizer, works with both)
- widget (None or nglview widget) – Provide an already existing widget to visualize the correlations on top of. This is only for expert use,
because no checks are done to see if
correlation_input
and the geometry contained in the widget actually match. Use with caution. - idxs (int or iterable of integers, default is 0) – Features can have many contributions, e.g. a distance feature can include many distances. Use this parameter to control which one of them gets represented.
- color_list (list, default is None) – list of colors to represent each feature in feat_idxs. The default None yields blue for everything. In principle, the list can contain one color for each projection (= as many colors as len(feat_idxs) but if your list is short it will just default to the last color. This way, color_list=[‘black’] will paint all black regardless len(proj_idxs)
- **kwargs (optional keyword arguments for _bmutils.add_atom_idxs_widget) – currently, only “radius” is left for the user to determine
- Returns –
- -------- –
- widget – the input widget with the features in
idxs
represented as either distances (for distance features) or “spacefill” spheres (for angular features)
-
molpx.visualize.
sample
(positions, geom, ax, plot_path=False, clear_lines=True, n_smooth=0, ngl_wdg=None, superpose=True, projection=None, n_feats=1, sticky=False, list_of_repr_dicts=None, color_list=None, **link_ax2wdg_kwargs)¶ Visualize the geometries in
geom
according to the data inpositions
on an existing matplotlib axesax
Use this method when the array of positions, the geometries, the axes (and the ngl_wdg, optionally) have already been generated elsewhere.
Parameters: - positions (numpy nd.array of shape (n_frames, 2)) – Contains the position associated the geometries in
geom
. See below for more details - geom (
mdtraj.Trajectory
object or a list thereof.) – The geometries associated with the thepositions
. # TODO: WRITE HOW THE LISTS-LENGTHS CORRESPONDS FOR THE STICKY OPTION - ax (
matplotlib.pyplot.Axes
object) – The axes to be linked with theNGLWidget
- plot_path (bool, default is False) – whether to draw a line connecting the positions in
positions
- clear_lines (bool, default is True) – whether to clear all the 2D objects that were previously drawn in
ax
- n_smooth (int, default is 0,) – if
n_smooth
> 0, the shown geometries and paths will be smoothed out by 2*n_smooth
+1. Seemolpx._bmutils.smooth_geom
for more information - ngl_wdg (None or an existing
NGLWidget
, default is None) – you can provide an already instantiatedNGLWidget
here (avanced use) - superpose (boolean, default is True) – The geometries in
geom
may or may not be oriented, depending on how they were generated. Since this method is mostly for visualization purposes, the default behaviour is to orient them all to maximally overlap with the most compact frame available - projection (object that generated the
positions
, default is None) –The projected coordinates may come from a variety of sources. When working with PyEMMA, a number of objects might have generated this projection, like a
TICA
- or aPCA
- or anMDFeaturizer
-object.
Makes most sense when
positions
where generated withmolpx.generate.projection_paths
, otherwise might produce rubbish. Seen_feats
for more info. # TODO: delete from here below? The features most correlated with thepositions
will be shown in the widget # TODO CHECK THIS geometries ingeom
, allowing the user to establish a visual connection between the projected coordinate and the original features (distances, angles, contacts etc). - n_feats (int, default is 1) – If a
projection
is passed along, the firstn_feats
features that most correlate the the projected trajectories will be represented, both in form of figures of feat(t) as well as in the on top of thengl_wdg
. Ifprojection == None
,nfeats
will be ignored. - sticky (boolean, default is False) – If set to True, the
ngl_wdg
will be sticky in that every click adds a new molecular structure without deleting the previous one. Left-clicks adds a structure, right-clicks deletes a structure. Particularly useful for representing more minima simultaneously. - color_list (None, list of len(
positions
), or “random”) – The colors with which the sticky frames will be plotted. A color is anything that yieldsmatplotlib.colors.is_color_like == True
“None” defaults to coloring by element. “random” randomizes the color choice - list_of_repr_dicts (None or list of dictionaries, default is None) – Has an effect only for
sticky == True
, s.t. these reps instead of the default ones are used. The dictionaries must have at least the keys ‘repr_type’ and ‘selection’. Other key-value pairs are currently ignored but, will be implemented in the future. Seenglview.NGLWidget.add_representation
for more info). - link_ax2wdg_kwargs (dictionary of named arguments, optional) – named arguments for the function
molpx._linkutils.link_ax_w_pos_2_nglwidget
, which is the one that internally provides the interactivity. Non-expert users can safely ignore this option.
Returns: - ngl_wdg (
NGLWidget
) - axes_wdg (
AxesWidget
)
- positions (numpy nd.array of shape (n_frames, 2)) – Contains the position associated the geometries in
-
molpx.visualize.
traj
(MD_trajectories, MD_top, projected_trajectories, max_frames=10000.0, stride=1, proj_stride=1, proj_idxs=[0, 1], proj_labels='proj', plot_FES=False, weights=None, panel_height=1, sharey_traj=True, dt=1.0, tunits='frames', traj_selection=None, projection=None, n_feats=1)¶ Link one or many
projected trajectories
, [Y_0(t), Y_1(t)…], with theMD_trajectories
that originated them. Optionally plot also the resulting FES.Parameters: - MD_trajectories (str, or list of strings with the filename(s) the the molecular dynamics (MD) trajectories.) –
Any file extension that
mdtraj
(.xtc, .dcd etc) can read is accepted.Alternatively, a single
mdtraj.Trajectory
object or a list of them can be given as input. - MD_top (str to topology filename or directly
mdtraj.Topology
object) – - projected_trajectories (numpy ndarray (or list thereof) of shape (n_frames, n_dims) with the time-series) –
- the projection(s) that want to be explored. Alternatively, strings or list of string with .npy or ascii filenames (of) –
- txt etc) ((dat,) –
- NOTE (molpx assumes that there is no time column.) –
- max_frames (int, default is 1000) – If the trajectoy is longer than this, stride to this length (in frames)
- stride (int, default is 1) – Stride value in case of large datasets. In case of having
MD_trajectories
andprojected_trajectories
in memory (and not on disk) the stride can take place also before calling this method. - proj_stride (int, default is 1) – Stride value that was used in the
projected_trajectories
relative to theMD_trajectories
If the originalMD_trajectories
were stored every 5 ps but the projected trajectories were stored every 50 ps,proj_stride
= 10 has to be provided, otherwise an exception will be thrown informing the user that theMD_trajectories
and theprojected_trajectories
have different number of frames. - proj_idxs (iterable of ints, default is [0,1]) – Indices of the projected coordinates to use in the various representations
- proj_labels (either string or list of strings) – The projection plots will get this paramter for labeling their yaxis. If a str is provided, that will be the base name proj_labels=’%s_%u’%(proj_labels,ii) for each projection. If a list, the list will be used. If not enough labels are there the module will complain
- plot_FES (bool, default is False) – Plot (and interactively link) the FES as well
- weights (ndarray(n_frames), default = None) – sample weights. By default all samples have the same weight (used for FES calculation only)
- panel_height (int, default 1) – Height, in inches, of each panel of each trajectory subplots
- sharey_traj (bool, default is True) – Force the panels of each projection to have the same yaxes across trajectories (Note: Not across coordinates)
- dt (float, default is 1.0) – Physical time-unit equivalent to one frame of the
projected_trajectories
- tunits (str, default is 'frames') – Name of the physical time unit provided in
dt
- traj_selection (None, int, iterable of ints, default is None) – Don’t plot all trajectories but only few of them. The default None implies that all trajs will be plotted. Note: the data used for the FES will always include all trajectories, regardless of this value
- projection (object that generated the projection, default is None) –
The projected coordinates may come from a variety of sources. When working with
pyemma
a number of objects might have generated this projection, like aPass this object along and observe and the features that are most correlated with the projections will be plotted for the active trajectory, allowing the user to establish a visual connection between the projected coordinate and the original features (distances, angles, contacts etc)
- n_feats (int, default is 1) – If a
projection
is passed along, the first n_feats features that most correlate the the projected trajectories will be represented, both in form of trajectories feat vs t as well as in the ngl_wdg. Ifprojection
is None,nfeats
will be ignored.
Returns: - ax, ngl_wdg, data_sample, geoms – return _plt.gca(), _plt.gcf(), widget, geoms
- ax –
pylab.Axis
object - fig –
pylab.Figure
object - ngl_wdg –
nglview.NGLWidget
- geoms –
mdtraj.Trajectory
object with the geometries n_sample geometries shown by the ngl_wdg
- MD_trajectories (str, or list of strings with the filename(s) the the molecular dynamics (MD) trajectories.) –
molpx.generate¶
This module contains methods that generate the needed objects for visualize
of the methods to work.
molpx.generate.projection_paths (…[, …]) |
Return a path along a given projection. |
molpx.generate.sample (MD_trajectories, …) |
Returns a sample of molecular geometries and their positions in the projected space |
-
molpx.generate.
projection_paths
(MD_trajectories, MD_top, projected_trajectories, n_projs=1, proj_dim=2, proj_idxs=None, n_points=100, n_geom_samples=100, proj_stride=1, history_aware=True, verbose=False, minRMSD_selection='backbone')¶ Return a path along a given projection. More info on what this means exactly will follow soon.
Parameters: - MD_trajectories (str, or list of strings with the filename(s) the the molecular dynamics (MD) trajectories.) –
Any file extension that
mdtraj
(.xtc, .dcd etc) can read is accepted.Alternatively, a single
mdtraj.Trajectory
object or a list of them can be given as input. - MD_top (str to topology filename or directly
mdtraj.Topology
object) – - projected_trajectories (str to a filename or numpy ndarray of shape (n_frames, n_dims)) – Time-series with the projection(s) that want to be explored. If these have been computed externally, you can provide .npy-filenames or readable asciis (.dat, .txt etc). NOTE: molpx assumes that there is no time column.
- n_projs (int, default is 1) – Number of projection paths to generate. If the input
projected_trajectories
are n-dimensional, in principle up to n-paths can be generated - proj_dim (int, default is 2) – Dimensionality of the space in which distances will be computed
- proj_idxs (int, defaultis None) – Selection of projection idxs (zero-idxd) to visualize. The default behaviour is that proj_idxs = range(n_projs). However, if proj_idxs != None, then n_projs is ignored and proj_dim is set automatically
- n_points (int, default is 100) – Number of points along the projection path. The higher this number, the higher the projected coordinate is resolved, at the cost of more computational effort. It’s a trade-off parameter
- n_geom_samples (int, default is 100) – For each of the
n_points
along the projection path,n_geom_samples
will be retrieved from the trajectory files. The higher this number, the smoother the minRMSD projection path. Also, the longer it takes for the path to be computed - proj_stride (int, default is 1) – The stride of the
projected_trajectories
relative to theMD_trajectories
. This will play a role particularly ifprojected_trajectories
is already strided (because the user is holding it in memory) but the MD-data on disk has not been strided. - history_aware (bool, default is True) – The path-searching algorigthm the can minimize distances between adjacent points along the path or minimize the distance between each point and the mean value of all the other up to that point. Use this parameter to avoid a situation in which the path gets “derailed” because an outlier is chosen at a given point.
- verbose (bool, default is False) – The verbosity level
- minRMSD_selection (str, default is 'backbone') – When computing minRMSDs between a given point and adjacent candidates, use this string to select the atoms that will be considered. Check mdtraj’s selection language here http://mdtraj.org/latest/atom_selection.html
Returns: dictionary of dictionaries containing the projection paths.
paths_dict[idxs][type_of_path]
- idxs represent the index of the projected coordinate ([0], [1]…)
- types of paths “min_rmsd” or “min_disp”
What the dictionary actually contains
paths_dict[idxs][type_of_path]["proj"]
: ndarray of shape (n_points, proj_dim) with the coordinates of the projection along the pathpaths_dict[idxs][type_of_path]["geom"]
:mdtraj.Trajectory
geometries along the path
Return type: paths_dict
- idata :
- list of ndarrays with the the data in
projected_trajectories
- MD_trajectories (str, or list of strings with the filename(s) the the molecular dynamics (MD) trajectories.) –
-
molpx.generate.
sample
(MD_trajectories, MD_top, projected_trajectories, atom_selection=None, proj_idxs=[0, 1], n_points=100, n_geom_samples=1, keep_all_samples=False, proj_stride=1, verbose=False, return_data=False)¶ Returns a sample of molecular geometries and their positions in the projected space
Parameters: - MD_trajectories (list of strings) – Filenames (any extension that
mdtraj
can read is accepted) containing the trajectory data. There is an untested input mode where the user parses directlymdtraj.Trajectory
objects - MD_top (str to topology filename or directly
mdtraj.Topology
object) – - projected_trajectories ((list of) strings or (list of) numpy ndarrays of shape (n_frames, n_dims)) – Time-series with the projection(s) that want to be explored. You can provide .npy-filenames or readable asciis (.dat, .txt etc). Alternatively, you can feed in your own PyEMMA-clustering object NOTE: molpx assumes that there is no time column.
- atom_selection (string or iterable of integers, default is None) – The geometries of the original trajectory files will be filtered down to these atoms. It can be any DSL string that mdtraj.Topology.select could understand or directly the iterable of integers. If :py:obj`MD_trajectories` is already a (list of) md.Trajectory objects, the atom-slicing can take place before calling this method.
- proj_idxs (int, default is None) – Selection of projection idxs (zero-idxd) to visualize. The default behaviour is that proj_idxs = range(n_projs). However, if proj_idxs != None, then n_projs is ignored and proj_dim is set automatically
- n_points (int, default is 100) – Number of points along the projection path. The higher this number, the higher the projected coordinate is resolved, at the cost of more computational effort. It’s a trade-off parameter
- n_geom_samples (int, default is 1) – For each of the
n_points
along the projection path,n_geom_samples
will be retrieved from the trajectory files. The higher this number, the smoother the minRMSD projection path. Also, the longer it takes for the path to be computed. This is a trade-off parameter between how smooth the transitons between geometries can be and how long it takes to generate the sample - keep_all_samples (boolean, default is False) – In principle, once the closest-to-ref geometry has been kept, the other geometries are discarded, and the
output sample contains only n_point geometries. There are, still, special cases where the user might
want to keep all sampled geometries. Typical use-case is when the n_points is low and many representatives
per clustercenters will be much more informative than the other way around.
This is an advanced feature that other methods of molPX use internally for generating overlays, be awere that
it changes the return type of
geom_smpl
from the default (anmdtraj.Trajectory
withn_points
-frames) to a list list of lengthn_geom_samples
, each element is anmdtraj.Trajectory
object ofn_points
-frames - proj_stride (int, default is 1) – Stride value that was used in the
projected_trajectories
relative to theMD_trajectories
If the originalMD_trajectories
were stored every 5 ps but the projected trajectories were stored every 50 ps,proj_stride
= 10 has to be provided, otherwise an exception will be thrown informing the user that theMD_trajectories
and theprojected_trajectories
have different number of frames.
Returns: - pos – ndarray with the positions of the sample
- geom_smpl – sampled geometries. Can be of two types:
- default:
mdtraj.Trajectory
withn_points
-frames - if keep_all_samples = True: list of length
n_geom_samples
. Each element is anmdtraj.Trajectory
object ofn_points
-frames.
- default:
- MD_trajectories (list of strings) – Filenames (any extension that
Example Jupyter Notebooks¶
There are several ways to see the example notebooks, which you can find in the molpx/notebooks/
installation directory.
- Play:
molpx
has a method that will launch a working, temporary copy of the example notebooks. From an IPython console, just type:>>>> import molpx >>>> molpx.example_notebooks()
A List of Juypter notebooks should automagically appear in front of you after a few seconds. This is the most interactive and usefull way to see
molpx
in action, but you’ll only have access to it after downloading and installingmolpx
- Read:
The links you see below are an html-rendered version of the notebook. Click on them to navigate the notebook. Unfortunately for this html documentation,
nglview
‘s output, i.e. the pictures of molecular structures, cannot be stored currently in the notebook file. In short: the html-notebook is lacking the most visually appealing part ofmolpx
.
Quick Intro with DiAla¶
molPX Di-Ala example¶
Guillermo Pérez-Hernández guille.perez@fu-berlin.de
In this notebook we will be using a trajectory of Di-Ala-peptide to easily identify conformations in the Ramachandran plot.
[1]:
import molpx
%matplotlib ipympl
[2]:
top = molpx._molpxdir(join='notebooks/data/ala2.pdb')
MD_trajfiles = [molpx._molpxdir(join='notebooks/data/ala2.mini.xtc')] #short trajectory
rama_files = [molpx._molpxdir(join='notebooks/data/ala2.mini.phi.psi.dat')]
Execute the following cell and click either on the FES or on the slidebar. Once you get familiar with it, try uncommenting these options and see what happens:
proj_idxs=[1]
n_overlays=5
sticky=True
color_list='random'
With sticky option is True
, you can see the effect of a right-click on the dots of the FES
[3]:
mpx_wdg_box = molpx.visualize.FES(MD_trajfiles,
top,
rama_files,
nbins=50,
proj_labels=['$\phi$',
'$\psi$'],
atom_selection="symbol != H",
#proj_idxs=[1],
#n_overlays=5,
#sticky=True,
#color_list='random'
)
mpx_wdg_box
[4]:
from molpx import visualize, _linkutils
from imp import reload
reload(visualize)
reload(_linkutils)
from matplotlib import pyplot as plt
plt.close('all')
mpx_wdg_box = molpx.visualize.traj(MD_trajfiles,
top,
rama_files,
plot_FES = True,
proj_labels=['$\phi$', '$\psi$']
)
mpx_wdg_box
Repeat from memory¶
You can load the trajectory data using mdtraj
and numpy
and use that directly as an input:
[5]:
import numpy as np
import mdtraj as md
MD_trajs = [md.load(fname, top=top) for fname in MD_trajfiles]
phi_psi = [np.loadtxt(fname) for fname in rama_files]
[6]:
mpx_wdg_box = molpx.visualize.FES(MD_trajs,
top,
phi_psi,
nbins=50,
proj_labels=['$\phi$',
'$\psi$'],
atom_selection="symbol != H",
#proj_idxs=[1],
#n_overlays=5,
#sticky=True,
#color_list='random'
)
mpx_wdg_box
The mpx_wdg_box
¶
It is a class derived from the ipython widgets HBox and VBox, with molPX’s extra information as attributes starting with “linked_*”
[12]:
for attr in dir(mpx_wdg_box):
if attr.startswith('linked_'):
print(attr)
linked_ax_wdgs
linked_axes
linked_data_arrays
linked_figs
linked_mdgeoms
linked_ngl_wdgs
molPX and PyEMMA¶
Visualize PyEMMA MDFeaturizer
-objects¶
Guillermo Perez-Hernandez guille.perez@fu-berlin.de
In this notebook, we introduce the molPX’s ability to display PyEMMA specific objects: features.If you are note familiar with PyEMMA (www.pyemma.org), you might not follow entirely, but it is pretty intuitive overall. Give it a try!
We will be using two datasets: * 1 millisecond trajectory of Bovine Pancreatic Trypsin Inhibitor (BPTI) generated by DE Shaw Research on the Anton Supercomputer and kindly made available by their lab. The original work is
* Shaw DE, Maragakis P, Lindorff-Larsen K, Piana S, Dror RO, Eastwood MP, Bank JA, Jumper JM, Salmon JK, Shan Y, Wriggers W: Atomic-level characterization of the structural dynamics of proteins. Science 330:341-346 (2010). doi: 10.1126/science.1187409.
The trajectory has been duplicated and shortened to provide a mock-trajectory set and be able to deal with lists of trajectories of different lenghts:
* `c-alpha_centered.stride.100.xtc`
* `c-alpha_centered.stride.100.reversed.xtc`
* `c-alpha_centered.stride.100.halved.xtc`
- A short, Di-Alanine-Peptide trajectory
[1]:
top = 'notebooks/data/bpti-c-alpha_centered.pdb'
MD_trajfiles = ['notebooks/data/c-alpha_centered.stride.1000.xtc',
'notebooks/data/c-alpha_centered.stride.1000.reversed.xtc',
'notebooks/data/c-alpha_centered.stride.1000.halved.xtc'
]
dt = 244 #saving interval in the .xtc files, in ns
import molpx
from matplotlib import pylab as plt
%matplotlib ipympl
import pyemma
import numpy as np
# This way the user does not have to care where the data are:
top = molpx._molpxdir(top)
MD_trajfiles = [molpx._molpxdir(ff) for ff in MD_trajfiles]
[2]:
# Create a memory representation of the trajectories
MD_list = [molpx.generate._md.load(itraj, top=top) for itraj in MD_trajfiles]
[3]:
feat = pyemma.coordinates.featurizer(top)
pairs = feat.pairs(range(feat.topology.n_atoms)[::2])
feat.add_distances(pairs)
src = pyemma.coordinates.source(MD_trajfiles, features=feat)
Y = src.get_output()
feat.describe()[108], feat.describe()[112], feat
[3]:
('DIST: PRO 9 CA 8 - LYS 15 CA 14',
'DIST: PRO 9 CA 8 - TYR 23 CA 22',
<pyemma.coordinates.data.featurization.featurizer.MDFeaturizer at 0x7efc407e7550>)
[4]:
proj_idxs = [108,112]
mpx_wdg_box = molpx.visualize.FES(MD_list,
top,
Y,
nbins=50,
proj_idxs=proj_idxs,
#proj_labels=feat,
proj_labels='feat',
#n_overlays=5,
#sticky=True,
)
__ = molpx.visualize.feature(feat.active_features[0],
mpx_wdg_box.linked_ngl_wdgs[0],
idxs=proj_idxs, radius=.5,
color_list=['red','green'])
mpx_wdg_box.display()
[5]:
mpx_wdg_box = molpx.visualize.traj(MD_trajfiles,
top,
Y,
dt = dt*1e-6, tunits='ms',
traj_selection = 0,
#sharey_traj=False,
proj_idxs=proj_idxs,
panel_height=1,
proj_labels='feat',
)
molpx.visualize.feature(feat.active_features[0], mpx_wdg_box.linked_ngl_wdgs[0], idxs=proj_idxs)
mpx_wdg_box
Cartesian Features¶
[6]:
feat = pyemma.coordinates.featurizer(top)
feat.add_all()
Y = [feat.transform(igeom.superpose(MD_list[0])) for igeom in MD_list]
feat.describe()[24], feat.describe()[42]
[6]:
('ATOM:PRO 9 CA 8 x', 'ATOM:LYS 15 CA 14 x')
[7]:
proj_idxs = [24,42]
mpx_wdg_box = molpx.visualize.FES(MD_list,
top,
Y,
nbins=50,
proj_idxs=proj_idxs,
proj_labels='feat',
#n_overlays=5,
)
molpx.visualize.feature(feat.active_features[0], mpx_wdg_box.linked_ngl_wdgs[0], idxs=proj_idxs, color_list=['red','green'])
mpx_wdg_box
[8]:
from os.path import exists
top = molpx._molpxdir(join='notebooks/data/ala2.pdb')
# What data do we have?
if exists('/group/ag_cmb/scratch/gph82/Di-Ala-nbdata/ala2.dcd'):
MD_trajfiles = ['/group/ag_cmb/scratch/gph82/Di-Ala-nbdata/ala2.dcd'] #long trajectory
elif exists('/home/guille/ala2a.dcd'):
MD_trajfiles = ['/home/guille/ala2.dcd'] # extra for Stralsund
else:
MD_trajfiles = [molpx._molpxdir(join='notebooks/data/ala2.mini.xtc')] #short trajectory
feat = pyemma.coordinates.featurizer(top)
feat.add_backbone_torsions(
# cossin=True
)
src = pyemma.coordinates.source(MD_trajfiles, features=feat)
Y = src.get_output()
[9]:
proj_idxs = [0,1]
mpx_wdg_box = molpx.visualize.FES(MD_trajfiles,
top,
Y,
nbins=50,
proj_idxs=proj_idxs,
proj_labels=feat,
)
molpx.visualize.feature(feat.active_features[0], mpx_wdg_box.linked_ngl_wdgs[0], idxs=[0], radius=.5)
mpx_wdg_box
molPX and TICA with BPTI¶
molPX intro¶
Guillermo Perez-Hernandez guille.perez@fu-berlin.de
In this notebook we will be using the 1 millisecond trajectory of Bovine Pancreatic Trypsin Inhibitor (BPTI) generated by DE Shaw Research on the Anton Supercomputer and kindly made available by their lab. The original work is
- Shaw DE, Maragakis P, Lindorff-Larsen K, Piana S, Dror RO, Eastwood MP, Bank JA, Jumper JM, Salmon JK, Shan Y, Wriggers W: Atomic-level characterization of the structural dynamics of proteins. Science 330:341-346 (2010). doi: 10.1126/science.1187409.
The trajectory has been duplicated and shortened to provide a mock-trajectory set and be able to deal with lists of trajectories of different lenghts:
c-alpha_centered.stride.100.xtc
c-alpha_centered.stride.100.reversed.xtc
c-alpha_centered.stride.100.halved.xtc
The typical usecase is having molecular dynamics (MD) simulation data in form of trajectory files with extensions like .xtc, .dcd
etc and the associated molecular topology as a .pdb
or .gro
file.
These files are the most general starting point for any analysis dealing with MD, and molpx
’s API has been designed to be able to function without further input:
[ ]:
top = 'notebooks/data/bpti-c-alpha_centered.pdb'
MD_trajfiles = ['notebooks/data/c-alpha_centered.stride.1000.xtc',
'notebooks/data/c-alpha_centered.stride.1000.reversed.xtc',
'notebooks/data/c-alpha_centered.stride.1000.halved.xtc'
]
dt = 244 #saving interval in the .xtc files, in ns
import molpx
from matplotlib import pylab as plt
%matplotlib ipympl
import pyemma
import numpy as np
# This way the user does not have to care where the data are:
top = molpx._molpxdir(top)
MD_trajfiles = [molpx._molpxdir(ff) for ff in MD_trajfiles]
However, molpx
relies heavily on the awesome `mdtraj
<http://www.mdtraj.org>`__ module for dealing with molecular structures, and so most of molpx
’s functions accept also Trajectory
-type objects (native to mdtraj
) as alternative inputs.
[ ]:
# Create a memory representation of the trajectories
MD_list = [molpx.generate._md.load(itraj, top=top) for itraj in MD_trajfiles]
The same idea applies to the input of projected trajectories: molpx
can take the filenames as inputs (.npy
, .dat
, .txt
etc) or deal directly with numpy.ndarray
objects.
** These alternative, “from-memory” input modes (md.Trajectory
and np.ndarray
objects) avoid forcing the user to read from file everytime an API function is called, saving I/O overhead**
The following cell either reads or generates projected trajectory files for this demonstration. In a real usecase this step (done here using TICA) might not be needed, given that the user might have generated the projected trajectory elsewhere:
[ ]:
# Perform TICA or read from file directly if already .npy-files exist
Y_filenames = [ff.replace('.xtc','.Y.npy') for ff in MD_trajfiles]
try:
Y = [np.load(ff) for ff in Y_filenames]
except:
feat = pyemma.coordinates.featurizer(top)
pairs = feat.pairs(range(feat.topology.n_atoms)[::2])
feat.add_distances(pairs)
src = pyemma.coordinates.source(MD_trajfiles, features=feat)
tica = pyemma.coordinates.tica(src, lag=10, dim=3)
Y = tica.get_output()
[np.save(ff, iY) for ff, iY in zip(Y_filenames, Y)]
Execute the following cell and click either on the FES or on the slidebar. Some input parameters have been comented out for you to try out: * different modes of input (disk vs memory) * different projection indices * different number of overlaid structures
[ ]:
mpx_wdg_box = molpx.visualize.FES(MD_list,
#MD_trajfiles,
top,
Y_filenames,
#Y,
nbins=50,
#proj_idxs=[1,2],
proj_labels='TIC',
#n_overlays=5,
)
mpx_wdg_box
The user can sample structures as they occurr in sequence in the actual trajectory. Depending on the size of the dataset, this can be very time consuming, particularly if data is being read from disk.
In this example, try changing MD_trajfiles
to MD_list
and/or changing Y_filenames
to simply Y
and see if it helps.
Furthermore, the objects in memory can be strided down to fewer frames before being parsed to the method. To stride objects being read from the disk, use the stride
parameter.
visualize.traj
. Uncomment them and see what happens¶[ ]:
mpx_wdg_box = molpx.visualize.traj(MD_trajfiles,
top,
Y,
#Y_filenames,
plot_FES = True,
dt = dt*1e-6, tunits='ms',
#traj_selection = 1,
#sharey_traj=False,
#max_frames=100,
proj_idxs=[0, 1],
panel_height=2,
#proj_labels='TIC'
)
mpx_wdg_box
Intermediate steps: using molpx to generate a regspace sample of the data¶
See the documentation of molpx.generate.sample
to find out about all possible options:
molpx.generate.sample(MD_trajectories, MD_top, projected_trajectories, atom_selection=None, proj_idxs=[0, 1], n_points=100, n_geom_samples=1, keep_all_samples=False, proj_stride=1, verbose=False, return_data=False)
[ ]:
data_sample, geoms = molpx.generate.sample(#MD_list,
MD_trajfiles,
top,
#Y,
Y_filenames,
n_points=200 ,
n_geom_samples=2,
)
data_sample.shape, geoms
Click either on the plot or on the widget slidebar: they’re connected!
[ ]:
# Replot the FES
plt.figure(figsize=(7,7))
h, (x,y) = np.histogramdd(np.vstack(Y)[:,:2], bins=50)
plt.contourf(x[:-1], y[:-1], -np.log(h.T), alpha=.50)
# Create the linked widget
linked_ngl_wdg, linked_ax_wdg = molpx.visualize.sample(data_sample,
geoms.superpose(geoms[0]),
plt.gca(),
clear_lines=True,
#plot_path=True
)
plt.plot(data_sample[:,0], data_sample[:,1],' ok', zorder=0)
# Show it
linked_ngl_wdg
Paths samples along the different projections (=axis)¶
[ ]:
paths_dict, idata = molpx.generate.projection_paths(#MD_list,
MD_trajfiles,
top,
Y_filenames,
#Y, # You can also directly give the data here
n_points=50,
proj_idxs=[0,1],
n_projs=3,
proj_dim = 3,
verbose=False,
)
Click either on the plot or on the widget slidebar: they’re connected! You can change the type of path between min_rmsd or min_disp and you can also change the coordinate sampled (0 or 1)
[ ]:
# Choose the coordinate and the tyep of path
coord = 1
#path_type = 'min_rmsd'
path_type = 'min_disp'
igeom = paths_dict[coord][path_type]["geom"]
ipath = paths_dict[coord][path_type]["proj"]
# Choose the proj_idxs for the path and the FES
# to be shown
proj_idxs = [0,1]
[ ]:
plt.figure(figsize=(7,7))
h, (x,y) = np.histogramdd(np.vstack(Y)[:,proj_idxs], bins=50)
plt.contourf(x[:-1], y[:-1], -np.log(h.T), alpha=.50)
linked_ngl_wdg, linked_ax_wdg = molpx.visualize.sample(ipath[:,proj_idxs],
igeom.superpose(igeom[0]),
plt.gca(),
clear_lines=True,
n_smooth = 5,
plot_path=True,
#radius=True,
)
linked_ngl_wdg
Interaction with PyEMMA
¶
molpx
is using many methods of the coordinates
submodule of PyEMMA
, and thus it also understands some of PyEMMA
’s classes as input (like clustering objects or streaming transformers). ## Using the TICA object to visualize the most correlated input features If the projected coordinates come from a TICA (or PCA) transformation, and the TICA object is available in memory molpx.visualize.traj
can make use of correlation information to display not only the projected coordinates
(i.e the TICs, in this case), but also the “original” input features behind it
[ ]:
# Re-do the TICA computation to make sure we have a tica object in memory
feat = pyemma.coordinates.featurizer(top)
pairs = feat.pairs(range(feat.topology.n_atoms)[::2])
feat.add_distances(pairs)
src = pyemma.coordinates.source(MD_trajfiles, features=feat)
tica = pyemma.coordinates.tica(src, lag=10, dim=3)
Y = tica.get_output()
The method molpx.visualize.correlations
tries to provide a visual representation of the projected coordinates by relating them to the input features, which carry more meaning, since they are (usually) familiar parameters such as atom distances, angles, contacts etc.
[ ]:
# Comment or uncomment the optinal parameters and see how the method reacts
# You can use a pre-instantiated the widget
#iwd = molpx.visualize._nglwidget_wrapper(MD_list[0][0])
# Or instantiate at the moment of calling visualize.correlations
iwd = None
corr, ngl_wdg = molpx.visualize.correlations(tica,
n_feats=3,
proj_idxs=[0,1,2],
geoms=MD_list[0][::100],
#verbose=True,
#proj_color_list=['red', 'blue', 'green'],
widget=iwd
)
ngl_wdg
Also, molpx.visualize.traj
can help in visualizing these correlations by parsing along the tica object itself as projection=tica
. In the next cell, can you spot the differences: * In the nglwidget? * In the trajectories?
[ ]:
# Reuse the visualize.traj method with the tica object as input
mpx_wdg_box = molpx.visualize.traj(MD_trajfiles,
top,
Y,
#Y_filenames,
plot_FES = True,
dt = dt*1e-6, tunits='ms',
#traj_selection = 0,
#sharey_traj=False,
#max_frames=100,
proj_idxs=[0,1],
panel_height=1,
projection=tica, ## this is what's new
n_feats=2
)
mpx_wdg_box
If the dataset has already been clustered, and it is that clustering that the user wants to explore, molpx.generate.sample
can take this clustering object as an input instead of the the projected trajectories:
[ ]:
# Do "some" clustering
clkmeans = pyemma.coordinates.cluster_kmeans([iY[:,:2] for iY in Y], 5)
[ ]:
data_sample, geoms = molpx.generate.sample(MD_trajfiles, top, clkmeans,
n_geom_samples=50,
#keep_all_samples=True # read the doc for this argument
)
[ ]:
# Plot clusters
plt.figure(figsize=(4,4))
plt.plot(clkmeans.clustercenters[:,0], clkmeans.clustercenters[:,1],' ok')
# FES as background is optional (change the bool to False)
if True:
plt.contourf(x[:-1], y[:-1], -np.log(h.T), alpha=.50)
# Link the clusters positions with the molecular structures
linked_ngl_wdg = molpx.visualize.sample(data_sample,
geoms.superpose(geoms[0]),
plt.gca(),
clear_lines=False,
#plot_path=True
)
linked_ngl_wdg
Visually inspect the network behind an MSM
[ ]:
MSM = pyemma.msm.estimate_markov_model(clkmeans.dtrajs, 20)
[ ]:
plt.figure(figsize=(4,4))
ax, pos = pyemma.plots.plot_markov_model(MSM.P,
minflux=5e-4,
arrow_labels=None,
ax=plt.gca(),
arrow_curvature = 2, show_frame=True,
pos=clkmeans.clustercenters)
# Add a background if wanted
h, (x, y) = np.histogramdd(np.vstack(Y)[:,:2], weights=np.hstack(MSM.trajectory_weights()), bins=50)
plt.contourf(x[:-1], y[:-1], -np.log(h.T), cmap="jet", alpha=.5, zorder=0)
plt.xlim(x[[0,-1]])
plt.xticks(np.unique(x.round()))
plt.yticks(np.unique(y.round()))
plt.ylim(y[[0,-1]])
linked_ngl_wd, linked_ax_wd = molpx.visualize.sample(pos, geoms, plt.gca(), dot_color='blue')
linked_ngl_wd
[ ]:
# Do an MSM with a realistic number of clustercenters
cl_many = pyemma.coordinates.cluster_regspace([iY[:,:2] for iY in Y], dmin=.25)
M = pyemma.msm.estimate_markov_model(cl_many.dtrajs, 20)
cl_many.n_clusters
[ ]:
# Use this object to sample geometries
pos, geom = molpx.generate.sample(MD_trajfiles, top, cl_many)
[ ]:
# Find the most representative microstate of each
# and least populated macrostate
M.pcca(3)
dens_max_i = [distro.argmax() for distro in M.metastable_distributions]
A = np.argmax([M.stationary_distribution[iset].sum() for iset in M.metastable_sets])
B = np.argmin([M.stationary_distribution[iset].sum() for iset in M.metastable_sets])
print(cl_many.clustercenters[dens_max_i[A]],
cl_many.clustercenters[dens_max_i[B]])
[ ]:
# Create a TPT object with most_pop, least_pop as source, sink respectively
tpt = pyemma.msm.tpt(M, [dens_max_i[A]], [dens_max_i[B]])
paths, flux = tpt.pathways(fraction=.5)
[ ]:
# Get a path with a decent number of intermediates
sample_path = paths[np.argmax([len(ipath) for ipath in paths])]
[ ]:
plt.figure()
plt.contourf(x[:-1], y[:-1], -np.log(h.T), cmap="jet", alpha=.5, zorder=0)
linked_ngl_wdg, linked_ax_wdg = molpx.visualize.sample(cl_many.clustercenters[sample_path],
geom[sample_path].superpose(geom[sample_path[0]]), plt.gca(),
plot_path=True,
)
plt.scatter(*cl_many.clustercenters.T, alpha=.25)
linked_ngl_wdg
[ ]:
molPX and TICA with DiAla¶
molPX Di-Ala example¶
Guillermo Pérez-Hernández guille.perez@fu-berlin.de
In this notebook we will be using a trajectory of Di-Ala-peptide to easily identify conformations in the Ramachandran plot.
[4]:
from os.path import exists
import molpx
from matplotlib import pylab as plt
%matplotlib ipympl
import pyemma
import numpy as np
[5]:
top = molpx._molpxdir(join='notebooks/data/ala2.pdb')
# What data do we have?
if exists('/group/ag_cmb/scratch/gph82/Di-Ala-nbdata/ala2.dcd'):
MD_trajfiles = ['/group/ag_cmb/scratch/gph82/Di-Ala-nbdata/ala2.dcd'] #long trajectory
elif exists('/home/guille/ala2.dcd'):
MD_trajfiles = ['/home/guille/ala2.dcd'] # extra for Stralsund
else:
MD_trajfiles = [molpx._molpxdir(join='notebooks/data/ala2.mini.xtc')] #short trajectory
PyEMMA
¶[6]:
feat = pyemma.coordinates.featurizer(top)
feat.add_backbone_torsions()
src = pyemma.coordinates.source(MD_trajfiles, features=feat)
Y = src.get_output()
Execute the following cell and click either on the FES or on the slidebar
[7]:
mpx_widget_box = molpx.visualize.FES(MD_trajfiles,
top,
Y,
#proj_idxs=[1],
nbins=50,
proj_labels=['$\phi$',
'$\psi$'],
atom_selection="symbol != H",
#n_overlays=5,
#sticky=True,
#color_list='random'
)
mpx_widget_box
[8]:
from molpx import visualize, _linkutils
from imp import reload
reload(visualize)
reload(_linkutils)
mpl_wdg_box = molpx.visualize.traj(MD_trajfiles,
top,
Y,
plot_FES = True,
#dt = dt*1e-6, tunits='ms',
max_frames=10000,
proj_idxs=[0, 1],
panel_height=2,
proj_labels=['$\phi$', '$\psi$']
)
mpl_wdg_box
Paths samples along the different projections (=axis)¶
[10]:
paths_dict, idata = molpx.generate.projection_paths(MD_trajfiles,
top,
Y,
n_points=50,
proj_idxs=[0,1],
n_projs=3,
proj_dim = 3,
verbose=False,
)
[11]:
# Choose the coordinate and the type of path
coord = 1
path_type = 'min_rmsd'
#path_type = 'min_disp'
igeom = paths_dict[coord][path_type]["geom"]
ipath = paths_dict[coord][path_type]["proj"]
# Choose the proj_idxs for the path and the FES
# to be shown
proj_idxs = [0,1]
[12]:
plt.ioff() # Turn of interactive plotting
plt.figure(figsize=(4,4))
h, (x,y) = np.histogramdd(np.vstack(Y)[:,proj_idxs], bins=50)
plt.contourf(x[:-1], y[:-1], -np.log(h.T), alpha=.50)
plt.ion()
linked_NGL_wdg, linked_ax_widget = molpx.visualize.sample(ipath[:,proj_idxs],
igeom,
plt.gca(),
clear_lines=True,
n_smooth = 2,
plot_path=True,
#radius=True,
)
linked_NGL_wdg._set_size('4in', '4in')
from ipywidgets import HBox
HBox([linked_NGL_wdg, linked_ax_widget.canvas])
/home/guille/miniconda3/lib/python3.6/site-packages/ipykernel_launcher.py:4: RuntimeWarning: divide by zero encountered in log
after removing the cwd from sys.path.
Let’s do TICA and try to look a the correlations in a TICA analysis¶
[13]:
feat = pyemma.coordinates.featurizer(top)
#feat.add_backbone_torsions(cossin=True)
feat.add_distances(feat.topology.select('symbol != H'))
src = pyemma.coordinates.source(MD_trajfiles, features=feat)
tica = pyemma.coordinates.tica(src, lag=np.int(src.trajectory_lengths()/3000))
Y_tica = tica.get_output()
01-11-17 21:01:44 pyemma.coordinates.data.featurization.featurizer.MDFeaturizer[7] WARNING The 1D arrays input for add_distances() have been sorted, and index duplicates have been eliminated.
Check the output of describe() to see the actual order of the features
[15]:
mpx_wdg_box = molpx.visualize.FES(MD_trajfiles,
top,
Y_tica,
n_overlays=5,
atom_selection='backbone',
#sticky=True,
#color_list='rand'
)
mpx_wdg_box
[17]:
mpx_wdg_box = molpx.visualize.traj(MD_trajfiles,
top,
Y_tica,
plot_FES = True,
#dt = dt*1e-6, tunits='ms',
max_frames=10000,
#proj_idxs=[0,1],
panel_height=2,
projection=tica
)
mpx_wdg_box
[20]:
paths_dict, idata = molpx.generate.projection_paths(MD_trajfiles,
top,
Y_tica,
n_points=50,
proj_idxs=[0,1],
n_projs=2,
proj_dim = 2,
verbose=False,
)
[21]:
# Choose the coordinate and the type of path
coord = 0
path_type = 'min_rmsd'
#path_type = 'min_disp'
igeom = paths_dict[coord][path_type]["geom"]
ipath = paths_dict[coord][path_type]["proj"]
# Choose the proj_idxs for the path and the FES
# to be shown
proj_idxs = [0,1]
[22]:
plt.figure(figsize=(4,4))
h, (x,y) = np.histogramdd(np.vstack(Y_tica)[:,proj_idxs], bins=50)
plt.contourf(x[:-1], y[:-1], -np.log(h.T), alpha=.50)
linked_wdg, axes_widget = molpx.visualize.sample(ipath[:,proj_idxs],
igeom,
plt.gca(),
clear_lines=True,
n_smooth = 1,
plot_path=True,
)
# You can even choose to add the correlations a posteriori
molpx.visualize.correlations(tica, widget=linked_wdg, proj_idxs=0)
linked_wdg.center_view()
linked_wdg
/home/guille/miniconda3/lib/python3.6/site-packages/ipykernel_launcher.py:3: RuntimeWarning: divide by zero encountered in log
This is separate from the ipykernel package so we can avoid doing imports until
WARNING:root:DEPRECATED: Please use 'center' method
[ ]:
molPX and Metadynamics with DiAla¶
FES from biased trajectories, too!¶
Vladas Oleinikovas uccavol@ucl.ac.uk Guillermo Pérez-Hernández guille.perez@fu-berlin.de
We do that by passing correct weights for each conformation in our trajectory.
We exemplify this with metadynamics trajectory of the ever popular test system - alanine dipeptide. The 2ns simulation used phi and psi angles as CVs, and the corresponding weight was derived using Tiwary and Parrinello reweighting algorithm (JPCB 2014, doi:10.1021/jp504920s).
Try commenting out the weights to notice the effect on the estimated depth of the basins.
[1]:
import molpx
import numpy as np
%matplotlib ipympl
# Topology
top = molpx._molpxdir(join='notebooks/data/ala2.pdb')
# Generated during the metadynamics run (extra reversed copy to have a list of files)
MD_trajfiles = [molpx._molpxdir(join='notebooks/data/ala2.meta.xtc'),
molpx._molpxdir(join='notebooks/data/ala2.meta.reversed.xtc')]
# COLVAR files containing CV projections and logarithm of the corresponding weight
colvar_files = [molpx._molpxdir(join='notebooks/data/ala2.meta.CV.txt'),
molpx._molpxdir(join='notebooks/data/ala2.meta.CV.reversed.txt')]
# Load logweights and turn them into proper weights
weights = [np.exp(np.loadtxt(ifile)[:,7]) for ifile in colvar_files]
[5]:
mpx_wdg_box = molpx.visualize.FES(MD_trajfiles,
top,
colvar_files,
proj_idxs=[1,2],
nbins=50,
n_sample=200,
proj_labels='CV',
weights=weights, # this is important for biased trajectories!
)
mpx_wdg_box
We can also visualize the weights together with the CV-trajectories¶
[8]:
mpx_wdg_box = molpx.visualize.traj(MD_trajfiles,
top,
colvar_files,
max_frames=10000,
proj_idxs=[1, 2, 7],
panel_height=1,
proj_labels=['$CV_1$','$CV_2$','log(weights)'],
)
mpx_wdg_box
[ ]:
- Watch:
- Our Youtube video or the gif animation show
molpx
in action.
About¶
molPX has been developed mostly by Dr. Guillermo Pérez-Hernández in the group of Prof. Dr. Frank Noé, with the priceless help of:
Beyond molPX’s own methods, this module connects two incredibly powerful and incredibly useful python modules:
- mdtraj for handling molecular structures inside python
- nglview IPython/Jupyter widget for in-notebook molecular visualization.
molPX is specially in debt to Dr. Alexander Rose, who, apart from developing the impressive nglview (among other projects) provided the very first proof-of-concept for molPX.
molPX was recently introduced to the community in a PyEMMA workshop in Berlin:
Installation¶
To install molPX , you need a few Python package dependencies. If these dependencies are not available in their required versions, the installation will fail. We recommend one particular way for the installation that is relatively safe, but you are welcome to try another approaches if you know what you are doing.
Anaconda Install (recommended)¶
We strongly recommend to use the Anaconda scientific python distribution in order to install python-based software. Python-based software is not trivial to distribute and this approach saves you many headaches and problems that frequently arise in other installation methods. You are free to use a different approach (see below) if you know how to sort out problems, but play at your own risk.
If you already have a conda installation, directly go to step 3:
Download and install miniconda for Python 3+, 32 or 64 bit depending on your system.
http://conda.pydata.org/miniconda.html
For Windows users, who do not know what to choose for 32 or 64 bit, it is strongly recommended to read the second question of this FAQ first:
http://windows.microsoft.com/en-us/windows/32-bit-and-64-bit-windows
Run the installer and select yes to add conda to the PATH variable.
If you have installed from a Linux shell, either open a new shell to have an updated PATH, or update your PATH variable by
source ~/.bashrc
(or .tcsh, .csh - whichever shell you are using).Install molPX using the conda-forge channel:
conda install molpx -c conda-forge
if the command
conda
is unknown, the PATH variable is probably not set correctly (see 1. and 2.)Check installation:
conda list
shows you the installed python packages. You should find a molpx 0.1.2 (or later). molPX requires all the following packages, s. t. they will be installed (and their dependencies) if they are not installed already.
nglview>=1 ipywidgets>=7 pyemma scikit-learn notebook mdtraj ipympl
Python Package Index (PyPI)¶
If you do not like Anaconda for some reason you should use the Python package manager pip to install. This is not recommended, because in the past, various problems have arisen with pip in compiling the packages that molPX depends upon, see this issue for more information.
If you do not have pip, please read the install guide: install guide.
Make sure pip is enabled to install so called wheel packages:
pip install wheel
Now you are able to install binaries if you use MacOSX or Windows.
Install molPX using
pip install molPX
Check your installation
python >>> import molpx >>> molpx.__version__
should print 0.1.2 or later
>>> import IPython >>> IPython.__version__
should print 3.1 or later. If ipython is not up to date, update it by
pip install ipython
Building from Source¶
Building all dependencies from molPX from source is sometimes (if not usually) tricky, takes a long time and is error prone. It is not recommended nor supported by us. If unsure, use the Anaconda installation.
What you can do is clone or download the source from github. After that, just cd to the download directory (and untar/unzip if necessary) and:
>>> cd molPX
>>> python setup.py install
but be aware that success is not guaranteed. See the “Known Issues” below.
Known Issues¶
- After a successfull installation and execution of
molpx.example_notebooks()
…no widgets are shown! Most probably, this has to do with nglview and its needed Jupyter notebook extensions.This is a known, frustrating behaviour:
We’are aware of this and molPX automatically checks for the needed extensions every time it gets imported, s.t. it will refuse to start-up if it finds any problems. If somehow it manages to start-up but no widget is shown, try issuing
molpx._auto_enable_extensions()restarting everything and trying again. Otherwise, check the above links.
- A
SandboxViolation
error might appear when installing from source. This is because thenglview
dependencyis trying to enable the needed extensions. Until we figure this out, try to install
nglview
externally issuing:>>> conda install nglview -c biocondaor, alternatively
>>> pip install nglview
- Note that molPX only works with
nglview
versions >=0.6.2.1.
Changelog¶
0.1.6
New features:
molpx.notebooks()
- interface changed to open the list of available notebooks
- molpx.notebook() is tagged as deprecated
molpx.visualize
:- new methods
correlations()
to visualize the feature_TIC_correlation attribute of PyEMMA TICA-objects in the widgetfeature()
accepts a PyEMMA MDFeaturizer object directly
traj()
- has new optional parameter
projection
to accept a PyEMMA TICA object to
visualize the linear correlations in the widget AND in as trajectories f(t)
FES
:- can show also 1D FESs
- accepts “weights” as optional argument for showing re-weighted FESs (as in metadynamics)
- argument “proj_labels” can be a PyEMMA TICA object to directly use the tica.describe() strings
molpx.bmutils
:- new methods:
most_corr_info()
to support the new methods inmolpx.visualize
labelize()
to deal with proj_label inputssuperpose_to_most_compact_in_list()
to superpose geoms with some intelligence in the input parsing
- New Notebooks:
- new notebooks added to the documentation
Fixes:
molpx.visualize
:
- all calls to
nglview.show_mdtraj()
has been wrapped in_initialize_nglwidget_if_safe
to avoid- erroring in tests