Welcome to molPX: The Molecular Projection Explorer

DOI Travis build status Appveyor build status Codecov Documentation Status

The Molecular Projection Explorer, molPX, is a python module that provides interactive visualization of projected coordinates of molecular dynamics (MD) trajectories inside a Jupyter notebook.

molPX is based on the incredibly useful nglview IPython/Jupyter widget. Other libraries heavily used are mdtraj and PyEMMA. At the moment, there is also an sklearn dependency that might disappear in the future.

_images/output.gif

At the moment the API consists of two subpackages:

TL;DR: see molPX in action through the

Find more about the people behind molPX here:

Download and Install

If you can’t wait to play around with molPX, and you have the Anaconda scientifc python distribution (which we strongly recommend), the easiest way to get molPX is to issue the conda command:

>>> conda install molpx -c conda-forge

and jump to the Quick Start section of this document. Otherwise, check out our more exhaustive

Quick Start

Start an IPython console

>>> ipython

Import molpx and let the example notebooks guide you

>>> import molpx
>>> molpx.example_notebooks()

Voilà: you should be looking at a list of jupyter notebooks explaining the basic functionality of molPX

Documentation

You can find the latest documentation online here You can build a local copy of the html documentation by navigating to the molPX installation directory and issuing:

>>> cd doc
>>> make html

This will generate molPX/docs/build/html/index.html with the html documentation. If you are missing some of the requirements for the documentation , issue:

>>> pip install -r ./source/doc_requirements.txt

If you don’t know where molPX is installed, you can find out this way:

>>> ipython
>>> import molpx
>>> molpx._molpxdir()

The output of the last command is one subdirectory of molPX’s installation directory, so just copy it and issue:

>>> cd the-output-of-the-molpx._molpxdir-command
>>> cd ..

and you are there !

Warnings

molPX is currently under heavy development and the API might change rapidly. Stay tuned.

Data Privacy Statement

When you import this Python package, some of your metadata is sent to our servers. These are:

  • molPX version
  • Python version
  • Operating System
  • Hostname/ mac address of the accessing computer
  • Time of retrieval

How to disable this feature easily:

Even before you use molPX for the first time:

  1. Create a hidden folder .molpx in your home folder
  2. Create a file conf_molpx.py inside of .molpx with the following line: report_status = False
  3. Restart your ipython/jupyter sessions

Hints:

  • This is most easily realized from terminal by issuing:

    >>> mkdir ~/.molpx
    >>> echo "report_status = False" >> ~/.molpx/conf_molpx.py
    
  • You can check your report status anytime by typing this line in a (i)python terminal

    >>> import molpx
    >>> molpx._report_status()
    
  • If you don’t know where your home folder is (for whatever reason), you can find it out by typing in a (i)python terminal

    >>> import os
    >>> os.path.expanduser('~/.molpx')
    

molpx.visualize

The core functionality is to link two interative figures, fig1 and fig2, inside an IPython/Jupyter notebook, so that an action in fig1 (e.g.a click of the mouse or a slide of a slidebar) will trigger an event in fig2 (e.g. a frame update or point moved) and vice versa. Usually, these two figures contain representations from:

  • molecules: an NGLview widget showing one (or more) molecular structure(s) that a particular value of the coordinate(s) is associated with and
  • projected coordinates: a matplotlib figure showing the projected coordinates (e.g. TICs or PCs or any other), \({Y_0, ..., Y_N}\), either as a 2D histogram, \(PDF(Y_i, Y_j)\) or as trajectory views \({Y_0(t), ...Y_N(t)}\)

You are strongly encouraged to check NGLview’s documentation, since its functionalities extend beyond the scope of this package and the molecular visualization universe is rich and complex (unlike this module).

The methods offered by this module are:

molpx.visualize.FES(MD_trajectories, MD_top, …) Return a molecular visualization widget connected with a free energy plot.
molpx.visualize.sample(positions, geom, ax) Visualize the geometries in geom according to the data in positions on an existing matplotlib axes ax
molpx.visualize.traj(MD_trajectories, …[, …]) Link one or many projected trajectories, [Y_0(t), Y_1(t)…], with the MD_trajectories that originated them.
molpx.visualize.correlations(correlation_input) Provide a visual and textual representation of the linear correlations between projected coordinates (PCA, TICA) and original features.
molpx.visualize.feature(feat, widget[, …]) Provide a visual representation of a PyEMMA feature.
molpx.visualize.FES(MD_trajectories, MD_top, projected_trajectories, proj_idxs=[0, 1], nbins=100, n_sample=100, proj_stride=1, weights=None, proj_labels='proj', n_overlays=1, atom_selection=None, **sample_kwargs)

Return a molecular visualization widget connected with a free energy plot.

Parameters:
  • MD_trajectories (str, or list of strings with the filename(s) the the molecular dynamics (MD) trajectories.) – Any file extension that mdtraj (.xtc, .dcd, etc) can read is accepted. Alternatively, a single mdtraj.Trajectory object or a list of them can be given as input.
  • MD_top (str to topology filename or directly an mdtraj.Topology object) –
  • projected_trajectories (numpy ndarray (or list thereof) of shape (n_frames, n_dims) with the time-series) –
  • the projection(s) that want to be explored. Alternatively, strings or list of string with .npy or ascii filenames (of) – filenames (.dat, .txt etc)
  • NOTE (molpx assumes that there is no time column.) –
  • proj_idxs (int, list or ndarray) – Selection of projection idxs (zero-idxd) to visualize.
  • nbins (int, default 100) – The number of bins per axis to used in the histogram (FES)
  • n_sample (int, default is 100) – The number of geometries that will be used to represent the FES. The higher the number, the higher the spatial resolution of the “click”-action. (And the longer it will take to generate the plot)
  • proj_stride (int, default is 1) – Stride value that was used in the projected_trajectories relative to the MD_trajectories If the original MD_trajectories were stored every 5 ps but the projected trajectories were stored every 50 ps, proj_stride = 10 has to be provided, otherwise an exception will be thrown informing the user that the MD_trajectories and the projected_trajectories have different number of frames.
  • weights (iterable of floats (or list thereof) each of shape (n_frames, 1) or (n_frames)) – The sample weights can come from a metadynamics run or an MSM-object, e.g. via the method pyemma.msm.BayesianMSM.trajectory_weights. Has to have the same length as the projected_trajectories
  • proj_labels (either string or list of strings or (experimental) PyEMMA featurizer) – The projection plots will get this parameter for labeling their yaxis. If a str is provided, that will be the base name proj_labels='%s_%u'%(proj_labels,ii) for each projection. If proj_labels is a list, the list will be used as is. If there are not enough labels, the module will complain
  • n_overlays (int, default is 1) – The number of structures that will be simultaneously displayed as overlays for every sampled point of the FES. This parameter can seriously slow down the method, it is currently limited to a maximum value of 50
  • atom_selection (string or iterable of integers, default is None) – The geometries of the original trajectory files will be filtered down to these atoms. It can be any DSL string that mdtraj.Topology.select could understand or directly the iterable of integers. If MD_trajectories is already a (list of) mdtraj.Trajectory objects, the atom-slicing can be done by the user place before calling this method.
  • sample_kwargs (dictionary of named arguments, optional) – named arguments for the function molpx.visualize.sample. Non-expert users can safely ignore this option. Examples are superpose or proj_idxs
Returns:

A molpx._linkutils.MolPXHBox-object.

It contains the NGLWidget and the AxesWidget (which is responsible for the interactive figure). It is child-class of the ipywidgets.HBox-class and has been monkey-patched to have the following extra attributes so that the user has access to all the information being displayed.

linked_axes :

list with all the pylab.Axis-objects contained in the widgetbox

linked_ax_wdgs :

list with all the matplotlib.widgets.AxesWidget`objects contained in the :obj:`widgetbox

linked_figs :

list with all the pylab.Figure-objects contained in the widgetbox

linked_ngl_wdgs :

list with all the nglview.NGLWidget-objects contained in the widgetbox

linked_data_arrays :

list with all the numpy ndarrays contained in the widgetbox

linked_mdgeoms:

list with all the mdtraj.Trajectory-objects contained in the widgetbox

Return type:

widgetbox

molpx.visualize.correlations(correlation_input, geoms=None, proj_idxs=None, feat_name=None, widget=None, proj_color_list=None, n_feats=1, verbose=False, featurizer=None)

Provide a visual and textual representation of the linear correlations between projected coordinates (PCA, TICA) and original features.

Parameters:
  • correlation_input (numpy ndarray or some PyEMMA objects) –
    if array :
    (m,m) correlation matrix, with a row for each feature and a column for each projection
    if PyEMMA-object :
    TICA, PCA or MDFeaturizer.
  • geoms (None or mdtraj.Trajectory, default is None) – The values of the most correlated features will be returned for the geometries in this object. If widget is left to its default, None, correlations will create a new widget and try to show the most correlated features on top of the widget.
  • widget (None or nglview.NGLWidget, default is None) –

    Provide an already existing widget to visualize the correlations on top of. This is only for expert use, because no checks are done to see if correlation_input and the geometry contained in the widget actually match. Use with caution.

    Note

    When objects geoms and widget are provided simultaneously, three things happen:
    • no new widget will be instantiated
    • the display of features will be on top of whatever geometry widget contains
    • the value of the features is computed for the geometry of geom

    Use with caution and clean bookkeeping!

  • proj_color_list (list, default is None) – projection specific list of colors to provide the representations with. The default None yields blue. In principle, the list can contain one color for each projection (= as many colors as len(proj_idxs) but if your list is short it will just default to the last color. This way, proj_color_list=[‘black’] will paint all black regardless len(proj_idxs). Input anything that yields matplotlib.colors.is_color_like == True
  • proj_idxs (None, or int, or iterable of integers, default is None) – The indices of the projections for which the most correlated feture will be returned. If none it will default to the dimension of the correlation_input object
  • feat_name (None or str, default is None) – The prefix with which to prepend the labels of the most correlated features. If left to None, the feature description found in correlation_input will be used, if available
  • n_feats (int, default is 1) – Number of most correlated features to return for each projection
  • featurizer (None or MDFeaturizer, default is None) – If correlation_input is not an MDFeaturizer itself, or doesn’t have a data_producer attribute, the user can input one here. If correlation_input and featurizer are both MDFeaturizer-objects, featurizer will be ignored.
  • verbose (Bool, default is True) – print to standard output
Returns:

  • corr_dict and ngl_wdg

  • corr_dict – Dictionary with items:

    idxs :

    List of length len(proj_idxs) with lists of length n_feat with the idxs of the most correlated features

    vals :

    List of length len(proj_idxs) with lists of length n_feat with the corelation values of the most correlated features

    labels :

    List of length len(proj_idxs) with lists of length n_feat with the labels of the most correlated features

    feats :

    If an mdtraj.Trajectory is passed as an geom argument, the most correlated features will be evaluated and returned as list of length len(proj_idxs). Each element in th elist is an arrays with shape (geom.n_frames, n_feats)

    atom_idxs :

    List of length len(proj_idxs) each with an nd.array of shape (nfeat, m), where m is the number of atoms needed to describe each feature (1 of cartesian, 2 for distances, 3 for angles, 4 for dihedrals)

    info :

    List of length len(proj_idxs) with lists of length n_feat with strings describing the correlations

  • ngl_wdgnglview.NGLWidget with the most correlated features (distances, angles, dihedrals, positions) visualized on top of it.

molpx.visualize.feature(feat, widget, idxs=0, color_list=None, **kwargs)

Provide a visual representation of a PyEMMA feature. PyEMMA’s features are found as a list of the MDFeaturizers’s active_features attribute

Parameters:
  • featurizer (py:obj:_MDFeautrizer) – A PyEMMA MDFeaturizer object (either a feature or a featurizer, works with both)
  • widget (None or nglview widget) – Provide an already existing widget to visualize the correlations on top of. This is only for expert use, because no checks are done to see if correlation_input and the geometry contained in the widget actually match. Use with caution.
  • idxs (int or iterable of integers, default is 0) – Features can have many contributions, e.g. a distance feature can include many distances. Use this parameter to control which one of them gets represented.
  • color_list (list, default is None) – list of colors to represent each feature in feat_idxs. The default None yields blue for everything. In principle, the list can contain one color for each projection (= as many colors as len(feat_idxs) but if your list is short it will just default to the last color. This way, color_list=[‘black’] will paint all black regardless len(proj_idxs)
  • **kwargs (optional keyword arguments for _bmutils.add_atom_idxs_widget) – currently, only “radius” is left for the user to determine
  • Returns
  • --------
  • widget – the input widget with the features in idxs represented as either distances (for distance features) or “spacefill” spheres (for angular features)
molpx.visualize.sample(positions, geom, ax, plot_path=False, clear_lines=True, n_smooth=0, ngl_wdg=None, superpose=True, projection=None, n_feats=1, sticky=False, list_of_repr_dicts=None, color_list=None, **link_ax2wdg_kwargs)

Visualize the geometries in geom according to the data in positions on an existing matplotlib axes ax

Use this method when the array of positions, the geometries, the axes (and the ngl_wdg, optionally) have already been generated elsewhere.

Parameters:
  • positions (numpy nd.array of shape (n_frames, 2)) – Contains the position associated the geometries in geom. See below for more details
  • geom (mdtraj.Trajectory object or a list thereof.) – The geometries associated with the the positions. # TODO: WRITE HOW THE LISTS-LENGTHS CORRESPONDS FOR THE STICKY OPTION
  • ax (matplotlib.pyplot.Axes object) – The axes to be linked with the NGLWidget
  • plot_path (bool, default is False) – whether to draw a line connecting the positions in positions
  • clear_lines (bool, default is True) – whether to clear all the 2D objects that were previously drawn in ax
  • n_smooth (int, default is 0,) – if n_smooth > 0, the shown geometries and paths will be smoothed out by 2* n_smooth +1. See molpx._bmutils.smooth_geom for more information
  • ngl_wdg (None or an existing NGLWidget, default is None) – you can provide an already instantiated NGLWidget here (avanced use)
  • superpose (boolean, default is True) – The geometries in geom may or may not be oriented, depending on how they were generated. Since this method is mostly for visualization purposes, the default behaviour is to orient them all to maximally overlap with the most compact frame available
  • projection (object that generated the positions, default is None) –

    The projected coordinates may come from a variety of sources. When working with PyEMMA, a number of objects might have generated this projection, like a

    Makes most sense when positions where generated with molpx.generate.projection_paths, otherwise might produce rubbish. See n_feats for more info. # TODO: delete from here below? The features most correlated with the positions will be shown in the widget # TODO CHECK THIS geometries in geom, allowing the user to establish a visual connection between the projected coordinate and the original features (distances, angles, contacts etc).

  • n_feats (int, default is 1) – If a projection is passed along, the first n_feats features that most correlate the the projected trajectories will be represented, both in form of figures of feat(t) as well as in the on top of the ngl_wdg. If projection == None, nfeats will be ignored.
  • sticky (boolean, default is False) – If set to True, the ngl_wdg will be sticky in that every click adds a new molecular structure without deleting the previous one. Left-clicks adds a structure, right-clicks deletes a structure. Particularly useful for representing more minima simultaneously.
  • color_list (None, list of len(positions), or “random”) – The colors with which the sticky frames will be plotted. A color is anything that yields matplotlib.colors.is_color_like == True “None” defaults to coloring by element. “random” randomizes the color choice
  • list_of_repr_dicts (None or list of dictionaries, default is None) – Has an effect only for sticky == True, s.t. these reps instead of the default ones are used. The dictionaries must have at least the keys ‘repr_type’ and ‘selection’. Other key-value pairs are currently ignored but, will be implemented in the future. See nglview.NGLWidget.add_representation for more info).
  • link_ax2wdg_kwargs (dictionary of named arguments, optional) – named arguments for the function molpx._linkutils.link_ax_w_pos_2_nglwidget, which is the one that internally provides the interactivity. Non-expert users can safely ignore this option.
Returns:

molpx.visualize.traj(MD_trajectories, MD_top, projected_trajectories, max_frames=10000.0, stride=1, proj_stride=1, proj_idxs=[0, 1], proj_labels='proj', plot_FES=False, weights=None, panel_height=1, sharey_traj=True, dt=1.0, tunits='frames', traj_selection=None, projection=None, n_feats=1)

Link one or many projected trajectories, [Y_0(t), Y_1(t)…], with the MD_trajectories that originated them. Optionally plot also the resulting FES.

Parameters:
  • MD_trajectories (str, or list of strings with the filename(s) the the molecular dynamics (MD) trajectories.) –

    Any file extension that mdtraj (.xtc, .dcd etc) can read is accepted.

    Alternatively, a single mdtraj.Trajectory object or a list of them can be given as input.

  • MD_top (str to topology filename or directly mdtraj.Topology object) –
  • projected_trajectories (numpy ndarray (or list thereof) of shape (n_frames, n_dims) with the time-series) –
  • the projection(s) that want to be explored. Alternatively, strings or list of string with .npy or ascii filenames (of) –
  • txt etc) ((dat,) –
  • NOTE (molpx assumes that there is no time column.) –
  • max_frames (int, default is 1000) – If the trajectoy is longer than this, stride to this length (in frames)
  • stride (int, default is 1) – Stride value in case of large datasets. In case of having MD_trajectories and projected_trajectories in memory (and not on disk) the stride can take place also before calling this method.
  • proj_stride (int, default is 1) – Stride value that was used in the projected_trajectories relative to the MD_trajectories If the original MD_trajectories were stored every 5 ps but the projected trajectories were stored every 50 ps, proj_stride = 10 has to be provided, otherwise an exception will be thrown informing the user that the MD_trajectories and the projected_trajectories have different number of frames.
  • proj_idxs (iterable of ints, default is [0,1]) – Indices of the projected coordinates to use in the various representations
  • proj_labels (either string or list of strings) – The projection plots will get this paramter for labeling their yaxis. If a str is provided, that will be the base name proj_labels=’%s_%u’%(proj_labels,ii) for each projection. If a list, the list will be used. If not enough labels are there the module will complain
  • plot_FES (bool, default is False) – Plot (and interactively link) the FES as well
  • weights (ndarray(n_frames), default = None) – sample weights. By default all samples have the same weight (used for FES calculation only)
  • panel_height (int, default 1) – Height, in inches, of each panel of each trajectory subplots
  • sharey_traj (bool, default is True) – Force the panels of each projection to have the same yaxes across trajectories (Note: Not across coordinates)
  • dt (float, default is 1.0) – Physical time-unit equivalent to one frame of the projected_trajectories
  • tunits (str, default is 'frames') – Name of the physical time unit provided in dt
  • traj_selection (None, int, iterable of ints, default is None) – Don’t plot all trajectories but only few of them. The default None implies that all trajs will be plotted. Note: the data used for the FES will always include all trajectories, regardless of this value
  • projection (object that generated the projection, default is None) –

    The projected coordinates may come from a variety of sources. When working with pyemma a number of objects might have generated this projection, like a

    Pass this object along and observe and the features that are most correlated with the projections will be plotted for the active trajectory, allowing the user to establish a visual connection between the projected coordinate and the original features (distances, angles, contacts etc)

  • n_feats (int, default is 1) – If a projection is passed along, the first n_feats features that most correlate the the projected trajectories will be represented, both in form of trajectories feat vs t as well as in the ngl_wdg. If projection is None, nfeats will be ignored.
Returns:

  • ax, ngl_wdg, data_sample, geoms – return _plt.gca(), _plt.gcf(), widget, geoms
  • axpylab.Axis object
  • figpylab.Figure object
  • ngl_wdgnglview.NGLWidget
  • geomsmdtraj.Trajectory object with the geometries n_sample geometries shown by the ngl_wdg

molpx.generate

This module contains methods that generate the needed objects for visualize of the methods to work.

molpx.generate.projection_paths(…[, …]) Return a path along a given projection.
molpx.generate.sample(MD_trajectories, …) Returns a sample of molecular geometries and their positions in the projected space
molpx.generate.projection_paths(MD_trajectories, MD_top, projected_trajectories, n_projs=1, proj_dim=2, proj_idxs=None, n_points=100, n_geom_samples=100, proj_stride=1, history_aware=True, verbose=False, minRMSD_selection='backbone')

Return a path along a given projection. More info on what this means exactly will follow soon.

Parameters:
  • MD_trajectories (str, or list of strings with the filename(s) the the molecular dynamics (MD) trajectories.) –

    Any file extension that mdtraj (.xtc, .dcd etc) can read is accepted.

    Alternatively, a single mdtraj.Trajectory object or a list of them can be given as input.

  • MD_top (str to topology filename or directly mdtraj.Topology object) –
  • projected_trajectories (str to a filename or numpy ndarray of shape (n_frames, n_dims)) – Time-series with the projection(s) that want to be explored. If these have been computed externally, you can provide .npy-filenames or readable asciis (.dat, .txt etc). NOTE: molpx assumes that there is no time column.
  • n_projs (int, default is 1) – Number of projection paths to generate. If the input projected_trajectories are n-dimensional, in principle up to n-paths can be generated
  • proj_dim (int, default is 2) – Dimensionality of the space in which distances will be computed
  • proj_idxs (int, defaultis None) – Selection of projection idxs (zero-idxd) to visualize. The default behaviour is that proj_idxs = range(n_projs). However, if proj_idxs != None, then n_projs is ignored and proj_dim is set automatically
  • n_points (int, default is 100) – Number of points along the projection path. The higher this number, the higher the projected coordinate is resolved, at the cost of more computational effort. It’s a trade-off parameter
  • n_geom_samples (int, default is 100) – For each of the n_points along the projection path, n_geom_samples will be retrieved from the trajectory files. The higher this number, the smoother the minRMSD projection path. Also, the longer it takes for the path to be computed
  • proj_stride (int, default is 1) – The stride of the projected_trajectories relative to the MD_trajectories. This will play a role particularly if projected_trajectories is already strided (because the user is holding it in memory) but the MD-data on disk has not been strided.
  • history_aware (bool, default is True) – The path-searching algorigthm the can minimize distances between adjacent points along the path or minimize the distance between each point and the mean value of all the other up to that point. Use this parameter to avoid a situation in which the path gets “derailed” because an outlier is chosen at a given point.
  • verbose (bool, default is False) – The verbosity level
  • minRMSD_selection (str, default is 'backbone') – When computing minRMSDs between a given point and adjacent candidates, use this string to select the atoms that will be considered. Check mdtraj’s selection language here http://mdtraj.org/latest/atom_selection.html
Returns:

dictionary of dictionaries containing the projection paths.

  • paths_dict[idxs][type_of_path]

    • idxs represent the index of the projected coordinate ([0], [1]…)
    • types of paths “min_rmsd” or “min_disp”
  • What the dictionary actually contains

    • paths_dict[idxs][type_of_path]["proj"] : ndarray of shape (n_points, proj_dim) with the coordinates of the projection along the path
    • paths_dict[idxs][type_of_path]["geom"] : mdtraj.Trajectory geometries along the path

Return type:

paths_dict

idata :
list of ndarrays with the the data in projected_trajectories
molpx.generate.sample(MD_trajectories, MD_top, projected_trajectories, atom_selection=None, proj_idxs=[0, 1], n_points=100, n_geom_samples=1, keep_all_samples=False, proj_stride=1, verbose=False, return_data=False)

Returns a sample of molecular geometries and their positions in the projected space

Parameters:
  • MD_trajectories (list of strings) – Filenames (any extension that mdtraj can read is accepted) containing the trajectory data. There is an untested input mode where the user parses directly mdtraj.Trajectory objects
  • MD_top (str to topology filename or directly mdtraj.Topology object) –
  • projected_trajectories ((list of) strings or (list of) numpy ndarrays of shape (n_frames, n_dims)) – Time-series with the projection(s) that want to be explored. You can provide .npy-filenames or readable asciis (.dat, .txt etc). Alternatively, you can feed in your own PyEMMA-clustering object NOTE: molpx assumes that there is no time column.
  • atom_selection (string or iterable of integers, default is None) – The geometries of the original trajectory files will be filtered down to these atoms. It can be any DSL string that mdtraj.Topology.select could understand or directly the iterable of integers. If :py:obj`MD_trajectories` is already a (list of) md.Trajectory objects, the atom-slicing can take place before calling this method.
  • proj_idxs (int, default is None) – Selection of projection idxs (zero-idxd) to visualize. The default behaviour is that proj_idxs = range(n_projs). However, if proj_idxs != None, then n_projs is ignored and proj_dim is set automatically
  • n_points (int, default is 100) – Number of points along the projection path. The higher this number, the higher the projected coordinate is resolved, at the cost of more computational effort. It’s a trade-off parameter
  • n_geom_samples (int, default is 1) – For each of the n_points along the projection path, n_geom_samples will be retrieved from the trajectory files. The higher this number, the smoother the minRMSD projection path. Also, the longer it takes for the path to be computed. This is a trade-off parameter between how smooth the transitons between geometries can be and how long it takes to generate the sample
  • keep_all_samples (boolean, default is False) – In principle, once the closest-to-ref geometry has been kept, the other geometries are discarded, and the output sample contains only n_point geometries. There are, still, special cases where the user might want to keep all sampled geometries. Typical use-case is when the n_points is low and many representatives per clustercenters will be much more informative than the other way around. This is an advanced feature that other methods of molPX use internally for generating overlays, be awere that it changes the return type of geom_smpl from the default (an mdtraj.Trajectory with n_points-frames) to a list list of length n_geom_samples, each element is an mdtraj.Trajectory object of n_points-frames
  • proj_stride (int, default is 1) – Stride value that was used in the projected_trajectories relative to the MD_trajectories If the original MD_trajectories were stored every 5 ps but the projected trajectories were stored every 50 ps, proj_stride = 10 has to be provided, otherwise an exception will be thrown informing the user that the MD_trajectories and the projected_trajectories have different number of frames.
Returns:

  • pos – ndarray with the positions of the sample
  • geom_smpl – sampled geometries. Can be of two types:
    • default: mdtraj.Trajectory with n_points-frames
    • if keep_all_samples = True: list of length n_geom_samples. Each element is an mdtraj.Trajectory object of n_points-frames.

Example Jupyter Notebooks

There are several ways to see the example notebooks, which you can find in the molpx/notebooks/ installation directory.

  1. Play:

    molpx has a method that will launch a working, temporary copy of the example notebooks. From an IPython console, just type:

    >>>> import molpx
    >>>> molpx.example_notebooks()
    

    A List of Juypter notebooks should automagically appear in front of you after a few seconds. This is the most interactive and usefull way to see molpx in action, but you’ll only have access to it after downloading and installing molpx

  2. Read:

    The links you see below are an html-rendered version of the notebook. Click on them to navigate the notebook. Unfortunately for this html documentation, nglview‘s output, i.e. the pictures of molecular structures, cannot be stored currently in the notebook file. In short: the html-notebook is lacking the most visually appealing part of molpx.

Quick Intro with DiAla
molPX Di-Ala example
Guillermo Pérez-Hernández  guille.perez@fu-berlin.de

In this notebook we will be using a trajectory of Di-Ala-peptide to easily identify conformations in the Ramachandran plot.

[1]:
import molpx
%matplotlib ipympl
Start from files on disk
[2]:
top = molpx._molpxdir(join='notebooks/data/ala2.pdb')
MD_trajfiles = [molpx._molpxdir(join='notebooks/data/ala2.mini.xtc')] #short trajectory
rama_files = [molpx._molpxdir(join='notebooks/data/ala2.mini.phi.psi.dat')]
Visualize a FES and the molecular structures behind it

Execute the following cell and click either on the FES or on the slidebar. Once you get familiar with it, try uncommenting these options and see what happens:

  • proj_idxs=[1]
  • n_overlays=5
  • sticky=True
  • color_list='random'

With sticky option is True, you can see the effect of a right-click on the dots of the FES

[3]:
mpx_wdg_box = molpx.visualize.FES(MD_trajfiles,
                                  top,
                                  rama_files,
                                  nbins=50,
                                  proj_labels=['$\phi$',
                                               '$\psi$'],
                                  atom_selection="symbol != H",
                                  #proj_idxs=[1],
                                  #n_overlays=5,
                                  #sticky=True,
                                  #color_list='random'
                                 )
mpx_wdg_box
Visualize trajectories and molecular structures (and the FES, optionally)
[4]:
from molpx import visualize, _linkutils
from imp import reload
reload(visualize)
reload(_linkutils)
from matplotlib import pyplot as plt
plt.close('all')
mpx_wdg_box = molpx.visualize.traj(MD_trajfiles,
                                   top,
                                   rama_files,
                                   plot_FES = True,
                                   proj_labels=['$\phi$', '$\psi$']
                          )

mpx_wdg_box
Repeat from memory

You can load the trajectory data using mdtraj and numpy and use that directly as an input:

[5]:
import numpy as np
import mdtraj as md
MD_trajs = [md.load(fname, top=top) for fname in MD_trajfiles]
phi_psi = [np.loadtxt(fname) for fname in rama_files]
[6]:
mpx_wdg_box = molpx.visualize.FES(MD_trajs,
                                  top,
                                  phi_psi,
                                  nbins=50,
                                  proj_labels=['$\phi$',
                                               '$\psi$'],
                                  atom_selection="symbol != H",
                                  #proj_idxs=[1],
                                  #n_overlays=5,
                                  #sticky=True,
                                  #color_list='random'
                                 )
mpx_wdg_box
The mpx_wdg_box

It is a class derived from the ipython widgets HBox and VBox, with molPX’s extra information as attributes starting with “linked_*”

[12]:
for attr in dir(mpx_wdg_box):
    if attr.startswith('linked_'):
        print(attr)
linked_ax_wdgs
linked_axes
linked_data_arrays
linked_figs
linked_mdgeoms
linked_ngl_wdgs
molPX and PyEMMA
Visualize PyEMMA MDFeaturizer-objects
Guillermo Perez-Hernandez  guille.perez@fu-berlin.de

In this notebook, we introduce the molPX’s ability to display PyEMMA specific objects: features.If you are note familiar with PyEMMA (www.pyemma.org), you might not follow entirely, but it is pretty intuitive overall. Give it a try!

We will be using two datasets: * 1 millisecond trajectory of Bovine Pancreatic Trypsin Inhibitor (BPTI) generated by DE Shaw Research on the Anton Supercomputer and kindly made available by their lab. The original work is

* Shaw DE, Maragakis P, Lindorff-Larsen K, Piana S, Dror RO, Eastwood MP, Bank JA, Jumper JM, Salmon JK, Shan Y, Wriggers W: Atomic-level characterization of the structural dynamics of proteins. Science 330:341-346 (2010).  doi: 10.1126/science.1187409.

The trajectory has been duplicated and shortened to provide a mock-trajectory set and be able to deal with lists of trajectories of different lenghts:

 * `c-alpha_centered.stride.100.xtc`
 * `c-alpha_centered.stride.100.reversed.xtc`
 * `c-alpha_centered.stride.100.halved.xtc`
  • A short, Di-Alanine-Peptide trajectory
[1]:
top = 'notebooks/data/bpti-c-alpha_centered.pdb'
MD_trajfiles = ['notebooks/data/c-alpha_centered.stride.1000.xtc',
               'notebooks/data/c-alpha_centered.stride.1000.reversed.xtc',
                'notebooks/data/c-alpha_centered.stride.1000.halved.xtc'
               ]

dt = 244 #saving interval in the .xtc files, in ns

import molpx
from matplotlib import pylab as plt
%matplotlib ipympl
import pyemma
import numpy as np

# This way the user does not have to care where the data are:
top = molpx._molpxdir(top)
MD_trajfiles = [molpx._molpxdir(ff) for ff in MD_trajfiles]
[2]:
# Create a memory representation of the trajectories
MD_list = [molpx.generate._md.load(itraj, top=top) for itraj in MD_trajfiles]
[3]:
feat = pyemma.coordinates.featurizer(top)
pairs = feat.pairs(range(feat.topology.n_atoms)[::2])
feat.add_distances(pairs)
src  = pyemma.coordinates.source(MD_trajfiles, features=feat)
Y = src.get_output()
feat.describe()[108], feat.describe()[112], feat
[3]:
('DIST: PRO 9 CA 8 - LYS 15 CA 14',
 'DIST: PRO 9 CA 8 - TYR 23 CA 22',
 <pyemma.coordinates.data.featurization.featurizer.MDFeaturizer at 0x7efc407e7550>)
Visualize a FES and the Features
[4]:
proj_idxs = [108,112]
mpx_wdg_box = molpx.visualize.FES(MD_list,
                                  top,
                                  Y,
                                  nbins=50,
                                  proj_idxs=proj_idxs,
                                  #proj_labels=feat,
                                  proj_labels='feat',
                                  #n_overlays=5,
                                  #sticky=True,
                                 )
__ = molpx.visualize.feature(feat.active_features[0],
                             mpx_wdg_box.linked_ngl_wdgs[0],
                             idxs=proj_idxs, radius=.5,
                             color_list=['red','green'])

mpx_wdg_box.display()
Visualize trajectories and the features
[5]:
mpx_wdg_box = molpx.visualize.traj(MD_trajfiles,
                                  top,
                                  Y,
                                  dt = dt*1e-6, tunits='ms',
                                  traj_selection = 0,
                                  #sharey_traj=False,
                                  proj_idxs=proj_idxs,
                                  panel_height=1,
                                  proj_labels='feat',
                  )
molpx.visualize.feature(feat.active_features[0], mpx_wdg_box.linked_ngl_wdgs[0], idxs=proj_idxs)
mpx_wdg_box
Cartesian Features
[6]:
feat = pyemma.coordinates.featurizer(top)
feat.add_all()
Y = [feat.transform(igeom.superpose(MD_list[0])) for igeom in MD_list]
feat.describe()[24], feat.describe()[42]
[6]:
('ATOM:PRO 9 CA 8 x', 'ATOM:LYS 15 CA 14 x')
[7]:
proj_idxs = [24,42]
mpx_wdg_box = molpx.visualize.FES(MD_list,
                                  top,
                                  Y,
                                  nbins=50,
                                  proj_idxs=proj_idxs,
                                  proj_labels='feat',
                                  #n_overlays=5,
                                          )
molpx.visualize.feature(feat.active_features[0], mpx_wdg_box.linked_ngl_wdgs[0], idxs=proj_idxs, color_list=['red','green'])
mpx_wdg_box
Angular Features (Di-Ala)
[8]:
from os.path import exists
top = molpx._molpxdir(join='notebooks/data/ala2.pdb')
# What data do we  have?
if exists('/group/ag_cmb/scratch/gph82/Di-Ala-nbdata/ala2.dcd'):
    MD_trajfiles = ['/group/ag_cmb/scratch/gph82/Di-Ala-nbdata/ala2.dcd'] #long trajectory
elif exists('/home/guille/ala2a.dcd'):
    MD_trajfiles = ['/home/guille/ala2.dcd'] # extra for Stralsund
else:
    MD_trajfiles = [molpx._molpxdir(join='notebooks/data/ala2.mini.xtc')] #short trajectory

feat = pyemma.coordinates.featurizer(top)
feat.add_backbone_torsions(
#    cossin=True
)
src  = pyemma.coordinates.source(MD_trajfiles, features=feat)
Y = src.get_output()
[9]:
proj_idxs = [0,1]
mpx_wdg_box = molpx.visualize.FES(MD_trajfiles,
                                  top,
                                  Y,
                                  nbins=50,
                                  proj_idxs=proj_idxs,
                                  proj_labels=feat,
                                 )
molpx.visualize.feature(feat.active_features[0], mpx_wdg_box.linked_ngl_wdgs[0], idxs=[0], radius=.5)
mpx_wdg_box
molPX and TICA with BPTI
molPX intro
Guillermo Perez-Hernandez  guille.perez@fu-berlin.de

In this notebook we will be using the 1 millisecond trajectory of Bovine Pancreatic Trypsin Inhibitor (BPTI) generated by DE Shaw Research on the Anton Supercomputer and kindly made available by their lab. The original work is

  • Shaw DE, Maragakis P, Lindorff-Larsen K, Piana S, Dror RO, Eastwood MP, Bank JA, Jumper JM, Salmon JK, Shan Y, Wriggers W: Atomic-level characterization of the structural dynamics of proteins. Science 330:341-346 (2010). doi: 10.1126/science.1187409.

The trajectory has been duplicated and shortened to provide a mock-trajectory set and be able to deal with lists of trajectories of different lenghts:

  • c-alpha_centered.stride.100.xtc
  • c-alpha_centered.stride.100.reversed.xtc
  • c-alpha_centered.stride.100.halved.xtc
Input types and typical usecase

The typical usecase is having molecular dynamics (MD) simulation data in form of trajectory files with extensions like .xtc, .dcd etc and the associated molecular topology as a .pdb or .gro file.

These files are the most general starting point for any analysis dealing with MD, and molpx’s API has been designed to be able to function without further input:

[ ]:
top = 'notebooks/data/bpti-c-alpha_centered.pdb'
MD_trajfiles = ['notebooks/data/c-alpha_centered.stride.1000.xtc',
               'notebooks/data/c-alpha_centered.stride.1000.reversed.xtc',
                'notebooks/data/c-alpha_centered.stride.1000.halved.xtc'
               ]

dt = 244 #saving interval in the .xtc files, in ns

import molpx
from matplotlib import pylab as plt
%matplotlib ipympl
import pyemma
import numpy as np

# This way the user does not have to care where the data are:
top = molpx._molpxdir(top)
MD_trajfiles = [molpx._molpxdir(ff) for ff in MD_trajfiles]

However, molpx relies heavily on the awesome `mdtraj <http://www.mdtraj.org>`__ module for dealing with molecular structures, and so most of molpx’s functions accept also Trajectory-type objects (native to mdtraj) as alternative inputs.

[ ]:
# Create a memory representation of the trajectories
MD_list = [molpx.generate._md.load(itraj, top=top) for itraj in MD_trajfiles]

The same idea applies to the input of projected trajectories: molpx can take the filenames as inputs (.npy, .dat, .txt etc) or deal directly with numpy.ndarray objects.

** These alternative, “from-memory” input modes (md.Trajectory and np.ndarray objects) avoid forcing the user to read from file everytime an API function is called, saving I/O overhead**

The following cell either reads or generates projected trajectory files for this demonstration. In a real usecase this step (done here using TICA) might not be needed, given that the user might have generated the projected trajectory elsewhere:

[ ]:
# Perform TICA or read from file directly if already .npy-files exist
Y_filenames = [ff.replace('.xtc','.Y.npy') for ff in MD_trajfiles]
try:
    Y = [np.load(ff) for ff in Y_filenames]
except:
    feat = pyemma.coordinates.featurizer(top)
    pairs = feat.pairs(range(feat.topology.n_atoms)[::2])
    feat.add_distances(pairs)
    src  = pyemma.coordinates.source(MD_trajfiles, features=feat)
    tica = pyemma.coordinates.tica(src, lag=10, dim=3)
    Y = tica.get_output()
    [np.save(ff, iY) for ff, iY in zip(Y_filenames, Y)]
Visualize a FES and the molecular structures behind it

Execute the following cell and click either on the FES or on the slidebar. Some input parameters have been comented out for you to try out: * different modes of input (disk vs memory) * different projection indices * different number of overlaid structures

[ ]:
mpx_wdg_box = molpx.visualize.FES(MD_list,
                                 #MD_trajfiles,
                                 top,
                                 Y_filenames,
                                 #Y,
                                 nbins=50,
                                 #proj_idxs=[1,2],
                                 proj_labels='TIC',
                                 #n_overlays=5,
                                )
mpx_wdg_box
Visualize trajectories, FES and molecular structures

The user can sample structures as they occurr in sequence in the actual trajectory. Depending on the size of the dataset, this can be very time consuming, particularly if data is being read from disk.

In this example, try changing MD_trajfiles to MD_list and/or changing Y_filenames to simply Y and see if it helps.

Furthermore, the objects in memory can be strided down to fewer frames before being parsed to the method. To stride objects being read from the disk, use the stride parameter.

Again, we have commented out some parameters that provide more control on the output of visualize.traj. Uncomment them and see what happens
[ ]:
mpx_wdg_box = molpx.visualize.traj(MD_trajfiles,
                                   top,
                                   Y,
                                   #Y_filenames,
                                   plot_FES = True,
                                   dt = dt*1e-6, tunits='ms',
                                   #traj_selection = 1,
                                   #sharey_traj=False,
                                   #max_frames=100,
                                   proj_idxs=[0, 1],
                                   panel_height=2,
                                   #proj_labels='TIC'
                                  )
mpx_wdg_box
Intermediate steps: using molpx to generate a regspace sample of the data

See the documentation of molpx.generate.sample to find out about all possible options:

molpx.generate.sample(MD_trajectories, MD_top, projected_trajectories, atom_selection=None, proj_idxs=[0, 1], n_points=100, n_geom_samples=1, keep_all_samples=False, proj_stride=1, verbose=False, return_data=False)
[ ]:
data_sample, geoms = molpx.generate.sample(#MD_list,
                                           MD_trajfiles,
                                           top,
                                           #Y,
                                           Y_filenames,
                                           n_points=200   ,
                                            n_geom_samples=2,
                                    )
data_sample.shape, geoms

Paths samples along the different projections (=axis)
[ ]:
paths_dict, idata = molpx.generate.projection_paths(#MD_list,
                                                    MD_trajfiles,
                                                    top,
                                                    Y_filenames,
                                                    #Y, # You can also directly give the data here
                                                    n_points=50,
                                                    proj_idxs=[0,1],
                                                    n_projs=3,
                                                    proj_dim = 3,
                                                    verbose=False,
                                        )
Interaction with PyEMMA

molpx is using many methods of the coordinates submodule of PyEMMA, and thus it also understands some of PyEMMA’s classes as input (like clustering objects or streaming transformers). ## Using the TICA object to visualize the most correlated input features If the projected coordinates come from a TICA (or PCA) transformation, and the TICA object is available in memory molpx.visualize.traj can make use of correlation information to display not only the projected coordinates (i.e the TICs, in this case), but also the “original” input features behind it

[ ]:
# Re-do the TICA computation to make sure we have a tica object in memory
feat = pyemma.coordinates.featurizer(top)
pairs = feat.pairs(range(feat.topology.n_atoms)[::2])
feat.add_distances(pairs)
src  = pyemma.coordinates.source(MD_trajfiles, features=feat)
tica = pyemma.coordinates.tica(src, lag=10, dim=3)
Y = tica.get_output()

The method molpx.visualize.correlations tries to provide a visual representation of the projected coordinates by relating them to the input features, which carry more meaning, since they are (usually) familiar parameters such as atom distances, angles, contacts etc.

[ ]:
# Comment or uncomment the optinal parameters and see how the method reacts
# You can use a pre-instantiated the widget
#iwd = molpx.visualize._nglwidget_wrapper(MD_list[0][0])
# Or instantiate at the moment of calling visualize.correlations
iwd = None
corr, ngl_wdg = molpx.visualize.correlations(tica,
                                          n_feats=3,
                                          proj_idxs=[0,1,2],
                                          geoms=MD_list[0][::100],
                                          #verbose=True,
                                          #proj_color_list=['red', 'blue', 'green'],
                                          widget=iwd
                                         )
ngl_wdg

Also, molpx.visualize.traj can help in visualizing these correlations by parsing along the tica object itself as projection=tica. In the next cell, can you spot the differences: * In the nglwidget? * In the trajectories?

[ ]:
# Reuse the visualize.traj method with the tica object as input
mpx_wdg_box = molpx.visualize.traj(MD_trajfiles,
                                  top,
                                  Y,
                                  #Y_filenames,
                                  plot_FES = True,
                                  dt = dt*1e-6, tunits='ms',
                                  #traj_selection = 0,
                                  #sharey_traj=False,
                                  #max_frames=100,
                                  proj_idxs=[0,1],
                                  panel_height=1,
                                  projection=tica, ## this is what's new
                                  n_feats=2
                                 )

mpx_wdg_box
Use a clustering object as input

If the dataset has already been clustered, and it is that clustering that the user wants to explore, molpx.generate.sample can take this clustering object as an input instead of the the projected trajectories:

[ ]:
# Do "some" clustering
clkmeans = pyemma.coordinates.cluster_kmeans([iY[:,:2] for iY in Y], 5)
[ ]:
data_sample, geoms = molpx.generate.sample(MD_trajfiles, top, clkmeans,
                                     n_geom_samples=50,
                                     #keep_all_samples=True # read the doc for this argument
                                    )
[ ]:
# Plot clusters
plt.figure(figsize=(4,4))
plt.plot(clkmeans.clustercenters[:,0], clkmeans.clustercenters[:,1],' ok')
# FES as background is optional (change the bool to False)
if True:
    plt.contourf(x[:-1], y[:-1], -np.log(h.T), alpha=.50)

# Link the clusters positions with the molecular structures
linked_ngl_wdg = molpx.visualize.sample(data_sample,
                              geoms.superpose(geoms[0]),
                              plt.gca(),
                              clear_lines=False,
                              #plot_path=True
                            )
linked_ngl_wdg
Visual representations for MSMs

Visually inspect the network behind an MSM

[ ]:
MSM = pyemma.msm.estimate_markov_model(clkmeans.dtrajs, 20)
[ ]:
plt.figure(figsize=(4,4))

ax, pos  = pyemma.plots.plot_markov_model(MSM.P,
                                          minflux=5e-4,
                                          arrow_labels=None,
                                          ax=plt.gca(),
                                          arrow_curvature = 2, show_frame=True,
                                          pos=clkmeans.clustercenters)
# Add a background if wanted
h, (x, y) = np.histogramdd(np.vstack(Y)[:,:2], weights=np.hstack(MSM.trajectory_weights()),  bins=50)
plt.contourf(x[:-1], y[:-1], -np.log(h.T), cmap="jet", alpha=.5, zorder=0)
plt.xlim(x[[0,-1]])
plt.xticks(np.unique(x.round()))
plt.yticks(np.unique(y.round()))

plt.ylim(y[[0,-1]])

linked_ngl_wd, linked_ax_wd = molpx.visualize.sample(pos, geoms, plt.gca(), dot_color='blue')
linked_ngl_wd
TPT Reactive Pathway Representation
[ ]:
# Do an MSM with a realistic number of clustercenters
cl_many = pyemma.coordinates.cluster_regspace([iY[:,:2] for iY in Y], dmin=.25)
M = pyemma.msm.estimate_markov_model(cl_many.dtrajs, 20)
cl_many.n_clusters
[ ]:
# Use this object to sample geometries
pos, geom = molpx.generate.sample(MD_trajfiles, top, cl_many)
[ ]:
# Find the most representative microstate of each
# and least populated macrostate
M.pcca(3)
dens_max_i = [distro.argmax() for distro in M.metastable_distributions]
A = np.argmax([M.stationary_distribution[iset].sum() for iset in M.metastable_sets])
B = np.argmin([M.stationary_distribution[iset].sum() for iset in M.metastable_sets])
print(cl_many.clustercenters[dens_max_i[A]],
      cl_many.clustercenters[dens_max_i[B]])
[ ]:
# Create a TPT object with most_pop, least_pop as source, sink respectively
tpt = pyemma.msm.tpt(M, [dens_max_i[A]], [dens_max_i[B]])
paths, flux = tpt.pathways(fraction=.5)
[ ]:
# Get a path with a decent number of intermediates
sample_path = paths[np.argmax([len(ipath) for ipath in paths])]
[ ]:
plt.figure()
plt.contourf(x[:-1], y[:-1], -np.log(h.T), cmap="jet", alpha=.5, zorder=0)
linked_ngl_wdg, linked_ax_wdg = molpx.visualize.sample(cl_many.clustercenters[sample_path],
                       geom[sample_path].superpose(geom[sample_path[0]]), plt.gca(),
                       plot_path=True,
                      )
plt.scatter(*cl_many.clustercenters.T, alpha=.25)
linked_ngl_wdg
[ ]:

molPX and TICA with DiAla
molPX Di-Ala example
Guillermo Pérez-Hernández  guille.perez@fu-berlin.de

In this notebook we will be using a trajectory of Di-Ala-peptide to easily identify conformations in the Ramachandran plot.

[4]:
from os.path import exists
import molpx
from matplotlib import pylab as plt
%matplotlib ipympl

import pyemma
import numpy as np
Start from files on disk
[5]:
top = molpx._molpxdir(join='notebooks/data/ala2.pdb')
# What data do we  have?
if exists('/group/ag_cmb/scratch/gph82/Di-Ala-nbdata/ala2.dcd'):
    MD_trajfiles = ['/group/ag_cmb/scratch/gph82/Di-Ala-nbdata/ala2.dcd'] #long trajectory
elif exists('/home/guille/ala2.dcd'):
    MD_trajfiles = ['/home/guille/ala2.dcd'] # extra for Stralsund
else:
    MD_trajfiles = [molpx._molpxdir(join='notebooks/data/ala2.mini.xtc')] #short trajectory
Featurize to Ramachandran \((\phi,\psi)\)-pairs with PyEMMA
[6]:
feat = pyemma.coordinates.featurizer(top)
feat.add_backbone_torsions()
src  = pyemma.coordinates.source(MD_trajfiles, features=feat)
Y = src.get_output()
Visualize a FES and the molecular structures behind it

Execute the following cell and click either on the FES or on the slidebar

[7]:
mpx_widget_box = molpx.visualize.FES(MD_trajfiles,
                                     top,
                                     Y,
                                     #proj_idxs=[1],
                                     nbins=50,
                                     proj_labels=['$\phi$',
                                                  '$\psi$'],
                                     atom_selection="symbol != H",
                                     #n_overlays=5,
                                     #sticky=True,
                                     #color_list='random'
                                    )
mpx_widget_box
Visualize trajectories, FES and molecular structures
[8]:
from molpx import visualize, _linkutils
from imp import reload
reload(visualize)
reload(_linkutils)
mpl_wdg_box = molpx.visualize.traj(MD_trajfiles,
                                   top,
                                   Y,
                                   plot_FES = True,
                                   #dt = dt*1e-6, tunits='ms',
                                   max_frames=10000,
                                   proj_idxs=[0, 1],
                                   panel_height=2,
                                   proj_labels=['$\phi$', '$\psi$']
                          )
mpl_wdg_box
Paths samples along the different projections (=axis)
[10]:
paths_dict, idata = molpx.generate.projection_paths(MD_trajfiles,
                                                    top,
                                                    Y,
                                                    n_points=50,
                                                    proj_idxs=[0,1],
                                                    n_projs=3,
                                                    proj_dim = 3,
                                                    verbose=False,
                                        )
[11]:
# Choose the coordinate and the type of path
coord = 1
path_type = 'min_rmsd'
#path_type = 'min_disp'
igeom = paths_dict[coord][path_type]["geom"]
ipath = paths_dict[coord][path_type]["proj"]

# Choose the proj_idxs for the path and the FES
# to be shown
proj_idxs = [0,1]
[12]:
plt.ioff() # Turn of interactive plotting
plt.figure(figsize=(4,4))
h, (x,y) = np.histogramdd(np.vstack(Y)[:,proj_idxs], bins=50)
plt.contourf(x[:-1], y[:-1], -np.log(h.T), alpha=.50)
plt.ion()

linked_NGL_wdg, linked_ax_widget = molpx.visualize.sample(ipath[:,proj_idxs],
                                                          igeom,
                                                          plt.gca(),
                                                          clear_lines=True,
                                                          n_smooth = 2,
                                                          plot_path=True,
                                                          #radius=True,
                                                         )
linked_NGL_wdg._set_size('4in', '4in')
from ipywidgets import HBox
HBox([linked_NGL_wdg, linked_ax_widget.canvas])

/home/guille/miniconda3/lib/python3.6/site-packages/ipykernel_launcher.py:4: RuntimeWarning: divide by zero encountered in log
  after removing the cwd from sys.path.
Let’s do TICA and try to look a the correlations in a TICA analysis
[13]:
feat = pyemma.coordinates.featurizer(top)
#feat.add_backbone_torsions(cossin=True)
feat.add_distances(feat.topology.select('symbol != H'))
src  = pyemma.coordinates.source(MD_trajfiles, features=feat)
tica = pyemma.coordinates.tica(src, lag=np.int(src.trajectory_lengths()/3000))
Y_tica = tica.get_output()
01-11-17 21:01:44 pyemma.coordinates.data.featurization.featurizer.MDFeaturizer[7] WARNING  The 1D arrays input for add_distances() have been sorted, and index duplicates have been eliminated.
Check the output of describe() to see the actual order of the features
[15]:
mpx_wdg_box = molpx.visualize.FES(MD_trajfiles,
                                  top,
                                  Y_tica,
                                  n_overlays=5,
                                  atom_selection='backbone',
                                  #sticky=True,
                                  #color_list='rand'
          )
mpx_wdg_box
[17]:
mpx_wdg_box = molpx.visualize.traj(MD_trajfiles,
                                  top,
                                  Y_tica,
                                  plot_FES = True,
                                  #dt = dt*1e-6, tunits='ms',
                                  max_frames=10000,
                                  #proj_idxs=[0,1],
                                  panel_height=2,
                                  projection=tica
                                  )
mpx_wdg_box
[20]:
paths_dict, idata = molpx.generate.projection_paths(MD_trajfiles,
                                                    top,
                                                    Y_tica,
                                                    n_points=50,
                                                    proj_idxs=[0,1],
                                                    n_projs=2,
                                                    proj_dim = 2,
                                                    verbose=False,
                                        )
[21]:
# Choose the coordinate and the type of path
coord = 0
path_type = 'min_rmsd'
#path_type = 'min_disp'
igeom = paths_dict[coord][path_type]["geom"]
ipath = paths_dict[coord][path_type]["proj"]

# Choose the proj_idxs for the path and the FES
# to be shown
proj_idxs = [0,1]
[22]:
plt.figure(figsize=(4,4))
h, (x,y) = np.histogramdd(np.vstack(Y_tica)[:,proj_idxs], bins=50)
plt.contourf(x[:-1], y[:-1], -np.log(h.T), alpha=.50)

linked_wdg, axes_widget = molpx.visualize.sample(ipath[:,proj_idxs],
                                    igeom,
                                    plt.gca(),
                                    clear_lines=True,
                                    n_smooth = 1,
                                    plot_path=True,
                                   )
# You can even choose to add the correlations a posteriori
molpx.visualize.correlations(tica, widget=linked_wdg, proj_idxs=0)
linked_wdg.center_view()
linked_wdg
/home/guille/miniconda3/lib/python3.6/site-packages/ipykernel_launcher.py:3: RuntimeWarning: divide by zero encountered in log
  This is separate from the ipykernel package so we can avoid doing imports until
WARNING:root:DEPRECATED: Please use 'center' method
[ ]:

molPX and Metadynamics with DiAla
FES from biased trajectories, too!
Vladas Oleinikovas         uccavol@ucl.ac.uk
Guillermo Pérez-Hernández  guille.perez@fu-berlin.de

We do that by passing correct weights for each conformation in our trajectory.

We exemplify this with metadynamics trajectory of the ever popular test system - alanine dipeptide. The 2ns simulation used phi and psi angles as CVs, and the corresponding weight was derived using Tiwary and Parrinello reweighting algorithm (JPCB 2014, doi:10.1021/jp504920s).

Try commenting out the weights to notice the effect on the estimated depth of the basins.

[1]:
import molpx
import numpy as np
%matplotlib ipympl

# Topology
top = molpx._molpxdir(join='notebooks/data/ala2.pdb')

# Generated during the metadynamics run (extra reversed copy to have a list of files)
MD_trajfiles = [molpx._molpxdir(join='notebooks/data/ala2.meta.xtc'),
               molpx._molpxdir(join='notebooks/data/ala2.meta.reversed.xtc')]

# COLVAR files containing CV projections and logarithm of the corresponding weight
colvar_files = [molpx._molpxdir(join='notebooks/data/ala2.meta.CV.txt'),
                molpx._molpxdir(join='notebooks/data/ala2.meta.CV.reversed.txt')]

# Load logweights and turn them into proper weights
weights = [np.exp(np.loadtxt(ifile)[:,7]) for ifile in colvar_files]
[5]:
mpx_wdg_box = molpx.visualize.FES(MD_trajfiles,
                                  top,
                                  colvar_files,
                                  proj_idxs=[1,2],
                                  nbins=50,
                                  n_sample=200,
                                  proj_labels='CV',
                                  weights=weights, # this is important for biased trajectories!
                                 )
mpx_wdg_box
We can also visualize the weights together with the CV-trajectories
[8]:
mpx_wdg_box = molpx.visualize.traj(MD_trajfiles,
                                  top,
                                  colvar_files,
                                  max_frames=10000,
                                  proj_idxs=[1, 2, 7],
                                  panel_height=1,
                                  proj_labels=['$CV_1$','$CV_2$','log(weights)'],
                                  )
mpx_wdg_box
[ ]:

  1. Watch:
    Our Youtube video or the gif animation show molpx in action.

About

molPX has been developed mostly by Dr. Guillermo Pérez-Hernández in the group of Prof. Dr. Frank Noé, with the priceless help of:

Beyond molPX’s own methods, this module connects two incredibly powerful and incredibly useful python modules:

molPX is specially in debt to Dr. Alexander Rose, who, apart from developing the impressive nglview (among other projects) provided the very first proof-of-concept for molPX.

molPX was recently introduced to the community in a PyEMMA workshop in Berlin:

Installation

To install molPX , you need a few Python package dependencies. If these dependencies are not available in their required versions, the installation will fail. We recommend one particular way for the installation that is relatively safe, but you are welcome to try another approaches if you know what you are doing.

Python Package Index (PyPI)

If you do not like Anaconda for some reason you should use the Python package manager pip to install. This is not recommended, because in the past, various problems have arisen with pip in compiling the packages that molPX depends upon, see this issue for more information.

  1. If you do not have pip, please read the install guide: install guide.

  2. Make sure pip is enabled to install so called wheel packages:

    pip install wheel
    

    Now you are able to install binaries if you use MacOSX or Windows.

  3. Install molPX using

    pip install molPX
    
  4. Check your installation

    python
    >>> import molpx
    >>> molpx.__version__
    

    should print 0.1.2 or later

    >>> import IPython
    >>> IPython.__version__
    

    should print 3.1 or later. If ipython is not up to date, update it by pip install ipython

Building from Source

Building all dependencies from molPX from source is sometimes (if not usually) tricky, takes a long time and is error prone. It is not recommended nor supported by us. If unsure, use the Anaconda installation.

What you can do is clone or download the source from github. After that, just cd to the download directory (and untar/unzip if necessary) and:

>>> cd molPX
>>> python setup.py install

but be aware that success is not guaranteed. See the “Known Issues” below.

Known Issues
  • After a successfull installation and execution of molpx.example_notebooks()…no widgets are shown! Most probably, this has to do with nglview and its needed Jupyter notebook extensions.

This is a known, frustrating behaviour:

We’are aware of this and molPX automatically checks for the needed extensions every time it gets imported, s.t. it will refuse to start-up if it finds any problems. If somehow it manages to start-up but no widget is shown, try issuing

molpx._auto_enable_extensions()

restarting everything and trying again. Otherwise, check the above links.

  • A SandboxViolation error might appear when installing from source. This is because the nglview dependency

is trying to enable the needed extensions. Until we figure this out, try to install nglview externally issuing:

>>> conda install nglview -c bioconda

or, alternatively

>>> pip install nglview
  • Note that molPX only works with nglview versions >=0.6.2.1.

Changelog

0.1.6

New features:

  • molpx.notebooks()
    • interface changed to open the list of available notebooks
    • molpx.notebook() is tagged as deprecated
  • molpx.visualize:
    • new methods
      • correlations() to visualize the feature_TIC_correlation attribute of PyEMMA TICA-objects in the widget
      • feature() accepts a PyEMMA MDFeaturizer object directly
    • traj()
    • has new optional parameter projection to accept a PyEMMA TICA object to

    visualize the linear correlations in the widget AND in as trajectories f(t)

    • FES:
      • can show also 1D FESs
      • accepts “weights” as optional argument for showing re-weighted FESs (as in metadynamics)
      • argument “proj_labels” can be a PyEMMA TICA object to directly use the tica.describe() strings
  • molpx.bmutils:
    • new methods:
      • most_corr_info() to support the new methods in molpx.visualize
      • labelize() to deal with proj_label inputs
      • superpose_to_most_compact_in_list() to superpose geoms with some intelligence in the input parsing
New Notebooks:
  • new notebooks added to the documentation

Fixes:

  • molpx.visualize:
    • all calls to nglview.show_mdtraj() has been wrapped in _initialize_nglwidget_if_safe to avoid
      erroring in tests

Indices and tables