ExTASY Workflows 0.2¶
Github Page
https://bitbucket.org/extasy-project/extasy-workflows
Mailing List
- Users : https://groups.google.com/forum/#!forum/extasy-project
- Developers : https://groups.google.com/forum/#!forum/extasy-devel
Build Status
Contents
Introduction¶
What is ExTASY ?¶
ExTASY, the Extensible Toolkit for Advanced Sampling and analYsis, is a flexible toolkit to allow efficient sampling of complex macromolecules using molecular Dynamics in combination with on-the-fly analysis tools, to drive the sampling process to regions of interest. In particular, compared with existing approaches like metadynamics, ExTASY requires no a priori assumptions about the behaviour of the system. ExTASY consists of several interoperable Python tools, which are coupled together into pre-defined patterns that may be executed on compute resources ranging from PCs and small clusters, to large-scale HPC systems.

ExTASY provides a command line interface, that along with specific configuration files, keeps the user’s job minimal and free of the underlying execution methods and data management that is resource specific. The ExTASY user interface is run on your local machine and handles the data staging, job scheduling and execution on the target machine in a uniform manner, making it easy to test small systems locally before moving to larger HPC resources as needed.
The coupled simulation-analysis execution pattern (aka ExTASY pattern) currently supports two usecases:
- Gromacs as the “Simulator” and LSDMap as the “Analyzer”
- AMBER as the “Simulator” and CoCo as the “Analyzer”
The ExTASY approach¶
ExTASY uses swarm/ensemble simulation strategies that map efficiently onto HPC services. It uses smart collective coordinate strategies to focus sampling in interesting regions, and relies on machine learning methods rather than user expertise to select and refine (on the fly) the collective coordinates. ExTASY is compatible with standard MD codes out of the box - without requiring software patches.

Background¶
Why do enhanced sampling ?¶
To efficiently and accurately identify particular alternative conformations of a molecule.
- E.g., starting from an apo-conformation, identify alternative low-energy conformations of a protein relevant to ligand binding (induced fit/conformational selection).
To efficiently and accurately sample ALL conformational space for a molecule.
- E.g., calculation of thermodynamic and kinetic parameters.
How to do enhanced sampling ?¶
Faster MD through hardware and software developments, e.g.:
- multicore architectures and domain composition.
- specialized hardware (ANTON, GRAPE,...).
Faster MD through manipulation of the effective potential energy surface, e.g.:
- meta-dynamics,
- accelerated dynamics.
Faster sampling through multiple simulation strategies, e.g.:
- Replica exchange.
- Swarm/ensemble simulations and Markov chain models.
Installation & Setup¶
Installation¶
This section describes the requirements and procedure to be followed to install the ExTASY package.
Note
Pre-requisites.The following are the minimal requirements to install the ExTASY module.
- python >= 2.7
- virtualenv >= 1.11
- pip >= 1.5
- Password-less ssh login to Stampede and/or Archer machine (help )
The easiest way to install ExTASY is to create virtualenv. This way, ExTASY and its dependencies can easily be installed in user-space without clashing with potentially incompatible system-wide packages.
Tip
If the virtualenv command is not available, try the following set of commands,
wget --no-check-certificate https://pypi.python.org/packages/source/v/virtualenv/virtualenv-1.11.tar.gz
tar xzf virtualenv-1.11.tar.gz
python virtualenv-1.11/virtualenv.py --system-site-packages $HOME/ExTASY-tools/
source $HOME/ExTASY-tools/bin/activate
Step 1 : Create the virtualenv,
virtualenv $HOME/ExTASY-tools/
If your shell is BASH,
source $HOME/ExTASY-tools/bin/activate
If your shell is CSH,
Setuptools might not get installed with virtualenv and hence using pip would fail. Please look at https://pypi.python.org/pypi/setuptools for installation instructions.
source $HOME/ExTASY-tools/bin/activate.csh
To install the Ensemble MD Toolkit Python modules in the virtual environment, run:
pip install radical.ensemblemd
You can check the version of Ensemble MD Toolkit with the ensemblemd-version command-line tool.
0.3.14
Tip
If your shell is CSH you would need to do,
rehash
This will reset the PATH variable to also point to the packages which were just installed.
Installation is complete !
Preparing the Environment¶
ExTASY is developed using Ensemble MD Toolkit which is a client-side library and relies on a set of external software packages. One of these packages is radical.pilot, an HPC cluster resource access and management library. It can access HPC clusters remotely via SSH and GSISSH, but it requires (a) a MongoDB server and (b) a properly set-up SSH environment.
Note
For the purposes of the examples in this guide, we provide access to a mongodb url (mongodb://extasy:extasyproject@extasy-db.epcc.ed.ac.uk/radicalpilot). This is for trying out these examples only and is periodically purged. We recommend setting up your own mongodb instances for production simulations/experiments.
MongoDB Server¶
The MongoDB server is used to store and retrieve operational data during the execution of an Ensemble MD Toolkit application. The MongoDB server must be reachable on port 27017 from both, the host that runs the Ensemble MD Toolkit application and the host that executes the MD tasks, i.e., the HPC cluster (see blue arrows in the figure above). In our experience, a small VM instance (e.g., Amazon AWS) works exceptionally well for this.
Warning
If you want to run your application on your laptop or private workstation, but run your MD tasks on a remote HPC cluster, installing MongoDB on your laptop or workstation won’t work. Your laptop or workstations usually does not have a public IP address and is hidden behind a masked and firewalled home or office network. This means that the components running on the HPC cluster will not be able to access the MongoDB server.
A MongoDB server can support more than one user. In an environment where multiple users use Ensemble MD Toolkit applications, a single MongoDB server for all users / hosts is usually sufficient.
Install your own MongoDB¶
Once you have identified a host that can serve as the new home for MongoDB, installation is straight forward. You can either install the MongoDB server package that is provided by most Linux distributions, or follow the installation instructions on the MongoDB website:
MongoDB-as-a-Service¶
There are multiple commercial providers of hosted MongoDB services, some of them offering free usage tiers. We have had some good experience with the following:
HPC Cluster Access¶
In order to execute MD tasks on a remote HPC cluster, you need to set-up password-less SSH login for that host. This can either be achieved via an ssh-agent that stores your SSH key’s password (e.g., default on OS X) or by setting up password-less SSH keys.
Password-less SSH with ssh-agent¶
An ssh-agent asks you for your key’s password the first time you use it and then stores it for you so that you don’t have to enter it again. On OS X (>= 10.5), an ssh-agent is running by default. On other Linux operating systems you might have to install or launch it manually.
You can test whether an ssh-agent is running by default on your system if you
log-in via SSH into the remote host twice. The first time, the ssh-agent
should ask you for a password, the second time, it shouldn’t. You can use the
ssh-add
command to list all keys that are currently managed by your
ssh-agent:
%> ssh-add -l
4096 c3:d6:4b:fb:ce:45:b7:f0:2e:05:b1:81:87:24:7f:3f /Users/enmdtk/.ssh/rsa_work (RSA)
For more information on this topic, please refer to this article:
Password-less SSH keys¶
Warning
Using password-less SSH keys is really not encouraged. Some sites might even have a policy in place prohibiting the use of password-less SSH keys. Use ssh-agent if possible.
These instructions were taken from http://www.linuxproblem.org/art_9.html
Follow these instructions to create and set-up a public-private key pair that doesn’t have a password.
As user_a
on host workstation
, generate a pair of keys.
Do not enter a passphrase:
user_a@workstation:~> ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/a/.ssh/id_rsa):
Created directory '/home/a/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/a/.ssh/id_rsa.
Your public key has been saved in /home/a/.ssh/id_rsa.pub.
The key fingerprint is:
3e:4f:05:79:3a:9f:96:7c:3b:ad:e9:58:37:bc:37:e4 a@A
Now use ssh to create a directory ~/.ssh as user_b
on cluster
.
(The directory may already exist, which is fine):
user_a@workstation:~> ssh user_b@cluster mkdir -p .ssh
user_b@cluster's password:
Finally append usera_a
‘s new public key to user_b@cluster:.ssh/authorized_keys
and enter user_b
‘s password one last time:
user_a@workstation:~> cat .ssh/id_rsa.pub | ssh user_b@cluster 'cat >> .ssh/authorized_keys'
user_b@cluster's password:
From now on you can log into cluster
as user_b
from workstation
as
user_a
without a password:
user_a@workstation:~> ssh user_b@cluster
Note
Depending on your version of SSH you might also have to do the following changes:
- Put the public key in
.ssh/authorized_keys2
(note the 2) - Change the permissions of .ssh to 700
- Change the permissions of .ssh/authorized_keys2 to 640
Getting Started¶
ExTASY 0.2 uses the Ensemble Toolkit API for composing the application. In this section we will run you through the basics building blocks of the API. We will introduce the SimulationAnalysisLoop pattern and then work through simple examples using the same pattern in the next section. Once you are comfortable with these examples, we will then move to two molecular dynamics applications created using this API.
SimulationAnalysisLoop (SAL): The Pattern¶
The SAL pattern supports multiple iterations of two chained bag of tasks (BoT). The first bag consists of ‘n’ instances of simulations followed by the second bag which consists of ‘m’ instances of analysis. These analysis instances work on the output of the simulation instances and hence they are chained. There can be multiple iterations of these two BoTs. Depending on the application, it is possible to have the simulation instances of iteration ‘i+1’ work on the output of the analysis instances of iteration ‘i’. There also exist two steps - pre_loop and post_loop to perform any pre- or post- processing. A graphical representation of the pattern is given below:

There are also a set of data references that can be used to reference the data in a particular step or instance.
$PRE_LOOP
- References the pre_loop step$PREV_SIMULATION
- References the previous simulation step with the same instance number.$PREV_SIMULATION_INSTANCE_Y
- References instance Y of the previous simulation step.$SIMULATION_ITERATION_X_INSTANCE_Y
- References instance Y of the simulation step of iteration number X.$PREV_ANALYSIS
- References the previous analysis step with the same instance number.$PREV_ANALYSIS_INSTANCE_Y
- References instance Y of the previous analysis step.$ANALYSIS_ITERATION_X_INSTANCE_Y
- References instance Y of the analysis step of iteration number X.
Components of the API¶
There are three components that the user interacts with in order to implement the application:
- Resource Handle: The resource handle can be see as a container that acquires the resources on the remote machine and provides application level control of these resources.
- Execution Pattern: A pattern can be seen as a parameterized template for an execution trajectory that implements a specific algorithm. A pattern provides placeholder methods for the individual steps or stages of an execution trajectory. These placeholders are populated with Kernels that get executed when it’s the step’s turn to be executed. In ExTASY, we will be using the SAL pattern.
- Application Kernel: A kernel is an object that abstracts a computational task in EnsembleMD. It represents an instantiation of a specific science tool along with its resource specific environment.
Running Generic Examples¶
Multiple Simulations Single Analysis Application with SAL pattern¶
This example shows how to use the SAL pattern to execute 4 iterations of a simulation analysis loop with multiple simulation instances and a single analysis instance. We skip the pre_loop
step in this example. Each simulation_stage
generates 16 new random ASCII files. One ASCII file in each of its instances. In the analysis_stage
, the ASCII files from each of the simulation instances are analyzed and character count is performed on each of the files using one analysis instance. The output is downloaded to the user machine.
[S] [S] [S] [S] [S] [S] [S] | | | | | | | \-----------------------------------------/ | [A] | /-----------------------------------------\ | | | | | | | [S] [S] [S] [S] [S] [S] [S] | | | | | | | \-----------------------------------------/ | [A] . .
Warning
In order to run this example, you need access to a MongoDB server and set the RADICAL_PILOT_DBURL
in your environment accordingly. The format is mongodb://hostname:port
.
Run locally¶
- Step 1: View the example source below. You can download the generic examples using the following:
wget https://bitbucket.org/extasy-project/extasy-workflows/downloads/generic.tar
tar xf generic.tar
cd generic
Note
The files in the above link are configured to run for the tutorial. The source at the end of this page is generic and might require changes.
- Step 2: Run the
multiple_simulations_single_analysis
:
python multiple_simulations_single_analysis.py
Once the script has finished running, you should see the character frequency files generated by the individual ensembles (cfreqs-1.dat) in the in the same directory you launched the script in. You should see as many such files as were the number of iterations. Each analysis stage generates the character frequency file for all the files generated in the simulation stage every iteration.
Note
Environment variable RADICAL_ENMD_VERBOSE is set to REPORT in the python script. This specifies the verbosity of the output. For more verbose output, you can use INFO or DEBUG.
Run remotely¶
By default, simulation and analysis stages run on one core your local machine:
SingleClusterEnvironment(
resource="local.localhost",
cores=1,
walltime=30,
username=None,
project=None
)
You can change the script to use a remote HPC cluster and increase the number of cores to see how this affects the runtime of the script as the individual simulations instances can run in parallel. For example, execution on xsede.stampede using 16 cores would require:
SingleClusterEnvironment(
resource="xsede.stampede",
cores=16,
walltime=30, #minutes
username=None, # add your username here
project=None # add your allocation or project id here if required
)
Example Script¶
Download multiple_simulations_single_analysis.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 | #!/usr/bin/env python
__author__ = "Vivek <vivek.balasubramanian@rutgers.edu>"
__copyright__ = "Copyright 2014, http://radical.rutgers.edu"
__license__ = "MIT"
__example_name__ = "Multiple Simulations Instances, Single Analysis Instance Example (MSSA)"
import sys
import os
import json
from radical.ensemblemd import Kernel
from radical.ensemblemd import SimulationAnalysisLoop
from radical.ensemblemd import EnsemblemdError
from radical.ensemblemd import ResourceHandle
# ------------------------------------------------------------------------------
# Set default verbosity
if os.environ.get('RADICAL_ENTK_VERBOSE') == None:
os.environ['RADICAL_ENTK_VERBOSE'] = 'REPORT'
# ------------------------------------------------------------------------------
#
class MSSA(SimulationAnalysisLoop):
"""MSMA exemplifies how the MSMA (Multiple-Simulations / Multiple-Analsysis)
scheme can be implemented with the SimulationAnalysisLoop pattern.
"""
def __init__(self, iterations, simulation_instances, analysis_instances):
SimulationAnalysisLoop.__init__(self, iterations, simulation_instances, analysis_instances)
def simulation_stage(self, iteration, instance):
"""In the simulation step we
"""
k = Kernel(name="misc.mkfile")
k.arguments = ["--size=1000", "--filename=asciifile.dat"]
return [k]
def analysis_stage(self, iteration, instance):
"""In the analysis step we use the ``$PREV_SIMULATION`` data reference
to refer to the previous simulation. The same
instance is picked implicitly, i.e., if this is instance 5, the
previous simulation with instance 5 is referenced.
"""
link_input_data = []
for i in range(1, self.simulation_instances+1):
link_input_data.append("$PREV_SIMULATION_INSTANCE_{instance}/asciifile.dat > asciifile-{instance}.dat".format(instance=i))
k = Kernel(name="misc.ccount")
k.arguments = ["--inputfile=asciifile-*.dat", "--outputfile=cfreqs.dat"]
k.link_input_data = link_input_data
k.download_output_data = "cfreqs.dat > cfreqs-{iteration}.dat".format(iteration=iteration)
return [k]
# ------------------------------------------------------------------------------
#
if __name__ == "__main__":
try:
# Create a new static execution context with one resource and a fixed
# number of cores and runtime.
cluster = ResourceHandle(
resource='local.localhost',
cores=1,
walltime=15,
#username=None,
#project=None,
#queue = None,
database_url='mongodb://extasy:extasyproject@extasy-db.epcc.ed.ac.uk/radicalpilot',
#database_name=None,
#access_schema=None
)
# Allocate the resources.
cluster.allocate()
# We set both the the simulation and the analysis step 'instances' to 16.
# If they
mssa = MSSA(iterations=4, simulation_instances=16, analysis_instances=1)
cluster.run(mssa)
cluster.deallocate()
except EnsemblemdError, er:
print "Ensemble MD Toolkit Error: {0}".format(str(er))
raise # Just raise the execption again to get the backtrace
|
In line 55, a SingleClusterEnvironment (Execution context) is created targetted to reserve 1 core on localhost for a duration of 30 mins. In line 64, an allocation request is made for the particular execution context.
In line 16, we define the pattern class to be the SAL pattern. We skip the definition of the pre_loop
step since we do not require it for this example. In line 27, we define the kernel that needs to be executed during the simulation stage (mkfile) as well as the arguments to the kernel. In line 41, we define the kernel that needs to be executed during the analysis stage (ccount). In lines 38-39, we create a list of references to the output data created in each of the simulation instances, in order to stage it in during the analysis instance (line 43).
In line 68, we create an instance of this MSSA class to run 4 iterations of 16 simulation instances and 1 analysis instance. We run this pattern in the execution context in line 70 and once completed we deallocate the acquired resources (line 72).
Multiple Simulations Multiple Analysis Application with SAL pattern¶
This example shows how to use the SAL pattern to execute 4 iterations of a simulation analysis loop with multiple simulation instances and multiple analysis instance. We skip the pre_loop
step in this example. Each simulation_stage
generates 16 new random ASCII files. One ASCII file in each of its instances. In the analysis_stage
, the ASCII files from the simulation instances are analyzed and character count is performed. Each analysis instance uses the file generated by the corresponding simulation instance. This is possible since we use the same number of instances for simulation and analysis.The output is downloaded to the user machine.
[S] [S] [S] [S] [S] [S] [S] [S]
| | | | | | | |
[A] [A] [A] [A] [A] [A] [A] [A]
| | | | | | | |
[S] [S] [S] [S] [S] [S] [S] [S]
| | | | | | | |
[A] [A] [A] [A] [A] [A] [A] [A]
Warning
In order to run this example, you need access to a MongoDB server and set the RADICAL_PILOT_DBURL
in your environment accordingly. The format is mongodb://hostname:port
.
Run locally¶
- Step 1: View the example sources below. You can download the generic examples using the following (same link as above):
wget https://bitbucket.org/extasy-project/extasy-workflows/downloads/generic.tar
tar xf generic.tar
cd generic
Note
The files in the above link are configured to run for the CECAM workshop. The source at the end of this page is generic and might require changes.
- Step 2: Run the
multiple_simulations_multiple_analysis
:
python multiple_simulations_multiple_analysis.py
Once the script has finished running, you should see the character frequency files generated by the individual ensembles (cfreqs-1-1.dat) in the in the same directory you launched the script in. You should see as many such files as were the number of iterations times the number of ensembles (i.e. simulation/analysis width). Each analysis stage generates the character frequency file for each of the files generated in the simulation stage every iteration.
Note
Environment variable RADICAL_ENMD_VERBOSE is set to REPORT in the python script. This specifies the verbosity of the output. For more verbose output, you can use INFO or DEBUG.
Run remotely¶
By default, simulation and analysis stages run on one core your local machine:
SingleClusterEnvironment(
resource="local.localhost",
cores=1,
walltime=30,
username=None,
project=None
)
You can change the script to use a remote HPC cluster and increase the number of cores to see how this affects the runtime of the script as the individual simulations instances can run in parallel. For example, execution on xsede.stampede using 16 cores would require:
SingleClusterEnvironment(
resource="xsede.stampede",
cores=16,
walltime=30,
username=None, # add your username here
project=None # add your allocation or project id here if required
)
Example Script¶
Download multiple_simulations_multiple_analysis.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 | #!/usr/bin/env python
__author__ = "Vivek <vivek.balasubramanian@rutgers.edu>"
__copyright__ = "Copyright 2014, http://radical.rutgers.edu"
__license__ = "MIT"
__example_name__ = "Multiple Simulations Instances, Multiple Analysis Instances Example (MSMA)"
import sys
import os
import json
from radical.ensemblemd import Kernel
from radical.ensemblemd import SimulationAnalysisLoop
from radical.ensemblemd import EnsemblemdError
from radical.ensemblemd import ResourceHandle
# ------------------------------------------------------------------------------
# Set default verbosity
if os.environ.get('RADICAL_ENTK_VERBOSE') == None:
os.environ['RADICAL_ENTK_VERBOSE'] = 'REPORT'
# ------------------------------------------------------------------------------
#
class MSMA(SimulationAnalysisLoop):
"""MSMA exemplifies how the MSMA (Multiple-Simulations / Multiple-Analsysis)
scheme can be implemented with the SimulationAnalysisLoop pattern.
"""
def __init__(self, iterations, simulation_instances, analysis_instances):
SimulationAnalysisLoop.__init__(self, iterations, simulation_instances, analysis_instances)
def simulation_stage(self, iteration, instance):
"""In the simulation step we
"""
k = Kernel(name="misc.mkfile")
k.arguments = ["--size=1000", "--filename=asciifile.dat"]
return k
def analysis_stage(self, iteration, instance):
"""In the analysis step we use the ``$PREV_SIMULATION`` data reference
to refer to the previous simulation. The same
instance is picked implicitly, i.e., if this is instance 5, the
previous simulation with instance 5 is referenced.
"""
k = Kernel(name="misc.ccount")
k.arguments = ["--inputfile=asciifile.dat", "--outputfile=cfreqs.dat"]
k.link_input_data = "$PREV_SIMULATION/asciifile.dat".format(instance=instance)
k.download_output_data = "cfreqs.dat > cfreqs-{iteration}-{instance}.dat".format(instance=instance, iteration=iteration)
return k
# ------------------------------------------------------------------------------
#
if __name__ == "__main__":
try:
# Create a new static execution context with one resource and a fixed
# number of cores and runtime.
cluster = ResourceHandle(
resource='local.localhost',
cores=1,
walltime=15,
#username=None,
#project=None,
#queue = None,
database_url='mongodb://extasy:extasyproject@extasy-db.epcc.ed.ac.uk/radicalpilot',
#database_name=,
#access_schema=None,
)
# Allocate the resources.
cluster.allocate()
# We set both the the simulation and the analysis step 'instances' to 8.
msma = MSMA(iterations=2, simulation_instances=8, analysis_instances=8)
cluster.run(msma)
cluster.deallocate()
except EnsemblemdError, er:
print "Ensemble MD Toolkit Error: {0}".format(str(er))
raise # Just raise the execption again to get the backtrace
|
In line 51, a SingleClusterEnvironment (Execution context) is created targetted to reserve 1 core on localhost for a duration of 30 mins. In line 60, an allocation request is made for the particular execution context.
In line 16, we define the pattern class to be the SAL pattern. We skip the definition of the pre_loop
step since we do not require it for this example. In line 27, we define the kernel that needs to be executed during the simulation stage (mkfile) as well as the arguments to the kernel. In line 37, we define the kernel that needs to be executed during the analysis stage (ccount). In lines 39, we refer to the output data created in the previous simulation stage with the same instance number as that of the current analysis instance.
In line 63, we create an instance of this MSSA class to run 4 iterations of 16 simulation instances and 1 analysis instance. We run this pattern in the execution context in line 65 and once completed we deallocate the acquired resources (line 67).
Running a Coco/Amber Workload¶
Introduction¶
CoCo (“Complementary Coordinates”) uses a PCA-based method to analyse trajectory data and identify potentially under sampled regions of conformational space. For an introduction to CoCo, including how it can be used as a stand-alone tool, see here.
The ExTASY workflow uses cycles of CoCo and MD simulation to rapidly generate a diverse ensemble of conformations of a chosen molecule. A typical use could be in exploring the conformational flexibility of a protein’s ligand binding site and the generation of diverse conformations for docking studies. The basic workflow is as follows:
- The input ensemble (typically the trajectory file from a short preliminary simulation, but potentially just a single structure) is analysed by CoCo and N possible, but so far unsampled, conformations identified.
- N independent short MD simulations are run, starting from each of these points.
- The resulting trajectory files are added to the input ensemble, and CoCo analysis performed on them all, identifying N new points.
- Steps 2 and 3 are repeated, building up an ensemble of an increasing number of short, but diverse, trajectory files, for as many cycles as the user choses.
In common with the other ExTASY workflows, a user prepares the necessary input files and ExTASY configuration files on their local workstation, and launches the job from there, but the calculations are then performed on the execution host, which is typically an HPC resource.
This release of ExTASY has a few restrictions:
- The MD simulations can only be performed using AMBER or GROMACS.
- The system to be simulated cannot contain any non-standard residues (i.e., any not found in the default AMBER residue library).
Required Input files¶
The Amber/CoCo workflow requires the user to prepare four AMBER-style files, and two ExTASY configuration files. For more information about Amber-specific file formats see here.
- A topology file for the system (Amber .top format).
- An initial structure (Amber .crd format) or ensemble (any Amber trajectory format).
- A simple minimisation input script (.in format). This will be used to refine each structure produced by CoCo before it is used for MD.
- An MD input script (.in format).
- An ExTASY Resource configuration (.rcfg) file.
- An ExTASY Workload configuration (.wcfg) file.
Here is an example of a typical minimisation input script (min.in):
Basic minimisation, weakly restraining backbone so it does not drift too far
from CoCo-generated conformation
&cntrl
imin=1, maxcyc=500,
ntpr=50,
ntr=1,
ntb=0, cut=25.0, igb=2,
&end
Atoms to be restrained
0.1
FIND
CA * * *
N * * *
C * * *
O * * *
SEARCH
RES 1 999
END
END
Here is an example of a typical MD input script (mdshort.in):
0.1 ns GBSA sim
&cntrl
imin=0, ntx=1,
ntpr=1000, ntwr=1000, ntwx=500,
ioutfm=1,
nstlim=50000, dt=0.002,
ntt=3, ig=-1, gamma_ln=5.0,
ntc=2, ntf=2,
ntb=0, cut=25.0, igb=2,
&end
The resource and workload configuration files are discussed specific to the resource in the forthcoming sections. In section 5.3, we discuss execution on Stampede and in section 5.4, we discuss execution on Archer.
Running on Stampede¶
This section is to be done entirely on your laptop. The ExTASY tool expects two input files:
- The resource configuration file sets the parameters of the HPC resource we want to run the workload on, in this case Stampede.
- The workload configuration file defines the CoCo/Amber workload itself. The configuration file given in this example is strictly meant for the coco-amber usecase only.
Step 1: Create a new directory for the example,
mkdir $HOME/extasy-tutorial/
cd $HOME/extasy-tutorial/
Step 2: Download the config files and the input files directly using the following link.
wget https://bitbucket.org/extasy-project/extasy-workflows/downloads/coam-on-stampede.tar
tar xf coam-on-stampede.tar
cd coam-on-stampede
Step 3: In the coam-on-stampede folder, a resource configuration file stampede.rcfg
exists. Details and modifications required are as follows:
Note
For the purposes of this example, you require to change only:
- UNAME
- ALLOCATION
The other parameters in the resource configuration are already set up to successfully execute the workload in this example.
REMOTE_HOST = 'xsede.stampede' # Label/Name of the Remote Machine
UNAME = 'username' # Username on the Remote Machine
ALLOCATION = 'TG-MCB090174' # Allocation to be charged
WALLTIME = 20 # Walltime to be requested for the pilot
PILOTSIZE = 16 # Number of cores to be reserved
WORKDIR = None # Working directory on the remote machine
QUEUE = 'development' # Name of the queue in the remote machine
DBURL = 'mongodb://extasy:extasyproject@extasy-db.epcc.ed.ac.uk/radicalpilot' #MongoDB link to be used for coordination purposes
Step 4: In the coam-on-stampede folder, a workload configuration file cocoamber.wcfg
exists. Details and modifications required are as follows:
#-------------------------Applications----------------------
simulator = 'Amber' # Simulator to be loaded
analyzer = 'CoCo' # Analyzer to be loaded
#-------------------------General---------------------------
num_iterations = 2 # Number of iterations of Simulation-Analysis
start_iter = 0 # Iteration number with which to start
num_CUs = 16 # Number of tasks or Compute Units
nsave = 2 # Iterations after which output is transferred to local machine
#-------------------------Simulation-----------------------
num_cores_per_sim_cu = 2 # Number of cores per Simulation Compute Units
md_input_file = './inp_files/mdshort.in' # Entire path to MD Input file - Do not use $HOME or the likes
minimization_input_file = './inp_files/min.in' # Entire path to Minimization file - Do not use $HOME or the likes
initial_crd_file = './inp_files/penta.crd' # Entire path to Coordinates file - Do not use $HOME or the likes
top_file = './inp_files/penta.top' # Entire path to Topology file - Do not use $HOME or the likes
ref_file = './inp_files/penta.pdb' # Path to file with reference coordinates that will be used as an auxiliary file to read the trajectory files
logfile = 'coco.log' # Name of the log file created by pyCoCo
atom_selection = 'protein'
#-------------------------Analysis--------------------------
grid = '5' # Number of points along each dimension of the CoCo histogram
dims = '3' # The number of projections to consider from the input pcz file
Note
All the parameters in the above example file are mandatory for amber-coco. There are no other parameters currently supported.
Step 5: You can find the executable script `extasy_amber_coco.py`
in the coam-on-stampede folder.
Now you are can run the workload using :
python extasy_amber_coco.py --RPconfig stampede.rcfg --Kconfig cocoamber.wcfg
Note
Environment variable RADICAL_ENMD_VERBOSE is set to REPORT in the python script. This specifies the verbosity of the output. For more verbose output, you can use INFO or DEBUG.
Note
Time to completion: ~240 seconds (from the time job goes through LRMS)
Running on Archer¶
This section is to be done entirely on your laptop. The ExTASY tool expects two input files:
- The resource configuration file sets the parameters of the HPC resource we want to run the workload on, in this case Archer.
- The workload configuration file defines the CoCo/Amber workload itself. The configuration file given in this example is strictly meant for the coco-amber usecase only.
Step 1: Create a new directory for the example,
mkdir $HOME/extasy-tutorial/
cd $HOME/extasy-tutorial/
Step 2: Download the config files and the input files directly using the following link.
wget https://bitbucket.org/extasy-project/extasy-workflows/downloads/coam-on-archer.tar
tar xf coam-on-archer.tar
cd coam-on-archer
Step 3: In the coam-on-archer folder, a resource configuration file archer.rcfg
exists. Details and modifications required are as follows:
Note
For the purposes of this example, you require to change only:
- UNAME
- ALLOCATION
The other parameters in the resource configuration are already set up to successfully execute the workload in this example.
REMOTE_HOST = 'epsrc.archer' # Label/Name of the Remote Machine
UNAME = 'username' # Username on the Remote Machine
ALLOCATION = 'e290' # Allocation to be charged
WALLTIME = 20 # Walltime to be requested for the pilot
PILOTSIZE = 24 # Number of cores to be reserved
WORKDIR = None # Working directory on the remote machine
QUEUE = 'standard' # Name of the queue in the remote machine
DBURL = 'mongodb://extasy:extasyproject@extasy-db.epcc.ed.ac.uk/radicalpilot' #MongoDB link to be used for coordination purposes
Step 4: In the coam-on-archer folder, a resource configuration file cocoamber.wcfg
exists. Details and modifications required are as follows:
#-------------------------Applications----------------------
simulator = 'Amber' # Simulator to be loaded
analyzer = 'CoCo' # Analyzer to be loaded
#-------------------------General---------------------------
num_iterations = 2 # Number of iterations of Simulation-Analysis
start_iter = 0 # Iteration number with which to start
num_CUs = 16 # Number of tasks or Compute Units
nsave = 2 # Iterations after which output is transferred to local machine
#-------------------------Simulation-----------------------
num_cores_per_sim_cu = 2 # Number of cores per Simulation Compute Units
md_input_file = './inp_files/mdshort.in' # Entire path to MD Input file - Do not use $HOME or the likes
minimization_input_file = './inp_files/min.in' # Entire path to Minimization file - Do not use $HOME or the likes
initial_crd_file = './inp_files/penta.crd' # Entire path to Coordinates file - Do not use $HOME or the likes
top_file = './inp_files/penta.top' # Entire path to Topology file - Do not use $HOME or the likes
ref_file = './inp_files/penta.pdb' # Path to file with reference coordinates that will be used as an auxiliary file to read the trajectory files
logfile = 'coco.log' # Name of the log file created by pyCoCo
atom_selection = 'protein'
#-------------------------Analysis--------------------------
grid = '5' # Number of points along each dimension of the CoCo histogram
dims = '3' # The number of projections to consider from the input pcz file
Note
All the parameters in the above example file are mandatory for amber-coco. There are no other parameters currently supported.
Step 5: You can find the executable script `extasy_amber_coco.py`
in the coam-on-archer folder.
Now you are can run the workload using :
python extasy_amber_coco.py --RPconfig archer.rcfg --Kconfig cocoamber.wcfg
Note
Environment variable RADICAL_ENMD_VERBOSE is set to REPORT in the python script. This specifies the verbosity of the output. For more verbose output, you can use INFO or DEBUG.
Note
Time to completion: ~600 seconds (from the time job goes through LRMS)
Running on localhost¶
The above two sections describes execution on XSEDE.Stampede and EPSRC.Archer, assuming you have access to these machines. This section describes the changes required to the EXISTING scripts in order to get CoCo-Amber running on your local machines (label to be used = local.localhost
as in the generic examples).
Step 1: You might have already guessed the first step. You need to create a SingleClusterEnvironment object targetting the localhost machine. You can either directly make changes to the extasy_amber_coco.py
script or create a separate resource configuration file and provide it as an argument.
Step 2: The MD tools require some tool specific environment variables to be setup (AMBERHOME, PYTHONPATH, GCC, GROMACS_DIR, etc). Along with this, you would require to set the PATH environment variable to point to the binary file (if any) of the MD tool. Once you determine all the environment variables to be setup, set them on the terminal and test it by executing the MD command (possibly for a sample case). For example, if you have amber installed in $HOME as $HOME/amber14. You probably have to setup AMBERHOME to $HOME/amber14 and append $HOME/amber14/bin to PATH. Please check official documentation of the MD tool.
Step 3: There are three options to proceed.
- Once you tested the environment setup, next you need to add it to the particular kernel definition. You need to, first, locate the particular file to be modified. All the files related to Ensemble Toolkit are located within the virtualenv (say “myenv”). Go into the following path:
myenv/lib/python-2.7/site-packages/radical/ensemblemd/kernel_plugins/md
. This path contains all the kernels used for the MD examples. You can open the amber.py file and add an entry for local.localhost (in"machine_configs"
) as follows:.. .. "machine_configs": { .. .. "local.localhost": { "pre_exec" : ["export AMBERHOME=$HOME/amber14", "export PATH=$HOME/amber14/bin:$PATH"], "executable" : ["sander"], "uses_mpi" : False # Could be True or False }, .. .. } .. ..This would have to be repeated for all the kernels.
- Another option is to perform the same above steps. But leave the
"pre_exec"
value as an empty list and set all the environment variables in your bashrc ($HOME/.bashrc
). Remember that you would still need to set the executable as above.- The third option is to create your own kernel plugin as part of your user script. These avoids the entire procedure of locating the existing kernel plugin files. This would also get you comfortable in using kernels other than the ones currently available as part of the package. Creating your own kernel plugins are discussed here
Understanding the Output of the Examples¶
In the local machine, a “output” folder is created and at the end of every checkpoint intervel (=nsave) an “iter*” folder is created which contains the necessary files to start the next iteration.
For example, in the case of CoCo-Amber on stampede, for 4 iterations with nsave=2:
coam-on-stampede$ ls
output/ cocoamber.wcfg mdshort.in min.in penta.crd penta.top stampede.rcfg
coam-on-stampede/output$ ls
iter1/ iter3/
The “iter*” folder will not contain any of the initial files such as the topology file, minimization file, etc since they already exist on the local machine. In coco-amber, the “iter*” folder contains the NetCDF files required to start the next iteration and a logfile of the CoCo stage of the current iteration.
coam-on-stampede/output/iter1$ ls
1_coco.log md_0_11.ncdf md_0_14.ncdf md_0_2.ncdf md_0_5.ncdf md_0_8.ncdf md_1_10.ncdf md_1_13.ncdf md_1_1.ncdf md_1_4.ncdf md_1_7.ncdf
md_0_0.ncdf md_0_12.ncdf md_0_15.ncdf md_0_3.ncdf md_0_6.ncdf md_0_9.ncdf md_1_11.ncdf md_1_14.ncdf md_1_2.ncdf md_1_5.ncdf md_1_8.ncdf
md_0_10.ncdf md_0_13.ncdf md_0_1.ncdf md_0_4.ncdf md_0_7.ncdf md_1_0.ncdf md_1_12.ncdf md_1_15.ncdf md_1_3.ncdf md_1_6.ncdf md_1_9.ncdf
It is important to note that since, in coco-amber, all the NetCDF files of previous and current iterations are transferred at each checkpoint, it might be useful to have longer checkpoint intervals. Since smaller intervals would lead to heavy data transfer of redundant data.
On the remote machine, inside the pilot-* folder you can find a folder called “unit.00000”. This location is used to exchange/link/move intermediate data. The shared data is kept in “unit.00000/” and the iteration specific inputs/outputs can be found in their specific folders (=”unit.00000/iter*”).
$ cd unit.00000/
$ ls
iter0/ iter1/ iter2/ iter3/ mdshort.in min.in penta.crd penta.top postexec.py
Running a Gromacs/LSDMap Workload¶
This section will discuss details about the execution phase. The input to the tool is given in terms of a resource configuration file and a workload configuration file. The execution is started based on the parameters set in these configuration files. In section 4.1, we discuss execution on Stampede and in section 4.2, we discuss execution on Archer.
Introduction¶
DM-d-MD (Diffusion-Map directed Molecular Dynamics) is an adaptive sampling algorithm based on LSDMap (Locally Scaled Diffusion Map), a nonlinear dimensionality reduction technique which provides a set of collective variables associated with slow time scales of Molecular Dynamics simulations (MD).
For an introduction to DM-d-MD, including how it can be used as a stand-alone tool, see J.Preto and C. Clementi, Phys. Chem. Chem. Phys., 2014, 16, 19181-19191 .
In a nutshell, DM-d-MD consists in periodically restarting multiple parallel GROMACS MD trajectories from a distribution of configurations uniformly sampled along LSDMap coordinates. In this way, during each DM-d-MD cycle, it becomes possible to visit a wider area of the configuration space without remaining trapped in local minima as it could be the case for plain MD simulations. As another feature, DM-d-MD includes a reweighting scheme that is used to keep track of the free energy landscape all along the procedure. A typical DM-d-MD cycle includes the following steps:
- Simulation: Short MD trajectories are run using GROMACS starting from the set of configurations selected in step 3 (one trajectory per configuration). For the first cycle, the trajectories start from configurations specified within an input file provided by the user (option md_input_file in the Workload configuration file).
- Analysis: LSDMap is computed from the endpoints of each trajectory. The LSDMap coordinates are stored in a file called lsdmap.ev.
- Select + Reweighting: New configurations are selected among the endpoints so that the distribution of new configurations is uniform along LSDMap coordinates. The same endpoint can be selected as a new configuration more than once or can be not selected. At the same time, a statistical weight is provided to each new configuration in order to recover the free energy landscape associated with regular MD. The weights are stored in a file called weight.w.
In common with the other ExTASY workflows, a user prepares the necessary input files and ExTASY configuration files on their local workstation, and launches the job from there, but the calculations are then performed on the execution host, which is typically an HPC resource.
Required Input files¶
The GROMACS/LSDMap (DM-d-MD) workflow requires the user to prepare at least three GROMACS-style files, one configuration file used for LSDMap, and two ExTASY configuration files.
- A topology file (.top format) (specified via the option top_file in the Workload configuration file).
- An initial structure file (.gro format) (specified via the option md_input_file in the Workload configuration file).
- A parameter file (.mdp format) for MD simulations (specified via the option mdp_file in the Workload configuration file).
- A configuration file used for LSDMap (.ini format) (specified via the option lsdm_config_file in the Workload configuration file)
- An ExTASY Resource configuration (*.rcfg) file.
- An ExTASY Workload configuration (*.wcfg) file.
For more information about .top, .gro and .mdp formats, we refer the user to the following website http://manual.gromacs.org/current/online/files.html. Please note that the parameter “nsteps” specified in the .mdp file should correspond to the number of MD time steps of each DM-d-MD cycle. Documentation on GROMACS can be found on the official website: http://www.gromacs.org.
Here is an example of a typical LSDMap configuration file (config.ini):
[LSDMAP]
;metric used to compute the distance matrix (rmsd, cmd, dihedral)
metric=rmsd
;constant r0 used with cmd metric
r0=0.05
[LOCALSCALE]
;status (constant, kneighbor, kneighbor_mean)
status=constant
;constant epsilon used in case status is constant
epsilon=0.05
;value of k in case status is kneighbor or kneighbor_mean
k=30
Notes:
- See the paper W. Zheng, M. A. Rohrdanz, M. Maggioni and C. Clementi, J. Chem. Phys., 2011, 134, 144109 for more information on how LSDMap works.
- metric is the metric used with LSDMap (only rmsd, cmd (contact map distance) and dihedral metric are currently supported, see the paper `P. Cossio, A. Laio and F. Pietrucci, Phys. Chem. Chem. Phys., 2011, 13, 10421–10425 <http://pubs.rsc.org/en/Content/ArticleLanding/2011/CP/c0cp02675a#!divAbstract>`_for more information).
- status in the section LOCALSCALE refers to the way the local scale is computed when performing LSDMap. constant means that the local scale is the same for all the configurations and is equal to the value specified via the parameter epsilon (in nm). kneighbor implies that the local scale of each MD configuration is given as the distance to its kth nearest neighbor, where k is given by the parameter k. kneighbor_mean means that the local scale is the same for all the configuration and is equal to the average kth-neighbor distance.
The resource and workload configuration files are discussed specific to the resource in the forthcoming sections. In section 6.3, we discuss execution on Stampede and in section 6.4, we discuss execution on Archer.
Running on Stampede¶
This section is to be done entirely on your laptop. The ExTASY tool expects two input files:
- The resource configuration file sets the parameters of the HPC resource we want to run the workload on, in this case Stampede.
- The workload configuration file defines the GROMACS/LSDMap workload itself. The configuration file given in this example is strictly meant for the gromacs-lsdmap usecase only.
Step 1: Create a new directory for the example,
mkdir $HOME/extasy-tutorial/
cd $HOME/extasy-tutorial/
Step 2: Download the config files and the input files directly using the following link.
wget https://bitbucket.org/extasy-project/extasy-workflows/downloads/grlsd-on-stampede.tar
tar xf grlsd-on-stampede.tar
cd grlsd-on-stampede
Step 3: In the grlsd-on-stampede folder, a resource configuration file stampede.rcfg
exists. Details and modifications required are as follows:
Note
For the purposes of this example, you require to change only:
- UNAME
- ALLOCATION
The other parameters in the resource configuration are already set up to successfully execute the workload in this example.
Step 4: In the grlsd-on-stampede folder, a workload configuration file gromacslsdmap.wcfg
exists. Details and modifications are as follows:
Note
All the parameters in the above example file are mandatory for gromacs-lsdmap. If ndxfile, grompp_options, mdrun_options and itp_file_loc are not required, they should be set to None; but they still have to mentioned in the configuration file. There are no other parameters currently supported for these examples.
Step 5: You can find the executable script `extasy_gromacs_lsdmap.py`
in the grlsd-on-stampede folder.
Now you are can run the workload using :
python extasy_gromacs_lsdmap.py --RPconfig stampede.rcfg --Kconfig gromacslsdmap.wcfg
Note
Environment variable RADICAL_ENMD_VERBOSE is set to REPORT in the python script. This specifies the verbosity of the output. For more verbose output, you can use INFO or DEBUG.
Note
Time to completion: ~13 mins (from the time job goes through LRMS)
Running on Archer¶
This section is to be done entirely on your laptop. The ExTASY tool expects two input files:
- The resource configuration file sets the parameters of the HPC resource we want to run the workload on, in this case Archer.
- The workload configuration file defines the CoCo/Amber workload itself. The configuration file given in this example is strictly meant for the gromacs-lsdmap usecase only.
Step 1: Create a new directory for the example,
mkdir $HOME/extasy-tutorial/
cd $HOME/extasy-tutorial/
Step 2: Download the config files and the input files directly using the following link.
wget https://bitbucket.org/extasy-project/extasy-workflows/downloads/grlsd-on-archer.tar
tar xf grlsd-on-archer.tar
cd grlsd-on-archer
Step 3: In the grlsd-on-archer folder, a resource configuration file archer.rcfg
exists. Details and modifications required are as follows:
Note
For the purposes of this example, you require to change only:
- UNAME
- ALLOCATION
The other parameters in the resource configuration are already set up to successfully execute the workload in this example.
Step 4: In the grlsd-on-archer folder, a workload configuration file gromacslsdmap.wcfg
exists. Details and modifications required are as follows:
Note
All the parameters in the above example file are mandatory for gromacs-lsdmap. If ndxfile, grompp_options, mdrun_options and itp_file_loc are not required, they should be set to None; but they still have to mentioned in the configuration file. There are no other parameters currently supported.
Step 5: You can find the executable script `extasy_gromacs_lsdmap.py`
in the grlsd-on-archer folder.
Now you are can run the workload using :
python extasy_gromacs_lsdmap.py --RPconfig archer.rcfg --Kconfig gromacslsdmap.wcfg
Note
Environment variable RADICAL_ENMD_VERBOSE is set to REPORT in the python script. This specifies the verbosity of the output. For more verbose output, you can use INFO or DEBUG.
Note
Time to completion: ~15 mins (from the time job goes through LRMS)
Running on localhost¶
The above two sections describes execution on XSEDE.Stampede and EPSRC.Archer, assuming you have access to these machines. This section describes the changes required to the EXISTING scripts in order to get Gromacs-LSDMap running on your local machines (label to be used = local.localhost
as in the generic examples).
Step 1: You might have already guessed the first step. You need to create a SingleClusterEnvironment object targetting the localhost machine. You can either directly make changes to the extasy_gromacs_lsdmap.py
script or create a separate resource configuration file and provide it as an argument.
Step 2: The MD tools require some tool specific environment variables to be setup (AMBERHOME
, PYTHONPATH
, GCC
, GROMACS_DIR
, etc). Along with this, you would require to set the PATH
environment variable to point to the binary file (if any) of the MD tool. Once you determine all the environment variables to be setup, set them on the terminal and test it by executing the MD command (possibly for a sample case). For example, if you have gromacs installed in $HOME
as $HOME/gromacs_5
. You probably have to setup GROMACS_DIR
to $HOME/gromacs-5
and append $HOME/gromacs-5/bin
to PATH
. Please check official documentation of the MD tool.
Step 3: There are three options to proceed.
- Once you tested the environment setup, next you need to add it to the particular kernel definition. You need to, first, locate the particular file to be modified. All the files related to Ensemble Toolkit are located within the virtualenv (say “myenv”). Go into the following path:
myenv/lib/python-2.7/site-packages/radical/ensemblemd/kernel_plugins/md
. This path contains all the kernels used for the MD examples. You can open thegromacs.py
file and add an entry for local.localhost (in"machine_configs"
) as follows:.. .. "machine_configs": { .. .. "local.localhost": { "pre_exec" : ["export GROMACS_DIR=$HOME/gromacs-5", "export PATH=$HOME/gromacs-5/bin:$PATH"], "executable" : ["mdrun"], "uses_mpi" : False # Could be True or False }, .. .. } .. ..This would have to be repeated for all the kernels.
- Another option is to perform the same above steps. But leave the
"pre_exec"
value as an empty list and set all the environment variables in your bashrc ($HOME/.bashrc
). Remember that you would still need to set the executable as above.- The third option is to create your own kernel plugin as part of your user script. These avoids the entire procedure of locating the existing kernel plugin files. This would also get you comfortable in using kernels other than the ones currently available as part of the package. Creating your own kernel plugins are discussed here
Understanding the Output of the Examples¶
In the local machine, a “output” folder is created and at the end of every checkpoint intervel (=nsave) an “iter*” folder is created which contains the necessary files to start the next iteration.
For example, in the case of gromacs-lsdmap on stampede, for 4 iterations with nsave=2:
grlsd-on-stampede$ ls
output/ config.ini gromacslsdmap.wcfg grompp.mdp input.gro stampede.rcfg topol.top
grlsd-on-stampede/output$ ls
iter1/ iter3/
The “iter*” folder will not contain any of the initial files such as the topology file, minimization file, etc since they already exist on the local machine. In gromacs-lsdmap, the “iter*” folder contains the coordinate file and weight file required in the next iteration. It also contains a logfile about the lsdmap stage of the current iteration.
grlsd-on-stampede/output/iter1$ ls
2_input.gro lsdmap.log weight.w
On the remote machine, inside the pilot-* folder you can find a folder called “unit.00000”. This location is used to exchange/link/move intermediate data. The shared data is kept in “unit.00000/” and the iteration specific inputs/outputs can be found in their specific folders (=”unit.00000/iter*”).
$ cd unit.00000/
$ ls
config.ini gro.py input.gro iter1/ iter3/ post_analyze.py reweighting.py run.py spliter.py
grompp.mdp gro.pyc iter0/ iter2/ lsdm.py pre_analyze.py run_analyzer.sh select.py topol.top
As specified above, outputs of the DM-d-MD procedure can be used to recover the free energy landscape of the system. It is however the responsibility of the user to decide how many DM-d-MD cycles he/she wants to perform depending on the region of the configuration space he/she might want to explore. In general, the larger the number of DM-d-MD cycles, the better. However, different systems may require more or less cycles to achieve a complete exploration of their free energy landscape. The free energy landscape can be plotted every nsave cycle. The .gro file in backup/iterX can be used to compute any specific collective variables to build the free energy plot. The weights contained in the .w file should be used to “reweight” each configuration when computing the free energy histogram.
API Reference¶
Execution Context API¶
class radical.ensemblemd.SingleClusterEnvironment (resource, cores, walltime, database_url, queue = None, username = None, allocation = None, cleanup = False)¶
A static execution context provides a fixed set of computational resources.
- name: Returns the name of the execution context
- allocate(): Allocates the resources
- deallocate(): Deallocates the resources
- run(pattern, force_plugin = None): Create a new SingleClusterEnvironment instance
- get_name(): Returns the name of the execution pattern
Execution Pattern API¶
pre_loop()¶
The radical.ensemblemd.Kernel
returned by pre_loop is executed before the main simulation-analysis loop is started. It can be used for example to set up structures, initialize experimental environments, one-time data transfers and so on.
- Returns:
- Implementations of this method must return either a single or a list of
radical.ensemblemd.Kernel
object(s). An exception is thrown otherwise.
simulation_step(iteration, instance)¶
The radical.ensemblemd.Kernel
returned by simulation_step is executed once per loop iteration before analysis_step.
- Arguments:
- iteration [int] - The iteration parameter is a positive integer and references the current iteration of the simulation-analysis loop.
- instance [int] - The instance parameter is a positive integer and references the instance of the simulation step, which is in the range [1 .. simulation_instances].
- Returns:
- Implementations of this method must return either a single or a list of
radical.ensemblemd.Kernel
object(s). An exception is thrown otherwise.
analysis_step(iteration, instance)¶
The radical.ensemblemd.Kernel
returned by analysis_step is executed once per loop iteration after simulation_step.
- Arguments:
- iteration [int] - The iteration parameter is a positive integer and references the current iteration of the simulation-analysis loop.
- instance [int] - The instance parameter is a positive integer and references the instance of the simulation step, which is in the range [1 .. simulation_instances].
- Returns:
- Implementations of this method must return either a single or a list of
radical.ensemblemd.Kernel
object(s). An exception is thrown otherwise.
post_loop()¶
The radical.ensemblemd.Kernel
returned by post_loop is executed after the main simulation-analysis loop has finished. It can be used for example to set up structures, initialize experimental environments and so on.
- Returns:
- Implementations of this method must return a single
radical.ensemblemd.Kernel
object. An exception is thrown otherwise.
Application Kernel API¶
class radical.ensemblemd.Kernel (name, args = None)¶
The Kernel provides functions to support file movement as required by the pattern.
- cores: number of cores the kernel is using.
- upload_input_data: Instructs the application to upload one or more files or directories from the host the script is running on into the kernel’s execution directory.
- Example:
k = Kernel(name="misc.ccount") k.arguments = ["--inputfile=input.txt", "--outputfile=output.txt"] k.upload_input_data = ["/location/on/HOST/RUNNING/THE/SCRIPT/data.txt > input.txt"]
- download_input_data: Instructs the kernel to download one or more files or directories from a remote HTTP server into the kernel’s execution directory.
- Example:
k = Kernel(name="misc.ccount") k.arguments = ["--inputfile=input.txt", "--outputfile=output.txt"] k.download_input_data = ["http://REMOTE.WEBSERVER/location/data.txt > input.txt"]
- copy_input_data: Instructs the kernel to copy one or more files or directories from the execution host’s filesystem into the kernel’s execution directory.
- Example:
k = Kernel(name="misc.ccount") k.arguments = ["--inputfile=input.txt", "--outputfile=output.txt"] k.copy_input_data = ["/location/on/EXECUTION/HOST/data.txt > input.txt"]
- link_input_data: Instructs the kernel to create a link to one or more files or directories on the execution host’s filesystem in the kernel’s execution directory.
- Example:
k = Kernel(name="misc.ccount") k.arguments = ["--inputfile=input.txt", "--outputfile=output.txt"] k.link_input_data = ["/location/on/EXECUTION/HOST/data.txt > input.txt"]
- download_output_data: Instructs the application to download one or more files or directories from the kernel’s execution directory back to the host the script is running on.
- Example:
k = Kernel(name="misc.ccount") k.arguments = ["--inputfile=input.txt", "--outputfile=output.txt"] k.download_output_data = ["output.txt > output-run-1.txt"]
- copy_output_data: Instructs the application to download one or more files or directories from the kernel’s execution directory to a directory on the execution host’s filesystem.
- Example:
k = Kernel(name="misc.ccount") k.arguments = ["--inputfile=input.txt", "--outputfile=output.txt"] k.download_output_data = ["output.txt > /location/on/EXECUTION/HOST/output.txt"]
- get_raw_args(): Returns the arguments passed to the kernel.
- get arg(name): Returns the value of the kernel argument given by ‘arg_name’.
Exceptions & Errors¶
This module defines and implement all ensemblemd Exceptions.
- exception radical.ensemblemd.exceptions.EnsemblemdError(msg): EnsemblemdError is the base exception thrown by the ensemblemd library. [source]
Bases: exceptions.Exception
- exception radical.ensemblemd.exceptions.NotImplementedError(method_name, class_name): NotImplementedError is thrown if a class method or function is not implemented. [source]
Bases: radical.ensemblemd.exceptions.EnsemblemdError
- exception radical.ensemblemd.exceptions.TypeError(expected_type, actual_type): TypeError is thrown if a parameter of a wrong type is passed to a method or function. [source]
Bases: radical.ensemblemd.exceptions.EnsemblemdError
- exception radical.ensemblemd.exceptions.FileError(message): FileError is thrown if something goes wrong related to file operations, i.e., if a file doesn’t exist, cannot be copied and so on. [source]
Bases: radical.ensemblemd.exceptions.EnsemblemdError
- exception radical.ensemblemd.exceptions.ArgumentError(kernel_name, message, valid_arguments_set): A BadArgumentError is thrown if a wrong set of arguments were passed to a kernel. [source]
Bases: radical.ensemblemd.exceptions.EnsemblemdError
- exception radical.ensemblemd.exceptions.NoKernelPluginError(kernel_name): NoKernelPluginError is thrown if no kernel plug-in could be found for a given kernel name. [source]
Bases: radical.ensemblemd.exceptions.EnsemblemdError
- exception radical.ensemblemd.exceptions.NoKernelConfigurationError(kernel_name, resource_key): NoKernelConfigurationError is thrown if no kernel configuration could be found for the provided resource key. [source]
Bases: radical.ensemblemd.exceptions.EnsemblemdError
- exception radical.ensemblemd.exceptions.NoExecutionPluginError(pattern_name, context_name, plugin_name): NoExecutionPluginError is thrown if a patterns is passed to an execution context via execut() but no execution plugin for the pattern exist. [source]
Bases: radical.ensemblemd.exceptions.EnsemblemdError
Customization¶
Writing New Application Kernels¶
While the current set of available application kernels might provide a good set of tools to start, sooner or later you will probably want to use a tool for which no application Kernel exsits. This section describes how you can add your custom kernels.
We have two files, user_script.py which contains the user application which uses our custom kernel, new_kernel.py which contains the definition of the custom kernel. You can download them from the following links:
Let’s first take a look at new_kernel.py
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | from radical.ensemblemd.kernel_plugins.kernel_base import KernelBase
# ------------------------------------------------------------------------------
#
_KERNEL_INFO = {
"name": "sleep", #Mandatory
"description": "sleeping kernel", #Optional
"arguments": { #Mandatory
"--interval=": {
"mandatory": True, #Mandatory argument? True or False
"description": "Number of seconds to do nothing."
},
},
"machine_configs": #Use a dictionary with keys as resource names and values specific to the resource
{
"local.localhost":
{
"environment" : None, #list or None, can be used to set environment variables
"pre_exec" : None, #list or None, can be used to load modules
"executable" : ["/bin/sleep"], #specify the executable to be used
"uses_mpi" : False #mpi-enabled? True or False
},
}
}
# ------------------------------------------------------------------------------
#
class MyUserDefinedKernel(KernelBase):
def __init__(self):
super(MyUserDefinedKernel, self).__init__(_KERNEL_INFO)
"""Le constructor."""
# --------------------------------------------------------------------------
#
@staticmethod
def get_name():
return _KERNEL_INFO["name"]
def _bind_to_resource(self, resource_key):
"""This function binds the Kernel to a specific resource defined in
"resource_key".
"""
arguments = ['{0}'.format(self.get_arg("--interval="))]
self._executable = _KERNEL_INFO["machine_configs"][resource_key]["executable"]
self._arguments = arguments
self._environment = _KERNEL_INFO["machine_configs"][resource_key]["environment"]
self._uses_mpi = _KERNEL_INFO["machine_configs"][resource_key]["uses_mpi"]
self._pre_exec = _KERNEL_INFO["machine_configs"][resource_key]["pre_exec"]
# ------------------------------------------------------------------------------
|
Lines 5-24 contain information about the kernel to be defined. “name” and “arguments” keys are mandatory. The “arguments” key needs to specify the arguments the kernel expects. You can specify whether the individual arguments are mandatory or not. “machine_configs” is not mandatory, but creating a dictionary with resource names (same as defined in the SingleClusterEnvironment) as keys and values which are resource specific lets use the same kernel to be used on different machines.
In lines 28-50, we define a user defined class (of “KernelBase” type) with 3 mandatory functions. First the constructor, self-explanatory. Second, a static method that is used by EnsembleMD to differentiate kernels. Third, _bind_to_resource
which is the function that (as the name suggests) binds the kernel with its resource specific values, during execution. In lines 41, 43-45, you can see how the “machine_configs” dictionary approach is helping us solve the tool-level heterogeneity across resources. There might be other ways to do this (if conditions,etc), but we feel this could be quite convenient.
Now, let’s take a look at user_script.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | from radical.ensemblemd import Kernel
from radical.ensemblemd import Pipeline
from radical.ensemblemd import EnsemblemdError
from radical.ensemblemd import SingleClusterEnvironment
#Used to register user defined kernels
from radical.ensemblemd.engine import get_engine
#Import our new kernel
from new_kernel import MyUserDefinedKernel
# Register the user-defined kernel with Ensemble MD Toolkit.
get_engine().add_kernel_plugin(MyUserDefinedKernel)
#Now carry on with your application as usual !
class Sleep(Pipeline):
def __init__(self, instances,steps):
Pipeline.__init__(self, instances,steps)
def step_1(self, instance):
"""This step sleeps for 60 seconds."""
k = Kernel(name="sleep")
k.arguments = ["--interval=10"]
return k
# ------------------------------------------------------------------------------
#
if __name__ == "__main__":
try:
# Create a new static execution context with one resource and a fixed
# number of cores and runtime.
cluster = SingleClusterEnvironment(
resource="local.localhost",
cores=1,
walltime=15,
username=None,
project=None
)
# Allocate the resources.
cluster.allocate()
# Set the 'instances' of the pipeline to 16. This means that 16 instances
# of each pipeline step are executed.
#
# Execution of the 16 pipeline instances can happen concurrently or
# sequentially, depending on the resources (cores) available in the
# SingleClusterEnvironment.
sleep = Sleep(steps=1,instances=16)
cluster.run(sleep)
cluster.deallocate()
except EnsemblemdError, er:
print "Ensemble MD Toolkit Error: {0}".format(str(er))
raise # Just raise the execption again to get the backtrace
|
There are 3 important lines in this script. In line 7, we import the get_engine function in order to register our new kernel. In line 10, we import our new kernel and in line 13, we register our kernel. THAT’S IT. We can continue with the application as in the previous examples.
Writing a Custom Resource Configuration File¶
A number of resources are already supported by RADICAL-Pilot, they are list in List of Pre-Configured Resources. If you want to use RADICAL-Pilot with a resource that is not in any of the provided configuration files, you can write your own, and drop it in $HOME/.radical/pilot/configs/<your_site>.json.
Note
Be advised that you may need system admin level knowledge for the target cluster to do so. Also, while RADICAL-Pilot can handle very different types of systems and batch system, it may run into trouble on specific configurationsor versions we did not encounter before. If you run into trouble using a cluster not in our list of officially supported ones, please drop us a note on the users mailing list.
A configuration file has to be valid JSON. The structure is as follows:
# filename: lrz.json { "supermuc": { "description" : "The SuperMUC petascale HPC cluster at LRZ.", "notes" : "Access only from registered IP addresses.", "schemas" : ["gsissh", "ssh"], "ssh" : { "job_manager_endpoint" : "loadl+ssh://supermuc.lrz.de/", "filesystem_endpoint" : "sftp://supermuc.lrz.de/" }, "gsissh" : { "job_manager_endpoint" : "loadl+gsissh://supermuc.lrz.de:2222/", "filesystem_endpoint" : "gsisftp://supermuc.lrz.de:2222/" }, "default_queue" : "test", "lrms" : "LOADL", "task_launch_method" : "SSH", "mpi_launch_method" : "MPIEXEC", "forward_tunnel_endpoint" : "login03", "global_virtenv" : "/home/hpc/pr87be/di29sut/pilotve", "pre_bootstrap" : ["source /etc/profile", "source /etc/profile.d/modules.sh", "module load python/2.7.6", "module unload mpi.ibm", "module load mpi.intel", "source /home/hpc/pr87be/di29sut/pilotve/bin/activate" ], "valid_roots" : ["/home", "/gpfs/work", "/gpfs/scratch"], "pilot_agent" : "radical-pilot-agent-multicore.py" }, "ANOTHER_KEY_NAME": { ... } }
The name of your file (here lrz.json) together with the name of the resource (supermuc) form the resource key which is used in the class:ComputePilotDescription resource attribute (lrz.supermuc).
All fields are mandatory, unless indicated otherwise below.
- description: a human readable description of the resource
- notes: information needed to form valid pilot descriptions, such as which parameter are required, etc.
- schemas: allowed values for the access_schema parameter of the pilot description. The first schema in the list is used by default. For each schema, a subsection is needed which specifies job_manager_endpoint and filesystem_endpoint.
- job_manager_endpoint: access url for pilot submission (interpreted by SAGA)
- filesystem_endpoint: access url for file staging (interpreted by SAGA)
- default_queue: queue to use for pilot submission (optional)
- lrms: type of job management system (LOADL, LSF, PBSPRO, SGE, SLURM, TORQUE, FORK)
- task_launch_method: type of compute node access (required for non-MPI units: SSH,`APRUN` or LOCAL)
- mpi_launch_method: type of MPI support (required for MPI units: MPIRUN, MPIEXEC, APRUN, IBRUN or POE)
- python_interpreter: path to python (optional)
- pre_bootstrap: list of commands to execute for initialization (optional)
- valid_roots: list of shared file system roots (optional). Pilot sandboxes must lie under these roots.
- pilot_agent: type of pilot agent to use (radical-pilot-agent-multicore.py)
- forward_tunnel_endpoint: name of host which can be used to create ssh tunnels from the compute nodes to the outside world (optional)
Several configuration files are part of the RADICAL-Pilot installation, and live under radical/pilot/configs/.
List of Pre-Configured Resources¶
The following list of resources are supported by the underlying layers of ExTASY.
Note
To configure your applications to run on these machines, you would need to add entries to your kernel definitions and specify the environment to be loaded for execution, executable, arguments, etc.
RESOURCE_LOCAL¶
LOCALHOST_SPARK_ANA¶
Your local machine gets spark.
- Resource label :
local.localhost_spark_ana
- Raw config :
resource_local.json
- Note : To use the ssh schema, make sure that ssh access to localhost is enabled.
- Default values for ComputePilotDescription attributes:
queue : None
sandbox : $HOME
access_schema : local
- Available schemas :
local, ssh
LOCALHOST_YARN¶
Your local machine.
Uses the YARN resource management system* Resource label : local.localhost_yarn
* Raw config : resource_local.json
* Note : To use the ssh schema, make sure that ssh access to localhost is enabled.
* Default values for ComputePilotDescription attributes:
queue : None
sandbox : $HOME
access_schema : local
- Available schemas :
local, ssh
LOCALHOST_ANACONDA¶
Your local machine.
To be used when the anaconda python interpreter is enabled* Resource label : local.localhost_anaconda
* Raw config : resource_local.json
* Note : To use the ssh schema, make sure that ssh access to localhost is enabled.
* Default values for ComputePilotDescription attributes:
queue : None
sandbox : $HOME
access_schema : local
- Available schemas :
local, ssh
LOCALHOST¶
Your local machine.
- Resource label :
local.localhost
- Raw config :
resource_local.json
- Note : To use the ssh schema, make sure that ssh access to localhost is enabled.
- Default values for ComputePilotDescription attributes:
queue : None
sandbox : $HOME
access_schema : local
- Available schemas :
local, ssh
LOCALHOST_SPARK¶
Your local machine gets spark.
- Resource label :
local.localhost_spark
- Raw config :
resource_local.json
- Note : To use the ssh schema, make sure that ssh access to localhost is enabled.
- Default values for ComputePilotDescription attributes:
queue : None
sandbox : $HOME
access_schema : local
- Available schemas :
local, ssh
RESOURCE_EPSRC¶
ARCHER¶
The EPSRC Archer Cray XC30 system (https://www.archer.ac.uk/)
- Resource label :
epsrc.archer
- Raw config :
resource_epsrc.json
- Note : Always set the
project
attribute in the ComputePilotDescription or the pilot will fail. - Default values for ComputePilotDescription attributes:
queue : standard
sandbox : /work/`id -gn`/`id -gn`/$USER
access_schema : ssh
- Available schemas :
ssh
ARCHER_ORTE¶
The EPSRC Archer Cray XC30 system (https://www.archer.ac.uk/)
- Resource label :
epsrc.archer_orte
- Raw config :
resource_epsrc.json
- Note : Always set the
project
attribute in the ComputePilotDescription or the pilot will fail. - Default values for ComputePilotDescription attributes:
queue : standard
sandbox : /work/`id -gn`/`id -gn`/$USER
access_schema : ssh
- Available schemas :
ssh
RESOURCE_NERSC¶
EDISON_CCM¶
The NERSC Edison Cray XC30 in Cluster Compatibility Mode (https://www.nersc.gov/users/computational-systems/edison/)
- Resource label :
nersc.edison_ccm
- Raw config :
resource_nersc.json
- Note : For CCM you need to use special ccm_ queues.
- Default values for ComputePilotDescription attributes:
queue : ccm_queue
sandbox : $SCRATCH
access_schema : ssh
- Available schemas :
ssh
EDISON¶
The NERSC Edison Cray XC30 (https://www.nersc.gov/users/computational-systems/edison/)
- Resource label :
nersc.edison
- Raw config :
resource_nersc.json
- Note :
- Default values for ComputePilotDescription attributes:
queue : regular
sandbox : $SCRATCH
access_schema : ssh
- Available schemas :
ssh, go
HOPPER¶
The NERSC Hopper Cray XE6 (https://www.nersc.gov/users/computational-systems/hopper/)
- Resource label :
nersc.hopper
- Raw config :
resource_nersc.json
- Note :
- Default values for ComputePilotDescription attributes:
queue : regular
sandbox : $SCRATCH
access_schema : ssh
- Available schemas :
ssh, go
HOPPER_APRUN¶
The NERSC Hopper Cray XE6 (https://www.nersc.gov/users/computational-systems/hopper/)
- Resource label :
nersc.hopper_aprun
- Raw config :
resource_nersc.json
- Note : Only one CU per node in APRUN mode
- Default values for ComputePilotDescription attributes:
queue : regular
sandbox : $SCRATCH
access_schema : ssh
- Available schemas :
ssh
HOPPER_CCM¶
The NERSC Hopper Cray XE6 in Cluster Compatibility Mode (https://www.nersc.gov/users/computational-systems/hopper/)
- Resource label :
nersc.hopper_ccm
- Raw config :
resource_nersc.json
- Note : For CCM you need to use special ccm_ queues.
- Default values for ComputePilotDescription attributes:
queue : ccm_queue
sandbox : $SCRATCH
access_schema : ssh
- Available schemas :
ssh
EDISON_APRUN¶
The NERSC Edison Cray XC30 (https://www.nersc.gov/users/computational-systems/edison/)
- Resource label :
nersc.edison_aprun
- Raw config :
resource_nersc.json
- Note : Only one CU per node in APRUN mode
- Default values for ComputePilotDescription attributes:
queue : regular
sandbox : $SCRATCH
access_schema : ssh
- Available schemas :
ssh, go
RESOURCE_STFC¶
JOULE¶
The STFC Joule IBM BG/Q system (http://community.hartree.stfc.ac.uk/wiki/site/admin/home.html)
- Resource label :
stfc.joule
- Raw config :
resource_stfc.json
- Note : This currently needs a centrally administered outbound ssh tunnel.
- Default values for ComputePilotDescription attributes:
queue : prod
sandbox : $HOME
access_schema : ssh
- Available schemas :
ssh
RESOURCE_RICE¶
DAVINCI¶
The DAVinCI Linux cluster at Rice University (https://docs.rice.edu/confluence/display/ITDIY/Getting+Started+on+DAVinCI).
- Resource label :
rice.davinci
- Raw config :
resource_rice.json
- Note : DAVinCI compute nodes have 12 or 16 processor cores per node.
- Default values for ComputePilotDescription attributes:
queue : parallel
sandbox : $SHARED_SCRATCH/$USER
access_schema : ssh
- Available schemas :
ssh
BIOU¶
The Blue BioU Linux cluster at Rice University (https://docs.rice.edu/confluence/display/ITDIY/Getting+Started+on+Blue+BioU).
- Resource label :
rice.biou
- Raw config :
resource_rice.json
- Note : Blue BioU compute nodes have 32 processor cores per node.
- Default values for ComputePilotDescription attributes:
queue : serial
sandbox : $SHARED_SCRATCH/$USER
access_schema : ssh
- Available schemas :
ssh
RESOURCE_LRZ¶
SUPERMUC¶
The SuperMUC petascale HPC cluster at LRZ, Munich (http://www.lrz.de/services/compute/supermuc/).
- Resource label :
lrz.supermuc
- Raw config :
resource_lrz.json
- Note : Default authentication to SuperMUC uses X509 and is firewalled, make sure you can gsissh into the machine from your registered IP address. Because of outgoing traffic restrictions your MongoDB needs to run on a port in the range 20000 to 25000.
- Default values for ComputePilotDescription attributes:
queue : test
sandbox : $HOME
access_schema : gsissh
- Available schemas :
gsissh, ssh
RESOURCE_NCSA¶
BW_CCM¶
The NCSA Blue Waters Cray XE6/XK7 system in CCM (https://bluewaters.ncsa.illinois.edu/)
- Resource label :
ncsa.bw_ccm
- Raw config :
resource_ncsa.json
- Note : Running ‘touch .hushlogin’ on the login node will reduce the likelihood of prompt detection issues.
- Default values for ComputePilotDescription attributes:
queue : normal
sandbox : /scratch/sciteam/$USER
access_schema : gsissh
- Available schemas :
gsissh
BW¶
The NCSA Blue Waters Cray XE6/XK7 system (https://bluewaters.ncsa.illinois.edu/)
- Resource label :
ncsa.bw
- Raw config :
resource_ncsa.json
- Note : Running ‘touch .hushlogin’ on the login node will reduce the likelihood of prompt detection issues.
- Default values for ComputePilotDescription attributes:
queue : normal
sandbox : /scratch/sciteam/$USER
access_schema : gsissh
- Available schemas :
gsissh
BW_LOCAL¶
The NCSA Blue Waters Cray XE6/XK7 system (https://bluewaters.ncsa.illinois.edu/)
- Resource label :
ncsa.bw_local
- Raw config :
resource_ncsa.json
- Note : Running ‘touch .hushlogin’ on the login node will reduce the likelihood of prompt detection issues.
- Default values for ComputePilotDescription attributes:
queue : normal
sandbox : /scratch/training/$USER
access_schema : local
- Available schemas :
local
BW_APRUN¶
The NCSA Blue Waters Cray XE6/XK7 system (https://bluewaters.ncsa.illinois.edu/)
- Resource label :
ncsa.bw_aprun
- Raw config :
resource_ncsa.json
- Note : Running ‘touch .hushlogin’ on the login node will reduce the likelihood of prompt detection issues.
- Default values for ComputePilotDescription attributes:
queue : normal
sandbox : /scratch/sciteam/$USER
access_schema : gsissh
- Available schemas :
gsissh
RESOURCE_RADICAL¶
TUTORIAL¶
Our private tutorial VM on EC2
- Resource label :
radical.tutorial
- Raw config :
resource_radical.json
- Default values for ComputePilotDescription attributes:
queue : batch
sandbox : $HOME
access_schema : ssh
- Available schemas :
ssh, local
RESOURCE_XSEDE¶
LONESTAR¶
The XSEDE ‘Lonestar’ cluster at TACC (https://www.tacc.utexas.edu/resources/hpc/lonestar).
- Resource label :
xsede.lonestar
- Raw config :
resource_xsede.json
- Note : Always set the
project
attribute in the ComputePilotDescription or the pilot will fail. - Default values for ComputePilotDescription attributes:
queue : normal
sandbox : $HOME
access_schema : ssh
- Available schemas :
ssh, gsissh
COMET_SPARK¶
The Comet HPC resource at SDSC ‘HPC for the 99%’ (http://www.sdsc.edu/services/hpc/hpc_systems.html#comet).
- Resource label :
xsede.comet_spark
- Raw config :
resource_xsede.json
- Note : Always set the
project
attribute in the ComputePilotDescription or the pilot will fail. - Default values for ComputePilotDescription attributes:
queue : compute
sandbox : $HOME
access_schema : ssh
- Available schemas :
ssh, gsissh
WRANGLER¶
The XSEDE ‘Wrangler’ cluster at TACC (https://www.tacc.utexas.edu/wrangler/).
- Resource label :
xsede.wrangler
- Raw config :
resource_xsede.json
- Note : Always set the
project
attribute in the ComputePilotDescription or the pilot will fail. - Default values for ComputePilotDescription attributes:
queue : normal
sandbox : $WORK
access_schema : ssh
- Available schemas :
ssh, gsissh, go
STAMPEDE_YARN¶
The XSEDE ‘Stampede’ cluster at TACC (https://www.tacc.utexas.edu/stampede/).
- Resource label :
xsede.stampede_yarn
- Raw config :
resource_xsede.json
- Note : Always set the
project
attribute in the ComputePilotDescription or the pilot will fail. - Default values for ComputePilotDescription attributes:
queue : normal
sandbox : $WORK
access_schema : gsissh
- Available schemas :
gsissh, ssh, go
STAMPEDE¶
The XSEDE ‘Stampede’ cluster at TACC (https://www.tacc.utexas.edu/stampede/).
- Resource label :
xsede.stampede
- Raw config :
resource_xsede.json
- Note : Always set the
project
attribute in the ComputePilotDescription or the pilot will fail. - Default values for ComputePilotDescription attributes:
queue : normal
sandbox : $WORK
access_schema : gsissh
- Available schemas :
gsissh, ssh, go
COMET_SSH¶
The Comet HPC resource at SDSC ‘HPC for the 99%’ (http://www.sdsc.edu/services/hpc/hpc_systems.html#comet).
- Resource label :
xsede.comet_ssh
- Raw config :
resource_xsede.json
- Note : Always set the
project
attribute in the ComputePilotDescription or the pilot will fail. - Default values for ComputePilotDescription attributes:
queue : compute
sandbox : $HOME
access_schema : ssh
- Available schemas :
ssh, gsissh
WRANGLER_YARN¶
The XSEDE ‘Wrangler’ cluster at TACC (https://www.tacc.utexas.edu/wrangler/).
- Resource label :
xsede.wrangler_yarn
- Raw config :
resource_xsede.json
- Note : Always set the
project
attribute in the ComputePilotDescription or the pilot will fail. - Default values for ComputePilotDescription attributes:
queue : hadoop
sandbox : $WORK
access_schema : ssh
- Available schemas :
ssh, gsissh, go
GORDON¶
The XSEDE ‘Gordon’ cluster at SDSC (http://www.sdsc.edu/us/resources/gordon/).
- Resource label :
xsede.gordon
- Raw config :
resource_xsede.json
- Note : Always set the
project
attribute in the ComputePilotDescription or the pilot will fail. - Default values for ComputePilotDescription attributes:
queue : normal
sandbox : $HOME
access_schema : ssh
- Available schemas :
ssh, gsissh
BLACKLIGHT¶
The XSEDE ‘Blacklight’ cluster at PSC (https://www.psc.edu/index.php/computing-resources/blacklight).
- Resource label :
xsede.blacklight
- Raw config :
resource_xsede.json
- Note : Always set the
project
attribute in the ComputePilotDescription or the pilot will fail. - Default values for ComputePilotDescription attributes:
queue : batch
sandbox : $HOME
access_schema : ssh
- Available schemas :
ssh, gsissh
WRANGLER_SPARK¶
The XSEDE ‘Wrangler’ cluster at TACC (https://www.tacc.utexas.edu/wrangler/).
- Resource label :
xsede.wrangler_spark
- Raw config :
resource_xsede.json
- Note : Always set the
project
attribute in the ComputePilotDescription or the pilot will fail. - Default values for ComputePilotDescription attributes:
queue : normal
sandbox : $WORK
access_schema : ssh
- Available schemas :
ssh, gsissh, go
COMET¶
The Comet HPC resource at SDSC ‘HPC for the 99%’ (http://www.sdsc.edu/services/hpc/hpc_systems.html#comet).
- Resource label :
xsede.comet
- Raw config :
resource_xsede.json
- Note : Always set the
project
attribute in the ComputePilotDescription or the pilot will fail. - Default values for ComputePilotDescription attributes:
queue : compute
sandbox : $HOME
access_schema : ssh
- Available schemas :
ssh, gsissh
SUPERMIC¶
SuperMIC (pronounced ‘Super Mick’) is Louisiana State University’s (LSU) newest supercomputer funded by the National Science Foundation’s (NSF) Major Research Instrumentation (MRI) award to the Center for Computation & Technology. (https://portal.xsede.org/lsu-supermic)
- Resource label :
xsede.supermic
- Raw config :
resource_xsede.json
- Note : Partially allocated through XSEDE. Primary access through GSISSH. Allows SSH key authentication too.
- Default values for ComputePilotDescription attributes:
queue : workq
sandbox : /work/$USER
access_schema : gsissh
- Available schemas :
gsissh, ssh
STAMPEDE_SPARK¶
The XSEDE ‘Stampede’ cluster at TACC (https://www.tacc.utexas.edu/stampede/).
- Resource label :
xsede.stampede_spark
- Raw config :
resource_xsede.json
- Note : Always set the
project
attribute in the ComputePilotDescription or the pilot will fail. - Default values for ComputePilotDescription attributes:
queue : normal
sandbox : $WORK
access_schema : gsissh
- Available schemas :
gsissh, ssh, go
TRESTLES¶
The XSEDE ‘Trestles’ cluster at SDSC (http://www.sdsc.edu/us/resources/trestles/).
- Resource label :
xsede.trestles
- Raw config :
resource_xsede.json
- Note : Always set the
project
attribute in the ComputePilotDescription or the pilot will fail. - Default values for ComputePilotDescription attributes:
queue : normal
sandbox : $HOME
access_schema : ssh
- Available schemas :
ssh, gsissh
GREENFIELD¶
The XSEDE ‘Greenfield’ cluster at PSC (https://www.psc.edu/index.php/computing-resources/greenfield).
- Resource label :
xsede.greenfield
- Raw config :
resource_xsede.json
- Note : Always set the
project
attribute in the ComputePilotDescription or the pilot will fail. - Default values for ComputePilotDescription attributes:
queue : batch
sandbox : $HOME
access_schema : ssh
- Available schemas :
ssh, gsissh
RESOURCE_ORNL¶
TITAN_ORTE¶
The Cray XK7 supercomputer located at the Oak Ridge Leadership Computing Facility (OLCF), (https://www.olcf.ornl.gov/titan/)
- Resource label :
ornl.titan_orte
- Raw config :
resource_ornl.json
- Note : Requires the use of an RSA SecurID on every connection.
- Default values for ComputePilotDescription attributes:
queue : batch
sandbox : $MEMBERWORK/`groups | cut -d' ' -f2`
access_schema : ssh
- Available schemas :
ssh, local, go
TITAN_APRUN¶
The Cray XK7 supercomputer located at the Oak Ridge Leadership Computing Facility (OLCF), (https://www.olcf.ornl.gov/titan/)
- Resource label :
ornl.titan_aprun
- Raw config :
resource_ornl.json
- Note : Requires the use of an RSA SecurID on every connection.
- Default values for ComputePilotDescription attributes:
queue : batch
sandbox : $MEMBERWORK/`groups | cut -d' ' -f2`
access_schema : ssh
- Available schemas :
ssh, local, go
Troubleshooting¶
Some issues that you might face during the execution are discussed here.
Execution fails with “Couldn’t read packet: Connection reset by peer”¶
You encounter the following error when running any of the extasy workflows:
...
#######################
## ERROR ##
#######################
Pilot 54808707f8cdba339a7204ce has FAILED. Can't recover.
Pilot log: [u'Pilot launching failed: Insufficient system resources: Insufficient system resources: read from process failed \'[Errno 5] Input/output error\' : (Shared connection to stampede.tacc.utexas.edu closed.\n)
...
TO fix this, create a file ~/.saga/cfg
in your home directory and add the following two lines:
[saga.utils.pty]
ssh_share_mode = no
This switches the SSH transfer layer into “compatibility” mode which should address the “Connection reset by peer” problem.
Configuring SSH Access¶
From a terminal from your local machine, setup a key pair with your email address.
$ ssh-keygen -t rsa -C "name@email.com" Generating public/private rsa key pair. Enter file in which to save the key (/home/user/.ssh/id_rsa): [Enter] Enter passphrase (empty for no passphrase): [Passphrase] Enter same passphrase again: [Passphrase] Your identification has been saved in /home/user/.ssh/id_rsa. Your public key has been saved in /home/user/.ssh/id_rsa.pub. The key fingerprint is: 03:d4:c4:6d:58:0a:e2:4a:f8:73:9a:e8:e3:07:16:c8 your@email.ac.uk The key's randomart image is: +--[ RSA 2048]----+ | . ...+o++++. | | . . . =o.. | |+ . . .......o o | |oE . . | |o = . S | |. +.+ . | |. oo | |. . | | .. | +-----------------+
Next you need to transfer it to the remote machine.
To transfer to Stampede,
$cat ~/.ssh/id_rsa.pub | ssh username@stampede.tacc.utexas.edu 'cat - >> ~/.ssh/authorized_keys'
To transfer to Archer,
cat ~/.ssh/id_rsa.pub | ssh username@login.archer.ac.uk 'cat - >> ~/.ssh/authorized_keys'
Error: Permission denied (publickey,keyboard-interactive) in AGENT.STDERR¶
The Pilot does not start running and goes to the ‘Done’ state directly from ‘PendingActive’. Please check the AGENT.STDERR file for “Permission denied (publickey,keyboard-interactive)” .
Permission denied (publickey,keyboard-interactive). kill: 19932: No such process
You require to setup passwordless, intra-node SSH access. Although this is default in most HPC clusters, this might not be the case always.
On the head-node, run:
cd ~/.ssh/ ssh-keygen -t rsa
Do not enter a passphrase. The result should look like this:
Generating public/private rsa key pair. Enter file in which to save the key (/home/e290/e290/oweidner/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/e290/e290/oweidner/.ssh/id_rsa. Your public key has been saved in /home/e290/e290/oweidner/.ssh/id_rsa.pub. The key fingerprint is: 73:b9:cf:45:3d:b6:a7:22:72:90:28:0a:2f:8a:86:fd oweidner@eslogin001 The key's randomart image is: +--[ RSA 2048]----+ | . ...+o++++. | | . . . =o.. | |+ . . .......o o | |oE . . | |o = . S | |. +.+ . | |. oo | |. . | | .. | +-----------------+
Next, you need to add this key to the authorized_keys file.
cat id_rsa.pub >> ~/.ssh/authorized_keys
This should be all. Next time you run radical.pilot, you shouldn’t see that error message anymore.
Error: Couldn’t create new session¶
If you get an error similar to,
An error occurred: Couldn't create new session (database URL 'mongodb://extasy:extasyproject@extasy-db.epcc.ac.uk/radicalpilot' incorrect?): [Errno -2] Name or service not known
Exception triggered, no session created, exiting now...
This means no session was created, mostly due to error in the MongoDB URL that is present in the resource configuration file. Please check the URL that you have used. If the URL is correct, you should check the system on which the MongoDB is hosted.
Error: Prompted for unkown password¶
If you get an error similar to,
An error occurred: prompted for unknown password (username@stampede.tacc.utexas.edu's password: ) (/experiments/extasy/local/lib/python2.7/site-packages/saga/utils/pty_shell_factory.py +306 (_initialize_pty) : % match))
You should check the username that is present in the resource configuration file. If the username is correct, you should check if you have a passwordless login set up for the target machine. You can check this by simply attempting a login to the target machine, if this attempt requires a password, you need to set up a passwordless login to use ExTASY.
Error: Pilot has FAILED. Can’t recover¶
If you get an error similar to,
ExTASY version : 0.1.3-beta-15-g9e16ce7
Session UID: 55102e9023769c19e7c8a84e
Pilot UID : 55102e9123769c19e7c8a850
[Callback]: ComputePilot '55102e9123769c19e7c8a850' state changed to Launching.
Loading kernel configurations from /experiments/extasy/lib/python2.7/site-packages/radical/ensemblemd/mdkernels/configs/mmpbsa.json
Loading kernel configurations from /experiments/extasy/lib/python2.7/site-packages/radical/ensemblemd/mdkernels/configs/coco.json
Loading kernel configurations from /experiments/extasy/lib/python2.7/site-packages/radical/ensemblemd/mdkernels/configs/namd.json
Loading kernel configurations from /experiments/extasy/lib/python2.7/site-packages/radical/ensemblemd/mdkernels/configs/lsdmap.json
Loading kernel configurations from /experiments/extasy/lib/python2.7/site-packages/radical/ensemblemd/mdkernels/configs/amber.json
Loading kernel configurations from /experiments/extasy/lib/python2.7/site-packages/radical/ensemblemd/mdkernels/configs/gromacs.json
Loading kernel configurations from /experiments/extasy/lib/python2.7/site-packages/radical/ensemblemd/mdkernels/configs/sleep.json
Loading kernel configurations from /experiments/extasy/lib/python2.7/site-packages/radical/ensemblemd/mdkernels/configs/test.json
Preprocessing stage ....
[Callback]: ComputePilot '55102e9123769c19e7c8a850' state changed to Failed.
#######################
## ERROR ##
#######################
Pilot 55102e9123769c19e7c8a850 has FAILED. Can't recover.
Pilot log: [<radical.pilot.logentry.Logentry object at 0x7f41f8043a10>, <radical.pilot.logentry.Logentry object at 0x7f41f8043610>, <radical.pilot.logentry.Logentry object at 0x7f41f80433d0>, <radical.pilot.logentry.Logentry object at 0x7f41f8043750>, <radical.pilot.logentry.Logentry object at 0x7f41f8043710>, <radical.pilot.logentry.Logentry object at 0x7f41f8043690>]
Execution was interrupted
Closing session, exiting now ...
This generally means either the Allocation ID or Queue name present in the resource configuration file is incorrect. If this is not the case, please re-run the experiment with the environment variables EXTASY_DEBUG=True, SAGA_VERBOSE=DEBUG, RADICAL_PILOT_VERBOSE=DEBUG. Example,
EXTASY_DEBUG=True SAGA_VERBOSE=DEBUG RADICAL_PILOT_VERBOSE=DEBUG extasy --RPconfig stampede.rcfg --Kconfig gromacslsdmap.wcfg 2> output.log
This should generate a more verbose output. You may look at this verbose output for errors or create a ticket with this log here )
Couldn’t send packet: Broken pipe¶
If you get an error similar to,
2015:03:30 16:05:07 radical.pilot.MainProcess: [DEBUG ] read : [ 19] [ 159] ( ls /work/e290/e290/e290ib/radical.pilot.sandbox/pilot-55196431d7bf7579ecc ^H3f080/unit-551965f7d7bf7579ecc3f09b/lsdmap.log\nCouldn't send packet: Broken pipe\n)
2015:03:30 16:05:08 radical.pilot.MainProcess: [ERROR ] Output transfer failed: read from process failed '[Errno 5] Input/output error' : (s --:-- ETA/home/h012/ibethune/testlsdmap2/input.gro 100% 105KB 104.7KB/s 00:00
sftp> ls /work/e290/e290/e290ib/radical.pilot.sandbox/pilot-55196431d7bf7579ecc ^H3f080/unit-551965f7d7bf7579ecc3f09b/lsdmap.log
Couldn't send packet: Broken pipe
This is mostly because of an older version of sftp/scp being used. This can be fixed by setting an environment variable SAGA_PTY_SSH_SHAREMODE
to no
.
export SAGA_PTY_SSH_SHAREMODE=no
Writing a Custom Resource Configuration File¶
If you want to use RADICAL-Pilot with a resource that is not in any of the provided configuration files, you can write your own, and drop it in $HOME/.radical/pilot/configs/<your_site>.json.
Note
Be advised that you may need system admin level knowledge for the target cluster to do so. Also, while RADICAL-Pilot can handle very different types of systems and batch system, it may run into trouble on specific configurationsor versions we did not encounter before. If you run into trouble using a cluster not in our list of officially supported ones, please drop us a note on the users mailing list.
A configuration file has to be valid JSON. The structure is as follows:
# filename: lrz.json { "supermuc": { "description" : "The SuperMUC petascale HPC cluster at LRZ.", "notes" : "Access only from registered IP addresses.", "schemas" : ["gsissh", "ssh"], "ssh" : { "job_manager_endpoint" : "loadl+ssh://supermuc.lrz.de/", "filesystem_endpoint" : "sftp://supermuc.lrz.de/" }, "gsissh" : { "job_manager_endpoint" : "loadl+gsissh://supermuc.lrz.de:2222/", "filesystem_endpoint" : "gsisftp://supermuc.lrz.de:2222/" }, "default_queue" : "test", "lrms" : "LOADL", "task_launch_method" : "SSH", "mpi_launch_method" : "MPIEXEC", "forward_tunnel_endpoint" : "login03", "global_virtenv" : "/home/hpc/pr87be/di29sut/pilotve", "pre_bootstrap" : ["source /etc/profile", "source /etc/profile.d/modules.sh", "module load python/2.7.6", "module unload mpi.ibm", "module load mpi.intel", "source /home/hpc/pr87be/di29sut/pilotve/bin/activate" ], "valid_roots" : ["/home", "/gpfs/work", "/gpfs/scratch"], "pilot_agent" : "radical-pilot-agent-multicore.py" }, "ANOTHER_KEY_NAME": { ... } }
The name of your file (here lrz.json) together with the name of the resource (supermuc) form the resource key which is used in the class:ComputePilotDescription resource attribute (lrz.supermuc).
All fields are mandatory, unless indicated otherwise below.
- description: a human readable description of the resource
- notes: information needed to form valid pilot descriptions, such as which parameter are required, etc.
- schemas: allowed values for the access_schema parameter of the pilot description. The first schema in the list is used by default. For each schema, a subsection is needed which specifies job_manager_endpoint and filesystem_endpoint.
- job_manager_endpoint: access url for pilot submission (interpreted by SAGA)
- filesystem_endpoint: access url for file staging (interpreted by SAGA)
- default_queue: queue to use for pilot submission (optional)
- lrms: type of job management system (LOADL, LSF, PBSPRO, SGE, SLURM, TORQUE, FORK)
- task_launch_method: type of compute node access (required for non-MPI units: SSH,`APRUN` or LOCAL)
- mpi_launch_method: type of MPI support (required for MPI units: MPIRUN, MPIEXEC, APRUN, IBRUN or POE)
- python_interpreter: path to python (optional)
- pre_bootstrap: list of commands to execute for initialization (optional)
- valid_roots: list of shared file system roots (optional). Pilot sandboxes must lie under these roots.
- pilot_agent: type of pilot agent to use (radical-pilot-agent-multicore.py)
- forward_tunnel_endpoint: name of host which can be used to create ssh tunnels from the compute nodes to the outside world (optional)
Several configuration files are part of the RADICAL-Pilot installation, and live under radical/pilot/configs/.