Welcome to VirAmp’s documentation!¶
VirAmp is a galaxy-based system for fast virus genome assembly and variation discovery.
Quick Start Guide:
- Launch the latest version of the “Szpara_Viramp” AMI from Amazon Web Services
2) SSH into the server and start the run.sh script using screen
./run.sh
Contents:
Introduction¶
The following graphic is an overview of how the VirAmp platform works:

Advances in next generation sequencing make it possible to obtain high-coverage sequence data for large numbers of viral strains in a short time. However, since most bioinformatics tools are developed for command line use, the selection and accessibility of computational tools for genome assembly and variation analysis limits the ability of individual labs to perform further bioinformatics analysis.We have developed a multi-step viral genome assembly pipeline named VirAmp, that combines existing tools and techniques and presents them to end users via a web-enabled Galaxy interface. Our pipeline allows users to assemble, analyze and interpret high coverage viral sequencing data with an ease and efficiency that was not possible previously. Our software makes a large number of genome assembly and related tools available to life scientists and automates the currently recommended best practices into a single, easy to use interface. We tested our pipeline with three different datasets from human herpes simplex virus (HSV).VirAmp provides a user-friendly interface and a complete pipeline for viral genome analysis. We make our software available via an Amazon Elastic Cloud disk image that can be easily launched by anyone with an Amazon web service account. A demonstration version of our system can be found at http://www.viramp.com. We also maintain detailed documentation on each tool and methodology at http://docs.viramp.com.
Usage¶
This is a general description of the usage and function of each tool found in the VirAmp pipeline. A more detailed description can be found at the webpage of each tool.
One-click pipeline¶
Two general pipelines are provided with a one-click option, one for paired-end data and the other for single-end data. Users are only required to submit read files and a reference file corresponding to their data. Alongside the default settings, users may use the “advanced setting” option to custom configure the pipeline with alternative parameters.

Quality Control¶
First, trim out the low quality bases of the input fastq files. This can be achieved by either removing low quality bases or trimming a certain length from each end.

Diginorm¶
Next, reduce coverage and bias using Digital normalization. This step reduces the sample variation as well as sample bias.

de novo Contig assembly¶
Now, the pipeline assembles the short reads into longer contigs. By default the One-click pipeline uses velvet. Two alternatives, SPAdes and VICUNA , are provided and can be selected as either individual tools or through the advanced options in the one-click pipeline.

Reference-based scaffolding¶
The contigs are then assembled into even longer super-contigs. This step is a modification of AMOScmp

Reference-independent scaffolding¶
The next step extends the super-contigs and connects them using SSPACE. The pipeline will produce a draft genome as a multi-fasta file usually containing 5~15 contigs which are listed in the same order as the reference.

Gap closing¶
This step connects all the contigs in the multi-fasta from the previous step into one linear genome for the convenience of downstream functional analysis. However, this is optional and highly recommended to be done only after assessing the draft genome, as the gaps between the contigs could be from misassembly, sequencing, genome feature, etc.

Post-Assembly Analysis¶
VirAmp not only provides all the processes related to assembly, but also integrates multiple tools for post-assembly processing including quality assessment and variation analysis.
QUAST REPORT¶
It is important to evaluate how robust the new assembly is before it is fed into the downstream functional analysis. VirAmp constructs a report of common assembly evaluation metrics based on comparisons with the reference. A detailed QUAST report can be downloaded for further evaluation.
The inputs required are the reference genome and the newly created assembly.

The primary output of QUAST is a summary of common assembly evaluation metrics.

Alternatively, a more detailed QUAST report can also be downloaded.

Unzip and open the report.

A demonstration of a QUAST plot:

Assembly-Reference Alignment¶
VirAmp provides information about the difference between the reference and the new assembly based on a MUMmer alignment. Coordinates and percentage identities are displayed for each aligned region between these two sequences. This is useful in identifying large INDELs as well as other complex structural variations. Table 1 demonstrates an example of the comparison report generated by this tool.

Circos graph visualization¶
Circos projects the assembled draft genome to the aligned part of the reference genome, creating a straightforward visualization for the above alignment and providing insight into large structural variations.

SNP analysis¶
Using the alignment between the assembly and the reference, SNP information is displayed in VCF format.

Repeat and Tandem repeat analysis¶
By aligning the assembly against itself, VirAmp additionally provides repeat and tandem repeat information. The starting coordinates and lengths of the repeats are dervied from this alignment.

Custom installation of the VirAmp AMI¶
Access http://aws.amazon.com/, in a Web browser.
Select ‘My Account/Console’ on the top right if you already have an account; otherwise sign up with a new account.
Go to the ‘AWS Management Console’ option, click the ‘EC2’ at upper left.
Before importing the AMI, make sure you are in the correct Availability zone. Amazon EC2 is hosted in multiple locations world-wide with multiple Availability zones, and resources cannot be replicated across regions until specified. Our AMI is stored in region “US East(N. Virginia)”. Check the upper right corner next to your account name, and make sure it’s set at the correct region. If not, just click and select the correct one from the dropdown menu.
Next, click the blue ‘Launch Instance’ button.
Step-1: Choosing the instance¶
Click the Community AMIs tab at mid-left and simply search “Szpara_Viramp”

Step-2: Review Instance type¶
Due to storage and computational requirements, free tier instances are not usable with our AMI. For trial runs it is possible to choose smaller instance types, but for serious usage it is advised to select at least the m3.large (third option)

Step-3: Launch the Instance¶

Step-4: Create Key-pairs¶

You have now successfully launched your own version of the instance. For information on logging in and starting your instance, please go to VirAmp instance login
Log in to the new instance¶
Instructions and an overview of the basic steps and parameters you need to login to the instance are provided at the console.

Hit the “Connect” button to view information you need to login to the backend of the system.

Start your terminal and type the following command:
chmod 400 myPemName.pem
Connect to your instance using your public IP:
ssh -i myPemName.pem ubuntu@public_IP
Change to the galaxy directory:
cd /mnt/galaxy/galaxy-dist/
Change viramp settings:
vi universe_wsgi.ini
Line 596: admin_users = dwr19@psu.edu should be changed to reflect the current administrators email address Line 662: ftp_upload_site = viramp.com should be changed from viramp.com to your public ip address
Start the viramp server:
screen
./run.sh
CTRL-a-d
For further information on the individual tools VirAmp utilizes please see the following websites: