Welcome to VirAmp’s documentation!

VirAmp is a galaxy-based system for fast virus genome assembly and variation discovery.

Quick Start Guide:

  1. Launch the latest version of the “Szpara_Viramp” AMI from Amazon Web Services

2) SSH into the server and start the run.sh script using screen ./run.sh

Contents:

Introduction

The following graphic is an overview of how the VirAmp platform works:

_images/pipeline_overview.png

Advances in next generation sequencing make it possible to obtain high-coverage sequence data for large numbers of viral strains in a short time. However, since most bioinformatics tools are developed for command line use, the selection and accessibility of computational tools for genome assembly and variation analysis limits the ability of individual labs to perform further bioinformatics analysis.We have developed a multi-step viral genome assembly pipeline named VirAmp, that combines existing tools and techniques and presents them to end users via a web-enabled Galaxy interface. Our pipeline allows users to assemble, analyze and interpret high coverage viral sequencing data with an ease and efficiency that was not possible previously. Our software makes a large number of genome assembly and related tools available to life scientists and automates the currently recommended best practices into a single, easy to use interface. We tested our pipeline with three different datasets from human herpes simplex virus (HSV).VirAmp provides a user-friendly interface and a complete pipeline for viral genome analysis. We make our software available via an Amazon Elastic Cloud disk image that can be easily launched by anyone with an Amazon web service account. A demonstration version of our system can be found at http://www.viramp.com. We also maintain detailed documentation on each tool and methodology at http://docs.viramp.com.

Usage

This is a general description of the usage and function of each tool found in the VirAmp pipeline. A more detailed description can be found at the webpage of each tool.

One-click pipeline

Two general pipelines are provided with a one-click option, one for paired-end data and the other for single-end data. Users are only required to submit read files and a reference file corresponding to their data. Alongside the default settings, users may use the “advanced setting” option to custom configure the pipeline with alternative parameters.

_images/paired_end_pipeline.png

Quality Control

First, trim out the low quality bases of the input fastq files. This can be achieved by either removing low quality bases or trimming a certain length from each end.

_images/trim_qual.png

Diginorm

Next, reduce coverage and bias using Digital normalization. This step reduces the sample variation as well as sample bias.

_images/diginorm.png

de novo Contig assembly

Now, the pipeline assembles the short reads into longer contigs. By default the One-click pipeline uses velvet. Two alternatives, SPAdes and VICUNA , are provided and can be selected as either individual tools or through the advanced options in the one-click pipeline.

_images/de-novo.png

Reference-based scaffolding

The contigs are then assembled into even longer super-contigs. This step is a modification of AMOScmp

_images/amoscmp.png

Reference-independent scaffolding

The next step extends the super-contigs and connects them using SSPACE. The pipeline will produce a draft genome as a multi-fasta file usually containing 5~15 contigs which are listed in the same order as the reference.

_images/sspace.png

Gap closing

This step connects all the contigs in the multi-fasta from the previous step into one linear genome for the convenience of downstream functional analysis. However, this is optional and highly recommended to be done only after assessing the draft genome, as the gaps between the contigs could be from misassembly, sequencing, genome feature, etc.

_images/linear.png

Post-Assembly Analysis

VirAmp not only provides all the processes related to assembly, but also integrates multiple tools for post-assembly processing including quality assessment and variation analysis.

QUAST REPORT

It is important to evaluate how robust the new assembly is before it is fed into the downstream functional analysis. VirAmp constructs a report of common assembly evaluation metrics based on comparisons with the reference. A detailed QUAST report can be downloaded for further evaluation.

The inputs required are the reference genome and the newly created assembly.

_images/quast-input.png

The primary output of QUAST is a summary of common assembly evaluation metrics.

_images/quast-basic.png

Alternatively, a more detailed QUAST report can also be downloaded.

_images/quast-download.png

Unzip and open the report.

_images/quast-unzip.png

A demonstration of a QUAST plot:

_images/quast-demo.png

Assembly-Reference Alignment

VirAmp provides information about the difference between the reference and the new assembly based on a MUMmer alignment. Coordinates and percentage identities are displayed for each aligned region between these two sequences. This is useful in identifying large INDELs as well as other complex structural variations. Table 1 demonstrates an example of the comparison report generated by this tool.

_images/ref-assembly.png

Circos graph visualization

Circos projects the assembled draft genome to the aligned part of the reference genome, creating a straightforward visualization for the above alignment and providing insight into large structural variations.

_images/comparison_circos.png

SNP analysis

Using the alignment between the assembly and the reference, SNP information is displayed in VCF format.

_images/snp.png

Repeat and Tandem repeat analysis

By aligning the assembly against itself, VirAmp additionally provides repeat and tandem repeat information. The starting coordinates and lengths of the repeats are dervied from this alignment.

_images/tandem-repeat.png

Custom installation of the VirAmp AMI

Access http://aws.amazon.com/, in a Web browser.

Select ‘My Account/Console’ on the top right if you already have an account; otherwise sign up with a new account.

Go to the ‘AWS Management Console’ option, click the ‘EC2’ at upper left.

Before importing the AMI, make sure you are in the correct Availability zone. Amazon EC2 is hosted in multiple locations world-wide with multiple Availability zones, and resources cannot be replicated across regions until specified. Our AMI is stored in region “US East(N. Virginia)”. Check the upper right corner next to your account name, and make sure it’s set at the correct region. If not, just click and select the correct one from the dropdown menu.

Next, click the blue ‘Launch Instance’ button.

Step-1: Choosing the instance

Click the Community AMIs tab at mid-left and simply search “Szpara_Viramp”

_images/search-ami.png

Step-2: Review Instance type

Due to storage and computational requirements, free tier instances are not usable with our AMI. For trial runs it is possible to choose smaller instance types, but for serious usage it is advised to select at least the m3.large (third option)

_images/review-instance-type.png

Step-3: Launch the Instance

_images/review-and-launch.png

Step-4: Create Key-pairs

_images/key-pair.png

You have now successfully launched your own version of the instance. For information on logging in and starting your instance, please go to VirAmp instance login

Log in to the new instance

Instructions and an overview of the basic steps and parameters you need to login to the instance are provided at the console.

_images/connect-info.png

Hit the “Connect” button to view information you need to login to the backend of the system.

_images/connect.png

Start your terminal and type the following command:

chmod 400 myPemName.pem

Connect to your instance using your public IP:

ssh -i myPemName.pem ubuntu@public_IP

Change to the galaxy directory:

cd /mnt/galaxy/galaxy-dist/

Change viramp settings:

vi universe_wsgi.ini

Line 596: admin_users = dwr19@psu.edu should be changed to reflect the current administrators email address Line 662: ftp_upload_site = viramp.com should be changed from viramp.com to your public ip address

Start the viramp server:

screen ./run.sh CTRL-a-d