Apollo

Apollo - A collaborative, real-time, genome annotation web-based editor.

The application’s technology stack includes a Grails-based Java web application with flexible database backends and a Javascript client that runs in a web browser as a JBrowse plugin.

You can find the latest release here: https://github.com/GMOD/Apollo/releases/latest and our setup guide: http://genomearchitect.readthedocs.io/en/latest/Setup.html

https://travis-ci.org/GMOD/Apollo.png?branch=master

Note: This documentation covers release versions 2.x of Apollo. For the 1.0.4 installation please refer to the installation guide found at http://genomearchitect.readthedocs.io/en/1.0.4/

Contents:

Setup guide

The quick-start guide showed how to quickly launch a temporary instance of Apollo, but deploying the application to production normally involves some extra steps.

The general idea behind your deployment is to create a apollo-config.groovy file from some existing sample files which have sample settings for various database engines.

Pre-requisites

The server will minimally need to have Java 8 or greater, Grails, git, ant, a servlet container e.g. tomcat7+, jetty, or resin. An external database such as PostgreSQL (9 or 10 preferred) is generally used for production, but instructions for MySQL or the H2 Java database (which may also be run embedded) are also provided.

To build the system natively JDK8 is required (typically OpenJDK8). To run the war, Java 8 or greater should be fine.

Important note: The default memory for Tomcat and Jetty is insufficient to run Apollo (and most other web apps).You should increase the memory according to these instructions.

Other possible build settings for JBrowse:

Ubuntu / Debian

 sudo apt-get install zlib1g zlib1g-dev libexpat1-dev libpng-dev libgd2-noxpm-dev build-essential git python-software-properties python make

RedHat / CentOS

 sudo apt-get install zlib zlib-dev expat-dev libpng-dev libgd2-noxpm-dev build-essential git python-software-properties python make

It is recommended to use the default version of JBrowse or better (though it does not work with JBrowse 2 yet).

There are additional requirements if doing development with Apollo.

Install node and yarn

Node versions 6-12 have been tested and work. nvm and ``nvm install 8``` is recommended.

npm install -g yarn

Install jdk

Build settings for Apollo specifically. Recent versions of tomcat7 will work, though tomcat 8 and 9 are preferred. If it does not install automatically there are a number of ways to build tomcat on linux:

sudo apt-get install ant openjdk-8-jdk 
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/  # or set in .bashrc / .profile / .zshrc
export JAVA_HOME=`/usr/libexec/java_home -v 1.8` # OR

If you need to have multiple versions of java (note #2222), you will need to specify the version for tomcat. In tomcat8 on Ubuntu you’ll need to set the /etc/default/tomcat8 file JAVA_HOME explicitly:

JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

Download Apollo from the latest release under source-code and unzip.Test installation by running ./apollo run-local and see that the web-server starts up on http://localhost:8080/apollo/.To setup for production continue onto configuration below after install .

Database configuration

Apollo supports several database backends, and you can choose sample configurations from using H2, Postgres, or MySQL by default.

Each has a file called sample-h2-apollo-config.groovy or sample-postgres-apollo-config.groovy that is designed to be renamed to apollo-config.groovy before running apollo deploy. Additionally, you can also run via docker.

Furthermore, the apollo-config.groovy has different groovy environments for test, development, and production modes. The environment will be selected automatically selected depending on how it is run, e.g:

  • apollo deploy use the production environment (i.e. when you copy the war file to your production server apollo run-local or apollo debug use the development environment (i.e. when you are running it locally)
  • apollo test uses the test environment (i.e. only when running unit tests)
Configure for H2:
  • H2 is an embedded database engine, so no external setups are needed. Simply copy sample-h2-apollo-config.groovy to apollo-config.groovy.
    • The default dev environment (apollo run-local or apollo run-app) is in memory so you will have to change that to file.
  • If you use H2 with tomcat or jetty in production you have to set the permissions for the file path in production correctly (e.g. jdbc:h2:/mypath/prodDb, chown -u tomcat:tomcat /mypath/prodDb.*.db).
    • If you use the local relative path jdbc:h2:./prodDb and tomcat8 the path will likely be: /usr/share/tomcat8/prodDb*db
Configure for PostgreSQL:
  • Create a new database with postgres and add a user for production mode. Here are a few ways to do this in PostgreSQL.
  • Copy the sample-postgres-apollo-config.groovy to apollo-config.groovy.
Configure for MySQL:
  • Create a new MySQL database for production mode (i.e. run ``create database `apollo-production``` in the mysql console) and copy the sample-postgres-apollo-config.groovy to apollo-config.groovy.
Apollo in Galaxy

Apollo can always be used externally from Galaxy, but there are a few integrations available as well.

Database schema

After you startup the application, the database schema (tables, etc.) is automatically setup. You don’t have to initialize any database schemas yourself.

Deploy the application

The apollo run-local command only launches a temporary server and should really not be used in production, so to deploy to production, we build a new WAR file with the apollo deploy command. After you have setup your apollo-config.groovy file, and it has the appropriate username, password, and JDBC URL in it, then we can run the command:

./apollo deploy

This command will package the application and it will download any missing pre-requisites (jbrowse) into a WAR file in the “target/” subfolder. After it completes, you can then copy the WAR file (e.g. apollo-2.0.4.war) from the target folder to the web-app folder of your web container installation. If you name the file apollo.war in your webapps folder, then you can access your app at “http://localhost:8080/apollo”

We test primarily on Apache Tomcat (7.0.62+ and 8). Make sure to set your Tomcat memory to an appropriate size or Apollo will run slow / crash.

Alternatively, as we alluded to previously, you can also launch a temporary instance of the server which is useful for testing

./apollo run-local 8085

This temporary server will be accessible at “http://localhost:8085/apollo”

Tomcat configuration

If you have tracks that have deep nested features that will result in a feature JSON larger than 10MB or if you have a client that sends requests to the Apollo server as JSON of size larger than 10MB then you will have to modify src/war/templates/web.xml.

Specifically the following block in web.xml:

    <context-param>
        <param-name>org.apache.tomcat.websocket.textBufferSize</param-name>
        <param-value>10000000</param-value>
    </context-param>
    <context-param>
        <param-name>org.apache.tomcat.websocket.binaryBufferSize</param-name>
        <param-value>10000000</param-value>
    </context-param>

Note: The <param-value> is in bytes.

Memory configuration

Changing the memory used by Apollo in production must be configured within Tomcat directly.

The default memory assigned to Apollo to run commands in Apollo is 2048 MB. This can be changed in your apollo-config.groovy by uncommenting the memory configuration block:

// Uncomment to change the default memory configurations
grails.project.fork = [
        test   : false,
        // configure settings for the run-app JVM
        run    : [maxMemory: 2048, minMemory: 64, debug: false, maxPerm: 1024, forkReserve: false],
        // configure settings for the run-war JVM
        war    : [maxMemory: 2048, minMemory: 64, debug: false, maxPerm: 1024, forkReserve: false],
        // configure settings for the Console UI JVM
        console: [maxMemory: 2048, minMemory: 64, debug: false, maxPerm: 1024]
]

Note on database settings

If you use the apollo run-local command, then the “development” section of the apollo-config.groovy is used (or an temporary in-memory H2 database is used if no apollo-config.groovy exists).

If you use the WAR file generated by the apollo deploy command on your own webserver, then the “production” section of the apollo-config.groovy is used.

Detailed build instructions

While the shortcut apollo deploy takes care of basic application deployment, understanding the full build process of Apollo can help you to optimize and improve your deployed instances.

To learn more about the architecture of webapollo, view the architecture guide.

Using Docker to Run Apollo

You can install Docker for your system if not previously done.

Running the Container

You can see Apollo straight away:

docker run -it -p 8888:8080 gmod/apollo:stable -v /jbrowse/root/directory/:/data

Open http://localhost:8888 in a web browser and login with admin@local.host / password to get started.

Note: data is not guaranteed to be saved in this manner, but data is /jbrowse/root/directory will not be written to either.

Production

To run in production against persistent JBrowse data and a persistent database you should:

  • docker pull gmod/apollo if running latest build to guarantee you are using the latest build (not necessary for point releases).
  • Create an empty directory for database data, e.g. /postgres/data/directory if you want to save data if the image goes down.
  • If you want to upload tracks and genomes directories, create an empty directory for that, e.g., /jbrowse/root/apollo_data
  • Put JBrowse data in a directory, e.g. /jbrowse/root/directory/.
  • If publicly visible set a username and password
    docker run -it  \
    -v /jbrowse/root/directory/:/data  \
    -v /postgres/data/directory:/var/lib/postgresql  \
    -v /jbrowse/root/apollo_data:/data/temporary/apollo_data \
    -e APOLLO_ADMIN_EMAIL=adminuser \
    -e APOLLO_ADMIN_PASSWORD=superdupersecretpassword \
    -p 8888:8080 gmod/apollo:latest

Additional configuration

See docker run instructions to run as a daemon (-d) and with a fresh container each time (--rm) depending on your use-case.

Additional options could be to set memory (required for running production off a mac) --memory=4g, running a docker daemon d or adding debugging to the server -e "WEBAPOLLO_DEBUG=true". For example (after creating the local apollo_shared_dir):

docker run --memory=4g -d -it -p 8888:8080 -v `pwd`/apollo_shared_dir/:`pwd`/apollo_shared_dir/ -e "WEBAPOLLO_DEBUG=true"  -v /postgres/data/directory:/var/lib/postgresql gmod/apollo:latest 

You can configure additional options by setting environmental variables for docker apollo-config.groovy by passing through via multiple -e parameters.

For example:

NOTE: If you don’t use a locally mounted PostgreSQL database (e.g., creating an empty directory and mounting using -v postgres-data:/var/lib/postgresql) or set appropriate environment variables for a remote database ( see variables defined here) your annotations and setup may not be persisted.

Notes on releases and availability

The image is available on docker hub.

On docker hub you always pull from gmod/apollo:release-<version> where version is something like 2.6.2.

Versions

On docker hub versions are stable is the master branch or the latest stable release (e.g., 2.6.0), latest is the checkin, which has not necessarily been thoroughly tested and release-X.Y.Z represents the release of the tag X.Y.Z (e.g., release-2.6.0).

quay.io mirrors tags and all branches directly. So master is master and X.Y.Z is the same. So quay.io/<user or orgname>/apollo:2.6.0 is the same as <user or org name>/apollo:release-2.6.0, but you’ll have to fork your own version

See what is avaiable for docker hub builds.

Logging In

The default credentials in this image are:

| Credentials | | | — | —————— | | Username | admin@local.host | | Password | password |

Example Workflow

  1. Make the following directories somewhere with write permissions: postgres-data and jbrowse-data.
  2. Copy your jbrowse data into jbrowse-data. We provide working sample data.
  3. Run the docker-command: docker run -it -v /absolute/path/to/jbrowse-data:/data -v /absolute/path/to/postgres-data:/var/lib/postgresql -p 8888:8080 gmod/apollo:latest
  4. Login to the server at http://localhost:8888/
  5. Add an organism per the instructions under Figure 2. Using yeast as an example, if you copy the data into jbrowse-data/yeast then on the server you’ll add the directory: /data/yeast.

_images/organism_add.png

Running your own preloaded data in a fork

Here is an example of running pre-loaded data from Apollo: https://github.com/alliance-genome/agr_apollo_container

Note that important changes here are in:

To create this we loaded the original, configured as we wanted and dumped the sql file out.

Apollo Configuration

Apollo includes some basic configuration parameters that are specified in configuration files. The most important parameters are the database parameters in order to get Apollo up and running. Other options besides the database parameters can be configured via the config files, but note that many parameters can also be configured via the web interface.

Note: Configuration options may change over time, as more configuration items are integrated into the web interface.

Main configuration

The main configuration settings for Apollo are stored in grails-app/conf/Config.groovy, but you can override settings in your apollo-config.groovy file (i.e. the same file that contains your database parameters). Here are the defaults that are defined in the Config.groovy file:

// default apollo settings
apollo {
    gff3.source = "." // also for GPAD
    // other translation codes are of the form ncbi_KEY_translation_table.txt
    // under the web-app/translation_tables  directory
    // to add your own add them to that directory and over-ride the translation code here
    get_translation_code = 1
   
    proxies = [
            [
                    referenceUrl : 'http://golr.geneontology.org/select',
                    targetUrl    : 'http://golr.geneontology.org/solr/select',
                    active       : true,
                    fallbackOrder: 0,
                    replace      : true
            ]
            ,
            [
                    referenceUrl : 'http://golr.geneontology.org/select',
                    targetUrl    : 'http://golr.berkeleybop.org/solr/select',
                    active       : false,
                    fallbackOrder: 1,
                    replace      : false
            ]
    ]
    fa_to_twobit_exe = "/usr/local/bin/faToTwoBit" // get from https://genome.ucsc.edu/goldenPath/help/blatSpec.html
    sequence_search_tools = [
            blat_nuc : [
                    search_exe  : "/usr/local/bin/blat",
                    search_class: "org.bbop.apollo.sequence.search.blat.BlatCommandLineNucleotideToNucleotide",
                    name        : "Blat nucleotide",
                    params      : ""
            ],
            blat_prot: [
                    search_exe  : "/usr/local/bin/blat",
                    search_class: "org.bbop.apollo.sequence.search.blat.BlatCommandLineProteinToNucleotide",
                    name        : "Blat protein",
                    params      : ""
                    //tmp_dir: "/opt/apollo/tmp" optional param
            ]
    ]
    ...
}

These settings are essentially the same familiar parameters from a config.xml file from previous Apollo versions. The defaults are generally sufficient, but as noted above, you can override any particular parameter in your apollo-config.groovy file, e.g. you can add override configuration any given parameter as follows:

grails {
  apollo.get_translation_code = 1
  apollo {
    use_cds_for_new_transcripts = true
    default_minimum_intron_size = 1
    get_translation_code = 1  // identical to the dot notation
  }
}

Suppress calculation of non-canonical splice sites

By default we calculate non-canonical splice sites. For some organisms this is undesirable.

apollo.calculate_non_canonical_splice_sites = false 

Count annotations

By default annotations are counted, but in some cases this can be come prohibitive for performance if a lot of annotations. This can be shut off by setting this to false. This can over-ridden as below in the apollo-config.groovy file:

grails {
  apollo.count_annotations = false
  apollo {
    count_annotations = false
  }
}

Suppress add merged comments

By default, when you merge two isoforms, it will automatically create a comment indicating the name and unique ID from the consumed isoform that was used as a comment.

grails {
  apollo.add_merged_comment = false
  apollo {
    add_merged_comment = false
  }
}

JBrowse Plugins and Configuration

You can configure the installed Apollo JBrowse by modifying the jbrowse section of your apollo-config.groovy that overrides the JBrowse configuration file.

There are two sections, plugins and git, which specifies the JBrowse version.

 git {
        url = "https://github.com/gmod/jbrowse"
        branch = "1.16.11-release"

If a git block a tag or branch can be specified.

In the plugins section, options are included (part of the JBrowse release), url (requiring a url parameter), or git, which can include a tag or branch as above.

Options for alwaysRecheck and alwaysRepull always check the branch and tag and always pull respectiviely.

See sample-*.groovy for example sections: https://github.com/GMOD/Apollo/blob/develop/sample-h2-apollo-config.groovy#L112-L146

Translation tables

The default translation table is 1

To use a different table from this list of NCBI translation tables set the number in the apollo-config.groovy file as:

apollo {
...
  get_translation_code = "11"

You may also add a custom translation table in the web-app/translation_tables directory as follows:

web-app/translation_tables/ncbi_customname_translation_table.txt

Specify the customname in apollo-config.groovy as follows:

apollo {
...
  get_translation_code = "customname"
}

As well, translation tables can be set per organism using the ‘Details’ panel located in the ‘Organism’ tab of the Annotator panel in the Apollo window: to replace the translation table (default or set by admin) for any given organism, use the field labeled as ‘Non-default Translation Table’ to enter a different table identifier as needed.

Configuring Transcript Overlapper

Apollo, by default, uses a CDS overlapper which treats two overlapping transcripts as isoforms of each other if and only if they share the same in-frame CDS.

You can also configure Apollo to use an exon overlapper, which would treat two overlapping transcripts as isoforms of each other if one or more exon overlaps with each other they share the same splice acceptor and splice donor sites.

apollo {
    transcript_overlapper = "exon"
}

Logging configuration

To over-ride the default logging, you can look at the logging configurations from Config.groovy and override or modify them in apollo-config.groovy.

log4j.main = {
    error 'org.codehaus.groovy.grails.web.servlet',  // controllers
          'org.codehaus.groovy.grails.web.pages',    // GSP
          'org.codehaus.groovy.grails.web.sitemesh', // layouts
           ...
    warn 'grails.app'
}

To add debug-level logging you would replace warn 'grails.app' with two lines debug 'grails.app' and debug 'org.bbop.apollo'. To see database-level logging you would also add: trace 'org.hibernate.type' and debug 'org.hibernate.SQL'.

Additional links for log4j:

  • Advanced log4j configuration: http://blog.andresteingress.com/2012/03/22/grails-adding-more-than-one-log4j-configurations/
  • Grails log4j guide: http://grails.github.io/grails-doc/2.4.x/guide/single.html#logging

Add attribute for the original id of the object

In the apollo store_orig_id=true is set to true by default. To store an orid_id attribute on the top-level feature that represents the original id from the genomic evidence. This is useful for re-merging code as Apollo will generate its own IDs because annotations will be based on multiple evidence sources. To turn this off, override it by setting it to false store_orig_id = false.

Canned Elements

Canned comments, canned keys (tags), and canned values are configured using the Admin tab from the Annotator Panel on the web interface; these can no longer be created or edited using the configuration files. For more details on how to create and edit Canned Elements see Canned Elements.

View your instances page for more details. For example

  • http://localhost:8080/apollo/cannedComment/
  • http://localhost:8080/apollo/cannedKey/
  • http://localhost:8080/apollo/cannedValue/

Search tools

Apollo can be configured to work with various sequence search tools. UCSC’s BLAT tool is configured by default and you can customize it as follows by making modifications in the apollo-config.groovy file. Here we replace blat with blast (there is an existing wrapper for Blast). The database for each file will be passed in via params (globally) or using the Blat database field in the organism tab. For blast the database will be the root name of the blast database files without the suffix. Retrieve blat binaries from ucsc.

apollo{
    fa_to_twobit_exe = "/usr/local/bin/faToTwoBit" // get from https://genome.ucsc.edu/goldenPath/help/blatSpec.html
	sequence_search_tools {
        blat_nuc {
            search_exe = "/usr/local/bin/blastn"
            search_class = "org.bbop.apollo.sequence.search.blast.BlastCommandLine"
            name = "Blast nucleotide"
            params = ""
        }
        blat_prot {
            search_exe = "/usr/local/bin/tblastn"
            search_class = "org.bbop.apollo.sequence.search.blast.BlastCommandLine"
            name = "Blast protein to translated nucleotide"
            params = ""
            //tmp_dir: "/opt/apollo/tmp" optional param
        }
        your_custom_search_tool {
          search_exe = "/usr/local/customtool"
          search_class = "org.your.custom.Class"
          name: "Custom search"
        }
    }
}

When you setup your organism in the web interface, you can then enter the location of the sequence search database for BLAT.

If you setup fa_to_twobit_exe with the proper path, fasta uploads for new genomes will automatically be indexed and populated.

Note: If the BLAT binaries reside elsewhere on your system, edit the search_exe location in the config to point to your BLAT executable.

Data adapters

Data adapters for Apollo provide the methods for exporting annotation data from the application. By default, GFF3 and FASTA adapters are supplied. They are configured to query your IOService URL e.g. http://localhost:8080/apollo/IOService with the customizable query

data_adapters = [[
  permission: 1,
  key: "GFF3",
  data_adapters: [[
    permission: 1,
    key: "Only GFF3",
    options: "output=file&format=gzip&type=GFF3&exportGff3Fasta=false"
  ],
  [
    permission: 1,
    key: "GFF3 with FASTA",
    options: "output=file&format=gzip&type=GFF3&exportGff3Fasta=true"
  ]]
],
[
  permission: 1,
  key : "FASTA",
  data_adapters :[[
    permission : 1,
    key : "peptide",
    options : "output=file&format=gzip&type=FASTA&seqType=peptide"
  ],
  [
    permission : 1,
    key : "cDNA",
    options : "output=file&format=gzip&type=FASTA&seqType=cdna"
  ],
  [
    permission : 1,
    key : "CDS",
    options : "output=file&format=gzip&type=FASTA&seqType=cds"
  ]]
]]

Default data adapter options

The options available for the data adapters are configured as follows

  • type: GFF3 or FASTA
  • output: can be file or text. file exports to a file and provides a UUID link for downloads, text just outputs to stream.
  • format: can by gzip or plain. gzip offers gzip compression of the exports, which is the default.
  • exportSequence: true or false, which is used to include FASTA sequence at the bottom of a GFF3 export

Supported annotation types

Many configurations will require you to define which annotation types the configuration will apply to. Apollo supports the following “higher level” types (from the Sequence Ontology):

  • sequence:gene
  • sequence:pseudogene
  • sequence:transcript
  • sequence:mRNA
  • sequence:tRNA
  • sequence:snRNA
  • sequence:snoRNA
  • sequence:ncRNA
  • sequence:rRNA
  • sequence:miRNA
  • sequence:repeat_region
  • sequence:transposable_element

Modify CORS

We are using the grails-cors plugin. To configure it specifically or turn it off override the options:

cors.url.pattern = '*'
cors.enable.logging = true
cors.enabled = true
cors.headers = ['Access-Control-Allow-Origin': '*']

Set the default biotype for dragging up evidence

By default dragged up evidence is treated as mRNA. However, you can specify the default biotype within trackList.json in order to specify default types for tracks.

For example, specifying ncRNA as the default type:

{
    'key' : 'Official Gene Set v3.2 Canvas',
    'storeClass' : 'JBrowse/Store/SeqFeature/NCList',
    'urlTemplate' : 'tracks/Official Gene Set v3.2/{refseq}/trackData.json',
    'default_biotype':'ncRNA'
}

If you specify auto instead then it will automatically try to infer based on a feature’s type.

Other non-transcript types repeat_region and transposable_element are also supported.

Apache / Nginx configuration

Oftentimes, admins will put use Apache or Nginx as a reverse proxy so that the requests to a main server can be forwarded to the tomcat server. This setup is not necessary, but it is a very standard configuration as is making modification to iptables.

Note that we use the SockJS library, which will downgrade to long-polling if websockets are not available, but since websockets are preferable, it helps to take some extra steps to ensure that the websocket calls are proxied or forwarded in some way too. Using Tomcat 8 or above is recommended.

If using a separate Oauth2 provider, a more detailed example of handling both the proxy and the authentication with OpenID Connect has also been provided.

Installing secure certificates.

Free certificates can be found by using certbot.

Follow the instructions to install your appropriate certificate if users are going to potentially be sending passwords across.

Apache Proxy

Here is the most basic configuration for a reverse proxy with Apache 2.4 (will probably work for 2.2 as well).

Enable proxy_pass and proxy_wstunnel:

sudo a2enmod proxy proxy_wstunnel proxy_connect proxy_http
sudo service apache2 restart

In the apache conf directory edit proxy.conf

   <Proxy *>
      # if using Apache 2.2 use Order, Allow directives
      Order Deny,Allow
      Allow from all

      # if using Apache 2.4 use Require directive
      Require all granted

    </Proxy>
    
    ProxyPass /apollo/stomp/info http://localhost:8080/apollo/stomp/info
    ProxyPassReverse /apollo/stomp/info http://localhost:8080/apollo/stomp/info

    ProxyPass /apollo/stomp ws://localhost:8080/apollo/stomp
    ProxyPassReverse /apollo/stomp ws://localhost:8080/apollo/stomp

    ProxyPass           /apollo  http://localhost:8080/apollo
    ProxyPassReverse    /apollo  http://localhost:8080/apollo

If Tomcat is running SSL

If the secure certificate is on Apollo and you’re running via apache use https and wss protocols instead or just change the tomcat server port explicitly:

    ProxyPass /apollo/stomp/info https://site:8443/apollo/stomp/info
    ProxyPassReverse /apollo/stomp/info https://localhost:8443/apollo/stomp/info

    ProxyPass /apollo/stomp wss://localhost:8443/apollo/stomp
    ProxyPassReverse /apollo/stomp wss://localhost:8443/apollo/stomp

    ProxyPass           /apollo  https://localhost:8443/apollo
    ProxyPassReverse    /apollo  https://localhost:8443/apollo

Note: that a reverse proxy does not use ProxyRequests On (which turns on forward proxying, which is dangerous)

Also note: This setup will downgrade (but will still function) to use AJAX long-polling without the websocket proxy being configured.

Debugging proxy issues

Note: if your webapp is accessible but it doesn’t seem like you can login, you may need to customize the ProxyPassReverseCookiePath

For example, if you proxied to a different path, you might have something like this

ProxyPass  /testing http://localhost:8080
ProxyPassReverse  /testing http://localhost:8080
ProxyPassReverseCookiePath / /testing

Then your application might be accessible from http://localhost/testing/apollo

Nginx Proxy (from version 1.4 on)

Your setup may vary, but setting the upgrade headers can be used for the websocket configuration http://nginx.org/en/docs/http/websocket.html

    map $http_upgrade $connection_upgrade {
        default upgrade;
        ''      close;
    }
    
    server {
        # Main
        listen   80; server_name  myserver;
        
        # http://nginx.org/en/docs/http/websocket.html
        location /ApolloSever {
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection $connection_upgrade;
            proxy_pass      http://127.0.0.1:8080;
        }
    }

Adding extra tabs

Extra tabs can be added to the side panel by over-riding the apollo configuration extraTabs:

    extraTabs = [
            ['title': 'extra1', 'url': 'http://localhost:8080/apollo/annotator/report/'],
            ['title': 'extra2', 'content': '<b>Apollo</b> documentation <a href="https://genomearchitect.readthedocs.io/" target="_blank">linked here</a>']
    ]

Upgrading existing instances

There are several scripts for migrating from older instances. See the migration guide for details. Particular notes:

Note: Apollo does not require using the add-webapollo-plugin.pl because the plugin is loaded implicitly by including the client/apollo/json/annot.json file at run time.

Upgrading existing JBrowse data stores

It is not necessary to change your existing JBrowse data directories to use Apollo 2.x, you can just point to existing data directories from your previous instances.

More information about JBrowse can also be found in their FAQ.

Adding custom CSS for track styling for JBrowse

There are a variety of different ways to include new CSS into the browser, but the easiest might be the following

Add the following statement to your trackList.json:

    "css" : "data/yourfile.css"

Then just place your CSS file in your organism’s data directory.

Adding custom CSS globally for JBrowse

If you want to add CSS that is used globally for JBrowse, you can edit the CSS in the client/apollo/css folder, but since you need to re-deploy the app every time for updates, it is easier to just edit the data directories for your organisms (you do not need to re-deploy the app when you are editing organism specific data, since this is outside of the webapp directory and is not deployed with the WAR file)

Adding custom CSS globally for the GWT app

If you want to style the GWT sidebar, generally the bootstrap theme is used but extra CSS is also included from web-app/annotator/theme.css which overrides the bootstrap theme

Adding / using proxies

If you are https, or choose to use separate services rather than the default provided, you can setup a pass-through proxy or modify a particular URL.

This service is only available to logged-in users.

The internal proxy URL is:

<apollo url>/proxy/request/<encoded_proxy_url>/

For example if your URL the URL we want to proxy:

http://golr.geneontology.org/solr/select

encoded:

http%3A%2F%2Fgolr.geneontology.org%2Fsolr%2Fselect

If you user is logged-in and you pass in:

http://localhost/apollo/proxy/request/http%3A%2F%2Fgolr.geneontology.org%2Fsolr%2Fselect?testkey=asdf&anotherkey=zxcv

This will get proxied to:

http://golr.geneontology.org/solr/select?testkey=asdf&anotherkey=zxcv

If you choose to use another proxy service, you can go to the “Proxy” page (as administrator). Internally used proxies are provided by default. The order the final URL is chosen in is ‘active’ and then ‘fallbackOrder’.

Register admin in configuration

If you want to register your admin user in the configuration, you can add a section to your apollo-config.groovy like:

apollo{
// other stuff
    admin{
        username = "super@duperadmin.com"
        password = System.getenv("APOLLO_ADMIN_PASSWORD")?:"demo"
        firstName = "Super"
        lastName = "Admin"
    }
}

It should only add the user a single time. User details can be retrieved from passed in text or from the environment depending on user preference.

Admin users will be added on system startup. Duplicate additions will be ignored.

Other authentication strategies

By default Apollo uses a username / password to authenticate users. However, additional strategies may be used.

To configure them, add them to the apollo-config.groovy and set active to true for the ones you want to use to authenticate.

apollo{
    // other stuff
    authentications = [
        ["name":"Username Password Authenticator",
         "className":"usernamePasswordAuthenticatorService",
         "active":true,
        ]
        ,
        ["name":"Remote User Authenticator",
         "className":"remoteUserAuthenticatorService",
         "active":false,
         "params":["default_group": "annotators"]
        ]
    ]
}

The Username Password Authenticator is the default method for storing username passwords, where databases are stored secured within the database.

The Remote User Authentication method uses a separate Apache authorization proxy, which is used by the Galaxy Community.Furthermore, users and groups can be inserted / updated via web services, which are wrapped by the Apollo python library. The default_group parameter adds a user to a default group on login so that a user has access to at least some genomes.

A more detailed guide using OpenIDConnect authorization explains usage of both the proxy and an authentication strategy.

URL modifications

You should be able to pass in most JBrowse URL modifications to the loadLink URL.

You should use tracklist=1 to force showing the native tracklist (or use the checkbox in the Track Tab in the Annotator Panel).

Use openAnnotatorPanel=0 to close the Annotator Panel explicitly on startup.

Linking to annotations

You can find a link to your current location by clicking the “chain link” icon in the upper, left-hand corner of the Annotator Panel.

It will provide a popup that gives you a public URL to view while not logged in and a one to use while logged in.

####Public URL

//jbrowse/index.html?loc=&tracks=

  • location = :..
  • organism is the organism id or common name if unique.
  • tracks are url-encoded tracks separated by a comma

Example:http://demo.genomearchitect.io/Apollo2/3836/jbrowse/index.html?loc=Group1.31:287765..336436&tracks=Official%20Gene%20Set%20v3.2,GeneID

####Logged in URL

//annotator/loadLink?loc=&organism=&tracks=

  • location = :.. it can also be the annotated feature name if an organims is provided or the uniqueName (see the ID in the annotation detail window), which is typically a UUID and does not require an organism.
  • organism is the organism id or common name if unique.
  • tracks are url-encoded tracks separated by a comma

Examples:

  • http://demo.genomearchitect.io/Apollo2/annotator/loadLink?loc=Group1.31:287765..336436&organism=3836&tracks=Official%20Gene%20Set%20v3.2,GeneID
  • http://demo.genomearchitect.io/Apollo2/annotator/loadLink?loc=GB51936-RA&organism=3836&tracks=Official%20Gene%20Set%20v3.2,GeneID
  • http://demo.genomearchitect.io/Apollo2/annotator/loadLink?uuid=355617c7-f8c1-4105-bb11-755cee1855df&tracks=Official%20Gene%20Set%20v3.2,GeneID

Setting default track list behavior

By default the native tracklist is off, but can be added. For new users if you want the default to be on, you can add this to the apollo-config.groovy:

apollo{
   native_track_selector_default_on = true
}

Set Common Data Directory in the config

The common_data_directory is where uploaded and processed jbrowse tracks will go.

This should be server-writable space on your system that is not deleted (note /tmp is deleted periodically on most unix systems).

common_data_directory = "/opt/temporary/apollo"

If you don’t plan to use these features, then /tmp might be fine.

In general it will create a directory for you at $HOME/apollo_data if not otherwise specified or will allow you to set one from the command-line.

Adding tracks via addStores

The JBrowse Configuration Guide describes in detail on how to add tracks to JBrowse using addStores. The configuration relies on sending track config JSON through the URL which can be problematic, especially with new versions of Tomcat.

Instead we recommend using the dot notation to add track configuration through the URL.

Thus,

addStores={"uniqueStoreName":{"type":"JBrowse/Store/SeqFeature/GFF3","urlTemplate":"url/of/my/file.gff3"}}

becomes,

addStores.uniqueStoreName.type=JBrowse/Store/SeqFeature/GFF3&addStores.uniqueStoreName.urlTemplate=url/of/my/file.gff3

Following are a few recommendations for adding tracks via dot notation in Apollo:

  • avoid {dataRoot} in your urlTemplate
  • avoid specifying data folder name in your urlTemplate
  • avoid specifying baseUrl

Since Apollo is aware of the organism data folder, specifying it explicitly in the urlTemplate can cause issues with URL redirects.

Setting Track Style by type

For the default track type (FeatureTrack) to set the feature style by type (for example, if you have multiple feature types on a single track and you want to distinguish them, you have to set the track className as {type} in the style section of the trackList.json file for that track:

 
 "style": {
        "className": "{type}",
      },

You then have to specify a custom CSS file for that type in the trackList.json:

"css":"data/custom.css"

And that file has to go at the same level as trackList.json.

An example CSS entry to specify the feature type lnc_RNA might be:

 .minus-lnc_RNA .neat-UTR,
 .plus-lnc_RNA .neat-UTR,
 .lnc_RNA .neat-UTR{
         height: 12px;
         margin-top: 2px;
         color: rgb(200,2,3);
         background-color: rgb(5,4,255) !important;
 }

For Canvas and HTML track configuration options, please see the JBrowse documentation for additional details.

Hiding JBrowse tracks from the public

To hide public tracks from public organisms add apollo.permission.level.private line to your JBrowse track:

      {
         "compress" : 0,
         "key" : "GeneData_hidden",
         "label" : "GeneData_hidden",
         "storeClass" : "JBrowse/Store/SeqFeature/NCList",
         ... 
         "apollo":{
             "permission":{
                 "level":"private"
             }
         },
         ... 
         "trackType" : null,
         "type" : "FeatureTrack",
         "urlTemplate" : "tracks/GeneData/{refseq}/trackData.json"
      },

Only owners can edit

Restricts deletion and reverting to original editor or admin user by setting:

apollo.only_owners_delete = true

Chado Export Configuration

Following are the steps for setting up a Chado data source that is compatible with Apollo Chado Export.

Create a Chado database

First create a database in PostgreSQL for Chado.

Note: Initial testing has only been done on PostgreSQL.

Default name is apollo-chado and apollo-production-chado for development and production environment, respectively.

Create a Chado user

Now, create a database user that has all access privileges to the newly created Chado database.

Load Chado schema and ontologies

Apollo assumes that the Chado database has Chado schema v1.2 or greater and has the following ontologies loaded:

  1. Relations Ontology
  2. Sequence Ontology
  3. Gene Ontology

The quickest and easiest way to do this is to use prebuilt Chado schemas. Apollo provides a prebuilt Chado schema with the necessary ontologies. (thanks to Eric Rasche at Center for Phage Technology, TAMU)

Users can load this prebuilt Chado schema as follows:

scripts/load_chado_schema.sh -u <USER> -d <CHADO_DATABASE> -h <HOST> -p <PORT> -s <CHADO_SCHEMA_SQL>

If there is already an existing database with the same name and if you would like to dump and create a clean database:

scripts/load_chado_schema.sh -u <USER> -d <CHADO_DATABASE> -h <HOST> -p <PORT> -s <CHADO_SCHEMA_SQL> -r

The ‘-r’ flag tells the script to perform a pg_dump if <CHADO_DATABASE> exists.

e.g.,

scripts/load_chado_schema.sh -u postgres -d apollo-chado -h localhost -p 5432 -r -s chado-schema-with-ontologies.sql.gz

The file chado-schema-with-ontologies.sql.gz can be found in Apollo/scripts/ directory.

The load_chado_schema.sh script creates log files which can be inspected to see if loading the schema was successful.

Note that you will also need to do this for your testing and production instances, as well.

Configure data sources

In apollo-config.groovy, uncomment the configuration for datasource_chado and specify the proper database name, database user name and database user password.

Export via UI

Users can export existing annotations to the Chado database via the Annotator Panel -> Ref Sequence -> Export.

Export via web services

Users can also leverage the Apollo web services API to export annotations to Chado. As a demonstration, a sample script, export_annotations_to_chado.groovy is provided.

Usage for the script:

export_annotations_to_chado.groovy -organism ORGANISM_COMMON_NAME -username APOLLO_USERNAME -password APOLLO_PASSWORD -url http://localhost:8080/apollo

Data generation pipeline

Using the methods below you can generate and update a trackList.json and then make any further manual required.

Canvas vs HTML in Apollo

Most of the JBrowse documentation for configuring tracks applies here. However, there are a few important points about Canvas vs HTML tracks.

It should be noted that if you need both benefits of each track type, you are free to duplicate that track and use an alternate styling or track type.

BigWig tracks are only shown as Canvas.

HTML Tracks

Pros: Create evidence by dragging and clicking on annotation or evidence does show the alignment. Can use CSS styling. Cons: Renders slower.

Note that in most cases regular HTML rendering will be preferable. Exceptions would be dense BAM alignments and dense Variant tracks where you are attempting to display the density at a higher resolution.

HTML Track mapping with type=<?> :

  • Annotation / GFF3: FeatureTrack,NeatHTMLFeatures/View/Track/NeatFeature, JBrowse/View/Track/HTMLFeatures, WebApollo/View/Track/DraggableNeatHTMLFeatures
  • Alignment: JBrowse/View/Track/Alignments,WebApollo/View/DraggableAlignments
  • Variant: JBrowse/View/Track/HTMLVariants,WebApollo/View/Track/WebApolloHTMLVariants

Canvas Tracks

Pros: Renders faster, non-CSS style options. Cons: Can not drag to create evidence, clicking on annotation or evidence does not show alignment.

Note that in most cases regular HTML rendering will be preferable. Exceptions would be dense BAM alignments and dense Variant tracks where you are attempting to display the density at a higher resolution.

Canvas Track mapping with type=<?> :

  • Annotation / GFF3: NeatCanvasFeatures/View/Track/NeatFeature, JBrowse/View/Track/CanvasFeatures, WebApollo/View/Track/WebApolloNeatCanvasFeatures
  • Alignment: JBrowse/View/Track/Alignments2,WebApollo/View/Track/WebApolloAlignments2
  • Variant: JBrowse/View/Track/CanvasVariants,WebApollo/View/Track/WebApolloCanvasVariants

Apollo Automated Configuration and upload

Admin users may upload FASTA files to create new genomes and upload most track types in a similar manner if a default configuration is desirable.

_images/AddGenomeSmall.png

Additionally admin users may also add most tracks in a similar fashion:

_images/AddTrackSmall.png

JBrowse Configuration

The manual data generation pipeline is based on the typical jbrowse commands such as prepare-refseqs.pl and flatfile-to-json.pl, and these scripts are automatically copied to a local bin/ directory when you run the setup scripts (e.g. apollo run-local or apollo deploy or install_jbrowse.sh).

If you don’t see a bin/ subdirectory containing these scripts after running the setup, check setup.log and check the troubleshooting guide for additional tips or feel free to post the error and setup.log on GitHub or the mailing list.

prepare-refseqs.pl

The first step to setup the genome browser is to load the reference genome data. We’ll use the prepare-refseqs.pl script to output to the data directory.

bin/prepare-refseqs.pl --fasta pyu_data/scf1117875582023.fa --out /opt/apollo/data

If you want to use an indexed FASTA genome then you can run prepare-refseqs.pl as follows:

bin/prepare-refseqs.pl --indexed_fasta pyu_data/scf1117875582023.fa --out /opt/apollo/data

The script will copy the genome FASTA and its FAI index into the output folder.

Note: the output directory is used later when we load the organism into the browser with the “Create organism” form

flatfile-to-json.pl

The flatfile-to-json.pl script can be used to load GFF3 files and you can customize the feature types. Here, we’ll start off by loading data from the MAKER GFF for the Pythium ultimum data. The simplest loading command specifies a –trackLabel, the –type of feature to load, the –gff file and the –out directory.

bin/flatfile-to-json.pl --gff pyu_data/scf1117875582023.gff --type mRNA \
        --trackLabel MAKER --out /opt/apollo/data

Note: you can also use the command bin/maker2jbrowse for loading the MAKER data.

Also see the section Customizing features section for more information on customizing the CSS styles of the Apollo features.

Note: Apollo uses features that are loaded at the “transcript” level. If your GFF3 has “gene” features with “transcript”/”mRNA” child features, make sure that you use the argument –type mRNA or –type transcript.

generate-names.pl

Once data tracks have been created, you can generate a searchable index of names using the generate-names.pl script:

bin/generate-names.pl --verbose --out /opt/apollo/data

This is optional but useful step to index of names and features and refseq names. If you have some tracks that have millions of features, consider only indexing select tracks with the –tracks argument or disabling autocomplete with --completionLimit 0.

add-bam-track.pl

Apollo natively supports BAM files and the file can be read (in chunks) directly from the server with no preprocessing.

To add a BAM track, copy the .bam and .bam.bai files to your data directory, and then use the add-bam-track.pl to add the file to the tracklist.

mkdir /opt/apollo/data/bam
cp pyu_data/simulated-sorted.bam /opt/apollo/data/bam
cp pyu_data/simulated-sorted.bam.bai /opt/apollo/data/bam
bin/add-bam-track.pl --bam_url bam/simulated-sorted.bam \
   --label simulated_bam --key "simulated BAM" -i /opt/apollo/data/trackList.json

Note: the bam_url parameter is a URL that is relative to the data directory. It is not a filepath! Also, the .bai will automatically be located if it is simply the .bam with .bai appended to it.

add-bw-track.pl

Apollo also has native support for BigWig files (.bw), so no extra processing of these files is required either.

To use this, copy the BigWig data into the jbrowse data directory and then use the add-bw-track.pl to add the file to the tracklist.

mkdir /opt/apollo/data/bigwig
cp pyu_data/*.bw /opt/apollo/data/bigwig
bin/add-bw-track.pl --bw_url bigwig/simulated-sorted.coverage.bw \
    --label simulated_bw --key "simulated BigWig"

Note: the bw_url parameter is a URL that is relative to the data directory. It is not a filepath!

Customizing different annotation types (advanced)

To change how the different annotation types look in the “User-created annotation” track, you’ll need to update the mapping of the annotation type to the appropriate CSS class. This data resides in client/apollo/json/annot.json, which is a file containing Apollo tracks that is loaded by default. You’ll need to modify the JSON entry whose label is Annotations. Of particular interest is the alternateClasses element. Let’s look at that default element:

"alternateClasses": {
    "pseudogene" : {
       "className" : "light-purple-80pct",
       "renderClassName" : "gray-center-30pct"
    },
    "tRNA" : {
       "className" : "brightgreen-80pct",
       "renderClassName" : "gray-center-30pct"
    },
    "snRNA" : {
       "className" : "brightgreen-80pct",
       "renderClassName" : "gray-center-30pct"
    },
    "snoRNA" : {
       "className" : "brightgreen-80pct",
       "renderClassName" : "gray-center-30pct"
    },
    "ncRNA" : {
       "className" : "brightgreen-80pct",
       "renderClassName" : "gray-center-30pct"
    },
    "miRNA" : {
       "className" : "brightgreen-80pct",
       "renderClassName" : "gray-center-30pct"
    },
    "rRNA" : {
       "className" : "brightgreen-80pct",
       "renderClassName" : "gray-center-30pct"
    },
    "repeat_region" : {
       "className" : "magenta-80pct"
    },
    "transposable_element" : {
       "className" : "blue-ibeam",
       "renderClassName" : "blue-ibeam-render"
    }
}

For each annotation type, you can override the default class mapping for both className and renderClassName to use another CSS class. Check out the Customizing features section for more information on customizing the CSS classes.

Customizing features

The visual appearance of biological features in Apollo (and JBrowse) is handled by CSS stylesheets with HTMLFeatures tracks. Every feature and subfeature is given a default CSS “class” that matches a default CSS style in a CSS stylesheet. These styles are are defined in client/apollo/css/track_styles.css and client/apollo/css/webapollo_track_styles.css. Additional styles are also defined in these files, and can be used by explicitly specifying them in the –className, –subfeatureClasses, –renderClassname, or –arrowheadClass parameters to flatfile-to-json.pl (see data loading section).

Apollo differs from JBrowse in some of it’s styling, largely in order to help with feature selection, edge-matching, and dragging. Apollo by default uses invisible container elements (with style class names like “container-16px”) for features that have children, so that the children are fully contained within the parent feature. This is paired with another styled element that gets rendered within the feature but underneath the subfeatures, and is specified by the --renderClassname argument to flatfile-to-json.pl. Exons are also by default treated as special invisible containers, which hold styled elements for UTRs and CDS.

It is relatively easy to add other stylesheets that have custom style classes that can be used as parameters to flatfile-to-json.pl. For example, you can create /opt/apollo/data/custom_track_styles.css which contains two new styles:

    .gold-90pct, 
    .plus-gold-90pct, 
    .minus-gold-90pct  {
        background-color: gold;
        height: 90%;
        top: 5%;
        border: 1px solid gray;
    }

    .dimgold-60pct, 
    .plus-dimgold-60pct, 
    .minus-dimgold-60pct  {
        background-color: #B39700;
        height: 60%;
        top: 20%;
    }

In this example, two subfeature styles are defined, and the top property is being set to (100%-height)/2 to assure that the subfeatures are centered vertically within their parent feature. When defining new styles for features, it is important to specify rules that apply to plus-stylename and minus-stylename in addition to stylename, as Apollo adds the “plus-” or “minus-” to the class of the feature if the the feature has a strand orientation.

You need to tell Apollo where to find these styles by modifying the JBrowse config or the plugin config, e.g. by adding this to the trackList.json

    "css" : "data/custom_track_styles.css"

Then you may use these new styles using –subfeatureClasses, which uses the specified CSS classes for your features in the genome browser, for example:

    bin/flatfile-to-json.pl --gff MyFile.gff \
       --type mRNA --trackLabel MyTrack      \
       --subfeatureClasses '{"CDS":"gold-90pct","UTR": "dimgold-60pct"}'

Bulk loading annotations to the user annotation track

GFF3

You can use the tools/data/add_features_from_gff3_to_annotations.pl script to bulk load GFF3 files with transcripts to the user annotation track. Let’s say we want to load our maker.gff transcripts.

    tools/data/add_features_from_gff3_to_annotations.pl \
        -U localhost:8080/Apollo -u web_apollo_admin -p web_apollo_admin \
        -i scf1117875582023.gff -t mRNA -o "name of organism"

The default options should be able to handle most GFF3 files that contain genes, transcripts, and exons.

You can still use this script even if the GFF3 file that you are loading does not contain transcripts and exon types. Let’s say we want to load match and match_part features as transcripts and exons respectively. We’ll use the blastn.gff file as an example.

    tools/data/add_features_from_gff3_to_annotations.pl \
       -U localhost:8080/Apollo -u web_apollo_admin -p web_apollo_admin \
       -i cf1117875582023.gff -t match -e match_part -o "name of organism"

You can view the add_features_from_gff3_to_annotations.pl help (-h) option for all available options.

Note: Apollo makes a clear distinction between a transcript and an mRNA. Genes that have mRNA as its child feature are treated as protein coding annotations and Genes that have transcript as its child feature are treated as non-coding annotations, specifically a pseudogene.

Note: In order to create meaningful names from your evidence when creating manual annotations, the GFF3 should provide the Name attribute in column 9 of the GFF3 spec as shown in this example:

NC_000001.11    BestRefSeq      gene    11874   14409   .       +       .       ID=gene1;Name=DDX11L1;Dbxref=GeneID:100287102,HGNC:37102;description=DEAD%2FH %28Asp-Glu-Ala-Asp%2FHis%29 box helicase 11 like 1;gbkey=Gene;gene=DDX11L1;pseudo=true

If you would like to look at a compatible representative GFF3, export annotations from Apollo via GFF3 export.

Disable draggable

Apollo has a number of specific track config parameters

overrideDraggable (boolean)
determines whether to transform the alignments tracks to draggable alignments

overridePlugins (boolean)
determines whether to transform alignments and sequence tracks

These can be specified on a specific track or in a global config.

Troubleshooting guide

Tomcat memory

Typically, the default memory allowance for the Java Virtual Machine (JVM) is too low. The memory requirements for Web Apollo will depend on many variables, but in general, we recommend at least 512g for the maximum memory and 512m for the minimum, though a 2 GB maximum seems to be optimal for most server configurations.

Suggested Tomcat memory settings

export CATALINA_OPTS="-Xms512m -Xmx2g \
        -XX:+CMSClassUnloadingEnabled \
        -XX:+CMSPermGenSweepingEnabled \
        -XX:+UseConcMarkSweepGC"

In cases where the assembled genome is highly fragmented, additional tuning of memory requirements and garbage collection will be necessary to maintain the system stable. Below is an example from a research group that maintains over 40 Apollo instances with assemblies that range from 1,000 to 150,000 scaffolds (reference sequences) and over one hundred users:

export CATALINA_OPTS="-Xmx12288m -Xms8192m \
        -XX:ReservedCodeCacheSize=64m \
        -XX:+UseG1GC \
        -XX:+CMSClassUnloadingEnabled \
        -Xloggc:$CATALINA_HOME/logs/gc.log \
        -XX:+PrintHeapAtGC \
        -XX:+PrintGCDetails \
        -XX:+PrintGCTimeStamps"

To change your settings, you can usually edit the setenv.sh script in $TOMCAT_BIN_DIR/setenv.sh where $TOMCAT_BIN_DIR is the directory where the Tomcat binaries reside. It is possible that this file doesn’t exist by default, but it will be picked up when Tomcat restarts. Make sure that tomcat can read the file.

In most cases, creating the setenv.sh should be sufficient but you may have to edit a catalina.sh or another file directly depending on your system and tomcat setup. For example, on Ubuntu, the file /etc/default/tomcat7 often contains these settings.

Confirm your settings

Your CATALINA_OPTS settings from setenv.sh can be confirmed with a tool like jvisualvm or via the command line with the ps tool. e.g. ps -ef | grep java should yield something like the following allowing you to confirm that your memory settings have been picked up.

root      9848     1  0 Oct22 ?        00:36:44 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Djava.util.logging.config.file=/usr/local/tomcat/current/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Xms1g -Xmx2g -XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled -XX:+UseConcMarkSweepGC -Dj

Re-install after changing settings

If you start seeing memory leaks (java.lang.OutOfMemoryError: Java heap space) after doing an update, you might try re-installing, as the live re-deploy itself can cause memory leaks or an inconsistent software state.

If you have named your web application named Apollo.war then you can remove all of these files from your webapps directory and re-deploy.

  • Run apollo deploy
  • Undeploy any existing Apollo instances
  • Stop tomcat
  • Copy the war file to the webapps folder
  • Start tomcat

Tomcat permissions

Preferably, when running Apollo or any webserver, you should not run Tomcat as root. Therefore, when deploying your war file to tomcat or another web application server, you may need to tune your file permissions to make sure Tomcat is able to access your files.

On many production systems, tomcat will typically belong to a user and group called something like ‘tomcat’. Make sure that the ‘tomcat’ user can read your “webapps” directory (where you placed your war file) and write into the annotations and any other relevant directory (e.g. tomcat/logs). As such, it is sometimes helpful to add the user you logged-in as to the same group as your tomcat user and set group write permissions for both.

Consider using a package manager to install Tomcat so that proper security settings are installed, or to use the jsvc http://tomcat.apache.org/tomcat-7.0-doc/security-howto.html#Non-Tomcat_settings

Errors with JBrowse

JBrowse tools don’t show up in bin directory (or install at all) after install or typing install_jbrowse.sh

If the bin directory with JBrowse tools doesn’t show up after calling install_jbrowse.sh JBrowse is having trouble installing itself for a few possible reasons. If these do not work, please observe the JBrowse troubleshooting and JBrowse install pages, as well and the setup.log file created during the installation process.

cpanm or other components are not installed

Make sure the appropriate JBrowse libraries are installed on your system.

If you see chmod: cannot access `web-app/jbrowse/bin/cpanm': No such file or directory make sure to install cpanm.

Git tool is too old

Git expects to clone a single branch which is supported in git 1.7.10 and greater. The output when that fails looks something like this:

Buildfile: build.xml

copy.apollo.plugin.webapp:

setup-jbrowse:

git.clone:
[exec] Result: 129

The solution is to upgrade git to 1.7.10 or greater or remove the line with the --single-branch option in build.xml.

Accessing git behind a firewall.

If you are behind a firewall, checking out code using the git:// protocol may not be allowed, but that is the default. The output will look something like this:

setup-jbrowse:

git.clone:
     [exec] Submodule 'src/FileSaver' (git://github.com/dkasenberg/FileSaver.js.git) registered for path 'src/FileSaver'
     [exec] Submodule 'src/dbind' (git://github.com/rbuels/dbind.git) registered for path 'src/dbind'
    . . . .
     [exec] Submodule 'src/xstyle' (git://github.com/kriszyp/xstyle.git) registered for path 'src/xstyle'
     [exec] Result: 1

with possibly more output below.

Type:

git config --global url."https://".insteadOf git://

in the command-line and then re-install using ./apollo clean-all ./apollo run-local (or deploy).

e.g. “Can’t locate Hash/Merge.pm in @INC” or “Can’t locate JBlibs.pm in @INC”

If you are trying to run the jbrowse binaries but get these sorts of errors, try running install_jbrowse.sh which will initialize as many pre-requisites as possible including JBLibs and other JBrowse dependencies.

Rebuilding JBrowse

You can manually clear jbrowse files from web-app/jbrowse and re-run apollo deploy to rebuild JBrowse.

RequestError: Unable to load … Apollo2/jbrowse/data/trackList.json status: 500

Apollo2 does fairly strict JSON validation so make sure your trackList.json file is valid JSON

If you still get this error after validating please forward the issue to our github issue tracker.

Complaints about 8080 being in use

Please check that you don’t already have a tomcat running netstat -tan | grep 8080. Sometimes tomcat does not exit properly. ps -ef | grep java and then kill -9 the offending processing.

Note that you can also configure tomcat to run on different ports, or you can launch a temporary instance of apollo with apollo run-local 8085 for example to avoid the port conflict.

Unable to open the h2 / default database for writing

If you receive an error similar to this:

SEVERE: Unable to create initial connections of pool.
org.h2.jdbc.JdbcSQLException: Error opening database: 
    "Could not save properties /var/lib/tomcat7/prodDb.lock.db" [8000-176]

Then this is due to the production server trying to write an h2 instance in an area it doesn’t have permissions to. If you use H2 (which is great for testing or single-user user, but not for full-blown production) make sure that:

You can modify the specified data directory for the H2 database in the apollo-config.groovy. For example, using the /tmp/ directory, or some other directory:

url = "jdbc:h2:/tmp/prodDb;MVCC=TRUE;LOCK_TIMEOUT=10000;DB_CLOSE_ON_EXIT=FALSE"

This will write a H2 db file to /tmp/prodDB.db. If you don’t specify an absolute path it will try to write in the same directory that tomcat is running in e.g., /var/lib/tomcat7/ which can have permission issues.

More detail on database configuration when specifying the apollo-config.groovy file is available in the setup guide.

Grails cache errors

In some instances you can’t write to the default cache location on disk. Part of an example config log:

2015-07-03 14:37:39,675 [main] ERROR context.GrailsContextLoaderListener  - Error initializing the application: null
java.lang.NullPointerException
        at grails.plugin.cache.ehcache.GrailsEhCacheManagerFactoryBean$ReloadableCacheManager.rebuild(GrailsEhCacheManagerFactoryBean.java:171)
        at grails.plugin.cache.ehcache.EhcacheConfigLoader.reload(EhcacheConfigLoader.groovy:63)
        at grails.plugin.cache.ConfigLoader.reload(ConfigLoader.groovy:42)

There are several solutions to this, but all involve updating the apollo-config.groovy file to override the caching defined in the Config.groovy.

Disabling the cache:

    grails.cache.config = {
        cache {
            enabled = false
            name 'globalcache'
        }
    }

This can also be done by removing the plugin. In grails-app/conf/BuildConfig remove / comment out the line and re-building:

 compile ':cache-ehcache:1.0.5'

Disallow writing overflow to disk

Can be used for small instances

    grails.cache.config = {
        // avoid ehcache naming conflict to run multiple WA instances
        provider {
            name "ehcache-apollo-"+(new Date().format("yyyyMMddHHmmss"))
        }
        cache {
            enabled = true
            name 'globalcache'
            eternal false
            overflowToDisk false   // THIS IS THE IMPORTANT LINE
            maxElementsInMemory 100000
        }
    }

Specify the overflow directory

Best for high load servers, which will need the cache. Make sure your tomcat / web-server user can write to that directory:

    // copy from Config.groovy except where noted
    grails.cache.config = {
    ... 
        cache {
        ...  
            maxElementsOnDisk 10000000
            // this is the important part, below!
            diskStore{
                path '/opt/apollo/cache-directory'
            }
        }
        ...
    }

JSON in the URL with newer versions of Tomcat

When JSON is added to the URL string (e.g., addStores and addTracks) you may get this error with newer patched versions of Tomcat 7.0.73, 8.0.39, 8.5.7:

 java.lang.IllegalArgumentException: Invalid character found in the request target. The valid characters are defined in RFC 7230 and RFC 3986

To fix these, the best solution we’ve come up with (and there may be many) is to explicitly allow these characters, which you can do starting with Tomcat versions: 7.0.76, 8.0.42, 8.5.12. This is done by adding the following line to $CATALINA_HOME/conf/catalina.properties:

tomcat.util.http.parser.HttpParser.requestTargetAllow=|{}

Information on the grails ehcache plugin (see “Overriding values”) and ehcache itself.

Java mismatch

If you get an Unsupported major.minor error or similar, please confirm that the version of java that tomcat is running ps -ef | grep java is the same as the one you used to build. Setting JAVA_HOME to the Java 8 JDK should fix most problems.

Mysql invalid TimeStamp error

For certain version of MySQL we might get errors of this nature:

SQLException occurred when processing request: [GET] /apollo/annotator/getAppState Value ‘0000-00-00 00:00:00’ can not be represented as java.sql.Timestamp. Stacktrace follows: java.sql.SQLException: Value ‘0000-00-00 00:00:00’ can not be represented as java.sql.Timestamp

The fix is to set the zeroDateTimeBehavior=convertToNull to the url connect screen. Originally identified here. Here is an example URL:

jdbc:mysql://localhost/apollo_production?zeroDateTimeBehavior=convertToNull&autoReconnect=true&characterEncoding=UTF-8&characterSetResults=UTF-8

Example Build Script on Unix with MySQL

This is an example build script. It may NOT be appropriate for your environment but does demonstrate what a typical build process might look like on a Unix system using MySQL.

Please consult our Setup and Configuration documentation for additional information.

# Install prereqs
apt-get install tomcat8 git ant openjdk-8-jdk nodejs
# Upped tomcat memory per Apollo devs instructions:
echo "export CATALINA_OPTS="-Xms512m -Xmx1g \
              -XX:+CMSClassUnloadingEnabled \
              -XX:+CMSPermGenSweepingEnabled \
              -XX:+UseConcMarkSweepGC" >> /etc/default/tomcat8

# Download and extract their tarball
npm install -g yarn
wget https://github.com/GMOD/Apollo/archive/2.5.0.tar.gz
mv 2.5.0.tar.gz Apollo-2.5.0.tar.gz
tar xf Apollo-2.5.0.tar.gz

Setup apollo mysql user and database

# Login to mysql e.g., 
mysql -u root
# Create a user 
CREATE USER 'apollo'@'localhost' IDENTIFIED BY 'THE_PASSWORD';
CREATE DATABASE `apollo-production`;
GRANT ALL PRIVILEGES ON `apollo-production`.* To 'apollo'@'localhost';

Configure apollo for mysql.

cd ~/src/Apollo-2.5.0
# Let's store the config file outside of the source tree.
mkdir ~/apollo.config
# Copy the template
cp sample-mysql-apollo-config.groovy ~/apollo.config/apollo-config.groovy 
ln -s ~/apollo.config/apollo-config.groovy
# For now, turn off tomcat8 so that we can see if the locally-run version works service tomcat8 stop
# Run the local version, which verifies install reqs, and does a bunch of stuff (see below) 
cd Apollo-2.5.0
./apollo run-local

If a pre-installed instance:

rm -rf /var/lib/tomcat/webapps/apollo
rm -f /var/lib/tomcat/webapps/apollo.war

Startup tomcat again:

service tomcat8 start

Create file target/apollo-2.5.0.war by running:

./apollo deploy    

and copy it into the war area where it is automatically:

sudo cp target/apollo-2.5.0.war /var/lib/tomcat/webapps/apollo.war

Prepare JBrowse data

Add the FASTA assembly:

~/src/Apollo-2.5.0/jbrowse/bin/prepare-refseqs.pl \
--fasta /research/dre/assembly/assembly1.fasta.gz \
--out ~/organisms/dre

Add annotations:

~/src/Apollo-2.5.0/jbrowse/bin/flatfile-to-json.pl \
--gff /research/dre/annotation/FINAL_annotations/ssc_v4.gff \ 
--type mRNA --trackLabel Annotations --out ~/organisms/dre

In interface point to directory ~/organisms/dre when loading organism.

Adding OpenID Connect Authentication to Apollo

Overview

OpenID Connect (https://openid.net/connect/), or OIDC, is an authentication layer that uses the OAuth 2 protocol.

It allows you to devolve authentication to an Identity Provider. It is the Identity Provider who registers users, provides the infrastructure (back end database, log-in page, “forgotten your password?” functions, etc.), and bears Data Protection responsibility – though of course it means that they ultimately have control over who can access your Apollo instance.

It is most likely to be useful if you have some relation to an Identity Provider that represents your organization or user community, or if you intend to provide public access and only require an arbitrary identifier for an end user (to ensure it is the same individual each time they start an Apollo session) without any need to know their real-world identity.

Architecture

The method described here uses the following components:

  1. An Apache httpd 2.4 web server (https://httpd.apache.org/)
  2. The mod_auth_openidc Apache module (https://www.mod-auth-openidc.org/)
  3. The remoteUserAuthenticatorService class provided by Apollo

The Apache httpd is deployed to provide a reverse proxy as the sole point of access to Apollo for end users. Its primary role is to allow the use of mod_auth_openidc to add OIDC access control in front of Apollo, but of course it is also an efficient way to serve static content (e.g. user guides). It also makes it very easy to place multiple services or sources of content behind the same access control layer. Documentation for setting up a reverse proxy is available at https://httpd.apache.org/docs/2.4/howto/reverse_proxy.html.

mod_auth_openidc is the module that adds OIDC authentication to Apache. The usage described here is only the simplest case, but this module offers a lot of functionality, including the option of letting end users choose between multiple Identity Providers. The module will intercept requests for protected resources, and redirect the end user to the Identity Provider so they can log in; after a successful log in, the end user is redirected back to the httpd, which serves the protected content. OIDC data are made available in the Apache environment, so that applications run by the httpd (including the reverse proxy server) can access values such as the authenticated User Identifier.

The RemoteUserAuthenticatorService class in Apollo is part of the standard distribution. It is used to grant access to end users who present with the REMOTE_USER HTTP header. The Apache reverse proxy can configured to pass the User Identifier, retrieved during OIDC authentication, as a REMOTE_USER header: thus users who have successfully authenticated via OIDC will be granted access to Apollo.

Securing Tomcat

Because RemoteUserAuthenticatorService gives access to any end user who sends a REMOTE_USER header (and any reasonably savvy user can add whatever HTTP headers they wish to any request sent by their browser), the Tomcat that serves Apollo must not be directly accessible to untrusted end users. For example, the Tomcat port could be made accessible only on localhost, or on a corporate private network.

Authorization

The end result of the process described above is an end user who is authenticated, i.e. you know the request comes from someone who was able to log in to an account associated with the User identifier you received. The user is not authorized at this stage, i.e. they have no permission to access any non-public resources within Apollo.

You can configure Apollo automatically to add all Remote Users to a default user group, which will authorize them according to the access permissions granted to that group. If you are willing immediately to grant full access to all Remote Users authenticated by your Identity provider, this is all that is required.

If you want to have finer control over end users’ access rights, you can use normal admin processes (user interface or API) to grant access – but be aware that end users’ accounts are only created in Apollo when they log in for the first time. That means you cannot grant them access until after they first log in – so their first session in Apollo will consist of nothing but a “you are not authorized to view any organisms” message. It is probably better to have a default group for Remote Users that provides limited (read only?) access, and a process for adding additional access.

Configuration

Apache 2.4 reverse proxy configuration

Apache can be configured to add the reverse proxy server independently from adding the OIDC access control (it is probably a good idea to add reverse proxying first as it will make any configuration problems easier to find). Reverse proxying on its own should be completely transparent to end users.

The correct Apache proxy configuration described in the Apollo documentation at https://genomearchitect.readthedocs.io/en/latest/Configure.html#apache-proxy

This should result in these four proxying modules being enabled in your httpd conf file(s), with directives like this:

LoadModule proxy_module modules/mod_proxy.so
LoadModule proxy_connect_module modules/mod_proxy_connect.so
LoadModule proxy_http_module modules/mod_proxy_http.so
LoadModule proxy_wstunnel_module modules/mod_proxy_wstunnel.so

If you wish, you should be able to edit the Apache configuration manually to enable these modules, rather than use a2enmod. The directives above should already be in the distributed httpd.conf file, commented out.

It is a good idea to use a VirtualHost directive to control the requests which are proxied. For instance to proxy all requests on port 80 to a tomcat running on the same machine using port 8080:

<Proxy *>
   Require all granted
</Proxy>

<VirtualHost *:80>

   ServerAdmin       <your admin email>
   ServerName        <your apollo host>
   ProxyPreserveHost On
   ProxyRequests     Off

   ProxyPass         /stomp/info    http://localhost:8080/stomp/info
   ProxyPassReverse  /stomp/info    http://localhost:8080/stomp/info

   ProxyPass         /stomp         ws://localhost:8080/stomp
   ProxyPassReverse  /stomp         ws://localhost:8080/stomp

   ProxyPass         /              http://localhost:8080/
   ProxyPassReverse  /              http://localhost:8080/

</VirtualHost>

(Substitute your admin/support email address and the host name where indicated.)

You can tweak the VirtualHost settings so that requests are proxied according to criteria such as port, host name etc. (see Apache 2.4 documentation for details). This can be useful to give you access direct to Apollo, bypassing the OIDC log in, say for admin access or local user accounts (see below).

Dockerfile

This Dockerfile will provide an httpd container (including the install of mod_auth_openidc, used in the next section). Put the conf file(s) you wish the httpd to use in apache2-config (a subdirectory of the docker build directory).

FROM  httpd:2.4

ENV   DEBIAN_FRONTEND noninteractive

RUN   apt-get -qq update && \
      apt-get install --yes ca-certificates libapache2-mod-auth-openidc

COPY  apache2-config/  /usr/local/apache2/conf/

If you are using docker, the Apache reverse proxy configuration will need to refer to the host running the Apollo tomcat server (if you were to use localhost in the proxy configuration, it would refer to the docker container in which the httpd is running, not to the host machine on which you are running the container). It is good practice in docker for the tomcat to run in a separate container from the httpd. On a docker network, you use container names as host names; so if the tomcat container was named apollo-tomcat, then you would use http://apollo-tomcat:8080/ (etc.) in the proxy configuration directives.

mod_auth_openidc configuration

Before starting this part of the configuration, you will need to register with an OIDC provider. If you do not have one already, the developer tools provided by ORCID (https://orcid.org/developer-tools) allow a quick and easy set up for personal development use.

OIDC can be enabled with this addition to the httpd configuration:

LoadModule auth_openidc_module /usr/lib/apache2/modules/mod_auth_openidc.so

<Location />
   AuthType openid-connect
   Require  valid-user
</Location>
<Location /public>
   AuthType None
   Require  all granted
</Location>

OIDCPassClaimsAs        environment

OIDCProviderMetadataURL <URL provided by your identity provider>
OIDCClientID            <ID issued to you by your identity provider>
OIDCClientSecret        <secret issued to you by your identity provider>

OIDCScope               "openid email profile"
# vanity URL points protected path but NOT to any content
OIDCRedirectURI         http://<your host name>/apollo/annotator/openid
OIDCCryptoPassphrase    <generate your own random string for this>

Your Identity Provider will give you the values for OIDCProviderMetadataURL (conventionally at /.well-known/openid-configuration), OIDCClientID and OIDCClientSecret when you register your client. Refer to their documentation to find supported scopes to include in OIDCScope; openid is required for authentication, but email and profile (requests for email address and profile information) will probably be supported as well if you need them.

When you register, you will also need to provide the URL to which the Identity Provider should redirect end users after authentication. Add this to your Apache configuration as OIDCRedirectURI. This URL is used internally by mod_auth_openidc and it should not be a “real” URL that references actual content – but must be something covered by the OIDC access control (see below).

OIDCCryptoPassphrase is used internally; just create a random string.

Combining OIDC with the proxy

Once OIDC has been enabled as desribed above, access control is just standard Apache 2.4 configuration, with openid-connect as the AuthType. This is commonly done using Location directive(s) to define the path(s) of content to which access control is applied, but Apache provides many flexible methods; e.g. for complicated access control rules, regular expression matching directives (like LocationMatch) are worth a look.

The final bit of Apache configuration required is a RequestHeader directive. This passes the OIDC User Identifier (which mod_auth_openidc makes available as an Apache environment variable) downstream in proxied requests as a REMOTE_USER HTTP header.

The example below extends the reverse proxy example (above), adding two Location directives that place all content served by this Virtual Host behind the OIDC access control, except content in /public which remains freely accessible; and adding a RequestHeader directive to send a REMOTE_USER header downstream to Apollo:

<Proxy *>
   Require all granted
</Proxy>

<VirtualHost *:80>

   ServerAdmin       <your admin email>
   ServerName        <your apollo host>
   ProxyPreserveHost On
   ProxyRequests     Off

   # OIDC log in will be required for everything...
   <Location />
      AuthType openid-connect
      Require  valid-user
   </Location>

   # ...except for public access content here
   <Location /public>
      AuthType None
      Require  all granted
   </Location>

   RequestHeader     set   Remote_User    "expr=%{REMOTE_USER}"

   ProxyPass         /stomp/info    http://localhost:8080/stomp/info
   ProxyPassReverse  /stomp/info    http://localhost:8080/stomp/info

   ProxyPass         /stomp         ws://localhost:8080/stomp
   ProxyPassReverse  /stomp         ws://localhost:8080/stomp

   ProxyPass         /              http://localhost:8080/
   ProxyPassReverse  /              http://localhost:8080/

</VirtualHost>

Apollo configuration

When the Apache is fully configured as described above, requests from all authenticated users will include the REMOTE_USER header. Apollo must be configured to use Remote User authentication, to make it grant access to all users who present with this header.

Add the following to apollo-config.groovy

apollo {
   authentications = [
        ["name":"Remote User Authenticator",
         "className":"remoteUserAuthenticatorService",
         "active":true,
         "params":["default_group": "remote_users"],
        ]
        ,
        ["name":"Username Password Authenticator",
         "className":"usernamePasswordAuthenticatorService",
         "active":true,
        ]
   ]
}

Note that params defines a default group. This is optional but recommended (see note above regarding authorization). The named group must have been created, with the user interface or API, and appropriate organism access rights defined. All Remote Users will be placed in this group when they first log in.

Maintaining administrator access

Access to the administrator account, or any other local user accounts that you want to use, require that OIDC authentication is bypassed – simply because that stops you seeing the Apollo log-in dialog.

If you have access direct to Tomcat on (probably) port 8080 of the host machine, you can use that for direct access to Apollo. Because you have bypassed the OIDC authentication, your browser will not send a REMOTE_USER header – so you should see the Apollo local user log-in dialog.

If direct access to the Tomcat port is a problem, you can simply add another Virtual Host to the Apache configuration. This can provide access via a different host name or port. For example, to give access on port 8000:

Listen 8000
<VirtualHost *:8000>

   ServerAdmin       <your admin email>
   ServerName        <your apollo host>
   ProxyPreserveHost On
   ProxyRequests     Off

   <Location />
      AuthType None
      Require  all granted
   </Location>

   ProxyPass         /stomp/info    http://localhost:8080/stomp/info
   ProxyPassReverse  /stomp/info    http://localhost:8080/stomp/info

   ProxyPass         /stomp         ws://localhost:8080/stomp
   ProxyPassReverse  /stomp         ws://localhost:8080/stomp

   ProxyPass         /              http://localhost:8080/
   ProxyPassReverse  /              http://localhost:8080/

</VirtualHost>

To keep the Apollo secure (REMOTE_USER headers are easily spoofed), if this port is openly accessible then some access control should be added, e.g. with Require ip <your.ip.address.here>.

Migration guide

This guide explains how to prepare your Apollo 2.x instance, and to migrate data from previous Web Apollo versions into 2.0.

In all cases you will need to follow the guide for setting up your 2.x instance.

Migration from Evaluation to Production:

If you are running your evaluation/development version using ./apollo run-local when you setup your production instance, any prior annotations will use a separate database.

If you are using the same production instance you can use scripts to delete all annotations and preferences:

scripts/delete_all_features.sh

or just the annotations:

scripts/delete_only_features.sh

If you want to start from scratch (including reloading organisms and users), you can just drop the database (when the server is not running) and the proper tables will be recreated on startup.

Migration from 2.0.X to 2.0.Y on production:

Installation from a downloaded release

  • Download the desired Apollo release from the bottom of each release. Official releases will be tagged as “Release” and have a green label.
  • Expand the archive.
  • Copy your existing apollo-config.groovy file into the directory.
  • Always backup your database!
  • Create a new war file as below: ./apollo deploy.
  • Turn off tomcat and remove the old apollo directory and .war file in the webapps folder.
  • Copy in new .war file with the same name.
  • Restart tomcat and you are ready to go.

Note if you you choose to have two different versions of Apollo running, though need to point to different database instances or you will experience problems.

Installation from a checked out github

If you want bleeding and only moderately tested code (not recommended unless you feel you know what you’re doing), you can clone Apollo directly from our source page https://github.com/GMOD/Apollo/

Any upgrading can be taken care of during a pull. Please note that as we sometimes change the version of JBrowse, so you should do:

./apollo clean-all

before building a target for production.

You can the follow the directions for deploying a downloaded release, above.

Migration from 1.0 to 2.0:

We provide examples in the form of [migration scripts](https://github.com/gmod/apollo/tree/master/docs/web_services/ examples) in the docs/web_services/examples directory. These tools are also described in the command line tools section.

We have written many of the command line tools examples using the groovy language, but mostly any language will work (Perl, shell/curl, Python, etc.).

Migrate Annotations

We provide a [migration script](https://github.com/gmod/apollo/tree/master/docs/web_services/examples/groovy/ migrate_annotations1to2.groovy) that connects to a single Web Apollo 1 instance and populates the annotations for an organism for a set of sequences / (confusingly called tracks as well). It would be best to develop your script on a development instance of Apollo2 for restricted sequences.

To get the scripts working properly, you’ll need to provide the list of sequences (or tracks) to migrate for each organism. You can get the list of tracks by either using the database (select * from tracks ;) or looking in the Web Apollo annotations directory

ls -1 /opt/apollo/annotations/ | grep Annotations | grep -v history | paste -s -d"," -

Migrate Users

You have to add users de novo using something like the add_users.groovy script. In this case you create a csv file with the email, name, password, and role (’user’ or ‘admin’). This is passed into the add_users.groovy script and users are added.

From Web Apollo 1, you should be able to pull user names out of the database select * from users ;, but there is not much overlap between users in Web Apollo1.x and Apollo2.x.

If you have only a few users, however, just adding them manually on the users will likely be easier.

Add Organisms

If possible adding organisms on the organisms tab is the easiest option if you only have a handful of organisms.

The [add_organism.groovy script](https://github.com/gmod/apollo/tree/master/docs/web_services/examples/groovy/ add_organism.groovy) can help automate this process if you have a large number of migrations to handle.

Demo

Please use our Demo Server with demo@demo.com / demo at login to play around with our features. Annotations are routinely removed so go wild.

Then, click on the “Ref Sequence” tab from the panel on the right to choose a Group to display. You can choose one of the Groups visible in the list, or you may type the name of the name of the Group in the “Search” box (for example: Group16.4 Group1.37; these groups have many gene models to get you started). You may now start annotating! Once you have displayed the first group, you may also choose to switch to a different Group from the drop-down menu in the navigation bar (e.g. Group1.10 Group1.33).

If you would like to also experience the “administrator user” please send us a request by email to the Apollo Developers Team.

The “Ref Sequence” selection panel on the right allows you to view all available reference sequences (e.g. scaffolds, groups, chromosomes, etc) and conduct bulk operations on those sequences (for example: exporting data).

You can choose a different organism from the drop-down menu on the upper left corner of the annotator panel. Please be aware you have access to the following organisms: Honeybee, Human-hg38, Yeast, and Volvox Fictious (the JBrowse demonstration sample organism). Choosing any of the other available options will show an error warning alerting you that you do not have sufficient permissions to perform the operation. Should you encounter this error, simply return to one of the organisms listed above.

If you are new to Apollo, we recommend that you read through our User Guide to learn more about the software and its functionality.

Please Note: We have not tested the current version of Apollo on Internet Explorer. If you are not able to use Apollo on IE, you will need to use a different browser such as Firefox or Google Chrome (both available free of cost).

Happy Annotating!

User’s Guide

This guide allows users to:

  • Become familiar with the environment of the Apollo annotation tool.
  • Understand Apollo’s functionality for the process of manual annotation.
  • Learn to corroborate and modify computationally predicted gene models using all available gene predictions and biological evidence available in Apollo.
  • Navigate through this user guide using the ‘Table of Contents’ at the bottom of this page.

General Information

General Process of Manual Annotation

The major steps of manual annotation using Apollo can be summarized as follows:

  • Locate a chromosomal region of interest.
  • Determine whether a feature in an existing evidence track provides a reasonable gene model to start annotating.
  • Drag the selected feature to the ‘User Annotation’ area, creating an initial gene model.
  • Use editing functions to edit the gene model if necessary.
  • Check your edited gene model for consistency with existing homologs by exporting the FASTA formatted sequence and searching a protein sequence database, such as UniProt or the NCBI Non Redundant (NR) database, and by conducting preliminary functional assignments using the Gene Ontology (GO) database.

When annotating gene models using Apollo, remember that you are looking at a ‘frozen’ version of the genome assembly. This means that you will not be able to modify the assembled genome sequence itself, but you will be able to instruct Apollo to take into account modifications to the reference sequence and calculate their consequences. For instance, for any given protein coding gene, Apollo is able to predict the consequences that deleting a string of nucleotide residues will have on the coding sequence.

Annotation

Apollo allows annotators to modify and refine the precise location and structure of the genome elements that predictive algorithms cannot yet resolve automatically. Using Apollo, annotators may corroborate or modify the structures of coding genes, pseudogenes, repeat regions, transposable elements, and non-coding RNAs (i.e: snRNA, snoRNA, rRNA, tRNA, and miRNA).

Annotating a gene

Below are detail about both biological principles and technical aspects to consider when editing a gene prediction.

Select the scaffold, chromosome or linkage group where you wish to conduct your annotations.
Search for a specific sequence

If you do not know the scaffold ID and have the sequence of a transcript or protein homolog related to your gene of interest, you might use the ‘Search Sequence’ feature to run a BLAT (BLAST-Like Alignment Tool) search. Querying the assembled genome using BLAT will determine the existence of a gene model prediction that is putatively homologous to your gene of interest. Click the ‘Tools’ item on the Apollo menu bar, and select ‘Sequence Search’ from the drop-down choices. Choose to run a Protein or Nucleotide BLAT search from the drop down menu as appropriate, and paste the string of residues to be used as query. Check the box labeled ‘Search all genomic sequences’ to search the entire genome.

The existence of paralogs may cause your query to match more than one scaffold or genomic range. Select the desired genomic range to be displayed in the Apollo Main Window. The result of your query will be displayed in the browser window behind the search box, highlighted in yellow. Close the window when you are satisfied with your results. You may read more about ‘Highlights’ below.

  • A word on Blat: Blat of DNA is designed to quickly find sequences of 95% and greater similarity of length 40 bases or more, and it may miss more divergent or shorter sequence alignments. On protein, Blat finds sequences of 80% and greater similarity to the query of length 20+ amino acids. Higher speed at the price of lesser homology depth make Blat a commonly used tool to look up the location of a sequence in the genome or determine the exon structure of an mRNA. Learn more about Blat here.
Initiating an annotation

If you have not already performed a Blat search to identify your gene of interest, you may do so at this point using the ‘Sequence search’ feature from the ‘Tools’ tab on the menu bar. You may also navigate along the scaffold using the navigation arrows. Your gene of interest may appear on the forward (sense) or reverse (anti-sense) strand. Gene predictions are labeled with identifiers, and users may retrieve additional information by selecting the entire model and using the right-click menu to select the ‘View details’ item.

After locating your gene of interest, display as many gene prediction and evidence tracks as you consider necessary to inform your annotation by ticking them from the list of available ‘Tracks’ in the ‘Annotator Panel’. Scroll through the different tracks of gene predictions and choose one that you consider most closely reflects the actual structure of the gene. It is also possible to filter the tracks displayed in this list by typing on the ‘Search’ box. You may base your decision on prior knowledge of the reliability of each gene prediction track (e.g., select an evidence-based gene model instead of an ab initio gene prediction). Alternatively, you may compare the gene prediction tracks to a BLAST alignment or other aligned data (e.g.: alignments of protein homologs, cDNAs and, RNAseq reads). Double click on any exon or click on one of the introns of your preferred gene model to select the entire gene model. You may also choose exons from two or more separate tracks of evidence. Drag the selected model, or all pieces of evidence into the ‘User-created Annotations’ area.

At this point you may download the protein sequence (see ‘Get Sequences’ below) to query a protein database and help you determine if the selected gene model is, biologically speaking, an accurate approximation to the gene. For example, you may perform a protein sequence search of UniProt or NCBI’s non-redundant peptide database (nr) using BLAST. If you have knowledge of protein domains in your gene of interest, you may perform a protein domain search of the InterPro databases to verify that your selected gene model contains the expected domains. If further investigation suggests that you have not selected the best gene model to start annotating, delete it by highlighting it (as described above) and using the ‘Delete’ function from the right-click menu.

Once a gene model is selected as the best starting point for annotation, the annotator must decide whether it needs further modification. Protein or domain database searches may have already informed this decision. Scroll down the evidence tracks to see if splice sites in transcript alignments agree with the selected gene model, or if evidence suggests addition or modification of an exon is necessary. Transcript alignments (e.g. cDNA/EST/RNASeq tracks) that are significantly longer than the gene model may indicate the presence of additional coding sequence or untranslated regions (UTRs). Keep in mind that transcript alignments may be shorter than the gene model due to the fragmented nature of current transcript sequencing technologies. Similarly, protein alignments may not reflect the entire length of the coding region because divergent regions may not align well, resulting in a short protein alignment or one with gaps. Protein and transcript alignments in regions with tandem, closely related genes might also be problematic, with partial alignments to one gene, then skipping over to align the rest to a second gene.

Simple Cases

In this guide, a ‘simple case’ is that when the predicted gene model is correct or nearly correct, and this model is supported by evidence that mostly agrees or completely agrees with the prediction. Aligned evidence (experimental data) that extends beyond the predicted model is assumed to be non-coding sequence. The following sections describe simple modifications.

Add UTRs

Gene predictions may or may not include UTRs. If transcript alignment data are available and extend beyond your original annotation, you may add or extend UTRs. To do this, users may implement edge-matching options to ‘Set as 5’ end’, ‘Set as 3’ end’, or ‘Set as both ends’ from the right-click menu. To use these options, select the exon that needs to be extended, then keep the ‘Shift’ key down as you select the exon from the track of evidence displaying the expected UTR (given the evidence), then use the right click menu to choose the appropriate option to extend to the desired UTR.

Alternatively this operation can be performed manually by positioning the cursor at the edge of the exon that needs to be extended, then using the right-click to display the menu and choosing the ‘Zoom to base level’ option. Place the cursor over the edge of the exon (5’ or 3’ end exon as needed) until it becomes a black arrow (see Fig. 2), then click and drag the edge of the exon to the new coordinate position that includes the UTR. To add a new, spliced UTR to an existing annotation follow the procedure for adding an exon, as detailed in the section ‘Add an Exon’ below.

Exon Structure Integrity

Zoom in sufficiently to clearly resolve each exon as a distinct rectangle. When two exons from different tracks share the same start and/or end coordinates, a red bar appears at the edge of the exon. Visualize this edge-matching function by either selecting the whole annotation or one exon at a time. Scrolling along the length of the annotation exon boundaries may be verified against available EST data. Check whether there are any ESTs or transcript data contigs, or any RNASeq reads showing evidence that one or more of the annotated exons are missing, or include additional exons.

You may use square bracket keys [ and ] to jump to the next exon splice junction or coding sequence (CDS). The curly bracket keys { and } allow users to jump to the next transcript.

To correct an exon boundary to match data in the evidence tracks, use the edge-matching options from the right-click menu as described in the ‘Add UTRs’ section above. Alternatively you may ‘Zoom to base level’, click on the exon to select it and place the cursor over the edge of the exon; when the cursor changes to an arrow, drag the edge of the exon to the desired new coordinates.

In some cases all the data may disagree with the annotation, in other cases some data support the annotation and some of the data support one or more alternative transcripts. Try to annotate as many alternatives transcripts as the evidence data support.

_images/Apollo_User_Guide_Figure2.jpg

Figure 2. Apollo view, zoomed to base level.

The DNA track and annotation track are visible. The DNA track includes the sense strand (top) and anti-sense strand (bottom). The six reading frames flank the DNA track, with the three forward frames above and the three reverse frames below. The ‘User-created Annotation’ track shows the terminal end of an annotation. The green rectangle highlights the location of the nucleotide residues in the ‘Stop’ signal.

Splice Sites

In most Eukaryotes the majority of splice sites at the exon/intron boundaries appear as 5’-…exon]GT/AG[exon…-3’. All other splice sites are here called ‘non-canonical’ and are indicated in Apollo with an orange circle with a white exclamation point inside, placed over the edge of the offending exon. When alternative transcripts are added, be sure to inspect each splice site to check for any changes that the changes.

If a non-canonical splice site is present, zoom to base level to review it. Not all non-canonical splice sites must be corrected, and in such cases they should be flagged with the appropriate comment. (Adding a ‘Comment’ is addressed in the section that details the ‘Information Editor’).

Prior knowledge about the organism of interest may help the user decide whether a predicted non-canonical splice site is likely to be real. For instance, GC splice donors have been observed in many organisms, but less frequently than the GT splice donors described above. As mentioned above Apollo flags GC splice donors as non-canonical. To further complicate the problem, splice sites that are non-canonical, but found in nature, such as GC donors, may not be recognized by some gene prediction algorithms. In such cases a gene prediction algorithm that does not recognize GC splice donors may have ignored a true GC donor and selected another non-canonical splice site that is less frequently observed in nature. Therefore, if a non-canonical splice site that is rarely observed in nature is present, you may wish to search the region for a more frequent in-frame non-canonical splice site, such as a GC donor. If there is a close in-frame site that is more likely to be the correct splice donor, make this adjustment while zoomed at base level.

To assist in the decision to modify a splice site, download the translated sequences and use them to search well-curated protein databases, such as UniProt, to see if you can resolve the question using protein alignments. Incorrect splice sites would likely cause gaps in the alignments. If there does not appear to be any way to resolve the non-canonical splice, leave it as is and add a comment.

‘Start’ and ‘Stop’ Sites.

By default, Apollo will calculate the longest possible open reading frame (ORF) that includes canonical ‘Start’ and ‘Stop’ signals within the predicted exons. To check for accuracy of ‘Start’ and ‘Stop’ signals, you may use the translated sequence to query a known protein database, such as UniProt, to determine whether the ends of the protein sequence corresponds with those of known proteins.

If it appears that Apollo did not calculate the correct ‘Start’ signal, the user can modify it. To set the ‘Start’ codon manually, position the cursor over the first nucleotide of the candidate ‘Start’ codon and select the ‘Set translation start’ option from the right-click menu. Depending on evidence from a protein database search or additional evidence tracks, you may wish to select an in-frame ‘Start’ codon further up or downstream. An upstream ‘Start’ codon may be present outside the predicted gene model, within a region supported by another evidence track. See section below on how to ‘Add an exon’. When necessary, it is also possible to ‘Set translation end’ from the right-click menu.

Note that the ‘Start’ codon may also be located in a non-predicted exon further upstream. If you cannot identify that exon, add the appropriate comment (using the transcript comment section in the ‘Comments’ table of the ‘Information Editor’ as described below).

In rare cases, the actual ‘Start’ codon may be non-canonical (non-ATG). Check whether a non-canonical ‘Start’ codon is usually present in homologs of this gene, and/or check whether this is a likely occurrence in this organism. If appropriate, you may override the predicted ‘Start’ by manually setting it to a non-canonical ‘Start’ codon, choosing the one that most closely reflects what you know about the protein, and has the best support from the biological evidence tracks. Add the appropriate comment (using the transcript comment section in the ‘Comments’ table of the ‘Information Editor’ as described below).

In some cases, a ‘Stop’ codon may not be automatically identified. Check to see if there are data supporting a 3’ extension of the terminal exon or additional 3’ exons with valid splice sites. See section below on how to ‘Add an exon’. Each time you add an exon region, whether by extending an existent exon or adding a new one, Apollo recalculates the longest ORF to identity ‘Start’ and ‘Stop’ signals, allowing you to determine whether a ‘Stop’ codon has been incorporated after each editing step.

Predicted Protein Products.

If any of your manipulations have thrown an exon out of frame, or caused other drastic changes to the translated sequence, Apollo will warn you by changing the display of the model in the ‘User-created Annotations area’ from a light-blue protein-coding stretch to a truncated model shown as a darker blue, narrower rectangle.

If the annotation looks good, obtain the protein sequence (see ‘Get Sequences’ section below) and use it to search a protein database, such as UniProt or NCBI NR. Keep in mind that the best Blast hit may be the exact prediction from which you initiated your annotation; you should not consider the identical protein from your organism as external evidence supporting the annotation. Instead, look at alignments to proteins from other organisms.

Additional Functionality

_images/Apollo_Users_Guide_Figure3.png

Figure 3. Additional functionality.

This is the right-click menu.

Get Sequences

Select one or more exons, or an entire gene model of interest, and retrieve the right-click menu to select the ‘Get sequence’ function. Chose from the options to obtain protein, cDNA, CDS or genomic sequences.

Merge Exons, Merge Transcripts

Select each of the joining exons while holding down the ‘Shift’ key, open the right-click menu and select the ‘Merge’ option.

Add an Exon

You may select and drag the putative new exon from a track in the ‘Evidence’ panel and add it directly to an annotated transcript in the ‘User-created Annotations’ area. Click the exon and, holding your finger on the mouse button, drag the exon using the cursor until it hovers over the receiving transcript. The receiving transcript will be highlighted in dark green when it is okay to release the mouse button. When the mouse button is released the additional exon becomes attached to the receiving transcript. If the receiving transcript is on the opposite strand from the one where you selected the new exon, a warning dialog box will ask you to confirm the change.

Apollo dynamically recalculates the longest ORF for each model, so you must check whether adding one or more exons disrupts the reading frame, inserts premature ‘Stop’ signals, etc.

Make an Intron, Split an Exon

Select the ‘Make intron’ option from the right-click menu over an exon will identify the nearest canonical splice sites (5’-…exon]GT/AG[exon…-3’) to modify the model, and Apollo will also recalculate the longest ORF. If Apollo cannot find a set of canonical splice sites within the selected exon, a dialog box will appear with a warning.

If everything you know about the model indicates that an exon should not be preserved in its current form, you may manually disrupt the exon using the ‘Split option from the right-click menu, which creates a 1-nucleotide intron without taking into account whether or not the surrounding splice sites are canonical.

Delete an Exon

Select the exon using a single click (double click selects the entire model), and select the ‘Delete’ option from the right-click menu. Check whether deleting one or more exons disrupts the reading frame, inserts premature ‘Stop’ signals, etc.

Flip the Strand of Annotation

At times, transcript alignments may appear on the strand opposite to the model’s coding strand, particularly when the transcript alignment does not include a splice junction, which makes it difficult to determine the coding direction. If aligned evidence is used to initiate an annotation, and it is later determined that the annotation is on the incorrect strand, the user may choose the ‘Flip strand’ option from the right-click menu to reverse the orientation of the annotation. As mentioned before, annotators should always reassess the integrity of the translation after modifying an annotation.

Complex Cases

Merge Two Gene Predictions on the Same Scaffold

Evidence may support the merge of two (or more) different gene models. To begin the annotation select all the gene models that you would like to merge, then drag them from the ‘Evidence’ panel onto the ‘User-created Annotations’ area. Be aware that protein alignments may not be a useful starting point because these may have incorrect splice sites and may lack non-conserved regions.

You may select the supporting evidence tracks and drag their ‘ghost’ over the candidate models (without releasing them) to corroborate the overlap. Additionally, zoom in and carefully review edge-matching (Figure 4) and coverage across models.

Alternatively, you may select and drag each proposed gene model separately onto the ‘User-created Annotations’ area. Once you are certain that two models should be merged, after checking boundaries and all supporting evidence, bring them together by holding the ‘Shift’ key and clicking on an intron from each of the merging gene models; in this way you will select both models completely. Then select the ‘Merge’ option from the right-click menu. Get the resulting translation sequence and inspect it by querying a protein database, such as UniProt. Be sure to record the IDs of all starting gene models in the ‘Comments’ table, and use the appropriate canned comment to indicate that this annotation is the result of a merge.

_images/Web_Apollo_User_Guide_edge-matching.png

Figure 4. Edge-matching in Apollo.

When a feature is selected, the exon edges are marked with a red box. All other features that share the same exon boundaries are marked with a red line on the matching edge. This feature allows annotators to confirm that evidence is in agreement without examining each exon at the base level.

Merge Two Gene Predictions from Different Scaffolds

It is not yet possible to merge two annotations across scaffolds, however annotators should document the fact that the data support a merge in the ‘Comments’ table for both components. For standardization purposes, please use the following two prepared (canned) comments, adding the name of both models in every case:

  • “RESULT OF: merging two or more gene models across scaffolds”
  • “RESULT OF: merging two or more gene models. Gene models involved in merge:”
Split a Gene Prediction

When different segments of a predicted protein align to two or more different families of protein homologs, and when the predicted protein does not align to any known protein over its entire length, one or more splits may be recommended. Transcript data may show evidence in support of a split; be sure to verify that it is not a case of alternative transcripts!

A split can be created in one of two ways:

  • Select the flanking exons using the right-click menu option ‘Split’, or
  • Annotate each resulting fragment independently.

You should obtain the resulting translation, and check it by searching a protein database, such as UniProt. Be sure to record the original ID for both annotations in the ‘Comments’ section.

Frameshifts, Single base Errors, and Selenocysteine-containing Products

Apollo allows annotators to make single base modifications and frameshifts that are reflected in the sequence and structure of any transcripts overlapping the modification. Note that these manipulations do NOT change the underlying genomic sequence. Changes are made on the DNA track with the right-click menu.

If you determine that you need to make one of these changes, zoom in to the nucleotide level, and right-click over the genomic sequence to access the menu with options for introducing sequence changes such as insertions, deletions or substitutions. The selected nucleotide must be the starting point for each modification.

  • The ‘Create Genomic Insertion’ option requires a string of nucleotide residues that will be inserted to the right of the cursor’s current coordinate.
  • The ‘Create Genomic Deletion’ option requires the length of the deletion, starting with the nucleotide where the cursor is positioned.
  • When using the ‘Create Genomic Substitution’ option, enter the string of nucleotide residues that will replace the ones on the DNA track.

Once you have entered the modifications, Apollo will recalculate the corrected transcript and protein sequences, which can be obtained selecting the ‘Get Sequence’ option from the right-click menu. Since the underlying genomic sequence is reflected in all annotations that include the modified region you should alert the curators of your organism’s database using the ‘Comments’ section to report these CDS edits.

It is also possible to annotate special cases such as selenocysteine-containing proteins, or read-through ‘Stop’ signals using the right-click menu and selecting the ‘‘Set readthrough stop codon’ option. The current TGA ‘Stop’ exon will be highlighted in purple, and the next ‘Stop’ signal in frame will be used as the end of translation. Note that Apollo will automatically add the remaining amino acids to the resulting sequence. Add a comment in the ‘Comments’ section for this transcript to include this modification.

Annotating Repeat Regions, Transposable Elements, and Non-coding (nc) RNAs

Apollo allows users to annotate a variety of ncRNAs and other regulatory elements.

If you don’t know the location of the feature you wish to annotate, perform a Blat search to identify the sequence of interest using the ‘Sequence search’ feature from the ‘Tools’ tab on the menu bar (see also section on how to ‘Search for a specific sequence’). You may also navigate along the scaffold using the navigation arrows. All non-coding elements are labeled with identifiers, and users may retrieve additional information by selecting the feature and using the right menu to select the ‘View details’ item.

Once the genomic element and track of interest are located in the ‘Evidence’ panel, select it and use right click over the desired feature, and choose the ‘Create New Annotation’ option to start an annotation. After the user chooses an element from the menu, the new annotation appears in the ‘User-created Annotations’ track. The type of annotation for any annotations already present in the ‘User-created Annotations’ cannot be changed.

Modifications such as editing boundaries, duplicating, and deleting the annotation, as well as the ‘History’, ‘Redo’ and ‘Undo’ functions, are possible for all non-coding features. Additional modifications such as ‘Split’ and ‘Make intron’ are also possible for ncRNAs.

All metadata about the annotation should be added using the ‘Information Editor’, as described below.

The Information Editor

Information about the ‘Name’, ‘Symbol’, and ‘Description’ for a Gene, Transcript, repeat region, transposable element, or non-coding RNAs can be modified in the ‘Information Editor’. There is also an option to report to the lead curators, informing them whether a manual annotation needs to be reviewed (’Needs review’), or has already been ‘Approved’ using the ‘Status’ buttons.

Users will also be able to input information about their annotations in fields that capture

  • ‘Comments’ on the process of annotation.
  • Crossed references to other databases in ‘DBXRefs’.
  • Additional ‘Attributes’ in a ‘tag/value’ format that pertain to the annotation.
  • References to any published data in the PubMed database using ‘Pubmed IDs’.
  • Gene Ontology (GO) annotations, which can be added typing text or GO identifiers. The auto-complete function will retrieve the desired information. A drop-down menu at the top of the ‘Information Editor’ allows users to switch between isoforms while editing these metadata.

All the information captured in these tables will be incorporated into the exported files of the ‘User-created Annotations’, and will appear in Column 9 of the GFF3 that is generated.

Add Comments

When you are satisfied with your annotation, you may provide additional information in the form of ‘Comments’. For example, the ID of the gene prediction that you used to initiate the annotation presents useful information for your database curators. Functional information obtained from homologs may also be useful, e.g. homolog ID, description, gene name, gene symbol. You should also indicate the type of changes made to the annotation, and whether a gene is split across scaffolds, as described in previous sections.

For each annotated element first click to select it, then use the right-click option to select ‘Information Editor’ from the menu. In the case of coding genes, pseudogenes, and ncRNAs the ‘Information Editor’ window displays information for both the gene and the transcript; users should determine whether the comment is more appropriate for the gene (e.g. a change in the gene symbol) or an individual transcript (e.g. type of alterations made). In the case of repetitive elements and transposable elements, the ‘Information Editor’ window has only one column.

In the ‘Information Editor’ window click on the respective ‘Add’ button to start a new comment; a new row, labeled as ‘Enter new comment’, will appear. One click on this row reveals a drop-down menu option on the right, which displays canned comments to choose if they are available for your organism of interest. Alternatively, it is also possible to type custom comments. To edit an existing comment, click over the comment and begin typing, or replace it with a different canned comment. Comments that are no longer relevant or useful may be removed using the ‘Delete’ button at the bottom of the box.

Add Database Crossed-references, PubMed IDs, and GO IDs

When available, users should also include information to cross-referenced databases by adding the name of the database and the corresponding accession number for each gene or transcript to the ‘DBXRefs’ tables, respectively. Any additional information regarding published information in support of this annotation (e.g. whether the gene has already been part of a publication) should be included by adding a ‘PubMed ID’ using the provided field, and available functional information should be added using GO IDs as appropriate. The process to add information to these tables is the same as described for the ‘Comments’ tables.

Add Attributes

Any additional information about the gene model or transcript that can be included in the form of a ‘tag/value’ entry, and provides further evidence in support of the manual annotation can be captured on the ‘Attributes’ table. The process to add information to these tables is the same as described for the ‘Comments’ tables.

(No need for) Saving your Annotations

Apollo immediately saves your work, automatically recording it on the database. Because of this, your work will not be lost in the event of network disruptions, and no further actions are required in order to save your work.

Exporting Data

The user-created annotations may be exported as GFF3 and FASTA formatted files. These operations may be done for either a single scaffold, or to include user-created annotations from the entire assembled genome. See the section on ‘Ref Sequence Tab’ under ‘Annotator Panel’ to learn more about how to export data.

Data from each of the evidence and prediction tracks can also be exported. GFF3 formatted files of the visible region on the Apollo screen, as well as files containing data from the entire scaffold/chromosome can be exported. The data will be formatted according to the original data used to display each track. For instance, RNA-Seq reads could be exported either as GFF3 or BED file formats.

Public Demo

The Apollo Demo uses the genome of the honey bee (Apis mellifera). Below are details about the experimental data provided as supporting evidence.

Evidence in support of protein coding gene models

Consensus Gene Sets:
  • Official Gene Set v3.2
  • Official Gene Set v1.0
Consensus Gene Sets comparison:
  • OGSv3.2 genes that merge OGSv1.0 genes
  • OGSv3.2 genes that merge RefSeq genes
  • OGSv3.2 genes that split OGSv1.0 genes
  • OGSv3.2 genes that split RefSeq genes
Protein Coding Gene Predictions Supported by Biological Evidence:
  • NCBI Gnomon
  • Fgenesh++ with RNASeq training data
  • Fgenesh++ without RNASeq training data
  • NCBI RefSeq Protein Coding Genes
  • NCBI RefSeq Low Quality Protein Coding Genes
Ab initio protein coding gene predictions:
  • Augustus Set 12
  • Augustus Set 9
  • Fgenesh
  • GeneID
  • N-SCAN
  • SGP2
Transcript Sequence Alignment:
  • NCBI ESTs
  • Apis cerana reads (RNA-Seq)
  • Forager Bee Brain Illumina Contigs
  • Nurse Bee Brain Illumina Contigs
  • Forager RNA-Seq reads
  • Nurse RNA-Seq reads
  • Abdomen 454 Contigs
  • Brain and Ovary 454 Contigs
  • Embryo 454 Contigs
  • Larvae 454 Contigs
  • Mixed Antennae 454 Contigs
  • Ovary 454 Contigs
  • Testes 454 Contigs
  • Forager RNA-Seq HeatMap
  • Forager RNA-Seq XY Plot
  • Nurse RNA-Seq HeatMap
  • Nurse RNA-Seq XY Plot
Protein homolog alignment:
  • Acep_OGSv1.2
  • Aech_OGSv3.8
  • Cflo_OGSv3.3
  • Dmel_r5.42
  • Hsal_OGSv3.3
  • Lhum_OGSv1.2
  • Nvit_OGSv1.2
  • Nvit_OGSv2.0
  • Pbar_OGSv1.2
  • Sinv_OGSv2.2.3
  • Znev_OGSv2.1
  • Metazoa_Swissprot

Evidence in support of non protein coding gene models

Non-protein coding gene predictions:
  • NCBI RefSeq Noncoding RNA
  • NCBI RefSeq miRNA
Pseudogene predictions:
  • NCBI RefSeq Pseudogene

Additional Information About Apollo

Apollo is an open-source project and is under active development. If you have any questions, you may contact the Apollo development team or join the conversation on the Apollo mailing list by filling out this form. We provide additional documentation for installation and setup. Our demo page provides information on connecting to our demonstration site. Apollo is a member of the GMOD project.

Permissions guide

Global

  • admin: access to everything
  • user: only guarantees a login with permissions configured on organism basis

Organism

Can only view things related to that organism.

  • read: view / search only, no annotation
    Annotations: lock detail / coding
    RefSeq: hide export
    Organism: hide
    User: hide 
    Group: hide 
    Preferences: hide 
    JBrowse: disable UcA track 
  • export: same as read, but can use the export screen
    RefSeq: show export 
  • write: same as above, but can add / edit annotations
    Annotations: allow editing
    JBrowse: enable UcA track 
  • admin: access to everything for that organism
    Organism: show
    User: show 
    Group: show
    Preferences: (still hide)

Table of permissions:

| Permission | Annotator          | Users/groups  | Annotations         | Organism                  |
|------------|--------------------|---------------|---------------------|---------------------------|
| READ       | visible / locked   | hide          | visible / no export | visible                   |
| EXPORT     | visible / locked   | hide          | visible / export    | visible                   |
| WRITE      | visible + editable | hide          | visible / export    | visible                   |
| ADMIN      | visible + editable | visible       | visible /export     | visible + admin functions |
| NONE       | not available      | not available | not available       | not visible               |

The Preference panel is available only for GLOBAL admin.

Developer’s guide

Here we will introduce how to setup Apollo on your server. In general, there are two modes of deploying Apollo.

There is “development mode” where the application is launched in a temporary server (automatically) and there is “production mode”, which will typically require an external separate database and tomcat server where you can deploy the generated war file.

This guide will cover the “development mode” scenario which should be easy to start. To setup in a production environment, please see the setup guide.

Java / JDK

You have to install Java and the Java Development Kit (JDK) 8 or higher to run Apollo. Both the Oracle and OpenJDK versions have been tested.

Node.js / NPM

You will need to install node.js, which includes NPM (the node package manager) to build Apollo.

nvm is highly recommended for installing and managing multiple version of Node. Node v6 and up should work, but we recommend Node v8 or better.

Grails / Groovy / Gradle (optional)

Installing Grails (application framework), Groovy (development language), or Gradle (build environment) is not required (they will install themselves), but it is suggested for doing development.

This is most easily done by using SDKMAN (formerly GVM) which can automatically setup grails for you.

  1. curl -s http://get.sdkman.io | bash
  2. sdk install grails 2.5.5
  3. sdk install gradle 2.11
  4. sdk install groovy

Get the code

To setup Apollo, you can download our latest release from our official releases as compressed zip or tar.gz file (link at the bottom).

Alternatively you can check it out from git directly as follows:

  1. git clone https://github.com/GMOD/Apollo.git Apollo
  2. cd Apollo
  3. git checkout <XYZ> - optional, where XYZ is the tagged version you want from here: https://github.com/GMOD/Apollo/releases

Verify install requirements

We can now perform a quick-start of the application in “development mode” with this command:

./apollo run-local

The JBrowse and perl pre-requisites will be installed during this step, and if there is a success, then a temporary server will be automatically launched at http://localhost:8080/apollo.

Note: You can also supply a port number e.g. apollo run-local 8085 if there are conflicts on port 8080.

Also note: if there are any errors at this step, check the setup.log file for errors. You can refer to the troubleshooting guide and often it just means the pre-requisites or perl modules failed.

Also also note: the “development mode” uses an in-memory H2 database for storing data by default. The setup guide will show you how to configure custom database settings.

Running the code

There are several distinct parts of the code.

  1. Apollo client plugin (JS: dojo, jquery, etc.) in client directory
  2. Server (Grails 2.5.5: Groovy and Java) in grails-app, src, web components and tests.
  3. Side-panel code / wrapper code (GWT 2.8: Java). Code is java and/or XML in src/gwt.
  4. Tools / scripts in the examples and tools: Groovy, perl, bash
  5. JBrowse (JS: dojo, jquery, etc.)

In general, the command ./apollo run-local will build and run the client and the server code. Subsequent runs that do not change the GWT code can use ./apollo run-app. Changes to domain objects or adding controller methods may make stopping and restarting the server necessary, but most other changes will compile without having to restart the server.

./apollo test runs the grails unit and integration tests.

Updating the web-service doc can be done with ./apollo create-rest-doc

Running the code for the making client plugin changes

After starting the server you can run ./gradlew installJBrowseWebOnly or ./apollo jbrowse to push changes from the JavaScript code in the client/apollo directory.

If for some reason this is not working then make sure that your network development tab, in your browser console, has disabled caching. You can also run the command ./gradlew copy-resources-dev manually each time instead if the files don’t seem to be getting copied.

Running the code for GWT changes

To use the GWT dev server run gradle devmode in a separate terminal. This will bring up a separate GWT dev-mode code server that will compile subsequent changes to the src/gwt code after reloading the page.

If errors seem to be a little obtuse using the dev compilation, you might try running ./apollo compile to get more detail.

Running the code for JBrowse changes

If you are testing making changes directly to JBrowse within Apollo, the following steps should work:

  1. ./apollo clean-all
  2. Clone the version of jbrowse you want into a directory called jbrowse-download as the root level.
  3. ./apollo run-local to run the server
  4. In a separate terminal run gradle copy-resources-dev to copy over your changes to the server.

Adding sample data

If you want to test with pre-processed data instead of adding your own you can load the following data into a directory to be added as an organism.

Using Apollo with IntelliJ

You can use Intellij, NetBeans, Eclipse, or just a simple text editor with Apollo to aid in development.

Here we discuss using IntelliJ Ultimate with Apollo:

  • Download IntelliJ Ultimate (you need the commercial version). Licensing options.
  • Clone / download Apollo if you haven’t already. git clone https://github.com/GMOD/Apollo and follow the instructions on building Apollo in this doc.
  • If you’ve tried to use it before with IntelliJ, make sure that there is no .idea or *.ipr file present in the directory.
  • Open IntelliJ
  • Select Import Project
  • Select Create from Existing Sources
  • After it detects the web-app it should have detected Web. Select Grails instead.
  • Note that there is a Grails view in the project menu.
  • Open Terminal and run ./apollo run-local to take care of all the dependencies, including JBrowse. If you aren’t developing GWT, you can use ./apollo run-app instead. Most Java / Groovy files will automatically recompile in a few seconds after you make changes.
  • You can also run debug or run directly from the IDE with output below.

Notes on Debugging:

  • In IntelliJ, run debug (works only for JVM files, debug JavaScript in the browser)
  • There is an error in IntelliJ 2017.3 so either downgrade to 2017.2 or disable the Insrumenting agent in File | Settings | Build, Execution, Deployment | Debugger | Async Stacktraces in the preferences menu.

Create server documentation

Using an IDE like IntelliJ, NetBeans, Eclipse etc. is highly recommended in conjunction with Grails 2.5.X documentation. Additionally, you can generate documentation using grails:

grails doc

Server documentation (for groovy) should be available at target/docs/all-docs.html.

Setting up the application

Setup a production server

To setup in a production environment, please see the setup guide. To setup (as opposed to a development server as above), you must properly configure a servlet container like Tomcat or Jetty with sufficient memory.

Adding data to Apollo

After we have a server setup, we will want to add a new organism to the panel. If you are a new user, you will want to setup this data with the jbrowse pre-processing scripts. You can see the data loading guide for more details, but essentially, you will want to load a reference genome and an annotations file at a minimum:

bin/prepare-refseqs.pl --fasta yourgenome.fasta --out /opt/apollo/data

bin/flatfile-to-json.pl --gff yourannotations.gff --type mRNA \
        --trackLabel AnnotationsGff --out /opt/apollo/data

Login to the web interface

After you access your application at http://localhost:8080/apollo/ then you will be prompted for login information

_images/1.pngLogin first time

Figure 1. “Register First Admin User” screen allows you to create a new admin user.

_images/2.pngOrganism configuration

Figure 2. Navigate to the “Organism tab” and select “Create new organism”. Then enter the new information for your organism. Importantly, the data directory refers to a directory that has been prepared with the JBrowse data loading scripts from the command line. See the data loading section for details.

_images/3.pngOpen annotator

Figure 3. Open up the new organism from the drop down tab on the annotator panel.

Conclusion

If you completed this setup, you can then begin adding new users and performing annotations. Please continue to the setup guide for deploying the webapp to production or visit the troubleshooting guide if you encounter problems during setup.

How to contribute code to Apollo

Audience

These guidelines are for developers of Apollo software, whether internal or in the broader community.

Basic principles of the Apollo-flavored GitHub Workflow

Principle 1: Work from a personal fork

  • Prior to adopting the workflow, a developer will perform a one-time setup to create a personal Fork of apollo and will subsequently perform their development and testing on a task-specific branch within their forked repo. This forked repo will be associated with that developer’s GitHub account, and is distinct from the shared repo managed by GMOD.

Principle 2: Commit to personal branches of that fork

  • Changes will never be committed directly to the master branch on the shared repo. Rather, they will be composed as branches within the developer’s forked repo, where the developer can iterate and refine their code prior to submitting it for review.

Principle 3: Propose changes via pull request of personal branches

  • Each set of changes will be developed as a task-specific branch in the developer’s forked repo, and then create a pull request will be created to develop and propose changes to the shared repo. This mechanism provides a way for developers to discuss, revise and ultimately merge changes from the forked repo into the shared Apollo repo.

Principle 4: Delete or ignore stale branches, but don’t recycle merged ones

  • Once a pull request has been merged, the task-specific branch is no longer needed and may be deleted or ignored. It is bad practice to reuse an existing branch once it has been merged. Instead, a subsequent branch and pull-request cycle should begin when a developer switches to a different coding task.
  • You may create a pull request in order to get feedback, but if you wish to continue working on the branch, so state with “DO NOT MERGE YET”.

One Time Setup - Forking a Shared Repo

The official shared Apollo repository is intended to be modified solely via pull requests that are reviewed and merged by a set of responsible ‘gatekeeper’ developers within the Apollo development team. These pull requests are initially created as task-specific named branches within a developer’s personal forked repo.

Typically, a developer will fork a shared repo once, which creates a personal copy of the repo that is associated with the developer’s GitHub account. Subsequent pull requests are developed as branches within this personal forked repo. The repo need never be forked again, although each pull request will be based upon a new named branch within this forked repo.

Step 1 - Backup your existing repo (optional)

The Apollo team has recently adopted the workflow described in this document. Many developers will have an existing clone of the shared repo that they have been using for development. This cloned local directory must be moved aside so that a proper clone of the forked repo can be used instead.

If you do not have an existing local copy of the shared repo, then skip to Step 2 below.

Step 2 - Fork apollo via the Web

The easiest way to fork the apollo repository is via the GitHub web interface:

  • Ensure you are logged into GitHub as your GitHub user.
  • Navigate to the apollo shared repo at https://github.com/GMOD/apollo.
  • Notice the ‘Fork’ button in the upper right corner. It has a number to the right of the button. images/githubForkButton.png
  • Click the Fork button. The resulting behavior will depend upon whether your GitHub user is a member of a GitHub organization. If not a member of an organization, then the fork operation will be performed and the forked repo will be created in the user’s account.
  • If your user is a member of an organization (e.g., GMOD or acme-incorporated), then GitHub will present a dialog for the user to choose where to place the forked repo. The user should click on the icon corresponding to their username. images/githubForkTarget.png
  • If you accidentally click the number, you will be on the Network Graphs page and should go back.

Step 3 - Clone the Fork Locally

At this point, you will have a fork of the shared repo (e.g., apollo) stored within GitHub, but it is not yet available on your local development machine. This is done as follows:

# Assumes that directory ~/MI/ will contain your Apollo repos.
# Assumes that your username is MarieCurie.
# Adapt these instructions to suit your environment
> cd ~/MI
> git clone git@github.com:MarieCurie/apollo.git
> cd apollo

Notice that we are using the SSH transport to clone this repo, rather than the HTTPS transport. The telltale indicator of this is the git@github.com:MarieCurie... rather than the alternative https://github.com/MarieCurie....

Note: If you encounter difficulties with the above git clone, you may need to associate your local public SSH key with your GitHub account. See Which remote URL should I use? for information.

Step 4 - Configure the local forked repo

The git clone above copied the forked repo locally, and configured the symbolic name ‘origin’ to point back to the remote GitHub fork. We will need to create an additional remote name to point back to the shared version of the repo (the one that we forked in Step 2). The following should work:

# Assumes that you are already in the local apollo directory
> git remote add upstream https://github.com/GMOD/apollo.git

Verify that remotes are configured correctly by using the command git remote -v. The output should resemble:

upstream    https://github.com/GMOD/apollo.git (fetch)
upstream    https://github.com/GMOD/apollo.git (push)
origin  git@github.com:MarieCurie/apollo.git (fetch)
origin  git@github.com:MarieCurie/apollo.git (push)

Step 5 - Configure .bashrc to show current branch (optional)

One of the important things when using Git is to know what branch your working directory is tracking. This can be easily done with the git status command, but checking your branch periodically can get tedious. It is easy to configure your bash environment so that your current git branch is always displayed in your bash prompt.

If you want to try this out, add the following to your ~/.bashrc file:

function parse_git_branch()
{
  git branch 2> /dev/null | sed -e '/^[^*]/d' -e 's/* \(.*\)/ \1/'
}
LIGHT_GRAYBG="\[\033[0;47m\]"
LIGHT_PURPLE="\[\033[0;35m\]"
NO_COLOR="\[\033[0m\]"
export PS1="$LIGHT_PURPLE\w$LIGHT_GRAYBG\$(parse_git_branch)$NO_COLOR \$ "

You will need to open up a new Terminal window (or re-login to your existing terminal) to see the effect of the above .bashrc changes.

If you cd to a git working directory, the branch will be displayed in the prompt. For example:

~ $
~ $ # This isn't a git directory, so no branch is shown
~ $
~ $ cd /tmp
/tmp $
/tmp $ # This isn't a git directory, so no branch is shown
/tmp $
/tmp $ cd ~/MI/apollo/
~/MI/apollo fix-feedback-button $
~/MI/apollo fix-feedback-button $ # The current branch is shown
~/MI/apollo fix-feedback-button $
~/MI/apollo fix-feedback-button $ git status
On branch fix-feedback-button
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)
    ... remaining output of git status elided ...

Typical Development Cycle

Once you have completed the One-time Setup above, then it will be possible to create new branches and pull requests using the instructions below. The typical development cycle will have the following phases:

  • Refresh and clean up local environment
  • Create a new task-specific branch
  • Perform ordinary development work, periodically committing to the branch
  • Prepare and submit a Pull Request (PR) that refers to the branch
  • Participate in PR Review, possibly making changes and pushing new commits to the branch
  • Celebrate when your PR is finally Merged into the shared repo.
  • Move onto the next task and repeat this cycle

Refresh and clean up local environment

Git will not automatically sync your Forked repo with the original shared repo, and will not automatically update your local copy of the Forked repo. These tasks are part of the developer’s normal cycle, and should be the first thing done prior to beginning a new development effort and creating a new branch. In addition, this

Step 1 - Fetch remotes

In the (likely) event that the upstream repo (the apollo shared repo) has changed since the developer last began a task, it is important to update the local copy of the upstream repo so that its changes can be incorporated into subsequent development.

> git fetch upstream        # Updates the local copy of shared repo BUT does not affect the working directory, it simply makes the upstream code available locally for subsequent Git operations. See step 2.
Step 2 - Ensure that ‘master’ is up to date

Assuming that new development begins with branch ‘master’ (a good practice), then we want to make sure our local ‘master’ has all the recent changes from ‘upstream’. This can be done as follows:

> git checkout master
> git reset --hard upstream/master

The above command is potentially dangerous if you are not paying attention, as it will remove any local commits to master (which you should not have) as well as any changes to local files that are also in the upstream/master version (which you should not have). In other words, the above command ensures a proper clean slate where your local master branch is identical to the upstream master branch.

Some people advocate the use of git merge upstream/master or git rebase upstream/master instead of the git reset --hard. One risk of these options is that unintended local changes accumulate in the branch and end up in an eventual pull request. Basically, it leaves open the possibility that a developer is not really branching from upstream/master, but is branching from some developer-specific branch point.

Create a new branch

Once you have updated the local copy of the master branch of your forked repo, you can create a named branch from this copy and begin to work on your code and pull-request. This is done with:

> git checkout -b fix-feedback-button   # This is an example name

This will create a local branch called ‘fix-feedback-button’ and will configure your working directory to track that branch instead of ‘master’.

You may now freely make modifications and improvements and these changes will be accumulated into the new branch when you commit.

If you followed the instructions in Step 5 - Configure .bashrc to show current branch (optional), your shell prompt should look something like this:

~/MI/apollo fix-feedback-button $

Changes, Commits and Pushes

Once you are in your working directory on a named branch, you make changes as normal. When you make a commit, you will be committing to the named branch by default, and not to master.

You may wish to periodically git push your code to GitHub. Note the use of an explicit branch name that matches the branch you are on (this may not be necessary; a git expert may know better):

> git push origin fix-feedback-button   # This is an example name

Note that we are pushing to ‘origin’, which is our forked repo. We are definitely NOT pushing to the shared ‘upstream’ remote, for which we may not have permission to push.

Reconcile branch with upstream changes

If you have followed the instructions above at Refresh and clean up local environment, then your working directory and task-specific branch will be based on a starting point from the latest-and-greatest version of the shared repo’s master branch. Depending upon how long it takes you to develop your changes, and upon how much other developer activity there is, it is possible that changes to the upstream master will conflict with changes in your branch.

So it is a good practice to periodically pull down these upstream changes and reconcile your task branch with the upstream master branch. At the least, this should be performed prior to submitting a PR.

Fetching the upstream branch

The first step is to fetch the update upstream master branch down to your local development machine. Note that this command will NOT affect your working directory, but will simply make the upstream master branch available in your local Git environment.

> git fetch upstream
Rebasing to avoid Conflicts and Merge Commits

Now that you’ve fetched the upstream changes to your local Git environment, you will use the git rebase command to adjust your branch

> # Make that your changes are committed to your branch
> # before doing any rebase operations
> git status
    # ... Review the git status output to ensure your changes are committed
    # ... Also a good chance to double-check that you are on your
    # ... task branch and not accidentally on master
> git rebase upstream/master

The rebase command will have the effect of adjusting your commit history so that your task branch changes appear to be based upon the most recently fetched master branch, rather than the older version of master you may have used when you began your task branch.

By periodically rebasing in this way, you can ensure that your changes are in sync with the rest of Apollo development and you can avoid hassles with merge conflicts during the PR process.

Dealing with merge conflicts during rebase

Sometimes conflicts happen where another developer has made changes and committed them to the upstream master (ideally via a successful PR) and some of those changes overlap with the code you are working on in your branch. The git rebase command will detect these conflicts and will give you an opportunity to fix them before continuing the rebase operation. The Git instructions during rebase should be sufficient to understand what to do, but a very verbose explanation can be found at Rebasing Step-by-Step

Advanced: Interactive rebase

As you gain more confidence in Git and this workflow, you may want to create PRs that are easier to review and best reflect the intent of your code changes. One technique that is helpful is to use the interactive rebase capability of Git to help you clean up your branch prior to submitting it as a PR. This is completely optional for novice Git users, but it does produce a nicer shared commit history.

See squashing commits with rebase for a good explanation.

Submitting a PR (pull request)

Once you have developed code and are confident it is ready for review and final integration into the upstream version, you will want to do a final git push origin ... (see Changes, Commits and Pushes above). Then you will use the GitHub website to perform the operation of creating a Pull Request based upon the newly pushed branch.

See submitting a pull request.

Reviewing a pull request

The set of open PRs for the apollo can be viewed by first visiting the shared apollo GitHub page at https://github.com/GMOD/apollo.

Click on the ‘Pull Requests’ link on the right-side of the page: images/githubPullRequest.png

Note that the Pull Request you created from your forked repo shows up in the shared repo’s Pull Request list. One way to avoid confusion is to think of the shared repo’s PR list as a queue of changes to be applied, pending their review and approval.

Respond to TravisCI tests

The GitHub Pull Request mechanism is designed to allow review and refinement of code prior to its final merge to the shared repo. After creating your Pull Request, the TravisCI tests for apollo will be executed automatically, ensuring that the code that ‘worked fine’ on your development machine also works in the production-like environment provided by TravisCI. The current status of the tests can be found near the bottom of the individual PR page, to the right of the Merge Request symbol: images/githubTestProgress.png images/githubTestStatus.png

TBD - Something should be written about developers running tests PRIOR to TravisCI and the the PR. This may already be in the README.html, but should be cited.

Respond to peer review

The GitHub Pull Request mechanism is designed to allow review and refinement of code prior to its final merge to the shared repo. After creating your Pull Request, the TravisCI tests for apollo will be executed automatically, ensuring that the code that ‘worked fine’ on your development machine also works in the production-like environment provided by TravisCI. The current status of the tests can be found

Repushing to a PR branch

It’s likely that after created a Pull Request, you will receive useful peer review or your TravisCI tests will have failed. In either case, you will make the required changes on your development machine, retest your changes, and you can then push your new changes back to your task branch and the PR will be automatically updated. This allows a PR to evolve in response to feedback from peers. Once everyone is satisfied, the PR may be merged. (see below).

Merge a pull request

One of the goals behind the workflow described here is to enable a large group of developers to meaningfully contribute to the Apollo codebase. The Pull Request mechanism encourages review and refinement of the proposed code changes. As a matter of informal policy, Apollo expects that a PR will not be merged by its author and that a PR will not be merged without at least one reviewer approving it (via a comment such as +1 in the PR’s Comment section).

Celebrate and get back to work

You have successfully gotten your code improvements into the shared repository. Congratulations! The branch you created for this PR is no longer useful, and may be deleted from your forked repo or may be kept. But in no case should the branch be further developed or reused once it has been successfully merge. Subsequent development should be on a new branch. Prepare for your next work by returning to Refresh and clean up local environment.


GitHub Tricks and Tips

  • Add ?w=1 to a GitHub file compare URL to ignore whitespace differences.

References and Documentation

  • The instructions presented here are derived from several sources. However, a very readable and complete article is Using the Fork-and-Branch Git Workflow. Note that the article doesn’t make clear that certain steps like Forking are one-time setup steps, after which Branch-PullRequest-Merge steps are used; the instructions below will attempt to clarify this.
  • New to GitHub? The GitHub Guides are a great place to start.
  • Advanced GitHub users might want to check out the GitHub Cheat Sheet

Automated testing architecture

The Apollo unit testing framework uses the grails testing guidelines extensively, which can be reviewed here: http://grails.github.io/grails-doc/2.4.3/guide/testing.html

Our basic methodology is to run the full test suite with the apollo command:

apollo test

More specific tests can also be run for example by running specific commands for grails test-app

grails test-app :unit-test

This runs ALL of the tests in “test/unit”. If you want to test a specific function then write it something like this:

grails test-app org.bbop.apollo.FeatureService :unit

Notes about the test suites:

  1. @Mock includes any domain objects you’ll use. Unit tests don’t use the database.
  2. The setup() function is run for each test
  3. The test is composed of blocks of code with when: and then:. You have to have both or it is not a test.

Example test:

@TestFor(FeatureService)
@Mock([Sequence,FeatureLocation,Feature])
  class FeatureServiceSpec extends Specification {
  void setup(){}
  void "convert JSON to Feature Location"(){

  when: "We have a valid json object"
  JSONObject jsonObject = new JSONObject()
  Sequence sequence = new Sequence(name: "Chr3",
    seqChunkSize: 20, start:1, end:100, length:99).save(failOnError: true)
  jsonObject.put(FeatureStringEnum.FMIN.value,73)
  jsonObject.put(FeatureStringEnum.FMAX.value,113)
  jsonObject.put(FeatureStringEnum.STRAND.value, Strand.POSITIVE.value)


  then: "We should return a valid FeatureLocation"
  FeatureLocation featureLocation = 
    service.convertJSONToFeatureLocation(jsonObject,sequence)
  assert featureLocation.sequence.name == "Chr3"
  assert featureLocation.fmin == 73
  assert featureLocation.fmax == 113
  assert featureLocation.strand ==Strand.POSITIVE.value
} }

There are 3 “special” types of things to test, which are all important and reflect the grails special functions: Domains, Controllers, Services. They will all be in the “test” directory and all be suffixed with “Spec” for a Spock test.

Chado

If you test with the chado export you will need to make sure you load ontologies into your chado database or integration steps will fail. If you don’t specify chado in your apollo-config.groovy then no further action would be necessary.

./scripts/load_chado_schema.sh -u nathandunn -d apollo-chado-test -s chado-schema-with-ontologies.sql.gz -r

Architecture notes

Overview and developer’s guide

See the build doc for the official developer’s guide.

Minimally, the apollo application can be launched by running apollo run-local. This starts up a temporary tomcat server automatically. It will also simply use a in-memory H2 database if a different database configuration isn’t setup yet.

For development purposes, you can also enable automatic code reloading which helps for fast iteration.

  • grails -reloading run-app will allow changes to the server side code to be auto-reloaded.
  • ant devmode will provide auto-reloading of GWT code changes
  • scripts/copy_client.sh will copy the plugin code to the web-apps folder to update the plugin javascript

The apollo script automatically does several of these functions.

Note: Changes to domain/database objects will require an application restart, but, a very cool feature of our application is that the whole database doesn’t need reloading after a database change.

If you look at the apollo binary, you’ll see that the code for grails run-app and others are automatically launched during apollo run-local.

Also, as always during web development, yoe will want to clear the cache to see changes (”shift-reload” on most browsers).

Overview

_images/architecture2.png

PDF schema

The main components of the Apollo 2.x application are:

  • Grails 2 Server with the current version set in the application.properties
  • Datastore: configured via Hibernate / Grails whcih can use most anything supported by JDBC / hibernate (primarily, Postgres, MySQL, H2)
  • JBrowse / Apollo Plugin: JS / HTML5 JBrowse doc and main site
  • GWT client: provides the sidebar. Can be written in another front-end language, as well. GWT doc

Basic layout

  • Grails code is in normal grails directories under “grails-app”
  • GWT-only code is under “src/gwt”
    • Code shared between the client and the server is under “src/gwt/org/bbop/apollo/gwt/shared”
  • Client code is under “client” (still)
  • Tests are under “test”
  • Old (presumably inactive code) is under “src/main/webapp”
  • New source (loaded into VM) is under “src/java” or “src/groovy” except for grails specific code.
  • Web code (not much) is either under “web-app” (and where jbrowse is copied) or under “grails-app/assets” (these are compiled down).
  • GWT-specifc CSS can also be found in: “src/gwt/org/bbop/apollo/gwt/client/resources/” but it inherits the CSS on its current page, as well.

Main components

The main components of the Apollo 2.x application (the four most important are 1 through 4):

  1. The domain classes; these are the main objects
  2. Controllers, which route those domains and provide URL routes; provides rest services
  3. Views: annotator and index and the only ones that matter for Apollo
  4. Services: very important because all of the controllers should typically have routes, then particular business logic should go into the service.
  5. Configuration files: The grails-app/conf folder contains central conf files, but the apollo-config.groovy file in your root directory can override these central configs (i.e. it is not necessary to edit DataSource.groovy)
  6. Grails-app/assets: all your javascript live here. efficient way to deliver this stuff
  7. Resources: web-app directory: css, images, and the jbrowse directory + WA plugin are initialized here.
  8. Client directory: The WA plugin is copied or compiled along with jbrowse to the web-app directory

Schema/domain classes

Domain classes: the most important domain class everywhere is the Feature; it is the key to everything that we do. The way a domain class is built:

The domain classes represent a database table. The way it works with “Feature”, which is inherited by many other classes, is that all features are stored in the same table, the difference is that in SQL, there is a class table and when it pulls these tables from the database — it queries it and then converts it into the right class. There are a number of constrains you can set.

Very important: the hasMany maps the one-to-many relationship within the database. It can have many locations. the parentFeatureRelationships is where you map this one-to-many relationship. You also have to have a single item relationship.

You can add extra methods to the domain objects, but this is generally not necessary.

Note: In the DataStore configuration, setting called “auditable = true” means that a new table, a feature auditing tool, is keeping track of history for the specified objects

Feature class

All features inherit an ontologyId and specify a cvTerm, although CvTerms are being phased out.

Subclasses of “Feature” will specify the ontologyId, but “Feature” itself is too generic, for example, so it does not have an ontologyId.

Sequence class

Sequences are the method for WA to grabs sequences used to have a cache built-in mechanism doesn’t want to have that anymore to avoid running into memory problems.

Feature locations

Features such as genes all have a feature location belongs to a particular sequence. If you have a feature with subclasses, it can exist within many locations, and each location belongs to its own sequence.

Feature relationship

Feature relationships can define parent/child relationships as well as SO terms i.e. SO “part_of” relationships

Feature enums

The FeatureString enum: allows for mapping names for concepts, and it is useful to use these enums without worrying about string mappings inside the application.

Running the application

If you go through and run this grails application when you send the URL request, then methods that are sent through the AnnotationEditorController (formerly called AnnotationEditorService) dynamically calls a method using handleOperation.

The AnnotatorController serves the page that the annotator is on. This doesn’t map to a particular domain object.

In most cases when we have these methods, it unwraps the data that is sent through into JSON object as a set of variables. Then it is processed into java objects and routed back to JSON to send back.

When annotator creates a transcript, it is then released to requestHandlingService and it sends it to an annotation event, which sends it to a WebSocket, and it’s then broadcasted to everyone.

Websockets and listeners

All clients subscribe to AnnotationNotifications for new transcripts and events.

If an add_transcript operation occurs, this is broadcasted via the websocket. The server side broadcasts this event, and then it does a JSON roundtrip to render the results and sends the return object that belongs to an AnnotationEvent.

Procedure transcript is created –> goes to the server –> adds a transcript locally –> announces it to everyone.

We used to use long polling request model for “push notifications” but now we use Spring with the SockJS, which uses websockets but it can fall back to long-polling.

There is another component of the broadcasting called brokerMessagingTemplate is the converter to broadcast the event

Controllers

Grails controllers are a fairly easy concept for “routing” URLs and info to methods in the code.

Services

Grails services are classes that perform business logic. (In IntelliJ, these are indicated by green buttons on the definitions to show that these are Injected Spring Bean classes)

The word @Transactional means that every operation that is not private is handled via a transaction. In the old model there were a lot of files that were recreated each time, even though they did the same. Now we define a class and can use it again and again. And there can be transactions within transaction. I could also call other services within services.

addTranscript generateTranscript

The different services do exactly what their name implies. It may not always be clear in what particular service each class should be in, but it can be changed later. It is easy also to make changes to the names as well.

Grails views
  • Most of Views are under grails-app
    • everything conforms to the MVC backend model for the Grails application.
  • Most of java, css, html is under web-app directory
    • Application logic for groovy, gwt, java, etc live here. we could put our old servlets there, but not recommended.

Main configuration

The central configuration files are defined in grails-app/conf/ folder, however the user normally only edits their personal config in apollo-config.groovy. That is because the user config file will override those in the central configuration. See Configure.html for details.

Database configuration

The “root” database configuration is specified by grails-app/conf/DataSource.groovy but it is generally over-ridden by the user’s apollo-config.groovy

It is recommended that the user takes sample-postgres-apollo-config.groovy or sample-mysql-apollo-config.groovy and copies it to apollo-config.groovy for their application.

The default database driver is the h2 database, which is an “embedded” database that doesn’t require installing postgres or mysql, but it is not generally seen as performant as postgres or mysql though.

Note: there are three environments that can be setup: a development environment, a test environment, and a production environment, and these are basically assigned automatically depending on how you deploy the app.

  • Development environment - “apollo run-local” or “apollo debug”
  • Test environment - “apollo test”
  • Production environment - “apollo deploy”

Note: If there are no users and no annotations, a bootstrap procedure can also automatically create some annotations and users to start up the app so there is something in there to begin with.

UrlMapping configuration:

The UrlMappings are stored in grails-app/conf/UrlMappings.groovy

The UrlMappings sets up a mapping from routes to controllers

Standard and customized mappings go in here. The way we route jbrowse to organism data directories is also controlled here. The organismJBrowseDirectory is set for a particular session, per user. If none specified, it brings up a default one.

Build configuration

The build configuration is stored in grails-app/conf/BuildConfig.groovy

If there are libraries that are missing are are to be added, you can add them here.

Additionally, the build system uses the “apollo” script and the “build.xml” to control the compilation and resource steps.

Central config

The central configuration is stored in grails-app/conf/Config.groovy

The central Grails config contains logging, app config, and also can reference external configs. The external config can override settings without even touching the application code using this method

In our application, we use the apollo-config.groovy then everything in there supersedes this file.

The log4j area can enable logging levels. You can turn on the “debug grails.app” to output all the webapollo debug info, or also set the “grails.debug” environment variable for java too.

There is also some Apollo configuration here, and it is mostly covered by the configuration section.

GWT web-app

When GWT compiles, it loads files into the web-app directory. When it loads up annotator, it goes to annotator index (the way things get loaded) it does an include annotator.nocache.js file, and with that, it includes all GWT stuff for the /annotator/index route. The src/gwt/org/bbop/apollo/gwt/ contains much code and the src/gwt/org/bbop/apollo/gwt/Annotator.gwt.xml is a central config file for the GWT web-app.

User interface definitions

A Bootstrap/GWT interface handles the tabs on the right for the new UI. The annotator object is at the root of everything.

Example definition: MainPanel.ui.xml

Tests

Unit tests

Unit tests and some basic javascript tests are running on Travis-CI (see .travis.yml for example script).

You can also run “apollo test” to run the tests locally. It will use the “test” database configuration automatically.

Also see the testing notes for more details.

Command line tools

The command line tools offer a number of interesting features that can be used to help setup and retrieve data from the application.

Overview

The command line tools are located in docs/web_services/examples, and they are mostly small scripts that automate the usage of the the web services API.

get_gff3.groovy

Example:

get_gff3.groovy -organism Amel_4.5 -username admin@webapollo.com \
    -password admin_password -url http://localhost:8080/apollo > my output.gff3

This command can accept an -output argument to output to file, or the stdout can be redirected.

The -username and -password can be specified via the command line or if omitted, the user will be prompted.

get_fasta.groovy

Example:

get_fasta.groovy -organism Amel_4.5 -username admin@webapollo.com \                                                      
    -password admin_password -seqtype cds/cdna/peptide -url http://localhost:8080/apollo > output.fa

This command can accept an -output argument to output to file, or the stdout can be redirected.

The -username and -password can be specified via the command line (similar to get_gff3.groovy) or if omitted, the user will be prompted.

add_users.groovy

Example:

add_users.groovy -username admin@webapollo.com -password admin_password \
    -newuser newuser@test.com -newpassword newuserpass \
    -destinationurl http://localhost:8080/apollo

The -username and -password refer to the admin user, and they can also be specified via stdin instead of the command line if they are omitted.

A list of users specified in a csv file can also be used as input.

add_organism.groovy

Example:

add_organism.groovy -name yeast -url http://localhost:8080/apollo/ \
    -directory /opt/apollo/yeast -username admin@webapollo.com -password admin_password

The -directory refers to the jbrowse data directory containing the output from prepare-refseqs.pl, flatfile-to-json.pl, etc. The -blatdb is optional, -genus, and -species are optional.

The -username and -password refer to the admin user, and they can also be specified via stdin instead of the command line if they are omitted.

delete_annotations_from_organism.groovy

Example:

docs/web_services/examples/groovy/delete_annotations_from_organism.groovy  -destinationurl http://localhost:8080/apollo\
     -organismname honeybee2

This script will delete any annotations associated with a given organism.

Web Service API

The Apollo Web Service API is a JSON-based REST API to interact with the annotations and other services of Apollo. Both the request and response JSON objects can contain feature information that are based on the Chado schema. We use the web services API scripting examples and we also use them in the Apollo JBrowse plugin.

The most up to date Web Service API documentation is deployed from the source code rest-api-doc annotations.

See http://demo.genomearchitect.io/Apollo2/jbrowse/web_services/api for details

Warning

If you are sending password you care about over the wire (even if not using web services) it is highly recommended that you use https (which adds encryption ssl) instead of http.

Examples

We provide an examples directory.

curl -b cookies.txt -c cookies.txt -e "http://localhost:8080" \
    -H "Content-Type:application/json" \
    -d "{'username': 'demo', 'password': 'demo'}" \
    "http://localhost:8080/apollo/Login?operation=login"

Login expects two parameters: username and password, and optionally rememberMe for a persistent cookie.

A successful login returns a empty JSON object

Python Client

A python client has been provided over many of the Apollo web services, which is easy to setup:

pip install apollo
arrow init # provide Apollo credentials
arrow -h
## have fun
arrow groups get_groups

Documentation on commands and some examples working with jq:

What is the Web Service API?

For a given Apollo server url (e.g., https://localhost:8080/apollo or any other Apollo site on the web), the Web Service API allows us to make requests to the various “controllers” of the application and perform operations.

The controllers that are available for Apollo include the AnnotationEditorController, the OrganismController, the IOServiceController for downloads of data, and the UserController for user management.

Most API requests will take:

  • The proper url (e.g., to get features from the AnnotationEditorController, we can send requests to (e.g http://localhost/apollo/annotationEditor/getFeatures)
  • username - an authorized user (also uses session if none specified)
  • password - password (also uses session if none specified)
  • organism - (if applicable) the “common name” of the organism for the operation – will also pull from the “user preferences” if none is specified.
  • track/sequence - (if applicable) reference sequence name (shown in sequence panel / genomic browse)
  • uniquename - (if applicable) the uniquename is a UUID used to guarantee a unique ID

Errors If an error has occurred, a proper HTTP error code (most likely 400 or 500) and an error message. is

returned, in JSON format:

{ "error": "error message" }

Cookies

The Apollo Login creates a JSESSIONID cookie and rememberMe cookie (if applicable) and these can be used in downstream API requests (for example, by setting -b cookies.txt in curl will preserve the cookie in the request).

You can also pass username/password to individual API requests and these will authenticate each individual request.

Representing features in JSON

Most requests and responses will contain an array of feature JSON objects named features. The feature object is based on the Chado feature, featureloc, cv, and cvterm tables.

{
    "residues": "$residues",
    "type": {
        "cv": {
            "name": "$cv_name"
        },
        "name": "$cv_term"
    },
    "location": {
        "fmax": $rightmost_intrabase_coordinate_of_feature,
        "fmin": $leftmost_intrabase_coordinate_of_feature,
        "strand": $strand
    },
    "uniquename": "$feature_unique_name"
    "children": [$array_of_child_features]
    "properties": [$array_of_properties]
}

where:

  • residues - A sequence of alphabetic characters representing biological residues (nucleic acids, amino acids) [string]
  • type.cv.name - The name of the ontology [string] type.name - The name for the cvterm [string]
  • location.fmax - The rightmost/maximal intrabase boundary in the linear range [integer]
  • location.fmin - The leftmost/minimal intrabase boundary in the linear range [integer]
  • strand - The orientation/directionality of the location. Should be 0, -1 or +1 [integer]
  • uniquename - The unique name for a feature [string]
  • children - Array of child feature objects [array]
  • properties - Array of properties (including frameshifts for transcripts) [array]

Note that different operations will require different fields to be set (which will be elaborated upon in each operation section).

Web Services API

The most up to date Web Service API documentation is deployed from the source code rest-api-doc annotations

See http://demo.genomearchitect.io/Apollo2/jbrowse/web_services/api for details