Apollo¶
Apollo - A collaborative, real-time, genome annotation web-based editor.
The application’s technology stack includes a Grails-based Java web application with flexible database backends and a Javascript client that runs in a web browser as a JBrowse plugin.
You can find the latest release here: https://github.com/GMOD/Apollo/releases/latest and our setup guide: http://genomearchitect.readthedocs.io/en/latest/Setup.html
- Apollo general documentation: http://genomearchitect.github.io/
- JBrowse general documentation: http://jbrowse.org
- Citing Apollo: Dunn, N. A. et al. Apollo: Democratizing genome annotation. PLoS Comput. Biol. 15, e1006790 (2019)

Note: This documentation covers release versions 2.x of Apollo. For the 1.0.4 installation please refer to the installation guide found at http://genomearchitect.readthedocs.io/en/1.0.4/
Contents:
Setup guide¶
The quick-start guide showed how to quickly launch a temporary instance of Apollo, but deploying the application to production normally involves some extra steps.
The general idea behind your deployment is to create a apollo-config.groovy
file from some existing sample files which
have sample settings for various database engines.
Pre-requisites¶
The server will minimally need to have Java 8 or greater, Grails, git, ant, a servlet container e.g. tomcat7+, jetty, or resin. An external database such as PostgreSQL (9 or 10 preferred) is generally used for production, but instructions for MySQL or the H2 Java database (which may also be run embedded) are also provided.
To build the system natively JDK8 is required (typically OpenJDK8). To run the war, Java 8 or greater should be fine.
Important note: The default memory for Tomcat and Jetty is insufficient to run Apollo (and most other web apps).You should increase the memory according to these instructions.
Other possible build settings for JBrowse:
Ubuntu / Debian
sudo apt-get install zlib1g zlib1g-dev libexpat1-dev libpng-dev libgd2-noxpm-dev build-essential git python-software-properties python make
RedHat / CentOS
sudo apt-get install zlib zlib-dev expat-dev libpng-dev libgd2-noxpm-dev build-essential git python-software-properties python make
It is recommended to use the default version of JBrowse or better (though it does not work with JBrowse 2 yet).
There are additional requirements if doing development with Apollo.
Install node and yarn¶
Node versions 6-12 have been tested and work. nvm and ``nvm install 8``` is recommended.
npm install -g yarn
Install jdk¶
Build settings for Apollo specifically. Recent versions of tomcat7 will work, though tomcat 8 and 9 are preferred. If it does not install automatically there are a number of ways to build tomcat on linux:
sudo apt-get install ant openjdk-8-jdk
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ # or set in .bashrc / .profile / .zshrc
export JAVA_HOME=`/usr/libexec/java_home -v 1.8` # OR
If you need to have multiple versions of java (note #2222), you will need to specify the version for tomcat. In tomcat8 on Ubuntu you’ll need to set the /etc/default/tomcat8
file JAVA_HOME explicitly:
JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
Download Apollo from the latest release under source-code and unzip.Test installation by running ./apollo run-local
and see that the web-server starts up on http://localhost:8080/apollo/.To setup for production continue onto configuration below after install .
Database configuration¶
Apollo supports several database backends, and you can choose sample configurations from using H2, Postgres, or MySQL by default.
Each has a file called sample-h2-apollo-config.groovy
or sample-postgres-apollo-config.groovy
that is designed to be
renamed to apollo-config.groovy before running apollo deploy
. Additionally, you can also run via docker.
Furthermore, the apollo-config.groovy
has different groovy environments for test, development, and production modes.
The environment will be selected automatically selected depending on how it is run, e.g:
apollo deploy
use the production environment (i.e. when you copy the war file to your production serverapollo run-local
orapollo debug
use the development environment (i.e. when you are running it locally)apollo test
uses the test environment (i.e. only when running unit tests)
Configure for H2:¶
- H2 is an embedded database engine, so no external setups are needed. Simply copy sample-h2-apollo-config.groovy to
apollo-config.groovy.
- The default dev environment (
apollo run-local
orapollo run-app
) is in memory so you will have to change that to file.
- The default dev environment (
- If you use H2 with tomcat or jetty in production you have to set the permissions for the file path in production correctly (e.g.
jdbc:h2:/mypath/prodDb
,chown -u tomcat:tomcat /mypath/prodDb.*.db
).- If you use the local relative path
jdbc:h2:./prodDb
and tomcat8 the path will likely be:/usr/share/tomcat8/prodDb*db
- If you use the local relative path
Configure for PostgreSQL:¶
- Create a new database with postgres and add a user for production mode. Here are a few ways to do this in PostgreSQL.
- Copy the sample-postgres-apollo-config.groovy to apollo-config.groovy.
Configure for MySQL:¶
- Create a new MySQL database for production mode (i.e. run ``create database `apollo-production``` in the mysql console) and copy the sample-postgres-apollo-config.groovy to apollo-config.groovy.
Apollo in Galaxy¶
Apollo can always be used externally from Galaxy, but there are a few integrations available as well.
Database schema¶
After you startup the application, the database schema (tables, etc.) is automatically setup. You don’t have to initialize any database schemas yourself.
Deploy the application¶
The apollo run-local
command only launches a temporary server and should really not be used in production, so to
deploy to production, we build a new WAR file with the apollo deploy
command. After you have setup your
apollo-config.groovy
file, and it has the appropriate username, password, and JDBC URL in it, then we can run the
command:
./apollo deploy
This command will package the application and it will download any missing pre-requisites (jbrowse) into a WAR file in
the “target/” subfolder. After it completes, you can then copy the WAR file (e.g. apollo-2.0.4.war
) from the target folder
to the web-app
folder of your web container installation.
If you name the file apollo.war
in your webapps folder, then you can access your app at “http://localhost:8080/apollo”
We test primarily on Apache Tomcat (7.0.62+ and 8). Make sure to set your Tomcat memory to an appropriate size or Apollo will run slow / crash.
Alternatively, as we alluded to previously, you can also launch a temporary instance of the server which is useful for testing
./apollo run-local 8085
This temporary server will be accessible at “http://localhost:8085/apollo”
Tomcat configuration¶
If you have tracks that have deep nested features that will result in a feature JSON larger than 10MB or if you have a client
that sends requests to the Apollo server as JSON of size larger than 10MB then you will have to modify src/war/templates/web.xml
.
Specifically the following block in web.xml
:
<context-param>
<param-name>org.apache.tomcat.websocket.textBufferSize</param-name>
<param-value>10000000</param-value>
</context-param>
<context-param>
<param-name>org.apache.tomcat.websocket.binaryBufferSize</param-name>
<param-value>10000000</param-value>
</context-param>
Note: The <param-value>
is in bytes.
Memory configuration¶
Changing the memory used by Apollo in production must be configured within Tomcat directly.
The default memory assigned to Apollo to run commands in Apollo is 2048 MB. This can be changed in your
apollo-config.groovy
by uncommenting the memory configuration block:
// Uncomment to change the default memory configurations
grails.project.fork = [
test : false,
// configure settings for the run-app JVM
run : [maxMemory: 2048, minMemory: 64, debug: false, maxPerm: 1024, forkReserve: false],
// configure settings for the run-war JVM
war : [maxMemory: 2048, minMemory: 64, debug: false, maxPerm: 1024, forkReserve: false],
// configure settings for the Console UI JVM
console: [maxMemory: 2048, minMemory: 64, debug: false, maxPerm: 1024]
]
Note on database settings¶
If you use the apollo run-local
command, then the “development” section of the apollo-config.groovy is used (or an
temporary in-memory H2 database is used if no apollo-config.groovy exists).
If you use the WAR file generated by the apollo deploy
command on your own webserver, then the “production” section of
the apollo-config.groovy is used.
Detailed build instructions¶
While the shortcut apollo deploy
takes care of basic application deployment, understanding the full build process of
Apollo can help you to optimize and improve your deployed instances.
To learn more about the architecture of webapollo, view the architecture guide.
Using Docker to Run Apollo¶
You can install Docker for your system if not previously done.
Running the Container¶
You can see Apollo straight away:
docker run -it -p 8888:8080 gmod/apollo:stable -v /jbrowse/root/directory/:/data
Open http://localhost:8888 in a web browser and login with admin@local.host / password to get started.
Note: data is not guaranteed to be saved in this manner, but data is /jbrowse/root/directory
will not be written to either.
Production¶
To run in production against persistent JBrowse data and a persistent database you should:
docker pull gmod/apollo
if runninglatest
build to guarantee you are using the latest build (not necessary for point releases).- Create an empty directory for database data, e.g.
/postgres/data/directory
if you want to save data if the image goes down. - If you want to upload tracks and genomes directories, create an empty directory for that, e.g.,
/jbrowse/root/apollo_data
- Put JBrowse data in a directory, e.g.
/jbrowse/root/directory/
. - If publicly visible set a username and password
docker run -it \
-v /jbrowse/root/directory/:/data \
-v /postgres/data/directory:/var/lib/postgresql \
-v /jbrowse/root/apollo_data:/data/temporary/apollo_data \
-e APOLLO_ADMIN_EMAIL=adminuser \
-e APOLLO_ADMIN_PASSWORD=superdupersecretpassword \
-p 8888:8080 gmod/apollo:latest
- As above, open http://localhost:8888 in a browser to begin setting up Apollo.
Additional configuration¶
See docker run instructions to run as a daemon (-d
) and with a fresh container each time (--rm
) depending on your use-case.
Additional options could be to set memory (required for running production off a mac) --memory=4g
, running a docker daemon d
or adding debugging to the server -e "WEBAPOLLO_DEBUG=true"
. For example (after creating the local apollo_shared_dir
):
docker run --memory=4g -d -it -p 8888:8080 -v `pwd`/apollo_shared_dir/:`pwd`/apollo_shared_dir/ -e "WEBAPOLLO_DEBUG=true" -v /postgres/data/directory:/var/lib/postgresql gmod/apollo:latest
You can configure additional options by setting environmental variables for docker apollo-config.groovy by passing
through via multiple -e
parameters.
For example:
- Change the root path of the url (e.g., http://localhost:8888/otherpath) by adding the argument
-e APOLLO_PATH=otherpath
when running.
NOTE: If you don’t use a locally mounted PostgreSQL database (e.g., creating an empty directory and mounting using -v postgres-data:/var/lib/postgresql
)
or set appropriate environment variables for a remote database
( see variables defined here) your annotations and setup may not be persisted.
Notes on releases and availability¶
The image is available on docker hub.
On docker hub you always pull from gmod/apollo:release-<version>
where version is something like 2.6.2.
Versions¶
On docker hub versions are stable
is the master
branch or the latest stable release (e.g., 2.6.0), latest
is the checkin, which has not necessarily been thoroughly tested and release-X.Y.Z
represents the release of the tag X.Y.Z
(e.g., release-2.6.0
).
quay.io mirrors tags and all branches directly. So master
is master
and X.Y.Z
is the same. So quay.io/<user or orgname>/apollo:2.6.0
is the same as <user or org name>/apollo:release-2.6.0
, but you’ll have to fork your own version
See what is avaiable for docker hub builds.
Logging In¶
The default credentials in this image are:
| Credentials | |
| — | —————— |
| Username | admin@local.host
|
| Password | password
|
Example Workflow¶
- Make the following directories somewhere with write permissions:
postgres-data
andjbrowse-data
. - Copy your jbrowse data into
jbrowse-data
. We provide working sample data. - Run the docker-command:
docker run -it -v /absolute/path/to/jbrowse-data:/data -v /absolute/path/to/postgres-data:/var/lib/postgresql -p 8888:8080 gmod/apollo:latest
- Login to the server at
http://localhost:8888/
- Add an organism per the instructions under Figure 2. Using yeast as an example, if you copy the data into
jbrowse-data/yeast
then on the server you’ll add the directory:/data/yeast
.
Running your own preloaded data in a fork¶
Here is an example of running pre-loaded data from Apollo: https://github.com/alliance-genome/agr_apollo_container
Note that important changes here are in:
To create this we loaded the original, configured as we wanted and dumped the sql file out.
Apollo Configuration¶
Apollo includes some basic configuration parameters that are specified in configuration files. The most important parameters are the database parameters in order to get Apollo up and running. Other options besides the database parameters can be configured via the config files, but note that many parameters can also be configured via the web interface.
Note: Configuration options may change over time, as more configuration items are integrated into the web interface.
Main configuration¶
The main configuration settings for Apollo are stored in grails-app/conf/Config.groovy
, but you can override settings
in your apollo-config.groovy
file (i.e. the same file that contains your database parameters). Here are the defaults
that are defined in the Config.groovy file:
// default apollo settings
apollo {
gff3.source = "." // also for GPAD
// other translation codes are of the form ncbi_KEY_translation_table.txt
// under the web-app/translation_tables directory
// to add your own add them to that directory and over-ride the translation code here
get_translation_code = 1
proxies = [
[
referenceUrl : 'http://golr.geneontology.org/select',
targetUrl : 'http://golr.geneontology.org/solr/select',
active : true,
fallbackOrder: 0,
replace : true
]
,
[
referenceUrl : 'http://golr.geneontology.org/select',
targetUrl : 'http://golr.berkeleybop.org/solr/select',
active : false,
fallbackOrder: 1,
replace : false
]
]
fa_to_twobit_exe = "/usr/local/bin/faToTwoBit" // get from https://genome.ucsc.edu/goldenPath/help/blatSpec.html
sequence_search_tools = [
blat_nuc : [
search_exe : "/usr/local/bin/blat",
search_class: "org.bbop.apollo.sequence.search.blat.BlatCommandLineNucleotideToNucleotide",
name : "Blat nucleotide",
params : ""
],
blat_prot: [
search_exe : "/usr/local/bin/blat",
search_class: "org.bbop.apollo.sequence.search.blat.BlatCommandLineProteinToNucleotide",
name : "Blat protein",
params : ""
//tmp_dir: "/opt/apollo/tmp" optional param
]
]
...
}
These settings are essentially the same familiar parameters from a config.xml file from previous Apollo versions. The
defaults are generally sufficient, but as noted above, you can override any particular parameter in your
apollo-config.groovy
file, e.g. you can add override configuration any given parameter as follows:
grails {
apollo.get_translation_code = 1
apollo {
use_cds_for_new_transcripts = true
default_minimum_intron_size = 1
get_translation_code = 1 // identical to the dot notation
}
}
Suppress calculation of non-canonical splice sites¶
By default we calculate non-canonical splice sites. For some organisms this is undesirable.
apollo.calculate_non_canonical_splice_sites = false
Count annotations¶
By default annotations are counted, but in some cases this can be come prohibitive for performance if a lot of annotations.
This can be shut off by setting this to false. This can over-ridden as below in the apollo-config.groovy
file:
grails {
apollo.count_annotations = false
apollo {
count_annotations = false
}
}
Suppress add merged comments¶
By default, when you merge two isoforms, it will automatically create a comment indicating the name and unique ID from the consumed isoform that was used as a comment.
grails {
apollo.add_merged_comment = false
apollo {
add_merged_comment = false
}
}
JBrowse Plugins and Configuration¶
You can configure the installed Apollo JBrowse by modifying the jbrowse
section of your apollo-config.groovy
that overrides the JBrowse configuration file.
There are two sections, plugins
and git
, which specifies the JBrowse version.
git {
url = "https://github.com/gmod/jbrowse"
branch = "1.16.11-release"
If a git block a tag
or branch
can be specified.
In the plugins
section, options are included
(part of the JBrowse release), url
(requiring a url parameter),
or git
, which can include a tag
or branch
as above.
Options for alwaysRecheck
and alwaysRepull
always check the branch and tag and always pull respectiviely.
See sample-*.groovy
for example sections: https://github.com/GMOD/Apollo/blob/develop/sample-h2-apollo-config.groovy#L112-L146
Translation tables¶
The default translation table is 1
To use a different table from this list of NCBI translation tables set the number in the apollo-config.groovy
file as:
apollo {
...
get_translation_code = "11"
You may also add a custom translation table in the web-app/translation_tables
directory as follows:
web-app/translation_tables/ncbi_customname_translation_table.txt
Specify the customname
in apollo-config.groovy as follows:
apollo {
...
get_translation_code = "customname"
}
As well, translation tables can be set per organism using the ‘Details’ panel located in the ‘Organism’ tab of the Annotator panel in the Apollo window: to replace the translation table (default or set by admin) for any given organism, use the field labeled as ‘Non-default Translation Table’ to enter a different table identifier as needed.
Configuring Transcript Overlapper¶
Apollo, by default, uses a CDS
overlapper which treats two overlapping transcripts as isoforms of each other if and only if they share the same in-frame CDS.
You can also configure Apollo to use an exon
overlapper, which would treat two overlapping transcripts as isoforms of each other if one or more exon overlaps with each other they share the same splice acceptor and splice donor sites.
apollo {
transcript_overlapper = "exon"
}
Logging configuration¶
To over-ride the default logging, you can look at the logging configurations from
Config.groovy and override or modify them in
apollo-config.groovy
.
log4j.main = {
error 'org.codehaus.groovy.grails.web.servlet', // controllers
'org.codehaus.groovy.grails.web.pages', // GSP
'org.codehaus.groovy.grails.web.sitemesh', // layouts
...
warn 'grails.app'
}
To add debug-level logging you would replace warn 'grails.app'
with two lines debug 'grails.app'
and debug 'org.bbop.apollo'
. To see database-level logging you would also add: trace 'org.hibernate.type'
and debug 'org.hibernate.SQL'
.
Additional links for log4j:
- Advanced log4j configuration: http://blog.andresteingress.com/2012/03/22/grails-adding-more-than-one-log4j-configurations/
- Grails log4j guide: http://grails.github.io/grails-doc/2.4.x/guide/single.html#logging
Add attribute for the original id of the object¶
In the apollo store_orig_id=true
is set to true by default. To store an orid_id
attribute on the top-level feature that
represents the original id from the genomic evidence. This is useful for re-merging code as Apollo will generate its own IDs because
annotations will be based on multiple evidence sources. To turn this off, override it by setting it to false store_orig_id = false
.
Canned Elements¶
Canned comments, canned keys (tags), and canned values are configured using the Admin tab from the Annotator Panel on the web interface; these can no longer be created or edited using the configuration files. For more details on how to create and edit Canned Elements see Canned Elements.
View your instances page for more details. For example
- http://localhost:8080/apollo/cannedComment/
- http://localhost:8080/apollo/cannedKey/
- http://localhost:8080/apollo/cannedValue/
Search tools¶
Apollo can be configured to work with various sequence search tools. UCSC’s BLAT tool is configured by default and you
can customize it as follows by making modifications in the apollo-config.groovy
file. Here we replace blat with blast
(there is an existing wrapper for Blast). The database for each file will be passed in via params (globally) or using the
Blat database
field in the organism tab. For blast the database will be the root name of the blast database files
without the suffix. Retrieve blat binaries from ucsc.
apollo{
fa_to_twobit_exe = "/usr/local/bin/faToTwoBit" // get from https://genome.ucsc.edu/goldenPath/help/blatSpec.html
sequence_search_tools {
blat_nuc {
search_exe = "/usr/local/bin/blastn"
search_class = "org.bbop.apollo.sequence.search.blast.BlastCommandLine"
name = "Blast nucleotide"
params = ""
}
blat_prot {
search_exe = "/usr/local/bin/tblastn"
search_class = "org.bbop.apollo.sequence.search.blast.BlastCommandLine"
name = "Blast protein to translated nucleotide"
params = ""
//tmp_dir: "/opt/apollo/tmp" optional param
}
your_custom_search_tool {
search_exe = "/usr/local/customtool"
search_class = "org.your.custom.Class"
name: "Custom search"
}
}
}
When you setup your organism in the web interface, you can then enter the location of the sequence search database for BLAT.
If you setup fa_to_twobit_exe
with the proper path, fasta uploads for new genomes will automatically be indexed and populated.
Note: If the BLAT binaries reside elsewhere on your system, edit the search_exe location in the config to point to your BLAT executable.
Data adapters¶
Data adapters for Apollo provide the methods for exporting annotation data from the application. By default, GFF3 and FASTA adapters are supplied. They are configured to query your IOService URL e.g. http://localhost:8080/apollo/IOService with the customizable query
data_adapters = [[
permission: 1,
key: "GFF3",
data_adapters: [[
permission: 1,
key: "Only GFF3",
options: "output=file&format=gzip&type=GFF3&exportGff3Fasta=false"
],
[
permission: 1,
key: "GFF3 with FASTA",
options: "output=file&format=gzip&type=GFF3&exportGff3Fasta=true"
]]
],
[
permission: 1,
key : "FASTA",
data_adapters :[[
permission : 1,
key : "peptide",
options : "output=file&format=gzip&type=FASTA&seqType=peptide"
],
[
permission : 1,
key : "cDNA",
options : "output=file&format=gzip&type=FASTA&seqType=cdna"
],
[
permission : 1,
key : "CDS",
options : "output=file&format=gzip&type=FASTA&seqType=cds"
]]
]]
Default data adapter options¶
The options available for the data adapters are configured as follows
- type:
GFF3
orFASTA
- output: can be
file
ortext
.file
exports to a file and provides a UUID link for downloads, text just outputs to stream. - format: can by
gzip
orplain
.gzip
offers gzip compression of the exports, which is the default. - exportSequence:
true
orfalse
, which is used to include FASTA sequence at the bottom of a GFF3 export
Supported annotation types¶
Many configurations will require you to define which annotation types the configuration will apply to. Apollo supports the following “higher level” types (from the Sequence Ontology):
- sequence:gene
- sequence:pseudogene
- sequence:transcript
- sequence:mRNA
- sequence:tRNA
- sequence:snRNA
- sequence:snoRNA
- sequence:ncRNA
- sequence:rRNA
- sequence:miRNA
- sequence:repeat_region
- sequence:transposable_element
Modify CORS¶
We are using the grails-cors plugin. To configure it specifically or turn it off override the options:
cors.url.pattern = '*'
cors.enable.logging = true
cors.enabled = true
cors.headers = ['Access-Control-Allow-Origin': '*']
Set the default biotype for dragging up evidence¶
By default dragged up evidence is treated as mRNA
. However, you can specify the default biotype within trackList.json
in order to specify default types for tracks.
For example, specifying ncRNA
as the default type:
{
'key' : 'Official Gene Set v3.2 Canvas',
'storeClass' : 'JBrowse/Store/SeqFeature/NCList',
'urlTemplate' : 'tracks/Official Gene Set v3.2/{refseq}/trackData.json',
'default_biotype':'ncRNA'
}
If you specify auto
instead then it will automatically try to infer based on a feature’s type.
Other non-transcript types repeat_region
and transposable_element
are also supported.
Apache / Nginx configuration¶
Oftentimes, admins will put use Apache or Nginx as a reverse proxy so that the requests to a main server can be forwarded to the tomcat server. This setup is not necessary, but it is a very standard configuration as is making modification to iptables.
Note that we use the SockJS library, which will downgrade to long-polling if websockets are not available, but since websockets are preferable, it helps to take some extra steps to ensure that the websocket calls are proxied or forwarded in some way too. Using Tomcat 8 or above is recommended.
If using a separate Oauth2 provider, a more detailed example of handling both the proxy and the authentication with OpenID Connect has also been provided.
Installing secure certificates.¶
Free certificates can be found by using certbot.
Follow the instructions to install your appropriate certificate if users are going to potentially be sending passwords across.
Apache Proxy¶
Here is the most basic configuration for a reverse proxy with Apache 2.4 (will probably work for 2.2 as well).
Enable proxy_pass and proxy_wstunnel:
sudo a2enmod proxy proxy_wstunnel proxy_connect proxy_http
sudo service apache2 restart
In the apache conf directory edit proxy.conf
<Proxy *>
# if using Apache 2.2 use Order, Allow directives
Order Deny,Allow
Allow from all
# if using Apache 2.4 use Require directive
Require all granted
</Proxy>
ProxyPass /apollo/stomp/info http://localhost:8080/apollo/stomp/info
ProxyPassReverse /apollo/stomp/info http://localhost:8080/apollo/stomp/info
ProxyPass /apollo/stomp ws://localhost:8080/apollo/stomp
ProxyPassReverse /apollo/stomp ws://localhost:8080/apollo/stomp
ProxyPass /apollo http://localhost:8080/apollo
ProxyPassReverse /apollo http://localhost:8080/apollo
If Tomcat is running SSL¶
If the secure certificate is on Apollo and you’re running via apache use https
and wss
protocols instead or just change the tomcat server port explicitly:
ProxyPass /apollo/stomp/info https://site:8443/apollo/stomp/info
ProxyPassReverse /apollo/stomp/info https://localhost:8443/apollo/stomp/info
ProxyPass /apollo/stomp wss://localhost:8443/apollo/stomp
ProxyPassReverse /apollo/stomp wss://localhost:8443/apollo/stomp
ProxyPass /apollo https://localhost:8443/apollo
ProxyPassReverse /apollo https://localhost:8443/apollo
Note: that a reverse proxy does not use ProxyRequests On
(which turns on forward proxying, which is dangerous)
Also note: This setup will downgrade (but will still function) to use AJAX long-polling without the websocket proxy being configured.
Debugging proxy issues¶
Note: if your webapp is accessible but it doesn’t seem like you can login, you may need to customize the ProxyPassReverseCookiePath
For example, if you proxied to a different path, you might have something like this
ProxyPass /testing http://localhost:8080
ProxyPassReverse /testing http://localhost:8080
ProxyPassReverseCookiePath / /testing
Then your application might be accessible from http://localhost/testing/apollo
Nginx Proxy (from version 1.4 on)¶
Your setup may vary, but setting the upgrade headers can be used for the websocket configuration http://nginx.org/en/docs/http/websocket.html
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
server {
# Main
listen 80; server_name myserver;
# http://nginx.org/en/docs/http/websocket.html
location /ApolloSever {
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_pass http://127.0.0.1:8080;
}
}
Adding extra tabs¶
Extra tabs can be added to the side panel by over-riding the apollo configuration extraTabs:
extraTabs = [
['title': 'extra1', 'url': 'http://localhost:8080/apollo/annotator/report/'],
['title': 'extra2', 'content': '<b>Apollo</b> documentation <a href="https://genomearchitect.readthedocs.io/" target="_blank">linked here</a>']
]
Upgrading existing instances¶
There are several scripts for migrating from older instances. See the migration guide for details. Particular notes:
Note: Apollo does not require using the add-webapollo-plugin.pl
because the plugin is loaded implicitly by
including the client/apollo/json/annot.json file at run time.
Upgrading existing JBrowse data stores¶
It is not necessary to change your existing JBrowse data directories to use Apollo 2.x, you can just point to existing data directories from your previous instances.
More information about JBrowse can also be found in their FAQ.
Adding custom CSS for track styling for JBrowse¶
There are a variety of different ways to include new CSS into the browser, but the easiest might be the following
Add the following statement to your trackList.json:
"css" : "data/yourfile.css"
Then just place your CSS file in your organism’s data directory.
Adding custom CSS globally for JBrowse¶
If you want to add CSS that is used globally for JBrowse, you can edit the CSS in the client/apollo/css folder, but since you need to re-deploy the app every time for updates, it is easier to just edit the data directories for your organisms (you do not need to re-deploy the app when you are editing organism specific data, since this is outside of the webapp directory and is not deployed with the WAR file)
Adding custom CSS globally for the GWT app¶
If you want to style the GWT sidebar, generally the bootstrap theme is used but extra CSS is also included from web-app/annotator/theme.css which overrides the bootstrap theme
Adding / using proxies¶
If you are https, or choose to use separate services rather than the default provided, you can setup a pass-through proxy or modify a particular URL.
This service is only available to logged-in users.
The internal proxy URL is:
<apollo url>/proxy/request/<encoded_proxy_url>/
For example if your URL the URL we want to proxy:
http://golr.geneontology.org/solr/select
encoded:
http%3A%2F%2Fgolr.geneontology.org%2Fsolr%2Fselect
If you user is logged-in and you pass in:
http://localhost/apollo/proxy/request/http%3A%2F%2Fgolr.geneontology.org%2Fsolr%2Fselect?testkey=asdf&anotherkey=zxcv
This will get proxied to:
http://golr.geneontology.org/solr/select?testkey=asdf&anotherkey=zxcv
If you choose to use another proxy service, you can go to the “Proxy” page (as administrator). Internally used proxies are provided by default. The order the final URL is chosen in is ‘active’ and then ‘fallbackOrder’.
Register admin in configuration¶
If you want to register your admin user in the configuration, you can add a section to your apollo-config.groovy
like:
apollo{
// other stuff
admin{
username = "super@duperadmin.com"
password = System.getenv("APOLLO_ADMIN_PASSWORD")?:"demo"
firstName = "Super"
lastName = "Admin"
}
}
It should only add the user a single time. User details can be retrieved from passed in text or from the environment depending on user preference.
Admin users will be added on system startup. Duplicate additions will be ignored.
Other authentication strategies¶
By default Apollo uses a username / password to authenticate users. However, additional strategies may be used.
To configure them, add them to the apollo-config.groovy
and set active to true for the ones you want to use to
authenticate.
apollo{
// other stuff
authentications = [
["name":"Username Password Authenticator",
"className":"usernamePasswordAuthenticatorService",
"active":true,
]
,
["name":"Remote User Authenticator",
"className":"remoteUserAuthenticatorService",
"active":false,
"params":["default_group": "annotators"]
]
]
}
The Username Password Authenticator
is the default method for storing username passwords, where databases are stored secured within the database.
The Remote User Authentication
method uses a separate Apache authorization proxy, which is used by the Galaxy Community.Furthermore, users and groups can be inserted / updated via web services, which are wrapped by the Apollo python library.
The default_group
parameter adds a user to a default group on login so that a user has access to at least some genomes.
A more detailed guide using OpenIDConnect authorization explains usage of both the proxy and an authentication strategy.
URL modifications¶
You should be able to pass in most JBrowse URL modifications to the loadLink
URL.
You should use tracklist=1
to force showing the native tracklist (or use the checkbox in the Track Tab in the Annotator Panel).
Use openAnnotatorPanel=0
to close the Annotator Panel explicitly on startup.
Linking to annotations¶
You can find a link to your current location by clicking the “chain link” icon in the upper, left-hand corner of the Annotator Panel.
It will provide a popup that gives you a public URL to view while not logged in and a one to use while logged in.
####Public URL
location
=: .. organism
is the organism id or common name if unique.tracks
are url-encoded tracks separated by a comma
Example:http://demo.genomearchitect.io/Apollo2/3836/jbrowse/index.html?loc=Group1.31:287765..336436&tracks=Official%20Gene%20Set%20v3.2,GeneID
####Logged in URL
location
=: .. it can also be the annotated feature name
if an organims is provided or theuniqueName
(see the ID in the annotation detail window), which is typically a UUID and does not require an organism.organism
is the organism id or common name if unique.tracks
are url-encoded tracks separated by a comma
Examples:
- http://demo.genomearchitect.io/Apollo2/annotator/loadLink?loc=Group1.31:287765..336436&organism=3836&tracks=Official%20Gene%20Set%20v3.2,GeneID
- http://demo.genomearchitect.io/Apollo2/annotator/loadLink?loc=GB51936-RA&organism=3836&tracks=Official%20Gene%20Set%20v3.2,GeneID
- http://demo.genomearchitect.io/Apollo2/annotator/loadLink?uuid=355617c7-f8c1-4105-bb11-755cee1855df&tracks=Official%20Gene%20Set%20v3.2,GeneID
Setting default track list behavior¶
By default the native tracklist is off, but can be added. For new users if you want the default to be on, you can add this to the apollo-config.groovy:
apollo{
native_track_selector_default_on = true
}
Set Common Data Directory in the config¶
The common_data_directory
is where uploaded and processed jbrowse tracks will go.
This should be server-writable space on your system that is not deleted (note /tmp
is deleted periodically on most unix systems).
common_data_directory = "/opt/temporary/apollo"
If you don’t plan to use these features, then /tmp
might be fine.
In general it will create a directory for you at $HOME/apollo_data
if not otherwise specified or will allow you to set one from the command-line.
Adding tracks via addStores¶
The JBrowse Configuration Guide describes in detail on how to add tracks to JBrowse using addStores. The configuration relies on sending track config JSON through the URL which can be problematic, especially with new versions of Tomcat.
Instead we recommend using the dot notation to add track configuration through the URL.
Thus,
addStores={"uniqueStoreName":{"type":"JBrowse/Store/SeqFeature/GFF3","urlTemplate":"url/of/my/file.gff3"}}
becomes,
addStores.uniqueStoreName.type=JBrowse/Store/SeqFeature/GFF3&addStores.uniqueStoreName.urlTemplate=url/of/my/file.gff3
Following are a few recommendations for adding tracks via dot notation in Apollo:
- avoid
{dataRoot}
in yoururlTemplate
- avoid specifying
data
folder name in yoururlTemplate
- avoid specifying
baseUrl
Since Apollo is aware of the organism data folder, specifying it explicitly in the urlTemplate
can cause issues with URL redirects.
Setting Track Style by type¶
For the default track type (FeatureTrack
) to set the feature style by type (for example, if you have multiple feature types on a single track and you want to distinguish them,
you have to set the track className
as {type}
in the style section of the trackList.json
file for that track:
"style": {
"className": "{type}",
},
You then have to specify a custom CSS file for that type in the trackList.json
:
"css":"data/custom.css"
And that file has to go at the same level as trackList.json
.
An example CSS entry to specify the feature type lnc_RNA
might be:
.minus-lnc_RNA .neat-UTR,
.plus-lnc_RNA .neat-UTR,
.lnc_RNA .neat-UTR{
height: 12px;
margin-top: 2px;
color: rgb(200,2,3);
background-color: rgb(5,4,255) !important;
}
For Canvas and HTML track configuration options, please see the JBrowse documentation for additional details.
Hiding JBrowse tracks from the public¶
To hide public tracks from public organisms add apollo.permission.level.private
line to your JBrowse track:
{
"compress" : 0,
"key" : "GeneData_hidden",
"label" : "GeneData_hidden",
"storeClass" : "JBrowse/Store/SeqFeature/NCList",
...
"apollo":{
"permission":{
"level":"private"
}
},
...
"trackType" : null,
"type" : "FeatureTrack",
"urlTemplate" : "tracks/GeneData/{refseq}/trackData.json"
},
Only owners can edit¶
Restricts deletion and reverting to original editor or admin user by setting:
apollo.only_owners_delete = true
Chado Export Configuration¶
Following are the steps for setting up a Chado data source that is compatible with Apollo Chado Export.
Create a Chado database¶
First create a database in PostgreSQL for Chado.
Note: Initial testing has only been done on PostgreSQL.
Default name is apollo-chado
and apollo-production-chado
for development and production environment, respectively.
Create a Chado user¶
Now, create a database user that has all access privileges to the newly created Chado database.
Load Chado schema and ontologies¶
Apollo assumes that the Chado database has Chado schema v1.2 or greater and has the following ontologies loaded:
- Relations Ontology
- Sequence Ontology
- Gene Ontology
The quickest and easiest way to do this is to use prebuilt Chado schemas. Apollo provides a prebuilt Chado schema with the necessary ontologies. (thanks to Eric Rasche at Center for Phage Technology, TAMU)
Users can load this prebuilt Chado schema as follows:
scripts/load_chado_schema.sh -u <USER> -d <CHADO_DATABASE> -h <HOST> -p <PORT> -s <CHADO_SCHEMA_SQL>
If there is already an existing database with the same name and if you would like to dump and create a clean database:
scripts/load_chado_schema.sh -u <USER> -d <CHADO_DATABASE> -h <HOST> -p <PORT> -s <CHADO_SCHEMA_SQL> -r
The ‘-r’ flag tells the script to perform a pg_dump if <CHADO_DATABASE>
exists.
e.g.,
scripts/load_chado_schema.sh -u postgres -d apollo-chado -h localhost -p 5432 -r -s chado-schema-with-ontologies.sql.gz
The file chado-schema-with-ontologies.sql.gz
can be found in Apollo/scripts/
directory.
The load_chado_schema.sh
script creates log files which can be inspected to see if loading the schema was successful.
Note that you will also need to do this for your testing and production instances, as well.
Configure data sources¶
In apollo-config.groovy
, uncomment the configuration for datasource_chado
and specify the proper database name, database user name and database user password.
Export via UI¶
Users can export existing annotations to the Chado database via the Annotator Panel -> Ref Sequence -> Export.
Export via web services¶
Users can also leverage the Apollo web services API to export annotations to Chado.
As a demonstration, a sample script, export_annotations_to_chado.groovy
is provided.
Usage for the script:
export_annotations_to_chado.groovy -organism ORGANISM_COMMON_NAME -username APOLLO_USERNAME -password APOLLO_PASSWORD -url http://localhost:8080/apollo
Data generation pipeline¶
Using the methods below you can generate and update a trackList.json
and then make any further manual required.
Canvas vs HTML in Apollo¶
Most of the JBrowse documentation for configuring tracks applies here. However, there are a few important points about Canvas vs HTML tracks.
It should be noted that if you need both benefits of each track type, you are free to duplicate that track and use an alternate styling or track type.
BigWig tracks are only shown as Canvas.
HTML Tracks¶
Pros: Create evidence by dragging and clicking on annotation or evidence does show the alignment. Can use CSS styling. Cons: Renders slower.
Note that in most cases regular HTML rendering will be preferable. Exceptions would be dense BAM alignments and dense Variant tracks where you are attempting to display the density at a higher resolution.
HTML Track mapping with type=<?> :
- Annotation / GFF3:
FeatureTrack
,NeatHTMLFeatures/View/Track/NeatFeature
,JBrowse/View/Track/HTMLFeatures
,WebApollo/View/Track/DraggableNeatHTMLFeatures
- Alignment:
JBrowse/View/Track/Alignments
,WebApollo/View/DraggableAlignments
- Variant:
JBrowse/View/Track/HTMLVariants
,WebApollo/View/Track/WebApolloHTMLVariants
Canvas Tracks¶
Pros: Renders faster, non-CSS style options. Cons: Can not drag to create evidence, clicking on annotation or evidence does not show alignment.
Note that in most cases regular HTML rendering will be preferable. Exceptions would be dense BAM alignments and dense Variant tracks where you are attempting to display the density at a higher resolution.
Canvas Track mapping with type=<?> :
- Annotation / GFF3:
NeatCanvasFeatures/View/Track/NeatFeature
,JBrowse/View/Track/CanvasFeatures
,WebApollo/View/Track/WebApolloNeatCanvasFeatures
- Alignment:
JBrowse/View/Track/Alignments2
,WebApollo/View/Track/WebApolloAlignments2
- Variant:
JBrowse/View/Track/CanvasVariants
,WebApollo/View/Track/WebApolloCanvasVariants
Apollo Automated Configuration and upload¶
Admin users may upload FASTA files to create new genomes and upload most track types in a similar manner if a default configuration is desirable.
Additionally admin users may also add most tracks in a similar fashion:
JBrowse Configuration¶
The manual data generation pipeline is based on the typical jbrowse commands such as prepare-refseqs.pl and
flatfile-to-json.pl, and these scripts are automatically copied to a local bin/ directory when you run the setup scripts
(e.g. apollo run-local
or apollo deploy
or install_jbrowse.sh
).
If you don’t see a bin/ subdirectory containing these scripts after running the setup, check setup.log and check the troubleshooting guide for additional tips or feel free to post the error and setup.log on GitHub or the mailing list.
prepare-refseqs.pl¶
The first step to setup the genome browser is to load the reference genome data. We’ll use the prepare-refseqs.pl
script to output to the data directory.
bin/prepare-refseqs.pl --fasta pyu_data/scf1117875582023.fa --out /opt/apollo/data
If you want to use an indexed FASTA genome then you can run prepare-refseqs.pl as follows:
bin/prepare-refseqs.pl --indexed_fasta pyu_data/scf1117875582023.fa --out /opt/apollo/data
The script will copy the genome FASTA and its FAI index into the output folder.
Note: the output directory is used later when we load the organism into the browser with the “Create organism” form
flatfile-to-json.pl¶
The flatfile-to-json.pl script can be used to load GFF3 files and you can customize the feature types. Here, we’ll start off by loading data from the MAKER GFF for the Pythium ultimum data. The simplest loading command specifies a –trackLabel, the –type of feature to load, the –gff file and the –out directory.
bin/flatfile-to-json.pl --gff pyu_data/scf1117875582023.gff --type mRNA \
--trackLabel MAKER --out /opt/apollo/data
Note: you can also use the command bin/maker2jbrowse
for loading the MAKER data.
Also see the section Customizing features section for more information on customizing the CSS styles of the Apollo features.
Note: Apollo uses features that are loaded at the “transcript” level. If your GFF3 has “gene” features with “transcript”/”mRNA” child features, make sure that you use the argument –type mRNA or –type transcript.
generate-names.pl¶
Once data tracks have been created, you can generate a searchable index of names using the generate-names.pl script:
bin/generate-names.pl --verbose --out /opt/apollo/data
This is optional but useful step to index of names and features and refseq names. If you have some tracks that have
millions of features, consider only indexing select tracks with the –tracks argument or disabling autocomplete with
--completionLimit 0
.
add-bam-track.pl¶
Apollo natively supports BAM files and the file can be read (in chunks) directly from the server with no preprocessing.
To add a BAM track, copy the .bam and .bam.bai files to your data directory, and then use the add-bam-track.pl to add the file to the tracklist.
mkdir /opt/apollo/data/bam
cp pyu_data/simulated-sorted.bam /opt/apollo/data/bam
cp pyu_data/simulated-sorted.bam.bai /opt/apollo/data/bam
bin/add-bam-track.pl --bam_url bam/simulated-sorted.bam \
--label simulated_bam --key "simulated BAM" -i /opt/apollo/data/trackList.json
Note: the bam_url
parameter is a URL that is relative to the data directory. It is not a filepath! Also, the .bai will
automatically be located if it is simply the .bam with .bai appended to it.
add-bw-track.pl¶
Apollo also has native support for BigWig files (.bw), so no extra processing of these files is required either.
To use this, copy the BigWig data into the jbrowse data directory and then use the add-bw-track.pl to add the file to the tracklist.
mkdir /opt/apollo/data/bigwig
cp pyu_data/*.bw /opt/apollo/data/bigwig
bin/add-bw-track.pl --bw_url bigwig/simulated-sorted.coverage.bw \
--label simulated_bw --key "simulated BigWig"
Note: the bw_url
parameter is a URL that is relative to the data directory. It is not a filepath!
Customizing different annotation types (advanced)¶
To change how the different annotation types look in the “User-created annotation” track, you’ll need to update the
mapping of the annotation type to the appropriate CSS class. This data resides in client/apollo/json/annot.json
, which
is a file containing Apollo tracks that is loaded by default. You’ll need to modify the JSON entry whose label is
Annotations
. Of particular interest is the alternateClasses
element. Let’s look at that default element:
"alternateClasses": {
"pseudogene" : {
"className" : "light-purple-80pct",
"renderClassName" : "gray-center-30pct"
},
"tRNA" : {
"className" : "brightgreen-80pct",
"renderClassName" : "gray-center-30pct"
},
"snRNA" : {
"className" : "brightgreen-80pct",
"renderClassName" : "gray-center-30pct"
},
"snoRNA" : {
"className" : "brightgreen-80pct",
"renderClassName" : "gray-center-30pct"
},
"ncRNA" : {
"className" : "brightgreen-80pct",
"renderClassName" : "gray-center-30pct"
},
"miRNA" : {
"className" : "brightgreen-80pct",
"renderClassName" : "gray-center-30pct"
},
"rRNA" : {
"className" : "brightgreen-80pct",
"renderClassName" : "gray-center-30pct"
},
"repeat_region" : {
"className" : "magenta-80pct"
},
"transposable_element" : {
"className" : "blue-ibeam",
"renderClassName" : "blue-ibeam-render"
}
}
For each annotation type, you can override the default class mapping for both className
and renderClassName
to use
another CSS class. Check out the Customizing features section for more
information on customizing the CSS classes.
Customizing features¶
The visual appearance of biological features in Apollo (and JBrowse) is handled by CSS stylesheets with HTMLFeatures
tracks. Every feature and subfeature is given a default CSS “class” that matches a default CSS style in a CSS
stylesheet. These styles are are defined in client/apollo/css/track_styles.css
and
client/apollo/css/webapollo_track_styles.css
. Additional styles are also defined in these files, and can be used by
explicitly specifying them in the –className, –subfeatureClasses, –renderClassname, or –arrowheadClass parameters to
flatfile-to-json.pl (see data loading section).
Apollo differs from JBrowse in some of it’s styling, largely in order to help with feature selection, edge-matching,
and dragging. Apollo by default uses invisible container elements (with style class names like “container-16px”) for
features that have children, so that the children are fully contained within the parent feature. This is paired with
another styled element that gets rendered within the feature but underneath the subfeatures, and is specified by the
--renderClassname
argument to flatfile-to-json.pl
. Exons are also by default treated as special invisible
containers, which hold styled elements for UTRs and CDS.
It is relatively easy to add other stylesheets that have custom style classes that can be used as parameters to
flatfile-to-json.pl
. For example, you can create /opt/apollo/data/custom_track_styles.css
which contains two new
styles:
.gold-90pct,
.plus-gold-90pct,
.minus-gold-90pct {
background-color: gold;
height: 90%;
top: 5%;
border: 1px solid gray;
}
.dimgold-60pct,
.plus-dimgold-60pct,
.minus-dimgold-60pct {
background-color: #B39700;
height: 60%;
top: 20%;
}
In this example, two subfeature styles are defined, and the top property is being set to (100%-height)/2 to assure that the subfeatures are centered vertically within their parent feature. When defining new styles for features, it is important to specify rules that apply to plus-stylename and minus-stylename in addition to stylename, as Apollo adds the “plus-” or “minus-” to the class of the feature if the the feature has a strand orientation.
You need to tell Apollo where to find these styles by modifying the JBrowse config or the plugin config, e.g. by adding this to the trackList.json
"css" : "data/custom_track_styles.css"
Then you may use these new styles using –subfeatureClasses, which uses the specified CSS classes for your features in the genome browser, for example:
bin/flatfile-to-json.pl --gff MyFile.gff \
--type mRNA --trackLabel MyTrack \
--subfeatureClasses '{"CDS":"gold-90pct","UTR": "dimgold-60pct"}'
Bulk loading annotations to the user annotation track¶
GFF3¶
You can use the tools/data/add_features_from_gff3_to_annotations.pl
script to bulk load GFF3 files with transcripts
to the user annotation track. Let’s say we want to load our maker.gff
transcripts.
tools/data/add_features_from_gff3_to_annotations.pl \
-U localhost:8080/Apollo -u web_apollo_admin -p web_apollo_admin \
-i scf1117875582023.gff -t mRNA -o "name of organism"
The default options should be able to handle most GFF3 files that contain genes, transcripts, and exons.
You can still use this script even if the GFF3 file that you are loading does not contain transcripts and exon types.
Let’s say we want to load match
and match_part
features as transcripts and exons respectively. We’ll use the
blastn.gff
file as an example.
tools/data/add_features_from_gff3_to_annotations.pl \
-U localhost:8080/Apollo -u web_apollo_admin -p web_apollo_admin \
-i cf1117875582023.gff -t match -e match_part -o "name of organism"
You can view the add_features_from_gff3_to_annotations.pl help (-h
) option for all available options.
Note: Apollo makes a clear distinction between a transcript and an mRNA. Genes that have mRNA as its child feature are treated as protein coding annotations and Genes that have transcript as its child feature are treated as non-coding annotations, specifically a pseudogene.
Note: In order to create meaningful names from your evidence when creating manual annotations, the GFF3 should
provide the Name
attribute in column 9 of the GFF3 spec as shown in this example:
NC_000001.11 BestRefSeq gene 11874 14409 . + . ID=gene1;Name=DDX11L1;Dbxref=GeneID:100287102,HGNC:37102;description=DEAD%2FH %28Asp-Glu-Ala-Asp%2FHis%29 box helicase 11 like 1;gbkey=Gene;gene=DDX11L1;pseudo=true
If you would like to look at a compatible representative GFF3, export annotations from Apollo via GFF3 export.
Disable draggable¶
Apollo has a number of specific track config parameters
overrideDraggable (boolean)
determines whether to transform the alignments tracks to draggable alignments
overridePlugins (boolean)
determines whether to transform alignments and sequence tracks
These can be specified on a specific track or in a global config.
Troubleshooting guide¶
Tomcat memory¶
Typically, the default memory allowance for the Java Virtual Machine (JVM) is too low. The memory requirements for Web Apollo will depend on many variables, but in general, we recommend at least 512g for the maximum memory and 512m for the minimum, though a 2 GB maximum seems to be optimal for most server configurations.
Suggested Tomcat memory settings¶
export CATALINA_OPTS="-Xms512m -Xmx2g \
-XX:+CMSClassUnloadingEnabled \
-XX:+CMSPermGenSweepingEnabled \
-XX:+UseConcMarkSweepGC"
In cases where the assembled genome is highly fragmented, additional tuning of memory requirements and garbage collection will be necessary to maintain the system stable. Below is an example from a research group that maintains over 40 Apollo instances with assemblies that range from 1,000 to 150,000 scaffolds (reference sequences) and over one hundred users:
export CATALINA_OPTS="-Xmx12288m -Xms8192m \
-XX:ReservedCodeCacheSize=64m \
-XX:+UseG1GC \
-XX:+CMSClassUnloadingEnabled \
-Xloggc:$CATALINA_HOME/logs/gc.log \
-XX:+PrintHeapAtGC \
-XX:+PrintGCDetails \
-XX:+PrintGCTimeStamps"
To change your settings, you can usually edit the setenv.sh script in $TOMCAT_BIN_DIR/setenv.sh
where
$TOMCAT_BIN_DIR
is the directory where the Tomcat binaries reside. It is possible that this file doesn’t exist by default, but it will be picked up when Tomcat restarts. Make sure that tomcat can read the file.
In most cases, creating the setenv.sh should be sufficient but you may have to edit a catalina.sh or another file directly depending on your system and tomcat setup. For example, on Ubuntu, the file /etc/default/tomcat7 often contains these settings.
Confirm your settings¶
Your CATALINA_OPTS settings from setenv.sh can be confirmed with a tool like jvisualvm or via the command line with the
ps
tool. e.g. ps -ef | grep java
should yield something like the following allowing you to confirm that your memory settings have been picked up.
root 9848 1 0 Oct22 ? 00:36:44 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Djava.util.logging.config.file=/usr/local/tomcat/current/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Xms1g -Xmx2g -XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled -XX:+UseConcMarkSweepGC -Dj
Re-install after changing settings¶
If you start seeing memory leaks (java.lang.OutOfMemoryError: Java heap space
) after doing an update, you might try
re-installing, as the live re-deploy itself can cause memory leaks or an inconsistent software state.
If you have named your web application named Apollo.war
then you can remove all of these files from your webapps
directory and re-deploy.
- Run
apollo deploy
- Undeploy any existing Apollo instances
- Stop tomcat
- Copy the war file to the webapps folder
- Start tomcat
Tomcat permissions¶
Preferably, when running Apollo or any webserver, you should not run Tomcat as root. Therefore, when deploying your war file to tomcat or another web application server, you may need to tune your file permissions to make sure Tomcat is able to access your files.
On many production systems, tomcat will typically belong to a user and group called something like ‘tomcat’. Make sure that the ‘tomcat’ user can read your “webapps” directory (where you placed your war file) and write into the annotations and any other relevant directory (e.g. tomcat/logs). As such, it is sometimes helpful to add the user you logged-in as to the same group as your tomcat user and set group write permissions for both.
Consider using a package manager to install Tomcat so that proper security settings are installed, or to use the jsvc http://tomcat.apache.org/tomcat-7.0-doc/security-howto.html#Non-Tomcat_settings
Errors with JBrowse¶
JBrowse tools don’t show up in bin
directory (or install at all) after install or typing install_jbrowse.sh
¶
If the bin
directory with JBrowse tools doesn’t show up after calling install_jbrowse.sh
JBrowse is having trouble installing itself for a few possible reasons. If these do not work, please observe the JBrowse troubleshooting and JBrowse install pages, as well and the setup.log
file created during the installation process.
cpanm or other components are not installed¶
Make sure the appropriate JBrowse libraries are installed on your system.
If you see chmod: cannot access `web-app/jbrowse/bin/cpanm': No such file or directory
make sure to install cpanm.
Git tool is too old¶
Git expects to clone a single branch which is supported in git 1.7.10 and greater. The output when that fails looks something like this:
Buildfile: build.xml
copy.apollo.plugin.webapp:
setup-jbrowse:
git.clone:
[exec] Result: 129
The solution is to upgrade git to 1.7.10 or greater or remove the line with the --single-branch
option in build.xml
.
Accessing git behind a firewall.¶
If you are behind a firewall, checking out code using the git://
protocol may not be allowed, but that is the default. The output will look something like this:
setup-jbrowse:
git.clone:
[exec] Submodule 'src/FileSaver' (git://github.com/dkasenberg/FileSaver.js.git) registered for path 'src/FileSaver'
[exec] Submodule 'src/dbind' (git://github.com/rbuels/dbind.git) registered for path 'src/dbind'
. . . .
[exec] Submodule 'src/xstyle' (git://github.com/kriszyp/xstyle.git) registered for path 'src/xstyle'
[exec] Result: 1
with possibly more output below.
Type:
git config --global url."https://".insteadOf git://
in the command-line and then re-install using ./apollo clean-all
./apollo run-local
(or deploy).
e.g. “Can’t locate Hash/Merge.pm in @INC” or “Can’t locate JBlibs.pm in @INC”¶
If you are trying to run the jbrowse binaries but get these sorts of errors, try running install_jbrowse.sh
which will
initialize as many pre-requisites as possible including JBLibs and other JBrowse dependencies.
Rebuilding JBrowse¶
You can manually clear jbrowse files from web-app/jbrowse and re-run apollo deploy
to rebuild JBrowse.
RequestError: Unable to load … Apollo2/jbrowse/data/trackList.json status: 500¶
Apollo2 does fairly strict JSON validation so make sure your trackList.json file is valid JSON
If you still get this error after validating please forward the issue to our github issue tracker.
Complaints about 8080 being in use¶
Please check that you don’t already have a tomcat running netstat -tan | grep 8080
. Sometimes tomcat does not exit
properly. ps -ef | grep java
and then kill -9
the offending processing.
Note that you can also configure tomcat to run on different ports, or you can launch a temporary instance of apollo with
apollo run-local 8085
for example to avoid the port conflict.
Unable to open the h2 / default database for writing¶
If you receive an error similar to this:
SEVERE: Unable to create initial connections of pool.
org.h2.jdbc.JdbcSQLException: Error opening database:
"Could not save properties /var/lib/tomcat7/prodDb.lock.db" [8000-176]
Then this is due to the production server trying to write an h2 instance in an area it doesn’t have permissions to. If you use H2 (which is great for testing or single-user user, but not for full-blown production) make sure that:
You can modify the specified data directory for the H2 database in the apollo-config.groovy. For example, using the /tmp/ directory, or some other directory:
url = "jdbc:h2:/tmp/prodDb;MVCC=TRUE;LOCK_TIMEOUT=10000;DB_CLOSE_ON_EXIT=FALSE"
This will write a H2 db file to /tmp/prodDB.db
. If you don’t specify an absolute path it will try to write in the
same directory that tomcat is running in e.g., /var/lib/tomcat7/
which can have permission issues.
More detail on database configuration when specifying the apollo-config.groovy
file is available in the
setup guide.
Grails cache errors¶
In some instances you can’t write to the default cache location on disk. Part of an example config log:
2015-07-03 14:37:39,675 [main] ERROR context.GrailsContextLoaderListener - Error initializing the application: null
java.lang.NullPointerException
at grails.plugin.cache.ehcache.GrailsEhCacheManagerFactoryBean$ReloadableCacheManager.rebuild(GrailsEhCacheManagerFactoryBean.java:171)
at grails.plugin.cache.ehcache.EhcacheConfigLoader.reload(EhcacheConfigLoader.groovy:63)
at grails.plugin.cache.ConfigLoader.reload(ConfigLoader.groovy:42)
There are several solutions to this, but all involve updating the apollo-config.groovy
file to override the caching
defined in the Config.groovy.
Disabling the cache:¶
grails.cache.config = {
cache {
enabled = false
name 'globalcache'
}
}
This can also be done by removing the plugin. In grails-app/conf/BuildConfig
remove / comment out the line and re-building:
compile ':cache-ehcache:1.0.5'
Disallow writing overflow to disk¶
Can be used for small instances
grails.cache.config = {
// avoid ehcache naming conflict to run multiple WA instances
provider {
name "ehcache-apollo-"+(new Date().format("yyyyMMddHHmmss"))
}
cache {
enabled = true
name 'globalcache'
eternal false
overflowToDisk false // THIS IS THE IMPORTANT LINE
maxElementsInMemory 100000
}
}
Specify the overflow directory¶
Best for high load servers, which will need the cache. Make sure your tomcat / web-server user can write to that directory:
// copy from Config.groovy except where noted
grails.cache.config = {
...
cache {
...
maxElementsOnDisk 10000000
// this is the important part, below!
diskStore{
path '/opt/apollo/cache-directory'
}
}
...
}
JSON in the URL with newer versions of Tomcat¶
When JSON is added to the URL string (e.g., addStores
and addTracks
) you may get this error with newer patched versions of Tomcat 7.0.73, 8.0.39, 8.5.7:
java.lang.IllegalArgumentException: Invalid character found in the request target. The valid characters are defined in RFC 7230 and RFC 3986
To fix these, the best solution we’ve come up with (and there may be many) is to explicitly allow these characters, which you can do starting with Tomcat versions: 7.0.76, 8.0.42, 8.5.12.
This is done by adding the following line to $CATALINA_HOME/conf/catalina.properties
:
tomcat.util.http.parser.HttpParser.requestTargetAllow=|{}
Information on the grails ehcache plugin (see “Overriding values”) and ehcache itself.
Java mismatch¶
If you get an Unsupported major.minor error
or similar, please confirm that the version of java that tomcat is
running ps -ef | grep java
is the same as the one you used to build. Setting JAVA_HOME to the Java 8 JDK should fix most problems.
Mysql invalid TimeStamp error¶
For certain version of MySQL we might get errors of this nature:
SQLException occurred when processing request: [GET] /apollo/annotator/getAppState Value ‘0000-00-00 00:00:00’ can not be represented as java.sql.Timestamp. Stacktrace follows: java.sql.SQLException: Value ‘0000-00-00 00:00:00’ can not be represented as java.sql.Timestamp
The fix is to set the zeroDateTimeBehavior=convertToNull
to the url connect screen. Originally identified here. Here is an example URL:
jdbc:mysql://localhost/apollo_production?zeroDateTimeBehavior=convertToNull&autoReconnect=true&characterEncoding=UTF-8&characterSetResults=UTF-8
Example Build Script on Unix with MySQL¶
This is an example build script. It may NOT be appropriate for your environment but does demonstrate what a typical build process might look like on a Unix system using MySQL.
Please consult our Setup and Configuration documentation for additional information.
# Install prereqs
apt-get install tomcat8 git ant openjdk-8-jdk nodejs
# Upped tomcat memory per Apollo devs instructions:
echo "export CATALINA_OPTS="-Xms512m -Xmx1g \
-XX:+CMSClassUnloadingEnabled \
-XX:+CMSPermGenSweepingEnabled \
-XX:+UseConcMarkSweepGC" >> /etc/default/tomcat8
# Download and extract their tarball
npm install -g yarn
wget https://github.com/GMOD/Apollo/archive/2.5.0.tar.gz
mv 2.5.0.tar.gz Apollo-2.5.0.tar.gz
tar xf Apollo-2.5.0.tar.gz
Setup apollo mysql user and database¶
# Login to mysql e.g.,
mysql -u root
# Create a user
CREATE USER 'apollo'@'localhost' IDENTIFIED BY 'THE_PASSWORD';
CREATE DATABASE `apollo-production`;
GRANT ALL PRIVILEGES ON `apollo-production`.* To 'apollo'@'localhost';
Configure apollo for mysql.¶
cd ~/src/Apollo-2.5.0
# Let's store the config file outside of the source tree.
mkdir ~/apollo.config
# Copy the template
cp sample-mysql-apollo-config.groovy ~/apollo.config/apollo-config.groovy
ln -s ~/apollo.config/apollo-config.groovy
# For now, turn off tomcat8 so that we can see if the locally-run version works service tomcat8 stop
# Run the local version, which verifies install reqs, and does a bunch of stuff (see below)
cd Apollo-2.5.0
./apollo run-local
If a pre-installed instance:¶
rm -rf /var/lib/tomcat/webapps/apollo
rm -f /var/lib/tomcat/webapps/apollo.war
Startup tomcat again:
service tomcat8 start
Create file target/apollo-2.5.0.war
by running:
./apollo deploy
and copy it into the war area where it is automatically:
sudo cp target/apollo-2.5.0.war /var/lib/tomcat/webapps/apollo.war
Prepare JBrowse data¶
Add the FASTA assembly:
~/src/Apollo-2.5.0/jbrowse/bin/prepare-refseqs.pl \
--fasta /research/dre/assembly/assembly1.fasta.gz \
--out ~/organisms/dre
Add annotations:
~/src/Apollo-2.5.0/jbrowse/bin/flatfile-to-json.pl \
--gff /research/dre/annotation/FINAL_annotations/ssc_v4.gff \
--type mRNA --trackLabel Annotations --out ~/organisms/dre
In interface point to directory ~/organisms/dre when
loading organism.
Adding OpenID Connect Authentication to Apollo¶
Overview¶
OpenID Connect (https://openid.net/connect/), or OIDC, is an authentication layer that uses the OAuth 2 protocol.
It allows you to devolve authentication to an Identity Provider. It is the Identity Provider who registers users, provides the infrastructure (back end database, log-in page, “forgotten your password?” functions, etc.), and bears Data Protection responsibility – though of course it means that they ultimately have control over who can access your Apollo instance.
It is most likely to be useful if you have some relation to an Identity Provider that represents your organization or user community, or if you intend to provide public access and only require an arbitrary identifier for an end user (to ensure it is the same individual each time they start an Apollo session) without any need to know their real-world identity.
Architecture¶
The method described here uses the following components:
- An Apache httpd 2.4 web server (https://httpd.apache.org/)
- The
mod_auth_openidc
Apache module (https://www.mod-auth-openidc.org/) - The
remoteUserAuthenticatorService
class provided by Apollo
The Apache httpd is deployed to provide a reverse proxy as the sole point of access to Apollo for end users. Its primary role is to allow the use of mod_auth_openidc to add OIDC access control in front of Apollo, but of course it is also an efficient way to serve static content (e.g. user guides). It also makes it very easy to place multiple services or sources of content behind the same access control layer. Documentation for setting up a reverse proxy is available at https://httpd.apache.org/docs/2.4/howto/reverse_proxy.html.
mod_auth_openidc is the module that adds OIDC authentication to Apache. The usage described here is only the simplest case, but this module offers a lot of functionality, including the option of letting end users choose between multiple Identity Providers. The module will intercept requests for protected resources, and redirect the end user to the Identity Provider so they can log in; after a successful log in, the end user is redirected back to the httpd, which serves the protected content. OIDC data are made available in the Apache environment, so that applications run by the httpd (including the reverse proxy server) can access values such as the authenticated User Identifier.
The RemoteUserAuthenticatorService class in Apollo is part of the standard distribution. It is used to grant access to end users who present with the REMOTE_USER HTTP header. The Apache reverse proxy can configured to pass the User Identifier, retrieved during OIDC authentication, as a REMOTE_USER header: thus users who have successfully authenticated via OIDC will be granted access to Apollo.
Securing Tomcat¶
Because RemoteUserAuthenticatorService gives access to any end user who sends a REMOTE_USER header (and any reasonably savvy user can add whatever HTTP headers they wish to any request sent by their browser), the Tomcat that serves Apollo must not be directly accessible to untrusted end users. For example, the Tomcat port could be made accessible only on localhost, or on a corporate private network.
Authorization¶
The end result of the process described above is an end user who is authenticated, i.e. you know the request comes from someone who was able to log in to an account associated with the User identifier you received. The user is not authorized at this stage, i.e. they have no permission to access any non-public resources within Apollo.
You can configure Apollo automatically to add all Remote Users to a default user group, which will authorize them according to the access permissions granted to that group. If you are willing immediately to grant full access to all Remote Users authenticated by your Identity provider, this is all that is required.
If you want to have finer control over end users’ access rights, you can use normal admin processes (user interface or API) to grant access – but be aware that end users’ accounts are only created in Apollo when they log in for the first time. That means you cannot grant them access until after they first log in – so their first session in Apollo will consist of nothing but a “you are not authorized to view any organisms” message. It is probably better to have a default group for Remote Users that provides limited (read only?) access, and a process for adding additional access.
Configuration¶
Apache 2.4 reverse proxy configuration¶
Apache can be configured to add the reverse proxy server independently from adding the OIDC access control (it is probably a good idea to add reverse proxying first as it will make any configuration problems easier to find). Reverse proxying on its own should be completely transparent to end users.
The correct Apache proxy configuration described in the Apollo documentation at https://genomearchitect.readthedocs.io/en/latest/Configure.html#apache-proxy
This should result in these four proxying modules being enabled in your httpd conf file(s), with directives like this:
LoadModule proxy_module modules/mod_proxy.so
LoadModule proxy_connect_module modules/mod_proxy_connect.so
LoadModule proxy_http_module modules/mod_proxy_http.so
LoadModule proxy_wstunnel_module modules/mod_proxy_wstunnel.so
If you wish, you should be able to edit the Apache configuration manually to enable
these modules, rather than use a2enmod
. The directives above should already be
in the distributed httpd.conf
file, commented out.
It is a good idea to use a VirtualHost directive to control the requests which are proxied. For instance to proxy all requests on port 80 to a tomcat running on the same machine using port 8080:
<Proxy *>
Require all granted
</Proxy>
<VirtualHost *:80>
ServerAdmin <your admin email>
ServerName <your apollo host>
ProxyPreserveHost On
ProxyRequests Off
ProxyPass /stomp/info http://localhost:8080/stomp/info
ProxyPassReverse /stomp/info http://localhost:8080/stomp/info
ProxyPass /stomp ws://localhost:8080/stomp
ProxyPassReverse /stomp ws://localhost:8080/stomp
ProxyPass / http://localhost:8080/
ProxyPassReverse / http://localhost:8080/
</VirtualHost>
(Substitute your admin/support email address and the host name where indicated.)
You can tweak the VirtualHost settings so that requests are proxied according to criteria such as port, host name etc. (see Apache 2.4 documentation for details). This can be useful to give you access direct to Apollo, bypassing the OIDC log in, say for admin access or local user accounts (see below).
Dockerfile¶
This Dockerfile will provide an httpd container (including the install of mod_auth_openidc, used in the next section). Put the conf file(s) you wish the httpd to use in apache2-config (a subdirectory of the docker build directory).
FROM httpd:2.4
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get -qq update && \
apt-get install --yes ca-certificates libapache2-mod-auth-openidc
COPY apache2-config/ /usr/local/apache2/conf/
If you are using docker, the Apache reverse proxy configuration will need to
refer to the host running the Apollo tomcat server (if you were to use localhost
in the proxy configuration, it would refer to the docker container in which the
httpd is running, not to the host machine on which you are running the container).
It is good practice in docker for the tomcat to run in a separate
container from the httpd. On a docker network, you use container names as host
names; so if the tomcat container was named apollo-tomcat, then you would use
http://apollo-tomcat:8080/
(etc.) in the proxy configuration directives.
mod_auth_openidc configuration¶
Before starting this part of the configuration, you will need to register with an OIDC provider. If you do not have one already, the developer tools provided by ORCID (https://orcid.org/developer-tools) allow a quick and easy set up for personal development use.
OIDC can be enabled with this addition to the httpd configuration:
LoadModule auth_openidc_module /usr/lib/apache2/modules/mod_auth_openidc.so
<Location />
AuthType openid-connect
Require valid-user
</Location>
<Location /public>
AuthType None
Require all granted
</Location>
OIDCPassClaimsAs environment
OIDCProviderMetadataURL <URL provided by your identity provider>
OIDCClientID <ID issued to you by your identity provider>
OIDCClientSecret <secret issued to you by your identity provider>
OIDCScope "openid email profile"
# vanity URL points protected path but NOT to any content
OIDCRedirectURI http://<your host name>/apollo/annotator/openid
OIDCCryptoPassphrase <generate your own random string for this>
Your Identity Provider will give you the values for OIDCProviderMetadataURL
(conventionally at /.well-known/openid-configuration
),
OIDCClientID
and OIDCClientSecret
when you register your client. Refer to
their documentation to find supported scopes to include in OIDCScope
; openid
is required for authentication, but email
and profile (requests for email address and profile information) will probably
be supported as well if you need them.
When you register, you will also need to provide the URL to which the Identity
Provider should redirect end users after authentication. Add this to your
Apache configuration as OIDCRedirectURI
. This URL is used internally by
mod_auth_openidc and it should not be a “real” URL that references actual
content – but must be something covered by the OIDC access control (see below).
OIDCCryptoPassphrase
is used internally; just create a random string.
Combining OIDC with the proxy¶
Once OIDC has been enabled as desribed above, access control is just standard
Apache 2.4 configuration, with openid-connect
as the AuthType. This is
commonly done using Location directive(s) to define the path(s) of content
to which access control is applied, but Apache provides many flexible methods;
e.g. for complicated access control rules, regular expression matching
directives (like LocationMatch) are worth a look.
The final bit of Apache configuration required is a RequestHeader directive. This passes the OIDC User Identifier (which mod_auth_openidc makes available as an Apache environment variable) downstream in proxied requests as a REMOTE_USER HTTP header.
The example below extends the reverse proxy example (above), adding two
Location
directives that place all content served by this
Virtual Host behind the OIDC access control, except content in /public
which
remains freely accessible; and adding a RequestHeader
directive to send a
REMOTE_USER header downstream to Apollo:
<Proxy *>
Require all granted
</Proxy>
<VirtualHost *:80>
ServerAdmin <your admin email>
ServerName <your apollo host>
ProxyPreserveHost On
ProxyRequests Off
# OIDC log in will be required for everything...
<Location />
AuthType openid-connect
Require valid-user
</Location>
# ...except for public access content here
<Location /public>
AuthType None
Require all granted
</Location>
RequestHeader set Remote_User "expr=%{REMOTE_USER}"
ProxyPass /stomp/info http://localhost:8080/stomp/info
ProxyPassReverse /stomp/info http://localhost:8080/stomp/info
ProxyPass /stomp ws://localhost:8080/stomp
ProxyPassReverse /stomp ws://localhost:8080/stomp
ProxyPass / http://localhost:8080/
ProxyPassReverse / http://localhost:8080/
</VirtualHost>
Apollo configuration¶
When the Apache is fully configured as described above, requests from all authenticated users will include the REMOTE_USER header. Apollo must be configured to use Remote User authentication, to make it grant access to all users who present with this header.
Add the following to apollo-config.groovy
apollo {
authentications = [
["name":"Remote User Authenticator",
"className":"remoteUserAuthenticatorService",
"active":true,
"params":["default_group": "remote_users"],
]
,
["name":"Username Password Authenticator",
"className":"usernamePasswordAuthenticatorService",
"active":true,
]
]
}
Note that params
defines a default group. This is optional but recommended
(see note above regarding authorization). The named group must have been created, with the user interface
or API, and appropriate organism access rights defined. All Remote Users will
be placed in this group when they first log in.
Maintaining administrator access¶
Access to the administrator account, or any other local user accounts that you want to use, require that OIDC authentication is bypassed – simply because that stops you seeing the Apollo log-in dialog.
If you have access direct to Tomcat on (probably) port 8080 of the host machine, you can use that for direct access to Apollo. Because you have bypassed the OIDC authentication, your browser will not send a REMOTE_USER header – so you should see the Apollo local user log-in dialog.
If direct access to the Tomcat port is a problem, you can simply add another Virtual Host to the Apache configuration. This can provide access via a different host name or port. For example, to give access on port 8000:
Listen 8000
<VirtualHost *:8000>
ServerAdmin <your admin email>
ServerName <your apollo host>
ProxyPreserveHost On
ProxyRequests Off
<Location />
AuthType None
Require all granted
</Location>
ProxyPass /stomp/info http://localhost:8080/stomp/info
ProxyPassReverse /stomp/info http://localhost:8080/stomp/info
ProxyPass /stomp ws://localhost:8080/stomp
ProxyPassReverse /stomp ws://localhost:8080/stomp
ProxyPass / http://localhost:8080/
ProxyPassReverse / http://localhost:8080/
</VirtualHost>
To keep the Apollo secure (REMOTE_USER headers are easily spoofed), if this port
is openly accessible then some access control should be added, e.g. with
Require ip <your.ip.address.here>
.
Migration guide¶
This guide explains how to prepare your Apollo 2.x instance, and to migrate data from previous Web Apollo versions into 2.0.
In all cases you will need to follow the guide for setting up your 2.x instance.
Migration from Evaluation to Production:¶
If you are running your evaluation/development version using ./apollo run-local
when you setup your production
instance, any prior annotations will use a separate database.
If you are using the same production instance you can use scripts to delete all annotations and preferences:
scripts/delete_all_features.sh
or just the annotations:
scripts/delete_only_features.sh
If you want to start from scratch (including reloading organisms and users), you can just drop the database (when the server is not running) and the proper tables will be recreated on startup.
Migration from 2.0.X to 2.0.Y on production:¶
Installation from a downloaded release¶
- Download the desired Apollo release from the bottom of each release. Official releases will be tagged as “Release” and have a green label.
- Expand the archive.
- Copy your existing apollo-config.groovy file into the directory.
- Always backup your database!
- Create a new war file as below:
./apollo deploy
. - Turn off tomcat and remove the old apollo directory and
.war
file in the webapps folder. - Copy in new .war file with the same name.
- Restart tomcat and you are ready to go.
Note if you you choose to have two different versions of Apollo running, though need to point to different database instances or you will experience problems.
Installation from a checked out github¶
If you want bleeding and only moderately tested code (not recommended unless you feel you know what you’re doing), you can clone Apollo directly from our source page https://github.com/GMOD/Apollo/
Any upgrading can be taken care of during a pull. Please note that as we sometimes change the version of JBrowse, so you should do:
./apollo clean-all
before building a target for production.
You can the follow the directions for deploying a downloaded release, above.
Migration from 1.0 to 2.0:¶
We provide examples in the form of [migration scripts](https://github.com/gmod/apollo/tree/master/docs/web_services/ examples) in the docs/web_services/examples directory. These tools are also described in the command line tools section.
We have written many of the command line tools examples using the groovy language, but mostly any language will work (Perl, shell/curl, Python, etc.).
Migrate Annotations¶
We provide a [migration script](https://github.com/gmod/apollo/tree/master/docs/web_services/examples/groovy/ migrate_annotations1to2.groovy) that connects to a single Web Apollo 1 instance and populates the annotations for an organism for a set of sequences / (confusingly called tracks as well). It would be best to develop your script on a development instance of Apollo2 for restricted sequences.
To get the scripts working properly, you’ll need to provide the list of sequences (or tracks) to migrate for each
organism. You can get the list of tracks by either using the database (select * from tracks ;
) or looking in the Web
Apollo annotations directory
ls -1 /opt/apollo/annotations/ | grep Annotations | grep -v history | paste -s -d"," -
Migrate Users¶
You have to add users de novo using something like the add_users.groovy script. In this case you create a csv file with the email, name, password, and role (’user’ or ‘admin’). This is passed into the add_users.groovy script and users are added.
From Web Apollo 1, you should be able to pull user names out of the database select * from users ;
, but there is not
much overlap between users in Web Apollo1.x and Apollo2.x.
If you have only a few users, however, just adding them manually on the users will likely be easier.
Add Organisms¶
If possible adding organisms on the organisms tab is the easiest option if you only have a handful of organisms.
The [add_organism.groovy script](https://github.com/gmod/apollo/tree/master/docs/web_services/examples/groovy/ add_organism.groovy) can help automate this process if you have a large number of migrations to handle.
Demo¶
Please use our Demo Server with demo@demo.com / demo at login to play around with our features. Annotations are routinely removed so go wild.
Then, click on the “Ref Sequence” tab from the panel on the right to choose a Group to display. You can choose one of the Groups visible in the list, or you may type the name of the name of the Group in the “Search” box (for example: Group16.4 Group1.37; these groups have many gene models to get you started). You may now start annotating! Once you have displayed the first group, you may also choose to switch to a different Group from the drop-down menu in the navigation bar (e.g. Group1.10 Group1.33).
If you would like to also experience the “administrator user” please send us a request by email to the Apollo Developers Team.
The “Ref Sequence” selection panel on the right allows you to view all available reference sequences (e.g. scaffolds, groups, chromosomes, etc) and conduct bulk operations on those sequences (for example: exporting data).
You can choose a different organism from the drop-down menu on the upper left corner of the annotator panel. Please be aware you have access to the following organisms: Honeybee, Human-hg38, Yeast, and Volvox Fictious (the JBrowse demonstration sample organism). Choosing any of the other available options will show an error warning alerting you that you do not have sufficient permissions to perform the operation. Should you encounter this error, simply return to one of the organisms listed above.
If you are new to Apollo, we recommend that you read through our User Guide to learn more about the software and its functionality.
Please Note: We have not tested the current version of Apollo on Internet Explorer. If you are not able to use Apollo on IE, you will need to use a different browser such as Firefox or Google Chrome (both available free of cost).
Happy Annotating!
User’s Guide¶
This guide allows users to:
- Become familiar with the environment of the Apollo annotation tool.
- Understand Apollo’s functionality for the process of manual annotation.
- Learn to corroborate and modify computationally predicted gene models using all available gene predictions and biological evidence available in Apollo.
- Navigate through this user guide using the ‘Table of Contents’ at the bottom of this page.
General Information¶
General Process of Manual Annotation¶
The major steps of manual annotation using Apollo can be summarized as follows:
- Locate a chromosomal region of interest.
- Determine whether a feature in an existing evidence track provides a reasonable gene model to start annotating.
- Drag the selected feature to the ‘User Annotation’ area, creating an initial gene model.
- Use editing functions to edit the gene model if necessary.
- Check your edited gene model for consistency with existing homologs by exporting the FASTA formatted sequence and searching a protein sequence database, such as UniProt or the NCBI Non Redundant (NR) database, and by conducting preliminary functional assignments using the Gene Ontology (GO) database.
When annotating gene models using Apollo, remember that you are looking at a ‘frozen’ version of the genome assembly. This means that you will not be able to modify the assembled genome sequence itself, but you will be able to instruct Apollo to take into account modifications to the reference sequence and calculate their consequences. For instance, for any given protein coding gene, Apollo is able to predict the consequences that deleting a string of nucleotide residues will have on the coding sequence.
Annotation¶
Apollo allows annotators to modify and refine the precise location and structure of the genome elements that predictive algorithms cannot yet resolve automatically. Using Apollo, annotators may corroborate or modify the structures of coding genes, pseudogenes, repeat regions, transposable elements, and non-coding RNAs (i.e: snRNA, snoRNA, rRNA, tRNA, and miRNA).
Annotating a gene¶
Below are detail about both biological principles and technical aspects to consider when editing a gene prediction.
Select the scaffold, chromosome or linkage group where you wish to conduct your annotations.¶
Search for a specific sequence¶
If you do not know the scaffold ID and have the sequence of a transcript or protein homolog related to your gene of interest, you might use the ‘Search Sequence’ feature to run a BLAT (BLAST-Like Alignment Tool) search. Querying the assembled genome using BLAT will determine the existence of a gene model prediction that is putatively homologous to your gene of interest. Click the ‘Tools’ item on the Apollo menu bar, and select ‘Sequence Search’ from the drop-down choices. Choose to run a Protein or Nucleotide BLAT search from the drop down menu as appropriate, and paste the string of residues to be used as query. Check the box labeled ‘Search all genomic sequences’ to search the entire genome.
The existence of paralogs may cause your query to match more than one scaffold or genomic range. Select the desired genomic range to be displayed in the Apollo Main Window. The result of your query will be displayed in the browser window behind the search box, highlighted in yellow. Close the window when you are satisfied with your results. You may read more about ‘Highlights’ below.
- A word on Blat: Blat of DNA is designed to quickly find sequences of 95% and greater similarity of length 40 bases or more, and it may miss more divergent or shorter sequence alignments. On protein, Blat finds sequences of 80% and greater similarity to the query of length 20+ amino acids. Higher speed at the price of lesser homology depth make Blat a commonly used tool to look up the location of a sequence in the genome or determine the exon structure of an mRNA. Learn more about Blat here.
Initiating an annotation¶
If you have not already performed a Blat search to identify your gene of interest, you may do so at this point using the ‘Sequence search’ feature from the ‘Tools’ tab on the menu bar. You may also navigate along the scaffold using the navigation arrows. Your gene of interest may appear on the forward (sense) or reverse (anti-sense) strand. Gene predictions are labeled with identifiers, and users may retrieve additional information by selecting the entire model and using the right-click menu to select the ‘View details’ item.
After locating your gene of interest, display as many gene prediction and evidence tracks as you consider necessary to inform your annotation by ticking them from the list of available ‘Tracks’ in the ‘Annotator Panel’. Scroll through the different tracks of gene predictions and choose one that you consider most closely reflects the actual structure of the gene. It is also possible to filter the tracks displayed in this list by typing on the ‘Search’ box. You may base your decision on prior knowledge of the reliability of each gene prediction track (e.g., select an evidence-based gene model instead of an ab initio gene prediction). Alternatively, you may compare the gene prediction tracks to a BLAST alignment or other aligned data (e.g.: alignments of protein homologs, cDNAs and, RNAseq reads). Double click on any exon or click on one of the introns of your preferred gene model to select the entire gene model. You may also choose exons from two or more separate tracks of evidence. Drag the selected model, or all pieces of evidence into the ‘User-created Annotations’ area.
At this point you may download the protein sequence (see ‘Get Sequences’ below) to query a protein database and help you determine if the selected gene model is, biologically speaking, an accurate approximation to the gene. For example, you may perform a protein sequence search of UniProt or NCBI’s non-redundant peptide database (nr) using BLAST. If you have knowledge of protein domains in your gene of interest, you may perform a protein domain search of the InterPro databases to verify that your selected gene model contains the expected domains. If further investigation suggests that you have not selected the best gene model to start annotating, delete it by highlighting it (as described above) and using the ‘Delete’ function from the right-click menu.
Once a gene model is selected as the best starting point for annotation, the annotator must decide whether it needs further modification. Protein or domain database searches may have already informed this decision. Scroll down the evidence tracks to see if splice sites in transcript alignments agree with the selected gene model, or if evidence suggests addition or modification of an exon is necessary. Transcript alignments (e.g. cDNA/EST/RNASeq tracks) that are significantly longer than the gene model may indicate the presence of additional coding sequence or untranslated regions (UTRs). Keep in mind that transcript alignments may be shorter than the gene model due to the fragmented nature of current transcript sequencing technologies. Similarly, protein alignments may not reflect the entire length of the coding region because divergent regions may not align well, resulting in a short protein alignment or one with gaps. Protein and transcript alignments in regions with tandem, closely related genes might also be problematic, with partial alignments to one gene, then skipping over to align the rest to a second gene.
Simple Cases¶
In this guide, a ‘simple case’ is that when the predicted gene model is correct or nearly correct, and this model is supported by evidence that mostly agrees or completely agrees with the prediction. Aligned evidence (experimental data) that extends beyond the predicted model is assumed to be non-coding sequence. The following sections describe simple modifications.
Add UTRs¶
Gene predictions may or may not include UTRs. If transcript alignment data are available and extend beyond your original annotation, you may add or extend UTRs. To do this, users may implement edge-matching options to ‘Set as 5’ end’, ‘Set as 3’ end’, or ‘Set as both ends’ from the right-click menu. To use these options, select the exon that needs to be extended, then keep the ‘Shift’ key down as you select the exon from the track of evidence displaying the expected UTR (given the evidence), then use the right click menu to choose the appropriate option to extend to the desired UTR.
Alternatively this operation can be performed manually by positioning the cursor at the edge of the exon that needs to be extended, then using the right-click to display the menu and choosing the ‘Zoom to base level’ option. Place the cursor over the edge of the exon (5’ or 3’ end exon as needed) until it becomes a black arrow (see Fig. 2), then click and drag the edge of the exon to the new coordinate position that includes the UTR. To add a new, spliced UTR to an existing annotation follow the procedure for adding an exon, as detailed in the section ‘Add an Exon’ below.
Exon Structure Integrity¶
Zoom in sufficiently to clearly resolve each exon as a distinct rectangle. When two exons from different tracks share the same start and/or end coordinates, a red bar appears at the edge of the exon. Visualize this edge-matching function by either selecting the whole annotation or one exon at a time. Scrolling along the length of the annotation exon boundaries may be verified against available EST data. Check whether there are any ESTs or transcript data contigs, or any RNASeq reads showing evidence that one or more of the annotated exons are missing, or include additional exons.
You may use square bracket keys [ and ] to jump to the next exon splice junction or coding sequence (CDS). The curly bracket keys { and } allow users to jump to the next transcript.
To correct an exon boundary to match data in the evidence tracks, use the edge-matching options from the right-click menu as described in the ‘Add UTRs’ section above. Alternatively you may ‘Zoom to base level’, click on the exon to select it and place the cursor over the edge of the exon; when the cursor changes to an arrow, drag the edge of the exon to the desired new coordinates.
In some cases all the data may disagree with the annotation, in other cases some data support the annotation and some of the data support one or more alternative transcripts. Try to annotate as many alternatives transcripts as the evidence data support.
Figure 2. Apollo view, zoomed to base level.¶
The DNA track and annotation track are visible. The DNA track includes the sense strand (top) and anti-sense strand (bottom). The six reading frames flank the DNA track, with the three forward frames above and the three reverse frames below. The ‘User-created Annotation’ track shows the terminal end of an annotation. The green rectangle highlights the location of the nucleotide residues in the ‘Stop’ signal.
Splice Sites¶
In most Eukaryotes the majority of splice sites at the exon/intron boundaries appear as 5’-…exon]GT/AG[exon…-3’
. All other splice sites are here called ‘non-canonical’ and are indicated in Apollo with an orange circle with a white exclamation point inside, placed over the edge of the offending exon. When alternative transcripts are added, be sure to inspect each splice site to check for any changes that the changes.
If a non-canonical splice site is present, zoom to base level to review it. Not all non-canonical splice sites must be corrected, and in such cases they should be flagged with the appropriate comment. (Adding a ‘Comment’ is addressed in the section that details the ‘Information Editor’).
Prior knowledge about the organism of interest may help the user decide whether a predicted non-canonical splice site is likely to be real. For instance, GC splice donors have been observed in many organisms, but less frequently than the GT splice donors described above. As mentioned above Apollo flags GC splice donors as non-canonical. To further complicate the problem, splice sites that are non-canonical, but found in nature, such as GC donors, may not be recognized by some gene prediction algorithms. In such cases a gene prediction algorithm that does not recognize GC splice donors may have ignored a true GC donor and selected another non-canonical splice site that is less frequently observed in nature. Therefore, if a non-canonical splice site that is rarely observed in nature is present, you may wish to search the region for a more frequent in-frame non-canonical splice site, such as a GC donor. If there is a close in-frame site that is more likely to be the correct splice donor, make this adjustment while zoomed at base level.
To assist in the decision to modify a splice site, download the translated sequences and use them to search well-curated protein databases, such as UniProt, to see if you can resolve the question using protein alignments. Incorrect splice sites would likely cause gaps in the alignments. If there does not appear to be any way to resolve the non-canonical splice, leave it as is and add a comment.
‘Start’ and ‘Stop’ Sites.¶
By default, Apollo will calculate the longest possible open reading frame (ORF) that includes canonical ‘Start’ and ‘Stop’ signals within the predicted exons. To check for accuracy of ‘Start’ and ‘Stop’ signals, you may use the translated sequence to query a known protein database, such as UniProt, to determine whether the ends of the protein sequence corresponds with those of known proteins.
If it appears that Apollo did not calculate the correct ‘Start’ signal, the user can modify it. To set the ‘Start’ codon manually, position the cursor over the first nucleotide of the candidate ‘Start’ codon and select the ‘Set translation start’ option from the right-click menu. Depending on evidence from a protein database search or additional evidence tracks, you may wish to select an in-frame ‘Start’ codon further up or downstream. An upstream ‘Start’ codon may be present outside the predicted gene model, within a region supported by another evidence track. See section below on how to ‘Add an exon’. When necessary, it is also possible to ‘Set translation end’ from the right-click menu.
Note that the ‘Start’ codon may also be located in a non-predicted exon further upstream. If you cannot identify that exon, add the appropriate comment (using the transcript comment section in the ‘Comments’ table of the ‘Information Editor’ as described below).
In rare cases, the actual ‘Start’ codon may be non-canonical (non-ATG). Check whether a non-canonical ‘Start’ codon is usually present in homologs of this gene, and/or check whether this is a likely occurrence in this organism. If appropriate, you may override the predicted ‘Start’ by manually setting it to a non-canonical ‘Start’ codon, choosing the one that most closely reflects what you know about the protein, and has the best support from the biological evidence tracks. Add the appropriate comment (using the transcript comment section in the ‘Comments’ table of the ‘Information Editor’ as described below).
In some cases, a ‘Stop’ codon may not be automatically identified. Check to see if there are data supporting a 3’ extension of the terminal exon or additional 3’ exons with valid splice sites. See section below on how to ‘Add an exon’. Each time you add an exon region, whether by extending an existent exon or adding a new one, Apollo recalculates the longest ORF to identity ‘Start’ and ‘Stop’ signals, allowing you to determine whether a ‘Stop’ codon has been incorporated after each editing step.
Predicted Protein Products.¶
If any of your manipulations have thrown an exon out of frame, or caused other drastic changes to the translated sequence, Apollo will warn you by changing the display of the model in the ‘User-created Annotations area’ from a light-blue protein-coding stretch to a truncated model shown as a darker blue, narrower rectangle.
If the annotation looks good, obtain the protein sequence (see ‘Get Sequences’ section below) and use it to search a protein database, such as UniProt or NCBI NR. Keep in mind that the best Blast hit may be the exact prediction from which you initiated your annotation; you should not consider the identical protein from your organism as external evidence supporting the annotation. Instead, look at alignments to proteins from other organisms.
Additional Functionality¶
Figure 3. Additional functionality.¶
This is the right-click menu.
Get Sequences¶
Select one or more exons, or an entire gene model of interest, and retrieve the right-click menu to select the ‘Get sequence’ function. Chose from the options to obtain protein, cDNA, CDS or genomic sequences.
Merge Exons, Merge Transcripts¶
Select each of the joining exons while holding down the ‘Shift’ key, open the right-click menu and select the ‘Merge’ option.
Add an Exon¶
You may select and drag the putative new exon from a track in the ‘Evidence’ panel and add it directly to an annotated transcript in the ‘User-created Annotations’ area. Click the exon and, holding your finger on the mouse button, drag the exon using the cursor until it hovers over the receiving transcript. The receiving transcript will be highlighted in dark green when it is okay to release the mouse button. When the mouse button is released the additional exon becomes attached to the receiving transcript. If the receiving transcript is on the opposite strand from the one where you selected the new exon, a warning dialog box will ask you to confirm the change.
Apollo dynamically recalculates the longest ORF for each model, so you must check whether adding one or more exons disrupts the reading frame, inserts premature ‘Stop’ signals, etc.
Make an Intron, Split an Exon¶
Select the ‘Make intron’ option from the right-click menu over an exon will identify the nearest canonical splice sites (5’-…exon]GT/AG[exon…-3’
) to modify the model, and Apollo will also recalculate the longest ORF. If Apollo cannot find a set of canonical splice sites within the selected exon, a dialog box will appear with a warning.
If everything you know about the model indicates that an exon should not be preserved in its current form, you may manually disrupt the exon using the ‘Split option from the right-click menu, which creates a 1-nucleotide intron without taking into account whether or not the surrounding splice sites are canonical.
Delete an Exon¶
Select the exon using a single click (double click selects the entire model), and select the ‘Delete’ option from the right-click menu. Check whether deleting one or more exons disrupts the reading frame, inserts premature ‘Stop’ signals, etc.
Flip the Strand of Annotation¶
At times, transcript alignments may appear on the strand opposite to the model’s coding strand, particularly when the transcript alignment does not include a splice junction, which makes it difficult to determine the coding direction. If aligned evidence is used to initiate an annotation, and it is later determined that the annotation is on the incorrect strand, the user may choose the ‘Flip strand’ option from the right-click menu to reverse the orientation of the annotation. As mentioned before, annotators should always reassess the integrity of the translation after modifying an annotation.
Complex Cases¶
Merge Two Gene Predictions on the Same Scaffold¶
Evidence may support the merge of two (or more) different gene models. To begin the annotation select all the gene models that you would like to merge, then drag them from the ‘Evidence’ panel onto the ‘User-created Annotations’ area. Be aware that protein alignments may not be a useful starting point because these may have incorrect splice sites and may lack non-conserved regions.
You may select the supporting evidence tracks and drag their ‘ghost’ over the candidate models (without releasing them) to corroborate the overlap. Additionally, zoom in and carefully review edge-matching (Figure 4) and coverage across models.
Alternatively, you may select and drag each proposed gene model separately onto the ‘User-created Annotations’ area. Once you are certain that two models should be merged, after checking boundaries and all supporting evidence, bring them together by holding the ‘Shift’ key and clicking on an intron from each of the merging gene models; in this way you will select both models completely. Then select the ‘Merge’ option from the right-click menu. Get the resulting translation sequence and inspect it by querying a protein database, such as UniProt. Be sure to record the IDs of all starting gene models in the ‘Comments’ table, and use the appropriate canned comment to indicate that this annotation is the result of a merge.
Figure 4. Edge-matching in Apollo.¶
When a feature is selected, the exon edges are marked with a red box. All other features that share the same exon boundaries are marked with a red line on the matching edge. This feature allows annotators to confirm that evidence is in agreement without examining each exon at the base level.
Merge Two Gene Predictions from Different Scaffolds¶
It is not yet possible to merge two annotations across scaffolds, however annotators should document the fact that the data support a merge in the ‘Comments’ table for both components. For standardization purposes, please use the following two prepared (canned) comments, adding the name of both models in every case:
- “RESULT OF: merging two or more gene models across scaffolds”
- “RESULT OF: merging two or more gene models. Gene models involved in merge:”
Split a Gene Prediction¶
When different segments of a predicted protein align to two or more different families of protein homologs, and when the predicted protein does not align to any known protein over its entire length, one or more splits may be recommended. Transcript data may show evidence in support of a split; be sure to verify that it is not a case of alternative transcripts!
A split can be created in one of two ways:
- Select the flanking exons using the right-click menu option ‘Split’, or
- Annotate each resulting fragment independently.
You should obtain the resulting translation, and check it by searching a protein database, such as UniProt. Be sure to record the original ID for both annotations in the ‘Comments’ section.
Frameshifts, Single base Errors, and Selenocysteine-containing Products¶
Apollo allows annotators to make single base modifications and frameshifts that are reflected in the sequence and structure of any transcripts overlapping the modification. Note that these manipulations do NOT change the underlying genomic sequence. Changes are made on the DNA track with the right-click menu.
If you determine that you need to make one of these changes, zoom in to the nucleotide level, and right-click over the genomic sequence to access the menu with options for introducing sequence changes such as insertions, deletions or substitutions. The selected nucleotide must be the starting point for each modification.
- The ‘Create Genomic Insertion’ option requires a string of nucleotide residues that will be inserted to the right of the cursor’s current coordinate.
- The ‘Create Genomic Deletion’ option requires the length of the deletion, starting with the nucleotide where the cursor is positioned.
- When using the ‘Create Genomic Substitution’ option, enter the string of nucleotide residues that will replace the ones on the DNA track.
Once you have entered the modifications, Apollo will recalculate the corrected transcript and protein sequences, which can be obtained selecting the ‘Get Sequence’ option from the right-click menu. Since the underlying genomic sequence is reflected in all annotations that include the modified region you should alert the curators of your organism’s database using the ‘Comments’ section to report these CDS edits.
It is also possible to annotate special cases such as selenocysteine-containing proteins, or read-through ‘Stop’ signals using the right-click menu and selecting the ‘‘Set readthrough stop codon’ option. The current TGA ‘Stop’ exon will be highlighted in purple, and the next ‘Stop’ signal in frame will be used as the end of translation. Note that Apollo will automatically add the remaining amino acids to the resulting sequence. Add a comment in the ‘Comments’ section for this transcript to include this modification.
Annotating Repeat Regions, Transposable Elements, and Non-coding (nc) RNAs¶
Apollo allows users to annotate a variety of ncRNAs and other regulatory elements.
If you don’t know the location of the feature you wish to annotate, perform a Blat search to identify the sequence of interest using the ‘Sequence search’ feature from the ‘Tools’ tab on the menu bar (see also section on how to ‘Search for a specific sequence’). You may also navigate along the scaffold using the navigation arrows. All non-coding elements are labeled with identifiers, and users may retrieve additional information by selecting the feature and using the right menu to select the ‘View details’ item.
Once the genomic element and track of interest are located in the ‘Evidence’ panel, select it and use right click over the desired feature, and choose the ‘Create New Annotation’ option to start an annotation. After the user chooses an element from the menu, the new annotation appears in the ‘User-created Annotations’ track. The type of annotation for any annotations already present in the ‘User-created Annotations’ cannot be changed.
Modifications such as editing boundaries, duplicating, and deleting the annotation, as well as the ‘History’, ‘Redo’ and ‘Undo’ functions, are possible for all non-coding features. Additional modifications such as ‘Split’ and ‘Make intron’ are also possible for ncRNAs.
All metadata about the annotation should be added using the ‘Information Editor’, as described below.
The Information Editor¶
Information about the ‘Name’, ‘Symbol’, and ‘Description’ for a Gene, Transcript, repeat region, transposable element, or non-coding RNAs can be modified in the ‘Information Editor’. There is also an option to report to the lead curators, informing them whether a manual annotation needs to be reviewed (’Needs review’), or has already been ‘Approved’ using the ‘Status’ buttons.
Users will also be able to input information about their annotations in fields that capture
- ‘Comments’ on the process of annotation.
- Crossed references to other databases in ‘DBXRefs’.
- Additional ‘Attributes’ in a ‘tag/value’ format that pertain to the annotation.
- References to any published data in the PubMed database using ‘Pubmed IDs’.
- Gene Ontology (GO) annotations, which can be added typing text or GO identifiers. The auto-complete function will retrieve the desired information. A drop-down menu at the top of the ‘Information Editor’ allows users to switch between isoforms while editing these metadata.
All the information captured in these tables will be incorporated into the exported files of the ‘User-created Annotations’, and will appear in Column 9 of the GFF3 that is generated.
Add Comments¶
When you are satisfied with your annotation, you may provide additional information in the form of ‘Comments’. For example, the ID of the gene prediction that you used to initiate the annotation presents useful information for your database curators. Functional information obtained from homologs may also be useful, e.g. homolog ID, description, gene name, gene symbol. You should also indicate the type of changes made to the annotation, and whether a gene is split across scaffolds, as described in previous sections.
For each annotated element first click to select it, then use the right-click option to select ‘Information Editor’ from the menu. In the case of coding genes, pseudogenes, and ncRNAs the ‘Information Editor’ window displays information for both the gene and the transcript; users should determine whether the comment is more appropriate for the gene (e.g. a change in the gene symbol) or an individual transcript (e.g. type of alterations made). In the case of repetitive elements and transposable elements, the ‘Information Editor’ window has only one column.
In the ‘Information Editor’ window click on the respective ‘Add’ button to start a new comment; a new row, labeled as ‘Enter new comment’, will appear. One click on this row reveals a drop-down menu option on the right, which displays canned comments to choose if they are available for your organism of interest. Alternatively, it is also possible to type custom comments. To edit an existing comment, click over the comment and begin typing, or replace it with a different canned comment. Comments that are no longer relevant or useful may be removed using the ‘Delete’ button at the bottom of the box.
Add Database Crossed-references, PubMed IDs, and GO IDs¶
When available, users should also include information to cross-referenced databases by adding the name of the database and the corresponding accession number for each gene or transcript to the ‘DBXRefs’ tables, respectively. Any additional information regarding published information in support of this annotation (e.g. whether the gene has already been part of a publication) should be included by adding a ‘PubMed ID’ using the provided field, and available functional information should be added using GO IDs as appropriate. The process to add information to these tables is the same as described for the ‘Comments’ tables.
Add Attributes¶
Any additional information about the gene model or transcript that can be included in the form of a ‘tag/value’ entry, and provides further evidence in support of the manual annotation can be captured on the ‘Attributes’ table. The process to add information to these tables is the same as described for the ‘Comments’ tables.
(No need for) Saving your Annotations¶
Apollo immediately saves your work, automatically recording it on the database. Because of this, your work will not be lost in the event of network disruptions, and no further actions are required in order to save your work.
Exporting Data¶
The user-created annotations may be exported as GFF3 and FASTA formatted files. These operations may be done for either a single scaffold, or to include user-created annotations from the entire assembled genome. See the section on ‘Ref Sequence Tab’ under ‘Annotator Panel’ to learn more about how to export data.
Data from each of the evidence and prediction tracks can also be exported. GFF3 formatted files of the visible region on the Apollo screen, as well as files containing data from the entire scaffold/chromosome can be exported. The data will be formatted according to the original data used to display each track. For instance, RNA-Seq reads could be exported either as GFF3 or BED file formats.
Public Demo¶
The Apollo Demo uses the genome of the honey bee (Apis mellifera). Below are details about the experimental data provided as supporting evidence.
Evidence in support of protein coding gene models¶
Consensus Gene Sets:¶
- Official Gene Set v3.2
- Official Gene Set v1.0
Consensus Gene Sets comparison:¶
- OGSv3.2 genes that merge OGSv1.0 genes
- OGSv3.2 genes that merge RefSeq genes
- OGSv3.2 genes that split OGSv1.0 genes
- OGSv3.2 genes that split RefSeq genes
Protein Coding Gene Predictions Supported by Biological Evidence:¶
- NCBI Gnomon
- Fgenesh++ with RNASeq training data
- Fgenesh++ without RNASeq training data
- NCBI RefSeq Protein Coding Genes
- NCBI RefSeq Low Quality Protein Coding Genes
Ab initio protein coding gene predictions:¶
- Augustus Set 12
- Augustus Set 9
- Fgenesh
- GeneID
- N-SCAN
- SGP2
Transcript Sequence Alignment:¶
- NCBI ESTs
- Apis cerana reads (RNA-Seq)
- Forager Bee Brain Illumina Contigs
- Nurse Bee Brain Illumina Contigs
- Forager RNA-Seq reads
- Nurse RNA-Seq reads
- Abdomen 454 Contigs
- Brain and Ovary 454 Contigs
- Embryo 454 Contigs
- Larvae 454 Contigs
- Mixed Antennae 454 Contigs
- Ovary 454 Contigs
- Testes 454 Contigs
- Forager RNA-Seq HeatMap
- Forager RNA-Seq XY Plot
- Nurse RNA-Seq HeatMap
- Nurse RNA-Seq XY Plot
Protein homolog alignment:¶
- Acep_OGSv1.2
- Aech_OGSv3.8
- Cflo_OGSv3.3
- Dmel_r5.42
- Hsal_OGSv3.3
- Lhum_OGSv1.2
- Nvit_OGSv1.2
- Nvit_OGSv2.0
- Pbar_OGSv1.2
- Sinv_OGSv2.2.3
- Znev_OGSv2.1
- Metazoa_Swissprot
Additional Information About Apollo¶
Apollo is an open-source project and is under active development. If you have any questions, you may contact the Apollo development team or join the conversation on the Apollo mailing list by filling out this form. We provide additional documentation for installation and setup. Our demo page provides information on connecting to our demonstration site. Apollo is a member of the GMOD project.
Permissions guide¶
Global¶
- admin: access to everything
- user: only guarantees a login with permissions configured on organism basis
Organism¶
Can only view things related to that organism.
- read: view / search only, no annotation
Annotations: lock detail / coding
RefSeq: hide export
Organism: hide
User: hide
Group: hide
Preferences: hide
JBrowse: disable UcA track
- export: same as read, but can use the export screen
RefSeq: show export
- write: same as above, but can add / edit annotations
Annotations: allow editing
JBrowse: enable UcA track
- admin: access to everything for that organism
Organism: show
User: show
Group: show
Preferences: (still hide)
Table of permissions:
| Permission | Annotator | Users/groups | Annotations | Organism |
|------------|--------------------|---------------|---------------------|---------------------------|
| READ | visible / locked | hide | visible / no export | visible |
| EXPORT | visible / locked | hide | visible / export | visible |
| WRITE | visible + editable | hide | visible / export | visible |
| ADMIN | visible + editable | visible | visible /export | visible + admin functions |
| NONE | not available | not available | not available | not visible |
The Preference panel is available only for GLOBAL admin.
Developer’s guide¶
Here we will introduce how to setup Apollo on your server. In general, there are two modes of deploying Apollo.
There is “development mode” where the application is launched in a temporary server (automatically) and there is
“production mode”, which will typically require an external separate database and tomcat server where you can deploy the
generated war
file.
This guide will cover the “development mode” scenario which should be easy to start. To setup in a production environment, please see the setup guide.
Java / JDK¶
You have to install Java and the Java Development Kit (JDK) 8 or higher to run Apollo. Both the Oracle and OpenJDK versions have been tested.
Node.js / NPM¶
You will need to install node.js, which includes NPM (the node package manager) to build Apollo.
nvm is highly recommended for installing and managing multiple version of Node. Node v6 and up should work, but we recommend Node v8 or better.
Grails / Groovy / Gradle (optional)¶
Installing Grails (application framework), Groovy (development language), or Gradle (build environment) is not required (they will install themselves), but it is suggested for doing development.
This is most easily done by using SDKMAN (formerly GVM) which can automatically setup grails for you.
curl -s http://get.sdkman.io | bash
sdk install grails 2.5.5
sdk install gradle 2.11
sdk install groovy
Get the code¶
To setup Apollo, you can download our latest release from our official releases as compressed zip or tar.gz file (link at the bottom).
Alternatively you can check it out from git directly as follows:
git clone https://github.com/GMOD/Apollo.git Apollo
cd Apollo
git checkout <XYZ>
- optional, where XYZ is the tagged version you want from here: https://github.com/GMOD/Apollo/releases
Verify install requirements¶
We can now perform a quick-start of the application in “development mode” with this command:
./apollo run-local
The JBrowse and perl pre-requisites will be installed during this step, and if there is a success, then a temporary
server will be automatically launched at http://localhost:8080/apollo
.
Note: You can also supply a port number e.g. apollo run-local 8085
if there are conflicts on port 8080.
Also note: if there are any errors at this step, check the setup.log file for errors. You can refer to the troubleshooting guide and often it just means the pre-requisites or perl modules failed.
Also also note: the “development mode” uses an in-memory H2 database for storing data by default. The setup guide will show you how to configure custom database settings.
Running the code¶
There are several distinct parts of the code.
- Apollo client plugin (JS: dojo, jquery, etc.) in client directory
- Server (Grails 2.5.5: Groovy and Java) in grails-app, src, web components and tests.
- Side-panel code / wrapper code (GWT 2.8: Java). Code is java and/or XML in src/gwt.
- Tools / scripts in the examples and tools: Groovy, perl, bash
- JBrowse (JS: dojo, jquery, etc.)
In general, the command ./apollo run-local
will build and run the client and the server code. Subsequent runs that do not change the GWT code can use ./apollo run-app
. Changes to domain objects or adding controller methods may make stopping and restarting the server necessary, but most other changes will compile without having to restart the server.
./apollo test
runs the grails unit and integration tests.
Updating the web-service doc can be done with ./apollo create-rest-doc
Running the code for the making client plugin changes¶
After starting the server you can run ./gradlew installJBrowseWebOnly
or ./apollo jbrowse
to push changes from the JavaScript code in the client/apollo
directory.
If for some reason this is not working then make sure that your network development tab, in your browser console, has disabled caching. You can also run the command ./gradlew copy-resources-dev
manually each time instead if the files don’t seem to be getting copied.
Running the code for GWT changes¶
To use the GWT dev server run gradle devmode
in a separate terminal. This will bring up a separate GWT dev-mode code server that will compile subsequent changes to the src/gwt code after reloading the page.
If errors seem to be a little obtuse using the dev compilation, you might try running ./apollo compile
to get more detail.
Running the code for JBrowse changes¶
If you are testing making changes directly to JBrowse within Apollo, the following steps should work:
./apollo clean-all
- Clone the version of jbrowse you want into a directory called
jbrowse-download
as the root level. ./apollo run-local
to run the server- In a separate terminal run
gradle copy-resources-dev
to copy over your changes to the server.
Adding sample data¶
If you want to test with pre-processed data instead of adding your own you can load the following data into a directory to be added as an organism.
- Yeast from JBrowse sample data 0.5 MB
- Volvox imaginary sample organism from JBrowse 2 MB
- Honeybee without BAM 500 MB
- Honeybee with BAM 17 GB
Using Apollo with IntelliJ¶
You can use Intellij, NetBeans, Eclipse, or just a simple text editor with Apollo to aid in development.
Here we discuss using IntelliJ Ultimate with Apollo:
- Download IntelliJ Ultimate (you need the commercial version). Licensing options.
- Clone / download Apollo if you haven’t already.
git clone https://github.com/GMOD/Apollo
and follow the instructions on building Apollo in this doc. - If you’ve tried to use it before with IntelliJ, make sure that there is no
.idea
or*.ipr
file present in the directory. - Open IntelliJ
- Select
Import Project
- Select
Create from Existing Sources
- After it detects the web-app it should have detected
Web
. SelectGrails
instead. - Note that there is a
Grails
view in the project menu. - Open Terminal and run
./apollo run-local
to take care of all the dependencies, including JBrowse. If you aren’t developing GWT, you can use./apollo run-app
instead. Most Java / Groovy files will automatically recompile in a few seconds after you make changes. - You can also run debug or run directly from the IDE with output below.
Notes on Debugging:
- In IntelliJ, run debug (works only for JVM files, debug JavaScript in the browser)
- There is an error in IntelliJ 2017.3 so either downgrade to 2017.2 or disable the Insrumenting agent in
File | Settings | Build, Execution, Deployment | Debugger | Async Stacktraces
in the preferences menu.
Create server documentation¶
Using an IDE like IntelliJ, NetBeans, Eclipse etc. is highly recommended in conjunction with Grails 2.5.X documentation. Additionally, you can generate documentation using grails:
grails doc
Server documentation (for groovy) should be available at target/docs/all-docs.html
.
Setting up the application¶
Setup a production server¶
To setup in a production environment, please see the setup guide. To setup (as opposed to a development server as above), you must properly configure a servlet container like Tomcat or Jetty with sufficient memory.
Adding data to Apollo¶
After we have a server setup, we will want to add a new organism to the panel. If you are a new user, you will want to setup this data with the jbrowse pre-processing scripts. You can see the data loading guide for more details, but essentially, you will want to load a reference genome and an annotations file at a minimum:
bin/prepare-refseqs.pl --fasta yourgenome.fasta --out /opt/apollo/data
bin/flatfile-to-json.pl --gff yourannotations.gff --type mRNA \
--trackLabel AnnotationsGff --out /opt/apollo/data
Login to the web interface¶
After you access your application at http://localhost:8080/apollo/ then you will be prompted for login information
Login first time
Figure 1. “Register First Admin User” screen allows you to create a new admin user.
Organism configuration
Figure 2. Navigate to the “Organism tab” and select “Create new organism”. Then enter the new information for your organism. Importantly, the data directory refers to a directory that has been prepared with the JBrowse data loading scripts from the command line. See the data loading section for details.
Open annotator
Figure 3. Open up the new organism from the drop down tab on the annotator panel.
Conclusion¶
If you completed this setup, you can then begin adding new users and performing annotations. Please continue to the setup guide for deploying the webapp to production or visit the troubleshooting guide if you encounter problems during setup.
How to contribute code to Apollo¶
Audience¶
These guidelines are for developers of Apollo software, whether internal or in the broader community.
Basic principles of the Apollo-flavored GitHub Workflow¶
Principle 1: Work from a personal fork¶
- Prior to adopting the workflow, a developer will perform a one-time setup to create a personal Fork of apollo and will subsequently perform their development and testing on a task-specific branch within their forked repo. This forked repo will be associated with that developer’s GitHub account, and is distinct from the shared repo managed by GMOD.
Principle 2: Commit to personal branches of that fork¶
- Changes will never be committed directly to the master branch on the shared repo. Rather, they will be composed as branches within the developer’s forked repo, where the developer can iterate and refine their code prior to submitting it for review.
Principle 3: Propose changes via pull request of personal branches¶
- Each set of changes will be developed as a task-specific branch in the developer’s forked repo, and then create a pull request will be created to develop and propose changes to the shared repo. This mechanism provides a way for developers to discuss, revise and ultimately merge changes from the forked repo into the shared Apollo repo.
Principle 4: Delete or ignore stale branches, but don’t recycle merged ones¶
- Once a pull request has been merged, the task-specific branch is no longer needed and may be deleted or ignored. It is bad practice to reuse an existing branch once it has been merged. Instead, a subsequent branch and pull-request cycle should begin when a developer switches to a different coding task.
- You may create a pull request in order to get feedback, but if you wish to continue working on the branch, so state with “DO NOT MERGE YET”.
Table of contents¶
- One Time Setup - Forking a Shared Repo
- Typical Development Cycle
- Refresh and clean up local environment
- Create a new branch
- Changes, Commits and Pushes
- Reconcile branch with upstream changes
- Submitting a PR (pull request)
- Reviewing a pull request
- Respond to TravisCI tests
- Respond to peer review
- Repushing to a PR branch
- Merge a pull request
- Celebrate and get back to work
- GitHub Tricks and Tips
- References and Documentation
Typical Development Cycle¶
Once you have completed the One-time Setup above, then it will be possible to create new branches and pull requests using the instructions below. The typical development cycle will have the following phases:
- Refresh and clean up local environment
- Create a new task-specific branch
- Perform ordinary development work, periodically committing to the branch
- Prepare and submit a Pull Request (PR) that refers to the branch
- Participate in PR Review, possibly making changes and pushing new commits to the branch
- Celebrate when your PR is finally Merged into the shared repo.
- Move onto the next task and repeat this cycle
Refresh and clean up local environment¶
Git will not automatically sync your Forked repo with the original shared repo, and will not automatically update your local copy of the Forked repo. These tasks are part of the developer’s normal cycle, and should be the first thing done prior to beginning a new development effort and creating a new branch. In addition, this
Step 1 - Fetch remotes¶
In the (likely) event that the upstream repo (the apollo shared repo) has changed since the developer last began a task, it is important to update the local copy of the upstream repo so that its changes can be incorporated into subsequent development.
> git fetch upstream # Updates the local copy of shared repo BUT does not affect the working directory, it simply makes the upstream code available locally for subsequent Git operations. See step 2.
Step 2 - Ensure that ‘master’ is up to date¶
Assuming that new development begins with branch ‘master’ (a good practice), then we want to make sure our local ‘master’ has all the recent changes from ‘upstream’. This can be done as follows:
> git checkout master
> git reset --hard upstream/master
The above command is potentially dangerous if you are not paying attention, as it will remove any local commits to master (which you should not have) as well as any changes to local files that are also in the upstream/master version (which you should not have). In other words, the above command ensures a proper clean slate where your local master branch is identical to the upstream master branch.
Some people advocate the use of git merge upstream/master
or git rebase upstream/master
instead of the git reset --hard
. One risk of these options is that unintended local changes accumulate in the branch and end up in an eventual pull request. Basically, it leaves open the possibility that a developer is not really branching from upstream/master, but is branching from some developer-specific branch point.
Create a new branch¶
Once you have updated the local copy of the master branch of your forked repo, you can create a named branch from this copy and begin to work on your code and pull-request. This is done with:
> git checkout -b fix-feedback-button # This is an example name
This will create a local branch called ‘fix-feedback-button’ and will configure your working directory to track that branch instead of ‘master’.
You may now freely make modifications and improvements and these changes will be accumulated into the new branch when you commit.
If you followed the instructions in Step 5 - Configure .bashrc
to show current branch (optional), your shell prompt should look something like this:
~/MI/apollo fix-feedback-button $
Changes, Commits and Pushes¶
Once you are in your working directory on a named branch, you make changes as normal. When you make a commit, you will be committing to the named branch by default, and not to master.
You may wish to periodically git push
your code to GitHub. Note the use of an explicit branch name that matches the branch you are on (this may not be necessary; a git expert may know better):
> git push origin fix-feedback-button # This is an example name
Note that we are pushing to ‘origin’, which is our forked repo. We are definitely NOT pushing to the shared ‘upstream’ remote, for which we may not have permission to push.
Reconcile branch with upstream changes¶
If you have followed the instructions above at Refresh and clean up local environment, then your working directory and task-specific branch will be based on a starting point from the latest-and-greatest version of the shared repo’s master branch. Depending upon how long it takes you to develop your changes, and upon how much other developer activity there is, it is possible that changes to the upstream master will conflict with changes in your branch.
So it is a good practice to periodically pull down these upstream changes and reconcile your task branch with the upstream master branch. At the least, this should be performed prior to submitting a PR.
Fetching the upstream branch¶
The first step is to fetch the update upstream master branch down to your local development machine. Note that this command will NOT affect your working directory, but will simply make the upstream master branch available in your local Git environment.
> git fetch upstream
Rebasing to avoid Conflicts and Merge Commits¶
Now that you’ve fetched the upstream changes to your local Git environment, you will use the git rebase
command to adjust your branch
> # Make that your changes are committed to your branch
> # before doing any rebase operations
> git status
# ... Review the git status output to ensure your changes are committed
# ... Also a good chance to double-check that you are on your
# ... task branch and not accidentally on master
> git rebase upstream/master
The rebase command will have the effect of adjusting your commit history so that your task branch changes appear to be based upon the most recently fetched master branch, rather than the older version of master you may have used when you began your task branch.
By periodically rebasing in this way, you can ensure that your changes are in sync with the rest of Apollo development and you can avoid hassles with merge conflicts during the PR process.
Dealing with merge conflicts during rebase¶
Sometimes conflicts happen where another developer has made changes and committed them to the upstream master (ideally via a successful PR) and some of those changes overlap with the code you are working on in your branch. The git rebase
command will detect these conflicts and will give you an opportunity to fix them before continuing the rebase operation. The Git instructions during rebase should be sufficient to understand what to do, but a very verbose explanation can be found at Rebasing Step-by-Step
Advanced: Interactive rebase¶
As you gain more confidence in Git and this workflow, you may want to create PRs that are easier to review and best reflect the intent of your code changes. One technique that is helpful is to use the interactive rebase capability of Git to help you clean up your branch prior to submitting it as a PR. This is completely optional for novice Git users, but it does produce a nicer shared commit history.
See squashing commits with rebase for a good explanation.
Submitting a PR (pull request)¶
Once you have developed code and are confident it is ready for review and final integration into the upstream version, you will want to do a final git push origin ...
(see Changes, Commits and Pushes above). Then you will use the GitHub website to perform the operation of creating a Pull Request based upon the newly pushed branch.
Reviewing a pull request¶
The set of open PRs for the apollo can be viewed by first visiting the shared apollo GitHub page at https://github.com/GMOD/apollo.
Click on the ‘Pull Requests’ link on the right-side of the page:
Note that the Pull Request you created from your forked repo shows up in the shared repo’s Pull Request list. One way to avoid confusion is to think of the shared repo’s PR list as a queue of changes to be applied, pending their review and approval.
Respond to TravisCI tests¶
The GitHub Pull Request mechanism is designed to allow review and refinement of code prior to its final merge to the shared repo. After creating your Pull Request, the TravisCI tests for apollo will be executed automatically, ensuring that the code that ‘worked fine’ on your development machine also works in the production-like environment provided by TravisCI. The current status of the tests can be found near the bottom of the individual PR page, to the right of the Merge Request symbol:
TBD - Something should be written about developers running tests PRIOR to TravisCI and the the PR. This may already be in the README.html, but should be cited.
Respond to peer review¶
The GitHub Pull Request mechanism is designed to allow review and refinement of code prior to its final merge to the shared repo. After creating your Pull Request, the TravisCI tests for apollo will be executed automatically, ensuring that the code that ‘worked fine’ on your development machine also works in the production-like environment provided by TravisCI. The current status of the tests can be found
Repushing to a PR branch¶
It’s likely that after created a Pull Request, you will receive useful peer review or your TravisCI tests will have failed. In either case, you will make the required changes on your development machine, retest your changes, and you can then push your new changes back to your task branch and the PR will be automatically updated. This allows a PR to evolve in response to feedback from peers. Once everyone is satisfied, the PR may be merged. (see below).
Merge a pull request¶
One of the goals behind the workflow described here is to enable a large group of developers to meaningfully contribute to the Apollo codebase. The Pull Request mechanism encourages review and refinement of the proposed code changes. As a matter of informal policy, Apollo expects that a PR will not be merged by its author and that a PR will not be merged without at least one reviewer approving it (via a comment such as +1 in the PR’s Comment section).
Celebrate and get back to work¶
You have successfully gotten your code improvements into the shared repository. Congratulations! The branch you created for this PR is no longer useful, and may be deleted from your forked repo or may be kept. But in no case should the branch be further developed or reused once it has been successfully merge. Subsequent development should be on a new branch. Prepare for your next work by returning to Refresh and clean up local environment.
GitHub Tricks and Tips¶
- Add
?w=1
to a GitHub file compare URL to ignore whitespace differences.
References and Documentation¶
- The instructions presented here are derived from several sources. However, a very readable and complete article is Using the Fork-and-Branch Git Workflow. Note that the article doesn’t make clear that certain steps like Forking are one-time setup steps, after which Branch-PullRequest-Merge steps are used; the instructions below will attempt to clarify this.
- New to GitHub? The GitHub Guides are a great place to start.
- Advanced GitHub users might want to check out the GitHub Cheat Sheet
Automated testing architecture¶
The Apollo unit testing framework uses the grails testing guidelines extensively, which can be reviewed here: http://grails.github.io/grails-doc/2.4.3/guide/testing.html
Our basic methodology is to run the full test suite with the apollo command:
apollo test
More specific tests can also be run for example by running specific commands for grails test-app
grails test-app :unit-test
This runs ALL of the tests in “test/unit”. If you want to test a specific function then write it something like this:
grails test-app org.bbop.apollo.FeatureService :unit
Notes about the test suites:¶
- @Mock includes any domain objects you’ll use. Unit tests don’t use the database.
- The setup() function is run for each test
- The test is composed of blocks of code with
when:
andthen:
. You have to have both or it is not a test.
Example test:
@TestFor(FeatureService)
@Mock([Sequence,FeatureLocation,Feature])
class FeatureServiceSpec extends Specification {
void setup(){}
void "convert JSON to Feature Location"(){
when: "We have a valid json object"
JSONObject jsonObject = new JSONObject()
Sequence sequence = new Sequence(name: "Chr3",
seqChunkSize: 20, start:1, end:100, length:99).save(failOnError: true)
jsonObject.put(FeatureStringEnum.FMIN.value,73)
jsonObject.put(FeatureStringEnum.FMAX.value,113)
jsonObject.put(FeatureStringEnum.STRAND.value, Strand.POSITIVE.value)
then: "We should return a valid FeatureLocation"
FeatureLocation featureLocation =
service.convertJSONToFeatureLocation(jsonObject,sequence)
assert featureLocation.sequence.name == "Chr3"
assert featureLocation.fmin == 73
assert featureLocation.fmax == 113
assert featureLocation.strand ==Strand.POSITIVE.value
} }
There are 3 “special” types of things to test, which are all important and reflect the grails special functions: Domains, Controllers, Services. They will all be in the “test” directory and all be suffixed with “Spec” for a Spock test.
Chado¶
If you test with the chado export you will need to make sure you load ontologies into your chado database or integration steps will fail. If you don’t specify chado in your apollo-config.groovy then no further action would be necessary.
./scripts/load_chado_schema.sh -u nathandunn -d apollo-chado-test -s chado-schema-with-ontologies.sql.gz -r
Architecture notes¶
Overview and developer’s guide¶
See the build doc for the official developer’s guide.
Minimally, the apollo application can be launched by running apollo run-local
. This starts up a temporary tomcat
server automatically. It will also simply use a in-memory H2 database if a different database configuration isn’t setup
yet.
For development purposes, you can also enable automatic code reloading which helps for fast iteration.
grails -reloading run-app
will allow changes to the server side code to be auto-reloaded.ant devmode
will provide auto-reloading of GWT code changesscripts/copy_client.sh
will copy the plugin code to the web-apps folder to update the plugin javascript
The apollo
script automatically does several of these functions.
Note: Changes to domain/database objects will require an application restart, but, a very cool feature of our application is that the whole database doesn’t need reloading after a database change.
If you look at the apollo
binary, you’ll see that the code for grails run-app
and others are automatically launched
during apollo run-local
.
Also, as always during web development, yoe will want to clear the cache to see changes (”shift-reload” on most browsers).
Overview¶
The main components of the Apollo 2.x application are:
- Grails 2 Server with the current version set in the application.properties
- Datastore: configured via Hibernate / Grails whcih can use most anything supported by JDBC / hibernate (primarily, Postgres, MySQL, H2)
- JBrowse / Apollo Plugin: JS / HTML5 JBrowse doc and main site
- GWT client: provides the sidebar. Can be written in another front-end language, as well. GWT doc
Basic layout¶
- Grails code is in normal grails directories under “grails-app”
- GWT-only code is under “src/gwt”
- Code shared between the client and the server is under “src/gwt/org/bbop/apollo/gwt/shared”
- Client code is under “client” (still)
- Tests are under “test”
- Old (presumably inactive code) is under “src/main/webapp”
- New source (loaded into VM) is under “src/java” or “src/groovy” except for grails specific code.
- Web code (not much) is either under “web-app” (and where jbrowse is copied) or under “grails-app/assets” (these are compiled down).
- GWT-specifc CSS can also be found in: “src/gwt/org/bbop/apollo/gwt/client/resources/” but it inherits the CSS on its current page, as well.
Main components¶
The main components of the Apollo 2.x application (the four most important are 1 through 4):
- The domain classes; these are the main objects
- Controllers, which route those domains and provide URL routes; provides rest services
- Views: annotator and index and the only ones that matter for Apollo
- Services: very important because all of the controllers should typically have routes, then particular business logic should go into the service.
- Configuration files: The grails-app/conf folder contains central conf files, but the apollo-config.groovy file in your root directory can override these central configs (i.e. it is not necessary to edit DataSource.groovy)
- Grails-app/assets: all your javascript live here. efficient way to deliver this stuff
- Resources: web-app directory: css, images, and the jbrowse directory + WA plugin are initialized here.
- Client directory: The WA plugin is copied or compiled along with jbrowse to the web-app directory
Schema/domain classes¶
Domain classes: the most important domain class everywhere is the Feature; it is the key to everything that we do. The way a domain class is built:
The domain classes represent a database table. The way it works with “Feature”, which is inherited by many other classes, is that all features are stored in the same table, the difference is that in SQL, there is a class table and when it pulls these tables from the database — it queries it and then converts it into the right class. There are a number of constrains you can set.
Very important: the hasMany maps the one-to-many relationship within the database. It can have many locations. the parentFeatureRelationships is where you map this one-to-many relationship. You also have to have a single item relationship.
You can add extra methods to the domain objects, but this is generally not necessary.
Note: In the DataStore configuration, setting called “auditable = true” means that a new table, a feature auditing tool, is keeping track of history for the specified objects
Feature class¶
All features inherit an ontologyId and specify a cvTerm, although CvTerms are being phased out.
Subclasses of “Feature” will specify the ontologyId, but “Feature” itself is too generic, for example, so it does not have an ontologyId.
Sequence class¶
Sequences are the method for WA to grabs sequences used to have a cache built-in mechanism doesn’t want to have that anymore to avoid running into memory problems.
Feature locations¶
Features such as genes all have a feature location belongs to a particular sequence. If you have a feature with subclasses, it can exist within many locations, and each location belongs to its own sequence.
Feature relationship¶
Feature relationships can define parent/child relationships as well as SO terms i.e. SO “part_of” relationships
Feature enums¶
The FeatureString enum: allows for mapping names for concepts, and it is useful to use these enums without worrying about string mappings inside the application.
Running the application¶
If you go through and run this grails application when you send the URL request, then methods that are sent through the AnnotationEditorController (formerly called AnnotationEditorService) dynamically calls a method using handleOperation.
The AnnotatorController serves the page that the annotator is on. This doesn’t map to a particular domain object.
In most cases when we have these methods, it unwraps the data that is sent through into JSON object as a set of variables. Then it is processed into java objects and routed back to JSON to send back.
When annotator creates a transcript, it is then released to requestHandlingService and it sends it to an annotation event, which sends it to a WebSocket, and it’s then broadcasted to everyone.
Websockets and listeners¶
All clients subscribe to AnnotationNotifications for new transcripts and events.
If an add_transcript operation occurs, this is broadcasted via the websocket. The server side broadcasts this event, and then it does a JSON roundtrip to render the results and sends the return object that belongs to an AnnotationEvent.
Procedure transcript is created –> goes to the server –> adds a transcript locally –> announces it to everyone.
We used to use long polling request model for “push notifications” but now we use Spring with the SockJS, which uses websockets but it can fall back to long-polling.
There is another component of the broadcasting called brokerMessagingTemplate is the converter to broadcast the event
Controllers¶
Grails controllers are a fairly easy concept for “routing” URLs and info to methods in the code.
Services¶
Grails services are classes that perform business logic. (In IntelliJ, these are indicated by green buttons on the definitions to show that these are Injected Spring Bean classes)
The word @Transactional means that every operation that is not private is handled via a transaction. In the old model there were a lot of files that were recreated each time, even though they did the same. Now we define a class and can use it again and again. And there can be transactions within transaction. I could also call other services within services.
addTranscript generateTranscript
The different services do exactly what their name implies. It may not always be clear in what particular service each class should be in, but it can be changed later. It is easy also to make changes to the names as well.
Grails views¶
- Most of Views are under grails-app
- everything conforms to the MVC backend model for the Grails application.
- Most of java, css, html is under web-app directory
- Application logic for groovy, gwt, java, etc live here. we could put our old servlets there, but not recommended.
Main configuration¶
The central configuration files are defined in grails-app/conf/ folder, however the user normally only edits their personal config in apollo-config.groovy. That is because the user config file will override those in the central configuration. See Configure.html for details.
Database configuration¶
The “root” database configuration is specified by grails-app/conf/DataSource.groovy but it is generally over-ridden by the user’s apollo-config.groovy
It is recommended that the user takes sample-postgres-apollo-config.groovy or sample-mysql-apollo-config.groovy and copies it to apollo-config.groovy for their application.
The default database driver is the h2 database, which is an “embedded” database that doesn’t require installing postgres or mysql, but it is not generally seen as performant as postgres or mysql though.
Note: there are three environments that can be setup: a development environment, a test environment, and a production environment, and these are basically assigned automatically depending on how you deploy the app.
- Development environment - “apollo run-local” or “apollo debug”
- Test environment - “apollo test”
- Production environment - “apollo deploy”
Note: If there are no users and no annotations, a bootstrap procedure can also automatically create some annotations and users to start up the app so there is something in there to begin with.
UrlMapping configuration:¶
The UrlMappings are stored in grails-app/conf/UrlMappings.groovy
The UrlMappings sets up a mapping from routes to controllers
Standard and customized mappings go in here. The way we route jbrowse to organism data directories is also controlled here. The organismJBrowseDirectory is set for a particular session, per user. If none specified, it brings up a default one.
Build configuration¶
The build configuration is stored in grails-app/conf/BuildConfig.groovy
If there are libraries that are missing are are to be added, you can add them here.
Additionally, the build system uses the “apollo” script and the “build.xml” to control the compilation and resource steps.
Central config¶
The central configuration is stored in grails-app/conf/Config.groovy
The central Grails config contains logging, app config, and also can reference external configs. The external config can override settings without even touching the application code using this method
In our application, we use the apollo-config.groovy then everything in there supersedes this file.
The log4j area can enable logging levels. You can turn on the “debug grails.app” to output all the webapollo debug info, or also set the “grails.debug” environment variable for java too.
There is also some Apollo configuration here, and it is mostly covered by the configuration section.
GWT web-app¶
When GWT compiles, it loads files into the web-app directory. When it loads up annotator, it goes to annotator index (the way things get loaded) it does an include annotator.nocache.js file, and with that, it includes all GWT stuff for the /annotator/index route. The src/gwt/org/bbop/apollo/gwt/ contains much code and the src/gwt/org/bbop/apollo/gwt/Annotator.gwt.xml is a central config file for the GWT web-app.
User interface definitions¶
A Bootstrap/GWT interface handles the tabs on the right for the new UI. The annotator object is at the root of everything.
Example definition: MainPanel.ui.xml
Tests¶
Unit tests¶
Unit tests and some basic javascript tests are running on Travis-CI (see .travis.yml for example script).
You can also run “apollo test” to run the tests locally. It will use the “test” database configuration automatically.
Also see the testing notes for more details.
Command line tools¶
The command line tools offer a number of interesting features that can be used to help setup and retrieve data from the application.
Overview¶
The command line tools are located in docs/web_services/examples, and they are mostly small scripts that automate the usage of the the web services API.
get_gff3.groovy¶
Example:
get_gff3.groovy -organism Amel_4.5 -username admin@webapollo.com \
-password admin_password -url http://localhost:8080/apollo > my output.gff3
This command can accept an -output argument to output to file, or the stdout can be redirected.
The -username and -password can be specified via the command line or if omitted, the user will be prompted.
get_fasta.groovy¶
Example:
get_fasta.groovy -organism Amel_4.5 -username admin@webapollo.com \
-password admin_password -seqtype cds/cdna/peptide -url http://localhost:8080/apollo > output.fa
This command can accept an -output argument to output to file, or the stdout can be redirected.
The -username and -password can be specified via the command line (similar to get_gff3.groovy
) or if omitted, the user
will be prompted.
add_users.groovy¶
Example:
add_users.groovy -username admin@webapollo.com -password admin_password \
-newuser newuser@test.com -newpassword newuserpass \
-destinationurl http://localhost:8080/apollo
The -username and -password refer to the admin user, and they can also be specified via stdin instead of the command line if they are omitted.
A list of users specified in a csv file can also be used as input.
add_organism.groovy¶
Example:
add_organism.groovy -name yeast -url http://localhost:8080/apollo/ \
-directory /opt/apollo/yeast -username admin@webapollo.com -password admin_password
The -directory refers to the jbrowse data directory containing the output from prepare-refseqs.pl, flatfile-to-json.pl, etc. The -blatdb is optional, -genus, and -species are optional.
The -username and -password refer to the admin user, and they can also be specified via stdin instead of the command line if they are omitted.
delete_annotations_from_organism.groovy¶
Example:
docs/web_services/examples/groovy/delete_annotations_from_organism.groovy -destinationurl http://localhost:8080/apollo\
-organismname honeybee2
This script will delete any annotations associated with a given organism.
Web Service API¶
The Apollo Web Service API is a JSON-based REST API to interact with the annotations and other services of Apollo. Both the request and response JSON objects can contain feature information that are based on the Chado schema. We use the web services API scripting examples and we also use them in the Apollo JBrowse plugin.
The most up to date Web Service API documentation is deployed from the source code rest-api-doc annotations.
See http://demo.genomearchitect.io/Apollo2/jbrowse/web_services/api for details
Warning¶
If you are sending password you care about over the wire (even if not using web services) it is highly recommended that you use https (which adds encryption ssl) instead of http.
Examples¶
We provide an examples directory.
curl -b cookies.txt -c cookies.txt -e "http://localhost:8080" \
-H "Content-Type:application/json" \
-d "{'username': 'demo', 'password': 'demo'}" \
"http://localhost:8080/apollo/Login?operation=login"
Login expects two parameters: username
and password
, and optionally rememberMe for a
persistent cookie.
A successful login returns a empty JSON object
Python Client¶
A python client has been provided over many of the Apollo web services, which is easy to setup:
pip install apollo
arrow init # provide Apollo credentials
arrow -h
## have fun
arrow groups get_groups
Documentation on commands and some examples working with jq:
What is the Web Service API?¶
For a given Apollo server url (e.g., https://localhost:8080/apollo
or any other Apollo site on the web), the
Web Service API allows us to make requests to the various “controllers” of the application and perform operations.
The controllers that are available for Apollo include the AnnotationEditorController, the OrganismController, the IOServiceController for downloads of data, and the UserController for user management.
Most API requests will take:
- The proper url (e.g., to get features from the AnnotationEditorController, we can send requests to
(e.g
http://localhost/apollo/annotationEditor/getFeatures
) - username - an authorized user (also uses session if none specified)
- password - password (also uses session if none specified)
- organism - (if applicable) the “common name” of the organism for the operation – will also pull from the “user preferences” if none is specified.
- track/sequence - (if applicable) reference sequence name (shown in sequence panel / genomic browse)
- uniquename - (if applicable) the uniquename is a UUID used to guarantee a unique ID
Errors If an error has occurred, a proper HTTP error code (most likely 400 or 500) and an error message. is¶
returned, in JSON format:
{ "error": "error message" }
Cookies¶
The Apollo Login creates a JSESSIONID cookie and rememberMe cookie (if applicable) and these can be used in downstream API requests (for example, by setting -b cookies.txt in curl will preserve the cookie in the request).
You can also pass username/password to individual API requests and these will authenticate each individual request.
Representing features in JSON¶
Most requests and responses will contain an array of feature
JSON objects named features
. The feature
object is
based on the Chado feature
, featureloc
, cv
, and cvterm
tables.
{
"residues": "$residues",
"type": {
"cv": {
"name": "$cv_name"
},
"name": "$cv_term"
},
"location": {
"fmax": $rightmost_intrabase_coordinate_of_feature,
"fmin": $leftmost_intrabase_coordinate_of_feature,
"strand": $strand
},
"uniquename": "$feature_unique_name"
"children": [$array_of_child_features]
"properties": [$array_of_properties]
}
where:
residues
- A sequence of alphabetic characters representing biological residues (nucleic acids, amino acids) [string]type.cv.name
- The name of the ontology [string]type.name
- The name for the cvterm [string]location.fmax
- The rightmost/maximal intrabase boundary in the linear range [integer]location.fmin
- The leftmost/minimal intrabase boundary in the linear range [integer]strand
- The orientation/directionality of the location. Should be 0, -1 or +1 [integer]uniquename
- The unique name for a feature [string]children
- Array of child feature objects [array]properties
- Array of properties (including frameshifts for transcripts) [array]
Note that different operations will require different fields to be set (which will be elaborated upon in each operation section).
Web Services API¶
The most up to date Web Service API documentation is deployed from the source code rest-api-doc annotations
See http://demo.genomearchitect.io/Apollo2/jbrowse/web_services/api for details