Welcome to Mineotaur’s documentation!

Mineotaur is a web application to share and visually analyse high-throughput/high-content microscopy screens developed at the Carazo Salas lab of the University of Cambridge, Department of Genetics.

The project website can be found at http://www.mineotaur.org. Please cite the following paper when using Mineotaur: B. Antal, A. Chessel, R. E. Carazo Salas: Mineotaur: interactive visual analytics for high-content microscopy screens, under revision.

Contents:

Getting started

Motivation

Despite the ground-breaking discoveries in genomics, the genomes of most organisms remain black boxes with the function of the majority of genes and gene products still unknown. Moreover, many genes and proteins play roles in multiple biological processes. High-throughput/high-content microscopy-based screening (HT/HCS) provides an increasingly powerful tool to discover and functionally annotate genes and biological pathways, which already led to several important discoveries, like the systematic identification of genes important for mitosis, endocytosis, and other fundamental processes. Specialised large-scale image and data analysis methods are needed to produce phenotypic data, limiting such functional genomic annotation techniques to researchers of groups that possess that expertise. This means that the community at large is limited in their access to data and their ability to further mine it after publication, reducing the impact of the expensive HT/HC screens. Overall, while technical advances led to an explosion in the amount of data being acquired, suitable data handling, visualization and analysis techniques are still lagging behind.

What is Mineotaur?

Here we propose a novel data visualization tool called Mineotaur (http://www.mineotaur.org), which will allow the community to mine further the raw multidimensional feature data and knowledge from published HT/HC screens leading to a better exploitation of experimental results. The user interface allows the members of the community without any computational knowledge to extract meaningful information from the data. The web interface can be used for querying the data and the results are visualized as plots (e.g. scatter plot, histogram) in real-time. The tool is based on a novel data model allowing the visualization and analysis of extremely large amounts of data.

About the documentation

Installation describes how to generate a new Minetaur instance. To use an existing Mineotaur instance, see Using the web interface. Those who want to understand the technical aspects of Mineotaur better or would like to contribute to it, go to Developing Mineotaur.

Installation

Requirements

Mineotaur requires Java 8 or higher, which can be download here: http://www.oracle.com/technetwork/java/javase/downloads/index-jsp-138363.html

If you want to build Mineotaur from source, you will also need Maven: https://maven.apache.org/

Generting a Mineotaur instance from text files

To generate a Mineotaur instance, you have to provide three input files: a data file containing all the measurements you want to include in Mineotaur, a label file containing the annotations assigned to the objects in Mineotaur and a file setting several options in Mineotaur. A sample for all input file can be downloaded here..

Data file

The input data file can ?SV (? Separated Values), where ? is an appropriate separator set in the options file (e.g. TSV - Tab Separated Values). Each line describes a set of measurements for a descriptive object, which is a unique obejct of interest in the experiment. Each descriptive object should be connected to a group object. Examples: descriptive object - cell, group object - gene. The file is consists of a header, an object and a type descriptor and the data lines.

_images/data_file_example.png
Object descriptor

The second line of the data file. The object descriptor describes what kind of real-world object does the respective column belongs to. The object descriptors can be any string. However it is advised to give semantically relevant names to future usage. Examples: Gene, Cell, Experiment.

Type descriptor

The third line of the data file. The type descriptor describes the data type for each column. The following types are accepted: * ID: identifier for a given object. Can be multiple IDs for one object type. * NUMBER: numerical data. Each numerical column of the descriptive can be queried. * TEXT: non-numerical data.

Data lines

Each line after starting from the fourth should contain the actual measurements for a descriptive object and other meatadat connecting them to experimental conditions.

Label file

The label file contain the annotations for the group level objects. For example, what genes were picked up as hits in a study. The label file consists of a header line and multiple label lines.

_images/label_file_example.png
Header

The first line of the label file. The first column contains the name of the group object ID property from the data file, while the rest of the columns contain the annotations.

Label lines

Each line starting from the second contains a group object ID and a 1 for each annotation assigned to the group object or 0, otherwise.

Metadata wizard

Mineotaur also provides a graphical interface to provide the metadata required for a standard data file by starting the wizard from the command line:

java -jar <path_to_jar file> -metadata <data_file> <spearator_character>
_images/metadata_wizard.png

Options file

The options describes metadata for the instance generation. All options are in the following format: option_name = option_value. The following options can be set:

  • (REQUIRED) name: name of the instance
  • group: name of the group object (same as described in the data file). Default: GENE
  • groupName: group object ID (same as described in the data file). Default: geneID
  • descriptive: name of the group object (same as described in the data file). Default: CELL
  • total_memory: the amount of memory can be used by Neo4J. Default: 4G
  • separator: character used to separate columns in the data and the label files. Default: \t
  • overwrite: whether to overwrite the current instance with the same name. Default: true

Please note that the different object caching methods of the operating systems might affect the performance of Neo4J so it is advised to set the amount of total memory after some experimenting. Under OSX, it is also advised to perform a memory clean from time to time since a lot of object is kept in the memory, leading to performance loss in the long run.

Generation from command line

  1. Download the latest jar file from http://www.mineotaur.org.
  2. Create a property file, a data file and a label file (see documentation and example input data)
  3. Start the data import with the following command:
java -jar <path_to_jar file> -import mineotaur.input chia_sample.tsv chia_labels.tsv
  1. After the database creation is completed you can start your Mineotaur instance with the following command:
java –jar <path_to_jar file> -start <instance_name>
  1. You can start querying at http://127.0.0.1:8080 in your browser.

Generation using the wizard

  1. Download the latest jar file from http://www.mineotaur.org.
  2. Create a property file, a data file and a label file (see documentation and example input data)
  3. Start the data import with the following command:

java -jar <path_to_jar file> -wizard

_images/wizard.png
  1. After the database creation is completed you can start your Mineotaur instance with the following command:
java –jar <path_to_jar file> -start <instance_name>
  1. You can start querying at http://127.0.0.1:8080 in your browser.

Using the web interface

Layout

Each Mineotaur instance use the same web interface layout.

Query panel

In each query panel, a variety of different option can be set to customize the query.

_images/scatterplot_header.png

For more details, please go to the Query tools, Scatterplots and the Distribution plots pages.

Plot area

The plot showing the requested data will be shown below the query panel.

_images/scatterplot_header.png

After the query, the Tools menu is activated, which allows different actions regarding the plot and the queried data.

For more details, please go to the The Tools menu, Scatterplots and the Distribution plots pages.

Help

(Optional) If provided with the Mineotaur instance, clicking on the Help link show information on the elements shown on the current page.

_images/help.png

Query tools

Variable selection

First, the two variables (properties) to be shown needs to be selected.

_images/scatterplot_property.png

Clicking on the selection panel shows the available variables.

_images/propertylist.png

(Group level scatterplot only)The group level variables are aggregated from the descriptive level data. The aggregation mode selection panel allows the selection of how the group level value are supposed to be caluclated.

_images/aggregations.png

Finally, the descriptive data can be filtered via a filter property. That is, in this example only those cells are used in the query, which are in the selected cell cycle stage.

_images/descriptive_filters.png

Filtering by annotation

The queried group objects can be filtered by their annotations. For example, only genes with a certain hit type associated are shown:

_images/filter.png

Group selection

Finally, we can select what group objects what we want to show on the plot. For descriptive level scatterplots, it is done by selection the object of interest from a list:

_images/cellwise_geneselect.png

For group level scatterplots, the selection can be done by selecting objects from a selection menu:

_images/gene_checklist.png

The search box on the top of the menu enables quick lookup of the objects included in the screen:

_images/gene_checklist_filter.png

As an alternative, one could use the free text input by clicking the “Enter a list of genes” link below the selection box.

_images/gene_freetext.png

The entered gene names will be validated and the ones included in the screen will be selected. Once every option is selected, the submit button needs to be clicked. If you want to start over, click the Reset button which will turn every option to be their default settings.

_images/buttons.png

Scatterplots

A scatter plot shows two variables against each other in a 2D coordinate system. In a Mineotaur instance, there are two kinds of scatterplots: group level and descriptive level scatterplots. The query plot for a group level scatterplot looks like this:

_images/scatterplot_header.png

Using the scatterplot

Once you hit the submit button, the query is sent to the server and if there was data returned, a plot like this is displayed:

_images/scatterplot.png

Coloring

The coloring of the data point are based on the colors associated to each annotation (hit) type, which can be seen in the top right corner of the plot:

_images/legend.png

The nodes are also transparent, which enables the visual representation of multiple annotations (for which the coloring is the addition of the colors) as well as showing distribution of the data points.

Exploring invidiual data points

Name and values

To see the name of the underlying data point and the respective values for the queried variables, hover the mouse over the data point.

_images/scatterplot_hover.png
External resource

(Optional) Left clicking on the data points will open an external link associated to the object, e.g. the raw images used for analysis. This option only works if the external resource is provided during the instance generation.

Subqueries

By invoking the context menu (e.g. right-click in Windows or CMD+click in OSX) a subquery for the selected node can be created. That is, we can see the distribution of one of the queried variables or a descriptive scatterplot.

_images/context_menu.png

To go back to the original scatterplot, use the browser’s back button.

Plot tools

Plot tools contain several to transform or analyze plots

_images/plot_tools.png

Logarithm

Clicking on the Logarithm checkbox transform the axes of the plot to logarithmic scale.

_images/logarithm.png

To go back to the original scale, untick the checkbox.

Transpose

Clicking on the Logarithm checkbox swaps the X-axis and the Y-axis.

_images/transpose.png

To go back to the original scale, untick the checkbox.

Regression

Clicking on the Regression checkbox fits a regression line on the data shown in the current plot. The type of the regression line can be selected from the selection box next to the checkbox.

_images/regression.png

To see the correlation coefficient of the regression line, hover the mouse over the line:

_images/regression_hover.png

To remove the regression line, untick the checkbox.

Select area

To analyze a specific area of the plot, use the Select area tool. Checking the box transforms the cursor to an area selection tool, what you can use to draw a rectangle around the area to be selected:

_images/select_area.png

If you are satisfied with the selection, hover over the are and click on the Analyze button:

_images/select_area_analyze.png

Then, a plot showing the data points from the selected are is shown. To go back to the previous plot, use the browsers Back button.

Visual filtering of nodes

Since scatterplots can be overcrowded, it is might be hardd to find individual objects on a plot. For example, genes of interest can be highlighted on a plot by selecting them from the provided list and clicking on the Filter link.

_images/gene_checklist_filter.png _images/visual_filtering.png

The highlighting can be reset by using the Reset link.

Setting the opacity

To enable the visual inspection of crowded areas, once could use the opacity slider to set the right amount of transparency.

_images/opacity.png

Distribution plots

Distribution plots provide a graphical representation of the distribution of a variable. In Mineotaur, there are three distribution plot types, which can be selected from the Plot type selection box from the Distribution plot query menu.

_images/distribution_plot_type.png

Histogram

Histograms shows the frequncy of variable values along the selected dataset. The binning of the histogram is automatically calculated based on the data.

_images/histogram.png

Multihistogram

(Group level only) Multihistograms shows the frequncy of variable values along the selected dataset where the data is split according to the annotations assigned to the data points. The histograms belonging to the annotations are shown in different color. The legend is provided in the top right corner.

_images/multihistogram.png

Kernel Density Estimation

Kernel Density Estimation plots show a continious approximation of the distribution with a Gaussian function fitted to the data. In group level plots, the different colors refer to the data point annotations. The legend is provided in the top right corner.

_images/kde.png

The Tools menu

Now, the Tools menu is enabled, allowing different action to be taken regarding the current plot.

_images/tools.png

Filtering by hit Type

Just like for querying, the data points can be also filtered once the plot has been loaded. Unchecking and checking the boxes removes and puts back all queried data points for the respective hit types.

_images/filter.png

Other plot tools

There are also several tools to export, share or analyze the plots further.

_images/other_tools.png

Download

By clicking on the CSV or SVG link, the raw data for creating the plot or the plot itself in a vectorgraphial format can be downloaded, respectively.

Statistics

The Statistic tool provide a summary on the underlying data of the plot.

_images/statistics.png

Comparing charts

Multiple (up to 4) plots can be shown alongisde each other on a comparison sheet. By clicking on the Select for comparison link, the current plot is copied to the next available position in the comparison sheet.

_images/comparison.png

Once the plots are loaded, they can be removed, switched and plots from the same kind can be synchronized. That is, their axes are set to the same value, allowing straightforward visual comparison.

Developing Mineotaur

Software used to create Mineotaur

Server side:

Client side:

Architecture of Mineotaur

The Mineotaur web server can be accessed from both a web interface and programatically using REST. The web server handles the interaction with the graph database containing the HT/HCS data.

_images/mineotaur_architecture_new.png

Server side architecture

The web server if based on the Spring Model-View-Controller (MVC), using Thymeleaf as a template engine. The data is stored in the Neo4j graph database. A web client can access the content by making an HTTP request to the server, which will query the appropriate data from the database and render a web page from a Thymeleaf template.

_images/mineotaur_server_new.png

Client side architecture

On the client side, all interaction is done using a Javascript application. The application is modular, with different modules responsinble to handle events (Controller), carry data values (Context), manipulate web pages (UI), generate plots (Plot) and provide general functionalities (Utilities).

_images/mineotaur_client_modules.png

Features planned to be included in further releases of Mineotaur

  1. Omero integration
  2. REST client libraries
  3. Time-lapse data handling
  4. Network data handling

If you have any other suggestions, please let us now at info@mineotaur.org !

Licence

Mineotaur: a visual analytics tool for high-throughput microscopy screens Copyright (C) 2014 Bálint Antal (University of Cambridge)

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

Indices and tables