Welcome to Vizier DB - WebUser Interface’s documentation!

Vizier is a new powerful tool to streamline the data curation process. Data curation (also known as data preparation, wrangling, or cleaning) is a critical stage in data science in which raw data is structured, validated, and repaired. Data validation and repair establish trust in analytical results, while appropriate structuring streamlines analytics.

Vizier makes it easier and faster to explore and analyze raw data by combining a simple notebook interface with spreadsheet views of your data. Powerful back-end tools that track changes, edits, and the effects of automation. These forms of provenance capture both parts of the exploratory curation process - how the cleaning workflows evolve, and how the data changes over time.

Vizier is a collaboration between the University at Buffalo, New York University, and the Illinois Institute of Technology.

Contents

Install and Run

Before installing Vizier DB Web UI, you should install VizierDB - Web API. The Web API is the backend that provides the API that is used by the Vizier DB Web UI.

Install VizierDB - Web API

Installation is still a bit labor intensive. The following steps seem to work for now (requires Anaconda). If you want to use Mimir modules within your curation workflows a local installation of Mimir v0.2 is required. Refer to this guide for Mimir installation details.

Python Environment

To setup the Python environment clone the repository and run the following commands:

>>> git clone https://github.com/VizierDB/web-api.git
>>> cd web-api
>>> conda env create -f environment.yml
>>> source activate vizier
>>> pip install git+https://github.com/VizierDB/Vistrails.git
>>> pip install -e .

As an alternative the following sequence of steps might also work (e.g., for MacOS):

>>> git clone https://github.com/VizierDB/web-api.git
>>> cd web-api
>>> conda create --name vizier pip
>>> source activate vizier
>>> pip install -r requirements.txt
>>> pip install -e .
>>> conda install pyqt=4.11.4=py27_4

Configuration

The web server is configured using a configuration file. There are two example configuration files in the config directory (depending on whether including Mimir config-mimir.yaml or not config-default.yaml).

The configuration parameters are:

api

  • server_url: Url of the server (e.g., http://localhost)
  • server_port: Server port (e.g., 5000)
  • app_path: Application path for Web API (e.g., /vizier-db/api/v1)
  • app_base_url: Concatenation of server_url, server_port and app_path
  • doc_url: Url to API documentation

fileserver

  • directory: Path to base directory for file server
  • max_file_size: Maximum size for file uploads

engines

  • identifier: Engine type (i.e., DEFAULT or MIMIR)
  • name: Engine printable name
  • description: Descriptive text for engine
  • datastore: - directory: Base directory for data store

viztrails

  • directory: Base directory for storing viztrail information and meta data

name: Web Service name

debug: Flag indicating whether server is started in debug mode

logs: Path to log directory

When the Web server starts it first looks for the configuration file that is reference in the environment variable VIZIERSERVER_CONFIG. If the variable is not set the server looks for a file config.yaml in the current working directory.

Note that there is a config.yaml file in the working directory of the server that can be used for development mode.

Run Server

After adjusting the server configuration the server is run using the following command:

>>> cd vizier
>>> python server.py

Make sure that the conda environment has been activated using source activate vizier.

If using Mimir the gateway server sould be started before running the web server.

API Documentation

For development it can be helpful to have a local copy of the API documentation. The repository README contains information on how to install the UI locally.

Install VizierDB - Web UI

Start by cloning the repository and switching to the app directory.

>>> git clone https://github.com/VizierDB/web-ui.git
>>> cd web-ui

Inside the app directory, you can run several commands:

Install build dependencies

>>> npm install

Start the development server

>>> npm start

Bundles the app into static files for production

>>> npm build

Additional Commands

Starts the test runner.

>>> npm test

Remove this tool and copies build dependencies, configuration files and scripts into the app directory. If you do this, you can’t go back!

>>> npm eject

Configuration

The UI app connects to the Web API server. The Url for the server is currently hard-coded in the file public/env.js. Before running npm start adjust the Url to point to a running Web API server. By default a local server running on port 5000 is used.

Getting Started

Vizier organizes data curation workflows into projects.

  • Start by selecting or creating a new project under the Projects Tab.
  • If the data that you want to clean is currently stored in CSV files, these files have to be uploaded to the file server. You can upload your data files under the Files Tab.

Step 1

Create Project

alternate text

Begin by adding a project on the Vizier page (initial page), shown in the figure above, by clicking on the New Projects … button.

alternate text

On the New Project… dialog shown in figure above, enter the name of the project you would like to create, for example credit_card, and click on Submit button. You should now see the new project you added in the list of projects as shown below.

alternate text

Once project is added click on project name in the list of projects to data curation.

Step 2

Load Dataset

Continuing with our example of the Credit_Card project, we show here the methods of uploading data.

First, select one project from the list of projects, for example, credit_card project by clicking on the name project.

alternate text

Once you are inside the project, load the data by clicking in the sign +.

alternate text

Then, go to the column DATASET, and click on Load Dataset

alternate text

Then, upload the data set. You have to provide the data set name and the source file.

alternate text

For example, we entered credit card data as the name of the dataset for that project and selected ccard.csv dataset, then, click on the blue play icon.

alternate text

After loading the credit card dataset, we can start to explore and curate our data.

alternate text

Step 3

Spreadsheet Views

Vizier makes it easier and faster to explore and analyze raw data by combining a simple notebook interface with spreadsheet views of your data. Powerful back-end tools that track changes, edits, and the effects of automation. These forms of provenance capture both parts of the exploratory curation process - how the cleaning workflows evolve, and how the data changes over time. To access the spreadsheet of our Credit Card project just go under the Credit Card Data Tab.

alternate text

Step 4

Chart Views

Vizier provides five types of chart: Simple bar chart, group bar chart, line chart, area chart and scatter plot. To create a chart, user have to select a module Plot>>Simple Chart from the list below.

alternate text

Then, fill the form and click on the blue play icon.

alternate text

To access the Chart view of our Credit Card project just go under the Age Income Tab which is the name of the chart.

alternate text

Indices and tables