Welcome to Vizier DB - WebUser Interface’s documentation!¶
Vizier is a new powerful tool to streamline the data curation process. Data curation (also known as data preparation, wrangling, or cleaning) is a critical stage in data science in which raw data is structured, validated, and repaired. Data validation and repair establish trust in analytical results, while appropriate structuring streamlines analytics.
Vizier makes it easier and faster to explore and analyze raw data by combining a simple notebook interface with spreadsheet views of your data. Powerful back-end tools that track changes, edits, and the effects of automation. These forms of provenance capture both parts of the exploratory curation process - how the cleaning workflows evolve, and how the data changes over time.
Vizier is a collaboration between the University at Buffalo, New York University, and the Illinois Institute of Technology.
Contents¶
Install and Run¶
Before installing Vizier DB Web UI, you should install VizierDB - Web API. The Web API is the backend that provides the API that is used by the Vizier DB Web UI.
Install VizierDB - Web API¶
Installation is still a bit labor intensive. The following steps seem to work for now (requires Anaconda). If you want to use Mimir modules within your curation workflows a local installation of Mimir v0.2 is required. Refer to this guide for Mimir installation details.
Python Environment¶
To setup the Python environment clone the repository and run the following commands:
>>> git clone https://github.com/VizierDB/web-api.git
>>> cd web-api
>>> conda env create -f environment.yml
>>> source activate vizier
>>> pip install git+https://github.com/VizierDB/Vistrails.git
>>> pip install -e .
As an alternative the following sequence of steps might also work (e.g., for MacOS):
>>> git clone https://github.com/VizierDB/web-api.git
>>> cd web-api
>>> conda create --name vizier pip
>>> source activate vizier
>>> pip install -r requirements.txt
>>> pip install -e .
>>> conda install pyqt=4.11.4=py27_4
Configuration¶
The web server is configured using a configuration file. There are two example configuration files in the config directory (depending on whether including Mimir config-mimir.yaml
or not config-default.yaml
).
The configuration parameters are:
api
- server_url: Url of the server (e.g., http://localhost)
- server_port: Server port (e.g., 5000)
- app_path: Application path for Web API (e.g., /vizier-db/api/v1)
- app_base_url: Concatenation of server_url, server_port and app_path
- doc_url: Url to API documentation
fileserver
- directory: Path to base directory for file server
- max_file_size: Maximum size for file uploads
engines
- identifier: Engine type (i.e., DEFAULT or MIMIR)
- name: Engine printable name
- description: Descriptive text for engine
- datastore: - directory: Base directory for data store
viztrails
- directory: Base directory for storing viztrail information and meta data
name: Web Service name
debug: Flag indicating whether server is started in debug mode
logs: Path to log directory
When the Web server starts it first looks for the configuration file that is reference in the environment variable VIZIERSERVER_CONFIG
. If the variable is not set the server looks for a file config.yaml
in the current working directory.
Note that there is a config.yaml
file in the working directory of the server that can be used for development mode.
Run Server¶
After adjusting the server configuration the server is run using the following command:
>>> cd vizier
>>> python server.py
Make sure that the conda environment has been activated using source activate vizier
.
If using Mimir the gateway server sould be started before running the web server.
API Documentation
For development it can be helpful to have a local copy of the API documentation. The repository README contains information on how to install the UI locally.
Install VizierDB - Web UI¶
Start by cloning the repository and switching to the app directory.
>>> git clone https://github.com/VizierDB/web-ui.git
>>> cd web-ui
Inside the app directory, you can run several commands:
Install build dependencies
>>> npm install
Start the development server
>>> npm start
Bundles the app into static files for production
>>> npm build
Additional Commands
Starts the test runner.
>>> npm test
Remove this tool and copies build dependencies, configuration files and scripts into the app directory. If you do this, you can’t go back!
>>> npm eject
Configuration¶
The UI app connects to the Web API server. The Url for the server is currently hard-coded in the file public/env.js
. Before running npm start
adjust the Url to point to a running Web API server. By default a local server running on port 5000 is used.
Getting Started¶
Vizier organizes data curation workflows into projects.
- Start by selecting or creating a new project under the Projects Tab.
- If the data that you want to clean is currently stored in CSV files, these files have to be uploaded to the file server. You can upload your data files under the Files Tab.
Step 1¶
Create Project¶

Begin by adding a project on the Vizier page (initial page), shown in the figure above, by clicking on the New Projects … button.

On the New Project… dialog shown in figure above, enter the name of the project you would like to create, for example credit_card, and click on Submit button. You should now see the new project you added in the list of projects as shown below.

Once project is added click on project name in the list of projects to data curation.
Step 2¶
Load Dataset¶
Continuing with our example of the Credit_Card project, we show here the methods of uploading data.
First, select one project from the list of projects, for example, credit_card project by clicking on the name project.

Once you are inside the project, load the data by clicking in the sign +.

Then, go to the column DATASET, and click on Load Dataset

Then, upload the data set. You have to provide the data set name and the source file.
For example, we entered credit card data as the name of the dataset for that project and selected ccard.csv dataset, then, click on the blue play icon.
After loading the credit card dataset, we can start to explore and curate our data.
Step 3¶
Spreadsheet Views¶
Vizier makes it easier and faster to explore and analyze raw data by combining a simple notebook interface with spreadsheet views of your data. Powerful back-end tools that track changes, edits, and the effects of automation. These forms of provenance capture both parts of the exploratory curation process - how the cleaning workflows evolve, and how the data changes over time. To access the spreadsheet of our Credit Card project just go under the Credit Card Data Tab.

Step 4¶
Chart Views¶
Vizier provides five types of chart: Simple bar chart, group bar chart, line chart, area chart and scatter plot. To create a chart, user have to select a module Plot>>Simple Chart from the list below.

Then, fill the form and click on the blue play icon.

To access the Chart view of our Credit Card project just go under the Age Income Tab which is the name of the chart.
