Pontoon - Translate the Web. In Place.

Pontoon is a web interface for translating text into other languages. Pontoon specializes in translating websites in-place, but can handle any project that uses one of the file formats it supports:

  • Gettext PO
  • XLIFF
  • FTL (L20n)
  • Properties
  • DTD
  • INI
  • INC
  • .lang

Pontoon pulls strings it needs to translate from an external source, and writes them back periodically. Typically these external sources are version control repositories that store the strings for an application. Supported external sources include:

  • Git
  • Mercurial
  • Subversion

Contents

Developer Setup

The following describes how to set up an instance of the site on your computer for development.

Prerequisites

This guide assumes you have already installed and set up the following:

  1. Git
  2. Python 2.7, pip, and virtualenv
  3. Node.js and npm
  4. Postgres 9.4 or 9.5

These docs assume a Unix-like operating system, although the site should, in theory, run on Windows as well. All the example commands given below are intended to be run in a terminal. If you’re on Ubuntu 16.04, you can install all the prerequisites using the following command:

sudo apt install git python-pip nodejs-legacy npm postgresql postgresql-server-dev-9.5 postgresql-contrib-9.5 libxml2-dev libxslt1-dev python-dev libmemcached-dev virtualenv

Installation

  1. Clone this repository or your fork:

    git clone --recursive https://github.com/mozilla/pontoon.git
    cd pontoon
    
  2. Create a virtualenv for Pontoon and activate it:

    virtualenv venv
    source ./venv/bin/activate
    

    Note

    Whenever you want to work on Pontoon in a new terminal you’ll have to re-activate the virtualenv. Read the virtualenv documentation to learn more about how virtualenv works.

  3. Install the dependencies using the latest version of pip:

    pip install --require-hashes -r requirements-dev.txt
    
  4. Create your database, using the following set of commands:

    sudo -u postgres psql
    CREATE USER pontoon WITH PASSWORD 'asdf' SUPERUSER;
    CREATE DATABASE pontoon;
    GRANT ALL PRIVILEGES ON DATABASE pontoon to pontoon;
    \q
    
  5. Create a .env file at the root of the repository to configure the settings for your development instance. It should look something like this:

    SECRET_KEY=insert_random_key
    DJANGO_DEV=True
    DJANGO_DEBUG=True
    DATABASE_URL=postgres://pontoon:asdf@localhost/pontoon
    SESSION_COOKIE_SECURE=False
    SITE_URL=http://localhost:8000
    FXA_CLIENT_ID=2651b9211a44b7b2
    FXA_SECRET_KEY=a3cafccbafe39db54f2723f8a6f804c337e362950f197b5b33050d784129d570
    FXA_OAUTH_ENDPOINT=https://oauth-stable.dev.lcip.org/v1
    FXA_PROFILE_ENDPOINT=https://stable.dev.lcip.org/profile/v1
    

    Make sure to make the following modifications to the template above:

    • SECRET_KEY should be set to some random key you come up with, as it is used to secure the authentication data for your local instance.
    • DATABASE_URL should contain the connection data for connecting to your Postgres database. It takes the form postgres://username:password@server_addr/database_name.
    • SITE_URL should be set to the URL you will use to connect to your local development site. Some people prefer to use http://127.0.0.1:8000 instead of localhost. However, should you decide to change the SITE_URL, you also need to request the new FXA_CLIENT_ID and FXA_SECRET_KEY, and our demo/intro site http://localhost:8000/intro will require change of base url.
  6. Initialize your database by running the migrations:

    python manage.py migrate
    
  7. Create a new superuser account:

    python manage.py createsuperuser
    

    Make sure that the email address you use for the superuser account matches the email that you will log in with via Firefox Accounts.

  8. Pull the latest strings from version control for the Pontoon Intro project (which is automatically created for you during the database migrations):

    python manage.py sync_projects --projects=pontoon-intro --no-commit
    
  9. After you’ve provided credentials to Firefox Accounts, you have to update them in database, because it’s required by django-allauth. You will have to call this command after every change in your FXA settings (e.g. client key):

    python manage.py updatefxaprovider
    
  10. Install the required Node libraries using npm:

npm install

Once you’ve finished these steps, you should be able to start the site by running:

python manage.py runserver

The site should be available at http://localhost:8000.

Extra settings

The following extra settings can be added to your .env file.

MICROSOFT_TRANSLATOR_API_KEY
Set your Microsoft Translator API key to use machine translation.
GOOGLE_ANALYTICS_KEY
Set your Google Analytics key to use Google Analytics.
MANUAL_SYNC
Enable Sync button in project Admin.
DJANGO_LOGIN
Set to True if you want to use the default Django login instead of Firefox Accounts. This will run allow you to log in via accounts created using manage.py shell.

Workflow

The following is a list of things you’ll probably need to do at some point while working on Pontoon.

Running Tests

You can run the automated test suite with the following command:

python manage.py test

Updating Your Local Instance

When changes are merged to the main Pontoon repository, you’ll want to update your local development instance to reflect the latest version of the site. You can use Git as normal to pull the latest changes, but if the changes add any new dependencies or alter the database, you’ll want to install any new libraries and run any new migrations.

If you’re unsure what needs to be run, it’s safe to just perform all of these steps, as they don’t affect your setup if nothing has changed:

# Pull latest code (assuming you've already checked out master).
git pull origin master

# Install new dependencies or update existing ones.
pip install -U --force --require-hashes -r requirements.txt

# Run database migrations.
python manage.py migrate

Building the Documentation

You can build the documentation with the following command:

# Enter the docs/ subdirectory
cd docs
make html

After running this command, the documentation should be available at docs/_build/html/index.html.

Note

Pontoon uses GraphViz as part of the documentation generation, so you’ll need to install it to generate graphs that use it. Most package managers, including Homebrew, have a package available for install.

Adding New Dependencies

Pontoon uses peep to install dependencies. Peep is a wrapper around pip that checks downloaded packages to ensure they haven’t been tampered with.

Because of this, adding a new library to requirements.txt is a bit more work as you need to add hashes for each library you want to install. To help make this easier, you can use the peepin tool to add new dependencies to the requirements file.

Translation Sync

At it’s core, Pontoon is a user interface for editing translations that are stored in a version control system. Because Pontoon does not directly edit the VCS files whenever a user submits a translation, it has to maintain a database of what it thinks the translated strings are. And, periodically, it has to sync with version control to pull newly-submitted strings and translations committed directly, as well as to write its own changes back.

This document describes that sync process in detail.

Triggering a Sync

Pontoon is assumed to run a sync once an hour, although this is configurable. When a sync is triggered, Pontoon finds all projects that are not marked as disabled within the admin interface and schedules a sync task for each one. Sync tasks are executed in parallel, using Celery to manage the worker queue.

Syncing a Project

Syncing an individual project is split into two tasks. The first one is syncing source strings:

  • Pull latest changes of the source string repository from version control.
  • Check for changes in VCS and in Pontoon, and if there are no changes in VCS and Pontoon and the project only uses one repository, skip syncing the project completely.
  • If source repository has changed since the last sync, reflect any added, changed or removed files in Pontoon.

The second step is syncing translations:

  • Pull latest changes of all project repositories from version control.
  • Check for changes in VCS and in Pontoon, and if there are no changes in VCS and Pontoon, quit early.
  • If there are changes, identify which files have changed and find all their entities, searching both the Pontoon database and VCS.
  • For each entity found, compare the VCS version to the Pontoon version (also known as the database version) and decide how to sync it. These changes are collected in a “Changeset” object.
  • Once all the entities are compared, execute the changes in the Changeset.
  • Commit any changes made in the filesystem back to the VCS. If there are no changes, no commit is made.
  • Clean up leftover information in the database.

Comparing Entities

The heart of the syncing process is comparing an entity stored in Pontoon’s database with its matching entity in the resource file in VCS and modifying both the database and the VCS file so that the two are in sync. It’s this process that determines when to update VCS with a submitted translation from Pontoon vs. when to update Pontoon with a translation from VCS, as well as other possible actions.

The comparison takes into account:

  • Whether an entity exists in the Pontoon database or VCS. VCS may be missing an entity due to a developer removing the source string, or Pontoon may be missing an entity due to a new string being added to VCS.
  • Whether a specific locale in VCS has an entity that Pontoon can update.
  • Whether an entity has changed in the Pontoon database since the last sync. This tracks if translations for an entity have been updated or deleted since the last time Pontoon synced.

The actual comparison logic goes something like this:

digraph sync_decision_tree {
compare_start[label="Comparing VCS/Pontoon entities"]
is_vcs_missing[label="VCS entity\nmissing?" shape=diamond];
vcs_missing[label="Mark as\nobsolete"];

is_db_missing[label="Pontoon entity\nmissing?" shape=diamond];
db_missing[label="Add entity to\nPontoon"];

for_each_locale[label="For each locale\navailable in the project" shape=rectangle];

is_vcs_translation_missing[label="VCS entity missing\nin locale?" shape=diamond]
vcs_translation_missing[label="Skip entity,\ncannot update"];

has_db_changed[label="Has Pontoon entity\nchanged since\nlast sync?" shape=diamond];
db_changed[label="Update VCS with\nPontoon translation"];
db_unchanged[label="Update Pontoon with\nVCS translation"];

compare_start -> is_vcs_missing
is_vcs_missing -> vcs_missing[label="Yes"]

is_vcs_missing -> is_db_missing[label="No"]
is_db_missing -> db_missing[label="Yes"]

is_db_missing -> for_each_locale[label="No"]
for_each_locale -> is_vcs_translation_missing
is_vcs_translation_missing -> vcs_translation_missing[label="Yes"]

is_vcs_translation_missing -> has_db_changed[label="No"]
has_db_changed -> db_changed[label="Yes"]
has_db_changed -> db_unchanged[label="No"]
}

Executing Changes

Entity comparison produces a Changeset, which is used to make the necessary changes to the database and resource files.

Changesets can perform 4 different operations on an entity:

Update Pontoon from VCS
Add a translation from VCS to Pontoon if necessary. Existing translations that match the VCS translation are re-used, and all non-matching translations are marked as unapproved.
Update VCS from Pontoon
Add a translation from Pontoon to VCS, overwriting the existing translation if it exists.
Create New Entity in Pontoon
Create a new entity in the Pontoon database, including the VCS translation if it is present.
Obsolete Pontoon Entity
Mark an entity in the database as obsolete, due to it not existing in VCS. The entity will no longer appear on the website.

When possible, Changesets perform database operations in bulk in order to speed up the syncing process.

Deployment

Pontoon is designed to be deployed on Heroku. To deploy an instance of Pontoon on Heroku, you must first create an app on your Heroku dashboard. The steps below assume you’ve already created an app and have installed the Heroku Toolbelt.

For quick and easy deployment without leaving your web browser, click this button:

Buildpack

Pontoon uses several buildpacks in a specific order. They are (in order):

  1. heroku-buildpack-submodules to fetch all related git submodules.
  2. heroku-buildpack-apt for installing Subversion.
  3. heroku-buildpack-ssh for setting up the SSH keys necessary for committing to version control.
  4. The official heroku/nodejs buildpack for installing Node.js programs for pre-processing frontend assets.
  5. The official heroku/python buildpack as our primary buildpack.

You can set these buildpacks on your app with the following toolbelt commands:

# Note that we use add and --index 1 to append to the top of the list.
heroku buildpacks:set heroku/python
heroku buildpacks:add --index 1 heroku/nodejs
heroku buildpacks:add --index 1 https://github.com/Osmose/heroku-buildpack-ssh.git#v0.1
heroku buildpacks:add --index 1 https://github.com/mozilla/heroku-buildpack-apt.git#v0.1
heroku buildpacks:add --index 1 https://github.com/dmathieu/heroku-buildpack-submodules#b37ffe4361bb9c975dd8e93068c9d296365d748c

Environment Variables

The following is a list of environment variables you’ll want to set on the app you create:

ADMIN_EMAIL
Optional. Email address for the ADMINS setting.
ADMIN_NAME
Optional. Name for the ADMINS setting.
CELERY_ALWAYS_EAGER
Controls whether asynchronous tasks (mainly used during sync) are sent to Celery or executed immediately and synchronously. Set this to False on production.
CELERYD_MAX_TASKS_PER_CHILD
Maximum number of tasks a Celery worker process can execute before it’s replaced with a new one. Defaults to 20 tasks.
DISABLE_COLLECTSTATIC

Disables running ./manage.py collectstatic during the build. Should be set to 1.

Heroku’s Python buildpack has a bug that causes issues when running node binaries during the compile step of the buildpack. To get around this, we run the command in our post-compile step (see bin/post_compile) when the issue doesn’t occur.

DJANGO_DEBUG
Controls DEBUG mode for the site. Should be set to False in production.
DJANGO_DEV
Signifies whether this is a development server or not. Should be False in production. Adds some additional django apps that can be helpful during day to day development.
ERROR_PAGE_URL
Optional. URL to the page displayed to your users when the application encounters a system error. See Heroku Reference for more information.
MAINTENANCE_PAGE_URL
Optional. URL to the page displayed to your users when the application is placed in the maintenance state. See Heroku Reference for more information.
NEW_RELIC_API_KEY
Optional. API key for accessing the New Relic REST API. Used to mark deploys on New Relic.
NEW_RELIC_APP_NAME
Optional. Name to give to this app on New Relic. Required if you’re using New Relic.
PROJECT_MANAGERS
Optional. A list of project manager email addresses to send project requests to
SECRET_KEY
Required. Secret key used for sessions, cryptographic signing, etc.
SITE_URL
Controls the base URL for the site, including the protocol and port. Defaults to http://localhost:8000, should always be set in production.
SSH_CONFIG

Contents of the ~/.ssh/config file used when Pontoon connects to VCS servers via SSH. Used for disabling strict key checking and setting the default user for SSH. For example:

StrictHostKeyChecking=no

Host hg.mozilla.org
User pontoon@mozilla.com

Host svn.mozilla.org
User pontoon@mozilla.com
SSH_KEY
SSH private key to use for authentication when Pontoon connects to VCS servers via SSH.
STATIC_HOST
Optional. Hostname to prepend to static resources paths. Useful for serving static files from a CDN. Example: //asdf.cloudfront.net.
SVN_LD_LIBRARY_PATH
Path to prepend to LD_LIBRARY_PATH when running SVN. This is necessary on Heroku because the Python buildpack alters the path in a way that breaks the built-in SVN command. Set this to /usr/lib/x86_64-linux-gnu/.

Note

Some environment variables, such as the SSH-related ones, may contain newlines. The easiest way to set these is using the heroku command-line tool to pass the contents of an existing file to them:

heroku config:set SSH_KEY="`cat /path/to/key_rsa`"
TZ
Timezone for the dynos that will run the app. Pontoon operates in UTC, so set this to UTC.

Provisioning Workers

Pontoon executes asynchronous jobs using Celery. These jobs are handled by the worker process type. You’ll need to manually provision workers based on how many projects you plan to support and how complex they are. At a minimum, you’ll want to provision at least one worker dyno:

heroku ps:scale worker=1

Add-ons

Pontoon is designed to run with the following add-ons enabled:

  • Database: Heroku Postgres
  • Performance Monitoring: New Relic APM
  • Log Management: Papertrail
  • Error Tracking: Raygun.io
  • Email: Sendgrid
  • Scheduled Jobs: Heroku Scheduler
  • Cache: Memcached Cloud
  • RabbitMQ: CloudAMQP

It’s possible to run with the free tiers of all of these add-ons, but it is recommended that, at a minimum, you run the “Standard 0” tier of Postgres.

Cache Add-ons

Pontoon uses django-pylibmc, which expects the following environment variables from the cache add-on:

MEMCACHE_SERVERS
Semi-colon separated list of memcache server addresses.
MEMCACHE_USERNAME
Username to use for authentication.
MEMCACHE_PASSWORD
Password to use for authentication.

Note

By default, the environment variables added by Memcached Cloud are prefixed with MEMCACHEDCLOUD instead of MEMCACHE. You can “attach” the configuration variables with the correct prefix using the addons:attach command:

heroku addons:attach resource_name --as MEMCACHE

Replace resource_name with the name of the resource provided by the cache addon you wish to use, such as memcachedcloud:30. Use the heroku addons command to see a list of resource names that are available.

RabbitMQ Add-ons

Similar to the cache add-ons, Pontoon expects environment variables from the RabbitMQ add-on:

RABBITMQ_URL
URL for connecting to the RabbitMQ server. This should be in the format for Celery’s BROKER_URL setting.

Note

Again, you must attach the resource for RabbitMQ as RABBITMQ. See the note in the Cache Add-ons section for details.

Scheduled Jobs

Pontoon requires a single scheduled job that runs the following command:

./manage.py sync_projects

It’s recommended to run this job once an hour. It commits any string changes in the database to the remote VCS servers associated with each project, and pulls down the latest changes to keep the database in sync.

Sync Log Retention

You may also optionally run the clear_old_sync_logs management command on a schedule to remove sync logs from the database that are over 90 days old:

./manage.py clear_old_sync_logs

Database Migrations

After deploying Pontoon for the first time, you must run the database migrations. This can be done via the toolbelt:

heroku run ./manage.py migrate

Creating an Admin User

After deploying the site, you can create a superuser account using the createsuperuser management command:

heroku run ./manage.py createsuperuser --noinput --user=admin --email=your@email.com

If you’ve already logged into the site with the email that you want to use, you’ll have to use the Django shell to mark your user account as an admin:

heroku run ./manage.py shell
# Connection and Python info...
>>> from django.contrib.auth.models import User
>>> user = User.objects.get(email='your@email.com')
>>> user.is_staff = True
>>> user.is_superuser = True
>>> user.save()
>>> exit()

Gotchas

  • Changing the SSH_KEY or SSH_CONFIG environment variables requires a rebuild of the site, as these settings are only used at build time. Simply changing them will not actually update the site until the next build.

    The Heroku Repo plugin includes a rebuild command that is handy for triggering builds without making code changes.

Maintenance

The following describes tricks and tools useful for debugging and maintaining an instance of Pontoon deployed to Heroku.

Monitoring Celery

Flower is a web interface for monitoring a Celery task queue. It’s useful for seeing how the worker dynos are handling sync jobs.

After installation, you can run a local instance of Flower and connect it to a Heroku-hosted instance of RabbitMQ:

# Replace my-app-name with your Heroku app's name.
flower --broker=`heroku config:get RABBITMQ_URL --app=my-app-name`