CitoEngine’s documentation

CitoEngine is an alert management system that helps you manage chaos in a better way.

Contents:

Overview

The problem:

Configuring monitoring systems to alert properly is an art. It’s a fine art of configuring thresholds when your monitoring parameters vary widely or when the monitoring tools lack capability to monitor dynamic workloads. It also takes discipline in working with monitoring systems during release process or outages. Not all monitoring systems are configured or maintained properly. In the end you have alerts and lots of it!

What is CitoEngine ?

CitoEngine allows you to manage large volume of alerts and trigger actions. These actions could notify or act on the alert by executing a script (a plugin). It is ideal alert management service for teams who have multiple monitoring systems.

What can it do?

  • Accept alerts from any monitoring systems such as Nagios, Sensu, Cron-jobs, etc. and aggregate alerts.
  • Lookup such alerts (called Incidents) to user-defined Event ID’s and enable any action based on rules that meet a user-defined criteria
  • Plugins enable actions on Incidents. Plugins can be any script that run commands or make API calls.
  • Dashboards to give you an overview of all incoming alerts or grouped by Teams
  • It does not require any agents.
  • It plugins can be any executable script, no pesky DSL’s.

What it is not:

CitoEngine is not a monitoring system.

Architecture

_images/cito-flow-20140607.png

The entire system is divided in two groups: event_listener, queue, poller and engine fall in the CitoEngine group whereas plugin_server is a standalone service called CitoPluginServer.

All alerts enter the system via the event_listener and are sent over to the queue. A poller reading this queue fetches these events and begins to parse them. If a given event matches a definition in the system, it is accepted as an Incident. Each Event has one or more user-defined EventActions. The engine checks the threshold in real-time and fires the EventAction. Thresholds, at the moment, are limited to a conditional match of X events in Y seconds. The EventAction is simply telling the plugin_server to execute the user-defined plugin with the user-defined (customizable) parameters.

CitoEngine Terminology

CitoEngine’s web interface allows you to define Events, Teams, Categories, Users and PluginServers.

Events: An event definition includes a Summary, Description, owning Team, Severity and Category. Only members of the owning Team can act on Incidents generated upon this Event. No two Teams can share the same Event.

Incidents Any alert coming into the system (with a valid Event Code) is defined as an Incident.

Teams: Each team can have one or more Users and Events associated with them.

Category: This is a generic classifier for events. Example categories could be Network, Disk, CPU, etc. These categories do not affect the behavior of the EventActions.

Users: One user per installation. User can be part of multiple Teams. User permissions are as under:

  • SuperAdmin: Can do just about anything.
  • Admin: Can add teams.
  • User: Can add events and action incidents.
  • NOC: Can comment.
  • ReportsUser: Can only view reports.

Plugin Server Definition: Users can add links to the plugin server. Once added, the system will fetch the active plugins. These plugins can now be accessed by the users in Events -> EventActions.

EventActions: Users can define which plugin to execute based on a given threshold. The user can send any number of parameters to the remote plugin. CitoEngine comes with a few internal variables which can be use sent as parameters:

  • __ELEMENT__ Engine send the element name
  • __EVENTID__ Engine send the event ID
  • __INCIDENTID__ Engine send the incident ID
  • __MESSAGE__ Engine send the message which came in by the alerting system.

Installing CitoEngine

The following guide shows installation steps on Ubuntu 12.04 x86_64. Theoretically application dependencies can be fulfilled on any linux distribution viz. Redhat, ArchLinux, etc. In future, we will try to include installation steps for other distributions as well. Python module dependencies are installed using pip rather than system installer, this gives us more control towards using modules of specific versions. The following steps assume that you will be installing in /opt/cito directory.

Installation

Install dependencies

# Installing MySQL and Python development packages
sudo apt-get install libmysqlclient-dev python-dev python-pip
sudo pip install virtualenv

MySQL Installation and Configuration

# Install mysql server
sudo apt-get install mysql-server mysql-client

# Setup mysql root password
sudo dpkg-reconfigure mysql-server-5.5
# Create a new database 'cito'
sudo mysqladmin -uroot -p create cito
# Create a new mysql user
/usr/bin/mysql -uroot -p -e "GRANT ALL PRIVILEGES ON cito.* TO 'cito_user'@'localhost' IDENTIFIED BY 'MINISTRYOFSILLYWALKS' with GRANT OPTION"

Setup python virtualenv

We recommend you use virtualenv for running cito engine, this will help you keep manage the dependencies better. Download the latest build

cd /opt/
git clone https://github.com/CitoEngine/cito_engine /opt/cito
sudo mkdir -p /opt/virtualenvs && sudo chown $USER /opt/virtualenvs/ && cd /opt/virtualenvs
virtualenv --no-site-packages /opt/virtualenvs/citovenv
source /opt/virtualenvs/citovenv/bin/activate
pip install -q --upgrade setuptools
pip install -r /opt/cito/requirements.txt

Edit default settings: /opt/cito/cito/settings/production.py

Message Queue Configuration:

You can either use AWS:SQS or RabbitMQ as your message queue. Edit either of these configuration blocks and make sure you select QUEUE_TYPE to be either SQS or RABBITMQ

##################################
# AWS::SQS Configuration settings
##################################
AWS_CONF = dict()
AWS_CONF['region'] = 'us-east-1'
AWS_CONF['awskey'] = ''
AWS_CONF['awssecret'] = ''
AWS_CONF['sqsqueue'] = 'citoq'


##################################
# RabbitMQ Configuration settings
##################################
RABBITMQ_CONF = dict()
RABBITMQ_CONF['host'] = 'localhost'
RABBITMQ_CONF['port'] = 5672
RABBITMQ_CONF['username'] = 'cito_user'
RABBITMQ_CONF['password'] = 'CHANGEME!'
RABBITMQ_CONF['ssl'] = False
RABBITMQ_CONF['exchange'] = ''
RABBITMQ_CONF['vhost'] = '/cito_event_listener'
RABBITMQ_CONF['queue'] = 'cito_commonq'

##############################
# Queue type: SQS or RABBITMQ
##############################
QUEUE_TYPE = 'RABBITMQ'

Note

Avoid editing /opt/cito/cito/settings/base.py unless you know what you are doing.

Setting up RabbitMQ (Optional):

If you are planning to use RabbitMQ, the following three lines should get you started.

sudo rabbitmqctl add_user cito_user cito_pass
sudo rabbitmqctl add_vhost /cito_event_listener
sudo rabbitmqctl set_permissions -p /cito_event_listener cito_user ".*" ".*" ".*"

Database Configuration:

#Database config
DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',   # Add 'postgresql_psycopg2', 'mysql', 'sqlite3' or 'oracle'.
        'NAME': 'cito',                         # Or path to database file if using sqlite3.
        'USER': '',                             # Not used with sqlite3.
        'PASSWORD': '',                         # Not used with sqlite3.
        'HOST': '',                             # Set to empty string for localhost. Not used with sqlite3.
        'PORT': '',                             # Set to empty string for default. Not used with sqlite3.
        'OPTIONS': {
            'init_command': 'SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED'
        }
    }
}

Initializing the tables and creating an admin account.

cd /opt/cito

# Populate the database
python manage.py syncdb --noinput --migrate

# Update django secret (for csrf)
# If you are using the webapp on multiple nodes behind a load balancer,
# make sure th secret_key.py file is same on all nodes.
sudo sh -c '/opt/cito/bin/create-django-secret.py > /opt/cito/cito/settings/secret_key.py'

# Create your first superuser!
python manage.py createsuperuser

That’s it, you are done!

Note

You can always validate your installation using the command python manage.py validate

Starting the services

CitoEngine is divided in three parts, poller, listener and webapp. You can either run the helper scripts in the /opt/cito/bin directory, or you can run the using manage.py <command>

Start CitoEngine SQS Poller service

/opt/cito/bin/cito-poller.sh

Start CitoEngine Event Listener service

/opt/cito/bin/cito-listener.sh

Start CitoEngine Webapp

We would recommended that you execute it with lower privileges. Have a look at bin/cito-webapp.sh for more information.

/opt/cito/bin/cito-webapp.sh

Open your browser and access http://<host>:8000 to login to CitoEngine with the admin account you created earlier.

Installing Cito Plugin Server

The following guide shows installation steps on Ubuntu 12.04 x86_64. Theoretically application dependencies can be fulfilled on any linux distribution viz. Redhat, ArchLinux, etc. In future, we will try to include installation steps for other distributions as well. Python module dependencies are installed using pip rather than system installer, this gives us more control towards using modules of specific versions. The following steps assume that you will be installing in /opt/cito_plugin_server directory.

Installation

Install dependencies

# Installing MySQL and Python development packages
sudo apt-get install libmysqlclient-dev python-dev python-pip
sudo pip install virtualenv

MySQL Installation and Configuration

# Install mysql server
sudo apt-get install mysql-server mysql-client

# Setup mysql root password
sudo dpkg-reconfigure mysql-server-5.5
# Create a new database 'cito_plugin_server'
sudo mysqladmin -uroot -p create cito_plugin_server
# Create a new mysql user
/usr/bin/mysql -uroot -p -e "GRANT ALL PRIVILEGES ON cito_plugin_server.* TO 'cito_user'@'localhost' IDENTIFIED BY 'MINISTRYOFSILLYWALKS' with GRANT OPTION"

Setup python virtualenv

We recommend you use virtualenv for running cito engine, this will help you keep manage the dependencies better. Download the latest build

cd /opt/
git clone https://github.com/CitoEngine/cito_plugin_server /opt/cito_plugin_server
sudo mkdir -p /opt/virtualenvs && sudo chown $USER /opt/virtualenvs/ && cd /opt/virtualenvs
virtualenv --no-site-packages /opt/virtualenvs/citopluginvenv
source /opt/virtualenvs/citopluginvenv/bin/activate
pip install -q --upgrade setuptools
pip install -r /opt/cito_plugin_server/requirements.txt

Edit default settings: /opt/cito_plugin_server/cito_plugin_server/settings/production.py

#Database config
DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',   # Add 'postgresql_psycopg2', 'mysql', 'sqlite3' or 'oracle'.
        'NAME': 'cito_plugin_server',                         # Or path to database file if using sqlite3.
        'USER': '',                             # Not used with sqlite3.
        'PASSWORD': '',                         # Not used with sqlite3.
        'HOST': '',                             # Set to empty string for localhost. Not used with sqlite3.
        'PORT': '',                             # Set to empty string for default. Not used with sqlite3.
        'OPTIONS': {
            'init_command': 'SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED'
        }
    }
}

PLUGIN_RUNNER_CONFIG = {
    'timeout': 5
}

PLUGIN_DIR = '/opt/plugins/'

Note

Avoid editting /opt/cito_plugin_server/cito_plugin_server/settings/base.py unless you know what you are doing.

Initializing the tables and creating an admin account.

cd /opt/cito_plugin_server

# Populate the database
python manage.py syncdb --noinput --migrate

# Update django secret (for csrf)
# If you are using the webapp on multiple nodes behind a load balancer,
# make sure th secret_key.py file is same on all nodes.
sudo sh -c '/opt/cito_plugin_server/bin/create-django-secret.py > /opt/cito_plugin_server/cito_plugin_server/settings/secret_key.py'

# Create your first superuser!
python manage.py createsuperuser

That’s it, you are done!

Note

You can always validate your installation using the command python manage.py validate

Starting the services

You can either run the helper scripts in the /opt/cito_plugin_server/bin directory, or you can run the using manage.py <command>

Start CitoEngine Plugin Server

We would recommended that you execute it with lower privileges. Have a look at bin/cito-webapp.sh for more information.

/opt/cito_plugin_server/bin/cito-plugin-server.sh

Open your browser and access http://<host>:8000 to login to CitoEngine Plugin Server with the admin account you created earlier.

Getting Started

It is highly recommended that you glance over the Architecture and CitoEngine Terminology docs before proceeding further.

Assuming you have the CitoEngine and CitoPluginServer setup, lets configure an end-to-end setup where we:

  1. Setup the Event codes.
  2. Setup the a Plugin
  3. Configure EventActions

Setting up Event Codes

With a fresh installation, you should first define some teams and categories before creating events. Head over to CitoEngine->Settings->Teams and add a few teams there (e.g. Ops, QA, DBA-Ops, etc.). Next, head over to CitoEngine->Settings->Categories and add a few event categories (e.g. Disk, CPU, Memory, Application, etc.).

Note

It is possible to change the names of teams and categories anytime after their creation.

To define an event, go to CitoEngine->Event Codes->Define New Event Code. Fill in the summary as needed e.g.:

Summary: /var full
Description: Server's /var partition is full, it needs to be cleaned up.
Severity: S3
Team: Ops
Category: Disk
Status: <enabled>

As this is your first event definition, its event code would be 1. With this bare minimum setup, you are now ready to accept Incidents (alerts) for EventCode: 1. Lets test our newly created event code:

curl -i -H "Content-Type: application/json" -X POST -d '{"eventid: 1, "element": "foo.bar.com", "message": "It Works!"}' http://localhost:8080/addevent

or

event_publisher.py -e 1 -H "foo.bar.com" -m "It Works!" --cito-server localhost --cito-port 8080

Note

You can find event_publisher.py in integerations tools repository.

Setting up a Plugin

Login to the plugin server and create an API Key CitoPluginServer->API Keys->Add new key e.g. Ops-Key

Next, we define a Plugin. This can be done at CitoPluginServer->Plugins->Add new Plugin. The Name here is what gets displayed in CitoEngine, so make sure it is unique and non ambigious. Remember, Plugin path field is relative to the PLUGIN_DIR in your settings file i.e. if you have /opt/citoplugins/clear_tmp.sh plugin and your settings is PLUGIN_DIR='/opt/citoplugins' then you just need to give clear_tmp.sh in Plugin path. To summarize, for our example, a plugin definition would look like:

Name: ClearTmp
Description: Clears /var/tomcat/temp folder.
Plugin path: clear_tmp.sh
Status: <enabled>
Accessible by: Cyrus, Ops-Key

Now lets go back to the API section and copy the URL listed under our previously defined API key e.g. http://192.168.77.77:9000/api/13429401-3e5b-46d4-9762-b40ce689386e

Add this to CitoEngine->Plugins->Add a server, once added click on the Refresh link in the listings page. This would query the plugin server and fetch all active plugins.

Configuring an Event Action

With the newly created Plugin (ClearTmp) ready to be used, lets go back to our previously created event and add an action against it. Go to CitoEngine->Events->View Event Codes and click on our example event. In the details page, click on Add an action to this event, this should show you the event action creation form. Select the plugin ClearTmp, make sure enabled checkbox is ticked.

We need to configure when to invoke the plugin. This can be done by setting the Threshold count and Threshold timer values. Threshold count of 2 and Threshold timer of 60 indicates that execute the plugin if this event is called 2 times in 60 secs

If you are using a self signed SSL certificate, you may want to uncheck the SSL Verify box on this page. Hit save and you are done.

Use the curl or event_publisher.py to send a few sample events making sure that your plugin is executed as intended.

Integrating CitoEngine with 3rd party tools

CitoEngine can be easily integrated with existing monitoring systems. All integration scripts mentioned here can be found at integerations tools repository.

Nagios

QuickStart

  1. Define an event in CitoEngine UI -> Events -> Define an event
  2. Add a custom Nagios variable called _CITOEVENTID in the service definition as shown below:
define service {
            service_description                   Total Processes
            _CITOEVENTID                          666
            check_command                         check_local_procs!250!400!RSZDT
            contact_groups                        +admins
            use                                   local-service
}
  1. Add the citoengine user to your notification group, in this example we will add the contact to the group admins:
define contactgroup {
            contactgroup_name                     admins
            alias                                 Nagios Administrators
            members                               bofh,citoengine
}

Note

Same logic applies for host definitions.

  1. Copy citoengine.cfg to your nagios’ directory and include it in nagios.cfg.
  2. Edit the citoengine.cfg file and replace the server and port to their actual values.
  3. Copy event_publisher.py script to /usr/local/bin/ and make it executable.

Bulk update of service definitions

If you have a lot of service definitions then the above steps may prove very tedious. To help you around this we an use a helper script called cito_config_parser.py. This script runs in two modes, one where it parses an existing service definition file and other where it updates the service definition file with the relevant event_id’s exported from CitoEngine. Here is how you can do it:

  1. Parse the existing service definition file:
cito_config_parser.py --type nagios -c services.cfg --parse --out my-services.txt
  1. Copy the output of my-services.txt into CitoEngine -> Tools -> Add events in bulk
  2. Select your Team, Severity, Category, etc and hit submit.
  3. The next page shows you a list of forms for each service definition you pasted above. Go through it carefully, modify it and hit submit.

5. Go to CitoEngine UI -> Events, select your Team check the Export CSV checkbox and hit search. The UI will give a CSV file of all your team’s events. Save this locally and have a quick look at it to confirm everything is in order.

  1. Generate the new services config using the following command:
cito_config_parser.py --type nagios -c services.cfg --events-file events.csv --generate --out new_services.cfg

Note

Do not run the --generate command on a previously configured services.cfg which already has _CITOEVENTID added. Always use the original service definition file.

Note

Sensu support will be released shortly.

Release Notes

0.9.0

  • Added RabbitMQ support.
  • Added bulk event creation feature (Tools -> Add events in bulk).
  • Added ability to export events in CSV.
  • UI will not allow creating events with duplicate summaries in the same category within user’s team.
  • Updated Django==1.6.5, Twisted==13.2.0, zope.interface==4.1.0
  • Launched the integration_tools repository to help integrate with 3rd party tools.
  • Lots of unittests, minor bug fixes, removal of cruft, etc.

0.8.0

  • Initial release

Indices and tables