Welcome to the Graylog documentation

NOTE: There are multiple options for reading this documentation. See link to the lower left.

Architecture

Every Graylog System is composed of at least one instance of Graylog Server, MongoDB and Elasticsearch. Each of these components are required and cannot be substituted with any other technology.

Minimum

In a minimum Graylog deployment, all three components are installed on a single host. A minimum Graylog setup that can be used for smaller, non-critical, or test setups.

None of the components are redundant but it is easy and quick to set up.

_images/architec_small_setup.png

Our Virtual Machine Appliances are using this design by default, deploying nginx as frontend proxy.

Simple Multi-Node

In a Simple Multi-Node system, Graylog and Elasticsearch components each reside on their own hosts. Most customers install MongoDB on same host as the Graylog Server. Since it is used primarily for application configuration information, the load on MongoDB is low enough that it does not typically need it’s own host.

Complex Multi-Node

For larger environments, or where High Availability is required, Graylog may be deployed in a Complex Multi-Node configuration. Both Graylog and Elasticsearch may be clustered to provide resilience in case of a node failure. Multi-node systems are often required in order support a high volume of events.

Complex Multi-Node designs will be required for larger production environments. It is comprised of two or more Graylog nodes behind a load balancer distributing the processing load.

The load balancer can ping the Graylog nodes via HTTP on the Graylog REST API to check if they are alive and take dead nodes out of the cluster.

_images/architec_bigger_setup.png

How to plan and configure such a setup is covered in our Multi-node Setup guide.

Some guides on the Graylog Marketplace also offer some ideas how you can use RabbitMQ (AMQP) or Apache Kafka to add some queueing to your setup.

Architectural considerations

There are a few rules of thumb when scaling resources for Graylog:

  • Graylog nodes should have a focus on CPU power. These also serve the user interface to the browser.
  • Elasticsearch nodes should have as much RAM as possible and the fastest disks you can get. Everything depends on I/O speed here.
  • MongoDB is storing meta information and configuration data and doesn’t need many resources.

Also keep in mind that ingested messages are only stored in Elasticsearch. If you have data loss in the Elasticsearch cluster, the messages are gone - unless you have created backups or archives of the indices.

Getting Started

This guide is designed for first time users and is intended to give enough key information to get Graylog installed and configured initially. Each section links to additional details on the topic.

Graylog is a very flexible solution. It can be deployed in many different ways. For those who would like to do an initial lab evaluation of Graylog, we recommend starting with the virtual machine appliances.

Virtual Appliances are the fastest way to get started. However, since the Virtual Appliances are generally not suitable for use in production, they should be used strictly for proof of concept, evaluations or lab environments. Users should plan to pick one of the other, more flexible installation methods for a production deployment.

If you need assistance planning and building your logging environment we offer professional support that can work with you.

Planning Your Log Collection

We know you are eager to get Graylog installed and working, but we ask that you take a few moments to review this section and plan your deployment appropriately. Proper planning will make the difference between a useful solution that meets a variety of stakeholder needs and a complicated mess that drains resources and provides little value. There are many factors you must consider when designing a log management solution.

Strategies

Even in a small organization, modern environments produce a lot of log data. Not long ago, 500 MB per day was considered a normal volume of logs for a small shop. Today, 5GB per day is not unusual for a small environment. A large environment can produce a thousand times more than that.

Assuming an average event size of 500k, 5GB per day equates to 125 log events every second, some 10.8 million events per day. With that much information being generated, you will need a strategy to manage it effectively. There are two major approaches.

Minimalist

“Doing the needful”

The Minimalist Strategy proceeds from a “Default No” position when deciding which events to collect. What that means is you don’t collect any log unless it is required for an identified business use case. This strategy has some advantages, it keeps licensing and storage costs down, by reducing the volume of collected events. It also minimizes the “noise” produced by extraneous events, allowing analysts to focus on events that have maximum value. Finally, it improves system and query efficiency, improving performance overall.

Maximalist

“Collect it all, let Graylog sort it out.”

The Maximalist strategy is to collect all events that are produced by any source. The thinking goes, all log data is potentially valuable, especially for forensics. Collecting it all and keeping it forever guarantees you will have it if you need it. However, this strategy is often not practical, due to budgetary or other constraints. The cost of this strategy can be prohibitive, since many more technical and human resources must be devoted to collection, processing and storage of event data. There is a performance penalty associated with keeping extremely large data sets online that must be considered as well.

Use Cases

“What do you want to do with event data?”

Use cases should inform most decisions during the planning phase. Some of these decisions include determining the event sources from which you must collect, how you will collect from these sources, how much of each event type to store, how events should be enriched and how long to retain the data.

Use case, broadly defined, means the technical steps necessary to achieve a technical and/or business outcome. An easier way to think about it is that a use case is a description of what you want to do with an event log once you’ve collected it. Use cases are often categorized to group like activities. Some common use case categories are Security, Operations, and DevOps. An example of a Security use case might be monitoring user logins to critical resources. An operations use case might monitor network or hardware performance, while DevOps use cases would focus on real-time application layer monitoring or troubleshooting.

Event Log Sources

“What logs do you need to collect?”

In an environment where seemingly everything generates event logs, it can be difficult to know what to collect. In most cases, selection of event sources should be driven by the use cases you have identified. For example, if the use case is monitoring of user logins to critical resources, the event sources selected should be only those of the critical resources in question. Perhaps the LDAP directory server, Local servers, firewalls, network devices, and key applications.

Some other potential event sources by category.

Security

  • Firewalls
  • Endpoint Security (EDR, AV, etc.)
  • Web Proxies/Gateways
  • LDAP/Active Directory
  • IDS
  • DNS
  • DHCP
  • Servers
  • Workstations
  • Netflow

Ops

  • Applications
  • Network Devices
  • Servers
  • Packet Capture/Network Recorder
  • DNS
  • DHCP
  • Email

DevOps

  • Application Logs
  • Load Balancer Logs
  • Automation System Logs
  • Business Logic

Collection method

“How will you collect it?”

After a list of event sources has been determined, the next step is to decide the method of collection for each source. Although many hardware and software products support common methods such as sending log data via syslog, many do not. It is critical to understand what method each event source uses and what resources that may require. For example, if a log shipper will be required to read logs from a local file on all servers, a log shipper must be selected and tested prior to deployment. In other cases, proprietary API’s or software tools must be employed and integrated.

In some cases, changes to the event sources themselves (security devices, network hardware or applications) may be required. Additional planning is often required to deploy and maintain these collection methods over time.

Graylog supports many input types out of the box. More inputs are available in the Graylog Marketplace. At the time of writing, Graylog supports the following:

  • Syslog (TCP, UDP, AMQP, Kafka)
  • GELF (TCP, UDP, AMQP, Kafka, HTTP)
  • AWS (AWS Logs, FlowLogs, CloudTrail)
  • Beats/Logstash
  • CEF (TCP, UDP, AMQP, Kafka)
  • JSON Path from HTTP API
  • Netflow (UDP)
  • Plain/Raw Text (TCP, UDP, AMQP, Kafka)

The Graylog Marketplace is the central directory of add-ons for Graylog. It contains plugins, content packs, GELF libraries and more content built by Graylog developers and community members.

_images/marketplace.png

Users

“Who will use the solution?”

The most important user-related factor to consider is the number of users. If the number is large, or if many users will be querying the data simultaneously, you may want to take that into consideration when designing an architecture.

The users’ level of skill should be considered. Less technical users may require more pre-built content, such as dashboards. They may also require more training.

Consideration should also be paid as to what event sources each user group should have access. As in all questions of access control, the principle of least privilege should apply.

Some typical user groups include:

  • Security Analysts
  • Engineers
  • Management
  • Help Desk

Retention

“How long will you keep the data?”

A key question when planning your log management system is log retention. There are two ways event log data may be retained, online or archived. Online data is stored in Elasticsearch and is searchable through the Graylog GUI. Archived data is stored in a compressed format, either on the Graylog server or on a network file share. It is still searchable, via GREP for example, but must be reconstituted in Graylog in order to be searchable through the GUI again.

Some regulatory frameworks require retention of event log data for a proscribed period. In the absence of a clear requirement, the question becomes one of balancing the cost of retention (storage) versus the utility of having historical data. There is no single answer, as each situation is different.

Most Graylog customers retain 30-90 days online (searchable in Elasticsearch) and 6-13 months of archives.

Calculating Storage Requirements

Like most data stores, Elasticsearch reacts badly when it consumes all available storage. In order to prevent this from happening, proper planning and monitoring must be performed.

Many variables affect storage requirements, such as how much of each message is kept, whether the original message is retained once parsing is complete, and how much enrichment is done prior to storage.

A simple rule of thumb for planning storage is to take your average daily ingestion rate, multiply it by the number of days you need to retain the data online, and then multiply that number by 1.3 to account for metadata overhead. (GB/day x Ret. Days x 1.3 = storage req.).

Elasticsearch makes extensive use of slack storage space in the course of it’s operations. Users are strongly encouraged to exceed the minimum storage required for their calculated ingestion rate. When at maximum retention, Elasticsearch storage should not exceed 75% of total space.

Download & Install Graylog

Graylog can be deployed in many different ways, You should download whatever works best for you. For those who would like to do an initial lab evaluation of Graylog, we recommend starting with the virtual machine appliances.

Virtual Appliances are definitely the fastest way to get started. However, since the virtual appliances are generally not suitable for use in production, they should be used strictly for proof of concept, evaluations or lab environments.

The virtual appliances are also completely unsecured. No hardening has been done and all default services are enabled.

For production deployments users should select and deploy one of the other, more flexible, installation methods.

Operating System Packages

Graylog may be installed on the following operating systems.

  • Ubuntu
  • Debian
  • RHEL/CentOS
  • SLES

Most customers use package tools like DEB or RPM to install the Graylog software. Details are included in the section, Operating System Packages.

Configuration Management

Customers who prefer to deploy graylog via configuration management tools may do so. Graylog currently supports Chef, Puppet, & Ansible.

Containers

Graylog supports Docker for deployment of Graylog, MongoDB and Elasticsearch. Installation and configuration instructions may be found on the Docker installation page.

Virtual Appliances

Virtual Appliances may be downloaded from virtual appliance download page If you are unsure what the latest stable version number is, take a look at our release page.

_images/download.png

Supported Virtual Appliances

  • OVA
  • AWS-AMI

Deployment guide for Virtual Machine Appliances.

Deployment guide for AWS - AMIs.

Supported Virtual Appliance Configuration Platforms

Deployment guide for Vagrant.

Deployment guide for OpenStack.

Virtual Appliance Caveats

Virtual appliances are not suitable for production deployment. They are created for lab or evaluation purposes. They do not have sufficient storage, nor do they offer capabilities like index replication that meet high availability requirements.

Also, because they are intended for internal testing and evaluation only, the virtual appliances are not hardened or otherwise secured. Use at your own risk and apply all security measures required by your organization.

Initial Configuration

Once the application is installed, there are a few items that must be configured before Graylog may be started for the first time. Both the Graylog server.conf and Elasticsearch elasticsearch.yml files are configuration files that contain key details needed for intial configuration.

This guide will provide you with the essential settings to get Graylog up and running. There are many other important settings in these files and we encourage you to review them once you are up and running. For more details, please see server.conf.

Note

If you are using the virtual appliance, please skip this section and go directly to Connect to the Web Console.

server.conf

The file server.conf is the Graylog configuration file. The default location for server.conf is: /etc/graylog/server/server.conf.

Note

All default file locations are listed here.

  • Entries are generally expected to be a single line of the form, one of the following:
    • propertyName=propertyValue
    • propertyName:propertyValue
  • White space that appears between the property name and property value is ignored, so the following are equivalent:
    • name=Stephen
    • name = Stephen
  • White space at the beginning of the line is also ignored.

  • Lines that start with the comment characters ! or # are ignored. Blank lines are also ignored.

  • The property value is generally terminated by the end of the line.

  • White space following the property value is not ignored, and is treated as part of the property value.

  • The characters newline, carriage return, and tab can be inserted with characters \n, \r, and \t, respectively.

General Properties
  • is_master = true
    • If you are running more than one instances of Graylog server you must designate (only) one graylog-server node as the master. This node will perform periodical and maintenance actions that slave nodes won’t.
  • password_secret = <secret>
    • You MUST set a secret that is used for password encryption and salting. The server will refuse to start if this value is not set. Use at least 64 characters. If you run multiple graylog-server nodes, make sure you use the same password_secret for all of them!

    Note

    Generate a secret with for example pwgen -N 1 -s 96.

  • root_username = admin
    • The default root user is named admin.
  • root_password_sha2 = <SHA2>
    • A SHA2 hash of the password you will use for your initial login. Insert a SHA2 hash generated with echo -n "Enter Password: " && head -1 </dev/stdin | tr -d '\n' | sha256sum | cut -d" " -f1 and you will be able to log in to the web interface with username admin and password yourpassword.

    Caution

    You MUST specify a hash password for the root user (which you only need to initially set up the system and in case you lose connectivity to your authentication backend). This password cannot be changed using the API or via the web interface. If you need to change it, modify it in this file.

  • rest_listen_uri = http://127.0.0.1:9000/api/
    • REST API listen URI. Must be reachable by other Graylog server nodes if you run a cluster.
    • Typically, this will be the “internal” IP address of the Graylog server.
    • When using Graylog Collectors, this URI will be used to receive heartbeat messages and must be accessible for all collectors.
  • rest_transport_uri = http://192.168.1.1:9000/api/
    • REST API transport address. Defaults to the value of rest_listen_uri. Exception: If rest_listen_uri is set to a wildcard IP address (0.0.0.0) the first non-loopback IPv4 system address is used.
    • Typically, this will be the “external” or publicly accessible IP address of the Graylog server.
    • You will need to define this if your Graylog server is running behind a HTTP proxy that is rewriting the scheme, host name or URI.
    • If set, this will be promoted in the cluster discovery APIs, so other nodes may try to connect on this address and it is used to generate URLs addressing entities in the REST API. (see rest_listen_uri)
    • This must not contain a wildcard address (0.0.0.0).
Web Properties
  • web_listen_uri = http://127.0.0.1:9000/
    • Web interface listen URI.
    • Typically, this will be the “internal” IP address of the Graylog server.
    • Configuring a path for the URI here effectively prefixes all URIs in the web interface. This is a replacement for the application.context configuration parameter in pre-2.0 versions of the Graylog web interface.
  • web_endpoint_uri =
    • Web interface endpoint URI. This setting can be overriden on a per-request basis with the X-Graylog-Server-URL header.
    • It takes the value of rest_transport_uri by default. If the Graylog server is not behind a proxy or load balancer, changing the default setting is not necessary.
Elasticsearch Properties
  • elasticsearch_hosts = http://node1:9200,http://user:password@node2:19200
    • List of Elasticsearch hosts Graylog should connect to.
    • Need to be specified as a comma-separated list of valid URIs for the http ports of your elasticsearch nodes.
    • If one or more of your elasticsearch hosts require authentication, include the credentials in each node URI that requires authentication.
    • Default: http://127.0.0.1:9200 You may retain the default setting only if Elasticsearch is installed on the same host as the Graylog server.
MongoDB
  • mongodb_uri = mongdb://...
    • MongoDB connection string. Enter your MongoDB connection and authentication information here.

    • See https://docs.mongodb.com/manual/reference/connection-string/ for details.

    • Examples:
      • Simple: mongodb_uri = mongodb://localhost/graylog
      • Authenticate against the MongoDB server: mongodb_uri = mongodb://grayloguser:secret@localhost:27017/graylog
      • Use a replica set instead of a single host: mongodb_uri = mongodb://grayloguser:secret@localhost:27017,localhost:27018,localhost:27019/graylog
HTTP
  • http_proxy_uri =
    • HTTP proxy for outgoing HTTP connections
  • http_non_proxy_hosts =
    • A list of hosts that should be reached directly, bypassing the configured proxy server.
    • This is a list of patterns separated by ”,”. The patterns may start or end with a “*” for wildcards.
    • Any host matching one of these patterns will be reached through a direct connection instead of through a proxy.

elasticsearch.yml

Elasticsearch.yml is the Elasticsearch configuration file. The default location for server.conf is: /etc/elasticsearch/elasticsearch.yml.

Several values must be properly configured in order for elasticsearch to work properly.

  • cluster.name: graylog
    • This value may be set to anything the customer wishes, though we recommend using “graylog”.
    • This value must be the same for every Elasticsearch node in a cluster.
  • network.host: 172.30.4.105
    • By default, Elasticsearch binds to loopback addresses only (e.g. 127.0.0.1). This is sufficient to run a single development node on a server.
    • In order to communicate and to form a cluster with nodes on other servers, your node will need to bind to a non-loopback address.
  • http.port: 9200
    • Port Elasticsearch will listen on. We recommend using the default value.
  • discovery.zen.ping.unicast.hosts: ["es01.acme.org", "es02.acme.org"]

    • Elasticsearch uses a custom discovery implementation called “Zen Discovery” for node-to-node clustering and master election. To form a cluster with nodes on other servers, you have to provide a seed list of other nodes in the cluster that are likely to be live and contactable.
    • May be specified as IP address or FQDN.

Connect to the Web Console

Open a browser and navigate to the URL http://xxx.xxx.xxx.xxx:9000, substituting the IP of your graylog server. You should see a Graylog login page similar to the screenshot below.

If using the VM appliance, log in using admin for both the username and password. If using either container or OS versions of Graylog, log in as admin and use the password from which you derived the password secret when installing Graylog.

_images/graylog_login.png

Logging in will get you to a “Getting Started” screen. But, if you are reading this, then you’ve already found the “Getting Started Guide”, so just keep going.

Also, feel free to dismiss the guide or keep it for later.

_images/first_login.png

Explore Graylog

Once messages are being received, you may want to poke around and explore a bit. There are several pages available, though not all pages may be visible to all users, depending on individual permissions. The following is a brief description of each page’s purpose and function.

Streams

Streams are a core feature of Graylog and may be thought of as a form of tagging for incoming messages. Streams are a mechanism used to route messages into categories in real-time. Stream rules instruct Graylog which message to route into which streams.

Streams have many uses. First, they are used to route data for storage into an index. They are also used to control access to data, route messages for parsing, enrichment or other modification and determine which messages will be archived.

Streams may be used in conjunction with Alerts to notify users or otherwise respond when a message meets a set of conditions.

Messages may belong to one or to multiple streams. For additional detail, please see Streams.

Searches

The Graylog Search page is the interface used to search logs directly. Graylog uses a simplified syntax, very similar to Lucene. Relative or absolute time ranges are configurable from drop down menus. Searches may be saved or visualized as dashboard widgets that may be added directly to dashboards from within the search screen.

Users may configure their own views and may choose to see either summary or complete data from event messages. Saved Searches may be imported and exported via content packs and search results may be exported to file.

For additional detail, please see Searching.

Dashboards

Graylog Dashboards are visualizations or summaries of information contained in log events. Each dashboard is populated by one or more widget, Widgets visualize or summarize event log data with data derived from field values such as counts, averages, or totals. Trend indicators, charts, graphs, and maps are easily created to visualize the data.

Dashboard widgets and dashboard layouts are configurable. Dashboard access is controlled via Graylog’s role based access control. Dashboards may be imported and exported via content packs.

For additional detail, please see Dashboards.

Alerts

Alerts are composed of two related elements, alert conditions and alert notifications. Alert conditions are tied to streams and may be based on the content of a field, the aggregated value of a field, or message count thresholds. An alert notification triggers when a condition is met, typically sending an email or HTTP call back to an analyst or another system.

Additional output types may also be created via plugins. Alerts may be imported and exported via content packs.

For additional detail, please see Alerts.

System

Overview

The Overview page displays information relating to the administration of the Graylog instance. It contains information on system notifications, system job status, ingestion rates, Elasticsearch cluster health, indexer failures, Time configuration and the system event messages.

Configuration

The Configuration page allows users to set options or variables related to searches, message processors and plugins.

Nodes

The Nodes page contains summary status information for each Graylog node. Detailed health information and metrics are available from buttons displayed on this page.

Inputs

Usually the first thing configured after initial system setup, Inputs are used to tell Graylog on which port to listen or how to go and retrieve event logs. The Inputs page allows users to create and configure new inputs, to manage extractors, to start and stop inputs, get metrics for each input and to add static fields to incoming messages.

Outputs

Outputs are used to define methods of forwarding data to remote systems, including port, protocol and any other required information. Out of the box, Graylog supports STDOUT and GELF outputs, but users may write their own and more are available in the Graylog Marketplace.

Authentication

The Authentication page is used to configure Graylog’s authentication providers and manage the active users of this Graylog cluster. Graylog supports LDAP or Active Directory for both authentication and authorization.

Content Packs

Content packs accelerate the set-up process for a specific data source. A content pack can include inputs/extractors, streams, dashboards, alerts, saved searches and pipeline processors.

Any program element created within Graylog may be exported as Content Packs for use on other systems. These may be kept private by the author, for use in quick deployment of new nodes internally, or may be shared with the community via the Graylog Marketplace. For example, users may create custom Inputs, Streams, Dashboards, and Alerts to support a security use case. These elements may be exported in a content pack and then imported on a newly installed Graylog instance to save configuration time and effort.

Users may download content packs created and shared by other users via the Graylog Marketplace. User created content packs are not supported by Graylog, but instead by their authors.

List of Elements Supported in Content Packs

  • Inputs
  • Grok Patterns
  • Outputs
  • Streams
  • Dashboards
  • Lookup Tables
  • Lookup Caches
  • Lookup Data Adapters
Indices

An Index is the basic unit of storage for data in Elasticsearch. Index sets provide configuration for retention, sharding, and replication of the stored data.

Values, like retention and rotation strategy, are set on a per index basis, so different data may be subjected to different handling rules.

For more details, please see Index model.

Collectors/Sidecars

Graylog created the Sidecar agent to manage fleets of log shippers like Beats or NXLog. These log shippers are used to collect OS logs from Windows servers, but also for OS logs from *nix systems. Log shippers are often the simplest way to read logs written locally to a flat file and send them to a centralized log management solution. Graylog supports management of any log shipper as a back-end, but includes Beats and NXLog binaries in the agent package.

For more details, please see Graylog Collector Sidecar.

Pipelines

Graylog’s Processing Pipelines are a powerful feature that enables user to run a rule, or a series of rules, against a specific type of event. Tied to streams, pipelines allow for routing, blacklisting, modifying and enriching messages as they flow through Graylog. Basically, if you want to parse, change, convert. add to, delete from or drop a message, Pipelines are the place to do it.

For more details, please see Processing Pipelines.

Collect Messages

Once Graylog and associated components are running, the next step is to begin collecting logs.

The first step is to create an input. Inputs define the method by which Graylog collects logs. Out of the box, Graylog supports multiple methods to collect logs, including:

  • Syslog (TCP, UDP, AMQP, Kafka)
  • GELF(TCP, UDP, AMQP, Kafka, HTTP)
  • AWS - AWS Logs, FlowLogs, CloudTrail
  • Beats/Logstash
  • CEF (TCP, UDP, AMQP, Kafka)
  • JSON Path from HTTP API
  • Netflow (UDP)
  • Plain/Raw Text (TCP, UDP, AMQP, Kafka)

Content packs

Additional inputs may be installed via content packs. Content packs are bundles of Graylog input, extractor, stream, dashboard, and output configurations that can provide full support for a data source. Some content packs are shipped with Graylog by default and some are available from the website. Content packs that were downloaded from the Graylog Marketplace can be imported using the Graylog web interface.

You can load and even create your own content packs from the System / Content Packs page of the Graylog web interface.

Create an Input

To create an Input, open the System / Inputs page in the top menu, click the arrow in the drop down field, select your input type and click green button labled Launch new input.

Usually, the default settings are correct, but you may change any that you wish. Some input types may require authentication or other information specific to that source.

Note

If Graylog is not running as root, you will not have the option of using ports lower than 1024 for inputs. Sending devices may need to be reconfigured. Since best practice dictates that applications should not be run as root, customers who cannot change the event source are encouraged to use a load balancer, or other external means, to perform port translation.

Save the input. It will start automatically.

If your event source is already configured to send events to the port you selected, in the case of push event sources like Syslog or CEF, you should start to recieve messages within a few seconds.

Check out Sending in log data if you’d like to learn more about the supported options for ingesting messages into Graylog.

Verify Messages Are Being Collected

Once you have an input defined, you will want to verify that you are receiving messages on that input. Check the Throughput / Metrics section to the right of your input. You should see the NetworkIO values start to climb, showing the amount of data consumed on this input.

_images/input_page.png

Click on the Show received messages button next to the input you just created. You should now see the messages received on this input.

_images/gs_10-messages.png

If you click on Sources in the top menu, you will see a nice overview of all devices, servers or applications that are sending data into Graylog and how many messages has received from each source. Initially, you may not see much on this page. However, once you start sending data from more systems, their hostnames or IP addresses will also be listed on this page.

_images/sources_page.png

Skip the following section if you are all good.

If You Don’t Have Messages
  1. Check to see that you made the proper entries in the input configuration described above.
  2. Check the configuration at the event source and make sure that it matches the ports and other options defined in the input. For example, if you changed the port for a Syslog UDP input to 5014, be sure the sending device has that same port defined.
  3. Check to see if traffic is coming to the defined port. You can use the tcpdump command to do this:

$ sudo tcpdump -i lo host 127.0.0.1 and udp port 5014

  1. Check to see if the server is listening on the host:

$ sudo netstat -peanut | grep ":5014"

If you still have issues, connect to our community support or get in touch with us via the professional support offering.

Installing Graylog

Modern server architectures and configurations are managed in many different ways. Some people still put new software somewhere in opt manually for each server while others have already jumped on the configuration management train and fully automated reproducible setups.

Graylog can be installed in many different ways so you can pick whatever works best for you. We recommend to start with the virtual machine appliances for the fastest way to get started and then pick one of the other, more flexible installation methods to build an easier to scale setup.

This chapter is explaining the many ways to install Graylog and aims to help choosing the one that fits your needs.

Virtual Machine Appliances

Pre-Considerations

The Graylog Virtual Machine Appliance was designed only as a showcase of Graylog and its cluster mode. This appliance is intended for proof of concept, testing, lab or other such applications. Please, deploy this appliance in a network that is isolated from the internet. In most cases, Graylog does not recommend using this appliance in a production environment.

Please review the notes about production readiness!

Download

Download the OVA image. If you are unsure what the latest version number is, take a look at our release page.

Run the image

You can run the OVA in many systems like VMware or Virtualbox. In this example we will guide you through running the OVA in the free Virtualbox on OSX.

In Virtualbox select File -> Import appliance:

_images/virtualbox1.png

Hit Continue and keep the suggested settings on the next page as they are. Make sure that you have enough RAM and CPUs on your local machine. You can lower the resources the virtual machine will get assigned but we recommend to not lower it to ensure a good Graylog experience. In fact you might have to raise it if you plan to scale out later and send more messages into Graylog.

Press Import to finish loading the OVA into Virtualbox:

_images/virtualbox2.png

You can now start the VM and should see a login shell like this when the boot sequence is completed:

_images/virtualbox3.png

Note

If you don’t have a working DHCP server for your virtual machine, you will get the error message:

“Your appliance came up without a configured IP address. Graylog is probably not running correctly!”

In this case, you have to login and edit /etc/network/interfaces in order to setup a fixed IP address. Then manually reconfigure Graylog as shown in the following paragraphs.

Logging in

You can log into the shell of the operating system of the appliance with the user ubuntu and the password ubuntu. You should of course change those credentials.

The web interface is reachable on port 80 at the IP address of your virtual machine. The login prompt of the shell is showing you this IP address, too. (See screenshot above). DHCP should be enabled in your network otherwise take a look into the graylog-ctl command to apply a static IP address to the appliance.

The standard user for the web interface is admin with the password admin.

Basic configuration

We ship the graylog-ctl tool with the virtual machine appliances to get you started with a customized setup as quickly as possible. Run these (optional) commands to configure the most basic settings of Graylog in the appliance:

sudo graylog-ctl set-email-config <smtp server> [--port=<smtp port> --user=<username> --password=<password>]
sudo graylog-ctl set-admin-password <password>
sudo graylog-ctl set-timezone <zone acronym>
sudo graylog-ctl reconfigure

The graylog-ctl has much more functionality documented. We strongly recommend learning more about it to ensure smooth operation of your virtual appliance.

VMWare tools

If you are using the appliance on a VMWare host, you might want to install the hypervisor tools:

sudo apt-get install -y open-vm-tools

Update OVA to latest Version

You can update your Appliance to the newest release without deploying a new template.

Production readiness

The Graylog appliance is not designed to provide a production ready solution. It is built to offer a fast and easy way to try the software itself.

If you must use an appliance in production, please harden the security of the box before deployment.

Graylog recommends the following minimum steps be taken

  • Set another password for the default ubuntu user
  • Disable remote password logins in /etc/ssh/sshd_config and deploy proper ssh keys
  • Deploy appliance on a network that is properly isolated and secured against access from unauthorized sources. Under no circumstances should the appliance be reachable from the public internet.
  • Add additional RAM to the appliance and raise the java heap!
  • Add additional HDD to the appliance and extend disk space.
  • Add the appliance to your monitoring and metric systems.

If you need to create your own production ready setup take a look at our other installation methods.

Operating System Packages

Until configuration management systems made their way into broader markets and many datacenters, one of the most common ways to install software on Linux servers was to use operating system packages. Debian has DEB, Red Hat has RPM and many other distributions are based on those or come with own package formats. Online repositories of software packages and corresponding package managers make installing and configuring new software a matter of a single command and a few minutes of time.

Graylog offers official DEB and RPM package repositories. The packages have been tested on the following operating systems:

  • Ubuntu 12.04, 14.04, 16.04
  • Debian 7, 8, 9
  • RHEL/CentOS 6, 7

The repositories can be setup by installing a single package. Once that’s done the Graylog packages can be installed via apt-get or yum. The packages can also be downloaded with a web browser at https://packages.graylog2.org/ if needed.

Prerequisites

Make sure to install and configure the following software before installing and starting any Graylog services:

  • Java (>= 8)
  • MongoDB (>= 2.4)
  • Elasticsearch (>= 2.x)

Caution

Graylog 2.4 does not work with Elasticsearch 6.x!

DEB / APT

Download and install graylog-2.4-repository_latest.deb via dpkg(1) and also make sure that the apt-transport-https package is installed:

$ sudo apt-get install apt-transport-https
$ wget https://packages.graylog2.org/repo/packages/graylog-2.4-repository_latest.deb
$ sudo dpkg -i graylog-2.4-repository_latest.deb
$ sudo apt-get update
$ sudo apt-get install graylog-server

After the installation completed successfully, Graylog can be started with the following commands. Make sure to use the correct command for your operating system.

OS Init System Command
Ubuntu 14.04, 12.04 upstart sudo start graylog-server
Debian 7 SysV sudo service graylog-server start
Debian 8, Ubuntu 16.04 systemd sudo systemctl start graylog-server

The packages are configured to not start any Graylog services during boot. You can use the following commands to start Graylog when the operating system is booting.

OS Init System Command
Ubuntu 14.04, 12.04 upstart sudo rm -f /etc/init/graylog-server.override
Debian 7 SysV sudo update-rc.d graylog-server defaults 95 10
Debian 8, Ubuntu 16.06 systemd sudo systemctl enable graylog-server
Update to latest version

If you’ve been using the repository package to install Graylog before, it has to be updated first. The new package will replace the repository URL, without which you will only be able to get bugfix releases of your previously installed version of Graylog.

The update basically works like a fresh installation:

$ wget https://packages.graylog2.org/repo/packages/graylog-2.4-repository_latest.deb
$ sudo dpkg -i graylog-2.4-repository_latest.deb
$ sudo apt-get update
$ sudo apt-get install graylog-server
Manual Repository Installation

If you don’t like to install the repository DEB to get the repository configuration onto your system, you can do so manually (although we don’t recommend to do that).

First, add the Graylog GPG keyring which is being used to sign the packages to your system.

Hint

We assume that you have placed the GPG key into /etc/apt/trusted.gpg.d/.

Now create a file /etc/apt/sources.list.d/graylog.list with the following content:

deb https://packages.graylog2.org/repo/debian/ stable 2.4

RPM / YUM / DNF

Download and install graylog-2.4-repository_latest.rpm via rpm(8):

$ sudo rpm -Uvh https://packages.graylog2.org/repo/packages/graylog-2.4-repository_latest.rpm
$ sudo yum install graylog-server

After the installation completed successfully, Graylog can be started with the following commands. Make sure to use the correct command for your operating system.

OS Init System Command
CentOS 6 SysV sudo service graylog-server start
CentOS 7 systemd sudo systemctl start graylog-server

The packages are configured to not start any Graylog services during boot. You can use the following commands to start Graylog when the operating system is booting.

OS Init System Command
CentOS 6 SysV sudo update-rc.d graylog-server defaults 95 10
CentOS 7 systemd sudo systemctl enable graylog-server
Update to latest version

If you’ve been using the repository package to install Graylog before, it has to be updated first. The new package will replace the repository URL, without which you will only be able to get bugfix releases of your previously installed version of Graylog.

The update basically works like a fresh installation:

$ sudo rpm -Uvh https://packages.graylog2.org/repo/packages/graylog-2.4-repository_latest.rpm
$ sudo yum clean all
$ sudo yum install graylog-server

Running yum clean all is required because YUM might use a stale cache and thus might be unable to find the latest version of the graylog-server package.

Manual Repository Installation

If you don’t like to install the repository RPM to get the repository configuration onto your system, you can do so manually (although we don’t recommend to do that).

First, add the Graylog GPG key which is being used to sign the packages to your system.

Hint

We assume that you have placed the GPG key into /etc/pki/rpm-gpg/RPM-GPG-KEY-graylog.

Now create a file named /etc/yum.repos.d/graylog.repo with the following content:

[graylog]
name=graylog
baseurl=https://packages.graylog2.org/repo/el/stable/2.4/$basearch/
gpgcheck=1
repo_gpgcheck=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-graylog

Step-by-step guides

Ubuntu installation

This guide describes the fastest way to install Graylog on Ubuntu 16.04 LTS. All links and packages are present at the time of writing but might need to be updated later on.

Warning

This setup should not be done on publicly exposed servers. This guide does not cover security settings!

Prerequisites

Taking a minimal server setup as base will need this additional packages:

$ sudo apt-get update && sudo apt-get upgrade
$ sudo apt-get install apt-transport-https openjdk-8-jre-headless uuid-runtime pwgen
MongoDB

The official MongoDB repository provides the most up-to-date version and is the recommended way of installing MongoDB for Graylog:

$ sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 2930ADAE8CAF5059EE73BB4B58712A2291FA4AD5
$ echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu xenial/mongodb-org/3.6 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.6.list
$ sudo apt-get update
$ sudo apt-get install -y mongodb-org

The last step is to enable MongoDB during the operating system’s startup:

$ sudo systemctl daemon-reload
$ sudo systemctl enable mongod.service
$ sudo systemctl restart mongod.service
Elasticsearch

Graylog 2.4.x should be used with Elasticsearch 5.x, please follow the installation instructions from the Elasticsearch installation guide:

$ wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
$ echo "deb https://artifacts.elastic.co/packages/5.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-5.x.list
$ sudo apt-get update && sudo apt-get install elasticsearch

Make sure to modify the Elasticsearch configuration file (/etc/elasticsearch/elasticsearch.yml) and set the cluster name to graylog additionally you need to uncomment (remove the # as first character) the line:

cluster.name: graylog

After you have modified the configuration, you can start Elasticsearch:

$ sudo systemctl daemon-reload
$ sudo systemctl enable elasticsearch.service
$ sudo systemctl restart elasticsearch.service
Graylog

Now install the Graylog repository configuration and Graylog itself with the following commands:

$ wget https://packages.graylog2.org/repo/packages/graylog-2.4-repository_latest.deb
$ sudo dpkg -i graylog-2.4-repository_latest.deb
$ sudo apt-get update && sudo apt-get install graylog-server

Follow the instructions in your /etc/graylog/server/server.conf and add password_secret and root_password_sha2. These settings are mandatory and without them, Graylog will not start!

You need to use the following command to create your root_password_sha2:

echo -n "Enter Password: " && head -1 </dev/stdin | tr -d '\n' | sha256sum | cut -d" " -f1

To be able to connect to Graylog you should set rest_listen_uri and web_listen_uri to the public host name or a public IP address of the machine you can connect to. More information about these settings can be found in Configuring the web interface.

Note

If you’re operating a single-node setup and would like to use HTTPS for the Graylog web interface and the Graylog REST API, it’s possible to use NGINX or Apache as a reverse proxy.

The last step is to enable Graylog during the operating system’s startup:

$ sudo systemctl daemon-reload
$ sudo systemctl enable graylog-server.service
$ sudo systemctl start graylog-server.service

The next step is to ingest messages into your Graylog and extract the messages with extractors or use the Pipelines to work with the messages.

Multiple Server Setup

If you plan to have multiple server taking care of different roles in your cluster like we have in this big production setup you need to modify only a few settings. This is covered in our Multi-node Setup guide. The default file location guide will give you the file you need to modify in your setup.

Feedback

Please file a bug report in the GitHub repository for the operating system packages if you run into any packaging related issues.

If you found this documentation confusing or have more questions, please open an issue in the Github repository for the documentation.

Debian installation

This guide describes the fastest way to install Graylog on Debian Linux 9 (Stretch). All links and packages are present at the time of writing but might need to be updated later on.

Warning

This setup should not be done on publicly exposed servers. This guide does not cover security settings!

Prerequisites

If you’re starting from a minimal server setup, you will need to install these additional packages:

$ sudo apt update && sudo apt upgrade
$ sudo apt install apt-transport-https openjdk-8-jre-headless uuid-runtime pwgen
MongoDB

The official MongoDB repository provides the most up-to-date version and is the recommended way of installing MongoDB for Graylog:

$ sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 2930ADAE8CAF5059EE73BB4B58712A2291FA4AD5
$ echo "deb http://repo.mongodb.org/apt/debian stretch/mongodb-org/3.6 main" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.6.list
$ sudo apt-get update
$ sudo apt-get install -y mongodb-org

The last step is to enable MongoDB during the operating system’s startup:

$ sudo systemctl daemon-reload
$ sudo systemctl enable mongod.service
$ sudo systemctl restart mongod.service
Elasticsearch

Graylog 2.4.x should be used with Elasticsearch 5.x, please follow the installation instructions from the Elasticsearch installation guide:

$ wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
$ echo "deb https://artifacts.elastic.co/packages/5.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-5.x.list
$ sudo apt update && sudo apt install elasticsearch

Make sure to modify the Elasticsearch configuration file (/etc/elasticsearch/elasticsearch.yml) and set the cluster name to graylog additionally you need to uncomment (remove the # as first character) the line:

cluster.name: graylog

After you have modified the configuration, you can start Elasticsearch:

$ sudo systemctl daemon-reload
$ sudo systemctl enable elasticsearch.service
$ sudo systemctl restart elasticsearch.service
Graylog

Now install the Graylog repository configuration and Graylog itself with the following commands:

$ wget https://packages.graylog2.org/repo/packages/graylog-2.4-repository_latest.deb
$ sudo dpkg -i graylog-2.4-repository_latest.deb
$ sudo apt update && sudo apt install graylog-server

Follow the instructions in your /etc/graylog/server/server.conf and add password_secret and root_password_sha2. These settings are mandatory and without them, Graylog will not start!

You need to use the following command to create your root_password_sha2:

echo -n "Enter Password: " && head -1 </dev/stdin | tr -d '\n' | sha256sum | cut -d" " -f1

To be able to connect to Graylog you should set rest_listen_uri and web_listen_uri to the public host name or a public IP address of the machine you can connect to. More information about these settings can be found in Configuring the web interface.

Note

If you’re operating a single-node setup and would like to use HTTPS for the Graylog web interface and the Graylog REST API, it’s possible to use NGINX or Apache as a reverse proxy.

The last step is to enable Graylog during the operating system’s startup:

$ sudo systemctl daemon-reload
$ sudo systemctl enable graylog-server.service
$ sudo systemctl start graylog-server.service

The next step is to ingest messages into your Graylog and extract the messages with extractors or use the Pipelines to work with the messages.

Multiple Server Setup

If you plan to have multiple server taking care of different roles in your cluster like we have in this big production setup you need to modify only a few settings. This is covered in our Multi-node Setup guide. The default file location guide will give you the file you need to modify in your setup.

Feedback

Please file a bug report in the GitHub repository for the operating system packages if you run into any packaging related issues.

If you found this documentation confusing or have more questions, please open an issue in the Github repository for the documentation.

CentOS installation

This guide describes the fastest way to install Graylog on CentOS 7. All links and packages are present at the time of writing but might need to be updated later on.

Warning

This setup should not be done on publicly exposed servers. This guide does not cover security settings!

Prerequisites

Taking a minimal server setup as base will need this additional packages:

$ sudo yum install java-1.8.0-openjdk-headless.x86_64

If you want to use pwgen later on you need to Setup EPEL on your system with sudo yum install epel-release and install the package with sudo yum install pwgen.

MongoDB

Installing MongoDB on CentOS should follow the tutorial for RHEL and CentOS from the MongoDB documentation. First add the repository file /etc/yum.repos.d/mongodb-org-3.6.repo with the following contents:

[mongodb-org-3.6]
name=MongoDB Repository
baseurl=https://repo.mongodb.org/yum/redhat/$releasever/mongodb-org/3.6/x86_64/
gpgcheck=1
enabled=1
gpgkey=https://www.mongodb.org/static/pgp/server-3.6.asc

After that, install the latest release of MongoDB with sudo yum install -y mongodb-org.

Additionally, run these last steps to start MongoDB during the operating system’s boot and start it right away:

$ sudo chkconfig --add mongod
$ sudo systemctl daemon-reload
$ sudo systemctl enable mongod.service
$ sudo systemctl start mongod.service
Elasticsearch

Graylog 2.4.x should be used with Elasticsearch 5.x, please follow the installation instructions from the Elasticsearch installation guide.

First install the Elastic GPG key with rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch then add the repository file /etc/yum.repos.d/elasticsearch.repo with the following contents:

[elasticsearch-5.x]
name=Elasticsearch repository for 5.x packages
baseurl=https://artifacts.elastic.co/packages/5.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md

followed by the installation of the latest release with sudo yum install elasticsearch.

Make sure to modify the Elasticsearch configuration file (/etc/elasticsearch/elasticsearch.yml) and set the cluster name to graylog additionally you need to uncomment (remove the # as first character) the line:

cluster.name: graylog

After you have modified the configuration, you can start Elasticsearch:

$ sudo chkconfig --add elasticsearch
$ sudo systemctl daemon-reload
$ sudo systemctl enable elasticsearch.service
$ sudo systemctl restart elasticsearch.service
Graylog

Now install the Graylog repository configuration and Graylog itself with the following commands:

$ sudo rpm -Uvh https://packages.graylog2.org/repo/packages/graylog-2.4-repository_latest.rpm
$ sudo yum install graylog-server

Follow the instructions in your /etc/graylog/server/server.conf and add password_secret and root_password_sha2. These settings are mandatory and without them, Graylog will not start!

You need to use the following command to create your root_password_sha2:

echo -n "Enter Password: " && head -1 </dev/stdin | tr -d '\n' | sha256sum | cut -d" " -f1

To be able to connect to Graylog you should set rest_listen_uri and web_listen_uri to the public host name or a public IP address of the machine you can connect to. More information about these settings can be found in Configuring the web interface.

Note

If you’re operating a single-node setup and would like to use HTTPS for the Graylog web interface and the Graylog REST API, it’s possible to use NGINX or Apache as a reverse proxy.

The last step is to enable Graylog during the operating system’s startup:

$ sudo chkconfig --add graylog-server
$ sudo systemctl daemon-reload
$ sudo systemctl enable graylog-server.service
$ sudo systemctl start graylog-server.service

The next step is to ingest messages into your Graylog and extract the messages with extractors or use the Pipelines to work with the messages.

SELinux information

Hint

We assume that you have policycoreutils-python installed to manage SELinux.

If you’re using SELinux on your system, you need to take care of the following settings:

  • Allow the web server to access the network: sudo setsebool -P httpd_can_network_connect 1

  • If the policy above does not comply with your security policy, you can also allow access to each port individually:
    • Graylog REST API and web interface: sudo semanage port -a -t http_port_t -p tcp 9000
    • Elasticsearch (only if the HTTP API is being used): sudo semanage port -a -t http_port_t -p tcp 9200
  • Allow using MongoDB’s default port (27017/tcp): sudo semanage port -a -t mongod_port_t -p tcp 27017

If you run a single server environment with NGINX or Apache proxy, enabling the Graylog REST API is enough. All other rules are only required in a multi-node setup. Having SELinux disabled during installation and enabling it later, requires you to manually check the policies for MongoDB, Elasticsearch and Graylog.

Hint

Depending on your actual setup and configuration, you might need to add more SELinux rules to get to a running setup.

Multiple Server Setup

If you plan to have multiple server taking care of different roles in your cluster like we have in this big production setup you need to modify only a few settings. This is covered in our Multi-node Setup guide. The default file location guide will give you the file you need to modify in your setup.

Feedback

Please file a bug report in the GitHub repository for the operating system packages if you run into any packaging related issues.

If you found this documentation confusing or have more questions, please open an issue in the Github repository for the documentation.

SLES installation

This guide describes the fastest way to install Graylog on SLES 12 SP3. All links and packages are present at the time of writing but might need to be updated later on.

Warning

This setup should not be done on publicly exposed servers. This guide does not cover security settings!

Prerequisites

The following patterns are required for a minimal setup (see SLES 12 SP3 Deployment Guide):

- Base System
- Minimal System (Appliances)
- YaST configuration packages

Warning

This Guide assumes that the firewall is disabled and communication is possible to the outside world.

Assuming a minimal setup, you have to install the Java runtime environment:

$ sudo zypper install java-1_8_0-openjdk
MongoDB

Installing MongoDB on SLES should follow the tutorial for SLES from the MongoDB documentation. Add the GPG key and the repository before installing MongoDB:

$ sudo rpm --import https://www.mongodb.org/static/pgp/server-3.6.asc
$ sudo zypper addrepo --gpgcheck "https://repo.mongodb.org/zypper/suse/12/mongodb-org/3.6/x86_64/" mongodb
$ sudo zypper -n install mongodb-org

In order to automatically start MongoDB on system boot, you have to activate the MongoDB service by running the following commands:

$ sudo chkconfig mongod on
$ sudo systemctl daemon-reload
$ sudo systemctl restart mongod.service
Elasticsearch

Graylog 2.4.x should be used with Elasticsearch 5.x, please follow the installation instructions from the Elasticsearch installation guide.

First install the Elastic GPG key with rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch then add the repository file /etc/zypp/repos.d/elasticsearch.repo with the following contents:

[elasticsearch-5.x]
name=Elasticsearch repository for 5.x packages
baseurl=https://artifacts.elastic.co/packages/5.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md

followed by the installation of the latest release with sudo zypper install elasticsearch.

Make sure to modify the Elasticsearch configuration file (/etc/elasticsearch/elasticsearch.yml) and set the cluster name to graylog additionally you need to uncomment (remove the # as first character) the line:

cluster.name: graylog

In order to automatically start Elasticsearch on system boot, you have to activate the Elasticsearch service by running the following commands:

$ sudo chkconfig elasticsearch on
$ sudo systemctl daemon-reload
$ sudo systemctl restart elasticsearch.service
Graylog

First install the Graylog GPG Key with rpm --import https://packages.graylog2.org/repo/debian/keyring.gpg then add the repository file /etc/zypp/repos.d/graylog.repo with the following content:

[graylog]
name=graylog
baseurl=https://packages.graylog2.org/repo/el/stable/2.4/$basearch/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-graylog

After that, install the latest release with sudo zypper install graylog-server.

Make sure to follow the instructions in your /etc/graylog/server/server.conf and add password_secret and root_password_sha2. These settings are mandatory and without them, Graylog will not start!

You can use the following command to create your password_secret:

cat /dev/urandom | base64 | cut -c1-96 | head -1

You need to use the following command to create your root_password_sha2:

echo -n "Enter Password: " && head -1 </dev/stdin | tr -d '\n' | sha256sum | cut -d" " -f1

To be able to connect to Graylog you should set rest_listen_uri and web_listen_uri to the public host name or a public IP address of the machine you can connect to. More information about these settings can be found in Configuring the web interface.

Note

If you’re operating a single-node setup and would like to use HTTPS for the Graylog web interface and the Graylog REST API, it’s possible to use NGINX or Apache as a reverse proxy.

The last step is to enable Graylog during the operating system’s startup:

$ sudo chkconfig graylog-server on
$ sudo systemctl daemon-reload
$ sudo systemctl start graylog-server.service

The next step is to ingest messages into your new Graylog Cluster and extract the messages with extractors or use the Pipelines to work with the messages.

Cluster Setup

If you plan to have multiple servers assuming different roles in your cluster like we have in this big production setup you need to modify only a few settings. This is covered in our Multi-node Setup guide. The default file location guide lists the locations of the files you need to modify.

Feedback

Please file a bug report in the GitHub repository for the operating system packages if you run into any packaging related issues.

If you found this documentation confusing or have more questions, please open an issue in the Github repository for the documentation.

Feedback

Please file a bug report in the GitHub repository for the operating system packages if you run into any packaging related issues.

If you found this documentation confusing or have more questions, please open an issue in the Github repository for the documentation.

Chef, Puppet, & Ansible

The DevOps movement turbocharged market adoption of the newest generation of configuration management and orchestration tools like Chef, Puppet or Ansible. Graylog offers official scripts for all three of them:

Docker

Requirements

You will need a fairly recent version of Docker.

We will use the following Docker images in this chapter:

Quick start

If you simply want to checkout Graylog without any further customization, you can run the following three commands to create the necessary environment:

$ docker run --name mongo -d mongo:3
$ docker run --name elasticsearch \
    -e "http.host=0.0.0.0" -e "xpack.security.enabled=false" \
    -d docker.elastic.co/elasticsearch/elasticsearch:5.6.12
$ docker run --link mongo --link elasticsearch \
    -p 9000:9000 -p 12201:12201 -p 514:514 \
    -e GRAYLOG_WEB_ENDPOINT_URI="http://127.0.0.1:9000/api" \
    -d graylog/graylog:2.4
How to get log data in

You can create different kinds of inputs under System / Inputs, however you can only use ports that have been properly mapped to your docker container, otherwise data will not go through.

For example, to start a Raw/Plaintext TCP input on port 5555, stop your container and recreate it, whilst appending -p 5555:5555 to your docker run command:

$ docker run --link mongo --link elasticsearch \
    -p 9000:9000 -p 12201:12201 -p 514:514 -p 5555:5555 \
    -e GRAYLOG_WEB_ENDPOINT_URI="http://127.0.0.1:9000/api" \
    -d graylog/graylog:2.4

Similarly, the same can be done for UDP by appending -p 5555:5555/udp.

After that you can send a plaintext message to the Graylog Raw/Plaintext TCP input running on port 5555 using the following command:

$ echo 'First log message' | nc localhost 5555
Settings

Graylog comes with a default configuration that works out of the box but you have to set a password for the admin user and the web interface needs to know how to connect from your browser to the Graylog REST API.

Both settings can be configured via environment variables (also see Configuration):

-e GRAYLOG_ROOT_PASSWORD_SHA2=8c6976e5b5410415bde908bd4dee15dfb167a9c873fc4bb8a81f6f2ab448a918
-e GRAYLOG_WEB_ENDPOINT_URI="http://127.0.0.1:9000/api"

In this case you can login to Graylog with the username and password admin.

Generate your own admin password with the following command and put the SHA-256 hash into the GRAYLOG_ROOT_PASSWORD_SHA2 environment variable:

echo -n "Enter Password: " && head -1 </dev/stdin | tr -d '\n' | sha256sum | cut -d" " -f1

All these settings and command line parameters can be put in a docker-compose.yml file, so that they don’t have to be executed one after the other.

Example:

version: '2'
services:
  # MongoDB: https://hub.docker.com/_/mongo/
  mongodb:
    image: mongo:3
  # Elasticsearch: https://www.elastic.co/guide/en/elasticsearch/reference/5.6/docker.html
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:5.6.12
    environment:
      - http.host=0.0.0.0
      - transport.host=localhost
      - network.host=0.0.0.0
      # Disable X-Pack security: https://www.elastic.co/guide/en/elasticsearch/reference/5.6/security-settings.html#general-security-settings
      - xpack.security.enabled=false
      - xpack.watcher.enabled=false
      - xpack.monitoring.enabled=false
      - xpack.security.audit.enabled=false
      - xpack.ml.enabled=false
      - xpack.graph.enabled=false
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    mem_limit: 1g
  # Graylog: https://hub.docker.com/r/graylog/graylog/
  graylog:
    image: graylog/graylog:2.4
    environment:
      # CHANGE ME!
      - GRAYLOG_PASSWORD_SECRET=somepasswordpepper
      # Password: admin
      - GRAYLOG_ROOT_PASSWORD_SHA2=8c6976e5b5410415bde908bd4dee15dfb167a9c873fc4bb8a81f6f2ab448a918
      - GRAYLOG_WEB_ENDPOINT_URI=http://127.0.0.1:9000/api
    links:
      - mongodb:mongo
      - elasticsearch
    depends_on:
      - mongodb
      - elasticsearch
    ports:
      # Graylog web interface and REST API
      - 9000:9000
      # Syslog TCP
      - 514:514
      # Syslog UDP
      - 514:514/udp
      # GELF TCP
      - 12201:12201
      # GELF UDP
      - 12201:12201/udp

After starting all three Docker containers by running docker-compose up, you can open the URL http://127.0.0.1:9000 in a web browser and log in with username admin and password admin (make sure to change the password later).

Configuration

Every configuration option can be set via environment variables. Simply prefix the parameter name with GRAYLOG_ and put it all in upper case.

For example, setting up the SMTP configuration for sending Graylog alert notifications via email, the docker-compose.yml would look like this:

version: '2'
services:
  mongo:
    image: "mongo:3"
    # Other settings [...]
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:5.6.12
    # Other settings [...]
  graylog:
    image: graylog/graylog:2.4
    # Other settings [...]
    environment:
      GRAYLOG_TRANSPORT_EMAIL_ENABLED: "true"
      GRAYLOG_TRANSPORT_EMAIL_HOSTNAME: smtp
      GRAYLOG_TRANSPORT_EMAIL_PORT: 25
      GRAYLOG_TRANSPORT_EMAIL_USE_AUTH: "false"
      GRAYLOG_TRANSPORT_EMAIL_USE_TLS: "false"
      GRAYLOG_TRANSPORT_EMAIL_USE_SSL: "false"

Another option would be to store the configuration file outside of the container and edit it directly.

Custom configuration files

Instead of using a long list of environment variables to configure Graylog (see Configuration), you can also overwrite the bundled Graylog configuration files.

The bundled configuration files are stored in /usr/share/graylog/data/config/ inside the Docker container.

Create the new configuration directory next to the docker-compose.yml file and copy the default files from GitHub:

$ mkdir -p ./graylog/config
$ cd ./graylog/config
$ wget https://raw.githubusercontent.com/Graylog2/graylog-docker/2.4/config/graylog.conf
$ wget https://raw.githubusercontent.com/Graylog2/graylog-docker/2.4/config/log4j2.xml

The newly created directory ./graylog/config/ with the custom configuration files now has to be mounted into the Graylog Docker container.

This can be done by adding an entry to the volumes section of the docker-compose.yml file:

version: '2'
services:
  mongodb:
    image: mongo:3
    # Other settings [...]
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:5.6.12
    # Other settings [...]
  graylog:
    image: graylog/graylog:2.4
    # Other settings [...]
    volumes:
      # Mount local configuration directory into Docker container
      - ./graylog/config:/usr/share/graylog/data/config

Persisting data

In order to make the recorded data persistent, you can use external volumes to store all data.

In case of a container restart, this will simply re-use the existing data from the former instances.

Using Docker volumes for the data of MongoDB, Elasticsearch, and Graylog, the docker-compose.yml file looks as follows:

version: '2'
services:
  # MongoDB: https://hub.docker.com/_/mongo/
  mongodb:
    image: mongo:3
    volumes:
      - mongo_data:/data/db
  # Elasticsearch: https://www.elastic.co/guide/en/elasticsearch/reference/5.6/docker.html
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:5.6.12
    volumes:
      - es_data:/usr/share/elasticsearch/data
    environment:
      - http.host=0.0.0.0
      - transport.host=localhost
      - network.host=0.0.0.0
      # Disable X-Pack security: https://www.elastic.co/guide/en/elasticsearch/reference/5.6/security-settings.html#general-security-settings
      - xpack.security.enabled=false
      - xpack.watcher.enabled=false
      - xpack.monitoring.enabled=false
      - xpack.security.audit.enabled=false
      - xpack.ml.enabled=false
      - xpack.graph.enabled=false
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    mem_limit: 1g
  # Graylog: https://hub.docker.com/r/graylog/graylog/
  graylog:
    image: graylog/graylog:2.4
    volumes:
      - graylog_journal:/usr/share/graylog/data/journal
    environment:
      # CHANGE ME!
      - GRAYLOG_PASSWORD_SECRET=somepasswordpepper
      # Password: admin
      - GRAYLOG_ROOT_PASSWORD_SHA2=8c6976e5b5410415bde908bd4dee15dfb167a9c873fc4bb8a81f6f2ab448a918
      - GRAYLOG_WEB_ENDPOINT_URI=http://127.0.0.1:9000/api
    links:
      - mongodb:mongo
      - elasticsearch
    depends_on:
      - mongodb
      - elasticsearch
    ports:
      # Graylog web interface and REST API
      - 9000:9000
      # Syslog TCP
      - 514:514
      # Syslog UDP
      - 514:514/udp
      # GELF TCP
      - 12201:12201
      # GELF UDP
      - 12201:12201/udp
# Volumes for persisting data, see https://docs.docker.com/engine/admin/volumes/volumes/
volumes:
  mongo_data:
    driver: local
  es_data:
    driver: local
  graylog_journal:
    driver: local

Start all services with exposed data directories:

$ docker-compose up

Plugins

In order to add plugins you can build a new image based on the existing graylog/graylog Docker image with the needed plugin included or you add a volume that points to the locally downloaded plugin file.

New Docker image

Simply create a new Dockerfile in an empty directory with the following contents:

FROM graylog/graylog:2.4
RUN wget -O /usr/share/graylog/plugin/graylog-plugin-auth-sso-2.4.0.jar https://github.com/Graylog2/graylog-plugin-auth-sso/releases/download/2.4.0/graylog-plugin-auth-sso-2.4.0.jar

Build a new image from the new Dockerfile (also see docker build):

$ docker build -t graylog-with-sso-plugin .

In this example, we created a new image with the SSO plugin installed. From now on reference to the newly built image instead of graylog/graylog.

The docker-compose.yml file has to reference the new Docker image:

version: '2'
services:
  mongo:
    image: "mongo:3"
    # Other settings [...]
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:5.6.12
    # Other settings [...]
  graylog:
    image: graylog-with-sso-plugin
    # Other settings [...]
Volume-mounted plugin

Instead of building a new docker image, you can also add additional plugins by mounting them directly and individually into the plugin folder of the original Docker image. This way, you don’t have to create a new docker image every time you want to add a new plugin (or remove an old one).

Simply create a plugin folder, download the plugin(s) you want to install into it and mount each file as an additional volume into the docker container:

$ mkdir -p ./graylog/plugin
$ wget -O ./graylog/plugin/graylog-plugin-auth-sso-2.4.0.jar https://github.com/Graylog2/graylog-plugin-auth-sso/releases/download/2.4.0/graylog-plugin-auth-sso-2.4.0.jar

The docker-compose.yml file has to reference the new Docker image:

version: '2'
services:
  mongo:
    image: "mongo:3"
    # Other settings [...]
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:5.6.12
    # Other settings [...]
  graylog:
    image: graylog/graylog:2.4
    # Other settings [...]
    volumes:
      # Mount local plugin file into Docker container
      - ./graylog/plugin/graylog-plugin-auth-sso-2.4.0.jar:/usr/share/graylog/plugin/graylog-plugin-auth-sso-2.4.0.jar

You can add as many of these links as you wish in your docker-compose.yml file. Simply restart the container and docker will recreate the graylog container with the new volumes included:

$ docker-compose restart

Troubleshooting

  • In case you see warnings regarding open file limit, try to set ulimit from the outside of the container:

    $ docker run --ulimit nofile=64000:64000 ...
    
  • The devicemapper storage driver can produce problems with Graylogs disk journal on some systems. In this case please pick another driver like aufs or overlay.

Testing a beta version

Caution

We only recommend running pre-release versions if you are an experienced Graylog user and know what you are doing.

You can also run a pre-release (alpha, beta, or release candidate) version of Graylog using Docker.

The pre-releases are tagged in the graylog/graylog Docker image.

Follow the documentation for the Graylog image on Docker Hub and pick an alpha/beta/rc tag like this:

$ docker run --link mongo --link elasticsearch -p 9000:9000 -p 12201:12201 -p 514:514 \
    -e GRAYLOG_WEB_ENDPOINT_URI="http://127.0.0.1:9000/api" \
    -d graylog/graylog:2.4.0-beta.1-3

Vagrant

Requirements

You need a recent vagrant version.

Installation

These steps will create a Vagrant virtual machine with all Graylog services running:

$ wget https://raw.githubusercontent.com/Graylog2/graylog2-images/2.4/vagrant/Vagrantfile
$ vagrant up

Usage

After starting the virtual machine, your Graylog instance is ready to use. You can reach the web interface by pointing your browser to : http://localhost:8080

The default login is Username: admin, Password: admin.

Configuration

We are shipping the graylog-ctl tool with the virtual machine appliances to get you started with a customised setup as quickly as possible. Run these (optional) commands to configure the most basic settings of Graylog in the appliance:

sudo graylog-ctl set-email-config <smtp server> [--port=<smtp port> --user=<username> --password=<password>]
sudo graylog-ctl set-admin-password <password>
sudo graylog-ctl set-timezone <zone acronym>
sudo graylog-ctl reconfigure

The graylog-ctl has much more functionality documented. We strongly recommend to learn more about it to ensure smooth operation of your virtual appliance.

If you want to create your own customised setup take a look at our other installation methods.

OpenStack

Installation

Download the Graylog image from the package site, uncompress it and import it into the OpenStack image store:

$ wget https://packages.graylog2.org/releases/graylog-omnibus/qcow2/graylog-2.4.0-1.qcow2.gz
$ gunzip graylog-2.4.0-1.qcow2.gz
$ glance image-create --name='graylog' --is-public=true --container-format=bare --disk-format=qcow2 --file graylog-2.4.0-1.qcow2

You should now see an image called graylog in the OpenStack web interface under Images

Usage

Launch a new instance of the image, make sure to reserve at least 4GB ram for the instance. After spinning up, login with the username ubuntu and your selected ssh key. Run the reconfigure program in order to setup Graylog and start all services:

$ ssh ubuntu@<vm IP>
$ sudo graylog-ctl reconfigure

Open http://<vm ip> in your browser to access the Graylog web interface. Default username and password is admin.

Basic configuration

We are shipping the graylog-ctl tool with the virtual machine appliances to get you started with a customised setup as quickly as possible. Run these (optional) commands to configure the most basic settings of Graylog in the appliance:

sudo graylog-ctl set-email-config <smtp server> [--port=<smtp port> --user=<username> --password=<password>]
sudo graylog-ctl set-admin-password <password>
sudo graylog-ctl set-timezone <zone acronym>
sudo graylog-ctl reconfigure

The graylog-ctl has much more functionality documented. We strongly recommend to learn more about it to ensure smooth operation of your virtual appliance.

Production readiness

The Graylog appliance is not created to provide a production ready solution. It is build to offer a fast and easy way to try the software itself and not wasting time to install Graylog and it components to any kind of server.

If you want to create your own production ready setup take a look at our other installation methods.

AWS - AMIs

AMIs

Select your AMI and AWS Region.

Usage

  • Click on Launch instance for your AWS region to start Graylog into.
  • Choose an instance type with at least 4GB memory.
  • Finish the wizard and spin up the VM.
  • Login to the instance via SSH as user ubuntu.
  • Run sudo graylog-ctl reconfigure.
  • Open port 80 and 9000 in the applied security group to access the web interface.
  • additionally open more ports for ingesting log data, like 514 for syslog or 12201 for the GELF protocol.

Open http://<private ip> in your browser to access the Graylog web interface. Default username and password is admin.

Networking

Your browser needs access to port 80 or 443 for reaching the web interface. The interface itself creates a connection back to the REST API of the Graylog server on port 9000. As long as you are in a private network like Amazon VPC for instance, this works out of the box. If you want to use the public IP address of your VM, this mechanism doesn’t work automatically anymore. You have to tell Graylog how to reach the API from the users browser perspective:

sudo graylog-ctl set-external-ip http://<public ip>:9000/api/
sudo graylog-ctl reconfigure

Also make sure that this port is open, even on the public IP.

HTTPS

In order to enable HTTPS for the web interface both ports need to be encrypted. Otherwise the web browser would show an error message. For this reason we created a proxy configuration on the appliance that can be enabled by running:

sudo graylog-ctl enforce-ssl
sudo graylog-ctl reconfigure

This command combines the Graylog web interface and the API on port 443. The API is accessable via the path /api. For this reason you have to set the external IP to an HTTPS address with the appended path /api:

sudo graylog-ctl set-external-ip https://<public ip>:443/api
sudo graylog-ctl reconfigure

Basic configuration

We are shipping the graylog-ctl tool with the virtual machine appliances to get you started with a customised setup as quickly as possible. Run these (optional) commands to configure the most basic settings of Graylog in the appliance:

sudo graylog-ctl set-email-config <smtp server> [--port=<smtp port> --user=<username> --password=<password>]
sudo graylog-ctl set-admin-password <password>
sudo graylog-ctl set-timezone <zone acronym>
sudo graylog-ctl reconfigure

The graylog-ctl has much more functionality documented. We strongly recommend to learn more about it to ensure smooth operation of your virtual appliance.

Production readiness

The Graylog appliance is not created to provide a production ready solution. It is build to offer a fast and easy way to try the software itself and not wasting time to install Graylog and it components to any kind of server.

If you want to create your own production ready setup take a look at our other installation methods.

Microsoft Windows

Unfortunately there is no supported way to run Graylog on Microsoft Windows operating systems even though all parts run on the Java Virtual Machine. We recommend to run the virtual machine appliances on a Windows host. It should be technically possible to run Graylog on Windows but it is most probably not worth the time to work your way around the cliffs.

Should you require running Graylog on Windows, you need to disable the message journal in graylog-server by changing the following setting in the graylog.conf:

message_journal_enabled = false

Due to restrictions of how Windows handles file locking the journal will not work correctly.

Please note that this impacts Graylog’s ability to buffer messages, so we strongly recommend running Graylog on Linux. Consider a Linux virtual machine on a Windows host. Graylog setups on Windows are no fun and not officially supported.

Manual Setup

Graylog server on Linux

Prerequisites

Graylog depends on MongoDB and Elasticsearch to operate, please refer to the system requirements for details.

Downloading and extracting the server

Download the tar archive from the download pages and extract it on your system:

~$ tar xvfz graylog-VERSION.tgz
~$ cd graylog-VERSION
Configuration

Now copy the example configuration file:

~# cp graylog.conf.example /etc/graylog/server/server.conf

You can leave most variables as they are for a first start. All of them should be well documented.

Configure at least the following variables in /etc/graylog/server/server.conf:

  • is_master = true
    • Set only one graylog-server node as the master. This node will perform periodical and maintenance actions that slave nodes won’t. Every slave node will accept messages just as the master nodes. Nodes will fall back to slave mode if there already is a master in the cluster.
  • password_secret
    • You must set a secret that is used for password encryption and salting here. The server will refuse to start if it’s not set. Generate a secret with for example pwgen -N 1 -s 96. If you run multiple graylog-server nodes, make sure you use the same password_secret for all of them!
  • root_password_sha2
    • A SHA2 hash of a password you will use for your initial login. Set this to a SHA2 hash generated with echo -n "Enter Password: " && head -1 </dev/stdin | tr -d '\n' | sha256sum | cut -d" " -f1 and you will be able to log in to the web interface with username admin and password yourpassword.
  • elasticsearch_shards = 4
    • The number of shards for your indices. A good setting here highly depends on the number of nodes in your Elasticsearch cluster. If you have one node, set it to 1.
  • elasticsearch_replicas = 0
    • The number of replicas for your indices. A good setting here highly depends on the number of nodes in your Elasticsearch cluster. If you have one node, set it to 0.
  • mongodb_uri
    • Enter your MongoDB connection and authentication information here.
Starting the server

You need to have Java installed. Running the OpenJDK is totally fine and should be available on all platforms. For example on Debian it is:

~$ apt-get install openjdk-8-jre

Start the server:

~$ cd bin/
~$ ./graylogctl start

The server will try to write a node_id to the graylog-server-node-id file. It won’t start if it can’t write there because of for example missing permissions.

See the startup parameters description below to learn more about available startup parameters. Note that you might have to be root to bind to the popular port 514 for syslog inputs.

You should see a line like this in the debug output of Graylog successfully connected to your Elasticsearch cluster:

2013-10-01 12:13:22,382 DEBUG: org.elasticsearch.transport.netty - [graylog-server] connected to node [[Unuscione, Angelo][thN_gIBkQDm2ab7k-2Zaaw][inet[/10.37.160.227:9300]]]

You can find the logs of Graylog in the directory logs/.

Important: All systems running Graylog must have synchronised system time. We strongly recommend to use NTP or similar mechanisms on all machines of your Graylog infrastructure.

Supplying external logging configuration

Graylog is using Apache Log4j 2 for its internal logging and ships with a default log configuration file which is embedded within the shipped JAR.

In case you need to modify Graylog’s logging configuration, you can supply a Java system property specifying the path to the configuration file in your start script (e. g. graylogctl).

Append this before the -jar parameter:

-Dlog4j.configurationFile=file:///path/to/log4j2.xml

Substitute the actual path to the file for the /path/to/log4j2.xml in the example.

In case you do not have a log rotation system already in place, you can also configure Graylog to rotate logs based on their size to prevent the log files to grow without bounds using the RollingFileAppender.

One such example log4j2.xml configuration is shown below:

<?xml version="1.0" encoding="UTF-8"?>
<Configuration packages="org.graylog2.log4j" shutdownHook="disable">
  <Appenders>
      <RollingFile name="RollingFile" fileName="/tmp/logs/graylog.log"
                   filePattern="/tmp/logs/graylog-%d{yyyy-MM-dd}.log.gz">
        <PatternLayout>
          <Pattern>%d %-5p: %c - %m%n</Pattern>
        </PatternLayout>
        <!-- Rotate logs every day or when the size exceeds 10 MB (whichever comes first) -->
        <Policies>
          <TimeBasedTriggeringPolicy modulate="true"/>
          <SizeBasedTriggeringPolicy size="10 MB"/>
        </Policies>
        <!-- Keep a maximum of 10 log files -->
        <DefaultRolloverStrategy max="10"/>
      </RollingFile>

      <Console name="STDOUT" target="SYSTEM_OUT">
          <PatternLayout pattern="%d %-5p: %c - %m%n"/>
      </Console>

      <!-- Internal Graylog log appender. Please do not disable. This makes internal log messages available via REST calls. -->
      <Memory name="graylog-internal-logs" bufferSize="500"/>
  </Appenders>
  <Loggers>
      <Logger name="org.graylog2" level="info"/>
      <Logger name="com.github.joschi.jadconfig" level="warn"/>
      <Logger name="org.apache.directory.api.ldap.model.message.BindRequestImpl" level="error"/>
      <Logger name="org.elasticsearch.script" level="warn"/>
      <Logger name="org.graylog2.periodical.VersionCheckThread" level="off"/>
      <Logger name="org.drools.compiler.kie.builder.impl.KieRepositoryImpl" level="warn"/>
      <Logger name="com.joestelmach.natty.Parser" level="warn"/>
      <Logger name="kafka.log.Log" level="warn"/>
      <Logger name="kafka.log.OffsetIndex" level="warn"/>
      <Logger name="org.apache.shiro.session.mgt.AbstractValidatingSessionManager" level="warn"/>
      <Root level="warn">
          <AppenderRef ref="STDOUT"/>
          <AppenderRef ref="RollingFile"/>
          <AppenderRef ref="graylog-internal-logs"/>
      </Root>
  </Loggers>
</Configuration>
Command line (CLI) parameters

There are a number of CLI parameters you can pass to the call in your graylogctl script:

  • -h, --help: Show help message
  • -f CONFIGFILE, --configfile CONFIGFILE: Use configuration file CONFIGFILE for Graylog; default: /etc/graylog/server/server.conf
  • -d, --debug: Run in debug mode
  • -l, --local: Run in local mode. Automatically invoked if in debug mode. Will not send system statistics, even if enabled and allowed. Only interesting for development and testing purposes.
  • -p PIDFILE, --pidfile PIDFILE: Set the file containing the PID of graylog to PIDFILE; default: /tmp/graylog.pid
  • -np, --no-pid-file: Do not write PID file (overrides -p/--pidfile)
  • --version: Show version of Graylog and exit
Problems with IPv6 vs. IPv4?

If your Graylog node refuses to listen on IPv4 addresses and always chooses for example a rest_listen_address like :::9000 you can tell the JVM to prefer the IPv4 stack.

Add the java.net.preferIPv4Stack flag in your graylogctl script or from wherever you are calling the graylog.jar:

~$ sudo -u graylog java -Djava.net.preferIPv4Stack=true -jar graylog.jar
Create a message input and send a first message

Log in to the web interface on port 9000 (e.g. http://127.0.0.1:9000) and navigate to System -> Inputs.

_images/create_input.png

Launch a new Raw/Plaintext UDP input, listening on 127.0.0.1 on port 9099. There’s no need to configure anything else for now.

The list of running inputs on that node should show you your new input right away.

Let’s send a message in:

echo "Hello Graylog, let's be friends." | nc -w 1 -u 127.0.0.1 9099

This has sent a short string to the raw UDP input you just opened. Now search for friends using the search bar on the top and you should already see the message you just sent in. Click on it in the table and see it in detail:

_images/setup_1.png

You have just sent your first message to Graylog! Why not spawn a syslog input and point some of your servers to it? You could also create some user accounts for your colleagues.

System requirements

The Graylog server application has the following prerequisites:

  • Some modern Linux distribution (Debian Linux, Ubuntu Linux, or CentOS recommended)
  • Elasticsearch 2.3.5 or later
  • MongoDB 2.4 or later (latest stable version is recommended)
  • Oracle Java SE 8 (OpenJDK 8 also works; latest stable update is recommended)

Caution

Graylog prior to 2.3 does not work with Elasticsearch 5.x!

Caution

Graylog 2.4 does not work with Elasticsearch 6.x yet!

Upgrading Graylog

When upgrading from a previous version of Graylog you follow the previous used installation method (ex. from image or package) using the new version numbers.

The following Upgrade notes should be read carefully before you start the upgrade process. Breaking changes and dependency upgrades are documented in those upgrade notes.

You should always follow minor versions when updating across multiple versions to make sure necessary migrations are run correctly. The upgrade notes are always written coming from the stable release before.

Upgrading to Graylog 2.0.x

Elasticsearch 2.x

The embedded Elasticsearch node being used by Graylog has been upgraded to Elasticsearch 2.x which includes some breaking changes. Graylog 2.x does not work with Elasticsearch 1.x anymore and cannot communicate with existing Elasticsearch 1.x clusters.

Please see Breaking changes in Elasticsearch 2.x for details.

The blog article Key points to be aware of when upgrading from Elasticsearch 1.x to 2.x also contains interesting information about the upgrade path from Elasticsearch 1.x to 2.x.

Multicast Discovery

Multicast discovery has been removed from Elasticsearch 2.x (although it is still provided as an Elasticsearch plugin for now).

To reflect this change, the elasticsearch_discovery_zen_ping_unicast_hosts now has to contain the address of at least one Elasticsearch node in the cluster which Graylog can connect to.

Default network host

The network interface which Elasticsearch binds to (elasticsearch_network_host) has been changed to localhost (i. e. 127.0.0.1 or ::1); see Network changes/Bind to localhost.

If Elasticsearch is not running on the same machine, elasticsearch_network_host must be set to a host name or an IP address which can be accessed by the other Elasticsearch nodes in the cluster.

Index range types

Note

This step needs to be performed before the update to Elasticsearch 2.x!

Some Graylog versions stored meta information about indices in elasticsearch, alongside the messages themselves. Since Elasticsearch 2.0 having multiple types with conflicting mappings is no longer possible, which means that the index_range type must be removed before upgrading to Elasticsearch 2.x.

Find out if your setup is affected by running (replace $elasticsearch with the address of one of your Elasticsearch nodes) curl -XGET $elasticsearch:9200/_all/_mapping/index_range; echo

If the output is {} you are not affected and can skip this step.

Otherwise, you need to delete the index_range type, Graylog does not use it anymore.

As Graylog sets older indices to read-only, first we need to remove the write block on those indices. Since we’ll be working with Elasticsearch’s JSON output, we recommend installing the jq utility which should be available on all popular package managers or directly at GitHub.

for i in `curl -s -XGET $elasticsearch:9200/_all/_mapping/index_range | jq -r "keys[]"`; do
    echo -n "Updating index $i: "
    echo -n "curl -XPUT $elasticsearch:9200/$i/_settings -d '{\"index.blocks.read_only\":false, \"index.blocks.write\":false}' : "
    curl -XPUT $elasticsearch:9200/$i/_settings -d '{"index.blocks.read_only":false, "index.blocks.write":false}'
    echo
done

The output for each of the curl commands should be {"acknowledged":true}. Next we have to delete the index_range mapping. We can perform this via the next command.

Note

We strongly recommend to perform this on a single index before running this bulk command. This operation can be expensive to perform if you have a lot of affected indices.

for i in `curl -s -XGET $elasticsearch:9200/_all/_mapping/index_range | jq -r "keys[]"`; do
    echo -n "Updating index $i: "
    curl -XDELETE $elasticsearch:9200/$i/index_range
    echo
done

It is not strictly necessary to set the indices back to read only, but if you prefer to do that, note the index names and commands during the first step and change the false into true.

Graylog Index Template

Graylog applies a custom index template to ensure that the indexed messages adhere to a specific schema.

Unfortunately the index template being used by Graylog 1.x is incompatible with Elasticsearch 2.x and has to be removed prior to upgrading.

In order to delete the index template the following curl command has to be issued against on of the Elasticsearch nodes:

curl -X DELETE http://localhost:9200/_template/graylog-internal

Graylog will automatically create the new index template on the next startup.

Dots in field names

One of the most important breaking changes in Elasticsearch 2.x is that field names may not contain dots anymore.

Using the Elasticsearch Migration Plugin might help to highlight some potential pitfalls if an existing Elasticsearch 1.x cluster should be upgraded to Elasticsearch 2.x.

MongoDB

Graylog 2.x requires MongoDB 2.4 or newer. We recommend using MongoDB 3.x and the WiredTiger storage engine.

When upgrading from MongoDB 2.0 or 2.2 to a supported version, make sure to read the Release Notes for the particular version.

Log4j 2 migration

Graylog switched its logging backend from Log4j 1.2 to Log4j 2.

Please refer to the Log4j Migration Guide for information on how to update your existing logging configuration.

Dead Letters feature removed

The Dead Letters feature, which stored messages that couldn’t be indexed into Elasticsearch for various reasons, has been removed.

This feature has been disabled by default. If you have enabled the feature the configuration file, please check the dead_letters_enabled collection in MongoDB and remove it afterwards.

Removed configuration settings

Index Retention and Rotation Settings

In 2.0.0 the index rotation and retention settings have been moved from the Graylog server config file to the database and are now configurable via the web interface.

The old settings from the graylog.conf or /etc/graylog/server/server.conf will be migrated to the database.

Warning

When you upgrade from a 1.x version and you modified any rotation/retention settings, please make sure you KEEP your old settings in the config file so the migration process will add your old settings to the database! Otherwise the retention process will use the default settings and might remove a lot of indices.

Overview

Some settings, which have been deprecated in previous versions, have finally been removed from the Graylog configuration file.

Removed configuration settings
Setting name Replacement
mongodb_host mongodb_uri
mongodb_port mongodb_uri
mongodb_database mongodb_uri
mongodb_useauth mongodb_uri
mongodb_user mongodb_uri
mongodb_password mongodb_uri
elasticsearch_node_name elasticsearch_node_name_prefix
collector_expiration_threshold (moved to collector plugin)
collector_inactive_threshold (moved to collector plugin)
rotation_strategy UI in web interface (System/Indices)
retention_strategy UI in web interface (System/Indices)
elasticsearch_max_docs_per_index UI in web interface (System/Indices)
elasticsearch_max_size_per_index UI in web interface (System/Indices)
elasticsearch_max_time_per_index UI in web interface (System/Indices)
elasticsearch_max_number_of_indices UI in web interface (System/Indices)
dead_letters_enabled None

Changed configuration defaults

For better consistency, the defaults of some configuration settings have been changed after the project has been renamed from Graylog2 to Graylog.

Configuration defaults
Setting name Old default New default
elasticsearch_cluster_name graylog2 graylog
elasticsearch_node_name graylog2-server graylog-server
elasticsearch_index_prefix graylog2 graylog
elasticsearch_discovery_zen_ping_unicast_hosts empty 127.0.0.1:9300
elasticsearch_discovery_zen_ping_multicast_enabled true false
mongodb_uri mongodb://127.0.0.1/graylog2 mongodb://localhost/graylog

Changed prefixes for configuration override

In the past it was possible to override configuration settings in Graylog using environment variables or Java system properties with a specific prefix.

For better consistency, these prefixes have been changed after the project has been renamed from Graylog2 to Graylog.

Configuration override prefixes
Override Old prefix New prefix Example
Environment variables GRAYLOG2_ GRAYLOG_ GRAYLOG_IS_MASTER
System properties graylog2. graylog. graylog.is_master

REST API Changes

The output ID key for the list of outputs in the /streams/* endpoints has been changed from _id to id.

 {
   "id": "564f47c41ec8fe7d920ef561",
   "creator_user_id": "admin",
   "outputs": [
     {
       "id": "56d6f2cce45e0e52d1e4b9cb", // ==> Changed from `_id` to `id`
       "title": "GELF Output",
       "type": "org.graylog2.outputs.GelfOutput",
       "creator_user_id": "admin",
       "created_at": "2016-03-02T14:03:56.686Z",
       "configuration": {
         "hostname": "127.0.0.1",
         "protocol": "TCP",
         "connect_timeout": 1000,
         "reconnect_delay": 500,
         "port": 12202,
         "tcp_no_delay": false,
         "tcp_keep_alive": false,
         "tls_trust_cert_chain": "",
         "tls_verification_enabled": false
       },
       "content_pack": null
     }
   ],
   "matching_type": "AND",
   "description": "All incoming messages",
   "created_at": "2015-11-20T16:18:12.416Z",
   "disabled": false,
   "rules": [],
   "alert_conditions": [],
   "title": "ALL",
   "content_pack": null
 }

Web Interface Config Changes

The web interface has been integrated into the Graylog server and was rewritten in React. Therefore configuring it has changed fundamentally since the last version(s). Please consult Web interface for details.

Please take note that the application.context configuration parameter present in Graylog 1.x (and earlier) is not existing anymore. The web interface can currently only be served without a path prefix.

Upgrading to Graylog 2.1.x

HTTPS Setup

Previous versions of Graylog were automatically generating a private key/certificate pair for HTTPS if either the private key or the certificate (or both) for rest_tls_key_file, rest_tls_cert_file, web_tls_key_file, or web_tls_cert_file couldn’t be read. While this feature is very comfortable for inexperienced users, it has lots of serious drawbacks like very weak key sizes (only 1024 bits), being untrusted by all TLS libraries used by web browsers and other client software (because they are self-signed and not included in the system’s CA/trust store), and problems with inter-node communications with other Graylog nodes.

Due to those shortcomings, the feature has been removed completely. Users need to use proper certificates or generate their own self-signed certificates and configure them with the appropriate settings, see Using HTTPS for reference.

Web Interface Listener

Graylog 2.0.x has been using separate listeners for the REST API and the web interface by default. The Graylog REST API on http://127.0.0.1:12900, the Graylog web interface on http://127.0.0.1:9000. Beginning with Graylog 2.1.0 it is possible to run both the REST API and the web interface on the same host/port-combination and this is now the default. This means that the REST API is now running on http://127.0.0.1:9000/api/ by default and the web interface is now running on http://127.0.0.1:9000/. Furthermore, all requests going to http://127.0.0.1:9000/api/ requesting a content-type of text/html or application/xhtml+xml are redirected to the web interface, therefore making it even easier to set up Graylog and use it behind proxies, expose it externally etc.

Please take note that you can still run the REST API and the web interface on two separate listeners. If you are running a Graylog 2.0.x configuration specifying web_listen_uri explicitly and you want to keep that, you do not have to change anything.

Please also take note, that when you have configured rest_listen_uri and web_listen_uri to run on the same host/port-combination, the following configuration directives will have no effect:

  • web_enable_tls, web_tls_cert_file, web_tls_key_file, web_tls_key_password (These will depend on the TLS configuration of the REST listener).
  • web_enable_cors, web_enable_gzip, web_thread_pool_size, web_max_initial_line_length, web_max_header_size (Those will depend on the corresponding settings of the REST listener).

Internal Metrics to MongoDB

Previous versions of Graylog included a (long deprecated) metrics reporter for writing internal metrics into MongoDB in a fixed interval of 1 second.

This feature has been removed completely and can be optionally pulled in by using the Graylog Metrics Reporter Plugins.

Configuration file changes

Network settings

The network settings in the Graylog configuration file (rest_listen_uri, rest_transport_uri, and web_listen_uri) are now using the default ports for the HTTP (80) and HTTPS (443) if no custom port was given. Previously those settings were using the custom ports 12900 (Graylog REST API) and 9000 (Graylog web interface) if no explicit port was given.

Examples:

Configuration setting Old effective URI New effective URI
rest_listen_uri = http://127.0.0.1:12900/ http://127.0.0.1:12900/ http://127.0.0.1:12900/
rest_listen_uri = http://127.0.0.1/ http://127.0.0.1:12900/ http://127.0.0.1:80/
rest_listen_uri = https://127.0.0.1/ https://127.0.0.1:12900/ https://127.0.0.1:443/
Collector Sidecar

The network changes are reflected in the Sidecar configuration as well and should be adopted. However it’s still possible to use the old API port by setting it explicitly. In case a mass deployment is too hard to change, just run the following to switch back to the old REST API port (OVA based installation):

sudo graylog-ctl set-listen-address --service rest --address http://0.0.0.0:12900
sudo graylog-ctl reconfigure

Graylog REST API

Removed resources
Original resource Replacement
/system/buffers /system/metrics/org.graylog2.buffers.input.size
/system/metrics/org.graylog2.buffers.input.usage
/system/metrics/org.graylog2.buffers.process.size
/system/metrics/org.graylog2.buffers.process.usage
/system/metrics/org.graylog2.buffers.output.size
/system/metrics/org.graylog2.buffers.output.usage
/system/buffers/classes None
Removed index rotation/retention settings from “/system/configuration”

The index rotation and retention settings have been moved to MongoDB in Graylog 2.0.0 but the representation of the old configuration options was still present in the /system/configuration resource.

In order to stay in sync with the actual configuration file, the following values have been removed:

  • rotation_strategy
  • retention_strategy
  • elasticsearch_max_docs_per_index
  • elasticsearch_max_size_per_index
  • elasticsearch_max_time_per_index
  • elasticsearch_max_number_of_indices

The retention and rotation configuration settings can be retrieved using the following resources:

  • /system/indices/rotation/config
  • /system/indices/retention/config

For Plugin Authors

Between Graylog 2.0.x and 2.1.0 we also made changes to the Plugin API. These include:

  • Removing org.graylog2.plugin.streams.Stream#getAlertCondition, as it was faulty and not easily replaceable with a working version without breaking our separation of models and persistence services.

If you are maintaining a plugin that was originally written for Graylog 1.x or 2.0.x, you need to make sure that your plugin is still compiling and working under Graylog 2.1.x or adapt it if necessary.

UI Plugins

The new app prefix feature requires some changes in UI plugins to make them work with that.

  • import webpackEntry from 'webpack-entry'; needs to be added at the very top of the src/web/index.jsx file
  • The Routes.pluginRoute() function needs to be used instead of a literal string to build URLs for links and buttons

Please check the updated plugins documentation for details.

Changed Elasticsearch Cluster Status Behavior

In previous versions Graylog stopped indexing into the current write index if the Elasticsearch cluster status turned RED. Since Graylog 2.1.0 only checks the status of the current write index when it tries to index messages.

If the current write index is GREEN or YELLOW, Graylog will continue to index messages even though the overall cluster status is RED. This avoids Graylog downtimes when doing Elasticsearch maintenance or when older indices have problems.

Changes in message field values trimming

Previous versions of Graylog were trimming message field values inconsistently, depending on the codec used. We have changed that behaviour in Graylog 2.1.0, so all message field values are trimmed by default. This means that leading or trailing whitespace of every field is removed during ingestion.

Important: This change will break your existing stream rules, extractors, and Drool rules if you are expecting leading or trailing white spaces in them. Please adapt them so they do not require those white spaces.

Upgrading to Graylog 2.2.x

Email Alarm Callback

Previous versions of Graylog created an implicit email alarm callback if no explicit callback existed for a stream.

Due to the extensive rework done in alerting, this behavior has been modified to be explicit, and more consistent with other entities within Graylog: from now on there will not be a default alarm callback.

To simplify the transition for people relying on this behavior, we have added a migration step that will create an email alarm callback for each stream that has alert conditions, has alert receivers, but has no associated alarm callbacks.

With to the introduction of email templates in 0.21, the transport_email_subject_prefix config setting became unused. It is now being removed completely. In early versions it was used to add a prefix to the generated subject of alerting emails. Since 0.21 it is possible to define a complete template used for the generation of alert email subjects.

Alert Notifications (previously known as Alarm Callbacks)

Graylog 2.2.0 introduces some changes in alerting. Alerts have now states, helping you to know in an easier way if something requires your attention.

These changes also affect the way we send notifications: Starting in Graylog 2.2.0, alert notifications are only executed once, just when a new alert is triggered. As long as the alert is unresolved or in grace period, Graylog will not send further notifications. This will help you reducing the noise and annoyance of getting notified way too often when a problem persists for a while.

If you are using Graylog for alerting, please take a moment to ensure this change will not break any of your processes when an alert occurs.

Default stream/Index Sets

With the introduction of index sets, and the ability to change a stream’s write target, the default stream needs additional information, which is calculated when starting a new Graylog 2.2 master node.

It requires recalculation of the index ranges of the default stream’s index set, which when updating from pre-2.2 versions is stored in the graylog_ index. This is potentially expensive, because it has to calculate three aggregations across every open index to detect which streams are stored in which index.

Please be advised that this necessary migration can put additional load on your cluster.

Warning

Make sure that all rotation and retention strategy plugins you had installed in 2.1 are updated to a version that is compatible with 2.2 before you start the Graylog 2.2 version for the first time. (e.g. Graylog Enterprise) This is needed so the required data migrations will run without problems.

Warning

The option to remove a message from the default stream is currently not available when using the pipeline function route_to_stream. This will be fixed in a subsequent bug fix release. Please see the corresponding Github issue.

RotationStrategy & RetentionStrategy Interfaces

The Java interfaces for RetentionStrategy and RotationStrategy changed in 2.2. The #rotate() and #retain() methods are now getting an IndexSet as first parameter.

This only affects you if you are using custom rotation or retention strategies.

Changes in Exposed Configuration

The exposed configuration settings on the /system/configuration resource of the Graylog REST API doesn’t contain the following (deprecated) Elasticsearch-related settings anymore:

  • elasticsearch_shards
  • elasticsearch_replicas
  • index_optimization_max_num_segments
  • disable_index_optimization

Changes in Split & Count Converter

The behavior of the split & count converter has been changed to that it resembles typical split() functions.

Previously, the split & count converter returned 0, if the split pattern didn’t occur in the string. Now it will return 1.

Examples:

String Split Pattern Old Result | New Result
<empty> - 0 0
foo - 0 1
foo-bar - 2 2

Graylog REST API

Streams API

Due to the introduction of index sets, the payload for creating, updating and cloning of streams now requires the index_set_id field. The value for this needs to be the ID of an existing index set.

Affected endpoints:

  • POST /streams
  • PUT  /streams/{streamId}
  • POST /streams/{streamId}/clone

Upgrading to Graylog 2.3.x

Graylog switches to Elasticsearch HTTP client

In all prior versions, Graylog used the Elasticsearch node client to connect to an Elasticsearch cluster, which was acting as a client-only Elasticsearch node. For compatibility reasons of the used binary transfer protocol, the range of Elasticsearch versions Graylog could connect to was limited. For more information and differences between the different ways to connect to Elasticsearch, you can check the Elasticsearch documentation.

Starting with version 2.3.0, we are switching over to using a lightweight HTTP client, which is almost version-agnostic. The biggest change is that it does not connect to the Elasticsearch native protocol port (defaulting to 9300/tcp), but the Elasticsearch HTTP port (defaulting to 9200/tcp).

Due to the differences in connecting to the Elasticsearch cluster, configuring Graylog has changed. These configuration settings have been removed:

elasticsearch_cluster_discovery_timeout
elasticsearch_cluster_name
elasticsearch_config_file
elasticsearch_discovery_initial_state_timeout
elasticsearch_discovery_zen_ping_unicast_hosts
elasticsearch_http_enabled
elasticsearch_network_bind_host
elasticsearch_network_host
elasticsearch_network_publish_host
elasticsearch_node_data
elasticsearch_node_master
elasticsearch_node_name_prefix
elasticsearch_path_data
elasticsearch_path_home
elasticsearch_transport_tcp_port

The following configuration options are now being used to configure connectivity to Elasticsearch:

Config Setting Type Comments Default
elasticsearch_connect_timeout Duration Timeout when connection to individual Elasticsearch hosts 10s (10 Seconds)
elasticsearch_hosts List<URI> Comma-separated list of URIs of Elasticsearch hosts http://127.0.0.1:9200
elasticsearch_idle_timeout Duration Timeout after which idle connections are terminated -1s (Never)
elasticsearch_max_total_connections int Maximum number of total Elasticsearch connections 20
elasticsearch_max_total_connections_per_route int Maximum number of Elasticsearch connections per route/host 2
elasticsearch_max_retries int Maximum number of retries for requests to Elasticsearch 2
elasticsearch_socket_timeout Duration Timeout when sending/receiving from Elasticsearch connection 60s (60 Seconds)
elasticsearch_discovery_enabled boolean Enable automatic Elasticsearch node discovery false
elasticsearch_discovery_filter String Filter by node attributes for the discovered nodes empty (use all nodes)
elasticsearch_discovery_frequency Duration Frequency of the Elasticsearch node discovery 30s (30 Seconds)

In most cases, the only configuration setting that needs to be set explicitly is elasticsearch_hosts. All other configuration settings should be tweaked only in case of errors.

Warning

The automatic node discovery does not work if Elasticsearch requires authentication, e. g. when using Shield (X-Pack).

Caution

Graylog does not react to externally triggered index changes (creating/closing/reopening/deleting an index) anymore. All of these actions need to be performed through the Graylog REST API in order to retain index consistency.

Special note for upgrading from an existing Graylog setup with a new Elasticsearch cluster

If you are upgrading the Elasticsearch cluster of an existing Graylog setup without migrating the indices, your Graylog setup contains stale index ranges causing nonexisting index errors upon search/alerting. To remediate this, you need to manually trigger an index range recalculation for all index sets once. This is possible using the web interface using the System->Indices functionality or by using the REST API using the /system/indices/ranges/<index set id>/rebuild endpoint.

Graylog REST API

Rotation and Retention strategies

The deprecated HTTP resources at /system/indices/rotation/config and /system/indices/retention/config, which didn’t work since Graylog 2.2.0, have been removed.

These settings are part of the index set configuration and can be configured under /system/indices/index_sets.

Stream List Response structure does not include in_grace field anymore

The response to GET /streams, GET /streams/<id> & PUT /streams/<id> does not contain the in_grace field for configured alert conditions anymore.

The value of this flag can be retrieved using the GET /alerts/conditions endpoint, or per stream using the GET /streams/<streamId>/alerts/conditions endpoint.

Upgrading to Graylog 2.4.x

You can upgrade from Graylog 2.3.x to Graylog 2.4.x without the need to change the configuration of your Graylog server.

More plugins shipped by default

The following Graylog plugins are now shipped as part of the Graylog server release.

Warning

Make sure you remove all previous versions of these plugins from your plugin/ folder before starting the new Graylog version!

Upgrading Graylog Originally Installed from Image

The Virtual Machine Appliance (OVA) and Amazon Web Services (AMI) installations of Graylog use the Omnibus package. The upgrade documentation using Omnibus is part of the graylog-ctl documentation.

Upgrading Graylog Originally Installed from Package

If the current installation was installed using a package manager (ex. yum, apt), update the repository package to the target version, and use the system tools to upgrade the package. For .rpm based systems this update guide and for .deb based systems this update guide should help.

Upgrading Elasticsearch

Since Graylog 2.3 Elasticsearch 5.x is supported. This Graylog version supports Elasticsearch 2.x and 5.x. It is recommended to update Elasticsearch 2.x to the latest stable 5.x version, after you have Graylog 2.3 or later running. This Elasticsearch upgrade does not need to be made during the Graylog update.

When upgrading from Elasticsearch 2.x to Elasticsearch 5.x, make sure to read the upgrade guide provided by Elastic. The Graylog Elasticsearch configuration documentation contains information about the compatible Elasticsearch version. After the upgrade you must rotate the indices once manually.

Configuring Graylog

server.conf

The file server.conf is the Graylog configuration file. `

Note

Check Default file locations to locate it in your installation.

It has to use ISO 8859-1/Latin-1 character encoding. Characters that cannot be directly represented in this encoding can be written using Unicode escapes as defined in Java SE Specifications, using the u prefix. For example, u002c.

  • Entries are generally expected to be a single line of the form, one of the following:
    • propertyName=propertyValue
    • propertyName:propertyValue
  • White space that appears between the property name and property value is ignored, so the following are equivalent:
    • name=Stephen
    • name = Stephen
  • White space at the beginning of the line is also ignored.

  • Lines that start with the comment characters ! or # are ignored. Blank lines are also ignored.

  • The property value is generally terminated by the end of the line. White space following the property value is not ignored, and is treated as part of the property value.

  • A property value can span several lines if each line is terminated by a backslash (\) character. For example:

    targetCities=\
           Detroit,\
           Chicago,\
           Los Angeles
    

    This is equivalent to targetCities=Detroit,Chicago,Los Angeles (white space at the beginning of lines is ignored).

  • The characters newline, carriage return, and tab can be inserted with characters \n, \r, and \t, respectively.

  • The backslash character must be escaped as a double backslash. For example:

    path=c:\\docs\\doc1
    

Properties

General
  • is_master = true
    • If you are running more than one instances of Graylog server you have to select only one graylog-server node as the master. This node will perform periodical and maintenance actions that slave nodes won’t.
    • Every slave node will accept messages just as the master nodes. Nodes will fall back to slave mode if there already is a master in the cluster.
  • node_id_file = /etc/graylog/server/<node-id>
    • The auto-generated node ID will be stored in this file and read after restarts. It is a good idea to use an absolute file path here if you are starting Graylog server from init scripts or similar.
  • password_secret = <secret>
    • You MUST set a secret that is used for password encryption and salting. The server will refuse to start if it’s not set. Use at least 64 characters. If you run multiple graylog-server nodes, make sure you use the same password_secret for all of them!

    Note

    Generate a secret with for example pwgen -N 1 -s 96

  • root_username = admin
    • The default root user is named admin.
  • root_password_sha2 = <SHA2>
    • A SHA2 hash of a password you will use for your initial login. Set this to a SHA2 hash generated with echo -n "Enter Password: " && head -1 </dev/stdin | tr -d '\n' | sha256sum | cut -d" " -f1 and you will be able to log in to the web interface with username admin and password yourpassword.

    Caution

    You MUST specify a hash password for the root user (which you only need to initially set up the system and in case you lose connectivity to your authentication backend). This password cannot be changed using the API or via the web interface. If you need to change it, modify it in this file.

  • root_email = ""
    • The email address of the root user. Default is empty.
  • root_timezone = UTC
  • plugin_dir = plugin
    • Set plugin directory here (relative or absolute)
  • rest_listen_uri = http://127.0.0.1:9000/api/
    • REST API listen URI. Must be reachable by other Graylog server nodes if you run a cluster.
    • When using Graylog Collectors, this URI will be used to receive heartbeat messages and must be accessible for all collectors.
  • rest_transport_uri = http://192.168.1.1:9000/api/
    • REST API transport address. Defaults to the value of rest_listen_uri. Exception: If rest_listen_uri is set to a wildcard IP address (0.0.0.0) the first non-loopback IPv4 system address is used.
    • If set, this will be promoted in the cluster discovery APIs, so other nodes may try to connect on this address and it is used to generate URLs addressing entities in the REST API. (see rest_listen_uri)
    • You will need to define this, if your Graylog server is running behind a HTTP proxy that is rewriting the scheme, host name or URI.
    • This must not contain a wildcard address (0.0.0.0).
  • rest_enable_cors = false
    • Enable CORS headers for REST API. This is necessary for JS-clients accessing the server directly.
    • If these are disabled, modern browsers will not be able to retrieve resources from the server. This is enabled by default.
  • rest_enable_gzip = false
    • Enable GZIP support for REST API. This compresses API responses and therefore helps to reduce overall round trip times. This is enabled by default.
  • rest_enable_tls = true
    • Enable HTTPS support for the REST API. This secures the communication with the REST API with TLS to prevent request forgery and eavesdropping. This is disabled by default.
  • rest_tls_cert_file = /path/to/graylog.crt
    • The X.509 certificate chain file in PEM format to use for securing the REST API.
  • rest_tls_key_file = /path/to/graylog.key
    • The PKCS#8 private key file in PEM format to use for securing the REST API.
  • rest_tls_key_password = secret
    • The password to unlock the private key used for securing the REST API.
  • rest_max_header_size = 8192
    • The maximum size of the HTTP request headers in bytes.
  • rest_max_initial_line_length = 4096
    • The maximal length of the initial HTTP/1.1 line in bytes.
  • rest_thread_pool_size = 16
    • The size of the thread pool used exclusively for serving the REST API.
  • trusted_proxies = 127.0.0.1/32, 0:0:0:0:0:0:0:1/128
    • Comma separated list of trusted proxies that are allowed to set the client address with X-Forwarded-For header. May be subnets, or hosts.
Web
  • web_enable = true
    • Enable the embedded Graylog web interface. Enabled by default.
  • web_listen_uri = http://127.0.0.1:9000/
    • Web interface listen URI.
    • Configuring a path for the URI here effectively prefixes all URIs in the web interface. This is a replacement for the application.context configuration parameter in pre-2.0 versions of the Graylog web interface.
  • web_endpoint_uri =
    • Web interface endpoint URI. This setting can be overriden on a per-request basis with the X-Graylog-Server-URL header.
    • It takes the value of rest_transport_uri by default.
  • web_enable_cors = true
    • Enable CORS headers for the web interface. This is necessary for JS-clients accessing the server directly.
    • If these are disabled, modern browsers will not be able to retrieve resources from the server.
  • web_enable_gzip = true
    • Enable/disable GZIP support for the web interface. This compresses HTTP responses and therefore helps to reduce overall round trip times. This is enabled by default.
  • web_enable_tls = false
    • Enable HTTPS support for the web interface. This secures the communication of the web browser with the web interface using TLS to prevent request forgery and eavesdropping.
    • This is disabled by default. Set it to true to enable it and see the other related configuration settings.
  • web_tls_cert_file = /path/to/graylog-web.crt
    • The X.509 certificate chain file in PEM format to use for securing the web interface.
  • web_tls_key_file = /path/to/graylog-web.key
    • The PKCS#8 private key file in PEM format to use for securing the web interface.
  • web_tls_key_password = secret
    • The password to unlock the private key used for securing the web interface.
  • web_max_header_size = 8192
    • The maximum size of the HTTP request headers in bytes.
  • web_max_initial_line_length = 4096
    • The maximal length of the initial HTTP/1.1 line in bytes.
  • web_thread_pool_size = 16
    • The size of the thread pool used exclusively for serving the web interface.
Elasticsearch
  • elasticsearch_hosts = http://node1:9200,http://user:password@node2:19200
    • List of Elasticsearch hosts Graylog should connect to.
    • Need to be specified as a comma-separated list of valid URIs for the http ports of your elasticsearch nodes.
    • If one or more of your elasticsearch hosts require authentication, include the credentials in each node URI that requires authentication.
    • Default: http://127.0.0.1:9200
  • elasticsearch_connect_timeout = 10s
    • Maximum amount of time to wait for successfull connection to Elasticsearch HTTP port.
    • Default: 10 seconds
  • elasticsearch_socket_timeout = 60s
    • Maximum amount of time to wait for reading back a response from an Elasticsearch server.
    • Default: 60 seconds
  • elasticsearch_idle_timeout = -1s
    • Maximum idle time for an Elasticsearch connection. If this is exceeded, this connection will be tore down.
    • Default: infinity
  • elasticsearch_max_total_connections = 20
    • Maximum number of total connections to Elasticsearch.
    • Default: 20
  • elasticsearch_max_total_connections_per_route = 2
    • Maximum number of total connections per Elasticsearch route (normally this means per elasticsearch server).
    • Default: 2
  • elasticsearch_max_retries = 2
    • Maximum number of times Graylog will retry failed requests to Elasticsearch.
    • Default: 2
  • elasticsearch_discovery_enabled = false

    Warning

    Automatic node discovery does not work if Elasticsearch requires authentication, e. g. with Shield.

    Warning

    This setting must be false on AWS Elasticsearch Clusters (the hosted ones) and should be used carefully. In case of trouble with connections to ES this should be the first option to be disabled. See Automatic node discovery for more details.

  • elasticsearch_discovery_filter = rack:42
  • elasticsearch_discovery_frequency = 30s
    • Frequency of the Elasticsearch node discovery.
    • Default: 30 seconds
  • elasticsearch_compression_enabled = false
    • Enable payload compression for Elasticsearch requests.
    • Default: false
Rotation

Attention

The following settings identified with ! in this section have been moved to the database in Graylog 2.0. When you upgrade, make sure to set these to your previous 1.x settings so they will be migrated to the database!

  • rotation_strategy = count !
    • Graylog will use multiple indices to store documents in. You can configured the strategy it uses to determine when to rotate the currently active write index.
    • It supports multiple rotation strategies: - count of messages per index, use elasticsearch_max_docs_per_index - size per index, use elasticsearch_max_size_per_index
    • valid values are count, size and time, default is count.
  • elasticsearch_max_docs_per_index = 20000000 !
    • (Approximate) maximum number of documents in an Elasticsearch index before a new index is being created, also see no_retention and elasticsearch_max_number_of_indices.
    • Configure this if you used rotation_strategy = count above.
  • elasticsearch_max_size_per_index = 1073741824 !
    • (Approximate) maximum size in bytes per Elasticsearch index on disk before a new index is being created, also see no_retention and `elasticsearch_max_number_of_indices`. Default is 1GB.
    • Configure this if you used rotation_strategy = size above.
  • elasticsearch_max_time_per_index = 1d !
    • (Approximate) maximum time before a new Elasticsearch index is being created, also see no_retention and elasticsearch_max_number_of_indices. Default is 1 day.

    • Configure this if you used rotation_strategy = time above.

    • Please note that this rotation period does not look at the time specified in the received messages, but is using the real clock value to decide when to rotate the index!

    • Specify the time using a duration and a suffix indicating which unit you want:
      • 1w = 1 week
      • 1d = 1 day
      • 12h = 12 hours
    • Permitted suffixes are: d for day, h for hour, m for minute, s for second.

  • elasticsearch_max_number_of_indices = 20 !
    • How many indices do you want to keep?
  • retention_strategy = delete !
    • Decide what happens with the oldest indices when the maximum number of indices is reached.

    • The following strategies are availble:
      • delete - Deletes the index completely (Default)
      • close - Closes the index and hides it from the system. Can be re-opened later.

  • elasticsearch_disable_version_check = true
    • Disable checking the version of Elasticsearch for being compatible with this Graylog release.

    Warning

    Using Graylog with unsupported and untested versions of Elasticsearch may lead to data loss!

  • no_retention = false
    • Disable message retention, i. e. disable Elasticsearch index rotation.

    Note

    If this is set to true Graylog will never rotate indices - whatever is selected in the Webinterface.


Attention

The following settings identified with !! have been moved to the database in Graylog 2.2.0. When you upgrade, make sure to set these to your previous settings so they will be migrated to the database. This settings are read once at the very first startup to be the initial settings in the database.

  • elasticsearch_shards = 4 !!
    • The number of shards for your indices. A good setting here highly depends on the number of nodes in your Elasticsearch cluster. If you have one node, set it to 1.
  • elasticsearch_replicas = 0 !!
    • The number of replicas for your indices. A good setting here highly depends on the number of nodes in your Elasticsearch cluster. If you have one node, set it to 0.

    Note

    elasticsearch_shards and elasticsearch_replicas only applies to newly created indices.

  • elasticsearch_index_prefix = graylog !!
    • Prefix for all Elasticsearch indices and index aliases managed by Graylog.
  • elasticsearch_template_name = graylog-internal !!
    • Name of the Elasticsearch index template used by Graylog to apply the mandatory index mapping.
    • Default: graylog-internal
  • elasticsearch_analyzer = standard !!
    • Analyzer (tokenizer) to use for message and full_message field. The “standard” filter usually is a good idea.
    • All supported analyzers are: standard, simple, whitespace, stop, keyword, pattern, language, snowball, custom
    • Elasticsearch documentation: https://www.elastic.co/guide/en/elasticsearch/reference/5.6/analysis.html
    • Note that this setting only takes effect on newly created indices.
  • disable_index_optimization = false !!
    • Disable the optimization of Elasticsearch indices after index cycling. This may take some load from Elasticsearch on heavily used systems with large indices, but it will decrease search performance. The default is to optimize cycled indices.
  • index_optimization_max_num_segments = 1 !!
    • Optimize the index down to <= index_optimization_max_num_segments. A higher number may take some load from Elasticsearch on heavily used systems with large indices, but it will decrease search performance. The default is 1.

  • allow_leading_wildcard_searches = false
    • Do you want to allow searches with leading wildcards? This can be extremely resource hungry and should only be enabled with care.
    • See also: Searching
  • allow_highlighting = false
    • Do you want to allow searches to be highlighted? Depending on the size of your messages this can be memory hungry and should only be enabled after making sure your Elasticsearch cluster has enough memory.
  • elasticsearch_request_timeout = 1m
    • Global request timeout for Elasticsearch requests (e. g. during search, index creation, or index time-range calculations) based on a best-effort to restrict the runtime of Elasticsearch operations.
    • Default: 1m
  • elasticsearch_index_optimization_timeout = 1h
    • Global timeout for index optimization (force merge) requests.
    • Default: 1h
  • elasticsearch_index_optimization_jobs = 20
    • Maximum number of concurrently running index optimization (force merge) jobs.
    • If you are using lots of different index sets, you might want to increase that number.
    • Default: 20
  • index_ranges_cleanup_interval = 1h
    • Time interval for index range information cleanups. This setting defines how often stale index range information is being purged from the database.
    • Default: 1h
  • output_batch_size = 500
    • Batch size for the Elasticsearch output. This is the maximum (!) number of messages the Elasticsearch output module will get at once and write to Elasticsearch in a batch call. If the configured batch size has not been reached within output_flush_interval seconds, everything that is available will be flushed at once. Remember that every output buffer processor manages its own batch and performs its own batch write calls. (outputbuffer_processors variable)
  • output_flush_interval = 1
    • Flush interval (in seconds) for the Elasticsearch output. This is the maximum amount of time between two batches of messages written to Elasticsearch. It is only effective at all if your minimum number of messages for this time period is less than output_batch_size * outputbuffer_processors.
  • output_fault_count_threshold = 5

  • output_fault_penalty_seconds = 30
    • As stream outputs are loaded only on demand, an output which is failing to initialize will be tried over and over again. To prevent this, the following configuration options define after how many faults an output will not be tried again for an also configurable amount of seconds.
  • processbuffer_processors = 5

  • outputbuffer_processors = 3
    • The number of parallel running processors.
    • Raise this number if your buffers are filling up.

    Note

    The following settings (outputbuffer_processor_*) configure the thread pools backing each output buffer processor. See ThreadPoolExecutor for technical details.

  • outputbuffer_processor_keep_alive_time = 5000
    • When the number of threads is greater than the core (see outputbuffer_processor_threads_core_pool_size), this is the maximum time in milliseconds that excess idle threads will wait for new tasks before terminating.
  • outputbuffer_processor_threads_core_pool_size = 3
    • The number of threads to keep in the pool, even if they are idle
  • outputbuffer_processor_threads_max_pool_size = 30
    • The maximum number of threads to allow in the pool
  • udp_recvbuffer_sizes = 1048576
    • UDP receive buffer size for all message inputs (e. g. SyslogUDPInput).
  • processor_wait_strategy = blocking
    • Wait strategy describing how buffer processors wait on a cursor sequence. (default: sleeping)

    • Possible types:
      • yielding - Compromise between performance and CPU usage.
      • sleeping - Compromise between performance and CPU usage. Latency spikes can occur after quiet periods.
      • blocking - High throughput, low latency, higher CPU usage.
      • busy_spinning - Avoids syscalls which could introduce latency jitter. Best when threads can be bound to specific CPU cores.
  • ring_size = 65536
    • Size of internal ring buffers. Raise this if raising outputbuffer_processors does not help anymore.
    • For optimum performance your LogMessage objects in the ring buffer should fit in your CPU L3 cache.
    • Must be a power of 2. (512, 1024, 2048, ...)
  • inputbuffer_ring_size = 65536

  • inputbuffer_processors = 2

  • inputbuffer_wait_strategy = blocking

  • message_journal_enabled = true
    • Enable the disk based message journal.
  • message_journal_dir = data/journal
    • The directory which will be used to store the message journal. The directory must me exclusively used by Graylog and must not contain any other files than the ones created by Graylog itself.

    Attention

    If you create a seperate partition for the journal files and use a file system creating directories like ‘lost+found’ in the root directory, you need to create a sub directory for your journal. Otherwise Graylog will log an error message that the journal is corrupt and Graylog will not start.

  • message_journal_max_age = 12h

  • message_journal_max_size = 5gb
    • Journal hold messages before they could be written to Elasticsearch.
    • For a maximum of 12 hours or 5 GB whichever happens first.
    • During normal operation the journal will be smaller.
  • message_journal_flush_age = 1m
    • This setting allows specifying a time interval at which we will force an fsync of data written to the log. For example if this was set to 1000 we would fsync after 1000 ms had passed.
  • message_journal_flush_interval = 1000000
    • This setting allows specifying an interval at which we will force an fsync of data written to the log. For example if this was set to 1 we would fsync after every message; if it were 5 we would fsync after every five messages.
  • message_journal_segment_age = 1h
    • This configuration controls the period of time after which Graylog will force the log to roll even if the segment file isn’t full to ensure that retention can delete or compact old data.
  • message_journal_segment_size = 100mb

Attention

When the journal is full and it keeps receiving messages, it will start dropping messages as a FIFO queue: The first dropped message will be the first inserted and so on (and not some random).

  • async_eventbus_processors = 2
    • Number of threads used exclusively for dispatching internal events. Default is 2.
  • lb_recognition_period_seconds = 3
    • How many seconds to wait between marking node as DEAD for possible load balancers and starting the actual shutdown process. Set to 0 if you have no status checking load balancers in front.
  • lb_throttle_threshold_percentage = 95
    • Journal usage percentage that triggers requesting throttling for this server node from load balancers. The feature is disabled if not set.
  • stream_processing_timeout = 2000

  • stream_processing_max_faults = 3
    • Every message is matched against the configured streams and it can happen that a stream contains rules which take an unusual amount of time to run, for example if its using regular expressions that perform excessive backtracking.
    • This will impact the processing of the entire server. To keep such misbehaving stream rules from impacting other streams, Graylog limits the execution time for each stream.
    • The default values are noted below, the timeout is in milliseconds.
    • If the stream matching for one stream took longer than the timeout value, and this happened more than “max_faults” times that stream is disabled and a notification is shown in the web interface.
  • alert_check_interval = 60
    • Length of the interval in seconds in which the alert conditions for all streams should be checked and alarms are being sent.

Note

Since 0.21 the Graylog server supports pluggable output modules. This means a single message can be written to multiple outputs. The next setting defines the timeout for a single output module, including the default output module where all messages end up.

  • output_module_timeout = 10000
    • Time in milliseconds to wait for all message outputs to finish writing a single message.
  • stale_master_timeout = 2000
    • Time in milliseconds after which a detected stale master node is being rechecked on startup.
  • shutdown_timeout = 30000
    • Time in milliseconds which Graylog is waiting for all threads to stop on shutdown.
MongoDB
  • mongodb_uri = mongodb://...
    • MongoDB connection string. Enter your MongoDB connection and authentication information here.

    • See https://docs.mongodb.com/manual/reference/connection-string/ for details.

    • Examples:
      • Simple: mongodb://localhost/graylog
      • Authenticate against the MongoDB server: mongodb_uri = mongodb://grayloguser:secret@localhost:27017/graylog
      • Use a replica set instead of a single host: mongodb://grayloguser:secret@localhost:27017,localhost:27018,localhost:27019/graylog
      • DNS Seedlist is set as mongodb+srv://server.example.org/graylog.
  • mongodb_max_connections = 1000
    • Increase this value according to the maximum connections your MongoDB server can handle from a single client if you encounter MongoDB connection problems.
  • mongodb_threads_allowed_to_block_multiplier = 5
Email
  • transport_email_enabled = false

  • transport_email_hostname = mail.example.com

  • transport_email_port = 587

  • transport_email_use_auth = true

  • transport_email_use_tls = true
    • Enable SMTP with STARTTLS for encrypted connections.
  • transport_email_use_ssl = true
    • Enable SMTP over SSL (SMTPS) for encrypted connections.

Attention

Make sure to enable only one of these two settings because most (or all) SMTP services only support one of the encryption mechanisms on the same port. Most SMTP services support SMTP with STARTTLS while SMTPS is deprecated on most SMTP services. Setting both to false is needed when you want to sent via unencrypted connection.

  • transport_email_auth_username = you@example.com

  • transport_email_auth_password = secret

  • transport_email_subject_prefix = [graylog]

  • transport_email_from_email = graylog@example.com

  • transport_email_web_interface_url = https://graylog.example.com
    • Specify this to include links to the stream in your stream alert mails.
    • This should define the fully qualified base url to your web interface exactly the same way as it is accessed by your users.
HTTP
  • http_connect_timeout = 5s
    • The default connect timeout for outgoing HTTP connections.
    • Values must be a positive duration (and between 1 and 2147483647 when converted to milliseconds).
    • Default: 5s
  • http_read_timeout = 10s
    • The default read timeout for outgoing HTTP connections.
    • Values must be a positive duration (and between 1 and 2147483647 when converted to milliseconds).
    • Default: 10s
  • http_write_timeout = 10s
    • The default write timeout for outgoing HTTP connections.
    • Values must be a positive duration (and between 1 and 2147483647 when converted to milliseconds).
    • Default: 10s
  • http_proxy_uri =
    • HTTP proxy for outgoing HTTP connections

Attention

If you configure a proxy, make sure to also configure the “http_non_proxy_hosts” option so internal HTTP connections with other nodes does not go through the proxy.

  • http_non_proxy_hosts =
    • A list of hosts that should be reached directly, bypassing the configured proxy server.
    • This is a list of patterns separated by ”,”. The patterns may start or end with a “*” for wildcards.
    • Any host matching one of these patterns will be reached through a direct connection instead of through a proxy.
Others
  • rules_file = /etc/graylog/server/rules.drl
  • gc_warning_threshold = 1s
    • The threshold of the garbage collection runs. If GC runs take longer than this threshold, a system notification will be generated to warn the administrator about possible problems with the system. Default is 1 second.
  • ldap_connection_timeout = 2000
    • Connection timeout for a configured LDAP server (e. g. ActiveDirectory) in milliseconds.
  • disable_sigar = false
    • Disable the use of SIGAR for collecting system stats.
  • dashboard_widget_default_cache_time = 10s
    • The default cache time for dashboard widgets. (Default: 10 seconds, minimum: 1 second)
  • content_packs_loader_enabled = true
    • Automatically load content packs in “content_packs_dir” on the first start of Graylog.
  • content_packs_dir = data/contentpacks
    • The directory which contains content packs which should be loaded on the first start of Graylog.
  • content_packs_auto_load = grok-patterns.json
    • A comma-separated list of content packs (files in “content_packs_dir”) which should be applied on the first start of Graylog.
    • Default: empty
  • proxied_requests_thread_pool_size = 32
    • For some cluster-related REST requests, the node must query all other nodes in the cluster. This is the maximum number of threads available for this. Increase it, if /cluster/* requests take long to complete.
    • Should be rest_thread_pool_size * average_cluster_size if you have a high number of concurrent users.

The graylog-ctl script

Some packages of Graylog (for example the virtual machine appliances) ship with a pre-installed graylog-ctl script to allow you easy configuration of certain settings.

Important

graylog-ctl is only available in the virtual machine appliances, but not in the tar-ball (for manual setup), operating system packages, or configuration management scripts (Puppet, Chef, Ansible).

Configuration commands

The following commands are changing the configuration of Graylog:

Command Description
sudo graylog-ctl set-admin-password <password>
Set a new admin password
sudo graylog-ctl set-admin-username <username>
Set a different username for the admin user
sudo graylog-ctl set-email-config
<smtp server> [--port=<smtp port>
--user=<username>
--password=<password>
--from-email=<sender-address>
--web-url=<graylog web-interface url>
--no-tls --no-ssl]
Configure SMTP settings to send alert mails
sudo graylog-ctl set-timezone <zone acronym>

Set Graylog’s time zone from a list of valid time zones. Make sure system time is also set correctly with sudo dpkg-reconfigure tzdata.
sudo graylog-ctl enforce-ssl
Enforce HTTPS for the web interface
sudo graylog-ctl set-node-id <id>
Override random server node id
sudo graylog-ctl set-server-secret <secret>
Override server secret used for encryption
sudo graylog-ctl disable-internal-logging
Disable sending internal logs (e. g. nginx) from the VM to Graylog. Reboot is needed for activation!
sudo graylog-ctl set-external-ip
http[s]://<public IP>:port/
Configure an external IP in the Nginx proxy. This is needed to connect the web interface to the REST API e.g. in NAT’d networks or on AWS.
sudo graylog-ctl set-listen-address
--service <web|rest|transport|endpoint>
--address http://<host>:port
Set the listen address for the web interface, REST API, and the transport URI. As well as the endpoint uri that is used by the web browser to connect to the API. Can be used to deal with additional network interfaces.
sudo graylog-ctl local-connect
Bind all services but the web interface to 127.0.0.1
sudo graylog-ctl set-mongodb-password [-a|-g]
-u <username> -p <password>
Activate MongoDB authentication and set a password for an admin or unprivileged service user
sudo graylog-ctl backup-etcd
Backup the cluster configuration stored in etcd. See also the restore notes.

Commands for multi node setups:

Command Description
sudo graylog-ctl set-cluster-master <IP of master node>
Set IP address of node where others can fetch cluster configuration from
sudo graylog-ctl reconfigure-as-backend
Run Graylog server and Elasticsearch on this node
sudo graylog-ctl reconfigure-as-datanode
Run Elasticsearch on this node only
sudo graylog-ctl reconfigure-as-server
Run Graylog server on this node only

General commands:

Command Description
sudo graylog-ctl cleanse
Delete all graylog data, and start from scratch
sudo graylog-ctl graceful-kill
Attempt a graceful stop, then SIGKILL the entire process group
sudo graylog-ctl hup
Send the services a HUP signal
sudo graylog-ctl int
Send the services an INT signal
sudo graylog-ctl term
Send the services a TERM signal
sudo graylog-ctl kill
Send the services a KILL signal
sudo graylog-ctl list-servers
List all Graylog servers in your cluster
sudo graylog-ctl status
Show the status of all the services
sudo graylog-ctl start
Start services if they are down, and restart them if they stop
sudo graylog-ctl stop
Stop the services, and do not restart them
sudo graylog-ctl restart
Stop the services if they are running, then start them again
sudo graylog-ctl once
Start the services if they are down. Do not restart them if they stop
sudo graylog-ctl uninstall
Kill all processes and uninstall the process supervisor (data will be preserved)
sudo graylog-ctl tail
Watch the service logs of all enabled services
sudo graylog-ctl tail <service name>
Watch the logs of just one service, name can be ‘server’, ‘elasticsearch’, ‘mongodb’, ‘nginx’, ‘etcd’
sudo graylog-ctl show-config
Show the service configuration
sudo graylog-ctl reconfigure
Reconfigure the application

Important

After using a command that changes the application configuration re-run sudo graylog-ctl reconfigure to actually enable the changes.

Multi VM setup

At some point it make sense to not run all services on a single VM anymore. For performance reasons you might want to add more Elasticsearch nodes to the cluster or even add a second Graylog server. This can be achieved by changing IP addresses in the Graylog configuration files by hand or use our canned configurations which come with the graylog-ctl command.

The idea is to have one VM which is a central point for other VMs to fetch all needed configuration settings to join the cluster. Typically the first VM you spin up is used for this task. Automatically an instance of etcd is started and filled with the necessary settings for other hosts.

For example, to create a small cluster with a dedicated Graylog server node and another for Elasticsearch, spin up two VMs from the same Graylog image. On the first one start only Graylog and MongoDB:

vm1> sudo graylog-ctl set-admin-password sEcReT
vm1> sudo graylog-ctl reconfigure-as-server

On the second VM start only Elasticsearch. Before doing so set the IP of the first VM to fetch the configuration data from there:

vm2> sudo graylog-ctl set-cluster-master <ip-of-vm1>
vm2> sudo graylog-ctl reconfigure-as-datanode

vm1> sudo graylog-ctl reconfigure-as-server

This results in a perfectly fine dual VM setup. However if you want to scale this setup out by adding an additional Elasticsearch node, you can proceed in the same way:

vm3> sudo graylog-ctl set-cluster-master <ip-of-vm1>
vm3> sudo graylog-ctl reconfigure-as-datanode

vm1> sudo graylog-ctl reconfigure-as-server
vm2> sudo graylog-ctl reconfigure-as-datanode

Verify that all nodes are working as a cluster by going to the Kopf plugin on one of the Elasticsearch nodes open http://vm2:9200/_plugin/kopf/#!/nodes.

Important: In case you want to add a second Graylog server you have to set the same server secret on all machines. The secret is stored in the file /etc/graylog/graylog-secrets and can be applied to other hosts with the set-server-secret sub-command.

The following configuration modes do exist:

Command Services
sudo graylog-ctl reconfigure Regenerate configuration files based on /etc/graylog/graylog-services.json
sudo graylog-ctl reconfigure-as-server Run Graylog, web and MongoDB (no Elasticsearch)
sudo graylog-ctl reconfigure-as-backend Run Graylog, Elasticsearch and MongoDB (no nginx for web interface access)
sudo graylog-ctl reconfigure-as-datanode Run only Elasticsearch
sudo graylog-ctl enable-all-services Run all services on this box

A server with only the web interface running is not supported as of Graylog 2.0. The web interface is now included in the server process. But you can create your own service combinations by editing the file /etc/graylog/graylog-services.json by hand and enable or disable single services. Just run graylog-ctl reconfigure afterwards.

Extend disk space

All data of the appliance setup is stored in /var/opt/graylog/data. In order to extend the disk space mount a second (virtual) hard drive into this directory.

Important

Make sure to move old data to the new drive before and give the graylog user permissions to read and write here.

Example procedure for the Graylog virtual appliance

Note

These steps require basic knowledge in using Linux and the common shell programs.

  • Shutdown the virtual machine as preparation for creating a consistent snapshot.

  • Take a snapshot of the virtual machine in case something goes wrong.

  • Attach an additional hard drive to the virtual machine.

  • Start the virtual machine again.

  • Stop all services to prevent disk access:

    $ sudo graylog-ctl stop
    
  • Check for the logical name of the new hard drive. Usually this is /dev/sdb:

    $ sudo lshw -class disk
    
  • Partition and format new disk:

    $ sudo parted -a optimal /dev/sdb mklabel gpt
    # A reboot may be necessary at this point so that the updated GPT is being recognized by the operating system
    $ sudo parted -a optimal -- /dev/sdb unit compact mkpart primary ext3 "1" "-1"
    $ sudo mkfs.ext4 /dev/sdb1
    
  • Mount disk into temporary directory /mnt/tmp:

    $ sudo mkdir /mnt/tmp
    $ sudo mount /dev/sdb1 /mnt/tmp
    
  • Copy current data to new disk:

    $ sudo cp -ax /var/opt/graylog/data/* /mnt/tmp/
    
  • Compare both folders:

    # Output should be: Only in /mnt/tmp: lost+found
    $ sudo diff -qr --suppress-common-lines /var/opt/graylog/data /mnt/tmp
    
  • Delete old data:

    $ sudo rm -rf /var/opt/graylog/data/*
    
  • Mount new disk into /var/opt/graylog/data directory:

    $ sudo umount /mnt/tmp
    $ sudo mount /dev/sdb1 /var/opt/graylog/data
    
  • Make change permanent by adding an entry to /etc/fstab:

    $ echo '/dev/sdb1 /var/opt/graylog/data ext4 defaults 0 0' | sudo tee -a /etc/fstab
    
  • Reboot virtual machine:

    $ sudo shutdown -r now
    

Install Graylog plugins

The Graylog plugin directory is located in /opt/graylog/plugin/. Just drop a JAR file there and restart the server with sudo graylog-ctl restart graylog-server to load the plugin.

Install Elasticsearch plugins

Elasticsearch comes with a helper program to install additional plugins you can call it like this sudo JAVA_HOME=/opt/graylog/embedded/jre /opt/graylog/elasticsearch/bin/plugin

Install custom SSL certificates

During the first reconfigure run self signed SSL certificates are generated. You can replace this certificate with your own to prevent security warnings in your browser. Just drop the key and combined certificate file here: /opt/graylog/conf/nginx/ca/graylog.crt respectively /opt/graylog/conf/nginx/ca/graylog.key. Afterwards restart nginx with sudo graylog-ctl restart nginx.

Assign a static IP

Per default the appliance make use of DHCP to setup the network. If you want to access Graylog under a static IP please follow these instructions:

$ sudo ifdown eth0

Edit the file /etc/network/interfaces like this (just the important lines):

auto eth0
  iface eth0 inet static
  address <static IP address>
  netmask <netmask>
  gateway <default gateway>
  pre-up sleep 2

Activate the new IP and reconfigure Graylog to make use of it:

$ sudo ifup eth0
$ sudo graylog-ctl reconfigure

Wait some time until all services are restarted and running again. Afterwards you should be able to access Graylog with the new IP.

Upgrade Graylog

Caution

The Graylog omnibus package does not support unattended upgrading from Graylog 1.x to Graylog 2.x!

Caution

The Graylog omnibus package 2.3.0 and later, which contains Elasticsearch 5.5.0, can not be used in environments which have been running the Graylog omnibus package 1.x before and which still have indices created by Elasticsearch before version 2.0.0!

Always perform a full backup or snapshot of the appliance before proceeding. Only upgrade if the release notes say the next version is a drop-in replacement. Choose the Graylog version you want to install from the list of Omnibus packages . graylog_latest.deb always links to the newest version:

$ wget https://packages.graylog2.org/releases/graylog-omnibus/ubuntu/graylog_latest.deb
$ sudo graylog-ctl stop
$ sudo dpkg -G -i graylog_latest.deb
$ sudo graylog-ctl backup-etcd
$ sudo graylog-ctl reconfigure
$ sudo reboot

Error

In case the etcd service won’t start after the upgrade, an error is shown like:

Errno::ECONNREFUSED
-------------------
Connection refused - connect(2) for "127.0.0.1" port 4001``

Please flush and restore the etcd database like it’s shown in the restore etcd notes.

Migrate manually from 1.x to 2.x

To update a 1.x appliance to 2.x the administrator has to purge the Graylog installation, migrate the stored log data and install the new version as Omnibus package. Before upgrading read the upgrade notes. This procedure can potentially delete log data or configuration settings. So it’s absolutely necessary to perform a backup or a snapshot before!

Stop all services but Elasticsearch:

$ sudo -s
$ graylog-ctl stop graylog-web
$ graylog-ctl stop graylog-server
$ graylog-ctl stop mongodb
$ graylog-ctl stop nginx
$ graylog-ctl stop etcd

Check for index range types. The output of this command should be {}, if not read these notes for how to fix this:

$ curl -XGET <appliance_IP>:9200/_all/_mapping/index_range; echo
{}

Delete the Graylog index template:

$ curl -X DELETE <appliance_IP>:9200/_template/graylog-internal

Migrate appliance configuration:

$ cd /etc
$ mv graylog graylog2.2
$ vi graylog2.2/graylog-secrets.json

# Remove the graylog_web section
},  << don't forget the comma!
"graylog_web": {
  "secret_token": "3552c87f3e3..."
}

$ vi graylog2.2/graylog-services.json

# Remove the graylog_web section
}, << don't forget the comma!
"graylog_web": {
  "enabled": true
}

$ vi graylog2.2/graylog-settings.json

# Remove "rotation_size", "rotation_time", "indices"
"enforce_ssl": false,
"rotation_size": 1073741824,
"rotation_time": 0,
"indices": 10,
"journal_size": 1,

Migrate appliance data:

$ cd /var/opt
$ mv graylog graylog2.2
$ mv graylog2.2/data/elasticsearch/graylog2 graylog2.2/data/elasticsearch/graylog

Delete old Graylog version and install new Omnibus package:

$ wget http://packages.graylog2.org/releases/graylog-omnibus/ubuntu/graylog_2.2.1-1_amd64.deb
$ apt-get purge graylog
$ dpkg -i graylog_2.2.1-1_amd64.deb

Move directories back:

$ cd /etc
$ mv graylog2.2 graylog
$ cd /var/opt/
$ mv graylog2.2 graylog

Reconfigure and Reboot:

$ graylog-ctl reconfigure
$ reboot

Graylog should now be updated and old data still available.

Important

The index retention configuration moved from the Graylog configuration file to the web interface. After the first start go to ‘System -> Indices -> Update configuration’ to re-enable your settings.

Advanced Settings

To change certain parameters used by graylog-ctl during a reconfigure run you can override all default parameters found in the attributes file.

If you want to change the username used by Graylog for example, edit the file /etc/graylog/graylog-settings.json like this:

"custom_attributes": {
  "user": {
    "username": "log-user"
  }
}

Afterwards run sudo graylog-ctl reconfigure and sudo graylog-ctl restart. The first command renders all changed configuration files and the later makes sure that all services restart to activate the change.

There are a couple of other use cases of this, e.g. change the default data directories used by Graylog to /data (make sure this is writeable by the graylog user):

"custom_attributes": {
    "elasticsearch": {
      "data_directory": "/data/elasticsearch"
    },
    "mongodb": {
      "data_directory": "/data/mongodb"
    },
    "etcd": {
      "data_directory": "/data/etcd"
    },
    "graylog-server": {
      "journal_directory": "/data/journal"
    }
  }

Or change the default memory settings used by Graylog or Elasticsearch:

"custom_attributes": {
     "graylog-server": {
       "memory": "1700m"
     },
     "elasticsearch": {
       "memory": "2200m"
     }
   }

Again, run reconfigure and restart afterwards to activate the changes.

Securing an appliance

Even though the Graylog appliances are not meant for production use there are still two commands you can use to increase the security of an installation. With graylog-ctl local-connect only the web interface is reachable from the outside. All other services are listening on the local loopback device. This is only useful when you run the appliance as a single node. Clustered setups are not possible anymore. But data stored in MongoDB or Elastcsearch are protected from direct external access.

The other one is graylog-ctl set-mongodb-password. This command enables authentication for MongoDB and creates or updates a database user. First an admin user should be created. This user is needed for database maintenance and future password changes. Afterwards an unprivileged service user can be created for Graylog. The procedure works like this:

$ graylog-ctl set-mongodb-password -a -u admin -p someAdminPassword123
$ graylog-ctl set-mongodb-password -g -u graylog -p someGraylogServicePassword
$ graylog-ctl reconfigure

MongoDB and the Graylog server will be restarted with activated authentication. The username and password needs to be set on every Graylog node to make a cluster work. Login to another Graylog server and only set the service user:

$ graylog-ctl set-cluster-master 1.1.1.2
$ graylog-ctl set-mongodb-password -g -u graylog -p someGraylogServicePassword
$ graylog-ctl reconfigure-as-server

Since the pre-build appliances are based on standard Ubuntu-Linux, tools like iptables/SELinux/AppArmor can be used additionally. But to explain all available countermeasurements would go beyond this documentation.

Restore cluster configuration

With graylog-ctl backup-etcd a backup of the cluster configuration of a multi node setup can be created. In order to restore this backup copy the wal-file back to the data directory:

$ graylog-ctl stop etcd
$ rm -r /var/opt/graylog/data/etcd/member/*
$ cp /var/opt/graylog/backup/etcd/<timestamp>/member/wal /var/opt/graylog/data/etcd/member/
$ chown -R graylog.graylog /var/opt/graylog/data/etcd/member/wal
$ su -c '/opt/graylog/embedded/sbin/etcd -data-dir=/var/opt/graylog/data/etcd -force-new-cluster' graylog
<Ctrl-C>
$ graylog-ctl start etcd

Web interface

When your Graylog instance/cluster is up and running, the next thing you usually want to do is check out our web interface, which offers you great capabilities for searching and analyzing your indexed data and configuring your Graylog environment. Per default you can access it using your browser on http://<graylog-server>:9000/api/.

Overview

The Graylog web interface was rewritten in JavaScript for 2.0 to be a client-side single-page browser application. This means its code is running solely in your browser, fetching all data via HTTP(S) from the REST API of your Graylog server.

Note

Both the web interface URI (see web_listen_uri) and the REST API (see rest_listen_uri and rest_transport_uri) must be accessible by everyone using the web interface. This means that Graylog must listen on a public network interface or be exposed to one using a proxy, NAT, or a load balancer!

Single or separate listeners for web interface and REST API?

Since Graylog 2.1 you have two options when it comes to exposing its web interface:

  • Running both on the same port, using different paths (defaulting to http://localhost:9000/api/ for the REST API and http://localhost:9000/ for the web interface), this is the default since 2.1 and is assumed for most parts of the documentation.
  • Running on two different ports (for example http://localhost:12900/ for the REST API and http://localhost:9000/ for the web interface)

Note

When you are using the first option and you want to run the REST API and the web interface on the same host and port, the path part of both URIs (rest_listen_uri & web_listen_uri) must be different and the path part of web_listen_uri must be non-empty and different than /.

Configuration Options

If our default settings do not work for you, there is a number of options in the Graylog server configuration file which you can change to influence its behavior:

Setting Default Explanation
web_enable true Determines if the web interface endpoint is started or not.
web_listen_uri http://127.0.0.1:9000/ Default address the web interface listener binds to.
web_endpoint_uri If not set, rest_transport_uri will be used. This is the external address of the REST API of the Graylog server. Web interface clients need to be able to connect to this for the web interface to work.
web_enable_cors false Support Cross-Origin Resource Sharing for the web interface assets. Not required, because no REST calls are made to this listener. This setting is ignored, if the host and port parts of web_listen_uri and rest_listen_uri are identical.
web_enable_gzip true Serve web interface assets using compression. This setting is ignored, if the host and port parts of web_listen_uri and rest_listen_uri are identical.
web_enable_tls false Should the web interface serve assets using encryption or not. This setting is ignored, if the host and port parts of web_listen_uri and rest_listen_uri are identical.
web_tls_cert_file (no default) Path to TLS certificate file, if TLS is enabled. This setting is ignored, if the host and port parts of web_listen_uri and rest_listen_uri are identical.
web_tls_key_file (no default) Path to private key for certificate, used if TLS is enabled. This setting is ignored, if the host and port parts of web_listen_uri and rest_listen_uri are identical.
web_tls_key_password (no default) Password for TLS key (if it is encrypted). This setting is ignored, if the host and port parts of web_listen_uri and rest_listen_uri are identical.
web_thread_pool_size 16 Number of threads used for web interface listener. This setting is ignored, if the host and port parts of web_listen_uri and rest_listen_uri are identical.

How does the web interface connect to the Graylog server?

The web interface is fetching all information it is showing from the REST API of the Graylog server. Therefore it needs to connect to it using HTTP(S). There are several ways how you can define which way the web interface connects to the Graylog server. The URI used by the web interface is determined in this exact order:

  • If the HTTP(S) client going to the web interface port sends a X-Graylog-Server-URL header, which contains a valid URL, then this is overriding everything else.
  • If web_endpoint_uri is defined in the Graylog configuration file, this is used if the aforementioned header is not set.
  • If both are not defined, rest_transport_uri is used.

Browser Compatibility

Writing the web interface as a single-page application is a challenging task. We want to provide the best possible experience to everyone, which often means using modern web technology only available in recent browsers, while keeping a reasonable compatibility with old and less-capable browsers. These browsers are officially supported in Graylog 2.0:

Browser OS Minimum Version
Chrome Windows, OS X, Linux 50
Firefox Windows, OS X, Linux 45 / 38 ESR
Internet Explorer Windows 11
Microsoft Edge Windows 25
Safari OS X 9

Please take into account that you need to enable JavaScript in order to use Graylog web interface.

Making the web interface work with load balancers/proxies

If you want to run a load balancer/reverse proxy in front of Graylog, you need to make sure that:

  • The REST API port is accessible for clients
  • The address for the Graylog server’s REST API is properly set (as explained in How does the web interface connect to the Graylog server?), so it is resolvable and accessible for any client of the web interface.
  • You are either using only HTTP or only HTTPS (no mixed content) for both the web interface endpoint and the REST API endpoint.
  • If you use SSL, your certificates must be valid and trusted by your clients.

Note

To help you with your specific environment, we have some example configurations. We take the following assumption in all examples. Your Graylog server.conf has the following settings set rest_listen_uri = http://127.0.0.1:9000/api/ and web_listen_uri = http://127.0.0.1:9000/. Your URL will be graylog.example.org with the IP 192.168.0.10.

Using a Layer 3 load balancer (forwarding TCP Ports)
  1. Configure your load balancer to forward connections going to 192.168.0.10:80 to 127.0.0.1:9000 (web_listen_uri) and 192.168.0.10:9000/api/ to 127.0.0.1:9000/api/ (rest_listen_uri).
  2. Set web_endpoint_uri in your Graylog server config to http://graylog.example.org:9000/api/.
  3. Start the Graylog server as usual.
  4. Access the web interface on http://graylog.example.org.
  5. Read up on Using HTTPS.
NGINX

REST API and Web Interface on one port (using HTTP):

server
{
    listen 80 default_server;
    listen [::]:80 default_server ipv6only=on;
    server_name graylog.example.org;

    location / {
      proxy_set_header Host $http_host;
      proxy_set_header X-Forwarded-Host $host;
      proxy_set_header X-Forwarded-Server $host;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header X-Graylog-Server-URL http://$server_name/api;
      proxy_pass       http://127.0.0.1:9000;
    }
}

NGINX can be used for SSL Termination, you would only need to modify the server listen directive and add all Information about your certificate.

If you are running multiple Graylog Server you might want to use HTTPS/SSL to connect to the Graylog Servers (on how to Setup read Using HTTPS) and use HTTPS/SSL on NGINX. The configuration for TLS certificates, keys and ciphers is omitted from the sample config for brevity’s sake.

REST API and Web Interface on one port (using HTTPS/SSL):

server
{
    listen      443 ssl spdy;
    server_name graylog.example.org;
    # <- your SSL Settings here!

    location /
    {
      proxy_set_header Host $http_host;
      proxy_set_header X-Forwarded-Host $host;
      proxy_set_header X-Forwarded-Server $host;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header X-Graylog-Server-URL https://$server_name/api;
      proxy_pass       http://127.0.0.1:9000;
    }
}
Apache httpd 2.x

REST API and Web Interface on one port (using HTTP):

<VirtualHost *:80>
    ServerName graylog.example.org
    ProxyRequests Off
    <Proxy *>
        Order deny,allow
        Allow from all
    </Proxy>

    <Location />
        RequestHeader set X-Graylog-Server-URL "http://graylog.example.org/api/"
        ProxyPass http://127.0.0.1:9000/
        ProxyPassReverse http://127.0.0.1:9000/
    </Location>

</VirtualHost>

REST API and Web Interface on one port (using HTTPS/SSL):

<VirtualHost *:443>
    ServerName graylog.example.org
    ProxyRequests Off
    SSLEngine on
    # <- your SSL Settings here!

    <Proxy *>
        Order deny,allow
        Allow from all
    </Proxy>

    <Location />
        RequestHeader set X-Graylog-Server-URL "https://graylog.example.org/api/"
        ProxyPass http://127.0.0.1:9000/
        ProxyPassReverse http://127.0.0.1:9000/
    </Location>

</VirtualHost>
HAProxy 1.6

REST API and Web Interface on one port (using HTTP):

frontend http
    bind 0.0.0.0:80

    option forwardfor
    http-request add-header X-Forwarded-Host %[req.hdr(host)]
    http-request add-header X-Forwarded-Server %[req.hdr(host)]
    http-request add-header X-Forwarded-Port %[dst_port]
    acl is_graylog hdr_dom(host) -i -m str graylog.example.org
    use_backend     graylog if is_graylog

backend graylog
    description     The Graylog Web backend.
    http-request set-header X-Graylog-Server-URL http://graylog.example.org/api
    use-server graylog_1
    server graylog_1 127.0.0.1:9000 maxconn 20 check

Multiple Backends (roundrobin) with Health-Check (using HTTP):

frontend graylog_http
    bind *:80
    option forwardfor
    http-request add-header X-Forwarded-Host %[req.hdr(host)]
    http-request add-header X-Forwarded-Server %[req.hdr(host)]
    http-request add-header X-Forwarded-Port %[dst_port]
    acl is_graylog hdr_dom(host) -i -m str graylog.example.org
    use_backend     graylog

backend graylog
    description     The Graylog Web backend.
    balance roundrobin
    option httpchk HEAD /api/system/lbstatus
    http-request set-header X-Graylog-Server-URL http://graylog.example.org/api
    server graylog1 192.168.0.10:9000 maxconn 20 check
    server graylog2 192.168.0.11:9000 maxconn 20 check
    server graylog3 192.168.0.12:9000 maxconn 20 check

Load balancer integration

When running multiple Graylog servers a common deployment scenario is to route the message traffic through an IP load balancer. By doing this we can achieve both a highly available setup, as well as increasing message processing throughput, by simply adding more servers that operate in parallel.

Load balancer state

However, load balancers usually need some way of determining whether a backend service is reachable and healthy or not. For this purpose Graylog exposes a load balancer state that is reachable via its REST API.

There are two ways the load balancer state can change:

  • due to a lifecycle change (e.g. the server is starting to accept messages, or shutting down)
  • due to manual intervention via the REST API

Note

In the following examples we assume that the Graylog REST API is available on the URI path /api/ (e. g. http://graylog.example.com/api/).

To query the current load balancer status of a Graylog instance, all you need to do is to issue a HTTP call to its REST API:

GET /api/system/lbstatus

The status knows two different states, ALIVE and DEAD, which is also the text/plain response of the resource. Additionally, the same information is reflected in the HTTP status codes: If the state is ALIVE the return code will be 200 OK, for DEAD it will be 503 Service unavailable. This is done to make it easier to configure a wide range of load balancer types and vendors to be able to react to the status.

The resource is accessible without authentication to make it easier for load balancers to access it.

To programmatically change the load balancer status, an additional endpoint is exposed:

PUT /api/system/lbstatus/override/alive
PUT /api/system/lbstatus/override/dead

Only authenticated and authorized users are able to change the status, in the currently released Graylog version this means only admin users can change it.

Graceful shutdown

Often, when running a service behind a load balancer, the goal is to be able to perform zero-downtime upgrades, by taking one of the servers offline, upgrading it, and then bringing it back online. During that time the remaining servers can take the load seamlessly.

By using the load balancer status API described above one can already perform such a task. However, it would still be guesswork when the Graylog server is done processing all the messages it already accepted.

For this purpose Graylog supports a graceful shutdown command, also accessible via the web interface and API. It will set the load balancer status to DEAD, stop all inputs, turn on messages processing (should it have been disabled manually previously), and flush all messages in memory to Elasticsearch. After all buffers and caches are processed, it will shut itself down safely.

_images/nodes_more_action.png

Web Interface

It is possible to use the Graylog web interface behind a load balancer for high availability purposes.

Note

Take care of the configuration you need with a proxy setup, as it will not work out of the box.

Using HTTPS

We highly recommend securing your Graylog installation using SSL/TLS to make sure that no sensitive data is sent over the wire in plain text. To make this work, you need to do two things:

  • Enable TLS for the Graylog REST API (rest_enable_tls)
  • Enable TLS for the web interface endpoint (web_enable_tls)

You also need to make sure that you have proper certificates in place, which are valid and trusted by the clients. Not enabling TLS for either one of them will result in a browser error about mixed content and the web interface will cease to work.

Note

If you’re operating a single-node setup and would like to use HTTPS for the Graylog web interface and the Graylog REST API, it’s possible to use NGINX or Apache as a reverse proxy.

Certificate/Key file format

When you are configuring TLS, you need to make sure that your certificate/key files are in the right format, which is X.509 for certificates and PKCS#8 for the private keys. Both must to be stored in PEM format.

Creating a self-signed private key/certificate

Create a file named openssl-graylog.cnf with the following content (customized to your needs):

[req]
distinguished_name = req_distinguished_name
x509_extensions = v3_req
prompt = no

# Details about the issuer of the certificate
[req_distinguished_name]
C = US
ST = Some-State
L = Some-City
O = My Company
OU = My Division
CN = graylog.example.com

[v3_req]
keyUsage = keyEncipherment, dataEncipherment
extendedKeyUsage = serverAuth
subjectAltName = @alt_names

# IP addresses and DNS names the certificate should include
# Use IP.### for IP addresses and DNS.### for DNS names,
# with "###" being a consecutive number.
[alt_names]
IP.1 = 203.0.113.42
DNS.1 = graylog.example.com

Create PKCS#5 private key and X.509 certificate:

$ openssl version
OpenSSL 0.9.8zh 14 Jan 2016
$ openssl req -x509 -days 365 -nodes -newkey rsa:2048 -config openssl-graylog.cnf -keyout pkcs5-plain.pem -out cert.pem
Generating a 2048 bit RSA private key
............................+++
.+++
writing new private key to 'pkcs5-plain.pem'
-----

Convert PKCS#5 private key into a unencrypted PKCS#8 private key:

$ openssl pkcs8 -in pkcs5-plain.pem -topk8 -nocrypt -out pkcs8-plain.pem

Convert PKCS#5 private key into an encrypted PKCS#8 private key (using the passphrase secret):

$ openssl pkcs8 -in pkcs5-plain.pem -topk8 -out pkcs8-encrypted.pem -passout pass:secret

Converting a PKCS #12 (PFX) file to private key and certificate pair

PKCS #12 key stores (PFX files) are commonly used on Microsoft Windows.

In this example, the PKCS #12 (PFX) file is named keystore.pfx:

$ openssl pkcs12 -in keystore.pfx -nokeys -out graylog-certificate.pem
$ openssl pkcs12 -in keystore.pfx -nocerts -out graylog-pkcs5.pem
$ openssl pkcs8 -in graylog-pkcs5.pem -topk8 -out graylog-key.pem

The resulting graylog-certificate.pem and graylog-key.pem can be used in the Graylog configuration file.

Converting an existing Java Keystore to private key/certificate pair

This section describes how to export a private key and certificate from an existing Java KeyStore in JKS format.

The starting point is an existing Java KeyStore in JKS format which contains a private key and certificate which should be used in Graylog:

$ keytool -list -v -keystore keystore.jks -alias graylog.example.com
Enter keystore password:
Alias name: graylog.example.com
Creation date: May 10, 2016
Entry type: PrivateKeyEntry
Certificate chain length: 1
Certificate[1]:
Owner: CN=graylog.example.com, OU=Unknown, O="Graylog, Inc.", L=Hamburg, ST=Hamburg, C=DE
Issuer: CN=graylog.example.com, OU=Unknown, O="Graylog, Inc.", L=Hamburg, ST=Hamburg, C=DE
Serial number: 2b33832d
Valid from: Tue May 10 10:02:34 CEST 2016 until: Mon Aug 08 10:02:34 CEST 2016
Certificate fingerprints:
       MD5:  8A:3D:9F:ED:69:93:1B:6C:E3:29:66:EA:82:8D:42:BE
       SHA1: 5B:27:92:25:46:36:BC:F0:82:8F:9A:30:D8:50:D0:ED:32:4D:C6:A0
       SHA256: 11:11:77:F5:F6:6A:20:A8:E6:4A:5D:B5:20:21:4E:B8:FE:B6:38:1D:45:6B:ED:D0:7B:CE:B8:C8:BC:DD:B4:FB
       Signature algorithm name: SHA256withRSA
       Version: 3

Extensions:

#1: ObjectId: 2.5.29.14 Criticality=false
SubjectKeyIdentifier [
KeyIdentifier [
0000: AC 79 64 9F A1 60 14 B9   51 F4 F5 0B B3 B5 02 A5  .yd..`..Q.......
0010: B8 07 DC 7B                                        ....
]
]

The Java KeyStore in JKS format has to be converted to a PKCS#12 keystore, so that OpenSSL can work with it:

$ keytool -importkeystore -srckeystore keystore.jks -destkeystore keystore.p12 -deststoretype PKCS12
Enter destination keystore password:
Re-enter new password:
Enter source keystore password:
Entry for alias graylog.example.com successfully imported.
Import command completed:  1 entries successfully imported, 0 entries failed or cancelled

After the keystore has been successfully converted into PKCS#12 format, OpenSSL can export the X.509 certificate with PEM encoding:

$ openssl pkcs12 -in keystore.p12 -nokeys -out graylog-certificate.pem
Enter Import Password:
MAC verified OK

The private key can only be exported in PKCS#5 format with PEM encoding:

$ openssl pkcs12 -in keystore.p12 -nocerts -out graylog-pkcs5.pem
Enter Import Password:
MAC verified OK
Enter PEM pass phrase:
Verifying - Enter PEM pass phrase:

Graylog currently only supports PKCS#8 private keys with PEM encoding, so OpenSSL has to convert it into the correct format:

$ openssl pkcs8 -in graylog-pkcs5.pem -topk8 -out graylog-key.pem
Enter pass phrase for graylog-pkcs5.pem:
Enter Encryption Password:
Verifying - Enter Encryption Password:

The working directory should now contain the PKCS#8 private key (graylog-key.pem) and the X.509 certificate (graylog-certificate.pem) to be used with Graylog:

$ head graylog-key.pem graylog-certificate.pem
==> graylog-key.pem <==
-----BEGIN ENCRYPTED PRIVATE KEY-----
MIIE6TAbBgkqhkiG9w0BBQMwDgQIwMhLa5bw9vgCAggABIIEyN42AeYJJNBEiqhI
mWqJDot4Jokw2vB4abcIJ5Do4+7tjtMrecVRCDSvBZzjkXjnbumBHEoxexe5f0/z
wgq6f/UDyTM3uKYQTG91fcqTyMDUlo3Wc8OqSqsNehOAQzA7hMCehqgNJHO0Zfny
EFvrXHurJWi4eA9vLRup86dbm4Wp3o8pmjOLduXieHfcgVtm5jfd7XfL5cRFS8kS
bSFH4v8xDxLNaJmKkKl9gPCACMRbO9nGk/Z9q9N8zkj+xG9lxlNRMX51SRzg20E0
nyyKTb39tJF35zjroB2HfiFWyrPQ1uF6yGoroGvu0L3eWosjBLjdRs0eBgjJCm5P
ic9zSVqMH6/4CPKJqvB97vP4QhpYcr9jlYJsbn6Zg4OIELpM00VLvp0yU9tqTuRR
TDPYAlNMLZ2RrV52CEsh3zO21WHM7r187x4WHgprDFnjkXf02DrFhgCsGwkEQnb3
vj86q13RHhqoXT4W0zugvcv2/NBLMv0HNQBAfEK3X1YBmtQpEJhwSxeszA1i7CpU

==> graylog-certificate.pem <==
Bag Attributes
    friendlyName: graylog.example.com
    localKeyID: 54 69 6D 65 20 31 34 36 32 38 36 37 38 32 33 30 39 32
subject=/C=DE/ST=Hamburg/L=Hamburg/O=Graylog, Inc./OU=Unknown/CN=graylog.example.com
issuer=/C=DE/ST=Hamburg/L=Hamburg/O=Graylog, Inc./OU=Unknown/CN=graylog.example.com
-----BEGIN CERTIFICATE-----
MIIDkTCCAnmgAwIBAgIEKzODLTANBgkqhkiG9w0BAQsFADB5MQswCQYDVQQGEwJE
RTEQMA4GA1UECBMHSGFtYnVyZzEQMA4GA1UEBxMHSGFtYnVyZzEWMBQGA1UEChMN
R3JheWxvZywgSW5jLjEQMA4GA1UECxMHVW5rbm93bjEcMBoGA1UEAxMTZ3JheWxv
Zy5leGFtcGxlLmNvbTAeFw0xNjA1MTAwODAyMzRaFw0xNjA4MDgwODAyMzRaMHkx

The resulting PKCS#8 private key (graylog-key.pem) and the X.509 certificate (graylog-certificate.pem) can now be used to enable encrypted connections with Graylog by enabling TLS for the Graylog REST API and the web interface in the Graylog configuration file:

# Enable HTTPS support for the REST API. This secures the communication with the REST API
# using TLS to prevent request forgery and eavesdropping.
rest_enable_tls = true

# The X.509 certificate chain file in PEM format to use for securing the REST API.
rest_tls_cert_file = /path/to/graylog-certificate.pem

# The PKCS#8 private key file in PEM format to use for securing the REST API.
rest_tls_key_file = /path/to/graylog-key.pem

# The password to unlock the private key used for securing the REST API.
rest_tls_key_password = secret

# Enable HTTPS support for the web interface. This secures the communication the web interface
# using TLS to prevent request forgery and eavesdropping.
web_enable_tls = true

# The X.509 certificate chain file in PEM format to use for securing the web interface.
web_tls_cert_file = /path/to/graylog-certificate.pem

# The PKCS#8 private key file in PEM format to use for securing the web interface.
web_tls_key_file = /path/to/graylog-key.pem

# The password to unlock the private key used for securing the web interface.
web_tls_key_password = secret

Sample files

This section show the difference between following private key formats with samples.

PKCS#5 plain private key:

-----BEGIN RSA PRIVATE KEY-----
MIIBOwIBAAJBANxtmQ1Kccdp7HBNt8zgTai48Vv617bj4SnhkcMN99sCQ2Naj/sp
[...]
NiCYNLiCawBbpZnYw/ztPVACK4EwOpUy+u19cMB0JA==
-----END RSA PRIVATE KEY-----

PKCS#8 plain private key:

-----BEGIN PRIVATE KEY-----
MIIBVAIBADANBgkqhkiG9w0BAQEFAASCAT4wggE6AgEAAkEA6GZN0rQFKRIVaPOz
[...]
LaLGdd9G63kLg85eldSy55uIAXsvqQIgfSYaliVtSbAgyx1Yfs3hJ+CTpNKzTNv/
Fx80EltYV6k=
-----END PRIVATE KEY-----

PKCS#5 encrypted private key:

-----BEGIN RSA PRIVATE KEY-----
Proc-Type: 4,ENCRYPTED
DEK-Info: DES-EDE3-CBC,E83B4019057F55E9

iIPs59nQn4RSd7ppch9/vNE7PfRSHLoQFmaAjaF0DxjV9oucznUjJq2gphAB2E2H
[...]
y5IT1MZPgN3LNkVSsLPWKo08uFZQdfu0JTKcn7NPyRc=
-----END RSA PRIVATE KEY-----

PKCS#8 encrypted private key:

-----BEGIN ENCRYPTED PRIVATE KEY-----
MIIBpjBABgkqhkiG9w0BBQ0wMzAbBgkqhkiG9w0BBQwwDgQIU9Y9p2EfWucCAggA
[...]
IjsZNp6zmlqf/RXnETsJjGd0TXRWaEdu+XOOyVyPskX2177X9DUJoD31
-----END ENCRYPTED PRIVATE KEY-----

Adding a self-signed certificate to the JVM trust store

Graylog nodes inside a cluster need to communicate with each other using the Graylog REST API. When using HTTPS for the Graylog REST API, the X.509 certificate must be trusted by the JVM trust store (similar to the trusted CA bundle in an operating system), otherwise communication will fail.

Important

If you are using different X.509 certificates for each Graylog node, you have to add all of them into the JVM trust store of each Graylog node.

The default trust store of an installed Java runtime environment can be found at $JAVA_HOME/jre/lib/security/cacerts. In order not to “pollute” the official trust store, we make a copy of it which we will use with Graylog instead:

$ cp -a "${JAVA_HOME}/jre/lib/security/cacerts" /path/to/cacerts.jks

After the original key store file has been copied, we can add the self-signed certificate (cert.pem, see Creating a self-signed private key/certificate) to the key store (the default password is changeit):

$ keytool -importcert -keystore /path/to/cacerts.jks -storepass changeit -alias graylog-self-signed -file cert.pem
Owner: CN=graylog.example.com, O="Graylog, Inc.", L=Hamburg, ST=Hamburg, C=DE
Issuer: CN=graylog.example.com, O="Graylog, Inc.", L=Hamburg, ST=Hamburg, C=DE
Serial number: 8c80134cee556734
Valid from: Tue Jun 14 16:38:17 CEST 2016 until: Wed Jun 14 16:38:17 CEST 2017
Certificate fingerprints:
       MD5:  69:D1:B3:01:46:0D:E9:45:FB:C6:6C:69:EA:38:ED:3E
       SHA1: F0:64:D0:1B:3B:6B:C8:01:D5:4D:33:36:87:F0:FB:10:E1:36:21:9E
       SHA256: F7:F2:73:3D:86:DC:10:22:1D:14:B8:5D:66:B4:EB:48:FD:3D:74:89:EC:C4:DF:D0:D2:EC:F8:5D:78:49:E7:2F
       Signature algorithm name: SHA1withRSA
       Version: 3

Extensions:

[Other details about the certificate...]

Trust this certificate? [no]:  yes
Certificate was added to keystore

To verify that the self-signed certificate has indeed been added, it can be listed with the following command:

$ keytool -keystore /path/to/cacerts.jks -storepass changeit -list | grep graylog-self-signed -A1
graylog-self-signed, Jun 14, 2016, trustedCertEntry,
Certificate fingerprint (SHA1): F0:64:D0:1B:3B:6B:C8:01:D5:4D:33:36:87:F0:FB:10:E1:36:21:9E

The printed certificate fingerprint (SHA1) should match the one printed when importing the self-signed certificate.

In order for the JVM to pick up the new trust store, it has to be started with the JVM parameter -Djavax.net.ssl.trustStore=/path/to/cacerts.jks. If you’ve been using another password to encrypt the JVM trust store than the default changeit, you additionally have to set the JVM parameter -Djavax.net.ssl.trustStorePassword=secret.

Most start and init scripts for Graylog provide a JAVA_OPTS variable which can be used to pass the javax.net.ssl.trustStore and (optionally) javax.net.ssl.trustStorePassword system properties.

Note

The default location to change the JVM parameter depends on your installation type and is documented with all other default locations.

Warning

Without adding the previously created Java keystore to the JVM parameters, Graylog won’t be able to verify any self-signed certificates or custom CA certificates.

Disabling specific TLS ciphers and algorithms

Since Java 7u76 it is possible to disable specific TLS algorithms and ciphers for secure connections.

In order to disable specific TLS algorithms and ciphers, you need to provide a properties file with a list of disabled algorithms and ciphers. Take a look at the example security.properties in the Graylog source repository.

For example, if you want to disable all algorithms except for TLS 1.2, the properties file has to contain the following line:

jdk.tls.disabledAlgorithms=SSLv2Hello, SSLv3, TLSv1, TLSv1.1

If additionally you want to disable DSA/RSA key sizes lower than 2048 bits and EC key sizes lower than 160 bits, the properties file has to contain the following line:

jdk.tls.disabledAlgorithms=SSLv2Hello, SSLv3, TLSv1, TLSv1.1, EC keySize < 160, RSA keySize < 2048, DSA keySize < 2048

To load the properties file into a JVM, you have to pass it to Java using the java.security.properties system property:

java -Djava.security.properties=/path/to/security.properties -jar /path/to/graylog.jar server

Most start and init scripts for Graylog provide a JAVA_OPTS variable which can be used to pass the java.security.properties system property.

Multi-node Setup

This guide doesn’t provide a step-by-step tutorial for building a multi-node Graylog cluster but does simply give some advice for questions that might arise during the setup.

It’s important for such a project that you understand each step in the setup process and do some planning upfront. Without a proper roadmap of all the things you want to achieve with a Graylog cluster, you will be lost on the way.

Graylog should be the last component you install in this setup. Its dependencies, namely MongoDB and Elasticsearch, have to be up and running first.

Important

This guide doesn’t include instructions for running a multi-node Graylog cluster in an untrusted network. We assume that the connection between the hosts is trusted and doesn’t have to be secured individually.

Prerequisites

Every server which is part of this setup should have the software requirements installed to run the targeted software. All software requirements can be found in the installation manual.

We highly recommend that the system time on all systems is kept in sync via NTP or a similar mechanism. Needless to say that DNS resolution must be working, too. Because everything is a freaking DNS problem.

In order to simplify the installation process, the servers should have a working Internet connection.

MongoDB replica set

We recommend to deploy a MongoDB replica set.

MongoDB doesn’t have to run on dedicated servers for the workload generated by Graylog, but you should follow the recommendations given in the MongoDB documentation about architecture. Most important is that you have an odd number of MongoDB servers in the replica set.

In most setups, each Graylog server will also host an instance of MongoDB which is part of the same replica set and shares the data with all other nodes in the cluster.

Note

To avoid unauthorized access to your MongoDB database, the MongoDB replica set should be setup with authentication.

The correct order of working steps should be as follows:

  1. Create the replica set (rs01)
  2. Create the database (graylog)
  3. Create a user account for accessing the database, which has the roles readWrite and dbAdmin.

If your MongoDB needs to be reachable over network you should set the IP with bind_ip in the configuration.

Elasticsearch cluster

The Elasticsearch setup documentation should help you to install Elasticsearch with a robust base configuration.

It is important to name the Elasticsearch cluster not simply named elasticsearch to avoid accidental conflicts with Elasticsearch nodes using the default configuration. Just choose anything else (we recommend graylog), because this is the default name and any Elasticsearch instance that is started in the same network will try to connect to this cluster.

The Elasticsearch servers need one IP that can be reached over network set in network.host and some participants of the cluster in discovery.zen.ping.unicast.hosts. That is enough to have a minimal cluster setup.

When you secure your Elasticsearch with User Authentification you need to add credentials to the Graylog configuration to be able to use the secured Elasticsearch cluster with Graylog.

Graylog Multi-node

After the installation of Graylog, you should take care that only one Graylog node is configured to be master with the configuration setting is_master = true.

The URI configured in rest_listen_uri (or rest_transport_uri) must be accessable for all Graylog nodes of the cluster.

Graylog to MongoDB connection

The mongodb_uri configuration setting must include all MongoDB nodes forming the replica set, the name of the replica set, as well as the previously configured user account with access to the replica set. The configuration setting is a normal MongoDB connection string.

Finally, the MongoDB connection string in the Graylog configuration file should look like this:

mongodb_uri = mongodb://USERNAME:PASSWORD@mongodb-node01:27017,mongodb-node02:27017,mongodb-node03:27017/graylog?replicaSet=rs01
Graylog to Elasticsearch connection

Graylog will connect to the Elasticsearch REST API.

To avoid issues with the connection to the Elasticsearch cluster you should add some of the network addresses of the Elasticsearch nodes to elasticsearch_hosts.

Graylog web interface

By default, the web interface can be used on every instance of Graylog which hasn’t disabled it with the configuration setting web_enable = false.

It’s possible to use a loadbalancer in front of all Graylog servers, please refer to Making the web interface work with load balancers/proxies for more details.

Depending on your setup, it’s possible to either use a hardware loadbalancer for TLS/HTTPS termination, a reverse proxy, or to simply enable it in the Graylog node.

Scaling

Each component in this multi-node setup can be scaled on the individual needs.

Depending on the amount of messages ingested and how long messages should be available for direct search, the Elasticsearch cluster will need most of the resources on your setup.

Keep an eye on the Metrics of each part of the cluster. One option is to use telegraf to fetch importand metrics and store them in your favorite metric system (e. g. Graphite, Prometheus or Influx).

Elasticseach Metrics and some administration can be done with Elastic HQ or Cerebro. Those will help you to understand the Elasticsearch cluster health and behavior.

Graylog Metrics can be monitored with the Graylog Metrics Reporter plugins which are able to send the internal Graylog metrics to your favorite metrics collector (e. g. Graphite or Prometheus).

Up until today, we have almost never faced the issue that the MongoDB replica set needed special attention. But of course you should still monitor it and store its metrics - just to be sure.

Troubleshooting

  • After every configuration change or service restart, watch the logfile of the applications you have worked on. Sometimes other log files can also give you hints about what went wrong. For example if you’re configuring Graylog and try to find out why the connection to the MongoDB isn’t working, the MongoDB logs can help to identify the problem.
  • If HTTPS has been enabled for the Graylog REST API, it need to be setup for the Graylog web interface, too.

Elasticsearch

We strongly recommend to use a dedicated Elasticsearch cluster for your Graylog setup.

If you are using a shared Elasticsearch setup, a problem with indices unrelated to Graylog might turn the cluster status to YELLOW or RED and impact the availability and performance of your Graylog setup.

Elasticsearch versions

Starting with version 2.3, Graylog uses the HTTP protocol to connect to your Elasticsearch cluster, so it does not have a hard requirement for the Elasticsearch version anymore. We can safely assume that any version starting from 2.x is working.

Caution

Graylog 2.4 does not work with Elasticsearch 6.x yet!

Note

Graylog works fine with the Amazon Elasticsearch Service using Elasticsearch 5.3.x or later.

Configuration

Caution

As Graylog has switched from an embedded Elasticsearch node client to a lightweight HTTP client in version 2.3, please check the upgrade notes how to migrate your configuration if you are switching from an earlier version.

Graylog

The most important setting to make a successful connection is a list of comma-separated URIs to one or more Elasticsearch nodes. Graylog needs to know the address of at least one other Elasticsearch node given in the elasticsearch_hosts setting. The specified value should at least contain the scheme (http:// for unencrypted, https:// for encrypted connections), the hostname or IP and the port of the HTTP listener (which is 9200 unless otherwise configured) of this node. Optionally, you can also specify an authentication section containing a user name and a password, if either your Elasticsearch node uses Shield/X-Pack or Search Guard, or you have an intermediate HTTP proxy requiring authentication in between the Graylog server and the Elasticsearch node. Additionally you can specify an optional path prefix at the end of the URI.

A sample specification of elasticsearch_hosts could look like this:

elasticsearch_hosts = http://es-node-1.example.org:9200/foo,https://someuser:somepassword@es-node-2.example.org:19200

Caution

Graylog assumes that all nodes in the cluster are running the same versions of Elasticsearch. While it might work when patch-levels differ, we highly encourage to keep versions consistent.

Warning

Graylog does not react to externally triggered index changes (creating/closing/reopening/deleting an index) anymore. All of these actions need to be performed through the Graylog REST API in order to retain index consistency.

Available Elasticsearch configuration tunables

The following configuration options are now being used to configure connectivity to Elasticsearch:

Config Setting Type Comments Default
elasticsearch_connect_timeout Duration Timeout when connection to individual Elasticsearch hosts 10s (10 Seconds)
elasticsearch_hosts List<URI> Comma-separated list of URIs of Elasticsearch hosts http://127.0.0.1:9200
elasticsearch_idle_timeout Duration Timeout after which idle connections are terminated -1s (Never)
elasticsearch_max_total_connections int Maximum number of total Elasticsearch connections 20
elasticsearch_max_total_connections_per_route int Maximum number of Elasticsearch connections per route/host 2
elasticsearch_socket_timeout Duration Timeout when sending/receiving from Elasticsearch connection 60s (60 Seconds)
elasticsearch_discovery_enabled boolean Enable automatic Elasticsearch node discovery false
elasticsearch_discovery_filter String Filter by node attributes for the discovered nodes empty (use all nodes)
elasticsearch_discovery_frequency Duration Frequency of the Elasticsearch node discovery 30s (30 Seconds)
elasticsearch_compression_enabled boolean Enable GZIP compression of Elasticseach request payloads false
Automatic node discovery

Caution

Authentication with the Elasticsearch cluster will not work if the automatic node discovery is being used.

Caution

Automatic node discovery does not work when using the Amazon Elasticsearch Service because Amazon blocks certain Elasticsearch API endpoints.

Graylog uses automatic node discovery to gather a list of all available Elasticsearch nodes in the cluster at runtime and distribute requests among them to potentially increase performance and availability. To enable this feature, you need to set the elasticsearch_discovery_enabled to true. Optionally, you can define the a filter allowing to selectively include/exclude discovered nodes (details how to specify node filters are found in the Elasticsearch documentation) using the elasticsearch_discovery_filter setting, or tuning the frequency of the node discovery using the elasticsearch_discovery_frequency configuration option.

Configuration of Elasticsearch nodes
Control access to Elasticsearch ports

If you are not using Shield/X-Pack or Search Guard to authenticate access to your Elasticsearch nodes, make sure to restrict access to the Elasticsearch ports (default: 9200/tcp and 9300/tcp). Otherwise the data is readable by anyone who has access to the machine over network.

Open file limits

Because Elasticsearch has to keep a lot of files open simultaneously it requires a higher open file limit that the usual operating system defaults allow. Set it to at least 64000 open file descriptors.

Graylog will show a notification in the web interface when there is a node in the Elasticsearch cluster which has a too low open file limit.

Read about how to raise the open file limit in the corresponding 2.x / 5.x documentation pages.

Heap size

It is strongly recommended to raise the standard size of heap memory allocated to Elasticsearch. Just set the ES_HEAP_SIZE environment variable to for example 24g to allocate 24GB. We recommend to use around 50% of the available system memory for Elasticsearch (when running on a dedicated host) to leave enough space for the system caches that Elasticsearch uses a lot. But please take care that you don’t cross 32 GB!

Merge throttling

Elasticsearch is throttling the merging of Lucene segments to allow extremely fast searches. This throttling however has default values that are very conservative and can lead to slow ingestion rates when used with Graylog. You would see the message journal growing without a real indication of CPU or memory stress on the Elasticsearch nodes. It usually goes along with Elasticsearch INFO log messages like this:

now throttling indexing

When running on fast IO like SSDs or a SAN we recommend to increase the value of the indices.store.throttle.max_bytes_per_sec in your elasticsearch.yml to 150MB:

indices.store.throttle.max_bytes_per_sec: 150mb

Play around with this setting until you reach the best performance.

Tuning Elasticsearch

Graylog is already setting specific configuration for every index it is managing. This is enough tuning for a lot of use cases and setups.

More detailed information about the configuration of Elasticsearch can be found in the official documentation.

Avoiding split-brain and shard shuffling

Split-brain events

Elasticsearch sacrifices consistency in order to ensure availability, and partition tolerance. The reasoning behind that is that short periods of misbehaviour are less problematic than short periods of unavailability. In other words, when Elasticsearch nodes in a cluster are unable to replicate changes to data, they will keep serving applications such as Graylog. When the nodes are able to replicate their data, they will attempt to converge the replicas and to achieve eventual consistency.

Elasticsearch tackles the previous by electing master nodes, which are in charge of database operations such as creating new indices, moving shards around the cluster nodes, and so forth. Master nodes coordinate their actions actively with others, ensuring that the data can be converged by non-masters. The cluster nodes that are not master nodes are not allowed to make changes that would break the cluster.

The previous mechanism can in some circumstances fail, causing a split-brain event. When an Elasticsearch cluster is split into two sides, both thinking they are the master, data consistency is lost as the masters work independently on the data. As a result the nodes will respond differently to same queries. This is considered a catastrophic event, because the data from two masters can not be rejoined automatically, and it takes quite a bit of manual work to remedy the situation.

Avoiding split-brain events

Elasticsearch nodes take a simple majority vote over who is master. If the majority agrees that they are the master, then most likely the disconnected minority has also come to conclusion that they can not be the master, and everything is just fine. This mechanism requires at least 3 nodes to work reliably however, because one or two nodes can not form a majority.

The minimum amount of master nodes required to elect a master must be configured manually in elasticsearch.yml:

# At least NODES/2+1 on clusters with NODES > 2, where NODES is the number of master nodes in the cluster
discovery.zen.minimum_master_nodes: 2

The configuration values should typically for example:

Master nodes minimum_master_nodes Comments
1 1  
2 1 With 2 the other node going down would stop the cluster from working!
3 2  
4 3  
5 3  
6 4  

Some of the master nodes may be dedicated master nodes, meaning they are configured just to handle lightweight operational (cluster management) responsibilities. They will not handle or store any of the cluster’s data. The function of such nodes is similar to so called witness servers on other database products, and setting them up on dedicated witness sites will greatly reduce the chance of Elasticsearch cluster instability.

A dedicated master node has the following configuration in elasticsearch.yml:

node.data: false
node.master: true
Shard shuffling

When cluster status changes, for example because of node restarts or availability issues, Elasticsearch will start automatically rebalancing the data in the cluster. The cluster works on making sure that the amount of shards and replicas will conform to the cluster configuration. This is a problem if the status changes are just temporary. Moving shards and replicas around in the cluster takes considerable amount of resources, and should be done only when necessary.

Avoiding unnecessary shuffling

Elasticsearch has couple configuration options, which are designed to allow short times of unavailability before starting the recovery process with shard shuffling. There are 3 settings that may be configured in elasticsearch.yml:

# Recover only after the given number of nodes have joined the cluster. Can be seen as "minimum number of nodes to attempt recovery at all".
gateway.recover_after_nodes: 8
# Time to wait for additional nodes after recover_after_nodes is met.
gateway.recover_after_time: 5m
# Inform ElasticSearch how many nodes form a full cluster. If this number is met, start up immediately.
gateway.expected_nodes: 10

The configuration options should be set up so that only minimal node unavailability is tolerated. For example server restarts are common, and should be done in managed manner. The logic is that if you lose large part of your cluster, you probably should start re-shuffling the shards and replicas without tolerating the situation.

Custom index mappings

Sometimes it’s useful to not rely on Elasticsearch’s dynamic mapping but to define a stricter schema for messages.

Note

If the index mapping is conflicting with the actual message to be sent to Elasticsearch, indexing that message will fail.

Graylog itself is using a default mapping which includes settings for the timestamp, message, full_message, and source fields of indexed messages:

$ curl -X GET 'http://localhost:9200/_template/graylog-internal?pretty'
{
  "graylog-internal" : {
    "order" : -2147483648,
    "template" : "graylog_*",
    "settings" : { },
    "mappings" : {
      "message" : {
        "_ttl" : {
          "enabled" : true
        },
        "_source" : {
          "enabled" : true
        },
        "dynamic_templates" : [ {
          "internal_fields" : {
            "mapping" : {
              "index" : "not_analyzed",
              "type" : "string"
            },
            "match" : "gl2_*"
          }
        }, {
          "store_generic" : {
            "mapping" : {
              "index" : "not_analyzed"
            },
            "match" : "*"
          }
        } ],
        "properties" : {
          "full_message" : {
            "analyzer" : "standard",
            "index" : "analyzed",
            "type" : "string"
          },
          "streams" : {
            "index" : "not_analyzed",
            "type" : "string"
          },
          "source" : {
            "analyzer" : "analyzer_keyword",
            "index" : "analyzed",
            "type" : "string"
          },
          "message" : {
            "analyzer" : "standard",
            "index" : "analyzed",
            "type" : "string"
          },
          "timestamp" : {
            "format" : "yyyy-MM-dd HH:mm:ss.SSS",
            "type" : "date"
          }
        }
      }
    },
    "aliases" : { }
  }
}

In order to extend the default mapping of Elasticsearch and Graylog, you can create one or more custom index mappings and add them as index templates to Elasticsearch.

Let’s say we have a schema for our data like the following:

Field Name Field Type Example
http_method string GET
http_response_code long 200
ingest_time date 2016-06-13T15:00:51.927Z
took_ms long 56

This would translate to the following additional index mapping in Elasticsearch:

"mappings" : {
  "message" : {
    "properties" : {
      "http_method" : {
        "type" : "string",
        "index" : "not_analyzed"
      },
      "http_response_code" : {
        "type" : "long"
      },
      "ingest_time" : {
        "type" : "date",
        "format": "strict_date_time"
      },
      "took_ms" : {
        "type" : "long"
      }
    }
  }
}

The format of the ingest_time field is described in the Elasticsearch documentation about the format mapping parameter. Also make sure to check the Elasticsearch documentation about Field datatypes.

In order to apply the additional index mapping when Graylog creates a new index in Elasticsearch, it has to be added to an index template. The Graylog default template (graylog-internal) has the lowest priority and will be merged with the custom index template by Elasticsearch.

Warning

If the default index mapping and the custom index mapping cannot be merged (e. g. because of conflicting field datatypes), Elasticsearch will throw an exception and won’t create the index. So be extremely cautious and conservative about the custom index mappings!

Creating a new index template

Save the following index template for the custom index mapping into a file named graylog-custom-mapping.json:

{
  "template": "graylog_*",
  "mappings" : {
    "message" : {
      "properties" : {
        "http_method" : {
          "type" : "string",
          "index" : "not_analyzed"
        },
        "http_response_code" : {
          "type" : "long"
        },
        "ingest_time" : {
          "type" : "date",
          "format": "strict_date_time"
        },
        "took_ms" : {
          "type" : "long"
        }
      }
    }
  }
}

Finally, load the index mapping into Elasticsearch with the following command:

$ curl -X PUT -d @'graylog-custom-mapping.json' 'http://localhost:9200/_template/graylog-custom-mapping?pretty'
{
  "acknowledged" : true
}

Every Elasticsearch index created from that time on, will have an index mapping consisting of the original graylog-internal index template and the new graylog-custom-mapping template:

$ curl -X GET 'http://localhost:9200/graylog_deflector/_mapping?pretty'
{
  "graylog_2" : {
    "mappings" : {
      "message" : {
        "_ttl" : {
          "enabled" : true
        },
        "dynamic_templates" : [ {
          "internal_fields" : {
            "mapping" : {
              "index" : "not_analyzed",
              "type" : "string"
            },
            "match" : "gl2_*"
          }
        }, {
          "store_generic" : {
            "mapping" : {
              "index" : "not_analyzed"
            },
            "match" : "*"
          }
        } ],
        "properties" : {
          "full_message" : {
            "type" : "string",
            "analyzer" : "standard"
          },
          "http_method" : {
            "type" : "string",
            "index" : "not_analyzed"
          },
          "http_response_code" : {
            "type" : "long"
          },
          "ingest_time" : {
            "type" : "date",
            "format" : "strict_date_time"
          },
          "message" : {
            "type" : "string",
            "analyzer" : "standard"
          },
          "source" : {
            "type" : "string",
            "analyzer" : "analyzer_keyword"
          },
          "streams" : {
            "type" : "string",
            "index" : "not_analyzed"
          },
          "timestamp" : {
            "type" : "date",
            "format" : "yyyy-MM-dd HH:mm:ss.SSS"
          },
          "took_ms" : {
            "type" : "long"
          }
        }
      }
    }
  }
}

Note

When using different index sets every index set can have its own mapping.

Deleting custom index templates

If you want to remove an existing index template from Elasticsearch, simply issue a DELETE request to Elasticsearch:

$ curl -X DELETE 'http://localhost:9200/_template/graylog-custom-mapping?pretty'
{
  "acknowledged" : true
}

After you’ve removed the index template, new indices will only have the original index mapping:

$ curl -X GET 'http://localhost:9200/graylog_deflector/_mapping?pretty'
{
  "graylog_3" : {
    "mappings" : {
      "message" : {
        "_ttl" : {
          "enabled" : true
        },
        "dynamic_templates" : [ {
          "internal_fields" : {
            "mapping" : {
              "index" : "not_analyzed",
              "type" : "string"
            },
            "match" : "gl2_*"
          }
        }, {
          "store_generic" : {
            "mapping" : {
              "index" : "not_analyzed"
            },
            "match" : "*"
          }
        } ],
        "properties" : {
          "full_message" : {
            "type" : "string",
            "analyzer" : "standard"
          },
          "message" : {
            "type" : "string",
            "analyzer" : "standard"
          },
          "source" : {
            "type" : "string",
            "analyzer" : "analyzer_keyword"
          },
          "streams" : {
            "type" : "string",
            "index" : "not_analyzed"
          },
          "timestamp" : {
            "type" : "date",
            "format" : "yyyy-MM-dd HH:mm:ss.SSS"
          }
        }
      }
    }
  }
}

Note

Settings and index mappings in templates are only applied to new indices. After adding, modifying, or deleting an index template, you have to manually rotate the write-active indices of your index sets for the changes to take effect.

Rotate indices manually

Select the desired index set on the System / Indices page in the Graylog web interface by clicking on the name of the index set, then select “Rotate active write index” from the “Maintenance” dropdown menu.

_images/rotate_index_1.png _images/rotate_index_2.png

Cluster Status explained

Elasticsearch provides a classification for the cluster health.

The cluster status applies to different levels:

  • Shard level - see status descriptions below
  • Index level - inherits the status of the worst shard status
  • Cluster level - inherits the status of the worst index status

That means that the Elasticsearch cluster status can turn red if a single index or shard has problems even though the rest of the indices/shards are okay.

Note

Graylog checks the status of the current write index while indexing messages. If that one is GREEN or YELLOW, Graylog will continue to write messages into Elasticsearch regardless of the overall cluster status.

Explanation of the different status levels:

RED

The RED status indicates that some or all of the primary shards are not available.

In this state, no searches can be performed until all primary shards have been restored.

YELLOW

The YELLOW status means that all of the primary shards are available but some or all shard replicas are not.

With only one Elasticsearch node, the cluster state cannot become green because shard replicas cannot be assigned.

In most cases, this can be solved by adding another Elasticsearch node to the cluster or by reducing the replication factor of the indices (which means less resiliency against node outages, though).

GREEN

The cluster is fully operational. All primary and replica shards are available.

Index model

Overview

Graylog is transparently managing one or more sets of Elasticsearch indices to optimize search and analysis operations for speed and low resource consumption.

To enable managing indices with different mappings, analyzers, and replication settings Graylog is using so-called index sets which are an abstraction of all these settings.

_images/index_set_overview.png

Each index set contains the necessary settings for Graylog to create, manage, and fill Elasticsearch indices and handle index rotation and data retention for specific requirements.

_images/index_set_details.png

Graylog is maintaining an index alias per index set which is always pointing to the current write-active index from that index set. There is always exactly one index to which new messages are written until the configured rotation criterion (number of documents, index size, or index age) has been met.

A background task continuously checks if the rotation criterion of an index set has been met and a new index is created and prepared when that happens. Once the index is ready, the index alias is atomically switched to it. That means that all Graylog nodes can write messages to the into alias without even knowing what the currently write-active index of the index set is.

_images/index_model_write.png

Almost every read operation is performed with a given time range. Because Graylog is writing messages sequentially into Elasticsearch it can keep information about the time range each index covers. It selects a lists of indices to query when having a time range provided. If no time range was provided, it will search in all indices it knows.

_images/index_model_read.png
Eviction of indices and messages

There are configuration settings for the maximum number of indices Graylog is managing in a given index set.

Depending on the configured retention strategy, the oldest indices of an index set will automatically be closed, deleted, or exported when the configured maximum number of indices has been reached.

The deletion is performed by the Graylog master node in a background thread which is continuously comparing the number of indices with the configured maximum:

INFO : org.graylog2.indexer.rotation.strategies.AbstractRotationStrategy - Deflector index <graylog_95> should be rotated, Pointing deflector to new index now!
INFO : org.graylog2.indexer.MongoIndexSet - Cycling from <graylog_95> to <graylog_96>.
INFO : org.graylog2.indexer.MongoIndexSet - Creating target index <graylog_96>.
INFO : org.graylog2.indexer.indices.Indices - Created Graylog index template "graylog-internal" in Elasticsearch.
INFO : org.graylog2.indexer.MongoIndexSet - Waiting for allocation of index <graylog_96>.
INFO : org.graylog2.indexer.MongoIndexSet - Index <graylog_96> has been successfully allocated.
INFO : org.graylog2.indexer.MongoIndexSet - Pointing index alias <graylog_deflector> to new index <graylog_96>.
INFO : org.graylog2.system.jobs.SystemJobManager - Submitted SystemJob <f1018ae0-dcaa-11e6-97c3-6c4008b8fc28> [org.graylog2.indexer.indices.jobs.SetIndexReadOnlyAndCalculateRangeJob]
INFO : org.graylog2.indexer.MongoIndexSet - Successfully pointed index alias <graylog_deflector> to index <graylog_96>.

Index Set Configuration

Index sets have a variety of different settings related to how Graylog will store messages into the Elasticsearch cluster.

_images/index_set_create.png
  • Title: A descriptive name of the index set.
  • Description: A description of the index set for human consumption.
  • Index prefix: A unique prefix used for Elasticsearch indices managed by the index set. The prefix must start with a letter or number, and can only contain letters, numbers, _, - and +. The index alias will be named accordingly, e. g. graylog_deflector if the index prefix was graylog.
  • Analyzer: (default: standard) The Elasticsearch analyzer for the index set.
  • Index shards: (default: 4) The number of Elasticsearch shards used per index.
  • Index replicas: (default: 0) The number of Elasticsearch replicas used per index.
  • Max. number of segments: (default: 1) The maximum number of segments per Elasticsearch index after index optimization (force merge), see Segment Merging for details.
  • Disable index optimization after rotation: Disable Elasticsearch index optimization (force merge) after index rotation. Only activate this if you have serious problems with the performance of your Elasticsearch cluster during the optimization process.
Index rotation
  • Message count: Rotates the index after a specific number of messages have been written.
  • Index size: Rotates the index after an approximate size on disk (before optimization) has been reached.
  • Index time: Rotates the index after a specific time (e. g. 1 hour or 1 week).
_images/index_set_create_rotation.png
Index retention
  • Delete: Delete indices in Elasticsearch to minimize resource consumption.
  • Close: Close indices in Elasticsearch to reduce resource consumption.
  • Do nothing
  • Archive: Commercial feature, see Archiving.
_images/index_set_create_retention.png

Maintenance

Keeping the index ranges in sync

Graylog will take care of calculating index ranges automatically as soon as a new index has been created.

In case the stored metadata about index time ranges has run out of sync, Graylog will notify you in the web interface. This can happen if an index was deleted manually or messages from already “closed” indices were removed.

The system will offer you to just re-generate all time range information. This may take a few seconds but is an easy task for Graylog.

You can easily re-build the information yourself after manually deleting indices or doing other changes that might cause synchronization problems:

$ curl -XPOST http://127.0.0.1:9000/api/system/indices/ranges/rebuild

This will trigger a system job:

INFO : org.graylog2.indexer.ranges.RebuildIndexRangesJob - Recalculating index ranges.
INFO : org.graylog2.system.jobs.SystemJobManager - Submitted SystemJob <9b64a9d0-dcac-11e6-97c3-6c4008b8fc28> [org.graylog2.indexer.ranges.RebuildIndexRangesJob]
INFO : org.graylog2.indexer.ranges.RebuildIndexRangesJob - Recalculating index ranges for index set Default index set (graylog2_*): 5 indices affected.
INFO : org.graylog2.indexer.ranges.MongoIndexRangeService - Calculated range of [graylog_96] in [7ms].
INFO : org.graylog2.indexer.ranges.RebuildIndexRangesJob - Created ranges for index graylog_96: MongoIndexRange{id=null, indexName=graylog_96, begin=2017-01-17T11:49:02.529Z, end=2017-01-17T12:00:01.492Z, calculatedAt=2017-01-17T12:00:58.097Z, calculationDuration=7, streamIds=[000000000000000000000001]}
[...]
INFO : org.graylog2.indexer.ranges.RebuildIndexRangesJob - Done calculating index ranges for 5 indices. Took 44ms.
INFO : org.graylog2.system.jobs.SystemJobManager - SystemJob <9b64a9d0-dcac-11e6-97c3-6c4008b8fc28> [org.graylog2.indexer.ranges.RebuildIndexRangesJob] finished in 46ms.
Manually rotating the active write index

Sometimes you might want to rotate the active write index manually and not wait until the configured rotation criterion for in the latest index has been met, for example if you’ve changed the index mapping or the number of shards per index.

You can do this either via an HTTP request against the REST API of the Graylog master node or via the web interface:

$ curl -XPOST http://127.0.0.1:9000/api/system/deflector/cycle
_images/index_set_maintenance.png

Triggering this job produces log output similar to the following lines:

INFO : org.graylog2.rest.resources.system.DeflectorResource - Cycling deflector for index set <58501f0b4a133077ecd134d9>. Reason: REST request.
INFO : org.graylog2.indexer.MongoIndexSet - Cycling from <graylog_97> to <graylog_98>.
INFO : org.graylog2.indexer.MongoIndexSet - Creating target index <graylog_98>.
INFO : org.graylog2.indexer.indices.Indices - Created Graylog index template "graylog-internal" in Elasticsearch.
INFO : org.graylog2.indexer.MongoIndexSet - Waiting for allocation of index <graylog_98>.
INFO : org.graylog2.indexer.MongoIndexSet - Index <graylog_98> has been successfully allocated.
INFO : org.graylog2.indexer.MongoIndexSet - Pointing index alias <graylog_deflector> to new index <graylog_98>.
INFO : org.graylog2.system.jobs.SystemJobManager - Submitted SystemJob <024aac80-dcad-11e6-97c3-6c4008b8fc28> [org.graylog2.indexer.indices.jobs.SetIndexReadOnlyAndCalculateRangeJob]
INFO : org.graylog2.indexer.MongoIndexSet - Successfully pointed index alias <graylog_deflector> to index <graylog_98>.
INFO : org.graylog2.indexer.retention.strategies.AbstractIndexCountBasedRetentionStrategy - Number of indices (5) higher than limit (4). Running retention for 1 index.
INFO : org.graylog2.indexer.retention.strategies.AbstractIndexCountBasedRetentionStrategy - Running retention strategy [org.graylog2.indexer.retention.strategies.DeletionRetentionStrategy] for index <graylog_94>
INFO : org.graylog2.indexer.retention.strategies.DeletionRetentionStrategy - Finished index retention strategy [delete] for index <graylog_94> in 23ms.

Backup

When it comes to backup in a Graylog setup it is not easy to answer. You need to consider what type of backup will suit your needs.

Your Graylog Server setup and settings are easy to backup with a MongoDB dump and a filesystem backup of all configuration files.

The data within your Elasticsearch Cluster can take the advantage of the Snapshot and Restore function that are offered by Elasticsearch.

Default file locations

Each installation flavor of Graylog will place the configuration files into a specific location on the local files system. The goal of this section is to provide a short overview about the most common and most important default file locations.

DEB package

This paragraph covers Graylog installations on Ubuntu Linux, Debian Linux, and Debian derivates installed with the DEB package.

Graylog
  File system path
Configuration /etc/graylog/server/server.conf
Logging configuration /etc/graylog/server/log4j2.xml
Plugins /usr/share/graylog-server/plugin
JVM settings /etc/default/graylog-server
Message journal files /var/lib/graylog-server/journal
Log Files /var/log/graylog-server/
Elasticsearch

Note

These are only the most common file locations. Please refer to the Elasticsearch documentation for a comprehensive list of default file locations.

  File system path
Configuration /etc/elasticsearch
JVM settings /etc/default/elasticsearch
Data files /var/lib/elasticsearch/data
Log files /var/log/elasticsearch/
MongoDB
  File system path
Configuration /etc/mongod.conf
Data files /var/lib/mongodb/
Log files /var/log/mongodb/

RPM package

This paragraph covers Graylog installations on Fedora Linux, Red Hat Enterprise Linux, CentOS Linux, and other Red Hat Linux derivates installed with the RPM package.

Graylog
  File system path
Configuration /etc/graylog/server/server.conf
Logging configuration /etc/graylog/server/log4j2.xml
Plugins /usr/share/graylog-server/plugin
JVM settings /etc/sysconfig/graylog-server
Message journal files /var/lib/graylog-server/journal
Log Files /var/log/graylog-server/
Elasticsearch

Note

These are only the most common file locations. Please refer to the Elasticsearch documentation for a comprehensive list of default file locations.

  File system path
Configuration /etc/elasticsearch
JVM settings /etc/sysconfig/elasticsearch
Data files /var/lib/elasticsearch/
Log files /var/log/elasticsearch/
MongoDB
  File system path
Configuration /etc/mongod.conf
Data files /var/lib/mongodb/
Log files /var/log/mongodb/

Omnibus package

This paragraph covers Graylog installations via OVA, on AWS (via AMI), and on OpenStack using the Graylog Omnibus package.

Graylog
  File system path
Configuration /opt/graylog/conf/graylog.conf
Logging configuration /opt/graylog/conf/log4j2.xml
Plugins /opt/graylog/plugin
JVM settings /etc/graylog/graylog-settings.json
Message journal files /var/opt/graylog/data/journal
Log files /var/log/graylog/server/
Elasticsearch

Note

These are only the most common file locations. Please refer to the Elasticsearch documentation for a comprehensive list of default file locations.

  File system path
Configuration /opt/graylog/conf/elasticsearch/
JVM settings /etc/graylog/graylog-settings.json
Data files /var/opt/graylog/data/elasticsearch
Log files /var/log/graylog/elasticsearch/
MongoDB
  File system path
Configuration /etc/graylog/graylog-settings.json
Data files /var/opt/graylog/data/mongodb
Log files /var/log/graylog/mongodb/

Graylog REST API

The functionality Graylog REST API is very comprehensive; even the Graylog web interface is exclusively using Graylog REST API to interact with the Graylog cluster.

To connect to the Graylog REST API with a web browser, just add api-browser to your current rest_listen_uri setting or use the API browser button on the nodes overview page (System / Nodes in the web interface).

For example if your Graylog REST API is listening on http://192.168.178.26:9000/api/, the API browser will be available at http://192.168.178.26:9000/api/api-browser/.

_images/system_nodes_overview.png

Note

The customized version of Swagger UI used by Graylog does currently only work in Google Chrome and Firefox.

Using the API browser

After providing the credentials (username and password), you can browse all available HTTP resources of the Graylog REST API.

_images/use_api_browser.png

Interacting with the Graylog REST API

While having a graphical UI for the Graylog REST API is perfect for interactive usage and exploratory learning, the real power unfolds when using the Graylog REST API for automation or integrating Graylog into another system, such as monitoring or ticket systems.

Naturally, the same operations the API browser offers can be used on the command line or in scripts. A very common HTTP client being used for this kind of interaction is curl.

Note

In the following examples, the username GM and password superpower will be used to demonstrate how to work with the Graylog REST API running at http://192.168.178.26:9000/api.

The following command displays Graylog cluster information as JSON, exactly the same information the web interface is displaying on the System / Nodes page:

curl -u GM:superpower -H 'Accept: application/json' -X GET 'http://192.168.178.26:9000/api/cluster?pretty=true'

The Graylog REST API will respond with the following information:

{
  "71ab6aaa-cb39-46be-9dac-4ba99fed3d66" : {
    "facility" : "graylog-server",
    "codename" : "Smuttynose",
    "node_id" : "71ab6aaa-cb39-46be-9dac-4ba99fed3d66",
    "cluster_id" : "3adaf799-1551-4239-84e5-6ed939b56f62",
    "version" : "2.1.1+01d50e5",
    "started_at" : "2016-09-23T10:39:00.179Z",
    "hostname" : "gm-01-c.fritz.box",
    "lifecycle" : "running",
    "lb_status" : "alive",
    "timezone" : "Europe/Berlin",
    "operating_system" : "Linux 3.10.0-327.28.3.el7.x86_64",
    "is_processing" : true
  },
  "ed0ad32d-8776-4d25-be2f-a8956ecebdcf" : {
    "facility" : "graylog-server",
    "codename" : "Smuttynose",
    "node_id" : "ed0ad32d-8776-4d25-be2f-a8956ecebdcf",
    "cluster_id" : "3adaf799-1551-4239-84e5-6ed939b56f62",
    "version" : "2.1.1+01d50e5",
    "started_at" : "2016-09-23T10:40:07.325Z",
    "hostname" : "gm-01-d.fritz.box",
    "lifecycle" : "running",
    "lb_status" : "alive",
    "timezone" : "Europe/Berlin",
    "operating_system" : "Linux 3.16.0-4-amd64",
    "is_processing" : true
  },
  "58c57924-808a-4fa7-be09-63ca551628cd" : {
    "facility" : "graylog-server",
    "codename" : "Smuttynose",
    "node_id" : "58c57924-808a-4fa7-be09-63ca551628cd",
    "cluster_id" : "3adaf799-1551-4239-84e5-6ed939b56f62",
    "version" : "2.1.1+01d50e5",
    "started_at" : "2016-09-30T13:31:39.051Z",
    "hostname" : "gm-01-u.fritz.box",
    "lifecycle" : "running",
    "lb_status" : "alive",
    "timezone" : "Europe/Berlin",
    "operating_system" : "Linux 4.4.0-36-generic",
    "is_processing" : true
  }
Creating and using Access Token

For security reasons, using the username and password directly on the command line or in some third party application is undesirable.

To prevent having to use the clear text credentials, Graylog allows to create access tokens which can be used for authentication instead.

In order to create a new access token, you need to send a POST request to the Graylog REST API which includes the username and the name of the new access token.

Note

Users require the permissions users:tokenlist, users:tokencreate, and users:tokenremove to manage their access tokens. Please check the documentation on Permission system for more information. Also note that users, even administrators, may only manage their own tokens.

The following example will create an access token named icinga for the user GM:

curl -u GM:superpower -H 'Accept: application/json' -X POST 'http://192.168.178.26:9000/api/users/GM/tokens/icinga?pretty=true'

The response will include the access token in the token field:

{
   "name" : "icinga",
   "token" : "htgi84ut7jpivsrcldd6l4lmcigvfauldm99ofcb4hsfcvdgsru",
   "last_access" : "1970-01-01T00:00:00.000Z"
}

The received access token can now be used as username in a request to the Graylog REST API using Basic Auth together with the literal password token.

Now the first curl example would look as follows:

curl -u htgi84ut7jpivsrcldd6l4lmcigvfauldm99ofcb4hsfcvdgsru:token -H 'Accept: application/json' -X GET 'http://192.168.178.26:9000/api/cluster?pretty=true'

If you need to know which access tokens have already been created by a user, just use GET /users/{username}/tokens/ on the Graylog REST API to request a list of all access tokens that are present for this user.

The following example will request all access tokens of the user GM:

curl -u GM:superpower -H 'Accept: application/json' -X GET 'http://192.168.178.26:9000/api/users/GM/tokens/?pretty=true'

When an access token is no longer needed, it can be delete on the Graylog REST API via DELETE /users/{username}/tokens/{token}.

The following example deletes the previously created access token htgi84ut7jpivsrcldd6l4lmcigvfauldm99ofcb4hsfcvdgsru of the user GM:

curl -u GM:superpower -H 'Accept: application/json' -X DELETE' http://192.168.178.26:9000/api/users/GM/tokens/ap84p4jehbf2jddva8rdmjr3k7m3kdnuqbai5s0h5a48e7069po?pretty=true'
Creating and using Session Token

While access tokens can be used for permanent access, session tokens will expire after a certain time. The expiration time can be adjusted in the user’s profile.

Getting a new session token can be obtained via POST request to the Graylog REST API. Username and password are required to get a valid session ID. The following example will create an session token for the user GM:

curl -i -X POST -H 'Content-Type: application/json' -H 'Accept: application/json' 'http://192.168.178.26:9000/api/system/sessions' -d '{"username":"GM", "password":"superpower", "host":""}'

The response will include the session token in the field session_id and the time of expiration:

{
    "valid_until" : "2016-10-24T16:08:57.854+0000",
    "session_id" : "cf1df45c-53ea-446c-8ed7-e1df64861de7"
}

The received token can now be used as username in a request to the Graylog REST API using Basic Auth together with the literal password session.

Now a curl command to get a list of access tokens would look as follows:

curl -u cf1df45c-53ea-446c-8ed7-e1df64861de7:session -H 'Accept: application/json' -X GET 'http://192.168.178.26:9000/api/cluster?pretty=true'

Securing Graylog

To secure your Graylog setup, you should not use one of our pre-configured images, create your own unique installation where you understand each component and secure the environment by design. Expose only the services that are needed and secure them whenever possible with TLS/SSL and some kind of authentication. Do not use the pre-created appliances for critical production environments.

On the Graylog appliances MongoDB and Elasticsearch is listening on the external interface. This makes the creation of a cluster easier and demonstrates the way Graylog works. Never run this in an insecure network.

When using Amazon Web Services and our pre-configured AMI, never open all ports in the security group. Do not expose the server to the internet. Access Graylog only from within your VPC. Enable encryption for the communication.

Default ports

All parts of one Graylog installation will communicate over network sockets. Depending on your setup and number of nodes this might be exposed or can be bound to localhost. The SELinux configuration is already covered in our step-by-step guide for CentOS Linux.

Default network communication ports
Component Port
Graylog (web interface / API) 9000 (tcp)
Graylog to Elasticsearch 9200 (tcp)
Elasticsearch node communication 9300 (tcp)
MongoDB 27017 (tcp)

Each setup is unique in the requirements and ports might be changed by configuration, but you should limit who is able to connect to which service. In the architecture description you can see what components need to be exposed and communicate with each other.

Configuring TLS ciphers

When running Graylog in untrusted environments such as the Internet, we strongly recommend to use SSL/TLS for all connections.

It’s possible to disable unsafe or deprecated TLS ciphers in Graylog. When using nginx or Apache httpd for SSL termination the Mozilla SSL Configuration Generator will help to create a reasonably secure configuration for them.

Sending in log data

A Graylog setup is pretty worthless without any data in it. This page explains the basic principles of getting your data into the system and also explains common fallacies.

What are Graylog message inputs?

Message inputs are the Graylog parts responsible for accepting log messages. They are launched from the web interface (or the REST API) in the System -> Inputs section and are launched and configured without the need to restart any part of the system.

Inputs can collect either structured or unstructured data. The type of message input you use will be determined by the format of the event sent by the source/device.

For example, a device sending RFC 5424 and RFC 3164 (strictly) compliant syslog messages should use a syslog message input. On the other hand, many devices are sending noncompliant messages, despite sending data on the same port (514) and using the same transport method (UDP).

To be handled properly by Graylog, these non-compliant sources will require a plain-text message input. Please note, events arriving from plain-text sources will not be parsed by default. In order to parse these messages, please configure a Processing Pipeline.

Content packs

Content packs are bundles of Graylog input, extractor, stream, dashboard, and output configurations that can provide full support for a data source. Some content packs are shipped with Graylog by default and some are available from the website. Content packs that were downloaded from the Graylog Marketplace can be imported using the Graylog web interface.

You can load and even create own content packs from the System -> Content Packs section of your Graylog web interface.

List of Elements Supported in Content Packs

  • Inputs
  • Grok Patterns
  • Outputs
  • Streams
  • Dashboards
  • Lookup Tables
  • Lookup Caches
  • Lookup Data Adapters

Syslog

Graylog is able to accept and parse RFC 5424 and RFC 3164 compliant syslog messages and supports TCP transport with both the octet counting or termination character methods. UDP is also supported and the recommended way to send log messages in most architectures.

Many devices, especially routers and firewalls, do not send RFC compliant syslog messages. This might result in wrong or completely failing parsing. In that case you might have to go with a combination of raw/plaintext message inputs that do not attempt to do any parsing and Extractors.

Rule of thumb is that messages forwarded by rsyslog or syslog-ng are usually parsed flawlessly.

Sending syslog from Linux hosts

Sending syslog data from Linux hosts is described on the Graylog Marketplace.

Sending syslog from MacOS X hosts

Sending log messages from MacOS X syslog daemons is easy. Just define a graylog-server instance as UDP log target by adding this line in your /etc/syslog.conf:

*.* @graylog.example.org:514

Now restart syslogd:

$ sudo launchctl unload /System/Library/LaunchDaemons/com.apple.syslogd.plist
$ sudo launchctl load /System/Library/LaunchDaemons/com.apple.syslogd.plist

Important: If syslogd was running as another user you might end up with multiple syslogd instances and strange behavior of the whole system. Please check that only one syslogd process is running:

$ ps aux | grep syslog
lennart         58775   0.0  0.0  2432768    592 s004  S+    6:10PM   0:00.00 grep syslog
root            58759   0.0  0.0  2478772   1020   ??  Ss    6:09PM   0:00.01 /usr/sbin/syslogd

That’s it! Your MacOS X syslog messages should now appear in your Graylog system.

GELF / Sending from applications

The Graylog Extended Log Format (GELF) is a log format that avoids the shortcomings of classic plain syslog and is perfect to logging from your application layer. It comes with optional compression, chunking and most importantly a clearly defined structure. There are dozens of GELF libraries for many frameworks and programming languages to get you started.

Read more about GELF in the specification.

GELF via HTTP

You can send in all GELF types via HTTP, including uncompressed GELF that is just a plain JSON string.

After launching a GELF HTTP input you can use the following endpoints to send messages:

http://graylog.example.org:[port]/gelf (POST)

Try sending an example message using curl:

curl -XPOST http://graylog.example.org:12202/gelf -p0 -d '{"short_message":"Hello there", "host":"example.org", "facility":"test", "_foo":"bar"}'

Both keep-alive and compression are supported via the common HTTP headers. The server will return a 202 Accepted when the message was accepted for processing.

Using Apache Kafka as transport queue

Graylog supports Apache Kafka as a transport for various inputs such as GELF, syslog, and Raw/Plaintext inputs. The Kafka topic can be filtered by a regular expression and depending on the input, various additional settings can be configured.

Learn how to use rsyslog and Apache Kafka in the Sending syslog via Kafka into Graylog guide.

Using RabbitMQ (AMQP) as transport queue

Graylog supports AMQP as a transport for various inputs such as GELF, syslog, and Raw/Plaintext inputs. It can connect to any AMQP broker supporting AMQP 0-9-1 such as RabbitMQ.

Learn how to use rsyslog and RabbitMQ in the Sending syslog via AMQP into Graylog guide.

Microsoft Windows

Sending syslog data from Windows is described on the Graylog Marketplace.

Heroku

Heroku allows you to forward the logs of your application to a custom syslog server by creating a so called Syslog drain. The drain sends all logs to the configured server(s) via TCP. Following example shows you how to configure Graylog to receive the Heroku logs and extract the different fields into a structured log message.

Configuring Graylog to receive Heroku log messages

The Graylog Marketplace contains a content pack for Heroku logs, including extractors to parse the Heroku log format. You can download and use that content pack to configure Graylog to be able to receive Heroku logs.

Go to System -> Content packs, and click on Import content pack. Select the content pack downloaded from the Graylog Marketplace, and click Upload

_images/heroku_1.png

On the same page, select Heroku on the SaaS category on the left column, and click on Apply.

_images/heroku_2.png

That’s it! You can verify that there is a new input for Heroku, containing a set of extractors to parse your log messages. Make sure your firewall setup allows incoming connections on the inputs port!

_images/heroku_3.png

Configuring Heroku to send data to your Graylog setup

Heroku has a detailed documentation regarding the Syslog drains feature. The following example shows everything that is needed to setup the drain for you application:

$ cd path/to/your/heroku/app
$ heroku drains
No drains for this app
$ heroku drains:add syslog://graylog.example.com:5556
Successfully added drain syslog://graylog.example.com:5556
$ heroku drains
syslog://graylog.example.com:5556 (d.8cf52d32-7d79-4653-baad-8cb72bb23ee1)

The Heroku CLI tool needs to be installed for this to work.

You Heroku application logs should now show up in the search results of your Graylog instance.

Ruby on Rails

This is easy: You just need to combine a few components.

Log all requests and logger calls into Graylog

The recommended way to send structured information (i.e. HTTP return code, action, controller, ... in additional fields) about every request and explicit Rails.logger calls is easily accomplished using the GELF gem and lograge. Lograge builds one combined log entry for every request (instead of several lines like the standard Rails logger) and has a Graylog output since version 0.2.0.

Start by adding Lograge and the GELF gem to your Gemfile:

gem "gelf"
gem "lograge"

Now configure both in your Rails application. Usually config/environments/production.rb is a good place for that:

config.lograge.enabled = true
config.lograge.formatter = Lograge::Formatters::Graylog2.new
config.logger = GELF::Logger.new("graylog.example.org", 12201, "WAN", { :host => "hostname-of-this-app", :facility => "heroku" })

This configuration will also send all explicit Rails.logger calls (e.g. Rails.logger.error "Something went wrong") to Graylog.

Log only explicit logger calls into Graylog

If you don’t want to log information about every request, but only explicit Rails.logger calls, it is enough to only configure the Rails logger.

Add the GELF gem to your Gemfile:

gem "gelf"

...and configure it in your Rails application. Usually config/environments/production.rb is a good place for that:

config.logger = GELF::Logger.new("graylog.example.org", 12201, "WAN", { :host => "hostname-of-this-app", :facility => "heroku" })

Heroku

You need to apply a workaround if you want custom logging on Heroku. The reason for this is that Heroku injects an own logger (rails_log_stdout), that overwrites your custom one. The workaround is to add a file that makes Heroku think that the logger is already in your application:

$ touch vendor/plugins/rails_log_stdout/heroku_fix

Raw/Plaintext inputs

The built-in raw/plaintext inputs allow you to parse any text that you can send via TCP or UDP. No parsing is applied at all by default until you build your own parser using custom Extractors. This is a good way to support any text-based logging format.

You can also write Plugins if you need extreme flexibility.

JSON path from HTTP API input

The JSON path from HTTP API input is reading any JSON response of a REST resource and stores a field value of it as a Graylog message.

Example

Let’s try to read the download count of a release package stored on GitHub for analysis in Graylog. The call looks like this:

$ curl -XGET https://api.github.com/repos/YourAccount/YourRepo/releases/assets/12345
{
  "url": "https://api.github.com/repos/YourAccount/YourRepo/releases/assets/12345",
  "id": 12345,
  "name": "somerelease.tgz",
  "label": "somerelease.tgz",
  "content_type": "application/octet-stream",
  "state": "uploaded",
  "size": 38179285,
  "download_count": 9937,
  "created_at": "2013-09-30T20:05:01Z",
  "updated_at": "2013-09-30T20:05:46Z"
}

The attribute we want to extract is download_count so we set the JSON path to $.download_count.

This will result in a message in Graylog looking like this:

_images/jsonpath_1.png

You can use Graylog to analyze your download counts now.

JSONPath

JSONPath can do much more than just selecting a simple known field value. You can for example do this to select the first download_count from a list of releases where the field state has the value uploaded:

$.releases[?(@.state == 'uploaded')][0].download_count

...or only the first download count at all:

$.releases[0].download_count

You can learn more about JSONPath here.

Reading from files

Log files come in a lot of different flavors and formats, much more than any single program could handle.

To support this use case, we provide the Collector Sidecar which acts as a supervisor process for other programs, such as nxlog and Filebeats, which have specifically been built to collect log messages from local files and ship them to remote systems like Graylog.

Of course you can still use any program supporting the GELF or syslog protocol (among others) to send your logs to Graylog.

Graylog Collector Sidecar

Graylog Collector Sidecar is a lightweight configuration management system for different log collectors, also called Backends. The Graylog node(s) act as a centralized hub containing the configurations of log collectors. On supported message-producing devices/hosts, Sidecar can run as a service (Windows host) or daemon (Linux host).

_images/sidecar_overview.png

These configurations are centrally managed through the Graylog web interface, in a graphical way. For specific needs, raw backend configurations, called Snippets, may optionally be directly stored into Graylog.

Periodically, the Sidecar daemon will fetch all relevant configurations for the target, using the REST API. Which configurations are actually fetched depends on ‘tags’ defined in the host’s Sidecar configuration file. For instance, a Web server host may include the linux and nginx tags.

On its first run, or when a configuration change has been detected, Sidecar will generate (render) relevant backend configuration files. Then it will start, or restart, those reconfigured log collectors.

Graylog Collector Sidecar (written in Go) and backends (written in various languages, such as C and Go) are meant as a small-footprint replacement for the deprecated, Java-based deprecated Graylog Collector.

Backends

Currently the Sidecar is supporting NXLog, Filebeat and Winlogbeat. They all share the same web interface. Switch the tab on a configuration page to create resources for the used collector. The supported features are almost the same. For all collectors a GELF output with SSL encryption is available. The most used input options like file tailing or windows event logging do exist. On the server side you can share inputs with multiple collectors. E.g. All Filebeat and Winlogbeat instances can send logs into a single Graylog-Beats input.

Installation

Currently we provide pre-compiled packages on the Github releases page of the project. Once the Sidecar project is settled and matured we will add the packages to the DEB and YUM online repositories. To get the Sidecar working Download a package and install it on the target system.

Please follow the version matrix to pick the right package:

Sidecar version Graylog server version
0.0.9 2.1.x
0.1.x 2.2.x,2.3.x,2.4.x

All following commands should be executed on the remote machine where you want to collect log data from.

Beats backend

Ubuntu

The Beats binaries (Filebeat and Winlogeventbeat) are included in the Sidecar package. So installation is just one command:

$ sudo dpkg -i collector-sidecar_0.0.9-1_amd64.deb

Edit /etc/graylog/collector-sidecar/collector_sidecar.yml, you should set at least the correct URL to your Graylog server and proper tags. The tags are used to define which configurations the host should receive.

Create a system service and start it:

$ sudo graylog-collector-sidecar -service install

[Ubuntu 14.04 with Upstart]
$ sudo start collector-sidecar

[Ubuntu 16.04 with Systemd]
$ sudo systemctl start collector-sidecar
CentOS

Install the RPM package on RedHat based systems

$ sudo rpm -i collector-sidecar-0.0.9-1.x86_64.rpm

Activate the Sidecar as a system service:

$ sudo graylog-collector-sidecar -service install
$ sudo systemctl start collector-sidecar
Windows

Use the Windows installer, it can be run interactively:

$ collector_sidecar_installer.exe

or in silent mode with:

$ collector_sidecar_installer.exe /S -SERVERURL=http://10.0.2.2:9000/api -TAGS="windows,iis"

Edit C:\Program Files\graylog\collector-sidecar\collector_sidecar.yml and register the system service:

$ "C:\Program Files\graylog\collector-sidecar\graylog-collector-sidecar.exe" -service install
$ "C:\Program Files\graylog\collector-sidecar\graylog-collector-sidecar.exe" -service start

NXLog backend

Ubuntu

Install the NXLog package from the official download page. Because the Sidecar takes control of stopping and starting NXlog it’s necessary to stop all running instances of NXlog and deconfigure the default system service. Afterwards we can install and setup the Sidecar:

$ sudo /etc/init.d/nxlog stop
$ sudo update-rc.d -f nxlog remove
$ sudo gpasswd -a nxlog adm
$ sudo chown -R nxlog.nxlog /var/spool/collector-sidecar/nxlog

$ sudo dpkg -i collector-sidecar_0.0.9-1_amd64.deb

Edit /etc/graylog/collector-sidecar/collector_sidecar.yml accordingly and register the Sidecar as a service:

$ sudo graylog-collector-sidecar -service install

[Ubuntu 14.04 with Upstart]
$ sudo start collector-sidecar

[Ubuntu 16.04 with Systemd]
$ sudo systemctl start collector-sidecar
CentOS

The same on a RedHat based system:

$ sudo service nxlog stop
$ sudo chkconfig --del nxlog
$ sudo gpasswd -a nxlog root
$ sudo chown -R nxlog.nxlog /var/spool/collector-sidecar/nxlog

$ sudo rpm -i collector-sidecar-0.0.9-1.x86_64.rpm

Activate the Sidecar as a system service:

$ sudo graylog-collector-sidecar -service install
$ sudo systemctl start collector-sidecar
Windows

Install the NXLog package from the official download page and deactivate the system service. We just need the binaries installed on the system:

$ C:\Program Files (x86)\nxlog\nxlog -u

$ collector_sidecar_installer.exe

Edit C:\Program Files\graylog\collector-sidecar\collector_sidecar.yml, you should set at least the correct URL to your Graylog server and proper tags. Register the system service:

$ C:\Program Files\graylog\collector-sidecar\graylog-collector-sidecar.exe -service install
$ C:\Program Files\graylog\collector-sidecar\graylog-collector-sidecar.exe -service start

To perform an uninstall on Windows:

$ C:\Program Files\graylog\collector-sidecar\graylog-collector-sidecar.exe -service stop
$ C:\Program Files\graylog\collector-sidecar\graylog-collector-sidecar.exe -service uninstall

Notice that the NXLog file input is currently not able to do a SavePos for file tailing, this will be fixed in a future version.

Configuration

On the command line you can provide a path to the configuration file with the -c switch. If no path is specified it looks on Linux systems for:

/etc/graylog/collector-sidecar/collector_sidecar.yml

and on Windows machines under:

C:\Program Files\graylog\collector-sidecar\collector_sidecar.yml

The configuration file is separated into global options and backend specific options. Global options are:

Parameter Description
server_url URL to the Graylog API, e.g. http://127.0.0.1:9000/api/
update_interval The interval in seconds the sidecar will fetch new configurations from the Graylog server
tls_skip_verify Ignore errors when the REST API was started with a self-signed certificate
send_status Send the status of each backend back to Graylog and display it on the status page for the host
list_log_files Send a directory listing to Graylog and display it on the host status page, e.g. /var/log. This can also be a list of directories
node_id Name of the Sidecar instance, will also show up in the web interface. Hostname will be used if not set.
collector_id Unique ID (UUID) of the instance. This can be a string or a path to an ID file
log_path A path to a directory where the Sidecar can store the output of each running collector backend
log_rotation_time Rotate the stdout and stderr logs of each collector after X seconds
log_max_age Delete rotated log files older than Y seconds
tags List of configuration tags. All configurations on the server side that match the tag list will be fetched and merged by this instance
backends A list of collector backends the user wants to run on the target host

Currently NXLog and Beats are supported as collector backend, to make it work the Sidecar needs to know where the binary is installed and where it can write a configuration file for it.

Parameter Description
name Which backend to use (must be ‘nxlog’, ‘filebeat’ or ‘winlogbeat)
enabled Whether this backend should be started by the Sidecar or not
binary_path Path to the actual collector binary
configuration_path Path to the configuration file for this collector

An example configuration for NXlog looks like this:

server_url: http://10.0.2.2:9000/api/
update_interval: 30
tls_skip_verify: true
send_status: true
list_log_files:
  - /var/log
node_id: graylog-collector-sidecar
collector_id: file:/etc/graylog/collector-sidecar/collector-id
log_path: /var/log/graylog/collector-sidecar
log_rotation_time: 86400
log_max_age: 604800
tags:
  - linux
  - apache
  - redis
backends:
    - name: nxlog
      enabled: true
      binary_path: /usr/bin/nxlog
      configuration_path: /etc/graylog/collector-sidecar/generated/nxlog.conf

For the Beats platform you can enable each Beat individually, e.g on a Windows host with Filebeat and Winlogbeat enabled use a configuration like this:

server_url: http://10.0.2.2:9000/api/
update_interval: 30
tls_skip_verify: true
send_status: true
list_log_files:
node_id: graylog-collector-sidecar
collector_id: file:C:\Program Files\graylog\collector-sidecar\collector-id
cache_path: C:\Program Files\graylog\collector-sidecar\cache
log_path: C:\Program Files\graylog\collector-sidecar\logs
log_rotation_time: 86400
log_max_age: 604800
tags:
  - windows
  - apache
  - redis
backends:
    - name: winlogbeat
      enabled: true
      binary_path: C:\Program Files\graylog\collector-sidecar\winlogbeat.exe
      configuration_path: C:\Program Files\graylog\collector-sidecar\generated\winlogbeat.yml
    - name: filebeat
      enabled: true
      binary_path: C:\Program Files\graylog\collector-sidecar\filebeat.exe
      configuration_path: C:\Program Files\graylog\collector-sidecar\generated\filebeat.yml

On the server side the collector plugin is caching the requested configuration in memory. By default up-to 100 entries are stored for 1 hour. If you wish to change that, add to your server configuration:

collector_sidecar_cache_time = 2h
collector_sidecar_cache_max_size = 500

First start

Once you installed the Sidecar package you are ready to start the service for the first time. Decide which backend you want to use. Enable or disable the single backends by setting enabled: true or respectively to false. Now start the Sidecar, depending on your operating system you can do this with:

systemd sudo systemctl start collector-sidecar
SysV sudo start collector-sidecar
Windows C:\Program Files\graylog\collector-sidecar\graylog-collector-sidecar.exe -service start

If you’re unsure which init system your Linux distribution is using, execute the following command to print the name of the used init system:

# ps -h -o comm -p 1

Otherwise please refer to the handbook of your Linux distribution and look up which init system is being used.

Afterwards you will most likely see an error like this in the log file:

INFO[0006] [RequestConfiguration] No configuration found for configured tags!

This simply means that there is no configuration with the same tag that the Sidecar was started with. So we have to create a new configuration. Define outputs and inputs and tag it in order to collect log files. Take the Step-by-step guide to create your first configuration.

When the Sidecar can find a configuration that matches its own tags, it will write for each collector backend a configuration file into the /generated directory. E.g. if you enabled the Filebeat collector you will find a filebeat.yml file in that directory. All changes have to be made in the Graylog web interface. Everytime the Sidecar detects an update to its configuration it will rewrite the corresponding collector configuration file. So it doesn’t make sense to manually edit those files.

Everytime a collector configuration file is changed the collector process is restarted. The Sidecar takes care of the collector processes and reports the status back to the web interface

Sidecar Status

Each Sidecar instance is able to send status informations back to Graylog. By enabling the option send_status metrics like the configured tags or the IP address of the host Sidecar is running on are send. Also metrics that are relevant for a stable operation e.g. disk volumes over 75% utilization are included. Additionally with the list_log_files option a directory listing is displayed in the Graylog web interface. In that way an administrator can see which files are available for collecting. The list is periodically updated and files with write access are highlighted for easy identification. After enabling send_status or send_status + list_log_files go to the collector overview and click on one of them, a status page with the configured information will be displayed.

Step-by-step guide

We have prepared an example on how to configure the Sidecar using the Graylog Webinterface. The assumption is that we want to collect Apache logfiles and ship them with a Filebeat collector to a Beats input that is listening on Port 5044 on your Graylog Server.

  • The first step is to create a Beats input where collectors can send data to. Click on System / Inputs and start a global Beats input on the listening address 0.0.0.0 and port 5044.
_images/sidecar_sbs0.png
  • Navigate to the collector configurations. In your Graylog Webinterface click on System / Collectors / Manage configurations.
_images/sidecar_sbs1.png
  • Next we create a new configuration
_images/sidecar_sbs2.png
  • Give the configuration a name
_images/sidecar_sbs3.png
  • Click on the new configuration and create e.g. a Filebeat output. For a first test just change the IP to your Graylog server.
_images/sidecar_sbs4.png
  • Create a Filebeat file input to collect the Apache access logs.
_images/sidecar_sbs5.png
  • Tag the configuration with the apache tag. Just write the tag name in the field press enter followed by the Update tags button.
_images/sidecar_sbs6.png
  • When you now start the Sidecar with the apache tag the output should look like this
_images/sidecar_sbs7.png
  • Congratulations your collector setup is working now!

Secure Sidecar Communication

The Communication between Sidecar and Graylog will be secured if your API use SSL.

To secure the communication between the Collector and Graylog you just need to mark Enable TLS in your Beats Input. Without giving additional Information, Graylog will now create a self-signed certificate for this Input. Now in the Sidecar Beats Output Configuration you just mark Enable TLS Support and Insecure TLS connection. After this is saved, the communication between Beats and Graylog will use TLS.

If you prefer NXLog you need to mark Allow untrusted certificate in the NXLog Outputs configuration and Enable TLS for your GELF Input.

Certificate based client authentication

If you want to allow Graylog only to accept data from certificated clients you will need to build your own certificate authority and provide this to the Input and the Client Output configuration.

Run Sidecar as non-root user

The default is that the Sidecar is started with the root user to allow access to all log files. But this is not mandatory. If you like to start it with a daemon user, proceede like the following:

  • Create a daemon user e.g. collector

The Sidecar itself is accessing the following files and directories:

  • collector_sidecar.yml - /etc/graylog/collector-sidecar/collector_sidecar.yml
  • backend configuration_path - /etc/graylog/collector-sidecar/generated/
  • collector_id - /etc/graylog/collector-sidecar/collector-id
  • cache_path - /var/cache/graylog/collector-sidecar/
  • log_path - /var/log/graylog/collector-sidecar/

So to make these directories readable for the collector user, use:

  • chown -R collector /etc/graylog
  • chown -R collector /var/cache/graylog
  • chown -R collector /etc/graylog

You can change all paths to different places in the filesystem. If you prefer to store all Sidecar data in the home directory of the collector user, just change the paths accordingly.

Now systemd needs to know that the Sidecar should be started with a non-root user. Open /etc/systemd/system/collector-sidecar.service with an editor and navigate to the [Service] section, add:

User=collector
Group=collector

To make use of these settings reload systemd:

$ sudo systemctl daemon-reload
$ sudo systemctl restart collector-sidecar

Check the log files in /var/log/graylog/collector-sidecar for any errors. Understand that not only the Sidecar but also all backends, like filebeat, will be started as collector user after these changes. So all log files that the backend should observe also need to be readable by the collector user. Depending on the Linux distribution there is usually an adminstrator group which has access to most log files. By adding the collector user to that group you can grant access fairly easy. For example on Debian/Ubuntu systems this group is called adm (see System Groups in Debian Wiki or Security/Privileges - Monitor system logs in Ubuntu wiki).

Sidecar Glossary

To understand the different parts of the Graylog Sidecar they are explained in the following section.

Configuration

A collector configuration is an abstract representation of a collector configuration file. It contains one or many Outputs, Inputs and Snippets. Based on the selected backend the Sidecar will then render a working configuration file for the particular collector. To match a configuration for a Sidecar instance both sides need to be started with the same tag. If the tags of a Sidecar instance match multiple configurations all Out-,Inputs and Snippets are merged together to a single configuration.

Tags

Tags are used to match Sidecar instances with configurations on the Graylog server side. E.g. a user can create a configuration for Apache access log files. The configuration gets the tag apache. On all web servers running the Apache daemon the Sidecar can also be started with the apache tag to fetch this configuration and to collect web access log files. There can be multiple tags on both sides the Sidecar and the Graylog server side. But to keep the overview the administrator should use at least on one side discrete tags that the assignment is always 1:1 or 1:n.

Outputs

Outputs are used to send data from a collector back to the Graylog server. E.g. NXLog is able to send directly messages in the GELF format. So the natural fit is to create a GELF output in a NXLog configuration. Instructing NXlog to send GELF messages is of course just half the way, we also need a receiver for that. So an administrator needs to create a proper receiver under System / Inputs.

Inputs

Inputs are the way how collectors ingest data. An input can be a log file that the collector should continuous read or a connection to the Windows event system that emits log events. An input is connected to an output, otherwise there would be no way of sending the data to the next hop. So first create an output and then associate one or many inputs with it.

Snippets

Snippets are simply plain text configuration fragments. Sometimes it’s not possible to represent the needed configuration through the provided system. E.g. a user would like to load a special collector module. She could put the directive into a snippet which will be added to the final collector configuration without any modification. It’s also conceivable to put a full configuration file into a snippet and skip all of the input and output mechanism. Before the snippet is actually rendered into the configuration file the Sidecar is sending it through a template engine. It’s using Go’s own text template engine for that. A usage of that can be seen in the nxlog-default snippet. It detects which operating the Sidecar is running on and depending on the result, paths for some collector settings change.

Actions

Resources like inputs, output or snippets have all the same actions: create, edit, clone Usually there are only little differences between certain configurations so you can create a resource once, clone it and modify only the fields you need. In this way it’s possible to manage a fairly large amount of configurations.

_images/sidecar_configuration.png

Debug

The Sidecar is writing log files to the directory configured in log_path. One file for each backend, there you can check for general issues like file permissions or log transmission problems. The Sidecar itself is writing to collector_sidecar.log problems like failed connection to the Graylog API can be found there.

You can also start the Sidecar in foreground and monitor the output of the process:

$ graylog-collector-sidecar -debug -c /etc/graylog/collector-sidecar/collector_sidecar.yml

Known Problems

Currently we know of two problems with NXLog:

  • Since version 2.9.17 timestamps are transmitted without millisecond precision
  • On Windows machines NXlog is not able to store its collector state so features like file tailing don’t work correctly in combination with Sidecar. Use Sidecar version 0.1.0-alpha.1 or newer.

Known issue if you use a loadbalancer or firewall in front of Graylog’s API:

  • The Sidecar is using a persistent connection for API requests. Therefore it logs 408 Request Time-out if the loadbalancer session or http timeout is lower than the configured update_interval.

Graylog Collector (deprecated)

Warning

The Graylog Collector is deprecated and can be replaced with the Graylog Collector Sidecar.

Graylog Collector is a lightweight Java application that allows you to forward data from log files to a Graylog cluster. The collector can read local log files and also Windows Events natively, it then can forward the log messages over the network using the GELF format.

Installation

Linux/Unix

You need to have Java >= 7 installed to run the collector.

Operating System Packages

We offer official package repositories for the following operating systems.

  • Ubuntu 12.04, 14.04
  • Debian 8
  • CentOS 7

Please open an issue in the Github repository if you run into any packaging related issues. Thank you!

Ubuntu 14.04

Download and install graylog-collector-latest-repository-ubuntu14.04_latest.deb via dpkg(1) and also make sure that the apt-transport-https package is installed:

$ wget https://packages.graylog2.org/repo/packages/graylog-collector-latest-repository-ubuntu14.04_latest.deb
$ sudo dpkg -i graylog-collector-latest-repository-ubuntu14.04_latest.deb
$ sudo apt-get install apt-transport-https
$ sudo apt-get update
$ sudo apt-get install graylog-collector

Ubuntu 12.04

Download and install graylog-collector-latest-repository-ubuntu12.04_latest.deb via dpkg(1) and also make sure that the apt-transport-https package is installed:

$ wget https://packages.graylog2.org/repo/packages/graylog-collector-latest-repository-ubuntu12.04_latest.deb
$ sudo dpkg -i graylog-collector-latest-repository-ubuntu12.04_latest.deb
$ sudo apt-get install apt-transport-https
$ sudo apt-get update
$ sudo apt-get install graylog-collector

Debian 8

Download and install graylog-collector-latest-repository-debian8_latest.deb via dpkg(1) and also make sure that the apt-transport-https package is installed:

$ wget https://packages.graylog2.org/repo/packages/graylog-collector-latest-repository-debian8_latest.deb
$ sudo dpkg -i graylog-collector-latest-repository-debian8_latest.deb
$ sudo apt-get install apt-transport-https
$ sudo apt-get update
$ sudo apt-get install graylog-collector

CentOS 7

Download and install graylog-collector-latest-repository-el7_latest.rpm via rpm(8):

$ sudo rpm -Uvh https://packages.graylog2.org/repo/packages/graylog-collector-latest-repository-el7_latest.rpm
$ sudo yum install graylog-collector
Manual Setup
  1. Download the latest collector release. (find download links in the collector repository README)
  2. Unzip collector tgz file to target location
  3. cp config/collector.conf.example to config/collector.conf
  4. Update server-url in collector.conf to correct Graylog server address (required for registration)
  5. Update file input configuration with the correct log files
  6. Update outputs->gelf-tcp with the correct Graylog server address (required for sending GELF messages)

Note: The collector will not start properly if you do not set the URL or the correct input log files and GELF output configuration

Windows

You need to have Java >= 7 installed to run the collector.

Download a release zip file from the collector repository README. Unzip the collector zip file to target location.

_images/collector_win_install_1.png

Change into the extracted collector directory and create a collector configuration file in config\collector.conf.

_images/collector_win_install_2.png

The following configuration file shows a good starting point for Windows systems. It collects the Application, Security, and System event logs. Replace the <your-graylog-server-ip> with the IP address of your Graylog server.

Example:

server-url = "http://<your-graylog-server-ip>:9000/api/"

inputs {
  win-eventlog-application {
    type = "windows-eventlog"
    source-name = "Application"
    poll-interval = "1s"
  }
  win-eventlog-system {
    type = "windows-eventlog"
    source-name = "System"
    poll-interval = "1s"
  }
  win-eventlog-security {
    type = "windows-eventlog"
    source-name = "Security"
    poll-interval = "1s"
  }
}

outputs {
  gelf-tcp {
    type = "gelf"
    host = "<your-graylog-server-ip>"
    port = 12201
  }
}

Start a cmd.exe, change to the collector installation path and execute the following commands to install the collector as Windows service.

Commands:

C:\> cd graylog-collector-0.2.2
C:\graylog-collector-0.2.2> bin\graylog-collector-service.bat install GraylogCollector
C:\graylog-collector-0.2.2> bin\graylog-collector-service.bat start GraylogCollector
_images/collector_win_install_3.png

Configuration

You will need a configuration file before starting the collector. The configuration file is written in the HOCON format which is a human-optimized version of JSON.

If you choose the operating system installation method, the configuration file defaults to /etc/graylog/collector/collector.conf. For the manual installation method you have to pass the path to the configuration to the start script. (see Running Graylog Collector)

Here is a minimal configuration example that collects logs from the /var/log/syslog file and sends them to a Graylog server:

server-url = "http://10.0.0.1:9000/api/"

inputs {
  syslog {
    type = "file"
    path = "/var/log/syslog"
  }
}

outputs {
  graylog-server {
    type = "gelf"
    host = "10.0.0.1"
    port = 12201
  }
}

There are a few global settings available as well as several sections which configure different subsystems of the collector.

Global Settings

server-url - The API URL of the Graylog server

Used to send a heartbeat to the Graylog server.

(default: "http://localhost:9000/api/")

enable-registration - Enable heartbeat registration

Enables the heartbeat registration with the Graylog server. The collector will not contact the Graylog server API for heartbeat registration if this is set to false.

(default: true)

collector-id - Unique collector ID setting

The ID used to identify this collector. Can be either a string which is used as ID, or the location of a file if prefixed with file:. If the file does not exist, an ID will be generated and written to that file. If it exists, it is expected to contain a single string without spaces which will be used for the ID.

(default: "file:config/collector-id")

Input Settings

The input settings need to be nested in a input { } block. Each input has an ID and a type:

inputs {
  syslog {         // => The input ID
    type = "file"  // => The input type
    ...
  }
}

An input ID needs to be unique among all configured inputs. If there are two inputs with the same ID, the last one wins.

The following input types are available.

File Input

The file input follows files in the file system and reads log data from them.

type
This needs to be set to "file".
path

The path to a file that should be followed.

Please make sure to escape the \ character in Windows paths: path = "C:\\Program Files\\Apache2\\logs\\www.example.com.access.log"

(default: none)

path-glob-root

The globbing root directory that should be monitored. See below for an explanation on globbing.

Please make sure to escape the \ character in Windows paths: path = "C:\\Program Files\\Apache2\\logs\\www.example.com.access.log"

(default: none)

path-glob-pattern

The globbing pattern. See below for an explanation on globbing.

(default: none)

content-splitter

The content splitter implementation that should be used to detect the end of a log message.

Available content splitters: NEWLINE, PATTERN

See below for an explanation on content splitters.

(default: "NEWLINE")

content-splitter-pattern

The pattern that should be used for the PATTERN content splitter.

(default: none)

charset

Charset of the content in the configured file(s).

Can be one of the Supported Charsets of the JVM.

(default: "UTF-8")

reader-interval

The interval in which the collector tries to read from every configured file. You might set this to a higher value like 1s if you have files which do not change very often to avoid unnecessary work.

(default: "100ms")

Globbing / Wildcards

You might want to configure the collector to read from lots of different files or files which have a different name each time they are rotated. (i.e. time/date in a filename) The file input supports this via the path-glob-root and path-glob-pattern settings.

A usual glob/wildcard string you know from other tools might be /var/log/apache2/**/*.{access,error}.log. This means you are interested in all log files which names end with .access.log or .error.log and which are in a sub directory of /var/log/apache2. Example: /var/log/apache2/example.com/www.example.com.access.log

For compatibility reasons you have to split this string into two parts. The root and the pattern.

Examples:

// /var/log/apache2/**/*.{access,error}.log
path-glob-root = "/var/log/apache2"
path-glob-pattern = "**/*.{access,error}.log"

// C:\Program Files\Apache2\logs\*.access.log
path-glob-root = "C:\\Program Files\\Apache2\\logs" // Make sure to escape the \ character in Windows paths!
path-glob-pattern = "*.access.log"

The file input will monitor the path-glob-root for new files and checks them against the path-glob-pattern to decide if they should be followed or not.

All available special characters for the glob pattern are documented in the Java docs for the getPathMatcher() method.

Content Splitter

One common problem when reading from plain text log files is to decide when a log message is complete. By default, the file input considers each line in a file to be a separate log message:

Jul 15 10:27:08 tumbler anacron[32426]: Job `cron.daily' terminated  # <-- Log message 1
Jul 15 10:27:08 tumbler anacron[32426]: Normal exit (1 job run)      # <-- Log message 2

But there are several cases where this is not correct. Java stack traces are a good example:

2015-07-10T11:16:34.486+01:00 WARN  [InputBufferImpl] Unable to process event RawMessageEvent{raw=null, uuid=bde580a0-26ec-11e5-9a46-005056b26ca9, encodedLength=350}, sequence 19847516
java.lang.NullPointerException
        at org.graylog2.shared.buffers.JournallingMessageHandler$Converter.apply(JournallingMessageHandler.java:89)
        at org.graylog2.shared.buffers.JournallingMessageHandler$Converter.apply(JournallingMessageHandler.java:72)
        at com.google.common.collect.Lists$TransformingRandomAccessList$1.transform(Lists.java:617)
        at com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48)
        at java.util.AbstractCollection.toArray(AbstractCollection.java:141)
        at java.util.ArrayList.<init>(ArrayList.java:177)
        at com.google.common.collect.Lists.newArrayList(Lists.java:144)
        at org.graylog2.shared.buffers.JournallingMessageHandler.onEvent(JournallingMessageHandler.java:61)
        at org.graylog2.shared.buffers.JournallingMessageHandler.onEvent(JournallingMessageHandler.java:36)
        at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128)
        at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
2015-07-10T11:18:18.000+01:00 WARN  [InputBufferImpl] Unable to process event RawMessageEvent{raw=null, uuid=bde580a0-26ec-11e5-9a46-005056b26ca9, encodedLength=350}, sequence 19847516
java.lang.NullPointerException
        ...
        ...

This should be one message but using a newline separator here will not work because it would generate one log message for each line.

To solve this problem, the file input can be configured to use a PATTERN content splitter. It creates separate log messages based on a regular expression instead of newline characters. A configuration for the stack trace example above could look like this:

inputs {
  graylog-server-logs {
    type = "file"
    path = "/var/log/graylog-server/server.log"
    content-splitter = "PATTERN"
    content-splitter-pattern = "^\\d{4}-\\d{2}-\\d{2}T" // Make sure to escape the \ character!
  }
}

This instructs the file input to split messages on a timestamp at the beginning of a line. So the first stack trace in the message above will be considered complete once a new timestamp is detected.

Windows Eventlog Input

The Windows eventlog input can read event logs from Windows systems.

type
This needs to be set to "windows-eventlog".
source-name

The Windows event log system has several different sources from which events can be read.

Common source names: Application, System, Security

(default: "Application")

poll-interval

This controls how often the Windows event log should be polled for new events.

(default: "1s")

Example:

inputs {
  win-eventlog-application {
    type = "windows-eventlog"
    source-name = "Application"
    poll-interval = "1s"
  }
}

Output Settings

The output settings need to be nested in a output { } block. Each output has an ID and a type:

outputs {
  graylog-server { // => The output ID
    type = "gelf"  // => The output type
    ...
  }
}

An output ID needs to be unique among all configured outputs. If there are two outputs with the same ID, the last one wins.

The following output types are available.

GELF Output

The GELF output sends log messages to a GELF TCP input on a Graylog server.

type
This needs to be set to "gelf".
host

Hostname or IP address of the Graylog server.

(default: none)

port

Port of the GELF TCP input on the Graylog server host.

(default: none)

client-tls
Enables TLS for the connection to the GELF TCP input. Requires a TLS-enabled GELF TCP input on the Graylog server. (default: false)
client-tls-cert-chain-file

Path to a TLS certificate chain file. If not set, the default certificate chain of the JVM will be used.

(default: none)

client-tls-verify-cert

Verify the TLS certificate of the GELF TCP input on the Graylog server.

You might have to disable this if you are using a self-signed certificate for the GELF input and do not have any certificate chain file.

(default: true)

client-queue-size

The GELF client library that is used for this output has an internal queue of messages. This option configures the size of this queue.

(default: 512)

client-connect-timeout

TCP connection timeout to the GELF input on the Graylog server.

(default: 5000)

client-reconnect-delay

The delay before the output tries to reconnect to the GELF input on the Graylog server.

(default: 1000)

client-tcp-no-delay

Sets the TCP_NODELAY option on the TCP socket that connects to the GELF input.

(default: true)

client-send-buffer-size

Sets the TCP send buffer size for the connection to the GELF input.

It uses the JVM default for the operating system if set to -1.

(default: -1)

STDOUT Output

The STDOUT output prints the string representation of each message to STDOUT. This can be useful for debugging purposes but should be disabled in production.

type
This needs to be set to "stdout".

Static Message Fields

Sometimes it is useful to be able to add some static field to a message. This can help selecting extractors to run on the server, simplify stream routing and can make searching/filtering for those messages easier.

Every collector input can be configured with a message-fields option which takes key-value pairs. The key needs to be a string, the value can be a string or a number.

Example:

inputs {
  apache-logs {
    type = "file"
    path = "/var/log/apache2/access.log"
    message-fields = {
      "program" = "apache2"
      "priority" = 3
    }
  }
}

Each static message field will end up in the GELF message and shows up in the web interface as a separate field.

An input might overwrite a message field defined in the input configuration. For example the file input always sets a source_file field with the path to the file where the message has been read from. If you configure a source_file message field, it will be overwritten by the input.

Input/Output Routing

Every message that gets read by the configured inputs will be routed to every configured output. If you have two file inputs and two GELF outputs, every message will be received by both outputs. You might want to send some logs to only one output or have one output only accept logs from a certain input, though.

The collector provides two options for inputs and outputs which can be used to influence the message routing.

Inputs have a outputs option and outputs have a inputs option. Both take a comma separated list of input/output IDs.

Example:

inputs {
  apache-logs {
    type = "file"
    path-glob-root = "/var/log/apache2"
    path-glob-pattern = "*.{access,error}.log"
    outputs = "gelf-1,gelf-2"
  }
  auth-log {
    type = "file"
    path = "/var/log/auth.log"
  }
  syslog {
    type = "file"
    path = "/var/log/syslog"
  }
}

outputs {
  gelf-1 {
    type = "gelf"
    host = "10.0.0.1"
    port = 12201
  }
  gelf-2 {
    type = "gelf"
    host = "10.0.0.1"
    port = 12202
  }
  console {
    type = "stdout"
    inputs = "syslog"
  }
}

Routing for this config:

  • apache-logs messages will only go to gelf-1 and gelf-2 outputs.
  • auth-log messages will go to gelf-1 and gelf-2 outputs.
  • syslog messages will go to all outputs.
  • console output will only receive messages from syslog input.
inputs | outputs gelf-1 gelf-2 console
apache-logs
auth-log
syslog

This is pretty powerful but might get confusing when inputs and outputs have the routing fields. This is how it is implemented in pseudo-code:

var message = Object(message)
var output = Object(gelf-output)

if empty(output.inputs) AND empty(message.outputs)

  // No output routing configured, write the message to the output.
  output.write(message)

else if output.inputs.contains(message.inputId) OR message.outputs.contains(output.id)

  // Either the input that generated the message has the output ID in its "outputs" field
  // or the output has the ID of the input that generated the message in its "inputs" field.
  output.write(message)

end

Running Graylog Collector

You will need a configuration file before starting the collector. See the configuration documentation above for detailed instructions on how to configure it.

Linux/Unix

The start method for the collector depends on the installation method your choose.

Operating System Package

We ship startup scripts in our OS packages that use the startup method of the particular operating system.

OS Init System Example
Ubuntu upstart sudo start graylog-collector
Debian systemd sudo systemctl start graylog-collector
CentOS systemd sudo systemctl start graylog-collector

Manual Setup

If you use the manual setup, the location of the start script depends on where you extracted the collector.

Example:

$ bin/graylog-collector run -f config/collector.conf

Windows

You probably want to run the collector as Windows service as described in the Windows installation section above. If you want to run it from the command line, run the following commands.

Make sure you have a valid configuration file in config\collector.conf.

Commands:

C:\> cd graylog-collector-0.2.2
C:\graylog-collector-0.2.2> bin\graylog-collector.bat run -f config\collector.conf
_images/collector_win_run_1.png

Collector Status

Once the collector has been deployed successfully, you can check on the status from the Graylog UI.

_images/collector_status.png

You can reach the collector status overview page this way:

  1. Log into Graylog Web Interface
  2. Navigate to System / Collectors
  3. Click Collectors

Troubleshooting

Check the standard output of the collector process for any error messages or warnings. Messages not arriving in your Graylog cluster? Check possible firewalls and the network connection.

Command Line Options

Linux/Unix

The collector offers the following command line options:

usage: graylog-collector <command> [<args>]

The most commonly used graylog-collector commands are:

    help      Display help information

    run       Start the collector

    version   Show version information on STDOUT

 See 'graylog-collector help <command>' for more information on a specific command.

 NAME
        graylog-collector run - Start the collector

 SYNOPSIS
        graylog-collector run -f <configFile>

 OPTIONS
        -f <configFile>
            Path to configuration file.

Correctly Configured Collector Log Sample

This is the STDOUT output of a healthy collector starting:

2015-05-12T16:00:10.841+0200 INFO  [main] o.graylog.collector.cli.commands.Run - Starting Collector v0.2.0-SNAPSHOT (commit a2ad8c8)
2015-05-12T16:00:11.489+0200 INFO  [main] o.g.collector.utils.CollectorId - Collector ID: cf4734f7-01d6-4974-a957-cb71bbd826b7
2015-05-12T16:00:11.505+0200 INFO  [GelfOutput] o.g.c.outputs.gelf.GelfOutput - Starting GELF transport: org.graylog2.gelfclient.GelfConfiguration@3952e37e
2015-05-12T16:00:11.512+0200 INFO  [main] o.graylog.collector.cli.commands.Run - Service RUNNING: BufferProcessor [RUNNING]
2015-05-12T16:00:11.513+0200 INFO  [main] o.graylog.collector.cli.commands.Run - Service RUNNING: MetricService [RUNNING]
2015-05-12T16:00:11.515+0200 INFO  [main] o.graylog.collector.cli.commands.Run - Service RUNNING: FileInput{id='local-syslog', path='/var/log/syslog', charset='UTF-8', outputs='', content-splitter='NEWLINE'}
2015-05-12T16:00:11.516+0200 INFO  [main] o.graylog.collector.cli.commands.Run - Service RUNNING: GelfOutput{port='12201', id='gelf-tcp', client-send-buffer-size='32768', host='127.0.0.1', inputs='', client-reconnect-delay='1000', client-connect-timeout='5000', client-tcp-no-delay='true', client-queue-size='512'}
2015-05-12T16:00:11.516+0200 INFO  [main] o.graylog.collector.cli.commands.Run - Service RUNNING: HeartbeatService [RUNNING]
2015-05-12T16:00:11.516+0200 INFO  [main] o.graylog.collector.cli.commands.Run - Service RUNNING: StdoutOutput{id='console', inputs=''}

Troubleshooting

Unable to send heartbeat

The collector registers with your Graylog server on a regular basis to make sure it shows up on the Collectors page in the Graylog web interface. This registration can fail if the collector cannot connect to the server via HTTP on port 9000:

2015-06-06T10:45:14.964+0200 WARN  [HeartbeatService RUNNING] collector.heartbeat.HeartbeatService - Unable to send heartbeat to Graylog server: ConnectException: Connection refused

Possible solutions

  • Make sure the server REST API is configured to listen on a reachable IP address. Change the “rest_listen_uri” setting in the Graylog server config to this: rest_listen_uri = http://0.0.0.0:9000/api/
  • Correctly configure any firewalls between the collector and the server to allow HTTP traffic to port 9000.

Searching

Search query language

Syntax

The search syntax is very close to the Lucene syntax. By default all message fields are included in the search if you don’t specify a message field to search in.

Messages that include the term ssh:

ssh

Messages that include the term ssh or login:

ssh login

Messages that include the exact phrase ssh login:

"ssh login"

Messages where the field type includes ssh:

type:ssh

Messages where the field type includes ssh or login:

type:(ssh login)

Messages where the field type includes the exact phrase ssh login:

type:"ssh login"

Messages that have the field type:

_exists_:type

Messages that do not have the field type:

NOT _exists_:type

Messages that match regular expression ethernet[0-9]+ in field type:

type:/ethernet[0-9]+/

Note

Please refer to the Elasticsearch documentation about the Regular expression syntax for details about the supported regular expression dialect.

Note

Elasticsearch 2.x allows to use _missing_:type instead of NOT _exists_:type. This query syntax has been removed in Elasticsearch 5.0.

By default all terms or phrases are OR connected so all messages that have at least one hit are returned. You can use Boolean operators and groups for control over this:

"ssh login" AND source:example.org
("ssh login" AND (source:example.org OR source:another.example.org)) OR _exists_:always_find_me

You can also use the NOT operator:

"ssh login" AND NOT source:example.org
NOT example.org

Note that AND, OR, and NOT are case sensitive and must be typed in all upper-case.

Wildcards: Use ? to replace a single character or * to replace zero or more characters:

source:*.org
source:exam?le.org
source:exam?le.*

Note that leading wildcards are disabled to avoid excessive memory consumption! You can enable them in your Graylog configuration file:

allow_leading_wildcard_searches = true

Also note that message, full_message, and source are the only fields that are being analyzed by default. While wildcard searches (using * and ?) work on all indexed fields, analyzed fields will behave a little bit different. See wildcard and regexp queries for details.

Fuzziness: You can search for similar terms:

ssh logni~
source:exmaple.org~

This example is using the Damerau–Levenshtein distance with a default distance of 2 and will match “ssh login” and “example.org” (intentionally misspelled in the query).

You can change the distance like this:

source:exmaple.org~1

You can also use the fuzzyness operator to do a proximity search where the terms in a phrase can have different/fuzzy distances from each other and don’t have to be in the defined order:

"foo bar"~5

Numeric fields support range queries. Ranges in square brackets are inclusive, curly brackets are exclusive and can even be combined:

http_response_code:[500 TO 504]
http_response_code:{400 TO 404}
bytes:{0 TO 64]
http_response_code:[0 TO 64}

You can also do searches with one side unbounded:

http_response_code:>400
http_response_code:<400
http_response_code:>=400
http_response_code:<=400

It is also possible to combine unbounded range operators:

http_response_code:(>=400 AND <500)

Escaping

The following characters must be escaped with a backslash:

&& || : \ / + - ! ( ) { } [ ] ^ " ~ * ?

Example:

resource:\/posts\/45326

Time frame selector

The time frame selector defines in what time range to search in. It offers three different ways of selecting a time range and is vital for search speed: If you know you are only interested in messages of the last hour, only search in that time frame. This will make Graylog search in relevant indices only and greatly reduce system load and required resources.

_images/queries_time_range_selector.png

Relative time frame selector

The relative time frame selector lets you look for messages from the selected option to the time you hit the search button. The selector offers a wide set of relative time frames that fit most of your search needs.

Absolute time frame selector

When you know exactly the boundaries of your search, you want to use the absolute time frame selector. Simply introduce the dates and times for the search manually or click in the input field to open up a calendar where you can choose the day with your mouse.

Keyword time frame selector

Graylog offers a keyword time frame selector that allows you to specify the time frame for the search in natural language like last hour or last 90 days. The web interface shows a preview of the two actual timestamps that will be used for the search.

_images/queries_keyword_time_selector.png

Here are a few examples for possible values.

  • “last month” searches between one month ago and now
  • “4 hours ago” searches between four hours ago and now
  • “1st of april to 2 days ago” searches between 1st of April and 2 days ago
  • “yesterday midnight +0200 to today midnight +0200” searches between yesterday midnight and today midnight in timezone +0200 - will be 22:00 in UTC

The time frame is parsed using the natty natural language parser. Please consult its documentation for details.

Saved searches

Sometimes you may want to search a specific search configuration to be used later. Graylog provides a saved search functionality to accomplish exactly that.

Once you submitted your search, selected the fields you want to show from the search sidebar, and chosen a resolution for the histogram, click on the Save search criteria button on the sidebar.

_images/saved_search_create.png

Give a name to the current search and click on save. When you want to use the saved search later on, you only need to select it from the saved search selector.

_images/saved_search_selector.png

Of course, you can always update the selected fields or name of your saved search. To do so, select the saved search from the saved search selector, update the field selection or histogram resolution, and click on Saved search -> Update search criteria. It is also possible to delete the saved search by selecting Saved search -> Delete saved search.

_images/saved_search_update.png

Histogram

The search page includes a search result histogram, where you can view in a concise way the number of messages received grouped by a certain time period that Graylog will adjust for you.

The histogram also allows you to further narrow down the cause for an issue:

  • Delimit the search time range by brushing over the histogram. Just click and drag with your mouse over the chart to select the time range you want to use, and click on the search button to perform that search
  • See the time where alerts are triggered in the graph annotations. If you are searching in a stream, you will only see alerts related to that stream
_images/search_histogram.png

Analysis

Graylog provides several tools to analyze your search results. It is possible to save these analysis into dashboards, so you can check them over time in a more convenient way. To analyze a field from your search results, expand the field in the search sidebar and click on the button of the analysis you want to perform.

_images/search_analysis.png

Field statistics

Compute different statistics on your fields, to help you better summarize and understand the data in them.

The statistical information consist of: total, mean, minimum, maximum, standard deviation, variance, sum, and cardinality. On non-numeric fields, you can only see the total amount of messages containing that field, and the cardinality of the field, i.e. the number of unique values it has.

_images/field_statistics.png

Quick values

Quick values helps you to find out the distribution of values for a field. Alongside a graphic representation of the common values contained in a field, Graylog will display a table with all different values, allowing you to see the number of times they appear. You can include any value in your search query by clicking on the magnifying glass icon located in the value row.

_images/quick_values.png

Field graphs

You can create field graphs for any numeric field, by clicking on the Generate chart button in the search sidebar. Using the options in the Customize menu on top of the field graph, you can change the statistical function used in the graph, the kind of graph to use to represent the values, the graph interpolation, as well as the time resolution.

_images/field_graph.png

Once you have customized some field graphs, you can also combine them by dragging them from the hamburger icon on the top corner of the graph, and dropping them into another field graph. You can see the location of the hamburger icon and the end result in the following screenshots:

_images/stacked_graph_1.png _images/stacked_graph_2.png

Field graphs appear every time you perform a search, allowing you to compare data, or combine graphs coming from different streams.

Decorators

Decorators allow you to alter message fields during search time automatically, while preserving the unmodified message on disk. Decorators are specially useful to make some data in your fields more readable, combine data in some field, or add new fields with more information about the message. As decorators are configured per stream (including the default stream), you are also able to present a single message in different streams differently.

As changes made by decorators are not persisted, you cannot search for decorated values or use field analyzers on them. You can still use those features in the original non-decorated fields.

Decorators are applied on a stream-level, and are shared among all users capable of accessing a stream, so all users can share the same results and benefit from the advantages decorators add.

Graylog includes some message decorators out of the box, but you can add new ones from pipelines or by writing your own as plugins.

In order to apply decorators to your search results, click on the Decorators tab in your search sidebar, select the decorator you want to apply from the dropdown, and click on Apply. Once you save your changes, the search results will already contain the decorated values.

_images/create_decorator.png

When you apply multiple decorators to the same search results, you can change the order in which they are applied at any time by using drag and drop in the decorator list.

Syslog severity mapper

The syslog severity mapper decorator lets you convert the numeric syslog level of syslog messages, to a human readable string. For example, applying the decorator to the level field in your logs would convert the syslog level 4 to Warning (4).

To apply a syslog severity mapper decorator, you need to provide the following data:

  • Source field: Field containing the numeric syslog level
  • Target field: Field to store the human readable string. It can be the same one as the source field, if you wish to replace the numeric value on your search results

Format string

The format string decorator provides a simple way of combining several fields into one. It can also be used to modify the content of a field in, without altering the stored result in Elasticsearch.

To apply a format string decorator you need to provide the following data:

  • Format string: Pattern used to format the resulting string. You can provide fields in the message by enclosing them in ${}. E.g. ${source} will add the contents of the source message field into the resulting string
  • Target field: Field to store the resulting value
  • Require all fields (optional): Check this box to only format the string when all other fields are present

For example, using the format string Request to ${controller}#${action} finished in ${took_ms}ms with code ${http_response_code}, could produce the text Request to PostsController#show finished in 57ms with code 200, and make it visible in one of the message fields in your search results.

Pipeline Decorator

The pipeline decorator provides a way to decorate messages by processing them with an existing processing pipeline. In contrast to using a processing pipeline, changes done to the message by the pipeline are not persisted. Instead, the pipeline is used at search time to modify the presentation of the message.

The prerequisite of using the pipeline decorator is that an existing pipeline is required.

Note

Please take note, that the pipeline you use for decoration should not be connected to a stream. This would mean that it is run twice (during indexing and search time) for each message, effectively rendering the second run useless.

When you are done creating a pipeline, you can now add a decorator using it on any number of streams. In order to create one, you proceed just like for any other decorator type, by clicking on the Decorator sidebar, selecting the type (“Pipeline Processor Decorator” in this case) and clicking the Apply button next to one.

_images/pipeline_decorator_select_type.png

Upon clicking Apply, the pipeline to be used for decorating can be selected.

_images/pipeline_decorator_select_pipeline.png

After selecting a pipeline and clicking Save, you are already set creating a new pipeline decorator.

Debugging decorators

When a message is not decorated as expected, or you need to know how it looked like originally, you can see all changes that were done during decoration by clicking “Show changes” in the message details.

_images/pipeline_decorator_show_changes.png

In this view, deleted content is shown in red, while added content is shown in green. This means that added fields will have a single green entry, removed fields a single red entry and modified fields will have two entries, a red and a green one.

Further functionality

If the existing decorators are not sufficient for your needs, you can either search the Graylog marketplace, or write your own decorator.

Export results as CSV

It is also possible to export the results of your search as a CSV document. To do so, select all fields you want to export in the search sidebar, click on the More actions button, and select Export as CSV.

_images/export_as_csv.png

Hint: Some Graylog inputs keep the original message in the full_message field. If you need to export the original message, you can do so by clicking on the List all fields link at the bottom of the sidebar, and then selecting the full_message field.

Warning

Exporting results to a CSV will not preserve sorting because Graylog is using the virtual _doc field to “sort” documents for performance reasons. If you need to have the exported data ordered you will need to either make a scroll query to ElasticSearch and process it after, or to download the file and post process it via other means.

Search result highlighting

Graylog supports search result highlighting since v0.20.2:

_images/search_result_highlighting.png

Enabling/Disabling search result highlighting

Using search result highlighting will result in slightly higher resource consumption of searches. You can enable and disable it using a configuration parameter in the graylog.conf of your Graylog nodes:

allow_highlighting = true

Search configuration

Graylog allows customizing the options allowed to search queries, like limiting the time range users can select or configuring the list of displayed relative time ranges.

_images/queries_search_configuration.png

All search configuration settings can be customized using the web interface on the System -> Configurations page in the Search configuration section.

Query time range limit

Sometimes the amount of data stored in Graylog is quite big and spans a wide time range (e. g. multiple years). In order to prevent normal users from accidentally running search queries which could use up lots of resources, it is possible to limit the time range that users are allowed to search in.

Using this feature, the time range of a search query exceeding the configured query time range limit will automatically be adapted to the given limit.

_images/queries_query_time_range_limit.png

The query time range limit is a duration formatted according to ISO 8601 following the basic format P<date>T<time> with the following rules:

Designator Description
P Duration designator (for period) placed at the start of the duration representation
Y Year designator that follows the value for the number of years
M Month designator that follows the value for the number of months
W Week designator that follows the value for the number of weeks
D Day designator that follows the value for the number of days
T Time designator that precedes the time components of the representation
H Hour designator that follows the value for the number of hours
M Minute designator that follows the value for the number of minutes
S Second designator that follows the value for the number of seconds

Examples:

ISO 8601 duration Description
P30D 30 days
PT1H 1 hour
P1DT12H 1 day and 12 hours

More details about the format of ISO 8601 durations can be found on Wikipedia.

Relative time ranges

The list of time ranges displayed in the Relative time frame selector can be configured, too. It consists of a list of ISO 8601 durations which the users can select on the search page.

_images/queries_relative_timerange_options.png

Streams

What are streams?

The Graylog streams are a mechanism to route messages into categories in realtime while they are processed. You define rules that instruct Graylog which message to route into which streams. Imagine sending these three messages to Graylog:

message: INSERT failed (out of disk space)
level: 3 (error)
source: database-host-1

message: Added user 'foo'.
level: 6 (informational)
source: database-host-2

message: smtp ERR: remote closed the connection
level: 3 (error)
source: application-x

One of the many things that you could do with streams is creating a stream called Database errors that is catching every error message from one of your database hosts.

Create a new stream with these rules, selecting the option to match all rules:

  • Field level must be greater than 4
  • Field source must match regular expression ^database-host-\d+

This will route every new message with a level higher than WARN and a source that matches the database host regular expression into the stream.

A message will be routed into every stream that has all (or any) of its rules matching. This means that a message can be part of many streams and not just one.

The stream is now appearing in the streams list and a click on its title will show you all database errors.

Streams can be used to be alerted in case certain condition happens. We cover more topics related to alerts in Alerts.

What’s the difference to saved searches?

The biggest difference is that streams are processed in realtime. This allows realtime alerting and forwarding to other systems. Imagine forwarding your database errors to another system or writing them to a file by regularly reading them from the message storage. Realtime streams do this much better.

Another difference is that searches for complex stream rule sets are always comparably cheap to perform because a message is tagged with stream IDs when processed. A search for Graylog internally always looks like this, no matter how many stream rules you have configured:

streams:[STREAM_ID]

Building a query with all rules would cause significantly higher load on the message storage.

How do I create a stream?

  1. Navigate to the streams section from the top navigation bar.
  2. Click “Create stream”.
  3. Save the stream after entering a name and a description. For example All error messages and Catching all error messages from all sources. The stream is now saved but not yet activated.
  4. Click on “Edit rules” for the stream you just created. That will open a page where you can manage and test stream rules.
  5. Choose how you want to evaluate the stream rules to decide which messages go into the stream:
    • A message must match all of the following rules (logical AND): Messages will only be routed into the stream if all rules in the stream are fulfilled. This is the default behavior
    • A message must match at least one of the following rules (logical OR): Messages will be routed into the stream if one or more rules in the stream are fulfilled
  6. Add stream rules, by indicating the field that you want to check, and the condition that should satisfy. Try the rules against some messages by loading them from an input or manually giving a message ID. Once you are satisfied with the results, click on “I’m done”.
  7. The stream is still paused, click on the “Start stream” button to activate the stream.

Index Sets

For starters, you should read Index model for a comprehensive description of the index set functionality in Graylog.

Every stream is assigned to an index set which controls how messages routed into that stream are being stored into Elasticsearch. The stream overview in the web interface shows the assigned index set for each stream.

_images/stream_overview.png

Index sets can be assigned to a stream when creating the stream and changed later when editing the stream settings.

Important

Graylog will not automatically copy messages into new Elasticsearch indices if another index set is being assigned to a stream.

_images/stream_create.png

Graylog routes every message into the All messages stream by default, unless the message is removed from this stream with a pipeline rule (see Processing Pipelines) or it’s routed into a stream marked with Remove matches from ‘All messages’ stream.

The latter is useful if messages should be stored with different settings than the ones in the Default index set, for example web server access logs should only be stored for 4 weeks while all other messages should be stored for 1 year.

Storage requirements

Graylog writes messages once for each index set into Elasticsearch. This means that if all streams are using the Default index set, each message will be written exactly once into Elasticsearch, no matter into how many streams the message has been sent. This can be thought of a kind of de-duplication.

If some streams use other index sets and the Remove matches from ‘All messages’ stream setting is not enabled, messages will be written into Elasticsearch at least twice, once for the Default index set and once for the assigned index set. This means that the same message will be stored in two or more indices in Elasticsearch with different index settings.

Unless you explicitly want to store messages multiple times in different Elasticsearch indices, either assign the Default index set to the respective streams or enable the Remove matches from ‘All messages’ stream setting for the respective streams.

Outputs

The stream output system allows you to forward every message that is routed into a stream to other destinations.

Outputs are managed globally (like message inputs) and not for single streams. You can create new outputs and activate them for as many streams as you like. This way you can configure a forwarding destination once and select multiple streams to use it.

Graylog ships with default outputs and can be extended with Plugins.

Use cases

These are a few example use cases for streams:

  • Forward a subset of messages to other data analysis or BI systems to reduce their license costs.
  • Monitor exception or error rates in your whole environment and broken down per subsystem.
  • Get a list of all failed SSH logins and use quick values to analyze which user names where affected.
  • Catch all HTTP POST requests to /login that were answered with a HTTP 302 and route them into a stream called Successful user logins. Now get a chart of when users logged in and use quick values to get a list of users that performed the most logins in the search time frame.

How are streams processed internally?

Every message that comes in is matched against the rules of a stream. For messages satisfying all or at least one of the stream rules (as configured in the stream), the internal ID of that stream is stored in the streams array of the processed message.

All analysis methods and searches that are bound to streams can now easily narrow their operation by searching with a streams:[STREAM_ID] limit. This is done automatically by Graylog and does not have to be provided by the user.

_images/internal_stream_processing.png

Stream Processing Runtime Limits

An important step during the processing of a message is the stream classification. Every message is matched against the user-configured stream rules. The message is added to the stream if all or any rules of a stream matches, depending on what the user chose. Applying stream rules is done during the indexing of a message only, so the amount of time spent for the classification of a message is crucial for the overall performance and message throughput the system can handle.

There are certain scenarios when a stream rule takes very long to match. When this happens for a number of messages, message processing can stall, messages waiting for processing accumulate in memory and the whole system could become non-responsive. Messages are lost and manual intervention would be necessary. This is the worst case scenario.

To prevent this, the runtime of stream rule matching is limited. When it is taking longer than the configured runtime limit, the process of matching this exact message against the rules of this specific stream is aborted. Message processing in general and for this specific message continues though. As the runtime limit needs to be configured pretty high (usually a magnitude higher as a regular stream rule match takes), any excess of it is considered a fault and is recorded for this stream. If the number of recorded faults for a single stream is higher than a configured threshold, the stream rule set of this stream is considered faulty and the stream is disabled. This is done to protect the overall stability and performance of message processing. Obviously, this is a tradeoff and based on the assumption, that the total loss of one or more messages is worse than a loss of stream classification for these.

There are scenarios where this might not be applicable or even detrimental. If there is a high fluctuation of the message load including situations where the message load is much higher than the system can handle, overall stream matching can take longer than the configured timeout. If this happens repeatedly, all streams get disabled. This is a clear indicator that your system is overutilized and not able to handle the peak message load.

How to configure the timeout values if the defaults do not match

There are two configuration variables in the configuration file of the server, which influence the behavior of this functionality.

  • stream_processing_timeout defines the maximum amount of time the rules of a stream are able to spend. When this is exceeded, stream rule matching for this stream is aborted and a fault is recorded. This setting is defined in milliseconds, the default is 2000 (2 seconds).
  • stream_processing_max_faults is the maximum number of times a single stream can exceed this runtime limit. When it happens more often, the stream is disabled until it is manually reenabled. The default for this setting is 3.

What could cause it?

If a single stream has been disabled and all others are doing well, the chances are high that one or more stream rules are performing bad under certain circumstances. In most cases, this is related to stream rules which are utilizing regular expressions. For most other stream rules types the general runtime is constant, while it varies very much for regular expressions, influenced by the regular expression itself and the input matched against it. In some special cases, the difference between a match and a non-match of a regular expression can be in the order of 100 or even 1000. This is caused by a phenomenon called catastrophic backtracking. There are good write-ups about it on the web which will help you understanding it.

Summary: How do I solve it?

  1. Check the rules of the stream that is disabled for rules that could take very long (especially regular expressions).
  2. Modify or delete those stream rules.
  3. Re-enable the stream.

Programmatic access via the REST API

Many organisations already run monitoring infrastructure that are able to alert operations staff when incidents are detected. These systems are often capable of either polling for information on a regular schedule or being pushed new alerts - this article describes how to use the Graylog Stream Alert API to poll for currently active alerts in order to further process them in third party products.

Checking for currently active alert/triggered conditions

Graylog stream alerts can currently be configured to send emails when one or more of the associated alert conditions evaluate to true. While sending email solves many immediate problems when it comes to alerting, it can be helpful to gain programmatic access to the currently active alerts.

Each stream which has alerts configured also has a list of active alerts, which can potentially be empty if there were no alerts so far. Using the stream’s ID, one can check the current state of the alert conditions associated with the stream using the authenticated API call:

GET /streams/<streamid>/alerts/check

It returns a description of the configured conditions as well as a count of how many triggered the alert. This data can be used to for example send SNMP traps in other parts of the monitoring system.

Sample JSON return value:

{
  "total_triggered": 0,
  "results": [
    {
      "condition": {
        "id": "984d04d5-1791-4500-a17e-cd9621cc2ea7",
        "in_grace": false,
        "created_at": "2014-06-11T12:42:50.312Z",
        "parameters": {
          "field": "one_minute_rate",
          "grace": 1,
          "time": 1,
          "backlog": 0,
          "threshold_type": "lower",
          "type": "mean",
          "threshold": 1
        },
        "creator_user_id": "admin",
        "type": "field_value"
      },
      "triggered": false
    }
  ],
  "calculated_at": "2014-06-12T13:44:20.704Z"
}

Note that the result is cached for 30 seconds.

List of already triggered stream alerts

Checking the current state of a stream’s alerts can be useful to trigger alarms in other monitoring systems, but if one wants to send more detailed messages to operations, it can be very helpful to get more information about the current state of the stream, for example the list of all triggered alerts since a certain timestamp.

This information is available per stream using the call:

GET /streams/<streamid>/alerts?since=1402460923

The since parameter is a unix timestamp value. Its return value could be:

{
  "total": 1,
  "alerts": [
    {
      "id": "539878473004e72240a5c829",
      "condition_id": "984d04d5-1791-4500-a17e-cd9621cc2ea7",
      "condition_parameters": {
        "field": "one_minute_rate",
        "grace": 1,
        "time": 1,
        "backlog": 0,
        "threshold_type": "lower",
        "type": "mean",
        "threshold": 1
      },
      "description": "Field one_minute_rate had a mean of 0.0 in the last 1 minutes with trigger condition lower than 1.0. (Current grace time: 1 minutes)",
      "triggered_at": "2014-06-11T15:39:51.780Z",
      "stream_id": "53984d8630042acb39c79f84"
    }
  ]
}

Using this information more detailed messages can be produced, since the response contains more detailed information about the nature of the alert, as well as the number of alerts triggered since the timestamp provided.

Note that currently a maximum of 300 alerts will be returned.

FAQs

Using regular expressions for stream matching

Stream rules support matching field values using regular expressions. Graylog uses the Java Pattern class to execute regular expressions.

For the individual elements of regular expression syntax, please refer to Oracle’s documentation, however the syntax largely follows the familiar regular expression languages in widespread use today and will be familiar to most.

However, one key question that is often raised is matching a string in case insensitive manner. Java regular expressions are case sensitive by default. Certain flags, such as the one to ignore case sensitivity can either be set in the code, or as an inline flag in the regular expression.

To for example route every message that matches the browser name in the following user agent string:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36

the regular expression .*applewebkit.* will not match because it is case sensitive. In order to match the expression using any combination of upper- and lowercase characters use the (?i) flag as such:

(?i).*applewebkit.*

Most of the other flags supported by Java are rarely used in the context of matching stream rules or extractors, but if you need them their use is documented on the same Javadoc page by Oracle.

Can I add messages to a stream after they were processed and stored?

No. Currently there is no way to re-process or re-match messages into streams.

Only new messages are routed into the current set of streams.

Can I write own outputs, alert conditions or notifications?

Yes. Please refer to the Plugins documentation page.

Alerts

Alerts are always based on streams. You can define conditions that trigger alerts. For example whenever the stream All production exceptions has more than 50 messages per minute or when the field milliseconds had a too high standard deviation in the last five minutes.

Navigate to the alerts section from the top navigation bar to see already configured alerts, alerts that were fired in the past or to configure new alert conditions and notifications.

Graylog ships with default alert conditions and alert notifications, and both can be extended with Plugins.

Alert states

Graylog alerts are periodical searches that can trigger some notifications when a defined condition is satisfied. Since Graylog 2.2.0, alerts can have two states:

Unresolved
Alerts have an unresolved state while the defined condition is satisfied. New alerts are triggered in this state, and they also execute the notifications attached to the stream. These alerts usually require an action on your side.
Resolved
Graylog automatically resolves alerts once their alert condition is no longer satisfied. This is the final state of an alert, as Graylog will create a new alert if the alert condition is satisfied again in the future. After an alert is resolved, Graylog will apply the grace period you defined in the alert condition, waiting a certain time before creating a new alert for this alert condition.

Alerts overview

The alerts overview page lets you find out which alerts currently require your attention in an easy way, while also allowing you to check alerts that were triggered in the past and are now resolved.

_images/alerts_alerts_overview.png

You can click on an alert name at any time to see more details about it.

Alert details

From the alert details page you can quickly check the reason why an alert was triggered, the status and configuration of notifications sent by Graylog, and the search results in the time frame when the alert was unresolved.

_images/alerts_alert_details.png
Alert timeline

From within the alert details page, you can see a timeline of what occurred since Graylog detected an alert condition was satisfied. This includes the time when Graylog evaluated the condition that triggered the alert, the time when notifications were executed and the results of executing them, and the time when the alert was resolved (if that is the case).

Triggered notifications

Sometimes sending alert notifications may fail for some reason. Graylog includes details of the configured notifications at the time an alert was triggered and the result of executing those notifications, helping you to debug and fix any problems that may arise.

Search results

You can quickly look at messages received while the alert was unresolved from within the alert details page. It is also possible to open that search in the search page, allowing you to further analyse the problem at hand.

Conditions

The first step of managing alerts with Graylog is defining alert conditions.

Alert conditions specify searches that Graylog will execute periodically, and also indicate under which circumstances Graylog should consider those search results as exceptional, triggering an alert in that case.

Click on Manage conditions in the Alerts section to see your current conditions details, modify them, or add new ones. Clicking on an alert condition’s title will open a detail page where you can also see the notifications that will be executed when the condition is satisfied.

_images/alerts_alert_condition.png

Alert condition types explained

In this section we explain what the default alert conditions included in Graylog do, and how to configure them. Since Graylog 2.2.0, alert conditions can be extended via Plugins, you can find more types in the Graylog Marketplace or even create your own.

Message count condition

This condition triggers whenever the stream received more than X messages in the last Y minutes. Perfect for example to be alerted when there are many exceptions on your platform. Create a stream that catches every error message and be alerted when that stream exceeds normal throughput levels.

Field aggregation condition

Triggers whenever the result of a statistical computation of a numerical message field in the stream is higher or lower than a given threshold. Perfect to monitor performance problems: Be alerted whenever the standard deviation of the response time of your application was higher than X in the last Y minutes.

Field content condition

This condition triggers whenever the stream received at least one message since the last alert run that has a field set to a given value. Get an alert when a message with the field `type` set to `security` arrives in the stream.

Important

We do not recommend to run this on analyzed fields like message or full_message because it is broken down to terms and you might get unexpected alerts. For example a check for security would also alert if a message with the field set to no security is received because it was broken down to no and security. This only happens on the analyzed message and full_message in Graylog.

Please also take note that only a single alert is raised for this condition during the alerting interval, although multiple messages containing the given value may have been received since the last alert.

Notifications

Warning

Starting in Graylog 2.2.0, alert notifications are only triggered once, just when a new alert is created. As long as the alert is unresolved or in grace period, Graylog will not send further notifications. This will help you reducing the noise and annoyance of getting notified way too often when a problem persists for a while. Should your setup require repeated notifications you can enable this during the creation of the alert condition since Graylog 2.2.2.

Notifications (previously known as Alarm Callbacks) enable you to take actions on external systems when an alert is triggered. In this way, you can rely on Graylog to know when something is not right in your logs.

Click on Manage notifications in the Alerts section to see your current notification details, modify them, test them, or add new ones. Remember that notifications are associated to streams, so all conditions evaluated in a stream will share the same notifications.

_images/alerts_alert_notification.png

Alert notifications types explained

In this section we explain what the default alert notifications included in Graylog do, and how to configure them. Alert notifications are meant to be extensible through Plugins, you can find more types in the Graylog Marketplace or even create your own.

Important

In previous versions of Graylog (before 2.2.0), the email alarm notification was used, when alert conditions existed for a stream, but no alarm notification had been created before. This has been changed, so that if there is no alarm notification existing for a stream, alerts will be shown in the interface but no other action is performed. To help users coming from earlier version, there is a migration job which is being run once, creating the email alarm notification explicitly for qualifying streams, so the old behavior is preserved.

Email alert notification

The email alert notification can be used to send an email to the configured alert receivers when the conditions are triggered.

Make sure to check the email-related configuration settings in the Graylog configuration file.

Three configuration options are available for the alert notification to customize the email that will be sent. The email body and email subject are JMTE templates. JMTE is a minimal template engine that supports variables, loops and conditions. See the JMTE documentation for a language reference.

We expose the following objects to the templates.

stream

The stream this alert belongs to.

  • stream.id ID of the stream
  • stream.title title of the stream
  • stream.description stream description
stream_url
A string that contains the HTTP URL to the stream.
check_result

The check result object for this stream.

  • check_result.triggeredCondition string representation of the triggered alert condition
  • check_result.triggeredAt date when this condition was triggered
  • check_result.resultDescription text that describes the check result
backlog
A list of message objects. Can be used to iterate over the messages via foreach.
message (only available via iteration over the backlog object)

The message object has several fields with details about the message. When using the message object without accessing any fields, the toString() method of the underlying Java object is used to display it.

  • message.id autogenerated message id
  • message.message the actual message text
  • message.source the source of the message
  • message.timestamp the message timestamp
  • message.fields map of key value pairs for all the fields defined in the message

The message.fields fields can be useful to get access to arbitrary fields that are defined in the message. For example message.fields.full_message would return the full_message of a GELF message.

_images/alerts_email_notification.png
HTTP alert notification

The HTTP alert notification lets you configure an endpoint that will be called when the alert is triggered.

Graylog will send a POST request to the notification URL including information about the alert. Here is an example of the payload included in a notification:

{
    "check_result": {
        "result_description": "Stream had 2 messages in the last 1 minutes with trigger condition more than 1 messages. (Current grace time: 1 minutes)",
        "triggered_condition": {
            "id": "5e7a9c8d-9bb1-47b6-b8db-4a3a83a25e0c",
            "type": "MESSAGE_COUNT",
            "created_at": "2015-09-10T09:44:10.552Z",
            "creator_user_id": "admin",
            "grace": 1,
            "parameters": {
                "grace": 1,
                "threshold": 1,
                "threshold_type": "more",
                "backlog": 5,
                "time": 1
            },
            "description": "time: 1, threshold_type: more, threshold: 1, grace: 1",
            "type_string": "MESSAGE_COUNT",
            "backlog": 5
        },
        "triggered_at": "2015-09-10T09:45:54.749Z",
        "triggered": true,
        "matching_messages": [
            {
                "index": "graylog2_7",
                "message": "WARN: System is failing",
                "fields": {
                    "gl2_remote_ip": "127.0.0.1",
                    "gl2_remote_port": 56498,
                    "gl2_source_node": "41283fec-36b4-4352-a859-7b3d79846b3c",
                    "gl2_source_input": "55f15092bee8e2841898eb53"
                },
                "id": "b7b08150-57a0-11e5-b2a2-d6b4cd83d1d5",
                "stream_ids": [
                    "55f1509dbee8e2841898eb64"
                ],
                "source": "127.0.0.1",
                "timestamp": "2015-09-10T09:45:49.284Z"
            },
            {
                "index": "graylog2_7",
                "message": "ERROR: This is an example error message",
                "fields": {
                    "gl2_remote_ip": "127.0.0.1",
                    "gl2_remote_port": 56481,
                    "gl2_source_node": "41283fec-36b4-4352-a859-7b3d79846b3c",
                    "gl2_source_input": "55f15092bee8e2841898eb53"
                },
                "id": "afd71342-57a0-11e5-b2a2-d6b4cd83d1d5",
                "stream_ids": [
                    "55f1509dbee8e2841898eb64"
                ],
                "source": "127.0.0.1",
                "timestamp": "2015-09-10T09:45:36.116Z"
            }
        ]
    },
    "stream": {
        "creator_user_id": "admin",
        "outputs": [],
        "matching_type": "AND",
        "description": "test stream",
        "created_at": "2015-09-10T09:42:53.833Z",
        "disabled": false,
        "rules": [
            {
                "field": "gl2_source_input",
                "stream_id": "55f1509dbee8e2841898eb64",
                "id": "55f150b5bee8e2841898eb7f",
                "type": 1,
                "inverted": false,
                "value": "55f15092bee8e2841898eb53"
            }
        ],
        "alert_conditions": [
            {
                "creator_user_id": "admin",
                "created_at": "2015-09-10T09:44:10.552Z",
                "id": "5e7a9c8d-9bb1-47b6-b8db-4a3a83a25e0c",
                "type": "message_count",
                "parameters": {
                    "grace": 1,
                    "threshold": 1,
                    "threshold_type": "more",
                    "backlog": 5,
                    "time": 1
                }
            }
        ],
        "id": "55f1509dbee8e2841898eb64",
        "title": "test",
        "content_pack": null
    }
}

Dashboards

Why dashboards matter

Using dashboards allows you to build pre-defined views on your data to always have everything important just one click away.

Sometimes it takes domain knowledge to be able to figure out the search queries to get the correct results for your specific applications. People with the required domain knowledge can define the search query once and then display the results on a dashboard to share them with co-workers, managers, or even sales and marketing departments.

This guide will take you through the process of creating dashboards and storing information on them. At the end you will have a dashboard with automatically updating information that you can share with anybody or just a subset of people based on permissions.

_images/dashboards_1.png

How to use dashboards

Creating an empty dashboard

Navigate to the Dashboards section using the link in the top menu bar of your Graylog web interface. The page is listing all dashboards that you are allowed to view. (More on permissions later.) Hit the Create dashboard button to create a new empty dashboard.

The only required information is a title and a description of the new dashboard. Use a specific but not too long title so people can easily see what to expect on the dashboard. The description can be a bit longer and could contain more detailed information about the displayed data or how it is collected.

Hit the Create button to create the dashboard. You should now see your new dashboard on the dashboards overview page. Click on the title of your new dashboard to see it. Next, we will be adding widgets to the dashboard we have just created.

_images/dashboards_2.png

Adding widgets

You should have your empty dashboard in front of you. Let’s add some widgets! You can add search result information to dashboards with a couple of clicks. The following search result types can be added to dashboards:

  • Search result counts
  • Search result histogram charts
  • Statistical values
  • Field value charts
  • Stacked charts
  • Quick values results

You can learn more about the different widget types in Widget types explained.

Once you can see the results of your search, you will see buttons with the “Add to dashboard” text, that will allow you to select the dashboard where the widget will be displayed, and configure the widget.

_images/dashboards_3.png _images/dashboards_4.png

Examples

It is strongly recommended to read the getting started guide on basic searches and analysis first. This will make the following examples more obvious for you.

  • Top log sources today
    • Example search: *, timeframe: Last 24 hours
    • Expand the source field in the sidebar and hit Quick values
    • Add quick values to dashboard
  • Number of exceptions in a given app today
    • Example search: source:myapp AND Exception, timeframe: Last 24 hours
    • Add search result count to dashboard
  • Response time chart of a given app
    • Example search: source:myapp2, any timeframe you want
    • Expand a field representing the response time of requests in the sidebar and hit Generate chart
    • Add chart to dashboard

Widgets from streams

You can of course also add widgets from stream search results. Every widget added this way will always be bound to streams. If you have a stream that contains every SSH login you can just search for everything (*) in that stream and store the result count as SSH logins on a dashboard.

Result

You should now see widgets on your dashboard. You will learn how to modify the dashboard, and edit widgets in the next chapter.

_images/dashboards_1.png

Widget types explained

Graylog supports a wide variety of widgets that allow you to quickly visualize data from your logs. This section intends to give you some information to better understand each widget type, and how they can help you to see relevant details from the many logs you receive.

Search result counts

This kind of widget includes a count of the number of search results for a given search. It can help you to quickly visualize things like the number of exceptions an application logs, or the number of requests your site receives.

All search result counts created with a relative time frame can additionally display trend information. The trend is calculated by comparing the count for the given time frame, with the one resulting from going further back the same amount of time. For example, to calculate the trend in a search result count with a relative search of 5 minutes ago, Graylog will count the messages in the last 5 minutes, and compare that with the count of the previous 5 minutes.

Search result histogram charts

The search result histogram displays a chart using the time frame of your search, graphing the number of search result counts over time. It may help you to visualize how the number of request to your site change over time, or to see how many downloads a file has over time.

Changing the graph resolution, you can decide how much time each bar of the graph represents.

Statistical values

You can add to your dashboard any statistical value calculated for a field. This may help you to see the mean time response for your application, or how many unique servers are handling requests to your application, by using the cardinality value of that field. Please refer to Field statistics for more information on the available statistical functions and how to display them in your searches.

As with search result counts, you can also add trend information to statistical value widgets created with a relative time frame.

Field value charts

To draw an statistical value over time, you can use a field value chart. They could help you to see the evolution of the number of unique users visiting your site in the last week. In the Field graphs section we explain how to create these charts and ways you can customize them.

Stacked charts

Stacked charts group several field value charts under the same axes. They let you compare different values in a compact way, like the number of visits to two different websites. As explained in Field graphs, stacked charts are basically field value charts represented in the same axes.

Quick values results

In order to show a list of values a certain field contains and their distribution, you can use a quick value widget. This may help you to see the percentage of failed requests in your application, or which parts of your application experience more problems. Please refer to Quick values to see how to request this information in your search result page.

The quick values information can be represented as a pie chart and/or as a table, so you can choose what is the best fit for your needs.

Modifying dashboards

You need to unlock dashboards to make any changes to them. Hit the “Unlock/Edit” button in the top right corner of a dashboard to unlock it. You should now see different icons at the bottom of each widget, that allow you to perform more actions.

Unlocked dashboard widgets explained

Unlocked dashboard widgets have two buttons that should be pretty self-explanatory.

  • Delete widget
  • Edit widget configuration
  • Change widget size (when you hover over the widget)
_images/dashboards_5.png

Widget cache times

Widget values are cached in graylog-server by default. This means that the cost of value computation does not grow with every new device or even browser tab displaying a dashboard. Some widgets might need to show real-time information (set cache time to 1 second) and some widgets might be updated way less often (like Top SSH users this month, cache time 10 minutes) to save expensive computation resources.

Repositioning widgets

Just grab a widget with your mouse in unlocked dashboard mode and move it around. Other widgets should adopt and re-position intelligently to make place for the widget you are moving. The positions are automatically saved when dropping a widget.

Resizing widgets

When hovering over a widget, you will see that a gray arrow appears in its bottom-right corner. You can use that icon to resize widgets. Their contents will adapt to the new size automatically!

_images/dashboards_7.png

Dashboard permissions

Graylog users in the Admin role are always allowed to view and edit all dashboards. Users in the Reader role are by default not allowed to view or edit any dashboard.

_images/dashboards_6.png

Navigate to System -> Roles and create a new role that grant the permissions you wish. You can then assign that new role to any users you wish to give dashboard permissions in the System -> Users page.

You can read more about user permissions and roles.

That’s it!

Congratulations, you have just gone through the basic principles of Graylog dashboards. Now think about which dashboards to create. We suggest:

  • Create dashboards for yourself and your team members
  • Create dashboards to share with your manager
  • Create dashboards to share with the CIO of your company

Think about which information you need access to frequently. What information could your manager or CIO be interested in? Maybe they want to see how the number of exceptions went down or how your team utilized existing hardware better. The sales team could be interested to see signup rates in realtime and the marketing team will love you for providing insights into low level KPIs that is just a click away.

Extractors

The problem explained

Syslog (RFC3164, RFC5424) is the de facto standard logging protocol since the 1980s and was originally developed as part of the sendmail project. It comes with some annoying shortcomings that we tried to improve in GELF for application logging.

Because syslog has a clear specification in its RFCs it should be possible to parse it relatively easy. Unfortunately there are a lot of devices (especially routers and firewalls) out there that send logs looking like syslog but actually breaking several rules stated in the RFCs. We tried to write a parser that reads all of them as good as possible and failed. Such a loosely defined text message usually breaks the compatibility in the first date field already. Some devices leave out hostnames completely, some use localized time zone names (e. g. “MESZ” instead of “CEST”), and some just omit the current year in the timestamp field.

Then there are devices out there that at least do not claim to send syslog when they don’t but have another completely separate log format that needs to be parsed specifically.

We decided not to write custom message inputs and parsers for all those thousands of devices, formats, firmwares and configuration parameters out there but came up with the concept of Extractors introduced the v0.20.0 series of Graylog.

Graylog extractors explained

The extractors allow you to instruct Graylog nodes about how to extract data from any text in the received message (no matter from which format or if an already extracted field) to message fields. You may already know why structuring data into fields is important if you are using Graylog: There are a lot of analysis possibilities with full text searches but the real power of log analytics unveils when you can run queries like http_response_code:>=500 AND user_id:9001 to get all internal server errors that were triggered by a specific user.

Wouldn’t it be nice to be able to search for all blocked packages of a given source IP or to get a quickterms analysis of recently failed SSH login usernames? Hard to do when all you have is just a single long text message.

Attention

Graylog extractors only work on text fields but won’t be executed for numeric fields or anything other than a string.

Creating extractors is possible via either Graylog REST API calls or from the web interface using a wizard. Select a message input on the System -> Inputs page and hit Manage extractors in the actions menu. The wizard allows you to load a message to test your extractor configuration against. You can extract data using for example regular expressions, Grok patterns, substrings, or even by splitting the message into tokens by separator characters. The wizard looks like this and should be pretty intuitive:

_images/extractors_1.png

You can also choose to apply so called converters on the extracted value to for example convert a string consisting of numbers to an integer or double value (important for range searches later), anonymize IP addresses, lower-/uppercase a string, build a hash value, and much more.

Import extractors

The recommended way of importing extractors in Graylog is using Content packs. The Graylog Marketplace provides access to many content packs that you can easily download and import into your Graylog setup.

You can still import extractors from JSON if you want to. Just copy the JSON extractor export into the import dialog of a message input of the fitting type (every extractor set entry in the directory tells you what type of input to spawn, e. g. syslog, GELF, or Raw/plaintext) and you are good to go. The next messages coming in should already include the extracted fields with possibly converted values.

A message sent by Heroku and received by Graylog with the imported Heroku extractor set on a plaintext TCP input looks like this: (look at the extracted fields in the message detail view)

_images/extractors_2.png

Using regular expressions to extract data

Extractors support matching field values using regular expressions. Graylog uses the Java Pattern class to evaluate regular expressions.

For the individual elements of regular expression syntax, please refer to Oracle’s documentation, however the syntax largely follows the familiar regular expression languages in widespread use today and will be familiar to most.

However, one key question that is often raised is matching a string in case insensitive manner. Java regular expressions are case sensitive by default. Certain flags, such as the one to ignore case sensitivity can either be set in the code, or as an inline flag in the regular expression.

For example, to create an extractor that matches the browser name in the following user agent string:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36

the regular expression (applewebkit) will not match because it is case sensitive. In order to match the expression using any combination of upper- and lowercase characters use the (?i) flag as such:

(?i)(applewebkit)

Most of the other flags supported by Java are rarely used in the context of matching stream rules or extractors, but if you need them their use is documented on the same Javadoc page by Oracle. One common reason to use regular expression flags in your regular expression is to make use of what is called non-capturing groups. Those are parentheses which only group alternatives, but do not make Graylog extract the data they match and are indicated by (?:).

Using Grok patterns to extract data

Graylog also supports the extracting data using the popular Grok language to allow you to make use of your existing patterns.

Grok is a set of regular expressions that can be combined to more complex patterns, allowing to name different parts of the matched groups.

By using Grok patterns, you can extract multiple fields from a message field in a single extractor, which often simplifies specifying extractors.

Simple regular expressions are often sufficient to extract a single word or number from a log line, but if you know the entire structure of a line beforehand, for example for an access log, or the format of a firewall log, using Grok is advantageous.

For example a firewall log line could contain:

len=50824 src=172.17.22.108 sport=829 dst=192.168.70.66 dport=513

We can now create the following patterns on the System/Grok Patterns page in the web interface:

BASE10NUM (?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))
NUMBER (?:%{BASE10NUM})
IPV6 ((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?
IPV4 (?<![0-9])(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}))(?![0-9])
IP (?:%{IPV6}|%{IPV4})
DATA .*?

Then, in the extractor configuration, we can use these patterns to extract the relevant fields from the line:

len=%{NUMBER:length} src=%{IP:srcip} sport=%{NUMBER:srcport} dst=%{IP:dstip} dport=%{NUMBER:dstport}

This will add the relevant extracted fields to our log message, allowing Graylog to search on those individual fields, which can lead to more effective search queries by allowing to specifically look for packets that came from a specific source IP instead of also matching destination IPs if one would only search for the IP across all fields.

If the Grok pattern creates many fields, which can happen if you make use of heavily nested patterns, you can tell Graylog to skip certain fields (and the output of their subpatterns) by naming a field with the special keyword UNWANTED.

Let’s say you want to parse a line like:

type:44 bytes:34 errors:122

but you are only interested in the second number bytes. You could use a pattern like:

type:%{BASE10NUM:type} bytes:%{BASE10NUM:bytes} errors:%{BASE10NUM:errors}

However, this would create three fields named type, bytes, and errors. Even not naming the first and last patterns would still create a field names BASE10NUM. In order to ignore fields, but still require matching them use UNWANTED:

type:%{BASE10NUM:UNWANTED} bytes:%{BASE10NUM:bytes} errors:%{BASE10NUM:UNWANTED}

This now creates only a single field called bytes while making sure the entire pattern must match.

If you already know the data type of the extracted fields, you can make use of the type conversion feature built into the Graylog Grok library. Going back to the earlier example:

len=50824 src=172.17.22.108 sport=829 dst=192.168.70.66 dport=513

We know that the content of the field len is an integer and would like to make sure it is stored with that data type, so we can later create field graphs with it or access the field’s statistical values, like average etc.

Grok directly supports converting field values by adding ;datatype at the end of the pattern, like:

len=%{NUMBER:length;int} src=%{IP:srcip} sport=%{NUMBER:srcport} dst=%{IP:dstip} dport=%{NUMBER:dstport}

The currently supported data types, and their corresponding ranges and values, are:

Type Range Example
byte -128 ... 127 %{NUMBER:fieldname;byte}
short -32768 ... 32767 %{NUMBER:fieldname;short}
int -2^31 ... 2^31 -1 %{NUMBER:fieldname;int}
long -2^63 ... 2^63 -1 %{NUMBER:fieldname;long}
float 32-bit IEEE 754 %{NUMBER:fieldname;float}
double 64-bit IEEE 754 %{NUMBER:fieldname;double}
boolean true, false %{DATA:fieldname;boolean}
string Any UTF-8 string %{DATA:fieldname;string}
date See SimpleDateFormat %{DATA:timestamp;date;dd/MMM/yyyy:HH:mm:ss Z}
datetime Alias for date  

There are many resources are the web with useful patterns, and one very helpful tool is the Grok Debugger, which allows you to test your patterns while you develop them.

Graylog uses Java Grok to parse and run Grok patterns.

Using the JSON extractor

Since version 1.2, Graylog also supports extracting data from messages sent in JSON format.

Using the JSON extractor is easy: once a Graylog input receives messages in JSON format, you can create an extractor by going to System -> Inputs and clicking on the Manage extractors button for that input. Next, you need to load a message to extract data from, and select the field containing the JSON document. The following page let you add some extra information to tell Graylog how it should extract the information. Let’s illustrate how a message would be extracted with an example message:

{"level": "ERROR", "details": {"message": "This is an example error message", "controller": "IndexController", "tags": ["one", "two", "three"]}}

Using the default settings, that message would be extracted into these fields:

details_tags
one, two, three
level
ERROR
details_controller
IndexController
details_message
This is an example error message

In the create extractor page, you can also customize how to separate list of elements, keys, and key/values. It is also possible to flatten JSON structures or expand them into multiple fields, as shown in the example above.

Automatically extract all key=value pairs

Sometimes you will receive messages like this:

This is a test message with some key/value pairs. key1=value1 some_other_key=foo

You might want to extract all key=value pairs into Graylog message fields without having to specify all possible key names or even their order. This is how you can easily do this:

Create a new extractor of type “Copy Input” and select to read from the field message. (Or any other string field that contains key=value pairs.) Configure the extractor to store the (copied) field value to the same field. In this case message. The trick is to add the “Key=Value pairs to fields” converter as last step. Because we use the “Copy Input” extractor, the converter will run over the complete field you selected and convert all key=value pairs it can find.

This is a screenshot of the complete extractor configuration:

_images/keyvalue_converter_1.png

... and this is the resulting message:

_images/keyvalue_converter_2.png

Normalization

Many log formats are similar to each other, but not quite the same. In particular they often only differ in the names attached to pieces of information.

For example, consider different hardware firewall vendors, whose models log the destination IP in different fields of the message, some use dstip, some dst and yet others use destination-address:

2004-10-13 10:37:17 PDT Packet Length=50824, Source address=172.17.22.108, Source port=829, Destination address=192.168.70.66, Destination port=513
2004-10-13 10:37:17 PDT len=50824 src=172.17.22.108 sport=829 dst=192.168.70.66 dport=513
2004-10-13 10:37:17 PDT length="50824" srcip="172.17.22.108" srcport="829" dstip="192.168.70.66" dstport="513"

You can use one or more non-capturing groups to specify the alternatives of the field names, but still be able to extract the a parentheses group in the regular expression. Remember that Graylog will extract data from the first matched group of the regular expression. An example of a regular expression matching the destination IP field of all those log messages from above is:

(?:dst|dstip|[dD]estination\saddress)="?(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"?

This will only extract the IP address without caring about which of the three naming schemes was used in the original log message. This way you don’t have to set up three different extractors.

The standard date converter

Date parser converters for extractors allow you to convert extracted data into timestamps - Usually used to set the timestamp of a message based on some date it contains. Let’s assume we have this message from a network device:

<131>: foo-bar-dc3-org-de01: Mar 12 00:45:38: %LINK-3-UPDOWN: Interface GigabitEthernet0/31, changed state to down

Extracting most of the data is not a problem and can be done easily. Using the date in the message (Mar 12 00:45:38) as Graylog message timestamp however needs to be done with a date parser converter.

Use a copy input extractor rule to select the timestamp and apply the Date converter with a format string:

MMM dd HH:mm:ss

(format string table at the end of this page)

_images/dateparser_1.png _images/dateparser_2.png
Standard date converter format string table
Symbol Meaning Presentation Examples
G era text AD
C century of era (>=0) number 20
Y year of era (>=0) year 1996
x weekyear year 1996
w week of weekyear number 27
e day of week number 2
E day of week text Tuesday; Tue
y year year 1996
D day of year number 189
M month of year month July; Jul; 07
d day of month number 10
a halfday of day text PM
K hour of halfday (0~11) number 0
h clockhour of halfday (1~12) number 12
H hour of day (0~23) number 0
k clockhour of day (1~24) number 24
m minute of hour number 30
s second of minute number 55
S fraction of second millis 978
z time zone text Pacific Standard Time; PST
Z time zone offset/id zone -0800; -08:00; America/Los_Angeles
escape for text delimiter  
‘’ single quote literal

The flexible date converter

Now imagine you had one of those devices that send messages that are not so easy to parse because they do not follow a strict timestamp format. Some network devices for example like to send days of the month without adding a padding 0 for the first 9 days. You’ll have dates like Mar 9 and Mar 10 and end up having problems defining a parser string for that. Or maybe you have something else that is really exotic like just last wednesday as timestamp. The flexible date converter is accepting any text data and tries to build a date from that as good as it can.

Examples:

  • Mar 12, converted at 12:27:00 UTC in the year 2014: 2014-03-12T12:27:00.000
  • 2014-3-12 12:27: 2014-03-12T12:27:00.000
  • Mar 12 2pm: 2014-03-12T14:00:00.000

Note that the flexible date converter is using UTC as time zone by default unless you have time zone information in the parsed text or have configured another time zone when adding the flexible date converter to an extractor (see this comprehensive list of time zones available for the flexible date converter).

Processing Pipelines

Graylog’s new processing pipelines plugin allows greater flexibility in routing, blacklisting, modifying, and enriching messages as they flow through Graylog.

Pipelines and rules are not configuration for pre-built code, as extractors and stream rules are, but are instead represented as code, much like Drools rules. This gives them great flexibility and extensibility, and enables live changes to Graylog’s message processing behavior.

The language used for pipeline rules is very simple and can be extended by functions, which are fully pluggable.

The following pages introduce the concepts of pipelines, rules, stream connections, and the built-in functions.

Pipelines

Overview

Pipelines are the central concept tying together the processing steps applied to your messages.

Pipelines contain rules and can be connected to one or more streams, enabling fine-grained control over which processing steps are performed on a given type of message.

Processing rules are simply conditions followed by a list of actions, and do not have control flow by themselves. To control the order in which processing rules are applied, pipelines utilize stages.

Stages are groups of conditions and actions which need to run in order. This is done by assigning a priority value. All stages with the same priority run at the same time across all connected pipelines. Stages provide the necessary control flow to decide whether or not to run the remaining stages in a pipeline.

Pipeline structure

Internally pipelines are represented as code. Let’s have a look at a simple example and understand what each part does:

pipeline "My new pipeline"
stage 1 match all
  rule "has firewall fields";
  rule "from firewall subnet";
stage 2 match either
  rule "geocode IPs";
  rule "anonymize source IPs";
end

This code snippet declares a new pipeline named My new pipeline, which has two stages.

Stages

Stages are run in the order of their given priority, and aren’t otherwise named. Stage priorities can be any integer you prefer, positive or negative.

In our example the first stage has a priority of 1 and the second stage a priority of 2, however -99 and 42 could be used instead. Ordering based upon stage priority gives you the ability to run certain rules before or after others, which might exist in other connected pipelines, without modifying those other connected pipelines. This is particularly handy when dealing with changing data formats.

For example, if there was a second pipeline declared with a stage assigned priority 0, that stage’s rules would run before either of the ones from the example (priorities 1 and 2, respectively). Note that the order in which stages are declared is irrelevant, since they are sorted according to their priority.

Stages then list the rule references they want to be executed, as well as whether any or all of the rules’ conditions need to be satisfied to continue running the pipeline.

In our example, imagine rule “has firewall fields” checks for the presence of message fields src_ip and dst_ip, but does not have any actions to run. For a message without both fields the rule’s condition would evaluate to false and the pipeline would abort after stage 1, as the stage requires all rules be satisfied (match all). With the pipeline aborted, stage 2 would not run.

match either acts as an OR operator, only requiring a single rule’s condition evaluate to true in order to continue pipeline processing. Note that actions are still run for all matching rules in the stage, even if it is the final stage in the pipeline.

Rules are referenced by their names, and can therefore be shared among many different pipelines. The intention is to enable creation of reusable building blocks, making it easier to process the data specific to your organization or use case.

Read more about Rules in the next section.

Rules

Overview

Rules are the cornerstone of processing pipelines. They contain the logic about how to change, enrich, route, and drop messages.

To avoid the complexities of a complete programming language, Graylog supports a small rule language to express processing logic. The rule language is intentionally limited to allow for easier understanding, faster learning, and better runtime optimization.

The real work of rules is done in functions, which are completely pluggable. Graylog already ships with a great number of built-in functions, providing data conversion, string manipulation, data retrieval using lookup tables, JSON parsing, and much more.

We expect that special purpose functions will be written and shared by the community, enabling faster innovation and problem solving than previously possible.

Rule Structure

Building upon the previous example in the Pipelines section, let’s look at examples of some of the rules we’ve referenced:

rule "has firewall fields"
when
    has_field("src_ip") && has_field("dst_ip")
then
end
rule "from firewall subnet"
when
    cidr_match("10.10.10.0/24", to_ip($message.gl2_remote_ip))
then
end

Firstly, apart from naming the rule structure follows a simple when, then pattern. In the when clause we specify a boolean expression which is evaluated in the context of the current message in the pipeline. These are the conditions used by the pipeline processor to determine whether to run a rule, and collectively (when evaluating the containing stage’s match all or match any requirement) whether to continue in a pipeline.

Note that the has firewall fields rule uses the built-in function has_field to check whether the message has the src_ip and dst_ip fields, as we want to use them in a later stage of the pipeline. This rule has no actions to run in its then clause, since we only want to use it to determine whether subsequent stages should run.

The second rule, from firewall subnet, uses the built-in function cidr_match, which takes a CIDR pattern and an IP address. In this case we reference a field from the currently-processed message using the message reference syntax $message.

Graylog always sets the gl2_remote_ip field on messages, so we don’t need to check whether that field exists. If we wanted to use a field that might not exist on all messages we’d first use the has_field function to ensure its presence.

Note the call to to_ip around the gl2_remote_ip field reference. This is necessary since the field is stored as a string internally, and cidr_match requires an IP address object for its ip parameter.

Requiring an explicit conversion to an IP address object demonstrates an important feature of Graylog’s rule language: enforcement of type safety to ensure that you end up with the data in the correct format. All too often everything is treated as a string, which wastes enormous amounts of cycles on data conversion and prevents proper analysis of the data.

We again have no actions to run, since we’re just using the rule to manage the pipeline’s flow, so the then block is empty.

You might be wondering why we didn’t just combine the has firewall fields and from firewall subnet rules, since they seem to be serving the same purpose. While we could absolutely do so, recall that rules are intended to be reusable building blocks. Imagine you have a another pipeline for a different firewall subnet. Rather than duplicating the logic to check for src_ip and dst_ip, and updating each rule if anything ever changes (e.g. additional fields), you can simply add the has firewall fields rule to your new stage. With this approach you only need to update a single rule, with the change immediatedly taking effect for all pipelines referencing it. Nice!

Data Types

As we have seen in the previous section, we need to make sure to use the proper data types when calling functions.

Graylog’s rule language parser rejects invalid use of types, making it safe to write rules.

The six built-in types in Graylog are string (a UTF-8 string), double (corresponds to Java’s Double), long (Java’s Long), boolean (Boolean), void (indicating a function has no return value to prevent it being used in a condition), and ip (a subset of InetAddress), but plugins are free to add additional types as they see fit. The rule processor takes care of ensuring that values and functions agree on the types being used.

By convention, functions that convert types start with the prefix to_. Please refer to the Functions index for a list.

Conditions

In Graylog’s rules the when clause is a boolean expression, which is evaluated against the processed message.

Expressions support the common boolean operators AND (or &&), OR (||), NOT (!), and comparison operators (<, <=, >, >=, ==, !=).

Any function that returns a value can be called in the when clause, but it must eventually evaluate to a boolean. For example: we were able to use to_ip in the from firewall subnet since it was being passed to cidr_match, which returns a boolean, but could not use route_to_stream since it doesn’t return a value.

The condition must not be empty, but can simply consist of the boolean literal true. This is useful when you always want to execute a rule’s actions.

If a condition calls a function which is not present (perhaps due to a typo or missing plugin) the call evaluates to false.

Note

Comparing two fields can be done when you use the same data type, e.g. to_string($message.src_ip) == to_string($message.dst_ip) will compare the two strings and will become true on match. Comparing different data types evaluates to false.

Actions

A rule’s then clause contains a list of actions which are evaluated in the order they appear.

There are two different types of actions:

  • Function calls
  • Variable assignments

Function calls look exactly like they do in conditions. All functions, including those which do not return a value, may be used in the then clause.

Variable assignments have the following form:

let name = value;

Variables are useful to avoid recomputing expensive parsing of data, holding on to temporary values, or making rules more readable.

Variables need to be defined before they can be used. Their fields (if any) can be accessed using the name.field notation in any place where a value of the field’s type is required.

The list of actions can be empty, in which case the rule is essentially a pluggable condition to help manage a pipeline’s processing flow.

Stream connections

Overview

Pipelines by themselves do not process any messages. For a pipeline to actually do any work it must first be connected to one or more streams, which enables fine-grained control of the messages processed by that pipeline.

Note that the built-in function route_to_stream causes a message to be routed to a particular stream. After the routing occurs, the pipeline engine will look up and start evaluating any pipelines connected to that stream.

Although pipelines can trigger other pipelines via message routing, incoming messages must be processed by an initial set of pipelines connected to one or more streams.

The All messages stream

All messages received by Graylog are initially routed into the All messages stream. You can use this stream as the entry point to pipeline processing, allowing incoming messages to be routed to more streams and being processed subsequently.

However, if you prefer to use the original stream matching functionality (i.e. stream rules), you can configure the Pipeline Processor to run after the Message Filter Chain (in the Message Processors Configuration section of the System -> Configurations page) and connect pipelines to existing streams. This gives you fine-grained control over the extraction, conversion, and enrichment process.

The importance of message processor ordering

It’s important to note that the order of message processors may have a significant impact on how your messages get processed.

For example: Message Filter Chain is responsible for setting static fields and running extractors defined on inputs, as well as evaluation of stream rules. If you create a pipeline that expects the presence of a static field, but the Pipeline Processor runs before Message Filter Chain, that field will not be available for use in your pipeline.

When designing your streams and pipelines be aware of the message processor order, especially if you have dependencies on earlier message processing.

Functions

Overview

Functions are the means of interacting with the messages Graylog processes.

Functions are written in Java and are pluggable, allowing Graylog’s pipeline processing capabilities to be easily extended.

Conceptually a function receives parameters, the current message context, and (potentially) returns a value. The data types of its return value and parameters determine where it can be used in a rule. Graylog ensures the rules are sound from a data type perspective.

A function’s parameters can either be passed as named pairs or position, as long as optional parameters are declared as coming last. The functions’ documentation below indicates which parameters are optional by wrapping them in square brackets.

Let’s look at a small example to illustrate these properties:

rule "function howto"
when
    has_field("transaction_date")
then
    // the following date format assumes there's no time zone in the string
    let new_date = parse_date(to_string($message.transaction_date), "yyyy-MM-dd HH:mm:ss");
    set_field("transaction_year", new_date.year);
end

In this example, we check if the current message contains the field transaction_date and then, after converting it to a string, try to parse it according to the format string yyyy-MM-dd HH:mm:ss, so for example the string 2016-03-05 14:45:02 would match. The parse_date function returns a DateTime object from the Java Joda-Time library, allowing easier access to the date’s components.

We then add the transaction’s year as a new field, transaction_year to the message.

You’ll note that we didn’t specify a time zone for our date, but Graylog still had to pick one. Graylog never relies on the local time of your server, as that makes it nearly impossible to figure out why date handling came up with its result.

The reason Graylog knows which timezone to use is because parse_date actually takes four parameters, rather than the two we’ve given it in this example. The other two parameters are a String called timezone (default value: "UTC") and a String called locale (default value: the default locale of the system running Graylog) which both are optional.

Let’s assume we have another message field called transaction_timezone, which is sent by the application and contains the time zone ID the transaction was done in (hopefully no application in the world sends its data like this, though):

rule "function howto"
when
    has_field("transaction_date") && has_field("transaction_timezone")
then
    // the following date format assumes there's no time zone in the string
    let new_date = parse_date(
                        to_string($message.transaction_date),
                        "yyyy-MM-dd HH:mm:ss",
                        to_string($message.transaction_timezone)
                );
    set_field("transaction_year", new_date.year);
end

Now we’re passing the parse_date function its timezone parameter the string value of the message’s transaction_timezone field.

In this case we only have a single optional parameter, which makes it easy to simply omit it from the end of the function call. However, if there are multiple optional parameters, or if there are so many parameters that it gets difficult to keep track of which positions correspond to which parameters, you can also use the named parameter variant of function calls. In this mode the order of the parameters does not matter, but all required ones still need to be there.

In our case the alternative version of calling parse_date would look like this:

rule "function howto"
when
    has_field("transaction_date") && has_field("transaction_timezone")
then
    // the following date format assumes there's no time zone in the string
    let new_date = parse_date(
                        value: to_string($message.transaction_date),
                        pattern: "yyyy-MM-dd HH:mm:ss",
                        timezone: to_string($message.transaction_timezone)
                );
    set_field("transaction_year", new_date.year);
end

More examples of the usage of various functions in pipeline rules can be found in the graylog2-server repo.

All parameters in Graylog’s processing functions, listed below, are named.

Function Index

The following list describes the built-in functions that ship with Graylog. Additional third party functions are available via plugins in the marketplace.

Built-in Functions
Name Description
debug Print the passed value as string in the Graylog log.
to_bool Converts the single parameter to a boolean value using its string value.
to_double Converts the first parameter to a double floating point value.
to_long Converts the first parameter to a long integer value.
to_string Converts the first parameter to its string representation.
to_url Converts a value to a valid URL using its string representation.
is_null Checks whether a value is ‘null’.
is_not_null Checks whether a value is not ‘null’.
abbreviate Abbreviates a String using ellipses.
capitalize Capitalizes a String changing the first letter to title case.
uncapitalize Uncapitalizes a String changing the first letter to lower case.
uppercase Converts a String to upper case.
lowercase Converts a String to lower case.
swapcase Swaps the case of a String.
contains Checks if a string contains another string.
substring Returns a substring of value with the given start and end offsets.
concat Concatenates two strings.
split Split a string around matches of this pattern (Java syntax).
regex Match a regular expression against a string, with matcher groups.
grok Applies a Grok pattern to a string.
key_value Extracts key/value pairs from a string.
crc32 Returns the hex encoded CRC32 digest of the given string.
crc32c Returns the hex encoded CRC32C (RFC 3720, Section 12.1) digest of the given string.
md5 Returns the hex encoded MD5 digest of the given string.
murmur3_32 Returns the hex encoded MurmurHash3 (32-bit) digest of the given string.
murmur3_128 Returns the hex encoded MurmurHash3 (128-bit) digest of the given string.
sha1 Returns the hex encoded SHA1 digest of the given string.
sha256 Returns the hex encoded SHA256 digest of the given string.
sha512 Returns the hex encoded SHA512 digest of the given string.
parse_json Parse a string into a JSON tree.
select_jsonpath Selects one or more named JSON Path expressions from a JSON tree.
to_ip Converts the given string to an IP object.
cidr_match Checks whether the given IP matches a CIDR pattern.
from_input Checks whether the current message was received by the given input.
route_to_stream Assigns the current message to the specified stream.
remove_from_stream Removes the current message from the specified stream.
create_message Currently incomplete Creates a new message which will be evaluated by the entire processing pipeline.
clone_message Clones a message.
drop_message This currently processed message will be removed from the processing pipeline after the rule finishes.
has_field Checks whether the currently processed message contains the named field.
remove_field Removes the named field from the currently processed message.
set_field Sets the name field to the given value in the currently processed message.
set_fields Sets multiple fields to the given values in the currently processed message.
rename_field Rename a message field.
syslog_facility Converts a syslog facility number to its string representation.
syslog_level Converts a syslog level number to its string representation.
expand_syslog_priority Converts a syslog priority number to its level and facility.
expand_syslog_priority_as_string Converts a syslog priority number to its level and facility string representations.
now Returns the current date and time.
parse_date Parses a date and time from the given string, according to a strict pattern.
flex_parse_date Attempts to parse a date and time using the Natty date parser.
format_date Formats a date and time according to a given formatter pattern.
to_date Converts a type to a date.
years Create a period with a specified number of years.
months Create a period with a specified number of months.
weeks Create a period with a specified number of weeks.
days Create a period with a specified number of days.
hours Create a period with a specified number of hours.
minutes Create a period with a specified number of minutes.
seconds Create a period with a specified number of seconds.
millis Create a period with a specified number of millis.
period Parses an ISO 8601 period from the specified string.
lookup Looks up a multi value in the named lookup table.
lookup_value Looks up a single value in the named lookup table.
debug

debug(value: any)

Print any passed value as string in the Graylog log.

Note

The debug message will only appear in the log of the Graylog node that was processing the message you are trying to debug.

Example:

// Print: "INFO : org.graylog.plugins.pipelineprocessor.ast.functions.Function - PIPELINE DEBUG: Dropped message from <source>"
let debug_message = concat("Dropped message from ", to_string($message.source));
debug(debug_message);
to_bool

to_bool(value: any)

Converts the single parameter to a boolean value using its string value.

to_double

to_double(value: any, [default: double])

Converts the first parameter to a double floating point value.

to_long

to_long(value: any, [default: long])

Converts the first parameter to a long integer value.

to_string

to_string(value: any, [default: string])

Converts the first parameter to its string representation.

to_url

to_url(url: any, [default: string])

Converts the given url to a valid URL.

is_null

is_null(value: any)

Checks if the given value is null.

Example:

// Check if the `src_addr` field is null (empty).
// If null, boolean true is returned. If not null, boolean false is returned.
is_null(src_addr)
is_not_null

is_not_null(value: any)

Checks if the given value is not null.

Example:

// Check if the `src_addr` field is not null.
// If not null, boolean true is returned. If null, boolean false is returned.
is_not_null(src_addr)
abbreviate

abbreviate(value: string, width: long)

Abbreviates a String using ellipses, the width defines the maximum length of the resulting string.

capitalize

capitalize(value: string)

Capitalizes a String changing the first letter to title case.

uncapitalize

uncapitalize(value: string)

Uncapitalizes a String changing the first letter to lower case.

uppercase

uppercase(value: string, [locale: string])

Converts a String to upper case. The locale (IETF BCP 47 language tag) defaults to “en”.

lowercase

lowercase(value: string, [locale: string])

Converts a String to lower case. The locale (IETF BCP 47 language tag) defaults to “en”.

swapcase

swapcase(value: string)

Swaps the case of a String changing upper and title case to lower case, and lower case to upper case.

contains

contains(value: string, search: string, [ignore_case: boolean])

Checks if value contains search, optionally ignoring the case of the search pattern.

Example:

// Check if the `example.org` is in the `hostname` field. Ignore case.
contains(to_string($message.hostname), "example.org", true)
substring

substring(value: string, start: long, [end: long])

Returns a substring of value starting at the start offset (zero based indices), optionally ending at the end offset. Both offsets can be negative, indicating positions relative to the end of value.

Example:

// Extract the substring starting at offset 0 and stopping at offset 2
// Below example will return "ab"
substring("abc", 0, 2)
concat

concat(first: string, second: string)

Returns a new string combining the text of first and second.

Note

The concat() function only concatenates two strings. If you want to build a string from more than two sub-strings, you’ll have to use concat() multiple times, see the example below.

Example:

// Build a message like:
// 'TCP connect from 88.99.35.172 to 192.168.1.10 Port 443'
let build_message_0 = concat(to_string($message.protocol), " connect from ");
let build_message_1 = concat(build_message_0, to_string($message.src_ip));
let build_message_2 = concat(build_message_1, " to ");
let build_message_3 = concat(build_message_2, to_string($message.dst_ip));
let build_message_4 = concat(build_message_3, " Port ");
let build_message_5 = concat(build_message_4, to_string($message.dst_port));
set_field("message", build_message_5);
split

split(pattern: string, value: string, [limit: int])

Split a value around matches of pattern. Use limit to indicate the number of times the pattern should be applied.

Note

Patterns have to be valid Java String literals, please ensure you escape any backslashes in your regular expressions!

regex

regex(pattern: string, value: string, [group_names: array[string])

Match the regular expression in pattern against value. Returns a match object, with the boolean property matches to indicate whether the regular expression matched and, if requested, the matching groups as groups. The groups can optionally be named using the group_names array. If not named, the groups names are strings starting with "0".

Note

Patterns have to be valid Java String literals, please ensure you escape any backslashes in your regular expressions!

grok

grok(pattern: string, value: string, [only_named_captures: boolean])

Applies the grok pattern grok to value. Returns a match object, containing a Map of field names and values. You can set only_named_captures to true to only return matches using named captures.

Tip

The result of executing the grok function can be passed as argument for set_fields to set the extracted fields into a message.

Note

The GROK pattern used in this function must be first imported into the Graylog “GROK Patterns” page.

Example:

// Apply the Grok pattern NGINXACCESS to the string representation of the "message" field
// Only return named captures from GROK pattern
grok("%{NGINXACCESS}", to_string($message.message), true)

Example:

// Let "nginxaccessfields" hold the Map returned by the grok function
// Use the "set_fields" function to use the "nginxaccessfields" object to set individual field names and values.
let nginxaccessfields = grok("%{NGINXACCESS}", to_string($message.message), true);
set_fields(nginxaccessfields);
key_value
key_value(
  value: string,
  [delimiters: string],
  [kv_delimiters: string],
  [ignore_empty_values: boolean],
  [allow_dup_keys: boolean],
  [handle_dup_keys: string],
  [trim_key_chars: string],
  [trim_value_chars: string]
)

Extracts key-value pairs from the given value and returns them as a Map of field names and values. You can optionally specify:

delimiters
Characters used to separate pairs. We will use each character in the string, so you do not need to separate them. Default value: <whitespace>.
kv_delimiters
Characters used to separate keys from values. Again, there is no need to separate each character. Default value: =.
ignore_empty_values
Ignores keys containing empty values. Default value: true.
allow_dup_keys
Indicates if duplicated keys are allowed. Default value: true.
handle_dup_keys
How to handle duplicated keys (if allow_dup_keys is set). It can take the values take_first, which will only use the first value for the key; or take_last, which will only use the last value for the key. Setting this option to any other value will change the handling to concatenate, which will combine all values given to the key, separating them with the value set in this option. For example, setting handle_dup_keys: ",", would combine all values given to a key a, separating them with a comma, such as 1,2,foo. Default value: take_first.
trim_key_chars
Characters to trim (remove from the beginning and end) from keys. Default value: no trim.
trim_value_chars
Characters to trim (remove from the beginning and end) from values. Default value: no trim.

Tip

The result of executing the key_value function can be passed as argument for set_fields to set the extracted fields into a message.

crc32

crc32(value: string)

Creates the hex encoded CRC32 digest of the value.

crc32c

crc32c(value: string)

Creates the hex encoded CRC32C (RFC 3720, Section 12.1) digest of the value.

md5

md5(value: string)

Creates the hex encoded MD5 digest of the value.

murmur3_32

murmur3_32(value: string)

Creates the hex encoded MurmurHash3 (32-bit) digest of the value.

murmur3_128

murmur3_128(value: string)

Creates the hex encoded MurmurHash3 (128-bit) digest of the value.

sha1

sha1(value: string)

Creates the hex encoded SHA1 digest of the value.

sha256

sha256(value: string)

Creates the hex encoded SHA256 digest of the value.

sha512

sha512(value: string)

Creates the hex encoded SHA512 digest of the value.

parse_json

parse_json(value: string)

Parses the value string as JSON, returning the resulting JSON tree.

select_jsonpath

select_jsonpath(json: JsonNode, paths: Map<string, string>)

Evaluates the given paths against the json tree and returns the map of the resulting values.

to_ip

to_ip(ip: string)

Converts the given ip string to an IpAddress object.

cidr_match

cidr_match(cidr: string, ip: IpAddress)

Checks whether the given ip address object matches the cidr pattern.

Example:

// Check whether the value in the src_addr field is in the subnet "192.0.0.0/8"
// If it is, return boolean "True". Else, return boolean "False"
cidr_match("192.0.0.0/8", to_ip($message.src_addr))
from_input

from_input(id: string | name: string)

Checks whether the currently processed message was received on the given input. The input can be looked up by either specifying its name (the comparison ignores the case) or the id.

route_to_stream

route_to_stream(id: string | name: string, [message: Message], [remove_from_default: boolean])

Routes the message to the given stream. The stream can be looked up by either specifying its name or the id.

If message is omitted, this function uses the currently processed message.

This causes the message to be evaluated on the pipelines connected to that stream, unless the stream has already been processed for this message.

If remove_from_default is true, the message is also removed from the default stream “All messages”.

Example:

// Route the current processed message to a stream with ID `512bad1a535b43bd6f3f5e86` (preferred method)
route_to_stream("512bad1a535b43bd6f3f5e86");

// Route the current processed message to a stream named `Custom Stream`
route_to_stream("Custom Stream");
remove_from_stream

remove_from_stream(id: string | name: string, [message: Message])

Removes the message from the given stream. The stream can be looked up by either specifying its name or the id.

If message is omitted, this function uses the currently processed message.

If the message ends up being on no stream anymore, it is implicitly routed back to the default stream “All messages”. This ensures that you the message is not accidentally lost due to complex stream routing rules. If you want to discard the message entirely, use the drop_message function.

create_message

create_message([message: string], [source: string], [timestamp: DateTime])

Creates a new message with from the given parameters. If any of them is omitted, its value is taken from the corresponding fields of the currently processed message. If timestamp is omitted, the timestamp of the created message will be the timestamp at that moment.

clone_message

clone_message([message: Message])

Clones a message. If message is omitted, this function uses the currently processed message.

drop_message

drop_message(message: Message)

The processing pipeline will remove the given message after the rule is finished executing.

If message is omitted, this function uses the currently processed message.

This can be used to implement flexible blacklisting based on various conditions.

has_field

has_field(field: string, [message: Message])

Checks whether the given message contains a field with the name field.

If message is omitted, this function uses the currently processed message.

remove_field

remove_field(field: string, [message: Message])

Removes the given field with the name field from the given message, unless the field is reserved.

If message is omitted, this function uses the currently processed message.

set_field

set_field(field: string, value: any, [prefix: string], [suffix: string], [message: Message])

Sets the given field named field to the new value. The field name must be valid, and specifically cannot include a . character. It is trimmed of leading and trailing whitespace. String values are trimmed of whitespace as well.

The optional prefix and suffix parameters specify which prefix or suffix should be added to the inserted field name.

If message is omitted, this function uses the currently processed message.

set_fields

set_fields(fields: Map<string, any>, [prefix: string], [suffix: string], [message: Message])

Sets all of the given name-value pairs in field in the given message. This is a convenience function acting like set_field. It can be helpful for using the result of a function like select_jsonpath or regex in the currently processed message especially when the key names are the result of a regular expression.

The optional prefix and suffix parameters specify which prefix or suffix should be added to the inserted field names.

If message is omitted, this function uses the currently processed message.

rename_field

rename_field(old_field: string, new_field: string, [message: Message])

Modifies the field name old_field to new_field in the given message, keeping the field value unchanged.

syslog_facility

syslog_facility(value: any)

Converts the syslog facility number in value to its string representation.

syslog_level

syslog_level(value: any)

Converts the syslog severity number in value to its string representation.

expand_syslog_priority

expand_syslog_priority(value: any)

Converts the syslog priority number in value to its numeric severity and facility values.

expand_syslog_priority_as_string

expand_syslog_priority_as_string(value: any)

Converts the syslog priority number in value to its severity and facility string representations.

now

now([timezone: string])

Returns the current date and time. Uses the default time zone UTC.

parse_date

parse_date(value: string, pattern: string, [locale: string], [timezone: string])

Parses the value into a date and time object, using the pattern. If no timezone is detected in the pattern, the optional timezone parameter is used as the assumed timezone. If omitted the timezone defaults to UTC.

The format used for the pattern parameter is identical to the pattern of the Joda-Time DateTimeFormat.

Symbol Meaning Presentation Examples
G era text AD
C century of era (>=0) number 20
Y year of era (>=0) year 1996
x weekyear year 1996
w week of weekyear number 27
e day of week number 2
E day of week text Tuesday; Tue
y year year 1996
D day of year number 189
M month of year month July; Jul; 07
d day of month number 10
a halfday of day text PM
K hour of halfday (0~11) number 0
h clockhour of halfday (1~12) number 12
H hour of day (0~23) number 0
k clockhour of day (1~24) number 24
m minute of hour number 30
s second of minute number 55
S fraction of second millis 978
z time zone text Pacific Standard Time; PST
Z time zone offset/id zone -0800; -08:00; America/Los_Angeles
' escape for text delimiter  
'' single quote literal

The format used for the locale parameter is a valid language tag according to IETF BCP 47 which can be parsed by the Locale#forLanguageTag(String) method.

Also see IANA Language Subtag Registry.

If no locale was specified, the locale of the system running Graylog (the default locale) is being used.

Examples:

Language Tag Description
en English
en-US English as used in the United States
de-CH German for Switzerland
flex_parse_date

flex_parse_date(value: string, [default: DateTime], [timezone: string])

Uses the Natty date parser to parse a date and time value. If no timezone is detected in the pattern, the optional timezone parameter is used as the assumed timezone. If omitted the timezone defaults to UTC.

In case the parser fails to detect a valid date and time the default date and time is being returned, otherwise the expression fails to evaluate and will be aborted.

format_date

format_date(value: DateTime, format: string, [timezone: string])

Returns the given date and time value formatted according to the format string. If no timezone is given, it defaults to UTC.

to_date

to_date(value: any, [timezone: string])

Converts value to a date. If no timezone is given, it defaults to UTC.

years

years(value: long)

Create a period with value number of years.

months

months(value: long)

Create a period with value number of months.

weeks

weeks(value: long)

Create a period with value number of weeks.

days

days(value: long)

Create a period with value number of days.

hours

hours(value: long)

Create a period with value number of hours.

minutes

minutes(value: long)

Create a period with value number of minutes.

seconds

seconds(value: long)

Create a period with value number of seconds.

millis

millis(value: long)

Create a period with value number of milliseconds.

period

period(value: string)

Parses an ISO 8601 period from value.

lookup

lookup(lookup_table: string, key: any, [default: any])

Looks up a multi value in the named lookup table.

lookup_value

lookup_value(lookup_table: string, key: any, [default: any])

Looks up a single value in the named lookup table.

Example:

// Lookup a value in lookup table "ip_lookup" where the key is the string representation of the src_addr field.
lookup_value("ip_lookup", to_string($message.src_addr));

Usage

Overview

Once you understand the concepts explained in Pipelines, Rules, and Stream connections, you’re ready to start creating your own processing pipelines. This page gives you the information you need to get started with the user interface.

Configuration

Configure the message processor

Before start using the processing pipelines you need to ensure the Pipeline Processor message processor is enabled and correctly configured. You can do so by going to the System -> Configurations page, and checking the configuration in the Message Processors Configuration section.

_images/pipelines_message_processor.png

On the Configurations page, you need to enable the Pipeline Processor message processor and, if you want your pipelines to have access to static fields set on inputs and/or fields set by extractors, set the Pipeline Processor after the Message Filter Chain.

Manage rules

You can create, edit, and delete your pipeline rules in the Manage rules page, under System -> Pipelines.

_images/pipelines_manage_rules.png

Clicking on Create Rule or Edit in one of the rules will open a page where you can write your own rule. The page lists available functions and their details to make the task a bit more manageable.

_images/pipelines_edit_rule.png

Managing pipelines

Once there are some rules in Graylog, you can create pipelines that use them to modify and enrich your messages.

To manage your pipelines, access Manage pipelines page under System -> Pipelines. This page is where you can create, edit, and delete pipelines.

_images/pipelines_manage_pipelines.png

In order to create or edit pipelines, and as explained in Pipelines, you need to add your rules to a stage, which has a certain priority. The Web interface will let you add rules to the default stage (priority 0), and to create new stages with potentially different priorities.

_images/pipelines_show_pipeline.png

A pipeline can have more than one stage, and when you create or edit a stage you need to select how to proceed to the next stage in the pipeline:

All rules on this stage match the message
This option will only consider further stages in the pipeline when all conditions in rules evaluated in this stage are true. This is equivalent to match all in the Pipelines section.
At least one of the rules on this stage matches the message
Selecting this option will continue to further stages in the pipeline when one or more of the conditions in rules evaluated in this stage are true. This is equivalent to match either in the Pipelines section.

Connect pipelines to streams

You can decide which streams are connected to a pipeline from the pipeline details page. Under System -> Pipelines, click on the title of the pipeline you want to connect to a stream, and then click on the Edit connections button.

_images/pipelines_manage_connections.png

You can assign many pipelines to the same stream, in which case all connected pipelines will process messages routed into that stream based upon the overall order of stage priorities.

_images/pipelines_edit_connections.png

Remember, as mentioned in the Stream connections documentation, the All messages stream is where all messages are initially routed, and is therefore a good place to apply pipelines applicable to all of your messages. Such pipelines might be responsible for stream routing, blacklisting, field manipulation, etc.

Simulate your changes

After performing some changes in a processing pipeline, you most likely want to see how they are applied to incoming messages. This is what the pipeline simulator is for.

Click the Simulate processing button under System -> Pipelines or in the pipeline details page to access the pipeline simulator.

_images/pipelines_simulation_1.png

In order to test the message processing you need to provide a raw message that will be routed into the stream you want to simulate. The raw message should use the same format Graylog will receive. For example: you can type a GELF message, in the same format your GELF library would send, in the Raw message field. Don’t forget to select the correct codec for the message you provide.

After specifying the message and codec, click Load message to start the simulation and display the results.

_images/pipelines_simulation_2.png

The simulation provides the following results:

Changes summary
Provides a summary of modified fields in the original message, as well as a list of added and dropped messages.
Results preview
Shows all fields in the processed message.
Simulation trace
Displays a trace of the processing, indicating which rules were evaluated and which were executed. It also includes a timeline, in microseconds, to allow you to see which rules and pipelines are taking up the most time during message processing.

Lookup Tables

Graylog 2.3 introduced the lookup tables feature. It allows you to lookup/map/translate message field values into new values and write them into new message fields or overwrite existing fields. A simple example is to use a static CSV file to map IP addresses to host names.

Components

The lookup table systems consists of four components.

  • Data adapters
  • Caches
  • Lookup tables
  • Lookup results

Data Adapters

Data adapters are used to do the actual lookup for a value. They might read from a CSV file, connect to a database or execute HTTP requests to receive the lookup result.

Data adapter implementations are pluggable and new ones can be added through plugins.

Caches

The caches are responsible for caching the lookup results to improve the lookup performance and/or to avoid overloading databases and APIs. They are separate entities to make it possible to reuse a cache implementation for different data adapters. That way, the data adapters do not have to care about caching and do not have to implement it on their own.

Cache implementations are pluggable and new ones can be added through plugins.

Important

The CSV file adapter reads the entire contents of the file into HEAP memory. Ensure that you size the HEAP accordingly.

Note

The CSV file adapter refreshes its contents within each check interval if the file was changed. If the cache was purged but the check interval has not elapsed, lookups might return expired values.

Lookup Tables

The lookup table component ties together a data adapter instance and a cache instance. It is needed to actually enable the usage of the lookup table in extractors, converters, pipeline functions and decorators.

Lookup Results

The lookup result is returned by a lookup table through the data adapter and can contain two types of data. A single value and a multi value.

The single value can be a string, number or boolean and will be used in extractors, converters, decorators and pipeline rules. In our CSV example to lookup host names for IP addresses, this would be the host name string.

A multi value is a map/dictionary-like data structure and can contain several different values. This is useful if the data adapter can provide multiple values for a key. A good example for this would be the geo-ip data adapter which does not only provide the latitude and longitude for an IP address, but also information about the city and country of the location. Currently, the multi value can only be used in a pipeline rule when using the lookup() pipeline function.

Example 1: Output for a CSV data adapter including a single value and a multi value.

_images/example-single-value.png

Example 2: Output for the geo-ip data adapter including a single value and a multi value.

_images/example-multi-value.png

Setup

The lookup tables can be configured on the “System/Lookup Tables” page.

You need to create at least one data adapter and one cache before you can create your first lookup table. The following example setup creates a lookup table with a CSV file data adapter and an in-memory cache.

Create Data Adapter

Navigate to “System/Lookup Tables” and click the “Data Adapters” button in the top right corner. Then you first have to select a data adapter type.

Every data adapter form includes data adapter specific documentation that helps you to configure it correctly.

_images/setup-data-adapter.png

Create Cache

Navigate to “System/Lookup Tables” and click the “Caches” button in the top right corner. Then you first have to select a cache type.

Every cache form includes cache specific documentation that helps you to configure it correctly.

_images/setup-cache.png

Create Lookup Table

Now you can create a lookup table with the newly created data adapter and cache by navigating to “System/Lookup Tables” and clicking “Create lookup table”.

Make sure to select the data adapter and cache instances in the creation form.

_images/setup-table.png
Default Values

Every lookup table can optionally be configured with default values which will be used if a lookup operation does not return any result.

_images/setup-table-defaults.png

Usage

Lookup tables can be used with the following Graylog components.

  • Extractors
  • Converters
  • Decorators
  • Pipeline rules

Extractors

A lookup table extractor can be used to lookup the value of a message field in a lookup table and write the result into a new field or overwrite an existing field.

_images/usage-extractor.png

Converters

When you use an extractor to get values out of a text message, you can use a lookup table converter to do a lookup on the extracted value.

_images/usage-converter.png

Decorators

A lookup table decorator can be used to enrich messages by looking up values at search time.

_images/usage-decorator.png

Pipeline Rules

There are two lookup functions that can be used in a pipeline rule, lookup() and lookup_value(). The first returns the multi value data of the lookup result, the second returns the single value.

_images/usage-pipeline-rule.png

Message rewriting with Drools

Note

Since Graylog 2.0 you can use the processing pipelines for more flexible message rewriting.

Graylog can optionally use Drools Expert to evaluate all incoming messages against a user defined rules file. Each message will be evaluated prior to being written to the outputs.

The rule file location is defined in the Graylog configuration file:

# Drools Rule File (Use to rewrite incoming log messages)
rules_file = /etc/graylog.d/rules/graylog.drl

The rules file is located on the file system with a .drl file extension. The rules file can contain multiple rules, queries and functions, as well as some resource declarations like imports, globals, and attributes that are assigned and used by your rules and queries.

For more information on the DRL rules syntax please read the Drools User Guide.

Getting Started

  1. Uncomment the rules_file line in the Graylog configuration file.
  2. Copy the sample rules file to the location specified in your Graylog configuration file.
  3. Modify the rules file to parse/rewrite/filter messages as needed.

Example rules file

This is an example rules file:

rule "Overwrite localhost"
    when
        m : Message( source == "localhost" )
    then
        m.addField("source", "localhost.example.com" );
        log.info("[Overwrite localhost] rule fired: {}", m);
end

rule "Drop UDP and ICMP Traffic from firewall"
    when
        m : Message( getField("full_message") matches "(?i).*(ICMP|UDP) Packet(.|\n|\r)*" && source == "firewall" )
    then
        m.setFilterOut(true);
        log.info("[Drop UDP and ICMP Traffic from firewall] rule fired: {}", m);
end

The log object being used to write log messages from within Drools rules is an instance of the SLF4J Logger interface.

Parsing Message and adding fields

In the following script we turn the PID and the source IP into additional fields:

import org.graylog2.plugin.Message
import java.util.regex.Matcher
import java.util.regex.Pattern

// Raw Syslog
// Example:
//   Apr 18 15:34:58 server01 smtp-glass[3371]: NEW (1/0) on=1.1.1.1:9100, src=2.2.2.2:38776, ident=, dst=3.3.3.3:25, id=1303151698.3371
rule "SMTP Glass Logging to GELF"
  when
      m : Message( message matches "^smtp-glass.*" )
  then
      Matcher matcher = Pattern.compile("smtp-glass\\\[(\\\d+)].* src (\\\d+.\\\d+.\\\d+.\\\d+)").matcher(m.getMessage());
      if (matcher.find()) {
         m.addField("_pid", Long.valueOf(matcher.group(1)));
         m.addField("_src", matcher.group(2));
      }
end

Another example: Adding additional fields and changing the message itself

We send Squid access logs to Graylog using Syslog. The problem is that the host field of the message was set to the IP address of the Squid proxy, which not very useful. This rule overwrites the source and adds other fields:

import java.util.regex.Matcher
import java.util.regex.Pattern
import java.net.InetAddress;

/*
Raw Syslog: squid[2099]: 1339551529.881  55647 1.2.3.4 TCP_MISS/200 22 GET http://www.google.com/

squid\[\d+\]: (\d+\.\d+) *(\d+) *(\d+.\d+.\d+.\d+) *(\w+\/\w+) (\d+) (\w+) (.*)
matched: 13:1339551529.881
matched: 29:55647
matched: 35:1.2.3.4
matched: 47:TCP_MISS/200
matched: 60:22
matched: 64:GET
matched: 68:http://www.google.com/
*/

rule "Squid Logging to GELF"
    when
        m : Message( getField("facility") == "local5" )
    then
        Matcher matcher = Pattern.compile("squid\\[\\d+\\]: (\\d+.\\d+) *(\\d+) *(\\d+.\\d+.\\d+.\\d+) *(\\w+\\/\\w+) (\\d+) (\\w+) (.*)").matcher(m.getMessage());

        if (matcher.find()) {
            m.addField("facility", "squid");
            InetAddress addr = InetAddress.getByName(matcher.group(3));
            String host = addr.getHostName();
            m.addField("source",host);
            m.addField("message",matcher.group(6) + " " + matcher.group(7));
            m.addField("_status",matcher.group(4));
            m.addField("_size",matcher.group(5));
        }
end

Blacklisting messages

You can also use Drools rules to blacklist messages.

Blacklisting

Note

Since Graylog 2.0 you can use the processing pipelines for blacklisting.

If you have messages coming into Graylog that should be discarded before being written to Elasticsearch or forwarded to another system you can use Drools rules to perform custom filtering.

The rule file location is defined in the Graylog configuration file:

# Drools Rule File (Use to rewrite incoming log messages)
rules_file = /etc/graylog.d/rules/graylog.drl

The rules file is located on the file system with a .drl file extension. The rules file can contain multiple rules, queries and functions, as well as some resource declarations like imports, globals, and attributes that are assigned and used by your rules and queries.

For more information on the DRL rules syntax please read the Drools User Guide.

How to

The general idea is simple: Any Message marked with setFilterOut(true) will be discarded when processed in the Graylog filter chain. You can either write and load your own filter plugin that can execute any Java code to mark messages or just use the Drools rules. The following example shows how to do this.

Based on regular expressions

Put this into your rules_file:

import org.graylog2.plugin.Message
import java.util.regex.Matcher
import java.util.regex.Pattern

rule "Blacklist all messages that start with 'firewall'"
  when
      m : Message( message matches "^firewall.*" )
  then
      System.out.println("DEBUG: Blacklisting message."); // Don't do this in production.
      m.setFilterOut(true);
end

This rule will blacklist any message that starts with the string “firewall” (matches "^firewall.*").

Geolocation

Graylog lets you extract and visualize geolocation information from IP addresses in your logs. Here we will explain how to install and configure the geolocation resolution, and how to create a map with the extracted geo-information.

Setup

The Graylog Map Widget is the plugin providing geolocation capabilities to Graylog. The plugin is compatible with Graylog 2.0.0 and higher, and it is installed by default, although some configuration is still required on your side. This section explains how to configure the plugin in detail.

In case you need to reinstall the plugin for some reason, you can find it inside the Graylog tarball in our downloads page. Follow the instructions in Installing and loading plugins to install it.

Configure the database

In first place, you need to download a geolocation database. We currently support MaxMind City databases in the MaxMind DB format, as the GeoIP2 City Database or GeoLite2 City Database that MaxMind provides.

The next step is to store the geolocation database in all servers running Graylog. As an example, if you were using the Graylog OVA, you could save the database in the /var/opt/graylog/data folder, along with other data used by Graylog. Make sure you grant the right permissions so the user running Graylog can read the file.

Then you need to configure Graylog to start using the geolocation database to resolve IPs in your logs. To do that, open Graylog web interface in your favourite browser, and go to System -> Configurations. You can find the geolocation configuration under the Plugins / Geo-Location Processor section, as seen in the screenshot.

_images/geolocation_1.png

In the configuration modal, you need to check the Enable geolocation processor, and enter the path to the geolocation database you use. Once you are all set, click on save to store the configuration changes.

_images/geolocation_2.png

Configure the message processor

The last step before being able to resolve locations from IPs in your logs, is to activate the GeoIP Resolver processor. In the same System -> Configurations page, update the configuration in the Message Processors Configuration section.

_images/geolocation_3.png

In that screen, you need to enable the GeoIP Resolver, and you must also set the GeoIP Resolver as the last message processor to run, if you want to be able to resolve geolocation from fields coming from extractors.

_images/geolocation_4.png

That’s it, at this point Graylog will start looking for fields containing exclusively an IPv4 or IPv6 address, and extracting their geolocation into a <field>_geolocation field.

Note

In case you are not sending structured logs to Graylog, you can use extractors to store the IP addresses in your messages into their own fields. Check out the Extractors documentation for more information.

Important

The GeoIP Resolver processor will not process any internal message fields, i. e. any field starting with gl2_ such as gl2_remote_ip.

Verify the geolocation configuration (Optional)

To ensure the geolocation resolution is working as expected, you can do the following:

  1. Create a TCP Raw/Plaintext input:
_images/geolocation_5.png

2. Send a message only containing an IP to the newly created input. As an example, we will be using the nc command: nc -w0 <graylog_host> 5555 <<< '8.8.8.8'

  1. Verify that the message contains a message_geolocation field:
_images/geolocation_6.png
  1. Delete the input if you don’t need it any more

In case the message does not contain a message_geolocation field, please check your Graylog server logs, and ensure you followed the steps in the Configure the database section.

Visualize geolocations in a map

Graylog can display maps from geolocation stored in any field, as long as the geo-points are using the latitude,longitude format.

Display a map in the search results page

On any search result page, you can expand the field you want to use to draw a map in the search sidebar, and click on the World Map link. That will show a map with all different points stored in that field.

_images/geolocation_7.png

Add map to a dashboard

You can add the map visualization into any dashboards as you do with other widgets. Once you displayed a map in the search result page, click on Add to dashboard, and select the dashboard where you want to add the map.

_images/geolocation_8.png _images/geolocation_9.png

FAQs

Will Graylog extract IPs from all fields?

Yes, as long as they contain exclusively an IP address.

What geo-information is extracted from IPs?

Since version 2.2.0, Graylog extracts the IP coordinates, country ISO code, and the city name if available.

Where is the extracted geo-information stored?

Extracted geo-information is stored in new message fields, named as the original field, and appended suffix describing the stored information. That is, if the original field was called ip_address, the extracted geo-information will be stored as follows:

  • ip_address_geolocation will contain the geo-coordinates
  • ip_address_country_code will contain the country ISO code
  • ip_address_city_name will contain the city name (if available) or N/A in other case

Which geo-points format does Graylog use to store geolocation information?

Graylog stores the geolocation information in the latitude,longitude format.

I have a field in my messages with geolocation information already, can I use it in Graylog?

Yes, as long as it contains geolocation information in the latitude,longitude format.

Not all fields containing IP addresses are resolved. Why does this happen?

Most likely it is a misconfiguration issue. Please ensure that the IPs you want to get geolocation information from are in their own fields, and also ensure that the GeoIP Resolver is enabled, and in the right order in the Message Processors Configuration, as explained in Configure the message processor.

Indexer failures

Every Graylog node is constantly keeping track about every indexing operation it performs. This is important for making sure that you are not silently losing any messages. The web interface can show you a number of write operations that failed and also a list of failed operations. Like any other information in the web interface this is also available via the REST APIs so you can hook it into your own monitoring systems.

_images/indexerfailures_1.png

Information about the indexing failure is stored in a capped MongoDB collection that is limited in size. A lot (many tens of thousands) of failure messages should fit in there but it should not be considered a complete collection of all errors ever thrown.

Common indexer failure reasons

There are some common failures that can occur under certain circumstances. Those are explained here:

MapperParsingException

An error message would look like this:

MapperParsingException[failed to parse [failure]]; nested: NumberFormatException[For input string: "some string value"];

You tried to write a string into a numeric field of the index. The indexer tried to convert it to a number, but failed because the string did contain characters that could not be converted.

This can be triggered by for example sending GELF messages with different field types or extractors trying to write strings without converting them to numeric values first. The recommended solution is to actively decide on field types. If you sent in a field like http_response_code with a numeric value then you should never change that type in the future.

The same can happen with all other field types like for example booleans.

Note that index cycling is something to keep in mind here. The first type written to a field per index wins. If the Graylog index cycles then the field types are starting from scratch for that index. If the first message written to that index has the http_response_code set as string then it will be a string until the index cycles the next time. Take a look at Index model for more information.

Users and Roles

Graylog has a granular permission system which secures the access to its features. Each interaction which can look at data or change configuration in Graylog must be performed as an authenticated user.

Each user can have varying levels of access to Graylog’s features, which can be controlled with assigning roles to users.

The following sections describe the capabilities of users and roles and also how to use LDAP for authentication.

Users

It is recommended to create an account for each individual user accessing Graylog.

User accounts have the usual properties such as a login name, email address, full name, password etc. In addition to these fields, you can also configure the session timeout, roles and timezone.

_images/create_user.png

Sessions

Each login for a user creates a session, which is bound to the browser the user is currently using. Whenever the user interacts with Graylog this session is extended.

For security reasons you will want to have Graylog expire sessions after a certain period of inactivity. Once the interval specified by timeout expires the user will be logged out of the system. Requests like displaying throughput statistics do not extend the session, which means that if the user keeps Graylog open in a browser tab, but does not interact with it, their session will expire as if the browser was closed.

Logging out explicitly terminates the session.

Timezone

Since Graylog internally processes and stores messages in the UTC timezone, it is important to set the correct timezone for each user.

Even though the system defaults are often enough to display correct times, in case your team is spread across different timezones, each user can be assigned and change their respective timezone setting. You can find the current timezone settings for the various components on the System / Overview page of your Graylog web interface.

Initial Roles

Each user needs to be assigned at least one role, which governs the basic set of permissions this user has in Graylog.

Normal users, which do not need to create inputs, outputs or perform administrative tasks like managing access control etc, should be assigned the built in Reader role in addition to the custom roles which grant access to streams and dashboards.

Roles

In Graylog, roles are named collections of individual permissions which can be assigned to users. Previous Graylog versions could only assign individual permissions to each user in the system, making updating stream or dashboard permissions for a large group of users difficult to deal with.

Starting with Graylog 1.2 you can create roles which bundle permissions to streams and dashboards. These roles can then be assigned to any number of users and later be updated to include new streams and dashboards.

_images/roles.png

The two roles Admin and Reader are built in and cannot be changed. The Admin role grants all permissions and should only be assigned to users operating Graylog. The Reader role grants the basic permissions every user needs to be able to use Graylog. The interface will ensure that every user at least has the Reader role in addition to more business specific roles you create.

Roles cannot be deleted as long as users are still assigned to them to prevent accidentally locking users out.

Creating a role

In order to create a new role, choose the green Add new role button on the System / Authentication / Roles page.

This will display a dialog allowing you to describe the new role and select the permissions it grants.

_images/create_role_1.png

After naming the role, select the permissions you want to grant using the buttons to the right of the respective stream or dashboard names. For each stream or dashboard you can select whether to grant edit or read permissions, but note that edit permissions always imply read permissions as well.

In case you have many streams or dashboards you can use the filter to narrow the list down, and use the checkboxes on the left hand side of the table to select multiple items. You can then use the bulk action buttons on the right hand side to toggle the permissions for all of the selected items at once.

_images/create_role_2.png

Once you are done, be sure to save your changes. The save button is disabled until you select at least one permission.

Editing a role

Administrators can edit roles to add or remove access to new streams and dashboards in the system. The two built in Admin and Reader roles cannot be edited or deleted because they are vital for Graylog’s permission system.

Simply choose the Edit button on the System / Authentication / Roles page and change the settings of the role in the following page:

_images/edit_role.png

You can safely rename the role as well as updating its description, the existing role assignment for users will be kept.

Deleting a role

Deleting roles checks whether a role still has users assigned to it, to avoid accidentally locking users out. If you want to remove a role, please remove it from all users first.

Permission system

The Graylog permission system is extremely flexible and allows you to create users that are only allowed to perform certain REST calls. The Roles UI allows you to create roles based on stream or dashboard access but does not expose permissions on a REST call level yet. This guide describes how to create those roles using the Graylog REST API.

Imagine we want to create a role that is only allowed to start or stop message processing on graylog-server nodes.

REST call permissions

Almost every REST call in Graylog has to be authenticated or it will return an HTTP 403 (Forbidden). In addition to that, the requesting user also has to have the permissions to execute the REST call. A Graylog admin user can always execute all calls and roles based on the standard stream or dashboard permissions can execute calls related to those entities.

If you want to create a user that can only execute calls to start or stop message processing you have to find the name of the required permission first.

You can learn about available permissions by querying the /system/permissions endpoint:

curl -XGET -u ADMIN:PASSWORD 'http://graylog.example.org:9000/api/system/permissions?pretty=true'

The server responds with a list such as this:

{
  "permissions" : {
    "outputs" : [ "create", "edit", "terminate", "read" ],
    "users" : [ "tokencreate", "rolesedit", "edit", "permissionsedit", "list", "tokenlist", "create", "passwordchange", "tokenremove" ],
    "processing" : [ "changestate" ],
    ...
  }
}

Starting and stopping message processing corresponds to the changestate permission in the processing category. We combine both pieces to the permission key processing:changestate.

Creating the role

You can create a new role using the REST API like this:

curl -v -XPOST -u ADMIN:PASSWORD -H 'Content-Type: application/json' 'http://graylog.example.org:9000/api/roles' -d '{"read_only": false,"permissions": ["processing:changestate"],"name": "Change processing state","description": "Permission to start or stop processing on Graylog nodes"}'

Notice the processing:changestate permission that we assigned. Every user with this role will be able to start and stop processing on graylog-server nodes. Graylog’s standard reader permissions do not provide any access to data or maintenance functionalities.

This is the POST body in an easier to read formatting:

{
  "name": "Change processing state",
  "description": "Permission to start or stop processing on graylog-server nodes",
  "permissions": [
    "processing:changestate"
  ],
  "read_only": false
}

Assigning the role to a user

Create a new user in the Graylog web interface and assign the new role to it:

_images/sysuser.png

Every user needs to at least have the standard “Reader” permissions but those do not provide any access to data or maintenance functionalities.

Now request the user information to see what permissions have been assigned:

$ curl -XGET -u ADMIN:PASSWORD 'http://graylog.example.org:9000/api/users/maintenanceuser?pretty=true'
{
  "id" : "563d1024d4c63709999c4ac2",
  "username" : "maintenanceuser",
  "email" : "it-ops@example.org",
  "full_name" : "Rock Solid",
  "permissions" : [
    "indexercluster:read",
    "messagecount:read",
    "journal:read",
    "inputs:read",
    "metrics:read",
    "processing:changestate",
    "savedsearches:edit",
    "fieldnames:read",
    "buffers:read",
    "system:read",
    "users:edit:maintenanceuser",
    "users:passwordchange:maintenanceuser",
    "savedsearches:create",
    "jvmstats:read",
    "throughput:read",
    "savedsearches:read",
    "messages:read"
  ],
  "preferences" : {
    "updateUnfocussed" : false,
    "enableSmartSearch" : true
  },
  "timezone" : "America/Chicago",
  "session_timeout_ms" : 300000,
  "read_only" : false,
  "external" : false,
  "startpage" : { },
  "roles" : [
    "Change processing state",
    "Reader"
  ]
}

Now you can use this user in your maintenance scripts or automated tasks.

External authentication

LDAP / Active Directory

It is possible to use an external LDAP or Active Directory server to perform user authentication in Graylog.

Since Graylog 1.2.0, you can also use LDAP groups to perform authorization by mapping them to Graylog roles.

Configuration

To set up your LDAP or Active Directory server, go to System / Authentication / LDAP/Active Directory.

Once LDAP is enabled, you need to provide some details about the directory server.

_images/ldap_settings.png

Please test the server connection before continuing to the next steps.

User mapping

In order to be able to look for users in the LDAP server you configured, Graylog needs to know some more details about it: the base tree to limit user search queries, the pattern used to look for users, and the field containing the full name of the user. You can test the configuration any time by using the login test form that you can find at the bottom of that page.

_images/login_test.png

The login test information will indicate if Graylog was able to load the given user (and perform authentication, if a password was provided), and it will display all LDAP attributes belonging to the user, as you can see in the screenshot.

That’s it for the basic LDAP configuration. Don’t forget to save your settings at this point!

Group mapping

You can additionally control the default permissions for users logging in with LDAP or Active Directory by mapping LDAP groups into Graylog roles. That is extremely helpful if you already use LDAP groups to authorize users in your organization, as you can control the default permissions members of LDAP groups will have.

Once you configure group mapping, Graylog will rely on your LDAP groups to assign roles into users. That means that each time an LDAP user logs into Graylog, their roles will be assigned based on the LDAP groups their belong to.

In first place, you need to fill in the details in the Group Mapping section under System / Authentication / LDAP/Active Directory, by giving the base where to limit group searches, a pattern used to look for groups, and the group name attribute.

Then you need to select which default user role will be assigned to any users authenticated with the LDAP server should have. It is also possible to assign additional roles to any users logging in with LDAP. Please refer to Roles for more details about user roles.

Note: Graylog only synchronizes with LDAP when users log in. After changing the default and additional roles for LDAP users, you may need to modify existing users manually or delete them in order to force them to log in again.

You can test the group mapping information by using the login test form, as it will display LDAP groups that the test user belongs to. Save the LDAP settings once you are satisfied with the results.

_images/ldap_group_mapping.png

Finally, in order to map LDAP groups into roles, you need to go to System / Authentication / LDAP/Active Directory -> LDAP group mapping. This page will load all available LDAP groups using the configuration you previously provided, and will allow you to select a Graylog role which defines the permissions that group will have inside Graylog.

Note

Loading LDAP groups may take some time in certain configurations, specially if you have many groups. In those cases, creating a better filter for groups may help with the loading times.

Note

Remember that Graylog only synchronizes with LDAP when users log in, so you may need to modify existing users manually after changing the LDAP group mapping.

Troubleshooting

LDAP referrals for groups can be a problem during group mapping. Referral issues are most likely to come up with larger AD setups. The Active Directory servers literally refer to other servers in search results, and it is the client’s responsibility to follow all referrals. Support for that is currently not implemented in Graylog.

Referral issues can be detected by warnings in the server logs about group mapping failing, for example:

2016-04-11T15:52:06.045Z WARN  [LdapConnector] Unable to iterate over user's groups,
unable to perform group mapping. Graylog does not support LDAP referrals at the moment.
Please see http://docs.graylog.org/en/2.2/pages/users_and_roles/external_auth.html#troubleshooting

These issues may be resolved by either managing the groups manually, or configuring the LDAP connection to work against the global catalog. The first solution means simply that the LDAP group settings must not be set, and the groups are managed locally. The global catalog solution requires using the 3268/TCP, or 3269/TCP (TLS) port of eligible Active Directory server. The downside is that using the global catalog service consumes slightly more server resources.

Single Sign-On

The SSO Authentication Plugin for Graylog allows to use arbitrary HTTP request headers for authenticating Graylog users.

Once the plugin has been downloaded and installed on all Graylog nodes, it can be configured on the System / Authentication / Single Sign-On (SSO) page.

_images/sso_1.png

The HTTP request header containing the Graylog username can be configured in the Username Header field and should contain exactly one HTTP header name. Most HTTP request header based single sign-on solutions are using the Remote-User or X-Forwarded-User HTTP request header.

In order to only allow trusted proxy servers to provide the Graylog username, the Request must come from a trusted proxy checkbox must be checked. The list of trusted proxy servers can be edited on each Graylog node in the configuration file using the trusted_proxies configuration setting.

If user accounts not existing in the Graylog user database should automatically be created on the first login, the Automatically create user checkbox must be checked. The automatically created users can also be customized to retrieve their full name or email address from another HTTP request header, otherwise the defaults are being used.

Authentication providers

Graylog 2.1.0 and later supports pluggable authentication providers. This means, that Graylog cannot only use the builtin authentication mechanisms like its internal user database, LDAP/Active Directory, or access tokens, but can also be extended by plugins to support other authentication mechanisms, for example Single Sign-On or Two Factor Authentication.

Configuration

The order in which the authentication providers will be queried can be configured in the Graylog web interface on the System / Authentication / Configure Provider Order page.

_images/authentication_order_1.png

If a user tries to log into Graylog, the authentication providers will be queried in the configured order until a successful authentication attempt has been made (in which case the user will be logged in) or all authentication providers have denied authentication (in which case the user will not be logged in and get an error message).

By clicking on the Update button on the System / Authentication / Configure Provider Order page, the order of authentication providers can be customized.

_images/authentication_order_2.png

Plugins

About Plugins

Graylog offers various extension points to customize and extend its functionality through writing Java code.

The first step for writing a plugin is creating a skeleton that is the same for each type of plugin. The next chapter is explaining how to do this and will then go over to chapters explaining plugin types in detail.

Plugin Types

Graylog comes with a stable plugin API for the following plugin types:

  • Inputs: Accept/write any messages into Graylog
  • Outputs: Forward ingested messages to other systems as they are processed
  • Services: Run at startup and able to implement any functionality
  • Alert Conditions: Decide whether an alert will be triggered depending on a condition
  • Alert Notifications: Called when a stream alert condition has been triggered
  • Processors: Transform/drop incoming messages (can create multiple new messages)
  • Filters: (Deprecated) Transform/drop incoming messages during processing
  • REST API Resources: An HTTP resource exposed as part of the Graylog REST API
  • Periodical: Called at periodical intervals during server runtime
  • Decorators: Used during search time to modify the presentation of messages
  • Authentication Realms: Allowing to implement different authentication mechanisms (like single sign-on or 2FA)

API concepts

Graylog uses certain patterns in its code bases to make it easier to write extensions. It is important to know about these to be successful in writing custom for it.

You can browse the Graylog Javadoc documentation for details on each class and method mentioned here.

Factory Class

Many newer Graylog extension points split the common aspects of custom code into three different classes:

  • instance creation - an, usually inner, interface commonly called Factory
  • configuration - the factory returns a ConfigurationRequest instance (or a wrapped instance of it), commonly called Config
  • descriptor - the factory returns a display descriptor instance, commonly called Descriptor

Say Graylog exposes an extension point interface called ExtensionPoint, which contains inner interfaces calles Factory, Config and Descriptor. An implementation of ExtensionPoint then looks as following:

public AwesomeExtension implements ExtensionPoint {

        public interface Factory extends ExtensionPoint.Factory {
                @Override
                AwesomeExtension create(Decorator decorator);

                @Override
                AwesomeExtension.Config getConfig();

                @Override
                AwesomeExtension.Descriptor getDescriptor();
        }

        public static class Config implements ExtensionPoint.Config {
                @Override
                public ConfigurationRequest getRequestedConfiguration() {
                        return new ConfigurationRequest();
                }
        }

        public static class Descriptor extends ExtensionPoint.Descriptor {
                public Descriptor() {
                        super("awesome", "http://docs.graylog.org/", "Awesome Extension");
                }
        }
}

This pattern is used to prevent instantiation of extensions just to get their descriptor or configuration information, because some extensions might be expensive to set up or require some external service and configuration to work.

The factory itself is built using Guice’s assisted injection for auto-wired factories. This allows plugin authors (and Graylog’s internals as well) to cleanly describe their extension as well as taking advantage of dependency injection.

To register such an extension, Graylog typically offers a convenience method via its Guice modules (GraylogModule or PluginModule). For example alert conditions follow the same pattern and are registered as such:

public class SampleModule extends PluginModule {
        // other methods omitted for clarity
        @Override
        protected void configure() {
                addAlertCondition(SampleAlertCondition.class.getCanonicalName(),
                                SampleAlertCondition.class,
                                SampleAlertCondition.Factory.class);
        }
}

Alert Conditions

An alert condition determines whether an alert is triggered. The result of a condition is sent to an alert notification for sending to remote systems.

In Graylog alerting is based on searches and typically includes a list of messages that lead to the alert. However nothing prevents user code to query other systems than Elasticsearch to produce alerts.

Class Overview

The central interface is org.graylog2.plugin.alarms.AlertCondition which is also the type that a plugin module must register using org.graylog2.plugin.PluginModule#addAlertCondition.

Alert conditions are configurable at runtime and thus need a corresponding org.graylog2.plugin.configuration.ConfigurationRequest.

Like many other types they also require a org.graylog2.plugin.alarms.AlertCondition.Descriptor for displaying information about the alert condition.

Typically you will not implement AlertCondition directly, but instead use org.graylog2.alerts.AbstractAlertCondition which handles the configuration persistence for you automatically and implements two helper to provide the result of a condition check.

Example

Please refer to the sample plugin implementation for the full code.

Bindings

Compare with the code in the sample plugin.

public class SampleModule extends PluginModule {

  @Override
  public Set<? extends PluginConfigBean> getConfigBeans() {
      return Collections.emptySet();
  }

  @Override
  protected void configure() {
      addAlertCondition(SampleAlertCondition.class.getCanonicalName(),
              SampleAlertCondition.class,
              SampleAlertCondition.Factory.class);
  }
}
User Interface

Alert conditions have no special user interface elements.

Alert Notifications

Alert Notifications are responsible for sending information about alerts to external systems, such as sending an email, push notifications, opening tickets, writing to chat systems etc.

They receive the stream they were bound to as well as the result of the configured Alert Conditions.

Note

Alert Notifications were called Alarm Callbacks in previous versions of Graylog.

The old name is still used in the code and REST API endpoints for backwards compatibility, so you will see it when implementing your plugins.

Class Overview

The interface to implement is org.graylog2.plugin.alarms.callbacks.AlarmCallback which is also the type that a plugin module must register using org.graylog2.plugin.PluginModule#addAlarmCallback.

Example Alert Notification

You can find a minimal implementation in the sample plugin.

To create an alert notification plugin implement the AlarmCallback interface interface:

public class SampleAlertNotification implements AlarmCallback

Your IDE should offer you to create the methods you need to implement:

public void initialize(Configuration configuration) throws AlarmCallbackConfigurationException

This is called once at the very beginning of the lifecycle of this plugin. It is common practice to store the Configuration as a private member for later access.

public void call(Stream stream, AlertCondition.CheckResult checkResult) throws AlarmCallbackException

This is the actual alert notification being triggered. Implement your login that interacts with a remote system here, for example sending a push notification, posts into a chat system etc.

public ConfigurationRequest getRequestedConfiguration()

Plugins can request configurations. The UI in the Graylog web interface is generated from this information and the filled out configuration values are passed back to the plugin in initialize(Configuration configuration).

The return value must not be null.

This is an example configuration request:

final ConfigurationRequest configurationRequest = new ConfigurationRequest();
configurationRequest.addField(new TextField(
        "service_key", "Service key", "", "JIRA API token. You can find this token in your account settings.",
        ConfigurationField.Optional.NOT_OPTIONAL)); // required, must be filled out
configurationRequest.addField(new BooleanField(
        "use_https", "HTTPs", true,
        "Use HTTP for API communication?"));

public String getName()

Return a human readable name of this plugin.

public Map<String, Object> getAttributes()

Return attributes that might be interesting to be shown under the alert notification in the Graylog web interface. It is common practice to at least return the used configuration here.

public void checkConfiguration() throws ConfigurationException

Throw a ConfigurationException if the user should have entered missing or invalid configuration parameters.

Caution

The alert notification may be created multiple times, so be sure to not perform business logic in the constructor.

You should however inject custom dependencies, such as a specific client library or other objects in the constructor.

Bindings

Compare with the code in the sample plugin.

public class SampleModule extends PluginModule {

        @Override
        public Set<? extends PluginConfigBean> getConfigBeans() {
                return Collections.emptySet();
        }

        @Override
        protected void configure() {
                addAlarmCallback(SampleAlertNotification.class);
        }
}
User Interface

Alert notifications have no custom user interface elements.

Decorators

Decorators can be used to transform a message field at display time. Multiple decorators can be applied at the same time, but you cannot make any assumptions about their order, as that is user defined. Stacked decorators receive the value of the previous decorator results.

They are typically used to map between the stored value and a human readable form of that value, for example like the Syslog severity mapper (compare its code) maps between numeric values and their textual representation.

Other uses include looking up user names based on a user’s ID in a remote database, triggering a whois request on a domain name etc.

Class Overview

You need to implement the org.graylog2.plugin.decorators.SearchResponseDecorator interface. This class must declare a Factory Class.

Beyond the factory, configuration and descriptor classes, the only thing that a decorator needs to implement is the apply function:

SearchResponse apply(SearchResponse searchResponse);

The org.graylog2.rest.resources.search.responses.SearchResponse class represents the result that is being returned to the web interface (or other callers of the REST API).

You are free to modify any field, create new fields or remove fields. However, the web interface makes certain assumptions regarding fields that start with gl2_ and requires at least the timestamp, source and message fields to be present.

Thrown exceptions are being logged as errors and lead to returning the original search response, without any modifications.

Example

Please refer to the sample plugin implementation for the full code.

Bindings

Compare with the code in the sample plugin.

public class SampleModule extends PluginModule {

  @Override
  public Set<? extends PluginConfigBean> getConfigBeans() {
      return Collections.emptySet();
  }

  @Override
  protected void configure() {
    installSearchResponseDecorator(searchResponseDecoratorBinder(),
                    PipelineProcessorMessageDecorator.class,
                    PipelineProcessorMessageDecorator.Factory.class);
  }
}
User Interface

Decorators have no custom user interface elements.

Writing Plugins

What you need in your development environment before starting is:

There are lots of different ways to get those on your local machine, unfortunately we cannot list all of them, so please refer to your operating system-specific documentation,

Graylog uses a couple of conventions and techniques in its code, so be sure to read about the API concepts for an overview.

Sample Plugin

To go along with this documentation, there is a sample plugin on Github. This documentation will link to specific parts for your reference. It is fully functional, even though it does not implement any useful functionality. Its purpose to provide a reference for helping to implement your own plugins.

Creating a plugin skeleton

The easiest way to get started is to use our Graylog meta project, which will create a complete plugin project infrastructure will all required classes, build definitions, and configurations. Using the meta project allows you to have the Graylog server project and your own plugins (or 3rd party plugins) in the same project, which means that you can run and debug everything in your favorite IDE or navigate seamlessly in the code base.

Note

We are working on a replacement tool for the graylog-project meta project, but for the time being it still works.

Maven is a widely used build tool for Java, that comes pre-installed on many operating systems or can be installed using most package managers. Make sure that you have at least version 3 before you go on.

Use it like this:

$ git clone git@github.com:Graylog2/graylog-project.git

This will create a checkout of the meta project in your current work dir. Now change to the graylog-project directory and execute the step which to download the necessary base modules:

$ scripts/bootstrap

Now you can bootstrap the plugin you want to write from here, by doing:

$ scripts/bootstrap-plugin jira-alarmcallback

It will ask you a few questions about the plugin you are planning to build. Let’s say you work for a company called ACMECorp and want to build an alarm callback plugin that creates a JIRA ticket for each alarm that is triggered:

groupId: com.acmecorp
version: 1.0.0
package: com.acmecorp
pluginClassName: JiraAlarmCallback

Note that you do not have to tell the archetype wizard what kind of plugin you want to build because it is creating the generic plugin skeleton for you but nothing that is related to the actual implementation. More on this in the example plugin chapters later.

You now have a new folder called graylog-plugin-jira-alarmcallback that includes a complete plugin skeleton including Maven build files. Every Java IDE out there can now import the project automatically without any required further configuration.

In IntelliJ IDEA for example you can just use the File -> Open dialog to open the graylog-project directory as a fully configured Java project, which should include the Graylog server and your plugin as submodules.

Please pay close attention to the README file of the Graylog meta project and follow any further instructions listed there to set up your IDE properly.

If you want to continue working on the command line, you can do the following to compile the server and your plugin:

$ mvn package

The anatomy of a plugin

Each plugin contains information to describe itself and register the extensions it contains.

Note

A single plugin can contain multiple extensions to Graylog.

For example a hypothetical plugin might contribute an input, an output and alert notifications to communicate with systems. For convenience this would be bundled in a single plugin registering multiple extensions.

Required classes

At the very minimum you need to implement two interfaces:

The bootstrap-plugin script generates these implementations for you, and you simply need to fill out the details.

Graylog uses Java’s ServiceLoader mechanism to find your plugin’s main class, so if you rename your Plugin implementation, you need to also adjust the service file. Please also see Google Guava’s AutoService which Graylog uses in conjunction with the plain ServiceLoader.

In addition to the service, Graylog needs an additional resource file called graylog-plugin.properties in a special location. This file contains information about the plugin, specifically which classloader the plugin needs to be in, so it needs to be read before the plugin is actually loaded. Typically you can simply take the default that has been generated for you.

Registering your extension

So far the plugin itself does not do anything, because it neither implements any of the available extensions, nor could Graylog know which ones are available from your code.

Graylog uses dependency injection to wire up its internal components as well as the plugins. Thus the extensions a plugin provides need to be exposed as a PluginModule which provides you with a lot of helper methods to register the various available extensions to cut down the boiler plate code you have to write.

An empty module is created for you.

Caution

The PluginModule exposes a lot of extension points, but not all of them are considered stable API for external use.

If in doubt, please reach out to us on our community support channels.

Please refer to the available Plugin Types for detailed information what you can implement. The Sample Plugin contains stub implementations for each of the supported extensions.

Web Plugin creation

Sometimes your plugin is not only supposed to work under the hoods inside a Graylog server as an input, output, alarm callback, etc. but you also want to contribute previously nonexisting functionality to Graylog’s web interface. Since version 2.0 this is now possible. When using the most recent Graylog meta project to bootstrap the plugin skeleton, you are already good to go for this. Otherwise please see our chapter about Creating a plugin skeleton.

The Graylog web interface is written in JavaScript, based on React. It is built using webpack, which is bundling all JavaScript code (and other files you use, like stylesheets, fonts, images, even audio or video files if you need them) into chunks digestible by your browser and npm, which is managing our external (and own) dependencies. During the build process all of this will be bundled and included in the jar file of your plugin.

This might be overwhelming at first if you are not accustomed to JS-development, but fortunately we have set up a lot to make writing plugins easier for you!

If you use our proposed way for Creating a plugin skeleton, and followed the part about the Writing Plugins, you are already good to go for building a plugin with a web part. All you need is a running Graylog server on your machine. Everything else is fetched at build time!

Getting up and running with a web development environment is as easy as this:

$ scripts/start-web-dev
[...]
$ open http://localhost:8080

This starts the development web server. It even tries to open a browser window going to it (probably working on Mac OS X only).

If your Graylog server is not running on http://localhost:9000/api/, then you need to edit graylog2-server/graylog2-web-interface/config.js (in your graylog-project directory) and adapt the gl2ServerUrl parameter.

Web Plugin structure

These are the relevant files and directories in your plugin directory for the web part of it:

webpack.config.js
This is the configuration file for the webpack module bundler. Most of it is already preconfigured by our PluginWebpackConfig class, so the file is very small. You can override/extend every configuration option by passing a webpack snippet though.
build.config.js.sample
In this file you can customize some of the parameters of the build. There is one mandatory parameter named web_src_path which defines the absolute or relative location to a checkout of the Graylog source repository.
package.json
This is a standard npm JSON file describing the web part of your plugin, especially its dependencies. You can read more about its format.
src/web
This is where the actual code for thw web part of your plugin goes to. For the start there is a simple index.jsx file, which shows you how to register your plugin and the parts it provides with the Graylog web interface. We will get to this in detail later.

Required conventions for web plugins

Plugin Entrypoint

There is a single file which is the entry point of your plugin, which means that the execution of your plugin starts there. By convention this is src/web/index.jsx. You can rename/move this file, you just have to adapt your webpack configuration to reflect this change, but it is not recommended.

In any case, this file needs to contain the following code at the very top:

// eslint-disable-next-line no-unused-vars
import webpackEntry from 'webpack-entry';

This part is responsible to include and execute the webpack-entry file, which is responsible to set up webpack to use the correct URL format when loading assets for this plugin. If you leave this out, erratic behavior will be the result.

Linking to other pages from your plugin

If you want to generate links from the web frontend to other pages of your plugin or the main web interface, you need to use the Routes.pluginRoute() helper method to generate the URLs properly.

See this file for more information.

Best practices for web plugin development

Using ESLint

ESLint is an awesome tool for linting JavaScript code. It makes sure that any written code is in line with general best practises and the project-specific coding style/guideline. We at Graylog are striving to make the best use of this tools as possible, to help our developers and you to generate top quality code with little bugs. Therefore we highly recommend to enable it for a Graylog plugin you are writing.

Code Splitting

Both the web interface and plugins for it depend on a number of libraries like React, RefluxJS and others. To prevent those getting bundled into both the web interface and plugin assets, therefore wasting space or causing problems (especially React does not like to be present more than once), we extract those into a commons chunk which is reused by the web interface and plugins.

This has no consequences for you as a plugin author, because the configuration to make use of this is already generated for you when using the meta project or the maven archetype. But here are some details about it:

Common libraries are built into a separate vendor bundle using an own configuration file named webpack.vendor.js. Using the DLLPlugin a manifest is extracted which allow us to reuse the generated bundle. This is then imported in our main web interface webpack configuration file and the corresponding generated webpack config file for plugins.

Building plugins

Building the plugin is easy because the meta project has created all necessary files and settings for you. Just run mvn package either from the meta project’s directory (to build the server and the plugin) or from the plugin directory (to build the plugin only):

$ mvn package

This will generate a .jar file in target/ that is the complete plugin file:

$ ls target/jira-alarmcallback-1.0.0-SNAPSHOT.jar
target/jira-alarmcallback-1.0.0-SNAPSHOT.jar

Installing and loading plugins

The only thing you need to do to run the plugin in Graylog is to copy the .jar file to your plugins folder that is configured in your graylog.conf. The default is just plugins/ relative from your graylog-server directory.

This is a list of default plugin locations for the different installation methods.

Plugin Installation Locations
Installation Method Directory
Virtual Machine Appliances /opt/graylog/plugins/
Operating System Packages /usr/share/graylog-server/plugin/
Manual Setup /<extracted-graylog-tarball-path>/plugin/

Restart graylog-server and the plugin should be available to use from the web interface immediately.

External dashboards

There are other frontends that are connecting to the Graylog REST API and display data or information in a special way.

CLI stream dashboard

This official Graylog dashboard which is developed by us is showing live information of a specific stream in your terminal. For example it is the perfect companion during a deployment of your platform: Run it next to the deployment output and show information of a stream that is catching all errors or exceptions on your systems.

_images/cli_dashboard.png

The CLI stream dashboard documentation is available on GitHub.

Graylog Marketplace

The Graylog Marketplace is the central directory of add-ons for Graylog. It contains plugins, content packs, GELF libraries and more content built by Graylog developers and community members.

_images/marketplace.png

GitHub integration

The Marketplace is deeply integrated with GitHub. You sign-in with your GitHub account if you want to submit content and only have to select an existing repository to list on the Marketplace.

From there on you manage your releases and code changes in GitHub. The Marketplace will automatically update your content.

There is no need to sign-in if you only want to browse or download content.

General best practices

README content

We kindly ask you to provide an as descriptive as possible README file with your submission. This file will be displayed on the Marketplace detail page and should provide the following information:

  • What is it.
  • Why would you want to use it? (Use cases)
  • Do you have to register somewhere to get for example an API token?
  • How to install and configure it.
  • How to use it in a Graylog context.

Take a look at the Splunk plug-in as an example.

The README supports Markdown for formatting. You cannot submit content that does not contain a README file.

License

You cannot submit content that does not contain a LICENSE or COPYING file. We recommend to consult ChooseALicense.com if you are unsure which license to use.

4 Types of Add-Ons

Plug-Ins: Code that extends Graylog to support a specific use case that it doesn’t support out of the box.

Content Pack: A file that can be uploaded into your Graylog system that sets up streams, inputs, extractors, dashboards, etc. to support a given log source or use case.

GELF Library: A library for a programming language or logging framework that supports sending log messages in GELF format for easy integration and pre-structured messages.

Other Solutions: Any other content or guide that helps you integrate Graylog with an external system or device. For example, how to configure a specific device to support a format Graylog understands out of the box.

Contributing plug-ins

You created a Graylog plugin and want to list it in the Marketplace? This is great. Here are the simple steps to follow:

  1. Create a GitHub repository for your plugin
  2. Include a README and a LICENSE file in the repository.
  3. Push all your code to the repository.
  4. Create a GitHub release and give it the name of the plugin version. For example 0.1. The Marketplace will always show and link the latest version. You can upload as many release artifacts as you want here. For example the .jar file together with DEB and RPM files. The Marketplace will link to the detail page of a release for downloads.
  5. Submit the repository to the Marketplace

Contributing content packs

Graylog content packs can be shared on the Marketplace by following these steps:

  1. Export a Graylog content pack from the Graylog Web Interface and save the generated JSON in a file called content_pack.json.
  2. Create a GitHub repository for your content pack
  3. Include a README and a LICENSE file in the repository.
  4. Include the content_pack.json file in the root of your GitHub repository.
  5. Submit the repository to the Marketplace

Contributing GELF libraries

A GELF library can be added like this:

  1. Create a GitHub repository for your GELF library.
  2. Include a README and a LICENSE file in the repository.
  3. Describe where to download and how to use the GELF library in the README.

Contributing other content

You want to contribute content that does not really fit into the other categories but describes how to integrate a certain system or make it send messages to Graylog?

This is how you can do it:

  1. Create a GitHub repository for your content
  2. Include a README and a LICENSE file in the repository.
  3. All content goes into the README.

Frequently asked questions

General

Do I need to buy a license to use Graylog?

We believe software should be open and accessible to all. You should not have to pay to analyze your own data, no matter how much you have.

Graylog is licensed under the GNU General Public License. We do not require license fees for production or non-production use.

How long do you support older versions of the Graylog product?

For our commercial support customers, we support older versions of Graylog up to 12 months after the next major release is available. So if you’re using 1.X, you will continue to receive 1.X support up to a full year after 2.0 has been released.

Architecture

What is MongoDB used for?

Graylog uses MongoDB to store your configuration data, not your log data. Only metadata is stored, such as user information or stream configurations. None of your log messages are ever stored in MongoDB. This is why MongoDB does not have a big system impact, and you won’t have to worry too much about scaling it. With our recommended setup architecture, MongoDB will simply run alongside your graylog-server processes and use almost no resources.

Can you guide me on how to replicate MongoDB for High Availability?

MongoDB actually supplies this information as part of their documentation. Check out :

After you’ve done this, add all MongoDB nodes into the replica_set configuration in all graylog-server.conf files.

I have datacenters across the world and do not want logs forwarding from everywhere to a central location due to bandwidth, etc. How do I handle this?

You can have multiple graylog-server instances in a federated structure, and forward select messages to a centralized GL server.

Which load balancers do you recommend we use with Graylog?

You can use any. We have clients running AWS ELB, HAProxy, F5 BIG-IP, and KEMP.

Isn’t Java slow? Does it need a lot of memory?

This is a concern that we hear from time to time. We understand Java has a bad reputation from slow and laggy desktop/GUI applications that eat a lot of memory. However, we are usually able to prove this assumption wrong. Well written Java code for server systems is very efficient and does not need a lot of memory resources.

Give it a try, you might be surprised!

Does Graylog encrypt log data?

All log data is stored in Elasticsearch. Elastic recommends you use dm-crypt at the file system level.

Where are the log files Graylog produces?

You can find the log data for Graylog under the below directory with timestamps and levels and exception messages. This is useful for debugging or when the server won’t start.

/var/log/graylog-server/server.log

If you use the pre-build appliances, take a look into

/var/log/graylog/<servicename>/current

Installation / Setup

Should I download the OVA appliances or the separate packages?

If you are downloading Graylog for the first time to evaluate it, go for the appliance. It is really easy, and can be quickly setup so you can understand if Graylog is right for you. If you are wanting to use Graylog at some scale in production, and do things like high availability (Mongo replication) we recommend you go for the separate packages.

How do I find out if a specific log source is supported?

We support many log sources – and more are coming everyday. For a complete list, check out Graylog Marketplace, the central repository of Graylog extensions. There are 4 types of content on the Marketplace:

  • Plug-Ins: Code that extends Graylog to support a specific use case that it doesn’t support out of the box.
  • Content Pack: A file that can be uploaded into your Graylog system that sets up streams, inputs, extractors, dashboards, etc. to support a given log source or use case.
  • GELF Library: A library for a programming language or logging framework that supports sending log messages in GELF format for easy integration and pre-structured messages.
  • Other Solutions: Any other content or guide that helps you integrate Graylog with an external system or device. For example, how to configure a specific device to support a format Graylog understands out of the box.

Can I install the Graylog Server on Windows?

Even though our engineers say it is “technically possible”, don’t do it. The Graylog server is built using Java, so technically it can run anywhere. But we currently have it optimized to run better on other operating systems. If you don’t feel comfortable running your own Linux system, we recommend you use our Linux virtual appliance which will run under VMWare.

Can I run Graylog on Azure?

You can create a Linux VM and use our step-by-step to install your customized Graylog. As a second option you can use this guide to convert our Appliance into some Azure compatible virtual machine.

Functionality

Can Graylog automatically clean old data?

Absolutely we have data retention features.

Does Graylog support LDAP / AD and its groups?

Yup, we’re all over this too with read/write roles and group permissions. To start, see this. If you want to get very granular, you can go through the Graylog REST API.

Do we have a user audit log for compliance?

Graylog Enterprise includes an audit log plugin. You can explore the documentation for more details.

It seems like Graylog has no reporting functionality?

That’s correct. We currently don’t have built-in reporting functionality that sends automated reports. However, you can use our REST API to generate and send you own reports. A cron job and the scripting language of your choice should do the trick.

Can I filter inbound messages before they are processed by the Graylog server?

Yes, check out our page on how to use blacklisting.

Dedicated Partition for the Journal

If you create a dedicated Partition for your Kafka Journal, you need to watch that this is a clean directory. Even lost+found can break it, for your reference.

Raise the Java Heap

If you need to raise the Java Heap of the Graylog Server or Elasticsearch in a System that runs as virtual appliances you can use the advanced settings.

On Systems that are installed with DEB / APT this setting can be made in /etc/default/graylog-server.

Systems that are installed with RPM / YUM / DNF the file is found in /etc/sysconfig/graylog-server.

How can I start an input on a port below 1024?

If you try to start an input on one of the privileged ports , it will only work for the “root” user. To be able to use a privileged port, you can use authbind on Debian-based systems, or you redirect the traffic with an iptables rule like this:

iptables -t nat -A PREROUTING -p tcp --dport 514 -j REDIRECT --to 1514
iptables -t nat -A PREROUTING -p udp --dport 514 -j REDIRECT --to 1514

The input needs to be started on port 1514 in this case and will be made available on port 514 to the outside. The clients can then send data to port 514.

Graylog & Integrations

What is the best way to integrate my applications to Graylog?

We recommend that you use GELF. It’s easy for your application developers and eliminates the need to store the messages locally. Also, GELF can just send what app person wants so you don’t have to build extractors or do any extra processing in Graylog.

I have a log source that creates dynamic syslog messages based on events and subtypes and grok patterns are difficult to use - what is the best way to handle this?

Not a problem! Use our key=value extractor.

I want to archive my log data. Can I write to another database, for example HDFS / Hadoop, from Graylog?

Yes, you can output data from Graylog to a different database. We currently have an HDFS output plug-in in the Marketplace - thank you sivasamyk!

It’s also easy and fun to write your own, which you can then add to Graylog Marketplace for others to use.

I don’t want to use Elasticsearch as my backend storage system – can I use another database, like MySQL, Oracle, etc?

You can, but we don’t suggest you do. You will not be able to use our query functionality or our analytic engine on the dataset outside the system. We only recommend another database if you want it for secondary storage.

How can I create a restricted user to check internal Graylog metrics in my monitoring system?

You can create a restricted user which only has access to the /system/metrics resource on the Graylog REST API. This way it will be possible to integrate the internal metrics of Graylog into your monitoring system. Giving the user only restricted access will minimize the impact of these credentials getting compromised.

Send a POST request via the Graylog API Browser or curl to the /roles resource of the Graylog REST API:

{
  "name": "Metrics Access",
  "description": "Provides read access to all system metrics",
  "permissions": ["metrics:*"],
  "read_only": false
 }

The following curl command will create the required role (modify the URL of the Graylog REST API, here http://127.0.0.1:9000/api/, and the user credentials, here admin/admin, according to your setup):

$ curl -u admin:admin -H "Content-Type: application/json" -X POST -d '{"name": "Metrics Access", "description": "Provides read access to all system metrics", "permissions": ["metrics:*"], "read_only": false}' 'http://127.0.0.1:9000/api/roles'

Troubleshooting

I’m sending in messages, and I can see they are being accepted by Graylog, but I can’t see them in the search. What is going wrong?

A common reason for this issue is that the timestamp in the message is wrong. First, confirm that the message was received by selecting ‘all messages’ as the time range for your search. Then identify and fix the source that is sending the wrong timestamp.

I have configured an SMTP server or an output with TLS connection and receive handshake errors. What should I do?

Outbound TLS connections have CA (certification authority) certificate verification enabled by default. In case the target server’s certificate is not signed by a CA found from trust store, the connection will fail. A typical symptom for this is the following error message in the server logs:

Caused by: javax.mail.MessagingException: Could not convert socket to TLS; nested exception is: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

This should be corrected by either adding the missing CA certificates to the Java default trust store (typically found at $JAVA_HOME/jre/lib/security/cacerts), or a custom store that is configured (by using -Djavax.net.ssl.trustStore) for the Graylog server process. The same procedure applies for both missing valid CAs and self-signed certificates.

For Debian/Ubuntu-based systems using OpenJDK JRE, CA certificates may be added to the systemwide trust store. After installing the JRE (including ca-certificates-java, ergo ca-certificates packages), place name-of-certificate-dot-crt (in PEM format) into /usr/local/share/ca-certificates/ and run /usr/sbin/update-ca-certificates. The hook script in /etc/ca-certificates/update.d/ should automatically generate /etc/ssl/certs/java/cacerts.

Fedora/RHEL-based systems may refer to Shared System Certificates in the Fedora Project Wiki.

Suddenly parts of Graylog did not work as expected

If you notice multiple different non working parts in Graylog and found something like java.lang.OutOfMemoryError: unable to create new native thread in your Graylog Server logfile, you need to raise the process/thread limit of the graylog user. The limit can be checked with ulimit -u and you need to check how you can raise nproc in your OS.

I cannot go past page 66 in search results

Elasticsearch limits the number of messages per search result to 10000 by default. Graylog displays 150 messages per page, which means that the last full page with default settings will be page 66.

You can increase the maximum result window by adjusting the parameter index.max_result_window as described in the Elasticsearch index modules dynamic settings, but be careful as this requires more memory in your Elasticsearch nodes for deep pagination.

This setting can be dynamically updated in Elasticsearch, so that it does not require a cluster restart to be effective.

My field names contain dots and stream alerts do not match anymore

Due to restrictions in certain Elasticsearch versions, Graylog needs to convert field names that contain . characters with another character, by default the replacement character is _.

This replacement is done just prior to writing messages to Elasticsearch, which causes a mismatch between what stream rules and alert conditions see as field names when they are evaluated.

Stream rules, the conditions that determine whether or not a message is routed to a stream, are being run as data is being processed by Graylog. These see the field names as containing the dots.

However, alert conditions, which are also attached to streams, are converted to searches and run in the background. They operate on stored data in Elasticsearch and thus see the replacement character for the dots. Thus alert conditions need to use the _ instead of . when referring to fields. There is currently no way to maintain backwards compatibility and transparently fixing this issue, so you need to take action.

The best option, apart from not sending fields with dots, is to remember to write alert conditions using the replacement character, and never use . in the field names. In general Graylog will use the version with _ in searches etc.

For example, if an incoming message contains the field docker.container stream rules use that name, whereas alert conditions need to use docker_container. You will notice that the search results also use the latter name.

What does “Uncommited messages deleted from journal” mean?

Some messages were deleted from the Graylog journal before they could be written to Elasticsearch. Please verify that your Elasticsearch cluster is healthy and fast enough. You may also want to review your Graylog journal settings and set a higher limit.

This can happen when Graylog is not able to connect to Elasticsearch or the Elasticsearch Cluster is not able to process the ingested messages in time. Add more resources to Elasticsearch or adjust the output settings from Graylog to Elasticsearch.

What does “Journal utilization is too high” mean?

Journal utilization is too high and may go over the limit soon. Please verify that your Elasticsearch cluster is healthy and fast enough. You may also want to review your Graylog journal settings and set a higher limit.

This can happen when Graylog is not able to connect to Elasticsearch or the Elasticsearch Cluster is not able to process the ingested messages in time. Add more resources to Elasticsearch or adjust the output settings from Graylog to Elasticsearch.

How do I fix the “Deflector exists as an index and is not an alias” error message?

Graylog is using an Elasticsearch index alias per index set pointing to the active write index, the so-called “deflector”, to write messages into Elasticsearch such as graylog_deflector in the default index set.

Please refer to Index model for a more in-depth explanation of the Elasticsearch index model used by Graylog.

In some rare situations, there might be an Elasticsearch index with a name which has been reserved for the deflector of an index set managed by Graylog, so that Graylog is unable to create the proper Elasticsearch index alias.

This error situation leads to the following system notification in Graylog:

> Deflector exists as an index and is not an alias.
> The deflector is meant to be an alias but exists as an index. Multiple failures of infrastructure can lead to this. Your messages are still indexed but searches and all maintenance tasks will fail or produce incorrect results. It is strongly recommend that you act as soon as possible.

The logs of the Graylog master node will contain a warning message similar to the following:

WARN  [IndexRotationThread] There is an index called [graylog_deflector]. Cannot fix this automatically and published a notification.
  1. Stop all Graylog nodes
  2. (OPTIONAL) If you want to keep the already ingested messages, reindex them into the Elasticsearch index with the greatest number, e. g. graylog_23 if you want to fix the deflector graylog_deflector, via the Elasticsearch Reindex API.
  3. Delete the graylog_deflector index via the Elasticsearch Delete Index API.
  4. Add action.auto_create_index: false to the configuration files of all Elasticsearch nodes in your cluster and restart these Elasticsearch nodes, see Elasticsearch Index API - Automatic Index Creation and Creating an Index for details.
  5. Start the Graylog master node.
  6. Manually rotate the active write index of the index set on the System / Indices / Index Set page in the Maintenance dropdown menu.
  7. (OPTIONAL) Start all remaining Graylog slave nodes.

Have another troubleshooting question?

See below for some additional support options where you can ask your question.

Support

I think I’ve found a bug, how do I report it?

Think you spotted a bug? Oh no! Please report it in our issue trackers so we can take a look at it. All issue trackers are hosted on GitHub, tightly coupled to our code and milestones. Don’t hesitate to open issues – we’ll just close them if there is nothing to do. Most issues will be in the Graylog server repository, but you should choose others if you have found a bug in one of the plugins.

I’m having issues installing or configuring Graylog, where can I go for support?

Check out the Graylog Community Forums – you can search for your problem which may already have an answer, or post a new question.

Another source is the Graylog channel on Matrix.org or the #graylog IRC chat channel on freenode (both are bridged, so you’ll see messages from either channels). Our developers and a lot of community members hang out here. Just join the channel and add any questions, suggestions or general topics you have.

If you’re looking for professional commercial support from the Graylog team, we do that too. Please get in touch here for more details.

GELF

Structured events from anywhere. Compressed and chunked.

The Graylog Extended Log Format (GELF) is a log format that avoids the shortcomings of classic plain syslog:

  • Limited to length of 1024 bytes – Not much space for payloads like backtraces
  • No data types in structured syslog. You don’t know what is a number and what is a string.
  • The RFCs are strict enough but there are so many syslog dialects out there that you cannot possibly parse all of them.
  • No compression

Syslog is okay for logging system messages of your machines or network gear. GELF is a great choice for logging from within applications. There are libraries and appenders for many programming languages and logging frameworks so it is easy to implement. You could use GELF to send every exception as a log message to your Graylog cluster. You don’t have to care about timeouts, connection problems or anything that might break your application from within your logging class because GELF can be sent via UDP.

GELF via UDP

Chunking

UDP datagrams are usually limited to a size of 8192 bytes. A lot of compressed information fits in there but you sometimes might just have more information to send. This is why Graylog supports chunked GELF.

You can define chunks of messages by prepending a byte header to a GELF message including a message ID and sequence number to reassemble the message later.

Most GELF libraries support chunking transparently and will detect if a message is too big to be sent in one datagram.

Of course TCP would solve this problem on a transport layer but it brings other problems that are even harder to tackle: You would have to care about slow connections, timeouts and other nasty network problems.

With UDP you may just lose a message while with TCP it could bring your whole application down when not designed with care.

Of course TCP makes sense in some (especially high volume environments) so it is your decision. Many GELF libraries support both TCP and UDP as transport. Some do even support HTTP.

Prepend the following structure to your GELF message to make it chunked:

  • Chunked GELF magic bytes - 2 bytes: 0x1e 0x0f
  • Message ID - 8 bytes: Must be the same for every chunk of this message. Identifying the whole message and is used to reassemble the chunks later. Generate from millisecond timestamp + hostname for example.
  • Sequence number - 1 byte: The sequence number of this chunk. Starting at 0 and always less than the sequence count.
  • Sequence count - 1 byte: Total number of chunks this message has.

All chunks MUST arrive within 5 seconds or the server will discard all already arrived and still arriving chunks. A message MUST NOT consist of more than 128 chunks.

Compression

When using UDP as transport layer, GELF messages can be sent uncompressed or compressed with either GZIP or ZLIB.

Graylog nodes detect the compression type in the GELF magic byte header automatically.

Decide if you want to trade a bit more CPU load for saving a lot of network bandwidth. GZIP is the protocol default.

GELF via TCP

At the current time, GELF TCP only supports uncompressed and non-chunked payloads. Each message needs to be delimited with a null byte (\0) when sent in the same TCP connection.

Attention

GELF TCP does not support compression due to the use of the null byte (\0) as frame delimiter.

GELF Payload Specification

Version 1.1 (11/2013)

A GELF message is a JSON string with the following fields:

  • version string (UTF-8)
    • GELF spec version – “1.1”; MUST be set by client library.
  • host string (UTF-8)
    • the name of the host, source or application that sent this message; MUST be set by client library.
  • short_message string (UTF-8)
    • a short descriptive message; MUST be set by client library.
  • full_message string (UTF-8)
    • a long message that can i.e. contain a backtrace; optional.
  • timestamp number
    • Seconds since UNIX epoch with optional decimal places for milliseconds; SHOULD be set by client library. Will be set to the current timestamp (now) by the server if absent.
  • level number
    • the level equal to the standard syslog levels; optional, default is 1 (ALERT).
  • facility string (UTF-8)
    • optional, deprecated. Send as additional field instead.
  • line number
    • the line in a file that caused the error (decimal); optional, deprecated. Send as additional field instead.
  • file string (UTF-8)
    • the file (with path if you want) that caused the error (string); optional, deprecated. Send as additional field instead.
  • _[additional field] string (UTF-8) or number
    • every field you send and prefix with an underscore (_) will be treated as an additional field. Allowed characters in field names are any word character (letter, number, underscore), dashes and dots. The verifying regular expression is: ^[\w\.\-]*$. Libraries SHOULD not allow to send id as additional field (_id). Graylog server nodes omit this field automatically.

Example payload

This is an example GELF message payload. Any graylog-server node accepts and stores this as a message when GZIP/ZLIB compressed or even when sent uncompressed over a plain socket (without newlines).

Note

Newlines must be denoted with the \n escape sequence to ensure the payload is valid JSON as per RFC 7159.

{
  "version": "1.1",
  "host": "example.org",
  "short_message": "A short message that helps you identify what is going on",
  "full_message": "Backtrace here\n\nmore stuff",
  "timestamp": 1385053862.3072,
  "level": 1,
  "_user_id": 9001,
  "_some_info": "foo",
  "_some_env_var": "bar"
}

Sending GELF messages via UDP using netcat

Sending an example message to a GELF UDP input (running on host graylog.example.com on port 12201):

echo -n '{ "version": "1.1", "host": "example.org", "short_message": "A short message", "level": 5, "_some_info": "foo" }' | nc -w0 -u graylog.example.com 12201

Sending GELF messages via TCP using netcat

Sending an example message to a GELF TCP input (running on host graylog.example.com on port 12201):

echo -n -e '{ "version": "1.1", "host": "example.org", "short_message": "A short message", "level": 5, "_some_info": "foo" }'"\0" | nc -w0 graylog.example.com 12201

Sending GELF messages via HTTP using curl

Sending an example message to a GELF HTTP input (running on http://graylog.example.com:12201/gelf):

curl -X POST -H 'Content-Type: application/json' -d '{ "version": "1.1", "host": "example.org", "short_message": "A short message", "level": 5, "_some_info": "foo" }' 'http://graylog.example.com:12201/gelf'

The thinking behind the Graylog architecture and why it matters to you

A short history of Graylog

The Graylog project was started by Lennart Koopmann some time around 2009. Back then the most prominent log management software vendor issued a quote for a one year license of their product that was so expensive that he decided to write a log management system himself. Now you might call this a bit over optimistic (I’ll build this in two weeks, end of quote) but the situation was hopeless: there was basically no other product on the market and especially no open source alternatives.

The log management market today

Things have changed a bit since 2009. Now there are viable open source projects with serious products and a growing list of SaaS offerings for log management.

Architectural considerations

Graylog has been successful in providing log management software because it was built for log management from the beginning. Software that stores and analyzes log data must have a very specific architecture to do it efficiently. It is more than just a database or a full text search engine because it has to deal with both text data and metrics data on a time axis. Searches are always bound to a time frame (relative or absolute) and only going back into the past because future log data has not been written yet. A general purpose database or full text search engine that could also store and index the private messages of your online platform for search will never be able to effectively manage your log data. Adding a specialized frontend on top of it makes it look like it could do the job in a good way but is basically just putting lipstick on the wrong stack.

A log management system has to be constructed of several services that take care of processing, indexing, and data access. The most important reason is that you need to scale parts of it horizontally with your changing use cases and usually the different parts of the system have different hardware requirements. All services must be tightly integrated to allow efficient management and configuration of the system as a whole. A data ingestion or forwarder tool is hard to tedious to manage if the configuration has to be stored on the client machines and is not possible via for example REST APIs controlled by a simple interface. A system administrator needs to be able to log into the web interface of a log management product and select log files of a remote host (that has a forwarder running) for ingestion into the tool.

You also want to be able to see the health and configuration of all forwarders, data processors and indexers in a central place because the whole log management stack can easily involve thousands of machines if you include the log emitting clients into this calculation. You need to be able to see which clients are forwarding log data and which are not to make sure that you are not missing any important data.

Graylog is coming the closest to the Splunk architecture:

  • Graylog was solely built as a log management system from the first line of code. This makes it very efficient and easy to use.
  • The graylog-server component sits in the middle and works around shortcomings of Elasticsearch (a full text search engine, not a log management system) for log management. It also builds an abstraction layer on top of it to make data access as easy as possible without having to select indices and write tedious time range selection filters, etc. - Just submit the search query and Graylog will take care of the rest for you.
  • All parts of the system are tightly integrated and many parts speak to each other to make your job easier.
  • Like WordPress makes MySQL a good solution for blogging, Graylog makes Elasticsearch a good solution for logging. You should never have a system or frontend query Elasticsearch directly for log management so we are putting graylog-server in front of it.
_images/architecture_comparison.png

Blackboxes

Closed source systems tend to become black boxes that you cannot extend or adapt to fit the needs of your use case. This is an important thing to consider especially for log management software. The use cases can range from simple syslog centralization to ultra flexible data bus requirements. A closed source system will always make you depending on the vendor because there is no way to adapt. As your setup reaches a certain point of flexibility you might hit a wall earlier than expected.

Consider spending a part of the money you would spend for the wrong license model for developing your own plugins or integrations.

The future

Graylog is the only open source log management system that will be able to deliver functionality and scaling in a way that Splunk does. It will be possible to replace Elasticsearch with something that is really suited for log data analysis without even changing the public facing APIs.

Changelog

Graylog 2.4.7

Released: 2019-03-01

Core

Graylog 2.4.6

Released: 2018-07-16

Core

Graylog 2.4.5

Released: 2018-05-28

Core

Graylog 2.4.4

Released: 2018-05-02

Core

ThreatIntel Plugin

AWS Plugin

Graylog 2.4.3

Released: 2018-01-24

https://www.graylog.org/blog/108-announcing-graylog-v2-4-3

Core

Graylog 2.4.2

Released: 2018-01-24

Core

Threatintel Plugin

Graylog 2.4.1

Released: 2018-01-19

https://www.graylog.org/blog/107-announcing-graylog-v2-4-1

Core

Pipeline Processor Plugin

AWS Plugin

Threatintel Plugin

Graylog 2.4.0

Released: 2017-12-22

https://www.graylog.org/blog/106-announcing-graylog-v2-4-0

No changes since 2.4.0-rc.2.

Graylog 2.4.0-rc.2

Released: 2017-12-20

Core

Graylog 2.4.0-rc.1

Released: 2017-12-19

https://www.graylog.org/blog/105-announcing-graylog-v2-4-0-rc-1

Core

Threatintel Plugin

Graylog 2.4.0-beta.4

Released: 2017-12-15

Core

Pipeline Processor Plugin

Threatintel Plugin

Anonymous Usage-Stats Plugin

  • The plugin got removed.

Graylog 2.4.0-beta.3

Released: 2017-12-04

Core

AWS Plugin

CEF Plugin

Threatintel Plugin

Graylog 2.4.0-beta.2

Released: 2017-11-07

https://www.graylog.org/blog/104-announcing-graylog-v2-4-0-beta-2

Core

Graylog 2.4.0-beta.1

Released: 2017-10-20

https://www.graylog.org/blog/103-announcing-graylog-v2-4-0-beta-1

Core

Map Widget plugin

Pipeline Processor plugin

Collector plugin

AWS plugin

CEF plugin

  • Improve CEF parser and add proper testing infrastructure.
  • Fix problems with Kafka and AMQP inputs.

NetFlow plugin

Threat Intelligence plugin

Graylog 2.3.2

Released: 2017-10-19

https://www.graylog.org/blog/102-announcing-graylog-v2-3-2

Core

Graylog 2.3.1

Released: 2017-08-25

https://www.graylog.org/blog/100-announcing-graylog-v2-3-1

Core

Pipeline Processor Plugin

Graylog 2.3.0

Released: 2017-07-26

https://www.graylog.org/blog/98-announcing-graylog-v2-3-0

Core

Beats Plugin

Collector Plugin

Map Widget Plugin

Pipeline Processor Plugin

Graylog 2.2.3

Released: 2017-04-04

https://www.graylog.org/blog/92-announcing-graylog-v2-2-3

Core

Pipeline Processor

Graylog 2.2.2

Released: 2017-03-03

https://www.graylog.org/blog/90-announcing-graylog-v2-2-2

Core

Graylog 2.2.1

Released: 2017-02-20

https://www.graylog.org/blog/89-announcing-graylog-v2-2-1

Core

Graylog 2.2.0

Released: 2017-02-14

https://www.graylog.org/blog/88-announcing-graylog-v2-2-0

Core

Beats plugin

  • Add support for Metricbeat
  • Extract “fields” for every type of beat

Pipeline processor plugin

Collector sidecar plugin

Graylog 2.1.3

Released: 2017-01-26

https://www.graylog.org/blog/84-announcing-graylog-2-1-3

Core

Beats plugin

Graylog 2.1.2

Released: 2016-11-04

https://www.graylog.org/blog/75-announcing-graylog-v2-1-2

Core

Beats plugin

Pipeline processor plugin

Graylog 2.1.1

Released: 2016-09-14

https://www.graylog.org/blog/69-announcing-graylog-v2-1-1

Core

Map plugin

Pipeline processor plugin

Graylog 2.1.0

Released: 2016-09-01

https://www.graylog.org/blog/68-announcing-graylog-v-2-1-0-ga

Core

Collector sidecar plugin

  • Return updated configuration after changing configuration name
  • Prevent crashes when failed to propagate state to the server
  • Improve compatibility with old API
  • Display collector IP address. Graylog2/graylog-plugin-collector#9
  • Ability to clone collector configuration. Graylog2/graylog-plugin-collector#10
  • NXLog GELF/TLS input should work without cert files. Graylog2/graylog-plugin-collector#13
  • Add tail_files option
  • Expand verbatim text area if value is present
  • Validation improvements
  • Add buffer option to NXLog outputs
  • Make defaults compatible with Windows hosts
  • Add support for Beats. Filebeat, Winlogbeat.
  • Beats binaries are bundled with the Collector-Sidecar package
  • Improve server side validation. Graylog2/graylog2-server#2247 and Graylog2/graylog-plugin-collector#7.
  • Add NXlog GELF TCP and TCP/TLS output
  • Add support to clone input, outputs and snippets
  • Optionally display collector status information in web interface
  • Optionally display log directory listing on status page
  • If no node-id is given use the hostname as identification
  • Linux distribution is detected and can be used in Snippet template
  • Silent install on Windows works now
  • Collector log files are now auto-rotated
  • Collector processes are supervised and restarted on crashes
  • NXlog Inputs and Outputs support free text configuration
  • Fix web plugin loading on IE 11

Pipeline processor plugin

Graylog 2.0.3

Released: 2016-06-20

https://www.graylog.org/blog/58-graylog-v2-0-3-released

Improvements

Bug fixes

Graylog 2.0.2

Released: 2016-05-27

https://www.graylog.org/blog/57-graylog-v2-0-2-released

Improvements

Bug Fixes

Plugin: Pipeline Processor

Graylog 2.0.1

Released: 2016-05-11

https://www.graylog.org/blog/56-graylog-v2-0-1-released

Improvements

Bug Fixes

Plugin: Collector

  • Rotate nxlog logfiles once a day by default.
  • Add GELF TCP output for nxlog.

Graylog 2.0.0

Released: 2016-04-27

https://www.graylog.org/blog/55-announcing-graylog-v2-0-ga

Note

Please make sure to read the Upgrade Guide before upgrading to Graylog 2.0. There are breaking changes!

Feature Highlights

See the release announcement for details on the new features.

  • Web interface no longer a separate process
  • Support for Elasticsearch 2.x
  • Live tail support
  • Message Processing Pipeline
  • Map Widget Plugin
  • Collector Sidecar
  • Streams filter UI
  • Search for surrounding messages
  • Query range limit
  • Configurable query time ranges
  • Archiving (commercial feature)

Bug Fixes

There have been lots of bug fixes since the 1.3 releases. We only list the ones that we worked on since the 2.0 alpha phase.

Graylog 1.3.4

Released: 2016-03-16

https://www.graylog.org/blog/49-graylog-1-3-4-is-now-available

Graylog 1.3.3

Released: 2016-01-14

https://www.graylog.org/graylog-1-3-3-is-now-available/

Graylog 1.3.2

Released: 2015-12-18

https://www.graylog.org/graylog-1-3-2-is-now-available/

Graylog 1.3.1

Released: 2015-12-17

https://www.graylog.org/graylog-1-3-1-is-now-available/

Graylog 1.3.0

Released: 2015-12-09

https://www.graylog.org/graylog-1-3-ga-is-ready/

Graylog 1.2.2

Released: 2015-10-27

https://www.graylog.org/graylog-1-2-2-is-now-available/

Graylog 1.2.1

Released: 2015-09-22

https://www.graylog.org/graylog-1-2-1-is-now-available/

Graylog 1.2.0

Released: 2015-09-14

https://www.graylog.org/announcing-graylog-1-2-ga-release-includes-30-new-features/

Graylog 1.2.0-rc.4

Released: 2015-09-08

https://www.graylog.org/announcing-graylog-1-2-rc-4/

Graylog 1.2.0-rc.2

Released: 2015-08-31

https://www.graylog.org/announcing-graylog-1-2-rc/

Graylog 1.1.6

Released: 2015-08-06

https://www.graylog.org/graylog-1-1-6-released/

Graylog 1.1.5

Released: 2015-07-27

https://www.graylog.org/graylog-1-1-5-released/

Graylog 1.1.4

Released: 2015-06-30

https://www.graylog.org/graylog-v1-1-4-is-now-available/

Graylog 1.1.3

Released: 2015-06-19

https://www.graylog.org/graylog-v1-1-3-is-now-available/

Graylog 1.1.2

Released: 2015-06-10

https://www.graylog.org/graylog-v1-1-2-is-now-available/

Graylog 1.1.1

Released: 2015-06-05

https://www.graylog.org/graylog-v1-1-1-is-now-available/

Graylog 1.1.0

Released: 2015-06-04

https://www.graylog.org/graylog-1-1-is-now-generally-available/

  • Properly set node_id on message input Graylog2/graylog2-server#1210
  • Fixed handling of booleans in configuration forms in the web interface
  • Various design fixes in the web interface

Graylog 1.1.0-rc.3

Released: 2015-06-02

https://www.graylog.org/graylog-v1-1-rc3-is-now-available/

Graylog 1.1.0-rc.1

Released: 2015-05-27

https://www.graylog.org/graylog-v1-1-rc1-is-now-available/

Graylog 1.1.0-beta.3

Released: 2015-05-27

https://www.graylog.org/graylog-1-1-beta-3-is-now-available/

Graylog 1.1.0-beta.2

Released: 2015-05-20

https://www.graylog.org/graylog-1-1-beta-is-now-available/

  • CSV output streaming support including full text message
  • Simplified MongoDB configuration with URI support
  • Improved tokenizer for extractors
  • Configurable UDP buffer size for incoming messages
  • Enhanced Grok support with type conversions (integers, doubles and dates)
  • Elasticsearch 1.5.2 support
  • Added support for integrated Log Collector
  • Search auto-complete
  • Manual widget resize
  • Auto resize of widgets based on screen size
  • Faster search results
  • Moved search filter for usability
  • Updated several icons to text boxes for usability
  • Search highlight toggle
  • Pie charts (Stacked charts are coming too!)
  • Improved stream management
  • Output plugin and Alarm callback edit support
  • Dashboard widget search edit
  • Dashboard widget direct search button
  • Dashboard background update support for better performance
  • Log collector status UI

Graylog 1.0.2

Released: 2015-04-28

https://www.graylog.org/graylog-v1-0-2-has-been-released/

Graylog 1.0.1

Released: 2015-03-16

https://www.graylog.org/graylog-v1-0-1-has-been-released/

Graylog 1.0.0

Released: 2015-02-19

https://www.graylog.org/announcing-graylog-v1-0-ga/

  • No changes since Graylog 1.0.0-rc.4

Graylog 1.0.0-rc.4

Released: 2015-02-13

https://www.graylog.org/graylog-v1-0-rc-4-has-been-released/

Graylog 1.0.0-rc.3

Released: 2015-02-05

https://www.graylog.org/graylog-v1-0-rc-3-has-been-released/

Graylog 1.0.0-rc.2

Released: 2015-02-04

https://www.graylog.org/graylog-v1-0-rc-2-has-been-released/

Graylog 1.0.0-rc.1

Released: 2015-01-28

https://www.graylog.org/graylog-v1-0-rc-1-has-been-released/

Graylog 1.0.0-beta.2

Released: 2015-01-21

https://www.graylog.org/graylog-v1-0-beta-3-has-been-released/

Graylog 1.0.0-beta.2

Released: 2015-01-16

https://www.graylog.org/graylog-v1-0-0-beta2/

Graylog2 0.92.4

Released: 2015-01-14

https://www.graylog.org/graylog2-v0-92-4/

Graylog 1.0.0-beta.1

Released: 2015-01-12

https://www.graylog.org/graylog-v1-0-0-beta1/

  • Message Journaling
  • New Widgets
  • Grok Extractor Support
  • Overall stability and resource efficiency improvements
  • Single binary for graylog2-server and graylog2-radio
  • Inputs are now editable
  • Order of field charts rendered inside the search results page is now maintained.
  • Improvements in focus and keyboard behaviour on modal windows and forms.
  • You can now define whether to disable expensive, frequent real-time updates of the UI in the settings of each user. (For example the updating of total messages in the system)
  • Experimental search query auto-completion that can be enabled in the user preferences.
  • The API browser now documents server response payloads in a better way so you know what to expect as an answer to your call.
  • Now using the standard Java ServiceLoader for plugins.

Graylog2 0.92.3

Released: 2014-12-23

https://www.graylog.org/graylog2-v0-92-3/

Graylog2 0.92.1

Released: 2014-12-11

https://www.graylog.org/graylog2-v0-92-1/

  • [SERVER] Fixed name resolution and overriding sources for network inputs.
  • [SERVER] Fixed wrong delimiter in GELF TCP input.
  • [SERVER] Disabled the output cache by default. The output cache is the source of all sorts of interesting problems. If you want to keep using it, please read the upgrade notes.
  • [SERVER] Fixed message timestamps in GELF output.
  • [SERVER] Fixed connection counter for network inputs.
  • [SERVER] Added warning message if the receive buffer size (SO_RECV) couldn’t be set for network inputs.
  • [WEB] Improved keyboard shortcuts with most modal dialogs (e. g. hitting Enter submits the form instead of just closing the dialogs).
  • [WEB] Upgraded to play2-graylog2 1.2.1 (compatible with Play 2.3.x and Java 7).

Graylog2 0.92.0

Released: 2014-12-01

https://www.graylog.org/graylog2-v0-92/

  • [SERVER] IMPORTANT SECURITY FIX: It was possible to perform LDAP logins with crafted wildcards. (A big thank you to Jose Tozo who discovered this issue and disclosed it very responsibly.)
  • [SERVER] Generate a system notification if garbage collection takes longer than a configurable threshold.
  • [SERVER] Added several JVM-related metrics.
  • [SERVER] Added support for Elasticsearch 1.4.x which brings a lot of stability and resilience features to Elasticsearch clusters.
  • [SERVER] Made version check of Elasticsearch version optional. Disabling this check is not recommended.
  • [SERVER] Added an option to disable optimizing Elasticsearch indices on index cycling.
  • [SERVER] Added an option to disable time-range calculation for indices on index cycling.
  • [SERVER] Lots of other performance enhancements for large setups (i.e. involving several Radio nodes and multiple Graylog2 Servers).
  • [SERVER] Support for Syslog Octet Counting, as used by syslog-ng for syslog via TCP (#743)
  • [SERVER] Improved support for structured syslog messages (#744)
  • [SERVER] Bug fixes regarding IPv6 literals in mongodb_replica_set and elasticsearch_discovery_zen_ping_unicast_hosts
  • [WEB] Added additional details to system notification about Elasticsearch max. open file descriptors.
  • [WEB] Fixed several bugs and inconsistencies regarding time zones.
  • [WEB] Improved graphs and diagrams
  • [WEB] Allow to update dashboards when browser window is not on focus (#738)
  • [WEB] Bug fixes regarding timezone handling
  • Numerous internal bug fixes

Graylog2 0.92.0-rc.1

Released: 2014-11-21

https://www.graylog.org/graylog2-v0-92-rc-1/

  • [SERVER] Generate a system notification if garbage collection takes longer than a configurable threshold.
  • [SERVER] Added several JVM-related metrics.
  • [SERVER] Added support for Elasticsearch 1.4.x which brings a lot of stability and resilience features to Elasticsearch clusters.
  • [SERVER] Made version check of Elasticsearch version optional. Disabling this check is not recommended.
  • [SERVER] Added an option to disable optimizing Elasticsearch indices on index cycling.
  • [SERVER] Added an option to disable time-range calculation for indices on index cycling.
  • [SERVER] Lots of other performance enhancements for large setups (i. e. involving several Radio nodes and multiple Graylog2 Servers).
  • [WEB] Upgraded to Play 2.3.6.
  • [WEB] Added additional details to system notification about Elasticsearch max. open file descriptors.
  • [WEB] Fixed several bugs and inconsistencies regarding time zones.
  • Numerous internal bug fixes

Graylog2 0.91.3

Released: 2014-11-05

https://www.graylog.org/graylog2-v0-90-3-and-v0-91-3-has-been-released/

  • Fixed date and time issues related to DST changes
  • Requires Elasticsearch 1.3.4; Elasticsearch 1.3.2 had a bug that can cause index corruptions.
  • The mongodb_replica_set configuration variable now supports IPv6
  • Messages read from the on-disk caches could be stored with missing fields

Graylog2 0.91.3

Released: 2014-11-05

https://www.graylog.org/graylog2-v0-90-3-and-v0-91-3-has-been-released/

  • Fixed date and time issues related to DST changes
  • The mongodb_replica_set configuration variable now supports IPv6
  • Messages read from the on-disk caches could be stored with missing fields

Graylog2 0.92.0-beta.1

Released: 2014-11-05

https://www.graylog.org/graylog2-v0-92-beta-1/

  • Content packs
  • [SERVER] SSL/TLS support for Graylog2 REST API
  • [SERVER] Support for time based retention cleaning of your messages. The old message count based approach is still the default.
  • [SERVER] Support for Syslog Octet Counting, as used by syslog-ng for syslog via TCP (Graylog2/graylog2-server#743)
  • [SERVER] Improved support for structured syslog messages (Graylog2/graylog2-server#744)
  • [SERVER] Bug fixes regarding IPv6 literals in mongodb_replica_set and elasticsearch_discovery_zen_ping_unicast_hosts
  • [WEB] Revamped “Sources” page in the web interface
  • [WEB] Improved graphs and diagrams
  • [WEB] Allow to update dashboards when browser window is not on focus (Graylog2/graylog2-web-interface#738)
  • [WEB] Bug fixes regarding timezone handling
  • Numerous internal bug fixes

Graylog2 0.91.1

Released: 2014-10-17

https://www.graylog.org/two-new-graylog2-releases/

  • Messages written to the persisted master caches were written to the system with unreadable timestamps, leading to
  • errors when trying to open the message.
  • Extractors were only being deleted from running inputs but not from all inputs
  • Output plugins were not always properly loaded
  • You can now configure the alert_check_interval in your graylog2.conf
  • Parsing of configured Elasticsearch unicast discovery addresses could break when including spaces

Graylog2 0.90.1

Released: 2014-10-17

https://www.graylog.org/two-new-graylog2-releases/

  • Messages written to the persisted master caches were written to the system with unreadable timestamps, leading to errors when trying to open the message.
  • Extractors were only being deleted from running inputs but not from all inputs
  • Output plugins were not always properly loaded
  • You can now configure the alert_check_interval in your graylog2.conf
  • Parsing of configured Elasticsearch unicast discovery addresses could break when including spaces

Graylog2 0.91.0-rc.1

Released: 2014-09-23

https://www.graylog.org/graylog2-v0-90-has-been-released/

  • Optional ElasticSearch v1.3.2 support

Graylog2 0.90.0

Released: 2014-09-23

https://www.graylog.org/graylog2-v0-90-has-been-released/

  • Real-time data forwarding to Splunk or other systems
  • Alert callbacks for greater flexibility
  • New disk-based architecture for buffering in load spike situations
  • Improved graphing
  • Plugin API
  • Huge performance and stability improvements across the whole stack
  • Small possibility of losing messages in certain scenarios has been fixed
  • Improvements to internal logging from threads to avoid swallowing Graylog2 error messages
  • Paused streams are no longer checked for alerts
  • Several improvements to timezone handling
  • JavaScript performance fixes in the web interface and especially a fixed memory leak of charts on dashboards
  • The GELF HTTP input now supports CORS
  • Stream matching now has a configurable timeout to avoid stalling message processing in case of too complex rules or erroneous regular expressions
  • Stability improvements for Kafka and AMQP inputs
  • Inputs can now be paused and resumed
  • Dozens of bug fixes and other improvements

Graylog2 0.20.3

Released: 2014-08-09

https://www.graylog.org/graylog2-v0-20-3-has-been-released/

  • Bugfix: Storing saved searches was not accounting custom application contexts
  • Bugfix: Editing stream rules could have a wrong a pre-filled value
  • Bugfix: The create dashboard link was shown even if the user has no permission to so. This caused an ugly error page because of the missing permissions.
  • Bugfix: graylog2-radio could lose numeric fields when writing to the message broker
  • Better default batch size values for the Elasticsearch output
  • Improved rest_transport_uri default settings to avoid confusion with loopback interfaces
  • The deflector index is now also using the configured index prefix

Graylog2 0.20.2

Released: 2014-05-24

https://www.graylog.org/graylog2-v0-20-2-has-been-released/

  • Search result highlighting
  • Reintroduces AMQP support
  • Extractor improvements and sharing
  • Graceful shutdowns, Lifecycles, Load Balancer integration
  • Improved stream alert emails
  • Alert annotations
  • CSV exports via the REST API now support chunked transfers and avoid heap size problems with huge result sets
  • Login now redirects to page you visited before if there was one
  • More live updating information in node detail pages
  • Empty dashboards no longer show lock/unlock buttons
  • Global inputs now also show IO metrics
  • You can now easily copy message IDs into native clipboard with one click
  • Improved message field selection in the sidebar
  • Fixed display of floating point numbers in several places
  • Now supporting application contexts in the web interface like http://example.org/graylog2
  • Several fixes for LDAP configuration form
  • Message fields in the search result sidebar now survive pagination
  • Only admin users are allowed to change the session timeout for reader users
  • New extractor: Copy whole input
  • New converters: uppercase/lowercase, flexdate (tries to parse any string as date)
  • New stream rule to check for presence or absence of fields
  • Message processing now supports trace logging
  • Better error message for ES discovery problems
  • Fixes to GELF HTTP input and it holding open connections
  • Some timezone fixes
  • CSV exports now only contain selected fields
  • Improvements for bin/graylog* control scripts
  • UDP inputs now allow for custom receive buffer sizes
  • Numeric extractor converter now supports floating point values
  • Bugfix: Several small fixes to system notifications and closing them
  • Bugfix: Carriage returns were not escaped properly in CSV exports
  • Bugfix: Some AJAX calls redirected to the startpage when they failed
  • Bugfix: Wrong sorting in sources table
  • Bugfix: Quickvalues widget was broken with very long values
  • Bugfix: Quickvalues modal was positioned wrong in some cases
  • Bugfix: Indexer failures list could break when you had a lot of failures
  • Custom application prefix was not working for field chart analytics
  • Bugfix: Memory leaks in the dashboards
  • Bugfix: NullPointerException when Elasticsearch discovery failed and unicast discovery was disabled
  • Message backlog in alert emails did not always include the correct number of messages
  • Improvements for message outputs: No longer only waiting for filled buffers but also flushing them regularly. This avoids problems that make Graylog2 look like it misses messages in cheap benchmark scenarios combined with only little throughput.

Introduction

Graylog Enterprise, built on top of the Graylog open source platform, offers additional features that enable users to deploy Graylog at enterprise scale and apply Graylog to processes and workflows across the whole organization.

Please see the Graylog Enterprise Page for details.

Setup

Graylog Enterprise comes as a set of Graylog server plugins which need to be installed in addition to the Graylog open source setup.

Requirements

The following list shows the minimum required Graylog versions for the Graylog Enterprise plugins.

Enterprise Version Requirements
Enterprise Version Required Graylog Version
1.0.0 2.0.0, 2.0.1
1.0.1 2.0.2, 2.0.3
1.2.0 2.1.0, 2.1.1, 2.1.2
1.2.1 2.1.3
2.2.0 2.2.0
2.2.1 2.2.1
2.3.0 2.3.0
2.3.1 2.3.1
2.3.2 2.3.2
2.4.0 2.4.0
2.4.1 2.4.1
2.4.2 2.4.2
2.4.3 2.4.3
2.4.4 2.4.4
2.4.5 2.4.5
2.4.6 2.4.6

Installation

Since Graylog 2.4 the Graylog Enterprise plugins can be installed the same way Graylog is installed. In most setups this will be done with the package tool provided by the distribution you are using and the online repository.

Note

For previous versions of Graylog Enterprise please contact your Graylog account manager.

Once you installed the Graylog Enterprise plugins you need to obtain a license from the Graylog Enterprise web page.

Should a simple apt-get install graylog-enterprise-plugins or yum install graylog-enterprise-plugins not work for you, the following information might help you.

Important

The Graylog Enterprise plugins need to be installed on all your Graylog nodes!

DEB / RPM Package

The default installation should be done with the system package tools. It includes the repository installation that is described in the Operating System Packages installation guides.

When the usage of online repositorys is not possible in your environment, you can download the Graylog Enterprise plugins at https://packages.graylog2.org.

Note

These packages can only be used when you installed Graylog via the Operating System Packages!

DEB

The installation on distributions like Debian or Ubuntu can be done with apt-get as installation tool from the previous installed online repository.

$ sudo apt-get install graylog-enterprise-plugins
RPM

The installation on distributions like CentOS or RedHat can be done with yum as installation tool from the previous installed online repository.

$ sudo yum install graylog-enterprise-plugins

Tarball

If you have done a manual installation or want to include only parts of the enterprise plugins you can get the tarball from the download locations listed in the following table.

Enterprise Plugins download
Enterprise Version Download URL
2.4.0 https://downloads.graylog.org/releases/graylog-enterprise/plugin-bundle/tgz/graylog-enterprise-plugins-2.4.0.tgz
2.4.1 https://downloads.graylog.org/releases/graylog-enterprise/plugin-bundle/tgz/graylog-enterprise-plugins-2.4.1.tgz
2.4.2 https://downloads.graylog.org/releases/graylog-enterprise/plugin-bundle/tgz/graylog-enterprise-plugins-2.4.2.tgz
2.4.3 https://downloads.graylog.org/releases/graylog-enterprise/plugin-bundle/tgz/graylog-enterprise-plugins-2.4.3.tgz
2.4.4 https://downloads.graylog.org/releases/graylog-enterprise/plugin-bundle/tgz/graylog-enterprise-plugins-2.4.4.tgz
2.4.5 https://downloads.graylog.org/releases/graylog-enterprise/plugin-bundle/tgz/graylog-enterprise-plugins-2.4.5.tgz
2.4.6 https://downloads.graylog.org/releases/graylog-enterprise/plugin-bundle/tgz/graylog-enterprise-plugins-2.4.6.tgz

The tarball includes the enterprise plugin JAR files.

$ tar -tzf graylog-enterprise-plugins-2.4.0.tgz
  graylog-enterprise-plugins-2.4.0/LICENSE
  graylog-enterprise-plugins-2.4.0/plugin/graylog-plugin-archive-2.4.0.jar
  graylog-enterprise-plugins-2.4.0/plugin/graylog-plugin-auditlog-2.4.0.jar
  graylog-enterprise-plugins-2.4.0/plugin/graylog-plugin-license-2.4.0.jar

Depending on the Graylog setup method you have used, you have to install the plugins into different locations.

Plugin Installation Locations
Installation Method Directory
Virtual Machine Appliances /opt/graylog/plugins/
Operating System Packages /usr/share/graylog-server/plugin/
Manual Setup /<extracted-graylog-tarball-path>/plugin/

Also check the plugin_dir config option in your Graylog server configuration file. The default might have been changed.

Make sure to install the enterprise plugin JAR files alongside the other Graylog plugins. Your plugin directory should look similar to this after installing the enterprise plugins.

plugin/
├── graylog-plugin-auditlog-2.4.0.jar
├── graylog-plugin-threatintel-2.4.0.jar
├── graylog-plugin-archive-2.4.0.jar
├── graylog-plugin-beats-2.4.0.jar
├── graylog-plugin-netflow-2.4.0.jar
├── graylog-plugin-aws-2.4.0.jar
├── graylog-plugin-pipeline-processor-2.4.0.jar
├── graylog-plugin-enterprise-integration-2.4.0.jar
├── graylog-plugin-map-widget-2.4.0.jar
├── graylog-plugin-cef-2.4.0.jar
├── graylog-plugin-license-2.4.0.jar
└── graylog-plugin-collector-2.4.0.jar

Server Restart

After you installed the Graylog Enterprise plugins you have to restart each of your Graylog servers to load the plugins.

Note

We recommend restarting one server at a time!

You should see something like the following in your Graylog server logs. It indicates that the plugins have been successfully loaded.

2017-12-18T17:39:10.797+01:00 INFO  [CmdLineTool] Loaded plugin: AWS plugins 2.4.0 [org.graylog.aws.plugin.AWSPlugin]
2017-12-18T17:39:10.803+01:00 INFO  [CmdLineTool] Loaded plugin: Audit Log 2.4.0 [org.graylog.plugins.auditlog.AuditLogPlugin]
2017-12-18T17:39:10.805+01:00 INFO  [CmdLineTool] Loaded plugin: Elastic Beats Input 2.4.0 [org.graylog.plugins.beats.BeatsInputPlugin]
2017-12-18T17:39:10.807+01:00 INFO  [CmdLineTool] Loaded plugin: CEF Input 2.4.0 [org.graylog.plugins.cef.CEFInputPlugin]
2017-12-18T17:39:10.809+01:00 INFO  [CmdLineTool] Loaded plugin: Collector 2.4.0 [org.graylog.plugins.collector.CollectorPlugin]
2017-12-18T17:39:10.811+01:00 INFO  [CmdLineTool] Loaded plugin: Enterprise Integration Plugin 2.4.0 [org.graylog.plugins.enterprise_integration.EnterpriseIntegrationPlugin]
2017-12-18T17:39:10.812+01:00 INFO  [CmdLineTool] Loaded plugin: License Plugin 2.4.0 [org.graylog.plugins.license.LicensePlugin]
2017-12-18T17:39:10.814+01:00 INFO  [CmdLineTool] Loaded plugin: MapWidgetPlugin 2.4.0 [org.graylog.plugins.map.MapWidgetPlugin]
2017-12-18T17:39:10.815+01:00 INFO  [CmdLineTool] Loaded plugin: NetFlow Plugin 2.4.0 [org.graylog.plugins.netflow.NetFlowPlugin]
2017-12-18T17:39:10.826+01:00 INFO  [CmdLineTool] Loaded plugin: Pipeline Processor Plugin 2.4.0 [org.graylog.plugins.pipelineprocessor.ProcessorPlugin]
2017-12-18T17:39:10.827+01:00 INFO  [CmdLineTool] Loaded plugin: Threat Intelligence Plugin 2.4.0 [org.graylog.plugins.threatintel.ThreatIntelPlugin]

Cluster Setup

If you run a Graylog cluster you need to add the enterprise plugins to every Graylog node. Additionally your load-balancer must route /api/plugins/org.graylog.plugins.archive/ only to the Graylog master node. Future versions of Graylog will forward these requests automatically to the correct node.

License Installation

The Graylog Enterprise plugins require a valid license to use the additional features.

Once you have obtained a license you can import it into your Graylog setup by going through the following steps.

  1. As an admin user, open the System / License page from the menu in the web interface.
  2. Click the Import new license button in the top right hand corner.
  3. Copy the license text from the confirmation email and paste it into the text field.
  4. The license should be valid and a preview of your license details should appear below the text field.
  5. Click Import to activate the license.

The license automatically applies to all nodes in your cluster without the need to restart your server nodes.

Note

If there are errors, please check that you copied the entire license from the email without line breaks. The same license is also attached as a text file in case it is wrongly formatted in the email.

_images/enterprise-license-1.png

License Verification

Some Graylog licenses require to check their validity on a regular basis. This includes the free Graylog Enterprise license with a specific amount of traffic included.

If your network environment requires Graylog to use a proxy server in order to communicate with the external services via HTTPS, you’ll have to configure the proxy server in the Graylog configuration file.

The Graylog web interface shows all details about the license, but if you are still unclear about the requirements, please contact our sales team with your questions.

Details on License Verification

Graylog Enterprise periodically sends the following information to ‘api.graylog.com’ via HTTPS on TCP port 443 for each installed license:

  • A nonce to avoid modified reports
  • The ID of the license
  • The ID of the Graylog cluster
  • A flag indicating if the license is violated
  • A flag indicating if the license has expired
  • A flag indicating if Graylog detected that the traffic measuring mechanisms have been modified
  • A list of how much traffic was received and written by Graylog in the recent days, in bytes

Archiving

Graylog enables you to configure a retention period to automatically delete older messages - this is to help you control the costs of storage in Elasticsearch. But we know it’s not ideal deciding between keeping less messages in Graylog or paying more for hardware. Additionally, many of you are required to store data for long periods of time due to compliance requirements like PCI or HIPAA.

The Archiving functionality allows you to archive log messages until you need to re-import them into Graylog for analysis. You can instruct Graylog to automatically archive log messages to compressed flat files on the local filesystem before retention cleaning kicks in and messages are deleted from Elasticsearch. Archiving also works through a REST call or the web interface if you don’t want to wait until retention cleaning to happen. We chose flat files for this because they are vendor agnostic so you will always be able to access your data.

You can then do whatever you want with the archived files: move them to cheap storage, write them on tape, or even print them out if you need to! If you need to search through archived data in the future, you can move any selection of archived messages back into the Graylog archive folder, and the web interface will enable you to temporarily import the archive so you can analyze the messages again in Graylog.

Note

The archive plugin is a commercial feature and part of Graylog Enterprise.

Setup

The archive plugin is a commercial Graylog feature that can be installed in addition to the Graylog open source server.

Installation

Please see the Graylog Enterprise setup page for details on how to install the Archive plugin.

Configuration

The archive plugin can be configured via the Graylog web interface and does not need any changes in the Graylog server configuration file.

In the web interface menu navigate to “System/Archives” and click “Configuration” to adjust the configuration.

_images/archiving-setup-config.png
Archive Options

There are several configuration options to configure the archive plugin.

Configuration Options
Name Description
Backend Backend on the master node where the archive files will be stored.
Max Segment Size Maximum size (in bytes) of archive segment files.
Compression Type Compression type that will be used to compress the archives.
Checksum Type Checksum algorithm that is used to calculate the checksum for archives.
Restore index batch size Elasticsearch batch size when restoring archive files.
Streams to archive Streams that should be included in the archive.
Backend

The archived indices will be stored in a backend. A backend that stores the data in /tmp/graylog-archive is created when the server starts for the first time but you can create a new backend if you want to store the data in a different path.

Max Segment Size

When archiving an index, the archive job writes the data into segments. The Max Segment Size setting sets the size limit for each of these data segments.

This allows control over the file size of the segment files to make it possible to process them with tools which have a size limit for files.

Once the size limit is reached, a new segment file will be started.

Example:

/path/to/archive/
  graylog_201/
    archive-metadata.json
    archive-segment-0.gz
    archive-segment-1.gz
    archive-segment-2.gz
Compression Type

Archives will be compressed with gzip by default. This option can be changed to use a different compression type.

The selected compression type has a big impact on the time it takes to archive an index. Gzip for example is pretty slow but has a great compression rate. Snappy and LZ4 are way faster but the archives will be bigger.

Here is a comparison between the available compression algorithms with test data.

Compression Type Comparison
Type Index Size Archive Size Duration
gzip 1 GB 134 MB 15 minutes, 23 seconds
Snappy 1 GB 291 MB 2 minutes, 31 seconds
LZ4 1 GB 266 MB 2 minutes, 25 seconds

Note

Results with your data may vary! Make sure to test the different compression types to find the one that is best for your data.

Warning

The current implementation of LZ4 is not compatible with the LZ4 CLI tools, thus decompressing the LZ4 archives outside of Graylog is currently not possible.

Checksum Type

When writing archives Graylog computes a CRC32 checksum over the files. This option can be changed to use a different checksum algorithm.

The type of checksum depends on the use case. CRC32 and MD5 are quick to compute and a reasonable choice to be able to detect damaged files, but neither is suitable as protection against malicious changes in the files. Graylog also supports using SHA-1 or SHA-256 checksums which can be used to make sure the files were not modified, as they are cryptographic hashes.

The best choice of checksum types depends on whether the necessary system tools are installed to compute them later (not all systems come with a SHA-256 utility for example), speed of checksum calculation for larger files as well as the security considerations.

Restore Index Batch Size

This setting controls the batch size for re-indexing archive data into Elasticsearch. When set to 1000, the restore job will re-index the archived data in document batches of 1000.

You can use this setting to control the speed of the restore process and also how much load it will generate on the Elasticsearch cluster. The higher the batch size, the faster the restore will progress and the more load will be put on your Elasticsearch cluster in addition to the normal message processing.

Make sure to tune this carefully to avoid any negative impact on your message indexing throughput and search speed!

Streams To Archive

This option can be used to select which streams should be included in the archive. With this you are able to archive only your important data instead of archiving everything that is arriving in Graylog.

Note

New streams will be archived automatically. If you create a new stream and don’t want it to be archived, you have to disable it in this configuration dialog.

Backends

A backend can be used to store the archived data. For now, we only support a single file system backend type.

File System

The archived indices will be stored in the Output base path directory. This directory needs to exist and be writable for the Graylog server process so the files can be stored.

Note

Only the master node needs access to the Output base path directory because the archiving process runs on the master node.

We recommend to put the Output base path directory onto a separate disk or partition to avoid any negative impact on the message processing should the archiving fill up the disk.

_images/archiving-setup-backend-new.png
Configuration Options
Name Description
Title A simple title to identify the backend.
Description Longer description for the backend.
Output base path Directory path where the archive files should be stored.

Output base path

The output base path can either be a simple directory path string or a template string to build dynamic paths.

You could use a template string to store the archive data in a directory tree that is based on the archival date.

Example:

# Template
/data/graylog-archive/${year}/${month}/${day}

# Result
/data/graylog-archive/2017/04/01/graylog_0
Available Template Variables
Name Description
${year} Archival date year. (e.g. “2017”)
${month} Archival date month. (e.g “04”)
${day} Archival date day. (e.g. “01”)
${hour} Archival date hour. (e.g. “23”)
${minute} Archival date minute. (e.g. “24”)
${second} Archival date second. (e.g. “59”)
${index-name} Name of the archived index. (e.g. “graylog_0”)
Index Retention

Graylog is using configurable index retention strategies to delete old indices. By default indices can be closed or deleted if you have more than the configured limit.

The archive plugin offers a new index retention strategy that you can configure to automatically archive an index before closing or deleting it.

Index retention strategies can be configured in the system menu under “System/Indices”. Select an index set and click “Edit” to change the index rotation and retention strategies.

_images/archiving-setup-index-retention-config.png

As with the regular index retention strategies, you can configure a max number of Elasticsearch indices. Once there are more indices than the configured limit, the oldest ones will be archived into the backend and then closed or deleted. You can also decide to not do anything (NONE) after archiving an index. In that case no cleanup of old indices will happen and you have to take care of that yourself!

Usage

Creating Archives

There are three ways to create archives from the Graylog Elasticsearch indices.

Web Interface

You can manually create an archive on the “System/Archives” page in the web interface.

_images/archiving-usage-create-web.png

On the “Create Archive for Index” section of the page is a form where you can select an index and archive it by pressing “Archive Index”.

Using this will just archive the index to disk and does not close it or delete it. This is a great way to test the archiving feature without changing your index retention configuration.

Index Retention

The archive plugin ships with an index retention strategy that can be used to automatically create archives before closing or deleting Elasticsearch indices.

This is the easiest way to automatically create archives without custom scripting.

Please see the Index Retention Configuration on how to configure it.

REST API

The archive plugin also offers a REST API that you can use to automate archive creation if you have some special requirements and need a more flexible way to do this.

_images/archiving-usage-create-api.png

An index can be archived with a simple curl command:

$ curl -s -u admin -X POST http://127.0.0.1:9000/api/plugins/org.graylog.plugins.archive/archives/graylog_386
Enter host password for user 'admin': ***************
{
   "archive_job_config" : {
     "archive_path" : "/tmp/graylog-archive",
     "max_segment_size" : 524288000,
     "segment_filename_prefix" : "archive-segment",
     "metadata_filename" : "archive-metadata.json",
     "source_histogram_bucket_size" : 86400000,
     "restore_index_batch_size" : 1001,
     "segment_compression_type": "SNAPPY"
   },
   "system_job" : {
     "id" : "cd7ebfa0-079b-11e6-9e1b-fa163e6e9b8a",
     "description" : "Archives indices and deletes them",
     "name" : "org.graylog.plugins.archive.job.ArchiveCreateSystemJob",
     "info" : "Archiving documents in index: graylog_386",
     "node_id" : "c5df7bff-cafd-4546-ac0a-5ccd2ba4c847",
     "started_at" : "2016-04-21T08:34:03.034Z",
     "percent_complete" : 0,
     "provides_progress" : true,
     "is_cancelable" : true
   }
 }

That command started a system job in the Graylog server to create an archive for index graylog_386. The system_job.id can be used to check the progress of the job.

The REST API can be used to automate other archive related tasks as well, like restoring and deleting archives or updating the archive config. See the REST API browser on your Graylog server for details.

Restoring Archives

Note

The restore process adds load to your Elasticsearch cluster because all messages are basically re-indexed. Please make sure to keep this in mind and test with smaller archives to see how your cluster behaves. Also use the Restore Index Batch Size setting to control the Elasticsearch batch size on re-index.

The archive plugin offer two ways to restore archived indices.

The archive plugin restores all indices into the “Restored Archives” index set to avoid conflicts with the original indices. (should those still exist)

_images/archiving-usage-restore-web-result.png

Restored indices are also marked as reopened so they are ignored by index retention jobs and are not closed or deleted. That means you have to manually delete any restored indices manually once you do not need them anymore.

Web Interface

In the web interface you can restore an archive on the “System/Archives” page by selecting an archive from the list, open the archive details and clicking the “Restore Index” button.

_images/archiving-usage-restore-web.png
REST API

As with archive creation you can also use the REST API to restore an archived index into the Elasticsearch cluster:

$ curl -s -u admin -X POST http://127.0.0.1:9000/api/plugins/org.graylog.plugins.archive/archives/graylog_386/restore
Enter host password for user 'admin': ***************
{
   "archive_metadata": {
     "archive_id": "graylog_307",
     "index_name": "graylog_307",
     "document_count": 491906,
     "created_at": "2016-04-14T14:31:50.787Z",
     "creation_duration": 142663,
     "timestamp_min": "2016-04-14T14:00:01.008Z",
     "timestamp_max": "2016-04-14T14:29:27.639Z",
     "id_mappings": {
       "streams": {
         "56fbafe0fb121a5309cef297": "nginx requests"
       },
       "inputs": {
         "56fbafe0fb121a5309cef290": "nginx error_log",
         "56fbafe0fb121a5309cef28d": "nginx access_log"
       },
       "nodes": {
         "c5df7bff-cafd-4546-ac0a-5ccd2ba4c847": "graylog.example.org"
       }
     },
     "histogram_bucket_size": 86400000,
     "source_histogram": {
       "2016-04-14T00:00:00.000Z": {
         "example.org": 227567
       }
     },
     "segments": [
       {
         "path": "archive-segment-0.gz",
         "size": 21653755,
         "raw_size": 2359745839,
         "compression_type": "SNAPPY"
         "checksum": "751e6e76",
         "checksum_type": "CRC32"
       }
     ],
     "index_size": 12509063,
     "index_shard_count": 4
   },
   "system_job": {
     "id": "e680dcc0-07a2-11e6-9e1b-fa163e6e9b8a",
     "description": "Restores an index from the archive",
     "name": "org.graylog.plugins.archive.job.ArchiveRestoreSystemJob",
     "info": "Restoring documents from archived index: graylog_307",
     "node_id": "c5df7bff-cafd-4546-ac0a-5ccd2ba4c847",
     "started_at": "2016-04-21T09:24:51.468Z",
     "percent_complete": 0,
     "provides_progress": true,
     "is_cancelable": true
   }
 }

The returned JSON payload contains the archive metadata and the system job description that runs the index restore process.

Restore into a separate cluster

As said earlier, restoring archived indices slow down your indexing speed because of added load. If you want to completely avoid adding more load to your Elasticsearch cluster, you can restore the archived indices on a different cluster.

To do that, you only have to transfer the archived indices to a different machine and put them into a configured Backend.

Each index archive is in a separate directory, so if you only want to transfer one index to a different machine, you only have to copy the corresponding directory into the backend.

Example:

$ tree /tmp/graylog-archive
  /tmp/graylog-archive
  ├── graylog_171
  │   ├── archive-metadata.json
  │   └── archive-segment-0.gz
  ├── graylog_201
  │   ├── archive-metadata.json
  │   └── archive-segment-0.gz
  ├── graylog_268
  │   ├── archive-metadata.json
  │   └── archive-segment-0.gz
  ├── graylog_293
  │   ├── archive-metadata.json
  │   └── archive-segment-0.gz
  ├── graylog_307
  │   ├── archive-metadata.json
  │   └── archive-segment-0.gz
  ├── graylog_386
  │   ├── archive-metadata.json
  │   └── archive-segment-0.gz
  └── graylog_81
      ├── archive-metadata.json
      └── archive-segment-0.gz
  7 directories, 14 files

Searching in Restored Indices

Once an index has been restored from an archive it will be used by search queries automatically.

Every message that gets restored into an Elasticsearch index gets a special gl2_archive_restored field with value true. This allows you to only search in restored messages by using a query like:

_exists_:gl2_archive_restored AND <your search query>

Example:

_images/archiving-usage-search.png

If you want to exclude all restored messages from you query you can use:

_missing_:gl2_archive_restored AND <your search query>

Audit Log

The Audit Log plugin keeps track of changes made by users to a Graylog system.

It records all state changes into the database and makes it possible to search, filter and export all audit log entries.

Note

The audit log plugin is a commercial feature and part of Graylog Enterprise.

Setup

The Audit Log plugin is a commercial Graylog feature that can be installed in addition to the Graylog open source server.

Installation

Please see the Graylog Enterprise setup page for details on how to install the Audit Log plugin.

Note

Make sure the Audit Log plugin is installed on every node in your Graylog cluster.

Configuration

The audit log plugin provides two ways of writing audit log entries:

  1. Database
  2. Log file via log4j2 appender

Logging to the database is always enabled and cannot be disabled.

Note

All configuration needs to be done in the Graylog server configuration file and in the logging configuration. (only if the log4j2 appender is enabled) Check the default file locations page for details.

The web interface can show the current configuration.

_images/auditlog-setup-config.png
Database Configuration Options

The default MongoDB audit log has a few configuration options available.

Configuration Options
Name Description
auditlog_mongodb_keep_entries delete audit log entries older that configured interval
auditlog_mongodb_cleanup_interval interval of the audit log entry cleanup job
auditlog_mongodb_collection the MongoDB collection to store the audit log entries in
auditlog_mongodb_keep_entries

This configures the interval after which old audit log entries in the MongoDB database will be deleted. You have to use values like 90d (90 days) to configure the interval.

Warning

Make sure to configure this to fit your needs. Deleted audit log entries are gone forever!

The default value for this is 365d.

Example:

auditlog_mongodb_keep_entries = 365d
auditlog_mongodb_cleanup_interval

This configures the interval of the background job that periodically deletes old audit log entries from the MongoDB database. You have to use values like 1h (1 hour) to configure the interval.

The default value for this is 1h.

Example:

auditlog_mongodb_cleanup_interval = 1h
auditlog_mongodb_collection

This configures the name of the MongoDB collection where the audit log plugin# stores the audit log entries.

The default value for this is audit_log.

Example:

auditlog_mongodb_collection = audit_log
Log4j2 Configuration Options

The optional log4j2 audit log appender has a few configuration options available.

Note

To configure the log4j2 appender you have to edit the Graylog server configuration file and the log4j2.xml file for your setup!

Configuration Options
Name Description
auditlog_log4j_enabled whether the log4j2 appender is enabled or not
auditlog_log4j_logger_name log4j2 logger name
auditlog_log4j_marker_name log4j2 marker name
auditlog_log4j_enabled

The log4j2 audit log appender is disabled by default and can be enabled by setting this option to true.

The default value for this is false.

Example:

auditlog_log4j_enabled = true
auditlog_log4j_logger_name

This configures the log4j2 logger name of the audit log.

The default value for this is gl-org.graylog.plugins.auditlog.

Example:

auditlog_log4j_logger_name = graylog-auditlog
auditlog_log4j_marker_name

This configures the log4j2 marker name for the audit log.

The default value for this is AUDIT_LOG.

Example:

auditlog_log4j_marker_name = AUDIT_LOG
Log4j2 Appender Configuration

To write audit log entries into a file you have to enable the log4j2 appender in your Graylog configuration file and add some configuration to the log4j2.xml file that is used by your server process.

The log4j2.xml file location is dependent on your deployment method. so please check the default file locations page.

An existing log4j2.xml config file needs another <Logger/> statement in the <Loggers/> section and an additional appender in the <Appenders/> section of the file.

Warning

The file on your system might look different than the following example. Make sure to only add the audit log related snippets to your config and do not remove anything else!

Example log4j2.xml file with audit log enabled:

<?xml version="1.0" encoding="UTF-8"?>
<Configuration packages="org.graylog2.log4j" shutdownHook="disable">
    <Appenders>
        <!-- Graylog server log file appender -->
        <RollingFile name="rolling-file" fileName="/var/log/graylog-server/server.log" filePattern="/var/log/graylog-server/server.log.%i.gz">
            <PatternLayout pattern="%d{yyyy-MM-dd'T'HH:mm:ss.SSSXXX} %-5p [%c{1}] %m%n"/>
            <Policies>
                <SizeBasedTriggeringPolicy size="50MB"/>
            </Policies>
            <DefaultRolloverStrategy max="10" fileIndex="min"/>
        </RollingFile>

        <!-- ##################################################### -->
        <!-- Rotate audit logs daily -->
        <RollingFile name="AUDITLOG" fileName="/var/log/graylog-server/audit.log" filePattern="/var/log/graylog-server/audit-%d{yyyy-MM-dd}.log.gz">
            <PatternLayout>
                <Pattern>%d - %m - %X%n</Pattern>
            </PatternLayout>
            <Policies>
                <TimeBasedTriggeringPolicy />
            </Policies>
        </RollingFile>
        <!-- ##################################################### -->
    </Appenders>
    <Loggers>
        <Logger name="org.graylog2" level="info"/>

        <!-- ##################################################### -->
        <!-- Graylog Audit Log.  The logger name has to match the "auditlog_log4j_logger_name" setting in the Graylog configuration file -->
        <Logger name="graylog-auditlog" level="info" additivity="false">
            <AppenderRef ref="AUDITLOG"/>
        </Logger>
        <!-- ##################################################### -->

        <Root level="warn">
            <AppenderRef ref="rolling-file"/>
        </Root>
    </Loggers>
</Configuration>

The config snippets between the <!-- ######### --> tags have been added to the existing log4j2.xml file.

Make sure that the name in the <Logger /> tag matches the configured auditlog_log4j_logger_name in your Graylog server configuration. Otherwise you will not see any log entries in the log file.

Caveats

You have to make sure that the log4j2 related settings in the Graylog server config file and the log4j2.xml file are the same on every node in your cluster!

Since every Graylog server writes its own audit log entries when the plugin is installed, the log files configured in the log4j2.xml file are written on every node. But only the entries from the local node will show up in that file.

If you have more than one node, you have to search in all configured files on all nodes to get a complete view of the audit trail.

Usage

Once you installed the Audit Log plugin, Graylog will automatically write audit log entries into the database.

View Audit Log Entries

The plugin adds a new page to the web interface which can be reached via “System/Audit Log”. You can view and export existing audit log entries in the database.

It also provides a simple search form to search and filter for audit events you are interested in.

_images/auditlog-view-entries-1.png

Expand Event Details

Every row in the audit event entry table is clickable. Once clicked it will reveal the details of the audit event.

All audit events have static fields like actor, object and others. In addition to that, every event has some event specific fields.

The fields on the left side in the details are the static fields every event has and the fields on the right side are the event specific fields.

_images/auditlog-view-entries-2.png

Search & Filter

To make it easier to get to the audit log entries you need, the audit log UI provides a simple query language to search and filter the audit log entries.

You can either enter one or more words into the search field or choose to look for some specific fields in the audit log entries.

Available Fields
Name Description
actor the user that triggered the audit event
namespace the namespace of the audit event; might be different in plugins
object the object of the audit event; what has been changed
action name of the action that has been executed on the object
success_status if the action failed or succeeded
message the actual audit event message
Search for text in the message

If you just want to find some text in the audit event message, you can enter the word you are looking for into the search bar

_images/auditlog-search-entries-1.png
Search for specific fields

You can also filter the entries for specific fields like the actor.

If you want to filter for all events triggered by the user jane you can enter actor:jane into the search bar.

Maybe you want to filter for events for more than one actor. That can be done by using either actor:jane,john or actor:jane actor:john.

Or you want to find all audit events which have not been triggered by a user. Add a - in front of the field name to negate the condition. To show all events except those created by user jane you can add -actor:jane to the search field.

You can mix and match several field queries to find the entries you need. Here are some more examples.

  • actor:jane,john -namespace:server get all events by users jane and john which are not in the server namespace
  • index action:create get all events which have the word index in the event message and where the action is create
  • message:index action:create same as above, just with an explicit field selector for the message field
_images/auditlog-view-entries-3.png

Export Entries

If the simple entry viewer is not enough, you can also export the result of your query as JSON or CSV to further process it.

The “Export Results” button next to the search bar can be used to do that.

Note

The export from the UI is currently limited to the newest 10,000 entries. Use the REST API if you need a bigger export.

Export via REST API

If you want to backup the audit log entries or make them available to another system, you can use the REST API to export them.

Example:

# Export 20,000 audit log entries in JSON format
curl -u admin:<admin-password> http://127.0.0.1:9000/api/plugins/org.graylog.plugins.auditlog/entries/export/json?limit=20000

# Export 5,000 audit log entries with actor "jane" in CSV format
curl -u admin:<admin-password> http://127.0.0.1:9000/api/plugins/org.graylog.plugins.auditlog/entries/export/csv?limit=5000&query=actor:jane

Note

Make sure the query parameter is properly escaped if it contains whitespace.

Changelog

Graylog Enterprise 2.4.7

Released: 2019-03-01

Plugin: License

  • Add missing authorization checks to license resources.

Graylog Enterprise 2.4.6

Released: 2018-07-16

No changes since 2.4.5.

Graylog Enterprise 2.4.5

Released: 2018-05-28

No changes since 2.4.4.

Graylog Enterprise 2.4.4

Released: 2018-05-02

No changes since 2.4.3.

Graylog Enterprise 2.4.3

Released: 2018-01-24

No changes since 2.4.2.

Graylog Enterprise 2.4.2

Released: 2018-01-24

No changes since 2.4.1.

Graylog Enterprise 2.4.1

Released: 2018-01-19

No changes since 2.4.0.

Graylog Enterprise 2.4.0

Released: 2017-12-22

No changes since 2.4.0-rc.2.

Graylog Enterprise 2.4.0-rc.2

Released: 2017-12-20

No changes since 2.4.0-rc.1.

Graylog Enterprise 2.4.0-rc.1

Released: 2017-12-19

No changes since 2.4.0-beta.4.

Graylog Enterprise 2.4.0-beta.4

Released: 2017-12-15

Plugin: License

  • The license page now shows more details about the installed licenses.

Graylog Enterprise 2.4.0-beta.3

Released: 2017-12-04

No changes since 2.4.0-beta.2.

Graylog Enterprise 2.4.0-beta.2

Released: 2017-11-07

No changes since 2.4.0-beta.1.

Graylog Enterprise 2.4.0-beta.1

Released: 2017-10-20

Plugin: Archive

  • Add support for Zstandard compression codec.

Graylog Enterprise 2.3.2

Released: 2017-10-19

Plugin: Archive

  • Fix archive creation for indices with lots of shards.

Graylog Enterprise 2.3.1

Released: 2017-08-25

Plugin: Archive

  • Lots of performance improvements (up to 7 times faster)
  • Do not delete an index if not all of its documents have been archived

Graylog Enterprise 2.3.0

Released: 2017-07-26

Plugin: Archive

  • Record checksums for archive segment files
  • Add two archive permission roles “admin” and “viewer”
  • Allow export of filenames from catalog search

Graylog Enterprise 2.2.3

Released: 2017-04-04

Plugin: Archive

  • Metadata is now stored in MongoDB
  • Preparation for storage backend support

Graylog Enterprise 2.2.2

Released: 2017-03-02

Plugin: Audit Log

  • Extend integration with the Archive plugin

Graylog Enterprise 2.2.1

Released: 2017-02-20

Plugin: Archive

  • Improve stability and smaller UI fixes

Graylog Enterprise 2.2.0

Released: 2017-02-09

Plugin: Archive

  • Improve index set support

Graylog Enterprise 1.2.1

Released: 2017-01-26

Plugin: Archive

  • Prepare the plugin to be compatible with the new default stream.

Plugin: Audit Log

  • Add support for index sets and fix potential NPEs.
  • Smaller UI improvements.

Graylog Enterprise 1.2.0

Released: 2016-09-14

https://www.graylog.org/blog/70-announcing-graylog-enterprise-v1-2

Plugin: Archive

  • Add support for selecting which streams should be included in your archives.

Plugin: Audit Log

New plugin to keep track of changes made by users to a Graylog system by automatically saving them in MongoDB.

Graylog Enterprise 1.1

Released: 2016-09-01

  • Added support for Graylog 2.1.0.

Graylog Enterprise 1.0.1

Released: 2016-06-08

Bugfix release for the archive plugin.

Plugin: Archive

Fixed problem when writing multiple archive segments

There was a problem when exceeding the max segment size so that multiple archive segments are written. The problem has been fixed and wrongly written segments can be read again.

Graylog Enterprise 1.0.0

Released: 2016-05-27

Initial Release including the Archive plugin.

Plugin: Archive

New features since the last beta plugin:

  • Support for multiple compression strategies. (Snappy, LZ4, Gzip, None)