OVH Guide

Using alerts with Logs Data Platform

With the alerting feature you don't even need to watch your logs, our platform does it for you.

Alerting is one of the most powerful feature of Logs Data Platform. It allows you to stop worrying about your logs and be prepared for a lot of situations: When you don't have any logs of a software for an unexpected long time, or when the number of tasks completed is too low, or when the traffic on your website is too high, or even when one specific keyword appears in any of your information feed: all these use cases can trigger an alert that will send you a message immediately.

This guide will describe you how to configure and use alerts on a particular field. We will provide an example with Apache logs. In order to understand this tutorial you should read the following tutorials:

Why configuring an alert on Logs Data Platform ?

Logs Data Platform provides many ways to watch your logs in real time:

  • the follow stream functionality in the OVH Manager.
  • the Live Tail functionality in Graylog.
  • The Graylog dashboards that refresh themselves in real-time.
  • Any software that can query the Graylog or ES APIs (Kibana or Grafana for example).

The goal of the Alerting feature is to give you the freedom to not watch your logs. Logs Data Platform can automatically inform you when something happens. Because the alerts conditions are diverse, there is 3 types of alerts :

  • Counter alert, like its name suggests, the counter alert can warn you when the number of logs is above or below a certain threshold.
  • Numeric value alert is triggered when a certain numeric field has an abnormal value. The value can be the mean value, the sum, the minimum, maximum, and even the standard deviation or the median.
  • Text content alert is the alert triggered when a field has some exact value.

For the 3 types of alerts, you can configure a grace period. The grace period is a period of time during which the alert won't be triggered again so that you won't get spammed by the same alert over and over. You can also configure how many last messages you want to include in your alert. This is useful to quickly identify the root causes of your alerts.

Use case: Alerts for a website powered by an Apache Server

For this tutorial, we will configure the 3 alerts that we can use for a website. These 3 alerts can help you to react immediately in case of failure, detect unexpected problems or verify that all your websites are working correctly. But before going into the alerting feature itself, we need to configure our Apache Logging format to include all the informations we need. We will also use Filebeat to send our logs to our dedicated Logstash collector on Logs Data Platform.

Apache Server Configuration

We will use the LTSV format to send logs, this format is simple enough to be efficiently parsed by the collector. Here is a configuration file sample:

<VirtualHost *:80>

    ServerAdmin webmaster@localhost
    DocumentRoot /var/www/html

    ErrorLog ${APACHE_LOG_DIR}/error.log
    CustomLog ${APACHE_LOG_DIR}/access.log "domain:%V\thost:%h\tserver:%A\tident:%l\tuser:%u\ttime:%{%d/%b/%Y:%H:%M:%S %z}t\tmethod:%m\tpath:%U%q\tprotocol:%H\tstatus_int:%>s\tsize_int:%b\treferer:%{Referer}i\tagent:%{User-Agent}i\tresponse_time_int:%D\tcookie:%{cookie}i\tset_cookie:%{Set-Cookie}o\tmessage:%h %l %u %t \"%r\" %>s %b\n"

</VirtualHost>

The configuration is inspired by the one you can find in this tutorial.

Logstash and Filebeat configuration

The Logstash collector configuration is kept simple for this tutorial. Here is the input section part

beats {
    port => 5044
    ssl => true
    ssl_certificate => "/etc/ssl/private/server.crt"
    ssl_key => "/etc/ssl/private/server.key"
}

As in the Filebeat tutorial, we will use a simple beats input with SSL.

For the filter part we use this configuration:

kv {
    value_split => ":"
    field_split => "\t"
}
date {
    match => [ "time", "dd/MMM/YYYY:HH:mm:ss Z"]
}

This simple Logstash filter use the key value filter plugin and the date plugin filter to parse the LTSV format and to parse the date so that our logs messages are delivered with the right timestamp (the time of the log message rather than the time of the delivery).

The Filebeat configuration will be similar to the one used in the Filebeat tutorial:

 ############################# Filebeat ######################################
 filebeat:
   # List of prospectors to fetch data.
   prospectors:
   # Each - is a prospector. Below are the prospector specific configurations
   # Paths that should be crawled and fetched. Glob based paths.
   # To fetch all ".log" files from a specific level of subdirectories
   # /var/log/*/*.log can be used.
   # For each file found under this path, a harvester is started.
   # Make sure no file is defined twice as this can lead to unexpected behaviour.
     -
         paths:
         - /var/log/apache2/access.log
          input_type: log
          document_type: apache
          fields:
              apache_version: 2.2.9
          fields_under_root: true

     -
         paths:
         - /var/log/apache2/error.log
          input_type: log
          document_type: apache-error
          fields:
              apache_version: 2.2.9
          fields_under_root: true

   # Name of the registry file. Per default it is put in the current working
   # directory. In case the working directory is changed after when running
   # filebeat again, indexing starts from the beginning again.
   registry_file: /var/lib/filebeat/registry
 ############################# Output ##########################################
 # Configure what outputs to use when sending the data collected by the beat.
 # Multiple outputs may be used.
 output:
   ### Logstash as output
   logstash:
     # The Logstash hosts
     hosts: ["<your_cluster>-XXXXXXXXXXXXXXXXXX.<your_cluster>.logs.ovh.com:5044"]
     worker: 1
     tls:
       # List of root certificates for HTTPS server verifications
       certificate_authorities:
       - /usr/local/etc/filebeat/ldp-ca.crt
 ############################# Logging #########################################
 # There are three options for the log ouput: syslog, file, stderr.
 # Under Windos systems, the log files are per default sent to the file output,
 # under all other system per default to syslog.
 logging:
   # Send all logging output to syslog. On Windows default is false, otherwise
   # default is true.
   to_syslog: false
   # Write all logging output to files. Beats automatically rotate files if rotateeverybytes
   # limit is reached.
   to_files: true
   # To enable logging to files, to_files option has to be set to true
   files:
   # The directory where the log files will written to.
     path: /var/log/
     # The name of the files where the logs are written to.
     name: filebeat.log
     # Configure log file size limit. If limit is reached, log file will be
     # automatically rotated
     rotateeverybytes: 10485760 # = 10MB
     # Number of rotated log files to keep. Oldest files will be deleted first.
     keepfiles: 7
   # Sets log level. The default log level is error.
   # Available log levels are: critical, error, warning, info, debug
 level: info

Configuring a Counter alert

For this alert we will tackle the following question: How to get alerted when my website is not working anymore?

One of the sign of a non working website on a dedicated server is the number of access logs from this website. Except for special cases like a maintenance, a website should have a steady number of visits during a day. If you want to configure an alert when no traffic is detected, you can for example configure an counter alert on the number of logs.

For this, go to the stream page and use the menu at the right to navigate to the Alerting menu.

Navigate to alert

On this interface, stay on the Counter Tab. Configuring alerts is as easy as filling the terms describing the behavior of your alerts. For example you can indicate to Logs Data Platform to:

Trigger an alert named No Traffic when there are less than 3 messages in the last 5 minutes and then wait at least 5 minutes before triggering a new alert (grace period).

Alert Creation

The sentence above contains the terms that you have to use to create your alerts. Click on Add this condition and your alert will be up and running immediately.

Alert Created

You can remove the alert by clicking on the Remove button.

As soon as the alert is fired you will receive a mail, detailing the alert condition that triggered the alert.

No traffic mail

Configuring a Numeric Value alert

A slow website is a poor experience for your users and can make you lose customers. There is multiple possible causes for a slowdown : too many connections, a misbehaving web application or an network problem. Fortunately, your Apache logs gives you the response time of your server that you can use to trigger an alert when your website is too slow.

To configure an alert based on the response time, Go to the Numeric value alert tab on the Alerting page. As with the Counter Alert, you have to fill the different fields to create your alert:

Slow website alert

Here, we have configured an alert to be sent when the minimum value of response_time_int is higher than 1500 in the last 5 minutes. That means an alert will be triggered every time you have a web request that tooks more than 1500 milliseconds (1.5 second) to complete. The triggered alert will send you a mail similar to the previous one, with link to the last message included so that you can directly see what pages are too slow.

Slow website alert

Configuring a Text Content alert

For this alert, we want to be alerted when there is any error 500 on our website. The Text Content alert is the one that you must use when you want some value to be detected in your field. This alert type is located under the Text Content tab in the alerting panel.

Like the previous alert, you have to describe your alert to configure it. Here the sentence states that the alert must be triggered when the field status_int is set to 500.

Slow website alert

You will then receive a mail with the messages included. You can then directly navigate to your Graylog stream for further investigations:

Slow website alert


Getting Help