Configure Telegraf for Metrics Data Platform
Configure Telegraf for Metrics Data Platform
Last updated 23 August, 2019
Telegraf is a go agent written to collect, process, aggregate, and write metrics. In this guide, you will learn how to configure it for Metrics Data Platform.
Telegraf is an agent written in Go for collecting, processing, aggregating, and writing metrics.
Design goals are to have a minimal memory footprint with a plugin system so that developers in the community can easily add support for collecting metrics from local or remote services.
Telegraf is plugin-driven and has the concept of 4 distinct plugins:
New plugins are designed to be easy to contribute, we'll eagerly accept pull requests and will manage the set of plugins that Telegraf supports.
To ensure using the last release, please refer to the Github project.
The Metrics platform allow the user to push directly in InfluxDB with Telegraf, or with Warp 10 using a Warp 10 output plugin. To set up correctly Telegraf to use it you have to refer to the plugin Github repository.
Once getting started with Telegraf you have to specify the data to record. With Telegraf you can generate basic default configuration file (as for example for the CPU data below).
telegraf --input-filter cpu --output-filter influxdb config
If we record this telegraf result to a file this will generate the following data set:
cpu,cpu=cpu0,host=test usage_system=0.3688181056160914,usage_irq=0,usage_guest=0,usage_softirq=0.10058675607711719,usage_steal=0,usage_guest_nice=0,usage_user=1.0896898575021035,usage_idle=98.37384744342029,usage_nice=0,usage_iowait=0.06705783738474282 1534923420000000000 cpu,cpu=cpu1,host=test usage_user=0.8682584738687309,usage_system=0.4174319585907506,usage_guest=0,usage_guest_nice=0,usage_idle=98.61412589747883,usage_nice=0,usage_iowait=0.03339455668725934,usage_irq=0,usage_softirq=0.06678911337452015,usage_steal=0 1534923420000000000 cpu,cpu=cpu2,host=test usage_user=1.0514018691588776,usage_idle=98.41455273698294,usage_nice=0,usage_iowait=0.03337783711615437,usage_softirq=0.05006675567423304,usage_system=0.45060080106808703,usage_irq=0,usage_steal=0,usage_guest=0,usage_guest_nice=0 1534923420000000000 cpu,cpu=cpu3,host=test usage_user=1.3538358682935037,usage_idle=98.29516964733415,usage_irq=0,usage_guest=0,usage_guest_nice=0,usage_system=0.3008524151763328,usage_nice=0,usage_iowait=0.033428046130702986,usage_softirq=0.016714023065351493,usage_steal=0 1534923420000000000 cpu,cpu=cpu-total,host=test usage_softirq=0.05433646812957265,usage_guest_nice=0,usage_user=1.0909090909090966,usage_idle=98.43260188087774,usage_nice=0,usage_steal=0,usage_guest=0,usage_system=0.38453500522465517,usage_iowait=0.037617554858935594,usage_irq=0 1534923420000000000
Let's only take the first line:
cpu,cpu=cpu0,host=test usage_system=0.3688181056160914,usage_irq=0,usage_guest=0,usage_softirq=0.10058675607711719,usage_steal=0,usage_guest_nice=0,usage_user=1.0896898575021035,usage_idle=98.37384744342029,usage_nice=0,usage_iowait=0.06705783738474282 1534923420000000000
This line creates several metrics on a Time-Series database:
To resume with just the first line there is at least 10 new metrics created. And for the complete example it's a total of 50 metrics. Keep this indicator in mind when deploying a telegraf agent and pushing the data to the Metrics Platform.
To start pushing Telegraf data to the Metrics platform, you just need to add an Influx output plugin as described below:
# OUTPUTS [[outputs.influxdb]] urls = ["https://influxdb.REGION.metrics.ovh.net" ] ## Timeout for HTTP messages. timeout = "15s" # Set at least 15s to avoid possible timeout with our platform ## HTTP Basic Auth username = "t" # A random user name for the basic auth (not checked) password = WRITE_TOKEN
REGION by your own information based on Metrics manager. Telegraf can now push to Metrics platform!
# OUTPUTS [[outputs.warp10]] warpUrl = "https://warp10.REGION.metrics.ovh.net/api/v0/update" token = "WRITE_TOKEN" prefix = "telegraf." # A prefix to start all metrics name can be left empty debug = false
REGION are to be replaced by your own information based on Metrics manager.
To have the best experience with Telegraf on the Metrics platform we propose here some possible updates.
First you can generate global tags for all the series you record using global_tags:
[global_tags] dc = "new-12" # will tag all metrics with dc=new-12 rack = "rack-0"
To control the amount of points recorded per metrics, it can be done using agent:
[agent] ## Default data collection interval for all inputs interval = "30s" # To push ont point per 30s ## true Rounds collection interval to 'interval' ## ie, if interval="10s" then always collect on :00, :10, :20, etc. round_interval = true ## Telegraf will send metrics to outputs in batches of at most ## metric_batch_size metrics. metric_batch_size = 1000 ## For failed writes, telegraf will cache metric_buffer_limit metrics for each ## output, and will flush this buffer on a successful write. metric_buffer_limit = 10000 ## Default flushing interval for all outputs. flush_interval = "30s"
We recommand a 30s interval to match best the Metrics platform storage offers. You can update this parameter to your needs. Here you can also control the number of points sent per batch.
Telegraf is used to record a lot of servers indicators as mem, system, net, cpu, diskio... This is done using the telegraf inputs plugins.
However some inputs indicators might required telegraf to run as root user (on linux). For example, this can be the case of some network stats on a grsec kernel (you can check it using the
cat /proc/net/dev command, a user not granted will always see 0 as value).
All inputs parameters, can be configured in telegraf main configuration file. To control the cardinality of the data send to the Metrics platform, we can, in the previous example, update the CPU inputs as below, setting percpu to false.
[[inputs.cpu]] percpu = false totalcpu = true collect_cpu_time = false report_active = false
This will record only one set of metrics for the CPU data. In our previous example, we will get only 10 metrics instead of 50. This doesn't look like a lot, but if you have hosts with 8 or 16 CPU, you still have only 10 series (and not 90 or 170). When deploying telegraf on several hosts, this can reduce by a lot the number of metrics saved on a long time storage and to process. But most of all it's get easier to predict the number of series created per host.
For all inputs, each configuration parameters are detailed in Telegraf Github repository. As example the interesting parameters to reduce the recorded data:
To only record specific metrics, a Telegraf filtering can be applied on any inputs. The field parameter applied to the Metrics platform classnames when tags applied to labels.
[[inputs.something]] fieldpass = ["valid_prefix\*"]
[[inputs.something]] fielddrop = [ "\*unvalid_suffix" ]
When using tagpass or tagrop, you have to first declare all settings of an input.
[[inputs.something]] some_setting = 'test' other_setting = '42' [inputs.something.tagpass] # tagpass conditions are OR. # If (label0 is 42 or test) OR (the label1 is ...-42 or test-...) # then the metric passes label0 = [ "42", "test" ] # Pattern can also be used on the tag values label1 = [ "*-42", "test-*" ]
[[inputs.something]] tagexclude = [ "port", "server" ]
[[inputs.something]] name_override = "something"
You can check the Telegraf configuration guidelines to find a lot of other interesting parameters.
Selecting the metrics to record on your infrastructure might looks like a lost of time at first sight, but it will save you a lot in the future as you will have the full control on them.
Once Telagraf is well configured, start it with:
telegraf --config telegraf.conf
Once the first batch of data is completed, you can retrieve them using any of our available query protocol.
N’hésitez pas à nous proposer des suggestions d’amélioration afin de faire évoluer cette documentation.
Images, contenu, structure… N’hésitez pas à nous dire pourquoi afin de la faire évoluer ensemble !
Vos demandes d’assistance ne seront pas traitées par ce formulaire. Pour cela, utilisez le formulaire "Créer un ticket" .
Merci beaucoup pour votre aide ! Vos retours seront étudiés au plus vite par nos équipes..
Accedez à votre espace communautaire. Posez des questions, recherchez des informations, publiez du contenu et interagissez avec d’autres membres d'OVHcloud Community.Echanger sur OVHcloud Community