↓ Archives ↓

Category → ganglia

Our #monitoringsucks rpm is repository available

Not only our Rubygems Builds have changed, but also my internal #monitoringsucks repository.

You might have noticed a variety of vagrant- projects on my github acount

http://github.com/KrisBuytaert/vagrant-ganglia
http://github.com/KrisBuytaert/vagrant-graphite
http://github.com/KrisBuytaert/vagrant-puppet-logstash,
Being the #monitoringsucks part of them. All of those Vagrant projects are basically my test setups to play with those new tools.

They contain a bunch of puppet modules that install and configure these tools. (Note that they mostly consist of
of git submodules to other puppet module repositories.

Given the fact that I also like to have my software cleanly installed from a package, that means that some of these tools had to be packaged, or I had to create a personal / internal repository which had packages from upstream that were hiding on the internet available.

I've forked of this repository off the internal Inuits epository so you all can also benefit from these efforts.
(You gotta love pulp :))

That means you can now install all of the above mentionned #monitoringsucks tool from our public repo on

  1. yumrepo { 'monitoringsucks':
  2. baseurl => 'http://pulp.inuits.eu/pulp/repos/monitoring',
  3. descr => 'MonitoringSuck at Inuits',
  4. gpgcheck => '0',
  5. }

Patches to both the Vagrant projects and the puppet modules are welcome ...

Graphite, JMXTrans, Ganglia, Logster, Collectd, say what ?

Given that @patrickdebois is working on improving data collection I thought it would be a good idea to describe the setup I currently have hacked together.

(Something which can be used as a starting point to improve stuff, and I have to write documentation anyhow)

I currently have 3 sources , and one target, which will eventually expand to at least another target and most probably more sources too.

The 3 sources are basically typical system data which I collect using collectd, However I`m using collectd-carbon from https://github.com/indygreg/collectd-carbon.git to send data to Graphite.

I`m parsing the Apache and Tomcat logfiles with logster , currently sending them only to Graphite, but logster has an option to send them to Ganglia too.

And I`m using JMXTrans to collect JMX data from Java apps that have this data exposed and send it to Graphite. (JMXTrans also comes with a Ganglia target option)

Rather than going in depth over the config it's probably easier to point to a Vagrant box I build https://github.com/KrisBuytaert/vagrant-graphite which brings up a machine that does pretty much all of this on localhost.

Obviously it's still a work in progress and lots of classes will need to be parametrized and cleaned up. But it's a working setup, and not just on my machine ..

#monitoringsucks and we’ll fix it !

If you are hacking on monitoring solutions, and want to talk to your peers solving the problem
Block the monday and tuesday after fosdem in your calendar !

That's right on february 6 and 7 a bunch of people interrested to fix the problem will be meeting , discussing and hacking stuff together in Antwerp

In short a #monitoringsucks hackathon

Inuits is opening up their offices for everybody who wants to join the effort Please let us (@KrisBuytaert and @patrickdebois) know if you want to join us in Antwerp

Obviously if you can't make it to Antwerp you can join the effort on ##monitoringsucks on Freenode or on Twitter.

The location will be Duboistraat 50 , Antwerp
It is about 10 minutes walk from the Antwerp Central Trainstation
Depending on Traffic Antwerp is about half an hour north of Brussels and there are hotels at walking distance from the venue.

Plenty of parking space is available on the other side of the Park

Monitoring Wonderland Survey – Metrics – API – Gateways

Update 4/01/2012: added ways to add metrics via logs, java pickle graphite feeder

One tool to rule them all? Not.

If you are working within an enterprise , chances are that you have different metric systems in place: You might have some Cacti, Ganglia, Collectd, etc... due to historical reasons, different departments,

This reminded me of the situation while I was working in Identity Management: you might have an LDAP, Active Directory, local HR database etc. There would be plans and discussions of using one over the other, and gateways would need to be written. I learned a few lessons there:

  1. have as few sources/stores of information as possible
  2. DON't try to chase the one tool to rule them all, aka don't use a tool for something it's not made for
  3. make it self-servicing to user and automate processes

1 to 1 gateways

Take the new Metrics hotness Graphite as an example, it has some nice graphing advantages over other tools . So people wonder , should I migrate my Ganglia, Collectd to Graphite? Graphite doesn't come with elaborate collection scripts for memory/disk/etc ... , so we have to rely on other tools like Cacti,Munin,Collectd,Ganglia to first collect the data.

So we start writing gateways to get data into Graphite:

But what happens if we also use Opentsdb for storing long term data ? We have to re-implement those gateways:

Issue 1 : Effort duplication

This just seems like a waste of energy implementing the protocol in every tool.This sure isn't the first time this happens in history: the same thing happened for Collectd -> Ganglia Plugin

If you look at the data that is transmitted it is actually pretty much the same:

a metric name, value, timestamp, optionally hostname, some metadata tags

So we could easily envision a 'universal' format that would be used to translate from and to.

Ganglia  <-> Intermediate format <-> Graphite
Collectd <-> Intermediate format <-> Opentsdb

With this intermedia format, we would only have to write one end of the equation once.

I started thinking of this like an ffmpeg for monitoring

Issue 2: Difficult to hook in additional listeners

Let's add another system that wants to listen into the metrics, something like Esper, Nagios alerting, some Dataware house tools etc... We could reuse the libraries from end to the other, but we'll have to add more gateways and put these in place everytime.

A better approach would be to use a message bus approach: every tools puts and listens on a bus and gets the data it needed. RI Pienaar has written about this approach extensively in his Series on Common Messaging Patterns. Aso John Bergmans has a great post on using AMQP and Websockets to get realtime graphics.

Some of the tools already have Message queue integrations, but there seems to be a common intermediate format missing

As a proof of concept I've created :

Building blocks

In this section I'll look for API's (ruby oriented) to get data in and out of the different metrics systems:

Graphite - IN

Sending metrics from ruby to Graphite:

These both implement the Simple Protocol, but for high performance we'd like to use the batching facility through the Pickle Format. I could not find a Pickle gem for ruby, but his could work through Ruby-Python gateway http://rubypython.rubyforge.org/.

Faster - a Java Netty based graphite relay takes the same approach https://github.com/markchadwick/graphite-relay

Another way to get your data into graphite is using Etsy's Logster https://github.com/etsy/logster

Mike Brittain greatly explains it's use in Take my logs... Please! - A velocity Online Conference SessionVideoPDF

Graphite - OUT

To get all the data out of Graphite is impossible through the standard API. You get a graph out as Raw data, but that hardly counts.

The best option seems to be to listen in to the graphite - udp receiver and duplicate the information onto a message bus.

An alternative might be to directly read from the Whisper storage, inspiration for that can be found in:

Opentsdb - IN

I could not find any ruby gem that implements the Opentsdb protocol for sending data, but creating one should be trivial. Opentsdb just use a plain TCP socket to get the data in

Opentsdb - OUT

Getting data out of Opentsdb suffers the same problem as Graphite: you can do queries on specific graph data

But you can't get it out, maybe if you directly interface with the Hbase/Java API. So again the best bet is to create a listener/proxy for the simple TCP protocol.

Ganglia - IN

Sending metrics to Ganglia is easy using the gmetric shell command. Early days code describing this can still be found at http://code.google.com/p/embeddedgmetric/

Igrigorik has written up nicely on how to use the Gmetric Ruby gem to send metrics

If you want to feed in log files into ganglia Logtailer might be your thing https://bitbucket.org/maplebed/ganglia-logtailer

Ganglia - OUT

Vladimir describes the options while he explains on how to get Ganglia data to graphite

Option 1 is to poll the Gmond over TCP and get the XML from it's current data:

Options 2 is to listen into the UDP protocol as a additional receiver.

I implemented both approaches in the https://github.com/jedi4ever/gmond-zmq

Note: As a side effect I found that the metrics send to the UDP are actualy more acurate then the values when you query the XML.

Collectd - IN

So send metrics to Collectd, you can use ruby gem from Astro that implements most of the UDP protocol

Collectd - OUT

I give Collectd for the price of best output.

It currently implements different writers:

  • Network plugin
  • UnixSock plugin
  • Carbon plugin
  • CSV
  • RRDCacheD
  • RRDtool
  • Write HTTP plugin

And the deactived ZeroMQ - https://github.com/deactivated/collectd-write-zmq

The Binary Protocol http://collectd.org/wiki/index.php/Binary_protocol is pretty simple to listen into.

Munin

If you happen to use Munin, here's some inspiration, but I haven't researched it much

Circonus

If you happen to use Circonus, here's some inspiration, but I haven't researched it much

RRD interaction from ruby

For those who want to read and write directly from RRD's in ruby, please have fun:

Alert on metrics:

With all the tools in and out, and a unified intermediate format, it will be trivial to rewrite the traditional alert check tools to listen into the bus for values. This means you can listen into for your Nagios, your ticket system, your pager system etc.. from the same source.

Graphite

Opentsdb

Ganglia

New Relic

https://github.com/kogent/check_newrelic

Conclusion

It should be feasible to create an intermediate format and reuse some of these libraries to implement both IN and OUT functionality. Why not create a Fog for monitoring information? Like implements metric receive, send,

Next stop Nagios because it deserves a blogpost on it's own ...

Puppet Report Processors Made Easy

This week I wanted to show people how easy it is to write Puppet report processors that do more that just store reports or log output. With that in mind I have written nine new report processors that I’ll be showing over this next week. We’re going to start with two new report processors – Puppet IRC and Puppet Ganglia!

Puppet IRC

The first, the Puppet IRC report processor, notifies an IRC channel of failed Puppet runs with the name of the host that failed and the date.

It requires the shout-bot gem to be installed on your Puppet master:

$ sudo gem install shout-bot

You can then install puppet-irc as a module in your Puppet master’s modulepath. Now update the irc_server and irc_channel variables in the /etc/puppet/irc.yaml file with your IRC connection details. An example file is included.

Then enable pluginsync and reports on your master and clients in puppet.conf including specifying the irc report processor.

[master]
report = true
reports = irc
pluginsync = true
[agent]
report = true
pluginsync = true

Finally, run the Puppet client and sync the report as a plugin and hey presto you’re logging failures to the IRC channel of your choice!

Puppet Ganglia

Our second report processor is called Puppet Ganglia and sends metrics to a Ganglia server via gmetric (so you need a running Ganglia server!).

Firstly, we install the gmetric gem:

$ sudo gem install gmetric

Then install puppet-ganglia as a module in your Puppet master’s module path. Next, update the `ganglia_server` and `ganglia_port` variables in the `ganglia.yaml` file with your Ganglia server IP and port and copy the file to `/etc/puppet/`. An example file is included in the repository.

Lastly, enable pluginsync and reports on your master and clients in `puppet.conf`

        [master]
        report = true
        reports = ganglia
        pluginsync = true
        [agent]
        report = true
        pluginsync = true

Now when you run the Puppet client metrics will be sent from Puppet straight to Ganglia including metrics like:

* Config retrieval time
* File time
* Filebucket time
* Total time
* Changed resources
* Out of sync resources
* Skipped resources
* Total resources
* Successful events
* Total events
* Total changes