Category → Code
Real time Puppet events and network wide callbacks
I’ve wanted to be notified the moment Puppet changes a resource for ages, I’ve often been told this cannot be done without monkey patching nasty Puppet internals.
Those following me on Twitter have no doubt noticed my tweets complaining about the complexities in getting this done. I’ve now managed to get this done so am happy to share here.
The end result is that my previously mentioned event system is now getting messages sent the moment Puppet does anything. And I have callbacks at the event system layer so I can now instantly react to change anywhere on my network.
If a new webserver in a certain cluster comes up I can instantly notify the load balancer to add it – perhaps via a Puppet run of of its own. This lets me build reactive cross node orchestration. Apart from the callbacks I also get run metrics in Graphite as seen in the image.
The best way I found to make Puppet do this is by creating your own transaction reports processor that is the client side of a normal puppet report, here’s a shortened version of what I did. It’s overly complex and a pain to get working since Puppet Pluginsync still cannot sync out things like Applications.
The code overrides the finalize_report method to send the final run metrics and the add_resource_status method to publish events for changing resources. Puppet will use the add_resource_status method to add the status of a resource right after evaluating that resource. By tapping into this method we can send events to the network the moment the resource has changed.
require 'puppet/transaction/report' class Puppet::Transaction::UnimatrixReport < Puppet::Transaction::Report def initialize(kind, configuration_version=nil) super(kind, configuration_version) @um_config = YAML.load_file("/etc/puppet/unimatrix.yaml") end def um_connection @um_connection ||= Stomp::Connection.new(@um_config[:user], @um_config[:password], @um_config[:server], @um_config[:port], true) end def um_newevent event = { ... } # skeleton event, removed for brevity end def finalize_report super metrics = um_newevent sum = raw_summary # add run metrics from raw_summary to metrics, removed for brevity Timeout::timeout(2) { um_connection.publish(@um_config[:portal], metrics.to_json) } end def add_resource_status(status) super(status) if status.changed? event = um_newevent # add event status to event hash, removed for brevity Timeout::timeout(2) { um_connection.publish(@um_config[:portal], event.to_json) } end end end
Finally as I really have no need for sending reports to my Puppet Masters I created a small Application that replace the standard agent. This application has access to the report even when reporting is disabled so it will never get saved to disk or copied to the masters.
This application sets up the report using the class above and creates a log destination that feeds the logs into it. This is more or less exactly the Puppet::Application::Agent so I retain all my normal CLI usage like –tags and –test etc.
require 'puppet/application' require 'puppet/application/agent' require 'rubygems' require 'stomp' require 'json' class Puppet::Application::Unimatrix < Puppet::Application::Agent run_mode :unimatrix def onetime unless options[:client] $stderr.puts "onetime is specified but there is no client" exit(43) return end @daemon.set_signal_traps begin require 'puppet/transaction/unimatrixreport' report = Puppet::Transaction::UnimatrixReport.new("unimatrix") # Reports get logs, so we need to make a logging destination that understands our report. Puppet::Util::Log.newdesttype :unimatrixreport do attr_reader :report match "Puppet::Transaction::UnimatrixReport" def initialize(report) @report = report end def handle(msg) @report << msg end end @agent.run(:report => report) rescue => detail puts detail.backtrace if Puppet[:trace] Puppet.err detail.to_s end if not report exit(1) elsif options[:detailed_exitcodes] then exit(report.exit_status) else exit(0) end end end
And finally I can now create a callback in my event system this example is over simplified but the gist of it is that I am triggering a Puppet run on my machines with class roles::frontend_lb as a result of a web server starting on any machine with the class roles::frontend_web – effectively immediately causing newly created machines to get into the pool. The Puppet run is triggered via MCollective so I am using it’s discovery capabilities to run all the instances of load balancers.
add_callback(:name => "puppet_change", :type => ["archive"]) do |event| data = event["extended_data"] if data["resource_title"] && data["resource_type"] if event["name"] == "Service[httpd]" # only when its a new start, this is short for brevity you want to do some more checks if data["restarted"] == false && data["tags"].include?("roles::frontend_web") # does a puppet run via mcollective on the frontend load balancer UM::Util.run_puppet(:class_filter => "roles::frontend_lb") end end end end
Doing this via a Puppet run demonstrates to me where the balance lie between Orchestration and CM. You still need to be able to build a new machine, and that new machine needs to be in the same state as those that were managed using the Orchestration tool. So by using MCollective to trigger Puppet runs I know I am not doing anything out of bounds of my CM system, I am simply causing it to work when I want it to work.
Facter facts from TXT, JSON, YAML and non ruby scripts
There has been many discussions about Facter 2, one of the things I looked forward to getting was the ability to read arbitrary files or run arbitrary scripts in a directory to create facts.
This is pretty important as a lot of the typical users just aren’t Ruby coders and really all they want is to trivially add some facts. All too often the answer to common questions in the Puppet user groups ends up being “add a fact” but when they look at adding facts its just way too daunting.
Sadly as of today Facter 2 is mostly vaporware so I created a quick – really quick – fact that reads the directory /etc/facts.d and parse Text, JSON or YAML files but can also run any executable in there.
To write a fact in a shell script that reads /etc/sysconfig/kernel on a RedHat machine and names the default kernel package name simply do this:
#!/bin/sh source /etc/sysconfig/kernel echo "default_kernel=${DEFAULTKERNEL}"
Add it to /etc/facts.d and make it executable, now you can simple use your new fact:
% facter default_kernel kernel-xen
Simple stuff. You can get the fact on my GitHub and deploy it using the standard method.
Puppet backend for Hiera part 2
Note: This project is now being managed by Puppetlabs, its new home is http://projects.puppetlabs.com/projects/hiera
Last week I posted the first details about my new data source for Hiera that enables its use in Puppet.
In that post I mentioned I want to do merge or array searches in the future. I took a first stab at that for array data and wanted to show how that works. I also mentioned I wanted to write a Hiera External Node Classifier (ENC) but this work completely makes that redundant now in my mind.
A common pattern you see in ENCs are that they layer data – very similar in how extlookup / hiera has done it – but that instead of just doing a first-match search they combine the results into a merged list. This merged list is then used to include the classes on the nodes.
For a node in the production environment located in dc1 you’ll want:
node default { include users::common include users::production include users::dc1 }
I’ve made this trivial in Hiera now, given the 3 files below:
common.json
{"classes":"users::common"}
production.json
{"classes":"users::production"}
dc1.json
{"classes":"users::dc1"}
And appropriate Hiera hierarchy configuration you can achieve this using the node block below:
node default { hiera_include("classes") }
Any parametrized classes that use Hiera as in my previous post will simply do the right thing. Individual classes variables can be arrays so you can include many classes at each tier. Now just add a role fact on your machines, add a role tier in Hiera and you’re all set.
The huge win here is that you do not need to do any stupid hacks like load the facts from the Puppet Masters vardir in your ENC to access the node facts or any of the other hacky things people do in ENCs. This is simply a manifest doing what manifests do – just better.
The hiera CLI tool has been updated with array support, here is it running on the data above:
$ hiera -a classes
["users::common"]
$ hiera -a classes environment=production location=dc1
["users::common", "users::production", "users::dc1"]
I’ve also added a hiera_array() function that takes the same parameters as the hiera() function but that returns an array of the found data. The array capability will be in Hiera version 0.2.0 which should be out later today.
I should also mention that Luke Kanies took a quick stab at integrating Hiera into Puppet and the result is pretty awesome. Given the example below Puppet will magically use Hiera if it’s available, else fall back to old behavior.
class ntp::config($ntpservers="1.pool.ntp.org") { . . } node default { include ntp::config }
With Lukes proposed changes this would be equivalent to:
class ntp::config($ntpservers=hiera("ntpservers", "1.pool.ntp.org")) { . . }
This is pretty awesome. I wouldn’t hold my breath to see this kind of flexibility soon in Puppet core but it shows whats possible.
Puppet backend for Hiera
Note: This project is now being managed by Puppetlabs, its new home is http://projects.puppetlabs.com/projects/hiera
Yesterday I released a Hierarchical datastore called Hiera, today as promised I’ll show how it integrates with Puppet.
Extlookup has solved the basic problem of loading data into Puppet. This was done 3 years ago at a time before Puppet supported complex data in Hashes or things like Parametrized classes. Now as Puppet has improved a new solution is needed. I believe the combination of Hiera and the Puppet plugin goes a very long way to solving this and making parametrized classes much more bearable.
I will highlight a sample use case where a module author places a module on the Puppet Forge and a module user downloads and use the module. Both actors need to create data – the author needs default data to create a just-works experience and the module user wants to configure the module behavior either in YAML, JSON, Puppet files or anything else he can code.
Module Author
The most basic NTP module can be seen below. It has a ntp::config class that uses Hiera to read default data from ntp::data:
modules/ntp/manifests/config.pp
class ntp::config($ntpservers = hiera("ntpservers")) { file{"/tmp/ntp.conf": content => template("ntp/ntp.conf.erb") } }
modules/ntp/manifests/data.pp
class ntp::data { $ntpservers = ["1.pool.ntp.org", "2.pool.ntp.org"] }
This is your most basic NTP module. By using hiera(“ntpserver”) you load $ntpserver from these variables, the first one that exists gets used. In this case the last one.
- $ntp::config::data::ntpservers
- $ntp::data:ntpservers
This would be an abstract from a forge module, anyone who use it will be configured to use the ntp.org base NTP servers.
Module User
As a user I really want to use this NTP module from the Forge and not write my own. But what I also need is flexibility over what NTP servers I use. Generally that means forking the module and making local edits. Parametrized Classes are supposed to make this better but sadly the design decisions means you need an ENC or a flexible data store. The data store was missing thus far and I really would not recommend their use without it.
Given that the NTP module above is using Hiera as a user I now have various options to override its use. I configure Hiera to use the (default) YAML backend for data but to also load in the Puppet backend should the YAML one not provide an answer. I also configure it to allow me to create per-location data that gives me the flexibility I need to pick NTP servers I need.
:backends: - yaml
- puppet
:hierarchy: - %{location}
- commonI now need to decide how best to override the data from the NTP module:
I want:
- Per datacenter values when needed. The NOC at the data center can change this data without change control.
- Company wide policy the should apply over the module defaults. This is company policy and should be subject to change control like my manifests.
Given these constraints I think the per-datacenter policy can go into data files that is controlled outside of my VCS like with a web application or simple editor. The common data that should apply company wide need to be controlled under my VCS and managed by the change control board.
Hiera makes this easy. By configuring it as above the new Puppet data search path – for a machine in dc1 – would be:
- $data::dc1::ntpservers – based on the Hiera configuration, user specific
- $data::common::ntpservers – based on the Hiera configuration, user specific
- $ntp::config::data::ntpservers – users do not touch this, it’s for module authors
- $ntp::data:ntpservers – users do not touch this, it’s for module authors
You can see this extends the list seen above, the module author data space remain in use but we now have a layer on top we can use.
First we create the company wide policy file in Puppet manifests:
modules/data/manifests/common.pp
class data::common { $ntpservers = ["ntp1.example.com", "ntp2.example.com"] }
As Hiera will query this prior to querying any in-module data this will effectively prevent any downloaded module from supplying NTP servers other than ours. This is a company wide policy that applies to all machines unless specifically configured otherwise. This lives with your code in your SCM.
Next we create the data file for machines with fact $location=dc1. Note this data is created in a YAML file outside of the manifests. You can use JSON or any other Hiera backend so if you had this data in a CMDB in MySQL you could easily query the data from there:
hieradb/dc1.yaml
---
ntpservers: - ntp1.dc1.example.com
- ntp2.dc1.example.comAnd this is the really huge win. You can create Hiera plugins to get this data from anywhere you like that magically override your in-manifest data.
Finally here are a few Puppet node blocks:
node "web1.prod.example.com" { $location = "dc1" include ntp::config } node "web1.dev.example.com" { $location = "office" include ntp::config } node "oneoff.example.com" { class{"ntp::config": ntpservers => ["ntp1.isp.net"] } }
These 3 nodes will have different NTP configurations based on their location – you should really make $location a fact:
- web1.prod will use ntp1.dc1.example.com and ntp2.dc1.example.com
- web1.dev will use the data from class data::common
- oneoff.example.com is a complete weird case and you can still use the normal parametrized syntax – in this case Hiera wont be involved at all.
And so we have a very easy to use a natural blend between using param classes from an ENC for full programmatic control without sacrificing the usability for beginner users who can not or do not want to invest the time to create an ENC.
The plugin is available on GitHub as hiera-puppet and you can just install the hiera-puppet gem. You will still need to install the Parser Function into your master until the work I did to make Puppet extendable using RubyGems is merged.
The example above is on GitHub and you can just use that to test the concept and see how it works without the need to plug into your Puppet infrastructure first. See the README.
The Gem includes extlookup2hiera that can convert extlookup CSV files into JSON or YAML.
Hiera: a pluggable hierarchical data store
Note: This project is now being managed by Puppetlabs, its new home is http://projects.puppetlabs.com/projects/hiera
In my previous post I presented a new version of extlookup that is pluggable. This is fine but it’s kind of tightly integrated with Puppet and hastily coded. That code works – and people are using it – but I wanted a more mature and properly standalone model.
So I wrote a new standalone non-puppet related data store that takes the main ideas of using Hierarchical data present in extlookup and made it generally available.
I think the best model for representing key data items about infrastructure is using a Hierarchical structure.

The image above shows the data model visually, in this case we need to know the Systems Administrator contact as well as the NTP servers for all machines.
If we had production machines in dc1, dc2 and our dev/testing in our office this model will give the Production machines specific NTP servers while the rest would use the public NTP infrastructure. DC1 would additional have a specific Systems Admin contact, perhaps it’s outsourced to your DR provider.
This is the model that extlookup exposed to Puppet and that a lot of people are using extensively.
Hiera extracts this into a standalone project and ships with a YAML backend by default, there are also JSON and Puppet ones available.
It extends the old extlookup model in a few key ways. It has configuration files of it’s own rather than rely on Puppet. You can chain multiple data sources together and the data directories are now subject to scope variable substitution.
The chaining of data sources is a fantastic ability that I will detail in a follow up blog post showing how you would use this to create reusable modules and make Puppet parametrized classes usable – even without an ENC.
It’s available as a gem using the simple gem install hiera and the code is on GitHub where there is an extensive README. There is also a companion project that let you use JSON as data store – gem install hiera-json. These are the first Gems I have made in years so no doubt they need some love, feedback appreciated in GitHub issues.
Given the diagram above and data setup to match you can query this data from the CLI, examples of the data is @ GitHub:
$ hiera ntpserver location=dc1 ntp1.dc1.example.com
If you were on your Puppet Master or had your node Fact YAML files handy you can use those to provide the scope, here the yaml file has a location=dc2 fact:
$ hiera ntpserver --yaml /var/lib/puppet/yaml/facts/example.com ntp1.dc2.example.com
I have a number of future plans for this:
- Currently you can only do priority based searches. It will also support merge searches where each tier will contribute to the answer. The answer will be a merged hash
- A Puppet ENC should be written based on this data. This will require the merge searches mentioned above.
- More backends
- A webservice that can expose the data to your infrastructure
- Tools to create the data – this should really be Foreman and/or Puppet Dashboard like tools but certainly CLI ones should exist too.
I have written a new Puppet backend and Puppet function that can query this data. This has turned out really great and I will make a blog post dedicated to that later, for now you can see the README for that project for details. This backend lets you override in-module data supplied inside your manifests using external data of your choice. Definitely check it out.
Pluggable Extlookup for Puppet
NOTE: This ended up being a proof of concept for a more complete system called Hiera please consider that instead.
Back in 2009 I wrote the first implementation of extlookup for Puppet later on it got merged – after a much needed rewrite – into Puppet mainstream. If you don’t know what extlookup does please go and read that post first.
The hope at the time was that someone would make it better and not just a hacky function that uses global variables for its config. I was exploring some ideas and showing how rich data would apply to the particular use case and language of Puppet but sadly nothing has come of these hopes.
The complaints about extlookup fall into various categories:
- CSV does not make a good data store
- I have a personal hate for the global variable abuse in extlookup, I was hoping Puppet config items will become pluggable at some point, alas.
- Using functions does not let you introspect the data usage inside your modules for UI creation
- Other complaints fall in the ‘Not Invented Here’ category and the ‘TL; DR’ category of people who simply didn’t bother understanding what extlookup does
The complaint about using functions to handle data not being visible to external sources is valid. Puppet has not made introspection of classes and their parameters easy for ENCs yet so this just seems to me like people who don’t understand that extlookup is simply a data model not a prescription for how to use the data. In a follow up post I will show an extlookup based ENC that supports parametrized classes and magical data resolution for those parametrized classes using the exact same extlookup data store and precedence rules.
Not much to be done for the last group of people but as @jordansissel said “haters gonna hate, coders gonna code!”.
I have addressed the first complaint now by making an extlookup that is pluggable so you can bring different backends.
First of course, in bold defiance of the Ruby Way, I made it backward compatible with older versions of extlookup and gave it a 1:1 compatible CSV backend.
I addressed my global variable hate by adding a config file that might live in /etc/puppet/extlookup.yaml.
Status
I wrote this code yesterday afternoon, so you should already guess that there might be bugs and some redesigns ahead and that it will most likely destroy your infrastructure. I will add unit tests to it etc so please keep an eye on it and it will become mature for sure.
I have currently done backends for CSV, YAML and Puppet manifests. A JSON one will follow and later perhaps one querying Foreman and other data stores like that.
The code lives at https://github.com/ripienaar/puppet-extlookup.
Basic Configuration
Configuration of precedence is a setting that applies equally to all backends, the config file should live in the same directory as your puppet.conf and should be called extlookup.yaml. Examples of it below.
CSV Backend
The CSV backend is backward compatible, it will also respect your old style global variables for configuration – but the other backends wont. To configure it simply put something like this in your config file:
---
:parser: CSV
:precedence:
- environment_%{environment}
- common
:csv:
:datadir: /etc/puppet/extdataYAML Backend
The most common proposed alternatives to extlookup seem to be YAML based. The various implementations out there though are pretty weak and seemed to get bored with the idea before reaching feature parity with extlookup. With a plugable backend it was easy enough for me to create a YAML data store that has all the extlookup features.
In the case of simple strings being returned I have kept the extlookup feature that parses variables like %{country} in the result data out from the current scope – something mainline puppet extlookup actually broke recently in a botched commit – but if you put hash or array data in the YAML files I don’t touch the data.
Sample data:
--- country: uk ntpservers: - 1.uk.pool.ntp.org - 2.uk.pool.ntp.org foo.com: contact: webmaster@foo.com docroot: /var/www/foo.com
All of this data is accessible using the exact same extlookup function. Configuration of the YAML backend:
---
:parser: YAML
:precedence:
- environment_%{environment}
- common
:yaml:
:datadir: /etc/puppet/extdataPuppet Backend
Nigel Kersten has been working on the proposal of a new data format called the PDL. I had pretty high hopes for the initial targeted feature list but now it seems to have been watered down to a minimal feature set extlookup with a different name and backend.
I implemented the proposed data lookup in classes and modules as a extlookup backend and made it full featured to what you’d expect from extlookup – full configurable lookup orders and custom overrides. Just like we’ve had for years in the CSV version.
Personally I think if you’re going to spend hours creating data that describes your infrastructure you should:
- Not stick it in a language that’s already proven to be bad at dealing with data
- Not stick it in a place where nothing else can query the data
- Not stick it in code that requires code audits for simple data changes – as most change control boards really just won’t see the difference.
- Not artificially restrict what kind of data can go into the data store by prescribing a unmovable convention with no configuration.
When I show the extlookup based ENC I am making I will really show why putting data in the Puppet Language is like a graveyard for useful information and not actually making anything better.
You can configure this backend to behave exactly the way Nigel designed it using this config file:
---
:parser: Puppet
:precedence:
- %{calling_class}
- %{calling_module}
:puppet:
:datasource: dataWhich will lookup data in these classes:
- data::$calling_class
- data::$calling_module
- $calling_class::data
- $calling_module::data
Or you can do better and configure proper precedence which would replace the 1st 2 above with ones for datacenter, country, whatever. The last 2 will always be in the list. An alternative might be:
- data::$customer
- data::$environment
- $calling_class::data
- $calling_module::data
You could just configure this behavior with the extlookup precedence setting. Pretty nice for those of you feeling nostalgic for config.php files as hated by Sysadmins everywhere.
And as you can see you can also configure the initial namespace – data – in the config file.
Monitor Framework: Minimal Configuration
This is a follow-up post to other posts I’ve done regarding a new breed of monitoring that I hope to create.
I’ve had some time to think about configuration of monitoring. This is a big pain point in all monitoring systems. Many require you configure all your resources, dependencies etc often in text files. Others have API that you can automate against and the worst ones have only a GUI.
In the modern world where we have configuration management this end up being a lot of duplication, your CM system already knows about inter dependencies etc. Your CM’s facts system could know about contacts for a machine and I am sure we could derive a lot of information from these catalogs. Today bigger sites tend to automate the building of monitor config files using their CM systems but it tends to be slow to adapt to network conditions and it’s quite a lot of work.
I spoke a bit about this in the CMDB session at Puppet Camp so thought I’d lay my thoughts down somewhere proper as I’ve been talking about this already.
I’ve previously blogged about using MCollective for monitoring based on discovery. In that post I pointed out that not all things are appropriate to be monitored using this method as you don’t know what is down. There is an approach to solving this problem though. MCollective supports building databases of what should be there – it’s called Registration. By correlating the discovered information with the registered information you can defer what is absent/unknown or down.
Ideally this is as much configuration as I want to have for monitoring mail queue sizes on all my smart hosts:
scheduler.every '1m' do nrpe("check_mailq", :cf_class => "exim::smarthost") end
This still leaves a huge problem, I can ask for my a specific service to be monitored on a subset of machines but I cannot defer parent child relationships or know who to notify and this is a huge problem.
Now as I am using Puppet to declare these services and using Puppet based discovery to select which machines to monitor I would like to declare parent child relationships in Puppet even cross-node ones.
The approach we are currently testing is around loading all my catalogs for all my machines into Neo4J – a purpose built graph database. I am declaring relationships in the manifests and post processing the graph to create the cross node links.
This means we have a huge graph of graphs containing all inter node dependencies. The image shows visually how a small part of this might look. Here we have a Exim service that depends on a database on a different machine because we use a MySQL based Greylisting service.
Using this graph we can answer many questions, among others:
- When doing notifications on a failure in MySQL do not notify about mailq on any of the mail servers
- What other services are affected by a failure on the MySQL Server, if you exposed this to your NOC in a good UI you’ll have to maintain a whole lot less documentation and they know who to call.
- If we are going to do maintenance on the MySQL server what related systems should we schedule downtime on
- What single points of failure exist in the infrastructure
- While planning maintenance on shared resources in big teams with many different groups using databases, find all stake holders
- Create action rule that will shut down all Exim cleanly after failure of the MySQL – mail will spool safely at senders
If we combine this with a rich set of facts we can create a testing framework – perhaps something cucumber based – that let us express infrastructure tests. Platform managers should be able to express baseline design principles the various teams should comply to. These tests are especially important in dynamic environments like ones managed by cloud auto scalers:
- Find all machines with no declared dependencies
- Write a test to check that all shards in a MongoDB cluster has more than 1 member
- Make sure all our MySQL databases are not in the same availability zone
- Find services that depend on each other but that co-habit in the same rack.
- If someone accidentally removes a class from Puppet that manage a DB machine, alert on all failed dependencies that are now unmanaged
And finally we can create automated queries into this database:
- When auto scaling make sure we never end up shutting down machines that would break a dependency
- For an outage on the MySQL server find all related node and their contact information, notify the right people
- When adding nodes using auto scalers make sure we start nodes in different availability zones. If we overlay latency information we can intelligently pick the fastest non-local zone to place a node
The possibilities of pulling in graphs from CM all into one huge queryable data source that understands structure and relationships is really endless. You can see how we have enough information here to derive all the parent child relationships we need for intelligent monitoring.
Ideally Puppet itself would support cross node dependencies but I think that’s some way off. So we have created a hacky solution to declare the relationships now. I think though we need a rich set of relationships. Hard relationships like we have in Puppet now where failure will cause other resources to fail. But we might also have soft relationships that just exist to declare relationships that other systems like monitoring will query.
This is a simple overview of what I have in mind, I expect in the next day or three a follow up post by a co-worker that will show some of the scripts we’ve been working on showing actual queries over this huge graph. We have it working, just polishing things up a bit still.
On a side note, I think one of the biggest design wins in Puppet is that it’s data based. It’s not just a bunch of top-down scripts being run like your old Bash scripts you used to build boxes. Its a directed graph with relationships, that’s queryable and can be used to build other systems, this is a big deal in next generational thinking about systems and I think the above post highlights just a small number of the possibilities this graph brings.
Monitoring Framework: Composable Architectures
I’ve been working on rewriting my proof of concept code into actual code I might not feel ashamed to show people, this is quite a slow process so hang in there.
If you’ve been reading my previous posts you’ll know I aim to write a framework for monitoring and event correlation. This is a surprisingly difficult problem space mostly due to the fact that we all have our own ideas of how this should work. I will need to cater literally for all kind of crazy to really be able to call this a framework.
In the most basic form it just take events, archive them, process metrics, process status and raise alerts. Most people will recognize these big parts in their monitoring systems but will also agree there is a lot more to it than this. What describes the extra bits will almost never be answered in a single description as we all have unique needs and ideas.
The challenge I face is how to make an architecture that can be changed, be malleable to all needs and in effect be composable rather than a prescribed design. These are mostly solved problems in computer science however I do not wish to build a system only usable by comp sci majors. I want to build something that infrastructure developers (read: DevOps) can use to create solutions they need at a speed reaching proof-of-concept velocity while realizing a robust result. These are similar to the goals I had when designing MCollective.
In building the rewrite of the code I opted for this pattern and realized it using middleware and a simple routing system. Routing in middleware is very capable like this post about RabbitMQ describes but this is only part of the problem.
Given the diagram above and given that events can be as simple as a metric for load and as complex as a GitHub commit notify containing sub documents for 100s of commits and can be a mix of metrics, status and archive data we’d want to at least be able to configure these behaviors and 100s like them:
- Only archive a subset of messages
- Only route metrics for certain types of events into Graphite while routing others into OpenTSDB
- Only do status processing on events that has enough information to track state
- Dispatch alerts for server events like load average alerts to a different system than alerts for application level events like payments per hour below a certain threshold. These are often different teams with different escalation procedures.
- Dispatch certain types of metrics to a system that will do alerting based on real time analysis of past trends – this is CPU intensive and you should only subject a subset of events to this processing
- Route a random 10% of the firehose of events into a development environment
- Inject code – in any language – in between 2 existing parts of the event flow and alter the events of route them to a temporary debugging destination.
We really need a routing system that can plug into any part of the architecture and make decisions based on any part of the event.
I’ve created a very simple routing system in my code that plugs all the major components together. Here’s a simple route that sends metrics off to the metric processor. It transforms events that contain metrics into graphite data:
add_route(:name => "graphite", :type => ["metric", "status"]) do |event, routes| routes << "stomp:///queue/events.graphite" unless event.metrics.empty? end
You can see from the code that we have access to the full event, a sample event is below, and we can make decisions based on any part of the event.
{"name":"puppetmaster",
"eventid":"4d9a33eb2bce3479f50a86e0",
"text":"PROCS OK: 2 processes with command name puppetmasterd",
"metrics":{},"tags":{},
"subject":"monitor2.foo.net",
"origin":"nagios bridge",
"type":"status",
"eventtime":1301951466,
"severity":0}This event is of type status and has no metrics so it would not have been routed to the graphite system, while the event below has no status only metrics:
{"name":"um.bridge.nagios",
"created_at":1301940072,
"eventid":"4d9a34462bce3479f50a8839",
"text":"um internal stats for um.bridge.nagios",
"metrics":{"events":49},
"tags":{"version":"0.0.1"},
"subject":"monitor2.foo.net",
"origin":"um stat",
"type":"metric",
"extended_data":{"start_time":1301940072},
"eventtime":1301951558,"severity":0}By simply supplying this route:
add_route(:name => "um_status", :type => ["status", "alert"]) do |event, routes| routes << "stomp:///queue/events.status" end
I can be sure this non status bearing event of type metric wouldn’t reach the status system where it will just waste resources.
You can see the routing system is very simple and sit at the emitting side of every part of the system. If you wanted to inject code between the main Processor and Graphite here simply route the events to your code and then back into graphite when you’re done transforming the events. As long as you can speak to middleware and process JSON you can inject logic into the flow of events.
I hope this will give me a totally composable infrastructure, I think the routes are trivial enough that almost anyone can write and tweak them and since I am using the most simplest of technologies like JSON almost any language can be used to plug into this framework and consume the events. Routes can be put into the system without restarting anything, just drop the files down and touch a trigger file – the routes will immediately become active.
The last example I want to show is the development route I mentioned above that siphons off roughly 10% of the firehose into your dev systems:
add_route(:name => "development", :type => "*") do |event, routes| routes << "stomp:///topic/development.portal" if rand(10) == 1 end
Here I am picking all event types, I am dumping it into a topic called development.portal but only in roughly 10% of cases. We’re using a topic since they dont buffer or store or consume much memory when the development system is down – events will just be lost when the dev system is down.
I’d simply drop this into /etc/um/routes/portal/development.rb to configure my production portal to emit raw events to my development event portal.
That’s all for today, as mentioned this stuff is old technology and nothing here really solves new problems but I think the simplicity of the routing system and how it allows people without huge amounts of knowledge to re-compose code I wrote in new and interesting ways is quite novel in the sysadmin tool space that’s all too rigid and controlled.
Monitoring Framework: Event Correlation
Since my last post I’ve spoken to a lot of people all excited to see something fresh in the monitoring space. I’ve learned a lot – primarily what I learned is that no one tool will please everyone. This is why monitoring systems are so hated – they try to impose their world view, they’re hard to hack on and hard to get data out. This served only to reinforce my believe that rather than build a new monitoring system I should build a framework that can build monitoring systems.
DevOps shops who can cut code, should be able to build the monitoring they want, not the monitoring their vendor thought they want.
Thus my focus has not been on how can I declare relationships between services, or how can I declare an escalation matrix. My focus has been on events and how events relate to each other.
Identifying an Event
Events can come from many places, in the recent video demo I did you saw events from Nagios and events from MCollective. I also have event bridges for my Apache Blackbox, SNMP Traps and it would be trivial to support events from GitHub commit hooks, Amazon SNS and really any conceivable source.
Events need to be identified then so that you can send information related to the same event from many sources. Your trap system might raise a trap about a port on a switch but your stats poller might emit regular packet counts – you need to know these 2 are for the same port.
You can identify events by subject and by name together they make up the event identity. Subject might be a FQDN of a host and name might be load or cpu usage.
This way if you have many ways to input information related to some event you just need to identify them correctly.
Finally as each event gets stored they get given a unique ID that you can use to pull out information about just a specific instance of an event.
Types Of Event
I have identified a couple of types of event in the first iteration:

- Metric – An event like the time it took to complete a Puppet run or the amount of GET requests served by a vhost
- Status – An event associated with an up/down style state transition, can optional embed a metrics event
- Archive – An event that you just wish to archive along with others for later correlation like a callback from GitHub saying code was comitted and by whom
The event you see on the right is a metric event – it doesn’t represent one specific status and it’s a time series event which in this case got fed into Graphite.
Status events get tracked automatically – a representation is built for each unique event based on its subject and name. This status representation can progress through states like OK, Warning, Critical etc. Events sent from many different sources gets condensed and summarized into a single status representing how that status looks based on most recent received data – regardless of source of the data.
Each state transition and each non 0 severity event will raise an Alert and get routed to a – pluggable – notification framework or frameworks.
Event Associations and Metadata
Events can have a lot of additional data past what the framework needs, this is one of the advantages of NoSQL based storage. A good example of this would be a GitHub commit hook. You might want to store this and retain the rich data present in this event.
My framework lets you store all this additional data in the event archive and later on you can pick it up based on event ID and get hold of all this rich data to build reactive alerting or correction based on call backs.
Thanks to conversations with @unixdaemon I’ve now added the ability to tag events with some additional data. If you are emitting many events from many subsystems out of a certain server you might want to embed into the events the version of software currently deployed on your machine. This way you can easily identify and correlate events before and after an upgrade.
Event Routing
So this is all well and fine, I can haz data, but where am I delivering on the promise to be promiscuous with your data routing it to your own code?
- Metric data can be delivered to many metrics emitters. The Graphite one is about 50 lines of code, you can run many in parallel
- Status data is stored and state transitions result in Alert events. You can run many alert receivers that implement your own desired escalation logic
For each of these you can write routing rules that tell it what data to route to your code. You might only want data in your special metrics consumer where subject =~ /blackbox/.
I intent to sprinkle the whole thing with a rich set of callbacks where you can register code that declares an interest in metrics, alerts, status transitions etc in addition to the big consumers.
You’d use this code to correlate the amount of web requests in a metric with the ones received 7 days ago. You can then decide to raise a new status event that will alert Ops about trend changes proactively. Or maybe you want to implement your own auto-scaler where you’d provision new servers on demand.
Scaling
How does it scale? Horizontally. My tests have shown that even on a modest (virtual) hardware I am able to process and route in excess of 10 000 events a minute. If that isn’t enough you can scale out horizontally by spreading the metric, status and callback processing over multiple physical systems. Each of the metric, status and callback handlers can also scale horizontally over clusters of servers.
Bringing It All Together
So to show that this isn’t all just talk, here are 2 graphs.

This graph shows web requests for a vhost and the times when Puppet ran.

This graph shows Load Average for the server hosting the site and times when Puppet ran.
What you’re seeing here is a correlation of events from:
- Metric events from Apache Blackbox
- Status and Metric events for Load Averages from Nagios
- Metric events from Puppet pre and post commands, these are actually metrics of how long each Puppet run was but I am showing it as a vertical line
This is a seemless blend of time series data, status data and randomly occurring events like when Puppet runs, all correlated and presented in a simple manner.
Thinking about monitoring frameworks
I’ve been Tweeting a bit about some prototyping of a monitoring tool I’ve been doing and had a big response from people all agreeing something has to be done.
Monitoring is something I’ve been thinking about for ages but to truly realize my needs I needed mature discovery based network addressing and ways to initiate commands on large amounts of hosts in a parallel manner. I have this now in the MCollective and I feel I can start exploring some ideas of how I might build a monitoring platform.
I won’t go into all my wishes, but I’ll list a few big ones as far as monitoring is concerned:
- Current tools represent a sliding scale, you cannot look at your monitoring tool and ever know current state. Reported status might be a window of 10 minutes and in some cases much longer.
- Monitoring tools are where data goes to die. Trying to get data out of Nagios and into tools like Graphite, OpenTSDB or really anywhere else is a royal pain. The problem get much harder if you have many Nagios probes. NDO is an abomination as is storing this kind of data in MySQL. Commercial tools are orders of magnitude worse.
- Monitoring logic is not reusable. Today with approaches like continuous deployment you need your monitoring logic to be reusable by many different parties. Deployers should be able to run the same logic on demand as your scheduled monitoring does.
- Configuration is a nightmare of static text, or worse click driven databases. People mitigate this with CM tools but there is still a long turn around time from node creation to monitored. This is not workable in modern cloud based and dynamic systems.
- Shops with skilled staff are constantly battling decades old tools if they want to extend it to create metrics driven infrastructure. It’s all just too ’90s.
- It does not scale. My simple prototype can easily do 300+ checks a second, including processing replies, archiving, alert logic and feeding external tools like Graphite. On a GBP20/month virtual machine. This is inconceivable with most of the tools we have to deal with.
I am prototyping some ideas at the moment to build a framework to build monitoring systems with.
There’s a single input queue on a middleware system, I expect an event in this queue – mine is a queue distributed over 3 countries and many instances of ActiveMQ.
The event can come from many places maybe from a commit hook at GitHub, fed in from Nagios performance data or by MCollective or Pingdom, the source of data is not important at all. It’s just a JSON document that has some structure – you can send in any data in addition to a few required fields, it’ll happily store the lot.
From there it gets saved into a capped collection on MongoDB in its entirety and gets given an eventid. It gets broken into its status parts and its metric parts and sent to any number of recipient queues. In the case of Metrics for example I have something that feeds Graphite, you can have many of these all active concurrently. Just write a small consumer for a queue in any language and do with the events whatever you want.
In the case of statusses it builds a MongoDB collection that represents the status of an event in relation to past statusses etc. This will notice any state transition and create alert events, alert events again can go to many destinations – right now I am sending them to Angelia, but there could be many destinations with different filtering and logic for how that happens. If you want to build something to alert based on trends of past metric data, no problem. Just write a small plugin, in any language, and plug it into the message flow.
At any point through this process the eventid is available and should you wish to get hold of the original full event its a simple lookup away – there you can find all the raw event data that you sent – stored for quick retrieval in a schemaless manner.
In effect this is a generic plugable event handling system. I currently feed it from MCollective using a modified NRPE agent and I am pushing my Nagios performance data in real time. I have many Nagios servers distributed globally and they all just drop events into a their nearest queue entry point.
Given that it’s all queued and persisted to disk I can create really vast amount of alerts using MCollective – it’s trivial for me to create 1000 check results a second. The events have the timestamp attached of when the check was done and even if the consumers are not keeping up the time series databases will get the events in the right order and right timestamps. So far on a small VM that runs Puppetmaster, MongoDB, ActiveMQ, Redmine and a lot of other stuff I am very comfortably sending 300 events a second through this process without even tuning or trying to scale it.
When I look at a graph of 50 servers load average I see the graph change at the same second for all nodes – because I have an exact single point in time view of my server estate, and what 50 servers I am monitoring in this manner is done using discovery on MCollective. Discovery is obviously no good for monitoring in general – you dont know the state of stuff you didn’t discover – but MCollective can build a database of truth using registration – correlate discovery against registration and you can easily identify missing things.
A free side effect of using an async queue is that horizontal scaling comes more or less for free, all I need to do is start more processes consuming the same queue – maybe even on a different physical server – and more capacity becomes available.
So this is a prototype, its not open source – I am undecided what I will do with it, but I am likely to post some more about its design and principals here. Right now I am only working on the event handling and routing aspects as the point in time monitoring is already solved for me as is my configuration of Nagios, but those aspects will be mixed into this system in time.
There’s a video of the prototype receiving monitor events over mcollective and feeding Loggly for alerts here.