↓ Archives ↓

Category → monitoringsucks

Assimilation Message Formats

I've recently been discussing some nitty gritty details about how packets are laid out on the wire with a member of the Assimilation Project, so it seemed good to explain how our packets are constructed and deconstructed to everyone while I was at it. This blog post talks about how we send lay out the bytes on the wire, with a bit of why and how.

How heartbeats fit into hierarchies of watchers – and pings don’t – or Who will watch the watchmen?

Somehow people seem to think that heartbeats and pings are the same thing. They're not at all the same thing. Heartbeats are typically semi-intelligent elements in a hierarchy of watchers. Pings can't play the same role. This blog post talks about why, and how hierarchies of watchers work both in general, and more specifically in the case of the Assimilation Monitoring Project (AMP)- which breaks with tradition in a useful and novel way.

How to implement "no news is good news" monitoring reliably

Sometimes when you've been doing something for a long time, it's easy to take for granted the things you know. In recent days, I've run into a several people who think that a "no news is good news" methodology for monitoring can't possibly be reliable. So, this blog post is about that - how the Assimilation Monitoring Project (AMP) follows a "no news is good news" methodology and still reports failures reliably. For monitoring, what you want is to know that if something fails, you'll know. In the worst case, you might not diagnose it correctly, but you'll know that something's wrong. So, let's look at how this is architected and see if we how everything is monitored so that we can see if we can find a way for failures to happen and slip past our monitoring net and not get reported.

Love, MonitoringLove

Last year we were pretty negative about Monitoring, We shouted out that MonitoringSucked ... A year has passed and a lot has changed ... most importantly our new found love for monitoring, thanks to an inspirational Ignite talk by Ulf Mansson at devopsdays Rome.

Right after Fosdem about 20 people showed up at the #monitoringlove hacksessions hosted at the Inuits.eu offices to work on Open Source monitoring projects and exchange ideas. Some completely new people, some people with already a lot of experience.

Amongst the projects that were worked on was Maciej working on Packaging graphite for Debian, Ohter people were fixing bugs in Puppet , I spent some time with a vagrant box to deploy Sensu using Puppet. Last time I was playing with Sensu was on the flight back from PuppetCon , I gave up the fight with
RabbitMQ and SSL because I had no internet connection .. and now Ulf just pointed out that I could disable SSL at all, which resulted in having a POC up and running in no time.

Patrick was hacking on the Chef counterpart of the vagrant-puppet sensu setup a part of #monigusto. Ulf Mansson was getting dashing to display on a Raspberry Pi ... pretty cool stuff
And Jelle Smet was working on Pyseps a Python based Simple Event Processing Server framework that consume JSON docs from RabbitMQ and forwards them real time to other queues using MongoDB query syntax.

One of the more interesting discussion was around the topic of alerting and modeling business rules and input from a lot of different sources
in order to send the right alerts to the right people.

We explored different ideas like using BPM tools such as Activity or Rules engines like Ruby Rools. There exist some Saas providers that try to solve this need like PagerDuty and friends but obviously there is still a lot of work that needs to be done in order to create a viable alerting system based on different input sources.

The monitoring problem is not solved yet .. and it will stay around for a couple of years .. but with the advent of event such as Monitorama its clear
that an event like our #monitoring love hackessions is needed .. and is probably here to stay for a couple of years.

Reliable UDP: The last major Assimilation feature before the first release

I'm still on track for a first release of the Assimilation code by the end of the year. But there is one last interesting (meaning tricky) feature to write before this release. All communication is over UDP, which means the OS doesn't guarantee packet delivery. So we need to do that ourselves. From an availability perspective, we need to acknowledge packets at the application layer anyway, so nothing much is lost. (Why that's the case is worthy of it's own post). The most interesting part of this is that our protocol needs to be resilient to replay attacks. This post explains what a replay attack is, and how we plan on eliminating them.

Assimilation Project Licensing

When I founded the Assimilation project, I chose a license in order to have chosen a license. I always assumed I would make a final license decision before the first release. With that time coming up in the forseeable future, it seems like time to give thought to a more permanent license decision. This blog entry outlines my thoughts on choice of licenses and related issues.

Assimilation Monitoring LinuxCon Video

I mentioned a few weeks ago that my talk at LinuxCon in San Diego had been very well received. Thanks to some good friends, we also created a video of the event, and this week I want to point you to the final cut of that video. This talk is a great introduction to the Assimilation Monitoring Project.

I see dead servers – in O(1) time

The title for this blog post comes from a T-shirt I had made for the Assimilation Project. I wore a nicer version of it at my recent talk at LinuxCon 2012. The Assimilation project has some significant and unique claims to scalability. Some of these have been discussed before. This blog article will explain the different aspects of the project and how they measure up in terms of scalability.

Injecting Nanoprobes into Servers – What’s that about?

I've recently had some people who have asked about the how nanoprobes work – are they clients, or what exactly are they? They start out like clients, and behave in some ways like peers, and maybe a bit like servers. So what the heck are they? The simplest explanation is that they are autonomous delegates of the central management authority. Read on to find out more about how this unconventional model works and why this authority model is key to unprecedented scalability and stealth discovery™ in the discovery-driven Assimilation monitoring project.

An Assimilation type schema in Neo4j

This week I want to talk about an aspect of the Assimilation database schema which is somewhat controversial, an aspect of the schema for which the jury is still out. I chose to represent the Assimilation node type hierarchy with relationships which currently serve no purpose other than to represent the types of nodes in the database. This post will talk about why I put the type hierarchy in, and why it might be a good idea, or maybe not.