↓ Archives ↓

Category → metrics

Devops Areas – Codifying devops practices

While working on the Devops Cookbook with my fellow authors Gene Kim,John Willis,Mike Orzen we are gathering a lot of "devops" practices. For some time we struggled with structuring them in the book. I figured we were missing a mental model to relate the practices/stories to.

This blogpost is a first stab at providing a structure to codify devops practices. The wording, descriptions are pretty much work in progress, but I found them important enough to share to get your feedback.

Devops in the right perspective

As you probably know by now, there are many definitions of devops. One thing that occasionally pops up is that people want to change the name to extend it to other groups within the IT area: star-ops, dev-qa-ops, sec-ops, ... From the beginning I think people involved in the first devops thinking had the idea to expand the thought process beyond just dev and ops. (but a name bus-qa-sec-net-ops would be that catchy :).

I've started reffering to :

  • devops : collaboration,optimization across the whole organisation. Even beyond IT (HR, Finance...) and company borders (Suppliers)
  • devops 'lite' : when people zoom in on 'just' dev and ops collaboration.

As rightly pointed out by Damon Edwards , devops is not about a technology , devops is about a business problem. The theory of Contraints tells us to optimize the whole and not the individual 'silos'. For me that whole is the business to customer problem , or in lean speak, the whole value chain. Bottlenecks and improvements could be happen anywhere and have a local impact on the dev and ops part of the company.

So even if your problem exists in dev or ops, or somewhere between, the optimization might need to be done in another part of the company. As a result describing pre-scriptive steps to solve the 'devops' problem (if there is such a problem) are impossible. The problems you're facing within your company could be vastly different and the solutions to your problem might have different effects/needs.

If not pre-scriptive, we can gather practices people have been doing to overcome similar situations. I've always encouraged people to share their stories so other people could learn from them. (one of the core reasons devopsdays exists) This helps in capturing practices, I'd leave it in the middle to say that they are good or best practices.

Currently a lot of the stories/practices are zooming in on areas like deployment, dev and ops collaboration, metrics etc.. (Devops Lite) . This is a natural evolution of having dev and ops in the term's name and given the background of people currently discussing the approaches. I hope that in the future this discussion expands itself to other company silos too: f.i. synergize HR and Devops(Spike Morelli) or relate our metrics to financial reporting.

Another thing to be aware of is that a system/company is continously in flux: whenever something changes to the system it can have an impact; So you can't take for granted that problems,bottle-necks will not re-emerge after some time. It needs continuous attention. That will be easier if you get closer to a steady-state, but still, devops like security is a journey, not an end state.

Beyond just dev and ops

Let's zoom in on some of the practices that are commonly discussed: the direct field between 'dev' and 'ops'.

In most cases, 'dev' actually means 'project' and 'ops' presents 'production'. Within projects we have methodologies like (Scrum, Kanban, ...) and within operations (ITIL, Visble Ops, ...). Both parts have been extending their project methodology over the years: from the dev perspective this has lead to 'Continous Delivery' and from the Ops side ITIL was extended with Application Life Cycle (ALM). They both worked hard on optimize the individual part of the company and less on integration with other parts. Those methodologies had a hard time solving a bottleneck that outside their 'authority'. I think this where devops kicks in: it seeks the active collaboration between different silos so we can start seeing the complete system and optimize where needed, not just in individual silos.

Devops Areas

In my mental model of devops there are four 'key' areas:

  • Area 1 : Extend delivery to production (Think Jez Humble) : this is where dev and ops collaborate to improve anything on delivering the project to production
  • Area 2 : Extend Operation to project (Think John Allspaw) : all information from production is radiated back to the project
  • Area 3 : Embed Project(Dev) into Operations : when the project takes co-ownership of everything that happens in production
  • Area 4 : Embed Production(Ops) into Project : when operations are involved from the beginning of the project

In each of these areas there will be a bi-directonal interaction between dev and ops, resulting in knowledge exchange and feedback.

Depending on where your most pressing 'current' bottleneck manifests itself, you may want to address things in different areas. There is no need to first address things in area1 than area2. Think of them as pressure points that you can stress but requiring a balanced pressure.

Area 1 and Area2 tend to be heavier on the tools side , but not strictly tools focused. Area3 and Area4 will be more related to people and cultural changes as their 'reach' is further down the chain.

When visualized in a table this gives you:

As you can see:

  • the DEV and OPS part keep having their own internal processes specific to their job
  • the two processes are becoming aligned and the areas extend both DEV and OPS to production and projects
  • it's almost like a double loop with area1 and area2 as the first loop and area3 and area4 as the second loop

Note 1: these areas definitely need 'catchier' names to make them easier to remember. Note 2: Ben Rockwoods post on "The Three Aspects of Devops" lists already 3 aspects but I think the areas make it more specific

Area Layers

In each of these areas, we can interact at the traditional 'layers' tools, process, people:

So whenever I hear story , I try to relate it's practice to one of these areas as described above and the layer it's adressing. Practices can have an impact at different layers so I see them as 'tags' to quickly label stories. Another benefit is that whenever you look at an area, you can ask yourself what practices we can do to improve each of these layers. To have a maximum impact on each of the layers, it's clear that the approach needs to be layered in all three.

The ultimate devops tools would support the whole people and process in all of these areas, not just in Area1 (deployment) or Area2 (monitoring/metrics). Therefore a devops toolchain with different tools interacting in each of the areas makes more sense. Also the tool by itself doesn't make it a devops tool: configuration mangement systems like chef and puppet are great, but when applied in Ops only don't help our problem much. Of course Ops gets infrastructure agilitity, but it isn't until it is applied to the delivery (f.i. to create test and development environments) that it becomes 'devops'. This shows that the mindset of the person applying the tool makes it a devops tool, not the tool by itself.

Area Maturity Levels

Now that we have the areas and layers identified, we want to track progress as we start solving our problems and are improving things.

Adrian Cockroft suggested using CMMI levels for devops:

CMMI levels allow you to quantify the 'maturity' of your process. That addresses only one layer (although an equally important one). In a nutshell CMMI describes the different levels as:

  1. Initial : Unpredictable and poorly controlled process and reactive nature
  2. Managed : Focused on project and still reactive nature
  3. Defined : Focused on organization and proactive
  4. Quantively Managed : Measured and controller approach
  5. Optimizing : Focus on Improvement

All these levels could be applied to dev , ops or devops combined. It gives you an idea at what level process is in, while you are optimizing in an area.

An alternative way of expressing maturity levels is used by the Continuous Integration Maturity Model. It puts a set of practices in levels of maturity: (industry consensus)

  1. Intro : using source control ...
  2. Novice : builds trigger by commit ...
  3. Intermediate : Automated deployment to testing ..
  4. Advanced : Automated Functional testing ...
  5. Insane : Continuous Deployment to Production ...

Instead of focusing on the proces only , it could be applied to a set of tools, process or people practices. What people consider the most advanced would get the highest maturity level.

Practices, Patterns and principles

A practice could be anything from an anecdotal item to a systemic approach. Similar practices can be grouped into patterns to elevate them to another level. Similar to the Software Design Patterns we can start grouping devops practices in devops patterns.

Practices and patterns will rely on principles and it's these underlying principles that will guide you when and you to apply the pattern or practice. These principles can be 'borrowed' from other fields like Lean, Systems Theory etc, Human Psychology. The principles are what the agile manifesto is about for example.

Slowly we will turn the practices -> patterns -> principles .

Note: I'm wondering if there will be new principles that will emerge from from devops itself or it will be apply existing principle to a new perspective.

A few practical examples:

Below are a few example 'practices' codified in a standard template. The practices/patterns/principles are not yet very well described. The point is more that this can serve as a template to codify practices.

Area Indicators

The idea is to list metrics/indicators that can tracked. The numbers as such might be not be too relevant but the rate of change would be. This is similar to tracking the velocity of storypoints or the tracking of mean time to recovery.

Note: I'm scared of presenting these as metrics to track, therefore I call them indicators to soften that.

Examples would be :

  • Tools Layer : Deploys/Day
  • Process Layer : Number of Change Requests/Day
  • People Layer : People Involved per deploy

This is not yet fleshed out enough , I'm guessing it will be based on my research done for my Velocity 2011 Presentation (Devops Metrics)

Devops Scorecard

To present progress during your 'devops' journey you can put all these things in a nice matrix, to get an overview on where you are at optimizing at the different layers and areas.

Obviously this only makes sense if you don't lie to yourself, your boss, your customers.

Project Teams, Product Teams and NOOPS

Jez Humble often talks about project teams evolving to product teams: largere silos will split of not by skill, but for product functionality they are delivering. Splitting teams like that, has the potential danger of creating new silos. It's obvious these product teams need to collaborate again. You should treat other product teams are external dependencies, just like other Silos. The areas of interaction will be very similar.

Also you can see the term NOOPS as working with product teams outside your company, like you rely on SAAS for certain functions. It's important not only to integrate in each of the areas on the tools layer, but also on the people and process layer. Something that is often forgotten. Automation and abstraction allows you to go faster but when things fail or even changes occur, synchronisation needs to happen.

CAMS and areas

The CAMS acronym (Culture, Automation, Measurement, Sharing) could be loosely mapped onto the areas structure:

  • Automation seems to map to Area1: the delivery process
  • Measurement seems to map to Area2: the feedback process
  • Culture to Area3 : embedded devs in Production
  • Sharing to Area4: embedded ops in Projects

Of course automation, measurement, culture and sharing can happen in any of the areas, but some of the areas seem to have a stronger focus on each of these parts.

Conclusion

Devops areas, layers and maturity levels, give us a framework to capture new practices stories and it can be used to identify areas of improvements related to the devops field. I'd love feedback on this. If anyone wants to help, I'd like to bring up a website where people can enter their stories in this structure and make it easily available for anyone to learn. I don't have too much CPU cycles left currently , but I'm happy to get this going :)

P.S. @littleidea: I do want to avoid the FSOP Cycle

Monitoring Wonderland Survey – Visualization

A picture tells more than a ...

Now that you've collected all the metrics you wanted or even more , it's time to make them useful by visualizing them. Every respecting metrics tool provides a visualization of the data collected. Older tools tended to revolve around creating RRD graphics from the data. Newer application are leveraging javascript or flash frameworks to have the data updated in realtime and rendered by the browser. People are exploring new ways of visualizing large amounts of data efficiently. A good example is Visualizing Device Utilization by Brendan Gregg. or Multi User - Realtime heatmap using Nodejs

Several interesting books have been written about visualization:

Dashboard written for specific metric tools

Graphite

Graphs are Graphite's killer feature, but there's always room for improvement:

Grockets - Realtime streaming graphite data via socket.io and node.js

Opentsdb

Graphs in Opentsdb are based on Gnuplot

Ganglia

Collectd

Nagios

Nagios also has a way to visualize metrics in it's UI

Overall integration

With all these different systems creating graphs, the nice folks from Etsy have provided a way to navigate the different systems easily via their dashboard - https://github.com/etsy/dashboard

I also like the Idea of Embeddable Graphs as http://explainum.com implements it

Development frameworks for visualization

Generic data visualization

There are many javascript graphing libraries. Depending on your need on how to visualize things, they provide you with different options. The first list is more a generic graphic library list

Time related libraries

To plot things many people now use:

For timeseries/timelines these libraries are useful:

And why not have Javascript generate/read some RRD graphs :

Annotations of events in timeseries:

On your graphs you often want event annotated. This could range from plotting new puppet runs , tracking your releases to everything that you do in the proces of managing your servers. This is what John Allspaw calls Ops-Metametrics

These events are usually marked as vertical lines.

Dependencies graphs

One thing I was wondering is that with all the metrics we store in these tools, we store the relationships between them in our head. I researched for tools that would link metrics or describe a dependency graph between them for navigation.

We could use Depgraph - Ruby library to create dependencies - based n graphviz to draw a dependency tree, but we obviously first have to define it. Something similar to the Nagios dependency model (without the strict host/service relationship of course)

Conclusion

With all the libraries to get data in and out and the power of javascript graphing libraries we should be able to create awesome visualizations of our metrics. This inspired me and @lusis to start thinking about creating a book on Metrics/Monitoring graphing patterns. Who knows ...

Monitoring Wonderland Survey – Moving up the stack Application and User metrics

While all the previously described metric systems have easy protocols, they tend to stay in Sysadmin/Operations land. But you should not stop there. There is a lot more to track than CPU,Memory and Disk metrics. This blogpost is about metrics up the stack: at the Application Middleware, Application and the User Usage.

To the cloud

Application Metrics

Maybe grumpy sysadmins have scared the developers and business to the cloud. It seems that the space of Application metrics, whether it's Ruby, Java , PHP is being ruled today by New Relic In a blogpost New Relic describes serving about 20 Billion Metrics A day.

It allows for easy instrumentation of ruby apps, but they also have support for PHP, Java, .NET, and Python

Part of their secret of success is the easy at how developers can get metrics from their application by adding a few files, and a token.

Several other cloud monitoring vendors are stepping into arena, and I really hope to see them grow the space and give some competition:

Some other complementary services, popular amongst developers are:

Check this blogpost on Monitoring Reporting Signal, Pingdom, Proby, Graphite, Monit , Pagerduty, Airbrake to see how they make a powerful team.

User tracking Metrics - Cloud

Clicks, Page view etc ...

Besides the application metrics, there is one other major player in web metrics. Google Analytics

I found several tools to get data out of it using the Google Analytics API

With google Analytics there is always a delay on getting your data;

If you want to have realtime statistics/metrics checkout Gaug.es http://get.gaug.es :

A/B Testing

Haven't really gotten into this, but well worth exploring getting metrics out of A/B testing

Page render time

Another important to track is the page render time. This is well explained in the Real User Monitoring- Chapter 10 - Complete Web Monitoring - O'Reilly Media

Again Newrelic provides RUM : Real User Monitoring. See How we provide real user monitoring: A quick technical review for more technical info

Who needs a cloud anyway

Putting your metrics into the cloud can be very convenient , but it has downsides:

  • most tools don't have way to redirect/replicate the metrics they collect internally
  • that makes it hard to correlate with your internal metrics
  • it's easy to get metrics in, but hard to get the full/raw data out again
  • it depends on the internet , duh, and sometimes this fails :)
  • or privacy or the volume of metrics just isn't possible to put it out in the cloud

Application Metrics - Non - Cloud

In his epic Metrics Anywhere, Codahale explains the importance of instrumenting your code with metrics. This looks very promising as this is really driven from the developers world:

Java

Or you can always use JMX to monitor/metrics from your application

And with JMX-trans http://code.google.com/p/jmxtrans you can feed jmx information into Graphite, Ganglia, Cacti/Rrdtool,

Other

Esty style: StatsD

To collect various metrics, Etsy has created StatD https://github.com/etsy/statsd a network daemon for aggregating statistics (counters and timers), rolling them up, then sending them to graphite.

There have been written clients in many languages php, java, ruby etc..

Other companies have been raving about the benefits of StatsD and for example Shopify has completely integrated it in their environment

It's incredible to see the power and simplicity of this; I've created a simple Proof of Concept to extract the statsd metrics on ZeroMQ in this experimental fork

MetricsD https://github.com/tritonrc/metricsd tries to marry both Etsy's statsD and the Coda Hale / Yammer's Metrics Library for the JVM and puts the data into Graphite. It should be drop-in compatible with Etsy's statsd, although with added explicit support for meters (with the m type) and gauges (with the g type) and introduce the h (histogram) type as an alias for timers (ms).

User tracking - Non Cloud

Clicks, Page view etc ...

Here are some Open Source Web Analytics libraries. These are merely links, haven't investigated it enough, work in progress

Another tool worth mentioning for tracking endusers is HummingBird - http://hummingbirdstats.com/ . It is NodeJS based an allows for realtime web traffic visualization. To send metrics is has a very simple UDP protocol.

A/B Testing

At Arrrrcamp I saw a great presentation on A/B Testing by Andrew Nesbitt(@teabass. Do watch the video to get inspired!

He pointed out several A/B testing frameworks:

And presented his own A/B Testing framework: Split - http://github.com/andrew/split

It would be interesting to integrate this further into traditional Monitoring/Metrics tools. View metrics per new version/enabled flags etc... In a Nutshell food for thought.

Page render time

For checking the page render time, I could not really found Open Source Alternatives.

There is a page by Steve Sounders about Episodes http://stevesouders.com/episodes/paper.php. Or you can track your Apache logs with Mod Log I/O

Conclusion

It's exciting to see the cross over between both development, operations and business. Up until now only New Relic has a very well integrated suite for all metrics. Hope the internal solutions catch up.

Now that we have all that data, it's time to talk about dashboards and visualization. On to the next blogpost.

If you are using other tools, have ideas, feel free to add them in the comments.

Devops Metrics – Velocityconf 2011

Whenever you hear a new theory or idea (like devops), people ask for proof before they engage. This is only natural I guess. This was the reason why I wanted to explore ways to measure devops success. Or rephrased: "Measuring the devops gap" . The result of my findings were presented at VelocityConf in June. While I initially planned to present with Israel Gat due to circumstances he switched at the last responsible moment with Andrew Shafer.

In the talk we used the metaphor of monitoring to explore the metrics. This probably put a lot of people on the wrong track, expecting a lot of real/ganglia metrics. The main point is that the higher level your monitoring the more interesting: measure the end-user perspective in both technical and human monitoring gives you the most value. Monitoring a server is like monitoring an individual person: good to know ,but it doesn't tell you anything on the end-result. I know this is meta-stuff , so you need your head clear to understand.

One of the nicest findings during the research was that whenever people say collaboration(in general) will improve things, there is a high demand for proof. The phrase "Collaboration is like a black hole, you can only measure it effects by looking at it's effects".

Also it makes no sense to increase the number of interactions between different groups (dev, ops, qa, mgt) for the sake of increasing it: "More interaction doesn't mean a better party" . You should work on the quality of the interactions.

A third take-away I got from the research what that in the Design world, they too are exploring ideas outside the usual user-centered design and are going the way of Participatory design. This fits directly to the devops idea, you automate and do all the stuff you need to, to free up more time in design. And there you can collaborate with your collegues, your peers from other groups and even with your end-users to test out new ideas. In the past design used to be a collective and shared ability. Only in the recent years this craft has become an individual thing.

Enjoy the presentation :