↓ Archives ↓

Rundeck and Automating Operations at Salesforce (Videos)

A few interesting videos have been posted over on Rundeck.org talking about Salesforce’s internal automation project, codename Gigantor. I’ve embedded the videos below.

It’s a great example of using a toolchain philosophy  to quickly build effective solutions at scale:

Rundeck is the workflow engine and system of record
SaltStack is the distributed execution engine (Salesforce’s Kim Ho wrote the SaltStack plugin for Rundeck)
Kingpin is a custom front-end that builds Salesforce-specific concepts and additional security constraints into the user experience

Gigantor_RundeckSalt

 

Kim Ho explains Gigantor’s architecutre and gives a demo of the SaltStack Plugin for Rundeck:

 

Alan Caudill presents an overview of how Gigantor works and some of the design choices:

 

 

The post Rundeck and Automating Operations at Salesforce (Videos) appeared first on dev2ops.

One minute hacks: the nautilus scripts folder

Master SDN hacker Flavio sent me some tunes. They were sitting on my desktop in a folder:

$ ls ~/Desktop/
uncopyrighted_tunes_from_flavio/

I wanted to listen them while hacking, but what was the easiest way…? I wanted to use the nautilus file browser to select which folder to play, and the totem music/video player to do the playing.

Drop a file named totem into:

~/.local/share/nautilus/scripts/

with the contents:

#!/bin/bash
# o hai from purpleidea
exec totem -- "$@"

and make it executable with:

$ chmod u+x ~/.local/share/nautilus/scripts/totem

Now right-click on that music folder in nautilus, and you should see a Scripts menu. In it there will be a totem menu item. Clicking on it should load up all the contents in totem and you’ll be rocking out in no time. You can also run scripts with a selection of various files.

Here’s a screenshot:

nautilus is pretty smart and lets you know that this folder is special

nautilus is pretty smart and even lets you know that this folder is special

I wrote this to demonstrate a cute nautilus hack. Hopefully you’ll use this idea to extend this feature for something even more useful.

Happy hacking,

James

 


One minute hacks: the nautilus scripts folder

Master SDN hacker Flavio sent me some tunes. They were sitting on my desktop in a folder:

$ ls ~/Desktop/
uncopyrighted_tunes_from_flavio/

I wanted to listen them while hacking, but what was the easiest way…? I wanted to use the nautilus file browser to select which folder to play, and the totem music/video player to do the playing.

Drop a file named totem into:

~/.local/share/nautilus/scripts/

with the contents:

#!/bin/bash
# o hai from purpleidea
exec totem -- "$@"

and make it executable with:

$ chmod u+x ~/.local/share/nautilus/scripts/totem

Now right-click on that music folder in nautilus, and you should see a Scripts menu. In it there will be a totem menu item. Clicking on it should load up all the contents in totem and you’ll be rocking out in no time. You can also run scripts with a selection of various files.

Here’s a screenshot:

nautilus is pretty smart and lets you know that this folder is special

nautilus is pretty smart and even lets you know that this folder is special

I wrote this to demonstrate a cute nautilus hack. Hopefully you’ll use this idea to extend this feature for something even more useful.

Happy hacking,

James

 


Common Objections to DevOps from Enterprise Operations

I’ve been in many large enterprise companies helping them learn about devops, helping them understand how to improve their service delivery capability. These companies have heard about devops and are looking for help creating a strategy to adopt devops principles because they need better time to market and higher quality. Not everyone in the company believes in devops for different reasons. To some, devops sounds like a free for all where devs make production changes. To others devops sounds like a bunch of nice sounding high ideals or that devops can’t be adopted because the necessary automation tooling does not exist for their domain.

DevOpsEntOpsObjects

In the enterprise, the operations group is often centralized and supports many different application groups. When it comes to site availability, the buck stops with ops. If there is a performance problem, outage or issue, the ops team is the first line of defense, sometimes escalating issues back to the application team for bug fixes or for help diagnosing a problem.

Enterprises interested in devops are also usually practicing or adopting agile methodology in which case demands on ops happen more often, during sprints (e.g., to set up a test environment) or after a sprint when ops needs to release software to the production site. The quickened pace puts a lot more pressure on the centralized ops team because they often get the work late in the project cycle (i.e., when it’s time to release to production). Because of time pressure or because they are over worked, operations teams have difficulty turning requested work around and begin to hear developers want to do things for themselves. Those users might want to rebuild servers, get shell access, install software, run commands and scripts, provision VMs, modify network ACLs, update load balancers, etc. These users essentially want to do things for themselves and might feel like the centralized ops team needs to get out of their way.

How does the ops team, historically the one responsible for uptime in the production environment, permit or expand access to environments they support? How can they avoid being the bottleneck at the tail end of every application team’s project cycle? How does the business remove the friction but not invite chaos, outages and lack of compliance?

If you’re in this kind of enterprise environment, how do you start approaching devops? If you are a centralized operations team facing the pressure to adopt devops, here are some questions and concerns for the organization to ask or think about. The answer to these questions are important steps to forming your devops strategy.

How does a centralized group handle the work that needs to be done to make applications run in production or across other environments?

For some enterprises, they begin by creating a specialized team called “devops” whose purpose is to solve “devops problems”. Generally, this means making things more operations friendly. This kind of team might also be the group that takes the hand off from application development teams and wrap their software in automation tooling, deploy it, and hand it off to the Site Reliability team. Unfortunately, a centralized devops team can become a silo and suffer from the same “late in the cycle” handoff challenges the traditional ops group sees. Also, there is always more developers and development projects than there can be devops engineers and devops team bandwidth. A centralized devops team can end up facing the same pressures as a traditional QA department does when they try “adding quality testing” as a separate process stage.

To make sure an application operates well in production and across other environments the devops concerns must be baked into the application architecture. This means the work to make applications easy to configure, deploy and monitor is done inside the development stage. The centralized operations group must then learn to develop a shared software delivery process and tool chain. It’s inside the delivery tool chain where the work gets distributed across teams. The centralized ops group can support the tool chain like architects and service providers providing the application development teams a framework and scaffolding to populate the needed artifacts to drive their pipeline.

What about our compliance policies?

Most enterprises abide by a change policy that dictates who can make production changes. Many times this policy is interpreted to mean anybody outside of ops is not allowed to push changes. Software must be handed off to an ops person to push the change. This handoff can introduce extra lead time and possibly errors due to lack of information.

These compliance rules are defined by the business and many times people on the delivery end have never actually read the language of these policies and base process on assumptions or their beliefs formed by tribal knowledge. Over time, tools and processes can morph in arcane ways, twisting into inefficent bureaucracy.

It’s common to find different compliance rules apply depending on the application or customer type. When thinking about how to reduce delivery cycle time, these differences should be taken into account because there might be alternative ways for seeing who and how change can be made.

Besides understanding the compliance rules, it should also be simple and fast to audit your compliance.

This means make it easy to find out:

  • who made the change and were they authorized
  • where the change was applied
  • what change was made and is it acceptable

This kind of query should be instantly accessible and not something done through manual evidence gathering long after the fact (e.g., when something went wrong). Knowing how change was made to an environment should be as visible as seeing a report that shows how busy your servers were in the last 24 hours.
These audit views should contain infrastructure and artifact information because both development and operations people want to know about their environments in software and server terms. A change ticket with a bunch of verbiage and bug links does not paint a complete enough picture.

How do you open access but not lose controls?

After walking through a software delivery process it’s easy to see the flow of work slows anytime the work must be done by a single team that is already past their capacity and is losing effectiveness due to context switching between competing priorities. This is the situation an ops team often finds itself. Ops teams balance work that comes from application development teams (e.g., participate in agile dev sprints), network operations (e.g., handling outages and production issues), business users (e.g., gathering info for compliance, asset info for finance) and finally, their own project work to maintain or improve infrastructure.

To free this process bottleneck the organization must figure out how the work can be redistributed or can be satisified by some self service function. Since deployment, configuration and monitoring are ops concerns that should be designed into the application, distribute this development to the developers. This can really be a collaboration where ops maintains a base set of automation modules and give developers ways to extend it. Create a development environment and tooling that lets developers integrate their changes into this ops framework in their own project sandboxes.
Provide developer access to create hosted environments easily through a self service interface that spins up the VMs or containers and lets them test the ops management code.

Build the compliance auditing logs into the ops management framework so you can track what resources are being created and used. This is important if resource conflicts occur and let you learn where more sandboxing is needed or where more fine grained configuration should be defined.

Moving faster leads to less quality, right?

To the business, moving fast is critical to staying competitive by increasing their velocity of innovation. This need to quicken the software delivery pace is almost always the chief motivation to adopt devops practices.

Devops success stories often begin with how many times deployments are done a day. Ten deploys a day, 1000 deploys a day. To an enterprise these metrics can sound mythical. Some enterprises struggle to make one deploy a month and I have seen some enterprises making major releases on an annual basis and the rollout of this release to their customers taking over 30 days. That’s thirty days of lag time and puts the production environment in an inconsistent state making it hard for everyone to cope with production issues. “Is it the new version or the old version causing this yet unidentified issue?” A primary reason operations is reluctant to move faster is due to the problems that occur during or after a change had been made.

When change leads to problems these are typical outcomes:

  • More control process is added (more approval gates, shorter change windows)
  • Change batches get bigger (cram more work into the given change window)
  • Increase in “emergency fixes” (high priority features get fast tracked to avoid the normal change process)
  • High pressure to make application changes quickly results in patching systems and not through the normal software release cycle.

Given these outcomes the idea of moving faster is crazy because obviously it will lead to breaking more stuff more often.

The question is how do organizations learn to be good at making change to their systems? Firstly, it is helpful to think about what kind of safety practices are important to move change. Moving fast means being able to safely change things fast. Here are some general strategies to consider:

Small batches

Large batches of change require more people on hand due to the volume of work and the work can take longer to get done.
The solution is to push less change through so it’s easier to get it done and have less to check and verify when the change is completed.

Rehearsal

Here’s a good mantra, “Don’t practice until you get it right. Practice until you can’t get it wrong.” Don’t make the production change be the first time you have tried it this way. Your change should have been verified multiple times in non production environments before you tried it in production. Don’t rely on luck. Expect failure.

Verifiable process stages

Whether it is a site build out or an update to an existing application, be sure you have well defined checks for your preconditions. This means if you are deploying an application you have a scripted test that confirms your external or environment dependencies before you do the deployment. If you are building a site, be sure you have confirmed the hardware and network environment before you install the operating platform. Building this kind of automated testing at process stage boundaries adds a huge deal of safety by not letting problems slip down stream. You can use these verification checks to decide to “stop the line”.

Process discipline

What leads to places full of snow flake environments, each full of idiosyncratic, specially customized servers and networks? Lack of discipline. If the organization does not manage change consistently together, everyone ends up doing things their own way. How do you know you have process discipline? Look for how much variation you see. If process differs between environments, that is a variation. Snow flake servers are the symptoms of process variation. Process variation means you don’t have process under control. There are two simple metrics to understand how much control you have over your process: lead time and scrap rate. Lead time is how long it takes you to make the change. Scrap rate is how often the change must be reworked to make it right. Rehersal and verifiable process stages will help you bring process under control by reducing scrap rate and stabilizing lead time. The biggest benefit to process discipline is improving your ability to deliver change predictably. The business depends on predictability. With predictability the business can guage how fast or slow it can move.

More access into ops managed environments?

The better everyone understands how things perform in production the better the organization can design their systems to support operations. Making it hard for developers or testers to see how the service is running only delays improvements that benefit the customer and reduces pressure on operations. It should be easy for anyone to know what version of applications are deployed on what hosts, the host configuration and the performance of the application.

Sometimes data privacy rules make accessing data less straightforward. Some logs contain customer data and regulations might restrict access to only limited users. Instead of saying no or making the data collection and scrubbing process manual, make this data available as an automated self service so developers or auditors can get it for themselves.

Visibility into the production environment is crucial for developers to make their environments production-like. Modeling the development and test envrionment so that it resembles production is another example of reducing variabilty and bringing process under control.

Does this mean shell access for devs?

This question is sometimes the worst one for a traditional enterprise ops team. Often times the question is a symptom of another problem. Why does a dev want shell access to an environment operations is supporting? In a development or early test envrionment shell access might be needed to experiment with developing deployment and configuration code. This is a valid reason for shell access.

Is this request for shell access in a staging or production environment? Requests for shell access could be a sign of ad hoc change methods and undermine the stability of an environment. It’s important that change methods are encapsulated in the automation.

Fundamentally, shell access to live operational environments is a question about risk and trust.


The list doesn’t stop here, but these are the most common questions and concerns  I hear. Feel free to share your experiences in the comments below.

The post Common Objections to DevOps from Enterprise Operations appeared first on dev2ops.

Securely managing secrets for FreeIPA with Puppet

Configuration management is an essential part of securing your infrastructure because it can make sure that it is set up correctly. It is essential that configuration management only enhance security, and not weaken it. Unfortunately, the status-quo of secret management in puppet is pretty poor.

In the worst (and most common) case, plain text passwords are found in manifests. If the module author tried harder, sometimes these password strings are pre-hashed (and sometimes salted) and fed directly into the consumer. (This isn’t always possible without modifying the software you’re managing.)

On better days, these strings are kept separate from the code in unencrypted yaml files, and if the admin is smart enough to store their configurations in git, they hopefully separated out the secrets into a separate repository. Of course none of these solutions are very convincing to someone who puts security at the forefront.

This article describes how I use puppet to correctly and securely setup FreeIPA.

Background:

FreeIPA is an excellent piece of software that combines LDAP and Kerberos with an elegant web ui and command line interface. It can also glue in additional features like NTP. It is essential for any infrastructure that wants single sign on, and unified identity management and security. It is a key piece of infrastructure since you can use it as a cornerstone, and build out your infrastructures from that centrepiece. (I hope to make the puppet-ipa module at least half as good as what the authors have done with FreeIPA core.)

Mechanism:

Passing a secret into the FreeIPA server for installation is simply not possible without it touching puppet. The way I work around this limitation is by generating the dm_password on the FreeIPA server at install time! This typically looks like:

/usr/sbin/ipa-server-install --hostname='ipa.example.com' --domain='example.com' --realm='EXAMPLE.COM' --ds-password=`/usr/bin/pwgen 16 1 | /usr/bin/tee >( /usr/bin/gpg --homedir '/var/lib/puppet/tmp/ipa/gpg/' --encrypt --trust-model always --recipient '24090D66' > '/var/lib/puppet/tmp/ipa/gpg/dm_password.gpg' ) | /bin/cat | /bin/cat` --admin-password=`/usr/bin/pwgen 16 1 | /usr/bin/tee >( /usr/bin/gpg --homedir '/var/lib/puppet/tmp/ipa/gpg/' --encrypt --trust-model always --recipient '24090D66' > '/var/lib/puppet/tmp/ipa/gpg/admin_password.gpg' ) | /bin/cat | /bin/cat` --idstart=16777216 --no-ntp --selfsign --unattended

This command is approximately what puppet generates. The interesting part is:

--ds-password=`/usr/bin/pwgen 16 1 | /usr/bin/tee >( /usr/bin/gpg --homedir '/var/lib/puppet/tmp/ipa/gpg/' --encrypt --trust-model always --recipient '24090D66' > '/var/lib/puppet/tmp/ipa/gpg/dm_password.gpg' ) | /bin/cat | /bin/cat`

If this is hard to follow, here is the synopsis:

  1. The pwgen command is used generate a password.
  2. The password is used for installation.
  3. The password is encrypted with the users GPG key and saved to a file for retrieval.
  4. The encrypted password is (optionally) sent out via email to the admin.

Note that the email portion wasn’t shown since it makes the command longer.

Where did my GPG key come from?

Any respectable FreeIPA admin should already have their own GPG key. If they don’t, they probably shouldn’t be managing a security appliance. You can either pass the public key to gpg_publickey or specify a keyserver with gpg_keyserver. In either case you must supply a valid recipient (-r) string to gpg_recipient. In my case, I use my keyid of 24090D66, which can be used to find my key on the public keyservers. In either case, puppet knows how to import it and use it correctly. A security audit is welcome!

You’ll be pleased to know that I deliberately included the options to use your own keyserver, or to specify your public key manually if you don’t want it stored on any key servers.

But, I want a different password!

It’s recommended that you use the secure password that has been generated for you. There are a few options if you don’t like this approach:

  • The puppet module allows you to specify the password as a string. This isn’t recommended, but it is useful for testing and compatibility with legacy puppet environments that don’t care about security.
  • You can use the secure password initially to authenticate with your FreeIPA server, and then change the password to the one you desire. Doing this is outside the scope of this article, and you should consult the FreeIPA documentation.
  • You can use puppet to regenerate a new password for you. This hasn’t been implemented yet, but will be coming eventually.
  • You can use the interactive password helper. This takes the place of the pwgen command. This will be implemented if there is enough demand. During installation, the admin will be able to connect to a secure console to specify the password.

Other suggestions will be considered.

What about the admin password?

The admin_password is generated following the same process that was used for the dm_password. The chance that the two passwords match is probably about:

1/((((26*2)+10)^16)^2) = ~4.4e-58

In other words, very unlikely.

Testing this easily:

Testing this out is quite straightforward. This process has been integrated with vagrant for easy testing. Start by setting up vagrant if you haven’t already:

Vagrant on Fedora with libvirt (reprise)

Once you are comfortable with vagrant, follow these steps for using Puppet-IPA:

git clone --recursive https://github.com/purpleidea/puppet-ipa
cd vagrant/
vagrant status
# edit the puppet-ipa.yaml file to add your keyid in the recipient field
# if you do not add a keyid, then a password of 'password' will be used
# this default is only used in the vagrant development environment
vagrant up puppet
vagrant up ipa

You should now have a working FreeIPA server. Login as root with:

vscreen root@ipa

yay!

Hope you enjoyed this.

Happy hacking,

James

 


Securely managing secrets for FreeIPA with Puppet

Configuration management is an essential part of securing your infrastructure because it can make sure that it is set up correctly. It is essential that configuration management only enhance security, and not weaken it. Unfortunately, the status-quo of secret management in puppet is pretty poor.

In the worst (and most common) case, plain text passwords are found in manifests. If the module author tried harder, sometimes these password strings are pre-hashed (and sometimes salted) and fed directly into the consumer. (This isn’t always possible without modifying the software you’re managing.)

On better days, these strings are kept separate from the code in unencrypted yaml files, and if the admin is smart enough to store their configurations in git, they hopefully separated out the secrets into a separate repository. Of course none of these solutions are very convincing to someone who puts security at the forefront.

This article describes how I use puppet to correctly and securely setup FreeIPA.

Background:

FreeIPA is an excellent piece of software that combines LDAP and Kerberos with an elegant web ui and command line interface. It can also glue in additional features like NTP. It is essential for any infrastructure that wants single sign on, and unified identity management and security. It is a key piece of infrastructure since you can use it as a cornerstone, and build out your infrastructures from that centrepiece. (I hope to make the puppet-ipa module at least half as good as what the authors have done with FreeIPA core.)

Mechanism:

Passing a secret into the FreeIPA server for installation is simply not possible without it touching puppet. The way I work around this limitation is by generating the dm_password on the FreeIPA server at install time! This typically looks like:

/usr/sbin/ipa-server-install --hostname='ipa.example.com' --domain='example.com' --realm='EXAMPLE.COM' --ds-password=`/usr/bin/pwgen 16 1 | /usr/bin/tee >( /usr/bin/gpg --homedir '/var/lib/puppet/tmp/ipa/gpg/' --encrypt --trust-model always --recipient '24090D66' > '/var/lib/puppet/tmp/ipa/gpg/dm_password.gpg' ) | /bin/cat | /bin/cat` --admin-password=`/usr/bin/pwgen 16 1 | /usr/bin/tee >( /usr/bin/gpg --homedir '/var/lib/puppet/tmp/ipa/gpg/' --encrypt --trust-model always --recipient '24090D66' > '/var/lib/puppet/tmp/ipa/gpg/admin_password.gpg' ) | /bin/cat | /bin/cat` --idstart=16777216 --no-ntp --selfsign --unattended

This command is approximately what puppet generates. The interesting part is:

--ds-password=`/usr/bin/pwgen 16 1 | /usr/bin/tee >( /usr/bin/gpg --homedir '/var/lib/puppet/tmp/ipa/gpg/' --encrypt --trust-model always --recipient '24090D66' > '/var/lib/puppet/tmp/ipa/gpg/dm_password.gpg' ) | /bin/cat | /bin/cat`

If this is hard to follow, here is the synopsis:

  1. The pwgen command is used generate a password.
  2. The password is used for installation.
  3. The password is encrypted with the users GPG key and saved to a file for retrieval.
  4. The encrypted password is (optionally) sent out via email to the admin.

Note that the email portion wasn’t shown since it makes the command longer.

Where did my GPG key come from?

Any respectable FreeIPA admin should already have their own GPG key. If they don’t, they probably shouldn’t be managing a security appliance. You can either pass the public key to gpg_publickey or specify a keyserver with gpg_keyserver. In either case you must supply a valid recipient (-r) string to gpg_recipient. In my case, I use my keyid of 24090D66, which can be used to find my key on the public keyservers. In either case, puppet knows how to import it and use it correctly. A security audit is welcome!

You’ll be pleased to know that I deliberately included the options to use your own keyserver, or to specify your public key manually if you don’t want it stored on any key servers.

But, I want a different password!

It’s recommended that you use the secure password that has been generated for you. There are a few options if you don’t like this approach:

  • The puppet module allows you to specify the password as a string. This isn’t recommended, but it is useful for testing and compatibility with legacy puppet environments that don’t care about security.
  • You can use the secure password initially to authenticate with your FreeIPA server, and then change the password to the one you desire. Doing this is outside the scope of this article, and you should consult the FreeIPA documentation.
  • You can use puppet to regenerate a new password for you. This hasn’t been implemented yet, but will be coming eventually.
  • You can use the interactive password helper. This takes the place of the pwgen command. This will be implemented if there is enough demand. During installation, the admin will be able to connect to a secure console to specify the password.

Other suggestions will be considered.

What about the admin password?

The admin_password is generated following the same process that was used for the dm_password. The chance that the two passwords match is probably about:

1/((((26*2)+10)^16)^2) = ~4.4e-58

In other words, very unlikely.

Testing this easily:

Testing this out is quite straightforward. This process has been integrated with vagrant for easy testing. Start by setting up vagrant if you haven’t already:

Vagrant on Fedora with libvirt (reprise)

Once you are comfortable with vagrant, follow these steps for using Puppet-IPA:

git clone --recursive https://github.com/purpleidea/puppet-ipa
cd vagrant/
vagrant status
# edit the puppet-ipa.yaml file to add your keyid in the recipient field
# if you do not add a keyid, then a password of 'password' will be used
# this default is only used in the vagrant development environment
vagrant up puppet
vagrant up ipa

You should now have a working FreeIPA server. Login as root with:

vscreen root@ipa

yay!

Hope you enjoyed this.

Happy hacking,

James

 


Jenkins, Puppet, Graphite, Logstash and YOU

This is a repost of an article I wrote for the Acquia Blog some time ago.

As mentioned before, devops can be summarized by talking about culture, automation, monitoring metrics and sharing. Although devops is not about tooling, there are a number of open source tools out there that will be able to help you achieve your goals. Some of those tools will also enable better communication between your development and operations teams.

When we talk about Continuous Integration and Continuous Deployment we need a number of tools to help us there. We need to be able to build reproducible artifacts which we can test. And we need a reproducible infrastructure which we can manage in a fast and sane way. To do that we need a Continuous Integration framework like Jenkins.

Formerly known as Hudson, Jenkins has been around for a while. The open source project was initially very popular in the Java community but has now gained popularity in different environments. Jenkins allows you to create reproducible Build and Test scenarios and perform reporting on those. It will provide you with a uniform and managed way to , Build, Test, Release and Trigger the deployment of new Artifacts, both traditional software and infrastructure as code-based projects. Jenkins has a vibrant community that builds new plugins for the tool in different kinds of languages. People use it to build their deployment pipelines, automatically check out new versions of the source code, syntax test it and style test it. If needed, users can compile the software, triggering unit tests, uploading a tested artifact into a repository so it is ready to be deployed on a new platform level.

Jenkins then can trigger an automated way to deploy the tested software on its new target platform. Whether that be development, testing, user acceptance or production is just a parameter. Deployment should not be something we try first in production, it should be done the same on all platforms. The deltas between these platforms should be managed using a configuration management tool such as Puppet, Chef or friends.

In a way this means that Infrastructure as code is a testing dependency, as you also want to be able to deploy a platform to exactly the same state as it was before you ran your tests, so that you can compare the test results of your test runs and make sure they are correct. This means you need to be able to control the starting point of your test and tools like Puppet and Chef can help you here. Which tool you use is the least important part of the discussion, as the important part is that you adopt one of the tools and start treating your infrastructure the same way as you treat your code base: as a tested, stable, reproducible piece of software that you can deploy over and over in a predictable fashion.

Configuration management tools such as Puppet, Chef, CFengine are just a part of the ecosystem and integration with Orchestration and monitoring tools is needed as you want feedback on how your platform is behaving after the changes have been introduced. Lots of people measure the impact of a new deploy, and then we obviously move to the M part of CAMS.

There, Graphite is one of the most popular tools to store metrics. Plenty of other tools in the same area tried to go where Graphite is going , but both on flexibility, scalability and ease of use, not many tools allow developers and operations people to build dashboards for any metric they can think of in a matter of seconds.

Just sending a keyword, a timestamp and a value to the Graphite platform provides you with a large choice of actions that can be done with that metric. You can graph it, transform it, or even set an alert on it. Graphite takes out the complexity of similar tools together with an easy to use API for developers so they can integrate their own self service metrics into dashboards to be used by everyone.

One last tool that deserves our attention is Logstash. Initially just a tool to aggregate, index and search the log files of our platform, it is sometimes a huge missed source of relevant information about how our applications behave.. Logstash and it's Kibana+ElasticSearch ecosystem are now quickly evolving into a real time analytics platform. Implementing the Collect, Ship+Transform, Store and Display pattern we see emerge a lot in the #monitoringlove community. Logstash now allows us to turn boring old logfiles that people only started searching upon failure into valuable information that is being used by product owners and business manager to learn from on the behavior of their users.

Together with the Graphite-based dashboards we mentioned above, these tools help people start sharing their information and communicate better. When thinking about these tools, think about what you are doing, what goals you are trying to reach and where you need to improve. Because after all, devops is not solving a technical problem, it's trying to solve a business problem and bringing better value to the end user at a more sustainable pace. And in that way the biggest tool we need to use is YOU, as the person who enables communication.

Why Does DevOps Matter?

This is a repost of an article I wrote for the Acquia Blog some time ago.

People often ask, why does DevOps matter?

The honest answer to that question is...because having the development and operations team work together is the only way IT is successful.

Over the past few decades I've worked in different environments that include: small web start ups, big pharmaceutical companies, hardware engineering shops and large software companies and banks. All were trying different approaches to deliver quality software to their end users, customers, but most of them were failing badly.

Operations people were being pulled in at the last minute. A marketing campaign needed to go live at 5 p.m. because that's when the first radio commercial was scheduled to be broadcasted. At 11 a.m., the operations people still didn't know the campaign existed.

It was always the other person’s fault. Waterfall projects and large PID documents were the solution to all the problems. But people learned; they figured out that we can't expect humans to predict how long it would take to implement something they have never done before. Unfortunately, even today, only a small set of people understand the value of being agile and that we cannot break a project down to its granular details without factoring in the “unpredictable.” The key element here is the “uncertainty” of the many project pieces.

So on came the agile movement and software development became much smoother.
People agreed on time boxing a reasonable set of work that would result in delivering useful functionality in frequent batches. Yet, on the day of deployment, all hell breaks loose because someone forgot to loop in the Ops team.

This is where my personal experience differs from a lot of others, because I was part of a development team building a product where the developers were sitting right next to the system administration team. Within sprints, our DevOps team was building both system features and application features, making the application highly available was a story on the board next to an actual end user feature.

In the old days, a new feature that was scheduled for Friday couldn't be brought online for a couple of days because it couldn't be deployed to production. In the new setup, deploying to production was a no brainer as we had already tested the automated deployment to the acceptance platform.

This brings us to the first benefit : Actually being able to go live.

The next problem came on a Wednesday evening. A major security issue had popped up in Drupal and an upgrade needed to be performed, however nobody dared to perform the upgrade as they were afraid of breaking the site. Some people had made changes, they hadn't put their config back in code base, and thus the site didn't get updated. This is the typical state of the majority of any type of website where people build something, deploy it and never look back. This is the case until disaster strikes and it hits the evening news.

Teams then learn that not only do they need to implement features and put their config changes in code, but also do continuous integration testing on their sites.

From doing continuous integration, they go to continuous delivery and continuous deployment, where an upgrade isn't a risk anymore but a normal event which happens automatically when all the tests are green. By implementing infrastructure as code, they now have achieved 2 goals. By implementing tests, we build the confidence that the code was working, but also made sure that the number of defects in that code base went down so the number of times people needed to dig back into old code to fix issue also came down.

By delivering better software in a much more regular way, it enables the security issues to be fixed faster, but also brings new features to market faster. With faster, we often mean that there is an change from releasing software on a bi-yearly basis to a release each sprint, to a release whenever a commit has passed a number of test criteria.

Because they started to involve other stakeholders, the value of their application grew as they had faster feedback and better usage statistics. The faster feedback meant that they weren't spending as much time on features nobody used, but focusing their efforts on things that mattered.

Having other stakeholders like systems and security teams involved with early metrics and taking in the non functional requirements into the backlog planning meant that the stability of the platform was growing. Rather than people spending hours and nights fixing production problems, Potential issues are now being tackled upfront because of the
communication between devs and ops. Also, scale and high availability have been built into the application upfront, rather than afterwards -- when it is too late.

So, in the end it comes down to the most important part, which is that devops creates more happiness. It creates more happy customers, developers, operations teams, managers, and investors and for a lot of people it improves not only application quality, but also their life quality.

The Rise of the DevOps movement

This is a repost of an article I wrote for the Acquia Blog some time ago.

DevOps, DevOps, DevOps … the whole world is talking about DevOps, but what is DevOps?

Since Munich 2012, DrupalCon had a dedicated devops track. After talking to
a lot of people in Prague last month, I realized that the concept of DevOps is still very unclear to a lot of developers. To a large part of the development community, DevOps development still means folks working on 'the infrastructure part' of the development life cycle and for some it just means simply deploying Drupal, being concerned about purely keeping the site alive etc.

Obviously that's not what DevOps is about, so let's take a step back and find out how it all started.

Like all good things, Drupal included, DevOps is a Belgian thing!

Back in 2009 DevopsDays Europe was created because a group of people met over and over again at different conferences throughout the world and didn’t have a common devops conference to go to. These individuals would talk about software delivery, deployment, build, scale, clustering, management, failure, monitoring and all the important things one needs to think about when running a modern web operation. These folks included Patrick Debois, Julian Simpson, Gildas Le Nadan, Jezz Humble, Chris Read, Matt Rechenburg , John Willis, Lindsay Holmswood and me - Kris Buytaert.

O’Reilly created a conference called, “Velocity,” and that sounded interesting to a bunch of us Europeans, but on our side of the ocean we had to resort to the existing Open Source, Unix, and Agile conferences. We didn't really have a common meeting ground yet. At CloudCamp Antwerp, in the Antwerp Zoo, I started talking to Patrick Debois about ways to fill this gap.

Many different events and activities like John Allspaw and Paul Hammond’s talk at “Velocity”, multiple twitter discussions influenced Patrick to create a DevOps specific event in Gent, which became the very first ‘DevopsDays'. DevopsDays Gent was not your traditional conference, it was a mix between a couple of formal presentations in the morning and open spaces in the afternoon. And those open spaces were where people got most value. The opportunity to talk to people with the same complex problems, with actual experiences in solving them, with stories both about success and failure etc. How do you deal with that oldskool system admin that doesn’t understand what configuration management can bring him? How do you do Kanban for operations while the developers are working in 2 week sprints? What tools do you use to monitor a highly volatile and expanding infrastructure?

From that very first DevopsDays in Gent several people spread out to organize other events John Willis and Damon Edwards started organizing DevopsDays Mountain View, and the European Edition started touring Europe. It wasn’t until this year that different local communities started organizing their own local DevopsDays, e.g in Atlanta, Portland, Austin, Berlin, Paris, Amsterdam, London, Barcelona and many more.

From this group of events a community has grown of people that care about bridging the gap between development and operations, a community of people that cares about delivering holistic business value to their organization.

As a community, we have realized that there needs to be more communication between the different stakeholders in an IT project lifecycle - business owners, developers, operations, network engineers, security engineers – everybody needs to be involved as soon as possible in the project in order to help each other and talk about solving potential pitfalls ages before the application goes live. And when it goes live the communication needs to stay alive too.. We need to talk about maintaining the application, scaling it, keeping it secure . Just think about how many Drupal sites are out there vulnerable to attackers because the required security updates have never been implemented. Why does this happen? It could be because many developers don't try to touch the site anymore..because they are afraid of breaking it.

And this is where automation will help.. if we can do automatic deployments and upgrades of a site because it is automatically tested when developers push their code, upgrading won't be that difficult of a task. Typically when people only update once in 6 months, its a painful and difficult process but when its automated and done regularly, it makes life so much easier.

This ultimately comes down to the idea that the involvement of developers doesn’t end at their last commit. Collaboration is key which allows every developer to play a key role in keeping the site up and running, for more happy users. After all software with no users has no value. The involvement of the developers in the ongoing operations of their software shouldn't end before the last end user stops using their applications.

In order to keep users happy we need to get feedback and metrics, starting from the very first phases of development all the way up to production. It means we need to monitor both our application and infrastructure and get metrics from all possible aspects, with that feedback we can learn about potential problems but also about successes.

Finally, summarizing this in an acronym coined by John Willis and Damon Edwards
- CAMS. CAMS says Devops is about Culture, Automation, Measurement and Sharing.
Getting the discussion going on how to do all of that, more specifically in a Drupal environment, is the sharing part .

Hiera data in modules and OS independent puppet

Earlier this year, R.I.Pienaar released his brilliant data in modules hack, a few months ago, I got the chance to start implementing it in Puppet-Gluster, and today I have found the time to blog about it.

What is it?

R.I.’s hack lets you store hiera data inside a puppet module. This can have many uses including letting you throw out the nested mess that is commonly params.pp, and replace it with something file based that is elegant and hierarchical. For my use case, I’m using it to build OS independent puppet modules, without storing this data as code. The secondary win is that porting your module to a new GNU/Linux distribution or version could be as simple as adding a YAML file.

How does it work?

(For the specifics on the hack in general, please read R.I. Pienaar’s blog post. After you’re comfortable with that, please continue…)

In the hiera.yaml data/ hierarchy, I define an OS / version structure that should probably cover all use cases. It looks like this:

---
:hierarchy:
- tree/%{::osfamily}/%{::operatingsystem}/%{::operatingsystemrelease}
- tree/%{::osfamily}/%{::operatingsystem}
- tree/%{::osfamily}
- common

At the bottom, you can specify common data, which can be overridden by OS family specific data (think RedHat “like” vs. Debian “like”), which can be overridden with operating system specific data (think CentOS vs. Fedora), which can finally be overridden with operating system version specific data (think RHEL6 vs. RHEL7).

Grouping the commonalities near the bottom of the tree, avoids duplication, and makes it possible to support new OS versions with fewer changes. It would be especially cool if someone could write a script to refactor commonalities downwards, and to refactor new uniqueness upwards.

This is an except of the Fedora specific YAML file:

gluster::params::package_glusterfs_server: 'glusterfs-server'
gluster::params::program_mkfs_xfs: '/usr/sbin/mkfs.xfs'
gluster::params::program_mkfs_ext4: '/usr/sbin/mkfs.ext4'
gluster::params::program_findmnt: '/usr/bin/findmnt'
gluster::params::service_glusterd: 'glusterd'
gluster::params::misc_gluster_reload: '/usr/bin/systemctl reload glusterd'

Since we use full paths in Puppet-Gluster, and since they are uniquely different in Fedora (no more: /bin) it’s nice to specify them all here. The added advantage is that you can easily drop in different versions of these utilities if you want to test a patched release without having to edit your system utilities. In addition, you’ll see that the OS specific RPM package name and service names are in here too. On a Debian system, they are usually different.

Dependencies:

This depends on Puppet >= 3.x and having the puppet-module-data module included. I do so for integration with vagrant like so.

Should I still use params.pp?

I think that this answer is yes. I use a params.pp file with a single class specifying all the defaults:

class gluster::params(
    # packages...
    $package_glusterfs_server = 'glusterfs-server',

    $program_mkfs_xfs = '/sbin/mkfs.xfs',
    $program_mkfs_ext4 = '/sbin/mkfs.ext4',

    # services...
    $service_glusterd = 'glusterd',

    # misc...
    $misc_gluster_reload = '/sbin/service glusterd reload',

    # comment...
    $comment = ''
) {
    if "${comment}" == '' {
        warning('Unable to load yaml data/ directory!')
    }

    # ...

}

In my data/common.yaml I include a bogus comment canary so that I can trigger a warning if the data in modules module isn’t working. This shouldn’t be a fail as long as you want to allow backwards compatibility, otherwise it should be! The defaults I use correspond to the primary OS I hack and use this module with, which in this case is CentOS 6.x.

To use this data in your module, include the params.pp file, and start using it. Example:

include gluster::params
package { "${::gluster::params::package_glusterfs_server}":
    ensure => present,
}

Unfortunately the readability isn’t nearly as nice as it is without this, however it’s an essential evil, due to the puppet language limitations.

Common patterns:

There are a few common code patterns, which you might need for this technique. The first few, I’ve already mentioned above. These are the tree layout in hiera.yaml, the comment canary, and the params.pp defaults. There’s one more that you might find helpful…

The split package pattern:

Certain packages are split into multiple pieces on some operating systems, and grouped together on others. This means there isn’t always a one-to-one mapping between the data and the package type. For simple cases you can use a hiera array:

# this hiera value could be an array of strings...
package { $::some_module::params::package::some_package_list:
    ensure => present,
    alias => 'some_package',
}
service { 'foo':
    require => Package['some_package'],
}

For this to work you must always define at least one element in the array. For more complex cases you might need to test for the secondary package in the split:

if "${::some_module::params::package::some_package}" != '' {
    package { "${::some_module::params::package::some_package}":
        ensure => present,
        alias => 'some_package', # or use the $name and skip this
    }
}

service { 'foo':
    require => "${::some_module::params::package::some_package}" ? {
        '' => undef,
        default => Package['some_package'],
    },
}

This pattern is used in Puppet-Gluster in more than one place. It turns out that it’s also useful when optional python packages get pulled into the system python. (example)

Hopefully you found this useful. Please help increase the multi-os aspect of Puppet-Gluster by submitting patches to the YAML files, and by testing it on your favourite GNU/Linux distro!

Happy hacking!

James