↓ Archives ↓

Hiera data in modules and OS independent puppet

Earlier this year, R.I.Pienaar released his brilliant data in modules hack, a few months ago, I got the chance to start implementing it in Puppet-Gluster, and today I have found the time to blog about it.

What is it?

R.I.’s hack lets you store hiera data inside a puppet module. This can have many uses including letting you throw out the nested mess that is commonly params.pp, and replace it with something file based that is elegant and hierarchical. For my use case, I’m using it to build OS independent puppet modules, without storing this data as code. The secondary win is that porting your module to a new GNU/Linux distribution or version could be as simple as adding a YAML file.

How does it work?

(For the specifics on the hack in general, please read R.I. Pienaar’s blog post. After you’re comfortable with that, please continue…)

In the hiera.yaml data/ hierarchy, I define an OS / version structure that should probably cover all use cases. It looks like this:

---
:hierarchy:
- tree/%{::osfamily}/%{::operatingsystem}/%{::operatingsystemrelease}
- tree/%{::osfamily}/%{::operatingsystem}
- tree/%{::osfamily}
- common

At the bottom, you can specify common data, which can be overridden by OS family specific data (think RedHat “like” vs. Debian “like”), which can be overridden with operating system specific data (think CentOS vs. Fedora), which can finally be overridden with operating system version specific data (think RHEL6 vs. RHEL7).

Grouping the commonalities near the bottom of the tree, avoids duplication, and makes it possible to support new OS versions with fewer changes. It would be especially cool if someone could write a script to refactor commonalities downwards, and to refactor new uniqueness upwards.

This is an except of the Fedora specific YAML file:

gluster::params::package_glusterfs_server: 'glusterfs-server'
gluster::params::program_mkfs_xfs: '/usr/sbin/mkfs.xfs'
gluster::params::program_mkfs_ext4: '/usr/sbin/mkfs.ext4'
gluster::params::program_findmnt: '/usr/bin/findmnt'
gluster::params::service_glusterd: 'glusterd'
gluster::params::misc_gluster_reload: '/usr/bin/systemctl reload glusterd'

Since we use full paths in Puppet-Gluster, and since they are uniquely different in Fedora (no more: /bin) it’s nice to specify them all here. The added advantage is that you can easily drop in different versions of these utilities if you want to test a patched release without having to edit your system utilities. In addition, you’ll see that the OS specific RPM package name and service names are in here too. On a Debian system, they are usually different.

Dependencies:

This depends on Puppet >= 3.x and having the puppet-module-data module included. I do so for integration with vagrant like so.

Should I still use params.pp?

I think that this answer is yes. I use a params.pp file with a single class specifying all the defaults:

class gluster::params(
    # packages...
    $package_glusterfs_server = 'glusterfs-server',

    $program_mkfs_xfs = '/sbin/mkfs.xfs',
    $program_mkfs_ext4 = '/sbin/mkfs.ext4',

    # services...
    $service_glusterd = 'glusterd',

    # misc...
    $misc_gluster_reload = '/sbin/service glusterd reload',

    # comment...
    $comment = ''
) {
    if "${comment}" == '' {
        warning('Unable to load yaml data/ directory!')
    }

    # ...

}

In my data/common.yaml I include a bogus comment canary so that I can trigger a warning if the data in modules module isn’t working. This shouldn’t be a fail as long as you want to allow backwards compatibility, otherwise it should be! The defaults I use correspond to the primary OS I hack and use this module with, which in this case is CentOS 6.x.

To use this data in your module, include the params.pp file, and start using it. Example:

include gluster::params
package { "${::gluster::params::package_glusterfs_server}":
    ensure => present,
}

Unfortunately the readability isn’t nearly as nice as it is without this, however it’s an essential evil, due to the puppet language limitations.

Common patterns:

There are a few common code patterns, which you might need for this technique. The first few, I’ve already mentioned above. These are the tree layout in hiera.yaml, the comment canary, and the params.pp defaults. There’s one more that you might find helpful…

The split package pattern:

Certain packages are split into multiple pieces on some operating systems, and grouped together on others. This means there isn’t always a one-to-one mapping between the data and the package type. For simple cases you can use a hiera array:

# this hiera value could be an array of strings...
package { $::some_module::params::package::some_package_list:
    ensure => present,
    alias => 'some_package',
}
service { 'foo':
    require => Package['some_package'],
}

For this to work you must always define at least one element in the array. For more complex cases you might need to test for the secondary package in the split:

if "${::some_module::params::package::some_package}" != '' {
    package { "${::some_module::params::package::some_package}":
        ensure => present,
        alias => 'some_package', # or use the $name and skip this
    }
}

service { 'foo':
    require => "${::some_module::params::package::some_package}" ? {
        '' => undef,
        default => Package['some_package'],
    },
}

This pattern is used in Puppet-Gluster in more than one place. It turns out that it’s also useful when optional python packages get pulled into the system python. (example)

Hopefully you found this useful. Please help increase the multi-os aspect of Puppet-Gluster by submitting patches to the YAML files, and by testing it on your favourite GNU/Linux distro!

Happy hacking!

James

 


Hiera data in modules and OS independent puppet

Earlier this year, R.I.Pienaar released his brilliant data in modules hack, a few months ago, I got the chance to start implementing it in Puppet-Gluster, and today I have found the time to blog about it.

What is it?

R.I.’s hack lets you store hiera data inside a puppet module. This can have many uses including letting you throw out the nested mess that is commonly params.pp, and replace it with something file based that is elegant and hierarchical. For my use case, I’m using it to build OS independent puppet modules, without storing this data as code. The secondary win is that porting your module to a new GNU/Linux distribution or version could be as simple as adding a YAML file.

How does it work?

(For the specifics on the hack in general, please read R.I. Pienaar’s blog post. After you’re comfortable with that, please continue…)

In the hiera.yaml data/ hierarchy, I define an OS / version structure that should probably cover all use cases. It looks like this:

---
:hierarchy:
- tree/%{::osfamily}/%{::operatingsystem}/%{::operatingsystemrelease}
- tree/%{::osfamily}/%{::operatingsystem}
- tree/%{::osfamily}
- common

At the bottom, you can specify common data, which can be overridden by OS family specific data (think RedHat “like” vs. Debian “like”), which can be overridden with operating system specific data (think CentOS vs. Fedora), which can finally be overridden with operating system version specific data (think RHEL6 vs. RHEL7).

Grouping the commonalities near the bottom of the tree, avoids duplication, and makes it possible to support new OS versions with fewer changes. It would be especially cool if someone could write a script to refactor commonalities downwards, and to refactor new uniqueness upwards.

This is an except of the Fedora specific YAML file:

gluster::params::package_glusterfs_server: 'glusterfs-server'
gluster::params::program_mkfs_xfs: '/usr/sbin/mkfs.xfs'
gluster::params::program_mkfs_ext4: '/usr/sbin/mkfs.ext4'
gluster::params::program_findmnt: '/usr/bin/findmnt'
gluster::params::service_glusterd: 'glusterd'
gluster::params::misc_gluster_reload: '/usr/bin/systemctl reload glusterd'

Since we use full paths in Puppet-Gluster, and since they are uniquely different in Fedora (no more: /bin) it’s nice to specify them all here. The added advantage is that you can easily drop in different versions of these utilities if you want to test a patched release without having to edit your system utilities. In addition, you’ll see that the OS specific RPM package name and service names are in here too. On a Debian system, they are usually different.

Dependencies:

This depends on Puppet >= 3.x and having the puppet-module-data module included. I do so for integration with vagrant like so.

Should I still use params.pp?

I think that this answer is yes. I use a params.pp file with a single class specifying all the defaults:

class gluster::params(
    # packages...
    $package_glusterfs_server = 'glusterfs-server',

    $program_mkfs_xfs = '/sbin/mkfs.xfs',
    $program_mkfs_ext4 = '/sbin/mkfs.ext4',

    # services...
    $service_glusterd = 'glusterd',

    # misc...
    $misc_gluster_reload = '/sbin/service glusterd reload',

    # comment...
    $comment = ''
) {
    if "${comment}" == '' {
        warning('Unable to load yaml data/ directory!')
    }

    # ...

}

In my data/common.yaml I include a bogus comment canary so that I can trigger a warning if the data in modules module isn’t working. This shouldn’t be a fail as long as you want to allow backwards compatibility, otherwise it should be! The defaults I use correspond to the primary OS I hack and use this module with, which in this case is CentOS 6.x.

To use this data in your module, include the params.pp file, and start using it. Example:

include gluster::params
package { "${::gluster::params::package_glusterfs_server}":
    ensure => present,
}

Unfortunately the readability isn’t nearly as nice as it is without this, however it’s an essential evil, due to the puppet language limitations.

Common patterns:

There are a few common code patterns, which you might need for this technique. The first few, I’ve already mentioned above. These are the tree layout in hiera.yaml, the comment canary, and the params.pp defaults. There’s one more that you might find helpful…

The split package pattern:

Certain packages are split into multiple pieces on some operating systems, and grouped together on others. This means there isn’t always a one-to-one mapping between the data and the package type. For simple cases you can use a hiera array:

# this hiera value could be an array of strings...
package { $::some_module::params::package::some_package_list:
    ensure => present,
    alias => 'some_package',
}
service { 'foo':
    require => Package['some_package'],
}

For this to work you must always define at least one element in the array. For more complex cases you might need to test for the secondary package in the split:

if "${::some_module::params::package::some_package}" != '' {
    package { "${::some_module::params::package::some_package}":
        ensure => present,
        alias => 'some_package', # or use the $name and skip this
    }
}

service { 'foo':
    require => "${::some_module::params::package::some_package}" ? {
        '' => undef,
        default => Package['some_package'],
    },
}

This pattern is used in Puppet-Gluster in more than one place. It turns out that it’s also useful when optional python packages get pulled into the system python. (example)

Hopefully you found this useful. Please help increase the multi-os aspect of Puppet-Gluster by submitting patches to the YAML files, and by testing it on your favourite GNU/Linux distro!

Happy hacking!

James

 


Monitorama PDX & my metrics 2.0 presentation

Earlier this month we had another iteration of the Monitorama conference, this time in Portland, Oregon.


(photo by obfuscurity)

I think the conference was better than the first one in Boston, much more to learn. Also this one was quite focused on telemetry (timeseries metrics processing), lots of talks on timeseries analytics, not so much about things like sensu or nagios. Adrian Cockroft's keynote brought some interesting ideas to the table, like building a feedback loop into the telemetry to drive infrastructure changes (something we do at Vimeo, I briefly give an example in the intro of my talk) or shortening the time from fault to alert (which I'm excited to start working on soon)
My other favorite was Noah Kantrowitz's talk about applying audio DSP techniques to timeseries, I always loved audio processing and production. Combining these two interests hadn't occurred to me so now I'm very excited about the applications.
The opposite, but just as interesting of an idea - conveying information about system state as an audio stream - came up in puppetlab's monitorama recap and that seems to make a lot of sense as well. There's a lot of information in a stream of sound, it is much denser than text, icons and perhaps even graph plots. Listening to an audio stream that's crafted to represent various information might be a better way to get insights into your system.


I'm happy to see the idea reinforced that telemetry is a key part of modern monitoring. For me personally, telemetry (the tech and the process) is the most fascinating part of modern technical operations, and I'm glad to be part of the movement pushing this forward. There's also a bunch of startups in the space (many stealthy ones), validating the market. I'm curious to see how this will play out.


I had the privilege to present metrics 2.0 and Graph-Explorer. As usual the slides are on slideshare and the footage on the better video sharing platform ;-) .
I'm happy with all the positive feedback, although I'm not aware yet of other tools and applications adopting metrics 2.0, and I'm looking forward to see some more of that, because ultimately that's what will show if my ideas are any good.

Restarting GNOME shell via SSH

When GNOME shell breaks, you get to keep both pieces. The nice thing about shell failures in GNOME 3, is that if you’re able to do a restart, the active windows are mostly not disturbed. The common way to do this is to type ALT-F2, r, <ENTER>.

Unfortunately, you can’t always type that in if your shell is very borked. If you are lucky enough to have SSH access, and another machine, you can log in remotely and run this script:

#!/bin/bash
export DISPLAY=:0.0
{ `gnome-shell --replace &> /dev/null`; } < /dev/stdin &

This will restart the shell, but also allow you to disconnect from the terminal without killing the shell. If you’re not sure what I mean, try running gnome-shell --replace by itself, and then disconnect.

The script is available as a gist. Download it, put it inside your ~/bin/ and chmod u+x it. Hopefully you don’t need to use it!

Happy hacking,

James

 


Restarting GNOME shell via SSH

When GNOME shell breaks, you get to keep both pieces. The nice thing about shell failures in GNOME 3, is that if you’re able to do a restart, the active windows are mostly not disturbed. The common way to do this is to type ALT-F2, r, <ENTER>.

Unfortunately, you can’t always type that in if your shell is very borked. If you are lucky enough to have SSH access, and another machine, you can log in remotely and run this script:

#!/bin/bash
export DISPLAY=:0.0
{ `gnome-shell --replace &> /dev/null`; } < /dev/stdin &

This will restart the shell, but also allow you to disconnect from the terminal without killing the shell. If you’re not sure what I mean, try running gnome-shell --replace by itself, and then disconnect.

The script is available as a gist. Download it, put it inside your ~/bin/ and chmod u+x it. Hopefully you don’t need to use it!

Happy hacking,

James

 


On Graphite, Whisper and InfluxDB

Graphite, and the storage Achilles heel

Graphite is a neat timeseries metrics storage system that comes with a powerful querying api, mainly due to the whole bunch of available processing functions.
For medium to large setups, the storage aspect quickly becomes a pain point. Whisper, the default graphite storage format, is a simple storage format, using one file per metric (timeseries).
  • It can't keep all file descriptors in memory so there's a lot of overhead in constantly opening, seeking, and closing files, especially since usually one write comes in for all metrics at the same time.
  • Using the rollups feature (different data resolutions based on age) causes a lot of extra IO.
  • The format is also simply not optimized for writes. Carbon, the storage agent that sits in front of whisper has a feature to batch up writes to files to make them more sequential but this doesn't seem to help much.
  • Worse, due to various implementation details the carbon agent is surprisingly inefficient and cpu-bound. People often run into cpu limitations before they hit the io bottleneck. Once the writeback queue hits a certain size, carbon will blow up.
Common recommendations are to run multiple carbon agents and running graphite on SSD drives.
If you want to scale out across multiple systems, you can get carbon to shard metrics across multiple nodes, but the complexity can get out of hand and manually maintaining a cluster where nodes get added, fail, get phased out, need recovery, etc involves a lot of manual labor even though carbonate makes this easier. This is a path I simply don't want to go down.

These might be reasonable solutions based on the circumstances (often based on short-term local gains), but I believe as a community we should solve the problem at its root, so that everyone can reap the long term benefits.


In particular, running Ceres instead of whisper, is only a slight improvement, that suffers from most of the same problems. I don't see any good reason to keep working on Ceres, other than perhaps that it's a fun exercise. This probably explains the slow pace of development.
However, many mistakenly believe Ceres is "the future".
Switching to LevelDB seems much more sensible but IMHO still doesn't cut it as a general purpose, scalable solution.

The ideal backend

I believe we can build a backend for graphite that
  • can easily scale from a few metrics on my laptop in power-save mode to millions of metrics on a highly loaded cluster
  • supports nodes joining and leaving at runtime and automatically balancing the load across them
  • assures high availability and heals itself in case of disk or node failures
  • is simple to deploy. think: just run an executable that knows which directories it can use for storage, elasticsearch-style automatic clustering, etc.
  • has the right read/write optimizations. I've never seen a graphite system that is not write-focused, so something like LSM trees seems to make a lot of sense.
  • can leverage cpu resources (e.g. for compression)
  • provides a more natural model for phasing out data. Optional, runtime-changeable rollups. And an age limit (possibly, but not necessarily round robin)
While we're at it. pub-sub for realtime analytics would be nice too. Especially when it allows to use the same functions as the query api.
And getting rid of the metric name restrictions such as inability to use dots or slashes.

InfluxDB

There's a lot of databases that you could hook up to graphite. riak, hdfs based (opentsdb), Cassandra based (kairosdb, blueflood, cyanite), etc. Some of these are solid and production ready, and would make sense depending on what you already have and have experience with. I'm personally very interested in playing with Riak, but decided to choose InfluxDB as my first victim.

InfluxDB is a young project that will need time to build maturity, but is on track to meet all my goals very well. In particular, installing it is a breeze (no dependencies), it's specifically built for timeseries (not based on a general purpose database), which allows them to do a bunch of simplifications and optimizations, is write-optimized, and should meet my goals for scalability, performance, and availability well. And they're in NYC so meeting up for lunch has proven to be pretty fruitful for both parties. I'm pretty confident that these guys can pull off something big.

Technically, InfluxDB is a "timeseries, metrics, and analytics" databases with use cases well beyond graphite and even technical operations. Like the alternative databases, graphite-like behaviors such as rollups management and automatically picking the series in the most appropriate resolutions, is something to be implemented on top of it. Although you never know, it might end up being natively supported.

Graphite + InfluxDB

InfluxDB developers plan to implement a whole bunch of processing functions (akin to graphite, except they can do locality optimizations) and add a dashboard that talks to InfluxDB natively (or use Grafana), which means at some point you could completely swap graphite for InfluxDB. However, I think for quite a while, the ability to use the Graphite api, combine backends, and use various graphite dashboards is still very useful. So here's how my setup currently works:
  • carbon-relay-ng is a carbon relay in Go. It's a pretty nifty program to partition and manage carbon metrics streams. I use it in front of our traditional graphite system, and have it stream - in realtime - a copy of a subset of our metrics into InfluxDB. This way I basically have our unaltered Graphite system, and in parallel to it, InfluxDB, containing a subset of the same data.
    With a bit more work it will be a high performance alternative to the python carbon relay, allowing you to manage your streams on the fly. It doesn't support consistent hashing, because CH should be part of a strategy of a highly available storage system (see requirements above), using CH in the relay still results in a poor storage system, so there's no need for it.
  • I contributed the code to InfluxDB to make it listen on the carbon protocol. So basically, for the purpose of ingestion, InfluxDB can look and act just like a graphite server. Anything that can write to graphite, can now write to InfluxDB. (assuming the plain-text protocol, it doesn't support the pickle protocol, which I think is a thing to avoid anyway because almost nothing supports it and you can't debug what's going on)
  • graphite-api is a fork/clone of graphite-web, stripped of needless dependencies, stripped of the composer. It's conceived for many of the same reasons behind graphite-ng (graphite technical debt, slow development pace, etc) though it doesn't go to such extreme lengths and for now focuses on being a robust alternative for the graphite server, api-compatible, trivial to install and with a faster pace of development.
  • That's where graphite-influxdb comes in. It hooks InfluxDB into graphite-api, so that you can query the graphite api, but using data in InfluxDB. It should also work with the regular graphite, though I've never tried. (I have no incentive to bother with that, because I don't use the composer. And I think it makes more sense to move the composer into a separate project anyway).
With all these parts in place, I can run our dashboards next to each other - one running on graphite with whisper, one on graphite-api with InfluxDB - and simply look whether the returned data matches up, and which dashboards loads graphs faster. Later i might do more extensive benchmarking and acceptance testing.

If all goes well, I can make carbon-relay-ng fully mirror all data, make graphite-api/InfluxDB the primary, and turn our old graphite box into a live "backup". We'll need to come up with something for rollups and deletions of old data (although it looks like by itself influx is already more storage efficient than whisper too), and I'm really looking forward to the InfluxDB team building out the function api, having the same function api available for historical querying as well as realtime pub-sub. (my goal used to be implementing this in graphite-ng and/or carbon-relay-ng, but if they do this well, I might just abandon graphite-ng)

To be continued..

Vagrant on Fedora with libvirt (reprise)

Vagrant has become the de facto tool for devops. Faster iterations, clean environments, and less overhead. This isn’t an article about why you should use Vagrant. This is an article about how to get up and running with Vagrant on Fedora with libvirt easily!

Background:

This article is an update of my original Vagrant on Fedora with libvirt article. There is still lots of good information in that article, but this one should be easier to follow and uses updated versions of Vagrant and vagrant-libvirt.

Why vagrant-libvirt?

Vagrant ships by default with support for virtualbox. This makes sense as a default since it is available on Windows, Mac, and GNU/Linux. Real hackers use GNU/Linux, and in my opinion the best tool for GNU/Linux is vagrant-libvirt. Proprietary, closed source platforms aren’t hackable and therefore aren’t cool!

Another advantage to using the vagrant-libvirt plugin is that it plays nicely with the existing ecosystem of libvirt tools. You can use virsh, virt-manager, and guestfish alongside Vagrant, and if your development work needs to go into production, you can be confident in knowing that it was already tested on the same awesome KVM virtualization platform that your servers run.

Prerequisites:

Let’s get going. What do you need?

  • A Fedora 20 machine

I recommend hardware that supports VT extensions. Most does these days. This should also work with other GNU/Linux distro’s, but I haven’t tested them.

Installation:

I’m going to go through this in a logical hacking order. This means you could group all the yum install commands into a single execution at the beginning, but you would learn much less by doing so.

First install some of your favourite hacking dependencies. I did this on a minimal, headless F20 installation. You might want to add some of these too:

# yum install -y wget tree vim screen mtr nmap telnet tar git

Update the system to make sure it’s fresh:

# yum update -y

Download Vagrant version 1.5.4. No, don’t use the latest version, it probably won’t work! Vagrant has new releases practically as often as there are sunsets, and they typically cause lots of breakages.

$ wget https://dl.bintray.com/mitchellh/vagrant/vagrant_1.5.4_x86_64.rpm

and install it:

# yum install -y vagrant_1.5.4_x86_64.rpm

RVM installation:

In order to get vagrant-libvirt working, you’ll need some ruby dependencies. It turns out that RVM seems to be the best way to get exactly what you need. Use the sketchy RVM installer:

# \curl -sSL https://get.rvm.io | bash -s stable

If you don’t know why that’s sketchy, then you probably shouldn’t be hacking! I did that as root, but it probably works when you run it as a normal user. At this point rvm should be installed. The last important thing you’ll need to do is to add yourself to the rvm group. This is only needed if you installed rvm as root:

# usermod -aG rvm <username>

You’ll probably need to logout and log back in for this to take effect. Run:

$ groups

to make sure you can see rvm in the list. If you ran rvm as root, you’ll want to source the rvm.sh file:

$ source /etc/profile.d/rvm.sh

or simply use a new terminal. If you ran it as a normal user, I think RVM adds something to your ~/.bashrc. You might want to reload it:

$ source ~/.bashrc

At this point RVM should be working. Let’s see which ruby’s it can install:

$ rvm list known

Ruby version ruby-2.0.0-p353 seems closest to what is available on my Fedora 20 machine, so I’ll use that:

$ rvm install ruby-2.0.0-p353

If the exact patch number isn’t available, choose what’s closest. Installing ruby requires a bunch of dependencies. The rvm install command will ask yum for a bunch of dependencies, but if you’d rather install them yourself, you can run:

# yum install -y patch libyaml-devel libffi-devel glibc-headers autoconf gcc-c++ glibc-devel patch readline-devel zlib-devel openssl-devel bzip2 automake libtool bison

GEM installation:

Now we need the GEM dependencies for the vagrant-libvirt plugin. These GEM’s happen to have their own build dependencies, but thankfully I’ve already figured those out for you:

# yum install -y libvirt-devel libxslt-devel libxml2-devel

Now, install the nokogiri gem that vagrant-libvirt needs:

$ gem install nokogiri -v '1.5.11'

and finally we can install the actual vagrant-libvirt plugin:

$ vagrant plugin install --plugin-version 0.0.16 vagrant-libvirt

You don’t have to specify the –plugin-version 0.0.16 part, but doing so will make sure that you get a version that I have tested to be compatible with Vagrant 1.5.4 should a newer vagrant-libvirt release not be compatible with the Vagrant version you’re using. If you’re feeling brave, please test newer versions, report bugs, and write patches!

Making Vagrant more useful:

Vagrant should basically work at this point, but it’s missing some awesome. I’m proud to say that I wrote this awesome. I recommend my bash function and alias additions. If you’d like to include them, you can run:

$ wget https://gist.githubusercontent.com/purpleidea/8071962/raw/ee27c56e66aafdcb9fd9760f123e7eda51a6a51e/.bashrc_vagrant.sh
$ echo '. ~/.bashrc_vagrant.sh' >> ~/.bashrc
$ . ~/.bashrc    # reload

to pull in my most used Vagrant aliases and functions. I’ve written about them before. If you’re interested, please read:

KVM/QEMU installation:

As I mentioned earlier, I’m assuming you have a minimal Fedora 20 installation, so you might not have all the libvirt pieces installed! Here’s how to install any potentially missing pieces:

# yum install -y libvirt{,-daemon-kvm}

This should pull in a whole bunch of dependencies too. You will need to start and (optionally) enable the libvirtd service:

# systemctl start libvirtd.service
# systemctl enable libvirtd.service

You’ll notice that I’m using the systemd commands instead of the deprecated service command. My biggest (only?) gripe with systemd is that the command line tools aren’t as friendly as they could be! The systemctl equivalent requires more typing, and make it harder to start or stop the same service in quick succession, because it buries the action in the middle of the command instead of leaving it at the end!

The libvirtd service should finally be running. On my machine, it comes with a default network which got in the way of my vagrant-libvirt networking. If you want to get rid of it, you can run:

# virsh net-destroy default
# virsh net-undefine default

and it shouldn’t bother you anymore. One last hiccup. If it’s your first time installing KVM, you might run into bz#950436. To workaround this issue, I had to run:

# rmmod kvm_intel
# rmmod kvm
# modprobe kvm
# modprobe kvm_intel

Without this “module re-loading” you might see this error:

Call to virDomainCreateWithFlags failed: internal error: Process exited while reading console log output: char device redirected to /dev/pts/2 (label charserial0)
Could not access KVM kernel module: Permission denied
failed to initialize KVM: Permission denied

Additional installations:

To make your machine somewhat more palatable, you might want to consider installing bash-completion:

# yum install -y bash-completion

You’ll also probably want to add the PolicyKit (polkit) .pkla file that I recommend in my earlier article. Typically that means adding something like:

[Allow james libvirt management permissions]
Identity=unix-user:james
Action=org.libvirt.unix.manage
ResultAny=yes
ResultInactive=yes
ResultActive=yes

as root to somewhere like:

/etc/polkit-1/localauthority/50-local.d/vagrant.pkla

Your machine should now be setup perfectly! The last thing you’ll need to do is to make sure that you get a Vagrantfile that does things properly! Here are some recommendations.

Shared folders:

Shared folders are a mechanism that Vagrant uses to pass data into (and sometimes out of) the virtual machines that it is managing. Typically you can use NFS, rsync, and some provider specific folder sharing like 9p. Using rsync is the simplest to set up, and works exceptionally well. Make sure you include the following line in your Vagrantfile:

config.vm.synced_folder './', '/vagrant', type: 'rsync'

If you want to see an example of this in action, you can have a look at my puppet-gluster Vagrantfile. If you are using the puppet apply provisioner, you will have to set it to use rsync as well:

puppet.synced_folder_type = 'rsync'

KVM performance:

Due to a regression in vagrant-libvirt, the default driver used for virtual machines is qemu. If you want to use the accelerated KVM domain type, you’ll have to set it:

libvirt.driver = 'kvm'

This typically gives me a 5x performance increase over plain qemu. This fix is available in the latest vagrant-libvirt version. The default has been set to KVM in the latest git master.

Dear internets!

I think this was fairly straightforward. You could probably even put all of these commands in a shell script and just run it to get it all going. What we really need is proper RPM packaging. If you can help out, that would be excellent!

If we had a version of vagrant-libvirt alongside a matching Vagrant version in Fedora, then developers and hackers could target that, and we could easily exchange dev environments, hackers could distribute product demos as full vagrant-libvirt clusters, and I could stop having to write these types of articles ;)

I hope this was helpful to you. Please let me know in the comments.

Happy hacking,

James

 


Vagrant on Fedora with libvirt (reprise)

Vagrant has become the de facto tool for devops. Faster iterations, clean environments, and less overhead. This isn’t an article about why you should use Vagrant. This is an article about how to get up and running with Vagrant on Fedora with libvirt easily!

Background:

This article is an update of my original Vagrant on Fedora with libvirt article. There is still lots of good information in that article, but this one should be easier to follow and uses updated versions of Vagrant and vagrant-libvirt.

Why vagrant-libvirt?

Vagrant ships by default with support for virtualbox. This makes sense as a default since it is available on Windows, Mac, and GNU/Linux. Real hackers use GNU/Linux, and in my opinion the best tool for GNU/Linux is vagrant-libvirt. Proprietary, closed source platforms aren’t hackable and therefore aren’t cool!

Another advantage to using the vagrant-libvirt plugin is that it plays nicely with the existing ecosystem of libvirt tools. You can use virsh, virt-manager, and guestfish alongside Vagrant, and if your development work needs to go into production, you can be confident in knowing that it was already tested on the same awesome KVM virtualization platform that your servers run.

Prerequisites:

Let’s get going. What do you need?

  • A Fedora 20 machine

I recommend hardware that supports VT extensions. Most does these days. This should also work with other GNU/Linux distro’s, but I haven’t tested them.

Installation:

I’m going to go through this in a logical hacking order. This means you could group all the yum install commands into a single execution at the beginning, but you would learn much less by doing so.

First install some of your favourite hacking dependencies. I did this on a minimal, headless F20 installation. You might want to add some of these too:

# yum install -y wget tree vim screen mtr nmap telnet tar git

Update the system to make sure it’s fresh:

# yum update -y

Download Vagrant version 1.5.4. No, don’t use the latest version, it probably won’t work! Vagrant has new releases practically as often as there are sunsets, and they typically cause lots of breakages.

$ wget https://dl.bintray.com/mitchellh/vagrant/vagrant_1.5.4_x86_64.rpm

and install it:

# yum install -y vagrant_1.5.4_x86_64.rpm

RVM installation:

In order to get vagrant-libvirt working, you’ll need some ruby dependencies. It turns out that RVM seems to be the best way to get exactly what you need. Use the sketchy RVM installer:

# \curl -sSL https://get.rvm.io | bash -s stable

If you don’t know why that’s sketchy, then you probably shouldn’t be hacking! I did that as root, but it probably works when you run it as a normal user. At this point rvm should be installed. The last important thing you’ll need to do is to add yourself to the rvm group. This is only needed if you installed rvm as root:

# usermod -aG rvm <username>

You’ll probably need to logout and log back in for this to take effect. Run:

$ groups

to make sure you can see rvm in the list. If you ran rvm as root, you’ll want to source the rvm.sh file:

$ source /etc/profile.d/rvm.sh

or simply use a new terminal. If you ran it as a normal user, I think RVM adds something to your ~/.bashrc. You might want to reload it:

$ source ~/.bashrc

At this point RVM should be working. Let’s see which ruby’s it can install:

$ rvm list known

Ruby version ruby-2.0.0-p353 seems closest to what is available on my Fedora 20 machine, so I’ll use that:

$ rvm install ruby-2.0.0-p353

If the exact patch number isn’t available, choose what’s closest. Installing ruby requires a bunch of dependencies. The rvm install command will ask yum for a bunch of dependencies, but if you’d rather install them yourself, you can run:

# yum install -y patch libyaml-devel libffi-devel glibc-headers autoconf gcc-c++ glibc-devel patch readline-devel zlib-devel openssl-devel bzip2 automake libtool bison

GEM installation:

Now we need the GEM dependencies for the vagrant-libvirt plugin. These GEM’s happen to have their own build dependencies, but thankfully I’ve already figured those out for you:

# yum install -y libvirt-devel libxslt-devel libxml2-devel

Now, install the nokogiri gem that vagrant-libvirt needs:

$ gem install nokogiri -v '1.5.11'

and finally we can install the actual vagrant-libvirt plugin:

$ vagrant plugin install --plugin-version 0.0.16 vagrant-libvirt

You don’t have to specify the –plugin-version 0.0.16 part, but doing so will make sure that you get a version that I have tested to be compatible with Vagrant 1.5.4 should a newer vagrant-libvirt release not be compatible with the Vagrant version you’re using. If you’re feeling brave, please test newer versions, report bugs, and write patches!

Making Vagrant more useful:

Vagrant should basically work at this point, but it’s missing some awesome. I’m proud to say that I wrote this awesome. I recommend my bash function and alias additions. If you’d like to include them, you can run:

$ wget https://gist.githubusercontent.com/purpleidea/8071962/raw/ee27c56e66aafdcb9fd9760f123e7eda51a6a51e/.bashrc_vagrant.sh
$ echo '. ~/.bashrc_vagrant.sh' >> ~/.bashrc
$ . ~/.bashrc    # reload

to pull in my most used Vagrant aliases and functions. I’ve written about them before. If you’re interested, please read:

KVM/QEMU installation:

As I mentioned earlier, I’m assuming you have a minimal Fedora 20 installation, so you might not have all the libvirt pieces installed! Here’s how to install any potentially missing pieces:

# yum install -y libvirt{,-daemon-kvm}

This should pull in a whole bunch of dependencies too. You will need to start and (optionally) enable the libvirtd service:

# systemctl start libvirtd.service
# systemctl enable libvirtd.service

You’ll notice that I’m using the systemd commands instead of the deprecated service command. My biggest (only?) gripe with systemd is that the command line tools aren’t as friendly as they could be! The systemctl equivalent requires more typing, and make it harder to start or stop the same service in quick succession, because it buries the action in the middle of the command instead of leaving it at the end!

The libvirtd service should finally be running. On my machine, it comes with a default network which got in the way of my vagrant-libvirt networking. If you want to get rid of it, you can run:

# virsh net-destroy default
# virsh net-undefine default

and it shouldn’t bother you anymore. One last hiccup. If it’s your first time installing KVM, you might run into bz#950436. To workaround this issue, I had to run:

# rmmod kvm_intel
# rmmod kvm
# modprobe kvm
# modprobe kvm_intel

Without this “module re-loading” you might see this error:

Call to virDomainCreateWithFlags failed: internal error: Process exited while reading console log output: char device redirected to /dev/pts/2 (label charserial0)
Could not access KVM kernel module: Permission denied
failed to initialize KVM: Permission denied

Additional installations:

To make your machine somewhat more palatable, you might want to consider installing bash-completion:

# yum install -y bash-completion

You’ll also probably want to add the PolicyKit (polkit) .pkla file that I recommend in my earlier article. Typically that means adding something like:

[Allow james libvirt management permissions]
Identity=unix-user:james
Action=org.libvirt.unix.manage
ResultAny=yes
ResultInactive=yes
ResultActive=yes

as root to somewhere like:

/etc/polkit-1/localauthority/50-local.d/vagrant.pkla

Your machine should now be setup perfectly! The last thing you’ll need to do is to make sure that you get a Vagrantfile that does things properly! Here are some recommendations.

Shared folders:

Shared folders are a mechanism that Vagrant uses to pass data into (and sometimes out of) the virtual machines that it is managing. Typically you can use NFS, rsync, and some provider specific folder sharing like 9p. Using rsync is the simplest to set up, and works exceptionally well. Make sure you include the following line in your Vagrantfile:

config.vm.synced_folder './', '/vagrant', type: 'rsync'

If you want to see an example of this in action, you can have a look at my puppet-gluster Vagrantfile. If you are using the puppet apply provisioner, you will have to set it to use rsync as well:

puppet.synced_folder_type = 'rsync'

KVM performance:

Due to a regression in vagrant-libvirt, the default driver used for virtual machines is qemu. If you want to use the accelerated KVM domain type, you’ll have to set it:

libvirt.driver = 'kvm'

This typically gives me a 5x performance increase over plain qemu. This fix is available in the latest vagrant-libvirt version. The default has been set to KVM in the latest git master.

Dear internets!

I think this was fairly straightforward. You could probably even put all of these commands in a shell script and just run it to get it all going. What we really need is proper RPM packaging. If you can help out, that would be excellent!

If we had a version of vagrant-libvirt alongside a matching Vagrant version in Fedora, then developers and hackers could target that, and we could easily exchange dev environments, hackers could distribute product demos as full vagrant-libvirt clusters, and I could stop having to write these types of articles ;)

I hope this was helpful to you. Please let me know in the comments.

Happy hacking,

James

 


Keeping git submodules in sync with your branches

This is a quick trick for making working with git submodules more magic.

One day you might find that using git submodules is needed for your project. It’s probably not necessary for everyday hacking, but if you’re glue-ing things together, it can be quite useful. Puppet-Gluster uses this technique to easily include all the dependencies needed for a Puppet-Gluster+Vagrant automatic deployment.

If you’re a good hacker, you develop things in separate feature branches. Example:

cd code/projectdir/
git checkout -b feat/my-cool-feature
# hack hack hack
git add -p
# add stuff
git commit -m 'my cool new feature'
git push
# yay!

The problem arises if you git pull inside of a git submodule to update it to a particular commit. When you switch branches, the git submodule‘s branch doesn’t move along with you! Personally, I think this is a bug, but perhaps it’s not. In any case, here’s the fix:

add:

#!/bin/bash
exec git submodule update

to your:

<projectdir>/.git/hooks/post-checkout

and then run:

chmod u+x <projectdir>/.git/hooks/post-checkout

and you’re good to go! Here’s an example:

james@computer:~/code/puppet/puppet-gluster$ git checkout feat/yamldata
M vagrant/gluster/puppet/modules/puppet
Switched to branch 'feat/yamldata'
Submodule path 'vagrant/gluster/puppet/modules/puppet': checked out 'f139d0b7cfe6d55c0848d0d338e19fe640a961f2'
james@computer:~/code/puppet/puppet-gluster (feat/yamldata)$ git checkout master
M vagrant/gluster/puppet/modules/puppet
Switched to branch 'master'
Your branch is up-to-date with 'glusterforge/master'.
Submodule path 'vagrant/gluster/puppet/modules/puppet': checked out '07ec49d1f67a498b31b4f164678a76c464e129c4'
james@computer:~/code/puppet/puppet-gluster$ cat .git/hooks/post-checkout
#!/bin/bash
exec git submodule update
james@computer:~/code/puppet/puppet-gluster$

Hope that helps you out too! If someone knows of a use-case when you don’t want this functionality, please let me know! Many thanks to #git for helping me solve this issue!

Happy hacking,

James

 


Keeping git submodules in sync with your branches

This is a quick trick for making working with git submodules more magic.

One day you might find that using git submodules is needed for your project. It’s probably not necessary for everyday hacking, but if you’re glue-ing things together, it can be quite useful. Puppet-Gluster uses this technique to easily include all the dependencies needed for a Puppet-Gluster+Vagrant automatic deployment.

If you’re a good hacker, you develop things in separate feature branches. Example:

cd code/projectdir/
git checkout -b feat/my-cool-feature
# hack hack hack
git add -p
# add stuff
git commit -m 'my cool new feature'
git push
# yay!

The problem arises if you git pull inside of a git submodule to update it to a particular commit. When you switch branches, the git submodule‘s branch doesn’t move along with you! Personally, I think this is a bug, but perhaps it’s not. In any case, here’s the fix:

add:

#!/bin/bash
exec git submodule update

to your:

<projectdir>/.git/hooks/post-checkout

and then run:

chmod u+x <projectdir>/.git/hooks/post-checkout

and you’re good to go! Here’s an example:

james@computer:~/code/puppet/puppet-gluster$ git checkout feat/yamldata
M vagrant/gluster/puppet/modules/puppet
Switched to branch 'feat/yamldata'
Submodule path 'vagrant/gluster/puppet/modules/puppet': checked out 'f139d0b7cfe6d55c0848d0d338e19fe640a961f2'
james@computer:~/code/puppet/puppet-gluster (feat/yamldata)$ git checkout master
M vagrant/gluster/puppet/modules/puppet
Switched to branch 'master'
Your branch is up-to-date with 'glusterforge/master'.
Submodule path 'vagrant/gluster/puppet/modules/puppet': checked out '07ec49d1f67a498b31b4f164678a76c464e129c4'
james@computer:~/code/puppet/puppet-gluster$ cat .git/hooks/post-checkout
#!/bin/bash
exec git submodule update
james@computer:~/code/puppet/puppet-gluster$

Hope that helps you out too! If someone knows of a use-case when you don’t want this functionality, please let me know! Many thanks to #git for helping me solve this issue!

Happy hacking,

James