Personal Workflow Blog

To content | To menu | To search

Saturday, 13 April 2013

Spinoffs in the munin ecosystem

KISS is the core design of Munin

Munin's greatest strength is its very KISS architecture. It therefore gets many things right, such as a huge modularity.

Each component (master/node/plugin) has a simple API to communicate with the others.

Spin-offs ...

I admit that the master, even the node, have convoluted code. In fact some rewrites already do exist.

... are welcomed ...

And they are a really good thing, as it enables rapid prototyping on things that the stock munin has (currently) trouble to do.

The stock munin is a piece of software that many depend upon, so it has to move at a much slower pace than one does want, even me. As much as I really want to add many many features to it, I still have to take extra care that it doesn't break stuff, even the least known features.

So I take munin off-springs very seriously and even offer as much help as I can in order for them to succeed.

... because they are very valuable in the long term

In my opinion competition is only short bad in the short term, and in the long term they usually add significant value to the whole ecosystem. That said, there's always a risk to become slowly irrelevant, but I think that's the real power of open-source's evolutionary paradigm : embrace them or become obsolete and get replaced.

Since, if someone takes the time to author a competitor that has a real threat potential, it mostly means that there's a real itch to scratch and that many things are to be learnt.

Different layers of spin-offs

The munin ecosystem is divided in 3 main categories, obviously related to the 3 main components of munin : master, node & plugin.

Plugins

That's the most obvious part as custom plugins are the real bread and butter of munin.

Stock plugins are mostly written in Perl or POSIX shell, as Perl is munin's own language and POSIX shell is ubiquitous. That fact is acknowledged by the fact that core munin provides 2 libraries (Perl & Shell) to help plugin authoring.

So, it's quite natural that each mainstream language has grown its own plugin library. Some language even have two of them.

C

Some plugins got even rewritten in plain C, as it was shown that shell plugins do have a significant impact on very under-powered nodes, such as embedded routers.

Node

This component is very simple. Yet, it has to be run on all the nodes that one wants to monitor. It is currently written in Perl, and while that's not an issue on UNIX-like systems, it can be quite problematic on embedded ones

Simple munin

The official package comes with a POSIX shell rewrite that has to be run from inetd. It is quite useful for embedded routers like OpenWRT, but still suffers from an hard dep on POSIX shell and inetd.

SNMP

SNMP is another way to monitor nodes. While it works really well, it mostly suffers the fact that its configuration is quite different of the usual way, so I guess some things will change on that side.

Win32 ports

Win32 has long been a very difficult OS to monitor, as it doesn't offer much of the UNIX-esque features. Yet the number of win32 nodes that one wants to monitor is quite high, as it makes munin one the few systems that can easily monitor heterogeneous systems.

Therefore, while you can install the stock munin-node, several projects emerged. We decided to adopt munin-node-win32.

Android

There's also a dedicated node for Android. It makes sense, given that the Android is yet Linux-derived, but lacks Perl, and is a Java mostly platform. This node also has some basic capabilities of pushing data to the master instead of the usual polling.

This is specially interesting given the fact that Android nodes are usually loosely connected, so the node spools values itself and pushes them when it recovers connectivity.

Note that this is specifically an aspect that is currently lacking in munin, and I'm planning to address it in the 2.1 series. So thanks to its author for showing a relevant use-case.

C

That's my last experiment. It started with a simple question : how difficult would it be to code a fairly portable version of the node ?

It turned out that it wasn't that difficult. I'm even asking myself about eventually replacing the win32 specific port with this one, as the code is much simpler. The win32 node has several plugin built-in mostly due to platform specifics. I still have to find a way to work my way around it, but it's in quite good shape.

This post was originally done to promote it, but while writing it I noticed that the ecosystem deserved a post on its own. So I'll write another one, specific to the C port of munin-node and plugins.

Master

The master is the most complex component. So rewrites of it won't happen as-is. They usually take the form of a bridge between the munin protocol and another graphing system, such as Graphite.

Clients

There are also client libraries that are able to directly query munin nodes, to be able to reuse the vast ecosystem. Languages are various, from the obvious Python to Ruby, along with a quite modern node.js one.

Thursday, 4 April 2013

Do not fear git rebase : make snapshots !

Git is a nice version system, but some commands are destructrive, such as rebase.

Here is a script to have a safety net, and free backups !

#! /bin/sh
# Script to snaphot a git repo
SNAP_VERSION=$(date +%s)
BUNDLE_NAME=$(basename $( pwd )).${SNAP_VERSION}.git.bundle
git bundle create ../${BUNDLE_NAME} --all
git remote add snap-${SNAP_VERSION} ../${BUNDLE_NAME}
git fetch -p snap-${SNAP_VERSION}

Usage is very easy. If you want to restore your current branch to the master one you made earlier.

git reset --hard snap-1365068411/master

Sunday, 24 February 2013

When having good relationships with package maintainers can also be a curse

I advise every user to only use the packaged version of munin. Here's a short article to explain the background of my reluctance to ask for users to directly use the official tarball.

I have become upstream of munin a while ago now. As such, I'm in contact with package maintainers. They take the official releases and cram it into their own distribution of choice[1].

I have to admit that the various epic war stories read throughout the web about upstream vs packagers are very far from the truth here. They are a charm to work with. Often challenging and demanding, but always because there's a real need. And that's quite a good thing, as I'm still a rookie in term of open source software management. Therefore I'm quite grateful when they gently pinpoint my mistakes[2].

Yet, this nice team comes with a price. Since we mostly hang out on IRC together, there is way much inter-distro communication than on other software. But I'm the sole owner of the tarball distro .

Yet, as I don't like to build everything from source, I obviously use a distro. There, since the packaging is very nicely done, I don't feel to take the hassle of using my own "tarball" to test them. I just build a package for my distro out of the release code.

That's also a curse, as I admit that I although I test the code, I only seldom test the packaging. This means that I cannot really advise someone on using the tarball, nor directly git code as even I don't do it.

But, that said, I still think I'm the luckiest upstream around. Thanks guys !

Notes

[1] Be it linux-based like Gentoo, Redhat..., BSD-based as FreeBSD, OpenBSD..., or even multi-kernel based as Debian

[2] Defaulting to CGI graphics was a move that was way too premature, end-user wise. So thanks to them, it defaults to cron again

Friday, 1 February 2013

Avoid those milli-hits in Munin

A recurring question on IRC is : why do I have 500 million hit/s in my graph ?.

Turns out that they are really seeing 500m hit/s, and that lower-case m means milli, and not Mega as specified in the Metric system. This is automatically done by RRD.

To avoid this you should just specify graph_scale no as specified.

Thursday, 12 July 2012

Waiting for Munin 2.0 - Break the 5 minutes barrier !

Every monitoring software has a polling rate. It is usually 5 min, because it's the sweet spot that enables frequent updates yet still having a low overhead.

Munin is not different in that respect : it's data fetching routines have to be launched every 5 min, otherwise you'll face data loss. And this 5 min period is deeply grained in the code. So changing it is possible, but very tedious and error prone.

But sometimes we need a very fine sampling rate. Every 10 seconds enables us to track fast changing metrics that would be averaged out otherwise. Changing the whole polling process to cope with a 10s period is very hard on hardware, since now every update has to finish in these 10 seconds.

This triggered an extension in the plugin protocol, commonly known as supersampling.

Supersampling

Overview

The basic idea is that fine precision should only be for selected plugins only. It also cannot be triggered from the master, since the overhead would be way too big.

So, we just let the plugin sample itself the values at a rate it feels adequate. Then each polling round, the master fetches all the samples since last poll.

This enables various constructions, mostly around streaming plugins to achieve highly detailed sampling with a very small overhead.

Notes

This protocol is currently completely transparent to munin-node, and therefore it means that it can be used even on older (1.x) nodes. Only a 2.0 master is required.

Protocol details

The protocol itself is derived from the spoolfetch extension.

Config

A new directive is used, update_rate. It enables the master to create the rrd with an adequate step.

Omitting it would lead to rrd averaging the supersampled values onto the default 5 min rate. This means data loss.

Notes

The heartbeat has always a 2 step size, so failure to send all the samples will result with unknown values, as expected.

The RRD file size is always the same in the default config, as all the RRA are configured proportionally to the update_rate. This means that, since you'll keep as much data as with the default, you keep it for a shorter time.

Fetch

When spoolfetching, the epoch is also sent in front of the value. Supersampling is then just a matter of sending multiple epoch/value lines, with monotonically increasing epoch. Note that since the epoch is an integer value for rrdtool, the smallest granularity is 1 second. For the time being, the protocol itself does also mandates integers. We can easily imagine that with another database as backend, an extension could be hacked together.

Compatibility with 1.4

On older 1.4 masters, only the last sampled value gets into the rrd.

Sample implementation

The canonical sample implementation is multicpu1sec, a contrib plugin on github. It is also a so-called streaming plugin.

Streaming plugins

These plugins fork a background process when called that streams a system tool into a spool file. In multipcu1sec, it is the mpstat tool with a period of 1 second.

Undersampling

Some plugins are on the opposite side of the spectrum, as they only need a lower precision.

It makes sense when :

  • data should be kept for a very long time
  • data is very expensive to generate and it doesn't vary fast.

- page 1 of 11