Personal Workflow Blog

To content | To menu | To search

Monday, 20 June 2011

Enhance RRD I/O performance in Munin 1.4 and Scale

As with most of the RRD-based monitoring software (Cacti, Ganglia, ...), it is quite difficult to scale.

The bad part is that updating lots of small RRD files seems like pure random I/O to the OS as stated in there documentation.

The good part is that we are not alone, and therefore the RRD developers did tackle the issue with rrdcached. It spools the updates, and flushs them to disk in a batched manner, or when needed by a rrd read command such as graphing. That's why it is scales well when using CGI graphing. Otherwise, munin-graph will read every rrd, and therefore force a flush on all the cache.

And the icing on the cake is that, although it is only fully integrated to munin 2.0, you can use it right away in the 1.4.x series.

You only need to define the environment variable RRDCACHED_ADDRESS while running the scripts accessing the RRDs.

Then, you have to remove the munin-graph part of the munin-cron and run it on its own line. Usually only every hour or so, to be able to accumulate data in rrdcached before flushing it all to disk when graphing.

Updating to 2.0 is also an option to have a real CGI support. (CGI on 1.4 is existing but has nowhere decent performance).

Thursday, 16 June 2011

Autovivification in Perl : Great Idea but also Huge Trap - Another Leaking Abstraction...

Autovivification is one of Perl's really great design success.

It all comes to you don't need to worry about existence before dereferencing something.

That means, for setting a nested hash, you only need to write :

$h->{foo}{bar} = "value";

And that will work out of the box. Perl will happily create all the data-structure for you.

So, now a little coding test, what does the following code output ?

my $a;

if ($a->{foo}{bar}) {
   print "Found foo/bar\n";
}

if ($a->{foo}) {
   print "Found foo\n";
}

Naively, it shouldn’t output anything, right ?

Not so fast. Upon a careful read of Perl will happily create all the data-structure for you, we can put some emphasis on one word : Perl will happily create all the data-structure for you.

That might be just perfect, except that Perl creates it whenever it needs it, even if it is only for reading.

And now you understand the catch : a read operation can result in a write one.

As Uncle Ben (from SpiderMan) said[1] : With Great Power Comes Great Responsibility.

Dagfinn Ilmari Mannsåker showed me a nice autovivification module on CPAN that fixes this behavior, and enables a fine tuning of this process.

I really think the fact that creation also happen when querying the value is a real bug in Perl itself, or at least a bug in the design of the feature.

Notes

[1] Voltaire, Franklin D. Roosevelt and other said something very similar, but they are not as geeky.

Monday, 23 August 2010

Waiting for Munin 2.0 - Keep more data with custom data retention plans

RRD is Munin's backbone.

Munin keeps its data in an RRD database. It's a wonderful piece of software, designed for this very purpose : keep an history of numeric data.

All you need is to tell RRD for how long and the precision you want to keep your data. RRD manages then all the underlying work : pruning old data, averaging to decrease precision if needed, ...

Munin automatically creates the RRD databases it needs.

1.2 - Only one set

In 1.2, every database creation was done with the same temporal & precision parameters. Since the output parameters were constant (day, week, month, year graphs), there were little need to have a different set of parameters.

1.4 - 2 sets : normal & huge

In 1.4, various users showed their need to have different graphing outputs, and began to hack around Munin's fixed graphing. It became rapidly obvious that the 1.2 preset wasn't a fit for everyone.

Therefore a huge dataset was available to be able to extend the finest precision (5min) to the whole Munin timeframe. This comes at a price though : more space is required, and the graph generation is slower, specially when generating the yearly one, since more data has to be read and analysed.

The switch is done for the whole munin installation by changing the system-wide graph_data_size, although already created rrd databases aren't changed. It is then even possible for a user to pre-customize the rrd file. Munin will then happily uses them transparently thanks to the RRD layer.

Manual overriding

Altering the RRD files after it is created is possible, but not as simple. Standard export & import from RRD take the structure with it. So data has to be moved around with special tools. rrdmove is my attempt to create such a tool. It copies data between 2 already existing RRD files, even asking RRD to interpolate the data when needed.

2.0 - Full control

Starting with 2.0, the parameter graph_data_size is per service. It also has a special mode : custom. Its format is very simple :

 
graph_data_size custom FULL_NB, MULTIPLIER_1 MULTIPLIER_1_NB, ... MULTIPLIER_NMULTIPLIER_N_NB
graph_data_size custom 300, 15 1600, 30 3000

The first number is the number of data at full resolution. Then usually it comes gradually decreasing resolution.

A decreasing resolution has 2 usages :

  • Limit the space consumption : keeping full resolution for the whole period (default : 5min for 2 years) is sometime too precise.
  • Increase performance : RRD will choose the best fitting resolution to generate its graphs. Already aggregated data is faster to compute.

Monday, 12 July 2010

Waiting for Munin 2.0 - Native SSH transport

In the munin architecture, the munin-master has to connect to the munin-node via a very simple protocol and plain TCP.

This has several advantages :

  1. Very simple to manage & install
  2. Optional SSL since 1.4 enabling secure communications
  3. Quite simple firewall rules.

It has also some disadvantages :

  1. A new listening service means a wider exposure
  2. The SSL option might add some administrative overhead (certificates management, ...)
  3. A native protocol isn't always covered by all firewall solutions
  4. Some organisations only authorize a few protocols to simplify audits (ex: only SSH & HTTPS)

Native SSH

Theses down points may be solved by encapsulation over SSH, but it can be a tedious task to maintain if the number of hosts increases.

Therefore 2.0 introduces the concept of a native SSH transport. Its usage is dead simple : replace the address with an ssh:// URL-like one.

The node still has to be modified to communicate with stdin/stdout instead of a network socket. For now, only pmmn and munin-async are able to provide such a node.

Configuration

The URL is quite self-explanatory as shown in the example below :

[old-style-host]
    address host.example.com

[new-style-host]
    address ssh://munin-node-user@host.example.com/path/to/stdio-enabled-node --params

Installation notes

Authentication should be done without password but via SSH keys. The connection is from munin-user@host-munin to munin-node-user@remote-node.

If you use munin-async, the user on the remote node might only be a readonly one, since it only needs to read spooled data. This implies that you use --spoolfetch and not --vectorfetch that updates the spool repository.

Upcoming HTTP(S) transport in 3.0

And the sweetest part is that since all the work has been done for adding another transport, adding a CGI-based HTTP transport one is possible (and therefore done) for 3.0.

Saturday, 26 June 2010

Waiting for Munin 2.0 - Performance - Asynchronous updates

munin-update is the fragile link in the munin architecture. A missed execution means that some data is lost.

The problem : updates are synchronous

In Munin 1.x, updates are synchronous : the value of each service[1] is the one that munin-update retrieves each scheduled run.

The issue is that munin-update has to ask every service on every node for their values. Since the values are only computed when asked, munin-update has to wait quite some time for every value.

This very simple design enables munin to have the simplest plugins : they are completely stateless. While being one great strength of munin, it puts a severe blow on scalability : more plugins/node means obviously a slower retrieval.

Evolving Solutions

1.4 : Parallel Fetching

1.4 addresses some of these scalability issues by implementing parallel fetching. It takes into account that the most of the execution time of munin-update is spent waiting for replies. In 1.4 munin-update can ask max_processes nodes in parallel.

Now, the I/O part is becoming the next limiting factor, since updating many RRDs in parallel is the same as random I/O access for the underlying munin-master OS. Serializing & grouping the updates will be possible with the new RRDp interface from rrdtool version 1.4 and on-demand graphing. Tomas Zvala even offered a patch for 1.4 RRDp on the ML. It is very promising, but doesn't address the root defect in this design : a hard dependence of regular munin-update runs.

2.0 : Stateful plugins

2.0 provides a way for plugins to be stateful. They might schedule their polling themselves, and then when munin-update runs, only emit collect already computed values. This way, a missed run isn't as dramatic as it is in the 1.x series, since data isn't lost. The data collection is also much faster because the real computing is done ahead of time.

2.0 : Asynchronous proxy node

But changing plugins to be stateful and self-polled is difficult and tedious. It even works against of one of the real strength of munin : having simple & stateless plugins.

To address this concern, an experimental proxy node is created. For 2.0 it takes the form of a couple of processes : munin-async-server and munin-sync-client.

The proxy node in detail (munin-async)

Overview

These 2 processes form an asynchronous proxy between munin-update and munin-node. This avoids the need to change the plugins or upgrade munin-node on all nodes.

munin-async-server should be installed on the same host than the proxied munin-node in order to avoid any network issue. It is the process that will poll regularly munin-node. The I/O issue of munin-update is here non-existent, since munin-async stores all the values by simply appending them in a text file without any further processing. This file is later read by the client's munin-update, and it will be processed there.

Specific update rates

Having one proxy per node enables a polling of all the services there with a specific update rate.

To achieve this, munin-async-server forks into multiple processes, one for each proxied service. This way each service is completely isolated from the other, and therefore is able to have its own update rate, is safe from other plugins slowdowns, and it does even completely parallelize the information gathering.

SSH transport

munin-async-client uses the new SSH native transport of 2.0. It permits a very simple install of the async proxy.

Notes

[1] in 1.2 it's the same as plugin, but since 1.4 and the introduction of multigraph, one plugin can provide multiple services.

- page 1 of 10