Enhance RRD I/O performance in Munin 1.4 and Scale
As with most of the RRD-based monitoring software (Cacti, Ganglia, ...), it is quite difficult to scale.
The bad part is that updating lots of small RRD files seems like pure random I/O to the OS as stated in there documentation.
The good part is that we are not alone, and therefore the RRD developers did tackle the issue with rrdcached. It spools the updates, and flushs them to disk in a batched manner, or when needed by a rrd read command such as graphing. That's why it is scales well when using CGI graphing. Otherwise, munin-graph will read every rrd, and therefore force a flush on all the cache.
And the icing on the cake is that, although it is only fully integrated to munin 2.0, you can use it right away in the 1.4.x series.
You only need to define the environment variable
RRDCACHED_ADDRESS
while running the scripts accessing the
RRDs.
Then, you have to remove the munin-graph
part of the
munin-cron
and run it on its own line. Usually only every hour or
so, to be able to accumulate data in rrdcached
before flushing it
all to disk when graphing.
Updating to 2.0 is also an option to have a real CGI support. (CGI on 1.4 is existing but has nowhere decent performance).