Waiting for Munin 2.0 - Performance - Architecture
A little intro/refresh on munin's architecture on the master
Munin has a very simple architecture on the master :
is launched via cron every 5 minutes. Its only job is to launch in order
The various processes
This process retrieves the values from the various nodes and to update the rrd files. This one should never take more than 5 minutes to run, otherwise there will be gaps since the next update will not be launched (lockfile-protected runs).
This process stresses the I/O on the master, and depends on the plugins execution time on the various nodes. On 1.4 the retrieval is multi-threaded, so an slow node doesn't impact too much the whole process.
2.0 proposes asynchronous updates and vectorized updates.
This process generates all the image files from the rrd files.
It is usually a process that is quite CPU-bound, it generates also a fair load of I/O. Since 1.4 there might also be a parallel graphing generation in order to take advantage of multiple CPU / multiple I/O paths.
A simple optimization is to generate only needed graphs instead of all of
them each time. This leads to CGI-generation of graphs. 1.2 & 1.4 took a
first step in this direction, but it's quite a hack since it's only a very
basic script that calls
munin-update with the correct
A FastCGI port of the wrapper (
munin-cgi-graph) removes the
overhead of starting the wrapper for each call, but in 1.4 the code is quite
experimental and has some serious bugs that would need extensive patching to be
2.0 completes the integration of CGI graphing with removing the
overhead of calling
munin-graph and does this extensive patching
for bugs fixing
This process generates all the html files from the rrd files. This one is quite fast for now.
This process checks the limits to see if there is a warning/alert to send via mail or nagios. This one is also quite fast for now.
 more multi-process actually