Personal Workflow Blog

To content | To menu | To search

Monday, 21 June 2010

CGI on steroids with FastCGI, but on a CGI-only server - The FastCGI wrapper

FastCGI is really CGI on steroids

FastCGI is very common way to increase performance of a CGI installation. It is based on the fact that usually the startup of CGI scripts is slow, whereas the response is quite fast.

So if you have a persistent process, you only have to take care of the startup once, and you then experience a real speedup.

FastCGI vs mod_perl (or mod_python, ...)

Once a big fan of mod_perl, I'm converted to FastCGI since. mod_perl was for a long time the answer for speeding up Perl CGI scripts. It has a very good track record of stability and has real hooks deep in the Apache processing requests.

FastCGI focuses on a different feature set that is more actual than mod_perl[1] :

  • It is much simpler to install and configure, especially when having multiple applications.
  • Able to connect to a distant server (running as a different UID, chrooted or even on a remote host)
  • Able to mix scripting languages without any need to compile some other apache modules.
  • Able to be used with several webservers, even closed-source ones : FastCGI is a protocol, not an API.

But steroids do have some side effects

CGI issues

One downside is that your CGI script should be adapted to FastCGI and the fact that the script doesn't end with the end of the request.

In the real world that's quite easy. Every language that is commonly used for CGI offers CGI-wrapper libraries that works in a FastCGI context as well as a plain CGI one.

Webserver issues

Another issue can also come from the webserver. Since CGI is dead simple to implement even the micro-webserver thttpd implements it.

FastCGI on the other hand is a little more difficult to implement, since the webserver needs to create a container that monitors and calls the FastCGI-enabled script.

A standalone FastCGI container

Fortunately, the FastCGI team provided us with a ready-to-use container and a very simple client that acts a plain CGI script, but proxies it to a full-blown container.

Since the plain CGI part is a very small native executable its overhead is negligible compared to the reply time, even without comparison with the startup time of the whole script.

Its installation is also quite straightforward. I just installed the libfcgi package on Debian : it provides /usr/bin/cgi-fcgi.

I created a simple CGI wrapper for my previous munin benchmarking needs :

#! /bin/sh

exec /usr/bin/cgi-fcgi -connect /tmp/munin-cgi.sock \
     /usr/lib/cgi-bin/munin-cgi-graph

Notes

[1] who really need deep apache hooks ?

Wednesday, 16 June 2010

Waiting for Munin 2.0 - Performance - FastCGI

1.2 has CGI, it is slow, unsupported, but it does exist.

1.4 has even an experimental FastCGI install mode.

Quoting from this page :

This is more a proof of concept than a recommended - it's slow. Also we do not test it before every release

In 2.0 lots of work has been done to take this experimental CGI mode into a supported one. It might even be the primary way of using munin since, when an install has a certain size, CGI becomes mandatory.

That's because munin-graph doesn't have time to finish its job when the next one is launched, and the new one doesn't run. It is not as dramatic as a missed munin-update execution, since the graphs will still be generated on the later round, but there will be random graph lags and it will put quite some stress on the CPU & I/O subsystem. This will slow munin-update down since it also uses the I/O subsystem much, and that's to be avoided at all costs.

Mainstream CGI has some consequences :

  1. Only the FastCGI wrapper remained : the plain CGI one is dropped.
    • The CPAN module CGI::Fast is compatible when launched as a normal CGI.
    • Almost all HTTP servers support plain CGI, and with the cgi-fcgi wrapper from the FastCGI devkit (Debian package libfcgi), you can have the best of both worlds (a custom HTTP server & FastCGI). I even posted on how to have a working thttpd with FastCGI.
  2. The old process limit mechanism is dropped also. The FastCGI server configuration is a much better way to control it. The old code was based on System V semaphores and was not 100% reliable.
  3. A caching system has to be implemented, in order for each graph to be generated only once for its lifetime.
  4. The CGI process is launched with the HTTP server user. Since it doesn't only read now, but also writes log files and images files, there is an extra step when installing it. But it's already described in the Munin CGI page given previously.
  5. Since the process is launched only once, for now it read only once the config. So if some part of the config change, the FastCGI container MUST be restarted.

Some benchmarks

Now, the sweet part : I'm putting up some micro-benchmarks.

They should be taken with caution as every benchmark should be, but I think the general idea is conveyed. For the sake of simplicity I'm only doing 1 request in parallel and disabled IMS caching.

Basic 1.2 CGI

$ httperf --num-conns 10  --add-header='Cache-Control: no-cache\n' \
    --uri  /cgi-bin/munin-cgi-graph/localdomain/localhost.localdomain/cpu-day.png

Total: connections 10 requests 10 replies 10 test-duration 27.939 s

Connection rate: 0.4 conn/s (2793.9 ms/conn, <=1 concurrent connections)
Connection time [ms]: min 1653.9 avg 2793.9 max 5217.0 median 1912.5 stddev 1487.8
Connection time [ms]: connect 0.0
Connection length [replies/conn]: 1.000

Request rate: 0.4 req/s (2793.9 ms/req)
Request size [B]: 131.0

1.4 FastCGI

The munin-fastcgi-graph is only loaded once, but the munin-graph is reloaded each time.

$ httperf --num-conns 10  --add-header='Cache-Control: no-cache\n' \
    --uri  /cgi-bin/munin-fastcgi-graph/localdomain/localhost.localdomain/cpu-day.png

Total: connections 10 requests 10 replies 10 test-duration 13.807 s

Connection rate: 0.7 conn/s (1380.7 ms/conn, <=1 concurrent connections)
Connection time [ms]: min 1141.3 avg 1380.7 max 1636.1 median 1381.5 stddev 173.7
Connection time [ms]: connect 0.0
Connection length [replies/conn]: 1.000

Request rate: 0.7 req/s (1380.7 ms/req)

The response time is cut almost in half. That's expected, since only the top half of the processing isn't reloaded.

2.0 FastCGI

Here everything is loaded once.

$ httperf --num-conns 10  --add-header='Cache-Control: no-cache\n' \
    --uri  /cgi-bin/munin-cgi-graph-2.0/localdomain/localhost.localdomain/cpu-day.png

Total: connections 10 requests 10 replies 10 test-duration 1.668 s

Connection rate: 6.0 conn/s (166.8 ms/conn, <=1 concurrent connections)
Connection time [ms]: min 123.0 avg 166.8 max 513.4 median 127.5 stddev 121.9
Connection time [ms]: connect 0.0
Connection length [replies/conn]: 1.000

Request rate: 6.0 req/s (166.8 ms/req)

Now response time is cut almost by a ten factor ! That's quite good news, since it goes 20 times faster that the original CGI.

Thursday, 10 June 2010

Waiting for Munin 2.0 - Performance - Architecture

A little intro/refresh on munin's architecture on the master

Munin has a very simple architecture on the master : munin-cron is launched via cron every 5 minutes. Its only job is to launch in order munin-update, munin-graph, munin-html & munin-limits.

The various processes

munin-update

This process retrieves the values from the various nodes and to update the rrd files. This one should never take more than 5 minutes to run, otherwise there will be gaps since the next update will not be launched (lockfile-protected runs).

This process stresses the I/O on the master, and depends on the plugins execution time on the various nodes. On 1.4 the retrieval is multi-threaded[1], so an slow node doesn't impact too much the whole process.

2.0 proposes asynchronous updates and vectorized updates.

munin-graph

This process generates all the image files from the rrd files.

It is usually a process that is quite CPU-bound, it generates also a fair load of I/O. Since 1.4 there might also be a parallel graphing generation in order to take advantage of multiple CPU / multiple I/O paths.

A simple optimization is to generate only needed graphs instead of all of them each time. This leads to CGI-generation of graphs. 1.2 & 1.4 took a first step in this direction, but it's quite a hack since it's only a very basic script that calls munin-update with the correct parameters.

A FastCGI port of the wrapper (munin-cgi-graph) removes the overhead of starting the wrapper for each call, but in 1.4 the code is quite experimental and has some serious bugs that would need extensive patching to be fixed.

2.0 completes the integration of CGI graphing with removing the overhead of calling munin-graph and does this extensive patching for bugs fixing

munin-html

This process generates all the html files from the rrd files. This one is quite fast for now.

munin-limits

This process checks the limits to see if there is a warning/alert to send via mail or nagios. This one is also quite fast for now.

Notes

[1] more multi-process actually

Tuesday, 8 June 2010

Waiting for Munin 2.0 - Introduction

This is the first article of a series about the coming version 2.0 of Munin.

The idea came from the series Waiting from 8.5 about PostgreSQL.

The ironic part is that their 8.5 release has become a 9.0, just like our 1.5 will be a 2.0.

I'll post several small articles about new or enhanced-enough features. They will all be tagged munin20.

Planned summary :

  1. Performance - Architecture context
  2. Performance - FastCGI
  3. Performance - Asynchronous updates
  4. Performance - Misc
  5. Native SSH transport
  6. Custom data retention plans (keep more data)
  7. Dynamic zooming

Thursday, 1 April 2010

Don't use Excerpt... At least with DotClear.

DotClear automatically generates a meta description tag from the blog entry, but it doesn't take the excerpt into account.

It just takes the beginning of the article content. Since the excerpt is also shown at the beginning of the article, I cannot just write 2 times the same content.

meta description is quite interesting since it is usually used for the little snipped under a search result in usual search engines, so having the beginning of the post in here is very nice.

This fact annihilates the good point of having excerpts.

I'm now falling back to removing progressively all the excerpts on my posts...

- page 2 of 10 -