Personal Workflow Blog

To content | To menu | To search

Tag - design

Entries feed

Monday, 2 December 2013

Experimenting with a C munin node

Core plugins are designed for simplicity...

As I wrote about it earlier, Helmut rewrote some core plugins in C. It was maintly done with efficiency in mind.

As those plugins are only parsing one /proc file, there seemed no need to endure the many forks inherent with even trivial shell programming. It also acknowledges the fact that the measuring system shall be as light as possible

Munin plugin are highly driven towards simplicity. Therefore having shell plugins is quite logical. It conveys the educational sample purpose for users to write their own, while being quite easy to code/debug for the developpers. Since their impact on current systems is very small, there are not much incentive to change.

... but efficiency is coming !

Nonetheless, now monitored systems are becoming quite small.

It is mostly thanks to embedded systems like the RaspberryPi. This means that processing power available is much lower than on normal nodes[1].

Now the embedded C approach for plugins has a new rationale.

Notes

[1] Usually datacenter nodes are more in the high end of the spectrum than the low-end.

Monday, 21 June 2010

CGI on steroids with FastCGI, but on a CGI-only server - The FastCGI wrapper

FastCGI is really CGI on steroids

FastCGI is very common way to increase performance of a CGI installation. It is based on the fact that usually the startup of CGI scripts is slow, whereas the response is quite fast.

So if you have a persistent process, you only have to take care of the startup once, and you then experience a real speedup.

FastCGI vs mod_perl (or mod_python, ...)

Once a big fan of mod_perl, I'm converted to FastCGI since. mod_perl was for a long time the answer for speeding up Perl CGI scripts. It has a very good track record of stability and has real hooks deep in the Apache processing requests.

FastCGI focuses on a different feature set that is more actual than mod_perl[1] :

  • It is much simpler to install and configure, especially when having multiple applications.
  • Able to connect to a distant server (running as a different UID, chrooted or even on a remote host)
  • Able to mix scripting languages without any need to compile some other apache modules.
  • Able to be used with several webservers, even closed-source ones : FastCGI is a protocol, not an API.

But steroids do have some side effects

CGI issues

One downside is that your CGI script should be adapted to FastCGI and the fact that the script doesn't end with the end of the request.

In the real world that's quite easy. Every language that is commonly used for CGI offers CGI-wrapper libraries that works in a FastCGI context as well as a plain CGI one.

Webserver issues

Another issue can also come from the webserver. Since CGI is dead simple to implement even the micro-webserver thttpd implements it.

FastCGI on the other hand is a little more difficult to implement, since the webserver needs to create a container that monitors and calls the FastCGI-enabled script.

A standalone FastCGI container

Fortunately, the FastCGI team provided us with a ready-to-use container and a very simple client that acts a plain CGI script, but proxies it to a full-blown container.

Since the plain CGI part is a very small native executable its overhead is negligible compared to the reply time, even without comparison with the startup time of the whole script.

Its installation is also quite straightforward. I just installed the libfcgi package on Debian : it provides /usr/bin/cgi-fcgi.

I created a simple CGI wrapper for my previous munin benchmarking needs :

#! /bin/sh

exec /usr/bin/cgi-fcgi -connect /tmp/munin-cgi.sock \
     /usr/lib/cgi-bin/munin-cgi-graph

Notes

[1] who really need deep apache hooks ?

Wednesday, 31 March 2010

API Design: Avoid hidden costs of simple features

Programmers are usually like water : they always use the path of least resistance.

Let's see how to use this fact to predict the usage of an API when you design it.

Initial API

Consider the very simple DB API that consumes a connected ResultSet and presents a disconnected version of it.

class DisconnectedResultSet{
        public DisconnectedResultSet (ResultSet rs);
        public boolean next();
        public Object getObject(int col_idx);
}

It's usage is quite easy :

while (drs.next()) {
        int col_idx = 1;
        drs.getObject(col_idx++); // Do something w/ 1st col
        drs.getObject(col_idx++); // Do something w/ 2st col
        //...
}

Just a little evolution...

Since the DisconnectedResultSet is disconnected, we can imagine that it should implement a rewind() method in order to use it several times without running the initial query again. We now have an updated class :

class DisconnectedResultSet{
        public DisconnectedResultSet (ResultSet rs);
        public boolean next();
        public Object getObject(int col_idx);   
        public void rewind(); // Be able to rewind it
}

And its classical usage :

while (drs.next()) {
        // do stuff...
}
// ...
drs.rewind();
while (drs.next()) {
        // do something else with the same data...
}
// ...
drs.rewind();
while (drs.next()) {
        // do something else with the same data...
}
// ...

A new need comes

A new need comes : see if the DisconnectedResultSet is empty or not in order to avoid sending header.

The usual way is to send them once when iterating like :

boolean is_headers_sent = false;
while (drs.next()) {
        if (! is_headers_sent) { 
                send_headers(); 
                is_headers_sent = true;
        }
        // do something else with the same data...
}

But since there is a nice rewind()method, just waiting to be used, the code might become :

if (drs.next()) {
        send_headers(); 
}
drs.rewind();
while (drs.next()) {
        // do something else with the same data...
}

Now, this code isn't generic anymore to accommodate a connected ResultSet.

So, as John Carmack said :

The cost of adding a feature isn't just the time it takes to code it. The cost also includes the addition of an obstacle to future expansion.

That's really true when you design APIs since their purpose is to last long and to be extended.

So, think twice when you propose an extension "just in case".

The little evolution, revisited...

To solve this case, don't propose a rewind() method, but offer a duplicate() one. It offers the same functionality, just in a new object.

The usage will be almost the same as shown below, but since it feels more performance-sensitive, it won't be used as lightly : the boolean is_headers_sent pattern has now more chances to be used.

while (drs.next()) {
        // do stuff...
}
// ...
drs = drs.duplicate();
while (drs.next()) {
        // do something else with the same data...
}
// ...
drs = drs.duplicate();
while (drs.next()) {
        // do something else with the same data...
}
// ...

It's an other example that immutable objects are the way to go, but for a different reason this time.

Note: Just finished my March 2010 article, even on time... I'm still trying to keep at least a one article per month blogging rate. So far so good for 2010, still 9 months to go !