Personal Workflow Blog

To content | To menu | To search

Tag - perl

Entries feed

Thursday, 16 June 2011

Autovivification in Perl : Great Idea but also Huge Trap - Another Leaking Abstraction...

Autovivification is one of Perl's really great design success.

It all comes to you don't need to worry about existence before dereferencing something.

That means, for setting a nested hash, you only need to write :

$h->{foo}{bar} = "value";

And that will work out of the box. Perl will happily create all the data-structure for you.

So, now a little coding test, what does the following code output ?

my $a;

if ($a->{foo}{bar}) {
   print "Found foo/bar\n";
}

if ($a->{foo}) {
   print "Found foo\n";
}

Naively, it shouldn’t output anything, right ?

Not so fast. Upon a careful read of Perl will happily create all the data-structure for you, we can put some emphasis on one word : Perl will happily create all the data-structure for you.

That might be just perfect, except that Perl creates it whenever it needs it, even if it is only for reading.

And now you understand the catch : a read operation can result in a write one.

As Uncle Ben (from SpiderMan) said[1] : With Great Power Comes Great Responsibility.

Dagfinn Ilmari Mannsåker showed me a nice autovivification module on CPAN that fixes this behavior, and enables a fine tuning of this process.

I really think the fact that creation also happen when querying the value is a real bug in Perl itself, or at least a bug in the design of the feature.

Notes

[1] Voltaire, Franklin D. Roosevelt and other said something very similar, but they are not as geeky.

Monday, 21 June 2010

CGI on steroids with FastCGI, but on a CGI-only server - The FastCGI wrapper

FastCGI is really CGI on steroids

FastCGI is very common way to increase performance of a CGI installation. It is based on the fact that usually the startup of CGI scripts is slow, whereas the response is quite fast.

So if you have a persistent process, you only have to take care of the startup once, and you then experience a real speedup.

FastCGI vs mod_perl (or mod_python, ...)

Once a big fan of mod_perl, I'm converted to FastCGI since. mod_perl was for a long time the answer for speeding up Perl CGI scripts. It has a very good track record of stability and has real hooks deep in the Apache processing requests.

FastCGI focuses on a different feature set that is more actual than mod_perl[1] :

  • It is much simpler to install and configure, especially when having multiple applications.
  • Able to connect to a distant server (running as a different UID, chrooted or even on a remote host)
  • Able to mix scripting languages without any need to compile some other apache modules.
  • Able to be used with several webservers, even closed-source ones : FastCGI is a protocol, not an API.

But steroids do have some side effects

CGI issues

One downside is that your CGI script should be adapted to FastCGI and the fact that the script doesn't end with the end of the request.

In the real world that's quite easy. Every language that is commonly used for CGI offers CGI-wrapper libraries that works in a FastCGI context as well as a plain CGI one.

Webserver issues

Another issue can also come from the webserver. Since CGI is dead simple to implement even the micro-webserver thttpd implements it.

FastCGI on the other hand is a little more difficult to implement, since the webserver needs to create a container that monitors and calls the FastCGI-enabled script.

A standalone FastCGI container

Fortunately, the FastCGI team provided us with a ready-to-use container and a very simple client that acts a plain CGI script, but proxies it to a full-blown container.

Since the plain CGI part is a very small native executable its overhead is negligible compared to the reply time, even without comparison with the startup time of the whole script.

Its installation is also quite straightforward. I just installed the libfcgi package on Debian : it provides /usr/bin/cgi-fcgi.

I created a simple CGI wrapper for my previous munin benchmarking needs :

#! /bin/sh

exec /usr/bin/cgi-fcgi -connect /tmp/munin-cgi.sock \
     /usr/lib/cgi-bin/munin-cgi-graph

Notes

[1] who really need deep apache hooks ?

Saturday, 14 November 2009

Sed is much slower than Perl, or not...

I wanted to do some text replacement with a huge file (think ~18GiB), filled with huge lines (think ~2MiB per ligne)[1].

I naïvely piped it through sed and I was quite shocked that it was CPU bound, and not I/O bound. The average rate was about 5 MiB/s (measured with pv, and the CPU was at almost 100%.The text file was gzipped on the filesystem, but with a 1/100 ratio, so the gzip process just took less than 2% CPU. I replaced then the sed -e with the Perl one-liner perl -lnpe, and .... tadaa, it was flying at a rate of 50MiB/s !

While I'm a big fan of Perl, and know its effectiveness to handle text streams, I'm was still astonished : being 10x faster than sed was something.

But in the good old saying Too good to be true means suspect, I remembered something about the character encoding of the regular expression. Since the system is entirely configured in UTF8, I suspected the infamous UTF8 overhead over plain ASCII.

I was right : a little LANG=C in front of the sed command line restored the rate to 50MiB/s.

So, beware of the performance impact of UTF8 strings, and try to avoid it if you can.

Notes

[1] For the record, it was a MySQL dump