Personal Workflow Blog

To content | To menu | To search

Tuesday, 8 June 2010

Waiting for Munin 2.0 - Introduction

This is the first article of a series about the coming version 2.0 of Munin.

The idea came from the series Waiting from 8.5 about PostgreSQL.

The ironic part is that their 8.5 release has become a 9.0, just like our 1.5 will be a 2.0.

I'll post several small articles about new or enhanced-enough features. They will all be tagged munin20.

Planned summary :

  1. Performance - Architecture context
  2. Performance - FastCGI
  3. Performance - Asynchronous updates
  4. Performance - Misc
  5. Native SSH transport
  6. Custom data retention plans (keep more data)
  7. Dynamic zooming

Thursday, 1 April 2010

Don't use Excerpt... At least with DotClear.

DotClear automatically generates a meta description tag from the blog entry, but it doesn't take the excerpt into account.

It just takes the beginning of the article content. Since the excerpt is also shown at the beginning of the article, I cannot just write 2 times the same content.

meta description is quite interesting since it is usually used for the little snipped under a search result in usual search engines, so having the beginning of the post in here is very nice.

This fact annihilates the good point of having excerpts.

I'm now falling back to removing progressively all the excerpts on my posts...

Wednesday, 31 March 2010

API Design: Avoid hidden costs of simple features

Programmers are usually like water : they always use the path of least resistance.

Let's see how to use this fact to predict the usage of an API when you design it.

Initial API

Consider the very simple DB API that consumes a connected ResultSet and presents a disconnected version of it.

class DisconnectedResultSet{
        public DisconnectedResultSet (ResultSet rs);
        public boolean next();
        public Object getObject(int col_idx);
}

It's usage is quite easy :

while (drs.next()) {
        int col_idx = 1;
        drs.getObject(col_idx++); // Do something w/ 1st col
        drs.getObject(col_idx++); // Do something w/ 2st col
        //...
}

Just a little evolution...

Since the DisconnectedResultSet is disconnected, we can imagine that it should implement a rewind() method in order to use it several times without running the initial query again. We now have an updated class :

class DisconnectedResultSet{
        public DisconnectedResultSet (ResultSet rs);
        public boolean next();
        public Object getObject(int col_idx);   
        public void rewind(); // Be able to rewind it
}

And its classical usage :

while (drs.next()) {
        // do stuff...
}
// ...
drs.rewind();
while (drs.next()) {
        // do something else with the same data...
}
// ...
drs.rewind();
while (drs.next()) {
        // do something else with the same data...
}
// ...

A new need comes

A new need comes : see if the DisconnectedResultSet is empty or not in order to avoid sending header.

The usual way is to send them once when iterating like :

boolean is_headers_sent = false;
while (drs.next()) {
        if (! is_headers_sent) { 
                send_headers(); 
                is_headers_sent = true;
        }
        // do something else with the same data...
}

But since there is a nice rewind()method, just waiting to be used, the code might become :

if (drs.next()) {
        send_headers(); 
}
drs.rewind();
while (drs.next()) {
        // do something else with the same data...
}

Now, this code isn't generic anymore to accommodate a connected ResultSet.

So, as John Carmack said :

The cost of adding a feature isn't just the time it takes to code it. The cost also includes the addition of an obstacle to future expansion.

That's really true when you design APIs since their purpose is to last long and to be extended.

So, think twice when you propose an extension "just in case".

The little evolution, revisited...

To solve this case, don't propose a rewind() method, but offer a duplicate() one. It offers the same functionality, just in a new object.

The usage will be almost the same as shown below, but since it feels more performance-sensitive, it won't be used as lightly : the boolean is_headers_sent pattern has now more chances to be used.

while (drs.next()) {
        // do stuff...
}
// ...
drs = drs.duplicate();
while (drs.next()) {
        // do something else with the same data...
}
// ...
drs = drs.duplicate();
while (drs.next()) {
        // do something else with the same data...
}
// ...

It's an other example that immutable objects are the way to go, but for a different reason this time.

Note: Just finished my March 2010 article, even on time... I'm still trying to keep at least a one article per month blogging rate. So far so good for 2010, still 9 months to go !

Saturday, 20 February 2010

Free Exception lunch : Use unchecked exceptions, but still announce which ones you might throw.

In a previous article I choosed my side : Unchecked Exceptions are much simpler to use.

But, on the other side of this great division, there is a very valid point : You usually declare checked exceptions. Sure it's possible to only declare to throw Exception, but that would defeat the whole purpose of using checked exceptions.

The nicest thing is that you can also have a custom exception hierarchy, but based on RuntimeException instead of a plain Exception. This way it's like in C++. Everything might be thrown, and you don't need to handle them.

Declaring them, on the other side, is very interesting because you are documenting your interface for almost free.

So, use unchecked exceptions to free yourself of the checked catch-slavery, but still declare the custom ones you might throw.

Immutability of an URL

In the pure spirit of Data is King I think that URL should never change. Even the W3C agrees with their Cool URIs don't change article.

But we all know that in IT never is only not in the foreseen future. So URL do change, at least after a while, and usually for technical reasons[1].

Since you can update your website to update the URLs, but the inbound link cannot be easily updated. To handle this need, the HTTP protocol has specified the 301 response code.

The solution is that the site should remember all the urls that it generated and redirects accordingly. This way you'll never loose a potential reader to the infamous 404 (this page does not exist).

Some sites even try to approximate the page on a custom 404 page. That's another reason to have user-friendly urls : to be able to hint your reader to appropriate pages in case you don't find his initial destination.

Sadly, this redirect behavior isn't supported by my blog engine (dotclear)... That's for the eat your own dog's food, but I'm looking forward to do it on my current blogging platform.

Notes

[1] upgrade to another blog engine...

- page 2 of 10 -