Personal Workflow Blog

To content | To menu | To search

java

Some random thoughts collected meanwhile working on an Enterprise-grade J2EE stack.

Entries feed

Wednesday, 6 May 2009

Bringing C++ Const to Java

Constness is another C++ idiom, like RAII that I talked about earlier. We can then write code that are side effects free : when a function is called with a const argument, we are assured that this argument will not be modified under the hood without us knowing[1].

In Java, the commonly accepted way is to use the final keyword. But it has a major drawback : the object cannot be redefined, but can be modified by calling mutable members. You have to convert it to an immutable type. This is a simple task, but radically different ways exists.

Notes

[1] Of course, C++ being what it is, there always will be ways. But let's say it's not as tempting to do as otherwise.

Continue reading...

Sunday, 3 August 2008

Daisy Chain Setters and Handle Optional Parameters Effectively

A side effect of RAII in Java is that all the parameters have to be set at construction time, since construction time is the time to acquire the resources. A quite common problem is the handling of optional parameters.

Continue reading...

Sunday, 27 July 2008

RAII in Java to clean your code

RAII is a very common idiom in C++ and some other languages that don't have an integrated garbage collection management.

Java has GC, therefore this idiom is not as popular. But the main problem of Java is that although the GC system has become quite efficient, it only handles the memory management. For other resources (database connections, sockets or file descriptors for exemple), this system is not really adequate. The release of these resources has always to be explicit, and handling this via the finalize() method is not satisfactory.

In short the finalize execute itself when the object is about to be garbaged. The main problem is that this garbaging does only take into account the memory limits, not the resources limits (max number of open file descriptors for example). So you can run out of open file descriptors way before running out of free memory.

So, the usual construction is like this :

MyResource res = null;
try {
  res = new MyResource();
  res.setSomething(someValue);
  /* Use the resource */
  res.close();
} catch (Exception e) {
  // release the resource if needed
  if (res != null) { res.close(); }
}

But hey, that's many code lines, and in case of a Throwable, you don't release the resource. The concept of releasing the resources with a try { } finally { } construct is much better (actually, it's one of the most common usage of finally).

The construction becomes :

MyResource res = null;
try {
  res = new MyResource();
  res.setSomething(someValue);
  /* Use the resource */
} finally {
  if (res != null) { res.close(); }
}

But here we can see that Java is not quite different from C++ for that matter, so we can just adapt the C++-ism that is RAII, and write a much cleaner version that aquire the ressource in the constructor, so most failure conditions can be checked immediatly.

The construction becomes finally :

MyResource res = new MyResource(someValue);
try {
  /* Use the resource */
} finally {
  res.close();
}

Since a constructor never returns a null value, there is no need to test. And if the constructor throws an exceptions, the general contract is that the object does not exists. Therefore no resource has been allocated since it would be impossible for the caller to release it (remember, no object was created). So there is no need to release it.

The setter is also integrated in the constructor, since the whole RAII concept is that the constructor returns a completely initialized object. It also enables to write cleaner code since when calling the close() there is no need to do some if() to know the object initialisation-state.

Monday, 3 December 2007

Use Immutable Objects to Avoid Synchronisation

With the future and its multiple core environnements as stated in a previous post about workflows, efficient locking will be more and more an issue.

My previous way to cut this gordian knot was to :

  1. multiply the objects that could be locked to reduce contention : have many multiple elementary objects . These can be workcases in the workflow theory.
  2. cheat to minimize the time spend on locking : use something called software transactional memory that only locks at aquiring the ressource (actually, here it means taking a copy) and only updating it at the end of the processing (remember those infamous access EJB ?). This can be that at be beginning of computing a task, every data from the workcase is copied in a new, non-shared, worktask. All the task work will then be done on the privately copied data. It can surely be optimised in copying only the data that "might" be used (read and/or write). And at the end of the task, the workcase is just "commited" (updated) in the main data storage. The nice thing is that you only need to synchronise the beginning, the end and to prevent concurrent modifications (usually done with a incrementing version counter).

Now, if you cross this with another previous post about caching cleverly and sparingly, you can also have another way of having for exemple a configuration that is at the same time :

  1. fast
  2. can be updated at runtime
  3. transactional (once you access it once, you will have all the properties that are coherent together)

The idea is to use immutable objects (such as java.lang.String). They are usually despised as memory eaters since you have to create a whole bunch of objects since you cannot modifiy them, only recreate them with the updated values. But they have a very nice property : there are completly thread-safe, since no one can modify them, so they are lock-free.

So, just imaging that the first time you ask for a configuration, you just load the whole in an immutable config object into something like a singleton. You just hand a reference to it to the called after you stored the reference in the caller's context (could be a HttpServletRequest). The second time the caller asks for the configuration it's already in its HttpServletRequest, so you take it from there.

Meanwhile some other thread just asks to refresh the configuration, a new immutable config object is created, it's swapped with the old one (only the reference is updated, not the object). This swap and the handing are to be synchronized together (it's not even always mandatory, since usually if there are several handings that give the old value, it's often not that problematic : the whole ole value is coherent). When all the old contexts will go out of scope, so will the old config object.

The use of immutable objects has becomed much easier with GC, since we don't have to track the scoping anymore (usually it was done through a pseudo-immutable object, that was only mutable on the reference counting).

It's also one application of COR (Copy On Read) instead of the more usual COW (Copy On Write).

Wednesday, 19 September 2007

Convert your Log Files into Gold

Log files are a necessary evil on a live system. A few rules can transform your log files from a useless heap of textfiles to a gold mine.

I'll focus mostly on Log4J since it's available on Java, but is ported on many languages also as it sets some kind of logging standard.

The logger is configured per class with a :

private static Logger logger = Logger.getLogger(MyClass.class);

The logger is private since each class should have its logger, especially the derived ones (that way you can very nicely debug the virtual function calls), and static since it's either thread-safe and that way you have it even in the <init> and <cinit> of your class (you did put it in the first line of the class, didn't you ?).

This is very useful since you can configure a per-package or a even per-class level of logging. It is very useful since all your classes do only one thing, don't they ?

I usually use only 5 levels of logging : ERROR, WARN, INFO, DEBUG, TRACE

  • TRACE is used to help with a dump of many internal variables (it's a last resort debug, since usually it's very verbose)
  • DEBUG is used to debug this class, with keypoints inside the class, in order to see the inside flow of execution.
  • INFO is used to debug other classes with the help of this one. Usually it emits only one line per public call, with the incoming parameters and the result displayed in a synthetised form.
  • WARN is used when an exceptional situation happens (usually a catched Exception that is triggered by the caller data) and there is a known path to recover.
  • ERROR is used when an exceptional situation happens, but there is no known path to recover. Usually this line is send by email to the administrator. If there is a problem, and an ERROR is logged, there should be no other ERROR logged for this problem : it will help you to keep a high signal/noise ratio.

In a live production system, I just log INFO for terminal business classes (the ones that represent actions), and WARN on technical ones (the one that actions class uses). I configure it to send an email on every ERROR.

It's also very important to have a reminder of a synthetised form of the arguments in the INFO, WARN & ERROR log messages (such as an ORDER_ID if it's an ordering action or the PRODUCT_ID if it's a deleting product action). It's also a good thing to put the exception that triggered the WARN/ERROR log. That way you can just grep through your logs to see if, when and what happened to that famous product that everyone is so excited about

Always use a RollingFileAppender. Always. If you're scared about loosing some logs, just put an insane number of backup files since nothing is worse than not enough space on the log filesystem : you won't have the logs anyway. Note that if you have a different kind of rolling mecanismed you can use it, the point is that you should never leave a growing log file without control.

So, here is a exemple of 2 classes (one business, one technical) that present the logging system I talked about :

public class PublishProductAction extends Action {

private static Logger logger = Logger.getLogger(AddProductAction.class); 
private int product_id;
public PublishProductAction(int product_id) { this.product_id = product_id; }
public void execute() {
        logger.info("publishing product[product_id:" 
                + product_id + "]");
        try {
                // Do things...
        } catch (Exception e) {
                logger.error("Cannot publish product[" 
                        + product_id + "] : ", e);
        }
        logger.debug("done publishing product[product_id:" 
                + product_id + "]");
}

}

public class MyPreparedStatement {

private static Logger logger = Logger.getLogger(MyPreparedStatement.class);
private String sql;
public MyPreparedStatement(String sql) { this.sql_id = sql; }
public void execute(Collection params) {
        startTimer();
        try {
        if (sql == null) {
                // Log in warning, since it should not happen, 
                // but we can handle it gracefully
                logger.warn("The SQL statement is null, we do nothing");
                return;
        }
                try {
                // Do things...
        } catch (Exception e) {
                // log only in info since we don't catch
                // the Exception.
                logger.info("Cannot execute SQL[" + sql + "] : ", e);
                thow e;
        }
        } finally {
                stopTimer();
                // help the outside class to see what happened
                if (logger.isInfoEnabled()) {
                        logger.info("executed sql [" 
                                + parseSQL(sql, params) + "] in " 
                                + getTime() + " ms");
                }
        }
}

}

- page 2 of 3 -