Databases: Version History
By Steve Schnepp on Tuesday, 31 March 2009, 18:19 - database - Permalink
One concept gaining huge momentum lately is file versionning (mostly Git and Subversion). It is quite interesting to track the evolution of the data contained in the files, and not only the last time the file was updated.
On the last post Databases: Meta-Data I was discussing about the merits of having a modification timestamp among with other various informations and more generally about the parallels between a database and a filesystem.
We can also adapt this model to our database by not only storing the last modification date, but also the old data.
Two main options are :
- replicate the whole row in a history table
- have a generic history table that only store the old values of the column that have changed
The approach is done in the way this ddj database article, but in a much less intrusive manner. We just have our trigger-based system evolve with every modification logging an insertion in the derived history table.
The history table and trigger become :
TABLE MD_ORDERS_HISTORY (
SERIAL MD_ORDERS_HISTORY_ID PRIMARY KEY,
TIMESTAMP CTIME,
CHAR TYPE DEFAULT 'M',
INT ORDERS_ID FOREIGN_KEY ON ORDERS(ORDERS_ID),
MONEY PRICE_ADJUSTMENT
)
CREATE INSERT TRIGGER ON ORDERS o
BEGIN
INSERT INTO MD_ORDERS (ORDERS_ID) VALUES (o.ORDERS_ID)
END
CREATE UPDATE TRIGGER ON ORDERS o
BEGIN
UPDATE MD_ORDERS SET MTIME=NOW(), MLOGIN=CURRENT_USER()
WHERE ORDERS_ID = o.ORDERS_ID
-- Log the change
INSERT INTO MD_ORDERS_HISTORY (ORDERS_ID, PRICE_ADJUSTMENT)
VALUES (o.ORDERS_ID, o.PRICE_ADJUSTMENT)
END
CREATE DELETE TRIGGER ON ORDERS o
BEGIN
-- Log the delete
INSERT INTO MD_ORDERS_HISTORY (ORDERS_ID, PRICE_ADJUSTMENT, TYPE)
VALUES (o.ORDERS_ID, o.PRICE_ADJUSTMENT, 'D')
END
The interesting point is that you can now travel back in time in order to see what happened to a row in case of debugging the application or its (mis)usage.
The main issue of this system is that your database usage will be larger by several orders of magnitude, depending of how often your application update its data.
Several possibilities to limit the size exists :
- Partitioning the historical data, and storing it on a slower (cheaper) array.
- Pruning the old data since depending on the application, traveling back 1 whole year may be overkill
- Live aggregating changes. If the data was modified less than 1 hour ago,
just update the last history line. We can even have a
history_starttimestamp andhistory_stoptimestamp to show that a aggregation has taken place. - Deferred aggregating changes. Like the live one, but on a scheduled basis. It can even have a dynamic granularity (a granularity of 1 day if the data is 1 year old, 1 hour if it is 1 month old).
Comments
Tom Kyte would say something along the lines of “if the answer is a trigger, then you didn’t understand the question”, and I hate things that happen magically under the hood too, but I can’t think of any other maintenable ways of adding stamp dates in an application. I had to deal with it in Oracle Applications, all tables had creation_date, update_login, with associated logins and so on (but no history): it was a pain to add all these columns in every bit of code.
Anyway, the ddj article is very good. I have this recurring problem in my current job, it’s nice to read something clean about the different options.