Managing Database Changes

Author: James Seigel, Christian Pekeler
Date: 2005/04/03

Abstract

We describe a method for encoding structural changes to a database as objects which can be used by WebObjects applications to automatically bring their databases up-to-date on startup. We provide example source code, usage patterns, and the lessons we have learned.

Contents

The Problem

WebObjects applications are most often developed with significant database dependance. When developing complex systems, complete database structures can not always be designed to support all future features. Therefore, while adding new features, one frequently needs to modify the structure of the database, or transform existing data in some way.

Structural database changes are often done manually and the required SQL is optionally stored in a file separate from the code base. The set of SQL operations most often has to be applied in sequence, and completely, for the system to operate properly. At deployment time, the production database needs to be updated to include the latest changes from the development database. With manual steps as part of the deployment process, there is significant opportunity for human error, resulting in a potential mismatch between the application and the database, or differences between the production and the development database. Unfortunately, these types of errors are not always easily identified, and sometimes lead to lengthly investigation and debug sessions.

Our Solution

We propose to alleviate the aforementioned problem by versioning the database. The current state of the database will be version x. When we need to change something, for example add a new table to support a new feature of our application, the database's version will become x+1. We also propose to create a new Java class for each new version. The responsibility of these classes will be to update the database from one version to the next by executing the required SQL, thus we call them the Update classes. Another class, the Updater will be able to determine the current version of the database and run the necessary Updates to bring it to the latest version. Finally, we suggest to run the Updater every time the application starts up. That way, we developers never need to worry about the state of the current development database (especially on big teams, where every programmer has her own database) or of the database we are deploying against. We call this approach the Database Updater, and our supplied implementation WODBUpdate.

LogoIdea.png

Will I really need to make many changes to the database? In fact 60-80% of changes to an application occur after the first release of the product [1]. Introducing the concept of a Database Updater early into the project provides a mechanism to support these changes through the lifetime of the project.

The Database Updater is not only useful for structural changes but also for data manipulations. Imagine your WebObjects application manages user adresses. You have an entity State with a corresponding table containing the names of all US American states. Now you want to allow Canadians to use your application, too. For that, you simply create a new Update class that inserts the names of the Canadian provinces and territories into your State table.

Database Updater

The Database Updater is simple enough to be incorporated directly into the WebObjects application, though it could be abstracted and the core contained in a framework.

Flow Chart for the Database Updater

The basic algorithm of the Database Updater

In order to prevent multiple application instances from trying to update the database at the same time (and thereby likely corrupting it), a mutual exclusive lock is used. The first instance to acquire the lock performs the update and unlocks afterwards. Other instances wait for the lock to be released before resuming their startup.

A database table, _DBUPDATER [*], is used by the Updater to lock and unlock the database, and to store the version number.

The process of updating the database is:

  1. getting the current version number from the database (for example 35)
  2. incrementing the version number by one (36)
  3. instantiating an Update class representing that version number (Update36.class)
  4. executing the update (Update36)
  5. writing the new version number to the database (36)
  6. goto step 1.

The very first time we run the updater is special:

  1. a database version will not be found and -1 is assumed instead
  2. incremented to 0
  3. Update0 will be instantiated - it is our bootstrapping update which creates our table _DBUPDATER

The update process is done when no more Update classes can be found. At this point the database is unlocked and control returns to the application which then continues starting up.

If an update fails, for instance because it contains invalid SQL for which the DBMS returns an error, the error is logged and the application terminates without unlocking the database. At this point the developer or a database administrator would need to investigate why the latest update failed, fix the update and revert the database to the state of the previous version. Then one would manually unlock the database with the following SQL before re-attempting the update:

update _DBUPDATER set UPDATE_LOCK = 0
[*]The name _DBUPDATER, with the leading underscore, can cause problems with some database tools. Choose a different table name if you depend on such a tool.

Implementation Details

The update algorithm relies on a simple code structure and reflection to determine which class contains the next update. The update classes are named Update0, Update1, Update2 and so on. They are instantiated and executed by the Updater as follows:

String updatesClassNamesPrefix = Update.class.getName();
String className = updatesClassNamesPrefix + (currentDBVersion() + 1);
Update update = (Update)Class.forName(className).newInstance();
nextUpdate.setDatabase(database);
nextUpdate.performUpdate();

Example update:

public class Update27 extends Update {
    // make AllowOverTime mandatory, default existing rows to 0
    protected void executeUpdate() throws SQLException {
        executeSQL("UPDATE USER SET ALLOWOVERTIME = 0 "
                 + "WHERE ALLOWOVERTIME IS NULL");
        executeSQL("ALTER TABLE USER MODIFY(ALLOWOVERTIME NOT NULL)");
    }
}

Note that WODBUpdate assumes that Update and its subclasses are in the same package. See Database Updater As Framework if that is an issue for you.

These kind of update classes are in our experience understandable by database administrators who have no Java knowledge (should they need to get involved in case of a failure).

As new features requiring database changes are added, a new Update class is created with the next sequential number appended. Convenience methods can be added to the Update super-class, making each update as simple to understand as possible.

The Database class uses JDBC to connect with the underlying database. Earlier implementations used as much EOF as possible instead of JDBC, but then we realized that it was just complicating the code without buying us anything. [todo: How so/Why?]

The SQL in the supplied implementation has been written for FrontBase. Different DBMS products might require different SQL statements.

The Class Diagram for WODBUpdate

Class Diagram

Source

The supplied source code for WODBUpdate is seedwork [2]. This means it is not a reusable framework, it is not supported by the authors and there likely will not be any further updates. If you like what you see, you are welcome to take it and grow it to suit your individual needs. It would not take much effort to write your own updater from scratch if you desire to do so.

Note

We allow you to basically do what you want with WODBUpdate, as long as you don't sue us. See the license.

Source:
Example Updates:
Unit Tests:
Usage:

Issues

Some of the technical and political issues that we have experienced are outlined below. In Extensions, we allude to some measures that can be implemented to reduce the impact of some of the issues.

Inadvertent Updates

Updating a database that should not be updated is an undesirable outcome with the updater. In projects with multiple databases supporting multiple branches or versions of an application, one has to be careful of shared or external configuration files or run scripts. If an external configuration file or run script's contents are forgotten to be changed when building a different branch of your application, inadvertent updates of reserved databases could happen. Ideally configuration and build scripts would be kept versioned with the source code and this issue a mute point. [See Sanity Check and Do not disturb under Extensions] [todo: need to clarify what are config files]

Too Much Power

The updater requires a database user powerful enough to run its update steps. Having a database user with strong permissions and its password stored in the source code or EOModel can raise alarm bells for some people. Obfuscation of the password by encryption or other methods can help alleviate this concern. However, in practice, security is already in place restricting access to the database machine to certain domains or machines. Likely you will have to negotiate the permissions structure for the update user each time it needs more. This can result in the Too Little Power issue.

Too Little Power

The updater requires a powerful database user to run its update steps. By not providing the correct permissions on the update user when it is performing an update can cause failures. This happens when creating new steps in the updater and realizing that you need to add a permission to the updater user to "ALTER SCHEMA" for instance. This information must be remembered and applied manually to all the databases where this application is going to be deployed. This can be alleviated by making the updater user have full permissions right from the start, see Too Much Power.

Politics

Boundaries are bound to be crossed in one's organization with this type of approach. Fears of the application doing too much magic may arrise. We have found that it takes time to get the different parties of the organization on-side with this approach. In our experience, the benefits of this approach far outweigh the risks. However, you might have to prove it and it might take a few successful deployments (and unsuccessful ones!) before acceptance is earned, and benefits appreciated by everyone.

Ensure DBA On-Side

Finally, if something goes wrong it could take substantial work to back out the part of the step that ran. Fixing permissions or whatever else killed the updater, and then running the application again is best done with a database administrator or knowledgeable database person who has been on board with the development of the updater from the beginning. This will expedite fixing the problem, as well as strengthening the quality of the solution in the future.

Extensions

WODBUpdate is an implementation presented as simply as possible. Depending on the type of project and the database being used, additional features can be useful, some of them to mitigate issues already described.

Different Database User Accounts

Database administrators do not always like WO applications to use a database account with admin privileges. However, while these privileges are not required for regular operations, they are necessary for WODBUpdate to do its work. In that scenario it can be useful to have two database user accounts. WODBUpdate can patch the connection information from the EOModel to change the user from the regular account to the administrative account before performing its updates.

Sanity check

On big projects with different branches and more than one database, you would not want to run the application from an old branch against a database that belongs to a newer branch. This mismatch can cause the application to not work as expected, not at all, or worse, cause damage. A simple way to prevent this from happening is to check whether the current database version is higher than the number of the highest Updater step available to the application. It allows the application to be stopped safely with an appropriate error message before it can proceed into unknown execution paths.

Schemas

If your database has several schema names, the updater steps need to properly specify them in the SQL statements. To increase readability and reduce errors, one possible solution implements a tableName() method in the Update superclass:

protected String tableName(String anUnformattedTableName) {
    return "\"" + schemaName + "\".\"" + anUnformattedTableName + "\"";
}

and used:

executeSQL("UPDATE " + tableName("PERSON") + " SET ADDRESSID = 1 "
         + "WHERE ADDRESSID IS NULL");

More Than One Database

A project with more than one database often has an EOModel per database, each with a different connection dictionary. WODBUpdate could be invoked sequentially with those different database connections. This solution requires a supportive implementation that allows for separate batches of Updates clearly marked as belonging to a certain database. An alternative would be to change WODBUpdate to work with multiple database connections at the same time and have different executeSQL()-like methods in Update clearly marked for a specific database. For example executeOracleSQL() and executeFrontBaseSQL().

Database Updater As Framework

Our implementation WODBUpdate assumes that Update and its subclasses are in the same package. If you want to wrap WODBUpdate into a reusable framework, you would probably have Updater, Update and Database in that framework's package and subclasses of Update in a different project/package. To support that, our implementation would need to be slightly changed. We recommend to pass in the value for Updater.updatesClassNamesPrefix as an additional argument to Updater's constructor and derive Update.INDEX_OF_UPDATE_NUMBER from the former. That way, WODBUpdate would not have any compile time knowledge of the particular Update subclasses it needs to instantiate at runtime.

Flag To Override Database Lock

Developers can make mistakes when writing new update steps and not find out until they actually run the step against their database. They have to fix the update, roll back the changes and unlock the database by setting the UPDATE_LOCK in the _DBUPDATER table to 0. If this happens a lot, it can make sense to have WODBUpdate override the lock of the database if a certain environment variable is set. That way, the developer can launch a new version of their application (containing the hopefully fixed update step) with a command line argument setting the environment variable to override. Caution must be observed that this variable does not become scripted, thereby negating any protective effect of the lock.

Do Not Disturb

Certain environments, such as Quality Assurance, require stability while regression testing is in progress for example. Interruptions from developers accidentally updating their databases are not allowed. The concept of a Do not disturb flag which prevents any updates to the database might be appropriate, as long as some very noisy log messages or warnings are printed indicating that the database has not been managed by the updater, use at own risk.

Conclusion

By providing an integrated, automated, and consistent way of maintaining database integrity with respect to the version of the application, the Database Updater reduces deployment misadventures, needless obscure bug hunts and trouble tickets. This results in a simpler deployment process, less required documentation, and happier developers!

There is not really anything WebObjects or even Java specific about the Database Updater, it just happened to be the context we are working in. Nothing should prevent you from using the same idea on any project, as long as it uses an object oriented programming language and an SQL database.

History

Peter Lyons came up with the original idea for a Database Updater in 2002 on a project at CodeFab where he and Ben Holt did the first implementation. Christian, working on the same project, implemented several update steps. Overall, close to 40 updates were implemented and it became clear that the team did not want to go back to the old-fashioned way of managing SQL scripts separate from the project's code.

We (James and Christian) later re-implemented the same idea on a different project. At the time of this writing, 100 update steps were implemented over a time frame of about 14 months. This was on an old project, where the updater was partly used for new features and partly for cleaning up old structures and data. Subsequent deployments of the application were noticeably faster and easier.

Peter Lyons reports [3] that he had used this method again on a .NET project using DotNetNuke [4], which grew to over 170 steps.

WODBUpdate is a new clean room implementation that we have written specifically for this article in February 2005.

Authors

The authors are software developers located in Calgary/Canada. James Seigel can be contacted at seigel@ieee.org and Christian Pekeler at christian@pekeler.org.

References

[1]Mary and Tom Poppendieck. Lean Software Development. Addison-Wesley, 2003.
[2]Michael Feathers. Stunting a Framework. http://www.artima.com/weblogs/viewpost.jsp?thread=8826
[3]private communication
[4]http://dotnetnuke.com