Database definition management
There is a lot of problems when you develop, deploy and support almost any larger applications. One of the most common problems is how to manage your (relational) database definition. By database definition I mean not only its structure (tables, constraints,..), but also certain amount of data needed for development, deployment or installation.
At least these questions usually need to be answered:
- Where to store a database definition? (e.g. in versioning system?)
- In what form should the database definition be stored? (e.g. sql scripts, xml)
- How to achieve easy development and change-management of the database definition?
- How to achieve quick and easy updates of production databases?
- How to fill initial data into the database?
You always must think about your goals. There are some ordinary ones:
- Every developer always needs to work with a database corresponding to his current code base. He needs to synchronize his developer-database definition whenever he gets the sources from the versioning system.
- When the application gets deployed, its database must be automatically created too.
- The deployed application will be repeatedly updated – including its database. The update process should be fully automated.
- Multi-tenant application must be able to create new tenant databases at anytime during runtime.
Let’s define some terminology we will use later in the article.
Any script transforming one state of the database to another. It may be a raw SQL script like
ALTER TABLE sample ADD COLUMN sample_col;
But it may be something more complicated as well. Let’s have a look at this XML:
<alter id="42" date="12.5.2012 20:47"> <forwardSQL> <!-- apply change --> ALTER TABLE sample ADD COLUMN sample_col; </forwardSQL> <backwardSQL> <!-- revert change --> ALTER TABLE sample REMOVE COLUMN sample_col; </backwardSQL> </alter>
It contains more information: script id, date of creation, forward script performing the database modification itself, backward script for reverting the modification if needed.
Even the raw SQL alter-script may hold some metadata (e.g. creation date) in its file name (alter_20120512.sql) or in a file path (scripts/module1/tag1/120512.sql).
Any script usable for a creation of a brand new application database or its distinct module. Typically it’s a some form of a database backup consisting of a sequence of CREATE TABLE’s and related commands such as adding foreign keys and similar.
You could say that if we took all available alter-scripts we’d get a create-script. Yes, that’s true, but such a script might not be effective enough. And what more, you can’t use such a script at all when you use the create-script for adding modules to an existing database (you simply can’t use a pile of alter-scripts as a create-script when any part of the database already exists). Yet it still can make sense to use alters when creating a developer’s database.
Development data script (devel-data)
Script used for filling the database with the data necessary for a comfortable development. Developers use it to prefill their database with a sample data.
Initial data script (init-data)
Fills the newly created database with the initial data required in production environment.
It is used whenever new database is created. The script is usually a part of the installation package or is included in the application itself. The data often come from different external sources (e.g. laws, government, your business department, etc.) and need to be somehow transformed. Requirements for its creation and management may significantly differ and will not be further discussed.
Use-cases in development environment
- create new database – needs: create-script, devel-data
- update existing database – needs: alter-scripts (data content stays the same)
Use-cases in production environment
- installation of the application – needs: create-script, init-data
- application update – needs: alter-scripts, init-data (and create-script in case of a multi-tenant app)
- add new tenant – needs: create-script, init-data
Database definition is a set of instructions on how to create new application database. From a logical point of view, the database definition is part of a source code-base.
Database definition versioning
Database definition is a part of the application’s source code and as such it needs to be versioned. But what exactly should be versioned? There are several apparent possibilites:
Alter-scripts – an execution of all alter-scripts creates the database. Versioning the alter-scripts means that every time a developer changes the database definition, he commits the alter-script representing all modifications he just did.
Create-script – by definition creates the database. When you version a create-script you must commit complete create-script with every change you do in the database schema. Database updates must be done by generating delta-scripts.
Data model – database definition is deduced from an appropriatelly enriched business model (see GORM from Grails). Data model is always versioned since it is indeed in the core of the application. When the model is changed the delta-script is automatically generated and the database is updated.
All of the approaches may be used. Every team must decide what fits its needs best. There is no simple answer on what to choose, but at least these aspects should be considered:
1) Required flexibility of the database update (do I need alters or can I go with deltas? – see next chapter for clarification)
2) Is it comfortable/difficult for a developer to handle the database definition? Consider attributes like: human-readability, ease of modification, ways of debugging and solving error conditions.
3) Database server (Oracle, MySQL, PostgreSQL, …). Feature sets and tool sets certainly are not the same. For example one database supports DDL statements (create table, alter table) in transactions (PostgreSQL) while another database does not (Oracle). Lack of such a feature may have strong impact on the design of your alter-script related tools.
4) Available third-party tools. Usually you hardly find some really useful tools, but it’s worth trying.
5) What home-made tools we need to implement. This is how it often ends – you decide to do it yourself. From my experience this is quite common solution on many projects.
I definitely prefer versioning alter-scripts over create-scripts. It is more concise and readable. Also it does not require as advanced tooling as create-scripts do – especially I mean that it is quite non-trivial to generate an alter-script from two create scripts and sometimes it is even impossible. It is closely related to the difference between delta-script and alter-script.
Delta-script vs. Alter-script
delta-script is a subtraction: delta = target-db – source-db
alter-script is a general function: target-db = alter(source-db)
Which means a delta-script is a subtype of an alter-script. It represents “the shortest path” of how to transform source-db into target-db. On the other hand, an alter-script is more general. It may contain for example custom (hand-written) data manipulations that can’t be obtained by any subtraction.
Here is a point where this article could come to an end. But it won’t. Further on I will shortly describe one concrete solution. Please do not copy it without thinking about it.
Multi-tenant application with a hybrid database definition
Environment: Java + PostgreSQL
Artifacts versioning: alter-scripts in SVN
Alter-script: Raw SQL file containing a timestamp in every file-name. Usually written manually. For bigger alters, the DbDiff tool may be used (see below).
Create-script for developers: Set of all alter-scripts.
Create-script for production: Raw SQL file containing all CREATE TABLE’s and so on. Generated by DbExport (see below).
Devel-data: Maintained completely by developers. It’s a set of raw SQL files.
Init-data: Maintained mostly by business/analytical department. It can be exported (dumped) into XML whenever needed. The XML’s then become part of the installation package.
DbSync – synchronizes the database by applying all available alter-scripts
- Looks for available alter-scripts on the pre-configured place (classpath, file-system, jar file).
- Manages history of already aplied alter-scripts, it is done by maintaining a service database table _alter_history containing a list of all successfuly applied scripts. It consists of the following columns: id, script_name, applied_timestamp.
- By comparing available and applied scripts, it is able to find and apply new alter-scripts.
- If used on an empty postgresql database (with no tables) all alters are run and so a brand new database is created.
- Developer’s database may be optionally created by applying the devel-data on a recently created database.
DbExport – takes a complete set of alter-scripts and transforms it into a create-script
- How it is done: Then the DbSync is used for a creation of a brand new application database. Now we have a source database which is in the following step dumped into a create-script. The dump is partially performed by a handy apgdiff tool, partially by standard pg_dump and partially by our own tools focusing mainly on an init-data gathering and export.
- It is obvious now that the init-data is exported as a part of the create-script. It is because it is always used together.
- The resulting create-script consists of a set of files and directories.
DbImport – takes a create-script and executes it
- Creates new production ready database populated with the init-data.
DbDiff – compares two databases and generates a delta-script
- tool helping developers to create more complicated alter-scripts
The description of tools is not completely accurate. Take it as a concept.
This article contains general and brief overview of the database definition management. The main goal here was to define some generally usable terminology and make the first steps towards more formal and more detailed description of the common issues we all have to tackle when we develop and operate database backed applications.
Posted in programming