1

The scenario: I have a web application running in Kubernetes. The web application is managed and updated by Argo CD, which means a Git repo defines the state of the application.

Now the new requirement: The users should also add a phone number when they register. This requires two changes:

  • A change to the application itself, maybe some HTML and JavaScript. You can change the application, package it as (Docker) image and then apply the changes by changing the Argo CD repo
  • A change to the database. The user table needs a column phone_number.

The database lives outside of the Kubernetes cluster. How do I apply these two changes in a sensible manner?

One could change the application in a way that it works with the old and the new table layout (using a feature flag, or just by checking whether the column phone_number exists), then deploy the application and then add the column "manually".

But I am looking for a better, more automated solution in the spirit of GitOps.

1 Answer 1

4

The general term for this is a 'versioned schema' or 'evolutionary database design'. In a nutshell, you keep a set of scripts in your source control which modify the database. You start with an initial creation script which defines the baseline DB from an empty 'scratch' DB schema. Each change to the DB schema is then a new script. Each script is given a version number so that the order of changes is clear. You never change any existing scripts in normal circumstances, you always add new ones.

In the initial version script, you create one or more tables which tracks all the scripts that have been run against it.

The trick is to align the versions of the DB scripts with release versions of your code in the source repo. That is, if you want to run a version of your application from a year ago, the automation can run all the scripts required to align the DB schema with it. You can then recreate the database from scratch. Additionally, when you run an upgrade, the version table in the DB tells the tooling which scripts need to be run to bring it to the desired version.

There are various vendor/open-source tools around this such as Liquibase or Flyway (not a recommendation, just for reference) It's also not terribly difficult to roll your own here but YMMV.

One note of warning: there's this idea around these kinds of tools that the first thing your application should do on startup is check the version of the DB and upgrade it as necessary. I strongly disagree with that idea for two main reasons: 1) your application almost surely shouldn't have the kinds of rights required on the DB to do that and 2) I would never want to be the person who accidentally connected to the production DB from development and broke production, nor would I wish that on (mostly) anyone. Instead, I would recommend checking the DB version and if it doesn't match the application's version, exit with an error. I suppose if you manage the permissions properly on the prod or other important databases that's what would happen anyway, but I don't see the point of setting a booby-trap for yourself.

4
  • It also interesting in which order one applies the changes. If you run your application in several pods, updating means that you have the old and new version around at the same time. So one way I can think of is to design the new version of the application to cope with both the old and new database structure, and apply the database changes after all pods have been restarted. Another way would be a complete downtime. But maybe I am just ignoring something obvious. Commented Dec 20, 2024 at 11:15
  • 1
    Another term I've heard for this is evolutionary database design. Commented Dec 20, 2024 at 13:26
  • 1
    @JFabianMeier I would start with the database changes, but make those backwards compatible. So in your example, make the phonenumber column nullable of give it a default value (which can be an empty string in most databases) so that the old code can continue to insert new records. Commented Dec 20, 2024 at 14:02
  • 1
    @JFabianMeier Making database schema changes with zero downtime is tricky. I don't think it's possible in general, but you can probably work it out in many cases without too much effort. Sometimes you might want to make the DB changes first (e.g. adding things), sometimes the code first (e.g. deleting things). You would have to do something around checking version compatibility beyond what I recommend here, though. Commented Dec 20, 2024 at 16:57

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.