By Brian
July 2, 2023
Importance of DDL version control
Have you ever wondered how database objects are managed and tracked when they change? If so, DDL version control could be the solution for you.
What is DDL?
Data Definition Language (DDL for short ) is a subset of SQL that is used to define and manage all objects in a database. This language is primarily used to CREATE, DROP, and ALTER tables within a database. But how does it look? Well these changes update the database schema. A schema is the layout and structure of a database, encompassing tables, views, indexes, and more. By modifying the scheme with DDL, you are making alteration this table structure. Here is an example of a Scheme.
CREATE TABLE Customers(
id INT PRIMARY KEY,
first_name VARCHAR(100),
last_name VARCHAR(100),
dob DATE
);
Here we created a table called Customers that holds first names, last names, and date of birth (dob). If you wanted to add an email field, you would execute a DDL command like the following:
ALTER TABLE Customers ADD COLUMN email VARCHAR(200);
This will update the Customers table to now include email:
CREATE TABLE Customers(
id INT PRIMARY KEY,
first_name VARCHAR(100),
last_name VARCHAR(100),
dob DATE,
email VARCHAR(200)
);
Importance of DDL Version Control
So what does this have todo with DDL (Data Definition Language) version control. This comes into play when you maintain a history of your changes to the database. You’re able to see a track of the changes made to each table and columns. This becomes immensely helpful when a project or feature extends beyond a month's work. This also serves as a good way to make sure that the staging and production database are sync with the same columns and data types. Another benefit is the ability to be able to roll back changes if something was to break. Also another would be the benefit of easier troubleshooting between items related to the database.
Version Control with Code
A typical workflow for developing a new feature in code involves branching out from the master branch and making commits as progress is made. Push that branch to the master branch, testing it thoroughly to make sure nothing breaks, then merging it to production. Most developers are more familiar with this practice in 'coding' languages like JavaScript, PHP, TypeScript, Python, etc., as the workflow often involves using Git. Git is the version control system that allows multiple people to work on the same code without stepping on each other's work. A person track are usually seen as commits made through out the feature that they work on. Commits can be seen as breadcrumbs or as a save point for someones work.
Commit as a Save Point
Those commits are pretty important because they act like a save point. For example, imagine playing an open world game back before Nintendo introduced the save game feature in Legend of Zelda in 1986? You would’ve needed to keep the NES on the entire play through and hope that there wasn’t a power outage or that your system ‘somehow’ got turned during dinner time. You would’ve lost all that progress you made in Hyrule and start all over. But thankfully, the save file feature allowed you to safely save your game, so you could take a break and return to it later. Or even save the game at a certain spot to come back to it later. Commits work in a Similar way.
How Does it look?
One approach is to have two different directories within the database directory. In this example, we’ll be working on the database named “EcommerceDB”. Originally we’ll have two tables, Customers and Orders.
CREATE TABLE Customers (
id INT PRIMARY KEY,
first_name VARCHAR(100),
last_name VARCHAR(100),
dob DATE
);
CREATE TABLE Orders (
id INT PRIMARY KEY,
customer_id INT,
order_date DATE
);
This is our version 1.0
├── v1.0/
│ ├── create_customers.sql
│ ├── create_orders.sql
After some time, we realize that we need to register customers with their email. So, we can make a change like this:
ALTER TABLE Customers
ADD COLUMN email VARCHAR(200);
And record the change like so:
|── v1.1/
│ ├── alter_customers_add_email.sql
As the business grows, we decide to track products and associate them with their orders.
CREATE TABLE Products (
id INT PRIMARY KEY,
product_name VARCHAR(200),
product_price DECIMAL(10,2)
);
ALTER TABLE Orders
ADD COLUMN product_id INT;
and we’ll record it like this:
├── v1.1/
│ ├── alter_customers_add_email.sql
By the end all together it’ll look like this:
EcommerceDB_Version_Control/
├── v1.0/
│ ├── create_customers.sql
│ ├── create_orders.sql
├── v1.1/
│ ├── alter_customers_add_email.sql
├── v1.2/
│ ├── create_products.sql
│ ├── alter_orders_add_product_id.sql
├── v2.0/
│ ├── create_shipping.sql
│ ├── alter_customers_add_address.sql
Real World Example
Here’s a real world example. One time, there was a feature that I was working where there were changes needed to be made to different tables. When everything was finished and tested in staging we had to go and update the production ddl to match. The issue arose when we updated the production database and pushed the code changes to production, only for the feature to break. Why? How did it happen? We checked the updates that we had noted for the database, and everything seemed correct. We then looked through the code and started trying to debug the issue. After a while we realized our mistake, we forgot one column that was added. We weren’t able to notice this at first since we had created a backup of a table in prod and brought it over to staging for testing purposes but forgot to add the new column to the original prod table. You might dismiss it as a simple mistake, but in all seriousness, it's easy to make errors when things aren't documented properly.
So, what's the point?
The point is, we could have saved ourselves a huge headache if we had just updated or committed the changes made to the DDL as we worked. Waiting till the end just leads to errors. Hopefully this post has been informative. I'd love to hear your thoughts. Please leave a comment below or feel free to reach out to me directly. Let's continue the conversation!