Conducting a major version upgrade of a self-managed software is always like a shitshow.
(Though we are conducting a GitLab EE upgrade from version 13 to 14, the following steps should apply to GitLab CE / other major version upgrade as well)
You can jump to the Major Upgrade section if you just want to learn the upgrade steps. Here is what we did:
- Read the gitlab upgrade guide
- Backup data
- Upgrade (1st try)
- Rollback
- Upgrade (2nd try)
We made the mistake of not reading the GitLab upgrade guide carefully, thus having the extra step 4 and 5. Please don't repeat our mistake
Background
Bytebase is an open source database schema change and version control tool for teams. We have native VCS integration with GitLab CE/EE from day 1. User can visit our live demo site from the landing page. And we did curate the demo data to let our potential customers better understand our VCS integration feature. If visitor clicks the commit hash, she will be navigated to that GitLab commit triggering the Bytebase pipeline.
The Problem
Besides complementing our demo, our GitLab instance is also used by our team members to work on GitLab integration features. And last week, we received an issue report telling our VCS integration is broken, and it turns out it's related to the default OAuth token expiration change introduced in GitLab 14.3 This is a sane change from GitLab. However, when Bytebase developed this feature, the latest major version was GitLab 13. That version has no expiration time set on the access token, and our implementation didn't properly handle the case when the access token expires. In order to facilitate testing this GitLab new feature, we decide to upgrade our managed GitLab instance to the new version.
The Major Upgrade
Overall steps
- Carefully read GitLab upgrade guide (read the fxxking manual)
- Backup your data! If you mess up the upgrade (and we did), you can still restore to the originally state and restart again.
- Follow the upgrade guide to conduct the upgrade.
Step 1 - Read the Upgrade Guide
GitLab's upgrade guide is pretty well written, e.g. it clearly lists the upgrade path On the other hand, GitLab itself is already a monster and conducting a major upgrade is still a daunting task. For our particular case, we are upgrading from 13.12.2 to the latest version (14.5.2 at the time).
Step 2 - Backup Data
Our GitLab instance runs on AWS EC2 using EBS as the storage. So we stopped the instance, took a snapshot, waited for the snapshot completion and started the instance again. The created snapshot
Step 3 - The Upgrade (1st try)
GitLab has clearly described the general required major version upgrade steps: We are already on 13.12.2 which is the latest minor version of the preceding major version, so we can skip the 1st step. So our identified upgrade path is:
13.12.2 -> 14.0.z (latest first minor version of 14) -> 14 latest
Part 1 - 13.12.2 -> 14.0.z
Visit GitLab docker hub to find its latest 14.0.z version which is 14.0.12 Pull the 14.0.12 ee image
docker pull gitlab/gitlab-ee:14.0.12-ee.0
Stop the existing container and start the 14.0.12 version
docker stop gitlab
docker rm gitlab
sudo docker run --detach --hostname gitlab.bytebase.com --publish 8080:80 --publish 22:22 --name gitlab --restart always --log-driver json-file --log-opt max-size=10m --log-opt max-file=3 --volume /srv/gitlab/config:/etc/gitlab --volume /srv/gitlab/logs:/var/log/gitlab --volume /srv/gitlab/data:/var/opt/gitlab gitlab/gitlab-ee:14.0.12-ee.0
Verify the running version Part 2 - 14.0.z -> 14 latest
We were too hurry to proceed to part 2 and made the mistake
Pull the latest image
docker pull gitlab/gitlab-ee:latest
Stop the existing container and start the latest version
docker stop gitlab
docker rm gitlab
sudo docker run --detach --hostname gitlab.bytebase.com --publish 8080:80 --publish 22:22 --name gitlab --restart always --log-driver json-file --log-opt max-size=10m --log-opt max-file=3 --volume /srv/gitlab/config:/etc/gitlab --volume /srv/gitlab/logs:/var/log/gitlab --volume /srv/gitlab/data:/var/opt/gitlab gitlab/gitlab-ee:latest
PANIC, GitLab failed to start and kept crash looping 😫
Use docker logs -f gitlab
to view the log and quickly realize we missed one critical steps already mentioned in the upgrade guide.
And it's also mentioned in the major upgrade overview.
And we still managed to miss those 😮💨
Step 4 - The Rollback
Guess we are probably not alone and GitLab does provide a dedicated troubleshooting guide
Fortunately, we took a snapshot in Step 1, so it's easier to just restore that snapshot.
Follow the Amazon EBS restoring guide Verified that the restore works and we are going to take 2nd try.
Step 5 - The Upgrade (2nd try)
The steps are almost the same as our 1st try, except that we need to pause after upgrading to14.0.z, we need to wait until all background migrations complete. Only after that, we proceed to upgrade from 14.0.z to the latest version. This time everything works smoothly. And all these effort is just for getting this little checkbox. What a journey! BTW, our monitoring does respond promptly (provided by betteruptime.com) along the path, you can see we have experienced 5 downtimes 😵
The takeaway
For GitLab
- GitLab is actually doing a pretty good job documenting the upgrade process. I would say for such a complex product single-handedly supporting $10 billion dollar market cap, I am pretty amazed. On the other hand, I would hope they could stress asking user to do a backup.
- Offering a UI-based upgrade wizard would still be desired though 😋.
- GitLab's release notes is very well written 👍.
For folks doing the GitLab major upgrade or any other software major upgrade
- Always creating a backup beforehand.
- Read the fxxking manual (RTFM) and follow the manual carefully, step by step when conducting the upgrade.
For Bytebase
Like our GitLab upgrade story shows, software (especially on-premises self-managed deployment) upgrade is hard. That's mainly because of the database schema/state migration and this is also what Bytebase is trying to tackle. It's good for our team to feel the pain and understand the challenge to provide a smooth upgrading experience and safe rollback plan.
Let's turn the shitshow 💩 into a carnival 🎡