Opinionated Git - Part 2 - Embracing Rebase
The path to git history enlightenment

Part 1

Embracing Rebase

Rebase (in my opinion) is one of the most mis-understood and under used features of git. There are many attempts at explaining rebase, some highlighting the difference between merge and rebase, and some dire warnings against using rebase. Git, at its core is surprisingly simple and elegant, and it is from this simplicity and elegance that it derives its power. It is only when this simplicity is abstracted away with tools that hide what is really going on that complexity seemingly emerges and begins to befuddle those who haven’t attempted to understand it. As such I will try to explain rebase alluding to some of the simple and powerful concepts at its core, but without delving too much into the inner workings of git.

The best point to start this explaination is the humble git commit. The main contents of a commit are a reference (SHA-1) to a tree object, a reference (SHA-1) to 1 or 2 parent objects, an author and commit message. These details are packed into a file, and a SHA-1 is created from this file which is then used to reference that commit. The tree object contains other tree (think directories) and blob (files) object references that represent all the files that form the snapshot for this commit. This can then be used to quickly find the differences between the current commit and its immediate parent, or quickly build a working directory when perfoming a checkout.

The Problem

Let’s say you create a branch from master called FeatureX. Work begins on this new local branch to implement the new feature, but in the meantime someone else has committed their changes to master before you have a chance to finish. So now master is ahead of where you branched from. Obviously you need to integrate your changes with the new changes in master.

alt text

Your branch may or may not merge cleanly. Even if it does merge cleanly, there is still the possibility that a change made by the other developer has broken your changes and will result in a build failure. Even if the code merges cleanly and builds, there is still a risk that the other developer has changed the code in such a way that your changes will introduce a bug. Hopefully you have sufficient unit/integration tests to validate your changes, and guard against this last kind of problem, but it is still a risk. A disciplined developer will bring the changes down to their local repository first, review them, and ensure their changes on the feature branch still build, and are valid against the changes done in master since they branched. This is called a “reverse integration”. To do this their are 2 options in git, merge or rebase.

Merge

When you merge two branches, git finds their common ancestor commit, and attempts to create a new commit that contains all the changes from the branch being merged, onto the branch you are merging into. A 3-way merge is performed on all files, and this may result in conflicts that need to be manually corrected. This is fairly simple, and pretty much what we are used to doing with older centralized source control. This merge commit containing 2 parents can be very useful. It means that you can easily roll back an entire feature by undoing this 1 merge commit. What if you need the changes that have been integrated on master to develop your new feature, but you are not ready to merge to master? Even if you are ready to merge to master, as discussed, it is good practice to reverse integrate your changes locally and validate them before blindly merging into the remote master. So first we merge the remote master into our local branch, validate that our changes still work, finish off our feature, and then merge our branch back into master.

alt text

It becomes even worse the longer you are branched off from master, as the probablity of merge issues and the subsequent time to integrate increases. In this case it is often best to do a number of reverse integrations at appropriate intervals (say daily) to keep the distance between your branch and master as small as possible, and avoid an exponential increase in the integration resolution time, not to mention the increased likelyhood of introducing bugs.

alt text

This is the path to the merge hell I discussed in Part 1 of opinionated Git. It leaves what are essentially “terds” in your feature branch history that add to the confusion of exactly what it is you are releasing, and make it harder to find and correct bugs down the track. It becomes even more problematic if you have multiple long lived feature branches, and are actively sharing code across them before committing to the branch you use to produce releases from. Very quickly it becomes impossible to determine exactly what you are releasing.

Rebase

Rebase on the other hand takes a slightly different approach. Rather than producing a merge commit containing all of the changes on the FeatureX branch, rebase re-parents the commits you have made on top of the new changes in master. In essence, it finds all the commits since the last common ancestor commit between the 2 branches (similar to merge), then replays the changes in each commit on top of the new target branches HEAD commit. This means each original commits intent (the modifications, the commit comment and author, etc…) is preserved, but because the parent commit has changed, the SHA-1 for each copy of the original commit is completely different. More importantly, it only has a single parent. In the process, each commit will perform 3-way merges with files that have changed in common for that commit, and the new commits since you branched from master. N.B. the original commits a not lost (just orphaned), and you can return to them if the rebase goes wrong.

alt text

When you do finally decide to merge back into master rather than the diamond shaped merge based history that shows the reverse integration followed by a forward integration, you see a farily simple merge history.

alt text

In the extreme example of a long running feature where you need to perform multiple reverse integrations before finally integrating back into master, you are left with something much more readable.

alt text

If you really wanted to, given that performing the rebase has ensured that you are at the tip of the master branch, you could even potentially do a fast forward merge to completely flatten the history into a single linear history. I personally don’t do this because I like the idea of being able to easily revert an entire feature by simply reverting the merge commit, but this does open the possibility. If practiced propoerly by the entire team, your master branch becomes a series of merge commits from feature branches that are self contained, and based on the previous tip of the master branch.

alt text

I get it… this sounds a bit scarey. It’s not as simple to understand what’s going on, and it feels like there is the potential to really mess up your branch. To be honest, I have messed up my branch in the past… majorly. The thing to remember about git though, is that if you’re “committed” you will never lose.

It was late one dark night when I was trying to rebase some changes from my branch onto master, and I kept getting very strange issues, and code changes I had been working on seemed to be disappearing right before my eyes. I started to panic, and frantically turned to the most powerful tool in the developers arsenal… Google. It was then I discovered this thing known as git reflog. That combined with the power of git reset ... --hard, and before I knew it, it was like the past 30 minutes had never happened. I figured out what I was doing wrong, and happily reverse integrated my changes. I also have the habbit of using git rebase --abort if the rebase tells me there are conflicts I wasn’t expecting, or am not equipped to deal with at that point in time.

Conflicts

Many people avoid rebase because they think it will result in more conflicts. For the most part, if a rebase will conflict, then the exact same merge will also conflict in the exact same ways. The main difference is that with Merge, you will get all the conflicts in a single operation, whereas with rebase you will generally get smaller conflicts on individual commits. This may be better or worse depending on the changes. The thing to remember is that with rebase you have to integrate every commit, which means you need to understand what you have done in each commit. By extension, this means commit messages become very, very important.

Gotchas

There are 2 gotchas when dealing with rebasing, and it is for these reasons there are many dire warnings on the interwebs as described above.

The first is that if you are using a feature branch to share code between developers, then if one developer rebases their changes, they will make life hell for the others when they attempt to sync up. This is why Opinionated Git mandates that code sharing be done through a minimal set (idealy 1) of remote branches. If you are working on your own small feature branches, this shouldn’t be a problem. My personal habit is to use feature branch names that explicitly call out the fact that I should be the only person actively adding to this branch. e.g. sbaldwin\SomeAwesomeFeature.

The second is explained in detail here and that is when you perform a reverse integration, if you find that you need to fix the build due to an integration code issue, you will have 1 or more commits that break the build in your history. This is a bad thing, and violates the rules of Opinionated Git. However, the solution is not to abandon rebase in favour of merge hell, but rather to use even more rebase power. In order to resolve this, an Opinionated Git will use interactive rebase to fix the issues in the commits they occur in rather than tacking on a rectifying commit at the tip of the branch. Interactive rebase is insanely powerful, and is a topic for discussion in a future part in this series.

Conclusion

When you need to reverse integrate changes from a long lived remote branch into your current working branch, use rebase instead of merge. In essence, you are actually being more honest about what you are doing. While you may have started development at an earlier point, the fact that someone else has committed changes into the remote long lived branch means that their changes have met the quality standards (code review, passing tests, etc…) required by your team charter. Clearly you have yet to do that, and as such, it is your responsibility to meet this standard after having incorporated their changes from master. To do this as a merge from the master and then to merge back into master will work, but starts a path to git branch hell, and makes any attempt at code archeology confusing. Performing a rebase accurately depicts the historical circumstances under which you were working at the time, and gives a much simpler commit structure. Merging into the long-lived remote branch after this is fine, and has the added advantage of giving you a single commit to revert in the event you need to roll back your change.

« Previous | Next »

*****
Written by Scott Baldwin on 08 July 2018