Cleaning up the Mess

Software development can be a very messy process. You follow a path you think might work, only to realize that the class you are relying on doesn’t expose a method with the right parameters, or you suddnely see an opportunity to refactor a piece of code to make it more generic and get some code re-use out of it. You have to dive into a piece of the code that hasn’t been touched in years, and realise that the coding style has changed, and it looks so weird you just have to correct it to make sense of it. A complex feature could require a number of changes to core libraries along with corresponding unit/integration/UI tests, not to mention database scripts. If you’re doing TDD, you may have a number of Red/Green/Refactor cycles. At the end of a feature, your code (hopefully) will be elegant and readable, but the process you got there by is most likely a winding road that is anything but elegant. Don’t get me wrong, regular committing is a great thing that I strongly endorse… commit early, commit often is my motto. The great thing about git is, if you’re committed, it’s never lost.

A favourite feature of many people that use pull requests is the ability to “Squash on Merge”. This takes the 17 commits that you pushed up, including the 5 that you required to address code review feedback, and turns them into a single commit as though the whole thing happened magically in a single out-pouring of beautiful code. This means that anyone doing code archeology is shielded from the meanderings of your thought process as you were coding, and the irrelevant code review to and fro. It also means that they potentially have a large surface area in which to look to understand that particular commit. It may well include some refactoring, some file name changes, and even some bug fixes in amongst the feature development you actually intended to do.

So it feels like our alternatives are, the meandering dificult to follow thought pattern of an over-caffenated software engineer, or the giant commit to rule them all of a fully implemented feature (or three) that contains multiple context switches, many different files, and potentially hundreds of lines of code changes. Take heart, there is a middle ground.

Introducing Interactive Rebase

In the previous post I talked about the git rebase command. Interactive rebase takes it one step further, and is probably my favourite git feature. While git rebase lets you re-write the history of the order of things happenning by re-parenting your series of commits onto a different parent, interactive rebase lets you rewrite history in a miriad of different ways. Using interactive rebase you can:

reorder commits
reword a commit message
squash 2 or more commits into 1
drop commits
execute a command on 1 or more commits as they are replayed
even edit existing commits

When you perform an interactive rebase using a command like

git rebase master -i

You are given a screen something like the following

alt text

You then use your configured text editor (vim by default) to write the history your code should have had from the very beginning. Re-order and squash commits that are related to the same thing, remove experiments that didn’t really work, and shouldn’t be in the code base. Splitting apart commits that have mixed multiple discreet thoughts, fix individual commits that break the build. Details on how to do all these things can be found in the git documentation, and is well worth a read.

I won’t lie to you, doing this does take some practice, but here are a few tips to make it as easy as possible for you.

Write good commit messages

When you are performing an interactive rebase, all you have to identify the commit is the sha1 and the first line of the commit message. Clearly the sha1 conveys nothing about the content of the commit, so all you really have to help you is your commit message. How to write a Git Commit Message is the definitive guide on the subject, however, given that you can reword your commit messages during rebase, it is sometimes helpful to include additional information in the first draft of your commit messages to aid you in this process. For example

pick d41ab79 Started adding the widget factory
pick a483e44 Continued adding the widget factory
pick 600180d Finished adding the widget factory

This would most likely be rebased like this

pick d41ab79 Started adding the widget factory
squash a483e44 Continued adding the widget factory
squash 600180d Finished adding the widget factory

and you can change the final commit message to the sanctioned Add widget factory during the interactive rebase.

Commit before you change direction

Reordering and squashing commits together is far easier than pulling them apart. If you are in the middle of something, and you realise that you need to refactor a method in another class so you can use it, do a commit before you start the refactoring, and another one when you complete it, then finish the work you were doing before you decided to refactor. When it comes time to clean up, this is simply a matter of changing this

pick d41ab79 Started adding the widget factory
pick dfead8e Refactor repository to take a collection of widgets
pick 600180d Finished adding the widget factory

into this

pick dfead8e Refactor repository to take a collection of widgets
pick d41ab79 Started adding the widget factory
squash 600180d Finished adding the widget factory

Don’t wait till the very end before beginning the tidy up process

If you start to see your commit structure growing, it doesn’t hurt to do an inital pass on tidying up the commit structure. This will actually make you start thinking about structure, and may point the way forward. At best, it will give you a mental break if you really need it, and won’t context switch you as much as surfing stack overflow to see if there are any easy questions you can answer to get your points up.

Don’t try to do everything in a single interactive rebase

It’s ok to have multiple bites at the cherry. I really like to keep things simple, and give myself the least possible chance of screwing up. As such, I rarely ever attempt more than 1 set of changes per interactive rebase session. This means that I may have to perform a number of interactive rebases to get the commit structure into the state I want it, but I’ve found it has a far higher chance of success.

Test the build on every commit

If your project has a single build script (e.g. build.sh or build.ps1) for local building and unit test execution, then you can use interactive rebase to execute this script on every single commit in your commit structure. E.g.

git rebase -i master --exec "exec powershell -NoProfile -ExecutionPolicy Bypass -File build.ps1"

If your build script takes a long time to run, then you could potentially create another clone of your repository, and perform this in the background while you continue to work.

Gotcha’s

Interactive rebase comes with all the same gotcha’s as rebase does.

You may well get merge conflicts. If you aren’t expecting them, you can abort the mission by using git rebase --abort, then re-assess, re-group, and try again.
you will have to force push any branches you have already pushed to your remote.
You will completely break anyone who is sharing a remote branch that you rebase. This is why opinionated git advocates code sharing only be done from a minimal number (preferably 1) of remote branches (e.g. origin master or origin develop). This is why I always prefix my branch names with sbaldwin\... so that people know it’s my branch, and I’ll probably be rebasing it.

Conclusion

Interactive rebase allows you to structure your commits in a way that hides the messiness of the creative act of software develoment, and lets you create a commit structure that is as elegant as the code you write.

« Previous | Next »

Opinionated Git - Part 3 - Cleaning up the Mess The path to git history enlightenment