Git: When is a tree not a tree?

Git repositories are described as trees, with branches. This is true. But, thinking of them as nodes of connected commits, is slightly more correct. Each node has a parent, and might have children. A node can be named with a tag or branch.

Sounds complicated, but let me explain

Target Audience

This post is aimed at beginner to intermediate Git users. You should know how to make commits, and checkout a branch, but maybe you’re not entirely sure what a branch or tag is.

You’ll be able to live your whole Git life just fine without reading this post. But if you want to have a different way of thinking about how your repository looks, and see why you can sometimes fast forward, and what’s really happening during a merge, read on.

Branches In A Line?

To illustrate these concepts, I have a simple Git repository with three commits. Its “tree” looks like this:

First Three Commits First Three Commits

OK, that must be a bamboo tree, it’s pretty straight. Let’s see if we can create a tree by adding a branch (called other) with two commits.

New Branch other New Branch (other)

It’s still straight even though it has a branch!

Thinking of master and other as separate branches is correct, but at this stage you can also think of them as just different nodes in the same line of commits (because that’s what they are).

Merging Straight Line Branches

Merging is when you incorporate the commits from one branch into another. In this case, we are merging the commits from other into master, with the git merge command:

$ git merge --ff-only other
Updating 6c41182..ea5a223
Fast-forward
 first.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Git will normally try to fast forward merge if possible. The --ff-only flag ensures a merge will not happen unless it is a fast forward.

How does it look now?

After the straight line merge After the straight line merge

You can see master and other are both pointers to the same node in the git tree.

So did any merging really take place? Well, yes. Sort of. We could also think of this type of merge as time travelling around. The reference master has now zoomed forward in time to be the same as other.

Merging Divergent Branches - Standard Merge

What about merging if the branches really have diverged, i.e. they look like this:

Diverged branches Diverged branches

There’s two option here. First, you can just merge the other branch into master, like this:

$ git merge -m "Merging in other" other
Merge made by the 'recursive' strategy.
 second.txt | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 second.txt
$

(the -m flag is optional, without it you will be prompted for a commit message)

You’ll end up with this familiar looking sight — the tree branches merging together again at the merge commit.

After standard merge After standard merge

Merging Divergent Branches - Rebase

The other option to to rebase first, then merge.

Rebasing is when you take the commits from some node in the tree, and move them so that node is attached to a different parent, (i.e. the base of the first node is changing). It makes more sense if you see a before and after. Let’s remind ourselves what it looked like before merge.

Diverged branches (again) Diverged branches again

Now, rebase like this. We’re on the branch master and we want to rebase onto other:

$ git rebase other
First, rewinding head to replay your work on top of it...
Applying: Branching out on master
$

Now the tree looks like this:

After rebase After rebase

So now we want to merge other and master together. In this case, since we have that straight line thing going on again, we can fast forward merge a.k.a do the “time travel zoomy” thing.

$ git checkout other
Switched to branch 'other'
$ git merge --ff-only master
Updating 8762b50..85fdd08
Fast-forward
 first.txt | 4 ++++
 1 file changed, 4 insertions(+)
$

And once again other and master are the same thing.

Fast forward after rebase Fast forward after rebase

So is it better to do a standard merge, or rebase and merge? Each method of merging has its own advantages and disadvantages

Standard Merge

Using this method you will end up with an extra commit in your tree, which may not add any extra information. If you are merging in a branch and there are no conflicts, you’re adding an unnecessary node to the tree.

It does however retain the log of that merge, which is an indication that things have come from different branches and can be useful in reconciling history up to that point.

Rebase and Fast Forward

This takes a few more steps but it is a very useful thing to do when working with other developers and wanting to integrate their changes incrementally. For example, if you have a long running feature branch, you can find it useful to rebase it onto master first thing every morning.

You will incrementally receive their commits daily, which reduces the friction to integrate back into master when your feature branch is complete. If you tried to merge changes from master every day with a normal merge, you’d end up with a ton of merge commits.

The disadvantage is that if there are conflicts during rebase that you fix, there won’t be a history of these fixes. However if you rebase often, it will reduce the difficulty of these conflict resolutions.

Also there is no history of merges, since there are no merge commits.

Whichever way you prefer is up to you and your team to decide. My personal preference is the rebase and fast forward method, because when you’re getting the changes from master daily with a rebase, it also makes sense to bring your feature branch back into master with a clean fast forward merge.

You also get a beautiful straight commit tree.

Tags

In the opening paragraph, I wrote:

Each node can be named with a tag or branch

The difference between a tag and a branch is that the tag always points to the same node, while the branch changes which node it points to. This is really obvious but I thought I would just point it out. You see it happen whenever you commit on a branch, now master (or wherever you commited) refers to the latest commit.

To illustrate quickly, I’ll create a tag called mytag on other.

New tag (mytag) New tag (mytag)

Here, mytag and other point to the same commit.

Now, I’ll fast-forward merge other to master.

Merge after tag Merge after tag

mytag still refers to the same commit, but other has moved on.

Conclusion

Manipulating the Git tree can seem daunting but with the help of a Git GUI tool (here I’m using GitX; most IDEs will provide one too these days) to show the commit graph, it can be a lot simpler. Likewise, thinking of separate branches is sometimes not correct, when two branches are in the same line or even the same node.

By understanding what branch is pointing to which node, and knowing when merging and rebasing is necessary (or not), Git becomes a lot easier to understand and your repositories can become a lot cleaner.

Previous entry

Next entry