Git repositories are described as trees, with branches. This is true. But, thinking of them as nodes of connected commits, is slightly more correct. Each node has a parent, and might have children. A node can be named with a tag or branch.
Sounds complicated, but let me explain
Target Audience
This post is aimed at beginner to intermediate Git users. You should know how to make commits, and checkout a branch, but maybe you’re not entirely sure what a branch or tag is.
You’ll be able to live your whole Git life just fine without reading this post. But if you want to have a different way of thinking about how your repository looks, and see why you can sometimes fast forward, and what’s really happening during a merge, read on.
Branches In A Line?
To illustrate these concepts, I have a simple Git repository with three commits. Its “tree” looks like this:
OK, that must be a bamboo tree, it’s pretty straight. Let’s see if we can create a tree by adding a branch (called other
) with two commits.
It’s still straight even though it has a branch!
Thinking of master
and other
as separate branches is correct, but at this stage you can also think of them as just different nodes in the same line of commits (because that’s what they are).
Merging Straight Line Branches
Merging is when you incorporate the commits from one branch into another. In this case, we are merging the commits from other
into master
, with the git merge
command:
$ git merge --ff-only other Updating 6c41182..ea5a223 Fast-forward first.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Git will normally try to fast forward merge if possible. The --ff-only
flag ensures a merge will not happen unless it is a fast forward.
How does it look now?
You can see master
and other
are both pointers to the same node in the git tree.
So did any merging really take place? Well, yes. Sort of. We could also think of this type of merge as time travelling around. The reference master
has now zoomed forward in time to be the same as other
.
Merging Divergent Branches - Standard Merge
What about merging if the branches really have diverged, i.e. they look like this:
There’s two option here. First, you can just merge the other
branch into master
, like this:
$ git merge -m "Merging in other" other Merge made by the 'recursive' strategy. second.txt | 1 + 1 file changed, 1 insertion(+) create mode 100644 second.txt $
(the -m
flag is optional, without it you will be prompted for a commit message)
You’ll end up with this familiar looking sight — the tree branches merging together again at the merge commit.
Merging Divergent Branches - Rebase
The other option to to rebase first, then merge.
Rebasing is when you take the commits from some node in the tree, and move them so that node is attached to a different parent, (i.e. the base of the first node is changing). It makes more sense if you see a before and after. Let’s remind ourselves what it looked like before merge.
Now, rebase like this. We’re on the branch master
and we want to rebase onto other
:
$ git rebase other First, rewinding head to replay your work on top of it... Applying: Branching out on master $
Now the tree looks like this:
So now we want to merge other
and master
together. In this case, since we have that straight line thing going on again, we can fast forward merge a.k.a do the “time travel zoomy” thing.
$ git checkout other Switched to branch 'other' $ git merge --ff-only master Updating 8762b50..85fdd08 Fast-forward first.txt | 4 ++++ 1 file changed, 4 insertions(+) $
And once again other
and master
are the same thing.
So is it better to do a standard merge, or rebase and merge? Each method of merging has its own advantages and disadvantages
Standard Merge
Using this method you will end up with an extra commit in your tree, which may not add any extra information. If you are merging in a branch and there are no conflicts, you’re adding an unnecessary node to the tree.
It does however retain the log of that merge, which is an indication that things have come from different branches and can be useful in reconciling history up to that point.
Rebase and Fast Forward
This takes a few more steps but it is a very useful thing to do when working with other developers and wanting to integrate their changes incrementally. For example, if you have a long running feature branch, you can find it useful to rebase it onto master
first thing every morning.
You will incrementally receive their commits daily, which reduces the friction to integrate back into master
when your feature branch is complete. If you tried to merge changes from master
every day with a normal merge, you’d end up with a ton of merge commits.
The disadvantage is that if there are conflicts during rebase that you fix, there won’t be a history of these fixes. However if you rebase often, it will reduce the difficulty of these conflict resolutions.
Also there is no history of merges, since there are no merge commits.
Whichever way you prefer is up to you and your team to decide. My personal preference is the rebase and fast forward method, because when you’re getting the changes from master
daily with a rebase, it also makes sense to bring your feature branch back into master
with a clean fast forward merge.
You also get a beautiful straight commit tree.
Tags
In the opening paragraph, I wrote:
Each node can be named with a tag or branch
The difference between a tag and a branch is that the tag always points to the same node, while the branch changes which node it points to. This is really obvious but I thought I would just point it out. You see it happen whenever you commit on a branch, now master
(or wherever you commited) refers to the latest commit.
To illustrate quickly, I’ll create a tag called mytag
on other
.
Here, mytag
and other
point to the same commit.
Now, I’ll fast-forward merge other
to master
.
mytag
still refers to the same commit, but other
has moved on.
Conclusion
Manipulating the Git tree can seem daunting but with the help of a Git GUI tool (here I’m using GitX; most IDEs will provide one too these days) to show the commit graph, it can be a lot simpler. Likewise, thinking of separate branches is sometimes not correct, when two branches are in the same line or even the same node.
By understanding what branch is pointing to which node, and knowing when merging and rebasing is necessary (or not), Git becomes a lot easier to understand and your repositories can become a lot cleaner.