Git Pull Unable To Fast-Forward

git is a tool that’s likely a key part of any developer’s toolbox in 2024, but it’s also a very opaque tool. Many developers, myself included, know the basic commands they need to run for making commits, rebasing them, stashing changes, managing branches, etc. However, I know that I often have a poor grasp of how things actually working under the hood or why git may behave in a particular way. I’ve actually been meaning to read Pro Git recently, but in lieu of that I just try to take advantage of situations that force me to understand git a little bit better.

I ran into one such situation last week. At work we have an Infrastructure as Code repo containing code to manage our environment. That code runs not directly from where it’s hosted on GitHub, but instead from a server we run since it’s Puppet-based. As a result, the server executing the code needs to periodically run a git pull origin main to get the latest good stuff when something makes it into the main branch.

Our workflow is probably familiar to a lot of people. We have a repo in our company’s GitHub organization that is the source of truth. The people who need to work on that repo have their own forks of it where changes are made, and then pull requests are opened back to the source of truth. No changes are ever made directly on the server that actually runs Puppet; it just periodically runs a pull to keep up-to-date. As a result of this workflow, I was confused when I recently checked the git status on the server and saw that it was 40+ commits ahead of the main branch. How could that be if no one is ever making commits on the server itself?

The first clue was to simply look at the commit history on the server with git log. This showed commits that don’t exist in GitHub that all have a message like:

Merge branch ‘main’ of github.com/{org}/{repo} into main.

So a merge commit is getting created when git pull executes. Only having limited experience in using git repositories in the context of having multiple developers working on something, I didn’t really understand this. After doing some digging into git, though, I learned that this is the normal behavior for git! So why is it that when I would go back and run a git pull origin main on my fork after syncing it with the org’s repo I would never run into the same experience?

The answer lies directly within git’s documentation:

If the current branch is behind the remote, then by default it will fast-forward the current branch to match the remote. If the current branch and the remote have diverged, the user needs to specify how to reconcile the divergent branches with --rebase or --no-rebase (or the corresponding configuration option in pull.rebase).

I didn’t see this when locally updating my fork because my current branch was always behind the remote. As a result, git would simply fast-forward my branch to make it match. On the Puppet server, though, git was unable to fast-forward, meaning a commit was needed to reconcile the merge. This told me that the repo on the Puppet server had diverged at some point.

This got me to start looking through git log on the server to see where these merge commits started happening. After finding the first one, a quick look through the preceding commits showed me 2 commits that had identical commit messages and timestamps, but two different SHA hashes. I went back to GitHub to find the corresponding commit(s) there. I found only one commit matching that log message and with a hash that matched one of the two on the Puppet server.

At this point, the pieces started to come together. This repo was one of the first our company had really started using GitHub for by people who weren’t developers, and during the initial setup we were doing something horrible and all simply committing directly to the main branch of the repo in the organization; we weren’t creating forks and we weren’t opening PRs. At the point of this issue, someone (name omitted to protect the guilty) had made a commit, decided to make a slight tweak, and rebased that commit with another one. However, the Puppet server had already synced with the first commit, so at this point now its commit history diverged from GitHub when history was effectively rewritten.

In our particular instance, this is just a minor annoyance. While the diverging histories don’t actually impact anything, simply deleting the repo from the Puppet server and re-cloning it his a pain-free proposition. And since this is before we were operating with forks like we should have been in the first place, no one’s fork would be out of alignment, either. However, these circumstances would’ve been horrible in basically any other context, so it’s quite a good thing we’ve matured as an organization and started doing things properly.