Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: July 6, 2024
Git uses the underlying filesystem to organize and store internal structures such as commits, branches, and other refs (references). However, sometimes conflicts can arise between already existing objects and new ones.
In this tutorial, we explore how Git organizes its filesystem directory and ways to resolve potential conflicts. First, we briefly refresh our knowledge about the structure of a Git repository. After that, we look at ways that tampering with the main subdirectories of that structure can affect operations. Next, we create a sample repository and use it to show examples of ref (reference) issues in practice. Finally, we turn to potential solutions to unexpected refs.
We tested the code in this tutorial on Debian 12 (Bookworm) with GNU Bash 5.2.15 and Git 2.39.2. Unless otherwise specified, it should work in most POSIX-compliant environments.
Since it gets installed within the context of one, Git leverages the underlying filesystem to store and organize its data structures.
To demonstrate, let’s first create an empty repository:
$ git init
Now, we can check the contents of this supposedly empty Git project:
$ tree -d .git/
.git/
├── branches
├── hooks
├── info
├── objects
│ ├── info
│ └── pack
└── refs
├── heads
└── tags
10 directories
$ tree .git/
.git/
├── branches
├── config
├── description
├── HEAD
├── hooks
│ ├── applypatch-msg.sample
│ ├── commit-msg.sample
│ ├── fsmonitor-watchman.sample
│ ├── post-update.sample
│ ├── pre-applypatch.sample
│ ├── pre-commit.sample
│ ├── pre-merge-commit.sample
│ ├── prepare-commit-msg.sample
│ ├── pre-push.sample
│ ├── pre-rebase.sample
│ ├── pre-receive.sample
│ ├── push-to-checkout.sample
│ └── update.sample
├── info
│ └── exclude
├── objects
│ ├── info
│ └── pack
└── refs
├── heads
└── tags
10 directories, 17 files
Thus, we can already appreciate the different files and directories that make up an empty Git repository. This information is usually stored in a .git subdirectory, but bare repositories that don’t have a working tree employ the root directly.
First, let’s briefly explain the function of each subdirectory:
As we can see, there are also many files in the hierarchy:
Further, some files only exist for repositories with data:
Of course, this is a non-comprehensive list.
Notably, data in the branches, hooks, and info subdirectories can usually only be modified manually. Further, although many of these files get created and modified with Git commands, they are still regular files.
However, creating, removing, and changing files under .git or any of its subdirectories without considering the proper structure and format can have consequences.
If we were to replace any of the main subdirectories under .git, it could result in different issues.
To begin with, if we lose the .git/logs/ directory, we won’t have a reflog available until the references within are restored:
$ git reflog
ebebca8 (HEAD -> branch1) HEAD@{0}: checkout: moving from master to branch1
8a8a026 (tag: v0.1, master) HEAD@{1}: checkout: moving from branch1 to master
ebebca8 (HEAD -> branch1) HEAD@{2}: commit: file1
8a8a026 (tag: v0.1, master) HEAD@{3}: checkout: moving from master to branch1
8a8a026 (tag: v0.1, master) HEAD@{4}: commit (initial): file
$ mv .git/logs .git/logs.bak
$ git reflog
$
Still, any following activities would get logged without issues as long as Git can create and write to the logs directory.
However, if we replace it with a file, we lose the reflog command functionality permanently:
$ touch .git/logs
$ git checkout master
error: unable to append to '.git/logs/HEAD': Not a directory
Already on 'master'
Thus, we prevent log writes.
Since it’s fairly deprecated, losing the .git/branches subdirectory shouldn’t have any impact on most repositories.
Although the hooks subdirectory can be important if we have custom hooks implemented, it doesn’t have much of a function otherwise.
If we delete or replace the directory, we only lose the default hook templates in most cases.
Similarly, if we don’t use the exclude or other special file features within, we can often freely dispose of the info subdirectory.
Effectively, objects holds the whole commit tree and snapshots of all working directory states. This means that any file or directory lost within objects usually leads to general data loss.
Because of this, the objects subdirectory is critical.
Similar to objects, refs is very important. Unlike objects, there are ways to restore refs, since they don’t hold actual data, only metadata.
In fact, we can look at the refs subdirectory as a container for commit pointers, which enable Git to recognize certain commits by name rather than an identifier.
To begin with, we create several objects in a new repository:
Let’s enumerate them via the log subcommand:
$ git log --all --decorate --oneline --graph
* 420061f (HEAD -> master, tag: v0.1, branch1) file
Now, we should also have the respective underlying files:
$ tree .git/
.git/
├── branches
├── COMMIT_EDITMSG
├── config
├── description
├── HEAD
├── hooks
[...]
├── index
├── info
│ └── exclude
├── logs
│ ├── HEAD
│ └── refs
│ └── heads
│ ├── branch1
│ └── master
├── objects
│ ├── 2a
│ │ └── cf5a5f0c860dd25f42a4dc326febb6a942baad
│ ├── 42
│ │ └── 0061ffe3926e2137129144dc4e9d2b545ab9e3
│ ├── 47
│ │ └── d83249b05cf06491633be38ea8637c5b356acc
│ ├── 8b
│ │ └── 137891791fe96927ad78e64b0aad7bded08bdc
│ ├── info
│ └── pack
├── packed-refs
└── refs
├── heads
│ ├── branch1
│ └── master
└── tags
└── v0.1
17 directories, 30 files
As confirmed by the existence of an index file, the four objects under objects, and the heads subdirectory, the repository now has content.
Part of the functionality that Git offers includes synchronization between the local filesystem structure, local repository metadata, and remote information. Despite this, we have the ability to change any of the files that it stores.
Let’s see an example of how this might work within the sample repository we already created.
Next, we create the empty filerefbranch file under .git/refs/heads:
$ touch .git/refs/heads/filerefbranch
Further, we make a directory named dirrefbranch in the same path:
$ mkdir .git/refs/heads/dirrefbranch
This fake metadata can cause issues for Git.
Now, we can try to actually create the filerefbranch branch:
$ git branch filerefbranch
fatal: cannot lock ref 'refs/heads/filerefbranch': unable to resolve reference 'refs/heads/filerefbranch': reference broken
Since Git can’t lock or resolve the ref, it errors out. Although this is a fairly forced problem, we can end up having it in more mundane situations:
Now, let’s see how dirrefbranch behaves as the name of a new branch:
$ git branch dirrefbranch
$
In this case, we experience no issues, because Git can use and modify any path under refs that doesn’t contain files at any level as long as it has permissions to do so.
Yet, if the directory or any of its subdirectories contains a file, we would see an error:
$ git branch --delete dirrefbranch
Deleted branch dirrefbranch (was 1b859e5).
$ mkdir .git/refs/heads/dirrefbranch/ && touch .git/refs/heads/dirrefbranch/file
$ git branch dirrefbranch
fatal: cannot lock ref 'refs/heads/dirrefbranch': 'refs/heads/dirrefbranch/file' exists; cannot create 'refs/heads/dirrefbranch'
Due to the existence of file under dirrefbranch, Git can’t remove and recreate the directory as a reference.
Consequently, should a branch already exist with a given name, we can’t create a branch with an upper-level component that has that name:
$ git branch branch1/subbranch1
fatal: cannot lock ref 'refs/heads/branch1/subbranch1': 'refs/heads/branch1' exists; cannot create 'refs/heads/branch1/subbranch1'
In this case, Git can’t replace branch1 with a directory element in the path since it’s already the name of a branch. In other words, if a branch x exists, x/anyname can’t be created. Similarly, this blocks branches further down the path like x/anyname/orpath.
Even if we attempt a pull, Git can’t override a bad ref file:
$ git pull
[...]
fatal: bad object refs/heads/filerefbranch
error: ../remote/ did not send all necessary objects
Let’s check how the directory behaves:
$ git pull
From ../remote
* [new branch] dirrefbranch -> origin/dirrefbranch
* [new branch] manualbranch -> origin/manualbranch
* [new branch] master -> origin/master
[...]
Again, a directory isn’t a problem as long as it’s empty.
In many cases, we can perform a comparison with a new repository to see which filesystem objects should be of what type. However, different approaches can correct the situation in case of issues.
Many problems with the current .git structure can be resolved on their own. From partial pulls and pushes, to incomplete merges, Git is often able to see which references don’t lead to commits and what commits don’t need to remain.
Although it usually happens automatically, we can invoke the garbage collector manually as well:
$ git gc
After this operation, we can recheck whether the previously-failing operation is now successful. Further, we can run gc after other operations to ensure the consistent state of the Git repository as much as possible.
Similar to garbage collection, a prune removes dangling objects from the tree:
$ git remote prune origin
In this case, we run a prune operation on the origin, which removes the data about remote branches that no longer exist.
Similarly, we can perform a –prune during a fetch and repeat a failing pull, for instance:
$ git fetch --prune origin && git pull
This approach usually removes unexpected objects and brings local copies of remote branches in sync with their counterparts.
The update-ref subcommand is used to safely update object names within refs. It usually relinks bad refs, creates new ones, and generally synchronizes the Git structure.
In this case, we can use the command to [-d]elete a ref after verification:
$ git update-ref --no-deref -d <REF_PATH>
Notably, we use –no-deref to avoid further indirections when removing.
Of course, if the provided REF_PATH isn’t a ref, update-ref won’t be able to help.
To know whether a given filesystem object is a ref, we can use rev-parse with its path.
In this article, we discussed the basic skeletal structure of a Git repository and ways we can tamper with it.
In conclusion, manual edits of .git are possible and sometimes necessary, but should be performed with extra caution, despite different ways to recover from issues.