Git course #1 – data model: blobs, trees, commits, refs

by Henryk Tews / Wednesday, 06 May 2026 / Published in Git

Most developers use Git like a magic box: you type something, something gets saved, history somehow works. Until something goes wrong. To stop Git being a black box – start by understanding what actually lives inside the .git/ directory. Four object types: blob, tree, commit, tag. That is it. Everything else follows from these.

Four object types

Git is a content-addressed object database. Every object is identified by the SHA-1 of its content. Change one byte – the SHA changes.

Blob

A blob stores raw file content. It has no name, no path – that is the tree’s job. Two identical files in different directories are one blob.

echo "Hello, Git" | git hash-object --stdin
# 8ab686eafeb1f44702738c8b0f24f2567c36da6d

git cat-file -p 8ab686eafeb1f44702738c8b0f24f2567c36da6d
# Hello, Git

Tree

A tree is a list of entries: file mode, type (blob or tree), SHA and name. The equivalent of a directory in the file system.

git cat-file -p HEAD^{tree}
# 100644 blob a8a940627d13...  README.md
# 100644 blob 1f7391f9274f...  composer.json
# 040000 tree 2a1bcad13f8e...  src

Commit

A commit points to one tree (the repository root), to zero or more parent commits, and contains author, committer, timestamp and message.

git cat-file -p HEAD
# tree 2a1bcad13f8e8c8d9d2b7d6c4f2a1e9b8c7d6e5f
# parent a3d5e2f1b4c6d7e8f9a0b1c2d3e4f5a6b7c8d9e0
# author Henryk Tews <henryk@tews.pl> 1746531600 +0200
# committer Henryk Tews <henryk@tews.pl> 1746531600 +0200
#
# Add product export service

Tag

An annotated tag is a separate object pointing to another object (usually a commit). A lightweight tag is just a ref – a file in .git/refs/tags/ containing a SHA.

References – human names for SHAs

A SHA like a3d5e2f1b4c6 is precise but impractical. References are text files containing a SHA.

cat .git/refs/heads/main
# a3d5e2f1b4c6d7e8f9a0b1c2d3e4f5a6b7c8d9e0

cat .git/HEAD
# ref: refs/heads/main   <- attached HEAD (points to branch)
# or:
# a3d5e2f1b4c6...        <- detached HEAD (points to commit)

Practical implications

A branch is one file containing a SHA. Creating a branch is as cheap as writing one file. A merge creates a new commit with two parents.

Git stores snapshots, not diffs. Every commit is a complete picture - a tree pointing to all blobs. Diffs are computed on the fly. Blobs shared between commits are not duplicated - that is the effect of content addressing.

SHA never lies. If two SHAs are identical, content is identical. This is the foundation of Git integrity.

The .git/ structure

.git/
├── HEAD          # where you are right now
├── config        # repository configuration
├── index         # staging area (binary)
├── objects/      # object database
│   ├── aa/       # first 2 chars of SHA = subdirectory
│   │   └── 3f...
│   └── pack/     # packed objects
└── refs/
    ├── heads/    # local branches
    ├── remotes/  # remote references
    └── tags/

Exploration commands

# Object type
git cat-file -t SHA

# Object content
git cat-file -p SHA

# Graphical commit tree
git log --oneline --graph --all

# All files in HEAD
git ls-tree -r HEAD

Summary

Git is four object types plus references. Understanding this structure explains behaviours that look like magic: why branches are cheap, why rebase rewrites SHAs, why two identical files take no more space than one. Next post: commits and history - good commit messages, rebase -i, cherry-pick and bisect.