CHAPTER 02
Advanced
Understanding Git Objects and Internals
Updated: May 15, 2026
20 min read
# CHAPTER 2
Understanding Git Objects and Internals
1. Introduction
To manipulate a car engine, a mechanic must understand pistons and spark plugs. To manipulate Git, a DevOps engineer must understand Objects. Git is fundamentally a key-value data store. When you save code, Git does not save files; it saves internal Objects. In this chapter, we will open the hood of the Git engine. We will explore the four core object types—Blobs, Trees, Commits, and Tags—and learn how Git uses SHA-1 cryptographic hashes to connect these objects together into the immutable history graph we rely on daily.2. Learning Objectives
By the end of this chapter, you will be able to:- Define the four core Git Object types.
- Understand how Git calculates a SHA-1 hash.
-
Use low-level "plumbing" commands (
git cat-file) to inspect raw objects.
- Explain the hierarchical relationship between Commits, Trees, and Blobs.
- Understand why Git history is cryptographically immutable.
3. Beginner-to-Advanced Explanations
When you commit a project with a folder and a file, Git generates three specific objects:-
1.
The Blob (Binary Large Object): Git takes the *contents* of your file (e.g., "Hello World"), compresses it, and hashes it. This hash becomes the filename in the
.git/objectsfolder. *Note: The blob ONLY stores the content, not the file's name!*
-
2.
The Tree: Git needs to remember the file's name and folder structure. A Tree object is like a directory listing. It contains pointers (hashes) to the Blobs, mapping the raw content to the filename (e.g.,
test.txt points to blob 557db...).
- 3. The Commit: The Commit object sits at the top. It contains a pointer to the main Tree, the author's name, the timestamp, the commit message, and a pointer to the *parent commit* that came before it.
-
4.
The Tag: (Optional) A tag object is simply a human-readable label (like
v1.0.0) that permanently points to a specific Commit object.
4. Real-World Workflow Examples
Why does an architect need to know this? Imagine a junior developer accidentally commits a 5GB database dump file. Even if they delete the file in the next commit, the repository is still 5GB in size. Why? Because the 5GB Blob was written to the.git/objects database and will live there forever, slowing down every git clone. Understanding objects allows a senior engineer to use tools like git filter-repo to hunt down and explicitly delete the 5GB blob from the raw database.
5. Git Command Walkthroughs
Git has high-level commands (porcelain) likegit commit, and low-level database commands (plumbing) like git cat-file. We will use plumbing to look inside the database.
bash
6. Best Practices
-
Never Modify the
.gitDirectory Manually: While it is educational tocatfiles inside.git/objects, you should never open a file in there with a text editor and change it. Because the filenames are cryptographic hashes of the content, changing a single character of the content will invalidate the hash, immediately corrupting your entire repository.
7. Common Mistakes
-
Confusing Branches with Objects: A commit is an immutable, permanent object mathematically woven into the database. A branch is NOT an object. A branch is just a text file in
.git/refs/heads/that contains the hash of a commit. This is why creating and deleting branches in Git is instantaneous—it's just creating or deleting a 41-byte text file.
8. Troubleshooting Tips
-
Corrupted Repositories: If your computer crashes during a
git push, your object database might become corrupted. The commandgit fsck(File System Check) will scan your entire.git/objectsdatabase, verify every single SHA-1 hash against its content, and report exactly which objects are broken or "dangling."
9. Exercises
- 1. Explain the specific roles of a Blob, a Tree, and a Commit in representing a single file in a repository.
- 2. Why doesn't a Blob object store the name of the file it represents?
10. Mini Project: Inspect Git Objects Manually
Let's trace a commit down to its raw binary data.Step-by-Step Walkthrough:
- 1. Open your terminal in a Git repository that has at least one commit.
- 2. Find the hash of your latest commit:
bash
git log -1 --format="%H"
`
*(Assume the output is a1b2c3d...)*
-
3.
Inspect the Commit object:
`bash
git cat-file -p a1b2c3d
`
*(You will see the author, the message, and a line that says tree d4e5f6g...)*
-
4.
Inspect the Tree object using that new hash:
`bash
git cat-file -p d4e5f6g
`
*(You will see a list of files. Next to your file index.html, you will see blob h7i8j9k...)*
-
5.
Inspect the Blob object using that final hash:
`bash
git cat-file -p h7i8j9k
`
*(The terminal will print the exact source code of your file).*
You just manually traversed the Git graph database!
11. FAQs
Q: What is a SHA-1 hash?
A: It is a cryptographic algorithm. If you feed it the text "Hello", it produces a unique 40-character string (the hash). If you feed it "hello" (lowercase), it produces a radically different 40-character string. Git uses this to ensure that if a file is changed by even one byte, it gets a completely new ID, making Git history tamper-proof.
12. Summary
In Chapter 2, we dissected the atomic structure of Git. We bypassed the familiar interface of branches and commits to expose the underlying key-value data store. We learned that Git breaks our project down into Blobs (raw content), Trees (directory structures), and Commits (metadata and historical links), securing everything with immutable SHA-1 hashes. By utilizing plumbing commands like git cat-file`, we manually traversed the object graph, proving that Git is not magic, but a beautifully engineered mathematical ledger.