Skip to main content
Advanced Git Commands
CHAPTER 02 Advanced

Understanding Git Objects and Internals

Updated: May 15, 2026
20 min read

# CHAPTER 2

Understanding Git Objects and Internals

1. Introduction

To manipulate a car engine, a mechanic must understand pistons and spark plugs. To manipulate Git, a DevOps engineer must understand Objects. Git is fundamentally a key-value data store. When you save code, Git does not save files; it saves internal Objects. In this chapter, we will open the hood of the Git engine. We will explore the four core object types—Blobs, Trees, Commits, and Tags—and learn how Git uses SHA-1 cryptographic hashes to connect these objects together into the immutable history graph we rely on daily.

2. Learning Objectives

By the end of this chapter, you will be able to:
  • Define the four core Git Object types.
  • Understand how Git calculates a SHA-1 hash.
  • Use low-level "plumbing" commands (git cat-file) to inspect raw objects.
  • Explain the hierarchical relationship between Commits, Trees, and Blobs.
  • Understand why Git history is cryptographically immutable.

3. Beginner-to-Advanced Explanations

When you commit a project with a folder and a file, Git generates three specific objects:
  1. 1. The Blob (Binary Large Object): Git takes the *contents* of your file (e.g., "Hello World"), compresses it, and hashes it. This hash becomes the filename in the .git/objects folder. *Note: The blob ONLY stores the content, not the file's name!*
  1. 2. The Tree: Git needs to remember the file's name and folder structure. A Tree object is like a directory listing. It contains pointers (hashes) to the Blobs, mapping the raw content to the filename (e.g., test.txt points to blob 557db...).
  1. 3. The Commit: The Commit object sits at the top. It contains a pointer to the main Tree, the author's name, the timestamp, the commit message, and a pointer to the *parent commit* that came before it.
  1. 4. The Tag: (Optional) A tag object is simply a human-readable label (like v1.0.0) that permanently points to a specific Commit object.

4. Real-World Workflow Examples

Why does an architect need to know this? Imagine a junior developer accidentally commits a 5GB database dump file. Even if they delete the file in the next commit, the repository is still 5GB in size. Why? Because the 5GB Blob was written to the .git/objects database and will live there forever, slowing down every git clone. Understanding objects allows a senior engineer to use tools like git filter-repo to hunt down and explicitly delete the 5GB blob from the raw database.

5. Git Command Walkthroughs

Git has high-level commands (porcelain) like git commit, and low-level database commands (plumbing) like git cat-file. We will use plumbing to look inside the database.
bash
1234567
# 1. Create a file and hash it directly into the Git database without committing!
echo "Advanced Git Internals" | git hash-object -w --stdin
# Output will be a hash, e.g., 9f257...

# 2. Inspect the object you just created using the hash
git cat-file -t 9f257... # The -t flag tells you the TYPE (it will output 'blob')
git cat-file -p 9f257... # The -p flag prints the raw CONTENT ('Advanced Git Internals')

6. Best Practices

  • Never Modify the .git Directory Manually: While it is educational to cat files inside .git/objects, you should never open a file in there with a text editor and change it. Because the filenames are cryptographic hashes of the content, changing a single character of the content will invalidate the hash, immediately corrupting your entire repository.

7. Common Mistakes

  • Confusing Branches with Objects: A commit is an immutable, permanent object mathematically woven into the database. A branch is NOT an object. A branch is just a text file in .git/refs/heads/ that contains the hash of a commit. This is why creating and deleting branches in Git is instantaneous—it's just creating or deleting a 41-byte text file.

8. Troubleshooting Tips

  • Corrupted Repositories: If your computer crashes during a git push, your object database might become corrupted. The command git fsck (File System Check) will scan your entire .git/objects database, verify every single SHA-1 hash against its content, and report exactly which objects are broken or "dangling."

9. Exercises

  1. 1. Explain the specific roles of a Blob, a Tree, and a Commit in representing a single file in a repository.
  1. 2. Why doesn't a Blob object store the name of the file it represents?

10. Mini Project: Inspect Git Objects Manually

Let's trace a commit down to its raw binary data.

Step-by-Step Walkthrough:

  1. 1. Open your terminal in a Git repository that has at least one commit.
  1. 2. Find the hash of your latest commit:
``bash git log -1 --format="%H" ` *(Assume the output is a1b2c3d...)*
  1. 3. Inspect the Commit object:
`bash git cat-file -p a1b2c3d ` *(You will see the author, the message, and a line that says tree d4e5f6g...)*
  1. 4. Inspect the Tree object using that new hash:
`bash git cat-file -p d4e5f6g ` *(You will see a list of files. Next to your file index.html, you will see blob h7i8j9k...)*
  1. 5. Inspect the Blob object using that final hash:
`bash git cat-file -p h7i8j9k ` *(The terminal will print the exact source code of your file).*

You just manually traversed the Git graph database!

11. FAQs

Q: What is a SHA-1 hash? A: It is a cryptographic algorithm. If you feed it the text "Hello", it produces a unique 40-character string (the hash). If you feed it "hello" (lowercase), it produces a radically different 40-character string. Git uses this to ensure that if a file is changed by even one byte, it gets a completely new ID, making Git history tamper-proof.

12. Summary

In Chapter 2, we dissected the atomic structure of Git. We bypassed the familiar interface of branches and commits to expose the underlying key-value data store. We learned that Git breaks our project down into Blobs (raw content), Trees (directory structures), and Commits (metadata and historical links), securing everything with immutable SHA-1 hashes. By utilizing plumbing commands like
git cat-file`, we manually traversed the object graph, proving that Git is not magic, but a beautifully engineered mathematical ledger.

13. Next Chapter Recommendation

Now that we understand how commits are mathematically structured, how do we organize thousands of them across a massive engineering team? Proceed to Chapter 3: Advanced Branching Strategies.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·