GIT Internals 1/3

---First---

Git is fundamentally a content-addressable filesystem with a VCS user interface written on top of it. In the early days of Git (mostly pre 1.5) , the user interface was much more complex because it emphasized this filesystem rather than a polished VCS. In the last few years, the UI has been refined until it is as clean and easy to use as any system out there.

What does content-addressable filesystem mean? It means that at the core of Git is a simple key-value data store. What this means is that you can insert any kind of content into a Git repository, for which Git will hand you back a unique key you can use later to retrieve that content.

-----Plumbing and Porcelain-----

Becuase Git was initially a toolkit for a version control system rather than a full user-friendly VCS, it has a number of subcommands that do low-level work and were designed to be chained together UNIX-style or called from scripts. These commands are generally referred to as Git's "plumbing" commands, while the more user-friendly commands are called "porcelain" commands.

-----.git directory-------

When you run "git init" in a new or existing directory, Git create the .git directory, which is where almost everything that Git stores and manipulates is located. Here is what a newly initialized .git directory typically looks like:

(1)config -- this file contains your project-specific configuration options

(2)description -- this file is used only by the GitWeb program

(3)HEAD -- one of the core parts of Git, this file points to the branch you currently have checked out

(4)hooks/ -- this directory contains your client- and server-side hook scripts

(5)info/ -- this directory keeps a global exclude file for ignored patterns that you don't want to track in a .gitignore file

(6)objects/ -- one of the core parts of Git, it stores all the content for your database

(7)refs/ -- one of the core parts of Git, it stores pointers into commit objects in that data

(8) index (yet to be created) -- one of the core parts of Git, this file is where Git stores your staging area information.

---blob object----

> git init C:\gittest

>cd C:\gittest

C:\gittest>echo "test content" | git hash-object -w --stdin //the plumbing command git hash-object, which takes some data, stores it in your .git/objects directory (the object database), and gives you back the unique key that now refers to that data object.

575b08edcc0046f2cdffe2819a7d55472da88013

> git cat-file -p 575b08edcc0046f2cdffe2819a7d55472da88013

But remember the SHA-1 key for each version of your file isn't practical; plus , you are't storing the filename in your system -- just the content.

--tree objects---

the tree objects solves the problem of storing the filename and also allows you to store a groud of files gogether. Git stores content in a manner similat to a UNIX filesystem, but a bit simplified. All the content is stored as tree and blob objects, with tree corresponding to UNIX directory entries and blobs corresponding more or less to inodes or file contents. aA single tree object contains one or more entries, each of which is the SHA-1 hash of a blob or subtree with its associated mode, type, and filename.

Git normally creates a tree by taking the state of your staging area or index and writing a series of tree objects from it. So, to create a tree object, you first have to set up an index by staging some files.

Assume that you create a new file test.txt in your git directory.

> git update-index --add test.txt

Or you can create index from your .git database

>git update-index --add --cacheinfo 100644 \

In this case ,you're specifying a mode of 100644, which means it's a normal file. Other options are 100755, which means it's an executable; and 12000, which specifies a symbolic link.

>git wite-tree //to write the staging area out to a tree object. No -w option is needed --calling this command automatically creates a tree object from the state of the index of that tree doesn't yet exist.

4b825dc642cb6eb9a060e54bf8d69288fbee4904

you can read trees into your staging area by calling git read-tree. In this case, you can read an existing tree into your staging area as a subtree by using the --prefix option with this command:

>git read-tree --prefix=bak 4b825dc642cb6eb9a060e54bf8d69288fbee4904

>git write-tree

--commit objects------

If you've done all of the above, the earlier problem remains: you must remember all SHA-1 values in order to recall the snapshots. You also don't have any information about who saved the snapshots, when they were saved, or why they were saved. This is the basic information that the commit object stores for you. To create a commit object , you call commit-tree and specify a single tree SHA-1 and which commit objects, if any, directly preceded it.

Assume the tree SHA-1 is 4b825dc642cb6eb9a060e54bf8d69288fbee4904,

> echo "first commit" | git commit-tree 4b825d

854314ac5257ede3d84064c6e1afafcd232a5c46

>git log --stat 854314

--object storage-----

There is a header stored with every objects you commit to your Git object database. Git first constructs a header which starts by indentifying the type of object (blob,tree,commit). To that first part of the header, Git adds a space followed by the size in bytes of the contents, and adding a final null byte. Git compresses the new content with zlib. Finally ,write the zlib-deflated content to an object on disk.

---Git references-----

It would be easier if you had a file in which you could store that SHA-1 value under a simple name so you could use that simple name rather than the raw SHA-1 value. In Git, these simple names are called "references" or "refs". You can find the files that contain those SHA-1values in the .git/refs directory.

To create a new reference that will help you remember where your latest commit is, you can technically do somthing as simple as this:

>echo 854314ac5257ede3d84064c6e1afafcd232a5c46 >.git/refs/heads/master

But you don't encouraged to directly edit the reference files; instead, Git provides the safer command git update-ref to do this .

> git update-ref refs/heads/master 854314ac5257ede3d84064c6e1afafcd232a5c46

That's basically what a branch in Git is: a simple pointer or reference to the head of a line of work.

> git update-ref refs/heads/test caca33

your branch will contain only work form the commit down.

------Reference :The HEAD------

The question now is, when you run git branch <branch> , how does Git know the SHA-1 of the last commit? The answer is the HEAD file. Usually the HEAD file is a symbolic reference to the branch you're currently on. By symbolic reference, it contains a pointer to another preference. However in some rare cased the HEAD file may contain the SHA-1 value of a Git object. This happens when you checkout a tag, commit, or a remote branch, which puts your repository in "detached HEAD" state.

> git symbolic-ref HEAD //read the value of your HEAD

>git symbolic-ref HEAD refs/heads/test //set the value of HEAD

-------Tag object---

The tag object is very much like a commit object -- it contains a tagger, a date, a message, and a pointer. The main difference is that a tag object generally points to a commit rather than a tree. It's like a branch reference, but it never moves -- it always points to the same commit but gives it a friendlier name.

There are two types of tags: annotated and lightweight. You can make a lightweight tag by running something like this :

> git update-ref refs/tags/v1.0 854314ac5257ede3d84064c6e1afafcd232a5c46

If you create an anotated tag, Git creates a tag object and then writes a reference to point to it rather than directly to the commit.

>git tag -a v1.1 854314ac5257ede3d84064c6e1afafcd232a5c46

--Reference: Remotes--------

If you add a remote and push to it, Git stores the value you last pushed to that remote for each branch in the refs/remotes directory.

> git remote add origin git@github.com:yourname/repository.git

>git push origin master

posted @ 2025-07-25 16:25 Bo_Ren 阅读(8) 评论(0) 收藏举报

刷新页面返回顶部

路漫漫，其修远，上下而求索

GIT Internals 1/3

公告