On the Git version control system
- 1A curriculum for team members(6w~1m)
- 2Mental model(199w~1m)
- 3More information(64w~1m)
- 4Things to write?(53w~1m)
- 5Git hash collisions(120w~1m)
- 6Related tools(11w~1m)
- 7Speeding up Git(205w~2m)
- 8Git annoys perfectionists(137w~1m)
- 9Distributed does not mean no coordination(38w~1m)
1A curriculum for team members
2Mental model
A commit is a snapshot of the working tree.
A reference names a commit so that you can write master
instead of da39a3e
.
If you are a visual person, you can think about how git commands change the picture shown by gitk
(a tool for visualizing Git repositories).
In gitk, a blue circle is a commit, a green box is a reference, and the bold green box is the head.
The head points to the commit that will be the parent of the next commit.
When you git init
, Git creates a .git
directory.
When you git status
, it prints On branch master
. It means that the head points to the same commit pointed by the reference named master
.
When you git commit
, you make a new commit (blue circle), and move the current branch (bold green box) to that new commit.
When you git reset Target
, you move the head (bold green box) to Target
.
If you are not yet comfortable with Git, back up your data by copying the .git
directory. It can get corrupted. Things will go wrong. You may accidentally do something and don't know how to recover. Computers don't understand what you mean. They do what you say, not what you mean.
3More information
A Git object:
- is identified by a SHA-1 hash;
- is either a blob, tree, or commit;
- is stored as a file somewhere in the
.git
folder.
A commit has zero or more parents. It also refers to a tree.
A tree is a list of references. Every reference points to either a tree or a blob.
For more information, read the Pro Git book or the manpages (man git
).
4Things to write?
Git fundamentals:
- Git store things in the
.git
directory. - Why merge conflicts? How to resolve them? How to use
meld
? How to do a three-way merge? - Avoid changing spaces. Avoid using your IDE to reformat files that are already commited.
- Git store things in the
Workaround for bad user experience
Disable git-gui GC warning:
5Git hash collisions
Git hash collision may occur albeit extremely unlikely. Git assumes that if two objects have the same hash, then they are the same object. This is false; the converse is true: if two objects are the same, then they have the same hash. When hash collision occur, Git may silently lose data. Git is an example of software that is incorrect but works for the use cases it was designed for (source code versioning). Git is not meant to be used as an arbitrary database.
Other softwares are incorrect as well. We routinely make software that assumes that there will never be more than 264 rows in a database table.
Is it even possible to write correct software at all?
6Related tools
- git-gui, for making commits
- gitk, for showing history
- meld, for three-way diff/merge
7Speeding up Git
- 7.1The problem(134w~1m)
- 7.2Plans(41w~1m)
- 7.3Non-plans(30w~1m)
7.1The problem
I have a repository with 100,000 files and 1,000,000 objects, but most of them are not mine, and I will never use most of them. I don't even think I have more than 1,000 files in that repository. The problem: Git interactive rebase is too slow in that repository.
<2018-12-05> I solved the problem by extracting my work into its own disjoint subtree, and pushing to a different branch of the same repository.
Hypothesis: How git rebase works.
I guess git rebase --onto TARGET BASE MOVE
works like this:
git checkout --orphan TARGET --
git cherry-pick <all commits from BASE to MOVE, excluding BASE, including MOVE>
git checkout -B MOVE
Cherry-pick is also slow. I guess that speeding up cherry-pick will also speed up rebase.
Checkout is also slow. I guess that speeding up checkout will also speed up cherry-pick.
It seems that commit
and write-tree
are slow.
- Git - Environment Variables
GIT_TRACE_PERFORMANCE=true
has no effect. Which git version is it for?
7.2Plans
- Plan: Make a rebase that uses only trees and not indexes
- If a tree changes, all its ancestors have to be rewritten.
- Plan: Just use subtrees and keep the repository small
- I think this is the least-effort solution that solves (works around) the problem.
7.3Non-plans
- atlassian.com: How to manage big Git repositories
- Try git sparse checkout? It seems that sparse-checkout and rebase doesn't mix.
- Not recommended:
git gc --aggressive
(doesn't do what we think it would do).
8Git annoys perfectionists
Is my commit perfect yet? Who the hell cares. But my stupid mistakes will be recorded permanently and published for everyone to see? Who the hell cares.
What do I use Git for? For backup and distribution. Not for history. I don't care about my Git history. In general, I don't care about the past, and I care about the present and the future. I have never needed to look into my Git history to diagnose a programming error. I have never used git-bisect. I try to do what I understand and understand what I do, in order to avoid introducing errors. I only demand that Git preserve the things I think are important.
But Git is a version control system, not a backup solution. It may lose data although the probability is astronomically small (practically improbable).1
9Distributed does not mean no coordination
Git compares lines of text. Git does not compare the meaning of code fragments.
Git merge introduces defect when the source commits are diff-compatible but not meaning-compatible. Example: adding name-clashing functions at different places.