Lecture 6: Version Control (Git)
Git's data model
Snapshots
- tree: directory, maps names to blobs or trees
- blob: file (bunch of bytes)
- snapshot / commit: top-level tree that is being tracked
Example:
- top-level tree has two elements: tree "foo", blob "baz.txt"
<root> (tree)
|
+- foo (tree)
| |
| + bar.txt (blob, contents = "hello world")
|
+- baz.txt (blob, contents = "git is wonderful")
Modeling history: relating snapshots
- history: directed acyclic graph (DAG) of snapshots
- each snapshot refers to a set of parents (the snapshots that preceded it)
- single parent: linear history
- multiple parents: merging two parallel branches
- commits are immutable, "edits" to the commit history are creating new commits
Example:
o <-- o <-- o <-- o <---- o
^ /
\ v
--- o <-- o
Data model, as pseudocode
// a file is a bunch of bytes
type blob = array<byte>
// a directory contains named files and directories
type tree = map<string, tree | blob>
// a commit has parents, metadata, and the top-level tree
type commit = struct {
parents: array<commit>
author: string
message: string
snapshot: tree
}
Objects and content-addressing
Object: blob, tree, or commit
type object = blob | tree | commit
In Git data store, all objects are content-addressed by their SHA-1 hash (160 bits, 40 hexadecimal characters)
objects = map<string, object>
def store(object):
id = sha1(object)
objects[id] = object
def load(id):
return objects[id]
Example:
- When objects reference other object, they don’t actually contain them in their on-disk representation, but have a reference to them by their hash
- visualize commit:
git cat-file -p 698281bc680d1995c5f4caaf3359721a5a58d48d
100644 blob 4448adbf7ecd394f42ae135bbeed9676e894af85 baz.txt
040000 tree c68d233a33c5c06e0340e4c224f0afca87c8ce87 foo
- visualize
baz.txt
:git cat-file -p 4448adbf7ecd394f42ae135bbeed9676e894af85
git is wonderful
References
- human-readable names for SHA-1 hashes
- pointers to commits
- mutable (update to point new commit)
- e.g.
master
reference points to the latest commit in the main branch HEAD
: reference to current check-out commit- e.g.
HEAD -> main -> C1
;git checkout main
- e.g.
- detached
HEAD
: whenHEAD
points to a commit rather than a branch- e.g.
HEAD -> C1
;git checkout C1
- e.g.
references = map<string, string>
def update_reference(name, id):
references[name] = id
def read_reference(name):
return references[name]
def load_reference(name_or_id):
if name_or_id in references:
return load(references[name_or_id])
else:
return load(name_or_id)
Repositories
- Git repository: it is the data
objects
andreferences
- all
git
commands map to some manipulation of the commit DAG by addingobjects
and adding/updatingreferences
Staging area
Allow to specify which modifications should be included in the next snapshot
Git command-line interface
Basics
git help <command>
: get help for a git commandgit init
: creates a new git repo, with data stored in the .git directorygit init --bare
: create bare repositories (for central repositories, does not have working directory)
git status
: tells you what’s going ongit add <filename>
: adds files to staging areagit commit
: creates a new commitgit log
: shows a flattened log of historygit log --all --graph --decorate --oneline
: visualizes history as a DAGgit diff <filename>
: show what has changed but hasn't been added to the index yet viagit add
git diff --cached
: show what has been added to the index viagit add
but not yet committedgit diff <revision> <filename>
: shows differences in a file between snapshotsgit checkout <revision>
: updates HEAD and current branchgit checkout HEAD^
: check out to parent commit (move upwards 1 time);^
(parent directly above);^2
(2nd parent)git checkout HEAD~n
: move upwardsn
times
Branching and merging
git branch
: shows branchesgit branch -vv
for verbose displaygit branch -f main HEAD~3
: moves (by force) themain
branch 3 parents behindHEAD
git branch <name>
: creates a branchgit checkout -b <name>
: creates a branch and switches to itsame as git branch <name>; git checkout <name>
git checkout -b foo origin/main
: set local branchfoo
to trackorigin/main
git merge <revision>
: merges into current branch- need to stage the file to mark the conflict as resolved
git merge --continue
instead ofgit commit
to complete the mergegit merge --abort
to abort the merge
git mergetool
: use a fancy tool to help resolve merge conflictsgit rebase
: rebase set of patches onto a new basegit rebase <basebranch> <topicbranch>
: checks out the topic branch for you and replays it onto the base branch
Remotes
git remote
: list remotesgit remote add <name> <url>
: add a remotegit push <remote> <local branch>:<remote branch>
: send objects to remote, and update remote referencegit push <remote> :<remote branch>
: deletes remote branch
git branch --set-upstream-to=<remote>/<remote branch>
: set up correspondence between local and remote branchgit branch -u origin/main foo
: set local branchfoo
to trackorigin/main
git fetch
: retrieve objects/references from a remote (does not update local branch)git fetch <remote> <remote branch>:<local branch>
: fetch remote branch from remote to local branchgit fetch <remote> :<local branch>
: fetching nothing makes new local branch
git pull
: same as git fetch; git mergegit pull --rebase
: fetch and rebasegit pull origin foo
:git fetch origin foo; git merge origin/foo
git pull origin bar:bugFix
:git fetch origin bar:bugFix; git merge bugFix
git clone
: download repository from remote
Undo
git commit --amend
: edit a commit’s contents/messagegit reset HEAD <file>
: unstage a filegit reset HEAD~1
: move a branch backwardsgit checkout -- <file>
: discard changesgit revert HEAD
: creates a new commit that effectively negates the changes introduced by the specified commit
Advanced Git
git config
: Git is highly customizable,~/.gitconfig
git clone --depth=1
: shallow clone, without entire version historygit add -p
: interactive staginggit rebase -i
: interactive rebasinggit blame
: show who last edited which linegit stash
: temporarily remove modifications to working directorygit stash pop
to undo the stash
git bisect
: binary search history (e.g. for regressions).gitignore
: specify intentionally untracked files to ignoregit show <commit>
: show commitgit cherry-pick <commit1> <commit2>
: apply the changes introduced by specific commits from one branch to another branchgit tag <tagname> <commit>
: add tag at commitgit describe <ref>
: describe where you are relative to the closest tag- outputs
<tag>_<numCommits>_g<hash>
- outputs
Workflows
Types of workflows:
- Centralized workflow
- centralized repository that individual developers will push and pull from
- Feature branch workflow
- all feature development should take place in a dedicated branch instead of the
main
branch
- all feature development should take place in a dedicated branch instead of the
- Gitflow workflow
develop
,feature
,release
,hotfix
branches
- Forking workflow
- instead of a single server-side repository to act as the "central" codebase, it gives every developer a server-side repository
- each contributor has two Git repositories: a private local one and a public server-side one
Gitflow:
The overall flow of Gitflow:
- A
develop
branch is created frommain
- A
release
branch is created fromdevelop
Feature
branches are created fromdevelop
- When a
feature
is completed it is merged into thedevelop
branch - When the
release
branch is done it is merged intodevelop
andmain
- If an issue in
main
is detected ahotfix
branch is created frommain
- Once the
hotfix
is complete it is merged to bothdevelop
andmain
References:
- https://www.endoflineblog.com/gitflow-considered-harmful
- https://www.atlassian.com/git/tutorials/comparing-workflows/gitflow-workflow
- https://nvie.com/posts/a-successful-git-branching-model/
Write good commit messages
Example:
Summarize changes in around 50 characters or less
More detailed explanatory text, if necessary. Wrap it to about 72
characters or so. In some contexts, the first line is treated as the
subject of the commit and the rest of the text as the body. The
blank line separating the summary from the body is critical (unless
you omit the body entirely); various tools like `log`, `shortlog`
and `rebase` can get confused if you run the two together.
Explain the problem that this commit is solving. Focus on why you
are making this change as opposed to how (the code explains that).
Are there side effects or other unintuitive consequences of this
change? Here's the place to explain them.
Further paragraphs come after blank lines.
- Bullet points are okay, too
- Typically a hyphen or asterisk is used for the bullet, preceded
by a single space, with blank lines in between, but conventions
vary here
If you use an issue tracker, put references to them at the bottom,
like this:
Resolves: #123
See also: #456, #789
1. Separate subject fr
The seven rules of a great Git commit message
- One more thing: atomic commits!
- Separate subject from body with a blank line
- Limit the subject line to 50 characters
- Capitalize the subject line
- Do not end the subject line with a period
- Use the imperative mood in the subject line
- Wrap the body at 72 characters
- Use the body to explain what and why vs. how
References:
Exercises
TO-DO