git.org


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249

* dVCS by way of git

  Sharing complete repositories is a simple concept which involves a
  subtle paradigm shift, which in turn opens up interesting new
  pastures.

  In this talk I will demonstrate some of these constraints
  and their solutions, as implemented in git.

* A look back at SVN

** Linear history is normal, all graphs are trees

  In other words, any given commit can have many children, but only
  one parent.

** Merging is painful and error prone

  Most solutions to this problem involve writing appropriate commit
  logs or writing out to files so merges can be traced. Screwing this
  up can be bad, and as a result it is avoided as much as possible.

** Sharing changes consists of mailing patches

  Obviously this was all workable, but it didn't exactly engender
  itself to lazy people like myself. The existance and popularity of
  CVSup in spite of being written in Modula 3 shows the value of
  repository sharing. You can think of git as CVSup done right.

* Constraints

** Repositories are collections of interwoven histories

  So:

*** Offline operation means history is frequently not linear
*** Merging must be easy
*** Sharing changes must be easy

* How git satisfies dVCS constraints

** History is no longer linear

  Time is no longer a useful identifier when comparing the history of
  disparate repositories, and thus can't be used for commit
  identifiers. Something new must be found.

*** git uses SHA hashes to identify repository objects

  SHA-1 hashes are the basic identifier of every object in the git
  system, which yields a bunch of nice properties we'll get into
  later.

** Merging is elevated to a first class operation

  Git makes merging easy(ier). It will probably never be trivial, but
  git at least automates the grunt work of tracking down common
  ancestors to reduce conflicts and ease merging.

** Branching is trivial and encouraged

  Creating a branch is just creating another ref pointing to an
  existing commit. It's very fast and efficient. It's very easy to
  move things between branches, and they are encouraged for any
  non-trivial work. It doesn't even mess up your history graph a lot
  of the time, and when it does you can often alter it so it does not.

*** What is the object store?

**** blobs
  Blobs are blobs of binary data.

**** trees
  Trees point to blobs or other trees.

**** commits
  Git commits contain a tree, its parent commits, and a tree object,
  along with meta-data: message, author, commiter, and so forth.

  In git, a commit can have many parents, as opposed to SVN where a
  commit can have only one parent. All commits contain a tree, so when
  you had to resolve conflicts from a merge, those will be contained
  in the commit's tree object.


**** tags
  Tag objects contain a commit id and an optional message and
  cryptographic signature. If neither are present, a tag is merely a
  symbolic ref.

*** All objects are identified by SHA hashes.

  The hash table has a number of advantages:

    # since type and length are part of the object you can use one
      namespace.

    # good entropy properties for building hash tables

    # finding tree changes is very simple, since trees effectively
      contain their subtrees and files.

    # system and its history is trivially verifiable. commits are
      effectively signed by all their parents.

*** Investigating the object store

**** TODO Show perl code and output of commit/tree/blob from .git/objects

**** There is no delta concept in the object store
   Deltas are generated by `git gc' when it creates pack files.

** SHA hashes are a pain to type

  Git has a concept of `refs' which are typically symbolic references
  to commits. At the end of the day, every ref ends up as a SHA hash.

*** SHA hashes can typically be shortened to a few characters

*** tags are fixed refs

  Tags always refer to a commit, but can also contain a cryptographic
  signature and message, in which case the ref points to a tag object,
  which, in turn, points to a commit. For almost any use of tags, you
  don't need to care about this, since git is fairly smart about it.

*** branches and HEAD are symbolic refs

  Branches are moving refs and always reference their tips. HEAD is a
  pointer to the tip of the current branch.

*** $ref^ and $ref~$n

  You can follow parents by using caret or tilde notation. Merge
  commits are followed in their order in the commit blob.

  # ^ is the parent, ^^ is the paren't parent, and so on
    e.g: HEAD^ (The next most-recent commit on the current branch)


  # ~2 is shorthand for ^^
    e.g: HEAD~2 (The third most-recent commit on the current branch)

*** $ref@{N}

  Where $ref was N moves ago. You can also specify by date.

** Sharing commits

*** Remotes

  Remotes are named repositories. They're useful when you push or pull
  from the same repository repeatedly. The `origin' remote is used as
  the default remote with many commands.

*** Implicit read-only "vendor" branches.

  When you fetch a remote you get all its objects, so you can always
  look at any point of its history. This duplicates `vendor branch'
  functionality.

*** Push and Pull

  You fetch changes via `fetch,' but frequently use `pull' instead,
  which does a fetch and merge of any remote branches being
  tracked. To publish changes to a remote, use `push.'

*** Space efficient

  Because the object store identifies by the SHA of object contents,
  it won't store duplicates. In effect, this means the cost of an
  additional remote is only the cost of its differences.

*** Transports

  # HTTP
  # GIT
  # SSH
  # Rsync
  # Files
  # Others

*** Example

* Merge strategies

*** Fast forward

  When the merge target is an ancestor of the other branch, this just
  points the target's HEAD at the other branch.

*** Recursive

  Used when more than one common ancestor exists. Builds the merge
  base revision by recursively merging common ancestors.

*** And others

  See git-merge(1)

* A brief note on the index

  The index stores the tree object of the commit-to-be.

  # adding to the index cache: git add
  # removing: git rm --cached

** git reset

  Can be used to reset the index, or certain files in the index, to a
  given commit, which is HEAD by default.

* Problems git solves

** Mixed two patches together

  # git reset $filename
  # git add --patch
  # git commit [--amend]

** In combination with git rebase, entire histories can be manipulated

  # git rebase -i $ref
  # git reset HEAD^
  # git add --patch
  # git commit -c ORIG_HEAD
  # git add -u
  # git commit
  # git rebase --continue

* My seekrit agenda

  I like git. It's the first version control system that allows me to
  use it the way I want to. I commit like crazy, I like to use
  branches, and I make a lot of mistakes. Git encourages this, or at
  least gives you the tools to recover. It's a security blanket.

* Additional Resources

  # Git - SVN Crash Course
    <http://git.or.cz/course/svn.html>

  # GitWiki
    <http://git.or.cz/gitwiki/FrontPage>

  # Git User's Manual
    <http://www.kernel.org/pub/software/scm/git/docs/user-manual.html>

  # Extensive Man Pages