One of the first thing one sets up when starting a software startup is a
version control system, and when we started Vakow! we decided to use SVN because we gained lots
of experience with SVN in previous work environment, is free, and more
importantly, the cheapest dreamhost account allowed us to run SVN central
repository without any problems. Other reasons to use it could be
TortoiseSVN for windows user. SVN just works, and central repository and
work flow is quite easy to fit in ones mind, there is a repository, you
checkout, you work, you see what you have changed so far, you commit, and
you update changes done by others. There are tags and branches, but they are
nothing for folders for SVN. The weak portion of SVN is merge, basic merge
offered by SVN is doable if you keep track of merge revisions numbers in
revision logs, this, tho tedious is usually manageable. I was pretty happy
with the setup, bugzilla integration worked just fine, tho I am yet to
publish my SVN-Bugzilla integration script that works on Dreamhost thanks to
endless procrastination. Soon I will. Promise. :-)
Another reason
to use SVN for Vakow! when we started was that well, Git wasn't there
then.
So what is the problem with SVN. Like I said above, SVN
does not help you almost at all in merging. There is no native concept of
merge in SVN. SVN is linear history of one big folder, which is organised in
trunk and branches, and there is no native support for trunk and
branches/tags either in SVN(1.4). These were all cool decisions taken by SVN
developers that made it so easy to grasp by developers and its simplicity
means robust implementation [without letting the Gods(=Linus et al) come
into picture], which all lead to wide spread adoption. But I digress.
Because of lack of merge capability, working with branches is
difficult. The work flow for branches is, you have some feature that will
take some time to develop, and you do not want to let your customers know
prematurely know about the new feature or may be the new feature will
destabilize the main code for some time before its stable, you branch off.
In SVN you create a copy of your trunk in a new folder, by convention it
resides under a folder called "branches". You work on trunk,
mostly bug fixes and minor features, on your main stable code, and you work
in parallel on the new feature branch. SVN is excellent in letting you do
this. But after the work is done, you ultimately have combine the changes
you have done in trunk and in the feature branch and move it to trunk. This
is what SVN is not good at. SVN does not know about branch, its a folder, so
it cant merge, but what it can do is, take the diff of two versions for any
folder, and give you a patch file, and then you can apply this patch file to
some code and get a merge.
This is how it works: lets say you
branched out the feature branch on revision 100, and have been developing
trunk and branch till revision 200, when you realize you want to create a
build to give it to testers, and you have to make sure changes from rev 100
to 200 on trunk also gets into the branch. So you create a diff from
revision 100 to 200 on trunk, and apply it to branch. But merging is not
trivial, you may have made changes to same files and same lines that other
developers did in trunk while working on feature branch. You have to resolve
it manually and its a laborious process. But what happens if testers say no
go, and find 10 more bugs for you to fix. You could either revert the
changes from trunk, to keep things clean, so that when you are on rev 300
lets say, you can again get the changes on trunk from rev 100 to 300 and
apply the patch on branch. Or you can let the changes after merge at rev 200
stay, and keep working separately on trunk and branch. So in future when you
have to merge the changes from trunk again, you have to remember your
decision. So you must keep it logged in SVN commit logs or somewhere.
Biggest issue is that in SVN when you merge, you lose history, you lose
exactly how the file changed over time, and the person who merged would be
logged as the person who made all the changes. Terrible thing in my opinion.
What happens if more than one branch is involved and merges are
brought back and forth between them? The method of merging I described above
becomes too difficult to keep track of, remember in real like the revision
numbers are not as rounded as 100 and 200 as I used above. This leads to
lots of uncertainty, and programmers hate uncertainty. This all leads
to programmers general reluctance to use branches, to consider branches as
necessary evil, and a constant effort to keep the number of branches at
minimum, with a clear head of the branch defined who is responsible for
merging and making sure nobody else is applying the merges and messing with
the revision numbers. A small mistake, lets say you merged from revision
1946:2045 instead of 1945:2045 may lead to important bug fix getting lost in
the process of merging. Headaches.
I managed with this at Vakow!
almost never worked on any branch for any significant time, and given that
we were just two people, of which only one can be considered a real
programmer, it was not really a big issue. And after all till before
Git/Mercurial started to become fashionable about 6-8 months ago [or this is
when I started to learn about them], this was the state of art of version
control for me.
So how does Git help? Well the first difference
between SVN and Git is that Git is distributed where as SVN is centralized.
What does it mean, and how does it make merge easier? I am not sure I am
absolutely correct about it, but this is what I understand so far. This will
make most sense for SVN veterans only, in Git there is no central
repository, every "checkout" is "complete", it not only
contains the latest code, as checked out code in SVN does, but it also gets
complete revision history and all tags and branches. This might sound
astounding, what if you had 1000s of checkins and tens of branches, how much
space will it all take, but the Gods did step into it when Git came into
existence so they solved this issue, and a typical Git clone with all its
glory, compares well with SVN checkout when it comes to disk space, and even
network transfer rate. These are the things I don't usually bother much
as long as they are manageable, so don't tell me if one of them is some
percent faster or smaller for some operation or another than other. Since
the repository is with you in Git lots of things become fast, checking log
is blazing fast for oldest commits, and so is creating branches and doing
commits. But this is not why Git or other distributed version control
systems shine. I digress again.
Because you have the whole
revision history for each branch and trunk, you can do something cool when
merging. In git, branch is not branch of a folder as is the case in SVN, its
a branch of a commit, Git remember this, where the commit came from, which
branch, and what revision. Lets take our original example: branch on 100,
merge on 200. Of course Git does not use the numbers like this as its
distributed and if it auto incremented both you and me can check in and get
version no 101, and then when merging this number will serve no purpose, so
Git relies on cryptographic hashing based on commit changes and author info
to get revision ids. Anyways, lets say those ids were 100 and 200 and when
we are merging the branch=feature[trunk*100] (git keeps track of origin of a
branch). This is what git does to merge: it goes back to revision 100, when
both trunk and branch and the same content. Then it starts applying changes
in the order the happened, lets say first change happened on trunk, so it
applies, then the next change on branch, it merges, and so forth. This is
possible because the entire change history is available to git. In case
there were no conflicts, by the end of it you all changes on trunk applied
on branch and git commits by default. This will make the branch now become
feature[trunk*200] because now its effectively a branch of revision 200 of
trunk. You did not have to remember the revision numbers. Branch based
coding heaven!. What happens if 30th commit lead to a conflict? I am not
sure about it, if I was designing Git probably I will just ignore that
commit and go on, and so on for each conflict causing commit, and at the end
of it, I will apply all conflicting commits on top, I am just speculating,
conflicts will still cause problem, but because changes are being applied in
sequence in which they happened, it reduces the conflicts that happen when
the SVN style on big patch is applied to a branch that is really far into
the future. Incremental merging will be less error prone then such bulk
merging. I just realized I was wrong, Git does something even better(I am
glad I did not design it :-), it stops at the first conflict and lets you
manually resolve it before proceeding.Now by the end of it, you will have
all changes merged cleanly, at any time you will be only trying to resolve
one conflict, where as in SVN style bulk merge you would have to resolve
conflicts due to more than one conflicting changes at once.
Enough of theory. But still does not solve the problem for Vakow!, we
still have others who do not understand Git, who like the simplicity of SVN
or are just used to it and considered learning one revision control system
enough for their lifetime, and because I have not yet time to rewrite and
deploy my SVN bugzilla integration scripts, or get someone else's. And
because I am not sure if it will just work with dreamhost, and because of
lack of TortoiseSVN, etc, I am still not ready to switch to SVN on server.
Next month may be, not yet. And this is from a sysadmin and CTO who is
completely convinced that the switch will be beneficial in long run! There
are other poor souls who are stuck with SVN, because either their
startup/company is still using SVN and going to for sometime, or if they
favorite open source system is stuck with SVN because of either
code.google.com/sf.net only supporting SVN or because the of the excellent
SVN-Trac integration that so many open source softwares are so fond of. Or
for other reasons like they want to switch but could not decide between Git,
Mercurial and Bazaar and few other, I would advise just move to Git, but
then. For one reason or another, people are going to be stuck with SVN for
sometime, and for them there is Git-SVN.
Git SVN is a cool two
way bridge between Git and SVN. To be used when you love Git but your
company/upstream team is stuck with SVN. I learnt about it from this blog post, I am writing my comments with using it
for about a month of full time Git SVN usage.
First thing is
getting SVN history into local Git:
git svn clone
https://svn.foo.com/svn/proj --trunk=trunk --branches=branches
--tags=tags
One of the peculiarities about my SVN repository was
that I did not have trunk when I begun coding. I just got the startup idea
and was in 80th revision by the time I realized I have not followed the
usual design, and then I restructured my SVN into trunk, branched, tags
usual hierarchy. This led to some problems. Initially when I tried that
command, I skipped the parameters as man page told me that those were the
default values anyways. Obviously enough I got some error and then
remembered my SVN history. Then panicked a little bit. I tried checking out
just the trunk portion but that failed too, as trunk was not there in the
beginning, so on a last resort without hope I tried the full command,
supplying the default values for --trunk etc. And git went on work. It
skipped the first 80 or so commits, but I was happy as it got the rest 2000
of them. It kept on stopping because of network issues, my network was
flaky, but was robust enough that simply restarting the process continued
from where it stopped. I was already becoming a fan for its robustness. :-)
The first thing I did after this was to move into the directory
and run gitk. This is a GUI log browser and was quite delighted to see all
the revisions since more than a year back, with search and color code diff,
way better than my old solution of using ViewSVN based website for browsing
history, which was terribly slow, or TortoiseSVN's log feature which
again was terribly slow, and no provision to search of highlight author etc.
This alone was my justification for keeping git clone of my SVN fresh for
quite some time, just to see the logs.
One of the reasons I
picked Git over Mercurial was the concept of index in Git. On more than one
occasions I committed more than I intended when using SVN, and Mercurial was
going to be the same in this regard, but not Git. In SVN and all other
decent version control systems, a file has to be manually added before SVN
starts keeping track of it. The problem is many times during debugging I
would change more than what is minimally needed to fix the issue and will
have to be really careful on only picking the files I intend to commit. This
is where TortoiseSVN shines, it made this process very robust, at least if
you follow the best practices. On command line, this lead to errors. So was
quite interested in Git in which after every change you have to add the file
again, as Git does not track files, it tracks content, and commits only the
content that was there when you added the file using "git add".
Anyways, if you prefer, you can get a behavior of commit very
similar to SVN, but I like the Git default.
First things first.
By the end of "git svn clone" this is what would happen: you will
get a folder named on your project derived from svn path. This folder will
contain the latest trunk.
Note: git repositories are not cluttered
with .svn like folders all over, there is only one .git folder in top level
folder which contains all git related data.
Now the work begins.
Lets say you made some changes in trunk. You can view the
changes by "git diff". If you jump ahead and add a file that you
have decided to commit by calling "git add filename", "git
diff" will stop showing the changes in that file, or more strictly
changes in that file till the moment you added it. The changes have gone
into "index". To see the changes in the index you have to run
"git diff --cached".
You can always see the status of
files you have modified or added to index for checkin by running "git
status".
Next thing we are going to do is committing. As you
have seen already, just changing is not enough, you have to add the files
again before you can commit anything. You commit by running "git
commit" obviously enough, but if you are a command line warrior, you
will miss/hate the fact that git does not think "git ci" is the
same as "git commit" as does SVN. But if you are on a decent shell
and operating system, the excellent tab completion won't let you miss it
all that much. Anyways. And yes, if you are coming from SVN, don't be
surprised by the speed of git commit, its nearly instantaneous because its
committing to your local branch. You fellow developers using SVN will not
notice it yet. But you can go on committing while net is not available.
If you do not like the process of adding a file before committing,
and prefer the SVN way, you can do "git commit -a" which will
detect changes in all files that are being kept track of.
No point
committing if nobody can see. To push your changes upstream, in real SVN
repository, you have to run "git svn dcommit". This will commit
all your changes on the current branch that has not been committed to SVN
yet.
A note about SVN precommit hooks: Some places have pre
commit SVN hooks that do not let a commit go unless the log message mention
the bug number or include copyright notice on the top or confirm with code
formatting practice etc, in those cases the previous step may cause problem
if you did not confirm to those rules while committing. The obvious answer
is to be careful, but that is not always enough. If possible you should
learn about git commit hooks and create them conforming to your SVN
repositories commit hooks to ensure that errors do not take place. Though
this will mean checking if bug exist before each git commit happens, and
slowing down the whole blazing git commit experience but then this is how it
is, if you want everything, you have to be really smart to avoid those pesky
hooks altogether, but then if you don't use tools and you look like us,
most probably you are a chimp. For the matter of this howto just understand
that its trivial to undo your commits and redo them if you want with Git to
fix some old commit you might have done, but spare yourself the trouble,
write git hooks, and get the tools working for you [if you have upstream SVN
pre commit hooks. Which BTW you should.].
The above step,
"git svn dcommit" will also update your code with SVN changes done
by others. But it will only happen if you have some changes to commit, and
probably only changes that are required to merge that change will be brought
in. So to robustly sync your trunk or branch with that in SVN repository,
you should execute "git svn rebase" from the branch time to time.
Q: What is the equivalent of "svn revert file"? A:
"git checkout file".
Q: What is the equivalent of "svn
copy"? A: None. Git will detect copy, just copy it and git add it
before committing.
The wonder of Git Stash:
One of the
coolest thing I find in git is the "git stash" command. This takes
all your uncommitted changes, and puts them in a hidden location, and
reverts to the previous checked in pristine state. Many operations, like
"git svn dcommit", "git svn rebase" etc require that you
have all the changes checked in and no un-committed changes lying around.
You may have precious changes, like local settings files, etc that you
don't want to checkin but you don't want to lose them either. So you
stash them before those operations. Think of stash as a named patch managed
by git for you. You can apply the latest changes that you stashed by running
"git stash apply". Your typical work flow could be:
•
hack hack
• git add
• git commit
• git
stash
• git svn dcommit
• git stash apply
• go
to hack hack
Remember every time you run "git stash" a new
patch will be created and stored for you, so you may want to run "git
stash clear" from time to time to get rid of old stash copies. To list
the stashes stored, run "git stash list". The name of each stash
is pretty arcane, something like stash@{0}, and you have to type it full to
refer to a stored stash by name. If you are working with branches, you may
have many stashes that you want to keep around containing changes meaningful
to you, so you can give them meaningful description by using the command
"git stash save 'my description'" instead of "git
stash", and to apply one of the stashes not on top of the list, run:
"git stash apply stash@{2}" or so after getting the proper name
from "git stash list". Remember the stash/patch is applied to
current branch.
Working with branches:
Now the true
wonder of Git. It confused me initially quite some, so hopefully this
writeup will help a git newbie.
Some basics: branches in Git are
of two types, local and remote. You can not work on remote branches
directly, only by branching them locally can you commit any changes. So the
SVN trunk and other branches and tags for that matter are visible to Git as
remote branches, and "git svn clone", the first step in this
howto, has created a local branch from trunk called master and checked it
out for you.
To be on top of branches, get into the habit of
running "git branch". This shows all local branches and indicates
the current one. If you have followed this writeup, you should have a local
git branch called master, and "git branch" will output just
"* master". * meaning master is the currently checked out branch,
and you can see its content in the current directory. "git branch
-a" or "git branch -a --color" will show you all the
branches, local and remote.
If you want to explore any SVN
branch or tag, which is remote branch in Git's world, you can check them
out:
"git checkout b_web20"
This command
will bring the content of the current directory in the state that is there
on the HEAD of b_web20 SVN branch. You can look but you can not commit. If
you do a "git branch" now, it will show "* (no branch)"
as you are viewing a remote branch.
To start work on any of the
branches or trunk, you have to create a local branch first, and that is done
using "git checkout -b local_branch_name remote_branch_name", so
you can say "git checkout -b web20 b_web20" and it will create a
branch for you and select is so that the content of current folder will
reflect that branch. Now if you do "git branch", it will show
"* web20", and also "master" since it was created by git
svn clone and is still around, a copy of trunk.
Note: There is
one more idiosyncrasy that you will have to learn, sometimes someone will
create a new branch in SVN, and you will want to work on it, but you
won't find it when you do "git branch -a", and neither
"git svn rebase" not "git svn dcommit" will help. You
will have to execture "git svn fetch" to get the new branch. Why?
Beats me. [I guess rebase only rebases the current branch, and dcommit only
syncs new commits on the current branch, because both are working with
current branch, they don't this care about other new branches.
Programmers may be smart but they are seldom nice. ]
So you have
created lots of local branches reflecting the remote SVN branches. You can
make changes and commit, and "svn dcommit" will push the commits
in appropriate remote branch for you, commits in master <= trunk will go
to trunk and in web20 <= b_web20 will go to b_web20.
Now
comes the question of merging. First use case is: you are working on branch
web20, which is local for remote b_web20, but changes have happened in trunk
that you want to merge to web20. You have to run "git merge
master" which you have branch web20 checked out. More strictly I am
assuming b_web20 was created from trunk. It will merge the changes and
commit them for you to your local branch web20. You can run "git merge
--no-commit master" to avoid commit.
Note: "git commit
--amend" can anytime be used to amend the change log for the previous
commit. This often is useful for me to tailor the commit log when I
accidentally "git merge" without "--no-commit" flag.
The second scenario is: you are satisfied with the branch and you
want to merge it with trunk. You can do so by "git pull . web20"
while you have checked out branch master, which was created from trunk. Be
careful if you do a "git merge web20" instead, the master local
branch will get associated with remote b_web20, and nothing will be merged.
If it happens you can get another copy of trunk by doing "git checkout
-b master2 trunk" and run the proper "git pull" in it. This
too will commit the change, and you may want to amend the commit log. Also
remember either of these merges will merge and commit in your local git
repository only, you will have to run "git svn dcommit" to push
these changes to SVN repository.
An unused branch can be deleted by
running "git branch -d branchname". Note this will not delete the
branch unless all local commits to it has been pulled or merged into some
other branch.
PS: Vakow! is hiring, so if you want to work with
a really cool startup in Mumbai, get in touch!
PS: Read more about git on my git page.