Anything Else

Saturday, March 29, 2008

Git-SVN: Whys And Hows

One of the first thing one sets up when starting a software startup is a version control system, and when we started Vakow! we decided to use SVN because we gained lots of experience with SVN in previous work environment, is free, and more importantly, the cheapest dreamhost account allowed us to run SVN central repository without any problems. Other reasons to use it could be TortoiseSVN for windows user. SVN just works, and central repository and work flow is quite easy to fit in ones mind, there is a repository, you checkout, you work, you see what you have changed so far, you commit, and you update changes done by others. There are tags and branches, but they are nothing for folders for SVN. The weak portion of SVN is merge, basic merge offered by SVN is doable if you keep track of merge revisions numbers in revision logs, this, tho tedious is usually manageable. I was pretty happy with the setup, bugzilla integration worked just fine, tho I am yet to publish my SVN-Bugzilla integration script that works on Dreamhost thanks to endless procrastination. Soon I will. Promise. :-)

Another reason to use SVN for Vakow! when we started was that well, Git wasn't there then.

So what is the problem with SVN. Like I said above, SVN does not help you almost at all in merging. There is no native concept of merge in SVN. SVN is linear history of one big folder, which is organised in trunk and branches, and there is no native support for trunk and branches/tags either in SVN(1.4). These were all cool decisions taken by SVN developers that made it so easy to grasp by developers and its simplicity means robust implementation [without letting the Gods(=Linus et al) come into picture], which all lead to wide spread adoption. But I digress.

Because of lack of merge capability, working with branches is difficult. The work flow for branches is, you have some feature that will take some time to develop, and you do not want to let your customers know prematurely know about the new feature or may be the new feature will destabilize the main code for some time before its stable, you branch off. In SVN you create a copy of your trunk in a new folder, by convention it resides under a folder called "branches". You work on trunk, mostly bug fixes and minor features, on your main stable code, and you work in parallel on the new feature branch. SVN is excellent in letting you do this. But after the work is done, you ultimately have combine the changes you have done in trunk and in the feature branch and move it to trunk. This is what SVN is not good at. SVN does not know about branch, its a folder, so it cant merge, but what it can do is, take the diff of two versions for any folder, and give you a patch file, and then you can apply this patch file to some code and get a merge.

This is how it works: lets say you branched out the feature branch on revision 100, and have been developing trunk and branch till revision 200, when you realize you want to create a build to give it to testers, and you have to make sure changes from rev 100 to 200 on trunk also gets into the branch. So you create a diff from revision 100 to 200 on trunk, and apply it to branch. But merging is not trivial, you may have made changes to same files and same lines that other developers did in trunk while working on feature branch. You have to resolve it manually and its a laborious process. But what happens if testers say no go, and find 10 more bugs for you to fix. You could either revert the changes from trunk, to keep things clean, so that when you are on rev 300 lets say, you can again get the changes on trunk from rev 100 to 300 and apply the patch on branch. Or you can let the changes after merge at rev 200 stay, and keep working separately on trunk and branch. So in future when you have to merge the changes from trunk again, you have to remember your decision. So you must keep it logged in SVN commit logs or somewhere. Biggest issue is that in SVN when you merge, you lose history, you lose exactly how the file changed over time, and the person who merged would be logged as the person who made all the changes. Terrible thing in my opinion.

What happens if more than one branch is involved and merges are brought back and forth between them? The method of merging I described above becomes too difficult to keep track of, remember in real like the revision numbers are not as rounded as 100 and 200 as I used above. This leads to lots of  uncertainty, and programmers hate uncertainty. This all leads to programmers general reluctance to use branches, to consider branches as necessary evil, and a constant effort to keep the number of branches at minimum, with a clear head of the branch defined who is responsible for merging and making sure nobody else is applying the merges and messing with the revision numbers. A small mistake, lets say you merged from revision 1946:2045 instead of 1945:2045 may lead to important bug fix getting lost in the process of merging. Headaches.

I managed with this at Vakow! almost never worked on any branch for any significant time, and given that we were just two people, of which only one can be considered a real programmer, it was not really a big issue. And after all till before Git/Mercurial started to become fashionable about 6-8 months ago [or this is when I started to learn about them], this was the state of art of version control for me.

So how does Git help? Well the first difference between SVN and Git is that Git is distributed where as SVN is centralized. What does it mean, and how does it make merge easier? I am not sure I am absolutely correct about it, but this is what I understand so far. This will make most sense for SVN veterans only, in Git there is no central repository, every "checkout" is "complete", it not only contains the latest code, as checked out code in SVN does, but it also gets complete revision history and all tags and branches. This might sound astounding, what if you had 1000s of checkins and tens of branches, how much space will it all take, but the Gods did step into it when Git came into existence so they solved this issue, and a typical Git clone with all its glory, compares well with SVN checkout when it comes to disk space, and even network transfer rate. These are the things I don't usually bother much as long as they are manageable, so don't tell me if one of them is some percent faster or smaller for some operation or another than other. Since the repository is with you in Git lots of things become fast, checking log is blazing fast for oldest commits, and so is creating branches and doing commits. But this is not why Git or other distributed version control systems shine. I digress again.

Because you have the whole revision history for each branch and trunk, you can do something cool when merging. In git, branch is not branch of a folder as is the case in SVN, its a branch of a commit, Git remember this, where the commit came from, which branch, and what revision. Lets take our original example: branch on 100, merge on 200. Of course Git does not use the numbers like this as its distributed and if it auto incremented both you and me can check in and get version no 101, and then when merging this number will serve no purpose, so Git relies on cryptographic hashing based on commit changes and author info to get revision ids. Anyways, lets say those ids were 100 and 200 and when we are merging the branch=feature[trunk*100] (git keeps track of origin of a branch). This is what git does to merge: it goes back to revision 100, when both trunk and branch and the same content. Then it starts applying changes in the order the happened, lets say first change happened on trunk, so it applies, then the next change on branch, it merges, and so forth. This is possible because the entire change history is available to git. In case there were no conflicts, by the end of it you all changes on trunk applied on branch and git commits by default. This will make the branch now become feature[trunk*200] because now its effectively a branch of revision 200 of trunk. You did not have to remember the revision numbers. Branch based coding heaven!. What happens if 30th commit lead to a conflict? I am not sure about it, if I was designing Git probably I will just ignore that commit and go on, and so on for each conflict causing commit, and at the end of it, I will apply all conflicting commits on top, I am just speculating, conflicts will still cause problem, but because changes are being applied in sequence in which they happened, it reduces the conflicts that happen when the SVN style on big patch is applied to a branch that is really far into the future. Incremental merging will be less error prone then such bulk merging. I just realized I was wrong, Git does something even better(I am glad I did not design it :-), it stops at the first conflict and lets you manually resolve it before proceeding.Now by the end of it, you will have all changes merged cleanly, at any time you will be only trying to resolve one conflict, where as in SVN style bulk merge you would have to resolve conflicts due to more than one conflicting changes at once.

Enough of theory. But still does not solve the problem for Vakow!, we still have others who do not understand Git, who like the simplicity of SVN or are just used to it and considered learning one revision control system enough for their lifetime, and because I have not yet time to rewrite and deploy my SVN bugzilla integration scripts, or get someone else's. And because I am not sure if it will just work with dreamhost, and because of lack of TortoiseSVN, etc, I am still not ready to switch to SVN on server. Next month may be, not yet. And this is from a sysadmin and CTO who is completely convinced that the switch will be beneficial in long run! There are other poor souls who are stuck with SVN, because either their startup/company is still using SVN and going to for sometime, or if they favorite open source system is stuck with SVN because of either code.google.com/sf.net only supporting SVN or because the of the excellent SVN-Trac integration that so many open source softwares are so fond of. Or for other reasons like they want to switch but could not decide between Git, Mercurial and Bazaar and few other, I would advise just move to Git, but then. For one reason or another, people are going to be stuck with SVN for sometime, and for them there is Git-SVN.

Git SVN is a cool two way bridge between Git and SVN. To be used when you love Git but your company/upstream team is stuck with SVN. I learnt about it from this blog post, I am writing my comments with using it for about a month of full time Git SVN usage.

First thing is getting SVN history into local Git:

git svn clone https://svn.foo.com/svn/proj --trunk=trunk --branches=branches --tags=tags

One of the peculiarities about my SVN repository was that I did not have trunk when I begun coding. I just got the startup idea and was in 80th revision by the time I realized I have not followed the usual design, and then I restructured my SVN into trunk, branched, tags usual hierarchy. This led to some problems. Initially when I tried that command, I skipped the parameters as man page told me that those were the default values anyways. Obviously enough I got some error and then remembered my SVN history. Then panicked a little bit. I tried checking out just the trunk portion but that failed too, as trunk was not there in the beginning, so on a last resort without hope I tried the full command, supplying the default values for --trunk etc. And git went on work. It skipped the first 80 or so commits, but I was happy as it got the rest 2000 of them. It kept on stopping because of network issues, my network was flaky, but was robust enough that simply restarting the process continued from where it stopped. I was already becoming a fan for its robustness. :-)

The first thing I did after this was to move into the directory and run gitk. This is a GUI log browser and was quite delighted to see all the revisions since more than a year back, with search and color code diff, way better than my old solution of using ViewSVN based website for browsing history, which was terribly slow, or TortoiseSVN's log feature which again was terribly slow, and no provision to search of highlight author etc. This alone was my justification for keeping git clone of my SVN fresh for quite some time, just to see the logs.

One of the reasons I picked Git over Mercurial was the concept of index in Git. On more than one occasions I committed more than I intended when using SVN, and Mercurial was going to be the same in this regard, but not Git. In SVN and all other decent version control systems, a file has to be manually added before SVN starts keeping track of it. The problem is many times during debugging I would change more than what is minimally needed to fix the issue and will have to be really careful on only picking the files I intend to commit. This is where TortoiseSVN shines, it made this process very robust, at least if you follow the best practices. On command line, this lead to errors. So was quite interested in Git in which after every change you have to add the file again, as Git does not track files, it tracks content, and commits only the content that was there when you added the file using "git add".

Anyways, if you prefer, you can get a behavior of commit very similar to SVN, but I like the Git default.

First things first. By the end of "git svn clone" this is what would happen: you will get a folder named on your project derived from svn path. This folder will contain the latest trunk.

Note: git repositories are not cluttered with .svn like folders all over, there is only one .git folder in top level folder which contains all git related data.

Now the work begins.

Lets say you made some changes in trunk. You can view the changes by "git diff". If you jump ahead and add a file that you have decided to commit by calling "git add filename", "git diff" will stop showing the changes in that file, or more strictly changes in that file till the moment you added it. The changes have gone into "index". To see the changes in the index you have to run "git diff --cached".

You can always see the status of files you have modified or added to index for checkin by running "git status".

Next thing we are going to do is committing. As you have seen already, just changing is not enough, you have to add the files again before you can commit anything. You commit by running "git commit" obviously enough, but if you are a command line warrior, you will miss/hate the fact that git does not think "git ci" is the same as "git commit" as does SVN. But if you are on a decent shell and operating system, the excellent tab completion won't let you miss it all that much. Anyways. And yes, if you are coming from SVN, don't be surprised by the speed of git commit, its nearly instantaneous because its committing to your local branch. You fellow developers using SVN will not notice it yet. But you can go on committing while net is not available.

If you do not like the process of adding a file before committing, and prefer the SVN way, you can do "git commit -a" which will detect changes in all files that are being kept track of.

No point committing if nobody can see. To push your changes upstream, in real SVN repository, you have to run "git svn dcommit". This will commit all your changes on the current branch that has not been committed to SVN yet.

A note about SVN precommit hooks: Some places have pre commit SVN hooks that do not let a commit go unless the log message mention the bug number or include copyright notice on the top or confirm with code formatting practice etc, in those cases the previous step may cause problem if you did not confirm to those rules while committing. The obvious answer is to be careful, but that is not always enough. If possible you should learn about git commit hooks and create them conforming to your SVN repositories commit hooks to ensure that errors do not take place. Though this will mean checking if bug exist before each git commit happens, and slowing down the whole blazing git commit experience but then this is how it is, if you want everything, you have to be really smart to avoid those pesky hooks altogether, but then if you don't use tools and you look like us, most probably you are a chimp. For the matter of this howto just understand that its trivial to undo your commits and redo them if you want with Git to fix some old commit you might have done, but spare yourself the trouble, write git hooks, and get the tools working for you [if you have upstream SVN pre commit hooks. Which BTW you should.].

The above step, "git svn dcommit" will also update your code with SVN changes done by others. But it will only happen if you have some changes to commit, and probably only changes that are required to merge that change will be brought in. So to robustly sync your trunk or branch with that in SVN repository, you should execute "git svn rebase" from the branch time to time.

Q: What is the equivalent of "svn revert file"? A: "git checkout file".
Q: What is the equivalent of "svn copy"? A: None. Git will detect copy, just copy it and git add it before committing.

The wonder of Git Stash:

One of the coolest thing I find in git is the "git stash" command. This takes all your uncommitted changes, and puts them in a hidden location, and reverts to the previous checked in pristine state. Many operations, like "git svn dcommit", "git svn rebase" etc require that you have all the changes checked in and no un-committed changes lying around. You may have precious changes, like local settings files, etc that you don't want to checkin but you don't want to lose them either. So you stash them before those operations. Think of stash as a named patch managed by git for you. You can apply the latest changes that you stashed by running "git stash apply". Your typical work flow could be:
• hack hack
• git add
• git commit
• git stash
• git svn dcommit
• git stash apply
• go to hack hack

Remember every time you run "git stash" a new patch will be created and stored for you, so you may want to run "git stash clear" from time to time to get rid of old stash copies. To list the stashes stored, run "git stash list". The name of each stash is pretty arcane, something like stash@{0}, and you have to type it full to refer to a stored stash by name. If you are working with branches, you may have many stashes that you want to keep around containing changes meaningful to you, so you can give them meaningful description by using the command "git stash save 'my description'" instead of "git stash", and to apply one of the stashes not on top of the list, run: "git stash apply stash@{2}" or so after getting the proper name from "git stash list". Remember the stash/patch is applied to current branch.

Working with branches:

Now the true wonder of Git. It confused me initially quite some, so hopefully this writeup will help a git newbie.

Some basics: branches in Git are of two types, local and remote. You can not work on remote branches directly, only by branching them locally can you commit any changes. So the SVN trunk and other branches and tags for that matter are visible to Git as remote branches, and "git svn clone", the first step in this howto, has created a local branch from trunk called master and checked it out for you.

To be on top of branches, get into the habit of running "git branch". This shows all local branches and indicates the current one. If you have followed this writeup, you should have a local git branch called master, and "git branch" will output just "* master". * meaning master is the currently checked out branch, and you can see its content in the current directory. "git branch -a" or "git branch -a --color" will show you all the branches, local and remote.

If you want to explore any SVN branch or tag, which is remote branch in Git's world, you can check them out:

"git checkout b_web20"

This command will bring the content of the current directory in the state that is there on the HEAD of b_web20 SVN branch. You can look but you can not commit. If you do a "git branch" now, it will show "* (no branch)" as you are viewing a remote branch.

To start work on any of the branches or trunk, you have to create a local branch first, and that is done using "git checkout -b local_branch_name remote_branch_name", so you can say "git checkout -b web20 b_web20" and it will create a branch for you and select is so that the content of current folder will reflect that branch. Now if you do "git branch", it will show "* web20", and also "master" since it was created by git svn clone and is still around, a copy of trunk.

Note: There is one more idiosyncrasy that you will have to learn, sometimes someone will create a new branch in SVN, and you will want to work on it, but you won't find it when you do "git branch -a", and neither "git svn rebase" not "git svn dcommit" will help. You will have to execture "git svn fetch" to get the new branch. Why? Beats me. [I guess rebase only rebases the current branch, and dcommit only syncs new commits on the current branch, because both are working with current branch, they don't this care about other new branches. Programmers may be smart but they are seldom nice. ]

So you have created lots of local branches reflecting the remote SVN branches. You can make changes and commit, and "svn dcommit" will push the commits in appropriate remote branch for you, commits in master <= trunk will go to trunk and in web20 <= b_web20 will go to b_web20.

Now comes the question of merging. First use case is: you are working on branch web20, which is local for remote b_web20, but changes have happened in trunk that you want to merge to web20. You have to run "git merge master" which you have branch web20 checked out. More strictly I am assuming b_web20 was created from trunk. It will merge the changes and commit them for you to your local branch web20. You can run "git merge --no-commit master" to avoid commit.

Note: "git commit --amend" can anytime be used to amend the change log for the previous commit. This often is useful for me to tailor the commit log when I accidentally "git merge" without "--no-commit" flag.

The second scenario is: you are satisfied with the branch and you want to merge it with trunk. You can do so by "git pull . web20" while you have checked out branch master, which was created from trunk. Be careful if you do a "git merge web20" instead, the master local branch will get associated with remote b_web20, and nothing will be merged. If it happens you can get another copy of trunk by doing "git checkout -b master2 trunk" and run the proper "git pull" in it. This too will commit the change, and you may want to amend the commit log. Also remember either of these merges will merge and commit in your local git repository only, you will have to run "git svn dcommit" to push these changes to SVN repository.

An unused branch can be deleted by running "git branch -d branchname". Note this will not delete the branch unless all local commits to it has been pulled or merged into some other branch. 

PS: Vakow! is hiring, so if you want to work with a really cool startup in Mumbai, get in touch!

Labels: Programming Invented Here


Thursday, December 20, 2007

Code's Worst Enemy

Bigger is just something you have to live with in Java. Growth is a fact of life. Java is like a variant of the game of Tetris in which none of the pieces can fill gaps created by the other pieces, so all you can do is pile them up endlessly.

Interesting article on code size.

Label: Programming


Wednesday, November 21, 2007

RSI Tip: Swap Control And Capslock Keys

Stop bending your left thumb in funny ways, after all its opposable thumbs that gives us all the superiority. Here is a reg file to do it on windows[remember to reboot after applying it], other platforms should not be that difficult.

And mix both hands in all the key combos, Right Alt with Tab on left, Right Ctrl with C to copy etc. 

Labels: Programming Life Happens Tips n Tricks


Monday, November 19, 2007

Grok: Zope For Human Beings!

Grok tutorial

Labels: Python Programming


Monday, July 30, 2007

Thread Synchronization Mechanisms in Python

A must read article.

Labels: Python Programming


Sunday, July 29, 2007

dbviews in Django

Came across this comment:

Julian: what we did first was implementing just normal models in the models.py, but that always created the tables in the DB upon “syncdb” (which we deleted then and created views for). Actually I even wrote a patch for django where you can mark a model as “create_table=False” but the patch never made it in.

So we went another actually much better way, after discussing it a lot in the team. We simply did not create the models in models.py, which are used for syncdb but we create a file dbviews.py where we put the views’ models. This is very nice separation of code too.
The next step is writing the view itself, which we just did in pure SQL of course. I.e. if we have the model Forum and we want some specialized ForumActivity-view then we created a view “CREATE VIEW core_forumactivityview” (we are on mysql). We then fired that onto our DB and the model that matched it (make sure to use the same column names as the view does!!!) simply looks like this:

class ForumActiviy(models.Model):
    ….. all the fields
    class Meta:
        db_table = “core_forumactivityview”

now you can simply do

import project.core.dbviews

and they just look like models :-).
Depending on how you wrote the view you might even be able to update the data.

Smart! Learn about MySQL views here.

Labels: Python Programming Django


ThreadLocal Storage In Python

Python has a different semantics of threading.local compared to ThreadLocal in Java. The thing to know is threading.local is a class, and its instance can be used to storing data on it, and data will be unique per thread. Here is a demonstration of its use:

 >>> import threading, time, random
>>> td = threading.local()
>>> class C: pass
...
>>> ntd = C()
>>> class T(threading.Thread):
...     def run(self):
...             td.x = random.randint(0, 100)
...             ntd.x = random.randint(0, 100)
...             time.sleep(random.randint(0, 3))
...             print "td", td.x, "ntd", ntd.x
...             time.sleep(random.randint(0, 3))
...             print "td", td.x, "ntd", ntd.x
...             time.sleep(random.randint(0, 3))
...             print "td", td.x, "ntd", ntd.x
...
>>> ts = [T(), T(), T()]
>>> for t in ts:
...     t.start()
...
td 41 ntd 27
>>> td 41 ntd 72
td 47 ntd 72
td 41 ntd 72
td 97 ntd 72
td 47 ntd 72
td 97 ntd 72
td 97 ntd 72
td 47 ntd 72

As you can see, ntd, an instance of simple object, gets shared across all threads, and all threads see each others changes made on it. Where as td, which is an instance of threading.local has different set of data per thread. 

Labels: Python Programming


Python Import Gotcha And An Advice

In python a module can be imported multiple times, and only at the first import will it be really imported, and subsequently the same module will be returned. This plays a very important role in python, as all kind of singleton class patter or module level initializations that are to be guaranteed to executed only once, relies on this behavior. But this breakdowns when a module can be referenced via two different path. Here is a demonstration:

>>> from path import path
>>> path(".").abspath()
path(u'C:\\Documents and Settings\\amitu')
>>> path("t").mkdir()
>>> import os, sys
>>> os.chdir("t")
>>> path(".").abspath()
path(u'C:\\Documents and Settings\\amitu\\t')
>>> sys.path += [path("..")]
>>> file("t2.py","w").write("""
... print "loading module t2.py"
... """)
>>>
>>> import t2
loading module t2.py
>>> file("__init__.py", "w").write("")
>>> import t.t2
loading module t2.py
>>> from t import t2
>>> sys.modules["t2"]
<module 't2' from 't2.py'>
>>> sys.modules["t.t2"]
<module 't.t2' from 'C:\Documents and Settings\amitu\t\t2.pyc'>
>>>

As you can see the module is loaded twice, with different names. I discovered this when my signal handler was getting called twice in a django project. Further discussion here.  

To avoid this situation, be consistent when importing a module about its path. Either always use package_name.module_name or module_name. Or just make sure pythonpath never contains both a folder and its descendent. 

Labels: Python Programming


Friday, July 27, 2007

HowTo Create Thumbnails

Just dumping a script I just wrote for a friend.

import Image, sys # required PIL: http://www.pythonware.com/products/pil/
from optparse import OptionParser

parser = OptionParser()
parser.add_option("--input", help="Input image path.")
parser.add_option("--output", help="Output file name for the image.")
parser.add_option(
    "--width", default=45, type="int",
    help="Maximum width, in pixels."
)
parser.add_option(
    "--height", default=45, type="int",
    help="Maximum height, in pixels."
)

def main():
    (options, args) = parser.parse_args()
    img = Image.open(options.input)
    img.thumbnail(
        (options.width, options.height), Image.ANTIALIAS
    )
    img.convert("RGB").save(options.output)

if __name__ == "__main__":
    status = main()
    sys.exit(status) 

Python makes it easy. Maintains aspect ratio.

Labels: Python Programming


Thursday, July 26, 2007

Vim: Python Code Folding And My VIMRC

If you were curios about the "# {{{" and "# }}}" in my previous post about extending django user model, they are code folding markers. Getting code folding right in vim took me some time and learning, so here I am documenting the takeaways. Code folding, hiding parts of code that you are not working on, is really cool and helpful, and once you start using it, you can not live without it. Here is how to do it in vim.

Pick The Fold Method 

First of all, you have to decide the method for code folding. Vim offers many methods. First is manual, which is basic, you can fold any piece of text at your whim. This might be good for one off or free form texts. Then comes markers. This is my preferred approach for folding as it can be fairly arbitrary, and it can be preserved in version control and gets shared with multiple developers. Then there is folding by indentation, and language syntax. Both these will let you quickly fold things if you have not manually handpicked folding markers. 

Learn The Keys 

Once the method has been decided, then comes the commands, here is a short list:

  • zf create the fold, useful for manual and marker methods. Select any piece of text, [press v or shift-v, then use arrow keys], and then press zf. It will place the markers around the fold for you in marker mode; in case of manual, it will store fold location in memory. Remember f by saying this command "forms" the fold, or just remember fold :-)
  • zc close the fold at the cursor.
  • zo open the fold at the cursor.
  • zr  increment the fold level by one, so if all classes are folded, they will opened, but function definitions will be kept folded.
  • zm reverse of the above, if one or more function folds are open, they will be closed, but classes will be kept open.
  • zR open all folds.
  • zM close all folds.
  • zj and zk can be used to jump from one fold to another. 

Thats it, that is all you need to know about folding in vim, all commands start with z so they are easy to remember. Z represents a piece of folded paper. From my experience you will be only using zf, zo, zc, zj and zk, and these key combinations have been selected quite wisely to make it easy to remember. 

Configure Things

Here is my complete vimrc file, optimized for python source editing:

" enter spaces when tab is pressed:
set expandtab
" do not break lines when line lenght increases
set textwidth=0
" user 4 spaces to represent a tab
set tabstop=4
set softtabstop=4
" number of space to use for auto indent
" you can use >> or << keys to indent current line or selection
" in normal mode.
set shiftwidth=4
" Copy indent from current line when starting a new line.
set autoindent
" makes backspace key more powerful.
set backspace=indent,eol,start
" shows the match while typing
set incsearch
" case insensitive search
set ignorecase
" show line and column number
set ruler
" show some autocomplete options in status bar
set wildmenu

" automatically save and restore folds
au BufWinLeave * mkview
au BufWinEnter * silent loadview

" this lets us put the marker in the file so that
" it can be shared across and stored in version control.
set foldmethod=marker
" this is for python, put
" # name for the folded text # {{{
" to begin marker and
" # }}}
" close to end it.
set commentstring=\ #\ %s
" default fold level, all open, set it 200 or something
" to make it all closed.
set foldlevel=0

" share clipboard with windows clipboard
set clipboard+=unnamed

" set viminfo='100,f1
" minibufexplorer settings:j
let g:miniBufExplMapWindowNavArrows = 1
let g:miniBufExplMapCTabSwitchWindows = 1

syntax on

Enjoy! 

Labels: Python Programming Invented Here Tips n Tricks


Django: Extending User Model

One of first issue one faces with starting developing with django is extending User. There are three approaches various people are using. 

 User Profile

Django has a concept of UserProfile model. This is the recommended way to extend User model in django, recommended by the django book and django's official documentation. Lets say the name of your project is myproj, you create an app to manage the accounts user registration etc, lets call it accounts. After "startapp accounts" go to the accounts/models.py, and create a model:

# extending user # {{{
class UserProfile(models.Model):
    user = models.ForeignKey(User)
    openid = models.URLField(blank=True)

    def _get_comments(self):
        return Comment.objects.get(user=self, is_public=True)
        public_comments = property(_get_comments)
    del _get_comments

    class Admin: pass
# }}}

While this is enough to store more information about user, django adds a shortcut to access this user profile model more conveniently by calling user.get_profile() if in your settings.py file you add the following line:

AUTH_PROFILE_MODULE = 'accounts.UserProfile'

While this is easy to get started, and "official" way of doing things, it has one major drawback: you put your relations in user profile model. What I mean is, if for example you had a concept of friends, you will say:

# extending user # {{{
class UserProfile(models.Model):
    user = models.ForeignKey(User)
    openid = models.URLField(blank=True)
    friends = models.ManyToManyField('self', symmetrical=True)
    ...

# }}}

and to access it you will have to use something like:

# show_friends # {{{
@login_required
def show_friends(request, userid):
    user = get_object_or_404(User, id=userid)
    friends = user.get_profile().friends
    # do other things with friends and return
    return HttpResposne(
        ",".join(
            [
                friend.get_profile().username
                for friend in user.get_profile().friends
            ]
    )
# }}}

See the problem? You have to remember to call get_profile() at appropriate times, and this sounds a little less right. Some variable are user.XYZ where as some are user.get_profile().ABC, and you are friends with "user", and not his profile! Keeps me wanting a neater solution. Then there is the hassle of making sure that one to one relationship between User and UserProfile is maintained, which on deleting User, you have to delete UserProfile and vice versa.  

The replaces_module Method

This is an incredibly hacky but works with issues method that is on django's wiki on the same topic.

Lost-Theories Solution

I came across another solution for it via Jeff Craft 's lost-theories.com source code; that I am actually using for my current project. In this case he created a LostUser that has a foreignkey to django's User model, and uses LostUser throughout the code. 

class LostUser(models.Model):
    user = models.ForeignKey(User, unique=True)
    # Location info
    city = models.CharField(maxlength=200, blank=True)
    state = models.CharField(maxlength=200, blank=True)
    country = models.CharField(maxlength=200, blank=True)
    ...

The primary problem I faced was the lack of request.user in views and user object in template niceties. I solved that by writing the following middleware:

from vakow.accounts.models import MyUser

class MyUserMiddleware(object):
    def process_request(self, request):
        if request.user.is_authenticated():
            request.vuser = MyUser.objects.get(duser=request.user)
        else: request.vuser = None

and the following context-processor:

# context_processor # {{{
def context_processor(request):
    d = {
       'media_url': settings.MEDIA_URL,
    }
    d['duser'] = request.user
    if request.user.is_authenticated():
        d['vuser'] = request.vuser
    else: d['vuser'] = None
    return d
# }}}

This has the drawback of having to deal with two objects about a user, duser and vuser, where duser is an instance of django's User models, and vuser in my project's user extension. Such naming conventions helped in disambiguating what instance I am talking about, and since almost all aspect of my code worked with my user class, this was not really a problem. I had added a few properties in my derived class for username, email to query it from self.duser.username when self.username is requested. Life was good, and the only time I had any issues was when dealing with django's comment framework, as comment objects contained a foreignkey to duser, and not vuser. I hacked my comment to added a property vuser in Comment model. 

The Final Solution 

This is partially derived from the cool tip I got on Pythoneer. Here is what you can do:

# Extending User # {{{
User.add_to_class("openid", models.URLField(blank=True))
User._meta.admin.fields += (
    ("AmitCom Extensions", { 'fields': ('openid', ) }),
)

class UserExtension(object):
    def _get_comments(self):
        return Comment.objects.get(user=self, is_public=True)
    public_comments = property(_get_comments)
    del _get_comments

User.__bases__ = User.__bases__ + ( UserExtension, )
# }}}

This sounds the best solution to me. All fields, relations and custom methods are on User model and no multiple Models to keep track of.

PS: If you are curios about all the "# {{{" and "# }}}" in the code above, they are code folding markers, learn about code folding for python in vim here

Labels: Python Programming Invented Here Django


How "super" Works In Python

This is from Guido's paper Unifying types and classes in Python 2.2.

Cooperative methods and "super"

One of the coolest, but perhaps also one of the most unusual features of the new classes is the possibility to write "cooperative" classes. Cooperative classes are written with multiple inheritance in mind, using a pattern that I call a "cooperative super call". This is known in some other multiple-inheritance languages as "call-next-method", and is more powerful than the super call found in single-inheritance languages like Java or Smalltalk. C++ has neither form of super call, relying instead on an explicit mechanism similar to that used in classic Python. (The term "cooperative method" comes from "Putting Metaclasses to Work".)

As a refresher, let's first review the traditional, non-cooperative super call. When a class C derives from a base class B, C often wants to override a method m defined in B. A "super call" occurs when C's definition of m calls B's definition of m to do some of its work. In Java, the body of m in C can write super(a, b, c) to call B's definition of m with argument list (a, b, c). In Python, C.m writes B.m(self, a, b, c) to accomplish the same effect. For example:

class B:
def m(self):
print "B here"

class C(B):
def m(self):
print "C here"
B.m(self)
We say that C's method m "extends" B's method m. The pattern here works well as long as we're using single inheritance, but it breaks down with multiple inheritance. Let's look at four classes whose inheritance diagram forms a "diamond" (the same diagram was shown graphically in the previous section):
class A(object): ..
class B(A): ...
class C(A): ...
class D(B, C): ...

Suppose A defines a method m, which is extended by both B and C. Now what is D to do? It inherits two implementations of m, one from B and one from C. Traditionally, Python simply picks the first one found, in this case the definition from B. This is not ideal, because this completely ignores C's definition. To see what's wrong with ignoring C's m, assume that these classes represent some kind of persistent container hierarchy, and consider a method that implements the operation "save your data to disk". Presumably, a D instance has both B's data and C's data, as well as A's data (a single copy of the latter). Ignoring C's definition of the save method would mean that a D instance, when requested to save itself, only saves the A and B parts of its data, but not the part of its data defined by class C!

C++ notices that D inherits two conflicting definitions of method m, and issues an error message. The author of D is then supposed to override m to resolve the conflict. But what is D's definition of m supposed to do? It can call B's m followed by C's m, but because both definitions call the definition of m inherited from A, A's m ends up being called twice! Depending on the details of the operation, this is at best an inefficiency (when m is idempotent), at worst an error. Classic Python has the same problem, except it doesn't even consider it an error to inherit two conflicting definitions of a method: it simply picks the first one.

The traditional solution to this dilemma is to split each derived definition of m into two parts: a partial implementation _m, which only saves the data that is unique to one class, and a full implementation m, which calls its own _m and that of the base class(es). For example:

class A(object):
def m(self): "save A's data"
class B(A):
def _m(self): "save B's data"
def m(self): self._m(); A.m(self)
class C(A):
def _m(self): "save C's data"
def m(self): self._m(); A.m(self)
class D(B, C):
def _m(self): "save D's data"
def m(self): self._m(); B._m(self); C._m(self); A.m(self)

There are several problems with this pattern. First of all, there is the proliferation of extra methods and calls. But perhaps more importantly, it creates an undesirable dependency in the derived classes on details of the dependency graph of their base classes: the existence of A can no longer be considered an implementation detail of B and C, since class D needs to know about it. If, in a future version of the program, we want to remove the dependency on A from B and C, this will affect derived classes like D as well; likewise, if we want to add another base class AA to B and C, all their derived classes must be updated as well.

The "call-next-method" pattern solves this problem nicely, in combination with the new method resolution order. Here's how:

class A(object):
def m(self): "save A's data"
class B(A):
def m(self): "save B's data"; super(B, self).m()
class C(A):
def m(self): "save C's data"; super(C, self).m()
class D(B, C):
def m(self): "save D's data"; super(D, self).m()

Note that the first argument to super is always the class in which it occurs; the second argument is always self. Also note that self is not repeated in the argument list for m.

Now, in order to explain how super works, consider the MRO for each of these classes. The MRO is given by the __mro__ class attribute:

A.__mro__ == (A, object)
B.__mro__ == (B, A, object)
C.__mro__ == (C, A, object)
D.__mro__ == (D, B, C, A, object)

The expression super(C, self).m should only be used inside the implementation of method m in class C. Bear in mind that while self is an instance of C, self.__class__ may not be C: it may be a class derived from C (for example, D). The expression super(C, self).m, then, searches self.__class__.__mro__ (the MRO of the class that was used to create the instance in self) for the occurrence of C, and then starts looking for an implementation of method m following that point.

For example, if self is a C instance, super(C, self).m will find A's implementation of m, as will super(B, self).m if self is a B instance. But now consider a D instance. In D's m, super(D, self).m() will find and call B.m(self), since B is the first base class following D in D.__mro__ that defines m. Now in B.m, super(B, self).m() is called. Since self is a D instance, the MRO is (D, B, C, A, object) and the class following B is C. This is where the search for a definition of m continues. This finds C.m, which is called, and in turn calls super(C, self).m(). Still using the same MRO, we see that the class following C is A, and thus A.m is called. This is the original definition of m, so no super call is made at this point.

Note how the same super expression finds a different class implementing a method depending on the class of self! This is the crux of the cooperative super mechanism.

Quite cool indeed.

Labels: Python Programming


Next