Anything Else

Monday, November 19, 2007

Grok: Zope For Human Beings!

Grok tutorial

Labels: Python Programming


Monday, July 30, 2007

Thread Synchronization Mechanisms in Python

A must read article.

Labels: Python Programming


Sunday, July 29, 2007

dbviews in Django

Came across this comment:

Julian: what we did first was implementing just normal models in the models.py, but that always created the tables in the DB upon “syncdb” (which we deleted then and created views for). Actually I even wrote a patch for django where you can mark a model as “create_table=False” but the patch never made it in.

So we went another actually much better way, after discussing it a lot in the team. We simply did not create the models in models.py, which are used for syncdb but we create a file dbviews.py where we put the views’ models. This is very nice separation of code too.
The next step is writing the view itself, which we just did in pure SQL of course. I.e. if we have the model Forum and we want some specialized ForumActivity-view then we created a view “CREATE VIEW core_forumactivityview” (we are on mysql). We then fired that onto our DB and the model that matched it (make sure to use the same column names as the view does!!!) simply looks like this:

class ForumActiviy(models.Model):
    ….. all the fields
    class Meta:
        db_table = “core_forumactivityview”

now you can simply do

import project.core.dbviews

and they just look like models :-).
Depending on how you wrote the view you might even be able to update the data.

Smart! Learn about MySQL views here.

Labels: Python Programming Django


ThreadLocal Storage In Python

Python has a different semantics of threading.local compared to ThreadLocal in Java. The thing to know is threading.local is a class, and its instance can be used to storing data on it, and data will be unique per thread. Here is a demonstration of its use:

 >>> import threading, time, random
>>> td = threading.local()
>>> class C: pass
...
>>> ntd = C()
>>> class T(threading.Thread):
...     def run(self):
...             td.x = random.randint(0, 100)
...             ntd.x = random.randint(0, 100)
...             time.sleep(random.randint(0, 3))
...             print "td", td.x, "ntd", ntd.x
...             time.sleep(random.randint(0, 3))
...             print "td", td.x, "ntd", ntd.x
...             time.sleep(random.randint(0, 3))
...             print "td", td.x, "ntd", ntd.x
...
>>> ts = [T(), T(), T()]
>>> for t in ts:
...     t.start()
...
td 41 ntd 27
>>> td 41 ntd 72
td 47 ntd 72
td 41 ntd 72
td 97 ntd 72
td 47 ntd 72
td 97 ntd 72
td 97 ntd 72
td 47 ntd 72

As you can see, ntd, an instance of simple object, gets shared across all threads, and all threads see each others changes made on it. Where as td, which is an instance of threading.local has different set of data per thread. 

Labels: Python Programming


Python Import Gotcha And An Advice

In python a module can be imported multiple times, and only at the first import will it be really imported, and subsequently the same module will be returned. This plays a very important role in python, as all kind of singleton class patter or module level initializations that are to be guaranteed to executed only once, relies on this behavior. But this breakdowns when a module can be referenced via two different path. Here is a demonstration:

>>> from path import path
>>> path(".").abspath()
path(u'C:\\Documents and Settings\\amitu')
>>> path("t").mkdir()
>>> import os, sys
>>> os.chdir("t")
>>> path(".").abspath()
path(u'C:\\Documents and Settings\\amitu\\t')
>>> sys.path += [path("..")]
>>> file("t2.py","w").write("""
... print "loading module t2.py"
... """)
>>>
>>> import t2
loading module t2.py
>>> file("__init__.py", "w").write("")
>>> import t.t2
loading module t2.py
>>> from t import t2
>>> sys.modules["t2"]
<module 't2' from 't2.py'>
>>> sys.modules["t.t2"]
<module 't.t2' from 'C:\Documents and Settings\amitu\t\t2.pyc'>
>>>

As you can see the module is loaded twice, with different names. I discovered this when my signal handler was getting called twice in a django project. Further discussion here.  

To avoid this situation, be consistent when importing a module about its path. Either always use package_name.module_name or module_name. Or just make sure pythonpath never contains both a folder and its descendent. 

Labels: Python Programming


Friday, July 27, 2007

HowTo Create Thumbnails

Just dumping a script I just wrote for a friend.

import Image, sys # required PIL: http://www.pythonware.com/products/pil/
from optparse import OptionParser

parser = OptionParser()
parser.add_option("--input", help="Input image path.")
parser.add_option("--output", help="Output file name for the image.")
parser.add_option(
    "--width", default=45, type="int",
    help="Maximum width, in pixels."
)
parser.add_option(
    "--height", default=45, type="int",
    help="Maximum height, in pixels."
)

def main():
    (options, args) = parser.parse_args()
    img = Image.open(options.input)
    img.thumbnail(
        (options.width, options.height), Image.ANTIALIAS
    )
    img.convert("RGB").save(options.output)

if __name__ == "__main__":
    status = main()
    sys.exit(status) 

Python makes it easy. Maintains aspect ratio.

Labels: Python Programming


Thursday, July 26, 2007

Vim: Python Code Folding And My VIMRC

If you were curios about the "# {{{" and "# }}}" in my previous post about extending django user model, they are code folding markers. Getting code folding right in vim took me some time and learning, so here I am documenting the takeaways. Code folding, hiding parts of code that you are not working on, is really cool and helpful, and once you start using it, you can not live without it. Here is how to do it in vim.

Pick The Fold Method 

First of all, you have to decide the method for code folding. Vim offers many methods. First is manual, which is basic, you can fold any piece of text at your whim. This might be good for one off or free form texts. Then comes markers. This is my preferred approach for folding as it can be fairly arbitrary, and it can be preserved in version control and gets shared with multiple developers. Then there is folding by indentation, and language syntax. Both these will let you quickly fold things if you have not manually handpicked folding markers. 

Learn The Keys 

Once the method has been decided, then comes the commands, here is a short list:

  • zf create the fold, useful for manual and marker methods. Select any piece of text, [press v or shift-v, then use arrow keys], and then press zf. It will place the markers around the fold for you in marker mode; in case of manual, it will store fold location in memory. Remember f by saying this command "forms" the fold, or just remember fold :-)
  • zc close the fold at the cursor.
  • zo open the fold at the cursor.
  • zr  increment the fold level by one, so if all classes are folded, they will opened, but function definitions will be kept folded.
  • zm reverse of the above, if one or more function folds are open, they will be closed, but classes will be kept open.
  • zR open all folds.
  • zM close all folds.
  • zj and zk can be used to jump from one fold to another. 

Thats it, that is all you need to know about folding in vim, all commands start with z so they are easy to remember. Z represents a piece of folded paper. From my experience you will be only using zf, zo, zc, zj and zk, and these key combinations have been selected quite wisely to make it easy to remember. 

Configure Things

Here is my complete vimrc file, optimized for python source editing:

" enter spaces when tab is pressed:
set expandtab
" do not break lines when line lenght increases
set textwidth=0
" user 4 spaces to represent a tab
set tabstop=4
set softtabstop=4
" number of space to use for auto indent
" you can use >> or << keys to indent current line or selection
" in normal mode.
set shiftwidth=4
" Copy indent from current line when starting a new line.
set autoindent
" makes backspace key more powerful.
set backspace=indent,eol,start
" shows the match while typing
set incsearch
" case insensitive search
set ignorecase
" show line and column number
set ruler
" show some autocomplete options in status bar
set wildmenu

" automatically save and restore folds
au BufWinLeave * mkview
au BufWinEnter * silent loadview

" this lets us put the marker in the file so that
" it can be shared across and stored in version control.
set foldmethod=marker
" this is for python, put
" # name for the folded text # {{{
" to begin marker and
" # }}}
" close to end it.
set commentstring=\ #\ %s
" default fold level, all open, set it 200 or something
" to make it all closed.
set foldlevel=0

" share clipboard with windows clipboard
set clipboard+=unnamed

" set viminfo='100,f1
" minibufexplorer settings:j
let g:miniBufExplMapWindowNavArrows = 1
let g:miniBufExplMapCTabSwitchWindows = 1

syntax on

Enjoy! 

Labels: Python Programming Invented Here Tips n Tricks


Django: Extending User Model

One of first issue one faces with starting developing with django is extending User. There are three approaches various people are using. 

 User Profile

Django has a concept of UserProfile model. This is the recommended way to extend User model in django, recommended by the django book and django's official documentation. Lets say the name of your project is myproj, you create an app to manage the accounts user registration etc, lets call it accounts. After "startapp accounts" go to the accounts/models.py, and create a model:

# extending user # {{{
class UserProfile(models.Model):
    user = models.ForeignKey(User)
    openid = models.URLField(blank=True)

    def _get_comments(self):
        return Comment.objects.get(user=self, is_public=True)
        public_comments = property(_get_comments)
    del _get_comments

    class Admin: pass
# }}}

While this is enough to store more information about user, django adds a shortcut to access this user profile model more conveniently by calling user.get_profile() if in your settings.py file you add the following line:

AUTH_PROFILE_MODULE = 'accounts.UserProfile'

While this is easy to get started, and "official" way of doing things, it has one major drawback: you put your relations in user profile model. What I mean is, if for example you had a concept of friends, you will say:

# extending user # {{{
class UserProfile(models.Model):
    user = models.ForeignKey(User)
    openid = models.URLField(blank=True)
    friends = models.ManyToManyField('self', symmetrical=True)
    ...

# }}}

and to access it you will have to use something like:

# show_friends # {{{
@login_required
def show_friends(request, userid):
    user = get_object_or_404(User, id=userid)
    friends = user.get_profile().friends
    # do other things with friends and return
    return HttpResposne(
        ",".join(
            [
                friend.get_profile().username
                for friend in user.get_profile().friends
            ]
    )
# }}}

See the problem? You have to remember to call get_profile() at appropriate times, and this sounds a little less right. Some variable are user.XYZ where as some are user.get_profile().ABC, and you are friends with "user", and not his profile! Keeps me wanting a neater solution. Then there is the hassle of making sure that one to one relationship between User and UserProfile is maintained, which on deleting User, you have to delete UserProfile and vice versa.  

The replaces_module Method

This is an incredibly hacky but works with issues method that is on django's wiki on the same topic.

Lost-Theories Solution

I came across another solution for it via Jeff Craft 's lost-theories.com source code; that I am actually using for my current project. In this case he created a LostUser that has a foreignkey to django's User model, and uses LostUser throughout the code. 

class LostUser(models.Model):
    user = models.ForeignKey(User, unique=True)
    # Location info
    city = models.CharField(maxlength=200, blank=True)
    state = models.CharField(maxlength=200, blank=True)
    country = models.CharField(maxlength=200, blank=True)
    ...

The primary problem I faced was the lack of request.user in views and user object in template niceties. I solved that by writing the following middleware:

from vakow.accounts.models import MyUser

class MyUserMiddleware(object):
    def process_request(self, request):
        if request.user.is_authenticated():
            request.vuser = MyUser.objects.get(duser=request.user)
        else: request.vuser = None

and the following context-processor:

# context_processor # {{{
def context_processor(request):
    d = {
       'media_url': settings.MEDIA_URL,
    }
    d['duser'] = request.user
    if request.user.is_authenticated():
        d['vuser'] = request.vuser
    else: d['vuser'] = None
    return d
# }}}

This has the drawback of having to deal with two objects about a user, duser and vuser, where duser is an instance of django's User models, and vuser in my project's user extension. Such naming conventions helped in disambiguating what instance I am talking about, and since almost all aspect of my code worked with my user class, this was not really a problem. I had added a few properties in my derived class for username, email to query it from self.duser.username when self.username is requested. Life was good, and the only time I had any issues was when dealing with django's comment framework, as comment objects contained a foreignkey to duser, and not vuser. I hacked my comment to added a property vuser in Comment model. 

The Final Solution 

This is partially derived from the cool tip I got on Pythoneer. Here is what you can do:

# Extending User # {{{
User.add_to_class("openid", models.URLField(blank=True))
User._meta.admin.fields += (
    ("AmitCom Extensions", { 'fields': ('openid', ) }),
)

class UserExtension(object):
    def _get_comments(self):
        return Comment.objects.get(user=self, is_public=True)
    public_comments = property(_get_comments)
    del _get_comments

User.__bases__ = User.__bases__ + ( UserExtension, )
# }}}

This sounds the best solution to me. All fields, relations and custom methods are on User model and no multiple Models to keep track of.

PS: If you are curios about all the "# {{{" and "# }}}" in the code above, they are code folding markers, learn about code folding for python in vim here

Labels: Python Programming Invented Here Django


How "super" Works In Python

This is from Guido's paper Unifying types and classes in Python 2.2.

Cooperative methods and "super"

One of the coolest, but perhaps also one of the most unusual features of the new classes is the possibility to write "cooperative" classes. Cooperative classes are written with multiple inheritance in mind, using a pattern that I call a "cooperative super call". This is known in some other multiple-inheritance languages as "call-next-method", and is more powerful than the super call found in single-inheritance languages like Java or Smalltalk. C++ has neither form of super call, relying instead on an explicit mechanism similar to that used in classic Python. (The term "cooperative method" comes from "Putting Metaclasses to Work".)

As a refresher, let's first review the traditional, non-cooperative super call. When a class C derives from a base class B, C often wants to override a method m defined in B. A "super call" occurs when C's definition of m calls B's definition of m to do some of its work. In Java, the body of m in C can write super(a, b, c) to call B's definition of m with argument list (a, b, c). In Python, C.m writes B.m(self, a, b, c) to accomplish the same effect. For example:

class B:
def m(self):
print "B here"

class C(B):
def m(self):
print "C here"
B.m(self)
We say that C's method m "extends" B's method m. The pattern here works well as long as we're using single inheritance, but it breaks down with multiple inheritance. Let's look at four classes whose inheritance diagram forms a "diamond" (the same diagram was shown graphically in the previous section):
class A(object): ..
class B(A): ...
class C(A): ...
class D(B, C): ...

Suppose A defines a method m, which is extended by both B and C. Now what is D to do? It inherits two implementations of m, one from B and one from C. Traditionally, Python simply picks the first one found, in this case the definition from B. This is not ideal, because this completely ignores C's definition. To see what's wrong with ignoring C's m, assume that these classes represent some kind of persistent container hierarchy, and consider a method that implements the operation "save your data to disk". Presumably, a D instance has both B's data and C's data, as well as A's data (a single copy of the latter). Ignoring C's definition of the save method would mean that a D instance, when requested to save itself, only saves the A and B parts of its data, but not the part of its data defined by class C!

C++ notices that D inherits two conflicting definitions of method m, and issues an error message. The author of D is then supposed to override m to resolve the conflict. But what is D's definition of m supposed to do? It can call B's m followed by C's m, but because both definitions call the definition of m inherited from A, A's m ends up being called twice! Depending on the details of the operation, this is at best an inefficiency (when m is idempotent), at worst an error. Classic Python has the same problem, except it doesn't even consider it an error to inherit two conflicting definitions of a method: it simply picks the first one.

The traditional solution to this dilemma is to split each derived definition of m into two parts: a partial implementation _m, which only saves the data that is unique to one class, and a full implementation m, which calls its own _m and that of the base class(es). For example:

class A(object):
def m(self): "save A's data"
class B(A):
def _m(self): "save B's data"
def m(self): self._m(); A.m(self)
class C(A):
def _m(self): "save C's data"
def m(self): self._m(); A.m(self)
class D(B, C):
def _m(self): "save D's data"
def m(self): self._m(); B._m(self); C._m(self); A.m(self)

There are several problems with this pattern. First of all, there is the proliferation of extra methods and calls. But perhaps more importantly, it creates an undesirable dependency in the derived classes on details of the dependency graph of their base classes: the existence of A can no longer be considered an implementation detail of B and C, since class D needs to know about it. If, in a future version of the program, we want to remove the dependency on A from B and C, this will affect derived classes like D as well; likewise, if we want to add another base class AA to B and C, all their derived classes must be updated as well.

The "call-next-method" pattern solves this problem nicely, in combination with the new method resolution order. Here's how:

class A(object):
def m(self): "save A's data"
class B(A):
def m(self): "save B's data"; super(B, self).m()
class C(A):
def m(self): "save C's data"; super(C, self).m()
class D(B, C):
def m(self): "save D's data"; super(D, self).m()

Note that the first argument to super is always the class in which it occurs; the second argument is always self. Also note that self is not repeated in the argument list for m.

Now, in order to explain how super works, consider the MRO for each of these classes. The MRO is given by the __mro__ class attribute:

A.__mro__ == (A, object)
B.__mro__ == (B, A, object)
C.__mro__ == (C, A, object)
D.__mro__ == (D, B, C, A, object)

The expression super(C, self).m should only be used inside the implementation of method m in class C. Bear in mind that while self is an instance of C, self.__class__ may not be C: it may be a class derived from C (for example, D). The expression super(C, self).m, then, searches self.__class__.__mro__ (the MRO of the class that was used to create the instance in self) for the occurrence of C, and then starts looking for an implementation of method m following that point.

For example, if self is a C instance, super(C, self).m will find A's implementation of m, as will super(B, self).m if self is a B instance. But now consider a D instance. In D's m, super(D, self).m() will find and call B.m(self), since B is the first base class following D in D.__mro__ that defines m. Now in B.m, super(B, self).m() is called. Since self is a D instance, the MRO is (D, B, C, A, object) and the class following B is C. This is where the search for a definition of m continues. This finds C.m, which is called, and in turn calls super(C, self).m(). Still using the same MRO, we see that the class following C is A, and thus A.m is called. This is the original definition of m, so no super call is made at this point.

Note how the same super expression finds a different class implementing a method depending on the class of self! This is the crux of the cooperative super mechanism.

Quite cool indeed.

Labels: Python Programming


Django Master Class

If I have to call a piece of documentation: "mind blowing", this has to be it. You do any kind of web programming, you must read this. If you do django, you will know what all you can do, if not you will know what all can be done in a thought-out framework, and start considering switching or implementing part of it in your own. Read it now.

Labels: Python Programming Django Tips n Tricks


Monday, July 23, 2007

Reverse Pagination

You are all aware of object pagination, search results, your photo on flickr, stories on reddit, all have a next page/previous page paradigm. Django makes it trivially easy to create such pages by providing object_list generic view. There are some problems with the current implementation of pagination that we see around, page no 0/1 is assigned to the latest objects in the list. While in search result this makes little difference, in other cases it has a few consequences.

In reddit for example, you are on the main page, you see 25 stories there, you take 10 minutes to go through all of them, and click next. There is a good chance that 3-5 stories on the first page would have moved to now second page, and you will see them again. Not a good experience, but acceptable. You spend next 3-4 hours working, and click back to see the stories on the first page, you are taken to page 0, and there are good chances you have missed 12 stories that moved from first page to second, as your current page labelled second page became third. This is all quite confusing if you think about it. [I am assuming from this discussion the stories changing their relative rankings for simplicity, you can take example of flickr group photos, they also change rapidly, and order does not change there].

Another problem is, I am page on 26 of this flickr group, and I see some fellow has posted 8 nice photos to the group. I bookmark the page, and come back later/email it to a friend, and by the time the page is visited again, 100s of new photos has been added and the content of page 26 is now on page 29 or so, and I don't find what I was looking for.

The last consequence of this is caching difficulty. If a group has 5 thousand pages worth 30 photos each, and one more photo gets added, either the page number of the photos in each page will change for 5 thousand of those pages. This will happen on each photo being added, and there for the page can hardly be ever cached.

I propose a solution to this problem, I call it reverse pagination, and this blog is currently using a patched django to demonstrate it. In reverse pagination, page no 0/1 is assigned to the older page ever. When on reddit home page, and click next, you will not go to page 2, you will jump to page 20566 or something like that. The content of page 20566 will never change, only the content of latest page would be changing while new items are being added. This means all pages other than the main page can be cached for the rest of the life span of the website. And user will not face the other two problems I listed above.

Only downside is on the main page/latest page, you will upto 19 items if you paginate by 10 items per page.

Here is the patch for django. Enjoy!

Labels: Python Programming Invented Here Django


Friday, July 13, 2007

Featured Django Project: i18ndynamic

I am a huge fan of django. And after their recent unicode improvements, they have become really good option of i18n enabled sites, or sites that are available in multiple languages. So now the interface can be in more than one language, django will select the default website language based on browser settings, and allows user to select their language preferences if they want. Only missing piece was translating data. While django's translation utilities support translating data in database, but something was missing, the capability to for example, store description of a book, for example, in different languages. i18ndynamic is exactly what was needed.

Labels: Python Programming Django


Monday, June 11, 2007

Django Image Bundle

One of the most important factor in a users percieved performance of a website is the initial page load time. Initial load time depends on backend performance, time it took for the backend to generate the HTML page, and front end performance, the time it takes for the browser to download and render the HTML and all its dependencies. And it turns out that browsers typically take much longer to download dependencies than the original HTML page, due to browser pipelining. Look at the chart below:

As one can see, most of the time is spent in downloading the images, especially problematic for image heavy sites. Read Yahoo! UI Blog's entries on this topic for futher details.

Google recently released GWT 1.4, and one of the features introduced in this release is ImageBundle. The basic idea is to bundle all the images into one at the server side, and use CSS sprite technique to render them. Inspired by it, I just finished an implementation of "Image Bundling" for Django. Here is a demo of the same. The original template to generate the page is here. Look at the bundled image. And the image bundle template tag library can be downloaded from djangosnippets.com. As an added bonus the size of bundled image is about 70% of the total size of individual images, so one can save both total bandwidth, and number of http requests.

This is still the first cut solution. Both this and Google's ImageBundle face an issue when dealing with images with padding, google suggests to avoid padding, or put a wrapper div around the image and put the padding on it. Another issue is images in CSS. They are to be handled slightly differently, but to still keep the output image count to one, some kind of bundle naming is to be done. Google's approach here is creating bundles and using them as two separate steps, I have tried to combine the two together for greater flexibility.

Labels: Python Programming Google Invented Here Django Tips n Tricks


Thursday, April 6, 2006

Scapy: Programmable Networking Swissknife For Python

What is Scapy

Scapy is a powerful interactive packet manipulation program. It is able to forge or decode packets of a wide number of protocols, send them on the wire, capture them, match requests and replies, and much more. It can easily handle most classical tasks like scanning, tracerouting, probing, unit tests, attacks or network discovery (it can replace hping, 85% of nmap, arpspoof, arp-sk, arping, tcpdump, tethereal, p0f, etc.). It also performs very well at a lot of other specific tasks that most other tools can't handle, like sending invalid frames, injecting your own 802.11 frames, combining technics (VLAN hopping+ARP cache poisoning, VOIP decoding on WEP encrypted channel, ...), etc.

Check out their demo. Very cool stuff for learning the nitty gritties of low level networking.

Labels: Python Programming Tips n Tricks


Next