Pyalot: April 2005

Thursday, April 28, 2005

dictionary speed

This entry got me thinking.

Here's my 5 cents. In all cases the construct


if key in dictionary:
    value = dictionary[key]
else:
    value = default

performed best.

update: I increased the amount of data to reduce the amount of white-noise in the test results, though things haven't changed. An explanation to the output. The programm compares nine data-sets against four test functions. The nine data sets are the combinations of size-distribution between search/dictionary and size-distribution between search and matching keysets ( thereof the small/even/big combinations ).

update2: Why, the version of python is 2.4 of course :)

Here's the output of the programm.


big dict, small search, big intersect
 test_has_key         0.9690
 test_get             1.0000
 test_try             1.2190
 test_in              0.5930
big dict, small search, small intersect
 test_has_key         0.8750
 test_get             1.0320
 test_try             6.6250
 test_in              0.5150
big dict, small search, half intersect
 test_has_key         0.9220
 test_get             1.0310
 test_try             3.9530
 test_in              0.5470
even, big intersect
 test_has_key         1.4370
 test_get             1.4840
 test_try             1.7030
 test_in              1.0780
even, small intersect
 test_has_key         1.4530
 test_get             1.6090
 test_try             7.3120
 test_in              1.0780
even, half intersect
 test_has_key         1.4690
 test_get             1.5620
 test_try             4.5630
 test_in              1.0780
small dict, big search, big intersect
 test_has_key         0.1090
 test_get             0.1250
 test_try             0.7970
 test_in              0.0630
small dict, big search, small intersect
 test_has_key         1.0790
 test_get             1.2180
 test_try             7.0160
 test_in              0.7340
small dict, big search, half intersect
 test_has_key         0.5790
 test_get             0.6560
 test_try             3.8440
 test_in              0.3910

and that's the programm


import sys, time

def random_key( keylen = 10 ):
    import random
    return ''.join(chr(random.randint(ord('a'),ord('z'))) for _ in range( keylen ))

def new_key( existing_keys ):
    key = random_key()
    while key in existing_keys:
        key = random_key()
    return key

def unique_keys( amount=1000, existing_keys=set() ):
    keys = set( existing_keys )
    for _ in xrange( amount ):
        key = new_key( keys )
        keys.add( key )
    return keys-existing_keys

def data( dictionary_size=1000, search_size=3000, intersection=0.5 ):
    dict_keys = unique_keys( dictionary_size )
    intersection_amount = int(search_size*intersection)
    unique_search_keys = unique_keys( search_size-intersection_amount, dict_keys )
    intersection_search_keys = set(list(dict_keys)[0:intersection_amount])
    return dict((key,None) for key in dict_keys), unique_search_keys | intersection_search_keys


def core_test( out_prefix, dictionary, searched, test_functions ):
    print out_prefix
    for function in test_functions:
        start = time.time()
        for _ in range( 20*(len(dictionary)/len(searched) or 1) ):
            function( dictionary, searched, None )
        end = time.time()
        print '\t%-20s %2.4f'%(function.__name__, end-start)

def inner_test( out_prefix, dictionary_size, search_size, test_functions ):
    #big intersection
    dictionary, search = data( dictionary_size, search_size, 0.9 )
    core_test( '%s, big intersect'%out_prefix, dictionary, search, test_functions )
    #small intersection
    dictionary, search = data( dictionary_size, search_size, 0.1 )
    core_test( '%s, small intersect'%out_prefix, dictionary, search, test_functions )
    #half intersection
    dictionary, search = data( dictionary_size, search_size, 0.5 )
    core_test( '%s, half intersect'%out_prefix, dictionary, search, test_functions )


#test functions
def test_has_key( dictionary, searched, default ):
    for key in searched:
        if dictionary.has_key( key ):
            value = dictionary[key]
        else:
            value = default

def test_get( dictionary, searched, default ):
    for key in searched:
        value = dictionary.get(key, default)

def test_try( dictionary, searched, default ):
    for key in searched:
        try:
            value = dictionary[key]
        except KeyError:
            value = default

def test_in( dictionary, searched, default ):
    for key in searched:
        if key in dictionary:
            value = dictionary[key]
        else:
            value = default

def test():
    factor = 100
    test_functions = [test_has_key, test_get, test_try, test_in]
    #big dict, small search, small
    inner_test( 'big dict, small search', 1000*factor, 1*factor, test_functions )

    #even dict/search
    inner_test( 'even', 1000*factor, 1000*factor, test_functions )

    #small dict, big search
    inner_test( 'small dict, big search', 1*factor, 1000*factor, test_functions )

if __name__ == '__main__':
    test()

Thursday, April 14, 2005

Live communications Server

The task was simple, "install a live-communications server". Me beeing a Microsoft-avoider I found out most things the hard way.

To install live communications server, you gotta have a windows 2004 and up server. Had to install one.

Do not try out live-communications server 2003, as it either fails, or lacks much of the configuration ease that 2005 offers.

If you want live communications server, you need a domain-controller that manages the active directory ( required to manage rights and users, for what it's worth ).

Here's where the trouble starts. In a demo-environment you'll never get rights to the produktive domain ( if any ). Since some of the stuff required for communications server requires you to make changes to what the Microsoft guys call schema, which is inter-domain stuff ( nono, no subdomain either ).

You gotta have a standalone domain, or any other domain admin of a produktive domain's very likely to kill you.

However, that means that you can't enjoy the benefit of having user of the other domain automatically beeing authenticated in yours, because trusts between domains include resources, not however the live-communications stuff.

Having that learned, I tried out connecting to the live-communications server with windows-messenger. That was easy enough, and you actually do not have to be part of the domain the live-communications server sits in in order for this to work ( as long as you can adress the machine somehow, that's enough )

Next up: Installing Sharepoint portal. That'll get a little difficult, as on our cached select site we've only got a very old sharepoint portal ( 2001 ), and this won't even install on a windows 2003 server. So I gotta have to have a Sharepoint 2005 sp1, just don't know from where.

Wednesday, April 13, 2005

Video Capture, weee!

vidcap + PIL + pygame =

import pygame, vidcap, Image

screen = pygame.display.set_mode( (320,240) )
dev = vidcap.new_Dev( 0, False )

while not pygame.event.get( pygame.QUIT ):
   buf, width, height = dev.getbuffer()
   img_buf = Image.frombuffer( 'RGB', (width,height), buf )
   vid_surface = pygame.image.fromstring( img_buf.tostring( 'raw', 'BGR' ), (width,height), 'RGB' )
   screen.blit( vid_surface, (0,0) )
   pygame.event.pump()
   pygame.display.flip()

pygame.quit()

vidcap + PIL + pygame + twisted + pymedia = ?

Now let me get that OS stream-compression and twisted, and let's cook our own video-streaming-game-fun thingy.
( I bet I've violated a dozen or so patents by now )

Update: Somebody pointed me to pymedia

Saturday, April 09, 2005

Bad movies

Today I had a lenghty talk about bad movies on IRC. I think this sequence of comments is worth remembering as the bottom of the pit of cynism we managed to find ourselfes in.

__doc__: I fear lucas will post-morten torture me with gruelsome badly done starwars licensed crap when I'm well in my fourties.
linkmastersab: The original star wars nerds will be dead by the time 9 comes out
linkmastersab: Good for them

Friday, April 08, 2005

Roll-Off

This means leaving one project and beeing on the bench for the next.

Yesterday was my roll-off in a project in Vienna. I wrote some scripts and small applications there in python, for the purpose of reporting. Usually at the end of a project you'll get a feedback. Then there are some good points and some bad ones.

Some of the good points for me where

Florian has excelent analysis skills and basically created the detailed designs independently. As he was later implementing the tools as well, it was not necessary to create detailed technical documents. Florian used an agile development approach. He is able to quickly understand complex requirements and shows areas for better designs.

Quite high praise. As you might guess, all I did was listening, hacking some python quickly and adapting and improoving it without hesitation. No magic at all.
But of course everything is measured in relation. The golden standard in this project in regards to data analysis has been cobol, excel, access, perl or java. An easy environment to be good with python :)

Thursday, April 07, 2005

Lotus Notes aka ( /dev/null )

Today morning. Clicking on the update records field in a Lotus Notes database. I got the error message popup "B-Tree structure Invalid". That's that, as a user am I Supposed to cheer now for the helpful error message?

Image to this later this week.

Pyalot out

Pyalot