Thursday, April 28, 2005

dictionary speed

This entry got me thinking.

Here's my 5 cents. In all cases the construct

if key in dictionary:
value = dictionary[key]
else:
value = default

performed best.

update: I increased the amount of data to reduce the amount of white-noise in the test results, though things haven't changed. An explanation to the output. The programm compares nine data-sets against four test functions. The nine data sets are the combinations of size-distribution between search/dictionary and size-distribution between search and matching keysets ( thereof the small/even/big combinations ).

update2: Why, the version of python is 2.4 of course :)

Here's the output of the programm.

big dict, small search, big intersect
test_has_key 0.9690
test_get 1.0000
test_try 1.2190
test_in 0.5930
big dict, small search, small intersect
test_has_key 0.8750
test_get 1.0320
test_try 6.6250
test_in 0.5150
big dict, small search, half intersect
test_has_key 0.9220
test_get 1.0310
test_try 3.9530
test_in 0.5470
even, big intersect
test_has_key 1.4370
test_get 1.4840
test_try 1.7030
test_in 1.0780
even, small intersect
test_has_key 1.4530
test_get 1.6090
test_try 7.3120
test_in 1.0780
even, half intersect
test_has_key 1.4690
test_get 1.5620
test_try 4.5630
test_in 1.0780
small dict, big search, big intersect
test_has_key 0.1090
test_get 0.1250
test_try 0.7970
test_in 0.0630
small dict, big search, small intersect
test_has_key 1.0790
test_get 1.2180
test_try 7.0160
test_in 0.7340
small dict, big search, half intersect
test_has_key 0.5790
test_get 0.6560
test_try 3.8440
test_in 0.3910


and that's the programm


import sys, time

def random_key( keylen = 10 ):
import random
return ''.join(chr(random.randint(ord('a'),ord('z'))) for _ in range( keylen ))

def new_key( existing_keys ):
key = random_key()
while key in existing_keys:
key = random_key()
return key

def unique_keys( amount=1000, existing_keys=set() ):
keys = set( existing_keys )
for _ in xrange( amount ):
key = new_key( keys )
keys.add( key )
return keys-existing_keys

def data( dictionary_size=1000, search_size=3000, intersection=0.5 ):
dict_keys = unique_keys( dictionary_size )
intersection_amount = int(search_size*intersection)
unique_search_keys = unique_keys( search_size-intersection_amount, dict_keys )
intersection_search_keys = set(list(dict_keys)[0:intersection_amount])
return dict((key,None) for key in dict_keys), unique_search_keys | intersection_search_keys


def core_test( out_prefix, dictionary, searched, test_functions ):
print out_prefix
for function in test_functions:
start = time.time()
for _ in range( 20*(len(dictionary)/len(searched) or 1) ):
function( dictionary, searched, None )
end = time.time()
print '\t%-20s %2.4f'%(function.__name__, end-start)

def inner_test( out_prefix, dictionary_size, search_size, test_functions ):
#big intersection
dictionary, search = data( dictionary_size, search_size, 0.9 )
core_test( '%s, big intersect'%out_prefix, dictionary, search, test_functions )
#small intersection
dictionary, search = data( dictionary_size, search_size, 0.1 )
core_test( '%s, small intersect'%out_prefix, dictionary, search, test_functions )
#half intersection
dictionary, search = data( dictionary_size, search_size, 0.5 )
core_test( '%s, half intersect'%out_prefix, dictionary, search, test_functions )


#test functions
def test_has_key( dictionary, searched, default ):
for key in searched:
if dictionary.has_key( key ):
value = dictionary[key]
else:
value = default

def test_get( dictionary, searched, default ):
for key in searched:
value = dictionary.get(key, default)

def test_try( dictionary, searched, default ):
for key in searched:
try:
value = dictionary[key]
except KeyError:
value = default

def test_in( dictionary, searched, default ):
for key in searched:
if key in dictionary:
value = dictionary[key]
else:
value = default

def test():
factor = 100
test_functions = [test_has_key, test_get, test_try, test_in]
#big dict, small search, small
inner_test( 'big dict, small search', 1000*factor, 1*factor, test_functions )

#even dict/search
inner_test( 'even', 1000*factor, 1000*factor, test_functions )

#small dict, big search
inner_test( 'small dict, big search', 1*factor, 1000*factor, test_functions )

if __name__ == '__main__':
test()

Thursday, April 14, 2005

Live communications Server

The task was simple, "install a live-communications server". Me beeing a Microsoft-avoider I found out most things the hard way.

To install live communications server, you gotta have a windows 2004 and up server. Had to install one.

Do not try out live-communications server 2003, as it either fails, or lacks much of the configuration ease that 2005 offers.

If you want live communications server, you need a domain-controller that manages the active directory ( required to manage rights and users, for what it's worth ).

Here's where the trouble starts. In a demo-environment you'll never get rights to the produktive domain ( if any ). Since some of the stuff required for communications server requires you to make changes to what the Microsoft guys call schema, which is inter-domain stuff ( nono, no subdomain either ).

You gotta have a standalone domain, or any other domain admin of a produktive domain's very likely to kill you.

However, that means that you can't enjoy the benefit of having user of the other domain automatically beeing authenticated in yours, because trusts between domains include resources, not however the live-communications stuff.

Having that learned, I tried out connecting to the live-communications server with windows-messenger. That was easy enough, and you actually do not have to be part of the domain the live-communications server sits in in order for this to work ( as long as you can adress the machine somehow, that's enough )

Next up: Installing Sharepoint portal. That'll get a little difficult, as on our cached select site we've only got a very old sharepoint portal ( 2001 ), and this won't even install on a windows 2003 server. So I gotta have to have a Sharepoint 2005 sp1, just don't know from where.

Wednesday, April 13, 2005

Video Capture, weee!

vidcap + PIL + pygame =
import pygame, vidcap, Image

screen = pygame.display.set_mode( (320,240) )
dev = vidcap.new_Dev( 0, False )

while not pygame.event.get( pygame.QUIT ):
buf, width, height = dev.getbuffer()
img_buf = Image.frombuffer( 'RGB', (width,height), buf )
vid_surface = pygame.image.fromstring( img_buf.tostring( 'raw', 'BGR' ), (width,height), 'RGB' )
screen.blit( vid_surface, (0,0) )
pygame.event.pump()
pygame.display.flip()

pygame.quit()


vidcap + PIL + pygame + twisted + pymedia = ?

Now let me get that OS stream-compression and twisted, and let's cook our own video-streaming-game-fun thingy.
( I bet I've violated a dozen or so patents by now )

Update: Somebody pointed me to pymedia

Saturday, April 09, 2005

Bad movies

Today I had a lenghty talk about bad movies on IRC. I think this sequence of comments is worth remembering as the bottom of the pit of cynism we managed to find ourselfes in.

__doc__: I fear lucas will post-morten torture me with gruelsome badly done starwars licensed crap when I'm well in my fourties.
linkmastersab: The original star wars nerds will be dead by the time 9 comes out
linkmastersab: Good for them

Friday, April 08, 2005

Roll-Off

This means leaving one project and beeing on the bench for the next.

Yesterday was my roll-off in a project in Vienna. I wrote some scripts and small applications there in python, for the purpose of reporting. Usually at the end of a project you'll get a feedback. Then there are some good points and some bad ones.

Some of the good points for me where

Florian has excelent analysis skills and basically created the detailed designs independently. As he was later implementing the tools as well, it was not necessary to create detailed technical documents. Florian used an agile development approach. He is able to quickly understand complex requirements and shows areas for better designs.
Quite high praise. As you might guess, all I did was listening, hacking some python quickly and adapting and improoving it without hesitation. No magic at all.
But of course everything is measured in relation. The golden standard in this project in regards to data analysis has been cobol, excel, access, perl or java. An easy environment to be good with python :)

Thursday, April 07, 2005

Lotus Notes aka ( /dev/null )

Today morning. Clicking on the update records field in a Lotus Notes database. I got the error message popup "B-Tree structure Invalid". That's that, as a user am I Supposed to cheer now for the helpful error message?

Image to this later this week.

Pyalot out