Python


Using TextMate, it is very easy to get snippets and commands to do things that you often do. However, the python bundle is a bit lacking, and this is a great opportunity to improve that.

I’ve created a Command that will enter a newline, and if not inside a list, function call, dictionary or multi-line string, automatically add a trailing \.

I’ve hooked it in to the Enter key, and the other settings can be seen in the screenshot below:

The actual code follows:

#!/usr/bin/env ruby

scope = ENV['TM_SCOPE'].split

no_trail = ['punctuation.definition.arguments.end.python',
            'meta.structure.list.python',
            'meta.structure.dictionary.python',
            'string.quoted.single.block.python',
            'string.quoted.double.block.python']

print (scope & no_trail) == [] ? "\\\n" : "\n"
View Comments (0)   RSS Feed for Comments on this Post

Note for future reference: especially since the data migration script I was running today was running out of memory and crashing.

You can use slice notation, which limits the query:

>>> for p in db.query(Person)[:10]:
...   print p

This rocks.

View Comments (0)   RSS Feed for Comments on this Post

I’ve had cause to transfer a whole stack of data from an old sqlite database to a PostgreSQL database, and since on the new database I am using SQL Alchemy, I used SA to do the transfer.

Because of some of the relations, I have had to keep IDs constant across some of the columns. This is fine, but because they have Sequence objects associated with them, these are not kept up to date with the ‘custom’ created IDs. Thus, when attempting to create a new row in the table, it often fails, since it is trying to add a primary key that already exists.

After a little bit of research, I discovered it is possible to force the sequence object to increment, using the command:

nextID = engine.execute(Sequence('sequence_name'))

Using a little bit of magic, we can find out the largest index currently in use:

maxID = db.query(Object).order_by('-id').first().id

And a simple while loop will keep incrementing until we have reached the correct id.

View Comments (0)   RSS Feed for Comments on this Post

Three languages. Three different Object Relational Mapping systems. One operating system.

Over the past couple of days, I’ve been madly learning how to create programs in Core Data, using Objective C and Cocoa. It’s given me plenty of food for thought, and made me perhaps think that it’s not that SQL Alchemy rocks my world, it’s just that J2EE/EJB is just ORM done wrong.

I actually got exposure to SQL Alchemy in detail before J2EE - I first came across them at the same time, but using SQL Alchemy was at work, and I basically had to learn in 2 days what a full semester worth of J2EE taught me. Perhaps that was just because learning the python stuff was so damn easy.

Now that I understand just how cool an Object Relational Mapping is, getting into Core Data was easy. Creating the same schema in SQL Alchemy (actually, using Elixir, so it was just object creation), and then in a Core Data xcdatamodel - basically the same process, was even simpler in Core Data: simply because it is a GUI tool, and you can see the whole model in one go, instead of having to scroll through a text file examining classes.

But doing GUI programming using the ORM is where Core Data really shines. Using Cocoa Bindings, you can just plonk down an NSTableView, and tell the object where it gets it’s data from. If you have two GUI widgets using the same data model, and you change selection in one, it even changes the selection in the other!

Core Data also helps look after Undo, Saving and all other sorts of goodness I haven’t even come across yet.

The only thing that Core Data isn’t good for is a multi-user system - or more precisely, a system where multiple users are accessing the data at the same time. I’ve used SOAP as the messenger format, where I had a rich client accessing services provided by my server, but this is cumbersome. I’ll just use Pylons, or perhaps Django to do a web application - where the interface is largely a Web Browser. In this instance, it will probably be best to just stick to a python-based approach. I’m tempted by WebObjects, but that would still require me to use Java.

If only Apple would release a distributed Core Data. I might be able to do something kind of cool with Distributed Objects, for a rich client, at least, but for Web, I may as well do the whole thing in python.

View Comments (0)   RSS Feed for Comments on this Post

One of the criticisms of many languages is that they are so complex, that people do not use all of the features of them. Meaning that one person might write a program in a language that another, fluent reader/writer of that language may not be able to understand. I think this is just rubbish, at least to some respect. If you see some code in a language that doesn’t make sense, then hopefully you should be able to either figure it out, or look up in the documentation (what, your language doesn’t have complete online, searchable documentation?) what is going on.

A similar thing happened to me the other day - and I didn’t even need to look up in the Python documentation to see what the story was.

People often criticise python because it lacks the ability to mark a variable as private, or protected. All attributes of a class are automatically public, and it is only really a guideline that if you prefix a variable with one or two underscores, you are marking it as protected or private. Namespace mangling means that this becomes slightly more than just a guideline, IIRC, although I’m not really sure of how this all works. Basically, the theory is that if you prefix an attribute, that is a marker to users of your API that you really shouldn’t use this.

I’ve often been frustrated when writing Java programs that attributes aren’t accessible, but I can see the value in only allowing protected access, through getters and setters. You can do this to prevent user classes/objects from putting garbage data into your attributes. This becomes even more important in Python, as its dynamic (but still strong) typing means I could put a “GOO” in where you expected an integer.

I discovered yesterday, as I was implementing a database in SQL Alchemy, that python has the ability to only allow protected access, through getters and setters, using the property function.

So, for this project, I have a database, where Sheep objects are stored, along with a variety of attributes. Sheep can be either Rams or Ewes, but the data that is stored about a Ram is virtually identical as that stored about a Ewe. However, using polymorphic mapping, I can have Ram and Ewe objects both stored in the Sheep table, since they are both sub-classes. Furthermore, I can use the gender attribute to determine if the object retrieved from the database should be a Ram or a Ewe. This is much better than anything I was able to do with J2EE, at least as far as I could find.

Because I am using a composite primary key (again, not something that is easy to do with J2EE, without defining a Primary Key class, ugh), and a Sheep can have a .sire and a .dam, which internally is stored with three columns in the table each, and that a sire/dam must be older than the sheep in question (and of particular gender), then some form of restricting which data can be assigned to the sire and dam attributes (which, using SQL Alchemy are references to other objects in the table).

Thus, I can have the following lines in the class definition for Sheep:

    sire = property(_getSire, _setSire)
    dam = property(_getDam, _setDam)

The crux of this post is that now, if you attempt to set a sire, using: sheep.sire = otherSheep; then it will pass the contents of otherSheep to the _setSire method of the Sheep class.

This goes even further - it now allows me to either use a Ram object, or a Sheep object (and uses some checking to ensure that it is a Gender=M), or even just a string that is the string representation of a sheep - the value that I am using for the three parts of the primary key. It will then look up the sheep in question, or create a new one if it doesn’t exist in the database.

Thus, a sheep is identified to the real world by three things: it’s flock number, such as 160188, the year it was born, like 2004, and the tag number, such as 040001, which is not unique between flocks, or necessarily years. For the flock this system is designed for, the tag number is unique for sheep of a given century, which would generally suffice, but in a situation where there were more than 9999 sheep born every year in a flock, then you wouldn’t just be able to use the last two digits of the year for the start of the tag number.

Thus a sheep might be declared as 160188-2007-070001, which is a unique identifier of that sheep in the world (or at least Australia!), but having this as a primary key on it’s own means more jigglery would be required to get the flock number and year. By storing as three seperate parts it is possible to keep an easy reference to the flock, and the year. There is no chance of a data integrity error as there might be if I stored the id, then the year and flock number. It would be possible then to change one without the others being udpated.

I’ve also done something quite clever with the mapping of the sire/dam relationship. By having different classes for Ram and Ewe, and mapping sire/dam to these, it is possible to use the same backreference term, offspring, to generate a list of sheep who are offspring of the Ram or Ewe in question. If you don’t use subclasses, then it isn’t nearly as neat.

If you can’t tell, I’m loving SQL Alchemy. I had some issues figuring out what the hell was going wrong with my relations between sheep and sire, but it turned out that you have to use the remote_side argument to the relation function to do what I wanted. Funnily, it was working with dam, but not with sire, but now it’s much more solid. No more circular reference constraints either.

By the way, this project is going to be an online database for livestock information, allowing potential purchasers to research pedigrees online. If anyone else is interested in purchasing it, send me an email. At this stage, it is sheep only, but will be easy to change to, say cattle if they are your thing…

View Comments (2)   RSS Feed for Comments on this Post

electric_skateboard_double_comic.png

[From xkcd - A webcomic of romance, sarcasm, math, and language - By Randall Munroe]

Ah, this combines my two great loves… python and Calvin and Hobbes.

View Comments (0)   RSS Feed for Comments on this Post

For a Uni project, I have to write the same algorithm different ways, and in different programming paradigms. I also need to collect data on the execution of said programs. Since some algorithms may run in large time on large data sets, I need to stop execution after 30 minutes of run time. And, since there needs to be between 50 and 64 runs of each data size, I wanted to automate the process.

Using the $ time `other-command` is the most obvious way to time execution of any command line application on any decent operating system. However, the time that comes with OS X is somewhat limited - from the man page:

NAME

time — time command execution

SYNOPSIS

time [-lp] utility

It only has two options, and they aren’t that useful. However, it is possible to use this command to time execution.

Because it outputs it’s timings to stderr, then you need to do some tricky python to capture it and save it to a file. I used sys.popen3(), and then read from the stderr stream. Which worked okay, but there isn’t really a way to stop execution after a certain time frame. You can try to use TimeoutFunctionException - which I can’t remember where I got it from, but it’s cool. It doesn’t work in this case, since the sys.popenX calls run in a sub-process, and thus continue to run after the exception is raised. Fail.

About this time in my thought process, I came across a better time. GNU time allows you to select the output format (excellent, no need to parse the output quite so much!), and to output to a file instead of stderr/stdout (even better). It also allows to append to a file instead of overwriting, and some other cool stuff.

But it doesn’t solve the issue of commands running too long. This was a killer, since the process continues to run, but worse than this, new processes are started. So the machine clogs to a virtual halt.

Then I came across ulimit. This handy tool can limit the resources a user or process is able to use.

$ ulimit -t X; time -f %U -a -o {DATAFILE} {COMMAND} {ARGS}

This command will limit a command to X seconds (or slightly less, since the time command itself uses some time. It will then execute COMMAND, with arguments ARGS, and time the execution, appending the run time only, on a seperate line, to the file DATAFILE.

Note that this is no excuse for not using code profiling. I have already run profile.run() on my code to work out where the slowdowns are in the python versions, and then optimised them. This is more like the last phase, actual comparisons.

View Comments (0)   RSS Feed for Comments on this Post

Java almost handles arrays well.

Almost.

Maybe I’m spoilt by python, but having datatypes that are effectively a hybrid between lists and arrays is excellent. You get both of the advantages - being able to iterate easily, and access by index (attributes of arrays), and having dynamic sizes and non-sparse lists (the only decent attributes of lists).

In fact, the text I am reading now has a three-and-a-half page code fragment called “Partially-filled lists”, which is about 200 lines of code, which implements what I describe. Except the upper limit of the size, which must be determined at compile-time. And it requires a new class if you want it to be for anything other than doubles, or whatever you have written it for.

The other thing which was bugging me was the looping of arrays. In python you can do cool stuff easily iterate over elements of an array. Recent versions of Java can also do this.

Python:

for element in theList:    print element

Java:

for (element: theList)    System.out.println(element);

It gets pretty close. I think I still like the simplicity of the python notation - brackets only where they are really required to indicate function/method calls, and for expression ordering. Having a required bracket around if test-expressions and the like just makes me think if, switch and so on are functions. Which they can’t possibly be, since Java doesn’t have functions, only objects and methods.

And don’t get me started on braces…

View Comments (0)   RSS Feed for Comments on this Post

For my day job, I am developing a SOAP server in python. I have been using ZSI as a framework, and it is very good. It will, with mod_python, allow you to build a complete application in python (and even without mod_python you can have it as a standalone process). One of the touted features is easy return of lists and dictionaries, without having to declare ComplexTypes classes.

However, it doesn’t quite work. And the not-working-bit is really odd.

If you return a dict, such as the following:

  return {"uid":23,"gid":993,"cid":333}

Then ZSI creates a SOAP response like:

<uid id=“1234″ xsi:type=“xsd:int”>23</uid>
<gid id=“5678″ xsi:type=“xsd:int”>993</gid>
<cid id=“0987″ xsi:type=“xsd:int”>333</cid>

But, if you return a dictionary with values that happen to be the same, as I did with my boilerplate code:

return {"uid":"xsd__string",
        "gid":"xsd__string",
        "cid":"xsd__string"}

Then it fails. The second and any other instance of any dict key where the value has already been used by another key is empty, and the wrong type:

<uid id=“1234″ xsi:type=“xsd:string”>xsd__string</uid>
<gid href=“#5678″></gid>
<cid href=“#0987″></cid>

This can be overcome with liberal use of classes (or subclasses, since most of the time I am returning dicts or lists). It is a bit of a pain in the arse, though. I’ve filed a bug report. And stopped using ZSI. If I found this bug this fast, then I don’t want to know how many more there are. It’s just easier to convert the XML to python objects and back again, and package it up to look like a SOAP request. Which is kind of what SOAP does anyway.

View Comments (0)   RSS Feed for Comments on this Post

I never thought I’d manage to find a job coding in python for a living. But that is what my full-time job is, right now, anyway.

I can’t really talk about the work I’m doing, since it is a commercial enterprise, but I can talk about what sort of things I am coding. I have spent the last couple of weeks rebuilding a server in python that uses SOAP to communicate with the outside world (well, a client application, anyway) over SSL connections. Python is used so it is easily extensible, without having to recompile. Eventually it will be dynamic, where new modules can be added to a database, and depending on the userid of a request, a different function will be called. It’s really quite exciting.

I’ve come across a couple of new software programs - one of which is NX (nomachine) Client, which is a remote tunnel for X windowing. I can remote in via this to work from home, as well as ssh or sftp. Which is fairly cool. Speedier than VNC, since I think the local X-Windowing system is responsible for some of the drawing tasks. Feels about the same over ADSL as ARD does over WLAN.

In my “free” time I’ve been doing a couple of other things, both programming tasks. The first is a web application for an art gallery to create HTML and PDF invitations and newsletters. Originally I planned to use a web app so that it could eventually be rolled out as a blog-alike - in fact originally it was just going to be a WordPress installation with some minor modifications. It turned out to be easier to rewrite it from scratch. I have learned from this process that PHP is crap: it’s never clear about the way to do stuff, and many functions have weird names. count_chars, for instance, doesn’t really count the characters, unless you decide that things like #@! aren’t characters. In which case you want strlen. Which had me tricked for some time, since I stopped looking once I had found count_chars. Python and len(anything) is much better.

Speaking of python (again), I’ve also been working on a Regular Expression helper - similar to the one that comes with Komodo IDE. I started (and pretty much finished, in a matter of hours, to the extent it solved my first use problem) this after having to load up Komodo just to get a visual representation of which bits of a text block were being matched by a regex. Still some kinks to work out - I need to figure out how to put stuff into an outline view, so I can see more than just matches, but match groups. Then it will be all good.

In the process of my work job, I downloaded SOAP Client, a freeware tool for testing SOAP packets. It was all good until I tried sending HTTPS requests, which it fails unfathomably on (cannot connect to endpoint…). I emailed the author, and he then promptly released the source code. I’ve snaffled that from Google Code, and I’ll try to hack through it a bit to implement SSL connections. Not sure how to go about it at this stage - dunno if it is with WebKit or something else I need to do. I also plan to add in the ability to edit the SOAP request manually before it is sent off to the server.

I start Uni in a couple of weeks - I’m doing an introductory Java course in intensive mode, which I expect to be fairly easy. I’m really only doing it so I can do the meatier sounding subjects, like Programming Language Concepts and Systems Programming. I really think I’m going to enjoy this course. I will be interested when I come to the Internet Computing subject, since I’ve been doing a fair bit of that in various forms over the past few years. Be interesting to see what the academics think it means.

Well, that’s been my life over the time period since coming back from the beach for a 10 day holiday over New Year. Apart from squeezing in a few games of Touch Football here and there, I’ve pretty much been chained to my laptop.

And loving it.

View Comments (0)   RSS Feed for Comments on this Post

Next Page »