Blog - Schinckel.net

rrule to RFC-string

2014-05-28 @ 22:15:27

here.

I’ve been playing around with Postgres lots lately, and I had the idea to store RRULE data in a Postgres Composite Type.

It then occurred to me I didn’t need to reimplement all of the great stuff that is in the python dateutil module: I can just use PL/Python and import it.

The next step was realising I didn’t need to use a custom type, but just use a new Domain of type text that validates the string is an RFC 2445 compatible string.

That’s all well and good, but dateutil doesn’t come with a way to convert rrule objects back into a string.

There is a patch, but it wasn’t quite right. So here is my method (which can be monkey-patched, as seen at the end):

FREQNAMES = ['YEARLY','MONTHLY','WEEKLY','DAILY','HOURLY','MINUTELY','SECONDLY']

def rrule_to_string(rule):
    output = []
    h,m,s = [None] * 3
    if rule._dtstart:
        output.append(rule._dtstart.strftime('DTSTART:%Y%m%dT%H%M%S'))
        h,m,s = rule._dtstart.timetuple()[3:6]
    
    parts = ['FREQ='+FREQNAMES[rule._freq]]
    if rule._interval != 1:
        parts.append('INTERVAL='+str(rule._interval))
    if rule._wkst:
        parts.append('WKST='+str(rule._wkst))
    if rule._count:
        parts.append('COUNT='+str(rule._count))
    
    for name, value in [
            ('BYSETPOS', rule._bysetpos),
            ('BYMONTH', rule._bymonth),
            ('BYMONTHDAY', rule._bymonthday),
            ('BYYEARDAY', rule._byyearday),
            ('BYWEEKNO', rule._byweekno),
            ('BYWEEKDAY', rule._byweekday),
            ]:
        if value:
            parts.append(name+'='+','.join(str(v) for v in value))
    
    # Only include these if they differ from rule._dtstart
    if rule._byhour and rule._byhour[0] != h:
        parts.append('BYHOUR=%s' % rule._byhour)
    if rule._byminute and rule._byminute[0] != m:
        parts.append('BYMINUTE=%s' % rule._byminute)
    if rule._bysecond and rule._bysecond[0] != s:
        parts.append('BYSECOND=%s' % rule._bysecond),
    
    
    output.append(';'.join(parts))
    return '\n'.join(output)

from dateutil.rrule import rrule
rrule.__str__ = rrule_to_string

I’ve only lightly tested it (in both regular python and PL/Python). I’ve also come up with a neat way of caching rrule objects between function calls, but that’s a topic for another day, but here it is in an SQL session.

SELECT rrulestr('DTSTART:19970902T090000 FREQ=DAILY;INTERVAL=10;COUNT=5');

            rrulestr            
--------------------------------
 DTSTART:19970902T090000       +
 FREQ=DAILY;INTERVAL=10;COUNT=5
(1 row)

Querying JSON in Postgres

2014-05-25 @ 16:25:15

postgres
json
sql

here.

Yesterday, I discovered how you can enable jsonb in postgres/psycopg2.

Today, I experimented around with how to query the data in json columns. There is documentation, but it wasn’t initially clear to me how the different operations worked.

CREATE TABLE json_test (
  id serial primary key,
  data jsonb
);

INSERT INTO json_test (data) VALUES 
  ('{}'),
  ('{"a": 1}'),
  ('{"a": 2, "b": ["c", "d"]}'),
  ('{"a": 1, "b": {"c": "d", "e": true}}'),
  ('{"b": 2}');

So far, so good. Let’s see what’s in there, to check:

SELECT * FROM json_test;

 id |                 data                 
----+--------------------------------------
  1 | {}
  2 | {"a": 1}
  3 | {"a": 2, "b": ["c", "d"]}
  4 | {"a": 1, "b": {"c": "d", "e": true}}
  5 | {"b": 2}
(5 rows)

Super. Let’s have a go at filtering those results. There are several operators that we can use (and we’ll soon see why we chose jsonb).

Equality

Only available for jsonb, we can test that two JSON objects are identical:

SELECT * FROM json_test WHERE data = '{"a":1}';

 id | data 
----+------
  1 | {"a": 1}
(1 row)

Containment

Again, jsonb only, we can see if one JSON object contains another. In this case, containment means “is a subset of”.

SELECT * FROM json_test WHERE data @> '{"a":1}';

Give me all objects that contain the key "a", with the value 1 associated with that key:

 id |                 data                 
----+--------------------------------------
  2 | {"a": 1}
  4 | {"a": 1, "b": {"c": "d", "e": true}}
(2 rows)

Containment goes both ways:

SELECT * FROM json_test WHERE data <@ '{"a":1}';

In this case, we can see that the query object is a superset of the empty object, as well as matching exactly to object 2.

 id |   data   
----+----------
  1 | {}
  2 | {"a": 1}
(2 rows)

Key/element existence

The last batch of jsonb only operators: we can test for the existence of a key (or an element of type string in an array, but we’ll get to those later).

SELECT * FROM json_test WHERE data ? 'a';

Give me all objects that have the key a.

 id |                 data                 
----+--------------------------------------
  2 | {"a": 1}
  3 | {"a": 2, "b": ["c", "d"]}
  4 | {"a": 1, "b": {"c": "d", "e": true}}
(3 rows)

We can also test for objects that have any of a list of keys:

SELECT * FROM json_test WHERE data ?| array['a', 'b'];

 id |                 data                 
----+--------------------------------------
  2 | {"a": 1}
  3 | {"a": 2, "b": ["c", "d"]}
  4 | {"a": 1, "b": {"c": "d", "e": true}}
  5 | {"b": 2}
(4 rows)

And, as you may expect, for objects that have all of the keys:

SELECT * FROM json_test WHERE data ?& array['a', 'b'];

 id |                 data                 
----+--------------------------------------
  3 | {"a": 2, "b": ["c", "d"]}
  4 | {"a": 1, "b": {"c": "d", "e": true}}
(2 rows)

Key-path traversal

We can also filter records that have a matching key-path. In simple cases, using the containment operators might be simpler, but in more complex situations, we would need to use these. These operations can also be used to extract a value, although at this stage I’m only interested in using them as part of a WHERE clause.

SELECT * FROM json_test WHERE data ->> 'a' > '1';

Give me all the records where the value of the element associated with key a is greater than 1. Notice the need to use a text value, rather than a number. I’m still investigating how this will play out.

 id |           data            
----+---------------------------
  3 | {"a": 2, "b": ["c", "d"]}
(1 row)

We can also do comparisons between primitives, objects and arrays:

SELECT * FROM json_test WHERE data -> 'b' > '1';

 id |                 data                 
----+--------------------------------------
  3 | {"a": 2, "b": ["c", "d"]}
  4 | {"a": 1, "b": {"c": "d", "e": true}}
  5 | {"b": 2}
(3 rows)

So, it seems that arrays and objects sort greater than numbers.

We can also look deeper down the path:

SELECT * FROM json_test WHERE data #> '{b,c}' = '"d"';

Give me objects where element b has a child object that has element c equal to the string "d". Neat.

 id |                 data                 
----+--------------------------------------
  4 | {"a": 1, "b": {"c": "d", "e": true}}

There are also versions of these operators that return a text, rather than a json object. In the case of the last query, that means we don’t need to compare to a JSON object (in the case where we actually want a string).

SELECT * FROM json_test WHERE data #>> '{b,c}' = 'd';

 id |                 data                 
----+--------------------------------------
  4 | {"a": 1, "b": {"c": "d", "e": true}}
(1 row)

Don’t cross the streams…

So, all good so far. We can query stuff, and this same stuff can be used to index jsonb columns, too.

However, the more astute reader may have noticed that I’ve been dealing with json data that has an object as it’s root. This needn’t be the case: arrays are also valid json, indeed so are any of the allowable atoms:

SELECT 
  'null'::json, 
  'true'::json, 
  'false'::json, 
  '2'::json,
  '1.0001'::json,
  '"abc"'::json, 
  '1E7'::jsonb;

Note the last one is a jsonb, which converts to canonical form:

 json | json | json  | json |  json   | json  |  jsonb   
------+------+-------+------+---------+-------+----------
 null | true | false | 2    | 1.00001 | "abc" | 10000000
(1 row)

Note also that a json null is different to an SQL NULL.

So, what happens when we start storing objects of mixed “type” in a json column?

I’m glad you asked.

INSERT INTO json_test (data) 
VALUES ('[]'), ('[1,2,"a"]'), ('null'), ('1E7'), ('"abc"');

SELECT * FROM json_test;

 id |                 data                 
----+--------------------------------------
  1 | {}
  2 | {"a": 1}
  3 | {"a": 2, "b": ["c", "d"]}
  4 | {"a": 1, "b": {"c": "d", "e": true}}
  5 | {"b": 2}
  6 | []
  7 | [1, 2, "a"]
  8 | null
  9 | 10000000
 10 | "abc"
(10 rows)

So far, so good. We can store those objects. And query?

Equality testing works fine:

SELECT * FROM json_test WHERE data = '{"a":1}';
SELECT * FROM json_test WHERE data = 'null';

Containment, too works as expected.

SELECT * FROM json_test WHERE data @> '{"a":1}';
SELECT * FROM json_test WHERE data <@ '{"a":1}';

Key and element existence perform reliably: perhaps surprisingly, the one query will match elements in an array, as well as keys in an object.

SELECT * FROM json_test WHERE data ? 'a';

 id |                 data                 
----+--------------------------------------
  2 | {"a": 1}
  3 | {"a": 2, "b": ["c", "d"]}
  4 | {"a": 1, "b": {"c": "d", "e": true}}
  7 | [1, 2, "a"]
(4 rows)

SELECT * FROM json_test WHERE data ?| array['a', 'b'];

 id |                 data                 
----+--------------------------------------
  2 | {"a": 1}
  3 | {"a": 2, "b": ["c", "d"]}
  4 | {"a": 1, "b": {"c": "d", "e": true}}
  5 | {"b": 2}
  7 | [1, 2, "a"]
(5 rows)

SELECT * FROM json_test WHERE data ?& array['a', 'b'];

 id |                 data                 
----+--------------------------------------
  3 | {"a": 2, "b": ["c", "d"]}
  4 | {"a": 1, "b": {"c": "d", "e": true}}
(2 rows)

But, as soon as we start doing key or element ‘get’, we hit a problem:

SELECT * FROM json_test WHERE data ->> 'a' > '1';

ERROR: cannot call jsonb_object_field_text 
       (jsonb ->> text operator) on an array

We can still use the key-path traversal, though, unless we have scalar values:

SELECT * FROM json_test WHERE data #> '{b,c}' = '"d"';
ERROR:  cannot call extract path from a scalar
SELECT * FROM json_test WHERE data #> '{b,c}' = '"d"' AND id < 8;

 id |                 data                 
----+--------------------------------------
  4 | {"a": 1, "b": {"c": "d", "e": true}}
(1 row)

Note the syntax for a key path: it only allows for strings (which json keys must be), or integers (which array indices are).

This seems like a pretty severe limitation. I’m not sure how things like MongoDB handle this, but in hindsight, if you are storing both array-based and object-based json data in the one column, you are probably going to be in a world of hurt anyway.

…or, maybe, do cross the streams

All is not lost, however: it’s possible to get just the object-based rows:

SELECT * FROM json_test WHERE data @> '{}';

 id |                 data                 
----+--------------------------------------
  1 | {}
  2 | {"a": 1}
  3 | {"a": 2, "b": ["c", "d"]}
  4 | {"a": 1, "b": {"c": "d", "e": true}}
  5 | {"b": 2}
(5 rows)

You could then combine this with a previously-forbidden query:

SELECT * FROM json_test WHERE data @> '{}' AND data ->> 'a' > '1';

 id |           data            
----+---------------------------
  3 | {"a": 2, "b": ["c", "d"]}
(1 row)

Indeed, postgres is so awesome you don’t even need to ensure the data @> '{} bit comes first!

But what about limiting to just array-typed data? Turns out we can use the same trick:

SELECT * FROM json_test WHERE data @> '[]';

 id |    data     
----+-------------
  6 | []
  7 | [1, 2, "a"]
(2 rows)

And, again, combine with the other required operator:

SELECT * FROM json_test WHERE data @> '[]' AND data ->> 1 = '2';

 id |    data     
----+-------------
  7 | [1, 2, "a"]
(1 row)

Worth noting is that the @> operator is only available on jsonb columns, so you won’t be able to query mixed-form data in a regular json column.

Wow! What’s next?

This foray into querying jsonb data in postgres was an aside to a project I’m working on to bring json(b) querying to django. With django 1.7’s new custom lookup features, it will be possible to write things like:

# Exact
MyModel.objects.filter(data={'a': 1})
MyModel.objects.exclude(data={})
# Key/element existence
MyModel.objects.filter(data__has='a')
MyModel.objects.filter(data__has_any=['a', 'b'])
MyModel.objects.filter(data__has_all=['a', 'b'])
# Sub/superset of key/value pair testing
MyModel.objects.filter(data__contains={'a': 1})
MyModel.objects.filter(data__in={'a': 1, 'b': 2})
# Get element/field (compare with json)
MyModel.objects.filter(data__get=(2, {'a': 1}))
# Get element/field (compare with scalar, including gt/lt comparisons)
MyModel.objects.filter(data__get=(2, 'a'))
MyModel.objects.filter(data__get__gt=('a', 1))
# key path traversal, compare with json or scalar.
MyModel.objects.filter(data__get=('{a,2}', {'foo': 'bar'}))
MyModel.objects.filter(data__get=('{a,2}', 2))
MyModel.objects.filter(data__get__lte=('{a,2}', 2))

I’m still not sure about the lookup names, especially the last set. The name “get” seems a little generic, and maybe we could use different lookup names for the input type, although only integer and string values are permitted.

Python, postgres and jsonb

2014-05-24 @ 21:46:33

here.

I maintain a json field for django, and was working today on getting the new (1.7+) lookup code to play nicely: in order for this to happen, you basically need to be running Postgres 9.4, and using a jsonb column. Otherwise, querying kind-of sucks.

After a significant amount of work, where I drift backwards and forwards between having old and new code working, I had an idea.

Some time ago I discovered that psycopg2 has really nice support for some custom types. Indeed, it’s super-easy to get it to handle UUID and json data. But it seems that it hasn’t yet been made to work with jsonb.

However, the registration process for handling the data makes it possible to do so, and trivial, since the serialised form will be essentially identical for both:

psycopg2.extras.register_json(
    conn_or_curs=None,
    globally=False,
    loads=None,
    oid=None,
    array_oid=None
)

Note the last two arguments. We can trick psycopg2 into using jsonb instead of json.

Is your database, execute:

SELECT oid, typarray FROM pg_type WHERE typname = 'jsonb';
-- oid      --> 3802
-- typarray --> 3807

(Syntax highlighting fail means I can’t include the actual results).

Your values may vary (I’m really not sure), but you’ll simply need to call register_json with the first two:

register_json(oid=3802, array_oid=3807)

Now, assuming you have a jsonb column, when you fetch data from it, it will already be turned into python objects.

Python 2.7.5 (default, Mar  9 2014, 22:15:05) 
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import psycopg2
>>> conn = psycopg2.connect("")
>>> cur = conn.cursor()
>>> cur.execute("SELECT * FROM jsonb_test;")
>>> data = cur.fetchone()
>>> data
(1, '{}')
>>> from psycopg2.extras import register_json
>>> register_json(oid=3802, array_oid=3807)
(<psycopg2._psycopg.type 'JSON' at 0x101713418>, <psycopg2._psycopg.type 'JSONARRAY' at 0x101721208>)
>>> cur.execute("SELECT * FROM jsonb_test;")
>>> data = cur.fetchone()
>>> data
(1, {})

ssh ProxyCommand

2014-05-23 @ 20:06:32

ssh
proxy
tunnel

here.

If you need to tunnel through a machine to get to another one, using ssh, you can easily do this with a ProxyCommand entry in your .ssh/config:

Host foo
ProxyCommand ssh <gateway-machine> -W <target-machine>:%p

If you have avahi-daemon installed on your gateway machine, you can get even better:

Host foo.local
ProxyCommand ssh <gateway-machine> -W %h:%p

I use this to get access to my office machine(s), without having to worry about firing up a VPN. I can rewrite the hostname so that I don’t need to have a different entry for each machine in the office:

Host *.office
ProxyCommand ssh <gateway-machine> -W $(echo %h | sed s/.office/.local/):%p

Bingo, now I can get really simple access to any machine in the office (as long as it has avahi installed, or is a Mac).

matt@alpaca:~ $ ssh karma.office
Last login: Fri May 23 19:52:29 2014 from 10.0.0.2
matt@karma:~ $

Transparent header on scroll

2014-05-07 @ 13:02:53

here.

I saw a nice effect the other day, on SourceJS. Basically, when you scroll the page at all, the header becomes transparent, but when you hover over it, it becomes opaque again.

Pretty, yet surprisingly easy to do.

Assume your HTML looks something like:

<body>
  <div id="main-menu">
    <!-- this contains the header -->
  </div>
</body>

The trick to getting stuff really smooth, with limited scripting required, is to use CSS transitions. So, instead of manually changing the opacity of the element, we just set/unset a class on the body, and have some CSS rules to set the opacity.

window.addEventListener('scroll', function () {
  document.body.classList[
    window.scrollY > 20 ? 'add': 'remove'
  ]('scrolled');
});

This fires every time there is a scroll event. In my browser, add/removing this class to the classList takes ~0.01ms.

Finally, there is the required CSS.

body.scrolled #main-menu {
  opacity: 0.2;
  transition: opacity .2s;
}
body.scrolled #main-menu:hover {
  opacity: 1.0;
  transition: opacity .2s;
}

That’s all there is to it!

Here’s one I prepared earlier.

Thoughts on Mutation Testing in Python (part 1)

2014-04-27 @ 12:42:06

here.

Writing code is fun.

Writing tests is a great way to have code that is likely to work.

Using a coverage tool will show you what percentage of your code is executed when you run your tests. But getting 100% coverage does not mean your code is 100% tested.

Take for example the following:

def product(a, b):
    return a * b

How might we go about testing this function?

>>> product(2, 2)
4

Okay, so technically, we now have 100% coverage of our function. Every line is executed when running the tests, but is it really tested?

What happens if we change our original function, and see if the tests pass:

def product(a, b):
    return a + b

Hmm. When we run that, with those arguments, we still pass our test.

What we have done here is mutate our code, and in this case, the mutant survived.

In order to test this code correctly, we want all possible mutations to be killed (or, tests that run should fail with mutants).

This is the first post in a series on mutation testing in python. Up next, we will investigate the types of mutants/mutations, and how they apply to python.

Get the class of a Django view function

2014-03-28 @ 09:45:03

here.

I needed to be able to get the class of a view function, once it had been instantiated via MyView.as_view(). I’d done something similar in the past to get the base callable view, but this was slightly different.

from django.views.generic.base import View

def get_class(func):
    if not getattr(func, 'func_closure', None):
        return
        
    for closure in func.func_closure:
        contents = closure.cell_contents
        
        if not contents:
            continue
        
        if getattr(contents, '__bases__', None) and issubclass(contents, View):
            return contents
        
        result = get_class(contents)
        if result:
            return result

This is a recursive function that does a depth-first search on the function object, until it finds an object that is a class, and is a subclass of django.views.generic.base.View.

You can use it like:

from django.core.urlresolvers import resolve
view = resolve('/path/to/url')

view_class = get_class(view.func)

KnockoutJS HTML binding

2014-03-18 @ 16:23:46

here.

TL;DR: Don’t use KnockoutJS html binding lots of times in your page.

I’m in the middle of rewriting a large part of our application in HTML: for a lot of the interactivity stuff, anything more than just a simple behaviour, I’m turning to KnockoutJS.

Mostly, it’s been awesome. Being able to use two-way binding is the obvious big winner, but dependency tracking is also fantastic.

However, I have had some concerns with performance in the past, and this was always on my mind as I moved into quite a complicated part of the system.

Our approach is that we are not creating a single page application: different parts of the system are at different URLs, and visiting that page loads up the relevant javascript. This is a deliberate tradeoff, mostly because for the forseeable future, our software will not work without a connection to our server: most of the logic related to shift selection is handled by that. We aren’t about to change that.

While rewriting the rostering interface, I initially had Django render the HTML, and I added behaviours. This was possible, and quite fast, however as the behaviours became more complex, I was doing things like sending back scripts that caused other parts of the page to refresh themselves. It was all rather fragile.

So, I went back to KnockoutJS. After a while, I noticed significant slowdowns when dealing with pages that really shouldn’t have been that slow. I’d optimised the database access for the fetching of shifts (and indeed, it is much faster than before), but it felt like Knockout was very sluggish.

I do have quite a few ko.computed() objects, perhaps they were slowing it down? Notably, the function that filters which shifts should be shown where on the page.

So I put some console.time()/timeEnd() calls in place.

Nope: the initial parse of the data runs in less than half a millisecond: instantiating the objects took a while, but the filtering of shifts was taking much less than 100ms.

However, the initial call to ko.applyBindings() was taking several seconds.

The most annoying thing was that when the developer tools were open, it was taking far, far longer!

Eventually, through using the developer tools profiling, I discovered that the slowdown was because of repeated code like:

foo.innerHTML = bar;

Initially, I had thought this slowdown was in KnockoutJS itself, and played around with other ways of binding (such as using the knockout-repeat plugin). Still slow.

Eventually, however, I worked out that it was the act of interacting with the DOM in this manner that was slow. More specifically, the assignation to innerHTML was occurring in the html: binding.

Looking through my source code, I discovered code that looked like:

<span data-bind="html: icon"></span>

And, icon contained the HTML I wanted to put in there:

<i class="icon-ok"></i>

Which was a bad idea to begin with: it conflated UI with data to begin with. So, I replaced the code that looked like:

this.icon = '<i class="icon-ok"></i>';

With:

this.icon = {
  'icon-time': true
};

And then, in the HTML:

<i data-bind="css: icon"></i>

Bingo. All of a sudden, a page that took several seconds to re-render does so in around a second.

It’s important to note that this pattern was repeated several times for each shift: and we have possibly dozens of shifts on a page. When you really need to use the html binding that’s fine, just don’t stick it inside a loop (or worse still, inside a nested loop).

Scheme line "values"

2014-03-04 @ 23:32:21

here.

Years ago, when I first saw TextMate demonstrated, one of the ways it was used as a teaching tool, when teaching Ruby, was to have the current line executed, and the value it returned appended to the current line:

(2 + 3) * 4 / 5 # => 4

That is, pressing Cmd-Shift-Ctrl-E would execute the line, and update the marker.

Today, while playing around with Scheme, I came up with a neat way to do the same type of thing.

Initially, I made it so that it executed the current line, and added/updated the marker. Then, I realised I could load the file, and then execute the current line.

You can create a new bundle command, and bind it to whatever key you want, with a scope selector of source.scheme, Input of Line, Output of Replace Input.

#!/usr/bin/env bash

[[ -f "${TM_SUPPORT_PATH}/lib/bash_init.sh" ]] && . "${TM_SUPPORT_PATH}/lib/bash_init.sh"

# Evaluate the current line in our Scheme interpreter
#
# The interpreter you use should be set in the environment
# variable TM_SCHEME

# The whole file will be loaded, and the current line's value executed,
# and added to the line as a comment.

INTERPRET=${TM_SCHEME}
CMD=$(basename "$INTERPRET")

LINE=`cat /dev/stdin | sed 's/; =>.*//'`
VALUE=`echo $LINE | $INTERPRET --load $TM_FILEPATH | grep ';Value: ' | sed 's/;Value: //'`

echo -n $LINE "; =>" $VALUE

Unfortunately, trailing comments are handled as a seperate line, so getting the ruby-like behaviour of updating all of the ; => comments will have to wait for another day.

django-boardinghouse

2014-01-29 @ 23:46:39

here.

I wrote a heap of code last April, under the name Multi-tenanted Django. It was fairly complete, but not especially well documented, and not really that well tested.

Recently, I’ve been having to write some reporting code at work that dealt with objects that are generated by django-reversion. If I was using tenancy-based partitioning, it would be really easy for me to just fetch the changes that were made to data from a given company: instead I need to do heaps of queries, and lots of filtering.

Which got me enthused on django-multi-schema, which has since been renamed to django-boardinghouse. And, it now has it’s own documentation, and an example project.

I’m still a bit cagey about releasing it to pypi, as the example project is pretty simple, and I’d like to build that (or another project) up a bit to see if I’ve made any more bad decisions: I’ve already changed it to opt-in to seperate schema to opt-out, and added in a configurable SCHEMA_MODEL.

It currently passes all tests under django 1.4 - 1.6, and has some functionality under django 1.7, but the migration handling code is not well tested just yet.