Using other Python versions with Codeship.

Codeship is pretty cool, other than their requirement to log in to view even public builds. They support Python to some extent, even going as far as creating and activating a virtualenv for your test environment.

However, I like to use tox to do matrix testing against packages, and try to cover as many cases as possible. For instance, for django-boardinghouse, I currently test against:

  • Python 2.7
  • Python 3.3
  • Python 3.4
  • Python 3.5
  • pypy
  • pypy3

…and Django 1.7 through 1.9. In most cases, each version of python should be tested with each version of django. In practice, there are some exceptions.

However, Codeship only have Python 2.7.6 and 3.4.0 installed.

You can run arbitrary code as part of your test/setup, but you can’t install stuff using sudo. Instead, I wrote a script that can be called from within the test setup that installs other pythons:

# We already have some versions of python, but want some more...
cd ~/src

mkdir -p pypy
cd pypy
wget https://bitbucket.org/squeaky/portable-pypy/downloads/pypy-5.0.1-linux_x86_64-portable.tar.bz2
tar --strip-components 1 -xvf pypy-5.0.1-linux_x86_64-portable.tar.bz2
cd ..

mkdir -p pypy3
cd pypy3
wget https://bitbucket.org/squeaky/portable-pypy/downloads/pypy3-2.4-linux_x86_64-portable.tar.bz2
tar --strip-components 1 -xvf pypy3-2.4-linux_x86_64-portable.tar.bz2
cd ..

mkdir -p ~/.local
wget https://www.python.org/ftp/python/3.5.1/Python-3.5.1.tar.xz
tar xvf Python-3.5.1.tar.xz
cd Python-3.5.1
./configure --prefix=/home/$USER/.local/
make
make install

# You actually need to put this line in the tests section. Not sure of a better solution.
export PATH=$PATH:~/src/pypy3/bin:~/src/pypy/bin:~/.local/bin/

I have this as a reusable snippet on BitBucket: codeship helper scripts, however as mentioned you need to grab the export PATH=... section and stick that in the tests section. Also notably you get a different URL for the raw version of each revision, which is actually really good, because it means someone cannot change the code between you inspecting it an executing it.

In my case, I have a line in the test setup:

curl https://bitbucket.org/\!api/2.0/snippets/schinckel/oKXKy/c7cc02bcd96d4a8f444cd997d5c3bc0bb92106d6/files/install-python.sh | sh

Also of note is that pypy* have a pre-built version, which is much faster than building from source, however there doesn’t seem to be a non-rpm version of Python 3.5.

Thoughts on Mutation testing in Python

Writing code is fun.

Writing tests can be less fun, but is a great way to have code that is more likely to work as desired.

Using a coverage tool will show you what percentage of your program is executed when you run your tests, but getting 100% code coverage does not mean your code is 100% tested.

For instance, testing a function:

def product(a, b):
  return a * b

If your tests are not carefully written, you may end up with tests that pass when they really should not. For instance:

>>> product(2, 2)
4

This execution shows only that it works for that choice of values. But how many sets of values do we need to test in order to be satisfied that a function does what is expected?

I happened across a document about Mutation Testing some time ago. It was actually new to me (and to a couple of people I respect), but it was actually quite interesting. The idea behind it is that you can change bits of the program, re-run your tests, and if your tests still succeed, then your tests are not sufficient.

Mutation Testing, then, is testing your tests.

For instance, mutating the * in the previous function to + results in a passing test, even though the code has changed (and, to us at least, is clearly different). Thus, we need to improve our test(s).

In theory, we should be able to mutate each statement in our program in all the ways it could be mutated, and if any of these mutants are not killed by our tests, then our tests are insufficient.

In practice, this is untenable. Some mutants will be equivalent (for instance, a * b is the same as b * a), but more so, there will be many, many mutants created for all but the simplest program.


Aside: doctest

Python has a fantastic library called doctest. This enables you to write tests as part of the function definition. I’ll use that here, as it makes writing the tests simple.

def product(a, b):
  """Multiply two numbers together.

  >>> product(2, 3)
  6
  """
  return a * b

They are really useful for documentation that is an accurate representation of what the code does. They don’t replace unit tests (indeed, you’ll want your unit tests to run your doctests), but they are good fun.

This investigation will use doctests, as then the tests for a module are always self-contained.


Selection of mutations

With a suitably complicated program, the number of possible mutations will be very, very large. We will, for now, consider only applying one mutation per mutant, doing otherwise would result in huge numbers of possible mutants.

There is quite a bit of academic research into Mutation Testing (indeed, if I were an academic, I’d probably write a paper or two on it), and one part that is relatively well documented is the discussion of mutation selection. That is, how can we reduce the number of mutant programs (and make our tests run in a satisfactory time), without missing mutations that could show how our tests may be improved.

Offtut et al (1996) discusses how we can reduce the number of mutation operators, and come up with the conclusion that with just five operators, sufficient tests can be determined.

Using the classifications from Mothra, they are:

  • ABS: ABSolute value insertion. Force arithmetic expressions to take on (-, 0, +) values.
  • AOR: Arithmetic Operator Replacement. Replace every arithmetic operator with every syntactically legal other arithmetic operator.
  • LCR: Logical Connectior Replacement. Replace every logical connector (and, or) with several kinds of logical connectors.
  • ROR: Relational Operator Replacement. Replace every relational operator with every other relational operator.
  • UOI: Unary Operator Insertion. Insert unary operators in front of expressions. In the case of python, the unary + operator will not be added, as that usually results in an equivalent mutant. I’m actually also going to include swapping unary operators as part of this operation.

These mutation operations make sense in python. For instance, in our toy example, we’d see the following (showing only the return statement from our product function):

# original
return a * b

# ABS
return -1
return 0
return 1
return a * -1
return a * 0
return a * 1
return -1 * b
return 0 * b
return 1 * b

# AOR
return a + b
return a - b
return a / b
return a // b
return a ** b
return a % b

# UOI
return -a * b
return a * -b
return -(a * b)
return not a * b
return a * not b
return not (a * b)
return ~a * b
return a * ~b
return ~(a * b)

We can see from our toy example that there are no LCR or ROR mutations possible, since there are no logical or relational operations or connectors. However, in the case of python, we could see code like:

# original
return a < b and a > 0

# LCR
return a < b or a > 0
return a < b and not a > 0
return a < b or not a > 0

# ROR
return a > b and a > 0
return a >= b and a > 0
return a <= b and a > 0
return a != b and a > 0
return a == b and a > 0

return a < b and a < 0
return a < b and a <= 0
return a < b and a == 0
return a < b and a >= 0
return a < b and a != 0

Limiting ourself to just these five mutation operations is great: it simplifies our mutation code immensely.

Within the context of python, I actually think there should also be some other mutations. A common mistake in programmer code is to omit a call to super(), or missing the explicit self argument in a method definition. For now, we’ll stick to the five above though.

I’m guessing the extreme age of Mothra means it wasn’t used in the case of object-oriented programming languages, even more so with multiple inheritance!

Okay, we are done with the generic stuff, time to get specific.


Aside: ast

Another great module that is included in python is ast. This allows you to build an Abstract Syntax Tree from (among other things) a string containing python code. Along with astor, which allows you to rewrite it as python code, after performing our mutation operation(s).

import ast
tree = ast.parse(open(filename).read())
ast.fix_missing_locations(tree)

The ast.fix_missing_locations stuff fixes up any line numbers, which we may use for reporting later.


Mutating the AST.

Bingo, now we have an abstract syntax tree containing our module. ast also contains classes for walking this tree, which we can subclass to do interesting things. For instance, to collect all statements that should be mutated, we can do something like:

SWAPS = {
  ast.Add: [ast.Mult, ast.Sub, ast.Div, ast.Mod, ast.Pow, ast.FloorDiv],
  # etc, etc
}

class Collector(ast.NodeVisitor):
  def __init__(self, *args, **kwargs):
    self.mutable_nodes = []

  def __visit_binary_operation(self, node):
    # Visit our left child node.
    self.visit(node.left)
    # Create all possible mutant nodes for this, according to AOR
    self.mutable_nodes.extend([
      (node, node.__class__(
        left=node.left, op=op(), right=node.right
      )) for op in SWAPS.get(node.op.__class__, [])
    ])
    # Visit our right child node.
    self.visit(node.right)

  # These three node types are handled identically.
  visit_BinOp = __visit_binary_operation
  visit_BoolOp = __visit_binary_operation
  visit_Compare = __visit_binary_operation

  def visit_UnaryOp(self, node):
    # Create all possible mutant nodes, according to UOI.
    self.mutable_nodes.extend([
      (node, node.__class__(
        op=op(), operand=node.operand)
      ) for op in SWAPS.get(node.op.__class__)
    ])
    # Also, create a node without the operator.
    self.mutable_nodes.append((node, node.operand))
    self.visit(node.operand)

Our mutator code is actually much simpler. Since we keep a reference to the nodes we want to mutate (and the new node it should be replaced with), we can just swap them each time a given node is visited:

class Mutator(ast.NodeTransformer):
  def __init__(self, original, replacment):
    self.original = original
    self.replacment = replacment

  def __swap(self, node):
    # Handle child nodes.
    self.generic_visit(node)

    # Don't swap out the node if it wasn't our target node.
    if node not in [self.original, self.replacement]:
        return node

    # Swap the node back if we are being visited again.
    if node == self.replacement:
        return self.original

    # Otherwise, swap the node out for the replacement.
    return self.replacement

  # We can just use this same function for a whole stack of visits.
  visit_BinOp = __swap
  visit_BoolOp = __swap
  visit_Compare = __swap
  visit_UnaryOp = __swap

Note that calling mutator.visit(tree) on a mutated tree will revert the mutation.

To use these, assuming we have a special function test that runs the tests we want:

tree = ast.parse(open(filename).read())
ast.fix_missing_locations(tree)

results = test(tree)
if results.failed:
    raise Exception("Unable to run tests without mutations.")

# Collect all of the possible mutants.
collector = Collector()
collector.visit(tree)

survivors = []

for (node, replacement) in collector.mutable_nodes:
    mutator = Mutator(node, replacement)
    # Apply our mutation
    mutator.visit(tree)
    ast.fix_missing_locations(tree)

    try:
        results = test(tree)
    except Exception:
        # Looks like this mutant was DOA.
        continue

    if not results.failed:
        survivors.append((node, replacement, results))

    # Revert our mutation
    mutator.visit(tree)

This is a bit of a simplification, but it’s actually pretty close to working code. We use multiple processes to run it in parallel (and also have a timeout based on the initial test run time, assuming we should not take more than twice as long), and compile the tree into a module to test it.

You may see the current version at pymutant. Keep in mind that this is little more than a proof of concept at this stage.


Testing our testing

So, let’s look at some toy examples, and see what the outcomes are.

def product(a, b):
    """
    >>> product(2, 2)
    4
    >>> product(2, 3)
    6
    """
    return a * b

def addition(a, b):
    """
    >>> addition(1, 0)
    1
    >>> addition(1, 1)
    2
    """
    return a + b

def negate(a):
    """
    >>> negate(1)
    -1
    >>> negate(-1)
    1
    """
    return -a

Running in a debug mode, we can view the mutants, and the results:

TestResults(failed=2, attempted=6) (a * b) -> (not (a * b))
TestResults(failed=1, attempted=6) (a * b) -> (a ** b)
TestResults(failed=2, attempted=6) (a * b) -> (a // b)
TestResults(failed=2, attempted=6) (a * b) -> 1
TestResults(failed=2, attempted=6) (a * b) -> (a % b)
TestResults(failed=2, attempted=6) (a * b) -> 0
TestResults(failed=2, attempted=6) (a * b) -> (- (a * b))
TestResults(failed=2, attempted=6) (a * b) -> (~ (a * b))
TestResults(failed=2, attempted=6) (a * b) -> True
TestResults(failed=2, attempted=6) (a * b) -> (-1)
TestResults(failed=2, attempted=6) (a * b) -> False
TestResults(failed=2, attempted=6) (a * b) -> (a / b)
TestResults(failed=1, attempted=6) (a * b) -> (a + b)
TestResults(failed=2, attempted=6) return (a * b) -> pass
TestResults(failed=2, attempted=6) (a * b) -> None
TestResults(failed=2, attempted=6) (a * b) -> (a - b)
TestResults(failed=2, attempted=6) (a + b) -> (not (a + b))
TestResults(failed=2, attempted=6) (a + b) -> (a % b)
TestResults(failed=1, attempted=6) (a + b) -> (a - b)
TestResults(failed=1, attempted=6) (a + b) -> (a ** b)
TestResults(failed=2, attempted=6) (a + b) -> (~ (a + b))
TestResults(failed=2, attempted=6) (a + b) -> (-1)
TestResults(failed=1, attempted=6) (a + b) -> 1
TestResults(failed=2, attempted=6) (a + b) -> (- (a + b))
TestResults(failed=2, attempted=6) (a + b) -> (a // b)
TestResults(failed=2, attempted=6) (a + b) -> None
TestResults(failed=2, attempted=6) (a + b) -> 0
TestResults(failed=1, attempted=6) (a + b) -> True
TestResults(failed=2, attempted=6) (a + b) -> False
TestResults(failed=2, attempted=6) return (a + b) -> pass
TestResults(failed=2, attempted=6) (a + b) -> (a / b)
TestResults(failed=2, attempted=6) (a + b) -> (a * b)
TestResults(failed=2, attempted=6) (- a) -> 0
TestResults(failed=2, attempted=6) (- a) -> (- (- a))
TestResults(failed=2, attempted=6) return (- a) -> pass
TestResults(failed=2, attempted=6) (- a) -> (not (- a))
TestResults(failed=1, attempted=6) (- a) -> True
TestResults(failed=2, attempted=6) (- a) -> False
TestResults(failed=1, attempted=6) (- a) -> (-1)
TestResults(failed=2, attempted=6) (- a) -> None
TestResults(failed=1, attempted=6) (- a) -> 1
TestResults(failed=2, attempted=6) (- a) -> (~ (- a))

===============
Mutation Report
===============

* Generated 42 mutants, and tested in 0.107773065567 seconds.

* 0 of these mutants were unable to execute correctly.

* 0 of these mutants were killed for taking too long to execute.

* Tests killed of 42 of the remaining mutants, leaving 0 survivors.

* Your innoculation rate is 100%.

Well, that’s all nice, but there’s more we can think about than this. What about tests that are “useless”? Tests that never fail, for instance?

Unfortunately, doctest.testmod() only returns the count of failures and attempts (and looking into that module, I’m not sure that which tests passed/failed is actually stored). It would be really nice to be able to capture this, but perhaps that is a task for a unittest-based approach.

What about a slightly more complex example?

 1 def aggregate(items):
 2     """
 3     Aggregate a list of 2-tuples, which refer to start/finish values.
 4 
 5     Returns a list with overlaps merged.
 6 
 7     >>> aggregate([])
 8     []
 9     >>> aggregate([(1, 3)])
10     [(1, 3)]
11     >>> aggregate([(1, 3), (2, 6)])
12     [(1, 6)]
13     >>> aggregate([(1, 3), (4, 6)])
14     [(1, 3), (4, 6)]
15     >>> aggregate([(3, 4), (1, 9)])
16     [(1, 9)]
17     """
18 
19     # Sort our items first, by the first value in the tuple. This means we can
20     # iterate through them later.
21     sorted_items = sorted(items)
22 
23     i = 0
24     while i < len(sorted_items) - 1:
25         current = sorted_items[i]
26         next = sorted_items[i + 1]
27 
28         if current[1] >= next[1]:
29             # Skip over the next item totally.
30             sorted_items.remove(next)
31             continue
32 
33         if current[1] >= next[0]:
34             # Merge the two items.
35             sorted_items[i:i+2] = ((current[0], next[1]),)
36             continue
37 
38         i += 1
39 
40     return sorted_items

And when we test with mutants (no debug mode this time, as we have 169 mutants):

===============
Mutation Report
===============

* Generated 169 mutants, and tested in 0.905333995819 seconds.

* 0 of these mutants were unable to execute correctly.

* 61 of these mutants were killed for taking too long to execute.

* Tests killed of 99 of the remaining mutants, leaving 9 survivors.

* Your innoculation rate is 94%.

Survivor Report
===============

0 at line 23, changed to False
(i < (len(sorted_items) - 1)) at line 24, changed to (- (i < (len(sorted_items) - 1)))
(i + 1) at line 26, changed to (i - 1)
(i + 1) at line 26, changed to (- (i + 1))
(i + 1) at line 26, changed to 1
(i + 1) at line 26, changed to (-1)
(i + 1) at line 26, changed to True
(current[1] >= next[1]) at line 28, changed to (- (current[1] >= next[1]))
(current[1] >= next[1]) at line 28, changed to (current[1] > next[1])

Timeout Report
==============

0 at line 23, changed to (-1)
...

I’ve omitted the remainder timeout report.

But, this does show us that our tests are incomplete, and perhaps what we should be doing to fix this.

In particular, note the group of mutants at line 26 that all survived: indicating that this particular line of code is not being tested well at all.

Perhaps the biggest takeaway (and an indicator of how Mutation Testing may be really useful) is the last listed mutant. It’s showing that this particular comparison is clearly not being tested for off-by-one errors.

tox and coverage.py

Tox makes it really easy to run multiple tests on your project: against different versions of python or different versions of a related library.

It’s still lacking proper matrix testing: you need to manually define each environment, but apparently, that is going to change:

However, that’s not what I’m writing about today.

Today is about coverage testing, using coverage.py.

It’s possible, using tox, to get coverage.py to run:

[testenv]
commands=
  coverage run setup.py test
  coverage report
deps=
  coverage

However, this will generate a coverage report for just that environment. It would be better if you generated a coverage report for the whole project (although you may want per-environment coverage testing too).

So, we can abuse the fact that the tox envlist will be created and processed in the order they appear:

[tox]
envlist = clean,py27,py34,stats

[testenv]
commands=
  coverage run -a setup.py test
deps=
  coverage

[testenv:clean]
commands=
  coverage erase

[testenv:stats]
commands=
  coverage report
  covarage html

You’ll then get a nice html report in htmlcov/, and a printed coverage report in your console.

Thoughts on Mutation Testing in Python (part 1)

Writing code is fun.

Writing tests is a great way to have code that is likely to work.

Using a coverage tool will show you what percentage of your code is executed when you run your tests. But getting 100% coverage does not mean your code is 100% tested.

Take for example the following:

def product(a, b):
    return a * b

How might we go about testing this function?

>>> product(2, 2)
4

Okay, so technically, we now have 100% coverage of our function. Every line is executed when running the tests, but is it really tested?

What happens if we change our original function, and see if the tests pass:

def product(a, b):
    return a + b

Hmm. When we run that, with those arguments, we still pass our test.

What we have done here is mutate our code, and in this case, the mutant survived.

In order to test this code correctly, we want all possible mutations to be killed (or, tests that run should fail with mutants).

This is the first post in a series on mutation testing in python. Up next, we will investigate the types of mutants/mutations, and how they apply to python.

Django Management.tmbundle

Did some work on my Django Management.tmbundle last night.

It now handles running tests when (a) The apps are not directly in the project root, but inside another folder, for instance; and (b) the app/tests.py file has been split into seperate files.

The main reason I made this was so that I could run tests and have clickable links in the results window for the location of failing tests.

There is still much to do on this. I am considering re-writing it in python rather than ruby, so I can programmatically find the app name, rather than guess it. I also want to refactor the hell out of it and make it much nicer.

Anyway, if you are interested, you can find the most recent version at http://github.com/schinckel/Django-Management.tmbundle - and I think it also appears in TextMate’s getBundles bundle.

Run Django Tests from TextMate

It would be cool to be able to run my Django tests from within TextMate.

Update: this version will run just the tests from the active file, if there are any. Otherwise, it runs all of the tests in the whole project.

Here is a Command to do just that:

 1     #! /usr/bin/env ruby
 2     
 3     command = [ENV["TM_PYTHON"] || "python", "-u", "#{ENV['TM_PROJECT_DIRECTORY']}/manage.py", "test", "--noinput"]
 4     
 5     File.open(ENV['TM_FILEPATH']) do |f|
 6       f.readlines.each do |line|
 7         if line =~ /class (.*)\(.*TestCase\):/
 8           test_case = $1
 9           app_name = ENV['TM_FILEPATH'].split(ENV['TM_PROJECT_DIRECTORY'])[1].split('/')[1]
10           test_name = "#{app_name}.#{test_case}"
11           command << test_name
12         end
13       end
14     end
15     
16     require ENV["TM_SUPPORT_PATH"] + "/lib/tm/executor"
17     
18     ENV["PYTHONPATH"] = ENV["TM_BUNDLE_SUPPORT"] + (ENV.has_key?("PYTHONPATH") ? ":" + ENV["PYTHONPATH"] : "")
19     
20     TextMate::Executor.run(command) do |str, type|
21       if type == :err
22         if str =~ /\A[\.EF]*\Z/
23           str.gsub!(/(\.)/, "<span class=\"test ok\">\1</span>")
24           str.gsub!(/(E|F)/, "<span class=\"test fail\">\1</span>")
25           str + "<br/>\n"
26         elsif str =~ /\A(FAILED.*)\Z/
27           "<div class=\"test fail\">#{htmlize $1}</div>\n"
28         elsif str =~ /\A(OK.*)\Z/
29           "<div class=\"test ok\">#{htmlize $1}</div>\n"
30         elsif str =~ /^(\s+)File "(.+)", line (\d+), in (.*)/
31           indent = $1
32           file   = $2
33           line   = $3
34           method = $4
35           indent += " " if file.sub!(/^\"(.*)\"/,"\1")
36           url = "&url=file://" + e_url(file)
37           display_name = file.split('/').last 
38           "#{htmlize(indent)}<a class=\"near\" href=\"txmt://open?line=#{line + url}\">" +
39             (method ? "method #{method}" : "<em>at top level</em>") +
40             "</a> in <strong>#{display_name}</strong> at line #{line}<br/>\n"
41         end
42       end
43     end

Installing ClearSilver 10.5 on OS X

I struggled for couple of hours tonight trying to get ClearSilver to build under OS X, so I could investigate Bitten, a continuous integration tool for Trac.

Here’s how I succeeded.

Download the source to ClearSilver and unpack it. Then change to that directory. You’ll need to enter the following command to configure:

$ ./configure --disable-apache --disable-java --disable-ruby --with-python=which python --disable-perl --disable-csharp --enable-gettext

Then do the make; make install dance (you may need to sudo the second one).

Finally, and this one took me a second go to remember, change to the python subdirectory, and use:

$ python setup.py build

$ sudo python setup.py install