Using the pipeline module ruffus with Python 3.2

The Python module ruffus is a great companion for setting up pipelines easily and quickly. Unfortunately the development has been focused on Python 2.x … until recently. There are some recent adaptations to get it running under Python 3.x but there are some modifications needed.

At first download the development version of ruffus:

$ svn checkout http://ruffus.googlecode.com/svn/trunk/ ruffus-read-only

Then use the 2to3 tool to for the code conversion:

$ 2to3 -w ruffus-read-only/

Let’s check if this works by running a small script:

$ cd ruffus-read-only/
$ emacs -nw test.py
$ cat test.py
from ruffus import *

def first_task():
    print("One")

@follows(first_task)
def second_task():
    print("Two")

pipeline_run(second_task)
$ python --version
Python 3.2
$ python test.py
Traceback (most recent call last):
  File "test.py", line 10, in 
    pipeline_run(second_task)
  File "[...]/ruffus-read-only/ruffus/task.py", line 2675, in
pipeline_run
    pool_func = imap
NameError: global name 'imap' is not defined

As you can see 2to3 did not do the full job. But the code needs only a small adaptation to be Python 3.x compatible (it is a tiny problem due to changes in itertools):

$ cp ruffus/task.py ruffus/task.py.org
$ emacs -nw ruffus/task.py
$ diff ruffus/task.py.org ruffus/task.py
2675c2675
<         pool_func = imap
---
>         pool_func = map

$ python test.py
One
    Job completed
Completed Task = first_task
Two
    Job completed
Completed Task = second_task 

Now it works!