skip to navigation
skip to content

Planet Python

Last update: September 02, 2014 05:48 AM

September 02, 2014


Graham Dumpleton

Debugging with pdb when using mod_wsgi.

In the early days of mod_wsgi I made a decision to impose a restriction on the use of stdin and stdout by Python WSGI web applications. My reasoning around this was that if you want to make a WSGI application portable to any WSGI deployment mechanism, then you should not be attempting to use stdin/stdout. This includes either reading or writing to these file objects, or even performing a check on

September 02, 2014 05:37 AM

What is the current version of mod_wsgi?

If you pick up any Linux distribution, you will most likely come to the conclusion that the newest version of mod_wsgi available is 3.3 or 3.4. Check when those versions were released and you will find: mod_wsgi version 3.3 - released 25th July 2010 mod_wsgi version 3.4 - released 22nd August 2012 Problem is that people look at that and seeing that there are only infrequent releases and nothing

September 02, 2014 03:20 AM

Using Python virtual environments with mod_wsgi.

You should be using Python virtual environments and if you don't know why you should, maybe you should find out. That said, the use of Python virtual environments was the next topic that came up in my hallway track discussions at DjangoCon US 2014. The pain point here is in part actually of my own creation. This is because although there are better ways of using Python virtual environments with

September 02, 2014 01:50 AM

September 01, 2014


PyPy Development

Python Software Foundation Matching Donations this Month

We're extremely excited to announce that for the month of September, any amount
you donate to PyPy will be match (up to $10,000) by the Python Software
Foundation
.

This includes any of our ongoing fundraisers: NumPyPy, STM, Python3, or our
general fundraising.

Here are some of the things your previous donations have helped accomplish:

You can see a preview of what's coming in our next 2.4 release in the draft
release notes
.

Thank you to all the individuals and companies which have donated so far.

So please, donate today: http://pypy.org/

(Please be aware that the donation progress bars are not live updating, so
don't be afraid if your donation doesn't show up immediately).

September 01, 2014 05:41 PM


Carl Trachte

PDF - Removing Pages and Inserting Nested Bookmarks

I blogged before about PyPDF2 and some initial work I had done in response to a request to get a report from Microsoft SQL Server Reporting Services into PDF format.  Since then I've had better luck with PyPDF2 using it with Python 3.4.  Seldom do I need to make any adjustments to either the PDF file or my Python code to get things to work.

Presented below is the code that is working for me now.  The basic gist of it is to strip the blank pages (conveniently SSRS dumps the report with a blank page every other page) from the SSRS PDF dump and reinsert the bookmarks in the right places in a new final document.  The report I'm doing is about 30 pages, so having bookmarks is pretty critical for presentation and usability.

The approach I took was to get the bookmarks out of the PDF object model and into a nested dictionary that I could understand and work with easily.  To keep the bookmarks in the right order for presentation I used collections.OrderedDict instead of just a regular Python dictionary structure.  The code should work for any depth level of nested parent-child PDF bookmarks.  My report only goes three or four levels deep, but things can get fairly complex even at that level.

There are a couple artifacts of the actual report I'm doing - the name "comparisonreader" refers to the subject of the report, a comparison of accounting methods' results.  I've tried to sanitize the code where appropriate, but missed a thing or two.

It may be a bit overwrought (too much code), but it gets the job done.  Thanks for having a look.

#!C:\python34\python

"""
Strip out blank pages and keep bookmarks for
SQL Server SSRS dump of model comparison report (pdf).
"""


import PyPDF2 as pdfimport math
from collections import OrderedDict

INPUTFILE = 'SSRSdump.pdf'
OUTPUTFILE = 'Finalreport.pdf'

OBJECTKEY = '/A'
LISTKEY = '/D'


# Adobe PDF document element keys.
FULLPAGE = '/Fit'
PAGE = '/Page'
PAGES = '/Pages'
ROOT = '/Root'
KIDS = '/Kids'
TITLE = '/Title'


# Python/PDF library types.
NODE = pdf.generic.Destination
CHILD = list


ADDPAGE = 'Adding page {0:d} from SSRS dump to page {1:d} of new document . . .'

# dictionary keys
NAME = 'name'
CHILDREN = 'children'


INDENT = 4 * ' '

ADDEDBOOKMARK = 'Added bookmark {0:s} to parent bookmark {1:s} at depthlevel {2:d}.'

TOPLEVEL = 'TOPLEVEL'

def getpages(comparisonreader):
    """
    From a PDF reader object, gets the
    page numbers of the odd numbered pages
    in the old document (SSRS dump) and
    the corresponding page in the final
    document.

    Returns a generator of two tuples.
    """
    # get number of pages then get odd numbered pages
    # (even numbered indices)
    numpages = comparisonreader.getNumPages()
    return ((x, int(x/2)) for x in range(numpages) if x % 2 == 0)


def fixbookmark(bookmark):
    """
    bookmark is a PyPDF2 bookmark object.

    Side effect function that changes bookmark
    page display mode to full page.
    """
    # getObject yields a dictionary
    props = bookmark.getObject()[OBJECTKEY][LISTKEY][1] = pdf.generic.NameObject(FULLPAGE)
    return 0


def matchpage(page, pages):
    """
    Find index of page match.

    page is a PyPDF2 page object.
    pages is the list (PyPDF2 array) of page objects.
    Returns integer page index in new (smaller) doc.
    """
    originalpageidx = pages.index(page)
    return math.floor((originalpageidx + 1)/2)


def pagedict(bookmark, pages):
    """
    Creates page dictionary for PyPDF2 bookmark object.

    bookmark is a PDF object (dictionary).
    pages is a list of PDF page objects (dictionary).
    Returns two tuple of a dictionary and
    integer page number.
    """
    page = matchpage(bookmark[PAGE].getObject(), pages)
    title = bookmark[TITLE]
    # One bookmark per page per level.
    lookupdict = OrderedDict()
    lookupdict.update({page:{NAME:title,
                             CHILDREN:OrderedDict()}})
    return lookupdict, page


def recursivepopulater(bookmark, pages):
    """
    Fills in child nodes of bookmarks
    recursively and returns dictionary.
    """
    dictx = OrderedDict()
    for pagex in bookmark:
        if type(pagex) is NODE:
            # get page info and update dictionary with it
            lookupdict, page = pagedict(pagex, pages)
            dictx.update(lookupdict)
        elif type(bookmark) is CHILD:
            newdict = OrderedDict()
            newdict.update(recursivepopulater(pagex, pages))
            dictx[page][CHILDREN].update(newdict)
    return dictx


def makenewbookmarks(pages, bookmarks):
    """
    Main function to generate bookmark dictionary:

    {page number: {name:<name>,
                   children:[<more bookmarks>]},
                   and so on.

    Returns dictionary.
    """
    dictx = OrderedDict()
    # top level bookmarks
    # it's going to go bookmark, list, bookmark, list, etc.
    for bookmark in bookmarks:
        if type(bookmark) is NODE:
            # get page info and update dictionary with it
            lookupdict, page = pagedict(bookmark, pages)
            dictx.update(lookupdict)
        elif type(bookmark) is CHILD:
            dictx[page][CHILDREN] = recursivepopulater(bookmark, pages)
    return dictx


def printbookmarkaddition(name, parentname, depthlevel):
    """
    Print notification of bookmark addition.

    Indentation based on integer depthlevel.
    name is the string name of the bookmark.
    parentname is the string name of the parent
    bookmark.

    Side effect function.
    """
    args = name, parentname, depthlevel
    indent = depthlevel * INDENT
    print(indent + ADDEDBOOKMARK.format(*args))


def dealwithbookmarks(comparisonreader, output, bookmarkdict, depthlevel, levelparent=None, parentname=None):
    """
    Fix bookmarks so that they are properly
    placed in the new document with the blank
    pages removed. Recursive side effect function.

    comparisonreader is the PDF reader object
    for the original document.


    output is the PDF writer object for the
    final document.


    bookmarkdict is a dictionary of bookmarks.

    depthlevel is the depth inside the nested
    dictionary-list structure (0 is the top).


    levelparent is the parent bookmark.

    parentname is the name of the parent bookmark.
    """
    depthlevel += 1
    for pagekeylevel in bookmarkdict:
        namelevel = bookmarkdict[pagekeylevel][NAME]
        levelparentii = output.addBookmark(namelevel, pagekeylevel, levelparent)
        if depthlevel == 0:
            parentname = TOPLEVEL
        printbookmarkaddition(namelevel, parentname, depthlevel)
        fixbookmark(levelparentii)
        # dictionary
        secondlevel = bookmarkdict[pagekeylevel][CHILDREN]
        argsx = comparisonreader, output, secondlevel, depthlevel, levelparentii, namelevel
        # Recursive call.
        dealwithbookmarks(*argsx)


def cullpages():
    """
    Fix SSRS PDF dump by removing blank
    pages.
    """
    ssrsdump = open(INPUTFILE, 'rb')
    finalreport = open(OUTPUTFILE, 'wb')
    comparisonreader = pdf.PdfFileReader(ssrsdump)
    pageindices = getpages(comparisonreader)
    output = pdf.PdfFileWriter()
    # add pages from SSRS dump to new pdf doc
    for (old, new) in pageindices:
        print(ADDPAGE.format(old, new))
        pagex = comparisonreader.getPage(old)
        output.addPage(pagex)

    # Attempt to add bookmarks from original doc
    # getOutlines yields a list of nested dictionaries and lists:
    #    outermost list - starts with parent bookmark (dictionary)
    #        inner list - starts with child bookmark (dictionary)       
    #                     and so on
    # The SSRS dump and this list have bookmarks in correct order.
    bookmarks = comparisonreader.getOutlines()
    # Get page numbers using this methodology (indirect object references)
    #
http://stackoverflow.com/questions/1918420/split-a-pdf-based-on-outline
    # list of IndirectObject's of pages in order
    pages = [pagen.getObject() for pagen in
            comparisonreader.trailer[ROOT].getObject()[PAGES].getObject()[KIDS]]
    # Bookmarks.
    # Top level is list of bookmarks.
    # List goes parent bookmark (Destination object)
    #               child bookmarks (list)
    #                   and so on.
    bookmarkdict = makenewbookmarks(pages, bookmarks)
    # Initial level of -1 allows increment to 0 at start.
    dealwithbookmarks(comparisonreader, output, bookmarkdict, -1)

    print('\n\nWriting final report . . .')
    output.write(finalreport)
    finalreport.close()
    ssrsdump.close()
    print('\n\nFinished.\n\n')


if __name__ == '__main__':
    cullpages()

September 01, 2014 04:59 PM


Graham Dumpleton

Setting LANG and LC_ALL when using mod_wsgi.

So I am at DjangoCon US 2014 and one of the first pain points for using mod_wsgi that came up in discussion at DjangoCon US was the lang and locale settings. These settings influence what the default encoding is for Python when implicitly converting Unicode to byte strings. In other words, they dictate what is going on at the Unicode/bytes boundary. Now this should not really be an issue with

September 01, 2014 04:42 PM

Reporting on the DjangoCon US 2014 hallway track.

I have only been in Portland for a few hours for DjangoCon, and despite some lack of sleep, I already feel that being here is recharging my enthusiasm for working on Open Source, something that has still been sagging a bit lately. I don't wish to return to that dark abyss I was in, so definitely what I need. Now lots of people write up reports on conferences including live noting them, but I

September 01, 2014 03:34 PM


Leonardo Giordani

Python 3 OOP Part 5 - Metaclasses

Previous post

Python 3 OOP Part 4 - Polymorphism

The Type Brothers

The first step into the most intimate secrets of Python objects comes from two components we already met in the first post: class and object. These two things are the very fundamental elements of Python OOP system, so it is worth spending some time to understand how they work and relate each other.

First of all recall that in Python everything is an object, that is everything inherits from object. Thus, object seems to be the deepest thing you can find digging into Python variables. Let's check this

``` python

a = 5 type(a) a.class a.class.bases (,) object.bases () ```

The variable a is an instance of the int class, and this latter inherits from object, which inherits from nothing. This demonstrates that object is at the top of the class hierarchy. However, as you can see, both int and object are called classes (<class 'int'>, <class 'object'>). Indeed, while a is an instance of the int class, int itself is an instance of another class, a class that is instanced to build classes

``` python

type(a) type(int) type(float) type(dict) ```

Since in Python everything is an object, everything is the instance of a class, even classes. Well, type is the class that is instanced to get classes. So remember this: object is the base of every object, type is the class of every type. Sounds puzzling? It is not your fault, don't worry. However, just to strike you with the finishing move, this is what Python is built on

``` python

type(object) type.bases (,) ```

If you are not about to faint at this point chances are that you are Guido van Rossum of one of his friends down at the Python core development team (in this case let me thank you for your beautiful creation). You may get a cup of tea, if you need it.

Jokes apart, at the very base of Python type system there are two things, object and type, which are inseparable. The previous code shows that object is an instance of type, and type inherits from object. Take your time to understand this subtle concept, as it is very important for the upcoming discussion about metaclasses.

When you think you grasped the type/object matter read this and start thinking again

``` python

type(type) ```

The Metaclasses Take Python

You are now familiar with Python classes. You know that a class is used to create an instance, and that the structure of this latter is ruled by the source class and all its parent classes (until you reach object).

Since classes are objects too, you know that a class itself is an instance of a (super)class, and this class is type. That is, as already stated, type is the class that is used to build classes.

So for example you know that a class may be instanced, i.e. it can be called and by calling it you obtain another object that is linked with the class. What prepares the class for being called? What gives the class all its methods? In Python the class in charge of performing such tasks is called metaclass, and type is the default metaclass of all classes.

The point of exposing this structure of Python objects is that you may change the way classes are built. As you know, type is an object, so it can be subclassed just like any other class. Once you get a subclass of type you need to instruct your class to use it as the metaclass instead of type, and you can do this by passing it as the metaclass keyword argument in the class definition.

``` python

class MyType(type): ... pass ... class MySpecialClass(metaclass=MyType): ... pass ... msp = MySpecialClass() type(msp) type(MySpecialClass) type(MyType) ```

Metaclasses 2: Singleton Day

Metaclasses are a very advanced topic in Python, but they have many practical uses. For example, by means of a custom metaclass you may log any time a class is instanced, which can be important for applications that shall keep a low memory usage or have to monitor it.

I am going to show here a very simple example of metaclass, the Singleton. Singleton is a well known design pattern, and many description of it may be found on the Internet. It has also been heavily criticized mostly because its bad behaviour when subclassed, but here I do not want to introduce it for its technological value, but for its simplicity (so please do not question the choice, it is just an example).

Singleton has one purpose: to return the same instance every time it is instanced, like a sort of object-oriented global variable. So we need to build a class that does not work like standard classes, which return a new instance every time they are called.

"Build a class"? This is a task for metaclasses. The following implementation comes from Python 3 Patterns, Recipes and Idioms.

``` python class Singleton(type):

instance = None
def __call__(cls, *args, **kw):
    if not cls.instance:
         cls.instance = super(Singleton, cls).__call__(*args, **kw)
    return cls.instance

```

We are defining a new type, which inherits from type to provide all bells and whistles of Python classes. We override the __call__ method, that is a special method invoked when we call the class, i.e. when we instance it. The new method wraps the original method of type by calling it only when the instance attribute is not set, i.e. the first time the class is instanced, otherwise it just returns the recorded instance. As you can see this is a very basic cache class, the only trick is that it is applied to the creation of instances.

To test the new type we need to define a new class that uses it as its metaclass

``` python

class ASingleton(metaclass=Singleton): ... pass ... a = ASingleton() b = ASingleton() a is b True hex(id(a)) '0xb68030ec' hex(id(b)) '0xb68030ec' ```

By using the is operator we test that the two objects are the very same structure in memory, that is their ids are the same, as explicitly shown. What actually happens is that when you issue a = ASingleton() the ASingleton class runs its __call__() method, which is taken from the Singleton type behind the class. That method recognizes that no instance has been created (Singleton.instance is None) and acts just like any standard class does. When you issue b = ASingleton() the very same things happen, but since Singleton.instance is now different from None its value (the previous instance) is directly returned.

Metaclasses are a very powerful programming tool and leveraging them you can achieve very complex behaviours with a small effort. Their use is a must every time you are actually metaprogramming, that is you are writing code that has to drive the way your code works. Good examples are creational patterns (injecting custom class attributes depending on some configuration), testing, debugging, and performance monitoring.

Coming to Instance

Before introducing you to a very smart use of metaclasses by talking about Abstract Base Classes (read: to save some topics for the next part of this series), I want to dive into the object creation procedure in Python, that is what happens when you instance a class. In the first post this procedure was described only partially, by looking at the __init_() method.

In the first post I recalled the object-oriented concept of constructor, which is a special method of the class that is automatically called when the instance is created. The class may also define a destructor, which is called when the object is destroyed. In languages without a garbage collection mechanism such as C++ the destructor shall be carefully designed. In Python the destructor may be defined through the __del__() method, but it is hardly used.

The constructor mechanism in Python is on the contrary very important, and it is implemented by two methods, instead of just one: __new__() and __init__(). The tasks of the two methods are very clear and distinct: __new__() shall perform actions needed when creating a new instance while __init__ deals with object initialization.

Since in Python you do not need to declare attributes due to its dynamic nature, __new__() is rarely defined by programmers, who may rely on __init__ to perform the majority of the usual tasks. Typical uses of __new__() are very similar to those listed in the previous section, since it allows to trigger some code whenever your class is instanced.

The standard way to override __new__() is

``` python class MyClass():

def __new__(cls, *args, **kwds):
    obj = super().__new__(cls, *args, **kwds)
    [put your code here]
    return obj

```

just like you usually do with __init__(). When your class inherits from object you do not need to call the parent method (object.__init__()), because it is empty, but you need to do it when overriding __new__.

Remember that __new__() is not forced to return an instance of the class in which it is defined, even if you shall have very good reasons to break this behaviour. Anyway, __init__() will be called only if you return an instance of the container class. Please also note that __new__(), unlike __init__(), accepts the class as its first parameter. The name is not important in Python, and you can also call it self, but it is worth using cls to remember that it is not an instance.

Movie Trivia

Section titles come from the following movies: The Blues Brothers (1980), The Muppets Take Manhattan (1984), Terminator 2: Judgement Day (1991), Coming to America (1988).

Sources

You will find a lot of documentation in this Reddit post. Most of the information contained in this series come from those sources.

Feedback

Feel free to use the blog Google+ page to comment the post. The GitHub issues page is the best place to submit corrections.

September 01, 2014 01:00 PM


Python Software Foundation

Matching Donations to PyPy in September!

We're thrilled to announce that we will be matching donations made to the PyPy project for the month of September. For every dollar donated this month, the PSF will also give a dollar, up to a $10,000 total contribution. Head to http://pypy.org/ and view the donation options on the right side of the page, including general funding or a donation targeted to their STM, Py3k, or NumPy efforts.

We've previously given a $10,000 donation to PyPy, and more recently seeded the STM efforts with $5,000. The PyPy project works with the Software Freedom Conservancy to manage fund raising efforts and the usage of the funds, and they'll be the ones notifying us of how you all made your donations. At the end of the month, we'll do our part and chip  in to making PyPy even better.

The matching period runs today through the end of September.

September 01, 2014 01:33 PM


Machinalis

Decision tree classifier

Introduction

This post presents a simple but still fully functional Python implementation of a decision tree classifier. It is not aimed to be a tutorial on machine learning, classifications or even decision trees: there are a lot of resources on the web already. The main idea is to provide a Python example implementation for those who are familiar or comfortable with this language.

There are several decision tree algorithms that have been developed over time, each one improving or optimizing something over the predecessor. In this post, the implementation presented corresponds to the first well-known algorithm on the subject: the Iterative Dichotomiser 3 (ID3), developed in 1986 by Ross Quinlan.

For those familiar with the scikit-learn library, its documentation includes a specific section devoted to decision trees. This API provides a production-ready, fully parametric implementation of an optimized version of the CART algorithm.

Implementation

The code is here: https://gist.github.com/cmdelatorre/fd9ee43167f5cc1da130

Basically a tree is represented using Python dicts. A couple of very simple classes that extend dict where created to distinguish between tree or leaf nodes. Also, a namedtuple was defined to match each training sample with its corresponding class. The necessary information_gain and entropy functions where created, their implementations really simple thanks to Python’s standard collections lib.

Finally, the main piece of code is the tree-creation method: this is where all the magic happens.

def create_decision_tree(self, training_samples, predicting_features):
    """Recursively, create a desition tree and return the parent node."""

    if not predicting_features:
        # No more predicting features
        default_klass = self.get_most_common_class(training_samples)
        root_node = DecisionTreeLeaf(default_klass)
    else:
        klasses = [sample.klass for sample in training_samples]
        if len(set(klasses)) == 1:
            target_klass = training_samples[0].klass
            root_node = DecisionTreeLeaf(target_klass)
        else:
            best_feature = self.select_best_feature(training_samples,
                                                    predicting_features,
                                                    klasses)
            # Create the node to return and create the sub-tree.
            root_node = DecisionTreeNode(best_feature)
            best_feature_values = {s.sample[best_feature]
                                   for s in training_samples}
            for value in best_feature_values:
                samples = [s for s in training_samples
                           if s.sample[best_feature] == value]
                # Recursively, create a child node.
                child = self.create_decision_tree(samples,
                                                  predicting_features)
                root_node[value] = child
    return root_node

Motivated by the already mentioned scikit-learn library, the algorithm is developed within a class with the following methods:

  • fit(training_samples, known_labels) : Creates the decision tree using the training data.
  • predict(samples) : given a fitted model, predict the label of a new set of data. It returns the learned label for each sample in the given array.
  • score(samples, known_labels) : predicts the labels for the given data samples and contrasts with the truth provided in the known_labels. Returns a score which is a number between 0 (no matches) and 1 (perfect match).

Other than that, the code is pretty much self explanatory. Using the standard Python module collections, the auxiliary methods (select_best_feature, information_gain, entropy) are very concise. The tree is easily implemented using dict:

  • Each node is either a leaf or a branch: If it is a leaf then it represents a class. If it is a branch, then it represents a feature.
  • Each branch has got as many children as possible values has the represented feature.

Then, to classify a given vector X = [f0, ..., fn], starting with the root of the generated tree:

  1. Take the root node (usually a branch, unless X has only one feature, which is not really useful).
  2. Such node will have a related feature, fi, so we check the value of X for the target feature: v = X [ fi]
  3. If node[v] is a leaf, then we assign the leaf’s related class to X.
  4. If v is not a key in node[v], then we can’t assign a class with the existing tree and we assign a default class (the most probable one).
  5. If node[v] is another branch, we repeat this procedure using the new node as root.

I’ll not dig further in the details as this is not supposed to be a tutorial or course on decision trees. Some minimal previous knowledge should be enough to understand the code. In any case, don’t hesitate to post your questions or comments.

To keep updated about Machine Learning, Data Processing and Complex Web Development follow us on @machinalis.

September 01, 2014 12:29 PM


Ian Ozsvald

Slides for High Performance Python tutorial at EuroSciPy2014 + Book signing!

Yesterday I taught an excerpt of my 2 day High Performance Python tutorial as a 1.5 hour hands-on lesson at EuroSciPy 2014 in Cambridge with 70 students:

IMG_20140828_155857

We covered profiling (down to line-by-line CPU & memory usage), Cython (pure-py and OpenMP with numpy), Pythran, PyPy and Numba. This is an abridged set of slides from my 2 day tutorial, take a look at those details for the upcoming courses (including an intro to data science) we’re running in October.

I’ll add the video in here once it is released, the slides are below.

I also got to do a book-signing for our High Performance Python book (co-authored with Micha Gorelick), O’Reilly sent us 20 galley copies to give away. The finished printed book will be available via O’Reilly and Amazon in the next few weeks.

Book signing at EuroSciPy 2014

If you want to hear about our future courses then join our low-volume training announce list. I have a short (no-signup) survey about training needs for Pythonistas in data science, please fill that in to help me figure out what we should be teaching.

I also have a further survey on how companies are using (or not using!) data science, I’ll be using the results of this when I keynote at PyConIreland in October, your input will be very useful.

Here are the slides (License: CC By NonCommercial), there’s also source on github:


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

September 01, 2014 09:11 AM


Carl Trachte

Internet Explorer 9 Save Dialog - SendKeys Last Resort

At work we use Internet Explorer 9 on Windows 7 Enterprise.  SharePoint is the favored software for filesharing inside organizational groups.  Our mine planning office is in the States; the mine operation whose data I work is in a remote, poorly connected location of the world.

Recently Sharepoint was updated to a new version at the mine.  The SharePoint server configuration there no longer allows Windows Explorer view or mapping of the site to a Windows drive letter.  I've put in a trouble ticket to regain this functionality, but that may take a while if it's possible.  Without it it is difficult to automate file retrieval or get more than one file at a time.

In the meantime I've been able to get the text based files over using win32com automation in Python to run Internet Explorer and grab the innerHTML object.  innerHTML is essentially the text of the files with tags around it.  I rip out the tags, write the text to a file on my harddrive and I'm good to go.

Binary files proved to be more difficult to download.  Shown below is a screenshot of the Internet Explorer 9 dialog box that goes by the generic name Notification Bar:

 
I googled and could nowhere find how this thing fit into the Internet Explorer 9 Document object hierarchy.  Then I came upon this colorful exchange between Microsoft Certified MVP's from 2012 that made things a little more clear.
 
It turns out you can't access the Notification Bar programatically per se.  What you can do is activate the specific Internet Explorer window and tab you're interested in, then send keystrokes to get where you want to, click, and download your file.
 
I'm not a web programmer nor am I a dedicated Windows programmer (I'm actually a geologist).  IEC is a small module that wraps some useful functionality - in my case identifying and clicking on the link on the SharePoint page by it's text identifier:
 
# C Python 2.7
 
# Internet Explorer module.
import IEC as iec
 
import time
 
ie = iec.IEController()
 
ie.Navigate(<URL of SharePoint page>)
# Give the page time to load (7 seconds).
time.sleep(7)
# I want to download file 11.msr.
ie.ClickLink('11')
# Give 5 seconds for the Notification Bar to show up.
time.sleep(5)
 
I'm fortunate in that our mine planning vendor, MineSight, ships Python 2.7 and associated win32com packages along with their software (their API's are written for Python).  If you don't have win32com and friends installed, they are necessary for this solution.
 
At this point I've just got to deal with that pesky Internet Explorer 9 Notification Bar.  As it turns out, SendKeys makes it doable (although neither elegant nor robust :-(   ):
 
# Activate the SharePoint page.
from win32com.client import Dispatch as dispx
shell = dispx('WScript.Shell')
shell.AppActivate(<name of IE9 tab>)
# Little pause.
time.sleep(0.5)
# Keyboard combination for the Notification Bar selection
# is ALT-N or '%n'
shell.SendKeys('%n', True)
# The Notification Bar goes to "Open" by default.
# You need to tab over to the "Save" button.
shell.SendKeys('{TAB}')
# Another little pause.
time.sleep(0.1)
# Space bar clicks on this control.
shell.SendKeys(' ', True)
 
The key combinations for accessing the Notification Bar are in Microsoft's documentation here
 
One link showing use of SendKeys is a German site (mostly English text) here.
 
And that's pretty much it.  There's another dialog that pops up in Internet Explorer 9 after the file is downloaded.  I've been able to blow that off so far and it hasn't gotten in the way as I move to the next download.  I give these files (about 300 kb) 15 seconds to download over a slow connection.  I may have to adjust that.
 
This solution is an abomination by any coding/architecture/durability standard.  Still, it's the abomination that is getting the job done for the time being.
 
Thanks for stopping by.
 
 

September 01, 2014 03:36 AM

August 31, 2014


Varun Nischal

code4Py | Style Context Differences

As per recently created page, the following diff command output representing context differences, needed to be styled; [vagrant@localhost python]$ diff -c A B *** A 2014-08-20 20:13:30.315009258 +0000 --- B 2014-08-20 20:13:39.021009349 +0000 *************** *** 1,6 **** --- 1,9 ---- + typeset -i sum=0 + while read num do printf "%d " ${num} + sum=sum+${num} done … Continue reading

August 31, 2014 06:29 PM


Europython

EuroPython 2014 Feedback Form

EuroPython 2014 was a great event and we’d like to learn from you how to make EuroPython 2015 even better. If you attended EuroPython 2014, please take a few moments and fill out our feedback form:

EuroPython 2014 Feedback Form

We will leave the feedback form online for another two weeks and then use the information as basis for the work on EuroPython 2015 and also post a summary of the multiple choice questions (not the comments to protect your privacy) on our website. Many thanks in advance.

Helping with EuroPython 2015

If you would like to help with EuroPython 2015, we invite you to join the EuroPython Society. Membership is free. Just go to our application page and enter your details.

In the coming months, we will start the discussions about the new work group model we’ve announced at the conference.

Enjoy,

EuroPython Society

August 31, 2014 10:48 AM


EuroPython Society

EuroPython 2014 Feedback Form

EuroPython 2014 was a great event and we’d like to learn from you how to make EuroPython 2015 even better. If you attended EuroPython 2014, please take a few moments and fill out our feedback form:

EuroPython 2014 Feedback Form

We will leave the feedback form online for another two weeks and then use the information as basis for the work on EuroPython 2015 and also post a summary of the multiple choice questions (not the comments to protect your privacy) on our website. Many thanks in advance.

Helping with EuroPython 2015

If you would like to help with EuroPython 2015, we invite you to join the EuroPython Society. Membership is free. Just go to our application page and enter your details.

In the coming months, we will start the discussions about the new work group model we’ve announced at the conference.

Enjoy,

EuroPython Society

August 31, 2014 10:46 AM


Ian Ozsvald

Python Training courses: Data Science and High Performance Python coming in October

I’m pleased to say that via our ModelInsight we’ll be running two Python-focused training courses in October. The goal is to give you new strong research & development skills, they’re aimed at folks in companies but would suit folks in academia too. UPDATE training courses ready to buy (1 Day Data Science, 2 Day High Performance).

UPDATE we have a <5min anonymous survey which helps us learn your needs for Data Science training in London, please click through and answer the few questions so we know what training you need.

“Highly recommended – I attended in Aalborg in May “:… upcoming Python DataSci/HighPerf training courses”” @ThomasArildsen

These and future courses will be announced on our London Python Data Science Training mailing list, sign-up for occasional announces about our upcoming courses (no spam, just occasional updates, you can unsubscribe at any time).

Intro to Data science with Python (1 day) on Friday 24th October

Students: Basic to Intermediate Pythonistas (you can already write scripts and you have some basic matrix experience)

Goal: Solve a complete data science problem (building a working and deployable recommendation engine) by working through the entire process – using numpy and pandas, applying test driven development, visualising the problem, deploying a tiny web application that serves the results (great for when you’re back with your team!)

High Performance Python (2 day) on Thursday+Friday 30th+31st October

Students: Intermediate Pythonistas (you need higher performance for your Python code)

Goal: learn high performance techniques for performant computing, a mix of background theory and lots of hands-on pragmatic exercises

The High Performance course is built off of many years teaching and talking at conferences (including PyDataLondon 2013, PyCon 2013, EuroSciPy 2012) and in companies along with my High Performance Python book (O’Reilly). The data science course is built off of techniques we’ve used over the last few years to help clients solve data science problems. Both courses are very pragmatic, hands-on and will leave you with new skills that have been battle-tested by us (we use these approaches to quickly deliver correct and valuable data science solutions for our clients via ModelInsight). At PyCon 2012 my students rated me 4.64/5.0 for overall happiness with my High Performance teaching.

@ianozsvald [..] Best tutorial of the 4 I attended was yours. Thanks for your time and preparation!” @cgoering

We’d also like to know which other courses you’d like to learn, we can partner with trainers as needed to deliver new courses in London. We’re focused around Python, data science, high performance and pragmatic engineering. Drop me an email (via ModelInsight) and let me know if we can help.

Do please join our London Python Data Science Training mailing list to be kept informed about upcoming training courses.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

August 31, 2014 10:33 AM


Graeme Cross

Notes from MPUG, August 2014

A strong line-up of speakers and the post-PyCon AU buzz meant that we had about 35 people at the August Melbourne Python Users Group meeting. There were five presentations for the night and here are my random notes.

1. Highlights from PyCon AU 2014

Various people contributed their highlights of PyCon AU 2014, which was in Brisbane in early August (and the reason why the MPUG meeting was moved back a week). The conference videos are now up on the Youtube channel.

2. Javier Candeira: “Don’t monkeypatch None!”

Javier was inspired by discussion at PyCon AU about PEP 336 (Make None Callable) and decided to implement it, with some inspiration from previous PyCon AU talks by Ryan Kelly, Richard Jones (“Don’t do this!”), Nick Coghlan (“Elegant and ugly hacks in CPython”) and Larry Hastings.

He has implemented quiet_None, an analog to quiet_NaN. He walked us through the prototype code, with a deviation into Haskell’s Maybe monad along the way.

A few random notes:

3. Juan Nunez-Iglesias: SciPy 2014, a summary

Juan works at VLSCI and is a scikit-image contributor

SciPy 2014 highlights:

Recommended lightning talks:

On Juan’s “to watch” list:

4. Rory Hart: Microservices in Python

This was a repeat of Rory’s PyCon AU 2014 talk (here’s the video).

5. Nick Farrell: sux

Nick gave a repeat of his PyCon AU 2014 lightning talk on the sux library, which allows you to transparently use Python 2 packages in Python 3.

August 31, 2014 04:55 AM


Dave Behnke

Fun with SQLAlchemy

This is a little experiment I created with SQLAlchemy. In this notebook, I'm using sqlite to create a table, and doing some operations such as deleting all the rows in the table and inserting a list of items.

In [2]:
# connection is a connection to the database from a pool of connections
connection = engine.connect()
# meta will be used to reflect the table later
meta = MetaData()
2014-08-10 21:10:16,410 INFO sqlalchemy.engine.base.Engine SELECT CAST(&apostest plain returns&apos AS VARCHAR(60)) AS anon_1

INFO:sqlalchemy.engine.base.Engine:SELECT CAST(&apostest plain returns&apos AS VARCHAR(60)) AS anon_1

2014-08-10 21:10:16,411 INFO sqlalchemy.engine.base.Engine ()

INFO:sqlalchemy.engine.base.Engine:()

2014-08-10 21:10:16,413 INFO sqlalchemy.engine.base.Engine SELECT CAST(&apostest unicode returns&apos AS VARCHAR(60)) AS anon_1

INFO:sqlalchemy.engine.base.Engine:SELECT CAST(&apostest unicode returns&apos AS VARCHAR(60)) AS anon_1

2014-08-10 21:10:16,414 INFO sqlalchemy.engine.base.Engine ()

INFO:sqlalchemy.engine.base.Engine:()

In [3]:
# create the table if it doesn't exist already
connection.execute("create table if not exists test ( id integer primary key autoincrement, name text )")
2014-08-10 21:10:17,212 INFO sqlalchemy.engine.base.Engine create table if not exists test ( id integer primary key autoincrement, name text )

INFO:sqlalchemy.engine.base.Engine:create table if not exists test ( id integer primary key autoincrement, name text )

2014-08-10 21:10:17,214 INFO sqlalchemy.engine.base.Engine ()

INFO:sqlalchemy.engine.base.Engine:()

2014-08-10 21:10:17,217 INFO sqlalchemy.engine.base.Engine COMMIT

INFO:sqlalchemy.engine.base.Engine:COMMIT

Out[3]:
<sqlalchemy.engine.result.ResultProxy at 0x105083e48>
In [4]:
#reflects all the tables in the current connection
meta.reflect(bind=engine)
2014-08-10 21:10:17,986 INFO sqlalchemy.engine.base.Engine SELECT name FROM  (SELECT * FROM sqlite_master UNION ALL   SELECT * FROM sqlite_temp_master) WHERE type=&apostable&apos ORDER BY name

INFO:sqlalchemy.engine.base.Engine:SELECT name FROM  (SELECT * FROM sqlite_master UNION ALL   SELECT * FROM sqlite_temp_master) WHERE type=&apostable&apos ORDER BY name

2014-08-10 21:10:17,987 INFO sqlalchemy.engine.base.Engine ()

INFO:sqlalchemy.engine.base.Engine:()

2014-08-10 21:10:17,989 INFO sqlalchemy.engine.base.Engine PRAGMA table_info("sqlite_sequence")

INFO:sqlalchemy.engine.base.Engine:PRAGMA table_info("sqlite_sequence")

2014-08-10 21:10:17,990 INFO sqlalchemy.engine.base.Engine ()

INFO:sqlalchemy.engine.base.Engine:()

2014-08-10 21:10:17,993 INFO sqlalchemy.engine.base.Engine PRAGMA foreign_key_list("sqlite_sequence")

INFO:sqlalchemy.engine.base.Engine:PRAGMA foreign_key_list("sqlite_sequence")

2014-08-10 21:10:17,995 INFO sqlalchemy.engine.base.Engine ()

INFO:sqlalchemy.engine.base.Engine:()

2014-08-10 21:10:17,997 INFO sqlalchemy.engine.base.Engine PRAGMA index_list("sqlite_sequence")

INFO:sqlalchemy.engine.base.Engine:PRAGMA index_list("sqlite_sequence")

2014-08-10 21:10:17,997 INFO sqlalchemy.engine.base.Engine ()

INFO:sqlalchemy.engine.base.Engine:()

2014-08-10 21:10:17,999 INFO sqlalchemy.engine.base.Engine PRAGMA table_info("test")

INFO:sqlalchemy.engine.base.Engine:PRAGMA table_info("test")

2014-08-10 21:10:18,000 INFO sqlalchemy.engine.base.Engine ()

INFO:sqlalchemy.engine.base.Engine:()

2014-08-10 21:10:18,001 INFO sqlalchemy.engine.base.Engine PRAGMA foreign_key_list("test")

INFO:sqlalchemy.engine.base.Engine:PRAGMA foreign_key_list("test")

2014-08-10 21:10:18,002 INFO sqlalchemy.engine.base.Engine ()

INFO:sqlalchemy.engine.base.Engine:()

2014-08-10 21:10:18,004 INFO sqlalchemy.engine.base.Engine PRAGMA index_list("test")

INFO:sqlalchemy.engine.base.Engine:PRAGMA index_list("test")

2014-08-10 21:10:18,005 INFO sqlalchemy.engine.base.Engine ()

INFO:sqlalchemy.engine.base.Engine:()

In [5]:
# grabs a Table object from meta
test = meta.tables['test']
test
Out[5]:
Table(&apostest&apos, MetaData(bind=None), Column(&aposid&apos, INTEGER(), table=<test>, primary_key=True, nullable=False), Column(&aposname&apos, TEXT(), table=<test>), schema=None)
In [6]:
# cleans out all the rows in the test table
result = connection.execute(test.delete())
print("Deleted %d row(s)" % result.rowcount)
2014-08-10 21:10:22,659 INFO sqlalchemy.engine.base.Engine DELETE FROM test

INFO:sqlalchemy.engine.base.Engine:DELETE FROM test

2014-08-10 21:10:22,661 INFO sqlalchemy.engine.base.Engine ()

INFO:sqlalchemy.engine.base.Engine:()

2014-08-10 21:10:22,662 INFO sqlalchemy.engine.base.Engine COMMIT

INFO:sqlalchemy.engine.base.Engine:COMMIT

Deleted 11 row(s)

In [7]:
# create a list of names to be inserting into the test table
names = ['alpha', 'bravo', 'charlie', 'delta', 'epsilon', 'foxtrot', 'golf', 'hotel', 'india', 'juliet', 'lima']
In [8]:
# perform multiple inserts, the list is converted on the fly into a dictionary with the name field.
result = connection.execute(test.insert(), [{'name': name} for name in names])
print("Inserted %d row(s)" % result.rowcount)
2014-08-10 21:10:26,580 INFO sqlalchemy.engine.base.Engine INSERT INTO test (name) VALUES (?)

INFO:sqlalchemy.engine.base.Engine:INSERT INTO test (name) VALUES (?)

2014-08-10 21:10:26,582 INFO sqlalchemy.engine.base.Engine ((&aposalpha&apos,), (&aposbravo&apos,), (&aposcharlie&apos,), (&aposdelta&apos,), (&aposepsilon&apos,), (&aposfoxtrot&apos,), (&aposgolf&apos,), (&aposhotel&apos,)  ... displaying 10 of 11 total bound parameter sets ...  (&aposjuliet&apos,), (&aposlima&apos,))

INFO:sqlalchemy.engine.base.Engine:((&aposalpha&apos,), (&aposbravo&apos,), (&aposcharlie&apos,), (&aposdelta&apos,), (&aposepsilon&apos,), (&aposfoxtrot&apos,), (&aposgolf&apos,), (&aposhotel&apos,)  ... displaying 10 of 11 total bound parameter sets ...  (&aposjuliet&apos,), (&aposlima&apos,))

2014-08-10 21:10:26,583 INFO sqlalchemy.engine.base.Engine COMMIT

INFO:sqlalchemy.engine.base.Engine:COMMIT

Inserted 11 row(s)

In [9]:
# query the rows with select, the where clause is included for demostration
# it can be omitted
result = connection.execute(select([test]).where(test.c.id > 0)) 
2014-08-10 21:10:28,528 INFO sqlalchemy.engine.base.Engine SELECT test.id, test.name 
FROM test 
WHERE test.id > ?

INFO:sqlalchemy.engine.base.Engine:SELECT test.id, test.name 
FROM test 
WHERE test.id > ?

2014-08-10 21:10:28,529 INFO sqlalchemy.engine.base.Engine (0,)

INFO:sqlalchemy.engine.base.Engine:(0,)

In [10]:
# show the results
for row in result:
    print("id=%d, name=%s" % (row['id'], row['name']))
id=56, name=alpha
id=57, name=bravo
id=58, name=charlie
id=59, name=delta
id=60, name=epsilon
id=61, name=foxtrot
id=62, name=golf
id=63, name=hotel
id=64, name=india
id=65, name=juliet
id=66, name=lima

In []:

August 31, 2014 12:30 AM

August 30, 2014


Dave Behnke

Back to Python

After some "soul searching" and investigation between go and python over the last few months. I've decided to come back to Python.

My Experience

I spent a couple months researching and developing with Go. I even bought a pre-released book (Go in Action - http://www.manning.com/ketelsen/). The concurrency chapter wasn't written quite yet so I ended up looking elsewhere. I eventually found an excellent book explaining concurrency concepts of Go through my safari account (https://www.safaribooksonline.com/). The book is entitled Mastering Concurrency in Go. (https://www.packtpub.com/application-development/mastering-concurrency-go)

After going through a couple of programming exercises using Go, I started to think to myself, how would I do this in Python. It started to click in my brain that on a conceptional level a goroutine is similar to a async coroutine in Python. The main difference is that Go was designed from the beginning to be concurrent. Python it requires a little more work.

I'll make a long story short. Go is a good language, I will probably use it for specific problems to solve. I'm more familiar with Python. The passionate Python community makes me proud to be a part of it. I like having access to many interesting modules and packages. Ipython, Flask, Djano, Sqlalchemy just to name a few.

I look forward to continuing to work with Python and share code examples where I can. Stay tuned!

August 30, 2014 08:38 PM

August 29, 2014


Alec Munro

It's yer data! - how Google secured its future, and everyone else's

Dear Google,

This is a love letter and a call to action.

I believe we stand at a place where there is a unique opportunity in managing personal data.

There is a limited range of data types in the universe, and practically speaking, the vast majority of software works with a particularly tiny fraction of them.

People, for example. We know things about them.

Names, pictures of, people known, statements made, etc.

Tons of web applications conceive of these objects. Maybe not all, but probably most have some crossover. For many of the most trafficked apps, this personal data represents a very central currency. But unfortunately, up until now we've more or less been content with each app having it's own currency, that is not recognized elsewhere.

You can change that. You can establish a central, independent bank of data, owned by users and lent to applications in exchange for functionality. The format of the data itself will be defined and evolved by an independent agency of some sort.

There are two core things this will accomplish.

1) It will open up a whole new world of application development free from ties to you, Facebook, Twitter, etc.

2) It will give people back ownership of their data. They will be able to establish and evolve an online identity that carries forward as they change what applications they use.

Both of these have a dramatic impact on Google, as they allow you to do what you do best, building applications that work with large datasets, while at the same time freeing from you concerns that you are monopolizing people's data.

A new application world

When developing a new application, you start with an idea, and then you spend a lot of time defining a data model and the logic required to implement that idea on that data model. If you have any success with your application, you will need to invest further in your data model, fleshing it out, and implementing search, caching, and other optimizations.

In this new world, all you would do is include a library and point it at an existing data model. For the small fraction of data that was unique to your application, you could extend the existing model. For example:
from new_world import Model, Field

BaseUser = Model("https://new_world.org/users/1.0")

class OurUser(BaseUser):
our_field = Field("our_field", type=String)

That's it. No persistence (though you could set args somewhere to define how to synchronize), no search, no caching. Now you can get to actually building what makes your application great.

Conceivably, you can do it all in Javascript, other than identifying the application uniquely to the data store.

And you can be guaranteed data interoperability with Facebook, Google, etc. So if you make a photo editing app, you can edit photos uploaded with any of those, and they can display the photos that are edited.

Securing our future

People have good reason to be suspicious of Google, Facebook, or any other organization that is able to derive value through the "ownership" of their data. Regardless of the intent of the organization today, history has shown that profit is a very powerful motivator for bad behaviour, and these caches of personal data represent a store of potential profit that we all expect will at some point prove too tempting to avoid abusing.

Providing explicit ownership and license of said data via a third-party won't take away the temptation to abuse the data, but will make it more difficult in a number of ways:

A gooder, more-productive, Google

By putting people's data back in their hands, and merely borrowing it from them for specific applications, the opportunities for evil are dramatically reduced.

But what I think is even more compelling for Google here is that it will make you more productive. Internally, I believe you already operate similar to how I've described here, but you constantly bump up against limitations imposed by trying not to be evil. Without having to worry about the perceptions of how you are using people's data, what could you accomplish?

Conclusion

Google wants to do no evil. Facebook is perhaps less explicit, but from what I know of its culture, I believe it aspires to be competent enough that there's no need to exploit users data. The future will bring new leadership and changes in culture to both companies, but if they act soon, they can secure their moral aspirations and provide a great gift to the world.

(Interesting aside, Amazon's recently announced Cognito appears to be in some ways a relative of this idea, at least as a developer looking to build things. Check it out.)

August 29, 2014 05:31 PM

August 28, 2014


Yann Larrivée

ConFoo is looking for speakers

ConFoo is currently looking for web professionals with deep understanding of PHP, Java, Ruby, Python, DotNet, HTML5, Databases, Cloud Computing, Security and Mobile development to share their skills and experience at the next ConFoo. Submit your proposals between August 25th and September 22nd.

ConFoo is a conference for developers that has built a reputation as a prime destination for exploring new technologies, diving deeper into familiar topics, and experiencing the best of community and culture.

If you would simply prefer to attend the conference, we have a $290 discount until October 13th.

August 28, 2014 07:49 PM


Martijn Faassen

Morepath 0.5(.1) and friends released!

I've just released a whole slew things of things, the most important is Morepath 0.5, your friendly neighborhood Python web framework with superpowers!

What's new?

There are a a bunch of new things in the documentation, in particular:

Also available is @reg.classgeneric. This depends on a new feature in the Reg library.

There are a few bug fixes as well.

For more details, see the full changelog.

Morepath mailing list

I've documented how to get in touch with the Morepath community. In particular, there's a new Morepath mailing list!

Please do get in touch!

Other releases

I've also released:

  • Reg 0.8. This is the generic function library behind some of Morepath's flexibility and power.
  • BowerStatic 0.3. This is a WSGI framework for including static resources in HTML pages automatically, using components installed with Bower.
  • more.static 0.2. This is a little library integrating BowerStatic with Morepath.

Morepath videos!

You may have noticed I linked to Morepath 0.5.1 before, not Morepath 0.5. This is because I had to as I was using a new youtube extension that gave me a bit too much on readthedocs. I replaced that with raw HTML, which works better. The Morepath docs now include two videos.

  • On the homepage is my talk about Morepath at EuroPython 2014 in July. It's a relatively short talk, and gives a good idea on what makes Morepath different.
  • If you're interested in the genesis and history behind Morepath, and general ideas on what it means to be a creative developer, you can find another, longer, video on the Morepath history page. This was taken last year at PyCon DE, where I had the privilege to be invited to give a keynote speech.

August 28, 2014 05:10 PM


Ian Ozsvald

High Performance Python Training at EuroSciPy this afternoon

I’m training on High Performance Python this afternoon at EuroSciPy, my github source is here (as a shortlink: http://bit.ly/euroscipy2014hpc). There are prerequisites for the course.

This training is actually a tiny part of what I’ll teach on my 2 day High Performance Python course in London in October (along with a Data Science course). If you’re at EuroSciPy, please say Hi :-)


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

August 28, 2014 10:18 AM


Catalin George Festila

python book from O'Reilly Media - Save 50% .

Save 50% from O'Reilly Media.
The main goal it's to help you with the best possible performance in your Python applications.
See this book Python High Performance Programming.

August 28, 2014 10:45 AM


Richard Jones

When testing goes bad

I've recently started working on a large, mature code base (some 65,000 lines of Python code). It has 1048 unit tests implemented in the standard unittest.TestCase fashion using the mox framework for mocking support (I'm not surprised you've not heard of it).

Recently I fixed a bug which was causing a user interface panel to display when it shouldn't have been. The fix basically amounts to a couple of lines of code added to the panel in question:

+    def can_access(self, context):
+        # extend basic permission-based check with a check to see whether 
+        # the Aggregates extension is even enabled in nova 
+        if not nova.extension_supported('Aggregates', context['request']):
+            return False
+        return super(Aggregates, self).can_access(context)

When I ran the unit test suite I discovered to my horror that 498 of the 1048 tests now failed. The reason for this is that the can_access() method here is called as a side-effect of those 498 tests and the nova.extension_supported (which is a REST call under the hood) needed to be mocked correctly to support it being called.

I quickly discovered that given the size of the test suite, and the testing tools used, each of those 498 tests must be fixed by hand, one at a time (if I'm lucky, some of them can be knocked off two at a time).

The main cause is mox's mocking of callables like the one above which enforces the order that those callables are invoked. It also enforces that the calls are made at all (uncalled mocks are treated as test failures).

This means there is no possibility to provide a blanket mock for the "nova.extension_supported". Tests with existing calls to that API need careful attention to ensure the ordering is correct. Tests which don't result in the side- effect call to the above method will raise an error, so even adding a mock setup in a TestCase.setUp() doesn't work in most cases.

It doesn't help that the codebase is so large, and has been developed by so many people over years. Mocking isn't consistently implemented; even the basic structure of tests in TestCases is inconsistent.

It's worth noting that the ordering check that mox provides is never used as far as I can tell in this codebase. I haven't sighted an example of multiple calls to the same mocked API without the additional use of the mox InAnyOrder() modifier. mox does not provide a mechanism to turn the ordering check off completely.

The pretend library (my go-to for stubbing) splits out the mocking step and the verification of calls so the ordering will only be enforced if you deem it absolutely necessary.

The choice to use unittest-style TestCase classes makes managing fixtures much more difficult (it becomes a nightmare of classes and mixins and setUp() super() calls or alternatively a nightmare of mixing classes and multiple explicit setup calls in test bodies). This is exacerbated by the test suite in question introducing its own mock-generating decorator which will generate a mock, but again leaves the implementation of the mocking to the test cases. py.test's fixtures are a far superior mechanism for managing mocking fixtures, allowing far simpler centralisation of the mocks and overriding of them through fixture dependencies.

The result is that I spent some time working through some of the test suite and discovered that in an afternoon I could fix about 10% of the failing tests. I have decided that spending a week fixing the tests for my 5 line bug fix is just not worth it, and I've withdrawn the patch.

August 28, 2014 08:07 AM