skip to navigation
skip to content

Planet Python

Last update: May 17, 2012 08:46 AM

May 17, 2012


PyLadies

Jessica McKellar to be Keynote speaker at DjangoCon Europe!

Jessica McKellar to be Keynote speaker at DjangoCon Europe!

May 17, 2012 08:37 AM


Frank Wierzbicki

Jython 2.7 alpha1 released!

On behalf of the Jython development team, I'm pleased to announce that Jython 2.7 alpha1 is available for download. See the installation instructions.

I'd like to thank Adconion Media Group for sponsoring my work on Jython 2.7. I'd also like to thank the many contributors to Jython. Jython 2.7 alpha1 implements much of the functionality introduced by CPython 2.6 and 2.7. There are still some missing features, in particular bytearray and the io system are currently incomplete.
Please report any bugs that you find. Thanks!

May 17, 2012 01:55 AM

May 16, 2012


Ashish Vidyarthi

Hadoop Map-Reduce with mrjob

With Hadoop, you have more flexibility in accessing files and running map-reduce jobs with java. All other languages needs to use Hadoop streaming and it feels like a second class citizen in Hadoop programming.

For those who like to write map-reduce programs in python, there are good toolkit available out there like mrjob and dumbo.
Internally, they still use Hadoop streaming to submit map-reduce jobs. These tools simplify the process of map-reduce job submission. My own experience with mrjob has been good so far. Installing and using mrjob is easy.

Installing mrjob

First ensure that you have installed a higher version of python than the default that comes with Linux (2.4.x for supporting yum). Ensure that you don't replace the existing python distribution as it breaks "yum".

Install mrjob on one of the machine in your Hadoop cluster. It is nicer to use virtualenv for creating isolated environment.


wget -O virtualenv.py http://bit.ly/virtualenv
/usr/bin/python26 virtualenv.py pythonenv
hadoopenv/bin/easy_install pip
hadoopenv/bin/pip install mrjob

Read more »

May 16, 2012 09:59 PM


Invent with Python

A Modest Proposal: Please Don’t Learn to Code Because It Will Damage Your Tiny Brain

Jeff Atwood wrote a post on his Coding Horror blog entitled “Please Don’t Learn to Code” in which he rails against the idea that “everyone should learn programming”.

And I couldn’t agree more.

People, not everyone needs to learn programming. Only some gifted individuals (of which we professional software developers are included) need to learn programming. For the rest of you, unless you are srsly committed it will just be a meaningless chore that may damage your tiny brains.

Coding is just like surgery: if an amateur decides to code their own Angry Birds clone as a fun little project, people will literally die. Those are the stakes, folks. That’s why it should be left to those who are explicitly pursing it as a professional career.

TL; DR link

You have my assurance that I find Bloomberg’s encouragement of people to learn a technical skill personally offensive. It filled me with a rage that was only subdued after discouraging a small child from learning to play the harmonica. (What’s the kid going to do with that skill anyway? There are better ways he could spend his valuable time.)

Meanwhile, Jeff doesn’t hold back when he deals his death blow:

“Can you explain to me how Michael Bloomberg would be better at his day to day job of leading the largest city in the USA if he woke up one morning as a crack Java coder?”

Gauntlet… THROWN. The especially insidious thing (which Jeff shows that he himself is well aware of) is that even if Bloomberg’s serpentine journey into forbidden coding knowledge ended up being useless (which, yeah, maybe the mayor doesn’t need to know programming), his tweet has terrible implications for those he is in communion with. The man has 252,000 followers on Twitter. It’s as though the point of his tweet wasn’t just a casual announcement of his own 2012 resolutions, but also an encouragement for citizens to educate themselves on a technical subject.

PURE EVIL.

Think of the anarchy that would result if a fraction of them took it upon themselves to learn a new skill that is ordinarily considered not-for-average-people. They might find out that programming wasn’t as unapproachable as they previously thought!

Those who tout the “everyone can learn to code” and “coding is increasingly becoming essential” line are unaware of how preposterous their claims are. To give an example, Jeff replaces “programming” with “plumbing” in a quote from Tim O’Reilly:

Exactly! Plumbing and programming in this context are completely comparable which is why this twisted quote proves Jeff’s point. The demand for programmers isn’t high at all, and we can only expect it to decline as the 21st century progresses. And even if that wasn’t true, you don’t begin the journey to learn programming from a website like Codecademy. When has anyone ever used the Internet to learn something?

Jeff’s bullet-pointed reasons that follow his courageous, speak-truth-to-power words are at once a soothing symphony to the nerves of our exclusive software guild and a salvo against those who dare think they could learn to fish for themselves. I iterate them here with my own praising commentary:

“It assumes that more code in the world is an inherently desirable thing. [...] You should be learning to write as little code as possible. Ideally none.”

What a lot of people don’t understand is that programmers never write throwaway code. I once had the problem of wanting to download a few hundred images off of a website that had URLs like http://example.com/1.jpg, http://example.com/2.jpg, http://example.com/3.jpg, etc. up to http://example.com/347.jpg. (A local copy of an online comic, something that plenty of non-coders would want to do.)

At first I thought, “No problem. I always forget the exact function names of the networking library, but wget (a command line tool for downloading files off the web) is easy enough to use. I’ll just write a small script that writes a simple batch file that calls wget on each of these images. Something like:”

fp = open('temp.bat', 'w')
for i in range(1, 348):
    fp.write('wget http://example.com/%s.jpg\n' % (i))

Then I’d run the temp.bat file and I wouldn’t have to type out all those wget commands myself. A couple minutes later and I have all 347 images on my hard drive.

WHOA SLOW DOWN, AL. Are you sure you want to commit to writing this code? Writing a script that itself writes a script to be executed? That doesn’t sound very elegant. Isn’t that overengineering a solution to this problem? And what about supporting this code? Will it scale to millions of users? Have you thought about localization? What if I need to translate this script to Bulgarian? Do you really want to introduce this low-quality throwaway code into the world? Is that an inherently desirable thing?

Just imagine if your average layperson had the knowledge to automate simple, repetitive tasks with a quick hack like this. Then they wouldn’t have to struggle through all that manual typing and mouse clicking like a good techno-plebian. Or “ideally”, they could just give up on the problem and think there’s no practical solution (no code at all!).

“It assumes that coding is the goal. [...] Before you go rushing out to learn to code, figure out what your problem actually is.”

The core problem of this learn-coding-for-the-sake-of-coding fad is that you should have all the details of how you are going apply that skill laid out before you even write your first “Hello world” program. Having knowledge of new methods never broadens our perspectives or opens insights to novel solutions.

In my toolbox, there’s only a hammer and a saw because that’s all I need to build birdhouses. Sure, I might get ideas for new things to build if I had other tools. But I don’t know what those things are right now, so why do I need anything beyond what I already have? And hopefully if I stick to what I know, I’ll never find out.

“Software developers tend to be software addicts who think their job is to write code. But it’s not. Their job is to solve problems. Don’t celebrate the creation of code, celebrate the creation of solutions.”

Totally on the ball. It’s just like when one of my friends wanted to start an afterschool music program. “Hey now, the world doesn’t need another afterschool program. It needs an effective solution to lower drop-out rates, increase test scores, and prevent gang violence.” Thankfully, I was able to convince him to can the idea.

“It assumes that adding naive, novice, not-even-sure-they-like-this-whole-programming-thing coders to the workforce is a net positive for the world.”

I’ve seen some terrible code in my time, and the thought of these novice programmers being hired and putting their code on the market send shivers down my spine. Maybe if there was some way that companies could judge which candidates were qualified or not before hiring them (like a piece of paper listing their previous experience and education, or some sort of one-on-one conversation about their domain knowledge) it would be okay for people with less-than-absolute-mastery skill in coding to exist. Sadly, I can think of nothing that would prevent crap programmers from instantly being put in charge of the software for nuclear power plants.

You don’t create an effective workforce of software developers by encouraging many people to try their hand at coding and see who sticks with it. Rather, you need to scare off anyone who isn’t “serious” about coding. That’s the attitude that has led to the arrogant, petty, sarcastic, satire-writing generation of computer programmers I proudly say we have today.

“It implies that there’s a thin, easily permeable membrane between learning to program and getting paid to program professionally.”

The cruelest joke is played on the ones who take on the task of learning to program. Jeff specifically cites the book “Teach Yourself Perl in 24 Hours” as contributing to this misleading atmosphere. How could anyone not realize that this claim was a gimmick based on the fact that the book had 24 chapters? When I was first starting to learn Perl, I literally thought this $35 book and a single day would be all I needed to become an employed Perl developer who knew every facet of the language. Surely I can’t be the only one who thought this. I hadn’t been so disappointed since the day I saw “The NeverEnding Story” and ended up leaving the movie theater a mere two hours later.

In Conclusion

Why would the average person need to learn programming? When would they ever use that knowledge if they weren’t going to become a software engineer? Just like mathematics, a musical instrument, a foreign language, all sports ever invented, cooking, dancing, knitting, sailing, and everything beyond a 3rd grade education, you can get by in life just fine without it. I mean, when have you ever heard of someone enjoying programming just as a hobby and creative pursuit? When has anyone said that programming is great way to improve general analytical skills?

Remember, Jeff isn’t just saying that most professions and lifestyles wouldn’t be significantly enhanced by programming ability. He isn’t just saying that “coding is the new literacy that you have to have” is hyperbole. (Both are arguable points.) If he was, he would have titled his post, “You Don’t Actually Need to Learn to Code”. Rather, he wants to keep the unwashed masses from embarrassing themselves with their amateur code which he and the other elite coders will end up having to debug. That’s why the title is “Please Don’t Learn to Code”.

It’s his plea for you to not even try.

Folks, programming is a privilege and a responsibility and not everyone should attempt to have it. In that way, it’s just like writing. If an army of amateurs took up keyboards to articulate their thoughts, just imagine what kind of dopey, smug opinions would be posted to the Internet.

Signed,
A Concerned Rock Star Programmer

Share

May 16, 2012 08:59 PM


Omaha Python Users Group

May Meeting Cancelled

No meeting on Wednesday, May 16, 2012.  See you next month.

May 16, 2012 08:29 PM


PyLadies

Building A Great Work Culture (Inspire, Empower And Collaborate)

Building A Great Work Culture (Inspire, Empower And Collaborate)

May 16, 2012 05:37 PM


Roberto Alsina

Hack English Instead

Lots of noise recently about Jeff Atwood's post about why you should not learn to code. I am here now telling you you should learn to code. But only after you learn a few other things.

You should learn to speak. You should learn to write. You should learn to listen. You should learn to read. You should learn to express yourself.

Richard Feynman once described his problem solving algorithm as follows:

  1. Write down the problem
  2. Think real hard
  3. Write down the solution

Most of us cannot do that because we are not Richard Feynman and thus, sadly, cannot keep all the solution in our head in step 2, so we need to iterate a few times, thinking (not as hard as he could) and writing down a bit of the solution on each loop.

And while we who code are unusually proud of our ability to write down solutions in such a clear and unforgiving way that even a computer can follow them, it's ten, maybe a hundred times more useful to know how to write it down, or say it, in such a way that a human being can understand it.

Explanations fit for computers are bad for humans and viceversa. Humans accept much more compact, ambiguous, and expressive code. You can transfer high level concepts or design to humans much easier than to computers, but algorithms to computers much easier than to humans.

I have a distrust of people who are able to communicate to computers easier than with fellow humans, a suspicion that they simply have a hole in their skillset, which they could easily fix if they saw it as essential.

And it is an essential skill. Programmers not only run on coffee and sugar and sushi and doritos, they run on happiness. They have a finite endowment of happiness and they spend it continuously, like drunken sailors. They perform an activity where jokingly they measure productivity on curses per hour, a lonely endeavour that isolates them (us) from other humans, from family and friends.

If a developer cannot communicate he isolates. When he isolates he can't cooperate, he cannot delegate. He can't give ideas to others, he can't receive them, he can't share.

And since lots of our communication is via email, and chat, and bug reports, and blogs, it's better if he can write. A developer who cannot write is at a serious disadvantage. A developer who cannot write to express an idea cannot explain, he doesn't make his fellows better. He's a knowledge black hole, where information goes to die behind the event horizon of his skull.

So, learn to write. Learn to speak. Learn to read and listen. Then learn to code.

May 16, 2012 05:23 PM


Menno's Musings

IMAPClient 0.9 released

I'm pleased to announce version 0.9 of IMAPClient, the easy-to-use and Pythonic IMAP client library.

Highlights for this release:

The NEWS file and manual have more details on all of the above.

As always, IMAPClient can be installed from PyPI (pip install imapclient) or downloaded from the IMAPClient site.

The main focus of the next release (0.10) will be Python 3 support as this is easily the most requested feature. Watch this space for more news on this.

May 16, 2012 03:02 PM


Python Diary

South is absolutely brilliant

South is absolutely brilliant. I never expected it to properly convert over my data as easily as it did. Basically, I originally had a monolithic table, which I wanted to convert into 3 separate tables linked together using a ForeignKey. The reason was, that 2 fields contains mostly the same information, and re-typing it, and organizing the data was becoming a tad tedious. Here is the first Python script I ran after creating the new tables. This script automatically separates the data into the required tables for me:

from tickets.models import Taxonomy, Category, Reason

print "Starting Taxonomy migration process..."
for tax in Taxonomy.objects.all():
	print "Processing %s..." % tax
	try:
		ci = Category.objects.get(title=tax.ci)
	except Category.DoesNotExist:
		ci = Category.objects.create(title=tax.ci)
		print "Created Category %s." % ci
	try:
		r = Reason.objects.get(title=tax.reason)
	except Reason.DoesNotExist:
		r = Reason.objects.create(title=tax.reason)
		print "Created Reason %s." % r
	tax.ci = ci.pk
	tax.reason = r.pk
	tax.save()
	print "Linked PKs, finished."

As you can see, it replaces the actual category name and reason with just the PK, at first I wasn't sure what South would do with the data if I just convert a CharField over to a ForeignKey. Here are the changes to the main model:

	ci = models.CharField(max_length=90, blank=True, verbose_name='CI Unique ID')
	reason = models.CharField(max_length=90)
# To:
	ci = models.ForeignKey(Category, blank=True, verbose_name='CI Unique ID')
	reason = models.ForeignKey(Reason)

Here is the migration data generated by South:

class Migration(SchemaMigration):

    def forwards(self, orm):
        
        # Renaming column for 'Taxonomy.ci' to match new field type.
        db.rename_column('tickets_taxonomy', 'ci', 'ci_id')
        # Changing field 'Taxonomy.ci'
        db.alter_column('tickets_taxonomy', 'ci_id', self.gf('django.db.models.fields.related.ForeignKey')(to=orm['tickets.Category']))

        # Adding index on 'Taxonomy', fields ['ci']
        db.create_index('tickets_taxonomy', ['ci_id'])

        # Renaming column for 'Taxonomy.reason' to match new field type.
        db.rename_column('tickets_taxonomy', 'reason', 'reason_id')
        # Changing field 'Taxonomy.reason'
        db.alter_column('tickets_taxonomy', 'reason_id', self.gf('django.db.models.fields.related.ForeignKey')(to=orm['tickets.Reason'], max_length=90))

        # Adding index on 'Taxonomy', fields ['reason']
        db.create_index('tickets_taxonomy', ['reason_id'])

All in all, there was almost no work required, besides writing up a quick Python script to move the actual data to a new table, and place the PK into it. According to the database, the fields are now INTs, as they should be for ForeignKeys.

May 16, 2012 11:40 AM


Wingware

Wing IDE 4.1.6 released

Wingware has released version 4.1.6 of Wing IDE, an integrated development environment designed specifically for the Python programming language.

Wing IDE is a cross-platform Python IDE that provides a professional code editor with vi, emacs, and other key bindings, auto-completion, call tips, refactoring, context-aware auto-editing, a powerful graphical debugger, version control, unit testing, search, and many other features.

Highlights of this release include:

* Support for Django 1.4
* Syntax highlighting Qt Style Sheet (.qss) files
* Command to show selected file in OS-provided file manager
* Per-project configuration of Debug Network Port for remote debugging
* Several auto-editing fixes
* Several turbo completion mode fixes
* Replace All preserves fold state when possible
* Git blame support
* Fixed debugging QThreads in older PyQt versions
* Shorter delay in restarting Python Shell or debug process
* About 15 other bug fixes and minor improvements

See the details in the change log.

Version 4 adds the following new major features:

* Refactoring -- Rename/move symbols, extract to function/method, and introduce variable
* Find Uses -- Find all points of use of a symbol
* Auto-Editing -- Reduce typing by auto-entering expected code
* Diff/Merge -- Graphical file and repository comparison and merge
* Django Support -- Debug Django templates, run Django unit tests, and more
* matplotlib Support -- Maintains live-updating plots in shell and debugger
* Simplified Licensing -- Includes all OSes and adds Support+Upgrades subscriptions

Learn more about Wing IDE   |   See what's new in Wing 4   |   Try a free trial   |   Purchase a License

May 16, 2012 08:38 AM


eGenix.com

eGenix pyOpenSSL Distribution 0.13.0-1.0.0j GA

eGenix is pleased to announce eGenix pyOpenSSL Distribution 0.13.0-1.0.0j for Python 2.4 - 2.7, with support for Windows, Linux and Mac OS X.

May 16, 2012 07:00 AM


PyCon

Apac PyCon 2012

Apac PyCon 2012 is here. Be sure to register for it at http://apac.pycon.org
 and come experience the wonderful talks from our keynotes and speakers at Singapore!


May 16, 2012 05:03 AM

May 15, 2012


Roberto Alsina

Nikola Plans

English only!


I have not stopped working on Nikola, my static site generator. Here are the plans:

  1. Finish the theme installer (so you can get a theme from the site easily)
  2. Implement a theme gallery on the site (same purpose)
  3. Fix a couple of bugs
  4. Update manual
  5. Polish a few theme bits
  6. Release version 3.x (new major number because it requires manual migration)

After that, I will push on projects Shoreham (hosted sites) and Smiljan (planet generator) and make them more public. Shoreham will become a real web app for those who don't want to have their own server. For free, hopefully!

Once I have that, I have no further feature ideas, really. So I need more people to start using it, and that means I have to start announcing it more.

So, stay tuned for version 3.x sometime next week.

Post-Nikola, I will do a rst2pdf release, and then will get back to work on a book.

May 15, 2012 10:05 PM


Tryton News

End of maintance for series 1.4

It makes now more than 2.5 years since the first release of series 1.4.
This ends the maintenance on this series.
To anyone who is still using it, it is highly recommended to upgrade to a more recent series.

May 15, 2012 08:56 PM


Jeet Sukumaran

Building and Installing SciPy, NumPy and PyMC on OS X 10.7 Lion

Site Section: 

Keywords: 

SciPy and Numpy are great packages for scientific computing. Unfortunately, installation on Mac OS X 10.7 Lion is not a very smooth affair. It can be a pain for moderately-experienced developers, and a nightmare for novice end-users. The problem is compounded by the fact that both SciPy and Numpy, as well other components of a scientific computing stack that depend on them, such as matplotlib or pymc, all need to be built using the same compiler suite, build flags and architecture. This would be fine if the pre-built distribution packages available through "easy_install" or "pip" for these various components were indeed all mutually compatible, or if "easy_install"/"pip" were smart enough to pick the correct packages for one's particular system. But, at least the last time I tried the "easy_install" route, this is not the case. Numpy installed fine via "easy_install". But SciPy failed miserably half-way through. Which meant that I had to build SciPy myself, which I finally did after some irritating false starts. Then I tried "easy_install"-ing PyMC. This apparently worked, in that the install procedure yielded no obvious errors. But this was misleading. Actually trying to import the module failed due to architectural incompatibilities.

So, after various false starts, I finally got three packages (SciPy, Numpy, and PyMC) installed and working (so far) on my system. The following are the steps that I took.

Prerequisites

Download and install the following (installing Apple Developer Tools should give you Git):

  1. Apple Developer Tools
  2. Git
  3. gfortran

Build Environment

Set up the build environment by setting the following environmental variables in your session shell:

    export CC=clang
    export CXX=clang
    export FFLAGS=-ff2c
    export LDFLAGS="-arch x86_64 -Wall -undefined dynamic_lookup -bundle"
    export FFLAGS="-arch x86_64"

This is the key to successfully setting up the stack. If you add other packages that depend in some way on either SciPy or Numpy in the future, you need to make sure the above environmental variables are set in your build session shell when building/installing them.

Numpy

  1. Download the Numpy source from:

    http://sourceforge.net/projects/numpy/files/NumPy/

  2. Unpack, build and install it:

     tar xf numpy-1.6.1.tar.gz
     cd numpy-1.6.1
     python setup.py build
     python setup.py install
    

SciPy

  1. Clone the SciPy source repository:

    git clone https://github.com/scipy/scipy.git
    
  2. Build and install:

    cd scipy
    python setup.py build
    python setup.py install
    

PyMC

  1. Download the PyMC source from:

    http://pypi.python.org/pypi/pymc/

  2. Unpack, build, and install:

    tar xf pymc-2.2.tar.gz
    cd pymc
    python setup.py build
    python setup.py install
    

May 15, 2012 02:37 PM


John Cook

Machine Learning in Action

A couple months ago I briefly reviewed Machine Learning for Hackers by Drew Conway and John Myles White. Today I’m looking at Machine Learning in Action by Peter Harrington and comparing the two books.

Both books are about the same size and cover many of the same topics. One difference between the two books is choice of programming language: ML for Hackers uses R for its examples, ML in Action uses Python.

ML in Action doesn’t lean heavily on Python libraries. It mostly implements its algorithms from scratch, with a little help from NumPy for linear algebra, but it does not use ML libraries such as scikit-learn. It sometimes uses Matplotlib for plotting and uses Tkinter for building a simple GUI in one chapter. The final chapter introduces Hadoop and Amazon Web Services.

ML for Hackers is a little more of a general introduction to machine learning. ML in Action contains a brief introduction to machine learning in general, but quickly moves on to specific algorithms. ML for Hackers spends a good number of pages discussing data cleaning. ML in Action starts with clean data in order to spend more time on algorithms.

ML in Action takes 8 of the top 10 algorithms in machine learning (as selected by this paper) and organizes around these algorithms. (The two algorithms out of the top 1o that didn’t make it into ML in Action were PageRank, because it has been covered well elsewhere, and EM, because its explanation requires too much mathematics.) The algorithms come first in ML in Action, illustrations second. ML for Hackers puts more emphasis on its examples and reads a bit more like a story. ML in Action reads a little more like a reference book.

http://www.johndcook.com/blog/2008/06/27/wine-beer-and-statistics/#comment-170809

May 15, 2012 01:23 PM


Grig Gheorghiu

The correct way of using DynamoDB BatchWriteItem with boto

In my previous post I wrote about the advantages of using the BatchWriteItem functionality in DynamoDB. As it turns out, I was overly optimistic when I wrote my initial code: I only called the batch_write_item method of the layer2 module in boto once.

The problem with this approach is that many of the batched inserts can fail, and in practice this happens quite frequently, probably because of transient network errors. The correct approach is to inspect the response object returned by batch_write_item -- here is an example of such an object:


{'Responses': {'mytable': {'ConsumedCapacityUnits': 5.0}},
 'UnprocessedItems': {'mytable': [
{'PutRequest': {'Item': {'mykey': 'key1', 'myvalue': 'value1'}}},
{'PutRequest': {'Item': {'mykey': 'key2', 'myvalue': 'value2'}}},
{'PutRequest': {'Item': {'mykey': 'key3', 'myvalue': 'value3'}}}]}}


You need to look for the value corresponding to the 'UnprocessedItems' key. This value is a dictionary keyed by the name of the table you're inserting items in. The value corresponding to that key gives you a list of other dictionaries with keys corresponding to the operations you applied to the table ('PutRequest' in my case). Going one level deeper allows you to finally obtain the attributes (keys + values) of the items that failed, which you can then try to re-insert.

So basically you need to stay in a loop and keep calling batch_write_items until UnprocessedItems corresponds to an empty list. Here is a gist containing code that reads a log file in lzop format, looks for lines containing a key + white space + a value, then inserts items based on those key/value pairs into a DynamoDB table. I've been pretty happy with this approach.

Before I finish, I'd like to reiterate the gripe I have about the static nature of determining your Read and Write Throughput when dealing with DynamoDB. I understand that it makes life easier for AWS in terms of the capacity planning they have to do on their end to scale the table across multiple instances, but it's a black art when it comes to capacity planning you need to do as a user. You almost always end up overcommitting as a DynamoDB user, and it's hard to make sense sometimes of the capacity units you're consuming, especially when doing inserts of large volumes of data.


May 15, 2012 11:22 AM


Morten W Petersen

Python style & flexibility, tabs or spaces

I've been programming Python for a good while, and in the beginning I was very occupied with the style in which things are written.

I always advocated using_names_like_this for everything, I guess because it is easy to read and understand, and it also takes into account acronyms which should be uppercase, as these acronyms are also separated from other words, for_example_html_text.

However, over the years, mostly these last couple of years, naming conventions aren't that important any more. Python is a very flexible language, and one can for example create a function that returns an instance of a class, the class being decided by the invoked function.. With this flexibility, separating functions from classes is somewhat redundant IMO, and one might just as well do everything in lower-case separated by an underscore.

However, I'm not riding these arguments anymore, there are a number of different Python systems such as Zope, Django, Repoze, Archetypes, Plone etc. that use a myriad of different ways of naming things.

I think the most important thing is, and has always been, the proper use of tabs and spaces in Python files, as this is the thing that can get you and has real impact on the "flow" of the program.

I think there are a couple of times in my Python career that I've encountered bugs due to mixed and erroneous spacing, and at least one of those bugs was pretty tough to get, but it's been a good while since that one. I don't think much about spacing any more, Emacs does the right thing whether I'm in text or Python mode.

One snag that always seems to get me is the use of commas in complex schema definitions in Archetypes for example, but that's thankfully always an error that stops the entire system so it's easy to catch.

My argument? Naming conventions aren't that important because there will never be enough willpower and resources to make everything completely consistent in the Python ecosystem.. and if one is very preoccupied with naming conventions, maybe it is time to drink a little less coffee, stress a little less and focus on the important parts of a program. [:)

May 15, 2012 09:16 AM


Yaco

How to integrate forms in the Django admin site? django-form-admin

It is very common having to add some feature only for the super users in our website, and it is common that the site where we put this feature is the admin site. Normally we try to hide from the user that he is (programmatically) outside of the Django adiministration, to provide for a better user experience; so our view will have the same path prefix as the administration (/admin/), and the same styles.

If we use forms in these features, we have a layout problem, or mucho work with CSS, because Django renders the forms (as_p, as_ul, as_table) in a very different way compared to the admin site. The django-form-admin app is tailored to solve this problem.

To show the diferences in layout between a normal form and another that uses django-form-admin, we’ll show an example of a form to update the value of some cookies defined in the settings.

For this we will create the following view, in for /admin/edit-cookies/:

@permission_required('my_perm')
def edit_cookies(request):
    initial = {}
    data = None
    if request.method == 'POST':
        data = request.POST
    for cookie in settings.COOKIES_EDITABLES:
        initial[cookie] = request.COOKIES.get(cookie, None)
    form = ChangeCookie(settings.COOKIES_EDITABLES,
                                      initial=initial, data=data)
    if form.is_valid():
        messages.info(request, _('Added the cookies'))
        response = HttpResponseRedirect(reverse('edit_cookies'))
        form.save(response)
        return response
    return render_to_response('foo/edit_cookies.html',
                              {'form': form},
                              context_instance=RequestContext(request))

We now code a form, to create/modify/delete the cookies:

class ChangeCookie(forms.Form):

    def __init__(self, cookies, *args, **kwargs):
        for cookie in cookies:
            self.base_fields[cookie] = forms.CharField(required=False)
        super(ChangeCookie, self).__init__(*args, **kwargs)

    def save(self, response):
        for key, val in self.cleaned_data.items():
            if val:
                response.set_cookie(key, smart_str(val))
            else:
                response.delete_cookie(key)

And the template to be rendered by our view, edit_cookies.html:

{% extends "admin/base_site.html" %}
{% load i18n admin_static admin_modify %}
{% block extrahead %}
    {{ block.super }}
    {{ media }}
    {{ form.media }}
{% endblock %}
{% block extrastyle %}
    {{ block.super }}
    <link rel="stylesheet" type="text/css" href="{% static "admin/css/forms.css" %}" />
{% endblock %}
{% block coltype %}
    {% if ordered_objects %}colMS{% else %}colM{% endif %}
{% endblock %}
{% block breadcrumbs %}
{% endblock %}
{% block content %}
    <div id="content-main">
        <form action="." method="POST">
            {% csrf_token %}
            {{ form }}
            <div>
                <input type="submit" name="submit" value="{% trans "Update Cookies" %}"/>
            </div>
        </form>
    </div>
{% endblock %}

The end result is not very good, because Django renders the forms in a very different way in the admin site.

But if we install the django-form-admin application in out project and we add some like this in our form (there are many ways to do it: inheritance, delegation, implicit, explicit) the difference is more than considerable:

    def __unicode__(self):
        from formadmin.forms import as_django_admin
        return as_django_admin(self)

This app has only had a minimal change in the last year, so this is a stable version. No changes were needed to adapt it to Django 1.3 or Django 1.4.

I hope you like it.

Related Posts:

May 15, 2012 09:02 AM


Chris McDonough

Why I Like ZODB

It's not a state secret that I like ZODB . It's a very civilized way to store web application data. I'll try to enumerate some of the most important reasons I like ZODB here, and why I prefer it to other NoSQL systems and relational systems.

For the record, the stuff in this blog post is in the context of writing a web application. I don't mean it in the context of an OLAP system, or some data warehousing system. I mean it in the context of writing your typical web application, which needs to support fewer than a couple thousand requests per second, a small fraction of which are write requests.

Transactions

ZODB uses transactions. When you write to a ZODB database, you change a bunch of objects, then you commit the changes. Until you commit your changes, other threads and processes accessing the database won't see those changes.

I take transactions entirely for granted when writing an application. Wrapping a set of actions which mutate persistent data in a transaction makes a whole class of really hard problems disappear. Many existing NoSQL databases like MongoDB do not.

Any speed or feature benefit in using a non-transactional data store would just be lost in the noise of needing to cope with the loss of transactionality for anything except the most immense, purpose-built application (e.g. you're writing Twitter, or GitHub). If an application is anticipated to serve fewer than, say, 50 write requests per second or so, it's pretty foolish to even momentarily consider using a system without transactions. Even much busier systems can be engineered to use databases with transactions, albeit with lots of fancy coding.

Truth be told, I'm really not clever enough to write a system without transactions where I needed to have any level of confidence that I have the storage of inter-related data right. I'm also pretty sure I don't want to be that clever, and that a job that required me to be that clever would be a very long job indeed. I certainly wouldn't willingly reach into my toolbox and pick out a data storage without transactions for a run-of-the-mill, low-traffic web application.

ZODB gives me transactions, and I appreciate that.

Caching

The way ZODB operates is pretty simple. Any pickleable object created in memory can be persisted. It's persisted by attaching it (via ``__setattr__``) to any other persistent object. Once persisted, every object has its own identifier that can be used as a cache key.

Because the organization of data in the database is a function of the same data as it might otherwise be organized in memory, ZODB has a natural caching system that is simple and robust. Each persistent object, once loaded, remains in a per-thread memory cache until evicted. It's evicted when it hasn't been accessed in a while and RAM is needed to load a more recently accessed object into the cache, or when another thread or process has changed the object. There's very little cost associated with asking for the same set of objects over and over once that set of objects has been asked for initially.

If your working set is smaller than your available RAM, the way ZODB caches its objects effectively means that accessing an object loaded from ZODB is almost costless after the object is first loaded. That object won't be evicted from the in-memory cache until it changes. In such cases, your application will work at "RAM speed" rather than at "disk speed". If your working set is larger than available RAM, accessing the same set of objects over and over won't be costless, but if you ask for that set of objects often enough, and they don't change very often, and you have an appropriate amount of memory in the machine, they will almost never be evicted from the cache.

Because most requests in a typical web application are read requests (they do not mutate persistent data), this built-in caching is extremely effective in a real-world sense. You usually don't have to employ a third-party caching system to make pages render quickly. They just render quickly in the first place without degrading too much as load increases. There are certainly times you have to think harder about it, such as when you begin to have too much data in your working set to effectively fit into the cache, but often applications never reach this level of exposure, and when they do, the answer is often to just increase the memory in the system and the cache size. In the worst case scenario, you do what everyone else who doesn't use ZODB does: you use a global external cache to make lookup speeds acceptable. In ten years of ZODB use, I can count the number of times I've had to do this on one finger. These days, instead of using an external cache, I might just try to change ZODB instead and improve its caching.

Testability

Of the things I really like about ZODB, the one I like the most is the ability to write easily unit-testable code.

I like the majority of the tests I write to be unit tests. Note: unit tests. Not functional tests. Not integration tests. I just want to test the code I'm trying to test, not its integration with the rest of the system. Functional and integration tests are useful and important too, and you need them for any serious application, but you can get better coverage and a much faster set of tests if you use unit testing in the majority. I don't think most people know what the difference between these styles of testing mean. But if you do know, you really know, and you care.

I have a deep respect for the amount of effort put into making object-relational mappings work. SQLAlchemy and other such systems are well written and thoughtfully executed. However, I find it unappealing that every ORM system I've run across effectively requires that you write integration and/or functional tests to actually test the code in your application. These tests need the code to run against a real database. The test code actually causes data to be modified in that database. Between each test case, the database needs to be reset to a baseline state.

This, in practice, utterly blows. As a limitation, it seems to be, at least in part, a function of the query syntax exposed by the ORM. It's an awful hard syntax to mock out. To my knowledge, it's so hard that no one has even tried, and folks just cave and use a "real" database, and write all of their tests as functional tests, or at least they write all of the tests that come near the database as functional tests. Most real-world code comes near the database, so it's often diminishing returns to even attempt to discern which of the tests can be non-integration tests, and thus people tend to just use the same setup and teardown for all their tests, even the ones that don't come close to the database.

Unit tests run much, much, much faster than integration and functional tests. When I see blog posts where people are trying to parallelize their test runs across multiple machines because their test suite takes so long to run, I want to weep, because it's very likely that they'd get just as much of a sense of comfort from a set of unit tests that ran thousands of times faster with a few functional and integration tests thrown in for the sake of sanity. But they can't, because their toolchain doesn't really support it.

ZODB makes it very easy to write the majority of your tests as unit tests. Because the object graph is just a tree of Python objects, and because each of those objects can be instantiated without any particular root or parent, and because often the "query syntax" of ZODB is just plain old Python item or attribute access, you can often just write test code like this:

  class TestFile(unittest.TestCase):
      def _makeOne(self, stream, mimetype):
          from .. import File
          return File(stream, mimetype)

      def test_ctor_no_stream(self):
          inst = self._makeOne(None, None)
          self.assertEqual(inst.mimetype, 'application/octet-stream')

      def test_ctor_with_stream_mimetype_None(self):
          stream = StringIO.StringIO('abc')
          inst = self._makeOne(stream, None)
          self.assertEqual(inst.mimetype, 'application/octet-stream')
          fp = inst.blob.open('r')
          fp.seek(0)
          self.assertEqual(fp.read(), 'abc')

The above test tests an implementation of a ZODB object representing a file (using the ZODB "blob" functionality). The file object it's testing would be considered a "model" by most people used to "MVC" terminology. Model code is actually pretty easy to test, and I suspect that with careful factoring and stubbing it's possible to test most ORM model code without actually using a database connection.

Let's try something harder. The hardest code to test in any web application is view code. View code is the code that responds to an invocation of a particular URL within your system. It's the code that ties everything together: the model objects that represent persistent storage, the various functional subsystems of the application like the HTTP request, sessions, mailers and filesystems, etc. It's hard to test because it's "where the rubber meets the road". Consider the following Pyramid view function, which is taken from an application that uses ZODB:

  from myapp import File

  def add_file(context, request):
      appstruct = get_appstruct(request)
      name = appstruct['name']
      filedata = appstruct['file']
      stream = None
      filename = None
      if filedata:
          filename = filedata['filename']
          stream = filedata['fp']
          if stream:
              stream.seek(0)
          else:
              stream = None
      name = name or filename
      fileob = request.registry.content.create(File, stream)
      context[name] = fileob
      return HTTPFound(request.mgmt_path(fileob, '@@properties'))

The above view function accepts a context object (a ZODB object representing a "folder", in this case, which is just a container of other ZODB objects), and a request object. It returns a Response. Here's a test for the function in the case where there's no file data provided:

  import unittest
  from pyramid import testing

  class Test_add_file(unittest.TestCase):
      def test_no_filedata(self):
          from .. import add_file
          created = testing.DummyResource()
          context = testing.DummyResource()
          request = testing.DummyRequest()
          request.mgmt_path = lambda *arg: '/mgmt'
          request.registry.content = DummyContent(created)
          appstruct = {'name':'abc', 'file':None}
          request.appstruct = appstruct
          result = add_file(context, request)
          self.assertEqual(result.location, '/mgmt')
          self.assertEqual(context['abc'], created)

  class DummyContent(object):
      def __init__(self, result):
          self.result = result

      def create(self, *arg, **kw):
          return self.result

Note that the test creates "dummy" (aka stub) objects for the context, the request, and the object that is returned from the call to create. It asserts that the created object is added to the context under the name abc. And that's all it does. In the running application, when a new file object is created by the view, it is persisted. But in the test, no database setup is required because the "query API" is limited to a single call: __setitem__, which seats the object into its parent. We can mock this up without much of a problem. In this case, the DummyResource has a suitable __setitem__ already, so we didn't need to do any mocking (it was already done for us). This test will run in microseconds.

There are definitely far more complex cases, requiring more stubbing, such as code that uses an indexing and querying system like repoze.catalog to look up persistent objects efficiently by asking a centralized index for all objects with such-and-such attribute. In those cases, the "query API" is not nearly as simple. But it's still simple enough to mock up without ever requiring a "real" ZODB database connection. The tests get longer, and the mocking and stubbing code becomes more complex. But it's not hopeless, like it seems to be in ORM systems.

Comparable idiomatic ORM code would construct the file object the same way but would then tend to call e.g. add on a semimagical threadlocal "session" object. So you'd need to at least mock out the thread local session, or come up with some other context-sensitive way to get at the session without the thread-local. It would also need to ensure that the a file object with the same name didn't already exist in the database before adding it blindly, which would imply some sort of exists query. Something like this (I realize my syntax is likely terrible):

  from myapp import Session
  from myapp import File

  def add_file(request):
      appstruct = get_appstruct(request)
      name = appstruct['name']
      filedata = appstruct['file']
      stream = None
      filename = None
      if filedata:
          filename = filedata['filename']
          stream = filedata['fp']
          if stream:
              stream.seek(0)
          else:
              stream = None
      name = name or filename
      exists = Session.query(File).filter_by(name=name).one()
      if exists:
         Session.query(File).delete(name=name)
      fileob = request.registry.content.create(File, stream)
      Session.add(fileob)
      return HTTPFound(request.mgmt_path(fileob, '@@properties'))

A test is nowhere near as straightforward to write when the view looks like this. You've obtained a global Session object from an import that needs to either be mocked or a stub of it passed in specially. This is purely convention, and you could construct a system that passed the session in like the "context" in a ZODB app, so that's not really a deep concern. But still, the contract of the Session object is complex. It needs to support a query method that accepts an argument, the result of which needs to support both a filter_by and delete method. The filter_by method needs to return an object that has a one method, and so on. The session is the source of truth in this view, and so mutations done through it need to be reflected in later queries.

This example doesn't even take into account more common cases where one query depends on the result of another data-mutating query, or methods of the session like flush which add attributes to recently add -ed objects representing automatically computed primary keys and so on.

It's no wonder that no one bothers to try to mock it out, and just punts back to always testing functionally. When you test functionally, your test runs in milliseconds rather than microseconds. And that difference adds up across lots of tests. You get a lot of power from the ORM, especially the power to do very ad-hoc queries in a sort of stream-of-consciousness way which is very useful in highly dynamic, ill-defined web applications. But you're paying a price. And often you don't need the ad-hocness supplied by the query syntax. You know exactly what you're looking for and where to find it. ZODB lends itself well to such applications, and the complexity curve seems more adjustable on a per-view basis.

For what it's worth I'd love to be wrong about needing to always write functional tests when an ORM is used. It would mean I could write applications that use an ORM in a style that suits my historical application writing patterns, and in a style that supports very fast test suites. Let me know if you've tried and succeeded. In the meantime, I much prefer to write ZODB applications for testing purposes. Note that it's not just ORMs that have this issue; database bindings for other NoSQL databases have similar issues (e.g. PyMongo); their APIs are very complicated and are difficult to mock. Often the features you gain from that complexity is not worth the price you pay.

May 15, 2012 07:10 AM


ShiningPanda

Python & Java: a unified build process (2/4)

In our previous blog post dedicated to Python and Java, we saw how Maven can orchestrate a unified build process for these two languages.

But most of the time, all artifacts within a build should take the same version. It was not the case in the sample project of our previous blog post, so let's find a way to unify this.

For Java it's easy: JARs and WARs are using the version located in their respective POMs, 0.1 in this case:


com.shiningpanda
jsample
0.1
jar


It would be great if Python projects could also get their versions from the POM. Let's see how to do that.

The POM version has to be propagated when calling setup.py. The easiest way is to set an environment variable with exec-maven-plugin. Modify setuptools/pom.xml in the sample project as follows:






org.codehaus.mojo
exec-maven-plugin



${project.version}







Now a VERSION environment variable containing the value of the POM's version tag is available for setup.py. It can be used to generate a __version__ module containing the project's version. The setup.py script of the pysample project would be modified like this:
# Folder containing setup.py
root = os.path.dirname(os.path.abspath(__file__))
# Path to __version__ module
version_file = os.path.join(root, 'pysample', '__version__.py')
# Check if this is a source distribution.
# If not create the __version__ module containing the version
if not os.path.exists(os.path.join(root, 'PKG-INFO')):
fd = codecs.open(version_file, 'w', 'utf-8')
fd.write('version = %r\n' % os.getenv('VERSION', '?'))
fd.close()
# Load version
exec(open(version_file).read())
# Setup
setup(
name = 'pysample',
version = version,
packages = find_packages(),
)
Note that we only generate the __version__ module if the PKG-INFO file does not exist. Indeed, an existing PKG-INFO file means that we're installing a source distribution previously generated by the setup.py sdist command.

Now all our artifacts are getting their version number from their POM. The versions in the POMs are easily handled thanks to the maven-release-plugin, but we will cover this in another blog post.

A Maven convention wants that version numbers are postfixed with a -SNAPSHOT between two releases. Setuptools uses more likely a .dev one, so feel free to process your POM version in your setup.py, for instance with:
os.getenv('VERSION', '?').replace('-SNAPSHOT', '.dev')
It can also be useful to get the revision version from your source code management tool. To do so, use the buildnumber-maven-plugin. Following the same principle, export an environment variable containing the revision that can be used in the setup.py to compute an artifact version (with a 0.1.dev-r1989 or 0.1.dev-rb0c1c6 pattern for example).

In addition to a Hosted Continuous Integration Service, ShiningPanda is also offering build and release management expertise, so if you have questions or if you are stuck with your internal build process do not hesitate to contact our service team!

May 15, 2012 05:48 AM


Montreal Python User Group

Second Distutils2 Sprint Wrap-up

TP1 logo The second event in our series of Distutils2 sprints was again a success. We’ve managed to fix some interesting issues and we’ve gained some experience points at dealing with the black magic of the packaging arcane.

We would like to thank TP1 for hosting the sprint at their nice Downtown Montreal offices and also for the pizza. Pierre Paul, our host, wrote a longer post relating the event.

Stay tuned for upcoming announcements on Distutils2 sprints.

Special thanks to the sprinters:

Stay tuned this week for the announcement of the next sprint.

Here’s a glimpse of patches being born:

May 12 Distutils2 sprinters

Divine inspiration:

Light falling on a computer

May 15, 2012 01:48 AM

New Room For The Django Workshop

Pythonistas

The last workshop in French for the winter 2012 season is this Wednesday. We wish to remind you that there is still plenty of places left, and, especially, that the room we have for the workshop is not SH-R810 as announced, but SH-2420, also in the Sherbrooke Pavillion of UQÀM.

Here are all the details:

Django : monter une application web en Python

May 15, 2012 12:48 AM

May 14, 2012


Daniel Greenfeld

10 reasons to go to DjangoCon Europe

You should go to DjangoCon Europe in lovely Zurich, Switzerland. Here are 10 reasons why:

1. Chocolate

So much of what we like about chocolate comes from Switzerland. For example, Milk Chocolate was invented in Switzerland.

2. Keynote speaker: Jacob Kaplan-Moss

Always a great speaker and fun to be around, he's one of the BDFL's of Django.

3. Cheese

I grew up thinking that Swiss Cheese was just about holes. It's so much more. I can't wait to try fresh European cheese made by master craftsmen from the freshest ingredients.

4. Keynote speaker: Jessica McKellar

In a word, Jessica is incredible. She's a Twisted core developer, PSF board member, part of the trio responsible for the gigantic Boston Python User Group's massive size explosion, and a talented speaker. She's used her incredible talents and skills to increase diversity in the community and generally help other people.

5. Breakfast

Muesli was invented in Switzerland. I love Muesli. I was floored by how much better it was in New Zealand. I can't wait to try it in it's homeland.

6. Web Site

The DjangoCon Europe site is crazy. I mean, look at all those animations!

7. Talks

This is a single track event with proven speakers like Zachary Voase and Andrew Godwin, yet balances that with bringing in new blood to spice things up. And dare I say I'm giving a technical talk with Audrey Roy? ;-)

8. Mountains

With all the incredible food, you would think you would gain umpteen kilograms. Fortunately there are mountains all around to climb and hike.

9. Sprints

Want to sprint on Django itself? Look no further because there will be Django core developers around! There will also be notable Python developers like Kenneth Reitz and others around working hard on a lot of different projects. It's going to intense and fun!

10. Castles

Living in the USA, we just don't have anything like castles. DjangoCon Europe will be near a small horde of stone fortifications. Which means if the Zombie Apocalypse happens during the conference, we'll have many secure places to go. They also make lovely tourist destinations. :-)

What are you waiting for?

DjangoCon Europe has a cap on attendance. Tickets for Python events have been selling out, not just for PyCon US. Don't miss out!

It's all about me

Yup.

Call me selfish but I want you there because I haven't haven't met all our European friends yet in person. Hope to see you next month in Zurich!

May 14, 2012 07:30 PM


Doug Hellmann

cliff -- Command Line Interface Formulation Framework -- version 0.7


cliff is a framework for building command line programs. It uses
setuptools entry points to provide subcommands, output formatters, and
other extensions.

What's New In This Release?

  • Clean up interactive mode flag settting.
  • Add support for Python 2.6, contributed by heavenshell.
  • Fix multi-word commands in interactive mode.

Documentation

Documentation for cliff is hosted on readthedocs.org

Installation

Use pip:


$ pip install cliff

See the installation guide for more details.


May 14, 2012 06:50 PM