Planet Python
Last update: September 03, 2010 02:47 AM
September 03, 2010
John Cook
Bug in SciPy’s erf function
Last night I produced the plot below and was very surprised at the jagged spike. I knew the curve should be smooth and strictly increasing.

My first thought was that there must be a numerical accuracy problem in my code, but it turns out there’s a bug in SciPy version 0.8.0b1. I started to report it, but I saw there were similar bug reports and one such report was marked as closed, so presumably the fix will appear in the next release.
The problem is that SciPy’s erf function is inaccurate for arguments with imaginary part near 5.8. For example, Mathematica computes erf(1.0 + 5.7i) as -4.5717×1012 + 1.04767×1012 i. SciPy computes the same value as -4.4370×1012 + 1.3652×1012 i. The imaginary component is off by about 30%.
Here is the code that produced the plot.
from scipy.special import erf
from numpy import linspace, exp
import matplotlib.pyplot as plt
def g(y):
z = (1 + 1j*y) / sqrt(2)
temp = exp(z*z)*(1 - erf(z))
u, v = temp.real, temp.imag
return -v / u
x = linspace(0, 10, 101)
plt.plot(x, g(x))
September 02, 2010
Menno's Musings
IMAPClient 0.6.1 released
I've just released IMAPClient 0.6.1.
The only functional change in the release is that it now automatically patches imaplib's IMAP4_SSL class to fix Python Issue 5949. This is a bug that's been fixed in later Python 2.6 versions and 2.7 but still exists in Python versions that are in common use. Without fix this you may experience hangs when using SSL.
The patch is only applied if the running Python version is known to be one of the affected versions. It is applied when IMAPClient is imported.
The only other change in this release is that I've now marked IMAPClient as "production ready" on PyPI and have updated the README to match. This was prompted by a request to clarify the current status of the project and seeing that all current functionality is solid and, I don't plan to change the existing APIs in backwards-incompatible ways, I've decided to indicate the project as suitable for production use.
As always, IMAPClient can be installed from PyPI (pip install imapclient) or downloaded from the IMAPClient site. Feedback, bug reports and patches are most welcome.
Matthew Rollings
Find words with the most anagrams efficiently using python
Following my previous post about 9 letter anagrams I am posting the final code I have created taking into account suggestions/snippets from Michael, Toby and Martin. Added two variables to make it nice and easy to modify what to look for.
Code
# -*- coding: utf-8 -*- from time import time from collections import defaultdict ag_len = 10 # Anagram word length ag_min = 2 # Min # of anagrams dictionary_path = '/usr/share/dict/british-english' tic = time() wd = defaultdict(set) for l in open (dictionary_path, 'r'): l=l.strip() if ag_len==len(l): wd["".join(sorted(l))].add (l) for ws, wl in wd.iteritems(): if len ( wl ) >= ag_min: print " ".join ( wl ) toc = time() print toc-tic,'s'
Explanation
The dictionary file is filtered by length into a dictionary. The key for the dictionary is the letter of the word sorted in order, IE:
"".join(sorted('arranging')) = 'aagginnrr'
With the value as the unsorted word. Because words that are an anagram of each other will be identical when sorted this means that using the add method with a dictionary will cause any anagram to share the same key. Eg:
When the dictionary gets to megatons it will create a new key in the dicitonary like so:
{'aegmnost': set(['megatons'])}
Then to magnetos
{'aegmnost': set(['magnetos', 'megatons'])}
Then to montages:
{'aegmnost': set(['magnetos', 'megatons', 'montages'])}
Then we loop over all the items in the dictionary we created and see if the length of the values is greater than the minimum value we are looking for.
All done, a very elegant and simple method to find words with several anagrams for a given word length.
Results
I was going to post the interesting 10 letter anagrams I found however I couldn’t find any with more than 2 anagrams with the dictionary I was using.
There is a 11 letter tripple anagram:
anthologies anthologise theologians
and some 8 letter with 4 or more anagrams:
painters pertains pantries repaints resident nerdiest inserted trendies salesmen lameness nameless maleness strainer restrain terrains retrains trainers altering triangle relating integral alerting rangiest ingrates angriest gantries parroted predator teardrop prorated iterates teariest treatise treaties trounces counters recounts construe
Yaniv Aknin
Python’s Innards: Hello, ceval.c!
The “Python’s Innards” series owes its existence, at least in part, to hearing one of the Python-Fu masters in my previous workplace say something about a switch statement so large that it was needed to break it up just so some compilers won’t choke on it. I remember thinking then: “Choke the compiler with a switch? Hrmf, let me see that code.” Turns out that this switch can be found in ./Python/ceval.c: PyEval_EvalFrameEx and it switches over the current opcode, invoking its implementation. If I had to summarize all of CPython into one line, I’d probably choose that switch (actually I’d refuse, but humour me by assuming I was at gunpoint or something). This choice is rather subjective, as arguably there are more complex/interesting bits in Python’s object system (explored here and there) or parser/compiler related code. But I can’t help seeing that line, and its surrounding function and file, as the ‘do-work’ heart of CPython.
The reason I didn’t start the series from this heart is that I thought it would be too hard (mostly for the author…). Thanks to what we (well, at least I) learned in the previous posts, I think we can now understand it quite well. I’ll try to link backwards as necessary throughout the article, but if you haven’t followed the series so far, you’d probably do much better if you went back and read some of the previous articles before tackling this one. Also, for brevity’s sake in this post, I won’t qualify the file ./Python/ceval.c and the function PyEval_EvalFrameEx in it. Finally, remember that usually in the series when I quote code, I may note that I edited it, and in that case I often prefer clarity and brevity over accuracy; this is true for this post as well, only much more so, excerpts here might bear only slight resemblance to the real code.
So, where were we… Ah, yes, monstrous switch statement. Well, as I said, this switch can be found in the rather lengthy file ceval.c, in the rather lengthy function PyEval_EvalFrameEx, which takes more than half the file’s lines (it’s roughly 2,250 lines, the file is about 4,400). PyEval_EvalFrameEx implements CPython’s evaluation loop, which is to say that it’s a function that takes a frame object and iterates over each of the opcodes in its associated code object, evaluating (interpreting, executing) each opcode within the context of the given frame (this context is chiefly the associated namespaces and interpreter/thread states). There’s more to ceval.c than PyEval_EvalFrameEx, and we may discuss some of the other bits later in this post (or perhaps a follow-up post), but PyEval_EvalFrameEx is obviously the most important part of it.
Having described the evaluation loop in the previous paragraph, let’s see what it looks like in C (edited):
PyEval_EvalFrameEx(PyFrameObject *f, int throwflag)
{
/* variable declaration and initialization stuff */
for (;;) {
/* do periodic housekeeping once in a few opcodes */
opcode = NEXTOP();
if (HAS_ARG(opcode)) oparg = NEXTARG();
switch (opcode) {
case NOP:
goto fast_next_opcode;
/* lots of more complex opcode implementations */
default:
/* become rather unhappy */
}
/* handle exceptions or runtime errors, if any */
}
/* we are finished, pop the frame stack */
tstate->frame = f->f_back;
return retval;
}
As you can see, iteration over opcodes is infinite (forever: fetch next opcode, do stuff), breaking out of the loop must be done explicitly. CPython (reasonably) assumes that evaluated bytecode is correct in the sense that it terminates itself by raising an exception, returning a value, etc. Indeed, if you were to synthesize a code object without a RETURN_VALUE at its end and execute it (exercise to reader: how?1), you’re likely to execute rubbish, reach the default handler (raises a SystemError) or maybe even segfault the interpreter (I didn’t check this thoroughly, but it looks plausible).
The evaluation loop may look fairly simple so far, but I kept back an important piece: I snipped about 1,450 lines of opcode implementations from within that big switch, all of them presumably more complex than a NOP. In order for you to be able to get a feel for what more serious opcode implementations look like, here’s the (edited) implementation of three more opcodes, illustrating a few more principles:
case BINARY_SUBTRACT:
w = *--stack_pointer; /* value stack POP */
v = stack_pointer[-1];
x = PyNumber_Subtract(v, w);
stack_pointer[-1] = x; /* value stack SET_TOP */
if (x != NULL) continue;
break;
case LOAD_CONST:
x = PyTuple_GetItem(f->f_code->co_consts, oparg);
*stack_pointer++ = x; /* value stack PUSH */
goto fast_next_opcode;
case SETUP_LOOP:
case SETUP_EXCEPT:
case SETUP_FINALLY:
PyFrame_BlockSetup(f, opcode, INSTR_OFFSET() + oparg,
STACK_LEVEL());
continue;
We see several things. First, we see a typical value manipulation opcode, BINARY_SUBTRACT. This opcode (and many others) works with values on the value stack as well as with a few temporary variables, using CPython’s C-API abstract object layer (in our case, a function from the number-like object abstraction) to replace the two top values on the value stack with the single value resulting from subtraction. As you can see, a small set of temporary variables, such as v, w and x are used (and reused, and reused…) as the registers of the CPython VM. The variable stack_pointer represents the current bottom of the stack (the next free pointer in the stack). This variable is initialized at the beginning of the function like so: stack_pointer = f->f_stacktop;. In essence, together with the room reserved in the frame object for that purpose, the value stack is this pointer. To make things simpler and more readable, the real (unedited by me) code of ceval.c defines several value stack manipulation/observation macros, like PUSH, TOP or EMPTY. They do what you imagine from their names.
Next, we see a very simple opcode that loads values from somewhere into the valuestack. I chose to quote LOAD_CONST because it’s very brief and simple, although it’s not really a namespace related opcode. “Real” namespace opcodes load values into the value stack from a namespace and store values from the value stack into a namespace; LOAD_CONST loads constants, but doesn’t fetch them from a namespace and has no STORE_CONST counterpart (we explored all this at length in the article about namespaces). The final opcode I chose to show is actually the single implementation of several different control-flow related opcodes (SETUP_LOOP, SETUP_EXCEPT and SETUP_FINALLY), which offload all details of their implementation to the block stack manipulation function PyFrame_BlockSetup; we discussed the block stack in our discussion of interpreter stacks.
Something we can observe looking at these implementations is that different opcodes exit the switch statement differently. Some simply break, and let the code after the switch resume. Some use continue to start the for loop from the beginning. Some goto various labels in the function. Each exit has different semantic meaning. If you break out of the switch (the ‘normal’ route), various checks will be made to see if some special behaviour should be performed – maybe a code block has ended, maybe an exception was raised, maybe we’re ready to return a value. Continuing the loop or going to a label lets certain opcodes take various shortcuts; no use checking for an exception after a NOP or a LOAD_CONST, for instance.
That’s pretty much it. I can’t really say we’re done (not at all), but this is pretty much the gist of PyEval_EvalFrameEx. Simple, eh? Well, yeah, simple, but I lied a bit with the editing to make it simpler. For example, if you look at the code itself, you will see that none of the case expressions for the big switch are really there. The code for the NOP opcode is actually (remember this series is about Python 3.x unless noted otherwise, so this snippet is from Python 3.1.2):
TARGET(NOP)
FAST_DISPATCH();
TARGET? FAST_DISPATCH? What are these? Let me explain. Things may become clearer if we’d look for a moment at the implementation of the NOP opcode in ceval.c of Python 2.x. Over there the code for NOP looks more like the samples I’ve shown you so far, and it actually seems to me that the code of ceval.c gets simpler and simpler as we look backwards at older revisions of it. The reason is that although I think PyEval_EvalFrameEx was originally written as a really exceptionally straightforward piece of code, over the years some necessary complexity crept into it as various optimizations and improvements were implemented (I’ll collectively call them ‘additions’ from now on, for lack of a better term).
To further complicate matters, many of these additions are compiled conditionally with preprocessor directives, so several things are implemented in more than one way in the same source file. In the larger code samples I quoted above, I liberally expanded some preprocessor directives using their least complex expansion. However, depending on compilation flags, these and other preprocessor directives might expand to something else, possibly more a complicated something. I can understand trading simplicity to optimize a tight loop which is used very often, and the evaluation loop is probably one of the more used loops in CPython (and probably as tight as its contributors could make it). So while this is all very warranted, it doesn’t help the readability of the code.
Anyway, I’d like to enumerate these additions here explicitly (some in more depth than others); this should aid future discussion of ceval.c, as well as prevent me from feeling like I’m hiding too many important things with my free spirited editing of quoted code. Fortunately, most if not all these additions are very well commented -actually, some of the explanations below will be just summaries or even taken verbatim from these comments, as I believe that they’re accurate (eek!). So, as you read PyEval_EvalFrameEx (and indeed ceval.c in general), you’re likely to run into any of these:
“Threaded Code” (Computed-GOTOs)
Let’s start with the addition that gave us TARGET, FAST_DISPATCH and a few other macros. The evaluation loop uses a “switch” statement, which decent compilers optimize as a single indirect branch instruction with a lookup table of addresses. Alas, since we’re switching over rapidly changing opcodes (it’s uncommon to have the same opcode repeat), this would have an adverse effect on the success rate of CPU branch prediction. Fortunately gcc supports the use of C-goto labels as values, which you can generally pass around and place in an array (restrictions apply!). Using an array of adresses in memory obtained from labels, as you can see in ./Python/opcode_targets.h, we create an explicit jump table and place an explicit indirect jump instruction at the end of each opcode. This improves the success rate of CPU prediction and can yield as much as 20% boost in performance.
Thus, for example, the NOP opcode is implemented in the code like so:
TARGET(NOP)
FAST_DISPATCH();
In the simpler scenario, this would expand to a plain case statement and a goto, like so:
case NOP:
goto fast_next_opcode;
But when threaded code is in use, that snippet would expand to (I highlighted the lines where we actually move on to the next opcode, using the dispatch table of label-values):
TARGET_NOP:
opcode = NOP;
if (HAS_ARG(NOP))
oparg = NEXTARG();
case NOP:
{
if (!_Py_TracingPossible) {
f->f_lasti = INSTR_OFFSET();
goto *opcode_targets[*next_instr++];
}
goto fast_next_opcode;
}
Same behaviour, somewhat more complicated implementation, up to 20% faster Python. Nifty.
Opcode Prediction
Some opcodes tend to come in pairs. For example, COMPARE_OP is often followed by JUMP_IF_FALSE or JUMP_IF_TRUE, themselves often followed by a POP_TOP. What’s more, there are situations where you can determine that a particular next-opcode can be run immediately after the execution of the current opcode, without going through the ‘outer’ (and expensive) parts of the evaluation loop. PREDICT (and a few others) are a set of macros that explicitly peek at the next opcode and jump to it if possible, shortcutting most of the loop in this fashion (i.e., if (*next_instr == op) goto PRED_##op). Note that there is no relation to real hardware here, these are simply hardcoded conditional jumps, not an exploitation of some mechanism in the underlying CPU (in particular, it has nothing to do with “Threaded Code” described above).
Low Level Tracing
An addition primarily geared towards those developing CPython (or suffering from a horrible, horrible bug). Low Level Tracing is controlled by the LLTRACE preprocessor name, which is enabled by default on debug builds of CPython (see --with-pydebug). As explained in ./Misc/SpecialBuilds.txt: when this feature is compiled-in, PyEval_EvalFrameEx checks the frame’s global namespace for the variable __lltrace__. If such a variable is found, mounds of information about what the interpreter is doing are sprayed to stdout, such as every opcode and opcode argument and values pushed onto and popped off the value stack. Not useful very often, but very useful when needed.
This is the what the low level trace output looks like (slightly edited):
>>> def f():
... global a
... return a - 5
...
>>> dis(f)
3 0 LOAD_GLOBAL 0 (a)
3 LOAD_CONST 1 (5)
6 BINARY_SUBTRACT
7 RETURN_VALUE
>>> exec(f.__code__, {'__lltrace__': 'foo', 'a': 10})
0: 116, 0
push 10
3: 100, 1
push 5
6: 24
pop 5
7: 83
pop 5
# trace of the end of exec() removed
>>>
As you can guess, you’re seeing a real-time disassembly of what’s going through the VM as well as stack operations. For example, the first line says: line 0, do opcode 116 (LOAD_GLOBAL) with the operand 0 (expands to the global variable a), and so on, and so forth. This is a bit like (well, little more than) adding a bunch of printf calls to the heart of VM.
Advanced Profiling
Under this heading I’d like to briefly discuss several profiling related additions. The first relies on the fact that some processors (notably Pentium descendants and at least some PowerPCs) have built-in wall time measurement capabilities which are cheap and precise (correct me if I’m wrong). As an aid in the development of a high-performance CPython implementation, Python 2.4′s ceval.c was instrumented with the ability to collect per-opcode profiling statistics using these counters. This instrumentation is controlled by the somewhat misnamed --with-tsc configuration flag (TSC is an Intel Pentium specific name, and this feature is more general than that). Calling sys.settscdump(True) on an instrumented interpreter will cause the function ./Python/ceval.c: dump_tsc to print these statistics every time the evaluation loop loops.
The second advanced profiling feature is Dynamic Execution Profiling. This is only available if Python was built with the DYNAMIC_EXECUTION_PROFILE preprocessor name. As ./Tools/scripts/analyze_dxp.py says, [this] will tell you which opcodes have been executed most frequently in the current process, and, if Python was also built with -DDXPAIRS, will tell you which instruction _pairs_ were executed most frequently, which may help in choosing new instructions.
One last thing to add here is that enabling Dynamic Execution Profiling implicitly disables the “Threaded Code” addition.
The third and last addition in this category is function call profiling, controlled by the preprocessor name CALL_PROFILE. Quoting ./Misc/SpecialBuilds.txt again: When this name is defined, the ceval mainloop and helper functions count the number of function calls made. It keeps detailed statistics about what kind of object was called and whether the call hit any of the special fast paths in the code.
Extra Safety Valves
Two preprocessor names, USE_STACKCHECK and CHECKEXC include extra assertions. Testing an interpreter with these enabled may catch a subtle bug or regression, but they are usually disabled as they’re too expensive.
These are the additions I found, grepping ceval.c for #ifdef. I think we’ll call it a day here, although we’re by no means finished. For example, I’d like to devote a separate post to exceptions, which is where we can discuss the tail of the evaluation loop (everything after the big switch and before the end of the big for), which we merely skimmed today. I’d also like to devote a whole post to locking and synchronization (including the GIL), which we touched upon before but never covered properly. Last but really not least, there’s about 2,000 other lines in ceval.c which we didn’t cover today; none of them are as important as PyEval_EvalFrameEx, but we need to talk at least about some of them.
All these things taken into account, I think we can say that today we finally conquered the evaluation loop. This isn’t the end of the series, far from it, but I do see it as a milestone. “Hooray”, I believe the saying goes. I hope you’re enjoying the show, thanks for the supportive comments (they keep me going), and I’ll see you in the next post.
I would like to thank Nick Coghlan for reviewing this article; any mistakes that slipped through are my own.
Tagged: bytecode, code objects, evaluation, evaluation loop, frame object, internals, python
Duncan McGreggor
HCI at Canonical
uTouch
Back in March, I blogged about future possibilities (in a blue-sky sense) of multi-touch, mentioning the project management I was doing for MT hardware kernel driver support in Lucid (and then proceeding to dive into the deep end of speculation). It's now an Ubuntu cycle later, and holy crap... I'm having a hard time finding the words. I think the blog title says it all. But I'll try to elaborate :-)
Unless you've been living under a rock, you've probably noticed the big announcements we made a few weeks ago:
- uTouch mail list announcement
- Mark Shuttleworth's blog post
- The Canonical uTouch announcement
- Chase's blogging extravaganza (uTouch, MT architecture, Magic Trackpad, Trackpad update)
There has been a lot of discussion in blog posts, mail lists, IRC (#ubuntu-touch on freenode.net), Launchpad bugs and merge proposals, etc., so much so that touchscreens now pursue me feverishly when I sleep at night. I'm really not interested in writing more of the same :-)
As such, I want to mix things up a bit...
HCI Remixed
I've been reading an amazing anthology of essays on human-computer interaction. I still haven't finished the book (yeah, I've got about 10 in-progress titles on my nightstand), but am relishing every word in this particular collection. The book is HCI Remixed: Reflections on Works That Have Influenced the HCI Community.
Due to the unusual nature of the book, describing it is surprisingly difficult. That being said, the MIT Press page gives you a great taste:
Over almost three decades, the field of human-computer interaction (HCI) has produced a rich and varied literature. Although the focus of attention today is naturally on new work, older contributions that played a role in shaping the trajectory and character of the field have much to tell us. The contributors to HCI Remixed were asked to reflect on a single work at least ten years old that influenced their approach to HCI. The result is this collection of fifty-one short, engaging, and idiosyncratic essays, reflections on a range of works in a variety of forms that chart the emergence of a new field.If you're into HCI, learning from others, and discovering new sources of inspiration for your own work, this is simply a must-have book :-)
A Small Piece of History
By the time I checked the book out of the Golden public library, it was May and we had begun building the MT team. By July -- once it became clear how astounding the team's work was -- I realized that in 10 or 20 years I could very well be writing an article about Henrik, Chase, Stephen, Ikbel, and Rafi. Much like those in the book, I could be sharing the conversations I'd had with Stéphane Chatty, Mark Shuttleworth, Neil Patel, David Siegel, and John Lea. And that's only the crew which which I was collaborating or discussing directly. There are a lot of folks who've been working very hard on multi-touch infrastructure solutions and exploring ways of integrating these for several years (e.g., Peter Hutterer and Carlos Garnacho).
Python User Groups
pyCologne Python User Group Cologne - Meeting, September 08, 2010, 6.30pm
Matthew Rollings
9 letter words with several anagrams
While perusing the statistics of wordcube, I was wondering how many 9 letter words have multiple anagrams (using all the letters in a single word) and what was the maximum number of anagrams. So I wrote a quick and dirty python program to find out. I will first show the results as they are interesting followed by my coding and methods to improve the efficiency of it.
Results
Here are all the nine letter words with more than 2 anagrams:
- 1. auctioned cautioned education
- 2. beastlier bleariest liberates
- 3. cattiness scantiest tacitness
- 4. countries cretinous neurotics
- 5. cratering retracing terracing
- 6. dissenter residents tiredness
- 7. earthling haltering lathering
- 8. emigrants mastering streaming
- 9. estranges greatness sergeants
- 10. gnarliest integrals triangles
- 11. mutilates stimulate ultimates
- 12. reprising respiring springier
I only found 12 sets of 3, there may be more with a larger dictionary. I was also disappointed that there were no words with 4 anagrams yet not entirely unsurprising. My personal favourite is number 10
Note: There is a poll embedded within this post, please visit the site to participate in this post's poll.Python
I recycled an anagram checking function that I have used before:
# -*- coding: utf-8 -*- # Anagram checking function def anagramchk(word,chkword): for letter in word: if letter in chkword: chkword=chkword.replace(letter, '', 1) else: return 0 return 1
First program
Firstly I created a dirty program that created a loop to cycle through the 9 letter word dictionary and another loop nested inside to check against every word in the dictionary again. This is a terrible and inefficient method and will create duplicates, I will follow with a more efficient method.
g=open('eng-9-letter', 'r')
for l in g:
wordin=l.strip()
f=open('eng-9-letter', 'r')
count=0
w=""
for line in f:
line=line.strip()
if anagramchk(line,wordin):
count+=1
w+=" "+line
f.close()
if count>2:
print wordin, count, "(",w,")"
g.close()
This program took 80.42s to find the 12 solutions. On the path to better coding I decided to load the dictionary into memory, this sped the code up about 20s to 63.88s.
# Load dictionary into memory
dic=[]
f=open('eng-9-letter', 'r')
for line in f:
dic.append(line.strip())
f.close()
I then attempted to create a method that loops over and removes words from the dictionary as it loops, however I don’t know the correct way (if there is one?) of modifying the loop variable while inside the loop without causing problems.
for word in dic: if ....: dic.remove(word)
If anyone knows a good method of doing this please let me know! I did managed to hack together something using slices so that I could modify the dictionary each time, however I imagine this is still quite inefficient.
for word in dic[:]: w="" count=0 for word2 in dic[:]: if anagramchk(word,word2): count+=1 dic.remove(word2) w+=word2+" " if count>2: print w
Even so this method now avoids duplication of results and completes in 31.87s (machine running at 3.15Ghz). Please let me know of any improvements you think can be made and I’ll happily benchmark to see how much better it is.
September 01, 2010
Evan Fosmark
SSL support in asynchat.async_chat
A while back I needed to be able to use SSL connections in async_chat, but I found it to be horribly incompatible. After quite a bit of investigation I found a suitable solution.
import asynchat import socket import ssl import errno class async_chat_ssl(asynchat.async_chat): """ Asynchronous connection with SSL support. """ def connect(self, host, use_ssl=False): self.use_ssl = use_ssl if use_ssl: self.send = self._ssl_send self.recv = self._ssl_recv asynchat.async_chat.connect(self, host) def handle_connect(self): """ Initializes SSL support after the connection has been made. """ if self.use_ssl: self.ssl = ssl.wrap_socket(self.socket) self.set_socket(self.ssl) def _ssl_send(self, data): """ Replacement for self.send() during SSL connections. """ try: result = self.write(data) return result except ssl.SSLError, why: if why[0] in (asyncore.EWOULDBLOCK, errno.ESRCH): return 0 else: raise ssl.SSLError, why return 0 def _ssl_recv(self, buffer_size): """ Replacement for self.recv() during SSL connections. """ try: data = self.read(buffer_size) if not data: self.handle_close() return '' return data except ssl.SSLError, why: if why[0] in (asyncore.ECONNRESET, asyncore.ENOTCONN, asyncore.ESHUTDOWN): self.handle_close() return '' elif why[0] == errno.ENOENT: # Required in order to keep it non-blocking return '' else: raise
It should fit in place of typical use of asynchat.async_chat. In order to specify that you're wanting to use SSL, just set the flag in:
connect(host, use_ssl=True)
It would be nice if SSL support with asynchat.async_chat worked by default. Hopefully I'm not the only one who finds the above solution useful.
And as always, if you see any errors above, I encourage you to post a comment explaining the it!
Mario Boikov
Python Koans - A Great Way to Learn Python!
I just found out about Python Koans by Greg Malcolm (thanks dude) after listening to the from python import podcast podcast (which I find amusing, thanks guys).
It's an awesome way to learn Python. Instead of just reading tutorials and/or books you learn Python by coding.
The interactive tutorial is built around unit-tests and you advance and gain new skills by passing tests and it's really funny. You do learn a lot about the Python language when doing the Koans so I recommend it even if you've been using Python for a while.
Another cool thing is that you learn how to do unit testing in Python, if you're not already familiar with it.
Vern Ceder
Python for Linux at OLF
I’m in the final (but not as final as I would like) stages of preparing for my day-long tutorial at Ohio LinuxFest. OLF, as we call it, is a great event, with some good keynotes, interesting talks, and even maddog. Not to mention first rate tutorials, such as, oh… “Python for Linux System Administration”.
The morning session I’ll spend on basics – writing scripts that illustrate control flow, lists, dictionaries, strings, etc. from the point of view some basic sysadmin scenarios. I’ll also introduce the basics of the subprocess module to call other Linux tools.
Then in the afternoon session, we’ll look at some more involved tasks, like traversing files systems, regular expresssions, daemons, using the network, etc.
I’m looking forward to it – I think it will be a blast.
So if anyone has any cool intersections between Python and Linux sysadmin you wouldn’t mind me stealing, or any other suggestions or words of wisdom, by all means let me know.
Filed under: Python
Imaginary Landscape
Our Django Server Setup: How and Why
One of the most important decisions you make in the process of building a new Django application is what software stack you use to serve it to the world. You're not lacking for options: people run Django on Apache, lighty, nginx, and Cherokee. You also need to decide how ...
Roberto Alsina
Goodreads+webcam+python+zbar == hackfun!
I am a big fan of GoodReads a social network for people who read books.
I read a lot, and I like that I can see what other people think before starting a book, and I can put my short reviews, and I can see what I have been reading, and lots more.
In fact, goodreads is going to be a big part of a project I am starting with some PyAr guys.
One thing I have been lazy about is adding my book list to goodreads, because it's a bit of a chore.
Well, chore no more!
Here's how to do it, the hacker way...
- Get zbar
- Get a cheap webcam
- Get a book
- Get a 7-line python program (included below)
Cute, isn't it?
Here's the code:
import os p=os.popen('/usr/bin/zbarcam','r') while True: code = p.readline() print 'Got barcode:', code isbn = code.split(':')[1] os.system('chromium http://www.goodreads.com/search/search?q=%s'%isbn)
Brett Cannon
What will forever be exclusive to Python 3?
[2010-08-26: remove PEP 3109 and 3110 as they are both syntactically supported in Python 2.6
2010-09-01: remove mention of built-ins returning iterators]
A question on Stack Overflow about what is exclusive to Python 3 came up and I realized that there is no clear list of big changes that you cannot access in Python 2.7 through a __future__ import. So I figured I would go through the What's New docs for Python 3.0, 3.1, and 3.2a1 (although the What's New doc has not been written yet) and see what has (not) been backported of significance.
If something is available in Python 2.6 without a __future__ import I will not list it here (e.g., new octal literals, bytes literal, and str.format). I also don't touch the C API. Otherwise stuff that is crossed out has been backported in Python 2.7 or is in Python 2.6 with a __future__ import. Everything else you have to make the switch to Python 3 to get the feature.
- Python 3.0
print as a functionfrom __future__ import print_functionsince Python 2.6.Views and iterators instead of listsfuture_builtinssince Python 2.6 contains versions of built-ins that match Python 3 semantics, including returning iterators instead of lists- Dicts gained equivalent view methods (e.g.,
dict.viewkeys()) in Python 2.7 - Ordering comparisons
- Int/long unification
- Unicode/bytes clarification
from __future__ import unicode_literalshelps since Python 2.6- ... but only if you bother to rid your code of all references to
strand move tounicodeandbytes - Standard library changes in regards to bytes/strings in Python 3 only
Dict comprehensions- Python 2.7
- Link is to a withdrawn PEP, but the general idea holds
Set comprehensions- Python 2.7
Set literals- Python 2.7
- Removal of old syntax
- exec as a function
PEP 352: Required Superclass for Exceptions- PEP 3102: keyword-only arguments
- PEP 3104: nonlocal
- PEP 3107: function annotations
- PEP 3108: Standard library reorganization
- PEP 3115: Metaclasses
- PEP 3132: extended iterable unpacking
- PEP 3134: Exception Chaining and Embedded Tracebacks
- PEP 3135: new
super PEP 3137:memoryview- Python 2.7
- Python 3.1
PEP 372: ordered dictionary to collections- Python 2.7
PEP 378:Format Specifier for Thousands Separator- Python 2.7
Fields in format() can be auto-numbered- E.g.,
"{} {}".format(2, 5) == "2 5" - Might seem minor, but damn does it make a difference!
- Added in Python 2.7
contextlib.nested no longer needed- Another minor but handy feature
- Python 2.7
Floats print their shortest representation- Python 2.7
tkinter.ttk- Added as
ttkin Python 2.7 importlib- Single function,
importlib.import_module, in Python 2.7 sysconfig- Python 2.7
iorewritten in C- Python 2.7
- --with-computed-gotos
- Python 3.2a1
- GIL heavily reworked to perform better
Bazillions of improvements to unittest- Available externally as unittest2
- Updated in Python 2.7
argparse- Python 2.7
PEP 391: dictionary-based configuration forlogging- Python 2.7
- PEP 3147: PYC repository directories
- ... bunch of stuff I probably overlooked from Misc/NEWS
Richard Tew
Roguelike MUD progress #2
Previous post: Roguelike MUD progress.
I wasn't feeling very motivated when the time I had set aside tonight to work on this came around, but once I got into it I made pretty good progress. The bugs I listed yesterday are now fixed, and additionally the field of view emphasis is working. It's annoying that the stupid mistakes are the ones that take the longest to track down. In this case, absently writing or instead of and.
if y in self.drawRangesNew:Maybe I should reconsider
minX, maxX = self.drawRangesNew[y]
if x >= minX or x = maxX:
return True
minX = x = maxX, although it never quite seems right.
The FoV changes are not as optimal as they could be. Every tile in the post-move FoV tile set is individually marked up with escape codes, I should instead simply mark each row of qualifying tiles in one go.
Current TODO list:
- Clean up the FoV emphasis to be row-based, rather than tile-based.
- Add some objects and entities to the world, that can be interacted with. Entities should move to add life to the world.
Montreal Python User Group
Next Montréal
The technology scene of Montréal is a very vibrant one. With groups such as our selves, OWASP, JS-Montreal, Montreal.rb, and PHP-Québec; with events such as WordCamp, PodCamp, Startup Drinks, and Startup Camp; you end up with weeks during which all your evenings are booked before lunch time on Monday. Yet, it also happens that you have a guest in town and that you want to show then how active you city is without knowing where exactly you should take him to.
Fortunately, some members of the community decided to take the matter in their own hands and to expose for all to see what is going on with the tech and startup scene here in Montréal. Next Montréal is a blog featuring news and opinion from the Web, mobile, and gaming communities. The site is piloted by a handful of Montréal entrepreneurs, engaging us with interviews with the local players and giving us a good feel for who’s working on what and what’s the next big thing. Beyond interviews, Next Montréal brings together the community by posting job opportunities and a calendar of events.
Next Montréal is a great initiative and we hope to see more Python project featured there.
Salman Haq
Sneak Peek: GAE Channel API
At Google IO 2010, the app engine team announced that they had a Channel API in the works. This week I got invited by Moishe Lettvin of the Channel API team to join a handful of developers to try it out. The api is undocumented at the moment and can be considered in private alpha.
I haven’t had the chance to actually use the api yet but I have studied the examples (there are only two) and browsed the private mailing list to better understand it. Here are a few things that I have understood about the api:
- Each client connection is called a “Channel”. There can only be one active channel per client and it is identified by a unique id.
- XMPP is used for transferring messages, not websockets. This is accomplished by embedding a hidden Google Talk iframe in the application. The javascript api does this automatically.
- The intent of the API is to allow multiple clients (browsers) to share messages instantaneously (eg: chatting). It is not designed for server-side streaming which really goes against the grain of app engine.
- Websocket support may be implemented in the future.
The basic javascript client code is quite simple:
var channel = new goog.appengine.Channel(channel_id);
var socket = channel.open();
socket.onopen = function() {
window.setTimeout(function() {sendMessage('connected')}, 100);
}
socket.onmessage = function(evt) {
var o = JSON.parse(evt.data);
... app logic ...
}
and the server-side Python code is not too bad either:
from google.appengine.api import channel from google.appengine.api import users # creating a channel user = users.get_current_user() id = channel.create_channel(user) # sending a message to that channel later in the code channel.send_message(user, "json formatted message")
As you can see, the API is quite simple. Hopefully it will prove to be a boon for developers trying to build distributed multi-user/multi-player applications using the app engine platform.
Grig Gheorghiu
MySQL InnoDB hot backups and restores with Percona XtraBackup
I blogged a while ago about MySQL fault-tolerance and disaster recovery techniques. At that time I was experimenting with the non-free InnoDB Hot Backup product. In the mean time I discovered Percona's XtraBackup (thanks Robin!). Here's how I tested XtraBackup for doing a hot backup and a restore of a MySQL database running Percona XtraDB (XtraBackup works with vanilla InnoDB too).
First of all, I use the following Percona .deb packages on a 64-bit Ubuntu Lucid EC2 instance:
# dpkg -l | grep percona
ii libpercona-xtradb-client-dev 5.1.43-xtradb-1.0.6-9.1-60.jaunty.11 Percona SQL database development files
ii libpercona-xtradb-client16 5.1.43-xtradb-1.0.6-9.1-60.jaunty.11 Percona SQL database client library
ii percona-xtradb-client-5.1 5.1.43-xtradb-1.0.6-9.1-60.jaunty.11 Percona SQL database client binaries
ii percona-xtradb-common 5.1.43-xtradb-1.0.6-9.1-60.jaunty.11 Percona SQL database common files (e.g. /etc
ii percona-xtradb-server-5.1 5.1.43-xtradb-1.0.6-9.1-60.jaunty.11 Percona SQL database server binaries
2) Apply the transaction logs to the datafiles just created, so that the InnoDB logfiles are recreated in the target directory:
/usr/bin/innobackupex-1.5.1 --defaults-file=/etc/mysql10/my.cnf --user=root --password=xxxxxx --apply-log /xtrabackup/2010-09-01_05-21-36/
At this point, I tested a disaster recovery scenario by stopping MySQL and moving all files in DATADIR to a different location.
To bring the databases back to normal from the XtraBackup hot backup, I did the following:
1) Brought back up a functioning MySQL instance to be used by the XtraBackup restore operation:
i) Copied the contents of the default /var/lib/mysql/mysql database under /var/lib/mysql/m10/ (or you can recreate the mysql DB from scratch)
ii) Started mysqld_safe manually:
mysqld_safe --defaults-file=/etc/mysql10/my.cnf
This will create the data files and logs under DATADIR (/var/lib/mysql/m10) with the sizes specified in the configuration file. I had to wait until the messages in /var/log/syslog told me that the MySQL instance is ready and listening for connections.
2) Copied back the files from the hot backup directory into DATADIR
Note that the copy-back operation below initially errored out because it tried to copy the mysql directory too, and it found the directory already there under DATADIR. So the 2nd time I ran it, I moved /var/lib/mysql/m10/mysql to mysql.bak. The copy-back command is:
/usr/bin/innobackupex-1.5.1 --defaults-file=/etc/mysql10/my.cnf --user=root --copy-back /xtrabackup/2010-09-01_05-21-36/
You can also copy the files from /xtrabackup/2010-09-01_05-21-36/ into DATADIR using vanilla cp.
3) If everything went well in step 2, restart the MySQL instance to make sure everything is OK.
At this point, your MySQL instance should have its databases restored to the point where you took the hot backup. If that instance is used in replication, you will most likely need to adjust the master_log_file and master_log_position so that it gets back in sync with its master.
Note that XtraBackup can also run in a 'stream' mode useful for compressing the files generated by the backup operation. Details in the documentation.
Isotoma
Annoying CSS3 Baseline Alignment Problem in Firefox
CSS3 transform enables the rotation of elements including HTML text. If you intend to use it you should be aware that Firefox 3.6.8 and below has very poor baseline alignment.
The heading is just about acceptable. Content text is not.
Firefox 3.6.8:
Webkit:
So be warned if you intend to use CSS transform on text.
S. Lott
Using SCons
In looking at Application Lifecycle Management (see "ALM Tools"), I had found that SCons appears to be pretty popular. It's not as famous as all the make variants, or Apache Ant or Apache Maven, but it seems to have a niche in the forest of Build Automation Software.
"SCons proved to be more accurate, mostly due to its stateful, content-based signature model.On the other hand, GNU Make proved to be more resource friendly, especially regard- ing the memory footprint. SCons needs to address this problem to be a viable alternative to Make when building large software projects."
James Polera
smtp_toolkit
smtp_toolkit
Speaking SMTP to mail servers with Python!
In my daily work, I often find the need to test various mail servers: verify that they are responding, see if they support TLS, check what the max supported message size is, etc. This is usually an exercise in running a telnet session to port 25 of the mail server and inspecting from there.
Seeing as telnet isn’t installed by default on some operating systems these days (I’m looking at you Windows 7), writing a Python class seemed to be the right thing to do. I can incorporate it in to scripts, schedule checks, work it into mxutils.com… The list goes on.
It’s pretty straightforward to use, and I’ve made the code available under the BSD license at http://github.com/polera/smtp_toolkit.
Here are some basic examples of usage:
1 from smtp_toolkit import SMTPServerTest
2
3 # setup a list of servers to check
4 server_list = ['smtp.gmail.com']
5
6 for server in server_list:
7 print(server)
8 s = SMTPServerTest(server)
9 # server connection results are returned as a dict
10 print(s.results)
11 # get the EHLO options (i.e. what would be returned after an ehlo command)
12 print("EHLO options %s" % ", ".join(s.ehlo_options))
13 # see if the server supports TLS (based on the EHLO response)
14 print("TLS Supported? %s" % s.server_supports_tls)
15 # what is the max message size that this server will handle (also from EHLO)
16 print("Max message size: %d MB" % s.server_max_message_size)
I plan on building this out to support more features in the near future, so if you’re interested, keep an eye on the github repo.
Now, go test your servers!
Montreal Python User Group
ConFooBBQ
This year again, ConFoo is going to be a major conference on Web development bringing together many of the local communities. To celebrate this synergy, everyone is invited to ConFooBBQ, the BBQ for developers and other actors of the Web.
The BBQ will take place on 2010-09-11 starting at 1h00 PM.
On the menu: hot-dogs, chips, salad, soft-drinks, cookies, and lots of fun. In line with our beer inspired events, Montréal-Python will bring a keg a Charmeuse de Serpents, a special batch of India Pale Ale with a very assertive character.
To help us plan adequate supplies, please send an email to board@confoo.ca if you plan to attend. Don’t forget to mention if you come with others. If you can’t find the group once you’re on the site, feel free to give the crew a call: 1-888-679-8466 option 0.
Details of the event:
- when: 2010-09-11 at 1h PM
- where: Mont-Royal park, near the Smith House
- who: developers, actors of the Web, and their family
- price: it’s free
- reservations: board@confoo.ca
Richard Tew
Roguelike MUD progress
I finally found some more time to work on my roguelike MUD project. Tonight I managed to get proper multi-player support in, so that concurrently logged in players have their view of the game updated as other game objects change position. Up to this point the players mostly shared the world representation, and had separate state defining what was in it.
While the changes required are more or less fine, some didn't fit well into the existing game framework. I need to put some thought into cleaning those up.
Achievable next steps should hopefully be:
- Cleaning up the bugs.
- Fix the incorrect menu related message that appears in the top telnet window shown in the screenshot.
- When a player quits the game, the display of other observing players does not update to reflect it.
- When an object moves, observing players are defined as those who have visited the tile the object moved to and currently have it on their display. Instead only observing players who have the object in their field of vision should see the movement. - Polish the field of vision support.
- The tiles that are in the field of vision should be distinct from those that are not (Smart Kobold uses this approach to good effect). - Widen the corridors so players can pass each other.
- Add entities controlled by AI that move around of their own volition.
Mike Driscoll
Another GUI2Exe Tutorial – Build a Binary Series!
This is the last article of my “Build a Binary Series”. If you haven’t done so already, be sure to check out the others. For our finale, we are to look at Andrea Gavana’s wxPython-based GUI2Exe, a nice graphical user interface to py2exe, bbfreeze, cx_Freeze, PyInstaller and py2app. The latest release of GUI2Exe is 0.5.0, although the source may be slightly newer. Feel free to run from the tip as well. We’ll be using the example scripts that we used for several of the previous articles: one console and one GUI script, neither of which do much of anything.
Getting Started with GUI2Exe
Quite some time ago, I wrote another article on this cool tool. However, the look-and-feel of the application has changed quite a bit, so I felt I should re-write that article in the context of this series. To follow along with this article, you’ll need to hit Google Code for the source. Let’s begin, shall we? Here are some step-by-step directions for making the console script using py2exe via GUI2Exe:
- Download the source and unzip them in a convenient location
- Run the “GUI2Exe.py” file (you can use your favorite editor, open it via the command line or whatever)
- Go to File, New Project. A dialog will appear asking you to name the project. Give it a good name! Then hit OK.
- Click in the “Exe Kind” column and change it to “Console”
- Click in the “Python Main Script” column and you’ll see a button appear.
- Press the button and use the file dialog to find your main script
- Fill out the other optional fields however you like
- Hit the compile button on the lower right
- Try the result to see if it worked!
If you followed the directions above, you should now have an executable file (and a few dependencies) in a “dist” folder at the location of the main script. As you can see in the screenshot above, there are all the typical options that you would set in your setup.py file. You can set your excludes list, the includes, the optimize and compressed settings, whether or not to include a zip, packages and much more! You can tweak to your hearts content and hit the “Compile” button whenever you’re ready to see the result. If I’m experimenting, I usually change the output directory’s name so I can compare the results to see which is the most compact.
If you want to use bbfreeze, cx_freeze, PyInstaller or py2app, just click the respective name in the column on the right. This will cause the middle part of the screen to change according to your choice and show the corresponding options for said choice. Let’s take a quick visual tour!
GUI2Exe in Pictures!
The following is a snapshot of the py2app options:
Next is a shot of the cx_Freeze options:
And here is PyInstaller’s settings:
Finally, we have bbfreeze’s options:
There’s also a VendorId screen, but I don’t know much about that one, so we’ll be skipping it.
GUI2Exe’s Menu Options
As you might guess, all these options work the same way as they do when you do it all yourself in code. If you ever need to check out the setup.py file that GUI2Exe is making for you, just go to the Builds menu and choose View Setup Script. If you want to see a handy listing of the files it output and where it output the files, go to Build, Explorer and you should see something like the screenshot below:
Other handy option in the Builds menu include the Mission Modules and the Binary Dependencies menu items. These show you what may be missing from the dist folder that you may need to include should you distribute your masterpiece.
The Options menu controls options for GUI2Exe itself and a few custom items for the build process, like setting the Python version, deleting the build and/or dist folders, set the PyInstaller Path and more. The other menus are pretty self-explanatory and I leave them for the adventurous readers.
Wrapping Up
If you’ve read my other tutorials in the “Build a Binary Series” then you should be able to take that knowledge and use it productively with GUI2Exe. I find GUI2Exe to be very helpful when it comes time for me to build an executable and I used it to help me figure out the options for some of the other binary builders in this series. I hope you enjoyed this series and found it helpful. See you next time!
Further Reading
- GUI2Exe Official website
- Andrea Gavana’s website and blog
- Build a Binary Series
- The other GUI2Exe tutorial
August 31, 2010
BioPython News
Biopython 1.55 released
The Biopython team is proud to announce Biopython 1.55, a new stable release, about three months after our last stable release (Biopyton 1.54) and the beta release earlier in August.
A lot of work has been towards Python 3 support (via the 2to3 script), but unless we broke something you shouldn’t notice any changes
In terms of new features, the most noticeable highlight is that the command line tool application wrapper classes are now executable, which should make it much easier to call external tools. This is described in the updated documentation.
Additionally GenBank and EMBL parsing has been sped up, the BioSQL classes act more like Python dictionaries, and Bio.PDB should handle model numbers and a missing element column better.
Note we are phasing out support for Python 2.4. We will continue to support it for at least one further release (i.e. Biopython 1.56). This could be delayed given feedback from our users (e.g. if this proves to be a problem in combination with other libraries or a popular Linux distribution).
(At least) 12 people have contributed to this release, including 6 new people – thank you all:
- Andres Colubri (first contribution)
- Carlos Rios Vera (first contribution)
- Claude Paroz (first contribution)
- Cymon Cox
- Eric Talevich
- Frank Kauff
- Joao Rodrigues (first contribution)
- Konstantin Okonechnikov (first contribution)
- Michiel de Hoon
- Nathan Edwards (first contribution)
- Peter Cock
- Tiago Antao
Source distributions and Windows installers are available from the downloads page on the Biopython website (biopython.org).
As usual, feedback is most welcome on the mailing lists (or bugzilla).
Mikko Ohtamaa
Testing if hostname is numeric IPv4
I had to resort this hack when testing a hybrid web/mobile site which uses site hostname based device discrimination. In production mode we can have m.yoursite.com and www.yoursite.com hostnames. However, when running the site locally, on your development computer and in LAN this does not work very well: one cannot spoof hostnames for web browsers in devices like iPhone/iPod/other mobile phone unless you install a DNS server. And installing a DNS server for LAN is something you don’t want to do…
So, I figured out that I can use hostname spoofing on desktop computers (/etc/hosts file) and I always access the site via numeric IP (IPv4 over ethernet) when testing over WLAN on mobile devices.
- The site is rendered in web mode when it is being accessed via textual hostname (localhost, yourpcname)
- The site is rendered in mobile mode when it is being accessed via IPv4 numeric hostname (127.0.0.1, 196.168.200.1)
And,… dadaa,… here is my magical code to test whether hostname is numeric IPv4. I couldn’t find a ready function from Python standard library
import re
ipv4_regex_source = "^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$"
ipv4_regex = re.compile(ipv4_regex_source)
def is_numeric_ipv4(str):
"""
http://answers.oreilly.com/topic/318-how-to-match-ipv4-addresses-with-regular-expressions/
@param str: Hostname as a string.
@return: True if the given string is numeric IPv4 address
"""
# ^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$
return ipv4_regex.match(str)
Read our blog
Subscribe mFabrik blog in a reader
Follow us on Twitter
Mikko Ohtamaa on LinkedIn









