Planet Python
Last update: February 09, 2010 09:43 AM
February 09, 2010
Richard Tew
Mailman-style mailing list archives
I have the posts made to several mailing lists in a variety of non-standard formats. Converting them to a standard mbox file is a matter of parsing and is a different process for each. Once I have each parsed, what I would like to do is generate Mailman-style list archives.
I've downloaded Mailman and tried to get it to take my mbox file, and output the list archives. But the process is to some degree tied to Unix-style platforms, relying on functionality that is not supported on Windows. But to a larger degree, it is tied into the quality of being a proper Mailman hosted mailing list. Even changing the code to address or work around these things is not the cleanest of processes. There must be a better way.
Any suggestions?
Have some hacky code while I am at it:
WORKING_PATH = r"D:\MailingList"
MAILMAN_PATH = os.path.join(WORKING_PATH, "mailman-2.1.13")
class MailList:
def __init__(self, basePath, fileName):
self.basePath = basePath
self.fileName = fileName
self.SetVars()
def SetVars(self):
self._internal_name = "mud-dev"
self._fullpath = "/resource/MUD-Dev/"
self.host_name = "localhost"
self.subject_prefix = "[MUD-Dev] "
self.real_name = "MUD-Dev"
def fullpath(self):
return self._fullpath
def archive_dir(self):
return self.basePath
def internal_name(self):
return self.fileName
def ArchiveFileName(self):
return os.path.join(self.basePath, self.internal_name() + ".mbox")
def GetScriptURL(self, *args, **kwargs):
return args[0]
def GetListEmail(self):
return "no-list-email"
class SuperDuperArchive(HyperArchive):
def GetArchLock(self): return 1
def DropArchLock(self): pass
def fake_symlink(src, dst):
if os.path.exists(src):
open(dst, "w").write(open(src, "r").read())
os.symlink = fake_symlink
os.link = fake_symlink
Mailman.mm_cfg.TEMPLATE_DIR = os.path.join(MAILMAN_PATH, "templates")
Mailman.mm_cfg.LIST_DATA_DIR = WORKING_PATH
Mailman.mm_cfg.PUBLIC_ARCHIVE_FILE_DIR = WORKING_PATH
Mailman.mm_cfg.PRIVATE_ARCHIVE_FILE_DIR = WORKING_PATH
mlist = MailList(os.path.join(filePath, "archives"), "mud-dev")
mlist.preferred_language = 'en'
listPath = os.path.join(Mailman.mm_cfg.LIST_DATA_DIR, mlist.internal_name())
if not os.path.exists(listPath):
os.makedirs(listPath)
class DummyClass:
def internal_name(self):
return "mud-dev"
def archive_dir(self):
return Site.get_archpath(self.internal_name())
def GetScriptURL(self, *args, **kwargs):
return args[0]
def GetListEmail(self):
return "no-list-email"
instance = DummyClass()
Mailman.MailList.MailList.InitVars.im_func(instance, "mud-dev")
for baseclass in Mailman.MailList.MailList.__bases__:
if hasattr(baseclass, 'InitVars'):
baseclass.InitVars.im_func(instance)
MailList.SetVars.im_func(instance)
listConfigPath = os.path.join(listPath, "config.pck")
if not os.path.exists(listConfigPath):
cPickle.dump(instance.__dict__, open(listConfigPath, "wb"))
archive = SuperDuperArchive(mlist)
archive.processListArch()
archive.close()
Heikki Toivonen
Pulling Android Market Sales Data Programmatically
Android Market handles sales through Google Checkout. I haven’t tried selling anything else online before, but what this setup provides for me as the seller leaves a lot to be desired. One issue you will have trouble with is getting the data needed to file taxes.
Google provides a Google Checkout Notification History API that lets you programmatically query sales data. For my purposes the API requests are really simple: just post a small XML document with the date range I am interested in, get back XML documents that contain my data. If there is more data that fits in a single response, look for an element that specifies the token for the next page and keep pulling until you get all data.
Below is a really simple Python script that uses M2Crypto to handle the SSL parts for the connection (needed since Python doesn’t do secure SSL out of the box). You will also need to grab certificates. You should save the script as gnotif.py, save the certificates as cacert.pem and create gnotif.ini as described in the script below all in the same directory. When you execute it, it will ask for start and end date (in YYYY-MM-DD format) and then fetch all the data, saving them in response-N.xml files, where N is a number.
#!/usr/bin/env python # Script to query Google Checkout Notification History # http://code.google.com/apis/checkout/developer/Google_Checkout_XML_API_Notification_History_API.html # Supporting file gnotif.ini: #[gnotif] # merchant_id = YOUR_MERCHANT_ID_HERE # merchant_key = YOUR_MERCHANT_KEY_HERE import base64 import re from ConfigParser import ConfigParser from M2Crypto import SSL, httpslib ENVIRONMENT = "https://checkout.google.com/api/checkout/v2/reports/Merchant/" XML = """\ <notification-history-request xmlns="http://checkout.google.com/schema/2"> %(query)s </notification-history-request> """ config = ConfigParser() config.read('gnotif.ini') MERCHANT_ID = config.get('gnotif', 'merchant_id') MERCHANT_KEY = config.get('gnotif', 'merchant_key') rawstr = r"""<next-page-token>(.*)</next-page-token>""" compile_obj = re.compile(rawstr, re.MULTILINE) auth = base64.encodestring('%s:%s' % (MERCHANT_ID, MERCHANT_KEY))[:-1] ctx = SSL.Context('sslv3') # If you comment out the next 2 lines, the connection won't be secure ctx.set_verify(SSL.verify_peer | SSL.verify_fail_if_no_peer_cert, depth=9) if ctx.load_verify_locations('cacert.pem') != 1: raise Exception('No CA certs') start = raw_input('Start date: ') end = raw_input('End date: ') data = XML % {'query': """<start-time>%(start)s</start-time> <end-time>%(end)s</end-time>""" % {'start': start, 'end': end}} i = 0 while True: c = httpslib.HTTPSConnection(host='checkout.google.com', port=443, ssl_context=ctx) c.request('POST', ENVIRONMENT + MERCHANT_ID, data, {'content-type': 'application/xml; charset=UTF-8', 'accept': 'application/xml; charset=UTF-8', 'authorization': 'Basic ' + auth}) r = c.getresponse() f=open('response-%d.xml' % i, 'w') result = r.read() f.write(result) f.close() print i, r.status c.close() match_obj = compile_obj.search(result) if match_obj: i += 1 data = XML % {'query': """<next-page-token>%s</next-page-token>""" % match_obj.group(1)} else: break
As you take a look at the data you will probably notice that you are only getting the sale price information, but no information about the fees that Google is deducting. Officially it is a flat 30%, but I have found out a number of my sales have the fee as 5%. So we need to get this information somehow. Luckily you can toggle a checkbox in your Google Checkout Merchant Settings. Unfortunately there is a bug, and the transaction fee shows as $0 for Android Market sales. I have reported this to Google, and they acknowledged it, but there is no ETA on when this will be fixed.
I also haven’t found any way to programmatically query when and how much did Google Checkout actually pay me. (I can get this info from my bank, but it would be nice to query for that with the Checkout API as well.)
Last but certainly not least, working with the monster XML files returned from Google Checkout API is a real pain. If someone has a script to turn those into a format that could be imported into a spreadsheet or database that would be nice…
Vern Ceder
Get the most out of PyCon – VOLUNTEER
PyCon Atlanta is now less than 2 weeks away, and things are coming together. My big concern, the poster session, is pretty much set to go. Transportation, check. Hotel, check. Conference registration, time off from work, talks I want to catch, tentative open space plans: check, check, check, and check.
Yesterday I added the final [...]
Calvin Spealman
DeferArgs on GitHub
A time ago I wrote a library called DeferArgs and I used it when I was still in Twisted code every day. I no longer have that fun, but I was reminded of the code and decided to throw it onto GitHub for anyone who cares for it.
http://github.com/ironfroggy/DeferArgs
An example usage, where foo could take any deferreds and would be called when they all fire.
@deferargs
def foo():
assert False
@catch(AssertionError)
def onAssert(error):
print "OOPS"
@catch()
def onOthers(error):
print "I WOULD BE REACHED FOR ANYTHING NOT CAUGHT ABOVE."
@cleanup
def _(r):
print "The result was: ", r
February 08, 2010
Geert Vanderkelen
Don't forget the COMMIT in MySQL
Yes, MySQL has transactions if you use InnoDB or NDB Cluster for example. Using these transactional storage engines, you'll have to commit (or roll back) your inserts, deletes or updates.
I've seen it a few times now with people being surprised that no data is going into the tables. It's not so a silly problem in the end. If you are used to the defaults in MySQL you don't have to commit anything since it is automatically done for you.
Take the Python Database Interfaces for MySQL. PEP-249 says that, by default, auto-commit should be turned off. You could turn it back on, but it's good practice to be explicit and commit in your code. Remember the Zen of Python!
Here is just a small example to show it. Uses MySQL Connector/Python, but it does work also with others:
import mysql.connector
cnx = mysql.connector.connect(db='test')
cur = cnx.cursor()
cur.execute("""CREATE TABLE innodb_t1 (
id INT UNSIGNED NOT NULL,
c1 VARCHAR(128),
PRIMARY KEY (id)
) ENGINE=InnoDB""")
ins = "INSERT INTO innodb_t1 (id,c1) VALUES (%s,%s)"
cur.execute(ins,
(1,'MySQL Support Team _is_ already the best',))
cnx.commit()
cur.close()
cnx.close()
Eric Florenzano
How do we kick our synchronous addiction?
Asynchronous programming is superior both in memory usage and in overall throughput when compared to synchronous programming . We've known this fact for years. If we look at Django or Ruby on Rails, arguably the two most promising new web application frameworks to emerge in the past few years, both of them are written in such a way that synchronous programming is assumed. Why is it that even in 2010 we're still writing programs that rely on synchronous programming ?
The reason that we're stuck on synchronous programming is twofold. Firstly, the programming model required for straightforward asynchronous implementations is inconvenient. Secondly, popular and/or mainstream languages lack the built-in language constructs that are needed to implement a less-straightforward approach to asynchronous programming.
Asynchronous programming is too hard
Let's first examine the straightforward implementation: an event loop. In this programming model, we have a single process with a single loop that runs continuously. Functionality is achieved by writing functions to execute small tasks quickly, and inserting those functions into that event loop. One of those functions might read some bytes from a socket, while another function might write a few bytes to a file, and yet another function might do something computational like calculating an XOR on the data that's been buffered from that first socket.
The most important part about this event loop is that only one thing is ever happening at a time. That means that you really have to break your logic up into small chunks that can be performed incrementally. If any one of our functions blocks, it hogs the event loop and nothing else can execute during that time.
We have some really great frameworks geared towards making this event loop model easier to work with. In Python, there's Twisted and, more recently, Tornado. In Ruby there's EventMachine. In PERL there's POE. What these frameworks do is twofold: provide constructs for more easily working with an event loop (e.g. Deferreds or Promises), and provide asynchronous implementations of common tasks (e.g. HTTP clients and DNS resolution).
But these frameworks stop very short of making asynchronous programming easy for two reasons. The first reason is that we really do have to completely change our coding style. Consider what it would take to render a simple blog web page with comments. Here's some JavaScript code to demonstrate how this might work in a synchronous framework:
function handleBlogPostRequest(request, response, postSlug) {
var db = new DBClient();
var post = db.getBlogPost(postSlug);
var comments = db.getComments(post.id);
var html = template.render('blog/post.html',
{'post': post, 'comments': comments});
response.write(html);
response.close();
}
Now here's some JavaScript code to demonstrate how this might look in an asynchronous framework. Note several things here: We've specifically written this in such a way that it doesn't become nested four levels deep. We've also written these callback functions inside of the handleBlogPostRequest function to take advantage of closure so as to retain access to the request and response objects, the template context, and the database client. Both the desire to avoid nesting and the closure are things that we need to think about as we write this code, that were not even considerations in the synchronous version:
function handleBlogPostRequest(request, response, postSlug) {
var context = {};
var db = new DBClient();
function pageRendered(html) {
response.write(html);
response.close();
}
function gotComments(comments) {
context['comments'] = comments;
template.render('blog/post.html', context).addCallback(pageRendered);
}
function gotBlogPost(post) {
context['post'] = post;
db.getComments(post.id).addCallback(gotComments);
}
db.getBlogPost(postSlug).addCallback(gotBlogPost);
}
I've chosen JavaScript here to prove a point, by the way. People are very excited about node.js right now, and it's a very cool framework, but it doesn't hide all of the complexities involved in doing things asynchronously. It only hides some of the implementation details of the event loop.
The second reason why these frameworks fall short is because not all IO can be handled properly by a framework, and in these cases we have to resort to bad hacks. For example, MySQL does not offer an asynchronous database driver, so most of the major frameworks end up using threads to ensure that this communication happens out of band.
Given the inconvenient API, the added complexity, and the simple fact that most developers haven't switched to using this style of programming, leads us to the conclusion that this type of framework is not a desirable final solution to the problem (even though I do concede that you can get Real Work done today using these techniques, and many people do). That being the case, what other options do we have for asynchronous programming? Coroutines and lightweight processes, which brings us to our next major problem.
Languages don't support easier asynchronous paradigms
There are a few language constructs that, if implemented properly in modern programming languages, could pave the way for alternative methods of doing asynchronous programming that don't have the drawbacks of the event loop. These constructs are coroutines and lightweight processes.
A coroutine is a function that can suspend and resume its execution at certain, programmatically specified, locations. This simple concept can serve to transform blocking-looking code to be non-blocking. At certain critical points in your IO library code, the low-level functions that are doing IO can choose to "cooperate". That is, it can choose to suspend execution in order for another function to resume execution and continue on.
Here's an example (it's Python, but fairly understandable for all I hope):
def download_pages():
google = urlopen('http://www.google.com/').read()
yahoo = urlopen('http://www.yahoo.com/').read()
Normally the way this would work is that a socket would be opened, connected to Google, an HTTP request sent, and the full response would be read, buffered, and assigned to the google variable, and then in turn the same series of steps would be taken for the yahoo variable.
Ok, now imagine that the underlying socket implementation were built using coroutines that cooperated with each other. This time, just like before, the socket would be opened and a connection would be made to Google, and then a request would be fired off. This time, however, after sending the request, the socket implementation suspends its own execution.
Having suspended its execution (but not yet having returned a value), execution continues on to the next line. The same thing happens on the Yahoo line: once its request has been fired off, the Yahoo line suspends its execution. But now there's something else to cooperate with--there's actually some data ready to be read on the Google socket--so it resumes execution at that point. It reads some data from the Gooogle socket, and then suspends its execution again.
It jumps back and forth between the two coroutines until one has finished. Let's say that the Yahoo socket has finished, but the Google one has not. In this case, the Google socket just continues to read from its socket until it has completed, because there are no other coroutines to cooperate with. Once the Google socket is finally finished, the function returns with all of the buffered data.
Then the Yahoo line returns with all of its buffered data.
We've preserved the style of our blocking code, but we've used asynchronous programming to do it. Best of all, we've preserved our original program flow--the google variable is assigned first, and then the yahoo variable is assigned. In truth, we've got a smart event loop going on underneath the covers to control who gets to execute, but it's hidden from us due to the fact that coroutines are in play.
Languages like PHP, Python, Ruby, and Perl simply don't have built-in coroutines that are robust enough to implement this kind of behind-the-scenes transformation. So what about these lightweight processes?
Lightweight processes are what Erlang uses as its main concurrency primitive. Essentially these are processes that are mostly implemented in the Erlang VM itself. Each process has approximately 300 words of overhead and its execution is scheduled primarily by the Erlang VM, sharing no state at all amongst processes. Essentially, we don't have to think twice about spawning a process, as it's essentially free. The catch is that these processes can only communicate via message passing.
Implementing these lightweight processes at the VM level gets rid of the memory overhead, the context switching, and the relative sluggishness of interprocess communication provided by the operating system. Since the VM also has insight into the memory stack of each process, it can freely move or resize those processes and their stacks. That's something that the OS simply cannot do.
With this model of lightweight processes, it's possible to again revert back to the convenient model of using a separate process for all of our asynchronous programming needs. The question becomes this: can this notion of lightweight processes be implemented in languages other than Erlang? The answer to that is "I don't know." To my knowledge, Erlang takes advantage of some features of the language itself (such as having no mutable data structures) in its lightweight process implementation.
Where do we go from here?
The key to moving forward is to drop the notion that developers need to learn to think about all of their code in terms of callbacks and asynchrony, as the asynchronous event loop frameworks require them to do. Over the past ten years, we can see that most developers, when faced with that decision, simply choose to ignore it. They continue to use the inferior blocking methodologies of yesteryear.
We need to look at these alternative implementations like coroutines and lightweight processes, so that we can make asynchronous programming as easy as synchronous programming. Only then will we be able to kick this synchronous addiction.
Roberto Alsina
Marave 0.3 is out!
Version 0.3 of Marave, a distraction-free fullscreen editor is out at http://marave.googlecode.com
This version includes several bugs fixed and features implemented since 0.2:
- New 'Styles' support, you can change the look of Marave with CSS syntax
- Debugged themes support, a few themes included
- Fixed bug saving text color
- Fixed font changing bug
- Use the document name in window title
- "Now playing" notification
Marave is free softare released under the GPL, and should work in all major desktop platforms.
I would love feedback on this release, as well as ideas for Marave's future, so if you want to help, please join the mailing list:
http://groups.google.com/group/marave-discuss
Of course, if you like Marave, feel free to give me money
John Cook
Twitter daily tip news
I have five Twitter accounts that send out one tip per day, including a new one I just added last week.
Regular expressions
@RegexTip started over today. It’s a cycle of tips for learning regular expressions. It sticks to the regular expression features common to Python, Perl, C#, and many other programming languages. This account posts Monday through Friday.
Keyboard shortcuts
@SansMouse gives one tip a day on using Windows without a mouse. By practicing one keyboard shortcut a day, you can get into the habit of using your mouse less and your keyboard more. This cycle of tips started over January 29 with the most common and most widely useful shortcuts. I’m also sprinkling in a few extra tips that are less well known. This account also posts Monday through Friday.
Math
I have three mathematical accounts. These post seven days a week.
@AlgebraFact, just started February 2. It will be a mixture of linear algebra, number theory, group theory, etc.
@ProbFact gives one fact per day from probability. Usually these facts are theorems, but sometimes they include a note on history or applications.
@AnalysisFact gives facts from real and complex analysis. The topics range from elementary to advanced.
What if I don’t use Twitter?
You can visit the page for a Twitter account just like any other web page. And every Twitter account has an RSS feed link allowing you to subscribe just as you would subscribe to a blog.
How do you write these?
I write up content for these accounts in bulk. I may sit down on a Saturday and come up with several weeks worth of tips. Then I use HootSuite to schedule the tips weeks in advance. Sometimes I’ll post something spontaneously, such as link to something relevant, but most of the work is done in advance. I use my personal Twitter account for live interaction.
Related links:
Regular expressions in
Chart of probability distribution relationships
Ned Batchelder
21st century life in transition
Sitting at the breakfast table, my wife Susan was reading the paper, and when she got to the end of a story, dragged her finger down the paper to try to scroll the newspaper.
I've sat in a movie theater watching trailers, and glanced at the bottom of the screen to try to see the progress bar to see how much time was left in the short clip.
Max said when he's writing on paper with a pencil, and makes a mistake, his left hand twitches as if to hit cmd-Z.
Jonathan Ellis
Distributed deletes in the Cassandra database
Handling deletes in a distributed, eventually consistent system is a little tricky, as demonstrated by the fairly frequent recurrence of the question, "Why doesn't disk usage immediately decrease when I remove data in Cassandra?"
As background, recall that a Cassandra cluster defines a ReplicationFactor that determines how many nodes each key and associated columns are written to. In Cassandra (as in Dynamo), the client controls how many replicas to block for on writes, which includes deletions. In particular, the client may (and typically will) specify a ConsistencyLevel of less than the cluster's ReplicationFactor, that is, the coordinating server node should report the write successful even if some replicas are down or otherwise not responsive to the write.
(Thus, the "eventual" in eventual consistency: if a client reads from a replica that did not get the update with a low enough ConsistencyLevel, it will potentially see old data. Cassandra uses Hinted Handoff, Read Repair, and Anti Entropy to reduce the inconsistency window, as well as offering higher consistency levels such as ConstencyLevel.QUORUM, but it's still something we have to be aware of.)
Thus, a delete operation can't just wipe out all traces of the data being removed immediately: if we did, and a replica did not receive the delete operation, when it becomes available again it will treat the replicas that did receive the delete as having missed a write update, and repair them! So, instead of wiping out data on delete, Cassandra replaces it with a special value called a tombstone. The tombstone can then be propagated to replicas that missed the initial remove request.
There's one more piece to the problem: how do we know when it's safe to remove tombstones? In a fully distributed system, we can't. We could add a coordinator like ZooKeeper, but that would pollute the simplicity of the design, as well as complicating ops -- then you'd essentially have two systems to monitor, instead of one. (This is not to say ZK is bad software -- I believe it is best in class at what it does -- only that it solves a problem that we do not wish to add to our system.)
So, Cassandra does what distributed systems designers frequently do when confronted with a problem we don't know how to solve: define some additional constraints that turn it into one that we do. Here, we defined a constant, GCGraceSeconds, and had each node track tombstone age locally. Once it has aged past the constant, it can be GC'd. This means that if you have a node down for longer than GCGraceSeconds, you should treat it as a failed node and replace it as described in Cassandra Operations. The default setting is very conservative, at 10 days; you can reduce that once you have Anti Entropy configured to your satisfaction. And of course if you are only running a single Cassandra node, you can reduce it to zero, and tombstones will be GC'd at the first compaction.
Isotoma
Beginning development with Plone 4 & Dexterity
Over the past few days, I’ve been tinkering with the latest alphas of Plone 4, particularly with an eye to trying out Dexterity on the latest version.
I started out, as many people will, by downloading the unified installer which will install Python 2.6, Zope 2.12 and the Plone 4.0 alpha for you. After a few teething problems with multiple versions of Python on my Hardy host, I had my Plone install up and running.
First impressions among myself and my colleagues here at Isotoma were that firstly, it was a heck of a lot faster than its predecessor. In fact, John Stahl recently blogged that Plone 4 is potentially three times faster than Drupal, Joomla and Wordpress. The other main, marked difference was the default theme, which is a lot slicker, though in my own opinion with its blocks of bright colours and rounded corners, a little too overtly “Web 2.0” (insert air-quotes here).
My next stop was Martin Aspeli’s Dexterity developer manual which whilst up-to-date for the current stable release of Plone, required some tweaking to get going with Plone 4.
The unified installer, by default, makes use of several config files for buildout, which keeps a lot of the core settings in separate files (base.cfg & versions.cfg). I hear that roadrunner is almost ready for Plone 4, but it’ll be a little while before we’re getting it without checking out the source so that had to be chopped. The extends entry for Dexterity also required updating to the latest alpha.
Otherwise, things went very straightforwardly. My buildout.cfg for use with the unified installer can be found below the fold.
[buildout]
extends-cache = extends
extends = http://good-py.appspot.com/release/dexterity/2.0-next
base.cfg
versions.cfg
http-address = 8080
eggs =
Plone
Products.PDBDebugMode
Products.LinguaPlone
plone.reload
zcml =
plone.reload
develop =
src/example.project
debug-mode = off
backups-dir=${buildout:directory}/var
user=admin:password
parts =
productdistros
instance
zopepy
zopeskel
backup
unifiedinstaller
chown
omelette
test
extensions =
mr.developer
buildout.dumppickedversions
[versions]
Cheetah = 2.2.1
Paste = 1.7.2
PasteScript = 1.7.3
ZopeSkel = 2.15
collective.recipe.backup = 1.3
plone.recipe.command = 1.0
plone.recipe.distros = 1.5
plone.recipe.unifiedinstaller = 4.0a1
PasteDeploy = 1.3.3
[omelette]
recipe = collective.recipe.omelette
eggs = ${instance:eggs}
packages = ./
[test]
recipe = zc.recipe.testrunner
eggs = example.project
extra-paths =
defaults = ['--exit-with-status', '--auto-color', '--auto-progress']
Simon Willison
Integrate Tornado in Django
Integrate Tornado in Django. A handy ./manage.py runtornado management command for firing up a Tornado server that serves your Django application.
Geert Vanderkelen
Python, oursql and MacOS X 10.6 (Snow Leopard)
This post explains how to compile oursql and install it on MacOS 10.6. oursql is a Python database interface for MySQL, an alternative to MySQL for Python (i.e. MySQLdb) and MySQL Connector/Python.
First, find out which MySQL you installed. This can be either the 32-bit or the 64-bit version. To make sure, find the mysqld (e.g. in /usr/local/mysql/bin) and do the following in a Terminal window:
shell> file /usr/local/mysql/bin/mysqld
.../mysqld: Mach-O 64-bit executable x86_64
If you see x86_64, you got 64-bit, otherwise 32-bit. If you see both, then you have a universal build. This is important for specifying the ARGSFLAG when building.
Download oursql from Launchpad and unpack it into some directory. Using the information from above, you'll have to do following for 64-bit platform (or universal build) in a Terminal window:
shell> ARCHFLAGS="-arch x86_64" python setup.py build
shell> sudo python setup.py install
For 32-bit, you'll have to do:
shell> ARCHFLAGS="-arch i386" python setup.py build
shell> sudo python setup.py install
Following error will be reported when you don't specify the correct ARCHFLAGS:
ld: warning: in .../lib/libmysqlclient.dylib,
file is not of required architecture
Tips:
- When building failed, it is good to remove oursql, unpack it and try again.
- If you don't want to compile anything, or run into more troubles, give MySQL Connector/Python a try (alpha releases). It's a pure Python implementation of the MySQL Client/Server protocol and doesn't need compiling or a MySQL installation.
- You can download MySQL from either www.mysql.com or dev.mysql.com.
Geek Scrap
Integrate Tornado in Django
Tornado is a nice python WSGI-compliant web server developed by guys at FriendFeed. It’s primarily thought as application server for python web frameworks and according to FriendFeed benchmarks, it’s blazing fast thanks to its non-blocking connections. There are already some how-to’s on the web on plugging Django web framework into Tornado webserver. A quick recap:
- A tutorial on Tornado, Django and nginx by Jeremy Bowers.
- How to import django framework inside a Tornado project by Lincoln Loop.
- A snippet by lawgon.
My approach is slightly different as I wanted to run Tornado using Django management command-line interface.
The 3 easy steps are:
- Add Tornado module to your django setup. If you use buildout, add Tornado git checkout to buildout.cfg using minitage.recipe.fetch recipe, like this:
[buildout] ... parts = ... tornado django ... [tornado] recipe = minitage.recipe.fetch urls = git://github.com/facebook/tornado.git | git | | ${buildout:parts-directory}/tornado [django] recipe = minitage.recipe.scripts initialization = import os os.environ['DJANGO_SETTINGS_MODULE']='project.settings.development' scripts = django eggs = Django ... entry-points= django=django.core.management:execute_from_command_line extra-paths = ${buildout:directory} ${tornado:location} ...
- Next, create a command-line extension hierarchy in your project’s main app:
$ mkdir project/myapp/management $ touch project/myapp/management/__init__.py $ mkdir project/myapp/management/commands $ touch project/myapp/management/commands/__init__.py
- Last, add a runtornado.py script in project/myapp/management/commands/ folder with the following content:
from django.core.management.base import BaseCommand, CommandError from optparse import make_option import os import sys class Command(BaseCommand): option_list = BaseCommand.option_list + () help = "Starts a Tornado Web." args = '[optional port number, or ipaddr:port]' def handle(self, addrport='', *args, **options): import django from django.core.handlers.wsgi import WSGIHandler from tornado import httpserver, wsgi, ioloop sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0) sys.stderr = os.fdopen(sys.stderr.fileno(), 'w', 0) if args: raise CommandError('Usage is runserver %s' % self.args) if not addrport: addr = '' port = '8000' else: try: addr, port = addrport.split(':') except ValueError: addr, port = '', addrport if not addr: addr = '127.0.0.1' if not port.isdigit(): raise CommandError("%r is not a valid port number." % port) quit_command = (sys.platform == 'win32') and 'CTRL-BREAK' or 'CONTROL-C' def inner_run(): from django.conf import settings print "Validating models..." self.validate(display_num_errors=True) print "\nDjango version %s, using settings %r" % (django.get_version(), settings.SETTINGS_MODULE) print "Server is running at http://%s:%s/" % (addr, port) print "Quit the server with %s." % quit_command application = WSGIHandler() container = wsgi.WSGIContainer(application) http_server = httpserver.HTTPServer(container) http_server.listen(int(port), address=addr) ioloop.IOLoop.instance().start() inner_run()
To run your tornado webserver, you just need to call your usual management program like manage.py with runtornado command, with the same syntax as runserver. In my case, I just run production server using supervisord, with a command like this:
$ ./bin/django runtornado --settings=project.settings.production 8000
If you found this quick how-to useful, remember to follow me on Twitter or subscribe to my feed for more django tips.
Related posts:
Virgil Dupras
Embedded PyObjC
When people think of a PyObjC application, they usually think of a Python application that uses Objective-C libraries. However, it's also possible to do the opposite: An Objective-C application that embeds Python code through a plugin. Building an application this way has advantages (speed, integration and memory usage) and should be used more often. This article explains why and how to achieve this. More
Eli Bendersky
Removing epsilon productions from context free grammars
Background
epsilon productions are very useful to express many grammars in a compact way. For example, take these simple function call productions in some imaginary C-like language:
func_call:: identifier '(' arguments_opt ')'
arguments_opt:: arguments_list | eps
arguments_list:: argument | argument ',' arguments_list
When composing grammars by hand, simplicity matters. It’s very useful to be able to look at arguments_opt and know that it’s an optional list of arguments. The same non-terminal can be reused in several other productions.
However, epsilon productions pose a problem for several algorithms that act on grammars. Therefore, prior to running these algorithms, epsilon productions have to be removed. Fortunately, this can be done relatively effortlessly in an automatic way.
Here I want to present an algorithm and a simple implementation for epsilon production removal.
The algorithm
Intuitively, it’s quite simple to remove epsilon productions. Consider the grammar for function calls presented above. The argument_opt nonterminal in func_call is just a short way of saying that there either is an argument list inside those parens or nothing. In other words, it can be rewritten as follows:
func_call:: identifier '(' arguments_opt ')'
| identifier '(' ')'
arguments_opt:: arguments_list
arguments_list:: argument | argument ',' arguments_list
This duplication of productions for func_call will have to be repeated for every other production that had arguments_opt in it. This grammar looks somewhat strange, as arguments_opt is now identical to arguments_list. It is correct, however.
A more interesting case occurs when the epsilon production is in a nonterminal that appears more than once in some other production [1]. Consider:
B:: A z A
A:: a | eps
When we remove the epsilon production from A, we have to duplicate the productions that have A in them, but the production for B has two A. Since either of the A instances in the production can be empty, the only proper way to do this is go over all the combinations:
B:: z | A z | z A | A z A
A:: a
In the general case, if A appears k times in some production, this production will be replicated 2^k times, each time with a different combination [2].
This leads us to the algorithm:
- Pick a nonterminal A with an epsilon production
- Remove that epsilon production
- For each production containing A: Replicate it 2^k times where k is the number of A instances in the production, such that all combinations of A being there or not will be represented.
- If there are still epsilon productions in the grammar, go back to step 1.
A couple of points to pay attention to:
- It’s obvious that a step of the algorithm can create new epsilon productions [3]. This is handled correctly, as it works iteratively until all epsilon productions are removed.
- The only place where an epsilon production cannot be removed is at the start symbol. If the grammar can generate an empty string, we can’t ruin that. A special case will have to handle this case.
Implementation
Here’s an implementation of this algorithm in Python:
from collections import defaultdict
class CFG(object):
def __init__(self):
self.prod = defaultdict(list)
self.start = None
def set_start_symbol(self, start):
""" Set the start symbol of the grammar.
"""
self.start = start
def add_prod(self, lhs, rhs):
""" Add production to the grammar. 'rhs' can
be several productions separated by '|'.
Each production is a sequence of symbols
separated by whitespace.
Empty strings are interpreted as an eps-production.
Usage:
grammar.add_prod('NT', 'VP PP')
grammar.add_prod('Digit', '1|2|3|4')
# Optional Digit: digit or eps
grammar.add_prod('Digit_opt', Digit |')
"""
# The internal data-structure representing productions.
# maps a nonterminal name to a list of productions, each
# a list of symbols. An empty list [] specifies an
# eps-production.
#
prods = rhs.split('|')
for prod in prods:
self.prod[lhs].append(prod.split())
def remove_eps_productions(self):
""" Removes epsilon productions from the grammar.
The algorithm:
1. Pick a nonterminal p_eps with an epsilon production
2. Remove that epsilon production
3. For each production containing p_eps, replace it
with several productions such that all the
combinations of p_eps being there or not will be
represented.
4. If there are still epsilon productions in the
grammar, go back to step 1
The replication can be demonstrated with an example.
Suppose that A contains an epsilon production, and
we've found a production B:: [A, k, A]
Then this production of B will be replaced with these:
[A, k], [k], [k, A], [A, k, A]
"""
while True:
# Find an epsilon production
#
p_eps, index = self._find_eps_production()
# No epsilon productions? Then we're done...
#
if p_eps is None:
break
# Remove the epsilon production
#
del self.prod[p_eps][index]
# Now find all the productions that contain the
# production that removed.
# For each such production, replicate it with all
# the combinations of the removed production.
#
for lhs in self.prod:
prods = []
for lhs_prod in self.prod[lhs]:
num_p_eps = lhs_prod.count(p_eps)
if num_p_eps == 0:
prods.append(lhs_prod)
else:
prods.extend(self._create_prod_combinations(
prod=lhs_prod,
nt=p_eps,
count=num_p_eps))
# Remove duplicates
#
prods = sorted(prods)
prods = [prods[i] for i in xrange(len(prods))
if i == 0 or prods[i] != prods[i-1]]
self.prod[lhs] = prods
def _find_eps_production(self):
""" Finds an epsilon production in the grammar. If such
a production is found, returns the pair (lhs, index):
the name of the non-terminal that has an epsilon
production and its index in lhs's list of productions.
If no epsilon productions were found, returns the
pair (None, None).
Note: eps productions in the start symbol will be
ignored, because we don't want to remove them.
"""
for lhs in self.prod:
if not self.start is None and lhs == self.start:
continue
for i, p in enumerate(self.prod[lhs]):
if len(p) == 0:
return lhs, i
return None, None
def _create_prod_combinations(self, prod, nt, count):
""" prod:
A production (list) that contains at least one
instance of 'nt'
nt:
The non-terminal which should be replicated
count:
The amount of times 'nt' appears in 'lhs_prod'.
Assumed to be >= 1
Returns the generated list of productions.
"""
# The combinations are a kind of a powerset. Membership
# in a powerset can be checked by using the binary
# representation of a number.
# There are 2^count possibilities in total.
#
numset = 1 << count
new_prods = []
for i in xrange(numset):
nth_nt = 0
new_prod = []
for s in prod:
if s == nt:
if i & (1 << nth_nt):
new_prod.append(s)
nth_nt += 1
else:
new_prod.append(s)
new_prods.append(new_prod)
return new_prods
And here are the results with some of the sample grammars presented earlier in the article:
cfg = CFG()
cfg.add_prod('identifier', '( arguments_opt )')
cfg.add_prod('arguments_opt', 'arguments_list | ')
cfg.add_prod('arguments_list', 'argument | argument , arguments_list')
cfg.remove_eps_productions()
for p in cfg.prod:
print p, ':: ', [' '.join(pr) for pr in cfg.prod[p]]
Produces:
func_call :: ['identifier ( )', 'identifier ( arguments_opt )']
arguments_list :: ['argument', 'argument , arguments_list']
arguments_opt :: ['arguments_list']
As expected. And:
cfg = CFG()
cfg.add_prod('B', 'A z A')
cfg.add_prod('A', 'a | ')
cfg.remove_eps_productions()
for p in cfg.prod:
print p, ':: ', [' '.join(pr) for pr in cfg.prod[p]]
Produces:
A :: ['a']
B :: ['A z', 'A z A', 'z', 'z A']
The implementation isn’t tuned for efficiency, but for simplicity. Luckily, CFGs are usually small enough to make the runtime of this implementation manageable. Note that the preservation of epsilon productions in the start rule is implemented in the _find_eps_production method.

| [1] | From here on, lowercase letters early in the alphabet (a, b, c…) are terminals. Early uppercase letters (A, B, C…) are nonterminals, and letters late in the alphabet (z, y, x…) are arbitrary strings of terminals and nonterminals. |
| [2] | If this sounds like generating a powerset, you’re right. |
| [3] | Consider the productions: |
A:: a | eps
B:: b | A
After removing the epsilon production from A we’ll have:
A:: a
B:: b | A | eps
Related posts:
- Generating random sentences from a context free grammar Sometimes it’s interesting to randomly generate a large amount of...
- The context sensitivity of C’s grammar Context free grammars (CFGs) are a valuable theoretical tool on...
Carl Trachte
Handling UnicodeEncodeError in the Console (Python 3.1)
I've been working with a lot of different foreign scripts for the past six months or so. Ideally I like to work in the console where possible. An error that always comes up is the following:
[carl@pcbsd]/home/carl(139)% python3.1
Python 3.1.1 (r311:74480, Jan 17 2010, 23:15:26)
[GCC 4.2.1 20070719 [FreeBSD]] on freebsd7
Type "help", "copyright", "credits" or "license" for more information.
>>> print('\u0400')
Traceback (most recent call last):
File "", line 1, in
UnicodeEncodeError: 'ascii' codec can't encode character '\u0400' in position 0: ordinal not in range(128)
>>>
After a while this can get pretty annoying. There's a number of ways to get around the problem. I don't know much about most of the languages I'm dealing with, so I prefer the Unicode code charts' capitalized ASCII descriptions to glyphs or empty boxes. Fortunately the unicodedata module has all this information available.
To get the output I wanted I came up with a little script:
# mockprint.py - wrapper around print
# function to handle
# UnicodeEncoding errors
# python 3.1
import unicodedata
ERRORSTR = "'ascii' codec can't encode character "
CHARIDX = 5
POSITIDX = 8
POSITIDX2 = 7
def mockprint(stringx):
"""
Wrapper for print() function that
replaces unprintable characters
with their Unicode names.
"""
try:
print(stringx)
except UnicodeEncodeError as e:
# main cases:
# 1) one character can't be printed
# 2) multiple characters in a row can't be printed
# 3) unicode character is first or last in string
# 4) other ascii characters surround the unicode ones
reasonx = str(e)
reasonx = reasonx.split(' ')
idx = reasonx[POSITIDX]
# more than 1 char in a row can't be printed
if idx == 'ordinal':
idx = int(reasonx[POSITIDX2][0])
if idx != 0:
print(stringx[:idx])
print(unicodedata.name(stringx[idx]))
mockprint(stringx[(idx + 1):])
# offending character shows up after ascii chars
elif len(stringx) > 1:
charx = int(reasonx[CHARIDX][3:-1], 16)
charx = chr(charx)
print(unicodedata.name(charx))
mockprint(stringx[(int(idx[0]) + 1):])
# end of the line
elif len(stringx) == 1:
charx = int(reasonx[CHARIDX][3:-1], 16)
charx = chr(charx)
print(unicodedata.name(charx))
A quick demo:
>>> import mockprint
>>> mockprint.mockprint('hello\u0401\u0402\u0403\u0404world')
hello
CYRILLIC CAPITAL LETTER IO
CYRILLIC CAPITAL LETTER DJE
CYRILLIC CAPITAL LETTER GJE
CYRILLIC CAPITAL LETTER UKRAINIAN IE
world
And something a bit more challenging:
>>> import mockprint
>>> for linex in fle.readlines():
... mockprint.mockprint(linex)
...
CJK UNIFIED IDEOGRAPH-65E5
CJK UNIFIED IDEOGRAPH-672C
CJK UNIFIED IDEOGRAPH-8A9E
abcde
ETHIOPIC SYLLABLE GLOTTAL A
ETHIOPIC SYLLABLE MAA
ETHIOPIC SYLLABLE RE
ETHIOPIC SYLLABLE NYAA
ARMENIAN CAPITAL LETTER HO
ARMENIAN SMALL LETTER AYB
ARMENIAN SMALL LETTER YI
ARMENIAN SMALL LETTER ECH
ARMENIAN SMALL LETTER REH
ARMENIAN SMALL LETTER ECH
ARMENIAN SMALL LETTER NOW
ORIYA LETTER O
ORIYA LETTER DDA
ORIYA SIGN NUKTA
ORIYA VOWEL SIGN I
ORIYA LETTER AA
LAO LETTER PHO TAM
LAO VOWEL SIGN AA
LAO LETTER SO SUNG
LAO VOWEL SIGN AA
LAO LETTER LO LOOT
LAO VOWEL SIGN AA
LAO LETTER WO
CYRILLIC SMALL LETTER ER
CYRILLIC SMALL LETTER U
CYRILLIC SMALL LETTER ES
CYRILLIC SMALL LETTER ES
CYRILLIC SMALL LETTER KA
CYRILLIC SMALL LETTER I
CYRILLIC SMALL LETTER SHORT I
CYRILLIC SMALL LETTER YA
CYRILLIC SMALL LETTER ZE
CYRILLIC SMALL LETTER YERU
CYRILLIC SMALL LETTER KA
Ned Batchelder
Test classes, singular or plural?
A minor hiccup in writing unit tests is how to name the classes that contain them. The jUnit style of test class, which has been adopted by virtually everyone, including Python's unittest module, is that tests are methods of a class. The class is instantiated once for each test method, then three methods are called: setUp, the test method, and tearDown.
As a result, you end up with test classes that look like this:
# Tests for double_it, and no, no one would write them this way...
class DoubleItTests(unittest.TestCase):
def test_ten(self):
assert double_it(10) == 20
def test_twenty(self):
assert double_it(20) == 40
Here I've named the class DoubleItTests, plural. That's because I can see that it's a container for a number of tests. This feels right if you think about the class simply as a namespace for the test methods.
But what is instantiated from the class? Only single tests. In this case, the class will be instantiated twice, once to run test_ten, and once to run test_twenty. The class' name should really be the name of the objects. No one would name their user class Users under the theory that the class encompasses a number of users.
So the test class should really be called DoubleItTest, which I guess fits in with the unittest.TestCase base class it derives from. But somehow it just looks wrong.
This is reminiscent of the SQL table naming dilemma. Is it the CUSTOMER table, or the CUSTOMERS table? How you feel about it seems to come down to whether you think natively in SQL, or whether it's just a backing store for your ORM.
I'm getting used to the singular test class name, but it still doesn't come naturally, I have to remind myself to leave off those tempting plurals.
Michael Foord
A Little Bit of Python Episode 4: A Pre-PyCon Special
A Little Bit of Python is an occasional podcast on Python related topics with myself, Brett Cannon, Jesse Noller, Steve Holden and Andrew Kuchling. The website is in progress and apparently nearly ready, thanks to Jesse and various other people who we will thank as soon as it is done. ... [233 words]
February 07, 2010
Michael Foord
ConfigObj 4.7.1 (and how to test warnings)
I hate doing releases. I haven't managed to automate the whole process (I should probably work on that), although setup.py sdist upload certainly helps. ... [290 words]
Discover 0.3.2 and the load_tests protocol
discover is a test discovery module for the standard library unittest test framework. Test discovery is built into unittest in Python 2.7 and 3.2. ... [335 words]
Roberto Alsina
Marave 0.2 is out!
Version 0.2 of Marave, a distraction-free fullscreen editor is out at http://marave.googlecode.com
This version includes several bugs fixed and features implemented since 0.1.1:
- A corrupted Right-click menu (Issue 20)
- Flickering on background changes
- More detailed licensing information
- More tested on Windows
- Added help (F1)
- Search & Replace (but replace all is not done)
- New artwork
- Status notifications
- Document Info (Ctrl+I)
- Better feedback in the UI elements (specially the buttons)
- Save font size correctly
- Fix "Starts in the background" problem (Issue 17)
Marave is free softare released under the GPL, and should work in all major desktop platforms.
I would love feedback on this release, as well as ideas for Marave's future, so a mailing list for Marave has been opened:
http://groups.google.com/group/marave-discuss
Of course, if you like Marave, feel free to give me money
Python User Groups
pyCologne Python User Group, Cologne, Germany, January, 10th, Announcement
The next meeting of pyCologne will take place
Wednesday, February, 10th
starting about 6.30 pm - 6.45 pm
at Room 0.14, Benutzerrechenzentrum (RRZK-B)
University of Cologne, Berrenrather Str. 136, 50937 Köln, Germany
Agenda:
- editmoin (Reimar Bauer)
- Using MoinMoin-Templates (Reimar Bauer)
- Further discussion topics, news, book-presentations etc. are welcome on each of our meetings!
Further information including directions how to get to the location can be found at:
http://www.pycologne.de (Sorry, this page is in German only)
Geert Vanderkelen
FOSDEM: 'Connecting MySQL and Python', handout & wrap-up
Apparently, my talk at FOSDEM 2010 about Connecting MySQL and Python was the only one about Python? There should be more, or?
I have a hand-out ready in PDF. The slides are not usable without my chatter. It contains a few examples and links. Any comments, corrections, criticism.. are welcome!
The longer version of this talk will be given at the O'Reilly MySQL Conference&Expo 2010 in Santa Clara, California (USA).
Noah Gift
Funniest Quote About Python in 2009
http://www.simple-talk.com/opinion/geek-of-the-week/interview-with-the-scary-dba-–-grant-fritchey/
"What do you see as the future of automating database administration? Will Powershell come to rule all or do you think Python will still have its fans?
GF:
PowerShell is going to take over. Microsoft is positioning it across all of its platforms and its various products such as Exchange and Operations Manager.
While there's always going to be other languages used, like Python or Perl, they're going to be marginalized under a PowerShell juggernaut.
"
Powershell, seriously?

