A. Jesse Jiryu Davis

My PyCon Lightning Talk About Toro

The lightning talk I gave at PyCon is now online. I talked for 4½ minutes on Toro, the package I wrote to provide locks, events, conditions, semaphores, and queues for Tornado. Watch for a quick intro on advanced control flow with [...]

The lightning talk I gave at PyCon is now online. I talked for 4½ minutes on Toro, the package I wrote to provide locks, events, conditions, semaphores, and queues for Tornado. Watch for a quick intro on advanced control flow with coroutines:

reStructuredText in PyCharm, Firefox, and Anger

I spend a lot of time writing Python package documentation in reST. Nevertheless, I find reST's markup permanently unlearnable, so I format docs by trial and error: I type a few backticks and colons and angle-brackets and random crap, [...]

I spend a lot of time writing Python package documentation in reST. Nevertheless, I find reST's markup permanently unlearnable, so I format docs by trial and error: I type a few backticks and colons and angle-brackets and random crap, sphinx-build the docs as HTML, and see if they look okay.

Here's some tools to support this expert workflow.

PyCharm: My favorite Python IDE has basic syntax-highlighting and auto-completion for reST. It's not much, but it far exceeds the amount of reStructuredText syntax that can fit in my tiny brain. It really shines when I'm embedding Python code examples in my docs: PyCharm gives me full IDE support, including automatically adding imports, auto-completing method names and parameters, and nearly all the help I get when editing normal Python files.

There's a file-watcher plugin for PyCharm that seems like a nice way to rebuild docs when the source files change, but it's not yet compatible with the latest version of PyCharm. So instead:

Watchdog: I install the watchdog Python package, which watches files and directories for changes. Watchdog gives me a command-line tool called watchmedo. (I find this fact unlearnable, too; why isn't the tool called watchdog the same as the package?) I tell it to watch my package's files for changes and rebuild the docs whenever I save a file:

watchmedo shell-command --command="sphinx-build doc build" .

Now that I can regenerate HTML automatically, I need a way to reload the browser window automatically:

auto-reload is a Firefox extension that detects any tab with a file:// URL and reloads it when the file changes. In my testing it seems to detect changes in linked files (CSS and Javascript) too. A nice little bar slides down to tell me when it's reloading. That way I know that the reason the page is still a mess is because my reST is still wrong, not because it hasn't reloaded:

Auto reload

This little suite of tools deals well with invoking Sphinx and reloading my web page, so I can focus on the task at hand: trying to write reStructuredText, which is a loathsome afterbirth expelled from the same womb as XML and TeX.

Begging

I periodically spend four days homeless, with a Zen teacher named Genro and a small group of fellow Buddhists. We live, sleep, and meditate on the streets together and eat at soup kitchens. I think the retreat has a triple purpose: First, [...]

7161960026 e92ea3c4bb

I periodically spend four days homeless, with a Zen teacher named Genro and a small group of fellow Buddhists. We live, sleep, and meditate on the streets together and eat at soup kitchens. I think the retreat has a triple purpose: First, briefly abandoning the comfort and certainty of my regular life helps me practice non-attachment, the same as it helped the first Buddhist monks. Second, it gives me a taste of what it's like to be homeless, so I can better understand the homeless people I meet in NYC. And finally, it's an opportunity to raise money for homeless services.

The rule is that I must raise $500 by May 2. The money will be distributed among the organizations that help us while we're on the street, and it will support the social service activities of the Hudson River Peacemaker Center. I have to beg for the money—I'm not allowed to just donate $500 of my own.

So I'm begging you: Will you please donate?

Update: I've now (April 16) exceeded my minimum, with $803. But donate anyway! Additional funds are divided the same as the first $500.

Shuso Hossen, Spring 2013

Two weeks ago the Village Zendo completed a week-long urban sesshin focused on our awareness of disabilities. We were blindfolded for part of one day, and wore earplugs for part of another. The retreat ended with the Shuso Hossen [...]

Two weeks ago the Village Zendo completed a week-long urban sesshin focused on our awareness of disabilities. We were blindfolded for part of one day, and wore earplugs for part of another. The retreat ended with the Shuso Hossen ceremony, in which R. Liam Oshin Jennings gave his first dharma talk.

Oshin shuso hossen 3

Oshin shuso hossen 1

Oshin shuso hossen 2

Oshin shuso hossen 4

Oshin shuso hossen 5

Oshin shuso hossen 7

Oshin shuso hossen 9

Review of "MongoDB Applied Design Patterns" by Rick Copeland

There's a lot of bad advice out there regarding MongoDB. As I wrote in my last review, even smart sources can encourage risky methods. Soon, I hope, there will be as much good MongoDB instruction from experts outside 10gen as there is good [...]

There's a lot of bad advice out there regarding MongoDB. As I wrote in my last review, even smart sources can encourage risky methods. Soon, I hope, there will be as much good MongoDB instruction from experts outside 10gen as there is good third-party SQL instruction. For now, know that you can trust Rick Copeland.

Copeland's new O'Reilly book on MongoDB complements O'Reilly's other five: the majestic Definitive Guide (due for a second edition in June), Scaling MongoDB, 50 Tips and Tricks, and the MongoDB books for Python and PHP.

After you've read the Definitive Guide, a good candidate for your second MongoDB book is Applied Design Patterns. (Disclosure: I was paid to critique an early draft.) Copeland's intended audience has basic MongoDB competence and wants application examples that optimize either for scalability or maintainability, plus the principles to guide new designs. Copeland also assumes basic SQL knowledge, and presents most examples in contrast to conventional SQL solutions, a method I find distracting and irrelevant. He identifies some common application types (product catalog, CMS, analytics, etc.) and provides for each a schema and application logic. He goes far beyond prior works when he discusses performance, consistency guarantees, and sharding considerations for every application.

MongoDB Applied Design Patterns

In Part 1, Copeland discusses the basic questions about MongoDB schemas. Right away, he identifies what makes nonrelational design different:

There is no longer a "garden path" of normalized database design to go down, and the go-to answer when faced with general schema design problems in MongoDB is "it depends".

MongoDB requires optimization up front, more often than SQL schema design does. (Armin Ronacher noticed this too a few months ago.) Most often the question is whether to embed or to link, and what data should be normalized or denormalized. Copeland uses an extensive description of disk seek times to explain the motivations for embedding and denormalization, better than prior MongoDB schema-design materials have.

Many presentations, my own included, have claimed that you can migrate your schema lazily with MongoDB: your application can start writing data in a new format, and read data in both new and old formats, while a batch job slowly migrates old data. MongoDB Applied Design Patterns finally presents a complete example of lazy migration, including example code (in Python) for reading data in both formats while the migration is in progress.

Without general-purpose transactions, MongoDB requires new techniques to guarantee that a series of changes is atomic: that is, to guarantee that in the long run your data either reflects all the changes or none of them. The simple approach is to put all related data in one document and use update operators to modify all the data in one shot. If there's no way to restrict your atomic operation to one document, your next best bet is optimistic concurrency control: try to complete the operation, check if another process overwrote your changes, and if so retry them. There are a number of examples of this in the wild (the MongoDB Manual, Dan Crosta, Scott Hernandez); Copeland's contribution is unusually complete, with example code for handling every case that can arise.

Part 2 of the book is much longer, and covers six kinds of application in depth, both conventional (a social network) and unusual (a role-playing game). Here Copeland excels. Where he covers well-tread ground his designs are more detailed and better thought out than prior authors', and where he innovates he chooses interesting problems to solve. In the Operational Intelligence chapter he explains compound indexes clearly and correctly. He presents a complete design for an analytics application using the MongoDB aggregation framework, and covers the interactions between aggregation, indexes, and sharding.

The final example of the book is an online Zork-style game. This is less widely applicable than E-Commerce or content management, but way more fun. Copeland chooses to radically denormalize his schema: when a player enters a room, the room's entire data structure is copied into the player's document so the game can display the player's state without querying for the room again. As with the other examples, this application is considered in depth: each query is carefully indexed, and when a player picks up an item, Copeland's code prevents another player from picking it up concurrently. Most of the game's intelligence is expressed in Python code rather than in MongoDB queries. Developers using Oracle or Microsoft SQL Server tend to push all the logic and complexity into their schema, their queries, and stored procedures. With MongoDB's simpler feature-set, coders have to move more logic out of the database and into their application. If a SQL refugee hasn't yet learned this lesson, the gaming chapter will drive it home.

Review of "Building Node Applications with MongoDB and Backbone"

Mike Wilson's O'Reilly book from December 2012 introduces some hip web development techniques by building a book-long example of a social networking app. Besides introducing MongoDB, Backbone, and Node, he shows the beauty and [...]

Mike Wilson's O'Reilly book from December 2012 introduces some hip web development techniques by building a book-long example of a social networking app. Besides introducing MongoDB, Backbone, and Node, he shows the beauty and remarkable concision of Jade, Require.js, and Mongoose. He demonstrates good patterns for organizing your code in an application of substantial complexity, covers a lot of ground in few pages, and concludes with an unusually feature-complete chat-server example that weaves together all the layers of the stack. Wilson has some dangerous habits readers shouldn't emulate, but on balance his book teaches well.

Building node applications

By necessity, the book jumps frequently between Node and Backbone, models and views, HTML and Javascript. It's the nature of web development that each new feature requires changes in many places, and it's hard to stay oriented. Wilson maintains a corrected version of each chapter's code on Github; use that instead of relying entirely on the examples in the book.

I've built one large front-end Javascript application with Backbone, and I floundered at organizing it. Although Backbone is rigorous (hence the name) about separating models and views, higher-level questions are underspecified: how should the code be split among files? Whose responsibility is it to create the models and views? Wilson uses Require.js to neatly slice code into files and to declare the dependencies among them. In his example application, the Backbone router is responsible for instantiating all models and views. As the book progresses and his example application grows, the routes, models, and views remain focused and decoupled. It's a compelling design. I wish I'd known.

Wilson spends an early chapter building a login system for his example app, before implementing any features. He even salts his password hashes to defend against rainbow tables. An author less secure in his convictions would fear losing his reader's attention, but Wilson insists on doing the right thing. And rightly so: readers will paste his examples and put them into production, so the examples should be complete.

On the other hand, Wilson's introduction to MongoDB misses some marks. It's only 12 pages, so why did he spend two of them on MapReduce? MapReduce has always been intended for big batch processes, not web applications. MongoDB books and talks have long over-emphasized MapReduce, which should be confined to a niche. The aggregation framework, on the other hand, is general-purpose and was released months before Wilson's book; it should have been covered instead.

Wilson also shows a MongoDB pattern that risks losing updates and is needlessly slow: When a user adds a contact in his social-networking site, Wilson's code fetches the whole user document, adds the contact, and saves the whole document back:

app.post('/accounts/:id/contact', function(req,res) {
  var accountId = req.params.id;
  var contactId = req.param('contactId', null);

  models.Account.findById(accountId, function(account) {
    models.Account.findById(contactId, function(contact) {
      models.Account.addContact(account, contact);
      account.save();
    });
  });

  // Note: Not in callback - this endpoint returns immediately and
  // processes in the background
  res.send(200);
});

(I've edited for brevity; the whole code is on GitHub.) Note that if two requests are updating the same account, the first one's updates are lost. $addToSet would have solved this, and would be more efficient too.

Equally worrisome is Wilson's tendency to drop errors on the floor instead of reporting them to the user, as shown at the bottom of this function. He argues "we are accepting the small but rare inconvenience in order to serve the majority of requests at an accelerated speed." This is a terrible argument for silencing errors, especially since the front-end framework needn't block the user from interacting with the UI while it waits for the server response.

A book like this seems intended to show best practices, and patterns that encourage correctness. Some of the hardest patterns to learn are error-handling in Node and concurrency control in MongoDB. I wish Wilson had devoted half the attention he placed on security to these two topics.

But I'm only mad at these flaws because the book they mar is a good one. As Wilson builds up his architecture piece by piece, the patterns appear both usable and elegant, and capable of staying clean as the app grows. Wilson uses Backbone custom named events like "app:loggedin" or "chat:start" to coordinate his front-end code, instead of letting views directly call methods on other views. A novice Backbone user might not see the tremendous value of decoupling views this way, but take it from me—it's a great idea.

The book concludes with a long chat example. Chat examples with Socket.io and Node are legion—indeed, obligatory—but the completeness of this one, including its integration with Backbone, is a tour de force. If you plan to use either Node or Backbone this book has excellent recommendations for structuring a large app, and even if you're not building with any of the frameworks Wilson covers, his examples can inspire you to write more concise and decoupled code.

Slides from my PyCon lightning talk on Toro

Here's the 8 slides for my 4½-minute talk on Toro this morning. Toro is a package I wrote last year that provides objects something like locks, events, conditions, semaphores, and queues for Tornado coroutines. PyCon lightning [...]

Here's the 8 slides for my 4½-minute talk on Toro this morning. Toro is a package I wrote last year that provides objects something like locks, events, conditions, semaphores, and queues for Tornado coroutines.

PyCon lightning talk on my Toro module for Tornado from emptysquare

Plop: Python Profiler With Call Graphs

Tornado's maintainer Ben Darnell released a Python Low-Overhead Profiler or "Plop" last year, and I'm just now playing with it. Unlike cProfile, which records every function call at great cost to the running process, Plop promises that [...]

Tornado's maintainer Ben Darnell released a Python Low-Overhead Profiler or "Plop" last year, and I'm just now playing with it. Unlike cProfile, which records every function call at great cost to the running process, Plop promises that "profile collection can be turned on and off in a live process with minimal performance impact."

A Plop Collector samples the process's call stack periodically (every 10 milliseconds by default) until you call Collector.stop(). Plop's profile viewer is a web application built on Tornado and d3.js, which uses a fun force-directed layout to display your process's call graph. You can use the demo scripts from Plop's repo to make an example profile:

Call graph

Functions are shown as circles, sized according to the number of times they were executed and colored according to filename. Edges connect callers to callees. The visualization nearly freezes Firefox but runs well in Chrome.

Plop isn't going to replace cProfile and RunSnakeRun, but that's not its intention. Better to think of it as a lightweight complement to the heavier machinery: Plop is nice for visualizing call graphs (which RunSnakeRun does badly) and for sampling a live process in a performance-critical environment.

Review of Roman Vishniac Rediscovered

Today I saw the International Center of Photography's big retrospective, "Roman Vishniac Rediscovered". The show opens with Vishniac's Berlin street photography from the 1920s and 30s, in which he concentrates on form: shafts of [...]

Roman Vishniac, Salesmen

Today I saw the International Center of Photography's big retrospective, "Roman Vishniac Rediscovered". The show opens with Vishniac's Berlin street photography from the 1920s and 30s, in which he concentrates on form: shafts of light in a train station; a workman on a diagonal ladder amid diagonal shadows; four boys admiring a motorcycle, all dressed alike. The beauty and the visual coincidences he catches are delightful. The scene darkens as the Nazis rise to power, and the impact of the photos, unfortunately, wanes. Vishniac's photo of his daughter wearing a cute beret, standing in front of a Hitler poster, is ominous, but not particularly good.

Vishniac's most prominent achievement is his photographs of Eastern European Jews in the late 1930s. The project was commissioned by an American Jewish relief fund to highlight the poverty of Jews in Eastern Europe, much in the same way (and at the same time) as the FSA commissioned Dorothea Lange and Walker Evans to photograph the Dust Bowl. ICP displays the work in fine new inkjet prints from Vishniac's negatives, and sometimes shows images Vishniac had originally edited out: Jewish women in secular dress, for example, or a prosperous-looking Jewish shop. The exhibit demonstrates how Vishniac selected his photos to accomplish a narrow view of Jewish life: poor, religious, medieval. When this world was wiped out by the Nazis a few years later, Vishniac's record of it became a twilit elegy, but the work as we've known it is not the whole scene Vishniac saw.

Roman Vishniac, Beggars

Propagandistic, too, are Vishniac's 1939 photographs of a Dutch "agrarian training camp" that prepared Zionist youth for emigration to Palestine. The images are posed, with clear inspiration from Socialist Realism. They're of their time: the age of statism, when individuals everywhere were subsumed in one ideology or another.

Roman Vishniac, Zionist Youth

It makes one nostalgic for the pictures made before all the polemics, when Vishniac was satisfied just to photograph stylish figures in slashing light. Unburdened by any message, these images are light, and the best in the show.

Roman Vishniac, Train Station

Motor 0.1 Migration Instructions

Motor (which is indeed my non-blocking driver for MongoDB and Tornado) had a 0.1 release to PyPI yesterday. It had an odd history prior, so there are various versions of the code that you, dear reader, may have installed on your system. All [...]

Motor (which is indeed my non-blocking driver for MongoDB and Tornado) had a 0.1 release to PyPI yesterday. It had an odd history prior, so there are various versions of the code that you, dear reader, may have installed on your system. All you need to do is:

$ pip uninstall pymongo motor
$ pip install motor

Motor will pull in the official PyMongo, plus Tornado and Greenlet, as dependencies. You should now have Motor 0.1 and PyMongo 2.4.2:

>>> import pymongo
>>> pymongo.version
'2.4.2'
>>> import motor
>>> motor.version
'0.1'

(The lore is: I started Motor last year in a branch of my fork of PyMongo, so you could've installed an experimental version of both PyMongo and Motor from there. Then we transferred Motor into its own repo within the MongoDB.org organization on January 15. And on February 1st a zealous fan actually grabbed the "Motor" package name on PyPI and uploaded my code to it, then transferred ownership to me, just to make sure I could use the name Motor.)