A Natural Environment

Sunset Park, (c) Google Maps

Sunset Park street view, (c) Google Maps

Darren Aronofsky’s 1998 movie “Pi” is very stupid, but it has an awesome techno soundtrack to which I frequently listen while coding. It interleaves electronica tracks with snippets of narration from the movie. Some of these, as I have said, are very stupid. Exhibit:

If you graph the numbers of any system, patterns emerge. Therefore, there are patterns everywhere in nature. So, what about the stock market?

Actually, if you graph the numbers of a system, sometimes patterns emerge and sometimes they do not. Some systems are deterministic, others stochastic, most are a mix. The stock market is noticeably resistant to merely numeric analysis. However, Aronofsky follows the silly lines above with a stroke of insight. He calls the global economy,

A vast network, screaming with life. A natural organism.

Right. We are natural organisms, after all, and so are our creations, the same way that both ants and their colonies are natural. When a particular creation—the Empire State Building, say, or the ad campaign for Huggies—is largely the deliberate outcome of one mind or a few, I’d call it “artifice.” But when a human-made system outgrows the comprehension of any human mind or the control of any human institution, then it is natural.

I’m reading E. B. White’s collection of essays from the mid-1950s, “The Points of My Compass.” Much of the collection is about his country home in Maine, where he weathers hurricanes and watches a raccoon descend and ascend its tree every night. White writes,

I would feel more optimistic about a bright future for man if he spent less time proving that he can outwit Nature and more time tasting her sweetness and respecting her seniority.

I agree, and at the same time, there are twice as many of us now as then, and no stopping the growth. We can’t all live in country homes—the country is too small. So how can we taste nature’s sweetness?

I’ve been thinking for years that New York City is a natural environment. It grows upward and outward, or collapses or changes, beyond our comprehension. It’s older than us and impossible to outwit. The city government and the big developers can cultivate patches of it, but as with any cultivation, as much depends on the times as their will. The tranquility one aims for on a walk in nature can be achieved in the city, too, if one walks with the same state of mind: this place grew by itself.

I asked a painter once, are there ugly colors in nature? He said there weren’t. So why are there ugly colors in hotels? He said he’d thought about this question a lot, and it was a mystery. It is still a mystery to me, but part of the answer must be that we know nature made no choices. There’s nothing to criticize in a sunset. It is the only way the sky can look tonight. Whereas a painting of a sunset on the wall of a cheap room is full of wrong choices.

Arguing the contrapositive: 18th-Century British with too much time on their hands made a science of critiquing landscapes, examining “the face of a country by the rules of picturesque beauty.” It must have ruined nature for them permanently. It’s when we let nature follow its own rules, not ours, that we taste its sweetness. When I abandon my judgment on an ugly block in New York, voilà: I’m in nature.

Of course we should try to limit population growth, and preserve what nature is left to us, and step into it (carefully) sometimes to remember what it was like. But my natural environment is the shops and sidewalks that grew here over the centuries. I like tasting their sweetness just as much.

Zen Street Retreat Photos

Street Retreat

Some photos from a four-day street retreat I did in September here in New York.

Street retreat is an American Zen practice, I believe invented by Bernie Glassman, of spending a short time living and sleeping on the streets. It brings us into contact with poor and homeless people and those who help them, and gives us a taste of takuhatsu, and of the homelessness of monks in the time of the Buddha.

This September was my second retreat. We had a group of eight people, led by Genro Roshi. We ate at churches and Christian soup kitchens, mainly the Bowery Mission. Circumstances were rougher than I expected; it rained most of the retreat, and major confrontations were beginning between Occupy Wall Street and the NYPD, so it was hard to find safe and dry places to sleep. We spent one night in Zuccotti with OWS, one at The Interdependence Project, and one going back and forth on the Staten Island Ferry.

I shot these on a Nikon FM2 35mm with Tri-X. Didn’t want to carry a digital camera with me, or anything that would mind too much getting wet.

Review of “Introductory Graph Theory” by Gary Chartrand

Graph theory

If you are a software developer, then from time to time you will have to solve an Interesting Problem in optimization, such as finding the best matches on a dating site, or the right sharding key for a database cluster. As often as not, such problems can be expressed as a network of interrelated nodes, and if so, the problem probably has a name and a known solution in graph theory. Your problem now is to know that name so you can Google for the solution.

This book is a charming, breezy intro to graph theory, covering basic terminology and some theorems related to planar graphs, topology, map-coloring, matching, and optimization. I was a course shy of minoring in math in college, and I’ve taken a handful of courses over subsequent 10 years, so I’m far from a math whiz. But I found nearly every proof and problem accessible, sometimes requiring some effort, but never insurmountably hard. The book promises to be introductory, and I have now been introduced—the next time I run into a stumper while I’m building software, I feel I have a far better chance of recognizing it as a known problem with a standard solution. Highly recommended.

We’re All Gonna Die

Have I mentioned lately that Zen is my favorite religion? Among the best attributes of Zen is its forthright acknowledgement of death. You are going to die. There is some paperwork you should handle beforehand. My temple will offer a workshop this Saturday, May 12 to help you write a will, power of attorney, advance directive, and everything else you’ll need for a convenient demise. If you still have time, you can also compose your death poem.

Patience

Rocks

This is my second homework for Path of Practice. Last month I wrote about generosity, this month it’s about kshānti, or patience.

•••

Last Sunday I visited my grandmother. She’s in her 90s, and each month I see her, her mind is a little more gone than the last time. It’s remarkable how steadily she declines. Just when I think she’s gone so far she must be bottoming out, she loses another ounce of her faculties.

I brought some sushi and salad for lunch. As I maneuvered her into her seat at the dining table she looked at the salad and forgot she was in the process of sitting. She reached into the salad, grabbed a cherry tomato, and turned it one way and another, trying to remember what it was. I’d never seen her do anything like that before. She seemed never to have experienced sushi, either. When I wasn’t watching she picked up a lump of wasabi and tried to eat it. She spat it out and looked at me pleadingly, but also with humor. Her face seemed to say, “I was just sitting here and look what an awful thing happened to me. How funny!” Her eyes teared up and her nose ran. She didn’t know why she felt so strange. She pulled her shawl around her, thinking she was cold.

For my homework on kshānti I’m writing about Grandma, because it’s with her that I’m most patient. I don’t claim I’m a perfect grandson. Now that being with her is harder than it used to be, I visit less often, and there’s a buildup of impatience that I have to release in some way afterward. But for the most part, when I see her, I accept each new small duty that results from her descent, like preventing her from eating the wasabi.

It’s easy for me to offer patience to someone who’s a model of patience herself. Grandma’s endured more than most. She was an active artist from her 30s onward, but her career in each medium was cut short by arthritis. She began working in stone, and when the chisel was too painful to use she could still paint and weave, until using the brush or sitting at her loom’s bench hurt too much. Her best work was the stone. She’d see an animal, or a man, or a mother and child in a big smooth stone, and carve the minimal line to bring out what she saw, so others could see it. A few long grooves in the stone, and out popped a lamb or a frog.

Now she’s headed the other direction, losing details, smoothing out. She used to tell me the names of the flowers in her garden: gladiolus, rhododendron, hepatica. A few years ago she forgot them all. She could only point at her flowers and say, “What an extraordinary area!” Last spring she forgot she was the one who’d planted them. She was delighted to see them. “It just came by itself, I didn’t do anything.” And last Sunday as we walked slowly, slowly around the garden, all she had left to say was, “Let’s look at that,” and, “Beautiful!”

Grandma is so patient about her losses that I have no trouble being patient with her. Even though I had to sit with her for an hour eating lunch, periodically reminding her that she had food in front of her and should eat some. Even though it took us an hour to walk 50 feet into the garden. When we turned back toward the house Grandma couldn’t get up the stony path—she was strong enough to walk up the incline, but she couldn’t figure out how to place her feet among the rocks. I took her elbow and led her the long way around to the house. Her steps were so short we hardly moved at all, more rocking side-to-side together than walking forward. We slow-danced around the garden to the house. Grandma didn’t complain and neither did I.

For her, part of her patience is that she doesn’t know what’s wrong. She can’t decide if her experience is bad or just strange. A baby can feel without knowing, but not an adult, and although she must be watched like a child Grandma is very much a grownup. If she isn’t sure something terrible is happening, she usually tries to keep a good face on things. She’ll exclaim from time to time that a fork is too heavy, or a cup too cold, but the great losses—her independence, her ability to read, to talk—go mostly unmentioned. And if Grandma can live patiently through a day stripped of everything that makes it worth living, I can cheerfully walk her around the garden before I go home.

Grandma

Moving VirtualBox and Vagrant to an external drive

Vagrant

When I joined 10gen they gave me a MacBook Pro with an SSD drive. This is wonderful, mainly because it loads StarCraft II really fast. An SSD is like my studio apartment on the Lower East Side: low latency, but a bit cramped. (My apartment is low-latency because it’s a 10-minute walk from work. This is not a strained analogy.)

Lately I’ve needed to spin up a bunch of virtual machines with VirtualBox and Vagrant for testing our changes to PyMongo under every conceivable OS, and there’s no room for them on my SSD. Even if they run heinously slow on a USB drive, they can’t stay in my apartment. Here’s how I moved them to an external hard drive:

  • Move ~/.vagrant.d to the external drive. I renamed it vagrant_home so I’d be able to see it without ls -a.
  • Set VAGRANT_HOME to “/path/to/drive/vagrant_home” in ~/.bash_profile.
  • Open the VirtualBox app, open Preferences, and set its Default Machine Folder to “/path/to/drive/VirtualBox VMs”.
  • Close VirtualBox.
  • Move your “VirtualBox VMs” folder to the drive.
  • Reopen VirtualBox. You’ll see your VMs are listed as “inaccessible”. Remove them from the list.
  • For each VM in your “VirtualBox VMs” folder on the external drive, browser to its folder in Finder and double-click the .vbox file to restore it to the VirtualBox Manager. (Is there an easier method than this?)
  • Finally, move any existing Vagrant directories you’ve made with vagrant init (these are the directories with a Vagrantfile in each) to the external drive. Since these directories only store metadata you could leave them on your main drive, but it’s nice to keep everything together so you could fairly easily plug the whole drive into another machine and start your VMs from there.

Good to go. This has freed up a ton of space on my main drive, and the speed penalty has not been very bad.

Review of “The Little Book Of Semaphores” by Allen B. Downey

The Little Book of Semaphores is a free PDF.

Whenever I write code to synchronize multiple threads, I always think, “There must be some method to this.” I’ve been warned by the popular adage, “Any non-trivial multithreaded program has bugs,” which I believe first appeared in Poor Richard’s Almanac. But I have no systematic way to think about synchronization that assures me I’ve handled all the cases. This book does not provide that method. What it does provide is exercises, with solutions, that have developed my facility with thinking about synchronization, and have shown common synchronization patterns that should be applicable to almost any real-world problem.

Starting from the most basic cases, the book leads the reader step-by-step through a series of increasingly complex synchronization problems, each followed by hints and finally a solution written in a Python-like pseudocode. Appendices show how to “clean up” Python’s and C’s threading libraries to better suit the author’s tastes, and to better match the pseudocode solutions.

The classic synchronization problems included in most Computer Science curricula tend to use real-world objects to describe their constraints: E.g., philosophers are dining at a round table, and each needs two forks. Or, men and women form two lines and they must dance in pairs. In fact, synchronization problems don’t arise on dance floors but in operating systems and software applications, so the classic descriptions confuse more than clarify. The author promises to present both the classic description and the actual software system it arose from, but in fact only the first few problems are presented this way. The more advanced problems (such as the dining philosophers) are not tied to software applications at all. I can’t think of any use for the solutions so I skimmed the later sections.

If you thoroughly absorb the first 10 problems or so, thinking hard and working out your own solutions, you’ll gain some confidence and familiarity with synchronization which will serve you in nearly all software challenges you’ll actually face. In fact, a few weeks ago I had to implement a “rendezvous”, a pattern in which many threads all reach the same point at the same time before proceding, and I was surprised to find I could implement it correctly in Python some years after reading the book. So invest your time in first few chapters of the book and you’ll be rewarded. The book’s long tail of theoretical puzzles is best left to grad students.

Against ResourceWarnings in Python 3

Allow me to grumble. Consider this function from Python 3.2.3′s socketmodule.c:

/* Deallocate a socket object in response to the last Py_DECREF().                                                                                                                                                                  
   First close the file description. */

static void
sock_dealloc(PySocketSockObject *s)
{
    if (s->sock_fd != -1) {
        PyObject *exc, *val, *tb;
        Py_ssize_t old_refcount = Py_REFCNT(s);
        ++Py_REFCNT(s);
        PyErr_Fetch(&exc, &val, &tb);
        if (PyErr_WarnFormat(PyExc_ResourceWarning, 1,
                             "unclosed %R", s))
            /* Spurious errors can appear at shutdown */
            if (PyErr_ExceptionMatches(PyExc_Warning))
                PyErr_WriteUnraisable((PyObject *) s);
        PyErr_Restore(exc, val, tb);
        (void) SOCKETCLOSE(s->sock_fd);
        Py_REFCNT(s) = old_refcount;
    }
    Py_TYPE(s)->tp_free((PyObject *)s);
}

Let’s ignore that “file description” has persisted as a misspelling of “descriptor” in that comment since at least as far back as Python 2.4. There’s a new annoyance in this function: it now junks up my terminal window with a ResourceWarning about an unclosed socket, just before it closes the socket.

Any sane, informed Python developer knows she doesn’t have to close sockets explicitly, rather than letting the garbage collector close them. There are two great reasons not to close sockets explicitly:

  1. In complex code it can be hard to know when the last reference to a socket is deleted. CPython knows precisely when the last reference goes away—that’s when it calls sock_dealloc.
  2. If you do know when a socket should be closed, it’s still pointless to close it explicitly, because CPython is about to do it for you.

“But Jython doesn’t reference-count!” you howl. Relax, you’re not using Jython and neither am I. (Let us not speak of IronPython, either.)

“But what about PyPy!” you cry, and there you have a point. PyPy doesn’t use a reference-counting GC, and PyPy is going to be increasingly popular. But PyPy is smart: it doesn’t raise the ResourceWarning.

PyMongo now supports Python 3, and Python 3 is now pooping ResourceWarnings to stderr, so at 10gen we’ve had to go through PyMongo’s socket-management code ensuring we know when a socket will be deleted, and closing it. Even though the interpreter closes it again immediately afterward.

Closing sockets is easy in PyMongo, but in other applications that use sockets (or files or whatever) less deterministically than PyMongo does, getting rid of ResourceWarnings is a total pain. Consider all the ways we’re used to dealing with resource deallocation, ordered from most to least convenient. (When I get mad I make ordered lists.)

  1. Implicit reference counting: This is what CPython does and it’s both automatic and predictable. I’ve loved this about CPython ever since I started using it.
  2. Resource acquisition is initialization: a reasonably elegant approach in C++ that wraps resources in objects allocated on the stack, so when the stack frame is destroyed the resource is freed, or at least its refcount is decremented. Library classes like auto_ptr make this style robust, even in the face of exceptions.
  3. Explicit reference counting: Objective-C programmers are familiar with this. It’s a pain, and error-prone, and mistakes lead to leaks and crashes. But with experience and frequent review of Apple’s coding guidelines you can stamp out most of the bugs.
  4. Malloc/free in C: You know what sucks? Manual memory management in C. In C, most of the bugs come from calling free() too early or not at all, and they are the worst bugs. In complex data structures it can be very difficult to determine when an object’s lifetime is over.

By adding the ResourceWarning, Python has gone from the top of the list to the bottom. We are no longer able to rely on the interpreter to clean up resources correctly, even though it still does, because then our terminals will be littered with warnings. Since Python has not developed any of the other languages’ resource management techniques (because it has no need for them whatsoever), we are left in the worst possible situation: C-style manual resource management.

“You can just use a Context Manager and the ‘with’ statement,” you whine. That has two problems:

  1. Python 2.4 doesn’t have the ‘with’ statement, so ResourceWarnings in Python 3 make it even harder to write code compatible both with Python 2.4 and Python 3.2, and Python 2.4 is far more widely installed and will be for some time. For 10gen, at least, ending Python 2.4 support is not an option. Thus ResourceWarnings are further discouraging adoption of Python 3.
  2. If you only use a resource within a single block of code, your resource management was always trivially easy. I’m talking about interesting code where it’s hard to tell when you’re done with a resource. This is what garbage collection was supposed to fix for us.

Besides manual management, there are two alternatives:

  1. Implement our own manual reference-counting, à la Objective-C, for cases where it’s hard to know when a resource should be closed. For example, we could wrap our sockets in objects that implement incref() and decref(), and which close the underlying socket when the refcount goes to zero.
  2. Rely on the garbage collector to clean up resources, just like we always have, and tolerate the useless warnings.

Either is less convenient and more error-prone than resource management was prior to ResourceWarnings.

I think ResourceWarnings are a terrible idea. I propose that we remove them. Intelligent programmers can either clean up resources when they know they’re not needed, or rely on the garbage-collector to clean them up. Python should leave it up to us to make that decision, rather than forcing our hand. The alternative, which is to implement manual management in Python for all resources, is too horrible to contemplate.

Review of “The Corrections” by Jonathan Franzen

The corrections jonathan franzen

Patience, grasshopper—the book does not promise well, but rewards your effort in the end. We spend the first few chapters with Chip, a po-mo literature professor at some small college. Franzen wastes our time with tired tropes: Chip’s academic theories are pretentious and empty, departmental politics in a small college are fierce, and (huge surprise) Chip sleeps with a pretty young student but is not fulfilled. Almost as predictably, Chip’s relationship with his Puritanical father is strained because they are opposites and yet alike. Chip has a chip on his shoulder, but he’s a chip off the old block. So what. Things pick up when he gets fired for his impropriety and runs off to Lithuania to work for a crime boss, selling fake stock in the state of Lithuania. It’s a scheme straight out of “The Producers,” but Chip starts to get interesting as he descends into perdition.

The remainder of the novel follows the father Alfred, his wife, and their three grown children. All the family becomes richly characterized; while the reader may not like or sympathize with all of them, they’re well motivated and described. Alfred is by far the most haunting character—he’s a retired engineer who once inspected railroads and signals, but his wires get crossed as Parkinson’s corrodes him. As Franzen tells the story of Alfred’s past, it’s clear the traits that made him successful as a working man are the same traits that make him a neurotic retiree, and that have predestined his children to their various self-destructions.

The characters of The Corrections are compelling, but the best enjoyment comes from Franzen’s love of writing. He doesn’t describe an event just one way if he can think of five ways. In a scene late in Alfred’s deterioration, he hallucinates a turd that has invaded his bed and is smearing itself on the sheets and the walls. The scene is bizarre, and most writers would be expected to cut it short in a few paragraphs. Franzen goes on for pages, delighting in all the new ways he can describe how a turd would move and what it would say.

Python’s swap is not atomic

I rewrote PyMongo’s connection pool over the last few months. Among the concurrency issues I had to nail down was, if a thread is resetting the connection pool as another thread is using the pool, how do I keep them from stepping on each other?

I thought I nailed this, but of course I didn’t. There’s a race condition in here:

class Pool(object):
    def __init__(self):
        self.sockets = set()

    def reset(self):
        # Close sockets before deleting them
        sockets, self.sockets = self.sockets, set()
        for sock_info in sockets: sock_info.close()

I thought that the swap would be atomic: the first thread to enter reset() would replace self.sockets with an empty set, then close all the old sockets, and all subsequent threads would find that self.sockets was empty. That turns out not to be the case.

The race condition was occasionally revealed in runs of PyMongo’s huge test suite. One of the tests spins up 40 concurrent threads. Each thread queries MongoDB, calls reset(), and queries MongoDB again. Here’s how the test fails:

test_disconnect (test.test_pooling.TestPooling) ... Exception in thread Thread-45:
Traceback (most recent call last):
 < ... snip ... >
 File "pymongo/pool.py", line 159, in reset
   for sock_info in sockets: sock_info.close()
RuntimeError: Set changed size during iteration

As I said, I’d thought the swap was atomic, but in fact it takes half a dozen bytecode instructions. That one swap line:

       sockets, self.sockets = self.sockets, set()

…disassembles to:

            0 LOAD_FAST                0 (self)
            3 LOAD_ATTR                0 (sockets)
            6 LOAD_GLOBAL              1 (set)
            9 CALL_FUNCTION            0
           12 ROT_TWO          <- this is the swap
           13 STORE_FAST               1 (sockets)
           16 LOAD_FAST                0 (self)
           19 STORE_ATTR               0 (sockets)

Say that Thread 1 is executing this function. Thread 1 loads self.sockets and the empty set onto its stack and swaps them, and before it gets to STORE_ATTR (where self.sockets is actually replaced), it gets interrupted by Thread 2. Thread 2 runs some other part of the connection pool’s code, e.g.:

    def return_socket(self, sock_info):
        self.sockets.add(sock_info)

This disassembles to:

           24 LOAD_FAST                0 (self)
           27 LOAD_ATTR                1 (sockets)
           30 LOAD_ATTR                3 (add)
           33 LOAD_FAST                1 (sock_info)
           36 CALL_FUNCTION            1

Let’s say Thread 2 reaches the LOAD_ATTR 1 bytecode. Now it has self.sockets on its stack, and it gets interrupted by Thread 1, which is still in reset(). Thread 1 replaces self.sockets with the empty set. But alas, Thread 1′s “old” list of sockets and Thread 2′s “self.sockets” are the same set. Thread 1 starts iterating over the old list of sockets, closing them:

        for sock_info in sockets: sock_info.close()

…but it gets interrupted again by Thread 2, which does self.sockets.add(sock_info), increasing the set’s size by one. When Thread 1 is next resumed, it tries to continue iterating, and raises the “Set changed size during iteration” exception.

Let’s dive deeper for a minute. You may be thinking that in practice two Python threads wouldn’t interrupt each other this often. Indeed, the interpreter executes 100 bytecodes at a time before it even thinks of switching threads. But in our case, Thread 1 is repeatedly calling socket.close(), which is written in socketmodule.c like this:

static PyObject * sock_close(PySocketSockObject *s) {
    SOCKET_T fd;

    if ((fd = s->sock_fd) != -1) {
        s->sock_fd = -1;
        Py_BEGIN_ALLOW_THREADS
        (void) SOCKETCLOSE(fd);
        Py_END_ALLOW_THREADS
    }
    Py_INCREF(Py_None);
    return Py_None;
}

That Py_BEGIN_ALLOW_THREADS macro releases the Global Interpreter Lock and Py_END_ALLOW_THREADS waits to reacquire it. In a multithreaded Python program, releasing the GIL makes it very likely that another thread which is waiting for the GIL will immediately acquire it. (Notwithstanding David Beazley’s talk on the GIL—he demonstrates that CPU-bound and IO-bound threads competing for the GIL on a multicore system interrupt each other too rarely, but in this case I’m only dealing with IO-bound threads.)

So calling socket.close() in a loop ensures that this thread will be constantly interrupted. The probability that some thread in return_socket() gets a reference to the set, and modifies it, interleaved with some other thread in reset() getting a reference to the same set and iterating it, is high enough to break PyMongo’s unittest about 1% of the time.

The solution was obvious once I understood the problem:

class Pool(object):
    def __init__(self):
        self.sockets = set()
        self.lock = threading.Lock()

    def reset(self):
        self.lock.acquire()
        try:
            # Close sockets before deleting them
            sockets, self.sockets = self.sockets, set()
        finally:
            self.lock.release()

        # Now only this thread can have a reference to this set of sockets
        for sock_info in sockets: sock_info.close()
 
   def return_socket(self, sock_info):
        self.lock.acquire()
        try:
            self.sockets.add(sock_info)
        finally:
            self.lock.release()

Single-bytecode instructions in Python are atomic, and if you can use this atomicity to avoid mutexes then I believe you should—not only is your code faster and simpler, but you avoid the risk of deadlocks, which are the worst concurrency bugs. But not everything that looks atomic is. When in doubt, use the dis module to examine your bytecode and find out for sure.