A. Jesse Jiryu Davis

Motor Officially Released

It's happened. Motor 0.1 is in PyPI. You can now install it with a simple: $ pip install motor This is the first official, production release of Motor. That said, there will be bugs: please file them and I'll respond as quickly as I can. Links: [...]

Motor

It's happened. Motor 0.1 is in PyPI. You can now install it with a simple:

$ pip install motor

This is the first official, production release of Motor.

That said, there will be bugs: please file them and I'll respond as quickly as I can.

Links:

Motor's Future

Motor is now feature-complete and fully tested. I expect to put it on the back burner and concentrate on other projects.

Motor will keep up easily with PyMongo development, because I designed it to. I don't intend for it to lag more than a smidge. For example, PyMongo 2.5 will bring some new security and authentication features; in the following Motor release I'll support those, too.

I believe this is the coolest thing I've ever made. I hope you have fun with it. Tweet me and let me know what you build with it.

Miami Photos

I'm starting to use color; here's some shots from 10gen's annual meeting, which was in Miami this year. This is on Kodak Portra 400, with a Norita 66. Mistakes were made: Photographing from my hotel balcony, I didn't notice that the [...]

I'm starting to use color; here's some shots from 10gen's annual meeting, which was in Miami this year.

Miami 1

Miami 2

This is on Kodak Portra 400, with a Norita 66. Mistakes were made: Photographing from my hotel balcony, I didn't notice that the railing's edge was in the frame. And I shot 7 rolls but my flash only synced on a handful of photos. I'll reread the manual before I use the Norita again.

Photography Is Burning!

I'm in a group photo show at the Village Zendo in lower Manhattan, Saturday March 9. The show is open 11am to 6pm and I plan to hang out there all day, except lunch time, so you're welcome to stop by and say hello. There's a panel discussion [...]

I'm in a group photo show at the Village Zendo in lower Manhattan, Saturday March 9. The show is open 11am to 6pm and I plan to hang out there all day, except lunch time, so you're welcome to stop by and say hello. There's a panel discussion 7:30-8:30pm, in which the curator and the other photographers will say smart things and I would be wise to stay silent. The exhibitors are:


Here's my selection, unless I change my mind:

Photography is burning 1

Photography is burning 2

Photography is burning 4

Photography is burning 5

Photography is burning 3

A Curious Concurrency Case

Last month, the team in charge of 10gen's Ruby driver for MongoDB ran into a few concurrency bugs, reported by a customer running the driver in JRuby with a large number of threads and connections. I've barely written a line of Ruby in my [...]

Last month, the team in charge of 10gen's Ruby driver for MongoDB ran into a few concurrency bugs, reported by a customer running the driver in JRuby with a large number of threads and connections. I've barely written a line of Ruby in my life, but I jumped in to help for a week anyway.

I helped spot a very interesting performance bug in the driver's connection pool. The fix was easy, but thoroughly characterizing the bug turned out to be complex. Here's a record of my investigation.


The Ruby driver's pool assigns a socket to a thread when the thread first calls checkout, and that thread stays pinned to its socket for life. Until the pool reaches its configured max_size, each new thread has a bespoke socket created for it. Additional threads are assigned random existing sockets. When a thread next calls checkout, if its socket's in use (by another thread) the requesting thread waits in a queue.

Here's a simplified version of the pool:

class Pool
  def initialize(max_size)
    @max_size       = max_size
    @sockets        = []
    @checked_out    = []
    @thread_to_sock = {}
    @lock           = Mutex.new
    @queue          = ConditionVariable.new
  end

  # Check out an existing socket or create a
  # new socket if max_size not exceeded.
  # Otherwise, wait for the next socket.
  def checkout
    tid = Thread.current.object_id
    loop do
      @lock.synchronize do
        if sock = @thread_to_sock[tid]

          # Thread wants its prior socket
          if ! @checked_out.include?(sock)
            # Acquire the socket
            @checked_out << sock
            return sock
          end

        else

          if @sockets.size < @max_size

            # Assign new socket to thread
            sock = create_connection
            @thread_to_sock[tid] = sock
            return sock

          elsif @checked_out.size < @sockets.size

            # Assign random socket to thread
            sock = available[rand(available.length)]
            @thread_to_sock[tid] = sock
            return sock

          end

        end

        # Release lock, wait to try again
        @queue.wait(@lock)
      end
    end
  end

  # Return a socket to the pool.
  def checkin(socket)
    @lock.synchronize do
      @checked_out.delete(socket)
      @queue.signal
    end
  end
end

When a thread returns a socket, it signals the queue and wakes the next thread in line. That thread goes to the top of the loop and tries again to acquire its socket. The bug is in checkin: if the next thread in the queue is waiting for a different socket than the one just checked in, it may fail to acquire its socket, and it will sleep again.

When I first saw this I thought there must be the possibility of a deadlock. After all, if threads sometimes call checkin without really waking other threads, mustn't there come a time when everyone's waiting and no one has a socket?

I wrote a Python script to simulate the Ruby pool and ran it for a few thousand ticks, with various numbers of threads and sockets. It never deadlocked.

So I had to stop coding and start thinking.


Let's say there are N threads and S sockets. N can be greater than, less than, or equal to S. Doesn't matter. Assume the pool has already created all S sockets, and all N threads have sockets assigned. Each thread either:

  1. Has checked out its socket, and is going to return it and signal the queue, or
  2. Is waiting for its socket, or will ask for it in the future, or
  3. Has returned its socket and will never ask for it again.

To deadlock, all threads must be in state 2.

To reach that point, we need N - 1 threads in state 2 and have the Nth thread transition from 1 to 2. (By definition it doesn't go from state 3 to 2.) But when the Nth thread returns its socket and signals the queue, all sockets are now returned, so the next awakened thread won't wait again—its socket is available, so it goes to state 1. Thus, no deadlock.

The old code definitely wasn't efficient. It's easy to imagine cases where all a socket's threads were waiting, even though one of them could have been running. Let's say there are 2 sockets and 4 threads:

  1. Thread 1 has Socket A checked out, Thread 2 has Socket B, Thread 3 is waiting for A, Thread 4 is waiting for B, and they're enqueued like [3, 4].
  2. Thread 2 returns B, signals the queue.
  3. Thread 3 wakes, can't get A, waits again.

At this point, Thread 4 should be running, since its Socket B is available, but it's waiting erroneously for Thread 1 to return A before it wakes.

So we changed the code to do queue.broadcast instead of signal, so checkin wakes all the threads, and we released the fixed driver. In the future, even better code may prevent multiple threads from contending for the same socket at all.

The bugfix was obvious. It's much harder to determine exactly how bad the bug was—how common is it for a socket to be unused?


In my simulated pool there are 10 sockets. Each thread uses its socket for 1‑20 seconds, sleeps one second, and asks for its socket again. I counted how many sockets were in use each second, and subtracted that from S * total_time to get an inefficiency factor:

Percentage unused sockets

If N=S=10, threads never wait but there's some fake "inefficiency" due to the 1-second sleep. For larger numbers of threads the sleep time becomes irrelevant (because there's always another thread ready to use the socket), but signal adds an inefficiency that declines very slowly from 8% as the number of threads increases. A pool that uses broadcast, in contrast, can saturate its sockets if it has more than 30 threads.

I spent hours (mostly on planes) trying to determine why the inefficiency factor acts this way—why 8%? Shouldn't it be worse? And why does it fall, slowly, as N rises? But I'm calling it quits now. Leave a comment if you have any insights, but I'm satisfied that the old pool was wasteful and that the new one is a substantial improvement.

What It's Like To Work For 10gen

My colleague Kristina Chodorow wrote a post on working at 10gen with which I was a bit obsessed when I applied for a gig here in late 2011. Recently I've had an eventful couple weeks: I went to Miami, won a robot battle, and helped the Ruby [...]

My colleague Kristina Chodorow wrote a post on working at 10gen with which I was a bit obsessed when I applied for a gig here in late 2011. Recently I've had an eventful couple weeks: I went to Miami, won a robot battle, and helped the Ruby driver team fix some bugs. Seems like a good time to add to the "Working at 10gen" genre.


10gen, the MongoDB company, is as distributed as our database. We're spread around four continents, partly because we hired interesting people wherever they were, and partly to support our customers in their regions and time zones. Once a year everyone comes together, this time in Miami. My room had an acceptable vista:

IMG 0333

There was a hot tub on my balcony but I was the only one so equipped. Please don't tell my colleagues.

At our annual meetings we have nerdy contests. Last year, rocket building. This year, Lego robots that we programmed to push each other off a table.

The basic robot kit comes programmed to turn in a circle, and when it detects something in front of it, it charges. The 10gen teams came up with designs a little more sophisticated. Most bots had color-sensors pointed at the table's surface in front of them, so they could turn away from the edge without falling off. By far the cleverest hack I saw was this:

Bot

Photo: Gary Murakami

The robot has a black stripe taped to its front and rear shovels, which slips under the opponent's color-sensor. The opponent sees the stripe, thinks it's about to fall off the table, and turns away—and falls off the table.

Despite its brilliance, this subterfuge lost to my team's strategy. (Disclosure: I was doing customer support when they built the bot, so I take no credit.) We had a slow, powerful machine with a high shovel in front. Because it was high, the shovel tilted our opponents upward off their treads and robbed them of traction. It wasn't a very smart robot, but mechanics and brute force won the prize.

The kits were quite expensive, I hear. We tried not to bang them up too badly, or lose any parts, so we could donate them to a local middle school.

It's this combination of enjoyment with care-taking that I loved about the Miami meeting. We play like kids but we are not children. We never forgot that we're spending other people's money when we meet: aside from a few hours of robot-fighting, we spent our two and a half days in Miami holed up in conference rooms planning our future. Our event planner Samantha made it clear that we were not to spend any extra time in Miami on the company's dime. If we weren't on a plane home within a few hours of its ending, our expenses were our own. It sounds harsh but it's a mature attitude: we must take the greatest care with the capital entrusted to us.

The final day of our meeting, the Ruby driver team had a crisis. A customer reported that the driver was leaving cursors unclosed on replica-set members, because it sent the killCursors message to the wrong member. The Ruby team is normally four coders: Tyler Brock, Gary Murakami, Emily Stolfo, and Brandon Black. But Bernie Hackett and I from the Python team, and Jeff Yemin from Java, joined to take a look. Ruby Team Plus dug into the customer's reports and I learned that the driver was in a novel operating environment, and it was not thriving there. It was running in JRuby with a big thread pool, which exposed threading issues that had lain dormant for months. Not only was it leaving cursors open, but the JRuby BSON extension had concurrency bugs, and there was a strange performance degradation in the connection pool. I spent my time looking for connection-pool bugs and found a neat puzzler: I'm going to write about that in my next post.

Ruby Team Plus formed in a conference room in Miami, and drifted apart after we nailed down the last of the bugs by video-chat, nine intense days later. (Much more intense for the core Rubyists than for me.) No manager said, "You guys should help the Ruby team." We thought we could be useful, so we helped. I admire this about 10gen. I also admire that we piled in to help a customer without checking whether they had some Special Diamond Premium contract. They'd found bugs we needed to fix, so we worked nights and weekends to fix them. (And once they were fixed, we made it up to ourselves with some time off.)

That's what it's like to work for 10gen. I'm proud of us, and I'm having the time of my life.

Miami beach

Photo: Gary Murakami again

An Event synchronization primitive for Ruby

I helped some Ruby friends implement a rendezvous (aka a barrier). I'm accustomed to using an Event to implement a rendezvous in Python but Ruby doesn't have Events, only Mutexes and ConditionVariables. That's fine, Python's Event is [ ... ]

I helped some Ruby friends implement a rendezvous (aka a barrier). I'm accustomed to using an Event to implement a rendezvous in Python but Ruby doesn't have Events, only Mutexes and ConditionVariables. That's fine, Python's Event is implemented in terms of a mutex and a condition, so it's easy to make an Event in Ruby:

class Event
    def initialize
        @lock = Mutex.new
        @cond = ConditionVariable.new
        @flag = false
    end

    def set
        @lock.synchronize do
            @flag = true
            @cond.broadcast
       end
    end

    def wait
        @lock.synchronize do
            if not @flag
                @cond.wait(@lock)
            end
        end
    end
end

Ruby's cond.wait(lock) pattern is interesting—you enter a lock so you can call wait, then wait releases the lock so another thread can broadcast the condition, and finally wait reacquires the lock before continuing.

I didn't implement is_set since it's unreliable (another thread can change it between the time you check the value and the time you act upon the information) and I didn't do clear since you can just replace the Event with a fresh one.

Author Photos

I did another round of headshots this weekend: Jennifer Keishin Armstrong, author of the upcoming books Sexy Feminism and Mary and Lou and Rhoda and Ted. Also my girlfriend. This one's probably not going to make the New York Times Book [ ... ]

I did another round of headshots this weekend: Jennifer Keishin Armstrong, author of the upcoming books Sexy Feminism and Mary and Lou and Rhoda and Ted. Also my girlfriend.

Jennifer Keishin Armstrong

This one's probably not going to make the New York Times Book Review, but it's a great outtake:

Jennifer Keishin Armstrong

Knowing When A Python Thread Has Died

A few months ago I had to solve a problem in PyMongo that is harder than it seems: how do you register for notifications when the current thread has died? The circumstances are these: when you call start_request in PyMongo, it gets a socket [...]

Young Woman Contemplating a Skull by Alessandro Casolani Statens Museum for Kunst DSC08131

A few months ago I had to solve a problem in PyMongo that is harder than it seems: how do you register for notifications when the current thread has died?

The circumstances are these: when you call start_request in PyMongo, it gets a socket from its pool and assigns the socket to the current thread. We need some way to know when the current thread dies so we can reclaim the socket and return it to the socket pool for future use, rather than wastefully allowing it to be closed.

PyMongo can assume nothing about what kind of thread this is: It could've been started from the threading module, or the more primitive thread module, or it could've been started outside Python entirely, in C, as when PyMongo is running under mod_wsgi.

Here's what I came up with:

import threading
import weakref

class ThreadWatcher(object):
    class Vigil(object):
        pass

    def __init__(self):
        self._refs = {}
        self._local = threading.local()

    def _on_death(self, vigil_id, callback, ref):
        self._refs.pop(vigil_id)
        callback()

    def watch(self, callback):
        if not self.is_watching():
            self._local.vigil = v = ThreadWatcher.Vigil()
            on_death = partial(
                self._on_death, id(v), callback)

            ref = weakref.ref(v, on_death)
            self._refs[id(v)] = ref

    def is_watching(self):
        "Is the current thread being watched?"
        return hasattr(self._local, 'vigil')

    def unwatch(self):
        try:
            v = self._local.vigil
            del self._local.vigil
            self._refs.pop(id(v))
        except AttributeError:
            pass

The key lines are highlighted, in watch(). First, I make a weakref to a thread local. Weakrefs are permitted on subclasses of object but not object itself, so I use an inner class called Vigil. I initialize the weakref with a callback, which will be executed when the vigil is deleted.

The callback only fires if the weakref outlives the vigil, so I keep the weakref alive by storing it as a value in the _refs dict. The key into _refs can't be the vigil itself, since then the vigil would have a strong reference and wouldn't be deleted when the thread dies. I use id(key) instead.

Let's step through this. When a thread calls watch(), the only strong reference to the vigil is a thread-local. When a thread dies its locals are cleaned up, the vigil is dereferenced, and _on_death runs. _on_death cleans up _refs and then voilà, it runs the original callback.

When exactly is the vigil deleted? This is a subtle point, as the sages among you know. First, PyPy uses occasional mark and sweep garbage collection instead of reference-counting, so the vigil isn't deleted until some time after the thread dies. In unittests, I force the issue with gc.collect().

Second, there's a bug in CPython 2.6 and earlier, fixed by Antoine Pitrou in CPython 2.7.1, where thread locals aren't cleaned up until the thread dies and some other thread accesses the local. I wrote about this in detail last year when I was struggling with it. gc.collect() won't help in this case.

Thirdly, when is the local cleaned up in Python 2.7.1 and later? It happens as soon as the interpreter deletes the underlying PyThreadState, but that can actually come after Thread.join() returns—join() is simply waiting for a Condition to be set at the end of the thread's run, which comes before the locals are cleared. So in Python 2.7.1 we need to sleep a few milliseconds after joining the thread to be certain it's truly gone.

Thus a reliable test for my ThreadWatcher class might look like:

class TestWatch(unittest.TestCase):
    def test_watch(self):
        watcher = ThreadWatcher()
        callback_ran = [False]

        def callback():
            callback_ran[0] = True

        def target():
            watcher.watch(callback)

        t = threading.Thread(target=target)
        t.start()
        t.join()

        # Trigger collection in Py 2.6
        # See http://bugs.python.org/issue1868
        watcher.is_watching()
        gc.collect()

        # Cleanup can take a few ms in
        # Python >= 2.7
        for _ in range(10):
            if callback_ran[0]:
                break
            else:
                time.sleep(.1)


        assert callback_ran[0]
        # id(v) removed from _refs?
        assert not watcher._refs

The is_watching() call accesses the local object from the main thread after the child has died, working around the Python 2.6 bug, and the gc.collect() call makes the test pass in PyPy. The sleep loop gives Python 2.7.1 a chance to finish tearing down the thread state, including locals.

Two final cautions. The first is, you can't predict which thread runs the callback. In Python 2.6 it's whichever thread accesses the local after the child dies. In later versions, with Pitrou's improved thread-local implementation, the callback is run on the dying child thread. In PyPy it's whichever thread is active when the garbage collector decides to run.

The second caution is, there's an unreported memory-leak bug in Python 2.6, which Pitrou fixed in Python 2.7.1 along with the other bug I linked to. If you access a thread-local from within the weakref callback, you're touching the local in an inconsistent state, and the next object stored in the local will never be dereferenced. So don't do that. Here's a demonstration:

class TestRefLeak(unittest.TestCase):
    def test_leak(self):
        watcher = ThreadWatcher()
        n_callbacks = [0]
        nthreads = 10

        def callback():
            # BAD, NO!:
            # Accessing thread-local in callback
            watcher.is_watching()
            n_callbacks[0] += 1

        def target():
            watcher.watch(callback)

        for _ in range(nthreads):
            t = threading.Thread(target=target)
            t.start()
            t.join()

        watcher.is_watching()
        gc.collect()
        for _ in range(10):
            if n_callbacks[0] == nthreads:
                break
            else:
                time.sleep(.1)

        self.assertEqual(nthreads, n_callbacks[0])

In Python 2.7.1 and later the test passes because all ten threads' locals are cleaned up, and the callback runs ten times. But in Python 2.6 only five locals are deleted.

I discovered this bug when I rewrote the connection pool in PyMongo 2.2 and a user reported that in Python 2.6 and mod_wsgi, every second request leaked one socket! I fixed PyMongo in version 2.2.1 by avoiding accessing thread locals while they're being torn down. (See bug PYTHON-353.)

Update: I've discovered that in Python 2.7.0 and earlier, you need to lock around the assignment to self._local.vigil, see "Another Thing About Threadlocals".

For further reading:


Post-script: The image up top is a memento mori, a "reminder you will die," by Alessandro Casolani from the 16th Century. The memento mori genre is intended to offset a portrait subject's vanity—you look good now, but your beauty won't make a difference when you face your final judgment.

This was painted circa 1502 by Andrea Previtali:

Andrea Previtali Memento Mori WGA18406

The inscription is "Hic decor hec forma manet, hec lex omnibus unam," which my Latin-nerd friends translate as, "This beauty endures only in this form, this law is the same for everyone." It was painted upside-down on the back of this handsome guy:

Andrea Previtali portrait of a man

The painting was mounted on an axle so the face and the skull could be rapidly alternated and compared. Think about that the next time you start a thread—it may be running now, but soon enough it will terminate and even its thread-id will be recycled.

Motor Is Growing Up

For a long time I've thought that Motor, my non-blocking Python driver for MongoDB and Tornado, ought to be included as a module within the standard PyMongo package. Everyone both inside and outside 10gen has told me they'd prefer Motor be [ ... ]

Motor

For a long time I've thought that Motor, my non-blocking Python driver for MongoDB and Tornado, ought to be included as a module within the standard PyMongo package. Everyone both inside and outside 10gen has told me they'd prefer Motor be a separate distribution. Last week, I was suddenly enlightened. I agree!

(My argument for keeping Motor and PyMongo together was that changes in PyMongo might require changes in Motor, so they should be versioned and released together. But as Motor nears completion and I see the exact extent of its coupling with PyMongo, the risk of incompatibilities arising seems lower to me than it had.)

We completed the first step of the separation yesterday: We released PyMongo 2.4.2, the first version of PyMongo that includes the hooks Motor needs to wrap it and make it non-blocking.

The next step is to make a standalone distribution of Motor, and that's almost done, too. Motor has left its parent's house. It has:

And now, installing Motor is finally normal:

$ git clone git://github.com/mongodb/motor.git
$ cd motor
$ python setup.py install

Motor's not done yet, but it's heading to a 0.1 release in PyPI, as a standalone package, real soon now.

PyMongo 2.4.2 Is Out

Yesterday we released PyMongo 2.4.2, the latest version of 10gen's Python driver for MongoDB. You can see the whole list of nine bugs fixed. Here are some highlights: I made PyMongo's MongoReplicaSetClient smarter about reading from [ ... ]

Yesterday we released PyMongo 2.4.2, the latest version of 10gen's Python driver for MongoDB. You can see the whole list of nine bugs fixed. Here are some highlights:

  • I made PyMongo's MongoReplicaSetClient smarter about reading from replica set members in failure scenarios. Since version 2.1, PyMongo has been able to detect when a secondary becomes primary or vice versa. But it wasn't very smart about members that are neither primary nor secondary because they're in recovery mode. Now, PyMongo reacts as soon as it notices such a member: it stops trying to use it, and it refreshes its view of all members' states immediately.

  • We got an excellent pull request from Craig Hobbs that lets you specify your read preference in the connection string, like:

    "mongodb://localhost/?readPreference=secondary"
    
  • If you want to try MongoDB's full-text search, PyMongo can now create a text index. (All versions let you to run the text command to use a text index once you've created

(Down here we have to speak very quietly, because the next part is top-secret: I snuck a feature into what's supposed to be a bugfix release. PyMongo 2.4.2 has the hooks Motor needs to wrap PyMongo and make it non-blocking. This lets Motor take a new direction, which I'll blog about shortly.)