A. Jesse Jiryu Davis

Category: Motor

The Green Matrix

For a year and a half I've been part of the team maintaining PyMongo, the Python MongoDB driver. It's one of the most widely used Python packages with 1.5 million lifetime downloads. The code itself is only moderately complex; about 8300 [...]

For a year and a half I've been part of the team maintaining PyMongo, the Python MongoDB driver. It's one of the most widely used Python packages with 1.5 million lifetime downloads. The code itself is only moderately complex; about 8300 source lines. What makes it a tiny horror to work on is the range of environments we support. Here's our test matrix in Jenkins:

PyMongo test matrix

That's 72 test configurations. (It looks like more than that, but we don't test Jython and PyPy with C extensions compiled since that currently doesn't make sense.) The dimensions are:

  • Python version: We support CPython 2.4 through 3.3. On each commit we test just the highlight versions: 2.4, 2.7, and 3.3. We also support the latest Jython and PyPy. We test the intermediate versions like 2.5 and 2.6 before a release.

  • C extensions: we have a few key parts of PyMongo implemented in C for speed, with pure-Python versions as a fallback. We test both modes.

  • MongoDB Version: We test the latest development branch of MongoDB (2.5) plus the last two production versions.

  • MongoDB Configuration: We set up a single server, a master-slave pair, and a three-node replica set, and run mostly the same tests against all.

In each test configuration, PyMongo's test suite has about 430 individual test functions.

This covers the main test matrix, but there are some auxiliary tests we run in Jenkins on every commit. We have a mod_wsgi test that runs a few thousand web requests (first serial, then parallel) against a web app using mod_wsgi in a range of configurations:

  • Python 2.4, 2.5, 2.6, and 2.7

  • mod_wsgi 2.8, 3.2, and 3.3

  • The latest production MongoDB as a single server or replica set

The mod_wsgi tests are there to ensure we never recreate a connection leak like the apocalyptic "unbounded connection growth with Apache mod_wsgi 2.x" bug to which I lost some of the best weeks of my life.

I've also set up some tests for Motor, my non-blocking MongoDB driver for Tornado: I run in Python 2.6, 2.7, and 3.3 against a single MongoDB server and a replica set, running the three most recent versions of MongoDB. I have a separate Motor test that connects to MongoDB over SSL, and finally I have a test of "Synchro," which wraps Motor inside a resynchronization layer and checks it can pass all the same tests as PyMongo. In all, Jenkins runs 33 test configurations for each Motor commit.

Jenkins automatically tests our main configurations, but we periodically hand-test some additional configurations, like sharded clusters, beta releases of Jython and PyPy, and Windows. We'll put some of these in Jenkins too.

For a team of three people to build and maintain this volume of test infrastructure is a huge effort. It's clearly worth it, because the test matrix is so large. But it's not much fun.

Lessons learned:

  • Test code is a liability: Too much testing code is as bad as too much of any other kind of code. Write as few tests as possible to cover the cases you need to test. Over-testing comforts the novice but impedes agility. For example, when we renamed PyMongo's Connection class to MongoClient, I had to change over 1000 lines in 32 files in the test suite. A commit that huge is a barrier in the repository's history, across which no commit can be moved without conflicts. I hope to never do anything like it again. The test suite should be smaller and better factored.

  • Tests must be very reliable: It needs to be not only minimal but also very reliable. Tests should fail if and only if the behavior they test breaks. When I joined the team, PyMongo's tests often failed "just cuz." Fixing them all took months: We'd observe an intermittent failure in Jenkins due to some race condition that we couldn't reproduce on our laptops (an EC2 "medium" instance runs a three-node MongoDB cluster slower than you could possibly imagine). We'd think real hard and finally understand and fix the failure. Then we'd do the same for some other test. It was a costly exercise but necessary: It's not until our tests always passed that we took them seriously when they didn't.

There are other dicta that I find negotiable: tests should be fast, sure, but I can live with a test suite that takes a few minutes to run per configuration. Perhaps test methods should include only one assert, but I can live with several asserts in some methods.

I'm implacably opposed to mocking when it comes to testing PyMongo: what our tests verify is primarily our understanding of how to talk to MongoDB. If we mocked any aspect whatsoever of the MongoDB server, our tests would be worse than useless. Virtually every test of PyMongo is an integration test, so we make no distinction between "unit tests" and "integration tests."

I'm curious what others have learned from maintaining a driver's test suite. It seems to be a lot of hard work no matter what.

Motor 0.1 Migration Instructions

Motor (which is indeed my non-blocking driver for MongoDB and Tornado) had a 0.1 release to PyPI yesterday. It had an odd history prior, so there are various versions of the code that you, dear reader, may have installed on your system. All [...]

Motor (which is indeed my non-blocking driver for MongoDB and Tornado) had a 0.1 release to PyPI yesterday. It had an odd history prior, so there are various versions of the code that you, dear reader, may have installed on your system. All you need to do is:

$ pip uninstall pymongo motor
$ pip install motor

Motor will pull in the official PyMongo, plus Tornado and Greenlet, as dependencies. You should now have Motor 0.1 and PyMongo 2.4.2:

>>> import pymongo
>>> pymongo.version
'2.4.2'
>>> import motor
>>> motor.version
'0.1'

(The lore is: I started Motor last year in a branch of my fork of PyMongo, so you could've installed an experimental version of both PyMongo and Motor from there. Then we transferred Motor into its own repo within the MongoDB.org organization on January 15. And on February 1st a zealous fan actually grabbed the "Motor" package name on PyPI and uploaded my code to it, then transferred ownership to me, just to make sure I could use the name Motor.)

Motor Officially Released

It's happened. Motor 0.1 is in PyPI. You can now install it with a simple: $ pip install motor This is the first official, production release of Motor. That said, there will be bugs: please file them and I'll respond as quickly as I can. Links: [...]

Motor

It's happened. Motor 0.1 is in PyPI. You can now install it with a simple:

$ pip install motor

This is the first official, production release of Motor.

That said, there will be bugs: please file them and I'll respond as quickly as I can.

Links:

Motor's Future

Motor is now feature-complete and fully tested. I expect to put it on the back burner and concentrate on other projects.

Motor will keep up easily with PyMongo development, because I designed it to. I don't intend for it to lag more than a smidge. For example, PyMongo 2.5 will bring some new security and authentication features; in the following Motor release I'll support those, too.

I believe this is the coolest thing I've ever made. I hope you have fun with it. Tweet me and let me know what you build with it.

Motor Is Growing Up

For a long time I've thought that Motor, my non-blocking Python driver for MongoDB and Tornado, ought to be included as a module within the standard PyMongo package. Everyone both inside and outside 10gen has told me they'd prefer Motor be [ ... ]

Motor

For a long time I've thought that Motor, my non-blocking Python driver for MongoDB and Tornado, ought to be included as a module within the standard PyMongo package. Everyone both inside and outside 10gen has told me they'd prefer Motor be a separate distribution. Last week, I was suddenly enlightened. I agree!

(My argument for keeping Motor and PyMongo together was that changes in PyMongo might require changes in Motor, so they should be versioned and released together. But as Motor nears completion and I see the exact extent of its coupling with PyMongo, the risk of incompatibilities arising seems lower to me than it had.)

We completed the first step of the separation yesterday: We released PyMongo 2.4.2, the first version of PyMongo that includes the hooks Motor needs to wrap it and make it non-blocking.

The next step is to make a standalone distribution of Motor, and that's almost done, too. Motor has left its parent's house. It has:

And now, installing Motor is finally normal:

$ git clone git://github.com/mongodb/motor.git
$ cd motor
$ python setup.py install

Motor's not done yet, but it's heading to a 0.1 release in PyPI, as a standalone package, real soon now.

MotorConnection Has Been Renamed MotorClient

As it was foretold, so has it come to pass. The omens all are satisfied, the prophecy fulfilled. Last month I wrote about PyMongo renaming its main classes from Connection to MongoClient and from ReplicaSetConnection to [...]

Motor

As it was foretold, so has it come to pass. The omens all are satisfied, the prophecy fulfilled.

Last month I wrote about PyMongo renaming its main classes from Connection to MongoClient and from ReplicaSetConnection to MongoReplicaSetClient. For consistency, I promised to rename Motor's main classes, too: from MotorConnection to MotorClient and from MotorReplicaSetConnection to MotorReplicaSetClient. Now I've done so.

Migration

  1. Obviously, anywhere you refer to MotorConnection or MotorReplicaSetConnection, replace it with MotorClient or MotorReplicaSetClient.
  2. More subtly, if you use the sync_connection method, that's changed to sync_client.

I've updated this blog to run on the latest version of Motor, you can see the commit here.

Yes, Every MongoDB Driver Supports Every Command

This post is in response to a persistent form of question I receive about MongoDB drivers: "Does driver X support feature Y?" The answer is nearly always "yes," but you can't know that unless you understand MongoDB commands. There are only [...]

This post is in response to a persistent form of question I receive about MongoDB drivers: "Does driver X support feature Y?" The answer is nearly always "yes," but you can't know that unless you understand MongoDB commands.

There are only four kinds of operations a MongoDB driver can perform on the server: insert, update, remove, query, and commands.

Almost two years ago my colleague Kristina wrote about "Why Command Helpers Suck," and she is still right: if you only use the convenience methods without understanding the unifying concept of a "command," you're unnecessarily tied to a particular driver's API, and you don't know how MongoDB really works.

So let's do a pop quiz:

  1. Which MongoDB drivers support the Aggregation Framework?
  2. Which support the "group" operation?
  3. Which drivers are compatible with MongoDB's mapreduce feature?
  4. Which drivers let you run "count" or "distinct" on a collection?

If you answered, "all of them," you're right—every driver supports commands, and all the features I asked about are commands.

Let's consider three MongoDB drivers for Python and show examples of using the distinct command in each.

PyMongo

PyMongo has two convenience methods for distinct. One is on the Collection class, the other on Cursor:

>>> from pymongo import MongoClient
>>> db = MongoClient().test
>>> db.test_collection.distinct('my_key')
[1.0, 2.0, 3.0]
>>> db.test_collection.find().distinct('my_key')
[1.0, 2.0, 3.0]

But this all boils down to the same MongoDB command. We can look up its arguments in the MongoDB Command Reference and see that distinct takes the form:

{ distinct: collection, key: <field>, query: <query> }

So let's use PyMongo's generic command method to run distinct directly. We'll pass the collection and key arguments and omit query. We need to use PyMongo's SON class to ensure we pass the arguments in the right order:

>>> from bson import SON
>>> db.command(SON([('distinct', 'test_collection'), ('key', 'my_key')]))
{u'ok': 1.0,
 u'stats': {u'cursor': u'BasicCursor',
            u'n': 3,
            u'nscanned': 3,
            u'nscannedObjects': 3,
            u'timems': 0},
 u'values': [1.0, 2.0, 3.0]}

The answer is in values.

Motor

My async driver for Tornado and MongoDB, called Motor, supports a similar conveniences for distinct. It has both the MotorCollection.distinct method:

>>> from tornado.ioloop import IOLoop
>>> from tornado import gen
>>> import motor
>>> from motor import MotorConnection
>>> db = MotorConnection().open_sync().test
>>> @gen.engine
... def f():
...     print (yield motor.Op(db.test_collection.distinct, 'my_key'))
...     IOLoop.instance().stop()
... 
>>> f()
>>> IOLoop.instance().start()
[1.0, 2.0, 3.0]

... and MotorCursor.distinct:

>>> @gen.engine
... def f():
...     print (yield motor.Op(db.test_collection.find().distinct, 'my_key'))
...     IOLoop.instance().stop()
... 
>>> f()
>>> IOLoop.instance().start()
[1.0, 2.0, 3.0]

Again, these are just convenient alternatives to using MotorDatabase.command:

>>> @gen.engine
... def f():
...     print (yield motor.Op(db.command,
...         SON([('distinct', 'test_collection'), ('key', 'my_key')])))
...     IOLoop.instance().stop()
... 
>>> f()
>>> IOLoop.instance().start()
{u'ok': 1.0,
 u'stats': {u'cursor': u'BasicCursor',
            u'n': 3,
            u'nscanned': 3,
            u'nscannedObjects': 3,
            u'timems': 0},
 u'values': [1.0, 2.0, 3.0]}

AsyncMongo

AsyncMongo is another driver for Tornado and MongoDB. Its interface isn't nearly so rich as Motor's, so I often hear questions like, "Does AsyncMongo support distinct? Does it support aggregate? What about group?" In fact, it's those questions that prompted this post. And of course the answer is yes, AsyncMongo supports all commands:

>>> from tornado.ioloop import IOLoop
>>> import asyncmongo
>>> db = asyncmongo.Client(
...     pool_id='mydb', host='127.0.0.1', port=27017,
...     maxcached=10, maxconnections=50, dbname='test')
>>> @gen.engine
... def f():
...     results = yield gen.Task(db.command,
...         SON([('distinct', 'test_collection'), ('key', 'my_key')]))
...     print results.args[0]
...     IOLoop.instance().stop()
... 
>>> f()
>>> IOLoop.instance().start()
{u'ok': 1.0,
 u'stats': {u'cursor': u'BasicCursor',
            u'n': 3,
            u'nscanned': 3,
            u'nscannedObjects': 3,
            u'timems': 0},
 u'values': [1.0, 2.0, 3.0]}

Exceptions

There are some areas where drivers really differ, like Replica Set support, or Read Preferences. 10gen's drivers are much more consistent than third-party drivers. But if the underlying operation is a command, then all drivers are essentially the same.

So Go Learn How To Run Commands

So the next time you're about to ask, "Does driver X support feature Y," first check if Y is a command by looking for it in the command reference. Chances are it's there, and if so, you know how to run it.

Motor: Iterating Over Results, The Grand Conclusion

This is another post about Motor, my non-blocking driver for MongoDB and Tornado. Last week I asked for your help improving Motor's iteration API, and I got invaluable responses here and on the Tornado mailing list. Today I'm pushing to [ ... ]

Motor

This is another post about Motor, my non-blocking driver for MongoDB and Tornado.

Last week I asked for your help improving Motor's iteration API, and I got invaluable responses here and on the Tornado mailing list. Today I'm pushing to GitHub some breaking changes to the API that'll greatly improve MotorCursor's ease of use.

(Note: I'm continuing to not make version numbers for Motor, since it's going to join PyMongo soon. Meanwhile, to protect yourself against API changes, pip install Motor with a specific git hash until you're ready to upgrade.)

next_object

After getting some inspiration from Ben Darnell on the Tornado list, I added to MotorCursor a fetch_next attribute. You yield fetch_next from a Tornado coroutine, and if it sends back True, then next_object is guaranteed to have a document for you. So iterating over a MotorCursor is now quite nice:

@gen.engine
def f():
    cursor = collection.find()
    while (yield cursor.fetch_next):
        document = cursor.next_object()
        print document

How does this work? Whenever you yield fetch_next, MotorCursor checks if it has another document already batched. If so, it doesn't need to contact the server, it just sends True back into your coroutine. Your coroutine then calls next_object, which simply pops a document off the list.

If there aren't any more documents in the current batch, but the cursor's still alive, fetch_next fetches another batch from the server and then sends True into the coroutine.

And finally, if the cursor is exhausted, fetch_next sends False and your coroutine exits the while-loop.

This new style of iteration handles all the edge cases the previous "while cursor.alive" style failed at: it's an especially big win for the case where find() found no documents at all. I like this new idiom a lot; let me know what you think.

Migration: If you have any loops using while cursor.alive, you'll need to rewrite them in the style shown above. I had some special hacks in place to make cursor.alive useful for loops like this, but I've now removed those hacks, and you shouldn't rely on cursor.alive to tell you whether a cursor has more documents or not. Only rely on fetch_next for that. Furthermore, next_object is now synchronous. It doesn't take a callback, so you can no longer do this:

# old syntax
document = yield motor.Op(cursor.next_object)

to_list

Shane Spencer on the Tornado list insisted I should add a length argument to MotorCursor.to_list so you could say, "Get me a certain number of documents from the result set." I finally saw he was right, so I've added the option.

@gen.engine
def f():
    cursor = collection.find()
    results = yield motor.Op(cursor.to_list, 10)
    while results:
        print results
        results = yield motor.Op(cursor.to_list, 10)

(Thanks to Andrew Downing for suggesting this loop style, apparently it's called a "Yourdon loop.")

This is a nice addition for chunking up your documents and not holding too much in memory. Note that the actual number of documents fetched per batch is controlled by batch_size, not by the length argument. But you can prevent your program from downloading all the batches at once if you pass a length. (I hope that makes sense.)

Migration: If you ever called to_list with an explicit callback as a positional argument, like this:

cursor.to_list(my_callback)

... then my_callback will now be interpreted as the length argument and you'll get an exception:

TypeError: Wrong type for length, value must be an integer

Pass it as a keyword-argument instead:

cursor.to_list(callback=my_callback)

Most Motor methods require you to pass the callback as a keyword argument, anyway, so you might as well use this style for all methods.

each

MotorCursor.each hasn't changed. It continues to be a pretty useless method, in my opinion, but it keeps Motor close to the MongoDB Node.js Driver's API so I'm not going to remove it.

In Conclusion

I asked for your help and I got it; everyone's critiques helped me seriously improve Motor. I'm glad I did this before I had to freeze the API. The new API is so much better.

Motor: Iterating Over Results

Motor (yes, that's my non-blocking MongoDB driver for Tornado) has three methods for iterating a cursor: to_list, each, and next_object. I chose these three methods to match the Node.js driver's methods, but in Python they all have [ ... ]

Motor

Motor (yes, that's my non-blocking MongoDB driver for Tornado) has three methods for iterating a cursor: to_list, each, and next_object. I chose these three methods to match the Node.js driver's methods, but in Python they all have problems.

I'm writing to announce an improvement I made to next_object and to ask you for suggestions for further improvement.

Update: Here's the improvements I made to the API in response to your critique.

to_list

MotorCursor.to_list is clearly the most convenient: it buffers up all the results in memory and passes them to the callback:

@gen.engine
def f():
    results = yield motor.Op(collection.find().to_list)
    print results

But it's dangerous, because you don't know for certain how big the results will be unless you set an explicit limit. In the docs I exhort you to set a limit before calling to_list. Should I raise an exception if you don't, or just let the user beware?

each

MotorCursor's each takes a callback which is executed once for every document. This actually looks fairly elegant in Node.js, but because Python doesn't do anonymous functions it looks like ass in Python, with control jumping forward and backward in the code:

def each(document, error):
    if error:
        raise error
    elif document:
        print document
    else:
        # Iteration complete
        print 'done'

collection.find().each(callback=each)

Python's generators allow us to do inline callbacks with tornado.gen, which makes up for the lack of anonymous functions. each doesn't work with the generator style, though, so I don't think many people will use each.

next_object

Since tornado.gen is the most straightforward way to write Tornado apps, I designed next_object for you to use with tornado.gen, like this:

@gen.engine
def f():
    cursor = collection.find()
    while cursor.alive:
        document = yield motor.Op(cursor.next_object)
        print document

    print 'done'

This loop looks pretty nice, right? The improvement I just committed is that next_object prefetches the next batch whenever needed to ensure that alive is correct—that is, alive is True if the cursor has more documents, False otherwise.

Problem is, just because cursor.alive is True doesn't truly guarantee that next_object will actually return a document. The first call returns None if find matched no documents at all, so a proper loop is more like:

@gen.engine
def f():
    cursor = collection.find()
    while cursor.alive:
        document = yield motor.Op(cursor.next_object)
        if document:
            print document
        else:
            # No results at all
            break

This is looking less nice. A blocking driver could have reasonable solutions like making cursor.alive actually fetch the first batch of results and return False if there are none. But a non-blocking driver needs to take a callback for every method that does I/O. I'm considering introducing a MotorCursor.has_next method that takes a callback:

cursor = collection.find()
while (yield motor.Op(cursor.has_next)):
    # Now we know for sure that document isn't None
    document = yield motor.Op(cursor.next_object)
    print document

This will be a core idiom for Motor applications; it should be as easy as possible to use.

What do you think?

Motor Installation Instructions

Update: Motor is in PyPI now, this is all moot I've done a bad job with installation instructions for Motor, my non-blocking driver for MongoDB and Tornado. I've gotten a bunch of emails from people complaining about this: Traceback (most [...]

Motor

Update: Motor is in PyPI now, this is all moot

I've done a bad job with installation instructions for Motor, my non-blocking driver for MongoDB and Tornado. I've gotten a bunch of emails from people complaining about this:

Traceback (most recent call last):    
  File "myfile.py", line 2, in <module>
    connection = motor.MotorConnection().open_sync()
  File ".../motor/__init__.py", line 690, in open_sync
    raise outcome['error']
pymongo.errors.ConfigurationError: Unknown option _pool_class

You'll get this ConfigurationError if you installed Motor without uninstalling PyMongo first. But you couldn't know that, because I forgot to tell you.

Here's installation instructions, followed by an explanation of why installation is wonky right now and how it will improve, and what Motor's status is now.

Installation

I assume you have pip, and I recommend you use virtualenv—these are just best practices for all Python application development. You need regular CPython, 2.5 or better.

# if you have pymongo installed previously, you MUST uninstall it
pip uninstall pymongo

# install prerequisites
pip install tornado greenlet

# get motor
pip install git+https://github.com/ajdavis/mongo-python-driver.git@motor

Now you should have my versions of pymongo, bson, gridfs, and motor installed:

>>> import motor
>>>

Update: If you're testing against a particular version of Motor, you can freeze that requirement and install that version by git hash, like:

pip install git+https://github.com/ajdavis/mongo-python-driver.git@694436f

pip will say, "Could not find a tag or branch '694436f', assuming commit," which is what you want. You can put Motor and its dependencies in your requirements.txt:

greenlet==0.4.0
tornado==2.4
git+https://github.com/ajdavis/mongo-python-driver.git@694436f

And install:

pip install -r requirements.txt

Confusingly, the command to uninstall Motor is:

pip uninstall pymongo

Why Is Installation Wonky?

Why do you have to uninstall 10gen's official PyMongo before installing Motor? Why isn't Motor in PyPI? Why doesn't Motor automatically install the Tornado and Greenlet packages as dependencies? All will be revealed.

Implementing Motor requires a few extra hooks in the core PyMongo module. For example, I added a _pool_class option to PyMongo's Connection class. Thus Motor and PyMongo are coupled, and I want them to be versioned together. Motor is a feature of PyMongo that you can choose to use. In the future when Motor is an official 10gen product, Motor and PyMongo will be in the same git repository, and in the same package in PyPI, and when you pip install pymongo, you'll get the motor module installed in your site-packages, just like the pymongo, bson, gridfs modules now. There will never be a separate "Motor" package in PyPI.

Even once Motor is official, the whole PyMongo package shouldn't require Tornado and Greenlet as dependencies. So you'll still need to manually install them to make Motor work. PyMongo will still work without Tornado and Greenlet, of course—they won't be necessary until you import motor.

Since that's the goal—the Motor module as a feature of PyMongo, in the same repository and the same PyPI package—this beta period is awkward. I'm building Motor in my fork of the PyMongo repo, on a motor branch, and regularly merging the upstream repo's changes. Sometimes, upstream changes to PyMongo break Motor and need small fixes.

I don't want to make a PyPI package for Motor, since that package will be obsolete once Motor's merged upstream. And since the eventual version of the PyMongo package that includes Motor won't require Tornado or Greenlet as dependencies, neither does the version in my git repo.

Status

Motor is feature-complete, and it's compatible with all the Python versions that Tornado is. MotorConnection has been load-tested by the QA team at a large corporation, with good results. At least one small startup has put MotorReplicaSetConnection in production, with one bug reported and fixed—Motor threw the wrong kinds of exceptions during a replica-set failover. I'm now hunting a similar MotorReplicaSetConnection bug reported on the Tornado mailing list.

Besides that bug, Motor has 37 TODOs. All are reminders to myself to refactor Motor's interaction with PyMongo, and to ensure every corner of Motor is reviewed, tested, and documented. I need to:

  • Complete those refactoring, testing, and documentation TODOs
  • Ensure 100% code coverage by unittests
  • Complete my own load-testing to make sure Motor matches AsyncMongo's performance
  • Pass code reviews from PyMongo's maintainer Bernie Hackett

At that point, Bernie and I will decide if Motor is ready to go official, and I'll announce on this blog, and throw a party.

Party Cat

Eating Your Own Hamster Food

If you aren't using your own libraries as you build them, you're skipping an essential test: not mainly for correctness or performance but for usability. (Using your software as you develop it is normally called "eating your own [ ... ]

Hamster Food

If you aren't using your own libraries as you build them, you're skipping an essential test: not mainly for correctness or performance but for usability.

(Using your software as you develop it is normally called "eating your own dogfood", but I don't have any dogs. Only hamsters. This is my dwarf hamster Rhoda.)

I develop Motor, my asynchronous driver for Tornado and MongoDB, mainly with test-driven development: I think of an API Motor should implement, I write the test, and I make Motor pass the test. But I also use Motor in the blog platform that runs this site. By using Motor, I discovered a few features that are absolutely essential for building a real application with it, which I never would have thought of otherwise:

• Opening a MotorConnection. My initial API for opening a connection to MongoDB with Motor was asynchronous:

connection = motor.MotorConnection()
connection.open(my_callback)

That's fine for unittests. But as soon as I started building my blog it was clear it's a pain in the ass. There's no place in a Tornado application's usual startup sequence to do this step. So I also provide the convenience method open_sync to start up the connection with one blocking call.

• Opening a synchronous connection. Here the problem is reversed: There are places in my code I really need a plain old synchronous PyMongo connection. Should I copy and paste all the code I use to get the MongoDB options from my application configuration? Instead I provided MotorConnection.sync_connection which creates a PyMongo connection using the same options the MotorConnection has.

GridFSHandler. I recently completed Motor's methods for accessing GridFS, MongoDB's binary blob-storage system. Then I updated my blog to serve images from GridFS. And even though all the functionality I needed was complete, it was horribly inconvenient. So I wrote a stream_to_handler method to pipe a GridFS file into a Tornado RequestHandler. Once I started using it, I figured it was still too low-level, so I reimplemented Tornado's StaticFileHandler on top of GridFS. Now serving static files straight from MongoDB is as easy as serving them from the file system.

I've sunk a lot of hours into building this site. I wondered if all the time was worth it. It's not like it has any special features I couldn't get from Nikola or Pelican. Building a capable blog platform with code syntax highlighting, drafts, media, Disqus, Google Analytics, and so on took longer than I expected, and I'm still tinkering with it. But the investment pays off marvelously. By using Motor in a real-world application, even a small one, I've discovered serious usability problems my testing wouldn't reveal.