Motor

This is another post about Motor, my non-blocking driver for MongoDB and Tornado.

Last week I asked for your help improving Motor's iteration API, and I got invaluable responses here and on the Tornado mailing list. Today I'm pushing to GitHub some breaking changes to the API that'll greatly improve MotorCursor's ease of use.

(Note: I'm continuing to not make version numbers for Motor, since it's going to join PyMongo soon. Meanwhile, to protect yourself against API changes, pip install Motor with a specific git hash until you're ready to upgrade.)

next_object

After getting some inspiration from Ben Darnell on the Tornado list, I added to MotorCursor a fetch_next attribute. You yield fetch_next from a Tornado coroutine, and if it sends back True, then next_object is guaranteed to have a document for you. So iterating over a MotorCursor is now quite nice:

@gen.engine
def f():
    cursor = collection.find()
    while (yield cursor.fetch_next):
        document = cursor.next_object()
        print document

How does this work? Whenever you yield fetch_next, MotorCursor checks if it has another document already batched. If so, it doesn't need to contact the server, it just sends True back into your coroutine. Your coroutine then calls next_object, which simply pops a document off the list.

If there aren't any more documents in the current batch, but the cursor's still alive, fetch_next fetches another batch from the server and then sends True into the coroutine.

And finally, if the cursor is exhausted, fetch_next sends False and your coroutine exits the while-loop.

This new style of iteration handles all the edge cases the previous "while cursor.alive" style failed at: it's an especially big win for the case where find() found no documents at all. I like this new idiom a lot; let me know what you think.

Migration: If you have any loops using while cursor.alive, you'll need to rewrite them in the style shown above. I had some special hacks in place to make cursor.alive useful for loops like this, but I've now removed those hacks, and you shouldn't rely on cursor.alive to tell you whether a cursor has more documents or not. Only rely on fetch_next for that. Furthermore, next_object is now synchronous. It doesn't take a callback, so you can no longer do this:

# old syntax
document = yield motor.Op(cursor.next_object)

to_list

Shane Spencer on the Tornado list insisted I should add a length argument to MotorCursor.to_list so you could say, "Get me a certain number of documents from the result set." I finally saw he was right, so I've added the option.

@gen.engine
def f():
    cursor = collection.find()
    results = yield motor.Op(cursor.to_list, 10)
    while results:
        print results
        results = yield motor.Op(cursor.to_list, 10)

(Thanks to Andrew Downing for suggesting this loop style, apparently it's called a "Yourdon loop.")

This is a nice addition for chunking up your documents and not holding too much in memory. Note that the actual number of documents fetched per batch is controlled by batch_size, not by the length argument. But you can prevent your program from downloading all the batches at once if you pass a length. (I hope that makes sense.)

Migration: If you ever called to_list with an explicit callback as a positional argument, like this:

cursor.to_list(my_callback)

... then my_callback will now be interpreted as the length argument and you'll get an exception:

TypeError: Wrong type for length, value must be an integer

Pass it as a keyword-argument instead:

cursor.to_list(callback=my_callback)

Most Motor methods require you to pass the callback as a keyword argument, anyway, so you might as well use this style for all methods.

each

MotorCursor.each hasn't changed. It continues to be a pretty useless method, in my opinion, but it keeps Motor close to the MongoDB Node.js Driver's API so I'm not going to remove it.

In Conclusion

I asked for your help and I got it; everyone's critiques helped me seriously improve Motor. I'm glad I did this before I had to freeze the API. The new API is so much better.