A. Jesse Jiryu Davis

Tag: pymongo

PyMongo 2.4.2 Is Out

Yesterday we released PyMongo 2.4.2, the latest version of 10gen's Python driver for MongoDB. You can see the whole list of nine bugs fixed. Here are some highlights: I made PyMongo's MongoReplicaSetClient smarter about reading from [ ... ]

Yesterday we released PyMongo 2.4.2, the latest version of 10gen's Python driver for MongoDB. You can see the whole list of nine bugs fixed. Here are some highlights:

  • I made PyMongo's MongoReplicaSetClient smarter about reading from replica set members in failure scenarios. Since version 2.1, PyMongo has been able to detect when a secondary becomes primary or vice versa. But it wasn't very smart about members that are neither primary nor secondary because they're in recovery mode. Now, PyMongo reacts as soon as it notices such a member: it stops trying to use it, and it refreshes its view of all members' states immediately.

  • We got an excellent pull request from Craig Hobbs that lets you specify your read preference in the connection string, like:

    "mongodb://localhost/?readPreference=secondary"
    
  • If you want to try MongoDB's full-text search, PyMongo can now create a text index. (All versions let you to run the text command to use a text index once you've created

(Down here we have to speak very quietly, because the next part is top-secret: I snuck a feature into what's supposed to be a bugfix release. PyMongo 2.4.2 has the hooks Motor needs to wrap PyMongo and make it non-blocking. This lets Motor take a new direction, which I'll blog about shortly.)

Reading from MongoDB Replica Sets with PyMongo

Read preferences are a new feature in MongoDB 2.2 that lets you finely control how queries are routed to replica set members. With fine control comes complexity, but fear not: I'll explain how to use read preferences to route your queries [...]

Book Wheel

Read preferences are a new feature in MongoDB 2.2 that lets you finely control how queries are routed to replica set members. With fine control comes complexity, but fear not: I'll explain how to use read preferences to route your queries with PyMongo.

(I helped write 10gen's spec for read preferences, and I did the implementation for PyMongo 2.3.)


Contents:

The Problem

Which member of a replica set should PyMongo use for a find, or for a read-only command like count? Should it query the primary or a secondary? If it queries a secondary, which one should it use? How can you control this choice?

When your application queries a replica set, you have the opportunity to trade off consistency, availability, latency, and throughput for each kind of query. This is the problem that read preferences solve: how to specify your preferences among these four variables, so you read from the best member of the replica set for each query.

First I'll describe what a read preference is. Then I'll show PyMongo's algorithm for choosing a member. Finally, I'll discuss a list of use cases and recommend a read preference to use for each.

Read Preferences

A read preference has three parts:

Mode. This determines whether to read from the primary or secondaries. There are five modes:

  • PRIMARY: The default mode. Always read from the primary. If there's no primary raise an exception, "AutoReconnect: No replica set primary available for query".
  • SECONDARY: read from a secondary if there is one, otherwise raise an exception: AutoReconnect: No replica set secondary available for query. PyMongo prefers secondaries with short ping times.
  • PRIMARY_PREFERRED: read from the primary if there is one, otherwise a secondary.
  • SECONDARY_PREFERRED: read from a secondary if there is one, otherwise the primary. Again, low-latency secondaries are preferred.
  • NEAREST: read from any low-latency member.

Tag Sets. If you've tagged your replica set members, you can use tags to specify which members to read from. Let's say you've tagged your members according to which data centers they're in. Your replica-set config is like:

{
    _id : "someSet",
    members : [
        {_id : 0, host : "A", tags : {"dc": "ny"}},
        {_id : 1, host : "B", tags : {"dc": "ny"}},
        {_id : 2, host : "C", tags : {"dc": "sf"}},
        {_id : 3, host : "D", tags : {"dc": "sf"}},
        {_id : 4, host : "E", tags : {"dc": "uk"}}
    ]
}

You could configure PyMongo to use this array of tag sets:

[{'dc': 'ny'}, {'dc':'sf'}, {}]

The driver searches through the array, from first tag set to last, looking for a tag set that matches one or more members. So if any members are online that match {'dc': 'ny'}, the driver picks among them, preferring those with the shortest ping times. If no members match {'dc': 'ny'}, PyMongo looks for members matching {'dc':'sf'}, and so on down the list.

The final, empty tag set {} means, "read from any member regardless of tags." It's a fail-safe. If you would rather raise an exception than read from a member that doesn't match a tag set, omit the empty set from the end of the array:

[{'dc': 'ny'}, {'dc':'sf'}]

In this case, if all members in New York and San Francisco are down, PyMongo will raise an exception instead of trying the member in the UK.

You can have multiple tags in a set. A member has to match all the tags. E.g., if your array of tag sets is like:

[{'dc': 'ny', 'disk': 'ssd'}]

... then only a member tagged both with 'dc': 'ny' and with 'disk': 'ssd' is a match. A member's extra tags, like 'rack': 2, have no effect.

Each mode interacts a little differently with tag sets:

  • PRIMARY: You can't combine tag sets with PRIMARY. After all, there's only one primary, so it's senseless to ask for a primary with particular tags.
  • PRIMARY_PREFERRED: If the primary is up, read from it no matter how it's tagged. If the primary is down, read from a secondary matching the tags provided. If there is no such secondary, raise an error.
  • SECONDARY: Read from a secondary that matches the first tag set for which there are any matches.
  • SECONDARY_PREFERRED: Like SECONDARY, or if there are no matching secondaries, like PRIMARY.
  • NEAREST: Like SECONDARY, but treat the primary the same as the secondaries.

secondary_acceptable_latency_ms. PyMongo tracks each member's ping time (see monitoring below) and queries only the "nearest" member, or any random member no more than 15ms "farther" than it.

Say you have members who are 10, 20, and 30 milliseconds away:

Servers

PyMongo distributes queries evenly between the 10- and 20-millisecond member. It excuses the 30-millisecond member, because it's more than 15ms farther than the closest member. You can override the 15ms default by setting the snappily-named secondary_acceptable_latency_ms option.

The Algorithm

PyMongo chooses a member using the three parts of a read preference as a three-stage filter, removing ineligible members at each stage. For PRIMARY, the driver just picks the primary, or if there's no primary, raises an exception. For SECONDARY and NEAREST:

  1. Apply the mode. For SECONDARY, filter out the primary and continue. For NEAREST, keep all the members and continue.
  2. Apply the tag sets. If there are no tag sets configured, then pass all the members to the next stage. Otherwise, search through the array of tag sets looking for a tag set that matches some members, and pass those members to the next stage.
  3. Apply ping times. First, find the nearest member who's survived filtration so far. Then filter out any members more than 15ms farther.

If several members are left at the end of the final stage, the driver picks one at random and sends it your query.

PRIMARY_PREFERRED uses the primary if there is one, otherwise it runs the SECONDARY algorithm.

SECONDARY_PREFERRED first runs the SECONDARY algorithm, and if there's no member left at the end, it uses the primary.

I can hear your objections: "It's complicated," you say. It is a bit complicated, but we chose this algorithm because we think it can be configured to work for any use-case you throw at it. (See use cases below.) "It's expensive," you object. The algorithm is cheaper than it sounds because it does no I/O at all. It just uses what it already knows about your replica set from periodic monitoring.

Finally, Some Code

Let's actually use read preferences with PyMongo. The simplest method is to configure a MongoReplicaSetClient. By default, the mode is PRIMARY, the tag sets are empty, and secondary_acceptable_latency_ms is 15ms:

from pymongo.mongo_replica_set_client import MongoReplicaSetClient

rsc = MongoReplicaSetClient('host1,host2,host3', replicaSet='foo')

You can override any of these options with keyword arguments.

from pymongo.mongo_replica_set_client import MongoReplicaSetClient
from pymongo.read_preferences import ReadPreference

rsc = MongoReplicaSetClient('host1,host2,host3', replicaSet='foo',
    read_preference=ReadPreference.SECONDARY_PREFERRED,
    tag_sets=[{'dc': 'ny'}, {}],
    secondary_acceptable_latency_ms=50)

(Note that what I'm calling the "mode" is configured with the read_preference option.)

If you initialize a MongoReplicaSetClient like this then all reads use the mode, tag sets, and latency you've configured. You can also override any of these three options post-hoc:

rsc = MongoReplicaSetClient('host1,host2,host3', replicaSet='foo')
rsc.read_preference = ReadPreference.NEAREST
rsc.tag_sets = [{'disk': 'ssd'}]
rsc.secondary_acceptable_latency_ms = 1000

You can do the same when accessing a database from a MongoReplicaSetClient:

db = rsc.my_database
db.read_preference = ReadPreference.SECONDARY

Or a collection:

collection = db.my_collection
collection.tag_sets = [{'dc': 'cloud'}]

You can even set your preference on individual method calls:

results = list(
    collection.find({}, secondary_acceptable_latency_ms=0))

document = collection.find_one(
    {'field': 'value'}, read_preference=ReadPreference.NEAREST)

Each of these four levels—connection, database, collection, method—inherits the options of the previous level and allows you to override them.

(Further reading: PyMongo read preferences, PyMongo's MongoReplicaSetClient.)

Remember slave_okay?

The old ReplicaSetConnection had a slave_okay option. That's deprecated now, but it still works. It's treated like SECONDARY_PREFERRED.

Commands

Some commands like findAndModify write data, others like count only read it. The read-only commands obey your read preference, the rest are always sent to the primary. Here are the commands that obey read preferences:

  • count
  • distinct
  • group
  • aggregate
  • inline mapreduce
  • collStats, dbStats
  • geoNear, geoSearch, geoWalk

If you want, you can override the read preference while executing an individual command:

stats = rsc.my_database.command(
    'dbStats', read_preference=ReadPreference.SECONDARY)

Sharding

When you run find on a sharded cluster of replica sets, PyMongo sends your read preference to mongos. E.g., if you do a query like:

collection.find(
    {'field': 'value'},
    read_preference=ReadPreference.SECONDARY_PREFERRED,
    tag_sets=[{'dc': 'ny'}, {}])

Then PyMongo sends a query to mongos like:

{
    $query: {field: 'value'},
    $readPreference: {
        mode: 'secondaryPreferred',
        tags: [{'dc': 'ny'}, {}],
    }
}

Mongos interprets this $readPreference field and applies the read-preference logic to each replica set in the sharded cluster.

There are two limitations:

  1. Mongos sends all commands to the primaries; you'll have to wait for version 2.4 to route read-only commands to secondaries. (See SERVER-7423.)
  2. You can't override mongos's secondary_acceptable_latency_ms, only its mode and tag sets.

Use Cases

I want maximum consistency. By "consistency" you mean you don't want stale reads under any circumstances. As soon as you've modified some data, you want all your reads to reflect the change. In this case use PRIMARY, and be aware that when you have no primary (e.g. during an election, or if a majority of the replica set is offline) that every query will raise an exception.

I want maximum availability. You want to be able to query if possible. Use PRIMARY_PREFERRED: when there's a primary you'll get consistent reads, but if there's no primary you can query secondaries. I like this option, because it lets your app stay online, read-only, during a failover. Be careful to test that your app behaves well under these circumstances, obviously.

I want minimum latency. Use NEAREST. The driver or mongos will read from the fastest member and those within 15ms of it. Be aware that you risk inconsistency: if the nearest member to your app server is a secondary with some replication lag, you could read stale data. Also note that NEAREST merely minimizes network lag, rather than reading from the member with the lowest IO or CPU load.

I use replica sets to distribute my data. If you have a replica set with members spread around the globe, you can tag them like in the tag sets example above. Then, configure your application servers to query the members nearby. For example, your New York app servers do:

rsc.read_preference = ReadPreference.NEAREST
rsc.tag_sets = [{'dc': 'ny'}, {}]

Although NEAREST favors nearby secondaries anyway, including the tag makes the choice more predictable.

I want maximum throughput. Use NEAREST and set secondary_acceptable_latency_ms very high, like 500ms. This will distribute the query load equally among all members, thus (under most circumstances) giving you maximum read throughput.

If you want to move read load off your primary, use mode SECONDARY. It's tempting to use SECONDARY_PREFERRED, but if your primary can't take your full read load, you probably prefer for your queries to fail than to move all the load to the primary whenever your secondaries are unavailable.

Monitoring

PyMongo needs to know a lot about the state of your replica set to know which members to use for your read preference. If you create a MongoReplicaSetClient like:

rsc = MongoReplicaSetClient('host1,host2,host3', replicaSet='foo')

...then the MongoReplicaSetClient tries to connect to each server, in random order, until it finds one that's up. It runs the isMaster command on that server. The server's response tells the MongoReplicaSetClient which members are in the replica set now, how they're tagged, and who's primary. MongoReplicaSetClient then calls isMaster on each member currently in the set and records the latency. This is what I called "ping time" above.

Once all that's complete, the MongoReplicaSetClient launches a background thread called the Monitor. The Monitor wakes every 30 seconds and refreshes its view of the replica set: it runs isMaster on all the members again, marks as "down" any members it can't reach, and notes new members who've joined. It also updates its latency measurement for each member. It uses a 5-sample moving average to track each member's latency.

If a member goes down, MongoReplicaSetClient won't take 30 seconds to notice. As soon as it gets a network error attempting to query a member it thought was up, it wakes the Monitor to refresh ASAP.

In Conclusion

There's a lot of options and details here. If you just want to query the primary, then accept the default, and if you just want to move load to your secondaries, use SECONDARY. But if you're the kind of hotrodder who needs to optimize for consistency, availability, latency, or throughput with every query, read preferences give you total control.

PyMongo's New Default: Safe Writes!

I joyfully announce that we are changing all of 10gen's MongoDB drivers to do "safe writes" by default. In the process we're renaming all the connection classes to MongoClient, so all the drivers now use the same term for the central class. [ ... ]

I joyfully announce that we are changing all of 10gen's MongoDB drivers to do "safe writes" by default. In the process we're renaming all the connection classes to MongoClient, so all the drivers now use the same term for the central class.

PyMongo 2.4, released today, has new classes called MongoClient and MongoReplicaSetClient that have the new default setting, and a new API for configuring write-acknowledgement called "write concerns". PyMongo's old Connection and ReplicaSetConnection classes remain untouched for backward compatibility, but they are now considered deprecated and will disappear in some future release. The changes were implemented by PyMongo's maintainer (and my favorite colleague) Bernie Hackett.


Contents:

Background

MongoDB's writes happen in two phases. First the driver sends the server an insert, update, or remove message. The MongoDB server executes the operation and notes the outcome: it records whether there was an error, how many documents were updated or removed, and whether an upsert resulted in an update or an insert.

In the next phase, the driver runs the getLastError command on the server and awaits the response:

getLastError

This getLastError call can be omitted for speed, in which case the driver just sends all its write messages without awaiting acknowledgment. "Fire-and-forget" mode is obviously very high-performance, because it can take advantage of network throughput without being affected by network latency. But this mode doesn't report errors to your application, and it doesn't guarantee that a write has completed before you do a query. It's not the right mode to use by default, so we're changing it now.

In the past we haven't been particularly consistent in our terms for these modes, sometimes talking about "safe" and "unsafe" writes, at other times "blocking" and "non-blocking", etc. From now on we're trying to stick to "acknowledged" and "unacknowledged," since that goes to the heart of the difference. I'll stick to these terms here.

(In 10gen's ancient history, before my time, the plan was to make a full platform-as-a-service stack with MongoDB as the data layer. It made sense then for getLastError to be a separate operation that was run explicitly, and to not call getLastError automatically by default. But MongoDB is a standalone product and it's clear that the default needs to change.)

The New Defaults

In earlier versions of PyMongo you would create a connection like this:

from pymongo import Connection
connection = Connection('localhost', 27017)

By default, Connection did unacknowledged writes—it didn't call getLastError at all. You could change that with the safe option like:

connection = Connection('localhost', 27017, safe=True)

You could also configure arguments that were passed to every getLastError call that made it wait for specific events, e.g. to wait for the primary and two secondaries to replicate the write, you could pass w=3, and to wait for the primary to commit the write to its journal, you could pass j=True:

connection = Connection('localhost', 27017, w=3, j=True)

(The "w" terminology comes from the Dynamo whitepaper that's foundational to the NoSQL movement.)

Connection hasn't changed in PyMongo 2.4, but we've added a MongoClient which does acknowledged writes by default:

from pymongo import MongoClient
client = MongoClient('localhost', 27017)

MongoClient lets you pass arguments to getLastError just like Connection did:

from pymongo import MongoClient
client = MongoClient('localhost', 27017, w=3, j=True)

Instead of an odd overlap between the safe and w options, we've now standardized on using w only. So you can get the old behavior of unacknowledged writes with the new classes using w=0:

client = MongoClient('localhost', 27017, w=0)

w=0 is the new way to say safe=False.

w=1 is the new safe=True and it's now the default. Other options like j=True or w=3 work the same as before. You can still set options per-operation:

client.db.collection.insert({'foo': 'bar'}, w=1)

ReplicaSetConnection is also obsolete, of course, and succeeded by MongoReplicaSetClient.

Write Concerns

The old Connection class let you set the safe attribute to True or False, or call set_lasterror_options() for more complex configuration. These are deprecated, and you should now use the MongoClient.write_concern attribute. write_concern is a dict whose keys may include w, wtimeout, j, and fsync:

>>> client = MongoClient()
>>> # default empty dict means "w=1"
>>> client.write_concern
{}
>>> client.write_concern = {'w': 2, 'wtimeout': 1000}
>>> client.write_concern
{'wtimeout': 1000, 'w': 2}
>>> client.write_concern['j'] = True
>>> client.write_concern
{'wtimeout': 1000, 'j': True, 'w': 2}
>>> client.write_concern['w'] = 0 # disable write acknowledgement

You can see that the default write_concern is an empty dictionary. It's equivalent to w=1, meaning "do regular acknowledged writes".

auto_start_request

This is very nerdy, but my personal favorite. The default value for auto_start_request is changing from True to False.

The short explanation is this: with the old Connection, you could write some data to the server without acknowledgment, and then read that data back immediately afterward, provided there wasn't an error and that you used the same socket for the write and the read. If you used a different socket for the two operations then there was no guarantee of "read your writes consistency," because the write could still be enqueued on one socket while you completed the read on the other.

You could pin the current thread to a single socket with Connection.start_request(), and in fact the default was for Connection to start a request for you with every operation. That's auto_start_request. It offers some consistency guarantees but requires the driver to open extra sockets.

Now that MongoClient waits for acknowledgment of every write, auto_start_request is no longer needed. If you do this:

>>> collection = MongoClient().db.collection
>>> collection.insert({'foo': 'bar'})
>>> print collection.find_one({'foo': 'bar'})

... then the find_one won't run until the insert is acknowledged, which means your document has definitely been inserted and you can query for it confidently on any socket. We turned off auto_start_request for improved performance and fewer sockets. If you're doing unacknowledged writes with w=0 followed by reads, you should consider whether to call MongoClient.start_request(). See the details (with charts!) in my blog post on requests from April.

Migration

Connection and ReplicaSetConnection will remain for a while (not forever), so your existing code will work the same and you have time to migrate. We are working to update all documentation and example code to use the new classes. In time we'll add deprecation warnings to the old classes and methods before removing them completely.

If you maintain a library built on PyMongo, you can check for the new classes with code like:

try:
    from pymongo import MongoClient
    has_mongo_client = True
except ImportError:
    has_mongo_client = False

What About Motor?

Motor's in beta, so I'll break backwards compatibility ruthlessly for the sake of cleanliness. In the next week or two I'll merge the official PyMongo changes into my fork, and I'll nuke MotorConnection and MotorReplicaSetConnection, to be replaced with MotorClient and MotorReplicaSetClient.

The Uplifting Conclusion

We've known for a while that unacknowledged writes were the wrong default. Now it's finally time to fix it. The new MongoClient class lets you migrate from the old default to the new one at your leisure, and brings a bonus: all the drivers agree on the name of the main entry-point. For programmers new to MongoDB, turning on write-acknowledgment by default is a huge win, and makes it much more intuitive to write applications on MongoDB.