当前位置:Gxlcms > 数据库问题 > python操作MongoDB部分翻译

python操作MongoDB部分翻译

时间:2021-07-01 10:21:17 帮助过:2人阅读

python操作MongoDB

http://api.mongodb.org/python/current/index.html

This tutorial is intended as an introduction to working with MongoDB and PyMongo .

Prerequisites[前提条件]

Before we start, make sure that you have the PyMongo distribution installed . In the Python shell, the following should run without raising an exception:

>>> import pymongo

This tutorial also assumes that a MongoDB instance is running on the default host and port. Assuming you have downloaded and installed MongoDB, you can start it like so:

$ mongod

Making a Connection with MongoClient[连接]

The first step when working with PyMongo is to create a MongoClient to the running mongod instance. Doing so is easy:

>>> from pymongo import MongoClient
>>> client = MongoClient ()

The above code will connect on the default host and port. We can also specify the host and port explicitly, as follows:

[上面的代码将连接默认的主机和端口。 我们还可以指定主机及端口,具体如下:]

client = MongoClient ( ‘localhost‘ , 27017 )

Or use the MongoDB URI format:

>>> client = MongoClient ( ‘mongodb://localhost:27017/‘ )

Getting a Database[指定数据库]

A single instance of MongoDB can support multiple independent databases . When working with PyMongo you access databases using attribute style access on MongoClient instances:

[......属性风格...]

>>> db = client.test_database

If your database name is such that using attribute style access won‘t work (like test-database ), you can use dictionary style access instead:

[...字典风格...]

>>> db = client [ ‘test-database‘ ]

补充,有密码时用:

db.authenticate("用户名","密码")

Getting a Collection[指定集合]

A collection is a group of documents stored in MongoDB, and can be thought of as roughly the equivalent of a table in a relational database. Getting a collection in PyMongo works the same as getting a database:

[...可以被想象为是一个表...类似上一节的方式]

>>> collection = db.test_collection

or (using dictionary style access):

>>> collection = db [‘test-collection‘]

An important note about collections (and databases) in MongoDB is that they are created lazily - none of the above commands have actually performed any operations on the MongoDB server. Collections and databases are created when the first document is inserted into them.

[...创建是延迟的……直到有数据插入才创建]

Documents[文件]

Data in MongoDB is represented (and stored) using JSON-style documents. In PyMongo we use dictionaries to represent documents. As an example, the following dictionary might be used to represent a blog post:

[...JSON风格的...在PyMongo我们用字典来代表文件...]

>>> import datetime
>>> post = { "author" : "Mike" ,
... "text" : "My first blog post!" ,
... "tags" : [ "mongodb" , "python" , "pymongo" ],
... "date" : datetime . datetime . utcnow ()}

Note that documents can contain native Python types (like datetime.datetime instances) which will be automatically converted to and from the appropriate BSON types.

[...可包含python的类型,会自动转化……TODO BSON]...概念BSON()是一种类json的一种二进制形式的存储格式...但是BSON有JSON没有的一些数据类型,如Date和BinData类型。

Inserting a Document[插入]

To insert a document into a collection we can use the insert_one() method:

>>> posts = db.posts
>>> post_id = posts.insert_one ( post ) . inserted_id
>>> post_id
ObjectId(‘...‘)

When a document is inserted a special key, "_id" , is automatically added if the document doesn‘t already contain an "_id" key. The value of "_id" must be unique across the collection. insert_one() returns an instance of InsertOneResult . For more information on "_id" , see the documentation on _id .

[..."_id"无时会自动插入一个,………………必须是唯一的]

After inserting the first document, the posts collection has actually been created on the server. We can verify this by listing all of the collections in our database:

[插入第一个文档后,该集合...创建。我们可以...列出所有..验证这一点:]

>>> db.collection_names(include_system_collections = False)
[u‘posts‘]

Getting a Single Document With find_one()[find_one获取单个文档]

The most basic type of query that can be performed in MongoDB is find_one() . This method returns a single document matching a query (or None if there are no matches). It is useful when you know there is only one matching document, or are only interested in the first match. Here we use find_one() to get the first document from the posts collection:

[...当你知道只有一个匹配的文件,或者只关心第一...]

>>> posts.find_one()
{u‘date‘: datetime.datetime(...), u‘text‘: u‘My first blog post!‘, u‘_id‘: ObjectId(‘...‘), u‘author‘: u‘Mike‘, u‘tags‘: [u‘mongodb‘, u‘python‘, u‘pymongo‘]}

The result is a dictionary matching the one that we inserted previously.

Note The returned document contains an "_id" , which was automatically added on insert.

find_one() also supports querying on specific elements that the resulting document must match. To limit our results to a document with author “Mike” we do:

[...指定条件...]

>>> posts.find_one ({"author" : "Mike"})
{u‘date‘: datetime.datetime(...), u‘text‘: u‘My first blog post!‘, u‘_id‘: ObjectId(‘...‘), u‘author‘: u‘Mike‘, u‘tags‘: [u‘mongodb‘, u‘python‘, u‘pymongo‘]}

If we try with a different author, like “Eliot”, we‘ll get no result:

>>> posts . find_one ({ "author" : "Eliot" })
>>> None

Querying By ObjectId[根据ObjectId查询]

We can also find a post by its _id , which in our example is an ObjectId:

>>> post_id
ObjectId(...)
>>> posts . find_one ({ "_id" : post_id })
{u‘date‘: datetime.datetime(...), u‘text‘: u‘My first blog post!‘, u‘_id‘: ObjectId(‘...‘), u‘author‘: u‘Mike‘, u‘tags‘: [u‘mongodb‘, u‘python‘, u‘pymongo‘]}

Note that an ObjectId is not the same as its string representation:

[...ObjectId...不是字符类型……]

>>> post_id_as_str = str ( post_id )
>>> posts . find_one ({ "_id" : post_id_as_str }) # No result
>>> None

A common task in web applications is to get an ObjectId from the request URL and find the matching document. It‘s necessary in this case to convert the ObjectId from a string before passing it to find_one :

[…………从字符串转为ObjectId再查...]

from bson.objectid import ObjectId

# The web framework gets post_id from the URL and passes it as a string
def get ( post_id ):
    # Convert from string to ObjectId:
    document = client.db.collection.find_one({‘_id‘ : ObjectId ( post_id )})

A Note On Unicode Strings[Unicode字符串]

You probably noticed that the regular Python strings we stored earlier look different when retrieved from the server (eg u‘Mike‘ instead of ‘Mike‘). A short explanation is in order.

MongoDB stores data in BSON format . BSON strings are UTF-8 encoded so PyMongo must ensure that any strings it stores contain only valid UTF-8 data. Regular strings () are validated and stored unaltered. Unicode strings () are encoded UTF-8 first. The reason our example string is represented in the Python shell as u‘Mike‘ instead of ‘Mike‘ is that PyMongo decodes each BSON string to a Python unicode string, not a regular str.

Bulk Inserts[批量插入]

In order to make querying a little more interesting, let‘s insert a few more documents. In addition to inserting a single document, we can also perform bulk insert operations, by passing a list as the first argument to insert_many() . This will insert each document in the list, sending only a single command to the server:

>>> new_posts = [{ "author" : "Mike" ,
... "text" : "Another post!" ,
... "tags" : [ "bulk" , "insert" ],
... "date" : datetime . datetime ( 2009 , 11 , 12 , 11 , 14 )},
... { "author" : "Eliot" ,
... "title" : "MongoDB is fun" ,
... "text" : "and pretty easy too!" ,
... "date" : datetime . datetime ( 2009 , 11 , 10 , 10 , 45 )}]
>>> result = posts.insert_many ( new_posts )
>>> result.inserted_ids
[ObjectId(‘...‘), ObjectId(‘...‘)]

There are a couple of interesting things to note about this example:

  • The result from insert_many() now returns two ObjectId instances, one for each inserted document.
  • new_posts[1] has a different “shape” than the other posts - there is no "tags" field and we‘ve added a new field, "title" . This is what we mean when we say that MongoDB is schema-free .[两条的结构是不一样的,所以说是schema-free]

Querying for More Than One Document[多条查询]

To get more than a single document as the result of a query we use the find() method. find() returns a Cursor instance, which allows us to iterate over all matching documents. For example, we can iterate over every document in the posts collection:

[...find()...方法,...结果遍历...]

>>> for post in posts . find ():
... post
...
{u‘date‘: datetime.datetime(...), u‘text‘: u‘My first blog post!‘, u‘_id‘: ObjectId(‘...‘), u‘author‘: u‘Mike‘, u‘tags‘: [u‘mongodb‘, u‘python‘, u‘pymongo‘]}
{u‘date‘: datetime.datetime(2009, 11, 12, 11, 14), u‘text‘: u‘Another post!‘, u‘_id‘: ObjectId(‘...‘), u‘author‘: u‘Mike‘, u‘tags‘: [u‘bulk‘, u‘insert‘]}
{u‘date‘: datetime.datetime(2009, 11, 10, 10, 45), u‘text‘: u‘and pretty easy too!‘, u‘_id‘: ObjectId(‘...‘), u‘author‘: u‘Eliot‘, u‘title‘: u‘MongoDB is fun‘}

Just like we did with find_one() , we can pass a document to find() to limit the returned results. Here, we get only those documents whose author is “Mike”:

[...加条件的,查多条结果...]

>>> for post in posts.find ({ "author" : "Mike" }):
... post
...
{u‘date‘: datetime.datetime(...), u‘text‘: u‘My first blog post!‘, u‘_id‘: ObjectId(‘...‘), u‘author‘: u‘Mike‘, u‘tags‘: [u‘mongodb‘, u‘python‘, u‘pymongo‘]}
{u‘date‘: datetime.datetime(2009, 11, 12, 11, 14), u‘text‘: u‘Another post!‘, u‘_id‘: ObjectId(‘...‘), u‘author‘: u‘Mike‘, u‘tags‘: [u‘bulk‘, u‘insert‘]}

Counting[计数]

If we just want to know how many documents match a query we can perform a count() operation instead of a full query. We can get a count of all of the documents in a collection:

>>> posts.count()
3

or just of those documents that match a specific query:

[..附加条件的计数……]

>>> posts.find({"author":"Mike"}).count()
2

Range Queries

MongoDB supports many different types of advanced queries . As an example, lets perform a query where we limit results to posts older than a certain date, but also sort the results by author:

[...高级查询……比如……时间……]

>>> d = datetime.datetime( 2009 , 11 , 12 , 12 ) 
>>> for post in posts . find ({"date":{"$lt":d }}).sort( "author" ): 
... print post 
... {u‘date‘:  datetime.datetime(2009, 11, 10, 10, 45), u‘text‘: u‘and pretty easy too!‘, u‘_id‘: ObjectId(‘...‘), u‘author‘: u‘Eliot‘, u‘title‘: u‘MongoDB is fun‘} {u‘date‘: datetime.datetime(2009, 11, 12, 11, 14), u‘text‘: u‘Another post!‘, u‘_id‘: ObjectId(‘...‘), u‘author‘: u‘Mike‘, u‘tags‘: [u‘bulk‘, u‘insert‘]}

Here we use the special "$lt" operator to do a range query, and also call sort() to sort the results by author.

[..."$lt"操作符,用于范围……]

Indexing[索引]

To make the above query fast we can add a compound index on "date" and "author" . To start, lets use the explain() method to get some information about how the query is being performed without the index:

>>> posts . find ({ "date" : { "$lt" : d }}) . sort ( "author" ) . explain ()[ "cursor" ]
u‘BasicCursor‘
>>> posts . find ({ "date" : { "$lt" : d }}) . sort ( "author" ) . explain ()[ "nscanned" ]
3

We can see that the query is using the BasicCursor and scanning over all 3 documents in the collection. Now let‘s add a compound index and look at the same information:

[...使用BasicCursor...扫描了3.……下面看看加了索引后……]

>>> from pymongo import ASCENDING , DESCENDING
>>> posts . create_index ([( "date" , DESCENDING ), ( "author" , ASCENDING )])
u‘date_-1_author_1‘
>>> posts . find ({ "date" : { "$lt" : d }}) . sort ( "author" ) . explain ()[ "cursor" ]
u‘BtreeCursor date_-1_author_1‘
>>> posts . find ({ "date" : { "$lt" : d }}) . sort ( "author" ) . explain ()[ "nscanned" ]
2

Now the query is using a BtreeCursor (the index) and only scanning over the 2 matching documents.

python操作MongoDB部分翻译

标签:

人气教程排行