10

If I need to insert a document in MongoDB if it does not exist yet

db_stock.update_one(document, {'$set': document}, upsert=True)

.will do the job (feel free to correct me if I am wrong)

But if I have a list of documents and want to insert them all what would be a best way of doing it?

There is a single-record version of this question but I need an en mass version of it, so it's different.

Let me reword my question. I have millions of documents, few of which can be already stored. How do I store remaining ones in MongoDB in a matter of seconds, not minutes/hours?

3
  • unfortunately there is no other way as iterate documents and use update_one. I was thinking about bulk operations, but to use bulk.find.upsert you need to have this document in db first. Commented Mar 18, 2016 at 12:29
  • OK. Can bulk delete the documents from the list and then bulk insert them? Commented Mar 18, 2016 at 13:22
  • if you get full collection in to list - then you can just delete collection and insert , or delete by know ids (retrieved ones) and reinsert using insert_many. Commented Mar 18, 2016 at 13:25

2 Answers 2

14

You need to use insert_many method and set the ordered option to False.

db_stock.insert_many(<list of documents>)

As mentioned in the ordered option documentation:

ordered (optional): If True (the default) documents will be inserted on the server serially, in the order provided. If an error occurs all remaining inserts are aborted. If False, documents will be inserted on the server in arbitrary order, possibly in parallel, and all document inserts will be attempted.

Which means that insertion will continue even if there is duplicate key error.

Demo:

>>> c.insert_many([{'_id': 2}, {'_id': 3}])
<pymongo.results.InsertManyResult object at 0x7f5ca669ef30>
>>> list(c.find())
[{'_id': 2}, {'_id': 3}]
>>> try:
...     c.insert_many([{'_id': 2}, {'_id': 3}, {'_id': 4}, {'_id': 5}], ordered=False)
... except pymongo.errors.BulkWriteError:
...     list(c.find())
... 
[{'_id': 2}, {'_id': 3}, {'_id': 4}, {'_id': 5}]

As you can see document with _id 4, 5 were inserted into the collection.


It worth noting that this is also possible in the shell using the insertMany method. All you need is set the undocumented option ordered to false.

db.collection.insertMany(
    [ 
        { '_id': 2 }, 
        { '_id': 3 },
        { '_id': 4 }, 
        { '_id': 5 }
    ],
    { 'ordered': false }
)
Sign up to request clarification or add additional context in comments.

1 Comment

@ORA600 Just to make the distinction clear, using "upserts" ( even with bulk operations, and still the old interface for current pymongo ) means that you are still "looking up" the data prior to deciding whether to "insert" or not. The nature of the "lookup" means you never get duplicates, however it naturally comes with a "cost". Hence the suggestion to "insert_many" with ordered=False ( can still do the same thing with "bulk" ) does not have that "lookup" overhead. Hences it's the "fastest", which is what you asked for.
0

With bulkWrite you can do this, though I'm not sure what the pymongo command for it is, here's the straight mongodb query:

db.products.insert([
  { _id: 11, item: "pencil", qty: 50, type: "no.2" },
  { item: "pen", qty: 20 },
  { item: "eraser", qty: 25 }
])

3 Comments

this is not answer for this particular question, as it states "bulk UPSERT"
His question says insert in both title and question body.
"en masse" means bulk :-)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.