20

Is it possible to merge array fields in while using MongoDB aggregation framework? Here is a summary problem I am trying to solve:

Sample input documents for aggregation:

{
  "Category" : 1,
  "Messages" : ["Msg1", "Msg2"],
  "Value" : 1
},
{
  "Category" : 1,
  "Messages" : [],
  "Value" : 10
},
{
  "Category" : 1,
  "Messages" : ["Msg1", "Msg3"],
  "Value" : 100
},
{
  "Category" : 2,
  "Messages" : ["Msg4"],
  "Value" : 1000
},
{
  "Category" : 2,
  "Messages" : ["Msg5"],
  "Value" : 10000
},
{
  "Category" : 3,
  "Messages" : [],
  "Value" : 100000
}

We want to group by 'Category' while summing up 'Value' and merging 'Messages'. I have tried this aggregation pipeline:

{group : {
        _id : "$Category",
        Value : { $sum : "$Value"},
        Messages : {$push : "$Messages"}
    }
}, 
{$unwind : "$Messages"}, 
{$unwind : "$Messages"}, 
{$group : {
        _id : "$_id",
        Value : {$first : "$Value"},
        Messages : {$addToSet : "$Messages"}
    }
}

The result is:

"result" : [{
        "_id" : 1,
        "Value" : 111,
        "Messages" : ["Msg3", "Msg2", "Msg1"]
    }, 
    {
        "_id" : 2,
        "Value" : 11000,
        "Messages" : ["Msg5", "Msg4"]
    }
]

However, this completely misses Category 3 since the documents where 'Category' is 3 do not have any 'Messages' and they are dropped by the second unwind. We would like the result to include the following as well:

{
    "_id" : 3,
    "Value" : 100000,
    "Messages" : []
}

Is there a neat way of achieving this by the aggregation framework?

4
  • is Messages guaranteed to be there as an array? Or is it possible it won't exist or will be there but as a different type? Commented Oct 18, 2013 at 1:11
  • yes Messages is guaranteed to exist as an array (which may be empty for some records). Commented Oct 29, 2013 at 11:13
  • 2
    have you tried the preserveNullAndEmptyArrays option to $unwind? Commented Dec 20, 2016 at 15:40
  • This question was raised when we were using v2.6. I believe using preserveNullAndEmptyArrays should do what we were looking for. Commented Mar 30, 2017 at 13:09

2 Answers 2

19

Here is a trick you can use if Messages is guaranteed to be an array:

> db.messages.find()
    { "Category" : 1, "Messages" : [  "Msg1",  "Msg2" ], "Value" : 1 }
    { "Category" : 1, "Messages" : [ ], "Value" : 10 }
    { "Category" : 1, "Messages" : [  "Msg1",  "Msg3" ], "Value" : 100 }
    { "Category" : 2, "Messages" : [  "Msg4" ], "Value" : 1000 }
    { "Category" : 2, "Messages" : [  "Msg5" ], "Value" : 10000 }
    { "Category" : 3, "Messages" : [ ], "Value" : 100000 }

> var group1 = {
    "$group":   {
        "_id":      "$Category",
        "Value":    {
            "$sum":     "$Value"
        },
        "Messages": {
            "$push":    "$Messages"
        }
    }
};

> var project1 = {
    "$project": {
        "Value":    1,
        "Messages": {
            "$cond":    [
                {
                    "$eq":  [
                        "$Messages",
                        [ [ ] ]
                    ]
                },
                [ [ null ] ],
                "$Messages"
            ]
        }
    }
};

> db.messages.aggregate( group1, project1 )
    { "_id" : 3, "Value" : 100000, "Messages" : [  [  null ] ] }
    { "_id" : 2, "Value" : 11000, "Messages" : [  [  "Msg4" ],  [  "Msg5" ] ] }
    { "_id" : 1, "Value" : 111, "Messages" : [  [  "Msg1",  "Msg2" ],  [ ],  [  "Msg1",  "Msg3" ] ] }

Now unwind twice and re-group to get a single Messages array.

> var unwind = {"$unwind":"$Messages"};

> var group2 = {
    $group: {
        "_id":      "$_id", 
        "Value":    {
            "$first":       "$Value"
        }, 
        "Messages": {
            "$addToSet":    "$Messages"
        }
    }
};

> var project2 = {
    "$project": {
        "Category": "$_id",
        "_id":      0,
        "Value":    1,
        "Messages": {
            "$cond":    [
                {
                    "$eq":  [
                        "$Messages",
                        [ null ]
                    ]
                },
                [ ],
                "$Messages"
            ]
        }
    }
};

> db.messages.aggregate(group1, project1, unwind, unwind, group2 ,project2 )
    { "Value" : 111, "Messages" : [  "Msg3",  "Msg2",  "Msg1" ], "Category" : 1 }
    { "Value" : 11000, "Messages" : [  "Msg5",  "Msg4" ], "Category" : 2 }
    { "Value" : 100000, "Messages" : [ ], "Category" : 3 }
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks for the tips. It almost does what I need. However, there is a case where it does not produce the desired result. Aggregated result for Category 1 (based on the documents in my original post) ends up having 4 messages: ["Msg1", "Msg2", "Msg3", "dummy"]. I am not sure how to easily get rid of "dummy" for this case.
Right - there is a way to get rid of it - I'll update the answer
ok, the complete answer now with all the steps - should be exactly what you want :)
@AsyaKamsky thanks this is great help could you help me for one more use case in it as I have two array field in my document as you can say messages and tags. And I need same behaviour for two fields
post it as a question with full details - comments aren't really for discussing new problems.
4

As already mentioned in one of the comments, the simplest answer to the original question is to add preserveNullAndEmptyArrays to the $unwind stage.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.