Emulating Neo4j Graph DB Nodes and Edges in MongoDB 3.6

By Chris Heino, posted December 14, 2017

Introduction

Multi-modal NoSQL database MongoDB added several useful new features since version 3.3, including the $lookup, $graphlookup, and $facet aggregation stages. In this article, I will explore how well the new $graphlookup feature compares to the graph querying capabilities of a specialized graph database such as Neo4j. In particular, we will explain a technique storing graph vertexes and edges in separate MongoDB collections to allow maximum flexibility for querying based on both vertex and edge properties. At the time of writing, I did not find any concrete examples of this technique online, although the MongoDB slide presentation here makes mention of edge collections. This example is based on my own experimentation.

Data Model

For our example scenario, we'll use a standard graph theory problem: a simple product recommendations engine as you might find on a typical e-commerce website.

Mobirise

Figure 1

The node data will be stored in two Mongo collections named customer and product, and we will use another collection for the edge data.

Collection: customer

/* 1 */
{
     "_id" : ObjectId("5a0bda87e7a83c9369d6f92a"),
     "name" : "Customer 1"
}

/* 2 */
{
     "_id" : ObjectId("5a0bda9fe7a83c9369d6f92c"),
     "name" : "Customer 2"
}

/* 3 */
{
     "_id" : ObjectId("5a0bdab3e7a83c9369d6f932"),
     "name" : "Customer 3"
}

/* 4 */
{
     "_id" : ObjectId("5a0bdac5e7a83c9369d6f938"),
     "name" : "Customer 4"
}

Collection: product

/* 1 */
{
     "_id" : ObjectId("5a0bd9f8e7a83c9369d6f90c"),
     "name" : "Product A"
}

/* 2 */
{
     "_id" : ObjectId("5a0bda22e7a83c9369d6f914"),
     "name" : "Product B"
}

/* 3 */
{
     "_id" : ObjectId("5a0bda3ee7a83c9369d6f91a"),
     "name" : "Product C"
}

/* 4 */
{
     "_id" : ObjectId("5a0bda68e7a83c9369d6f920"),
     "name" : "Product D"
}

Collection: edge

/* 1 */
{
     "_id" : ObjectId("5a0be126e7a83c9369d6fb40"),
     "from" : "Customer 1",
     "to" : "Product D",
     "refs" : [
          ObjectId("5a0be13ae7a83c9369d6fb48")
     ],
     "label" : "viewed"
}

/* 2 */
{
     "_id" : ObjectId("5a0be13ae7a83c9369d6fb48"),
     "from" : "Customer 1",
     "to" : "Product A",
     "refs" : [
          ObjectId("5a0be126e7a83c9369d6fb40"),
          ObjectId("5a0be14de7a83c9369d6fb52")
     ],
     "label" : "purchased"
}

/* 3 */
{
     "_id" : ObjectId("5a0be14de7a83c9369d6fb52"),
     "from" : "Customer 2",
     "to" : "Product A",
     "refs" : [
          ObjectId("5a0be13ae7a83c9369d6fb48"),
          ObjectId("5a0be15ee7a83c9369d6fb56")
     ],
     "label" : "purchased"
}

/* 4 */
{
     "_id" : ObjectId("5a0be15ee7a83c9369d6fb56"),
     "from" : "Customer 2",
     "to" : "Product B",
     "refs" : [
          ObjectId("5a0be14de7a83c9369d6fb52"),
          ObjectId("5a0bde97e7a83c9369d6fa60")
     ],
     "label" : "purchased"
}

/* 5 */
{
     "_id" : ObjectId("5a0be172e7a83c9369d6fb60"),
     "from" : "Customer 3",
     "to" : "Product B",
     "refs" : [
          ObjectId("5a0be15ee7a83c9369d6fb56"),
          ObjectId("5a0be1cae7a83c9369d6fb8e")
     ],
     "label" : "purchased"
}

/* 6 */
{
     "_id" : ObjectId("5a0be1cae7a83c9369d6fb8e"),
     "from" : "Customer 3",
     "to" : "Product C",
     "refs" : [
          ObjectId("5a0be172e7a83c9369d6fb60")
     ],
     "label" : "purchased"
}

/* 7 */
{
     "_id" : ObjectId("5a375c8594954dcf94f15794"),
     "from" : "Customer 4",
     "to" : "Product B",
     "refs" : [
          ObjectId("5a0be15ee7a83c9369d6fb56"),
          ObjectId("5a0be172e7a83c9369d6fb60")
     ],
     "label" : "viewed"
}

This creates "Customers also bought" recommendations functionality.
1. Start with current product being viewed by customer
2. Traverse "purchased" edges to customers and products.
3. Select the products along the traversal path.

This gives us the other products purchased by any customer who purchased the original product, and also products purchased by other customers who purchased those products. These types of "friend of a friend" queries are what graph databases excel at, while a traditional relational database is not ideal for this due to the arbitrary number of SQL joins which would be involved since this has performance implications and also typically requires programmatic SQL generation. The strength of each recommendation returned can be determined by its proximity to the starting product.  

MongoDB Query

The approach below is inspired by the way graph databases such as Neo4j or OrientDB store nodes (a.k.a. vertexes) and edges as separate objects in the database. For purposes of this example, we will create three collections in MongoDB: products, customers, and edges. The documents in the products collection will represent our e-commerce products and contain their attributes, and our customers will be stored as documents in the customers collection. The edge collection documents will represent the edges or relationships between nodes, including a relationship type, a from_id for the source node ID, a to_id for the target node ID, and any other custom properties you want to include. Note that one advantage of this approach is that we do not need to embed the edges within the nodes, in fact the nodes do not need to contain any relationship data whatsoever.

Although many of these features where added in MongoDB 3.3, version 3.6 is required for this example since 3.6 introduces the "let" and "pipeline" options we will use on the $lookup stage to fetch the node details from the resulting edges.

The example below starts by looking for inbound relationships from the target product (Product B). Both $graphLookup and $lookup are used. To traverse edges and filter by edge properties, use $graphLookup. To fetch node details or filter by node properties, use $lookup.

db.product.aggregate( [
     { $match: { "name": "Product B" } },
     { $lookup: {
          from: "edge",
          localField: "name",
          foreignField: "to",
          as: "edge_docs"
     }
},
{ $project: {
     "edge_docs": 1
     }
},
{ $graphLookup: {
     from: "edge",
     startWith: "$edge_docs._id",
     connectFromField: "refs",
     connectToField: "_id",
     as: "pathEdges",
     maxDepth: 2,
     depthField: 'steps',
     restrictSearchWithMatch: { "label" : "purchased" }
     }
},
{ $project: { "edges" : "$pathEdges" } },
{ $unwind: "$edges" },
{ $sort: { 'edges.steps': 1 } },
{ $lookup: {
     from: "customer",
     let: { from: "$edges.from", steps: "$edges.steps" },
     pipeline: [
          { $match:
               { $expr: { $eq: [ "$name", "$from" ] } }
          },
          { $project: { _id: "$_id", name: "$name", steps: "$steps" } }
     ],
     as: "customers"
     }
},
{ $unwind: "$customers" },
{ $sort: { 'customers.steps': 1 } },
{ $lookup: {
     from: "product",
     let: { to: "$edges.to", steps: "$edges.steps" },
     pipeline: [
          { $match:
               { $expr: { $eq: [ "$name", "$to" ] } }
          },
          { $project: { _id: "$_id", name: "$name", steps: "$steps" } }
     ],
     as: "products"
     }
},
{ $unwind: "$products" },
{ $sort: { 'products.steps': 1 } },
{ $group: { _id: null, edgs: { $push: "$edges" }, cstmrs: { $push: "$customers" }, prdcts: { $push: "$products" } } },
{ $project: { _id: 0, edges: "$edgs", customers: "$cstmrs", products: "$prdcts" } }
] ).map( function(u) {
     var customerIds = [], newCustomers = [];
     for (var i = 0; i < u.customers.length; i++) {
          var c = u.customers[i];
          if (customerIds.indexOf(c._id.toString()) == -1)
               newCustomers.push(c);
               customerIds.push(c._id.toString());
     }
     var productIds = [], newProducts = [];
     for (var i = 0; i < u.products.length; i++) {
          var p = u.products[i];
          if (productIds.indexOf(p._id.toString()) == -1)
          newProducts.push(p);
          productIds.push(p._id.toString());
      }
     u.customers = newCustomers;
     u.products = newProducts;
     return u;
} );

The query above performs the following steps in the aggregation pipeline:
- Use $match to select the target product node.
- Use $lookup to find all outbound edges from target node.
- Use $project to get the edge data we want to work with.
- Use $graphLookup to traverse the edges having label "purchased", up to 2 levels deep.
- Use $project to get an edges array.
- Use $unwind to get the documents in the edges array.
- Use $sort on the "steps" field to list edges in order of traversal.
- Use $lookup to get all customer nodes in the traversal path.
- Use $unwind and $sort to sort the customer nodes by order of traversal.
- Use $lookup to get all product nodes in the traversal path.
- Use $unwind and $sort to sort the product nodes by order of traversal.
- Use $group and $project to aggregate the documents from each traversal step into a single document with a list of edges and nodes.
- Use a $map function to get distinct customer and product nodes in results.

The refs property is necessary to allow bidirectional traversal, similar to "both" in Neo4j Cypher. It contains an array of ObjectIds for all other edges intersecting with the current edge at any node.

After running the query, we get the following output. This document gives us access to all edges and nodes that resulted from the graph traversal, in the correct sequence of the traversal path.

/* 1 */
[
     {
          "edges" : [
               {
                    "_id" : ObjectId("5a0be172e7a83c9369d6fb60"),
                    "from" : "Customer 3",
                    "to" : "Product B",
                    "refs" : [
                         ObjectId("5a0be15ee7a83c9369d6fb56"),
                         ObjectId("5a0be1cae7a83c9369d6fb8e")
                    ],
                    "label" : "purchased",
                    "steps" : NumberLong(0)
               },
               { 
                    "_id" : ObjectId("5a0be15ee7a83c9369d6fb56"),
                    "from" : "Customer 2",
                    "to" : "Product B",
                    "refs" : [
                         ObjectId("5a0be14de7a83c9369d6fb52"),
                         ObjectId("5a0bde97e7a83c9369d6fa60")
                    ],
                    "label" : "purchased",
                    "steps" : NumberLong(0)
               },
               {
                    "_id" : ObjectId("5a0be1cae7a83c9369d6fb8e"),
                    "from" : "Customer 3",
                    "to" : "Product C",
                    "refs" : [
                         ObjectId("5a0be172e7a83c9369d6fb60")
                    ],
                    "label" : "purchased",
                    "steps" : NumberLong(1)
               },
               {
                    "_id" : ObjectId("5a0be14de7a83c9369d6fb52"),
                    "from" : "Customer 2",
                    "to" : "Product A",
                    "refs" : [
                         ObjectId("5a0be13ae7a83c9369d6fb48"),
                         ObjectId("5a0be15ee7a83c9369d6fb56")
                    ],
                    "label" : "purchased",
                    "steps" : NumberLong(1)
               },
               {
                    "_id" : ObjectId("5a0be13ae7a83c9369d6fb48"),
                    "from" : "Customer 1",
                    "to" : "Product A",
                    "refs" : [
                         ObjectId("5a0be126e7a83c9369d6fb40"),
                         ObjectId("5a0be14de7a83c9369d6fb52")
                    ],
                    "label" : "purchased",
                    "steps" : NumberLong(2)
               }
          ],
          "customers" : [
               {
                    "_id" : ObjectId("5a0bdab3e7a83c9369d6f932"),
                    "name" : "Customer 3",
                    "steps" : NumberLong(0)
               },
               {
                    "_id" : ObjectId("5a0bda9fe7a83c9369d6f92c"),
                    "name" : "Customer 2",
                    "steps" : NumberLong(0)
               },
               {
                    "_id" : ObjectId("5a0bda87e7a83c9369d6f92a"),
                    "name" : "Customer 1",
                    "steps" : NumberLong(2)
               }
          ],
          "products" : [
               {
                    "_id" : ObjectId("5a0bda22e7a83c9369d6f914"),
                    "name" : "Product B",
                    "steps" : NumberLong(0)
               },
               {
                    "_id" : ObjectId("5a0bda3ee7a83c9369d6f91a"),
                    "name" : "Product C",
                    "steps" : NumberLong(1)
               },
               {
                    "_id" : ObjectId("5a0bd9f8e7a83c9369d6f90c"),
                    "name" : "Product A",
                    "steps" : NumberLong(1)
               }
          ]
     }
]

Conclusion

The results of this brief experiment are promising for MongoDB's graph query capabilities. We were able to successfully model a directed property graph including both node and edge properties and then query those properties while traversing the graph. The query syntax is more verbose than Neo4j's Cypher and it remains to be seen how well it handles certain edge cases, but for those already using MongoDB for their application, this may be a feasible alternative to deploying a dedicated graph database. Another advantage is that since the graph queries occur in Mongo's aggregation framework, it's easy to integrate with other Mongo features such as faceted search, whereas Neo4j does not support faceted search out of the box. Thanks for reading.

© Copyright 2021 Common Aspect, LLC. All Rights Reserved.

Built with Mobirise web page maker