As the property graph database space has matured over the last few years, it has largely been dominated by a handful of names who on balance are not that big in the software marketplace generally speaking. Those of you that follow the property graph or NoSQL space have surely heard of Neo4j and DataStax but they are not widely known outside our community. That has now changed with the addition of graph support in Microsoft’s Cosmos DB.
Cosmos DB is a multi-model database and while there are plenty of good use cases and needs for MongoDB or DocumentDB, we are the graph people so this blog will focus on the graph functionality of Cosmos DB.
We were fortunate to have a .NET client that wanted us to explore the new Cosmos DB option. They were seasoned and savvy enough to know that early product releases tend to be just that, um, early. They are broadly committed to Microsoft technology and needed a graph engine to satisfy a specific graph use case. Frankly speaking, they needed to know if Cosmos DB was ready, and if not, how far away is it.
Cosmos DB support of Gremlin is not complete at this point however it is evolving quickly. Moreover they have a “private preview” feature where we were able to access features that were in the later stages of their development pipeline but not yet released. From both a graph and product ideation standpoint, it was exciting to work with the Microsoft product management and development people to help them refine and perfect their offering.
For our project, we only ran into one Gremlin feature that we needed (.hasNot()) that was only available in preview mode but we were easily able to achieve the same functionality by rewriting the query logic a bit.
We were able to work with all the basic CRUD features in Gremlin and satisfy simple traversals. Our client’s project was a reasonably straight forward network analysis type use case. Our most common use case was of the sort “given a device, find all the devices connected to it (basic first neighbor)” - essentially simply walking a graph a known number of hops.
We discovered that the performance of the loader was not particularly good, but in our later work we realized that the mechanisms that we borrowed from the sample code were really not suited to large bulk uploads. The loader essentially built a remote connection and tore it down for each statement, for example. In retrospect, for production work, it would make more sense to run the loader in Azure and supply it with Gremlin to execute.
While the Gremlin support is quickly coming together, where you really see Cosmos shine is because it’s part of the larger MS Azure/Studio ecosystem.
It turns out that building a service (really a service plus website, as we will see) is ridiculously simple.
They give you a url and a package of settings that you can import into Visual Studio to set up the publishing interface. Then all you do is build your application in the conventional way and hit “Publish”. It just works.
CosmosDB provides a tool to execute arbitrary Gremlin against a graph and view the results, either as JSON or as a live graphical view. Using this tool, you can both explore the graph and test Gremlin to be used in the API.
The parts of the environment that you’d expect for a mature development ecosystem are there and work flawlessly. This presents an interesting juxtaposition in the marketplace. Some graph vendors have more mature backends and are trying to mature the tools. Microsoft’s offering is the inverse at the present moment. While the core storage layer appears to be ahead of the game when it comes to tunable ACID and data distribution behavior, the graph layer is young.
There are some items to note around the .NET platform for development. All the sample code (including code generated by others such as Swagger.io) targets the newer ASP.NET Core foundation. Traditionally, Expero has used the .NET Framework because there are many useful nuget and other packages that work with .NET Framework but not ASP.NET Core. While we expect these issues to be resolved as the offering matures, special care should be taken to your broader toolset to ensure system wide tool and framework compatibility.