OrientDB is one of several popular graph data stores on the market today. It provides a multi-model approach with the powerful nature of a graph database and the flexibility of a document data store. If you have decided to build out your multi-tenant application on top of OrientDB, you are in luck as it has several built-in, out-of-the-box methods for handling multi-tenancy. In this post, I’ll look at three specific approaches:
Note: For this blog series we will talk about a multi-tenant application as an application in which two or more tenants are served software by a set of resources. A "tenant" in this case is a group of users with common access to a dedicated share of the data, configuration and resources of the system. For an overview on reducing complexity in multi-tenant applications, read Part 1 of this series, Multi-Tenant Applications: Reduce the Complexity.
OrientDB has support for Record-Level Security that can be leveraged to create a Partitioned Graph per tenant. A Partitioned Graph is a subgraph of data that is only available to specifically authorized subsets of users. You accomplish this by leveraging the robust security model that OrientDB provides to logically isolate one tenant’s data from other tenants. In order to enable this functionality, there are three basic steps:
Since the partitioning of the graph occurs at the low level within the core of OrientDB, this approach also has the advantage of being enforced on other clients such as those using Gremlin or the Java API. However, there is a price to pay here as each record returned by a query has a low-level hook that is called to process the security model. In most cases this is not a problem, but if you have a complicated security model such as those containing multiple levels of inheritance, the cost to process these can become noticeable.
Note: An excellent walkthrough of how to create Partitioned Graphs is available here.
One of the unique features of OrientDB is Clusters, which allow you to specify how the data in each class (vertex) is grouped on the physical disc. By creating a separate cluster for each class per tenant, you can physically isolate one tenant’s data from another. One major drawback to using this methodology is that due to the class inheritance structure you will also need to logically isolate that tenant’s data. This means that you must explicitly include that tenant’s cluster identifier in each query. The added need for logical isolation increases the developmental complexity of this method but allows for additional flexibility.
For example, given a database with a class called Customer, you would create a custom cluster for each tenant Customer_Tenant1, Customer_Tenant2, etc., as they are added to your system. You would include the cluster identifier in each query to limit the results to just a single tenant's data, e.g., SELECT FROM cluster:Customer_Tenant1.
One advantage of using this methodology is that aggregating data across all tenants is as simple as only using the base class name in the query instead of the tenant cluster identifier, e.g., SELECT FROM Customer.
OrientDB has the capability to run multiple databases on a single OrientDB server. You can leverage this functionality to give each tenant in your system a unique database. This provides each tenant with complete physical isolation of their own from other tenants' data while allowing you to leverage a shared infrastructure. This simplifies the development of applications by removing the need to worry about handling any logical isolation but adds some additional complexity to the operational aspects of the system. With this method you will have to coordinate database upgrades/migrations, handle tenant resource concerns, as well as handle routing tenants to different databases. While all of these operational aspects are well understood and not unique to OrientDB, it does add to the overall workload required to make this method work efficiently.
OrientDB has robust support for both physical and logical isolation using a variety of different methods. Each of the options presented here have pros and cons but each are also the right fit for certain use cases. If you are looking to completely physically separate tenant data, then you should look into using Separate Databases. If you are fine with only a logical separation of data, then Graph Partitioning is probably the best option. If you end up needing something in between, then you can look at Clustering as the possible solution.
For other posts in this series, see: