Introduction
We get contacted frequently for JanusGraph support, tuning advice and recommendations. The vast majority of the inquiries are centered around getting data loaded into JanusGraph initially. Some projects never get off the ground if they can’t get the data loaded fast enough. By now most everyone’s already seen the JanusGraph blogs from Ted. In addition, we’ve built up a suite of remediation tactics based on several projects and experiences - consider this the (JanusGraph) tuneup when you take your graph to the mechanic.
Before we get started, a few caveats:
- We always try the “low hanging fruit” first. No need to remodel the database if a simple tweak to the JVM will suffice.
- Your mileage may vary. When you go to the auto shop, the standard 50k mile tuneup doesn’t catch everything...but this is a good place to start.
Low Hanging Fruit
JanusGraph version
Consider upgrading your JanusGraph version if you haven’t done so for a while. For instance, version 0.4.0 has some performance improvements related to parallelizing calls to the JanusGraph storage layer which could speed up your queries. The standard caveat applies here - this is open source software, you should always read the changelog because while you may get some performance benefits, you are also on the bleeding edge.
JVM Settings
There are several items that we like to look at here - none of which will be surprising to server side Java engineers.
- If your heapsize is 8GB or larger (which is the case most of the time), you should consider using the G1 garbage collector. It has fewer knobs to twiddle and is specifically designed for large heap sizes. (This is a good primer for more reading on types of garbage collectors.)
- If you are using a container-ized environment, make sure your heap size is consistent with your container’s settings. Oftentimes we see containers get killed and restarted when the JVM heap size is too big for its host container.
- Setting your heap size too low in general will cause excessive garbage collections (which causes latency) or out-of-memory (OOM) errors. In most cases, the default JanusGraph heap size is too small.
JanusGraph properties
There are several properties configured in the JanusGraph properties file upon which our gremlin-server.yaml relies. Note that some of these properties may have a GLOBAL_OFFLINE mutability level. Here are some instructions on how to change them and here is the full list of options - these are the key ones we tend to tweak.
- ids.block-size: this is the number of globally unique IDs to be reserved by each JanusGraph instance id block thread and it defaults to 10,000. It will maintain a pool of IDs to be used later. For instance, every time we add a vertex, JanusGraph assigns an ID to that vertex, extracting it from the IDs pool. When this pool drops below a given percentage (controlled by the renew-percentage parameter), JanusGraph will try to reserve a new block of this size asynchronously. However, if we have a high insertion rate, this pool can be exhausted very fast and the asynchronous renewal mechanism will eventually block our worker threads, waiting for the new pool of IDs. Periodic spikes in your query latencies are one potential symptom of this behavior. If we have a cluster of JanusGraph instances, this could be even worse, since the retrieval of a new block of IDs needs to acquire a lock across multiple threads and multiple instances. So setting this value too low will make transactions to block waiting for this reservation. Setting it too high could potentially cause unused IDs to be lost if the JanusGraph instance is shutdown (unused IDs are not returned to the pool upon restart) but this shouldn’t be an issue if your insertion rate is high and you don’t turn off your instances too often. The recommendation is to set this value to the number of vertices expected to be inserted per hour per JanusGraph instance.
- storage.batch-loading - Caution! Use this one with care because it turns off constraint checks thereby reducing reads before writes. This one is most safely used when you have highly preprocessed input data or otherwise do not need uniqueness or other referential integrity constraints enforced at input time - either because the data coming in is clean or you’ve already done the constraint checking programmatically.
- query.batch - This boolean enables parallelization of storage access which doesn’t help write performance but can help if you have to otherwise do a lot of reads before writes.
- cache.db-cache - For multi-node JanusGraph instances, it is usually best to turn this off as cache is not distributed. Therefore each node will have a different view of the data. In some cases it makes sense to turn it on for nodes that have very high reads after load. You could also turn it off for some instances that are in charge of the writes, and turn it on for some other instances than handle the reads.
- force-index - This setting may not apply in most cases as it is really a protective measure against runaway queries. It only runs queries that use an index. Most of the time these scenarios have already been addressed but it’s always good to verify.
Gremlin Server properties
Some highly pertinent performance settings also exist in the gremlin-server.yaml file. Again there are several settings available to experiment with - these are the key ones we focus on.
- threadPoolWorker - The default value for this is 1. This is essentially a traffic cop to disseminate work. It should be set at most 2 * number of cores in the JG instance.
- gremlinPool (the number of traversals that can be run in parallel) - This pool runs until query completes. You should start at number of cores, then increase incrementally and it is usually a function of what your storage layer can handle; eventually your storage layer will have contention.
Client driver configuration
This may seem like a “Captain Obvious” moment but make sure client knows about all the candidate servers. These are not maintained automagically like some commercial offerings - you must update these settings explicitly. Additionally, if you have single client instances producing a large amount of traffic, it may also be worth increasing client connection pool sizes.
Please see this blog for more on driver settings.
Intermediate Steps
If you’ve tried all the low hanging fruit and still haven’t gotten the performance you seek, don’t despair. There are still some reasonably painless areas that can be addressed.
Storage Metrics
JanusGraph requires you to choose a storage layer - each of which have their own performance profile characteristics. In each, what you’re looking for is latency of the request to the storage layer from the JanusGraph perspective.
Frequently that requires collecting metrics from some storage specific toolset.
- BigTable - Prometheus or some other GCP metrics console. This can be a blog post on it’s own but for a starting primer, read this.
- Scylla - The Scylla Monitoring Stack is based on their Prometheus API and is deployed as a set of Docker containers described here.
- Cassandra - C* can be the most difficult to tune because it has the most knobs to turn. From GC to replication to consistency and compaction, this is surely an area where you can make an entire career. Most clients use something like DataDog to collect the data.
- HBase - Depending on who’s HBase you are using, you may have some vendor specific tools at your disposal. Otherwise you can consider tools like hbtop.
The JanusGraph project does not come with an out-of-the-box Prometheus setup, but JanusGraph community member Ganesh Guttikonda has put together a well documented repo that can be used to stand up a JanusGraph specific dashboard: https://github.com/gguttikonda/janusgraph-prometheus.
Schema Review
You can always cripple a properly tuned JanusGraph architecture with a bad data model. Your data model should always be optimized for your expected query patterns. In practice, no two client data models have ever been exactly the same, even within the same problem spaces, because the shapes and sizes of data sets vary, frequency of events vary and key value drivers vary by customer. Having said that, there are few things that you should look out for, unfortunately they are likely to require some changes to your application:
- Uniqueness Constraints - Determine if unique constraints are necessary as they impose read before writes. Don’t add them unless absolutely necessary; if you do need them, then pull in locking wait time.
Locking Wait Time
When writing to the graph, there are locks required on the graph (for a single node) that a single thread will hold long enough to write the node and have the storage layer acknowledge it as committed (which is storage layer dependent). By default it is 100ms, if your storage layer completes most writes in say 50ms, then the graph is waiting for no reason. The good news is that you don’t get spurious lock failures. If you set it too short, you end up having a bunch of retries. The recommended value is a small multiple of the average write time. So how low to set this? It depends on your write latency - try ratcheting it down slowly 10ms at a time from the default amount until you find your sweet spot. Alternatively, you can get your average write time if you’ve already gathered storage metrics, and adjust from there.
Structural Changes… If You Must
Look for and Remediate Supernodes
Do you know what the maximum degree size is for your graph? You should. If you don’t know the answer, a quick Spark Gremlin routine can give you the answer. If you have queries that traverse through those nodes, you can be sure they’re not as performant as you’d like. Do your queries need to go this route? Can they originate elsewhere? Can you remodel your graph? Unfortunately there’s no succinct to-do here but in short, removing supernodes from your graph and graph model will pay dividends when it comes to performance and operations
Caching and Local Maps to Reduce Reads
Are you looking up attributes from the graph before deciding to write back or what to write back? Any read you have in your writepath ultimately steals time that could be spent writing. In some cases, these reads are inevitable, but in many other cases the reads can be removed with some careful ingest design. This discussion is larger than the scope of this article but tools like Kafka Streams can ease the burden of designing a data loading pipeline that can make use of things like caching to remove large percentages of your ingest read traffic and in turn, greatly improve your loading performance.