The intricacies of JanusGraph can often lead users down a rabbit hole of challenges and complexities. Whether you are initiating a project or are already knee-deep into one, you might have felt the pressing need for a JanusGraph tuneup.
While navigating through the blogosphere, you might have chanced upon the invaluable advice of Ted's JanusGraph blogs. However, just like taking a vehicle to a mechanic, it's crucial to understand the very essence of a JanusGraph tuneup.
In this article, we shall embark on the journey of dissecting the importance of this tuneup and offer you a roadmap to optimize your JanusGraph operations. Let’s take a look.
At its core, a JanusGraph tuneup is the meticulous process of adjusting and optimizing various settings and configurations to enhance JanusGraph's performance.
A tuneup is more than a one-size-fits-all solution. It begins with examining the preliminary adjustments that can be made without delving too deeply into database remodeling. Perhaps you might need to tweak the JVM settings or upgrade your JanusGraph version for those coveted performance enhancements.
However, a tuneup doesn't end there. It dives deep into JanusGraph properties, ensuring optimal performance while balancing the trade-offs. From modifying threadPoolWorker in the Gremlin Server properties to managing client driver configurations, every aspect is given due attention.
Consider a JanusGraph tuneup as an extensive health checkup for your graph project, examining every nook and cranny for possible optimizations.
Imagine embarking on a cross-country drive with a car that hasn't been serviced for years. The chances of breaking down or facing issues are incredibly high. Similarly, without a proper JanusGraph tuneup, your project might stumble upon unexpected roadblocks, severely affecting its performance.
The data load speed is a significant concern for many, and without an effective tuneup, projects can stutter right from the onset. But the importance of a tuneup isn't confined to just loading data swiftly. It encompasses optimizing JVM settings, ensuring smooth operations in a containerized environment, and even preventing the adverse effects of excessive garbage collections.
Additionally, every JanusGraph project is uniquely tailored. Hence, a tuneup aids in aligning the JanusGraph configurations with the specific project needs, ensuring the graph can tackle real-world challenges head-on.
In essence, tuning up JanusGraph isn't just an optimization activity; it's a prerequisite for ensuring the longevity, stability, and performance of your graph projects.
Consider upgrading your JanusGraph version if you haven’t done so for a while.
The standard caveat applies here - this is open-source software, you should always read the changelog because while you may get some performance benefits, you are also on the bleeding edge.
There are several items that we like to look at here - none of which will be surprising to server-side Java engineers.
There are several properties configured in the JanusGraph properties file upon which our gremlin-server.yaml relies. Note that some of these properties may have a GLOBAL_OFFLINE mutability level.
This is the number of globally unique IDs to be reserved by each JanusGraph instance id block thread and it defaults to 10,000. It will maintain a pool of IDs to be used later.
For instance, every time we add a vertex, JanusGraph assigns an ID to that vertex, extracting it from the IDs pool. When this pool drops below a given percentage (controlled by the renew-percentage parameter), JanusGraph will try to reserve a new block of this size asynchronously.
However, if we have a high insertion rate, this pool can be exhausted very fast and the asynchronous renewal mechanism will eventually block our worker threads, waiting for the new pool of IDs. Periodic spikes in your query latencies are one potential symptom of this behavior.
If we have a cluster of JanusGraph instances, this could be even worse, since the retrieval of a new block of IDs needs to acquire a lock across multiple threads and multiple instances. So setting this value too low will make transactions block waiting for this reservation.
Setting it too high could potentially cause unused IDs to be lost if the JanusGraph instance is shutdown (unused IDs are not returned to the pool upon restart) but this shouldn’t be an issue if your insertion rate is high and you don’t turn off your instances too often.
The recommendation is to set this value to the number of vertices expected to be inserted per hour per JanusGraph instance.
CAUTION! Use this one with care because it turns off constraint checks thereby reducing reads before writes.
This one is most safely used when you have highly preprocessed input data or otherwise do not need uniqueness or other referential integrity constraints enforced at input time. Either because the data coming in is clean or you’ve already done the constraint checking programmatically.
This boolean enables parallelization of storage access which doesn’t help write performance but can help if you have to otherwise do a lot of reads before writes.
For multi-node JanusGraph instances, it is usually best to turn this off as the cache is not distributed. Therefore each node will have a different view of the data.
In some cases, it makes sense to turn it on for nodes that have very high reads after load. You could also turn it off for some instances that are in charge of the writes, and turn it on for some other instances than handle the reads.
This setting may not apply in most cases as it is really a protective measure against runaway queries. It only runs queries that use an index. Most of the time these scenarios have already been addressed but it’s always good to verify.
Some highly pertinent performance settings also exist in the gremlin-server.yaml file. Again there are several settings available to experiment with (full list here) - these are the key ones we focus on.
This may seem like a “Captain Obvious” moment but make sure the client knows about all the candidate servers. These are not maintained automagically like some commercial offerings; you must update these settings explicitly.
Additionally, if you have single client instances producing a large amount of traffic, it may also be worth increasing client connection pool sizes.
Please see this blog for more on driver settings.
If you’ve tried all the previous steps and still haven’t gotten the performance you seek, don’t despair. There are still some reasonably painless areas that can be addressed.
JanusGraph requires you to choose a storage layer, each of which has its own performance profile characteristics. In each, what you’re looking for is the latency of the request to the storage layer from the JanusGraph perspective.
Frequently that requires collecting metrics from some storage-specific toolset.
The JanusGraph project does not come with an out-of-the-box Prometheus setup, but JanusGraph community member Ganesh Guttikonda has put together a well-documented repo that can be used to set up a JanusGraph specific dashboard: https://github.com/gguttikonda/janusgraph-prometheus.
You can always cripple a properly tuned JanusGraph architecture with a bad data model. Your data model should always be optimized for your expected query patterns.
In practice, no two client data models have ever been exactly the same, even within the same problem spaces, because the shapes and sizes of data sets vary, the frequency of events vary and key value drivers vary by customer.
Having said that, there are a few things that you should look out for, unfortunately, they are likely to require some changes to your application:
When writing to the graph, there are locks required on the graph (for a single node) that a single thread will hold long enough to write the node and have the storage layer acknowledge it as committed (which is storage layer dependent).
By default it is 100ms, if your storage layer completes most writes in say 50ms, then the graph is waiting for no reason. The good news is that you don’t get spurious lock failures. If you set it too short, you end up having a bunch of retries. The recommended value is a small multiple of the average write time.
So how low to set this? It depends on your write latency - try ratcheting it down slowly 10ms at a time from the default amount until you find your sweet spot. Alternatively, you can get your average write time if you’ve already gathered storage metrics, and adjust from there.
Do you know what the maximum degree size is for your graph? You should.
If you don’t know the answer, a quick Spark Gremlin routine can give you the answer. If you have queries that traverse through those nodes, you can be sure they’re not as performant as you’d like.
Do your queries need to go this route? Can they originate elsewhere? Can you remodel your graph? Unfortunately, there’s no succinct to-do here but in short, removing supernodes from your graph and graph model will pay dividends when it comes to performance and operations.
Are you looking up attributes from the graph before deciding to write back or what to write back? Any read you have in your write path ultimately steals time that could be spent writing.
In some cases, these reads are inevitable, but in many other cases, the reads can be removed with some careful ingest design. This discussion is larger than the scope of this article.
However, tools like Kafka Streams can ease the burden of designing a data-loading pipeline that can make use of things like caching to remove large percentages of your ingest read traffic. In turn, this will greatly improve your loading performance.
Navigating through the intricacies of JanusGraph can be daunting. But with a JanusGraph tuneup, the challenges become manageable. Tuning isn't just about making a few adjustments here and there. It's about understanding the project's unique demands and tailoring the graph to meet them efficiently.
Just as you wouldn't neglect a vehicle's maintenance, overlooking the significance of a JanusGraph tuneup could be detrimental to your project's health. As you embark on your journey with JanusGraph, ensure that tune-ups become a staple, guiding your project to success and optimal performance.
Tell us what you need and one of our experts will get back to you.