“Jeez, I’m out of it for a little while, everyone gets delusions of grandeur!” That’s what Han Solo said after being frozen in carbonite. I’ve been solving data problems for customers the last year and a half and am now getting back in graph DBMSs. We took a nice look at Titan last week, can’t wait to play with that some more. I’m going to give a bit of the same to Neo4j. All of this as prep for my talk at GraphDay 2016 in Austin, TX.
In a Word: Performance
The nice thing about taking a step away from a software title for over a year is that you can see where the focus really has been in that time, see the forest through the trees if you will.
Let’s summarize some of the changes from the last two point releases:
- New in-memory page cache: 10x read throughput
- Fast-write buffering / concurrent writes: 100x write throughput
- New cost-based query optimizer: up to 100x performance
- Fully off-heap cache
That’s not all of the changes, but they are the headline ones. See a theme? Performance. And, as Sebastian likes to say, “Speed is a Feature“.
So, the good news is that it is faster, possibly much, much faster. But the bad news is, apparently it wasn’t that fast before. (As an aside, it was plenty fast in early 2014 for some things that relational databases couldn’t do well: multi-hop traversals.)
One of the side benefits of speed improvements is that they should also result in scale improvements. That is, the same hardware running the new version should be able to do more work. I’d like to call this found money, but an associate of mine pointed out: upgrades aren’t free, and they aren’t always cheap either.
With that it mind, I should note that there’s been some changes to the data store along the way. I probably can’t just dust off my 2.0 version dbs and start up 2.3 with them. I wasn’t planning to do that, mind you. (One of the things I’m going to evaluate for my Graph Day talk is import performance & ease of use.) But if I was running a production operation, I’d be particularly wary about an upgrade that changed how the data sat on disk. (Note: that seems to have been a concern for 2.1 or 2.2, but not for 2.3.)
One of the really slick things about Neo4j has been the web front end. It allows both ad hoc querying, and also has a fair set of graph visualization capabilities with happily bouncing nodes frolicking about the screen. It was pretty nice in 2014, and has received some attention since then. I’ll be looking at performance (it would freeze up on my at times) and for capabilities that aid in graph discovery and ad-hoc visualization.
I know, you’re rolling your eyes because I said DevOps, obvious clickbait (except that it isn’t in the title, or URL, and probably won’t make the summary; I know the SEO basics). I’m going to prototype a few different database engines so I better take advantage of the nifty tools that now fall under the DevOps heading. Back when I was first cutting my teeth on technology, DevOps just meant that you were a small shop and everyone had to wear multiple hats. Nowadays, roles are blurring and technical staff are doing a better job working together.
So in the Dev-Ops vein we get:
- An official Neo4j Docker container
- Neo4j Metrics – stream monitoring stats (Enterprise lic only)
- Faster backups, upgrades & bulk loading – aka, safe ways to bypass transaction management overhead
- Windows PowerShell Support ?!?
Yeah, that last one surprised me a little. But since I’m a glutton for an object-capable scripting environment, or would be if I was still doing a lot of sys admin work, that seems like a pretty snazzy way to appeal to the Microsoft shops out there.
Developers Get Some Too
I’d be remiss if I didn’t also note that there’s new capabilities on the coding side as well:
- Query Plan Visualization – Your graph query in pictures!
- Spring Data Neo4j 4.0 – mainly better support for Server (e.g. not embedded) deployments
- Property Existence Constraints – you can’t add an Asset to the graph unless it has an assetId property (Enterprise lic only)
- New String Operators: STARTS WITH, CONTAINS, ENDS WITH
- DETACH DELETE – Remove a node and all of its edges with one command.
So now I can do snazzy things like:
MATCH (db:DatabaseEngine)WHERE db.type STARTS WITH "graph" AND db.type NOT CONTAINS "RDF"RETURN db
Returns: Neo4j, Titan, Microsoft Graph Engine
MATCH (db:DatabaseEngine)WHERE db.name = "FoundationDB"DETACH DELETE db
The performance work alone is a significant investment in the product. Building software with Java and then building all of the tricks necessary to overcome the performance constraints of the JVM (and also the OS) is no simple task. It’s nice to see that some other features made it in as well. I’m looking forward to taking this new Neo4j for a test drive, like I’m doing with Titan. Join me January in Austin at Graph Day to see what I learn.