Note: We were in a rush to get this out so I convinced my editor to just let me publish the blog post “as is” and leave the editorial comments in line. Our readers are very smart, and I’m sure that this will be neither confusing nor distracting.
Wow, seems like the graph database world is teeming with new news. (Editor: “new news” is redundant.) DataStax released 5.1, Neo4j released 3.2, Microsoft announces CosmosDB; there’s a lot of stuff happening in the graph database world. Looks like a prime time for some Gremlin training to me.
“Gremlin training? Why would anyone need Gremlin training?” you ask. Did you think that graphs were dead simple, near-obvious in their use?
Well, let me recount an exchange I heard recently between someone who’s been steeped in the graph world for several years and an experienced technologist who is coming to graph databases from other interests. This is pretty much near-verbatim. (Editor: and I’m pretty sure that is sarcasm, which does not work well in print, so avoid it. Josh: Right, avoid sarcasm. Like that will work with devs. Editor: Stop it, you’re doing it again. Josh: I was aiming for irony. Editor: You missed.)
Let’s call our two characters Alice and Bob. (Editor: If you call them Alice & Bob, people will think this is a security blog post.)
Let’s call our two characters Dave and Ted. We’ll have Dave be an experienced software developer who is just starting to explore graph. Let’s make Ted a graph architect, Apache TinkerPop contributor, JanusGraph contributor, and all-around great guy. (Editor: Because Ted is. I’m sure Dave’s cool, too.)
The scene opens with Dave walking into the conference room and Ted writing the following code on the board:
Dave: Gee, what’s that?
Ted barely able to contain his excitement that someone would ask him a question about graph:
Ted: Oh, the g, well that’s the start of a traversal process which is used to implement strategies to locate elements within the graph data store.
Dave: Actually, I said gee, not g.
Ted: Not g? Well, Gremlin does support the usual boolean logic operators like “or”, “and”, “not”. They are filter steps of course. But “not g”? I don’t know how taking the negation of a process would work. Perhaps you just want to use a HasNot() step?
Dave: Gremlin huh. Better not feed it after midnight. Or give it water.
Ted: Feed it after midnight? Not give it water? I don’t get it.
Dave: “It multiplies with water.”
Ted starts simultaneously waving his hands erratically in the air and drawing circles and lines on the board, with little green men marching on the circles.
Ted: Oh yeah, Gremlin processes do multiply, but you don’t need water for that. As for the Gremlins (that’s what we call the vertex programs) well, as they traverse the graph they divide, sort of cloning themselves when they reach branches. Then, if two of these clones meet up at another vertex, they’ll merge back together. They keep doing this until they reach the end of the traversal.
Dave: Friggin’ Millennials. They never get ‘80s movie references.
Ted: But you know, graphs are awesome. Like totally cool. No impedance mismatch or anything.
Dave: What’s that “V parens” thing?
Ted: Oh, it’s a Graph Step.
Dave: “Graph Step”? But it starts with V?
Ted: Yeah, and the other one starts with E, though starting with edges doesn’t make a whole lot of sense to me. I’ve yet to find a good use case for starting with an edge. Seems to me if someone is doing that, then it is probably a modeling mistake.
Dave: Starting with edges? I think maybe you should back away from the edges.
Ted: I can’t back away from the edges. That’s a big part of what makes graph databases so awesome, and they are my life. Graphs are filled with edges.
Dave: Filled with edges? Someone might get hurt.
Ted: Oh, you can’t get hurt with graphs. You can only solve problems. Lots of different types of problems. Like my example here which returns the names of all of the people that I know. Or doing fraud analysis. Or running a recommendation engine. Or integrating all of your customer data from a variety of data sources.
Dave: Wait, I’ve just started building an application that does fraud analysis and drives a customer support portal. It will display all of the information that is connected to the customer. Maybe I should look into a graph database?
Ted: Sounds like ideal graph use cases. If you are new to graph, you should go to Graph Day SF and see all that is going on.
Dave: Well the company I work for really likes open source software, I don’t expect there’s any free software to try?
Ted: Sure there is! Gremlin is part of the Apache TinkerPop project which specifies a graph computing framework. Marko’s giving the keynote at Graph Day. But you might want to look into JanusGraph, another open sourced project. It runs a graph engine with a Cassandra back-end so it scales really well.
Dave: Cassandra? We use DataStax for our Cassandra clusters.
Ted: Then you should look at DSE Graph if you already have DataStax Cassandra clusters. DSE Graph is built by the guys that made Titan Graph, sort of the ancestor to JanusGraph, and a lot of the guys working on DSE Graph are also involved with the TinkerPop project.
Dave: But our DBAs prefer a declarative language. That doesn’t look declarative to me.
Ted: Well you can go native with Neo4j, it has the Cypher language which is declarative, and Neo4j also supports Gremlin. Using Gremlin helps you to avoid vendor lock-in.
Dave: Do any of them handle inheritance? Significant parts of our domain are modeled with inheritance.
Dave: I forgot to mention, we’re mostly a Microsoft .Net shop. I’m sure this is all Java stuff given the Apache Foundation involvement.
Ted: Actually, Microsoft just released a new cloud-based graph database engine called CosmosDB with a .Net SDK, and it uses Gremlin as well.
Dave: Microsoft’s got a Cosmo Kramer DB? I’ll have to look into that. So, sounds to me like my first step in learning graph is learning Gremlin. How long until one of these companies offers an “Intro to Gremlin” course?
Ted: Expero already has! Graph Day just tweeted that it is the only one in the world. My good friend, godfather to all my children, fellow Trinity U alum, renowned graph database comparer (Editor: Laying on a bit thick here, don’t you think? Josh: I’m almost done) and general cool frood who knows where his towel is, is running an “Intro to Gremlin” training workshop on Friday, June 16 for Graph Day in San Francisco.
Dave: Nice recovery with the Douglas Adams quote there. Maybe you aren’t so bad. Though you are one of the better consultants I’ve talked to. Sure, you say lots of technical jargon that barely makes any sense to normal people, but you also seem to know a lot about the different graph database vendors and how I can get started learning Gremlin. I guess I’ll make my way to Graph Day in June, get me some Gremlin training, and start looking at all of these graph database vendors.
Josh here again: yep, that was near verbatim, though the names have been changed to protect the innocent. Or at least they were supposed to be changed to Alice and Bob. Did we do that?