A recent client request had Expero working with a relative newcomer in the graph provider space, TigerGraph. We tend to fancy ourselves as graph experts and have worked with a variety of graph vendors, so we always welcome a chance to work with new technologies.

What made this experience different? All graph databases emphasize relationships, which help find associations that are not always immediately obvious. For example, answering the questions “How are two objects related?” or “How closely are two objects related?”. The answer to these questions provides new insights and patterns to explore based only on the results. What if there was a mechanism that could provide insights to how the results were derived that may provoke more questions to ask that are obscured by simply looking at the results?

**What is TigerGraph?**

TigerGraph is a native parallel graph database. Tiger defines this as storing data as a graph as opposed to an underlying RDBMS or NOSQL database engine. TigerGraph’s storage and processing engine are implemented in C++, and support distributed querying across the cluster to tackle tasks faster and parallelize processing.

There is no universally accepted query language in the land of graph, so TigerGraph has implemented their own:GSQL. GSQL is a schema-enforced SQL-like query language offering traditional control structures and parameterized queries.

**Accumulators a powerful feature of GSQL **

What sets GSQL apart from other graph query languages, such as Cypher or Gremlin? One key differentiator is the concept of Accumulators. Accumulators can be local or global in scope. Accumulators give a way to compute or check values as vertices and edges are traversed during a selection, or post traversal on the result set. This is a very powerful feature that allows for the straightforward implementation of various algorithms or data analysis.

Accumulators come in many different flavors. For example some are logical in nature like ** OrAccum** and

**, which are boolean and store the logical result of all conditionals along the way. Some provide ways to capture numerical results along the traversal, such as**

*AndAccum***,**

*MinAccum***, and**

*MaxAccum***, which record the minimum, maximum and average value respectively. Finally accumulators can also be used simply for storage of values or objects like**

*AvgAccum***,**

*SetAccum***, and**

*ListAccum***, which offer basic container functionality, such as contains, remove and clear.**

*MapAccum***The Path Best Traveled**

**Start With a Simple Schema **

Dijkstra’s algorithm is used for graphs to find the shortest path, if one exists, between a given vertex and all other vertices in the graph. Let’s take a look at an implementation of Dijkstra in GSQL, using some of these nifty accumulators. Before we begin we need a schema to operate on and for the purposes of this exercise a truly simple schema is all that is needed.

There is one vertex type, ** VSimple**, which has nothing except for a string id. There are two edge types, each with an attribute of

**. The edge type**

*Weight***is an example of an undirected edge, while**

*ESimpleUndirected***is an example of a directed edge. Notice the orange**

*ESimpleUndirected***has an arrow on one end on the reference to**

*ESimpleUndirected***.**

*VSimple***Create a Couple of Vertices and Edges**

Since there is now a schema we can begin to define a graph to traverse. The following graph will be weighted and directed and edges will have the following weight:

**Review of Dijkstra’s Algorithm**

First choose a vertex to be the be the root, or initial vertex. In our case this will be vertex S, and the distance from vertex S to S will simply be 0.

- Mark all vertices unvisited, except for the initial vertex S.
- Assign a tentative distance value to all vertices marked unvisited. The tentative value is generally set to infinity, but for our case we will assume all distances are positive and -1 will act as infinity. The vertex S will be set as the current vertex.
- For the current vertex consider all unvisited neighbor vertices and calculate the tentative distance through the current vertex to each neighbor. The newly calculated distance is compared to the current distance value for that vertex. If the newly calculated distance is less than the current distance, then the new path through the current vertex is the best path. Therefore update the distance value, otherwise if the newly calculated distance is greater than what has been recorded, then no changes need to be made.
- After all neighbors of the current vertex have been checked, mark the current vertex as visited. A visited vertex will never need to be checked again.
- If there are still unvisited vertices and there is a distance value that is not set to infinity then select the smallest distance value from the unvisited vertex set and go back to step 3. Otherwise if the unvisited set is empty, or if the only vertices in the unvisited set are all infinity, then the algorithm has finished.

**Overview of the GSQL Query**

Now let’s look at an implementation of Dijkstra’s algorithm that will be run on this particular graph. Don’t worry we will step through the code in piecewise fashion reviewing each section in detail.

*<p> CODE: https://gist.github.com/experoblogs/c9f036c079b64597ac2d3bb75c372e2d.js</p>*

**GSQL in Detail**

**Query Body**

First is the outer query body, which simply defines a query with the name DijkstraExample that takes a parameter of type String.

*<p> CODE: https://gist.github.com/experoblogs/186604ae22c94b6e32307739cc56e51e.js</p>*

One feature to note is that queries can also be defined to return values or objects. It is also possible to reference queries from inside a query allowing users to build easily build more complex queries, while keeping code segregated into logical constructs.

**Accumulators**

Let’s take a look at some of the accumulators that this query will be using.

*<p> CODE: https://gist.github.com/experoblogs/08722fdb5aea1716137bfa496a3e4300.js</p>*

Here we have declared a couple different types of accumulators. First let’s look at the two accumulators, ** @@minDistance** and

**, which have been declared as a**

*@@maxDistance***and**

*MinAccum***respectively. Note how both also have a type of**

*MaxAccum***declared and both are preceded by**

*INT***The use of**

*@@.***informs the typing of the two accumulators, while the**

*INT***and**

*@***dictates a local or global scope. A single**

*@@***acts as a local, or vertex attached accumulator. That is to say that each vertex has an instance of that accumulator, while**

*@***indicates a globally shared accumulator not attached to any one specific vertex.**

*@@*Next there is an ** OrAccum**,

**that has been declared as a vertex attached accumulator. Notice how no data type has been specified. This is due to the fact that both**

*@visited***and**

*AndAccum***are logical accumulators and will always results in a true or false value. It is important to realize in the example that we have also specified a default value of**

*OrAccum***for the accumulator.**

*FALSE*Then there are two ** SumAccum**,

**and**

*@distance***. The**

*@path***will allow for us to keep a running total for each vertex. Accumulator**

*SumAccum***is of type**

*@distance***is set to our default value of**

*INT***. However**

*-1***has been given no default value and is of type**

*@path***. For**

*STRING***the sum accumulation works as expected and for**

*@distance***acts as string concatenation.**

*@path*Finally the last type of accumulator is a container accumulator, ** SetAccum**, which will keep and unordered collection of objects, where no duplicates are allowed. Both

**and**

*@@visitedSet***will hold**

*@@unvisitedSet***objects.**

*VERTEX***Initializing and Simple Selects**

*<p> CODE: https://gist.github.com/experoblogs/f3a4762e70badb0d0ab66de3a68e0f6f.js</p>*

The first line to look at is that there is a vertex set, ** vertexList**, which is being instantiated with all vertices of type

**.**

*VSimple**<p> CODE: https://gist.github.com/experoblogs/753b81c7986f6d63eafc861657b355d7.js</p>*

Next we have the first ** SELECT** statement using the vertexList defined above.

*<p> CODE: https://gist.github.com/experoblogs/b73c5bfd26846c6698c5e6c9103c37f0.js</p>*

Looking closer at the ** FROM** statement we see a reference to the set

**followed by**

*vertexList***, where**

*:p***is vertex alias for the vertices from**

*p***. Next the**

*vertexList***is used to designate an edge connection. The**

*-()->***is indicating a direction, while the value in between the parentheses indicates the allowed edge types and edge alias. In our case we are limiting edges to only**

*->***and e is the alias that is used for edges in this context. The last little bit is defining the destination vertex for a given edge. Again in this simplified example we are limiting the vertice destinations to only be of type**

*ESimpleDirected***and these will have an alias of**

*VSimple***.**

*t*Lastly before moving into accumulators there is a defined ** WHERE** clause in the above query

*<p> CODE: https://gist.github.com/experoblogs/c372bff22006e28327c0371e35fefd21.js</p>*

The variable ** origin** is a parameter and is the

**of the vertex that will act as the root vertex. Since this value is a Primary Key, no two vertices will have the same**

*id***id**value, otherwise this would be a violation constraint.

**The Accum Clause**

Now for the first foray into using accumulators.

*<p> CODE: https://gist.github.com/experoblogs/325e964ef8c6bc3f2f1cb70657d79100.js</p>*

Notice that the keyword ** ACCUM** is used to denote an accum block. There are three operations, which are currently being done in the ACCUM block,

**, ‘**

*t.@distance = e.Weight***and**

*t.@path = p.id + ->***. Remember that**

*@@maxDistance += e.Weight***is the alias of the vertex destination of the edge and that**

*t***is the alias of the edge itself. The statement**

*e**<p> CODE: https://gist.github.com/experoblogs/429ab268d7ab9908d516ac92217398b8.js</p>*

is setting the value of the vertex attached accumulator ** @distance** to the edge weight that connects

**to**

*p***, for each**

*t***that**

*t***is connected. Similarly we are setting the**

*p***string value to the id of the starting vertex and concatenating that value with**

*@path***. Finally we are adding the weight value to the global accumulator**

*->***using the operator**

*@@maxDistance***. Because**

*+=***is a**

*@@maxDistance***the**

*MaxAccum***will only update the value if it is greater than value currently recorded.**

*+=***The Post-Accum Clause**

The last bit of the first ** SELECT** is the

**clause. The Post-Accum clause takes place after the result set has been established and is performed only on the result set. This allows a post check of values that were computed in the**

*POST-ACCUM***phase.**

*Accum**<p> CODE: https://gist.github.com/experoblogs/19db44cb104a2811bfbcdee063492337.js</p>*

Now this is setting the value of the origin vertex ** @visited** to

**, the**

*TRUE***value to 0 and notice that we are adding the vertex**

*@distance***p**to the set of visited vertices.

**What are we left with?**

The results of the above where stored in a temporary vertex set, ** currentVertex**. In our case because of the defined where clause the size of this set will either be

**or**

*1***depending on whether the value passed in when calling the query is legitimate**

*0***id or not.**

*VSimple***Two More Examples of Sets**

Now let’s take a look at the next two statements

*<p> CODE: https://gist.github.com/experoblogs/9e6bd999e2d2f0dbc8b66b41ac6e139e.js</p>*

The top statement should select all vertices where the vertex attached value ** @visited** is

**, which should be very vertex is not the initial vertex. These vertices are also added to the unvisited set. Finally we can demonstrate that vertex sets can be copied by declaring a new vertex set**

*FALSE***, which contains the same contents as**

*choiceSet***.**

*unvisitedSet***Control Structures **

Another nifty feature built into GSQL, is the traditional use of control structures. For example the support of a conditional while loop is as simple as follows

*<p> CODE: https://gist.github.com/experoblogs/32a0ee1fd21aee995964f259065920e1.js</p>*

The above code block also has an example of some of the built in functionality for vertex set, namely the use of ** size()** to determine how many vertices are in the set. As with traditional programming be aware that infinite loops can occur if the terminating condition is not guaranteed. Additionally these control structures can be used in

**and**

*Accum***blocks as well as DML blocks that may exist.**

*Post-Accum***Dijkstra in GSQL**

Now let’s review the actual implementation of the algorithm, which is the code that exists in that ** WHILE** block. Remember before entering the loop, steps 1 and 2 of Dijkstra have been satisfied. We have marked all vertices with a tentative distance and have marked each vertex as visited or not visited. The vertex set

**should always contain at most one vertex**

*currentVertex**<p> CODE: https://gist.github.com/experoblogs/6f129a55ad695322eb7a555b408422d1.js</p>*

For the current vertex all that happens is it is marked as visited and the ** @path** is concatenated with the it’s own

**to make the**

*id***value more aesthetically pleasing. Next using the current vertex we calculate a potential value for all of the current vertex’s neighbors, which corresponds to step 3 in the Dijkstra algorithm.**

*@path**<p> CODE: https://gist.github.com/experoblogs/dfa0cb1b80fc8e4f0f3274c64cf18a30.js</p>*

Notice in the above code block we have an example of an ** IF** and

**statement, which demonstrates using control structures in a query body statement and is also an example of a nested conditional. Walking through the logic for any vertices that have not been visited then perform a check on the distance value. If**

*ELSE IF***is less than 0, which in our case means that no distance has been calculated for the target vertex**

*@distance***. Notice the only subtle difference between the**

*n***and the**

*IF***block is how the**

*ELSE IF***value is being set. The results are stored in the**

*@path***vertex set. The following snippet also covers Step 4.**

*unvisitedNeighbor**<p> CODE: https://gist.github.com/experoblogs/5c1e74a6c7eb2737fee9f1970e5f6636.js</p>*

The following code block once again shows off some features of GSQL, mainly that two vertex sets have set operations defined, such as ** UNION** and

**.**

*INTERSECT**<p> CODE: https://gist.github.com/experoblogs/b6887caccfe163d685dbd130f332d82f.js</p>*

Recall that in our version of Dijkstra we are assuming a value of ** -1** is acting as infinity, so utilizing the where clause, we specify that we want to consider all non-negative distances.

*<p> CODE: https://gist.github.com/experoblogs/5f992c06c3b6462939190afa8d1115bc.js</p>*

The above is simply establishing an upper bound of greatest distance that exist within the vertex set. A sentinel value could have been used to simplify the above, but I wanted to highlight the use of a MaxAccum, as well as being able to use conditional ** IF, ELSE** block to manipulate global accumulators outside of a vertex select statement. After the above we are guaranteed that

**will be non-negative. Now to perform the final step of Dijkstra:**

*@@minDistnace**<p> CODE: https://gist.github.com/experoblogs/abc955bf137fab98ea46b61d6cb38a32.js</p>*

Notice that the above also makes use of a ** HAVING** clause, where we specify the result set should contain only those with minimum distance. Furthermore there is also use of

**, which guarantees that the currentVertex set contains at most one vertex. For good measure let’s review the terminating condition for the while loop**

*LIMIT**<p> CODE: https://gist.github.com/experoblogs/9a32a10a4f69c6e8e50154e81488d2c1.js</p>*

We are going to repeat the ** WHILE** loop until we have no more valid connected vertices to choose. This is the end of Dijkstra’s algorithm and the rest of the query is again demonstrating some concepts and functionality of GSQL.

**Getting There from Here**

**Creating Output **

Let’s take a look at the last bit of the query.

*<p> CODE: https://gist.github.com/experoblogs/40b10d89c23c8477508744dfde6eb5f0.js</p>*

The above demonstrates container accumulators as well as the ** FOREACH** control structure, which will iterate through the objects in the

**. Each vertex is then tested to see if it is in the**

*@@visitedSet***by using the**

*@@unvisitedSet***function. If the**

*contains***instance exists then it is removed from the**

*v***.**

*@@unvisitedSet**<p> CODE: https://gist.github.com/experoblogs/e7826829dd82aeb05b0f25781bcb869f.js</p>*

The only new functionality to review is the ** PRINT** command. This can be used to write output to the JSON response when the query is run. Both accumulators and vertex sets can be printed.

**Running the Query **

When running the query, we are first prompted to provide a value, so we can pick our initial vertex. For our purposes we will use ** S**.

**Understanding the Results **

Running the query produces the following results in JSON.

Notice how both the contents of the container accumulators, ** @@visitedSet** and

**, are in the response. In this particular example everything was reachable, so**

*@@unvisitedSet***contains all vertices in the graph, while**

*@@visitedSet***contains no elements.**

*@@unvisitedSet*Next notice the vertex set, ** shortestPath** that was output. For each vertex there are the attributes that exist on the Vertex. First are the system attributes,

**and**

*v_id***, where**

*v_type***is the primary id of the vertex and**

*v_id***denotes the vertex type. Next there is an attribute map of the vertex. There are is an actual attribute called**

*v_type***and then there is an output of all the vertex attached accumulators defined for the query. We can see that the vertex has been visited, the distance to the vertex from**

*id***to**

*S***has a distance of 8 and the actual path to take is**

*E***.**

*S->A->D->E***Making a Slight Change**

Let’s add one more vertex to the graph, vertex ** G**. Let’s make this vertex by itself with no edges connected it to the existing graph.

Rerunning the exact same query with the same parameter now produces a slightly different result.

**Taming the TigerGraph**

Accumulators provide a powerful tool to gather additional information during a graph traversal. The implementation provided for Dijkstra was done more to highlight the features present and provided for TigerGraph as opposed to making the most efficient implementation in GSQL. Indeed many decisions for the algorithm were made to simply highlight what can be done in GSQL.