Node Classification by Graph Convolutional Network
Graham Ganssle, Ph.D., P.G.

Graph is the optimal representation of information. In a graph, there are nodes (maybe “entities”) which are connected by edges (say, “relationships”). For example, take a moment to imagine what your Facebook network looks like: there’s you in the middle connected to all of your friends, who are, in turn, connected to each other in various permutations. Representing these friendships in a table is an arbitrary (and detrimental) way to express this information. Friendships don’t look like a bunch of rows in a table; they look like this:

Figure 1: Facebook friends. The gold connections are your friendships; the grey connections are friendships between your friends.

Predicting Node Properties

Now that we have your Facebook friends represented optimally, how do we go about the business of optimally predicting what your friends are interested in? We use graph convolutional networks (GCNs). GCNs are a new deep learning architecture which leverage both the information contained in the data and the information contained in the relationships between data.

By way of example, let’s say we want to predict which of your friends are republicans and which are democrats. We’d apply a GCN to your Facebook network above to assign the predicted node property, “political stance” to the nodes with missing labels in the graph. The GCN takes two inputs: first, it takes a list of the “features” of each of your friends, which could include things like their alma mater, their zip code, the groups they’re a member of, etc. Obviously, this information is characteristic for learning a person’s political leaning. Second, the GCN takes a condensed form of the graph’s structure. This helps the GCN learn how the friends’ friendships influence their political stance.

Arguably, both of these pieces of information alone are enough to make predictions about a person’s political leaning. Traditionally, deep learning systems do not even make use of the relationships between entities to make predictions, they only use the properties of individuals. But by incorporating both the properties and structure of the data in GCN we build an incredibly powerful predictive tool.

Figure 2: Inputs to a graph convolutional network: Jane is connected to many republicans, but she also has unique characteristics including education, geographical location, and many Facebook group memberships. Is she a republican or democrat?

The application of GCN’s is pretty simple. They operate in layers, which you can stack together to be as deep as you want. Inside of each layer, there are three things happening: first, the structure of the graph is normalized. Second, the normalized graph structure is multiplied by the node properties. Finally, the last thing that happens inside of each GCN layer is that we apply a nonlinearity function to the node properties and weights:

Figure 3: a graph convolutional layer. The discerning reader will note that this is an approximate convolutional operator, due to the stationarity of the kernel function. For a more detailed treatment, including the derivation of this equation, see Kipf and Welling 2017.

In application, we stack in a dropout layer and use leaky ReLU’s with a softmax output activation function to build something which looks like this:

Figure 4: a two layer graph convolutional network using one intermediate dropout layer. I’ve found a leaky ReLU activation function helps alleviate vanishing gradients.

I want to highlight the simplicity of this neural network design by including some Tensorflow code which implements a GCN layer. I’ve marked it up with annotation for those readers unfamiliar with Tensorflow:

Figure 5: a graph convolutional network implemented in Tensorflow with an ADAM optimizer and a softmax cross entropy loss function. The green section is the implementation of the graph convolutional layer; note its simplicity.


The above cartoon example is too simple to demonstrate the complexity of this problem. Below, find a real Facebook friendship network, colorized by political leaning.

Figure 6: a Facebook friendship network. Color indicates the political viewpoint of each friend; 10% of these political stances are predicted by graph convolutional network, the rest were input as part of the training sequence.

Remember that the GCN doesn’t care what the input data is for each node, or what the relationships between nodes look like. We could use anything! Using information about the financial transactions between businesses and the financial standing of the businesses themselves, we can use a GCN to tell us what transactions are fraudulent. Even more powerfully, building a supernode graph of businesses, we can use GCNs to tell use where analysts should look for money laundering rings.

Figure 7: a graph of transactional supernodes. Which of these organizations is involved in a money laundering ring? We can use graph convolutional networks to predict the malevolent actors.

We can apply GCNs to banks’ customer data to predict which customers should not be approved for loans based on some customer characteristics and some risk thresholds. Again, the power of using between-customer relationships is that we can leverage the structure of historical bank data.

Figure 8: a subgraph of a 100k node bank customer dataset. Using the relationships between customers and each customers’ financial data, we predict which customers should be approved for which loans and verify against some risk threshold.

I’ve shown that leveraging the natural structure of data improves the predictive power of machine learning analysis, so it’s important to recognize the ubiquity of graphs. I’d be willing to bet your organization works on data every day which is perfect for a graph structure. Any situation in which pieces of information are connected together, there’s a graph problem. And anywhere there’s a graph, there’s a beautiful, powerful analytical insight waiting to be discovered.

Want to chat about your use case? Send Graham an email!