Share
BLOG

What Machine Learning Can Learn from Graph

Summary:
Graphs and graph datasets are rich data structures that can be used uniquely to improve the accuracy and effectiveness of machine learning workflows. Some of the key interactions are graph analytics as features, semi supervised learning, graph based deep learning, and machine learning approaches to hard graph problems.

Some of the major challenges in practical application of Machine Learning (ML) come down to data. Supervised learning approaches such as Deep Learning, ensemble methods like Random Forests, and Support Vector Machines provide the most compelling examples of successful ML application.

When it comes to the data that we use to train these systems, typical factors are:

  • Do we have enough training data for generalisation and convergence?
  • Is that data balanced, or do we have too few examples of some classes of input that is not reflected in the true distributions of the input?
  • Do we have enough labeled data in order to use a supervised approach?

It is essential that we get the most from the data that we have available--and we can use Graph representations to help, whether our data is naturally a graph or not.

This intersection of ML and Graph is the topic of my talk “Graph Representations in Machine Learning” at the Minds Mastering Machines [M³] Conference in London on October 9. 2017.

‍‍Real-world Graph datasets inherently contain rich relational data between different entities. If we have Graph data or can enforce a Graph on our data, this gives us completely new dimensions across which to train ML algorithms. For example, using Graph data to approach semi-supervised learning in situations where labelled data is sparse is one of the more compelling potential applications.

Here are 6 ways that the Graph and Machine Learning worlds intersect:

  1. Graph Features - When we have a graph dataset, we can augment our traditional feature sets with additional features derived from the graph. Neighbourhood similarity, distance from a particular node type, community labels, centrality and so on. The graphiness of our data determines which graph-derived measures we can add to our feature engineering toolkit.
  2. Graph Costs - In addition to more features, we can use the graph to compute distances, similarity and cost function based on measures across the graph, providing a different way to constrain learning and optimise an algorithm.
  3. Graph Matrix Representations - Machine learning architectures essentially process vectors and matrices, yet graphs also have natural matrix formulations. We can use these representations directly in some traditional machine learning architectures and by doing so incorporate graph structure into
  4. Graph-Based Semi-Supervised Learning (SSL) - SSL is an approach to dealing with shortages in labelled data. Here unlabelled data is used to improve training accuracy, increase balance in the training data and expose the algorithm to underlying priors of the input space. Graph data represents a unique opportunity to robustly apply SSL.
  5. Graph Deep Learning - Beyond using traditional architectures on vectorised representations of graphs where these are treated as any other vector, we are seeing architectures being adapted to understand and operate on graphs. Essentially, they are Deep Neural Networks designed to use graph-based weight representation operations and costs to operate on graphs directly.
  6. Solving Graph Measurement Problems - The points above have so far described how to use Graphs in ML pipelines. However, independently, ML approaches can be used to practically approach intractable graph analytics problems such as entity resolution or community detection where we can develop different methods for arriving at approximate graph isomorphisms.

In Summary

Using Graph representations and analytics enables organizations to understand their data in new and powerful ways. The same representation can enable Machine Learning to be applied in different ways and more effectively by getting the most out of the dataset and inherent relationships available.

Subscribe to our quarterly newsletter

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.