Building a useful fraud detection system requires bringing many tools together and making them work together to bring undetected anomalies to light. In its simplest form, our fraud detection brings together the following tools:
- Graph Viewer: Visualize the connections within your data
- Investigations: Create, record and track changes in your data
- Fraud and Community Rings: Group your data based on detected patterns
- Graph and Graph Algorithms: Analyze the connections within your data
- Supervised and Unsupervised Learning: Predict behavior based on past or learned behavior
- GraphML: supervised or unsupervised learning combined with and applied on graph data
along with a host of other useful components in one easy to use and intuitive dashboard to help users find anomalies in their systems.
One of the most obvious needs of a fraud detection system is its ability to visualize results. It’s not enough to simply visualize the data itself: there are many applications that can achieve this goal. The system should also be able to display things within the data that the user cannot see, such as impactful or costly entities in a system. Our Connected Toolkit provides users with these capabilities, as well as the ability to track how data changes in the system over time through its investigations module. It also allows users the ability to group entities, such as potential fraud rings.
Beyond the visualizations, a good fraud detection system needs to be able to perform powerful analytics to get the most out of its data. A common approach is to load pertinent data in a graph, and then perform algorithms across datasets. This allows the system to gather information about the arrangement of entities in the graph and how various entities relate to one another. Our Connected Toolkit for example, allows users to automatically create communities to group data sets, or find similarities between different entities. From here, the system can produce an understanding of key KPIs that are normally difficult to measure, such as risk or churn scores. These analytics can also be used to generate important features that feed into machine learning models.
Our system allows us to calculate essentially any measurement or metric from the graph in order to achieve the goal of improving the accuracy of our fraud recommendations. However, choosing the right features is important to actually accomplishing that goal. Although each context is different, understanding how various entities in your graph relate and which relationships might be more important than others can help you in your decisions. For example, in an AML context, you might look at how many transactions a party is connected to, or how similar one group of financial transactors is to another.
Of course, the process to achieve accurate fraud detection models is not always straightforward. You are likely not going to have the perfect set of features or models right from the start. Rather, your data scientists will go through an iterative process with potentially many experimental trials. The Connected Toolkit gives users the ability to easily view these experiments and their results, so that they can deploy the best version of their models for production use.
Finally, machine learning provides an extra, indispensable, tool to find fraud or anomalies in a system. Our Connected Toolkit uses machine learning algorithms to predict fraud in a given system. Through human in the loop feedback, we provide a means for analysts to further improve the detection accuracy of the system, and automatically adapt to changing behaviors of fraudsters. We also provide unsupervised learning algorithms on entities without historical data, such as generated communities, to determine where fraud is likely to exist.
One important point that has not been discussed yet is scalability. A fraud detection system can only be as useful as the amount of data it can handle. If it collapses under large data sets, from performance and lack of scalability, it is unlikely to produce the results needed with a sufficient level of accuracy. We’ve solved this issue on two fronts: in the visualization and in the analytics. We handle large data sets on the UI using filters and different resolutions simplify the view. We also handle large datasets at the algorithm level, so that metrics and topology features can be gathered quickly and efficiently.