There have been a few different approaches to fraud detection over the years:
- Rules Engine
- Supervised Machine Learning
- Unsupervised Machine Learning
- ML + Social Network Analysis, or GraphML
I’ll discuss some of the benefits and drawbacks for each of the various approaches to fraud detection in increasing order of sophistication and effectiveness below:
Blacklist: Usually a list of parties or devices, which include known bad entities. Entity resolution can be a big factor here to determine if a party in the system and in the blacklist overlap. For example, what if the names of a party differ from what they are in a system but only slightly. The system needs a way to match entities which are similar but not identical. Furthermore, the system needs to do so accurately. Blacklists are generally static and do not adapt automatically. Blacklist’s have to be updated manually once more bad actors are known. Furthermore, if the bad actors are known, fraudsters can easily get around this using identity theft, or employing actors or devices that are not on the blacklist.
Rules Engine: Includes a list of rules to catch risky patterns. Difficult to keep up with rules. Rules are always changing, and potentially come into conflict which makes management difficult. Rules are also, once known, very easy to get around. For example, say a rule is to flag transactions over $X as suspicious. If this rule is known to the money launderer, they can split their large transactions into smaller ones that go undetected.
Supervised Machine Learning: This strategy relies on existing data to make an inference on new data in the system. In this context, this strategy would be used to classify a particular entity into one of two groups: fraud and not fraud. Performs better on larger datasets rather than small datasets. Requires quality data on which to accurately train a model which will be used in a production environment. Therefore, quality data representative of the environment in which you will be working is essential to achieving accurate results. Another caveat is that you will need a set of already labeled data, or the skills to generate labels for existing data. For example, if you were looking at a set of transactions to determine if fraud possibly exists, you would need a set of known fraudulent and known not fraudulent transactions on which to train your machine learning model. Large amounts of data benefit machine learning models and algorithms (like supervised learning) that perform better on larger data sets rather than smaller data sets.
Unsupervised Machine Learning: In many ways similar to supervised machine learning, but doesn’t need pre-labeled data. This is great for detecting whether communities generated by complex algorithms on the fly potentially create fraud, since in these scenarios it’s unlikely that you will have a pre-labeled set of data to work with.
GraphML: Modern detection systems use a combination of machine learning and transaction network (or graph) analysis. What’s more, these two strategies compliment each other. Information gathered from transaction networks can provide feature sets for improving machine learning models. Meanwhile, machine learning models can provide input on entities within the transaction network apart from what exists in the network. Furthermore, transaction networks provide an intuitive visualization of a variety of scenarios, making it easier for users to see patterns of fraud or anomalies in the data, apart from what the analytics determine in the background.
At Expero, our fraud product uses a combination of machine learning technologies with graph data structures to find and fight fraudsters quickly and in real-time. See below screenshots for examples of how we use these technologies to guide users toward finding fraud and anomalies within their data or system: