Share
BLOG

Opioid Fraud Detection Using Deep Embedding

Summary:
Using deep machine learning recommender system technology, Expero detects health care opioid fraud.

Opioid Prescription Fraud

The opioid crisis in the United States is fuelled by a malevolent release of prescription drugs from clinical settings. There are two types of bad actors in this situation: first, there’s the external bad actor. This is an entity who exists outside the clinical setting and is funneling these drugs into the community by purchasing or stealing them from the clinics and/or clinicians. The other type is the internal bad actor. This is an entity who works inside the clinical setting, and uses their access credentials to maliciously disburse these drugs into the greater population. One common form of this disbursement is the misdirection of opioid prescriptions or treatments within clinics. Unfortunately, this increasingly common fraudulent behavior is hard to detect, and is causing an increase in the overdose deaths in the United States every year.


Figure 1: Number of deaths in the United States per year, courtesy of the National Institute of Health, 2017.


A common fraud scheme is for a medical professional working in a hospital to falsely assign opioid drugs to a patient’s chart, but never deliver those drugs to the patient, instead pocketing the opioids for use or disbursement. Using their clinic’s internal record system, it’s possible for an employee with the proper medical credentials to change patient records to account for missing opioids.

Federally Mandated Record Keeping

Fortunately, HIPAA’s 45 C.F.R. § 164.312(b) requires all health care professionals to maintain an audit log for any changes to patients’ medical records. One effect of this requirement is that patients can request a list of all the health care professionals who’ve ever changed their medical records. Another effect is that most medical facilities (in our experience) have some logging system streaming data in real time about access and changes to medical records. These logs include the ID of the medical professional, their position (lab tech, nurse, pharmacist, medical coder, etc.), a timestamp, the patient ID, and the log deltas.

Figure 2: A schematic of a graph database storing patient-clinician medical record interactions.


Traditionally, companies have employed standard anomaly detection techniques and temporal modeling to attempt to classify various types of fraudulent behavior from these logs. These techniques do work with varying degrees of accuracy, but as in the case of all fraud, criminal techniques are constantly evolving. Expero employs a bleeding edge architecture for performing internal bad actor anomaly detection which can be used in concert with these traditional techniques. The secret sauce is as follows.

Detecting Internal Bad Actors

The information known as priori in this analysis is:


  1. The medical professionals working in the clinic. The information we have on these individuals is fairly complete from an analysis standpoint. We have a timestamped log of all their transactions with patient records, their specific medical credentials and system access levels, and we also know some demographic information about each individual (which we will ignore in this article).
  2. The patients treated in the clinic. There exists exhaustive information about each of these individuals, but because data privacy is of paramount important in the healthcare domain, we’ll only be working with the following: an obfuscated patient ID, a (potentially time-shifted for obfuscation) time stamp indicating a change has occurred to the medical record, and some limited information about the patient’s medical history and/or their current ailment. We choose to use this medical condition information as “side information” in our analysis.


We extract the information on each of these patients and clinicians by selecting data for each clinician from a graph database. Because the patient-clinician relationships within one medical facility can be devilishly intertwined, storing this information in a graph database is ideal. When we perform the extraction, we simply issue the query, “return all the patients within X hops of each clinician.” In this article, we examine only the case where X equals 1, though as you can imagine, when you increase the X distance you can uncover extremely complex and subtle interactions in the medical facility.


The type of analysis we perform in these cases is a derivative of the field of sequential recommender systems. In fact, one may call it a perversion of the field of sequential recommender systems, because it has little to do with recommendations. The analysis starts in the same way, we use a deep learning library to instantiate a high dimensional latent vector embedding layer for each of the patients, and each of the medical professionals.


Figure 2: A sample from a medical record clinician-patient cross interaction matrix used in this fraud analysis.


We train our model against the cross interactions between medical professionals and patients. One (naive) way of building up the cross interaction information is by counting; every time medical professional X interacts with patient record Y, add one to the X/Y intersection point. This is directly correlated to the application of implicit ratings matrices of the recommender system world.


Once we’ve trained against the interaction matrix, we can perform several interesting analysis types. First, we can extract the latent vectors describing the medical professionals and use cosine similarity (or similar techniques) to sort them against their colleagues. This works in an identical way for patients. Using cosine similarity we can find the most similar patients/clinicians, and find the least similar patients/clinicians.


More importantly, we can fill in the rest of the interaction matrix.


Filling in the interaction matrix is equivalent to predicting the number of times each clinician will interact with each patient’s medical record. This is extremely powerful, because once we have this information, we can cross check it with observed data as it streams in in real time. When our comparison algorithm finds that a clinician’s interactions with a patient's record is higher than some threshold (which is dependent on the types of the patients. Remember that side information?), we use our UI to notify an auditor or regulator that this clinician is behaving anomalously, maybe even fraudulently. The regulator then can make the decision whether to investigate further.

Questions about this blog post? Send us an email - info@experoinc.com

Subscribe to our quarterly newsletter

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.