Using Anomaly Detection Techniques to Spot Credit Card Fraud

5 min readAug 14, 2021


Photo by Pickawood on Unsplash

Table of Contents

  1. Executive Summary
  2. Introduction
  3. Methodology
  4. Results
  5. Discussion
  6. Conclusion

1 Executive Summary

This article illustrates an example of anomaly detection techniques application to analyze customer credit card behavior to identify potentially fraudulent transactions. We discuss the preprocessing steps, apply anomaly detection algorithms such as Local Outlier Factor and Elliptic Envelope hyper-parameters, and visualize the outcome.

2 Introduction

This project and the data were part of the first course, “Introduction to Predictive Analytics using Python,” in the University of Edinburgh Predictive Analytics using Python MicroMasters program.

The project aims to predict whether a transaction was fraudulent based on a small historical dataset.

Photo by Nahel Abdul Hadi on Unsplash

The aim is to identify anomaly transactions using a historical dataset that we first need to preprocess.

3 Methodology:

3.1 Analytic Approach and Data Requirements

A predictive analytics approach is selected to identify outliers. Due to having a relatively low frequency of fraudulent transactions in practice, we cannot assume that the positive examples would be sufficiently similar to learn in a supervised learning approach scenario reliably. Therefore, this assignment uses the anomaly detection approach. (In this dataset, 57% of the cases had a positive label, so arguably supervised classification would work as well in this example) Anomaly detection requires normalized numerical data to converge efficiently.

3.2 Data

The course authors provided the dataset in CSV format. It consisted of 99 training examples with five features: CC, No, Money, City, Phone, and class label 1 for a fraudulent transaction and 0 for the negative class.

Figure 3.1: Initial Dataset

Several preprocessing steps were necessary to bring the data to the format that the anomaly detection algorithms would accept (steps sequence defined by the course):

  • Step 1: Format
  • Step 2: Feature Selection
  • Step 3: Outliers
  • Step 4: Transformation

Step 1: Format

First, we get rid of dollar signs and commas and then convert money into the numeric format.

Step 2: Feature Selection

Next, we count the number of unique values for the remaining features, and we see that the CC number has 99 unique values in object format, and the phone had 95 out of the total of 99 observations. These features will arguably be less beneficial and are, therefore, dropped from the data frame.

Step 3: Outliers

Outliers are then identified based on numerical features “No” and “Money” using LocalOutlierFactor with 20 neighbors and contamination of 20%

Photo by Will Myers on Unsplash

Step 4: Transformation

as the last preprocessing step, we standardize the numeric variables. Figure 4.2 Normalized Data shows the dataset after the transformation.

Figure 4.2 Normalized Data

Note: Within the programming assignment, “city” was not converted into a dummy variable, but this could be an option also to assess the relationship between cardholder location and behavior.


Using 30% contamination factor (set by the course), unsurprisingly, with the higher number of neighbors points located towards the edges such extreme low and high No values were selected as outliers (red points) while the at lower neighbor values.

Figure 4.3 LocalOutlierFactor

Application of EllipticEnvelope results in an elliptical decision boundary with points outside the ElliptcEnvelope flagged as anomalies.

Figure 4.4 EllipticEnvelope

5 Discussion

We have seen that it is possible to identify anomaly credit card behavior using the widely used Elliptic Envelope or Local Outlier Factor algorithm. In this simple example, with only two features selected, the visualization provides some intuition into how these algorithms would identify anomalies. With LOF, we would expect that lower hyper-parameter value for neighbors would result in more complex decision boundaries, also flagging examples closer to the “center of mass” (for the lack of a better term) than anomalies identified for values located at the edges for models built a higher number of neighbors. This additional complexity may offer extra accuracy but may be at the price of potentially overfitting the training set.

Elliptic Envelope is also a classic example of anomaly detection that would identify values outside of elliptical decision boundaries as anomalies having lower probability. The downside of this approach would be that we assume Gaussian distribution, and more research and descriptive analysis are needed to confirm that this is the case. With either of the methods, we can use the combination of the variables “No” and “Money” to predict anomalies either based on example distance from other observations as we saw with LOF, or based on probabilities and comparing it to a set threshold as we saw with EllipticEnvelope case.

6 Conclusion

In this exercise, we have preprocessed the dataset on credit card behavior and built several anomaly-detection models. We have also plotted the anomalies in 2-dimensional feature space to feel how these algorithms would approach the detection problem in higher dimensions. In a higher dimensional space, we can still visualize the anomaly examples to communicate the findings to the stakeholders using principal components.

Future analysis

While not true in the given dataset, we could potentially use phone numbers or parts of a phone number for predictions if we could gain information about the phone code for different regions or network providers favored by fraudsters. These additional features could add to the predictive power of our model.

Prediction of anomalies on unseen data using the learned parameters such as standard deviation and mean was not part of the scope of this exercise. A logical next step would be to split the data into training, validation, and test sets, with the training set being composed of negative examples. Use this set to learn the parameters, e.g., standard deviation and mean (if we use an elliptic envelope). Then, select the optimal parameters for the threshold value by comparing the predicted versus actual labels for the validation set. Finally, measure test set performance.

We can also measure the performance of different algorithms based on real-world detection capability when they are applied.

Introduction to Predictive Analytics using Python

University of Edinburgh MicroMasters® Program in

Predictive Analytics using Python

August 14, 2021

Copyright © 2021 Schwarzwald_AI




Data Science | Machine Learning | Operations Research