Financial institutions such as banks and insurance firms are among the organizations that collect the most data. This is because of their large clientele base as well as their number of transactions per day. By utilizing this massive data resource, these organizations can tap into many use cases that data science offers. This data plays a huge role in helping the organizations improve their operational efficiency and serve their customers better. In this piece, we’ll look at some of the applications of data science in the field, and then a little later work through a practical example. We’ll look at the following application areas:
- Credit Worthiness
- Real-time Analytics
- Algorithmic Trading
- Identification & Authentication
- Investment Advisory
- Fraud Detection
- Anomaly Detection
- Document Analysis
- Revenue Projection
Financial markets are hugely affected by many things that are happening across the world. These factors include government policy decisions, sentiments of key economists — just to mention a couple. Having a real-time analytics dashboard would enable a financial institution to track the many factors that affect the market in real-time. In doing so, they can make quick decisions that would enable them to prevent losses as well as increase profit.
Apart from having real-time dashboards that display shifts in the market, a financial institution can have a system that makes trading decisions immediately after a key factor changes. This will ensure that they don’t have to wait for human traders to check the system and make decisions. Automated trading via algorithmic decision making can be very useful in split-second trading decisions. Another key advantage of these systems is that they can work around the clock, unlike a human trader. A human trader can also keep improving these systems by sharing their trading strategy with the system.
Financial institutions invest a lot of resources in risk mitigation. A wrong customer risk profile in lending and insurance would lead to future losses. In insurance, for example, the losses would occur due to unprecedented claims resulting from accidents. It is, therefore, very crucial to accurately profile a customer as this will determine the amount of premium that they’ll pay. For example, a driver with less experience and/or multiple accidents or claims will have higher premiums. This is an advantage to less risky clients because they can then pay less on their premiums.
Identification & Authentication
Advancements in facial recognition and biometrics have enabled financial institutions to incorporate additional layers of security in order to protect their clients’ money. Financial institutions are also using advanced image recognition techniques to determine the validity and authenticity of passports and other identification documents posted on their online platforms. These, among other techniques, have boosted their online security as well as increased the confidence of their clientele when using those channels.
Financial advisors are not inexpensive. Many financial institutions across the globe are deploying artificial intelligence-driven investment advisors. These advisors have proven to be cheaper than human advisors. They are being used by many institutions to enable clients in achieving their financial goals. The fact that they are available online has enabled clients to access them anytime, even on different time zones.
With advances in e-commerce and online payments, fraudulent transactions are a major concern for financial institutions. These transactions result from stolen identities as well as stolen credit and debit cards. Financial institutions are protecting their clients’ money by requesting additional information in the event that the system notices an out of the ordinary activity. For example, the system would flag a client transaction in a new country especially if the client had not informed their bank of any planned travel. Other items such as huge or irregular withdrawals would also trigger the system to block the transaction.
According to the UN, the amount of money laundered globally every year is $800 billion — $2 trillion. Looking at the world’s GDP, if money laundering was a country, it would be the fifth-largest nation in the world. This is clearly a huge global problem. Financial institutions have to, therefore, comply with global regulations concerning money laundering. One of the techniques they are using to tackle this challenge is anomaly detection. This means keeping a close eye on all transactions in order to detect individual transactions that fall out of their ordinary transaction history.
Analysis of legal paperwork is something that has proven to require a lot of time. It also requires many lawyers to analyze and interpret legal jargon. It goes without saying that having several experienced lawyers do such analysis is not cheap. Financial institutions are using AI-driven document analysis systems to save not just time, but also money. The institutions using these systems are also able to comply and get ahead of their competition by a quick adoption of policies from regulators.
By collecting relevant client information, financial institutions are able to determine how long a client will stay with them. By doing so, a financial institution can predict the lifetime value of a customer. This type of information plays a key role in enabling the organization in future planning. It can also be useful in developing products for the customer as they grow with the organization.
When any financial institution lends money, it wants to increase the chances that the borrower will not default. This is especially critical because these institutions lend to a very large amount of people. So even if the amount of money being lent for an individual is small, it compounds to a huge figure when the number of clients comes into play. Therefore, the ability of the institution to determine the probability of a customer defaulting before they even take the loan would lead to huge savings.
We’ll use this dataset to build a simple machine learning model to determine if a client will default on their credit card payments.
As usual, we’ll be working on Saturn Cloud in order to take advantage of the compute resources available there. We’ll start off by loading a couple of packages.
import pandas as pd import matplotlib.pyplot as plt import seaborn as sns
Next, let’s use pandas to load in our dataset.
df = pd.read_csv(‘default of credit card clients.csv’)
Using the head function we can display a snapshot of the data.
By checking the dataset’s info, we notice that there are no null values that we’d need to deal with.
Some basic exploratory data analysis shows us that most people in the dataset have gone to university.
plt.figure(figsize=(8,6)) sns.countplot(x=’EDUCATION’, data=df)
We can also see that the majority of the people in the dataset are female.
plt.figure(figsize=(8,6)) sns.countplot(x=’SEX’, data=df)
Let’s now visualize our target column — that is if an individual will default on their next payment.
plt.figure(figsize=(8,6)) sns.countplot(x=’default payment next month’, data=df)
With that, let’s prepare our dataset for training. We declare the features as X and the target variable a y.
X = df.drop(‘default payment next month’, axis=1) y = df[‘default payment next month’]
The next step is to split the dataset into a training and a testing set. We’ll use 67% of the data for training and the rest for testing.
from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.33)
Now we can define a binary LGBMClassifier because it will enable us to declare the columns that are label encoded as categorical features.
from lightgbm import LGBMClassifier model = LGBMClassifier(objective=’binary’)
As we fit this model to the training, we declare the columns that should be treated as categorical.
We can now proceed to make predictions using the model.
predictionsLGB = model.predict(X_test)
Let’s now evaluate the performance of the model.
from sklearn.metrics import classification_report, confusion_matrix,accuracy_score accuracy_score(predictionsLGB, y_test) 0.815
In the event of highly imbalanced data, it is important to look at other evaluation metrics other than accuracy. This is because the model will learn to predict the class with more entries in the dataset. If the data is highly imbalanced, the model might also struggle in the learning process. In that case, consider balancing the data.
We can display the confusion matrix courtesy of Scikit-Learn.
confusion_matrix(predictionsLGB, y_test) array([[7269, 1416], [ 406, 809]])
Finally, let’s display the classification report. We should, obviously, in this kind of problem focus on fine-tuning the model to improve the precision, recall, and f1-score of the individual classes. We should clearly work on reducing the false negatives, that is when the model predicts that a customer would not default then they do.
I trust that this piece of writing has shed some light on how machine learning is being applied in the financial space. We’ve gone through a couple of applications and walked through a quick example. I am confident that with this and further research you’ll be in a position to apply some of these so as to start reaping the fruits of machine learning in your organization.