11 Applications of Data Science in Retail

May 14, 2020

11 Applications of Data Science in Retail

May 14, 2020

There are many application areas for data science and analytics in the retail space. Their application in this sector has enabled players in the sector to serve their customers better as well as increase profit. In this piece, we’ll look at some of these application areas. At the tail end of this writing, we’ll comb through a dataset to see some of the application areas.

Product Recommendation 

E-commerce has taken the world by storm. What this means is that these online retailers have the purchasing history of their customers. Using this data, online retailers are increasing their sales by recommending new products to customers. That’s why you’ll see recommendations such as customers who viewed this also viewed or customers who bought this also bought. For example, by determining the similarity index of customers, they can recommend certain terms to one customer when the other customer buys them.


Market Basket Analysis

In this analysis, retailers work to figure out the relationship between different items that are purchased by their customers. This is done using association rules. By determining which items are frequently bought together, the retailer can make informed decisions on merchandising. This informs the layout of the store by, for example, placing items that are frequently bought together close to each other.



Online retailers such as Amazon are using drones to make delivery to their customers quicker. These drones run a bunch of machine learning models in order to ensure the safety of the drone. The drones have autonomous flight systems that enable them to land at the customers’ locations.


Review Analysis

Negative sentiment spreads like wildfire, especially if it’s not addressed immediately. However, with the rise of social media, it has become extremely difficult to determine negative and positive sentiment from the sea of reviews that are left on them. Retailers are using social media monitoring tools coupled with sentiment analysis to determine a review’s polarity. Negative reviews can be discovered faster and dealt with immediately.


Stock Prediction

Most of the items in retail stores and pharmacies have near-term expiration dates. Overstocking them would mean that they would lose money because they can’t sell expired products. Understocking would lead to customers visiting the stores and not getting the items they need. Therefore, an optimal stock capacity is crucial. This can be achieved by weaving state of the art machine models with the retailers’ sales history to determine the best stock capacity.


Shopping Assistants 

Finding a shop or an item in a large shopping mall or a large retail store is not a walk in the park. Retail stores are using shopping assistants to enable their customers in navigating their stores. They are also using chatbot applications to help customers get answers to frequently answered questions.


Augmented Reality

With augmented reality, online retailers are able to allow customers to visualize items before they purchase them. For example, using the visual fitting rooms, one can visualize how different clothes would fit them. One can also visualize how a piece of furniture would look in their living room before making a purchase.


Warehouse Robots

Online retailers use warehouses to store their products. The items can become so plentiful in the warehouse that it becomes close to impossible for a human being to move from one section to another looking for an item. Robots are being deployed in these warehouses because they can quickly determine an item’s location in the warehouse. They then pick up the item in preparation for shipping. This can obviously work only in the combination of proper recording of all items and their location in the warehouse.



Theft Prevention

Computer vision can be used to detect faces of known shoplifters when they enter a store. This will definitely help the physical stores reduce losses that result from shoplifting. It’ll also make work easier for staff — in the event of well-known shoplifters — because they don’t have to constantly monitor the security cameras.



Sales Projection

By determining how long a customer will stay with a certain retailer, the retailer can determine the customer’s lifetime value. Using this information, they can predict how much profit they are likely to make from an individual customer during their lifetime. This can then be used for projecting sales which help the retailer in future planning.

Store Location

Analyzing the population and income levels of an area can help in determining the best place to open a new store. It can also help in price optimization as well as determining the type of products to stock in the store. Analyzing the population and their living standards is very crucial because physical stores rely majorly on customer walk-ins.

Let’s now see how we can do sales forecasting for a retail store using Prophet. Prophet is a library built by Facebook for forecasting time series data. It works well with shifts in trend, outliers, and missing data.

The dataset we’ll use is available on the UCI Machine Learning Portal. It contains sales from 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail.

If you don’t already have Prophet installed, you’ll start by installing it.

pip install fbprophet

Next, we import Prophet and Pandas.

from fbprophet import Prophet
import pandas as pd

Let’s now import the dataset and check its head.

df = pd.read_csv(‘Online Retail.csv’)

Prophet expects us to have two columns. y for the item to be projected and ds for the timeframe. We, therefore, have to transform the data frame to be in that format. We start by computing the the column.

df[‘y’] = df[‘Quantity’] * df[‘UnitPrice’]

After this, we compute the date column. We start by splitting the date column in order to remove the time. Let’s write a simple function to do that.

def getDate(date):
x = date.split(‘ ‘)
return x[0]

The function splits the date column by the space and returns the first part of the split list. We then use it to create the new ds column.

df[‘ds’] = df[‘InvoiceDate’].apply(getDate)

We can now select the two columns that interest us and save them in a new data frame.

sales = df[[‘ds’,’y’]]

However, since the dates are repeated, we’ll aggregate them in order to get the total sales per day.

sales = sales.groupby(‘ds’)[‘y’].sum().reset_index()

Now we are ready to fit Prophet to the dataset. The first step is to create an instance of Prophet. Since the dataset is from the United Kingdom, we also add the in-built UK holidays. Doing this will add visualization of holidays when we perform the visualization later. Finally, we fit the model to our sales data frame.

model = Prophet()

Next, let’s make a year’s worth of predictions in the future. In order to do this, we have to create a data frame with those future dates. Prophet provides a make_future_dataframe function to enable this.

future = model.make_future_dataframe(periods=365)

Once the new dates are ready, we can make our forecasts. As you can see below, Prophet’s usage is similar to Scikit-Learn’s implementation.

forecast = model.predict(future)

Let’s check out these predictions.


However, the columns that are of most interest to us are ds, yhat, yhat_lower, and yhat_upper. yhat represents the predicted sales. The rest are its upper and lower boundaries.

forecast[[‘ds’, ‘yhat’, ‘yhat_lower’, ‘yhat_upper’]].tail()

Prophet also allows us to visualize our model. The black dots represent our dataset while the blue line represents the predicted sales.

plot1 = model.plot(forecast)

Prophet also enables us to quickly plot the time series seasonality. This includes the trend, weekly and yearly seasonality.

plot2 = model.plot_components(forecast)

We can now check the performance of the model by comparing the predicted results with the historical data. This is done via cross-validation. This function requires us to pass in our model and the forecast horizon.

from fbprophet.diagnostics import cross_validation
df_cv = cross_validation(model,horizon = ’50 days’)

We can now check the performance metrics.

from fbprophet.diagnostics import performance_metrics
df_p = performance_metrics(df_cv)

Prophet also allows us to visualize these metrics as shown below.

from fbprophet.plot import plot_cross_validation_metric
fig = plot_cross_validation_metric(df_cv, metric=’rmse’)

Check out this [VIDEO] to see how I completed the analysis.

Final Thoughts 

In this article, we’ve covered a couple of application areas of data science in the retail sector. We’ve seen how it can be applied in sales projection, determining a store’s location, preventing theft, and executing drone delivery — just to mention a few. We have also combed through a case study that I hope has shed some light on some of the application use cases.

Guest post: Derrick Mwiti

Stay up to date with Saturn Cloud on LinkedIn and Twitter.

You may also be interested in: Evolving the Transportation Industry with Machine Learning.