Optimization of processes in the manufacturing industry can lead to big savings for the sector. For a capital intensive industry like this one, application data science can also lead to streamlined processes as well as increased profits. Let’s look at some practical applications in this article. Towards the end, we’ll also work with a real dataset so as to see one application in practice. 

We will look at applications in:

  • Predicting Future Machine Failures
  • Detecting Faulty Products
  • Making Work Easier with Robots
  • Demand Forecasting

Let’s dive in. 

Predicting Future Machine Failures 

The failure of machines in the manufacturing sector can lead to huge losses. This is because it would take a while to diagnose and fix the problem. However, sensors can be installed to collect data as the machines run. This will not only make the diagnosis faster and easier but also make it possible to predict when a machine is likely to fail. 

 

Detecting Faulty Products

Quality is very crucial in the manufacturing process. Therefore, reducing the number of faulty products is of utmost importance. Multiple faulty products lead to losses as well as more time making replacements. Collecting data on what makes certain products faulty can help a manufacturing firm in reducing the number of faulty products as well as determining a potential product necessity. 

 

Making Work Easier with Robots

Having robots working alongside human beings can make the manufacturing process faster and easier. The robots can also handle dangerous processes that put the life of a human being at risk. The robots are equipped with computer vision capabilities that make it possible to implement complex tasks in a manufacturing firm. 

 

Demand Forecasting 

It is very crucial that a manufacturing firm only produces what is needed in the market, now referred to as Just In Time production. Production of excess products would lead to increased cost of storage as well as capital stuck in non-moving products. Production of products below the market demands would then lead to loss of revenue. Therefore, optimal forecasting of market demand is important. 

 

Let’s now walk through a problem presented by Daimler on Kaggle. The goal is to use the provided data to reduce the time cars spent on the test bench. The dataset has representations of different permutations of Mercedes-Benz car features. We’ll use the dataset to predict the time it takes for a car to pass testing. As a result, Daimler will have faster testing that will result in lower carbon dioxide emissions. All this without reducing quality and standards. 

The variables in the dataset have been anonymized. A feature could be something like if a car is a four-wheel drive, has air suspension, etc. y is the time in seconds that a car took to pass testing for each feature. y is what we’ll be predicting. 

We kick it off with a couple of imports:

  • Pandas for data manipulation 
  • Seaborn and Matplotlib for visualization

We then use seaborn to set the default style. 

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set()

With Pandas imported, let’s load in the training file.

df = pd.read_csv(‘train.csv’)

Here’s how the head of the dataset looks like.

df.head()
4 Applications of Data Science in Manufacturing

Let’s see how many columns the dataset has.

df.info()
4 Applications of Data Science in Manufacturing

The dataset has 378 columns. Those are so many columns, we’ll address this later. The total number of rows is 4209.

Let’s visualize the distribution of the target variable.

plt.figure(figsize=(12,6))
sns.distplot(df[‘y’])
4 Applications of Data Science in Manufacturing

Checking for null variables, we notice that there are none.

df.isnull().any().sum()
4 Applications of Data Science in Manufacturing

We are going to perform one-hot encoding for the categorical features before we fit the dataset to a machine learning model. We start by creating a variable with those features.

cat_features = [‘X0’, ‘X1’, ‘X2’, ‘X3’, ‘X4’, ‘X5’, ‘X6’, ‘X8’]

We then create a final data frame with the one-hot encoded features. drop_first=True ensures that we prevent the dummy variable trap. For example, if we have three categories a,b and c, the final data frame will only have two out of the three categories. This is because if an entry doesn’t fall within two of the categories, then it definitely falls in the third one. Dropping one of the categories prevents the model from overfitting. 

final_data = pd.get_dummies(df, columns = cat_features, drop_first=True)

Let’s look at a snapshot of that data.

final_data.head()
4 Applications of Data Science in Manufacturing

Now we can split this dataset into a training and testing set. We use Scikit-Learn’s train_test_split function for that purpose. We start by creating a variable containing the features —  X and a variable containing the target — y. We then split the dataset with 30% of the data for testing and 70% for training.

from sklearn.model_selection import train_test_split
X = final_data.drop([‘y’,’ID’],axis=1)
y = final_data[‘y’]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=101)

Let’s start by trying a Random Forest Regressor. After importing the regressor, we instantiate it and fit it to the training set. 

from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor()
model.fit(X_train,y_train)

Next, we use the fitted model to make predictions on the testing set. 

predictions = model.predict(X_test)

Let’s evaluate its performance. In order to do that, we import NumPy and Sklearn metrics. 

from sklearn import metrics
import numpy as np

We can now compare the predictions against the true values. Since this is a regression problem, we use the mean absolute error, mean squared error and the root mean squared error metrics. 

print(‘MAE:’, metrics.mean_absolute_error(y_test, predictions))
print(‘MSE:’, metrics.mean_squared_error(y_test, predictions))
print(‘RMSE:’, np.sqrt(metrics.mean_squared_error(y_test, predictions)))
4 Applications of Data Science in Manufacturing

Let’s visualize the differences between the true values and the predicted values. 

plt.figure(figsize=(12,6))
sns.distplot((y_test-predictions),bins=50);
4 Applications of Data Science in Manufacturing

Random Forest provides us with the feature importances of the model. Let’s create a data frame with these values and visualize them.

importance = model.feature_importances_
importances_rfc_df = pd.DataFrame(importance, index=X.columns, columns=[‘Importance’])
importances_rfc_df = importances_rfc_df.sort_values(by=’Importance’, ascending=False)
importances_rfc_df = importances_rfc_df[importances_rfc_df[‘Importance’] > 0]
importances_rfc_df = importances_rfc_df.head(10)

We’ll use Seaborn to plot a barplot of these values.

plt.figure(figsize=(8,8))
plt.xticks(rotation=60, fontsize = 20)
sns.barplot(y=importances_rfc_df.index, x=importances_rfc_df[‘Importance’])
4 Applications of Data Science in Manufacturing

Unfortunately, the variables have been anonymized so we can’t know what they represent. 

Let’s see whether we can get better performance with a different algorithm. We’ll try the Lasso algorithm since it’s known to do well with data that has many features just like ours. 

We start by importing the algorithm and creating an instance of it. 

from sklearn import linear_model
clf = linear_model.Lasso(alpha=0.03)

With that in place, we can fit the model and make some predictions. 

clf.fit(X_train,y_train)
predictions = clf.predict(X_test)

Let’s check the same regression metrics. We notice a slight improvement. 

print(‘MAE:’, metrics.mean_absolute_error(y_test, predictions))
print(‘MSE:’, metrics.mean_squared_error(y_test, predictions))
print(‘RMSE:’, np.sqrt(metrics.mean_squared_error(y_test, predictions)))
4 Applications of Data Science in Manufacturing

The Lasso algorithm also has a score function that gives us the coefficient of determination R² of the prediction. 1 is the best score. A negative value means that the model is arbitrarily worse. 

clf.score(X_test,y_test)
4 Applications of Data Science in Manufacturing

Finally, let’s see whether we’ll get better results by using a deep learning model. We start by making a couple of imports:

  • Sequential for initializing the deep learning layers
  • tensorflow_docs for visualizing the models training and performance
  • Dense for adding the network layers
from keras.models import Sequential
from keras.layers import Dense
import tensorflow_docs as tfdocs
import tensorflow_docs.plots
import tensorflow_docs.modeling

Let’s create the deep learning model using Keras. We add the first layer with 128 nodes with a relu activation function. The activation function is how the model will learn. input_shape represents the number of features in our dataset. We then add the second layer that will produce the output predictions, hence it has one unit. We finally compile the model. This is the process of applying gradient descent. It enables the model to learn and reduce errors as it does so with the goal of getting the lowest error. Since it’s a regression problem, we use the mean_squared_error loss function. 

model = Sequential()
model.add(Dense(units = 128, activation=’relu’,input_shape=(X_train.shape[1],)))
model.add(Dense(units=1, activation=’relu’))
model.compile(optimizer=’adam’, loss=’mean_squared_error’,metrics=[‘mae’, ‘mse’])

In order to ensure that we get the best results, we’ll stop training the model once it stops improving. In this case, we check its performance every ten epochs. The moment the model stops improving, we stop the training process. Epochs here is the number of iterations to run the model. We’ll check the model’s performance via a Keras callback. 

import keras
early_stop = keras.callbacks.EarlyStopping(monitor=’val_loss’, patience=10)

Let’s now fit the deep learning model to the training data. We save it in a history variable so that we can use it to visualize the performance of the model. We are setting a validation set of 20%. This set will be used for evaluating the loss and model metrics at the end of each iteration.

history = model.fit(X_train,y_train,epochs=100,validation_split = 0.2,callbacks=[early_stop,tfdocs.modeling.EpochDots()])

4 Applications of Data Science in ManufacturingUsing Keras, we can print a summary of our model. 

model.summary()
4 Applications of Data Science in Manufacturing

We can now visualize the training and mean absolute error.

plotter = tfdocs.plots.HistoryPlotter(smoothing_std=2)
plotter.plot({‘Basic’: history}, metric = “mae”)
4 Applications of Data Science in Manufacturing

Here’s a visual of the mean squared error. 

plotter.plot({‘Basic’: history}, metric = “mse”)
4 Applications of Data Science in Manufacturing

We can visualize the metrics over the epochs using the saved history. 

hist = pd.DataFrame(history.history)
hist[‘epoch’] = history.epoch
hist.tail()
4 Applications of Data Science in Manufacturing

Let’s now use the model to make some predictions and check the evaluation metrics.

predictions = model.predict(X_test)
print(‘MAE:’, metrics.mean_absolute_error(y_test, predictions))
print(‘MSE:’, metrics.mean_squared_error(y_test, predictions))
print(‘RMSE:’, np.sqrt(metrics.mean_squared_error(y_test, predictions)))
4 Applications of Data Science in Manufacturing

Just like last time, we can also visualize the residuals.

plt.figure(figsize=(12,6))
sns.distplot((y_test-predictions.reshape(1263,)),bins=50);
4 Applications of Data Science in Manufacturing

Final Thoughts

In this piece, we’ve seen how data science can be applied to the manufacturing sector. We’ve seen its application in predictive analytics, ensuring production of quality products and how to ensure the manufacturing firm runs smoothly — just to mention a few. We’ve also walked through a practical application. I am sure this has created enough interest for you to dive in deeper.

[VIDEO]

Guest post: Derrick Mwiti

Stay up to date with Saturn Cloud on LinkedIn and Twitter.

You may also be interested in: Data Science in the Energy Sector.