Data science application in the energy sector can lead to operation efficiency, hence saving the industry millions of dollars annually. It can also help the industry in saving energy, enabling the industry to serve the world’s increasing energy demands. In this piece of writing, we’ll examine some of these application areas. At the very end, we’ll also walk through an analysis of a real-life dataset.

Some of the application areas we’ll look at are:

  • Power Consumption Prediction
  • Forecasting Energy Production
  • Predicting Power Outages
  • Equipment Maintenance
  • Prevent Power Theft
  • Digital Oil Fields

Power Consumption Prediction

The installation of smart meters in most homes has enabled energy companies to collect a vast amount of data. This means that the companies have power consumption information for every household and area. By coupling this information with population growth data, the companies can predict future energy demands. This can enable them to plan for the future adequately. The consumer benefit would be better pricing since future power demands can be determined. 

 

Forecasting Energy Production 

The ability to forecast power production is crucial as it helps power companies in determining if the power produced will meet future power demands. In the event forecasted power is below future demands, energy companies can start implementing measures to avert a possible power crisis. The energy forecast can be determined from projected wind speeds for renewable energy and projected water levels in dams for hydropower. 

 

Predicting Power Outages 

Power outages are a major problem in most parts of the world, especially during the rainy season and heavy storms. Instead of always fixing broken power lines, energy companies can implement preventive measures that would enable them to predict a possible outage before it happens. This can be achieved with the combination of previous outages data with weather forecasts. 

 

Equipment Maintenance 

Failure of equipment at power plants can lead to major power problems. This is because of the time lost in fixing the equipment. Some of this equipment can take a couple of days to diagnose problems and fix them. Installation of sensors that collect data from all this equipment can help in determining an imminent failure. Dealing with the issue before it happens ensures that the time spent on fixing issues at power plants is reduced significantly. 

 

Prevent Power Theft

Theft of power is a major problem that can lead to loss of millions for power companies in revenue. By using smart grids, power companies can monitor power surges. For example, if power consumption in a certain area doubles all of a sudden, a company can investigate that to determine the reason for the power surge. 

 

Digital Oil Fields

Oil discovery and production is an expensive affair. Implementation of data science in this sector can lead to earlier oil discovery as well as reduced cost of production. This ultimately puts more money in the coffers of the oil companies. In order to collect data production and exploration, companies are installing sensors that collect data points such as temperature, vibration, and volume. Using these sensors has enabled the monitoring of every step in the oil production and exploration step. 

 

We’ll now work through an analysis of this dataset that is available on Kaggle. The dataset contains meter readings for several buildings. The meter types in play are chilled water, electric, hot water, and steam meters. The data is collected over a three-year timeframe. The competition host —  ASHRAE — aims to advance the arts and sciences of heating, ventilation, air conditioning, refrigeration, and their allied fields.

As always, we start by importing our tools of trade. That is Pandas, Matplotlib, and Seaborn. 

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

Next, we’ll import the training data and check its head.

train = pd.read_csv(‘train.csv’)
train.head()

Data Science in the Energy Sector

Now, using the building ID, we can merge the above data frame with the dataset that contains the buildings’ metadata. However, we first have to load in the file with the building’s metadata. 

weather = pd.read_csv(‘weather_test.csv’)
buildings = pd.read_csv(‘building_metadata.csv’)
buildings.head()
Data Science in the Energy Sector

We are now ready to merge this dataset with the training data frame. We do that below, and then check its info. 

df = buildings.merge(train, on=’building_id’, how=’left’)
df.info()
Data Science in the Energy Sector

Here’s a snapshot of the merged file:

Data Science in the Energy SectorIndicator of the Primary Category of activities for the building

Visualizing the number of entries in the dataset, we notice that most of the rows are from the education sector. 

plt.figure(figsize=(8,6))
sns.countplot(y = ‘primary_use’, data=df)
Data Science in the Energy Sector

However, looking at the power consumption by the primary use, we notice that services are leading. In order to visualize this, we start by grouping the dataset by the primary use and then find the mean usage per primary use. 

primary_use = df.groupby(‘primary_use’)[‘meter_reading’].mean().sort_values(ascending= False).reset_index()
primary_use.head()
Data Science in the Energy Sector

We can then visualize this using a barplot. 

plt.figure(figsize=(8,6))
sns.barplot(y=’primary_use’,x=’meter_reading’,data=primary_use)

Data Science in the Energy Sector

Visualizing using the Timestamp

The data frame has a timestamp column. We can use this to create new columns such as the day of the week and month. We can then use these new columns to draw some visuals. 

First, we convert the to_datetime using Pandas, then use this to create the month, day, hour, and day name. 

df[“timestamp”] = pd.to_datetime(df[“timestamp”])
df[“month”] = df[“timestamp”].dt.month
df[“day”] = df[“timestamp”].dt.day
df[“hour”] = df[“timestamp”].dt.hour
df[“weekday”] = df[“timestamp”].dt.weekday
df[“day_name”] = df[“timestamp”].dt.day_name()

Meter Reading by Day 

Let’s now visualize the meter reading by the day. We first create the data frame to visualize by grouping by the day and summing the meter reading. 

meter = df.groupby([‘day_name’])[‘meter_reading’].sum().sort_values(ascending=False).reset_index()
meter.head()
Data Science in the Energy Sector

Now visualizing the power-consuming by the day of the week. 

plt.figure(figsize=(8,6))
sns.barplot(x=’day_name’,y=’meter_reading’, data=meter)
Data Science in the Energy Sector

Let’s now visualize this over the week using the day, and hour.

plt.figure(figsize=(12,6))
sns.lineplot(y=’meter_reading’,x=’day’, data=df,legend=’full’,label=’Day’)
sns.lineplot(y=’meter_reading’,x=’hour’, data=df,legend=’full’,label=’Hour’)
sns.lineplot(y=’meter_reading’,x=’weekday’, data=df,legend=’full’,label=’Week Day’)
plt.xlabel(‘Time’)
plt.legend()
Data Science in the Energy Sector

Site Power Consumption

We can proceed to check the power consumption by the site. We start by creating that data frame. We do this by grouping the our dataset by the site ID and summing the meter reading. 

site = df.groupby(‘site_id’)[‘meter_reading’].sum().sort_values(ascending=False).reset_index()
site.head()
Data Science in the Energy Sector

Visualizing that, we see that site 13 has the highest power consumption. 

plt.figure(figsize=(8,6))
sns.barplot(x=’site_id’,y=’meter_reading’, data=site)
Data Science in the Energy Sector

Building Power Consumption

Let’s now look at the building with the highest power consumption. 

building = df.groupby(‘building_id’)[‘meter_reading’].sum().sort_values(ascending=False).reset_index()
building.head()
Data Science in the Energy Sector

As visual of the same looks like this.

plt.figure(figsize=(8,6))
sns.barplot(x=’building_id’,y=’meter_reading’, data=building.head(10))
Data Science in the Energy Sector

Checking the primary use for this building, we notice that it’s education. 

b1099 = df[df[‘building_id’] == 1099]
b1099[‘primary_use’].unique()
Data Science in the Energy Sector

Meter Type Power Consumption

Which meter type consumes the most power? We’ll visualize that by grouping the data by the meter type and summing the meter readings. 

# 0: electricity, 1: chilledwater, 2: steam, 3: hotwater
meter = df.groupby(‘meter’)[‘meter_reading’].sum().sort_values(ascending=False).reset_index()

We then remove the scientific notation from using Pandas so that we can see the whole figures. 

pd.options.display.float_format = ‘{:.2f}’.format
meter
Data Science in the Energy Sector
plt.figure(figsize=(8,6))
sns.barplot(x=’meter’,y=’meter_reading’, data=meter)

As we’d expect, the highest is the steam meters.

Data Science in the Energy SectorBuildings and the Year they Were Built

Let’s now visualize the meter reading of the buildings over the years. In order to make this possible, we group the dataset by the year and sum the meter readings. 

year = df.groupby(‘year_built’)[‘meter_reading’].sum().sort_values(ascending=False).reset_index()
year.head()

Here’s a snapshot of the generated data frame. 

Data Science in the Energy Sector

Here’s how it looks like when visualized on a line plot. 

plt.figure(figsize=(12,6))
sns.lineplot(y=’meter_reading’,x=’year_built’, data=year)
Data Science in the Energy Sector

Number of Buildings Per Year

We can take a look at the buildings and the years they were built. Notice the spike in the number of buildings at around 1976. 

Data Science in the Energy SectorLet’s look at the code snippet that generates the above visual.

number = df.groupby(‘year_built’)[‘building_id’].count().sort_values(ascending=False).reset_index()
number.columns = [‘Year Built’, ‘Number of Houses’]
number.head()

Mean Power Consumption Per Year

Looking at the meter readings over the years, we notice high power consumption in the early 90s. 

Data Science in the Energy SectorThis is obtained by computing the mean of the meter readings over the years. 

mean = df.groupby(‘year_built’)[‘meter_reading’].mean().sort_values(ascending=False).reset_index()
mean.columns = [‘Year’, ‘Mean’]
mean.head()

Meter Distribution

Let’s look at the number of meters distributed over the years. 

Data Science in the Energy SectorThe visual above can be obtained as follows.

meter_dist = df.groupby(‘year_built’)[‘meter’].count().sort_values(ascending=False).reset_index()
meter_dist.columns = [‘Year’, ‘Meters’]
meter_dist.head()

Final Thoughts 

That was just a scratch on the surface as far as the application of data science in the energy sector is concerned. We’ve seen how its application can help power companies cut costs, increase profit, and operate more optimally. We’ve also walked through an example of real-world application. I am certain that using this information, you’ll find a couple of items you can implement in your organization for quick data science wins. 

[VIDEO]

Guest post: Derrick Mwiti

Stay up to date with Saturn Cloud on LinkedIn and Twitter.

You may also be interested in: Evolving the Transportation Industry with Machine Learning.