# Neural Network from Scratch

nn_from_scratch/NN_with_numpy.ipynb

# Introduction¶

Notebook by Lukas Frei

I am going to utilize the neural network project structure proposed by Andrew Ng in his deeplearning.ai Deep Learning Specialization in order to try to code one of the assignments of the course myself to gain an even deeper understanding. I have changed certain parts to make sure that this code does not provide solutions to the assignment but purely serves its self-study purpose.

If you'd like to learn more about deep learning, I'd highly recommend Andrew Ng's courses on Coursera: https://www.deeplearning.ai/

1. Define the neural network structure ( # of input units, # of hidden units, etc).
2. Initialize the model's parameters
3. Loop:
• Implement forward propagation
• Compute loss
• Implement backward propagation to get the gradients
• Update parameters (gradient descent) You often build helper functions to compute steps 1-3 and then merge them into one function we call nn_model(). Once you've built nn_model() and learnt the right parameters, you can make predictions on new data.
#importing the only package we are going to use
import numpy as np


### 1. Defining Network Structure¶

def layer_structure(x, y, hidden_size):
"""
Input:
x: predictors
y: response variable
hidden_size: number of neurons in the hidden layer

Output:
input_size: number of predictors in input dataset
hidden_size: number of neurosn in hidden layer
output_size: number of possible prediction outputs
"""
input_size = x.shape[0]
hidden_size = hidden_size
output_size = y.shape[0]

return (input_size, hidden_size, output_size)


### 2. Initializing Parameters¶

def initialize_parameters(input_size, hidden_size, output_size):
"""
Input:
Sizes of input, hidden, and output layer
Output:

"""
W_1 = np.random.randn(hidden_size, input_size) * 0.001
b_1 = np.zeros((hidden_size, 1))
W_2 = np.random.randn(output_size, hidden_size) * 0.001
b_2 = np.zeros((output_size, 1))

parameters = {'W_1':W_1,
'b_1':b_1,
'W_2':W_2,
'b_2':b_2}

return parameters


### 3. Loop¶

Forward Propagation

Defining a sigmoid and tanh function:

def tanh(z):
"""
Inputs z and outputs the tanh of z
"""

t = (np.exp(z)-np.exp(-z))/(np.exp(z)+np.exp(-z))

return t

def sigmoid(z):
"""
Inputs z and outputs the sigmoid of z
"""

s = 1/(1+np.exp(-z))

return s


Using the input and the initialized parameters to compute the output:

def forward_propagation(x, parameters):
"""
Taking the input X along with the parameters and computing the output
"""

#retrieving the initialized parameters from the 'parameters' dictionary
W_1, b_1, W_2, b_2 = parameters['W_1'], parameters['b_1'], parameters['W_2'], parameters['b_2']

#computing the linear and activation part of both the hidden and the output layer
Z_1 = np.dot(W_1, x) + b_1
A_1 = tanh(Z_1)
Z_2 = np.dot(W_2, A_1) + b_2
A_2 = sigmoid(Z_2)

#storing results in new dictionary called 'cache'
cache = {'Z1': Z_1,
'A1': A_1,
'Z2': Z_2,
'A2': A_2}

return cache


Next step: Computing the loss of our output

def compute_loss(A2, y, parameters):
"""
Computing the cross-entropy loss
"""

n_observations = y.shape[1]

loss = 1/n_observations * np.sum(y*np.log(A2) + (1-y)*np.log(1-A2))

loss = np.squeeze(loss)

return loss


Taking the derivatives in backprop:

def backpropagation(parameters, cache, x, y):
"""
Taking the derivatives in backprop
"""

m = x.shape[1] #number of observations

#retrieving parameters as well as calculated output from forward propagation
W1 = parameters["W1"]
W2 = parameters["W2"]

A1 = cache["A1"]
A2 = cache["A2"]

#calculating the derivatives
dZ2 = A2 - y
dW2 = 1/m * np.dot(dZ2, A1.T)
db2 = 1/m * np.sum(dZ2, axis=1, keepdims=True)
dZ1 = np.dot(W2.T, dZ2) * (1 - np.power(A1, 2))
dW1 = 1/m * np.dot(dZ1, x.T)
db1 = 1/m * np.sum(dZ1, axis=1, keepdims=True)

"db1": db1,
"dW2": dW2,
"db2": db2}



Updating parameters with our previously computed derivatives:

def update_parameters(parameters, grads, learning_rate = 1):
"""
Updating the parameters after backprop
"""

W1 = parameters["W1"]
b1 = parameters["b1"]
W2 = parameters["W2"]
b2 = parameters["b2"]

W1 = W1 - learning_rate*dW1
b1 = b1 - learning_rate*db1
W2 = W2 - learning_rate*dW2
b2 = b2 - learning_rate*db2

parameters = {"W1": W1,
"b1": b1,
"W2": W2,
"b2": b2}

return parameters


Putting everything together into a single function representing the neural net:

def neural_net(x, y, hidden_size, iterations = 1000):
"""
Putting all functions together to form a neural network
"""
input_size = layer_sizes(x, y)[0]
output_size = layer_sizes(x, y)[2]

parameters = initialize_parameters(input_size, hidden_size, output_size)
W1 = parameters["W_1"]
b1 = parameters["b_1"]
W2 = parameters["W_2"]
b2 = parameters["b_2"]

for i in range(0, num_iterations): #gradient descent

A2, cache = forward_propagation(x, parameters) #forward prop

cost = compute_cost(A2, y, parameters) #computing the cost of our output

grads = backpropagation(parameters, cache, x, y) #taking the derivatives

parameters = update_parameters(parameters, grads) #updating our parameters