Neural Network from Scratch

by lksfr



Notebook by Lukas Frei

I am going to utilize the neural network project structure proposed by Andrew Ng in his Deep Learning Specialization in order to try to code one of the assignments of the course myself to gain an even deeper understanding. I have changed certain parts to make sure that this code does not provide solutions to the assignment but purely serves its self-study purpose.

If you'd like to learn more about deep learning, I'd highly recommend Andrew Ng's courses on Coursera:

  1. Define the neural network structure ( # of input units, # of hidden units, etc).
  2. Initialize the model's parameters
  3. Loop:
    • Implement forward propagation
    • Compute loss
    • Implement backward propagation to get the gradients
    • Update parameters (gradient descent) You often build helper functions to compute steps 1-3 and then merge them into one function we call nn_model(). Once you've built nn_model() and learnt the right parameters, you can make predictions on new data.
#importing the only package we are going to use
import numpy as np

1. Defining Network Structure

def layer_structure(x, y, hidden_size):
        x: predictors
        y: response variable
        hidden_size: number of neurons in the hidden layer
        input_size: number of predictors in input dataset
        hidden_size: number of neurosn in hidden layer
        output_size: number of possible prediction outputs  
    input_size = x.shape[0]
    hidden_size = hidden_size
    output_size = y.shape[0]
    return (input_size, hidden_size, output_size)

2. Initializing Parameters

def initialize_parameters(input_size, hidden_size, output_size):
        Sizes of input, hidden, and output layer
    W_1 = np.random.randn(hidden_size, input_size) * 0.001
    b_1 = np.zeros((hidden_size, 1))
    W_2 = np.random.randn(output_size, hidden_size) * 0.001
    b_2 = np.zeros((output_size, 1))
    parameters = {'W_1':W_1,
    return parameters

3. Loop

Forward Propagation

Defining a sigmoid and tanh function:

def tanh(z):
    Inputs z and outputs the tanh of z
    t = (np.exp(z)-np.exp(-z))/(np.exp(z)+np.exp(-z))
    return t
def sigmoid(z):
    Inputs z and outputs the sigmoid of z
    s = 1/(1+np.exp(-z))
    return s

Using the input and the initialized parameters to compute the output:

def forward_propagation(x, parameters):
     Taking the input X along with the parameters and computing the output                                             
    #retrieving the initialized parameters from the 'parameters' dictionary
    W_1, b_1, W_2, b_2 = parameters['W_1'], parameters['b_1'], parameters['W_2'], parameters['b_2']
    #computing the linear and activation part of both the hidden and the output layer
    Z_1 =, x) + b_1
    A_1 = tanh(Z_1)
    Z_2 =, A_1) + b_2
    A_2 = sigmoid(Z_2)
    #storing results in new dictionary called 'cache'
    cache = {'Z1': Z_1,
             'A1': A_1,
             'Z2': Z_2,
             'A2': A_2}
    return cache

Next step: Computing the loss of our output

def compute_loss(A2, y, parameters):
    Computing the cross-entropy loss
    n_observations = y.shape[1]
    loss = 1/n_observations * np.sum(y*np.log(A2) + (1-y)*np.log(1-A2))
    loss = np.squeeze(loss)
    return loss

Taking the derivatives in backprop:

def backpropagation(parameters, cache, x, y):
    Taking the derivatives in backprop
    m = x.shape[1] #number of observations
    #retrieving parameters as well as calculated output from forward propagation
    W1 = parameters["W1"]
    W2 = parameters["W2"]
    A1 = cache["A1"]
    A2 = cache["A2"]
    #calculating the derivatives
    dZ2 = A2 - y
    dW2 = 1/m *, A1.T)
    db2 = 1/m * np.sum(dZ2, axis=1, keepdims=True)
    dZ1 =, dZ2) * (1 - np.power(A1, 2))
    dW1 = 1/m *, x.T)
    db1 = 1/m * np.sum(dZ1, axis=1, keepdims=True)
    grads = {"dW1": dW1,
             "db1": db1,
             "dW2": dW2,
             "db2": db2}
    return grads

Updating parameters with our previously computed derivatives:

def update_parameters(parameters, grads, learning_rate = 1):
    Updating the parameters after backprop
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]
    dW1 = grads["dW1"]
    db1 = grads["db1"]
    dW2 = grads["dW2"]
    db2 = grads["db2"]
    W1 = W1 - learning_rate*dW1
    b1 = b1 - learning_rate*db1
    W2 = W2 - learning_rate*dW2
    b2 = b2 - learning_rate*db2
    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}
    return parameters

Putting everything together into a single function representing the neural net:

def neural_net(x, y, hidden_size, iterations = 1000):
    Putting all functions together to form a neural network
    input_size = layer_sizes(x, y)[0]
    output_size = layer_sizes(x, y)[2]
    parameters = initialize_parameters(input_size, hidden_size, output_size)
    W1 = parameters["W_1"]
    b1 = parameters["b_1"]
    W2 = parameters["W_2"]
    b2 = parameters["b_2"]
    for i in range(0, num_iterations): #gradient descent
        A2, cache = forward_propagation(x, parameters) #forward prop
        cost = compute_cost(A2, y, parameters) #computing the cost of our output
        grads = backpropagation(parameters, cache, x, y) #taking the derivatives
        parameters = update_parameters(parameters, grads) #updating our parameters