How to Build a Neural Network from Scratch

Neural networks form the foundation of many modern machine learning applications, from image and speech recognition to natural language processing and game playing. Understanding how to build a neural network from scratch can deepen your knowledge of machine learning fundamentals and provide insights into how these powerful models operate. This article will guide you through the process of creating a simple neural network from scratch using Python, focusing on key concepts, necessary steps, and coding practices.

1. Understanding Neural Networks

Basic Concepts

A neural network consists of layers of interconnected nodes, or neurons, where each node represents a mathematical operation. The simplest form of a neural network is a feedforward network, where data moves in one direction—from input nodes, through hidden nodes (if any), to output nodes.

Components of a Neural Network

Input Layer: Receives the input data.
Hidden Layers: Perform computations and feature extraction.
Output Layer: Produces the final output.

Each connection between nodes has a weight that gets adjusted during training, and each node applies an activation function to its input to introduce non-linearity into the model.

2. Setting Up the Environment

To build a neural network from scratch, you’ll need Python and some essential libraries like NumPy for numerical operations. Install NumPy if you haven’t already:

pip install numpy

3. Initializing the Network

The first step in creating a neural network is to define its architecture and initialize weights and biases. Here’s how to set up a simple feedforward neural network with one hidden layer.

import numpy as np

# Define the architecture
input_size = 3   # Number of input neurons
hidden_size = 4  # Number of hidden neurons
output_size = 2  # Number of output neurons

# Initialize weights and biases
weights_input_hidden = np.random.randn(input_size, hidden_size)
weights_hidden_output = np.random.randn(hidden_size, output_size)
bias_hidden = np.zeros((1, hidden_size))
bias_output = np.zeros((1, output_size))

# Print initialized values for verification
print("Weights Input to Hidden:", weights_input_hidden)
print("Weights Hidden to Output:", weights_hidden_output)
print("Bias Hidden:", bias_hidden)
print("Bias Output:", bias_output)

4. Activation Functions

Activation functions introduce non-linearity into the network, allowing it to learn complex patterns. Common activation functions include the sigmoid, tanh, and ReLU functions. Here, we use the sigmoid function:

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    return x * (1 - x)

5. Forward Propagation

Forward propagation involves passing the input data through the network to obtain the output. This step involves matrix multiplications, adding biases, and applying the activation function.

def forward_propagation(inputs):
    hidden_layer_input = np.dot(inputs, weights_input_hidden) + bias_hidden
    hidden_layer_output = sigmoid(hidden_layer_input)

    output_layer_input = np.dot(hidden_layer_output, weights_hidden_output) + bias_output
    output_layer_output = sigmoid(output_layer_input)

    return hidden_layer_output, output_layer_output

6. Loss Function

The loss function measures how well the neural network performs by comparing the predicted output with the actual output. A common choice for binary classification problems is the Mean Squared Error (MSE):

def mean_squared_error(predicted, actual):
    return np.mean((predicted - actual) ** 2)

7. Backward Propagation

Backward propagation involves calculating the gradient of the loss function with respect to each weight by applying the chain rule. This process updates the weights to minimize the loss.

def backward_propagation(inputs, hidden_layer_output, output_layer_output, actual_output):
    global weights_input_hidden, weights_hidden_output, bias_hidden, bias_output

    # Calculate output layer error and delta
    output_error = actual_output - output_layer_output
    output_delta = output_error * sigmoid_derivative(output_layer_output)

    # Calculate hidden layer error and delta
    hidden_error = output_delta.dot(weights_hidden_output.T)
    hidden_delta = hidden_error * sigmoid_derivative(hidden_layer_output)

    # Update weights and biases
    weights_hidden_output += hidden_layer_output.T.dot(output_delta)
    weights_input_hidden += inputs.T.dot(hidden_delta)
    bias_output += np.sum(output_delta, axis=0, keepdims=True)
    bias_hidden += np.sum(hidden_delta, axis=0, keepdims=True)

8. Training the Network

Training involves iteratively performing forward and backward propagation to adjust the weights and minimize the loss. Here’s a simple training loop:

def train(inputs, actual_output, epochs, learning_rate):
    for epoch in range(epochs):
        hidden_layer_output, output_layer_output = forward_propagation(inputs)
        backward_propagation(inputs, hidden_layer_output, output_layer_output, actual_output)

        if epoch % 1000 == 0:
            loss = mean_squared_error(output_layer_output, actual_output)
            print(f"Epoch {epoch}, Loss: {loss}")

# Example usage
inputs = np.array([[0, 0, 1], 
                   [1, 1, 1], 
                   [1, 0, 1], 
                   [0, 1, 1]])
actual_output = np.array([[0], [1], [1], [0]])

train(inputs, actual_output, epochs=10000, learning_rate=0.1)

9. Testing the Network

After training, the network can be tested with new data to evaluate its performance. Forward propagate the new data and compare the predicted output with expected results.

def predict(inputs):
    _, output_layer_output = forward_propagation(inputs)
    return output_layer_output

# Example test
test_inputs = np.array([[1, 0, 0], [0, 1, 0]])
predictions = predict(test_inputs)
print("Predictions:", predictions)

10. Conclusion

Building a neural network from scratch involves understanding and implementing key concepts such as forward propagation, backward propagation, activation functions, and the training process. By creating a simple neural network using Python and NumPy, you can gain a deeper appreciation of how these models work and the principles underlying machine learning. This foundational knowledge can be applied to more complex models and frameworks, furthering your journey into the world of artificial intelligence and machine learning.

Mastering the basics of neural networks equips you with the tools to explore advanced topics, optimize models, and develop innovative solutions across various domains. Whether you’re pursuing research, developing applications, or simply expanding your knowledge, understanding how to build a neural network from scratch is a valuable and rewarding endeavor.