Backpropagation from First Principles
In this summary, we derive the equations of backpropagation for a multi-layer feedforward neural network (Multi-Layer Perceptron) using standard mathematical calculus.
[!NOTE] To see a visual simulation of parameter fitting and gradient optimization on a live canvas, you can launch the 🔬Linear Regression Simulator from the digital outpost.
1. Network Architecture
Consider a feedforward network with layers. For a layer :
- is the weight connecting neuron in layer to neuron in layer .
- is the bias of neuron in layer .
- is the weighted input to activation function of neuron in layer .
- is the activation output of neuron in layer .
The forward propagation relations are defined as:
where is an activation function (e.g., Sigmoid, ReLU).
2. The Loss Function
We define a cost function for a single training instance. For example, using Mean Squared Error (MSE):
where is the target value, and is the output layer activation.
3. Deriving the Backpropagation Equations
To update weights and biases using gradient descent, we need to compute the partial derivatives of the cost with respect to every weight and bias .
We define the error of neuron in layer as:
Equation 1: Error in Output Layer ()
Applying the chain rule:
Since and :
Thus:
Equation 2: Error in Hidden Layers ()
We calculate the error in terms of the error of the subsequent layer :
Since and , we have:
Substituting this back gives:
Equation 3: Derivative with respect to Biases
Equation 4: Derivative with respect to Weights
4. Backpropagation Implementation
Here is a simplified TypeScript snippet demonstrating the calculation of output error and gradient descent updates:
interface NetworkLayer {
weights: number[][]; // [neurons_out][neurons_in]
biases: number[];
inputs: number[];
outputs: number[];
zs: number[];
}
function sigmoidPrime(z: number): number {
const s = 1.0 / (1.0 + Math.exp(-z));
return s * (1.0 - s);
}
// Single step weight and bias update
function backwardPass(
layer: NetworkLayer,
nextLayerWeights: number[][],
nextLayerErrors: number[],
learningRate: number
): number[] {
const numNeurons = layer.weights.length;
const currentErrors: number[] = [];
for (let j = 0; j < numNeurons; j++) {
// Backpropagate error sum
let errorSum = 0;
for (let k = 0; k < nextLayerErrors.length; k++) {
errorSum += nextLayerWeights[k][j] * nextLayerErrors[k];
}
const delta = errorSum * sigmoidPrime(layer.zs[j]);
currentErrors.push(delta);
// Update weights and biases
layer.biases[j] -= learningRate * delta;
for (let k = 0; k < layer.inputs.length; k++) {
layer.weights[j][k] -= learningRate * delta * layer.inputs[k];
}
}
return currentErrors;
}
🔬 Interactive Laboratory Sandbox
Run practical simulations and numerical verifications associated with the mathematical equations derived in this note: