June 6, 2025
Neural networks need nonlinearities when stacking linear layers one after another.
A linear layer represents a linear transformation, and repeated linear transformations can be represented by just a single linear transformation, so adding additional layers is essentially useless. By introducing a nonlinearity in between the two layers, we add additional expressiveness, because the network can learn to do things that aren't possible with just linear transformations.
Common nonlinearities include sigmoid (), ReLU (), and tanh (). So, if we have a two layer network with per-layer weights A and B, we might have something like this:
where × is matrix multiplication, and ReLU is applied element-wise
But what exactly does that look like, intuitively?
If you think about it, we're still just transforming one 2D input vector into another 2D vector. So let's try visualizing that in the same way we might try visualizing a matrix multiplication, by showing how it affects a grid of points in the plane.
This visualization shows the pipeline used in neural networks: input data passes through a linear transformation (Matrix 1), then an activation function, then another linear transformation (Matrix 2). You can adjust each component to understand how they work together to transform data.
Matrix 1 Determinant: 1.00 | Matrix 2 Determinant: 1.00
Pipeline: Points → Matrix1 → Activation Function → Matrix2 → Result
Reset Options: Use individual reset buttons for each section, or "Reset All" for complete identity transformation.