17: Initialization/normalization

In this lesson, we discuss the importance of weight initialization in neural networks and explore various techniques to improve training. We start by introducing changes to the miniai library and demonstrate the use of HooksCallback and ActivationStats for better visualization. We then dive into the importance of having zero mean and unit standard deviation in neural networks and introduce the Glorot (Xavier) initialization.

We also cover variance, standard deviation, and covariance, and their significance in understanding relationships between data points. We create a novel Generalized ReLU activation function and discuss the Layer-wise Sequential Unit Variance (LSUV) technique for initializing any neural network. We explore normalization techniques, such as Layer Normalization and Batch Normalization, and briefly mention other normalization methods like Instance Norm and Group Norm.

Finally, we experiment with different batch sizes, learning rates, and optimizers like Accelerated SGD, RMSProp, and Adam to improve performance.

Concepts discussed

Callback class and TrainLearner subclass
HooksCallback and ActivationStats
Glorot (Xavier) initialization
Variance, standard deviation, and covariance
General ReLU activation function
Layer-wise Sequential Unit Variance (LSUV)
Layer Normalization and Batch Normalization
Instance Norm and Group Norm
Accelerated SGD, RMSProp, and Adam optimizers
Experimenting with batch sizes and learning rates

17: Initialization/normalization

Concepts discussed

Video

Papers from the lesson