17: Initialization/normalization

In this lesson, we discuss the importance of weight initialization in neural networks and explore various techniques to improve training. We start by introducing changes to the miniai library and demonstrate the use of HooksCallback and ActivationStats for better visualization. We then dive into the importance of having zero mean and unit standard deviation in neural networks and introduce the Glorot (Xavier) initialization.

We also cover variance, standard deviation, and covariance, and their significance in understanding relationships between data points. We create a novel Generalized ReLU activation function and discuss the Layer-wise Sequential Unit Variance (LSUV) technique for initializing any neural network. We explore normalization techniques, such as Layer Normalization and Batch Normalization, and briefly mention other normalization methods like Instance Norm and Group Norm.

Finally, we experiment with different batch sizes, learning rates, and optimizers like Accelerated SGD, RMSProp, and Adam to improve performance.

Concepts discussed

  • Callback class and TrainLearner subclass
  • HooksCallback and ActivationStats
  • Glorot (Xavier) initialization
  • Variance, standard deviation, and covariance
  • General ReLU activation function
  • Layer-wise Sequential Unit Variance (LSUV)
  • Layer Normalization and Batch Normalization
  • Instance Norm and Group Norm
  • Accelerated SGD, RMSProp, and Adam optimizers
  • Experimenting with batch sizes and learning rates

Video

Papers from the lesson