17: Initialization/normalization
In this lesson, we discuss the importance of weight initialization in neural networks and explore various techniques to improve training. We start by introducing changes to the miniai library and demonstrate the use of HooksCallback and ActivationStats for better visualization. We then dive into the importance of having zero mean and unit standard deviation in neural networks and introduce the Glorot (Xavier) initialization.
We also cover variance, standard deviation, and covariance, and their significance in understanding relationships between data points. We create a novel Generalized ReLU activation function and discuss the Layer-wise Sequential Unit Variance (LSUV) technique for initializing any neural network. We explore normalization techniques, such as Layer Normalization and Batch Normalization, and briefly mention other normalization methods like Instance Norm and Group Norm.
Finally, we experiment with different batch sizes, learning rates, and optimizers like Accelerated SGD, RMSProp, and Adam to improve performance.
Concepts discussed
- Callback class and TrainLearner subclass
- HooksCallback and ActivationStats
- Glorot (Xavier) initialization
- Variance, standard deviation, and covariance
- General ReLU activation function
- Layer-wise Sequential Unit Variance (LSUV)
- Layer Normalization and Batch Normalization
- Instance Norm and Group Norm
- Accelerated SGD, RMSProp, and Adam optimizers
- Experimenting with batch sizes and learning rates
Video
Papers from the lesson
- Discuss this lesson
- Understanding the difficulty of training deep feedforward neural networks - Xavier Glorot, Yoshua Bengio
- Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification - Kaiming He et al
- LSUV - All you need is a good init - Dmytro Mishkin, Jiri Matas
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift - Sergey Ioffe, Christian Szegedy
- Layer Normalization - Ba, Kiros, Hinton