18: Accelerated SGD & ResNets

In this lesson, we dive into various stochastic gradient descent (SGD) accelerated approaches, such as momentum, RMSProp, and Adam. We start by experimenting with these techniques in Microsoft Excel, creating a simple linear regression problem and applying the different approaches to solve it. We also introduce learning rate annealing and show how to implement it in Excel. Next, we explore learning rate schedulers in PyTorch, focusing on Cosine Annealing and how to work with PyTorch optimizers. We create a learner with a single batch callback and fit the model to obtain an optimizer. We then explore the attributes of the optimizer and explain the concept of parameter groups.

We continue by implementing the OneCycleLR scheduler from PyTorch, which adjusts the learning rate and momentum during training. We also discuss how to improve the architecture of a neural network by making it deeper and wider, introducing ResNets and the concept of residual connections. Finally, we explore various ResNet architectures from the PyTorch Image Models (timm) library and experiment with data augmentation techniques, such as random erasing and test time augmentation.

Concepts discussed

  • Stochastic gradient descent (SGD) accelerated approaches
    • Momentum
    • RMSProp
    • Adam
  • Learning rate annealing
  • PyTorch learning rate schedulers
    • Cosine Annealing
    • OneCycleLR
  • Working with PyTorch optimizers
  • Neural network architecture improvements
    • Deeper and wider networks
    • ResNets
    • Residual connections
  • Data augmentation techniques
    • Random erasing
    • Test time augmentation
  • Creating custom schedulers and experimenting with model performance


Lesson resources