16: The Learner framework

In Lesson 16, we dive into building a flexible training framework called the learner. We start with a basic callbacks Learner, which is an intermediate step towards the flexible learner. We introduce callbacks, which are functions or classes called at specific points during the training process, and demonstrate the creation of a simple callback. We also introduce the concept of CancelFitException, CancelEpochException, and CancelBatchException.

Next, we explore metrics and create a MetricsCB callback to print out metrics during training. We introduce the torcheval library and create a DeviceCB callback to handle moving the model and data to the appropriate device. We refactor the code using a context manager to simplify the code and make it easier to maintain and add callbacks in the future.

We then focus on looking inside the models to diagnose and fix problems during training. We introduce a set_seed function and train a model with a high learning rate of 0.6 to test the stability of the training. Finally, we discuss analyzing the training process by looking at the mean and standard deviation of each layer’s activations, using PyTorch hooks and creating histograms of the activations.

Concepts discussed

  • Building a flexible training framework
  • Basic Callbacks Learner
  • Callbacks and exceptions (CancelFitException, CancelEpochException, CancelBatchException)
  • Metrics and MetricsCB callback
  • torcheval library
  • DeviceCB callback
  • Refactoring code with context managers
  • set_seed function
  • Analyzing the training process
  • PyTorch hooks
  • Histograms of activations

Video

Lesson resources