ExponentialLR

ExponentialLR#

class braintools.optim.ExponentialLR(base_lr=0.001, gamma=0.95, last_epoch=0)#

Exponential learning rate scheduler - Decays learning rate exponentially.

ExponentialLR multiplies the learning rate by gamma at every epoch, creating a smooth exponential decay. This scheduler is useful when you want a continuous and predictable decrease in the learning rate throughout training.

Parameters:
  • base_lr (float | List[float]) – Initial learning rate(s). Can be a single float or a list of floats for multiple parameter groups. Default: 1e-3.

  • gamma (float) – Multiplicative factor of learning rate decay per epoch. Must be in range (0, 1). Typical values: 0.95-0.99 for slow decay, 0.9-0.95 for moderate decay.

  • last_epoch (int) – The index of the last epoch. Used for resuming training. Default: -1 (starts from beginning).

Notes

The learning rate at epoch \(t\) is computed as:

\[\eta_t = \eta_0 \cdot \gamma^t\]

where \(\eta_0\) is the initial learning rate (base_lr) and \(t\) is the current epoch number.

Key characteristics:

  • Smooth exponential decay every epoch

  • Learning rate decreases continuously

  • Simple one-parameter control (gamma)

  • Decay rate is constant in logarithmic scale

Gamma selection guidelines:

  • gamma=0.95: Moderate decay, lr halves every ~14 epochs

  • gamma=0.96: Gentle decay, lr halves every ~17 epochs

  • gamma=0.98: Slow decay, lr halves every ~35 epochs

  • gamma=0.99: Very slow decay, lr halves every ~69 epochs

When to use:

  • When you want smooth, continuous learning rate reduction

  • For fine-tuning with gradual decay

  • When step-based schedules are too abrupt

  • For long training runs with gradual convergence

Examples

Basic exponential decay:

>>> import braintools
>>> import brainstate
>>>
>>> model = brainstate.nn.Linear(10, 5)
>>> # Decay by 0.95 each epoch
>>> scheduler = braintools.optim.ExponentialLR(base_lr=0.1, gamma=0.95)
>>> optimizer = braintools.optim.SGD(lr=scheduler, momentum=0.9)
>>> optimizer.register_trainable_weights(model.states(brainstate.ParamState))
>>>
>>> for epoch in range(20):
...     # Training code
...     scheduler.step()
...     if epoch % 5 == 0:
...         print(f"Epoch {epoch}: lr = {optimizer.current_lr:.6f}")
Epoch 0: lr = 0.100000
Epoch 5: lr = 0.077378  # lr * 0.95^5
Epoch 10: lr = 0.059874  # lr * 0.95^10
Epoch 15: lr = 0.046329  # lr * 0.95^15

Slow decay for fine-tuning:

>>> # Very gentle decay with gamma=0.99
>>> scheduler = braintools.optim.ExponentialLR(base_lr=0.001, gamma=0.99)
>>> optimizer = braintools.optim.Adam(lr=scheduler)
>>> optimizer.register_trainable_weights(model.states(brainstate.ParamState))
>>>
>>> for epoch in range(100):
...     finetune_epoch(model, optimizer, finetune_loader)
...     scheduler.step()
# After 100 epochs: lr ≈ 0.001 * 0.99^100 ≈ 0.000366

Moderate decay for standard training:

>>> # Moderate decay with gamma=0.96
>>> scheduler = braintools.optim.ExponentialLR(base_lr=0.1, gamma=0.96)
>>> optimizer = braintools.optim.SGD(lr=scheduler, momentum=0.9, weight_decay=1e-4)
>>> optimizer.register_trainable_weights(model.states(brainstate.ParamState))
>>>
>>> for epoch in range(50):
...     optimizer.step(grads)
...     scheduler.step()
# lr smoothly decreases from 0.1 to ~0.013

Combining with warmup:

>>> # Warmup followed by exponential decay
>>> warmup = braintools.optim.LinearLR(
...     start_factor=0.1,
...     end_factor=1.0,
...     total_iters=5
... )
>>> decay = braintools.optim.ExponentialLR(base_lr=0.01, gamma=0.95)
>>> scheduler = braintools.optim.ChainedScheduler([warmup, decay])
>>>
>>> optimizer = braintools.optim.Adam(lr=scheduler)
>>> optimizer.register_trainable_weights(model.states(brainstate.ParamState))
>>>
>>> for epoch in range(100):
...     optimizer.step(grads)
...     scheduler.step()

Using with different optimizers:

>>> # Works with any optimizer
>>> scheduler = braintools.optim.ExponentialLR(base_lr=0.001, gamma=0.98)
>>>
>>> # With Adam
>>> adam_opt = braintools.optim.Adam(lr=scheduler)
>>> adam_opt.register_trainable_weights(model.states(brainstate.ParamState))
>>>
>>> # Or with RMSprop
>>> model2 = brainstate.nn.Linear(10, 5)
>>> scheduler2 = braintools.optim.ExponentialLR(base_lr=0.001, gamma=0.98)
>>> rmsprop_opt = braintools.optim.RMSprop(lr=scheduler2)
>>> rmsprop_opt.register_trainable_weights(model2.states(brainstate.ParamState))

Saving and loading state:

>>> scheduler = braintools.optim.ExponentialLR(base_lr=0.1, gamma=0.95)
>>> optimizer = braintools.optim.SGD(lr=scheduler, momentum=0.9)
>>> optimizer.register_trainable_weights(model.states(brainstate.ParamState))
>>>
>>> # Train for some epochs
>>> for epoch in range(50):
...     scheduler.step()
>>>
>>> # Save checkpoint
>>> checkpoint = {
...     'epoch': 50,
...     'model': model.state_dict(),
...     'scheduler': scheduler.state_dict(),
... }
>>>
>>> # Resume training
>>> new_scheduler = braintools.optim.ExponentialLR(base_lr=0.1, gamma=0.95)
>>> new_scheduler.load_state_dict(checkpoint['scheduler'])
>>> # lr will be correctly set to 0.1 * 0.95^50

Aggressive decay:

>>> # Fast decay with gamma=0.9
>>> scheduler = braintools.optim.ExponentialLR(base_lr=0.1, gamma=0.9)
>>> optimizer = braintools.optim.SGD(lr=scheduler, momentum=0.9)
>>> optimizer.register_trainable_weights(model.states(brainstate.ParamState))
>>>
>>> for epoch in range(30):
...     optimizer.step(grads)
...     scheduler.step()
# After 30 epochs: lr ≈ 0.1 * 0.9^30 ≈ 0.00424

See also

StepLR

Step-wise learning rate decay

CosineAnnealingLR

Cosine annealing schedule

MultiStepLR

Multi-step learning rate decay

References

get_lr()[source]#

Calculate learning rate.