CyclicLR#
- class braintools.optim.CyclicLR(base_lr=0.001, max_lr=0.01, step_size_up=2000, step_size_down=None, mode='triangular', gamma=1.0, scale_fn=None, scale_mode='cycle', last_epoch=0)#
Cyclic learning rate scheduler - Oscillates learning rate between bounds.
CyclicLR implements a learning rate schedule that cyclically varies between a minimum (base_lr) and maximum (max_lr) learning rate. This helps the optimizer explore different regions of the loss landscape and can lead to better convergence and generalization. The policy was originally proposed for faster convergence without extensive hyperparameter tuning.
- Parameters:
base_lr (
float|List[float]) – Lower learning rate boundaries in each cycle. This is the minimum learning rate during the cycle. Can be a single float or list for multiple parameter groups. Default: 1e-3.max_lr (
float|List[float]) – Upper learning rate boundaries in each cycle. The learning rate will oscillate between base_lr and max_lr. Can be a single float or list. Default: 1e-2.step_size_up (
int) – Number of iterations in the increasing half of a cycle. Default: 2000.step_size_down (
int|None) – Number of iterations in the decreasing half of a cycle. If None, it’s set equal to step_size_up. Default: None.mode (
str) –One of {‘triangular’, ‘triangular2’, ‘exp_range’}. Values correspond to policies detailed below:
’triangular’: Basic triangular cycle without amplitude scaling.
’triangular2’: Basic triangular cycle that scales amplitude by half each cycle.
’exp_range’: Triangular cycle that scales amplitude by gamma^(cycle iterations).
Default: ‘triangular’.
gamma (
float) – Constant used in ‘exp_range’ mode for multiplicative scaling. gamma^(cycle iterations) gives the scaling factor. Default: 1.0.scale_fn (
Callable|None) – Custom scaling function given y = scale_fn(x), where x is the current cycle iteration. Overrides mode parameter. Default: None.scale_mode (
str) – {‘cycle’, ‘iterations’}. Determines whether scale_fn uses cycle number or cycle iterations as input. Default: ‘cycle’.last_epoch (
int) – The index of the last epoch. Used when resuming training. Default: 0.
Notes
Mathematical Formulation:
The learning rate oscillates according to:
\[\text{lr} = \text{base_lr} + (\text{max_lr} - \text{base_lr}) \times \max(0, 1 - |x - 1|) \times \text{scale}\]where x cycles between 0 and 2, and scale depends on the mode.
Modes Explained:
triangular: Constant amplitude oscillation - LR oscillates between base_lr and max_lr with fixed amplitude
triangular2: Decaying amplitude by half each cycle - Amplitude = (max_lr - base_lr) * 0.5^(cycle_number)
exp_range: Exponentially decaying amplitude - Amplitude = (max_lr - base_lr) * gamma^(iterations)
Finding Optimal LR Range:
Use the LR range test to find optimal base_lr and max_lr: 1. Start with very low LR (e.g., 1e-7) 2. Increase LR exponentially each batch 3. Plot loss vs LR and find:
base_lr: LR where loss starts decreasing
max_lr: LR where loss stops decreasing or starts increasing
Benefits:
No manual schedule tuning: Automatically handles LR scheduling
Escapes saddle points: Periodic high LR helps escape flat regions
Better generalization: Oscillation prevents overfitting to sharp minima
Fast convergence: Can achieve super-convergence with proper range
Examples
Basic triangular schedule:
>>> import braintools >>> import brainstate >>> >>> # Basic triangular oscillation >>> scheduler = braintools.optim.CyclicLR( ... base_lr=0.001, ... max_lr=0.006, ... step_size_up=2000, # 2000 iterations to go from base to max ... mode='triangular' ... ) >>> optimizer = braintools.optim.SGD(lr=scheduler, momentum=0.9) >>> optimizer.register_trainable_weights(model.states(brainstate.ParamState))
Triangular2 with amplitude decay:
>>> # Amplitude halves each cycle for fine-tuning >>> scheduler = braintools.optim.CyclicLR( ... base_lr=0.0001, ... max_lr=0.001, ... step_size_up=1000, ... step_size_down=1000, ... mode='triangular2' # Amplitude decay ... ) >>> >>> # First cycle: LR oscillates between 0.0001 and 0.001 >>> # Second cycle: LR oscillates between 0.0001 and 0.0055 >>> # Third cycle: LR oscillates between 0.0001 and 0.00325 >>> # And so on...
Exponential range decay:
>>> # Exponentially decreasing amplitude >>> scheduler = braintools.optim.CyclicLR( ... base_lr=0.0001, ... max_lr=0.01, ... step_size_up=500, ... mode='exp_range', ... gamma=0.99994 # Gradual decay ... )
Asymmetric cycles:
>>> # Spend more time at lower learning rates >>> scheduler = braintools.optim.CyclicLR( ... base_lr=0.0001, ... max_lr=0.001, ... step_size_up=500, # Quick ramp up ... step_size_down=1500 # Slow ramp down ... )
Custom scaling function:
>>> # Custom amplitude scaling >>> def custom_scale(x): ... '''Custom scaling: faster decay initially''' ... return 1 / (1 + 0.0005 * x) >>> >>> scheduler = braintools.optim.CyclicLR( ... base_lr=0.001, ... max_lr=0.1, ... step_size_up=1000, ... scale_fn=custom_scale, ... scale_mode='iterations' ... )
LR range test implementation:
>>> # Find optimal LR range >>> def lr_range_test(model, data_loader, max_lr=10, num_iter=100): ... scheduler = braintools.optim.CyclicLR( ... base_lr=1e-7, ... max_lr=max_lr, ... step_size_up=num_iter, ... mode='triangular' ... ) ... optimizer = braintools.optim.SGD(lr=scheduler, momentum=0.9) ... ... lrs, losses = [], [] ... for i, batch in enumerate(data_loader): ... if i >= num_iter: ... break ... loss = train_step(model, batch, optimizer) ... lrs.append(scheduler.get_lr()[0]) ... losses.append(loss) ... scheduler.step() ... ... # Plot and find optimal range ... import matplotlib.pyplot as plt ... plt.semilogx(lrs, losses) ... plt.xlabel('Learning Rate') ... plt.ylabel('Loss') ... plt.show()
For super-convergence:
>>> # Super-convergence with one cycle >>> # Use with large batch sizes and proper regularization >>> scheduler = braintools.optim.CyclicLR( ... base_lr=0.08, # Relatively high base_lr ... max_lr=0.8, # Very high max_lr ... step_size_up=epochs * len(train_loader) // 2, ... step_size_down=epochs * len(train_loader) // 2, ... mode='triangular' ... ) >>> >>> # Combine with strong regularization >>> optimizer = braintools.optim.SGD( ... lr=scheduler, ... momentum=0.95, ... weight_decay=1e-4 ... )
Multiple parameter groups:
>>> # Different LR ranges for different layers >>> scheduler = braintools.optim.CyclicLR( ... base_lr=[0.0001, 0.001], # Lower for pretrained layers ... max_lr=[0.001, 0.01], # Higher for new layers ... step_size_up=1000 ... )
Monitoring cycles:
>>> scheduler = braintools.optim.CyclicLR( ... base_lr=0.001, ... max_lr=0.01, ... step_size_up=100, ... step_size_down=100 ... ) >>> >>> for iteration in range(1000): ... train_step(...) ... scheduler.step() ... ... if iteration % 50 == 0: ... cycle = iteration // (scheduler.step_size_up + scheduler.step_size_down) ... lr = scheduler.get_lr()[0] ... print(f"Iter {iteration}, Cycle {cycle}, LR: {lr:.6f}")
See also
OneCycleLROne cycle learning rate policy
CosineAnnealingLRCosine annealing schedule
CosineAnnealingWarmRestartsCosine annealing with restarts
TriangularLRSimplified triangular schedule
References