Adamax

Adamax#

class braintools.optim.Adamax(lr=0.002, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.0, grad_clip_norm=None, grad_clip_value=None)#

Adamax optimizer - variant of Adam based on infinity norm.

Adamax is a variant of Adam based on the infinity norm, making it more robust to large gradients. It can sometimes achieve better performance than Adam.

Parameters:
  • lr (float | LRScheduler) – Learning rate. Can be a float or LRScheduler instance.

  • betas (Tuple[float, float]) – Coefficients (beta1, beta2) for computing running averages.

  • eps (float) – Term added to the denominator for numerical stability.

  • weight_decay (float) – Weight decay (L2 penalty) coefficient.

  • grad_clip_norm (float | None) – Maximum gradient norm for clipping.

  • grad_clip_value (float | None) – Maximum gradient value for clipping.

Notes

The Adamax update uses the infinity norm:

\[ \begin{align}\begin{aligned}m_t = \beta_1 m_{t-1} + (1 - \beta_1) g_t\\u_t = \max(\beta_2 u_{t-1}, |g_t|)\\\theta_t = \theta_{t-1} - \frac{\alpha}{1 - \beta_1^t} \frac{m_t}{u_t + \epsilon}\end{aligned}\end{align} \]

where \(u_t\) uses the max operation instead of the squared gradients in Adam.

References

Examples

Basic Adamax usage:

>>> import brainstate
>>> import braintools
>>>
>>> model = brainstate.nn.Linear(10, 5)
>>> optimizer = braintools.optim.Adamax(lr=0.002)
>>> optimizer.register_trainable_weights(model.states(brainstate.ParamState))

Adamax with custom betas:

>>> optimizer = braintools.optim.Adamax(lr=0.002, betas=(0.9, 0.99))
>>> optimizer.register_trainable_weights(model.states(brainstate.ParamState))

See also

Adam

Standard Adam optimizer

Nadam

Adam with Nesterov momentum

default_tx()[source]#

Create default gradient transformation with clipping and weight decay.