RMSprop

RMSprop#

class braintools.optim.RMSprop(lr=0.01, alpha=0.99, eps=1e-08, weight_decay=0.0, momentum=0.0, centered=False, grad_clip_norm=None, grad_clip_value=None)#

RMSprop (Root Mean Square Propagation) optimizer.

RMSprop divides the learning rate by an exponentially decaying average of squared gradients. This helps the optimizer navigate ravines, where the surface curves much more steeply in one dimension than in another.

Parameters:
  • lr (float | LRScheduler) – Learning rate. Can be a float or LRScheduler instance.

  • alpha (float) – Smoothing constant (decay rate for moving average of squared gradients).

  • eps (float) – Term added to the denominator to improve numerical stability.

  • weight_decay (float) – Weight decay (L2 penalty) coefficient.

  • momentum (float) – Momentum factor. If > 0, uses momentum-based RMSprop.

  • centered (bool) – If True, compute centered RMSprop (normalizes gradient by variance estimate).

  • grad_clip_norm (float | None) – Maximum gradient norm for clipping.

  • grad_clip_value (float | None) – Maximum gradient value for clipping.

Notes

The RMSprop update is computed as:

\[ \begin{align}\begin{aligned}E[g^2]_t = \alpha E[g^2]_{t-1} + (1 - \alpha) g_t^2\\\theta_t = \theta_{t-1} - \frac{\eta}{\sqrt{E[g^2]_t + \epsilon}} g_t\end{aligned}\end{align} \]

where \(\alpha\) is the decay rate, \(g_t\) is the gradient, \(\eta\) is the learning rate, and \(\epsilon\) is for numerical stability.

With centered=True:

\[ \begin{align}\begin{aligned}E[g]_t = \alpha E[g]_{t-1} + (1 - \alpha) g_t\\\theta_t = \theta_{t-1} - \frac{\eta}{\sqrt{E[g^2]_t - E[g]_t^2 + \epsilon}} g_t\end{aligned}\end{align} \]

References

Examples

Basic RMSprop usage:

>>> import brainstate
>>> import braintools
>>>
>>> model = brainstate.nn.Linear(10, 5)
>>> optimizer = braintools.optim.RMSprop(lr=0.01)
>>> optimizer.register_trainable_weights(model.states(brainstate.ParamState))

RMSprop with momentum:

>>> optimizer = braintools.optim.RMSprop(lr=0.01, momentum=0.9)
>>> optimizer.register_trainable_weights(model.states(brainstate.ParamState))

Centered RMSprop:

>>> optimizer = braintools.optim.RMSprop(lr=0.01, centered=True)
>>> optimizer.register_trainable_weights(model.states(brainstate.ParamState))

RMSprop with custom alpha:

>>> optimizer = braintools.optim.RMSprop(lr=0.01, alpha=0.95)
>>> optimizer.register_trainable_weights(model.states(brainstate.ParamState))

See also

Adagrad

Adaptive gradient algorithm

Adadelta

Extension of Adagrad

Adam

Combines RMSprop with momentum

default_tx()[source]#

Create RMSprop-specific gradient transformation.