brainstate.transform.fwd_grad

brainstate.transform.fwd_grad#

brainstate.transform.fwd_grad(func=<brainstate.typing.Missing object>, grad_states=None, argnums=None, return_value=False, has_aux=None, tangent_size=None, drct_der_clip=None, key=None, **kwargs)#

Take forward first-order gradients for function func.

Same as grad(), jacrev(), and jacfwd(), the returns in this function are different for different argument settings.

When grad_states is None
- has_aux=False + return_value=False => arg_grads.
- has_aux=True + return_value=False => (arg_grads, aux_data).
- has_aux=False + return_value=True => (arg_grads, loss_value).
- has_aux=True + return_value=True => (arg_grads, loss_value, aux_data).
When grad_states is not None and argnums is None
- has_aux=False + return_value=False => var_grads.
- has_aux=True + return_value=False => (var_grads, aux_data).
- has_aux=False + return_value=True => (var_grads, loss_value).
- has_aux=True + return_value=True => (var_grads, loss_value, aux_data).
When grad_states is not None and argnums is not None
- has_aux=False + return_value=False => (var_grads, arg_grads).
- has_aux=True + return_value=False => ((var_grads, arg_grads), aux_data).
- has_aux=False + return_value=True => ((var_grads, arg_grads), loss_value).
- has_aux=True + return_value=True => ((var_grads, arg_grads), loss_value, aux_data).

Parameters:

func (Callable) – Function whose gradient is to be computed.
grad_states (State | Sequence[State] | Dict[str, State] | None) – The variables in func to take their gradients.
argnums (int | Sequence[int] | None) – Specifies which positional argument(s) to differentiate with respect to.
return_value (bool) – Whether to return the loss value.
has_aux (bool | None) – Indicates whether func returns a pair where the first element is considered the output of the mathematical function to be differentiated and the second element is auxiliary data.
tangent_size (int | None) – Number of random tangent directions to average over. None (the default) uses a single random direction; a positive integer averages the estimator over that many directions.
drct_der_clip (float | None) – If given, clip each directional derivative to [-drct_der_clip, drct_der_clip] before forming the gradient estimate.
key (int | Array | ndarray) – Seed or PRNG key controlling the random tangent directions. When None a key is drawn from the global RNG state.

Returns:

The forward-mode gradient function. The wrapped function must return a scalar (use vector_grad(), jacfwd(), or jacrev() for non-scalar outputs).

Return type:

GradientTransform | Callable[[Callable], GradientTransform]

Notes

fwd_grad is a stochastic forward-mode estimator: it draws random tangent directions and combines them with the directional derivative, so successive calls with different keys yield different estimates of the same gradient.

Examples

Basic forward-mode gradient estimation of a scalar function:

>>> import brainstate
>>> import jax.numpy as jnp
>>>
>>> # Scalar-valued function
>>> def f(x):
...     return jnp.sum(x ** 2)
>>>
>>> fwd_grad_f = brainstate.transform.fwd_grad(f, key=0)
>>> x = jnp.array([2.0, 3.0])
>>> gradients = fwd_grad_f(x)  # Estimate of [4.0, 6.0]

With states:

>>> params = brainstate.ParamState(jnp.array([1.0, 2.0]))
>>>
>>> def model():
...     return jnp.sum(params.value ** 2)
>>>
>>> fwd_grad_fn = brainstate.transform.fwd_grad(
...     model, grad_states=[params], key=0
... )
>>> param_grads = fwd_grad_fn()

brainstate.transform.fwd_grad

Contents

brainstate.transform.fwd_grad#

Modeling

Infrastructure

Compilation