huber_loss

huber_loss#

class braintools.metric.huber_loss(predictions, targets=None, delta=1.0)#

Compute Huber loss combining L1 and L2 properties for robust regression.

The Huber loss provides a compromise between L1 and L2 losses, being quadratic for small errors (like L2) and linear for large errors (like L1). This makes it robust to outliers while maintaining smooth gradients near zero, combining the best properties of both loss functions.

The Huber loss is defined as:

\[\begin{split}\ell_\delta(a) = \begin{cases} \frac{1}{2} a^2 & \text{if } |a| \leq \delta \\ \delta |a| - \frac{1}{2} \delta^2 & \text{if } |a| > \delta \end{cases}\end{split}\]

where \(a = y - \hat{y}\) is the residual and \(\delta\) is the threshold.

Parameters:

predictions (Array | ndarray | bool | number | bool | int | float | complex | Quantity) – Predicted values with arbitrary shape. Must be floating-point type.
targets (Array | ndarray | bool | number | bool | int | float | complex | Quantity | None) – Ground truth target values with shape broadcastable to predictions. If not provided, targets are assumed to be zeros.
delta (float) – Threshold parameter that controls the transition between quadratic and linear regions. Smaller values make the loss more L1-like (robust but less smooth), while larger values make it more L2-like (smooth but less robust).

Returns:

Element-wise Huber losses with the same shape as predictions.

Return type:

Notes

The Huber loss has several important properties:

Robustness: Linear growth for large errors reduces outlier sensitivity
Smoothness: Quadratic near zero ensures smooth gradients for optimization
Gradient clipping: Equivalent to clipping L2 gradients to [-delta, delta]

The choice of delta parameter affects the balance:

Small delta: More robust, approaches L1 loss
Large delta: Less robust, approaches L2 loss
delta = 1.0: Common default providing good balance

This loss is particularly effective for regression with outliers and in reinforcement learning for value function approximation.

Examples

Basic Huber loss computation:

>>> import jax.numpy as jnp
>>> import braintools
>>> predictions = jnp.array([1.0, 2.0, 5.0])
>>> targets = jnp.array([1.1, 1.9, 3.0])  # Last prediction is outlier
>>> loss = braintools.metric.huber_loss(predictions, targets)
>>> print(loss)

Compare different delta values:

>>> # Small delta (more L1-like, robust)
>>> loss_small = braintools.metric.huber_loss(predictions, targets, delta=0.5)
>>> # Large delta (more L2-like, smooth)
>>> loss_large = braintools.metric.huber_loss(predictions, targets, delta=2.0)
>>> print(f"Small delta: {loss_small}")
>>> print(f"Large delta: {loss_large}")

Visualize the transition regions:

>>> errors = jnp.linspace(-3, 3, 100)
>>> # Targets of zero to compute loss vs. error magnitude
>>> huber_vals = braintools.metric.huber_loss(errors, jnp.zeros_like(errors), delta=1.0)
>>> l1_vals = braintools.metric.absolute_error(errors, jnp.zeros_like(errors), reduction='none')
>>> l2_vals = braintools.metric.squared_error(errors, jnp.zeros_like(errors), reduction='none')

Gradient clipping interpretation:

>>> # For small errors, gradient is proportional to error (L2-like)
>>> small_error = jnp.array([0.5])
>>> # For large errors, gradient is constant (L1-like, clipped)
>>> large_error = jnp.array([2.0])

huber_loss

Contents

huber_loss#