huber_loss#
- class braintools.metric.huber_loss(predictions, targets=None, delta=1.0)#
Compute Huber loss combining L1 and L2 properties for robust regression.
The Huber loss provides a compromise between L1 and L2 losses, being quadratic for small errors (like L2) and linear for large errors (like L1). This makes it robust to outliers while maintaining smooth gradients near zero, combining the best properties of both loss functions.
The Huber loss is defined as:
\[\begin{split}\ell_\delta(a) = \begin{cases} \frac{1}{2} a^2 & \text{if } |a| \leq \delta \\ \delta |a| - \frac{1}{2} \delta^2 & \text{if } |a| > \delta \end{cases}\end{split}\]where \(a = y - \hat{y}\) is the residual and \(\delta\) is the threshold.
- Parameters:
predictions (
Array|ndarray|bool|number|bool|int|float|complex|Quantity) – Predicted values with arbitrary shape. Must be floating-point type.targets (
Array|ndarray|bool|number|bool|int|float|complex|Quantity|None) – Ground truth target values with shape broadcastable topredictions. If not provided, targets are assumed to be zeros.delta (
float) – Threshold parameter that controls the transition between quadratic and linear regions. Smaller values make the loss more L1-like (robust but less smooth), while larger values make it more L2-like (smooth but less robust).
- Returns:
Element-wise Huber losses with the same shape as
predictions.- Return type:
Array|ndarray|bool|number|bool|int|float|complex|Quantity
Notes
The Huber loss has several important properties:
Robustness: Linear growth for large errors reduces outlier sensitivity
Smoothness: Quadratic near zero ensures smooth gradients for optimization
Gradient clipping: Equivalent to clipping L2 gradients to
[-delta, delta]
The choice of
deltaparameter affects the balance:Small
delta: More robust, approaches L1 lossLarge
delta: Less robust, approaches L2 lossdelta = 1.0: Common default providing good balance
This loss is particularly effective for regression with outliers and in reinforcement learning for value function approximation.
Examples
Basic Huber loss computation:
>>> import jax.numpy as jnp >>> import braintools >>> predictions = jnp.array([1.0, 2.0, 5.0]) >>> targets = jnp.array([1.1, 1.9, 3.0]) # Last prediction is outlier >>> loss = braintools.metric.huber_loss(predictions, targets) >>> print(loss)
Compare different delta values:
>>> # Small delta (more L1-like, robust) >>> loss_small = braintools.metric.huber_loss(predictions, targets, delta=0.5) >>> # Large delta (more L2-like, smooth) >>> loss_large = braintools.metric.huber_loss(predictions, targets, delta=2.0) >>> print(f"Small delta: {loss_small}") >>> print(f"Large delta: {loss_large}")
Visualize the transition regions:
>>> errors = jnp.linspace(-3, 3, 100) >>> # Targets of zero to compute loss vs. error magnitude >>> huber_vals = braintools.metric.huber_loss(errors, jnp.zeros_like(errors), delta=1.0) >>> l1_vals = braintools.metric.absolute_error(errors, jnp.zeros_like(errors), reduction='none') >>> l2_vals = braintools.metric.squared_error(errors, jnp.zeros_like(errors), reduction='none')
Gradient clipping interpretation:
>>> # For small errors, gradient is proportional to error (L2-like) >>> small_error = jnp.array([0.5]) >>> # For large errors, gradient is constant (L1-like, clipped) >>> large_error = jnp.array([2.0])
See also
braintools.metric.absolute_errorPure L1 loss
braintools.metric.squared_errorPure L2 loss
braintools.metric.log_coshSmooth alternative to Huber loss
References