ScaledWSConv3d#

class brainstate.nn.ScaledWSConv3d(in_size, out_channels, kernel_size, stride=1, padding='SAME', lhs_dilation=1, rhs_dilation=1, groups=1, ws_gain=True, eps=0.0001, w_init=XavierNormal(   scale=1.0, mode='fan_avg', in_axis=-2, out_axis=-1, distribution='truncated_normal', rng=RandomState([ 900 9244]), unit=Unit("1") ), b_init=None, w_mask=None, channel_first=False, name=None, param_type=<class 'brainstate.ParamState'>)#

Three-dimensional convolution with weight standardization.

This layer applies weight standardization to the convolutional kernel before performing the 3D convolution operation. Weight standardization normalizes the weights to have zero mean and unit variance, which improves training dynamics especially for 3D networks that are typically deeper and more parameter-heavy.

The input should be a 5D array with the shape of [B, H, W, D, C] where B is batch size, H is height, W is width, D is depth, and C is the number of input channels (channels-last format).

Parameters:
  • in_size (Sequence[int]) – The input shape without the batch dimension. For Conv3d: (H, W, D, C) where H is height, W is width, D is depth, and C is the number of input channels. This argument is important as it is used to evaluate the output shape.

  • out_channels (int) – The number of output channels (also called filters or feature maps). These determine the depth of the output feature map.

  • kernel_size (int | Tuple[int, ...]) –

    The shape of the convolutional kernel. Can be:

    • An integer (e.g., 3): creates a cubic kernel (3, 3, 3)

    • A tuple of three integers (e.g., (3, 5, 5)): creates a (height, width, depth) kernel

  • stride (int | Tuple[int, ...]) –

    The stride of the convolution. Controls how much the kernel moves at each step. Can be:

    • An integer: same stride for all dimensions

    • A tuple of three integers: (stride_h, stride_w, stride_d)

    Default: 1.

  • padding (str | int | Tuple[int, int] | Sequence[Tuple[int, int]]) –

    The padding strategy. Options:

    • ’SAME’: output spatial size equals input size when stride=1

    • ’VALID’: no padding, output size reduced by kernel size

    • int: same symmetric padding for all dimensions

    • (pad_h, pad_w, pad_d): different padding for each dimension

    • [(pad_h_before, pad_h_after), (pad_w_before, pad_w_after), (pad_d_before, pad_d_after)]: explicit padding

    Default: ‘SAME’.

  • lhs_dilation (int | Tuple[int, ...]) – The dilation factor for the input (left-hand side). Controls spacing between input elements. A value > 1 inserts zeros between input elements, equivalent to transposed convolution. Default: 1.

  • rhs_dilation (int | Tuple[int, ...]) – The dilation factor for the kernel (right-hand side). Also known as atrous convolution or dilated convolution. Increases the receptive field without increasing parameters by inserting zeros between kernel elements. Particularly valuable for 3D to capture multi-scale temporal/spatial context efficiently. Default: 1.

  • groups (int) –

    Number of groups for grouped convolution. Must divide both in_channels and out_channels.

    • groups=1: standard convolution (all-to-all connections)

    • groups>1: grouped convolution (critical for reducing 3D conv computational cost)

    • groups=in_channels: depthwise convolution (each input channel convolved separately)

    Default: 1.

  • w_init (Callable | Array | ndarray | bool | number | bool | int | float | complex | Quantity) –

    Weight initializer for the convolutional kernel. Can be:

    • An initializer instance (e.g., braintools.init.XavierNormal())

    • A callable that returns an array given a shape

    • A direct array matching the kernel shape

    Default: XavierNormal().

  • b_init (Callable | Array | ndarray | bool | number | bool | int | float | complex | Quantity | None) – Bias initializer. If None, no bias term is added to the output. Default: None.

  • ws_gain (bool) – Whether to include a learnable per-channel gain parameter in weight standardization. When True, adds a scaling factor that can be learned during training, improving model expressiveness. Particularly beneficial for deep 3D networks. Default: True.

  • eps (float) – Small constant for numerical stability in weight standardization. Prevents division by zero when computing weight standard deviation. Typical values: 1e-4 to 1e-5. Default: 1e-4.

  • w_mask (Callable | Array | ndarray | bool | number | bool | int | float | complex | Quantity | None) – Optional weight mask for structured sparsity or custom connectivity. The mask is element-wise multiplied with the standardized kernel weights during the forward pass. Default: None.

  • name (str) – Name identifier for this module instance. Default: None.

  • param_type (type) – The parameter state class to use for managing learnable parameters. Default: ParamState.

in_size#

The input shape (H, W, D, C) without batch dimension.

Type:

tuple of int

out_size#

The output shape (H_out, W_out, D_out, out_channels) without batch dimension.

Type:

tuple of int

in_channels#

Number of input channels.

Type:

int

out_channels#

Number of output channels.

Type:

int

kernel_size#

Size of the convolving kernel (height, width, depth).

Type:

tuple of int

weight#

The learnable weights (and bias if specified) of the module.

Type:

ParamState

eps#

Small constant for numerical stability in weight standardization.

Type:

float

Examples

>>> import brainstate as brainstate
>>> import jax.numpy as jnp
>>>
>>> # Create a 3D convolution with weight standardization for video
>>> conv = brainstate.nn.ScaledWSConv3d(
...     in_size=(16, 64, 64, 3),
...     out_channels=32,
...     kernel_size=3
... )
>>>
>>> # Apply to input
>>> x = jnp.ones((4, 16, 64, 64, 3))
>>> y = conv(x)
>>> print(y.shape)  # (4, 16, 64, 64, 32)
>>>
>>> # For medical imaging with custom parameters
>>> conv = brainstate.nn.ScaledWSConv3d(
...     in_size=(32, 32, 32, 1),
...     out_channels=64,
...     kernel_size=(3, 3, 3),
...     stride=2,
...     ws_gain=True,
...     eps=1e-5,
...     b_init=braintools.init.Constant(0.01)
... )
>>>
>>> # 3D grouped convolution with weight standardization
>>> conv = brainstate.nn.ScaledWSConv3d(
...     in_size=(8, 16, 16, 64),
...     out_channels=64,
...     kernel_size=3,
...     groups=8,
...     ws_gain=False
... )

Notes

Weight standardization formula:

Weight standardization reparameterizes the convolutional weights as:

\[\hat{W} = g \cdot \frac{W - \mu_W}{\sigma_W + \epsilon}\]

where \(\mu_W\) and \(\sigma_W\) are the mean and standard deviation of the weights, \(g\) is a learnable gain parameter (if ws_gain=True), and \(\epsilon\) is a small constant for numerical stability.

Why weight standardization for 3D:

For 3D convolutions, weight standardization is particularly beneficial because:

  • 3D networks are typically much deeper and harder to train

  • Reduces sensitivity to weight initialization

  • Improves gradient flow through very deep networks

  • Works well with limited computational resources (small batches)

  • Compatible with Group Normalization for batch-independent normalization

Applications:

Video understanding, medical imaging (CT, MRI scans), 3D object recognition, and temporal sequence modeling.

References