ScaledWSConv2d#

class brainstate.nn.ScaledWSConv2d(in_size, out_channels, kernel_size, stride=1, padding='SAME', lhs_dilation=1, rhs_dilation=1, groups=1, ws_gain=True, eps=0.0001, w_init=XavierNormal(   scale=1.0, mode='fan_avg', in_axis=-2, out_axis=-1, distribution='truncated_normal', rng=RandomState([ 900 9244]), unit=Unit("1") ), b_init=None, w_mask=None, channel_first=False, name=None, param_type=<class 'brainstate.ParamState'>)#

Two-dimensional convolution with weight standardization.

This layer applies weight standardization to the convolutional kernel before performing the convolution operation. Weight standardization normalizes the weights to have zero mean and unit variance, improving training dynamics and model generalization, particularly in combination with group normalization.

The input should be a 4D array with the shape of [B, H, W, C] where B is batch size, H is height, W is width, and C is the number of input channels (channels-last format).

Parameters:
  • in_size (Sequence[int]) – The input shape without the batch dimension. For Conv2d: (H, W, C) where H is height, W is width, and C is the number of input channels. This argument is important as it is used to evaluate the output shape.

  • out_channels (int) – The number of output channels (also called filters or feature maps). These determine the depth of the output feature map.

  • kernel_size (int | Tuple[int, ...]) –

    The shape of the convolutional kernel. Can be:

    • An integer (e.g., 3): creates a square kernel (3, 3)

    • A tuple of two integers (e.g., (3, 5)): creates a (height, width) kernel

  • stride (int | Tuple[int, ...]) –

    The stride of the convolution. Controls how much the kernel moves at each step. Can be:

    • An integer: same stride for both dimensions

    • A tuple of two integers: (stride_height, stride_width)

    Default: 1.

  • padding (str | int | Tuple[int, int] | Sequence[Tuple[int, int]]) –

    The padding strategy. Options:

    • ’SAME’: output spatial size equals input size when stride=1

    • ’VALID’: no padding, output size reduced by kernel size

    • int: same symmetric padding for all dimensions

    • (pad_h, pad_w): different padding for each dimension

    • [(pad_h_before, pad_h_after), (pad_w_before, pad_w_after)]: explicit padding

    Default: ‘SAME’.

  • lhs_dilation (int | Tuple[int, ...]) – The dilation factor for the input (left-hand side). Controls spacing between input elements. A value > 1 inserts zeros between input elements, equivalent to transposed convolution. Default: 1.

  • rhs_dilation (int | Tuple[int, ...]) – The dilation factor for the kernel (right-hand side). Also known as atrous convolution or dilated convolution. Increases the receptive field without increasing parameters by inserting zeros between kernel elements. Useful for semantic segmentation and dense prediction tasks. Default: 1.

  • groups (int) –

    Number of groups for grouped convolution. Must divide both in_channels and out_channels.

    • groups=1: standard convolution (all-to-all connections)

    • groups>1: grouped convolution (reduces parameters by factor of groups)

    • groups=in_channels: depthwise convolution (each input channel convolved separately)

    Default: 1.

  • w_init (Callable | Array | ndarray | bool | number | bool | int | float | complex | Quantity) –

    Weight initializer for the convolutional kernel. Can be:

    • An initializer instance (e.g., braintools.init.XavierNormal())

    • A callable that returns an array given a shape

    • A direct array matching the kernel shape

    Default: XavierNormal().

  • b_init (Callable | Array | ndarray | bool | number | bool | int | float | complex | Quantity | None) – Bias initializer. If None, no bias term is added to the output. Default: None.

  • ws_gain (bool) – Whether to include a learnable per-channel gain parameter in weight standardization. When True, adds a scaling factor that can be learned during training, improving model expressiveness. Highly recommended when using with Group Normalization. Default: True.

  • eps (float) – Small constant for numerical stability in weight standardization. Prevents division by zero when computing weight standard deviation. Typical values: 1e-4 to 1e-5. Default: 1e-4.

  • w_mask (Callable | Array | ndarray | bool | number | bool | int | float | complex | Quantity | None) – Optional weight mask for structured sparsity or custom connectivity. The mask is element-wise multiplied with the standardized kernel weights during the forward pass. Default: None.

  • name (str) – Name identifier for this module instance. Default: None.

  • param_type (type) – The parameter state class to use for managing learnable parameters. Default: ParamState.

in_size#

The input shape (H, W, C) without batch dimension.

Type:

tuple of int

out_size#

The output shape (H_out, W_out, out_channels) without batch dimension.

Type:

tuple of int

in_channels#

Number of input channels.

Type:

int

out_channels#

Number of output channels.

Type:

int

kernel_size#

Size of the convolving kernel (height, width).

Type:

tuple of int

weight#

The learnable weights (and bias if specified) of the module.

Type:

ParamState

eps#

Small constant for numerical stability in weight standardization.

Type:

float

Examples

>>> import brainstate as brainstate
>>> import jax.numpy as jnp
>>>
>>> # Create a 2D convolution with weight standardization
>>> conv = brainstate.nn.ScaledWSConv2d(
...     in_size=(64, 64, 3),
...     out_channels=32,
...     kernel_size=3
... )
>>>
>>> # Apply to input
>>> x = jnp.ones((8, 64, 64, 3))
>>> y = conv(x)
>>> print(y.shape)  # (8, 64, 64, 32)
>>>
>>> # Combine with custom settings for ResNet-style architecture
>>> conv = brainstate.nn.ScaledWSConv2d(
...     in_size=(224, 224, 3),
...     out_channels=64,
...     kernel_size=7,
...     stride=2,
...     padding='SAME',
...     ws_gain=True,
...     b_init=braintools.init.ZeroInit()
... )
>>>
>>> # Depthwise separable convolution with weight standardization
>>> conv = brainstate.nn.ScaledWSConv2d(
...     in_size=(32, 32, 128),
...     out_channels=128,
...     kernel_size=3,
...     groups=128,
...     ws_gain=False
... )

Notes

Weight standardization formula:

Weight standardization reparameterizes the convolutional weights as:

\[\hat{W} = g \cdot \frac{W - \mu_W}{\sigma_W + \epsilon}\]

where \(\mu_W\) and \(\sigma_W\) are the mean and standard deviation of the weights computed per output channel, \(g\) is a learnable gain parameter (if ws_gain=True), and \(\epsilon\) is a small constant.

Benefits:

  • Reduces internal covariate shift

  • Smooths the loss landscape

  • Works well with Group Normalization

  • Improves training stability with small batch sizes

  • Enables training deeper networks more easily

References