ScaledWSConv2d#
- class brainstate.nn.ScaledWSConv2d(in_size, out_channels, kernel_size, stride=1, padding='SAME', lhs_dilation=1, rhs_dilation=1, groups=1, ws_gain=True, eps=0.0001, w_init=XavierNormal( scale=1.0, mode='fan_avg', in_axis=-2, out_axis=-1, distribution='truncated_normal', rng=RandomState([ 900 9244]), unit=Unit("1") ), b_init=None, w_mask=None, channel_first=False, name=None, param_type=<class 'brainstate.ParamState'>)#
Two-dimensional convolution with weight standardization.
This layer applies weight standardization to the convolutional kernel before performing the convolution operation. Weight standardization normalizes the weights to have zero mean and unit variance, improving training dynamics and model generalization, particularly in combination with group normalization.
The input should be a 4D array with the shape of
[B, H, W, C]where B is batch size, H is height, W is width, and C is the number of input channels (channels-last format).- Parameters:
in_size (
Sequence[int]) – The input shape without the batch dimension. For Conv2d: (H, W, C) where H is height, W is width, and C is the number of input channels. This argument is important as it is used to evaluate the output shape.out_channels (
int) – The number of output channels (also called filters or feature maps). These determine the depth of the output feature map.kernel_size (
int|Tuple[int,...]) –The shape of the convolutional kernel. Can be:
An integer (e.g., 3): creates a square kernel (3, 3)
A tuple of two integers (e.g., (3, 5)): creates a (height, width) kernel
stride (
int|Tuple[int,...]) –The stride of the convolution. Controls how much the kernel moves at each step. Can be:
An integer: same stride for both dimensions
A tuple of two integers: (stride_height, stride_width)
Default: 1.
padding (
str|int|Tuple[int,int] |Sequence[Tuple[int,int]]) –The padding strategy. Options:
’SAME’: output spatial size equals input size when stride=1
’VALID’: no padding, output size reduced by kernel size
int: same symmetric padding for all dimensions
(pad_h, pad_w): different padding for each dimension
[(pad_h_before, pad_h_after), (pad_w_before, pad_w_after)]: explicit padding
Default: ‘SAME’.
lhs_dilation (
int|Tuple[int,...]) – The dilation factor for the input (left-hand side). Controls spacing between input elements. A value > 1 inserts zeros between input elements, equivalent to transposed convolution. Default: 1.rhs_dilation (
int|Tuple[int,...]) – The dilation factor for the kernel (right-hand side). Also known as atrous convolution or dilated convolution. Increases the receptive field without increasing parameters by inserting zeros between kernel elements. Useful for semantic segmentation and dense prediction tasks. Default: 1.groups (
int) –Number of groups for grouped convolution. Must divide both in_channels and out_channels.
groups=1: standard convolution (all-to-all connections)
groups>1: grouped convolution (reduces parameters by factor of groups)
groups=in_channels: depthwise convolution (each input channel convolved separately)
Default: 1.
w_init (
Callable|Array|ndarray|bool|number|bool|int|float|complex|Quantity) –Weight initializer for the convolutional kernel. Can be:
An initializer instance (e.g., braintools.init.XavierNormal())
A callable that returns an array given a shape
A direct array matching the kernel shape
Default: XavierNormal().
b_init (
Callable|Array|ndarray|bool|number|bool|int|float|complex|Quantity|None) – Bias initializer. If None, no bias term is added to the output. Default: None.ws_gain (
bool) – Whether to include a learnable per-channel gain parameter in weight standardization. When True, adds a scaling factor that can be learned during training, improving model expressiveness. Highly recommended when using with Group Normalization. Default: True.eps (
float) – Small constant for numerical stability in weight standardization. Prevents division by zero when computing weight standard deviation. Typical values: 1e-4 to 1e-5. Default: 1e-4.w_mask (
Callable|Array|ndarray|bool|number|bool|int|float|complex|Quantity|None) – Optional weight mask for structured sparsity or custom connectivity. The mask is element-wise multiplied with the standardized kernel weights during the forward pass. Default: None.name (
str) – Name identifier for this module instance. Default: None.param_type (
type) – The parameter state class to use for managing learnable parameters. Default: ParamState.
- weight#
The learnable weights (and bias if specified) of the module.
- Type:
Examples
>>> import brainstate as brainstate >>> import jax.numpy as jnp >>> >>> # Create a 2D convolution with weight standardization >>> conv = brainstate.nn.ScaledWSConv2d( ... in_size=(64, 64, 3), ... out_channels=32, ... kernel_size=3 ... ) >>> >>> # Apply to input >>> x = jnp.ones((8, 64, 64, 3)) >>> y = conv(x) >>> print(y.shape) # (8, 64, 64, 32) >>> >>> # Combine with custom settings for ResNet-style architecture >>> conv = brainstate.nn.ScaledWSConv2d( ... in_size=(224, 224, 3), ... out_channels=64, ... kernel_size=7, ... stride=2, ... padding='SAME', ... ws_gain=True, ... b_init=braintools.init.ZeroInit() ... ) >>> >>> # Depthwise separable convolution with weight standardization >>> conv = brainstate.nn.ScaledWSConv2d( ... in_size=(32, 32, 128), ... out_channels=128, ... kernel_size=3, ... groups=128, ... ws_gain=False ... )
Notes
Weight standardization formula:
Weight standardization reparameterizes the convolutional weights as:
\[\hat{W} = g \cdot \frac{W - \mu_W}{\sigma_W + \epsilon}\]where \(\mu_W\) and \(\sigma_W\) are the mean and standard deviation of the weights computed per output channel, \(g\) is a learnable gain parameter (if ws_gain=True), and \(\epsilon\) is a small constant.
Benefits:
Reduces internal covariate shift
Smooths the loss landscape
Works well with Group Normalization
Improves training stability with small batch sizes
Enables training deeper networks more easily
References