braintools.init module#
Parameter Initialization System for Neural Networks.
This module provides a comprehensive initialization system for neural network parameters, including weights, delays, and spatial connectivity patterns. The system is designed with composability and biological realism in mind.
Key Features:
Basic Distributions: Constant, Uniform, Normal, LogNormal, Gamma, Exponential, etc.
Variance Scaling: Xavier, Kaiming/He, LeCun initialization for deep learning
Orthogonal Methods: Orthogonal, DeltaOrthogonal, Identity initialization
Composite Patterns: Mixture, Conditional, Scaled, Clipped distributions
Distance-Based: Spatial connectivity with Gaussian, Exponential, Power-law profiles
Composable Design: Combine and transform initializations with arithmetic operations
Unit-Aware: Full integration with BrainUnit for physical quantities
Quick Start:
import brainunit as u
from braintools.init import Normal, Uniform, param
# Basic distributions
weight_init = Normal(mean=1.0 * u.nS, std=0.2 * u.nS)
weights = weight_init(1000)
# Uniform distribution for delays
delay_init = Uniform(low=1.0 * u.ms, high=3.0 * u.ms)
delays = delay_init(1000)
# Use param() helper for flexible initialization
w = param(Normal(0.5 * u.nS, 0.1 * u.nS), sizes=100)
Basic Distributions:
import numpy as np
import brainunit as u
from braintools.init import (
Constant, ZeroInit, Uniform, Normal, LogNormal,
Gamma, Exponential, TruncatedNormal, Beta, Weibull
)
# Constant values
const_init = Constant(0.5 * u.nS)
zero_init = ZeroInit(u.nS)
# Uniform distribution
uniform_init = Uniform(0.1 * u.nS, 1.0 * u.nS)
# Normal distribution
normal_init = Normal(0.5 * u.nS, 0.1 * u.nS)
# Log-normal for positive-only values
lognormal_init = LogNormal(0.5 * u.nS, 0.2 * u.nS)
# Truncated normal with bounds
truncated_init = TruncatedNormal(
mean=0.5 * u.nS,
std=0.2 * u.nS,
low=0.0 * u.nS,
high=1.0 * u.nS
)
Variance Scaling (Deep Learning):
import numpy as np
from braintools.init import (
KaimingUniform, KaimingNormal,
XavierUniform, XavierNormal,
LecunUniform, LecunNormal
)
# Kaiming/He initialization (for ReLU networks)
kaiming_init = KaimingNormal(mode='fan_in')
weights = kaiming_init((256, 128))
# Xavier/Glorot initialization (for tanh/sigmoid)
xavier_init = XavierUniform()
weights = xavier_init((256, 128))
# LeCun initialization (for SELU networks)
lecun_init = LecunNormal()
weights = lecun_init((256, 128))
Orthogonal Initialization:
import numpy as np
from braintools.init import Orthogonal, DeltaOrthogonal, Identity
# Orthogonal matrix (for recurrent networks)
ortho_init = Orthogonal(scale=np.sqrt(2))
weights = ortho_init((100, 100))
# Identity initialization
identity_init = Identity(scale=1.0)
weights = identity_init((100, 100))
Composite Patterns:
import numpy as np
import brainunit as u
from braintools.init import (
Mixture, Conditional, Scaled, Clipped,
Normal, Uniform
)
# Mixture of distributions
mixed_init = Mixture(
distributions=[
Normal(0.5 * u.nS, 0.1 * u.nS),
Uniform(0.8 * u.nS, 1.2 * u.nS)
],
weights=[0.7, 0.3]
)
# Conditional based on neuron properties
def is_excitatory(indices):
return indices < 800
conditional_init = Conditional(
condition_fn=is_excitatory,
true_dist=Normal(0.5 * u.nS, 0.1 * u.nS),
false_dist=Normal(-0.3 * u.nS, 0.05 * u.nS)
)
# Scaled and clipped distributions
scaled_init = Scaled(Normal(1.0 * u.nS, 0.2 * u.nS), scale_factor=0.5)
clipped_init = Clipped(
Normal(0.5 * u.nS, 0.3 * u.nS),
min_val=0.0 * u.nS,
max_val=1.0 * u.nS
)
Distance-Based Connectivity:
import numpy as np
import brainunit as u
from braintools.init import (
GaussianProfile, ExponentialProfile, DistanceModulated,
Normal
)
# Gaussian distance profile
gaussian_profile = GaussianProfile(
sigma=50.0 * u.um,
max_distance=200.0 * u.um
)
# Exponential decay
exp_profile = ExponentialProfile(
decay_constant=100.0 * u.um,
max_distance=500.0 * u.um
)
# Distance-modulated weights
positions_pre = np.random.uniform(0, 1000, (100, 2)) * u.um
positions_post = np.random.uniform(0, 1000, (100, 2)) * u.um
dist_init = DistanceModulated(
base_dist=Normal(1.0 * u.nS, 0.2 * u.nS),
distance_profile=gaussian_profile
)
weights = dist_init(
(100, 100),
pre_positions=positions_pre,
post_positions=positions_post
)
Composable Initialization:
import numpy as np
import brainunit as u
from braintools.init import Normal, Compose
# Arithmetic composition
init1 = Normal(0.5 * u.nS, 0.1 * u.nS) * 2.0 + 0.1 * u.nS
# Clipping
init2 = Normal(0.5 * u.nS, 0.2 * u.nS).clip(0.0 * u.nS, 1.0 * u.nS)
# Functional composition with pipe operator
init3 = (
Normal(1.0 * u.nS, 0.2 * u.nS) |
(lambda x: u.math.maximum(x, 0 * u.nS)) |
(lambda x: x * 0.5)
)
# Compose class for complex chains
init4 = Compose(
Normal(1.0 * u.nS, 0.2 * u.nS),
lambda x: u.math.maximum(x, 0 * u.nS),
lambda x: x * 0.5
)
Comprehensive parameter initialization toolkit for brain modeling, featuring statistical distributions, variance scaling strategies, orthogonal initializations, and distance-dependent connectivity patterns.
Overview#
The braintools.init module provides:
Statistical distributions for basic weight and parameter initialization
Variance scaling methods (Kaiming/He, Xavier/Glorot, LeCun) for deep neural networks
Orthogonal initializations for recurrent networks and deep architectures
Distance-dependent profiles for spatially-structured connectivity
Composite distributions for creating complex initialization patterns
Functional composition via arithmetic operations and transformations
Base Classes#
These classes provide the foundational architecture for all initialization strategies in the module.
The Initialization class defines the common interface for weight and parameter initializations,
while DistanceProfile serves as the base for distance-dependent connectivity patterns.
Base class for all parameter initialization strategies. |
|
Base class for distance-dependent connectivity profiles. |
Utility Functions#
Helper functions and type aliases that support initialization workflows.
Basic Distributions#
Fundamental statistical distributions for parameter initialization. These provide standard probability distributions commonly used in neural network initialization and brain modeling. All distributions support physical units via brainunit and accept an optional random number generator for reproducibility.
Constant and Uniform#
Simple initialization strategies for constant values and uniform random sampling.
Gaussian-like Distributions#
Normal and log-normal distributions for generating bell-curved parameter values. These are the most commonly used distributions for weight initialization.
Normal (Gaussian) distribution initialization. |
|
Log-normal distribution initialization. |
|
Truncated normal distribution initialization. |
Heavy-tailed and Skewed Distributions#
Distributions with non-Gaussian shapes, useful for modeling biological variability and creating diverse parameter patterns. These include exponential decay, gamma, beta, and Weibull distributions.
Gamma distribution initialization. |
|
Exponential distribution initialization. |
|
Beta distribution initialization (rescaled to desired range). |
|
Weibull distribution initialization. |
Variance Scaling Initializations#
Variance scaling methods automatically adjust the initialization scale based on layer dimensions to maintain stable gradient flow during training. These are essential for training deep neural networks and are widely used in modern deep learning frameworks.
All variance scaling methods support three modes:
fan_in: Scale by number of input units (default for most methods)
fan_out: Scale by number of output units
fan_avg: Scale by average of input and output units
Kaiming/He Initialization#
Recommended for ReLU and leaky ReLU activations. Maintains variance through layers with rectified linear units by accounting for the fact that ReLU zeros out half the activations.
Reference: He et al., “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification”, ICCV 2015.
Kaiming/He uniform initialization. |
|
Kaiming/He normal initialization. |
Xavier/Glorot Initialization#
Recommended for tanh and sigmoid activations. Uses fan_avg mode to balance gradient flow in both forward and backward passes. Particularly effective for symmetric activation functions.
Reference: Glorot & Bengio, “Understanding the difficulty of training deep feedforward neural networks”, AISTATS 2010.
Xavier/Glorot uniform initialization. |
|
Xavier/Glorot normal initialization. |
LeCun Initialization#
Similar to Xavier but uses fan_in only. Recommended for SELU (scaled exponential linear unit) activations, which have self-normalizing properties.
Reference: LeCun et al., “Efficient BackProp”, 1998.
LeCun uniform initialization. |
|
LeCun normal initialization. |
Orthogonal Initializations#
Orthogonal initialization methods create weight matrices with orthonormal rows or columns, which helps preserve gradient norms during backpropagation. These are particularly useful for recurrent neural networks and very deep architectures where gradient flow is critical.
Orthogonal matrix initialization. |
|
Delta-orthogonal initialization for deep CNNs. |
|
Identity matrix initialization. |
Orthogonal: Standard orthogonal matrix initialization using QR decomposition. Preserves gradient norms and is ideal for recurrent networks. Often scaled by sqrt(2) for ReLU networks.
DeltaOrthogonal: Specialized initialization for deep convolutional networks. Creates delta functions in spatial dimensions combined with orthogonal initialization in channel dimensions, enabling training of extremely deep CNNs (10,000+ layers).
Identity: Identity matrix initialization, optionally scaled. Creates matrices that initially perform identity transformations, useful for residual connections and gated recurrent architectures.
Composite Distributions#
Composite distributions combine or modify other distributions to create complex initialization patterns. These enable sophisticated initialization strategies by composing simple distributions in flexible ways.
Mixture of multiple weight distributions. |
|
Conditional weight distribution based on neuron properties. |
|
Scaled version of another distribution. |
|
Clipped version of another distribution. |
Mixture: Randomly selects from multiple distributions for each parameter according to specified probability weights. Useful for creating heterogeneous parameter populations.
Conditional: Uses different distributions based on neuron properties or indices. Enables excitatory/inhibitory distinction or layer-specific initialization strategies.
Scaled: Multiplies another distribution by a constant factor. Simple wrapper for scaling existing distributions.
Clipped: Clips distribution output to specified minimum and maximum values. Ensures parameters stay within desired bounds without changing the underlying distribution.
Distance-Modulated Initialization#
Distance-modulated initialization adjusts weights based on spatial distance between neurons, using a specified distance profile. This is essential for spatially-structured brain models where connection strength depends on neuronal proximity.
Initialization modulated by distance. |
Distance Profiles#
Distance profiles define how connection probability and weight strength vary with spatial distance between neurons. These are essential for modeling spatially-structured neural populations where connectivity follows anatomical principles.
All distance profiles implement two methods:
probability(distances): Connection probability as a function of distanceweight_scaling(distances): Weight strength scaling factor as a function of distance
Base Class#
Base class for distance-dependent connectivity profiles. |
Distance Profile Implementations#
Concrete distance profile implementations for various spatial connectivity patterns.
Import these from braintools.init:
Gaussian distance profile. |
|
Exponential distance profile. |
|
Linear distance profile. |
|
Step function distance profile. |
|
Power-law distance profile. |
|
Mexican Hat (Ricker wavelet) distance profile. |
|
Difference of Gaussians (DoG) distance profile. |
|
Sigmoid distance profile. |
|
Bimodal distance profile. |
|
Logistic distance profile. |
Composition Profiles#
These profiles enable functional composition and transformation of distance profiles, allowing complex spatial patterns to be built from simpler ones.
Binary operation composition of distance profiles. |
|
Clip a distance profile's output to a specified range. |
|
Apply an arbitrary function to a distance profile's output. |
|
Pipe/compose distance profiles or functions. |
Functional Composition#
Both Initialization and DistanceProfile classes support rich functional composition
through operator overloading and transformation methods:
Arithmetic Operations#
Addition (
+): Sum two initializations or add a constantSubtraction (
-): Subtract initializations or subtract a constantMultiplication (
*): Multiply initializations or scale by a constantDivision (
/): Divide initializations or scale by a reciprocal
Transformation Methods#
.clip(min_val, max_val): Clip output to specified range.add(value): Add a constant value.multiply(value): Multiply by a constant value.apply(func): Apply arbitrary function to output
Composition Operators#
Pipe (
|): Chain transformations functionallyCompose: Explicitly compose multiple transformations
Examples#
Basic Weight Initialization#
import numpy as np
import brainunit as u
from braintools.init import Normal, Uniform, Constant
# Gaussian weights with physical units
weight_init = Normal(0.5 * u.nS, 0.1 * u.nS)
rng = np.random.default_rng(0)
weights = weight_init(1000, rng=rng)
# Uniform delay distribution
delay_init = Uniform(1.0 * u.ms, 5.0 * u.ms)
delays = delay_init(1000, rng=rng)
# Constant bias
bias_init = Constant(-70.0 * u.mV)
biases = bias_init(100)
Variance Scaling for Deep Networks#
from braintools.init import KaimingNormal, XavierUniform
# Kaiming initialization for ReLU network
init_relu = KaimingNormal(mode='fan_in')
layer1_weights = init_relu((256, 784), rng=rng)
layer2_weights = init_relu((128, 256), rng=rng)
# Xavier initialization for tanh network
init_tanh = XavierUniform()
layer1_weights = init_tanh((256, 784), rng=rng)
layer2_weights = init_tanh((128, 256), rng=rng)
Orthogonal Initialization#
from braintools.init import Orthogonal, Identity
# Orthogonal recurrent weights
recurrent_init = Orthogonal(scale=np.sqrt(2))
recurrent_weights = recurrent_init((128, 128), rng=rng)
# Identity initialization for residual connections
residual_init = Identity(scale=1.0)
residual_weights = residual_init((256, 256))
Functional Composition#
from braintools.init import Normal, Uniform
# Compose with arithmetic operations
combined = Normal(0.5 * u.nS, 0.1 * u.nS) * 2.0 + 0.1 * u.nS
weights = combined(1000, rng=rng)
# Chain transformations with pipe operator
init = (Normal(1.0 * u.nS, 0.3 * u.nS) |
(lambda x: u.math.maximum(x, 0 * u.nS)) |
(lambda x: x * 0.5))
weights = init(1000, rng=rng)
# Clip to valid range
clipped_init = Normal(0.5 * u.nS, 0.2 * u.nS).clip(0.0 * u.nS, 1.0 * u.nS)
weights = clipped_init(1000, rng=rng)
Composite Distributions#
from braintools.init import Mixture, Conditional, Scaled
# Mixture of distributions (70% normal, 30% uniform)
mix_init = Mixture(
distributions=[
Normal(0.5 * u.nS, 0.1 * u.nS),
Uniform(0.8 * u.nS, 1.2 * u.nS)
],
weights=[0.7, 0.3]
)
weights = mix_init(1000, rng=rng)
# Conditional initialization (excitatory vs inhibitory)
def is_excitatory(indices):
return indices < 800
cond_init = Conditional(
condition_fn=is_excitatory,
true_dist=Normal(0.5 * u.nS, 0.1 * u.nS), # Excitatory
false_dist=Normal(-0.3 * u.nS, 0.05 * u.nS) # Inhibitory
)
weights = cond_init(1000, neuron_indices=np.arange(1000), rng=rng)
Distance-Dependent Connectivity#
from braintools.init import (GaussianProfile, ExponentialProfile,
DistanceModulated, Normal)
# Gaussian distance profile
gaussian_profile = GaussianProfile(sigma=100.0 * u.um)
# Exponential decay profile
exp_profile = ExponentialProfile(length_scale=200.0 * u.um)
# Distance-modulated weights
init = DistanceModulated(
base_dist=Normal(1.0 * u.nS, 0.2 * u.nS),
distance_profile=gaussian_profile,
min_weight=0.01 * u.nS
)
# Assuming distances is a distance matrix between neurons
distances = np.random.uniform(0, 500, size=1000) * u.um
weights = init(1000, distances=distances, rng=rng)
# Compose distance profiles
combined_profile = gaussian_profile * 0.7 + exp_profile * 0.3
# Clip profile values
clipped_profile = gaussian_profile.clip(0.1, 0.9)
# Apply custom transformation
transformed_profile = exp_profile.apply(lambda x: x ** 2)
Helper Function Usage#
from braintools.init import init_call, Normal, Compose
# init_call provides a unified interface
init1 = Normal(0.5 * u.nS, 0.1 * u.nS)
weights1 = init_call(init1, 100, rng=rng)
# Works with scalars
weights2 = init_call(0.5, 100)
# Works with arrays
weights3 = init_call(np.ones(100) * 0.5 * u.nS, 100)
# Compose multiple initializations
composed = Compose(
Normal(1.0 * u.nS, 0.2 * u.nS),
lambda x: u.math.maximum(x, 0 * u.nS),
lambda x: x * 0.5
)
weights = composed(1000, rng=rng)
Best Practices#
Choosing Initialization Methods#
For feedforward networks with ReLU: Use
KaimingNormalorKaimingUniformFor networks with tanh/sigmoid: Use
XavierNormalorXavierUniformFor recurrent networks: Use
Orthogonalwith appropriate scale (often sqrt(2))For very deep CNNs: Consider
DeltaOrthogonalFor biological models: Use
Normal,LogNormal, or distance-dependent profiles
Random Number Generators#
Always pass an RNG for reproducibility:
rng = np.random.default_rng(seed=42)
weights = init(1000, rng=rng)
Physical Units#
Use brainunit for physical quantities:
# Good: explicit units
weight_init = Normal(0.5 * u.nS, 0.1 * u.nS)
# Also valid: dimensionless
weight_init = Normal(0.5, 0.1)
Distance-Dependent Models#
For spatially-structured models, combine distance profiles with base distributions:
profile = GaussianProfile(sigma=100 * u.um)
init = DistanceModulated(
base_dist=Normal(1.0 * u.nS, 0.2 * u.nS),
distance_profile=profile
)
Composition Strategies#
Build complex patterns incrementally:
# Start simple
base = Normal(0.5 * u.nS, 0.1 * u.nS)
# Add constraints
base = base.clip(0.0 * u.nS, 1.0 * u.nS)
# Apply scaling
base = base * 2.0
# Add offset
final_init = base + 0.1 * u.nS