Batching Strategies

Batching Strategies#

Online learning algorithms need to handle batched data efficiently. In braintrace, there are two main batching strategies:

Vmap-based batching (recommended): Compile the computation graph for a single sample, then use vmap to automatically vectorize across the batch dimension.
Single-sample mode: Process one sample at a time, without any batching.

The choice of strategy affects how model states are initialized and how the online learning algorithm is called.

This tutorial walks through each strategy with concrete examples and shows how to build a full training loop using vmap-based batching.

Vmap-Based Batching (Recommended)#

The recommended approach is to compile the online learning graph for a single sample, then leverage JAX’s vmap to parallelize across the batch. braintrace.compile(..., batch_size=B, vmap=True) does this in one call:

It initialises per-sample states for batch_size samples.
It compiles the ETP graph from a single batched time step (shape (batch_size, n_in)); the batch axis (axis 0) is stripped internally to recover the per-sample example.
It wraps the algorithm with brainstate.nn.Vmap for parallel execution.
It returns the vmapped learner, ready to call on batched inputs.

This pattern keeps the model definition simple (single-sample logic) while gaining efficient batch parallelism automatically.

import jax
import jax.numpy as jnp
import brainstate
import braintools
import braintrace

class SimpleGRU(brainstate.nn.Module):
    def __init__(self, n_in, n_rec, n_out):
        super().__init__()
        self.rnn = braintrace.nn.GRUCell(n_in, n_rec)
        self.out = braintrace.nn.Linear(n_rec, n_out)

    def update(self, x):
        return self.out(self.rnn(x))

model = SimpleGRU(10, 64, 5)
batch_size = 16

# braintrace.compile with vmap=True:
#   - initialises per-sample hidden states (batch_size independent copies)
#   - compiles the ETP graph from a single batched time step: axis 0 is the batch
#     axis, which compile strips internally to recover the per-sample example
#   - wraps the result in brainstate.nn.Vmap for parallel execution
algo_vmapped = braintrace.compile(
    model, braintrace.D_RTRL, jnp.zeros((batch_size, 10)),
    batch_size=batch_size, vmap=True,
)

# Run on batched input — the returned learner handles the batch axis transparently
x_batch = jnp.ones((batch_size, 10))
out = algo_vmapped(x_batch)
print("Output shape:", out.shape)  # (16, 5)

How it works:

braintrace.compile(..., batch_size=B, vmap=True) internally runs brainstate.transform.vmap_new_states to create B independent copies of all model states (tagged 'new'), then compiles the ETP graph on a single-sample input, and finally wraps the learner in brainstate.nn.Vmap(algo, vmap_states='new') for parallel execution.
Each call to algo_vmapped(x_batch) automatically splits the batch input across the per-sample states, runs the forward pass independently for each sample, and stacks the outputs.
The model itself only ever sees single-sample inputs — all batch handling is transparent.

Single-Sample Mode#

For debugging or situations where batch processing is unnecessary, you can compile and run the algorithm on individual samples directly. No vmap or state replication is needed.

model2 = SimpleGRU(10, 64, 5)

# Single-sample mode: omit batch_size so states are created unbatched.
algo2 = braintrace.compile(model2, braintrace.D_RTRL, jnp.zeros(10))

# Process one sample at a time
x_single = jnp.ones(10)
out = algo2(x_single)
print("Single sample output shape:", out.shape)  # (5,)

This mode is straightforward: initialize the model, compile the graph, and call the algorithm. It is useful for step-by-step debugging or when processing a single stream of data.

Multi-Step Data#

braintrace provides SingleStepData and MultiStepData wrappers to control how the algorithm processes input along the time dimension.

SingleStepData: Wraps data for a single time step. The algorithm processes it as one forward pass.
MultiStepData: Wraps a sequence of time steps. The algorithm internally scans over all steps in the sequence.

This is useful when you want to pass an entire sequence to the algorithm and have it handle the temporal loop internally, rather than manually iterating over time steps.

# Single-step: process one time step at a time
x_single = braintrace.SingleStepData(jnp.ones(10))

# Multi-step: process a sequence
sequence = jnp.ones((20, 10))  # 20 time steps, 10 features
x_multi = braintrace.MultiStepData(sequence)

When a MultiStepData object is passed to the algorithm, it will iterate over the first axis (time steps) internally. When a SingleStepData object (or a plain array) is passed, the algorithm processes it as a single forward step.

Full Training Loop with Vmap Batching#

Below is a complete example that combines vmap-based batching with a temporal training loop. The pattern is:

Initialize model states and compile the graph for a single sample.
Vmap the algorithm across the batch dimension.
Scan over time steps, accumulating gradients at each step.
Update parameters with the accumulated gradients.

@brainstate.transform.jit
def train_step(inputs, targets):
    """inputs: (n_steps, batch_size, n_in), targets: (batch_size,)"""
    weights = model.states(brainstate.ParamState)

    # braintrace.compile with vmap=True replaces the manual init + compile_graph + Vmap pattern.
    # Pass the batched single time step inputs[0] (shape (batch_size, n_in)); compile strips
    # axis 0 internally to recover the per-sample example — do NOT pass inputs[0, 0].
    vmapped_algo = braintrace.compile(
        model, braintrace.D_RTRL, inputs[0],
        batch_size=inputs.shape[1], vmap=True,
    )

    def step_fn(prev_grads, inp):
        def loss_fn(inp):
            out = vmapped_algo(inp)
            return jnp.mean((out - targets) ** 2)
        cur_grads = brainstate.transform.grad(loss_fn, weights)(inp)
        return jax.tree.map(lambda a, b: a + b, prev_grads, cur_grads), None

    grads = jax.tree.map(jnp.zeros_like, weights.to_dict_values())
    grads, _ = brainstate.transform.scan(step_fn, grads, inputs)
    return grads

# Example usage
model = SimpleGRU(10, 64, 5)
inputs = jnp.ones((20, 16, 10))  # 20 steps, batch 16, 10 features
targets = jnp.zeros((16, 5))
grads = train_step(inputs, targets)
print("Gradient keys:", list(grads.keys()))

What happens in train_step:

weights collects all ParamState objects from the model.
braintrace.compile(model, braintrace.D_RTRL, inputs[0], batch_size=B, vmap=True) initialises per-sample hidden states, compiles the computation graph using a single time step’s batched input (inputs[0], shape (batch_size, n_in)), and wraps the result in brainstate.nn.Vmap for batch-parallel execution — all in one call.
brainstate.transform.scan iterates over the time dimension (inputs has shape (n_steps, batch_size, n_in)). At each step, step_fn computes the loss and its gradients with respect to weights, then accumulates them.
The returned grads dictionary can be passed to an optimizer (e.g., braintools.optim.Adam) for a parameter update.

Summary#

braintrace.compile(..., batch_size=B, vmap=True) is the recommended way to set up batched online learning. It replaces the three-step manual pattern (init_all_states + compile_graph + Vmap) with a single call.
The workflow is: braintrace.compile (with vmap=True) → scan over time steps → accumulate gradients → update parameters.
SingleStepData and MultiStepData control whether the algorithm processes one time step or scans over an entire sequence internally.
For non-batched (single-sample) use, call braintrace.compile without vmap=True; states are initialised for a single sample.

Batching Strategies

Contents

Batching Strategies#

Vmap-Based Batching (Recommended)#

Single-Sample Mode#

Multi-Step Data#

Full Training Loop with Vmap Batching#

Summary#

Modeling

Infrastructure

Compilation