Graph Compilation & Visualization

Graph Compilation & Visualization#

In braintrace, models are compiled into an ETraceGraph – an intermediate representation that captures the structural relationships between weight parameters, ETP primitives (the operations that connect inputs to hidden states), and hidden state groups. This compilation step is what enables efficient online learning: by analyzing the computation graph, braintrace can automatically determine which weights influence which hidden states, and how eligibility traces should propagate.

The show_graph() method visualizes these relationships, providing a human-readable summary of:

Hidden groups: clusters of hidden states that evolve together (e.g., the membrane potential and adaptation current of a neuron population)
Weight-primitive-hidden connections: which weight parameters are associated with which hidden groups through which ETP primitives
Non-ETP weights: parameters that exist in the model but do not participate in online learning

Understanding the compiled graph is essential for debugging model structure, verifying that the correct parameters are included in online learning, and optimizing model design.

Single-Layer RNN#

We start with the simplest case: a single recurrent layer followed by a linear readout. The ValinaRNNCell contains one hidden state and one recurrent weight, and the Linear readout has its own weight that feeds into the output.

import jax
import jax.numpy as jnp
import brainstate
import braintrace

class SingleLayerRNN(brainstate.nn.Module):
    def __init__(self, n_in, n_rec, n_out):
        super().__init__()
        self.rnn = braintrace.nn.ValinaRNNCell(n_in, n_rec)
        self.out = braintrace.nn.Linear(n_rec, n_out)

    def update(self, x):
        return self.out(self.rnn(x))


model = SingleLayerRNN(10, 32, 5)

# braintrace.compile initialises states, compiles the ETP graph, and returns a ready learner.
# We compile for a single unbatched sample (no batch_size), so the hidden state is (32,) and
# the recurrent op is the matrix-vector primitive etp_mv. verbose=2 prints full diagnostics.
learner = braintrace.compile(model, braintrace.D_RTRL, jnp.zeros(10), verbose=2)
learner.show_graph()

The output shows:

Hidden Group 0: the hidden state of the ValinaRNNCell (path ('rnn', 'h'))
Associated weight: the recurrent weight inside the RNN cell (('rnn', 'W', 'weight')), eligibility-traced because it feeds the hidden state through an ETP primitive
Non-etrace weight: the readout weight (('out', 'weight')). braintrace.nn.Linear does use an ETP primitive, but the readout’s output is the network’s final output and never flows back into a hidden state – so the compiler reports it as a non-temporal parameter (still trained, just not through an eligibility trace). A matching has no connected hidden states warning is emitted at compile time.

This tells us that D_RTRL will maintain an eligibility trace for the recurrent weight, tracking how it influences the hidden state over time.

Using `learner.report` — the CompilationReport#

Every learner returned by braintrace.compile carries a CompilationReport at learner.report. It aggregates the compiler’s findings into a single inspectable object so you can programmatically query what was included, what was excluded, and why — without parsing log output.

Key members:

report.counts — summary dict with keys hidden_groups, etrace_weights, excluded_weights, warnings, errors
report.etrace_weights — list of weight paths that have eligibility traces
report.excluded_weights — list of (weight path, reason) pairs excluded from online learning
report.dynamic_states — list of non-hidden dynamic-state paths discovered by the compiler
report.diagnostics — full list of CompilationRecord objects
report.show(level) — print a human-readable summary (1 = hidden groups + weight lists; 2 = also raw WARNING/ERROR diagnostics)

# report.show(level) prints a structured summary at the requested verbosity.
# level=1 shows hidden groups, etrace weights, and excluded weights.
learner.report.show(1)

# Programmatic access to the summary counts
print("Counts:", learner.report.counts)

# Which weights participate in online learning?
print("ETrace weights:", learner.report.etrace_weights)

# Which weights were excluded (e.g., non-temporal readouts)?
print("Excluded weights:", learner.report.excluded_weights)

Understanding ETraceGraph#

The compiled graph is an ETraceGraph named tuple with several key fields:

Field	Type	Description
`module_info`	`ModuleInfo`	Jaxpr and state mappings extracted from the model
`hidden_groups`	`Sequence[HiddenGroup]`	Discovered hidden state groups
`hid_path_to_group`	`Dict[Path, HiddenGroup]`	Mapping from hidden state path to its group
`hidden_param_op_relations`	`Sequence[HiddenParamOpRelation]`	Weight-primitive-hidden connections
`hidden_perturb`	`HiddenPerturbation` or `None`	Perturbation structure for Jacobian computation

Each HiddenGroup records a cluster of hidden states that are updated together in one recurrent step. Each HiddenParamOpRelation records the connection between a weight parameter and the hidden groups it feeds into through an ETP primitive.

Let’s inspect these programmatically:

graph = learner.graph

print("=== Hidden Groups ===")
for g in graph.hidden_groups:
    print(f"  Group {g.index}: {g.num_state} state(s), shape {g.varshape}")
    print(f"    Paths: {g.hidden_paths}")

print("\n=== Weight-Primitive-Hidden Relations ===")
for i, r in enumerate(graph.hidden_param_op_relations):
    print(f"  Relation {i}:")
    # ``trainable_paths`` is a dict {trainable key -> owning ParamState path};
    # a single primitive may own several (e.g. {weight, bias}).
    print(f"    Trainable paths: {r.trainable_paths}")
    print(f"    Primitive: {r.primitive}")
    print(f"    Hidden groups: {[g.index for g in r.hidden_groups]}")

print(f"\n=== Perturbation ===")
print(f"  Has perturbation: {graph.hidden_perturb is not None}")

The HiddenGroup.num_state property returns the total number of state variables in the group, and HiddenGroup.varshape returns the shape of each state variable. The HiddenParamOpRelation.primitive field identifies which ETP primitive connects the weight to the hidden state – here etp_mv (matrix-vector product); you will also see etp_mm (matrix-matrix, e.g. under batching) and etp_conv (convolution) in other models.

Two-Layer RNN#

Stacking recurrent layers makes the compiled graph richer: there are more hidden states and more weights, and the compiler must decide how to group the hidden states. Below we stack two GRUCell layers and a linear readout, then read the structure straight off show_graph() – rather than assuming one group per layer, we let the compiler tell us how it grouped things.

class TwoLayerRNN(brainstate.nn.Module):
    def __init__(self, n_in, n_rec, n_out):
        super().__init__()
        self.rnn1 = braintrace.nn.GRUCell(n_in, n_rec)
        self.rnn2 = braintrace.nn.GRUCell(n_rec, n_rec)
        self.out = braintrace.nn.Linear(n_rec, n_out)

    def update(self, x):
        h1 = self.rnn1(x)
        h2 = self.rnn2(h1)
        return self.out(h2)


model2 = TwoLayerRNN(10, 32, 5)

# Compile for a single unbatched sample (no batch_size).
learner2 = braintrace.compile(model2, braintrace.D_RTRL, jnp.zeros(10))
learner2.show_graph()

Notice that:

Both GRU hidden states – ('rnn1', 'h') and ('rnn2', 'h') – are collected into a single hidden group. The compiler co-locates coupled hidden states that share the same shape, stacking their per-state Jacobians along one axis; because both layers here use the same recurrent width (n_rec = 32), they land in one group. Give the two layers different widths and the compiler reports two separate groups instead – the Hidden State Management tutorial shows exactly that case (a 16-unit layer feeding an 8-unit layer yields one group each). The practical habit is to read the grouping off show_graph() rather than assume it.
Four weight matrices are eligibility-traced into that group: the update-gate (Wz) and candidate (Wh) weights of each layer.
The reset-gate weights (Wr) and the readout (out) are listed as non-etrace parameters. Wr is excluded by the weight -> weight -> hidden rule – its only path to a hidden state runs through the candidate weight Wh, so it must be learned by BPTT (see the Limitations tutorial) – while out never feeds a hidden state at all.

Reading this off show_graph() tells you exactly which weights D_RTRL learns online (the four traced GRU weights) and which it leaves to ordinary backprop (Wr, out).

# Inspect the two-layer graph programmatically via learner.graph and learner.report
graph2 = learner2.graph

print(f"Number of hidden groups: {len(graph2.hidden_groups)}")
print(f"Number of weight-hidden relations: {len(graph2.hidden_param_op_relations)}")

print("\nHidden groups:")
for g in graph2.hidden_groups:
    print(f"  Group {g.index}: {g.hidden_paths}")

print("\nRelations:")
for i, r in enumerate(graph2.hidden_param_op_relations):
    groups = [g.index for g in r.hidden_groups]
    # ``trainable_paths`` maps each trainable key to its owning ParamState path.
    print(f"  Weight {i}: {r.trainable_paths} -> hidden group(s) {groups}")

# Quick summary via the report
print("\nReport counts:", learner2.report.counts)

Convolutional Network#

ETP primitives also support convolutional operations via braintrace.nn.Conv2d. When a convolution operates recurrently on a hidden feature map – reading the current map and writing the result back – the compiler discovers the connection between the convolution kernel and the hidden state it updates. This demonstrates the generality of the graph compilation: it works with any ETP primitive, not just matrix multiplication.

Note. braintrace.nn.Conv2d takes a channel-last in_size of the form (H, W, C) (the spatial dimensions plus the number of input channels) and a separate out_channels. The kernel is only traced as an ETP parameter if its output reaches a hidden state directly – an intervening shape-changing op (such as a reshape flattening the feature map before a Linear) breaks the connection, just like the slice in the Limitations tutorial.

class ConvRNN(brainstate.nn.Module):
    """A convolutional *recurrent* layer.

    The conv operates on its own hidden feature map ``self.h`` and writes the
    result back, so the conv kernel feeds a hidden state **directly** -- which is
    what lets the compiler trace it as an ETP parameter.
    """

    def __init__(self):
        super().__init__()
        # in_size is the channel-last spatial+channel shape (H, W, C).
        self.conv = braintrace.nn.Conv2d(in_size=(28, 28, 8), out_channels=8,
                                         kernel_size=3, padding='SAME')
        self.h = brainstate.HiddenState(jnp.zeros((28, 28, 8)))
        self.out = braintrace.nn.Linear(8 * 28 * 28, 10)

    def update(self, x):
        # x: (28, 28, 8) external drive in the same feature space as the hidden map.
        # The conv reads the recurrent hidden map and writes back into it, with no
        # shape-changing op in between, so the kernel -> hidden connection is traced.
        self.h.value = jax.nn.tanh(self.conv(self.h.value) + x)
        return self.out(self.h.value.reshape(-1))


model3 = ConvRNN()

learner3 = braintrace.compile(model3, braintrace.D_RTRL, jnp.zeros((28, 28, 8)), batch_size=1)
learner3.show_graph()

In this model:

The Conv2d kernel is discovered as an ETP parameter (the etp_conv primitive), because the convolution operates on the recurrent hidden feature map self.h and writes the result back into it – so the kernel feeds a hidden state directly, exactly like a recurrent matmul weight would.
The single hidden group is the (28, 28, 8) convolutional feature map.
The readout Linear is non-temporal – its output does not flow back into a hidden state – so the compiler excludes it from eligibility-trace tracking (it is still trained, just as a plain parameter learned through the loss, not an online trace).

The key requirement is that the conv output reach the hidden state without an intervening shape-changing op (a reshape/slice) or another trainable ETP weight. Here it does, so the compiler traces the conv kernel’s influence on the recurrent hidden state – showing that ETP generalises beyond matrix multiplication to convolutional recurrence.

Using `compile_etrace_graph` Directly#

Most users should call braintrace.compile(...) — it initialises states, compiles the graph, and returns a learner in one call (access the underlying graph via learner.graph). For advanced users who want to inspect the graph without wrapping the model in an algorithm, braintrace.compile_etrace_graph() is also available. This is useful for:

Debugging model structure before training
Verifying that ETP primitives are correctly placed
Building custom online learning algorithms on top of the graph

model_direct = SingleLayerRNN(10, 32, 5)
brainstate.nn.init_all_states(model_direct)

graph_direct = braintrace.compile_etrace_graph(model_direct, jnp.zeros(10))

print(f"Number of hidden groups: {len(graph_direct.hidden_groups)}")
print(f"Number of relations: {len(graph_direct.hidden_param_op_relations)}")
print(f"Has perturbation: {graph_direct.hidden_perturb is not None}")

print("\nGraph fields:")
for key in graph_direct.dict().keys():
    print(f"  {key}")

Number of hidden groups: 1
Number of relations: 1
Has perturbation: True

Graph fields:
  module_info
  hidden_groups
  hid_path_to_group
  hidden_param_op_relations
  hidden_perturb
  diagnostics

The compile_etrace_graph() function returns the same ETraceGraph named tuple that is stored internally by D_RTRL and other algorithms. You can use it to build custom training loops or to programmatically analyze model structure.

Summary#

In this tutorial, we covered the graph compilation and visualization tools in braintrace:

compile_graph() (on algorithm objects) and compile_etrace_graph() (standalone function) analyze the model’s computation graph to discover the structural relationships between weights, ETP primitives, and hidden states
show_graph() provides a human-readable summary of the compiled graph, showing hidden groups, weight-hidden associations, and non-ETP parameters
The compiled graph reveals which weights participate in online learning – only weights whose ETP-primitive output feeds a hidden state are eligibility-traced; readouts and weights caught by the weight -> weight -> hidden rule (e.g. the GRU reset gate) are left to ordinary backprop
Multi-layer and convolutional models create richer graph structures – more hidden states and more weight relations – and the compiler decides how to group the hidden states, co-locating coupled same-shape states into a single group while keeping differently-shaped states apart
The ETraceGraph named tuple can be inspected programmatically for custom analysis or to build custom online learning algorithms

Understanding the compiled graph is a key step in verifying that your model is correctly structured for online learning with braintrace.

Graph Compilation & Visualization

Contents

Graph Compilation & Visualization#

Single-Layer RNN#

Using `learner.report` — the CompilationReport#

Understanding ETraceGraph#

Two-Layer RNN#

Convolutional Network#

Using `compile_etrace_graph` Directly#

Summary#

Modeling

Infrastructure

Compilation

Graph Compilation & Visualization

Contents

Graph Compilation & Visualization#

Single-Layer RNN#

Using learner.report — the CompilationReport#

Understanding ETraceGraph#

Two-Layer RNN#

Convolutional Network#

Using compile_etrace_graph Directly#

Summary#

Using `learner.report` — the CompilationReport#

Using `compile_etrace_graph` Directly#