BenchmarkResult

BenchmarkResult#

class brainevent.BenchmarkResult(records, primitive_name='')[source]#

Unified container for benchmark timing records across all (config × backend) pairs.

BenchmarkResult is returned by benchmark(). It stores every BenchmarkRecord collected during a benchmarking session and exposes methods for display, comparison, plotting, and serialisation.

Parameters:

records (List[BenchmarkRecord]) – All collected benchmark records. Each record represents one (config × backend) pair.
primitive_name (str) – Name of the primitive that produced these records. Used as the table heading when printing. Defaults to an empty string.

primitive_name#

Name label for the benchmarked primitive.

Type:: str

Methods — Accessors

-------------------

records#: Property — return a shallow copy of all stored records.

fastest(label=None)[source]#: Return the BenchmarkRecord with the lowest mean_ms, optionally restricted to a specific config label.

Methods — Display

-----------------

print(sort_by, group_by, compare_by, highlight_best, order_by, speedup_vs)[source]#: Print a formatted timing table to stdout. Supports flat, sorted, grouped, and hierarchical layouts, plus relative speedup columns.

Methods — Plotting

------------------

plot(ax, x, y, hue, style, kind, show, \*\*kwargs)[source]#: Produce a matplotlib figure visualising the results as a line, bar, or scatter chart.

Methods — Persistence

---------------------

save(path, format)[source]#: Write the result to disk as JSON (default), CSV, or pickle.

load(path)[source]#: Class method — deserialise a previously saved result. Format is inferred from the file extension.

to_dict[source]#: Return a JSON-serialisable dictionary representation of all records and metadata.

Notes

BenchmarkResult can also be constructed manually from a list of BenchmarkRecord objects, which is useful for offline analysis (merging results from different machines, aggregating saved runs, etc.).

__str__ / __repr__ delegate to print() so a plain print(result) always shows a formatted table.

Examples

Typical usage — run and display:

import brainevent
result = brainevent.binary_csrmv_p.benchmark(
    platform='gpu',
    n_warmup=5,
    n_runs=20,
    verbose=True,
)
# __str__ / __repr__ renders a formatted table
print(result)

Hierarchical display with per-group speedup:

# Rows grouped by (transpose, label); best backend per group
# marked with *, plus a speedup column vs. the 'numba' baseline.
result.print(
    order_by=['transpose', 'label', 'backend'],
    highlight_best=True,
    speedup_vs='numba',
)

Flat table: sort, group, and baseline comparison:

# Sorted by mean execution time (fastest first)
result.print(sort_by='mean_ms')

# Best backend per config label marked with an asterisk
result.print(group_by='label', highlight_best=True)

# Speedup column relative to the numba baseline (string expression)
result.print(compare_by="backend == 'numba'")

# Callable baseline selector
result.print(compare_by=lambda row: row.get('backend') == 'numba')

Accessing records programmatically:

# Iterate over all records
for rec in result.records:
    status = 'OK' if rec.success else f'FAILED: {rec.error}'
    print(f"{rec.backend:10s} | {rec.label:20s} | {rec.mean_ms:.3f} ms | {status}")

# Overall fastest successful record
fastest = result.fastest()
if fastest:
    print(f"Best overall: {fastest.backend} ({fastest.label}) — {fastest.mean_ms:.3f} ms")

# Fastest backend per config label
labels = dict.fromkeys(r.label for r in result.records)
for label in labels:
    rec = result.fastest(label=label)
    if rec:
        print(f"[{label}] winner: {rec.backend} ({rec.mean_ms:.4f} ms)")

# Custom aggregation: average mean_ms per backend
from collections import defaultdict
backend_times = defaultdict(list)
for rec in result.records:
    if rec.success:
        backend_times[rec.backend].append(rec.mean_ms)
for be, times in sorted(backend_times.items()):
    avg = sum(times) / len(times)
    print(f"{be:10s}: avg={avg:.4f} ms over {len(times)} configs")

Save and reload:

# JSON (default) — human-readable, round-trips with full fidelity
result.save('bench.json')
result2 = BenchmarkResult.load('bench.json')

# CSV — flat table, easy to open in a spreadsheet
result.save('bench.csv', format='csv')
result3 = BenchmarkResult.load('bench.csv')

# Pickle — lossless, preserves all dict fields
result.save('bench.pkl', format='pkl')
result4 = BenchmarkResult.load('bench.pkl')

Embedding in a larger JSON document:

import json

d = result.to_dict()
report = {
    'experiment': 'csrmv_gpu_sweep',
    'hardware': 'A100',
    'results': d,
}
with open('report.json', 'w') as f:
    json.dump(report, f, indent=2)

Building from scratch for offline / cross-platform analysis:

from brainevent._op.benchmark import BenchmarkRecord, BenchmarkResult

# Combine records collected on two separate machines
records = [
    BenchmarkRecord(
        platform='cpu', backend='numba', label='1k×1k',
        mean_ms=3.2, std_ms=0.1, min_ms=3.0,
        throughput=None, success=True, error=None,
        kernel_kwargs={'shape': (1000, 1000)},
        data_kwargs={'nnz': 100_000},
    ),
    BenchmarkRecord(
        platform='gpu', backend='pallas', label='1k×1k',
        mean_ms=0.42, std_ms=0.01, min_ms=0.40,
        throughput=None, success=True, error=None,
        kernel_kwargs={'shape': (1000, 1000)},
        data_kwargs={'nnz': 100_000},
    ),
    BenchmarkRecord(
        platform='gpu', backend='warp', label='1k×1k',
        mean_ms=0.60, std_ms=0.02, min_ms=0.58,
        throughput=None, success=True, error=None,
        kernel_kwargs={'shape': (1000, 1000)},
        data_kwargs={'nnz': 100_000},
    ),
]
combined = BenchmarkResult(records, primitive_name='binary_csrmv')
combined.print(group_by='label', highlight_best=True)
# Speedup vs. CPU numba baseline
combined.print(
    sort_by='mean_ms',
    compare_by="backend == 'numba' and platform == 'cpu'",
)

Plotting:

# Bar chart: one bar per (label, backend) pair
fig = result.plot(x='label', y='mean_ms', hue='backend', kind='bar')
fig.tight_layout()
fig.savefig('bench_bar.png', dpi=150)

# Line chart over config labels, one line per backend
fig2 = result.plot(x='label', y='min_ms', hue='backend', kind='line')
fig2.savefig('bench_line.png', dpi=150)

See also

BenchmarkConfig: Input specification for one benchmark configuration.
BenchmarkRecord: Individual timing record stored in this container.
XLACustomKernel.benchmark: Primary method that produces a BenchmarkResult.
benchmark_function: Low-level timing utility used internally.

fastest(label=None)[source]#

Return the fastest successful record.

Parameters:: label (str | None) – If given, consider only records whose label matches label exactly. Pass None (default) to search across all records.
Returns:: The BenchmarkRecord with the smallest mean_ms among all successful records (after optional label filtering), or None if no successful records exist.
Return type:: BenchmarkRecord | None

Examples

result = binary_csrmv_p.benchmark(platform='gpu')

# Overall fastest backend across all config labels
rec = result.fastest()
if rec:
    print(f"Best overall: {rec.backend} ({rec.label}) — {rec.mean_ms:.3f} ms")

# Fastest for a specific config label
rec = result.fastest(label='NT,homo,bool')
if rec:
    print(f"Best for NT,homo,bool: {rec.backend} — {rec.mean_ms:.3f} ms")

# Tabulate the winner for every label
labels = dict.fromkeys(r.label for r in result.records)
for label in labels:
    r = result.fastest(label=label)
    if r:
        print(f"[{label}] winner: {r.backend} ({r.mean_ms:.4f} ms)")

See also

records: Access all records for custom filtering and aggregation.
print: Display a formatted table with highlight_best=True to mark per-group winners visually.

classmethod load(path)[source]#

Deserialize a previously saved result.

The format is inferred from the file extension (.json, .csv, .pkl). Files without one of these suffixes are assumed to be JSON.

Parameters:

path (str | Path) – Path to the file written by save().

Returns:

A new BenchmarkResult populated from the file.

Return type:

BenchmarkResult

Raises:

FileNotFoundError – If path does not exist.
ValueError – If a .pkl file does not contain a BenchmarkResult.

Examples

# Round-trip with JSON
result.save('bench.json')
reloaded = BenchmarkResult.load('bench.json')
print(reloaded)

# Round-trip with CSV
result.save('bench.csv', format='csv')
reloaded_csv = BenchmarkResult.load('bench.csv')

# Round-trip with pickle
result.save('bench.pkl', format='pkl')
reloaded_pkl = BenchmarkResult.load('bench.pkl')

See also

save: Serialise a result to disk.

plot(ax=None, x=None, y='mean_ms', hue=None, style=None, kind='line', show=False, **kwargs)[source]#

Produce a visualization of the benchmark results.

Parameters:

ax (matplotlib Axes or None, optional) – Axes to draw into. If None, a new figure and axes are created.
x (str | None) – Column name for the x-axis (e.g., 'label', 'n_pre').
y (str) – Column name for the y-axis. Defaults to 'mean_ms'.
hue (str | None) – Column name used to colour-code different series (e.g., 'backend').
style (str | None) – Column name used to set line/marker style (seaborn only).
kind (str) – Plot type. Defaults to 'line'.
show (bool) – If True, call plt.show() after drawing. Defaults to False.
**kwargs – Additional keyword arguments forwarded to the underlying matplotlib / seaborn plotting function.

Returns:

The figure containing the plot.

Return type:

matplotlib.figure.Figure

Raises:

ImportError – If matplotlib or pandas is not installed.

Examples

result = binary_csrmv_p.benchmark(platform='gpu')

# Bar chart: mean_ms per config, coloured by backend
fig = result.plot(x='label', y='mean_ms', hue='backend', kind='bar')
fig.tight_layout()
fig.savefig('bench_bar.png', dpi=150)

# Line chart: min_ms vs. config label
fig2 = result.plot(x='label', y='min_ms', hue='backend', kind='line')
fig2.savefig('bench_line.png', dpi=150)

# Scatter: draw into an existing axes
import matplotlib.pyplot as plt
fig3, ax = plt.subplots()
result.plot(ax=ax, x='label', y='mean_ms', kind='scatter')
plt.show()

See also

print: Display a formatted text table instead.
save: Persist results to disk for later off-line plotting.

print(sort_by=None, group_by=None, compare_by=None, highlight_best=True, order_by=None, speedup_vs=None, vary_by=None)[source]#

Print the benchmark table with optional sorting, grouping, and comparison.

Parameters:

sort_by (str or list of str or None, optional) – Column name(s) to sort rows by. Numeric columns are sorted numerically; string columns lexicographically. Ignored when order_by is set.
group_by (str or list of str or None, optional) – Column name(s) to group rows by. Within each group the fastest backend is identified for highlighting and relative speedup computation. Ignored when order_by is set.
compare_by (str, callable, or None, optional) – Designate a baseline config for normalising performance. Pass a string expression (e.g., "label=='baseline'") evaluated against each row dict, or a callable (row_dict) -> bool. A speedup column is added showing baseline_mean / row_mean.
highlight_best (bool) – If True (default), visually mark the best-performing config per group with an asterisk (*).
order_by (list of str or None, optional) –
When provided, render the table in hierarchical mode. Rows are sorted and visually grouped by all columns in order_by except the last one. Repeated values in the group-key columns are suppressed after the first row of each group, and a separator line is drawn between groups. The fastest entry within each group (determined by the last column in order_by) is marked *. Overrides sort_by, group_by, and vary_by.

Example:
```
result.print(order_by=['transpose', 'shape', 'backend'])
```
speedup_vs (str or None, optional) –
Active with order_by or vary_by. Name of the leaf-column value (typically a backend name) to use as the per-group baseline. Adds a vs_<name> column showing baseline_mean / row_mean for every row in that group. A value > 1 means the row is faster than the baseline.

Example:
```
result.print(
    order_by=['transpose', 'shape', 'backend'],
    speedup_vs='numba',
)
```
vary_by (str or list of str or None, optional) –
Shorthand grouping mode. Names the column(s) that vary within each group; everything else (excluding metrics) forms the fixed group boundary.
- Single string — one column varies, all others are the group key. A separator line is drawn between each group and the fastest leaf-column value is marked *:
```
result.print(vary_by='backend')
```
- Ordered list — multiple columns vary; the separator fires only when the fixed columns change; earlier vary-columns are suppressed when they repeat consecutively; the last element is the finest leaf:
```
result.print(vary_by=['transpose', 'backend'])
```
* and speedup_vs are computed per (fixed_keys + outer_vary_keys) sub-group. order_by takes precedence if both are given.

Return type:

None

Examples

result = binary_csrmv_p.benchmark(platform='gpu')

# Default: plain table in insertion order
result.print()

# Sorted by mean execution time (fastest first)
result.print(sort_by='mean_ms')

# Group by config label; fastest backend per group marked *
result.print(group_by='label', highlight_best=True)

# Speedup column vs. the numba baseline (string expression)
result.print(
    sort_by='mean_ms',
    compare_by="backend == 'numba'",
)

# Callable baseline selector
result.print(compare_by=lambda row: row.get('backend') == 'numba')

# Hierarchical view: group by (transpose, label), mark best backend
result.print(
    order_by=['transpose', 'label', 'backend'],
    highlight_best=True,
)

# Hierarchical + per-group speedup vs. numba
result.print(
    order_by=['transpose', 'label', 'backend'],
    highlight_best=True,
    speedup_vs='numba',
)

# vary_by shorthand: backend varies, everything else is the group
result.print(vary_by='backend', speedup_vs='numba_cuda')

# vary_by with two levels: transpose is outer, backend is leaf
result.print(vary_by=['transpose', 'backend'], speedup_vs='numba_cuda')

See also

fastest: Return the fastest record programmatically.
plot: Produce a matplotlib visualisation.

property records: List[BenchmarkRecord]#

Return a shallow copy of all benchmark records.

Returns:: A new list containing every stored BenchmarkRecord. Each record represents one (config × backend) timing run. Modifying the returned list does not affect the internal state.
Return type:: list of BenchmarkRecord

Examples

result = binary_csrmv_p.benchmark(platform='gpu')
print(f"Total records: {len(result.records)}")

# Filter to successful records only
ok = [r for r in result.records if r.success]

# Custom aggregation: geometric mean per backend
import math
from collections import defaultdict
backend_times = defaultdict(list)
for rec in result.records:
    if rec.success:
        backend_times[rec.backend].append(rec.mean_ms)
for be, times in sorted(backend_times.items()):
    gm = math.exp(sum(math.log(t) for t in times) / len(times))
    print(f"{be}: geomean={gm:.4f} ms over {len(times)} configs")

See also

fastest: Return the single fastest successful record.

save(path, format='json')[source]#

Serialize the result to disk.

Parameters:

path (str | Path) – Destination file path. Parent directories are created automatically if they do not exist.
format (Literal['json', 'csv', 'pkl']) –
Serialization format:
- 'json' (default) — human-readable JSON; round-trips with full fidelity for all field types supported by to_dict().
- 'csv' — flat CSV table; easily opened in spreadsheet tools. kernel_kwargs and data_kwargs are not preserved as nested dicts (they are omitted from the flat rows).
- 'pkl' — binary pickle; lossless, preserves all dict fields but not portable across Python versions.

Raises:

ValueError – If format is not one of the supported values.

Return type:

None

Examples

result = binary_csrmv_p.benchmark(platform='gpu')

# Default JSON
result.save('results/bench.json')

# CSV for spreadsheet analysis
result.save('results/bench.csv', format='csv')

# Lossless pickle
result.save('results/bench.pkl', format='pkl')

See also

load: Deserialise a previously saved file.
to_dict: Access the JSON-serialisable dict directly.

to_dict()[source]#

Return a JSON-serialisable dictionary representation.

The returned dictionary contains the primitive name and the full list of records in the same format used by save() (JSON). It can be passed directly to json.dump(), embedded in a larger document, or used to reconstruct a BenchmarkResult via load().

Returns:

A dictionary with two top-level keys:

'primitive_name'str: The benchmarked primitive’s name.
'records'list of dict: One dict per BenchmarkRecord. Each dict has keys: platform, backend, label, mean_ms, std_ms, min_ms, throughput, success, error, kernel_kwargs, data_kwargs.

Return type:

dict

Examples

result = binary_csrmv_p.benchmark(platform='gpu')
d = result.to_dict()

# Pretty-print to console
import json
print(json.dumps(d, indent=2))

# Embed in a larger report document
report = {
    'experiment': 'csrmv_gpu_sweep',
    'hardware': 'A100-SXM4-80GB',
    'results': d,
}
with open('report.json', 'w') as f:
    json.dump(report, f, indent=2)

# Access individual record fields
for rec in d['records']:
    print(rec['backend'], rec['mean_ms'])

See also

save: Write directly to disk (JSON / CSV / pickle).
load: Reconstruct a BenchmarkResult from a file.

BenchmarkResult

Contents

BenchmarkResult#