Tutorial 1: NevergradOptimizer Tutorial

Tutorial 1: `NevergradOptimizer` Tutorial#

This tutorial demonstrates how to use the NevergradOptimizer from BrainTools for black-box optimization problems. The NevergradOptimizer is a powerful wrapper around the Nevergrad library that provides batched evaluation support and seamless integration with JAX and BrainUnit.

Introduction and Setup#

The NevergradOptimizer is designed for derivative-free optimization where gradients are unavailable or unreliable. It’s particularly useful for:

Hyperparameter optimization
Neural architecture search
Complex loss landscapes
Noisy objective functions

Let’s start by importing the necessary libraries:

import brainunit as u
import braintools
import jax
import jax.numpy as jnp
import matplotlib.pyplot as plt
import numpy as np

# Set up plotting
plt.style.use('default')
plt.rcParams['figure.figsize'] = (10, 6)

print("BrainTools version:", braintools.__version__)
print("JAX version:", jax.__version__)

BrainTools version: 0.0.12
JAX version: 0.7.1

First, let’s check if Nevergrad is installed (it’s required for the optimizer to work):

try:
    import nevergrad as ng

    print(f"Nevergrad version: {ng.__version__}")
    print("✓ Nevergrad is available")
except ImportError:
    print("❌ Nevergrad is not installed. Please install it with:")
    print("   pip install nevergrad")
    raise

Nevergrad version: 1.0.12
✓ Nevergrad is available

Basic Usage: Scalar Optimization#

Let’s start with a simple example: optimizing a basic quadratic function with two scalar parameters.

# Define a simple quadratic loss function
def quadratic_loss(x, y):
    """
    Batched quadratic loss function.
    
    Args:
        x, y: JAX arrays of shape (n_sample,)
        
    Returns:
        Array of losses, shape (n_sample,)
    """
    return (x - 2.0) ** 2 + (y + 1.0) ** 2


# Define bounds for each parameter: (min, max)
bounds = [
    (-5.0, 5.0),  # bounds for x
    (-3.0, 3.0),  # bounds for y
]

# Create the optimizer
optimizer = braintools.optim.NevergradOptimizer(
    batched_loss_fun=quadratic_loss,
    bounds=bounds,
    n_sample=10,  # Number of candidates per iteration
    method='DE'  # Differential Evolution
)

print("Optimizer created successfully!")
print(f"Method: {optimizer.method}")
print(f"Population size: {optimizer.n_sample}")

Optimizer created successfully!
Method: DE
Population size: 10

# Run the optimization
best_params = optimizer.minimize(n_iter=20, verbose=True)

print(f"\nOptimization completed!")
print(f"Best parameters: x={best_params[0]:.4f}, y={best_params[1]:.4f}")
print(f"True optimum: x=2.0, y=-1.0")
print(f"Final loss: {quadratic_loss(best_params[0], best_params[1]):.6f}")

Iteration 0, best error: 0.25561, best parameters: [1.5779222385288518, -0.7216862168441729]
Iteration 1, best error: 0.22335, best parameters: [2.179454790119798, -0.5627973574184901]
Iteration 2, best error: 0.22335, best parameters: [2.179454790119798, -0.5627973574184901]
Iteration 3, best error: 0.01908, best parameters: [2.1325511093016987, -0.9610894268195804]
Iteration 4, best error: 0.01830, best parameters: [1.8810521585283122, -0.9355450228396044]
Iteration 5, best error: 0.01830, best parameters: [1.8810521585283122, -0.9355450228396044]
Iteration 6, best error: 0.01830, best parameters: [1.8810521585283122, -0.9355450228396044]
Iteration 7, best error: 0.00248, best parameters: [1.9688748933374687, -0.9610894268195804]
Iteration 8, best error: 0.00248, best parameters: [1.9688748933374687, -0.9610894268195804]
Iteration 9, best error: 0.00248, best parameters: [1.9688748933374687, -0.9610894268195804]
Iteration 10, best error: 0.00248, best parameters: [1.9688748933374687, -0.9610894268195804]
Iteration 11, best error: 0.00248, best parameters: [1.9688748933374687, -0.9610894268195804]
Iteration 12, best error: 0.00248, best parameters: [1.9688748933374687, -0.9610894268195804]
Iteration 13, best error: 0.00248, best parameters: [1.9688748933374687, -0.9610894268195804]
Iteration 14, best error: 0.00248, best parameters: [1.9688748933374687, -0.9610894268195804]
Iteration 15, best error: 0.00248, best parameters: [1.9688748933374687, -0.9610894268195804]
Iteration 16, best error: 0.00248, best parameters: [1.9688748933374687, -0.9610894268195804]
Iteration 17, best error: 0.00248, best parameters: [1.9688748933374687, -0.9610894268195804]
Iteration 18, best error: 0.00248, best parameters: [1.9688748933374687, -0.9610894268195804]
Iteration 19, best error: 0.00248, best parameters: [1.9688748933374687, -0.9610894268195804]

Optimization completed!
Best parameters: x=1.9689, y=-0.9611
True optimum: x=2.0, y=-1.0
Final loss: 0.002483

# Visualize the optimization progress
plt.figure(figsize=(12, 4))

# Plot error evolution
plt.subplot(1, 2, 1)
plt.plot(optimizer.errors, 'b-', alpha=0.6, label='All evaluations')
# Plot best so far
best_so_far = np.minimum.accumulate(optimizer.errors)
plt.plot(best_so_far, 'r-', linewidth=2, label='Best so far')
plt.yscale('log')
plt.xlabel('Evaluation')
plt.ylabel('Loss')
plt.title('Optimization Progress')
plt.legend()
plt.grid(True, alpha=0.3)

# Plot parameter evolution
plt.subplot(1, 2, 2)
candidates = np.array(optimizer.candidates)
plt.scatter(candidates[:, 0], candidates[:, 1], c=optimizer.errors,
            cmap='viridis_r', alpha=0.6, s=20)
plt.colorbar(label='Loss')
plt.axvline(x=2.0, color='red', linestyle='--', alpha=0.7, label='True optimum')
plt.axhline(y=-1.0, color='red', linestyle='--', alpha=0.7)
plt.scatter(best_params[0], best_params[1], color='red', s=100,
            marker='*', label='Best found', zorder=5)
plt.xlabel('x')
plt.ylabel('y')
plt.title('Parameter Space Exploration')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

../_images/748e6650602dfda2f20f96571c91480561e2a9b18b445ddf16a00a73a6f7e18d.png

Multi-dimensional Parameter Optimization#

The NevergradOptimizer can handle multi-dimensional parameters (arrays) as well as scalars. Let’s optimize a function with array parameters.

# Define a loss function with array parameters
def matrix_loss(W, b):
    """
    Loss function with matrix and vector parameters.
    
    Args:
        W: JAX array of shape (n_sample, 3, 2) - batch of 3x2 matrices
        b: JAX array of shape (n_sample, 2) - batch of 2D vectors
        
    Returns:
        Array of losses, shape (n_sample,)
    """
    # Target matrix and bias
    W_target = jnp.array([[1.0, -0.5], [0.3, 1.2], [-0.8, 0.9]])
    b_target = jnp.array([0.1, -0.3])

    # Compute Frobenius norm of differences
    W_diff = W - W_target[None, :, :]
    b_diff = b - b_target[None, :]

    return jnp.sum(W_diff ** 2, axis=(1, 2)) + jnp.sum(b_diff ** 2, axis=1)


# Define bounds for array parameters
bounds = [
    (jnp.full((3, 2), -2.0), jnp.full((3, 2), 2.0)),  # bounds for W (3x2 matrix)
    (jnp.full(2, -1.0), jnp.full(2, 1.0)),  # bounds for b (2D vector)
]

# Create optimizer with more sophisticated method
optimizer = braintools.optim.NevergradOptimizer(
    batched_loss_fun=matrix_loss,
    bounds=bounds,
    n_sample=20,
    method='CMA'  # Covariance Matrix Adaptation
)

print("Multi-dimensional optimizer created!")
print(f"Parameter shapes: W={bounds[0][0].shape}, b={bounds[1][0].shape}")

Multi-dimensional optimizer created!
Parameter shapes: W=(3, 2), b=(2,)

# Run optimization
best_params = optimizer.minimize(n_iter=15, verbose=True)

W_best, b_best = best_params
print(f"\nBest W matrix:")
print(W_best)
print(f"\nBest b vector:")
print(b_best)

# Compare with targets
W_target = jnp.array([[1.0, -0.5], [0.3, 1.2], [-0.8, 0.9]])
b_target = jnp.array([0.1, -0.3])
print(f"\nTarget W matrix:")
print(W_target)
print(f"\nTarget b vector:")
print(b_target)

print(f"\nFinal loss: {matrix_loss(W_best[None, :, :], b_best[None, :])[0]:.6f}")

Iteration 0, best error: 3.06228, best parameters: [array([[ 0.1319265 ,  0.12554918],
       [ 0.1219798 ,  0.22701639],
       [-0.45197331,  0.07438969]]), array([-0.20415943, -0.09081564])]
Iteration 1, best error: 2.37360, best parameters: [array([[ 0.57418931, -0.3839883 ],
       [ 0.10332607,  0.44112678],
       [ 0.10670122,  0.08995476]]), array([ 0.04396808, -0.0121924 ])]
Iteration 2, best error: 1.66984, best parameters: [array([[ 0.45320926, -0.02988484],
       [ 0.12446372,  0.78202291],
       [-0.1857579 ,  0.16924231]]), array([-0.02571888, -0.16873823])]
Iteration 3, best error: 1.57220, best parameters: [array([[ 0.45811746,  0.1171277 ],
       [ 0.20338258,  0.71623132],
       [-0.20948488,  0.46675391]]), array([0.07615692, 0.04259765])]
Iteration 4, best error: 1.13585, best parameters: [array([[ 0.59234747, -0.1824433 ],
       [ 0.04802526,  0.7759153 ],
       [-0.16934964,  0.53639949]]), array([0.10434092, 0.00909982])]
Iteration 5, best error: 0.64211, best parameters: [array([[ 1.17984971, -0.2327991 ],
       [ 0.47433285,  1.0244372 ],
       [-0.27971846,  0.57972288]]), array([-0.00452593,  0.00489644])]
Iteration 6, best error: 0.28482, best parameters: [array([[ 1.01529137, -0.38012391],
       [ 0.58950817,  1.39789853],
       [-0.60435638,  0.72684862]]), array([ 0.03362839, -0.02691852])]
Iteration 7, best error: 0.28482, best parameters: [array([[ 1.01529137, -0.38012391],
       [ 0.58950817,  1.39789853],
       [-0.60435638,  0.72684862]]), array([ 0.03362839, -0.02691852])]
Iteration 8, best error: 0.27480, best parameters: [array([[ 1.1448838 , -0.44785598],
       [ 0.26842684,  1.31264979],
       [-0.7427719 ,  0.42019695]]), array([ 0.16201322, -0.2914579 ])]
Iteration 9, best error: 0.27480, best parameters: [array([[ 1.1448838 , -0.44785598],
       [ 0.26842684,  1.31264979],
       [-0.7427719 ,  0.42019695]]), array([ 0.16201322, -0.2914579 ])]
Iteration 10, best error: 0.22019, best parameters: [array([[ 0.80992882, -0.68508829],
       [ 0.59793163,  1.31998019],
       [-0.72473842,  0.9786355 ]]), array([ 0.16063863, -0.47642237])]
Iteration 11, best error: 0.22019, best parameters: [array([[ 0.80992882, -0.68508829],
       [ 0.59793163,  1.31998019],
       [-0.72473842,  0.9786355 ]]), array([ 0.16063863, -0.47642237])]
Iteration 12, best error: 0.22019, best parameters: [array([[ 0.80992882, -0.68508829],
       [ 0.59793163,  1.31998019],
       [-0.72473842,  0.9786355 ]]), array([ 0.16063863, -0.47642237])]
Iteration 13, best error: 0.10389, best parameters: [array([[ 1.05913023, -0.29764577],
       [ 0.28055037,  0.97868875],
       [-0.79499943,  0.88000241]]), array([ 0.19508606, -0.27504107])]
Iteration 14, best error: 0.10389, best parameters: [array([[ 1.05913023, -0.29764577],
       [ 0.28055037,  0.97868875],
       [-0.79499943,  0.88000241]]), array([ 0.19508606, -0.27504107])]

Best W matrix:
[[ 1.05913023 -0.29764577]
 [ 0.28055037  0.97868875]
 [-0.79499943  0.88000241]]

Best b vector:
[ 0.19508606 -0.27504107]

Target W matrix:
[[ 1.  -0.5]
 [ 0.3  1.2]
 [-0.8  0.9]]

Target b vector:
[ 0.1 -0.3]

Final loss: 0.103890

# Visualize optimization progress for multi-dimensional case
plt.figure(figsize=(10, 4))

plt.subplot(1, 2, 1)
plt.plot(optimizer.errors, 'b-', alpha=0.6, label='All evaluations')
best_so_far = np.minimum.accumulate(optimizer.errors)
plt.plot(best_so_far, 'r-', linewidth=2, label='Best so far')
plt.yscale('log')
plt.xlabel('Evaluation')
plt.ylabel('Loss')
plt.title('Multi-dimensional Optimization Progress')
plt.legend()
plt.grid(True, alpha=0.3)

# Show parameter convergence for a few elements
plt.subplot(1, 2, 2)
candidates = optimizer.candidates
W_evolution = [c[0] for c in candidates]  # Extract W matrices
b_evolution = [c[1] for c in candidates]  # Extract b vectors

# Plot evolution of W[0,0], W[1,1], and b[0]
W_00 = [W[0, 0] for W in W_evolution]
W_11 = [W[1, 1] for W in W_evolution]
b_0 = [b[0] for b in b_evolution]

plt.plot(W_00, label='W[0,0]', alpha=0.8)
plt.plot(W_11, label='W[1,1]', alpha=0.8)
plt.plot(b_0, label='b[0]', alpha=0.8)

# Target values
plt.axhline(y=W_target[0, 0], color='blue', linestyle='--', alpha=0.5)
plt.axhline(y=W_target[1, 1], color='orange', linestyle='--', alpha=0.5)
plt.axhline(y=b_target[0], color='green', linestyle='--', alpha=0.5)

plt.xlabel('Evaluation')
plt.ylabel('Parameter Value')
plt.title('Parameter Convergence')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

../_images/8168d131f012aae5859855ba2f33c84257be76c510aa590c7a262177adc2fc10.png

Named Parameters with Dictionary Bounds#

For better organization, especially with many parameters, you can use dictionary bounds to give meaningful names to your parameters.

# Define a loss function that accepts named parameters
def named_loss(**params):
    """
    Loss function with named parameters.
    
    Args:
        **params: Dictionary with keys 'learning_rate', 'momentum', 'weight_decay'
                 Each value is a JAX array of shape (n_sample,)
                 
    Returns:
        Array of losses, shape (n_sample,)
    """
    lr = params['learning_rate']
    momentum = params['momentum']
    weight_decay = params['weight_decay']

    # Simulate a validation loss based on hyperparameters
    # This is a synthetic example - in practice, you'd train a model
    optimal_lr = 0.001
    optimal_momentum = 0.9
    optimal_wd = 0.0001

    lr_penalty = (jnp.log(lr) - jnp.log(optimal_lr)) ** 2
    momentum_penalty = (momentum - optimal_momentum) ** 2
    wd_penalty = (jnp.log(weight_decay + 1e-8) - jnp.log(optimal_wd)) ** 2

    # Add some noise to simulate real hyperparameter optimization
    noise = 0.1 * jax.random.normal(jax.random.PRNGKey(42), lr.shape)

    return lr_penalty + momentum_penalty + wd_penalty + noise


# Define named bounds
bounds = {
    'learning_rate': (1e-5, 1e-1),  # Log scale is often better for learning rates
    'momentum': (0.5, 0.99),
    'weight_decay': (1e-6, 1e-2),
}

# Create optimizer
optimizer = braintools.optim.NevergradOptimizer(
    batched_loss_fun=named_loss,
    bounds=bounds,
    n_sample=15,
    method='TwoPointsDE'  # Two-point Differential Evolution
)

print("Named parameter optimizer created!")
print(f"Parameters: {list(bounds.keys())}")

Named parameter optimizer created!
Parameters: ['learning_rate', 'momentum', 'weight_decay']

# Run optimization
best_params = optimizer.minimize(n_iter=25, verbose=True)

print(f"\nOptimal hyperparameters found:")
for param_name, value in best_params.items():
    print(f"  {param_name}: {value:.6f}")

print(f"\nTrue optimal values:")
print(f"  learning_rate: 0.001000")
print(f"  momentum: 0.900000")
print(f"  weight_decay: 0.000100")

Iteration 0, best error: 5.47494, best parameters: {'learning_rate': 0.009383971149458234, 'momentum': 0.9691190314635103, 'weight_decay': 0.00018051704200176024}
Iteration 1, best error: 5.47494, best parameters: {'learning_rate': 0.009383971149458234, 'momentum': 0.9691190314635103, 'weight_decay': 0.00018051704200176024}
Iteration 2, best error: 5.47494, best parameters: {'learning_rate': 0.009383971149458234, 'momentum': 0.9691190314635103, 'weight_decay': 0.00018051704200176024}
Iteration 3, best error: 4.44929, best parameters: {'learning_rate': 0.0029384439812353415, 'momentum': 0.8719176183870795, 'weight_decay': 0.0006102275151344754}
Iteration 4, best error: 2.34815, best parameters: {'learning_rate': 0.000501531539502326, 'momentum': 0.9870790594480003, 'weight_decay': 0.000397248490393415}
Iteration 5, best error: 2.25207, best parameters: {'learning_rate': 0.0038051526275209723, 'momentum': 0.8041034810236244, 'weight_decay': 0.00018051704200176054}
Iteration 6, best error: 0.83839, best parameters: {'learning_rate': 0.0016395956825206277, 'momentum': 0.8748682021335097, 'weight_decay': 4.7188740697419946e-05}
Iteration 7, best error: 0.83839, best parameters: {'learning_rate': 0.0016395956825206277, 'momentum': 0.8748682021335097, 'weight_decay': 4.7188740697419946e-05}
Iteration 8, best error: 0.26655, best parameters: {'learning_rate': 0.0006329703212461424, 'momentum': 0.8333596447309115, 'weight_decay': 9.717363337197931e-05}
Iteration 9, best error: 0.26655, best parameters: {'learning_rate': 0.0006329703212461424, 'momentum': 0.8333596447309115, 'weight_decay': 9.717363337197931e-05}
Iteration 10, best error: 0.26655, best parameters: {'learning_rate': 0.0006329703212461424, 'momentum': 0.8333596447309115, 'weight_decay': 9.717363337197931e-05}
Iteration 11, best error: 0.26655, best parameters: {'learning_rate': 0.0006329703212461424, 'momentum': 0.8333596447309115, 'weight_decay': 9.717363337197931e-05}
Iteration 12, best error: 0.08936, best parameters: {'learning_rate': 0.000625331014102345, 'momentum': 0.8593500141294951, 'weight_decay': 8.986604766108071e-05}
Iteration 13, best error: 0.06288, best parameters: {'learning_rate': 0.0010962089332115371, 'momentum': 0.7654055682214886, 'weight_decay': 8.859737970218184e-05}
Iteration 14, best error: 0.06052, best parameters: {'learning_rate': 0.0012668109779111687, 'momentum': 0.9064873633629937, 'weight_decay': 9.039588241842516e-05}
Iteration 15, best error: -0.01869, best parameters: {'learning_rate': 0.0014086680103050566, 'momentum': 0.8593500141294951, 'weight_decay': 0.00010827672328188666}
Iteration 16, best error: -0.01869, best parameters: {'learning_rate': 0.0014086680103050566, 'momentum': 0.8593500141294951, 'weight_decay': 0.00010827672328188666}
Iteration 17, best error: -0.01869, best parameters: {'learning_rate': 0.0014086680103050566, 'momentum': 0.8593500141294951, 'weight_decay': 0.00010827672328188666}
Iteration 18, best error: -0.01869, best parameters: {'learning_rate': 0.0014086680103050566, 'momentum': 0.8593500141294951, 'weight_decay': 0.00010827672328188666}
Iteration 19, best error: -0.01869, best parameters: {'learning_rate': 0.0014086680103050566, 'momentum': 0.8593500141294951, 'weight_decay': 0.00010827672328188666}
Iteration 20, best error: -0.01869, best parameters: {'learning_rate': 0.0014086680103050566, 'momentum': 0.8593500141294951, 'weight_decay': 0.00010827672328188666}
Iteration 21, best error: -0.01869, best parameters: {'learning_rate': 0.0014086680103050566, 'momentum': 0.8593500141294951, 'weight_decay': 0.00010827672328188666}
Iteration 22, best error: -0.01869, best parameters: {'learning_rate': 0.0014086680103050566, 'momentum': 0.8593500141294951, 'weight_decay': 0.00010827672328188666}
Iteration 23, best error: -0.01869, best parameters: {'learning_rate': 0.0014086680103050566, 'momentum': 0.8593500141294951, 'weight_decay': 0.00010827672328188666}
Iteration 24, best error: -0.01869, best parameters: {'learning_rate': 0.0014086680103050566, 'momentum': 0.8593500141294951, 'weight_decay': 0.00010827672328188666}

Optimal hyperparameters found:
  learning_rate: 0.001409
  momentum: 0.859350
  weight_decay: 0.000108

True optimal values:
  learning_rate: 0.001000
  momentum: 0.900000
  weight_decay: 0.000100

# Visualize hyperparameter optimization
fig, axes = plt.subplots(2, 2, figsize=(12, 8))

# Error evolution
axes[0, 0].plot(optimizer.errors, 'b-', alpha=0.6)
best_so_far = np.minimum.accumulate(optimizer.errors)
axes[0, 0].plot(best_so_far, 'r-', linewidth=2)
axes[0, 0].set_yscale('log')
axes[0, 0].set_xlabel('Evaluation')
axes[0, 0].set_ylabel('Loss')
axes[0, 0].set_title('Hyperparameter Optimization Progress')
axes[0, 0].grid(True, alpha=0.3)

# Extract parameter evolution
candidates = optimizer.candidates
lr_values = [c['learning_rate'] for c in candidates]
momentum_values = [c['momentum'] for c in candidates]
wd_values = [c['weight_decay'] for c in candidates]

# Learning rate evolution (log scale)
axes[0, 1].plot(lr_values, 'g-', alpha=0.8)
axes[0, 1].axhline(y=0.001, color='red', linestyle='--', alpha=0.7, label='Optimal')
axes[0, 1].set_yscale('log')
axes[0, 1].set_xlabel('Evaluation')
axes[0, 1].set_ylabel('Learning Rate')
axes[0, 1].set_title('Learning Rate Evolution')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Momentum evolution
axes[1, 0].plot(momentum_values, 'orange', alpha=0.8)
axes[1, 0].axhline(y=0.9, color='red', linestyle='--', alpha=0.7, label='Optimal')
axes[1, 0].set_xlabel('Evaluation')
axes[1, 0].set_ylabel('Momentum')
axes[1, 0].set_title('Momentum Evolution')
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)

# Weight decay evolution (log scale)
axes[1, 1].plot(wd_values, 'purple', alpha=0.8)
axes[1, 1].axhline(y=0.0001, color='red', linestyle='--', alpha=0.7, label='Optimal')
axes[1, 1].set_yscale('log')
axes[1, 1].set_xlabel('Evaluation')
axes[1, 1].set_ylabel('Weight Decay')
axes[1, 1].set_title('Weight Decay Evolution')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

../_images/67d5039d6763f8220acf1333cb55418855e31ee7e14f4f62f37c2a27268115bf.png

Working with BrainUnit Quantities#

The NevergradOptimizer seamlessly integrates with BrainUnit quantities, allowing you to optimize parameters with physical units.

# Define a loss function with physical parameters
def neuron_loss(tau_m, V_th, I_ext):
    """
    Loss function for neuron model parameters.
    
    Args:
        tau_m: Membrane time constant (with time units)
        V_th: Threshold voltage (with voltage units)
        I_ext: External current (with current units)
        
    Returns:
        Array of losses
    """
    # Target parameters for a realistic neuron
    tau_target = 20.0 * u.ms
    V_th_target = -50.0 * u.mV
    I_target = 100.0 * u.pA

    # Compute normalized differences
    tau_diff = ((tau_m - tau_target) / (10.0 * u.ms)) ** 2
    V_diff = ((V_th - V_th_target) / (10.0 * u.mV)) ** 2
    I_diff = ((I_ext - I_target) / (50.0 * u.pA)) ** 2

    return u.get_mantissa(tau_diff + V_diff + I_diff)


# Define bounds with units
bounds = [
    (5.0 * u.ms, 50.0 * u.ms),  # tau_m: membrane time constant
    (-80.0 * u.mV, -30.0 * u.mV),  # V_th: threshold voltage
    (10.0 * u.pA, 200.0 * u.pA),  # I_ext: external current
]

# Create optimizer
optimizer = braintools.optim.NevergradOptimizer(
    batched_loss_fun=neuron_loss,
    bounds=bounds,
    n_sample=12,
    method='PSO'  # Particle Swarm Optimization
)

print("Neuron parameter optimizer with units created!")
print(f"tau_m bounds: {bounds[0][0]} to {bounds[0][1]}")
print(f"V_th bounds: {bounds[1][0]} to {bounds[1][1]}")
print(f"I_ext bounds: {bounds[2][0]} to {bounds[2][1]}")

Neuron parameter optimizer with units created!
tau_m bounds: 5.0 * msecond to 50.0 * msecond
V_th bounds: -80.0 * mvolt to -30.0 * mvolt
I_ext bounds: 10.0 * pamp to 200.0 * pamp

# Run optimization with units
best_params = optimizer.minimize(n_iter=20, verbose=True)

tau_best, V_th_best, I_best = best_params

print(f"\nOptimal neuron parameters:")
print(f"  tau_m: {tau_best}")
print(f"  V_th: {V_th_best}")
print(f"  I_ext: {I_best}")

print(f"\nTarget parameters:")
print(f"  tau_m: {20.0 * u.ms}")
print(f"  V_th: {-50.0 * u.mV}")
print(f"  I_ext: {100.0 * u.pA}")

# Evaluate final loss
final_loss = neuron_loss(tau_best, V_th_best, I_best)
print(f"\nFinal loss: {final_loss:.6f}")

Iteration 0, best error: 0.06290, best parameters: [20.869514 * msecond, -47.689476 * mvolt, 97.78947 * pamp]
Iteration 1, best error: 0.06290, best parameters: [20.869514 * msecond, -47.689476 * mvolt, 97.78947 * pamp]
Iteration 2, best error: 0.06290, best parameters: [20.869514 * msecond, -47.689476 * mvolt, 97.78947 * pamp]
Iteration 3, best error: 0.06290, best parameters: [20.869514 * msecond, -47.689476 * mvolt, 97.78947 * pamp]
Iteration 4, best error: 0.06290, best parameters: [20.869514 * msecond, -47.689476 * mvolt, 97.78947 * pamp]
Iteration 5, best error: 0.06290, best parameters: [20.869514 * msecond, -47.689476 * mvolt, 97.78947 * pamp]
Iteration 6, best error: 0.06290, best parameters: [20.869514 * msecond, -47.689476 * mvolt, 97.78947 * pamp]
Iteration 7, best error: 0.06290, best parameters: [20.869514 * msecond, -47.689476 * mvolt, 97.78947 * pamp]
Iteration 8, best error: 0.04136, best parameters: [20.60706 * msecond, -48.807583 * mvolt, 107.65714 * pamp]
Iteration 9, best error: 0.04136, best parameters: [20.60706 * msecond, -48.807583 * mvolt, 107.65714 * pamp]
Iteration 10, best error: 0.04136, best parameters: [20.60706 * msecond, -48.807583 * mvolt, 107.65714 * pamp]
Iteration 11, best error: 0.04136, best parameters: [20.60706 * msecond, -48.807583 * mvolt, 107.65714 * pamp]
Iteration 12, best error: 0.04136, best parameters: [20.60706 * msecond, -48.807583 * mvolt, 107.65714 * pamp]
Iteration 13, best error: 0.04136, best parameters: [20.60706 * msecond, -48.807583 * mvolt, 107.65714 * pamp]
Iteration 14, best error: 0.03424, best parameters: [20.393507 * msecond, -48.75344 * mvolt, 106.5493 * pamp]
Iteration 15, best error: 0.03424, best parameters: [20.393507 * msecond, -48.75344 * mvolt, 106.5493 * pamp]
Iteration 16, best error: 0.03126, best parameters: [20.209414 * msecond, -48.47601 * mvolt, 104.358955 * pamp]
Iteration 17, best error: 0.00627, best parameters: [20.173552 * msecond, -49.88805 * mvolt, 103.82278 * pamp]
Iteration 18, best error: 0.00627, best parameters: [20.173552 * msecond, -49.88805 * mvolt, 103.82278 * pamp]
Iteration 19, best error: 0.00505, best parameters: [20.155338 * msecond, -49.777374 * mvolt, 103.2845 * pamp]

Optimal neuron parameters:
  tau_m: 20.155338701712683 * msecond
  V_th: -49.77737511934559 * mvolt
  I_ext: 103.28450179436669 * pamp

Target parameters:
  tau_m: 20.0 * msecond
  V_th: -50.0 * mvolt
  I_ext: 100.0 * pamp

Final loss: 0.005052

# Visualize parameter evolution with units
fig, axes = plt.subplots(2, 2, figsize=(12, 8))

# Error evolution
axes[0, 0].plot(optimizer.errors, 'b-', alpha=0.6)
best_so_far = np.minimum.accumulate(optimizer.errors)
axes[0, 0].plot(best_so_far, 'r-', linewidth=2)
axes[0, 0].set_yscale('log')
axes[0, 0].set_xlabel('Evaluation')
axes[0, 0].set_ylabel('Loss')
axes[0, 0].set_title('Neuron Parameter Optimization')
axes[0, 0].grid(True, alpha=0.3)

# Extract parameter evolution (convert to base units for plotting)
candidates = optimizer.candidates
tau_values = [c[0] for c in candidates]
V_values = [c[1] for c in candidates]
I_values = [c[2] for c in candidates]

# Tau evolution
axes[0, 1].plot(tau_values, 'g-', alpha=0.8)
axes[0, 1].axhline(y=20.0, color='red', linestyle='--', alpha=0.7, label='Target')
axes[0, 1].set_xlabel('Evaluation')
axes[0, 1].set_ylabel('tau_m (ms)')
axes[0, 1].set_title('Membrane Time Constant')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Voltage threshold evolution
axes[1, 0].plot(V_values, 'orange', alpha=0.8)
axes[1, 0].axhline(y=-50.0, color='red', linestyle='--', alpha=0.7, label='Target')
axes[1, 0].set_xlabel('Evaluation')
axes[1, 0].set_ylabel('V_th (mV)')
axes[1, 0].set_title('Threshold Voltage')
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)

# Current evolution
axes[1, 1].plot(I_values, 'purple', alpha=0.8)
axes[1, 1].axhline(y=100.0, color='red', linestyle='--', alpha=0.7, label='Target')
axes[1, 1].set_xlabel('Evaluation')
axes[1, 1].set_ylabel('I_ext (pA)')
axes[1, 1].set_title('External Current')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

../_images/c987b61c060fb566e3a4987eb847f78a6b2fe385727ab0512178437b035c945c.png

Different Optimization Methods#

The NevergradOptimizer supports many different optimization algorithms. Let’s compare a few of them on the same problem.

# Define a challenging test function (Rosenbrock)
def rosenbrock_loss(x, y):
    """
    The Rosenbrock function - a classic optimization benchmark.
    Global minimum at (1, 1) with value 0.
    """
    a = 1.0
    b = 100.0
    return (a - x) ** 2 + b * (y - x ** 2) ** 2


bounds = [(-2.0, 2.0), (-1.0, 3.0)]

# Test different methods
methods = ['DE', 'CMA', 'PSO', 'OnePlusOne', 'TwoPointsDE']
results = {}
n_iter = 30

for method in methods:
    print(f"\nTesting {method}...")

    optimizer = braintools.optim.NevergradOptimizer(
        batched_loss_fun=rosenbrock_loss,
        bounds=bounds,
        n_sample=10,
        method=method
    )

    best_params = optimizer.minimize(n_iter=n_iter, verbose=False)
    final_loss = rosenbrock_loss(best_params[0], best_params[1])

    results[method] = {
        'best_params': best_params,
        'final_loss': final_loss,
        'errors': optimizer.errors.copy(),
        'candidates': [c for c in optimizer.candidates]
    }

    print(f"  Final loss: {final_loss:.6f}")
    print(f"  Best params: ({best_params[0]:.4f}, {best_params[1]:.4f})")

print("\nMethod comparison completed!")

Testing DE...
  Final loss: 0.018676
  Best params: (0.9207, 0.8365)

Testing CMA...
  Final loss: 1.354735
  Best params: (-0.1639, 0.0256)

Testing PSO...
  Final loss: 0.000568
  Best params: (0.9793, 0.9578)

Testing OnePlusOne...
  Final loss: 0.025147
  Best params: (0.8414, 0.7078)

Testing TwoPointsDE...
  Final loss: 0.004973
  Best params: (0.9440, 0.8869)

Method comparison completed!

# Visualize method comparison
fig, axes = plt.subplots(3, 3, figsize=(15, 8))
axes = axes.flatten()

# Plot convergence curves
ax = axes[0]
for method, result in results.items():
    best_so_far = np.minimum.accumulate(result['errors'])
    ax.plot(best_so_far, label=method, linewidth=2)

ax.set_yscale('log')
ax.set_xlabel('Evaluation')
ax.set_ylabel('Best Loss So Far')
ax.set_title('Method Comparison: Convergence')
ax.legend()
ax.grid(True, alpha=0.3)

# Plot final results
ax = axes[1]
method_names = list(results.keys())
final_losses = [results[m]['final_loss'] for m in method_names]
bars = ax.bar(method_names, final_losses, alpha=0.7)
ax.set_yscale('log')
ax.set_ylabel('Final Loss')
ax.set_title('Final Performance')
ax.tick_params(axis='x', rotation=45)
ax.grid(True, alpha=0.3)

# Highlight best method
best_method_idx = np.argmin(final_losses)
bars[best_method_idx].set_color('red')
bars[best_method_idx].set_alpha(1.0)

# Plot search trajectories for each method
colors = plt.cm.tab10(np.linspace(0, 1, len(methods)))
for i, (method, result) in enumerate(results.items()):
    ax = axes[2 + i]
    candidates = np.array(result['candidates'])
    errors = result['errors']

    # Plot contour of Rosenbrock function
    x = np.linspace(-2, 2, 50)
    y = np.linspace(-1, 3, 50)
    X, Y = np.meshgrid(x, y)
    Z = (1 - X) ** 2 + 100 * (Y - X ** 2) ** 2
    ax.contour(X, Y, Z, levels=20, alpha=0.3, colors='gray')

    # Plot search points
    scatter = ax.scatter(candidates[:, 0], candidates[:, 1],
                         c=errors, cmap='viridis_r', alpha=0.7, s=20)

    # Mark true optimum and best found
    ax.plot(1, 1, 'r*', markersize=15, label='True optimum')
    best_params = result['best_params']
    ax.plot(best_params[0], best_params[1], 'bo', markersize=8,
            label='Best found')

    ax.set_xlabel('x')
    ax.set_ylabel('y')
    ax.set_title(f'{method}: Search Trajectory')
    ax.legend(fontsize=8)
    ax.grid(True, alpha=0.3)

# Hide unused subplots
for i in range(2 + len(methods), len(axes)):
    axes[i].set_visible(False)

plt.tight_layout()
plt.show()

# Print summary
print("\nMethod Performance Summary:")
for method, result in results.items():
    print(f"{method:12s}: Loss = {result['final_loss']:.6f}, "
          f"Params = ({result['best_params'][0]:.4f}, {result['best_params'][1]:.4f})")

../_images/6d520e2ee7c980be587ebc1d9dcf8a296ceee9c5181ece3a2eb1f2db5f07be94.png

Method Performance Summary:
DE          : Loss = 0.018676, Params = (0.9207, 0.8365)
CMA         : Loss = 1.354735, Params = (-0.1639, 0.0256)
PSO         : Loss = 0.000568, Params = (0.9793, 0.9578)
OnePlusOne  : Loss = 0.025147, Params = (0.8414, 0.7078)
TwoPointsDE : Loss = 0.004973, Params = (0.9440, 0.8869)

Neural Network Hyperparameter Tuning#

Let’s see a more realistic example: optimizing hyperparameters for a simple neural network on a classification task.

# Generate synthetic classification data
np.random.seed(42)
n_samples = 1000
n_features = 10

# Create synthetic data
X = np.random.randn(n_samples, n_features)
# Create a non-linear decision boundary
y = ((X[:, 0] + 0.5 * X[:, 1] ** 2 + 0.3 * X[:, 2] * X[:, 3]) > 0).astype(int)

# Split into train/test
split = int(0.8 * n_samples)
X_train, X_test = X[:split], X[split:]
y_train, y_test = y[:split], y[split:]

print(f"Training set: {X_train.shape[0]} samples")
print(f"Test set: {X_test.shape[0]} samples")
print(f"Class distribution: {np.bincount(y_train)}")

Training set: 800 samples
Test set: 200 samples
Class distribution: [274 526]

# Define a simple neural network training function
def train_and_evaluate_network(**hyperparams):
    """
    Train a neural network with given hyperparameters and return validation loss.
    
    Args:
        **hyperparams: Dictionary containing 'hidden_size', 'learning_rate', 'l2_reg'
                      Each value is a JAX array of shape (n_sample,)
    
    Returns:
        Array of validation losses, shape (n_sample,)
    """
    hidden_sizes = hyperparams['hidden_size'].astype(int)
    learning_rates = hyperparams['learning_rate']
    l2_regs = hyperparams['l2_reg']

    losses = []

    for hidden_size, lr, l2_reg in zip(hidden_sizes, learning_rates, l2_regs):
        # Initialize network parameters
        key = jax.random.PRNGKey(42)
        k1, k2 = jax.random.split(key)

        # Simple 2-layer network
        W1 = jax.random.normal(k1, (n_features, hidden_size)) * 0.1
        b1 = jnp.zeros(hidden_size)
        W2 = jax.random.normal(k2, (hidden_size, 1)) * 0.1
        b2 = jnp.zeros(1)

        params = {'W1': W1, 'b1': b1, 'W2': W2, 'b2': b2}

        # Simple training loop (simplified for demo)
        def loss_fn(params, x, y):
            h = jax.nn.relu(x @ params['W1'] + params['b1'])
            logits = h @ params['W2'] + params['b2']
            # Binary cross-entropy loss
            loss = jnp.mean(jnp.log(1 + jnp.exp(-y[:, None] * logits)))
            # L2 regularization
            l2_loss = l2_reg * (jnp.sum(params['W1'] ** 2) + jnp.sum(params['W2'] ** 2))
            return loss + l2_loss

        grad_fn = jax.grad(loss_fn)

        # Convert labels to {-1, 1}
        y_train_signed = 2 * y_train - 1
        y_test_signed = 2 * y_test - 1

        # Training loop (simplified)
        for epoch in range(50):
            grads = grad_fn(params, X_train, y_train_signed)
            # Simple SGD update
            params = {k: v - lr * grads[k] for k, v in params.items()}

        # Evaluate on test set
        test_loss = loss_fn(params, X_test, y_test_signed)
        losses.append(float(test_loss))

    return jnp.array(losses)


# Define hyperparameter bounds
hyperparameter_bounds = {
    'hidden_size': (10, 100),  # Number of hidden units
    'learning_rate': (1e-4, 1e-1),  # Learning rate
    'l2_reg': (1e-6, 1e-2),  # L2 regularization
}

print("Neural network hyperparameter optimization setup complete!")

Neural network hyperparameter optimization setup complete!

# Create and run hyperparameter optimizer
optimizer = braintools.optim.NevergradOptimizer(
    batched_loss_fun=train_and_evaluate_network,
    bounds=hyperparameter_bounds,
    n_sample=8,  # Smaller batch size since training is expensive
    method='CMA',
    use_nevergrad_recommendation=True  # Use Nevergrad's recommendation
)

print("Starting hyperparameter optimization...")
print("(This may take a moment as we're actually training networks)")

best_hyperparams = optimizer.minimize(n_iter=15, verbose=True)

print(f"\nOptimal hyperparameters:")
print(f"  Hidden size: {int(best_hyperparams['hidden_size'])}")
print(f"  Learning rate: {best_hyperparams['learning_rate']:.6f}")
print(f"  L2 regularization: {best_hyperparams['l2_reg']:.6f}")
print(f"\nBest validation loss: {np.min(optimizer.errors):.6f}")

Starting hyperparameter optimization...
(This may take a moment as we're actually training networks)
Iteration 0, best error: 0.62158, best parameters: {'hidden_size': 55.72615785312022, 'learning_rate': 0.049378885537207005, 'l2_reg': 0.004330238788185962}
Iteration 1, best error: 0.61371, best parameters: {'hidden_size': 56.35320997926537, 'learning_rate': 0.05471933175493087, 'l2_reg': 0.004505735949617119}
Iteration 2, best error: 0.61371, best parameters: {'hidden_size': 56.35320997926537, 'learning_rate': 0.05471933175493087, 'l2_reg': 0.004505735949617119}
Iteration 3, best error: 0.60838, best parameters: {'hidden_size': 57.407793541579316, 'learning_rate': 0.05653133878150727, 'l2_reg': 0.003631914774644863}
Iteration 4, best error: 0.60522, best parameters: {'hidden_size': 57.407793541579316, 'learning_rate': 0.05653133878150727, 'l2_reg': 0.003631914774644863}
Iteration 5, best error: 0.60331, best parameters: {'hidden_size': 56.11874863311255, 'learning_rate': 0.05390213434427211, 'l2_reg': 0.002881664666010194}
Iteration 6, best error: 0.59252, best parameters: {'hidden_size': 55.72899448382774, 'learning_rate': 0.06946911305265302, 'l2_reg': 0.0036988397670589075}
Iteration 7, best error: 0.56450, best parameters: {'hidden_size': 55.43694246066678, 'learning_rate': 0.08911746525813419, 'l2_reg': 0.0032387087272190895}
Iteration 8, best error: 0.56450, best parameters: {'hidden_size': 55.43694246066678, 'learning_rate': 0.08911746525813419, 'l2_reg': 0.0032387087272190895}
Iteration 9, best error: 0.56450, best parameters: {'hidden_size': 55.43694246066678, 'learning_rate': 0.08911746525813419, 'l2_reg': 0.0032387087272190895}
Iteration 10, best error: 0.55636, best parameters: {'hidden_size': 55.43694246066678, 'learning_rate': 0.08911746525813419, 'l2_reg': 0.0032387087272190895}
Iteration 11, best error: 0.55636, best parameters: {'hidden_size': 66.04618028889605, 'learning_rate': 0.09991923718937859, 'l2_reg': 0.004181101599859817}
Iteration 12, best error: 0.54768, best parameters: {'hidden_size': 66.04618028889605, 'learning_rate': 0.09991923718937859, 'l2_reg': 0.004181101599859817}
Iteration 13, best error: 0.54768, best parameters: {'hidden_size': 65.92585198235423, 'learning_rate': 0.09688479388181644, 'l2_reg': 0.0031455150348020777}
Iteration 14, best error: 0.53523, best parameters: {'hidden_size': 70.27477504112848, 'learning_rate': 0.09683993890523653, 'l2_reg': 0.0018961034871329823}

Optimal hyperparameters:
  Hidden size: 70
  Learning rate: 0.096840
  L2 regularization: 0.001896

Best validation loss: 0.535234

# Visualize hyperparameter optimization results
fig, axes = plt.subplots(2, 2, figsize=(12, 8))

# Optimization progress
axes[0, 0].plot(optimizer.errors, 'b-', alpha=0.6, label='All evaluations')
best_so_far = np.minimum.accumulate(optimizer.errors)
axes[0, 0].plot(best_so_far, 'r-', linewidth=2, label='Best so far')
axes[0, 0].set_xlabel('Evaluation')
axes[0, 0].set_ylabel('Validation Loss')
axes[0, 0].set_title('Hyperparameter Optimization Progress')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Extract hyperparameter evolution
candidates = optimizer.candidates
hidden_sizes = [int(c['hidden_size']) for c in candidates]
learning_rates = [c['learning_rate'] for c in candidates]
l2_regs = [c['l2_reg'] for c in candidates]

# Hidden size evolution
axes[0, 1].plot(hidden_sizes, 'g-', alpha=0.8, marker='o', markersize=4)
axes[0, 1].set_xlabel('Evaluation')
axes[0, 1].set_ylabel('Hidden Size')
axes[0, 1].set_title('Hidden Size Evolution')
axes[0, 1].grid(True, alpha=0.3)

# Learning rate evolution
axes[1, 0].plot(learning_rates, 'orange', alpha=0.8, marker='o', markersize=4)
axes[1, 0].set_yscale('log')
axes[1, 0].set_xlabel('Evaluation')
axes[1, 0].set_ylabel('Learning Rate')
axes[1, 0].set_title('Learning Rate Evolution')
axes[1, 0].grid(True, alpha=0.3)

# L2 regularization evolution
axes[1, 1].plot(l2_regs, 'purple', alpha=0.8, marker='o', markersize=4)
axes[1, 1].set_yscale('log')
axes[1, 1].set_xlabel('Evaluation')
axes[1, 1].set_ylabel('L2 Regularization')
axes[1, 1].set_title('L2 Regularization Evolution')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Show correlation between hyperparameters and performance
print("\nHyperparameter vs Performance Analysis:")
print(f"Best performing hidden size: {hidden_sizes[np.argmin(optimizer.errors)]}")
print(f"Best performing learning rate: {learning_rates[np.argmin(optimizer.errors)]:.6f}")
print(f"Best performing L2 reg: {l2_regs[np.argmin(optimizer.errors)]:.6f}")

../_images/c1939f7bf1a6490fb1372a0792417e5c4e348b52543e271f16b2bec82382324c.png

Hyperparameter vs Performance Analysis:
Best performing hidden size: 70
Best performing learning rate: 0.096840
Best performing L2 reg: 0.001896

Advanced Features#

Let’s explore some advanced features of the NevergradOptimizer.

# Advanced example: Custom method parameters and budget constraints
def complex_loss(**params):
    """
    A more complex loss function with multiple local minima.
    """
    x = params['x']
    y = params['y']
    z = params['z']

    # Multi-modal function with several local minima
    loss1 = jnp.sin(5 * x) * jnp.cos(5 * y) + (x - 0.2) ** 2 + (y + 0.3) ** 2
    loss2 = 0.1 * (z - 1.5) ** 2

    return loss1 + loss2


bounds = {
    'x': (-1.0, 1.0),
    'y': (-1.0, 1.0),
    'z': (0.0, 3.0),
}

# Create optimizer with custom method parameters
optimizer = braintools.optim.NevergradOptimizer(
    batched_loss_fun=complex_loss,
    bounds=bounds,
    n_sample=20,
    method='CMA',
    budget=300,  # Limit total evaluations
    num_workers=1,
    # method_params={
    #     'sigma': 0.5,  # Initial step size for CMA-ES
    # },
    use_nevergrad_recommendation=True
)

print("Advanced optimizer with budget and custom parameters created!")
print(f"Budget: {optimizer.budget} evaluations")
print(f"Method parameters: {optimizer.method_params}")

Advanced optimizer with budget and custom parameters created!
Budget: 300 evaluations
Method parameters: {}

# Run optimization with budget constraint
best_params = optimizer.minimize(n_iter=15, verbose=True)

print(f"\nOptimization completed with budget constraint")
print(f"Total evaluations: {len(optimizer.errors)}")
print(f"Best parameters: x={best_params['x']:.4f}, y={best_params['y']:.4f}, z={best_params['z']:.4f}")
print(f"Best loss: {np.min(optimizer.errors):.6f}")

Iteration 0, best error: -0.62136, best parameters: {'x': -0.311048244098807, 'y': 0.02788643240639415, 'z': 1.5372617684679157}
Iteration 1, best error: -0.62144, best parameters: {'x': -0.311048244098807, 'y': 0.02788643240639415, 'z': 1.5372617684679157}
Iteration 2, best error: -0.66273, best parameters: {'x': -0.27458126887851286, 'y': -0.01659419012402761, 'z': 1.821264714382406}
Iteration 3, best error: -0.66273, best parameters: {'x': -0.27458126887851286, 'y': -0.01659419012402761, 'z': 1.821264714382406}

C:\Users\adadu\miniconda3\envs\bdp\Lib\site-packages\cma\evolution_strategy.py:2936: InjectionWarning: orphanated injected solution {'iteration': 1, 'index': 0, 'counter': 0}
                            This could be a bug in the calling order/logics or due to
                            a too small popsize used in `ask()` or when only using
                            `ask(1)` repeatedly. Please check carefully.
                            In case this is desired, the warning can be surpressed with
                            ``warnings.simplefilter("ignore", cma.evolution_strategy.InjectionWarning)``
                            
  warnings.warn("""orphanated injected solution %s
C:\Users\adadu\miniconda3\envs\bdp\Lib\site-packages\cma\evolution_strategy.py:2936: InjectionWarning: orphanated injected solution {'iteration': 3, 'index': 0, 'counter': 1}
                            This could be a bug in the calling order/logics or due to
                            a too small popsize used in `ask()` or when only using
                            `ask(1)` repeatedly. Please check carefully.
                            In case this is desired, the warning can be surpressed with
                            ``warnings.simplefilter("ignore", cma.evolution_strategy.InjectionWarning)``
                            
  warnings.warn("""orphanated injected solution %s
C:\Users\adadu\miniconda3\envs\bdp\Lib\site-packages\cma\evolution_strategy.py:2936: InjectionWarning: orphanated injected solution {'iteration': 4, 'index': 0, 'counter': 1}
                            This could be a bug in the calling order/logics or due to
                            a too small popsize used in `ask()` or when only using
                            `ask(1)` repeatedly. Please check carefully.
                            In case this is desired, the warning can be surpressed with
                            ``warnings.simplefilter("ignore", cma.evolution_strategy.InjectionWarning)``
                            
  warnings.warn("""orphanated injected solution %s

Iteration 4, best error: -0.66273, best parameters: {'x': -0.27458126887851286, 'y': -0.01659419012402761, 'z': 1.821264714382406}
Iteration 5, best error: -0.66353, best parameters: {'x': -0.2909894718851514, 'y': -0.033045808952976305, 'z': 1.6975710509505424}
Iteration 6, best error: -0.66353, best parameters: {'x': -0.2909894718851514, 'y': -0.033045808952976305, 'z': 1.6975710509505424}
Iteration 7, best error: -0.67070, best parameters: {'x': -0.2909894718851514, 'y': -0.033045808952976305, 'z': 1.6975710509505424}
Iteration 8, best error: -0.67070, best parameters: {'x': -0.2744329472588286, 'y': -0.023087308440185417, 'z': 1.7503477157619711}

C:\Users\adadu\miniconda3\envs\bdp\Lib\site-packages\cma\evolution_strategy.py:2936: InjectionWarning: orphanated injected solution {'iteration': 6, 'index': 0, 'counter': 1}
                            This could be a bug in the calling order/logics or due to
                            a too small popsize used in `ask()` or when only using
                            `ask(1)` repeatedly. Please check carefully.
                            In case this is desired, the warning can be surpressed with
                            ``warnings.simplefilter("ignore", cma.evolution_strategy.InjectionWarning)``
                            
  warnings.warn("""orphanated injected solution %s
C:\Users\adadu\miniconda3\envs\bdp\Lib\site-packages\cma\evolution_strategy.py:2936: InjectionWarning: orphanated injected solution {'iteration': 7, 'index': 0, 'counter': 1}
                            This could be a bug in the calling order/logics or due to
                            a too small popsize used in `ask()` or when only using
                            `ask(1)` repeatedly. Please check carefully.
                            In case this is desired, the warning can be surpressed with
                            ``warnings.simplefilter("ignore", cma.evolution_strategy.InjectionWarning)``
                            
  warnings.warn("""orphanated injected solution %s
C:\Users\adadu\miniconda3\envs\bdp\Lib\site-packages\cma\evolution_strategy.py:2936: InjectionWarning: orphanated injected solution {'iteration': 8, 'index': 0, 'counter': 2}
                            This could be a bug in the calling order/logics or due to
                            a too small popsize used in `ask()` or when only using
                            `ask(1)` repeatedly. Please check carefully.
                            In case this is desired, the warning can be surpressed with
                            ``warnings.simplefilter("ignore", cma.evolution_strategy.InjectionWarning)``
                            
  warnings.warn("""orphanated injected solution %s

Iteration 9, best error: -0.67070, best parameters: {'x': -0.2744329472588286, 'y': -0.023087308440185417, 'z': 1.7503477157619711}
Iteration 10, best error: -0.67070, best parameters: {'x': -0.2744329472588286, 'y': -0.023087308440185417, 'z': 1.7503477157619711}
Iteration 11, best error: -0.67070, best parameters: {'x': -0.2748475989704158, 'y': -0.021836972402086707, 'z': 1.7384099131250397}
Iteration 12, best error: -0.67070, best parameters: {'x': -0.27842131476021775, 'y': -0.022420418064458564, 'z': 1.7276357553062276}

C:\Users\adadu\miniconda3\envs\bdp\Lib\site-packages\cma\evolution_strategy.py:2936: InjectionWarning: orphanated injected solution {'iteration': 10, 'index': 0, 'counter': 1}
                            This could be a bug in the calling order/logics or due to
                            a too small popsize used in `ask()` or when only using
                            `ask(1)` repeatedly. Please check carefully.
                            In case this is desired, the warning can be surpressed with
                            ``warnings.simplefilter("ignore", cma.evolution_strategy.InjectionWarning)``
                            
  warnings.warn("""orphanated injected solution %s
C:\Users\adadu\miniconda3\envs\bdp\Lib\site-packages\cma\evolution_strategy.py:2936: InjectionWarning: orphanated injected solution {'iteration': 11, 'index': 0, 'counter': 1}
                            This could be a bug in the calling order/logics or due to
                            a too small popsize used in `ask()` or when only using
                            `ask(1)` repeatedly. Please check carefully.
                            In case this is desired, the warning can be surpressed with
                            ``warnings.simplefilter("ignore", cma.evolution_strategy.InjectionWarning)``
                            
  warnings.warn("""orphanated injected solution %s
C:\Users\adadu\miniconda3\envs\bdp\Lib\site-packages\cma\evolution_strategy.py:2936: InjectionWarning: orphanated injected solution {'iteration': 12, 'index': 0, 'counter': 2}
                            This could be a bug in the calling order/logics or due to
                            a too small popsize used in `ask()` or when only using
                            `ask(1)` repeatedly. Please check carefully.
                            In case this is desired, the warning can be surpressed with
                            ``warnings.simplefilter("ignore", cma.evolution_strategy.InjectionWarning)``
                            
  warnings.warn("""orphanated injected solution %s

Iteration 13, best error: -0.67070, best parameters: {'x': -0.27842131476021775, 'y': -0.022420418064458564, 'z': 1.7276357553062276}
Iteration 14, best error: -0.67070, best parameters: {'x': -0.27704310792891923, 'y': -0.0221970706220294, 'z': 1.706428302258128}

Optimization completed with budget constraint
Total evaluations: 300
Best parameters: x=-0.2770, y=-0.0222, z=1.7064
Best loss: -0.670704

# Visualize the complex optimization landscape
fig = plt.figure(figsize=(15, 5))

# Plot 2D slice of the loss function (fixing z at optimal value)
ax1 = fig.add_subplot(131)
x_grid = np.linspace(-1, 1, 100)
y_grid = np.linspace(-1, 1, 100)
X_grid, Y_grid = np.meshgrid(x_grid, y_grid)
Z_optimal = best_params['z']

# Compute loss over the grid
Z_grid = np.zeros_like(X_grid)
for i in range(X_grid.shape[0]):
    for j in range(X_grid.shape[1]):
        loss_val = complex_loss(x=X_grid[i, j], y=Y_grid[i, j], z=Z_optimal)
        Z_grid[i, j] = float(loss_val)

contour = ax1.contourf(X_grid, Y_grid, Z_grid, levels=30, cmap='viridis')
plt.colorbar(contour, ax=ax1)

# Plot optimization trajectory
candidates = optimizer.candidates
x_traj = [c['x'] for c in candidates]
y_traj = [c['y'] for c in candidates]
ax1.plot(x_traj, y_traj, 'r-', alpha=0.7, linewidth=2, label='Optimization path')
ax1.scatter(x_traj, y_traj, c=optimizer.errors, cmap='viridis_r', s=30,
            edgecolors='black', alpha=0.8)
ax1.plot(best_params['x'], best_params['y'], 'r*', markersize=15, label='Best found')

ax1.set_xlabel('x')
ax1.set_ylabel('y')
ax1.set_title(f'Loss Landscape (z={Z_optimal:.2f})')
ax1.legend()

# Plot convergence
ax2 = fig.add_subplot(132)
ax2.plot(optimizer.errors, 'b-', alpha=0.6, label='All evaluations')
best_so_far = np.minimum.accumulate(optimizer.errors)
ax2.plot(best_so_far, 'r-', linewidth=2, label='Best so far')
ax2.set_yscale('log')
ax2.set_xlabel('Evaluation')
ax2.set_ylabel('Loss')
ax2.set_title('Complex Function Optimization')
ax2.legend()
ax2.grid(True, alpha=0.3)

# Plot parameter evolution for z
ax3 = fig.add_subplot(133)
z_traj = [c['z'] for c in candidates]
ax3.plot(z_traj, 'g-', alpha=0.8, marker='o', markersize=4)
ax3.axhline(y=1.5, color='red', linestyle='--', alpha=0.7, label='Optimal z')
ax3.set_xlabel('Evaluation')
ax3.set_ylabel('z parameter')
ax3.set_title('z Parameter Evolution')
ax3.legend()
ax3.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

../_images/62e4fa71501e879f60dd79e96c5300fe0d6ed7a42110331b4ab4fafda6b914c6.png

Best Practices and Tips#

Here are some best practices for using the NevergradOptimizer effectively:

# Best Practice 1: Proper scaling and bounds
print("Best Practice 1: Proper parameter scaling")
print("=====================================\n")

# Good: Parameters on similar scales
good_bounds = {
    'param1': (-1.0, 1.0),
    'param2': (-2.0, 2.0),
    'param3': (-0.5, 0.5),
}

# Problematic: Very different scales
problematic_bounds = {
    'param1': (-1.0, 1.0),
    'param2': (-1000.0, 1000.0),  # Much larger scale
    'param3': (-0.001, 0.001),  # Much smaller scale
}

print("✓ Good bounds (similar scales):")
for name, (low, high) in good_bounds.items():
    print(f"  {name}: [{low}, {high}] (range: {high - low})")

print("\n❌ Problematic bounds (very different scales):")
for name, (low, high) in problematic_bounds.items():
    print(f"  {name}: [{low}, {high}] (range: {high - low})")

print("\nTip: Normalize parameters to similar ranges for better optimization.")

Best Practice 1: Proper parameter scaling
=====================================

✓ Good bounds (similar scales):
  param1: [-1.0, 1.0] (range: 2.0)
  param2: [-2.0, 2.0] (range: 4.0)
  param3: [-0.5, 0.5] (range: 1.0)

❌ Problematic bounds (very different scales):
  param1: [-1.0, 1.0] (range: 2.0)
  param2: [-1000.0, 1000.0] (range: 2000.0)
  param3: [-0.001, 0.001] (range: 0.002)

Tip: Normalize parameters to similar ranges for better optimization.

# Best Practice 2: Choosing the right method and population size
print("\nBest Practice 2: Method and population size selection")
print("==================================================\n")

guidelines = {
    'DE': {
        'description': 'Differential Evolution - Good general purpose optimizer',
        'best_for': 'Most problems, robust and reliable',
        'population_size': '5-20 times the number of parameters',
        'pros': 'Robust, handles constraints well',
        'cons': 'Can be slow on high-dimensional problems'
    },
    'CMA': {
        'description': 'Covariance Matrix Adaptation Evolution Strategy',
        'best_for': 'Continuous optimization, high-dimensional problems',
        'population_size': 'Automatically determined, typically 4 + 3*ln(n_params)',
        'pros': 'Excellent for continuous problems, adapts to problem structure',
        'cons': 'Can be computationally expensive, sensitive to initialization'
    },
    'PSO': {
        'description': 'Particle Swarm Optimization',
        'best_for': 'Multi-modal problems, when you want exploration',
        'population_size': '20-50 particles',
        'pros': 'Good exploration, handles multiple local minima',
        'cons': 'Can be slow to converge precisely'
    },
    'OnePlusOne': {
        'description': 'Simple (1+1) evolution strategy',
        'best_for': 'Quick optimization, low-dimensional problems',
        'population_size': '1 (no population)',
        'pros': 'Fast, simple, low memory usage',
        'cons': 'Can get stuck in local minima, not suitable for noisy functions'
    }
}

for method, info in guidelines.items():
    print(f"{method}:")
    print(f"  Description: {info['description']}")
    print(f"  Best for: {info['best_for']}")
    print(f"  Population size: {info['population_size']}")
    print(f"  Pros: {info['pros']}")
    print(f"  Cons: {info['cons']}\n")

Best Practice 2: Method and population size selection
==================================================

DE:
  Description: Differential Evolution - Good general purpose optimizer
  Best for: Most problems, robust and reliable
  Population size: 5-20 times the number of parameters
  Pros: Robust, handles constraints well
  Cons: Can be slow on high-dimensional problems

CMA:
  Description: Covariance Matrix Adaptation Evolution Strategy
  Best for: Continuous optimization, high-dimensional problems
  Population size: Automatically determined, typically 4 + 3*ln(n_params)
  Pros: Excellent for continuous problems, adapts to problem structure
  Cons: Can be computationally expensive, sensitive to initialization

PSO:
  Description: Particle Swarm Optimization
  Best for: Multi-modal problems, when you want exploration
  Population size: 20-50 particles
  Pros: Good exploration, handles multiple local minima
  Cons: Can be slow to converge precisely

OnePlusOne:
  Description: Simple (1+1) evolution strategy
  Best for: Quick optimization, low-dimensional problems
  Population size: 1 (no population)
  Pros: Fast, simple, low memory usage
  Cons: Can get stuck in local minima, not suitable for noisy functions

# Best Practice 3: Handling noisy objective functions
print("Best Practice 3: Dealing with noisy objectives")
print("=============================================\n")


def noisy_objective(x, y, noise_level=0.1):
    """Example of a noisy objective function."""
    true_loss = (x - 1.0) ** 2 + (y + 0.5) ** 2
    # Add noise that varies with evaluation
    noise = noise_level * jax.random.normal(
        jax.random.PRNGKey(int(1000 * (x[0] + y[0]))), x.shape
    )
    return true_loss + noise


bounds = [(-2.0, 2.0), (-2.0, 2.0)]

# Compare different approaches for noisy functions
methods_for_noise = ['DE', 'CMA', 'PSO']
noise_results = {}

for method in methods_for_noise:
    optimizer = braintools.optim.NevergradOptimizer(
        batched_loss_fun=noisy_objective,
        bounds=bounds,
        n_sample=15,  # Larger population for noisy functions
        method=method,
        use_nevergrad_recommendation=True  # Often better for noisy functions
    )

    best_params = optimizer.minimize(n_iter=20, verbose=False)
    # Evaluate true performance (without noise)
    true_loss = (best_params[0] - 1.0) ** 2 + (best_params[1] + 0.5) ** 2

    noise_results[method] = {
        'best_params': best_params,
        'true_loss': float(true_loss),
        'errors': optimizer.errors
    }

    print(f"{method}: True loss = {true_loss:.6f}, "
          f"Params = ({best_params[0]:.4f}, {best_params[1]:.4f})")

print("\nTips for noisy objectives:")
print("- Use larger population sizes (n_sample)")
print("- Set use_nevergrad_recommendation=True")
print("- Consider methods like CMA-ES that are robust to noise")
print("- Run for more iterations to allow convergence despite noise")

Best Practice 3: Dealing with noisy objectives
=============================================

DE: True loss = 0.011109, Params = (1.0971, -0.4590)
CMA: True loss = 0.048718, Params = (0.8519, -0.6637)
PSO: True loss = 0.059386, Params = (1.1363, -0.7020)

Tips for noisy objectives:
- Use larger population sizes (n_sample)
- Set use_nevergrad_recommendation=True
- Consider methods like CMA-ES that are robust to noise
- Run for more iterations to allow convergence despite noise

# Best Practice 4: Parameter importance analysis
print("\nBest Practice 4: Understanding parameter importance")
print("================================================\n")


# Analyze which parameters matter most
def analyze_parameter_importance(optimizer_results):
    """Analyze which parameters have the most impact on the objective."""
    candidates = optimizer_results.candidates
    errors = optimizer_results.errors

    if isinstance(candidates[0], dict):
        # Dictionary-based parameters
        param_names = list(candidates[0].keys())
        param_values = {name: [c[name] for c in candidates] for name in param_names}

        print("Parameter sensitivity analysis:")
        for name in param_names:
            values = np.array(param_values[name])
            # Simple correlation with errors
            correlation = np.corrcoef(values, errors)[0, 1]
            print(f"  {name}: correlation with loss = {correlation:.4f}")

            # Range of values explored
            print(f"    Range explored: [{np.min(values):.4f}, {np.max(values):.4f}]")

        return param_values, errors
    else:
        print("Analysis for positional parameters not implemented in this demo")
        return None, None


# Use the hyperparameter optimization results from earlier
if 'optimizer' in locals() and hasattr(optimizer, 'candidates'):
    param_values, errors = analyze_parameter_importance(optimizer)

    if param_values is not None:
        # Visualize parameter importance
        fig, axes = plt.subplots(1, len(param_values), figsize=(15, 4))
        if len(param_values) == 1:
            axes = [axes]

        for i, (name, values) in enumerate(param_values.items()):
            axes[i].scatter(values, errors, alpha=0.6)
            axes[i].set_xlabel(name)
            axes[i].set_ylabel('Loss')
            axes[i].set_title(f'{name} vs Loss')
            if name in ['learning_rate', 'l2_reg']:
                axes[i].set_xscale('log')
            axes[i].grid(True, alpha=0.3)

        plt.tight_layout()
        plt.show()

print("\nTips for parameter analysis:")
print("- Look for strong correlations between parameters and loss")
print("- Parameters with low correlation might be less important")
print("- Consider fixing less important parameters to reduce search space")
print("- Use the insights to set better bounds for important parameters")

Best Practice 4: Understanding parameter importance
================================================

Analysis for positional parameters not implemented in this demo

Tips for parameter analysis:
- Look for strong correlations between parameters and loss
- Parameters with low correlation might be less important
- Consider fixing less important parameters to reduce search space
- Use the insights to set better bounds for important parameters

Summary#

In this tutorial, we’ve covered the key aspects of using BrainTools’ NevergradOptimizer:

Key Features:

Batched evaluation: Efficiently evaluate multiple parameter sets in parallel
Flexible bounds: Support for both positional and named parameters
Unit integration: Seamless work with BrainUnit quantities
Multiple algorithms: Access to various optimization methods (DE, CMA, PSO, etc.)
Advanced options: Budget constraints, custom method parameters, recommendations

Best Practices:

Scale parameters appropriately - Keep parameter ranges on similar scales
Choose the right method - Match the algorithm to your problem characteristics
Handle noise properly - Use larger populations and recommendations for noisy objectives
Analyze results - Understand which parameters matter most for future optimization

When to Use NevergradOptimizer:

Hyperparameter optimization for neural networks
Problems where gradients are unavailable or unreliable
Multi-modal optimization landscapes
Noisy objective functions
Mixed continuous/discrete parameter spaces

The NevergradOptimizer provides a powerful and flexible interface for derivative-free optimization in the BrainTools ecosystem, making it easy to optimize complex neural network models and computational neuroscience simulations.

Tutorial 1: NevergradOptimizer Tutorial

Contents

Tutorial 1: `NevergradOptimizer` Tutorial#

Introduction and Setup#

Basic Usage: Scalar Optimization#

Multi-dimensional Parameter Optimization#

Named Parameters with Dictionary Bounds#

Working with BrainUnit Quantities#

Different Optimization Methods#

Neural Network Hyperparameter Tuning#

Advanced Features#

Best Practices and Tips#

Summary#

Modeling

Infrastructure

Compilation

Tutorial 1: NevergradOptimizer Tutorial

Contents

Tutorial 1: NevergradOptimizer Tutorial#

Introduction and Setup#

Basic Usage: Scalar Optimization#

Multi-dimensional Parameter Optimization#

Named Parameters with Dictionary Bounds#

Working with BrainUnit Quantities#

Different Optimization Methods#

Neural Network Hyperparameter Tuning#

Advanced Features#

Best Practices and Tips#

Summary#

Tutorial 1: `NevergradOptimizer` Tutorial#