cosine_similarity

cosine_similarity#

class braintools.metric.cosine_similarity(predictions, targets, epsilon=0.0)#

Compute cosine similarity between predicted and target vectors.

Calculates the cosine of the angle between vectors, providing a measure of similarity that is independent of vector magnitude. This metric is particularly useful for comparing direction or orientation of high-dimensional vectors, commonly used in natural language processing, computer vision, and recommendation systems.

The cosine similarity is defined as:

\[\text{cosine\_similarity}(\mathbf{u}, \mathbf{v}) = \frac{\mathbf{u} \cdot \mathbf{v}}{||\mathbf{u}|| ||\mathbf{v}||}\]

where \(\mathbf{u}\) and \(\mathbf{v}\) are vectors, and \(||\cdot||\) denotes the L2 (Euclidean) norm.

Parameters:
  • predictions (Array | ndarray | bool | number | bool | int | float | complex | Quantity) – Predicted vectors with shape (..., dim) where dim is the vector dimension. Must be floating-point type.

  • targets (Array | ndarray | bool | number | bool | int | float | complex | Quantity) – Ground truth target vectors with shape (..., dim) matching the shape of predictions. Must be floating-point type.

  • epsilon (float) – Small value added to denominators to prevent division by zero when computing norms. This provides numerical stability for zero or near-zero vectors.

Returns:

Cosine similarity values with shape (...,) where the last dimension has been reduced. Values range from -1 (opposite directions) to 1 (same direction), with 0 indicating orthogonal vectors.

Return type:

Array | ndarray | bool | number | bool | int | float | complex | Quantity

Notes

Properties of cosine similarity:

  • Scale invariant: Only depends on vector direction, not magnitude

  • Bounded: Values always in [-1, 1] range

  • Symmetric: sim(u, v) = sim(v, u)

  • Geometric interpretation: Cosine of angle between vectors

Common use cases:

  • Text similarity: Comparing document embeddings

  • Image features: Comparing visual feature vectors

  • Recommendation: Finding similar user/item profiles

  • Clustering: Measuring vector similarity in high dimensions

The function handles zero vectors gracefully using the epsilon parameter to avoid division by zero errors.

Examples

Basic cosine similarity:

>>> import jax.numpy as jnp
>>> import braintools
>>> # Two 3D vectors
>>> pred = jnp.array([1.0, 2.0, 3.0])
>>> target = jnp.array([2.0, 4.0, 6.0])  # Same direction, different magnitude
>>> similarity = braintools.metric.cosine_similarity(pred, target)
>>> print(f"Similarity: {similarity:.4f}")  # Should be close to 1.0

Batch computation:

>>> # Batch of vector pairs
>>> pred_batch = jnp.array([[1.0, 0.0], [0.0, 1.0], [1.0, 1.0]])
>>> target_batch = jnp.array([[0.0, 1.0], [1.0, 0.0], [1.0, -1.0]])
>>> similarities = braintools.metric.cosine_similarity(pred_batch, target_batch)
>>> print(similarities)  # [0.0, 0.0, 0.0] (all orthogonal pairs)

Handling zero vectors:

>>> zero_vec = jnp.array([0.0, 0.0, 0.0])
>>> normal_vec = jnp.array([1.0, 2.0, 3.0])
>>> # Without epsilon, might cause numerical issues
>>> sim_safe = braintools.metric.cosine_similarity(zero_vec, normal_vec, epsilon=1e-8)

Measuring text similarity (conceptual):

>>> # Document embeddings (simplified)
>>> doc1_embedding = jnp.array([0.8, 0.1, 0.3, 0.2])
>>> doc2_embedding = jnp.array([0.7, 0.2, 0.4, 0.1])
>>> text_similarity = braintools.metric.cosine_similarity(doc1_embedding, doc2_embedding)

See also

braintools.metric.cosine_distance

1 - cosine_similarity

jax.numpy.dot

Dot product computation

jax.numpy.linalg.norm

Vector norm computation

References