• Getting Started
  • Core Concepts
  • Reinforcement Learning
  • Model Context Protocol (MCP)
  • Workflow Patterns
  • Advanced Agent Patterns
  • Guides

Reinforcement Learning

State Representation

Semantic embeddings and state matching in Azcore RL.

State representation is how Azcore RL encodes queries and contexts into a format suitable for Q-learning. Using semantic embeddings enables generalization across similar queries, dramatically improving learning efficiency.

🎯 What is State Representation?

In RL, a state represents the current situation that determines action selection. In Azcore:

  • State = User query or task description
  • Action = Tool selection
  • Goal = Learn which tools work for which types of queries

Without Semantic Representation

# Treats each query as completely unique
Query 1: "What's the weather in Paris?"
Query 2: "Weather in London?"
Query 3: "Temperature in Tokyo?"

# Problem: Must learn each separately, no knowledge transfer!
# Requires 3x the training data

With Semantic Representation

# Understands semantic similarity
Query 1: "What's the weather in Paris?"      → Embedding: [0.12, 0.45, -0.23, ...]
Query 2: "Weather in London?"                → Embedding: [0.14, 0.43, -0.21, ...]
Query 3: "Temperature in Tokyo?"             → Embedding: [0.13, 0.44, -0.22, ...]

# Cosine similarities: 0.92, 0.89 → Knowledge transfers!
# Learn once, apply to similar queries

🏗️ Semantic Embeddings

Enable Embeddings

from azcore.rl.rl_manager import RLManager

rl_manager = RLManager(
    tool_names=["weather", "search", "calculate"],
    use_embeddings=True,                        # Enable semantic matching
    embedding_model_name="all-MiniLM-L6-v2",   # Model choice
    similarity_threshold=0.7                    # Match threshold
)

How It Works

User Query: "What's the temperature in NYC?"
Embedding Model (Sentence Transformer)
Vector: [0.15, 0.42, -0.19, 0.31, ..., 0.08]  # 384 dimensions
Similarity Search (Cosine Similarity)
Compare with existing state embeddings
If similarity > 0.7: Use existing state
If similarity < 0.7: Create new state

Example Flow

# First query - creates new state
query1 = "What's the weather in Paris?"
selected1, state_key1 = rl_manager.select_tools(query1)
# state_key1 = "What's the weather in Paris?"
# Creates embedding and stores in Q-table

# Second query - finds similar state
query2 = "Weather forecast for London?"
selected2, state_key2 = rl_manager.select_tools(query2)
# Computes embedding for query2
# Finds query1 with similarity 0.85 > 0.7
# state_key2 = "What's the weather in Paris?" (reuses!)
# Uses Q-values learned from query1

📊 Embedding Models

Available Models

Azcore uses sentence-transformers library. Popular models:

ModelDimensionsSizeSpeedQualityUse Case
all-MiniLM-L6-v238480MB⚡ Fast⭐⭐⭐ GoodDefault, production
all-mpnet-base-v2768420MB🐌 Slow⭐⭐⭐⭐ BetterHigh accuracy
all-MiniLM-L12-v2384120MB⚡ Fast⭐⭐⭐⭐ BetterBalanced
paraphrase-multilingual-MiniLM-L12-v2384420MB⚡ Medium⭐⭐⭐ GoodMultilingual

Choosing a Model

# Fast, production (default)
rl_manager = RLManager(
    tool_names=tools,
    embedding_model_name="all-MiniLM-L6-v2"
)

# High accuracy
rl_manager = RLManager(
    tool_names=tools,
    embedding_model_name="all-mpnet-base-v2"
)

# Multilingual support
rl_manager = RLManager(
    tool_names=tools,
    embedding_model_name="paraphrase-multilingual-MiniLM-L12-v2"
)

🔍 Similarity Matching

Cosine Similarity

Azcore uses cosine similarity to compare embeddings:

similarity = cos(θ) = (A · B) / (||A|| × ||B||)

Range: -1 to +1
- 1.0 = identical
- 0.0 = orthogonal (unrelated)
- -1.0 = opposite

Similarity Threshold

# Conservative matching (exact similarity required)
rl_manager = RLManager(
    tool_names=tools,
    similarity_threshold=0.9  # 90% similar
)

# Balanced matching (default)
rl_manager = RLManager(
    tool_names=tools,
    similarity_threshold=0.7  # 70% similar
)

# Aggressive matching (broad generalization)
rl_manager = RLManager(
    tool_names=tools,
    similarity_threshold=0.5  # 50% similar
)

Threshold Guidelines

ThresholdBehaviorUse Case
0.9 - 1.0Very strict, near-identicalSensitive domains
0.7 - 0.9Balanced, similar queriesGeneral purpose (recommended)
0.5 - 0.7Loose, broad generalizationRapid learning, diverse queries
< 0.5Too loose, false matchesNot recommended

💡 Benefits of Semantic Representation

1. Data Efficiency

# Without embeddings: Need to learn each query separately
queries_needed_without = 1000  # 1000 unique queries

# With embeddings: Learn from similar queries
queries_needed_with = 200  # 200 queries cover 1000 variations

efficiency_gain = queries_needed_without / queries_needed_with
print(f"5x more data efficient!")

2. Generalization

# Train on: "What's the weather in Paris?"
# Generalizes to:
- "Weather in London?"- "Temperature in Tokyo?"- "How's the weather in NYC?"- "Is it raining in Seattle?"
# Even handles paraphrasing!

3. Robustness

# Handles variations automatically
"What is 2 + 2?"
"Calculate 2 plus 2"
"What's two plus two?"
"2+2 equals what?"

# All map to similar embeddings
# Uses same learned Q-values

4. Cold Start Performance

# First query ever
query = "Weather in Paris?"
# No prior knowledge, explores randomly

# Second query (similar)
query = "Temperature in London?"
# Uses knowledge from first query!
# Faster learning, better performance

🔧 Advanced Configuration

Embedding Caching

Embeddings are automatically cached to avoid recomputation:

from azcore.utils.caching import get_embedding_cache

# Global embedding cache
cache = get_embedding_cache()

# First call: Computes embedding
embedding1 = rl_manager._get_embedding("What's the weather?")
# Cache MISS - computes embedding

# Second call: Uses cache
embedding2 = rl_manager._get_embedding("What's the weather?")
# Cache HIT - instant retrieval

# Same embedding, no recomputation
assert np.array_equal(embedding1, embedding2)

State Inspection

# View all learned states
all_states = list(rl_manager.q_table.keys())
print(f"Total states: {len(all_states)}")

# View state embeddings
for state in all_states[:5]:
    embedding = rl_manager.state_embeddings.get(state)
    if embedding is not None:
        print(f"State: {state[:50]}...")
        print(f"Embedding shape: {embedding.shape}")
        print(f"Embedding norm: {np.linalg.norm(embedding):.3f}\n")

Find Similar States

# Find states similar to a query
def find_similar_states(rl_manager, query, top_k=5):
    """Find top-k most similar states to query."""
    from sentence_transformers import util

    query_embedding = rl_manager._get_embedding(query)
    if query_embedding is None:
        return []

    similarities = []
    for state, embedding in rl_manager.state_embeddings.items():
        sim = util.cos_sim(query_embedding, embedding).item()
        similarities.append((state, sim))

    # Sort by similarity
    similarities.sort(key=lambda x: x[1], reverse=True)
    return similarities[:top_k]

# Usage
similar = find_similar_states(rl_manager, "Weather in Tokyo?", top_k=3)
for state, sim in similar:
    print(f"Similarity: {sim:.3f} - {state}")

📈 Embedding Analysis

Visualize Embeddings

import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA

# Collect embeddings
embeddings = []
labels = []
for state, emb in rl_manager.state_embeddings.items():
    embeddings.append(emb)
    labels.append(state[:30])  # Truncate for display

# Reduce to 2D using PCA
pca = PCA(n_components=2)
embeddings_2d = pca.fit_transform(np.array(embeddings))

# Plot
plt.figure(figsize=(12, 8))
plt.scatter(embeddings_2d[:, 0], embeddings_2d[:, 1])

for i, label in enumerate(labels):
    plt.annotate(label, (embeddings_2d[i, 0], embeddings_2d[i, 1]))

plt.title("State Embeddings (2D Projection)")
plt.xlabel("PC1")
plt.ylabel("PC2")
plt.tight_layout()
plt.savefig("embeddings_visualization.png")

Embedding Quality Metrics

def analyze_embedding_quality(rl_manager):
    """Analyze quality of learned embeddings."""
    from sentence_transformers import util

    states = list(rl_manager.state_embeddings.keys())
    embeddings = [rl_manager.state_embeddings[s] for s in states]

    if len(embeddings) < 2:
        return

    # Compute pairwise similarities
    similarities = []
    for i in range(len(embeddings)):
        for j in range(i + 1, len(embeddings)):
            sim = util.cos_sim(embeddings[i], embeddings[j]).item()
            similarities.append(sim)

    similarities = np.array(similarities)

    print(f"Embedding Analysis:")
    print(f"  States: {len(states)}")
    print(f"  Avg Similarity: {similarities.mean():.3f}")
    print(f"  Std Similarity: {similarities.std():.3f}")
    print(f"  Min Similarity: {similarities.min():.3f}")
    print(f"  Max Similarity: {similarities.max():.3f}")
    print(f"  Matches (>0.7): {(similarities > 0.7).sum()}")

analyze_embedding_quality(rl_manager)

🎯 Best Practices

1. Enable Embeddings by Default

# ✅ GOOD: Use embeddings for better generalization
rl_manager = RLManager(
    tool_names=tools,
    use_embeddings=True
)

# ❌ BAD: Disable only if you have specific reasons
rl_manager = RLManager(
    tool_names=tools,
    use_embeddings=False  # Treats each query as unique
)

2. Tune Similarity Threshold

# Start with default 0.7
rl_manager = RLManager(
    tool_names=tools,
    similarity_threshold=0.7
)

# If learning too slow, decrease threshold
similarity_threshold=0.6  # More generalization

# If false matches, increase threshold
similarity_threshold=0.8  # More specificity

3. Monitor State Growth

# Track number of unique states
stats = rl_manager.get_statistics()
print(f"Total states: {stats['total_states']}")
print(f"Cached embeddings: {stats['cached_embeddings']}")

# Ideal: States grow sublinearly with queries
# If linear growth, threshold may be too high

4. Use Appropriate Model

# Production: Fast, good enough
embedding_model_name="all-MiniLM-L6-v2"

# High stakes: Better quality
embedding_model_name="all-mpnet-base-v2"

# Multilingual: Support multiple languages
embedding_model_name="paraphrase-multilingual-MiniLM-L12-v2"

🚀 Complete Example

from azcore.rl.rl_manager import RLManager
from azcore.rl.rewards import HeuristicRewardCalculator
from sentence_transformers import util

# Setup with embeddings
rl_manager = RLManager(
    tool_names=["weather", "search", "calculate", "email"],
    use_embeddings=True,
    embedding_model_name="all-MiniLM-L6-v2",
    similarity_threshold=0.7,
    q_table_path="rl_data/semantic_agent.pkl"
)

reward_calc = HeuristicRewardCalculator()

# Training queries
training_data = [
    ("What's the weather in Paris?", "weather", 1.0),
    ("Temperature in NYC?", "weather", 1.0),
    ("Calculate 15 * 23", "calculate", 1.0),
    ("What's 50 plus 25?", "calculate", 1.0),
    ("Search for Python tutorials", "search", 1.0),
    ("Find information on AI", "search", 1.0),
]

# Train
for query, correct_tool, reward in training_data:
    selected, state_key = rl_manager.select_tools(query, top_n=2)

    # Reward correct tool selection
    for tool in selected:
        tool_reward = reward if tool == correct_tool else -0.5
        rl_manager.update(state_key, tool, tool_reward)

    print(f"Query: {query}")
    print(f"  State Key: {state_key[:50]}...")
    print(f"  Selected: {selected}\n")

# Test generalization
test_queries = [
    "How's the weather in London?",  # Similar to training weather queries
    "Calculate 100 divided by 5",     # Similar to training math queries
    "Search for machine learning",    # Similar to training search queries
]

print("\n=== Testing Generalization ===")
rl_manager.exploration_rate = 0.0  # Pure exploitation

for query in test_queries:
    selected, state_key = rl_manager.select_tools(query, top_n=1)
    print(f"Query: {query}")
    print(f"  Matched State: {state_key[:50]}...")
    print(f"  Selected Tool: {selected[0]}\n")

# Analyze embeddings
print("\n=== Embedding Analysis ===")
states = list(rl_manager.state_embeddings.keys())
print(f"Total unique states: {len(states)}")
print(f"Queries processed: {len(training_data) + len(test_queries)}")
print(f"State efficiency: {len(training_data) + len(test_queries)}/{len(states)} = "
      f"{(len(training_data) + len(test_queries))/len(states):.1f}x")

🎓 Summary

State representation with semantic embeddings provides:

  • Data Efficiency: Learn from fewer examples
  • Generalization: Knowledge transfers across similar queries
  • Robustness: Handles paraphrasing and variations
  • Fast Learning: Better cold-start performance
  • Scalability: Sublinear state growth

Enabling semantic embeddings is one of the most impactful features for efficient RL in Azcore.

Edit this page on GitHub
AzrienLabs logo

AzrienLabs

Craftedby Team AzrienLabs