State representation is how Azcore RL encodes queries and contexts into a format suitable for Q-learning. Using semantic embeddings enables generalization across similar queries, dramatically improving learning efficiency.
🎯 What is State Representation?
In RL, a state represents the current situation that determines action selection. In Azcore:
- State = User query or task description
- Action = Tool selection
- Goal = Learn which tools work for which types of queries
Without Semantic Representation
# Treats each query as completely unique
Query 1: "What's the weather in Paris?"
Query 2: "Weather in London?"
Query 3: "Temperature in Tokyo?"
# Problem: Must learn each separately, no knowledge transfer!
# Requires 3x the training data
With Semantic Representation
# Understands semantic similarity
Query 1: "What's the weather in Paris?" → Embedding: [0.12, 0.45, -0.23, ...]
Query 2: "Weather in London?" → Embedding: [0.14, 0.43, -0.21, ...]
Query 3: "Temperature in Tokyo?" → Embedding: [0.13, 0.44, -0.22, ...]
# Cosine similarities: 0.92, 0.89 → Knowledge transfers!
# Learn once, apply to similar queries
🏗️ Semantic Embeddings
Enable Embeddings
from azcore.rl.rl_manager import RLManager
rl_manager = RLManager(
tool_names=["weather", "search", "calculate"],
use_embeddings=True, # Enable semantic matching
embedding_model_name="all-MiniLM-L6-v2", # Model choice
similarity_threshold=0.7 # Match threshold
)
How It Works
User Query: "What's the temperature in NYC?"
↓
Embedding Model (Sentence Transformer)
↓
Vector: [0.15, 0.42, -0.19, 0.31, ..., 0.08] # 384 dimensions
↓
Similarity Search (Cosine Similarity)
↓
Compare with existing state embeddings
↓
If similarity > 0.7: Use existing state
If similarity < 0.7: Create new state
Example Flow
# First query - creates new state
query1 = "What's the weather in Paris?"
selected1, state_key1 = rl_manager.select_tools(query1)
# state_key1 = "What's the weather in Paris?"
# Creates embedding and stores in Q-table
# Second query - finds similar state
query2 = "Weather forecast for London?"
selected2, state_key2 = rl_manager.select_tools(query2)
# Computes embedding for query2
# Finds query1 with similarity 0.85 > 0.7
# state_key2 = "What's the weather in Paris?" (reuses!)
# Uses Q-values learned from query1
📊 Embedding Models
Available Models
Azcore uses sentence-transformers library. Popular models:
| Model | Dimensions | Size | Speed | Quality | Use Case |
|---|---|---|---|---|---|
all-MiniLM-L6-v2 | 384 | 80MB | ⚡ Fast | ⭐⭐⭐ Good | Default, production |
all-mpnet-base-v2 | 768 | 420MB | 🐌 Slow | ⭐⭐⭐⭐ Better | High accuracy |
all-MiniLM-L12-v2 | 384 | 120MB | ⚡ Fast | ⭐⭐⭐⭐ Better | Balanced |
paraphrase-multilingual-MiniLM-L12-v2 | 384 | 420MB | ⚡ Medium | ⭐⭐⭐ Good | Multilingual |
Choosing a Model
# Fast, production (default)
rl_manager = RLManager(
tool_names=tools,
embedding_model_name="all-MiniLM-L6-v2"
)
# High accuracy
rl_manager = RLManager(
tool_names=tools,
embedding_model_name="all-mpnet-base-v2"
)
# Multilingual support
rl_manager = RLManager(
tool_names=tools,
embedding_model_name="paraphrase-multilingual-MiniLM-L12-v2"
)
🔍 Similarity Matching
Cosine Similarity
Azcore uses cosine similarity to compare embeddings:
similarity = cos(θ) = (A · B) / (||A|| × ||B||)
Range: -1 to +1
- 1.0 = identical
- 0.0 = orthogonal (unrelated)
- -1.0 = opposite
Similarity Threshold
# Conservative matching (exact similarity required)
rl_manager = RLManager(
tool_names=tools,
similarity_threshold=0.9 # 90% similar
)
# Balanced matching (default)
rl_manager = RLManager(
tool_names=tools,
similarity_threshold=0.7 # 70% similar
)
# Aggressive matching (broad generalization)
rl_manager = RLManager(
tool_names=tools,
similarity_threshold=0.5 # 50% similar
)
Threshold Guidelines
| Threshold | Behavior | Use Case |
|---|---|---|
| 0.9 - 1.0 | Very strict, near-identical | Sensitive domains |
| 0.7 - 0.9 | Balanced, similar queries | General purpose (recommended) |
| 0.5 - 0.7 | Loose, broad generalization | Rapid learning, diverse queries |
| < 0.5 | Too loose, false matches | Not recommended |
💡 Benefits of Semantic Representation
1. Data Efficiency
# Without embeddings: Need to learn each query separately
queries_needed_without = 1000 # 1000 unique queries
# With embeddings: Learn from similar queries
queries_needed_with = 200 # 200 queries cover 1000 variations
efficiency_gain = queries_needed_without / queries_needed_with
print(f"5x more data efficient!")
2. Generalization
# Train on: "What's the weather in Paris?"
# Generalizes to:
- "Weather in London?" ✓
- "Temperature in Tokyo?" ✓
- "How's the weather in NYC?" ✓
- "Is it raining in Seattle?" ✓
# Even handles paraphrasing!
3. Robustness
# Handles variations automatically
"What is 2 + 2?"
"Calculate 2 plus 2"
"What's two plus two?"
"2+2 equals what?"
# All map to similar embeddings
# Uses same learned Q-values
4. Cold Start Performance
# First query ever
query = "Weather in Paris?"
# No prior knowledge, explores randomly
# Second query (similar)
query = "Temperature in London?"
# Uses knowledge from first query!
# Faster learning, better performance
🔧 Advanced Configuration
Embedding Caching
Embeddings are automatically cached to avoid recomputation:
from azcore.utils.caching import get_embedding_cache
# Global embedding cache
cache = get_embedding_cache()
# First call: Computes embedding
embedding1 = rl_manager._get_embedding("What's the weather?")
# Cache MISS - computes embedding
# Second call: Uses cache
embedding2 = rl_manager._get_embedding("What's the weather?")
# Cache HIT - instant retrieval
# Same embedding, no recomputation
assert np.array_equal(embedding1, embedding2)
State Inspection
# View all learned states
all_states = list(rl_manager.q_table.keys())
print(f"Total states: {len(all_states)}")
# View state embeddings
for state in all_states[:5]:
embedding = rl_manager.state_embeddings.get(state)
if embedding is not None:
print(f"State: {state[:50]}...")
print(f"Embedding shape: {embedding.shape}")
print(f"Embedding norm: {np.linalg.norm(embedding):.3f}\n")
Find Similar States
# Find states similar to a query
def find_similar_states(rl_manager, query, top_k=5):
"""Find top-k most similar states to query."""
from sentence_transformers import util
query_embedding = rl_manager._get_embedding(query)
if query_embedding is None:
return []
similarities = []
for state, embedding in rl_manager.state_embeddings.items():
sim = util.cos_sim(query_embedding, embedding).item()
similarities.append((state, sim))
# Sort by similarity
similarities.sort(key=lambda x: x[1], reverse=True)
return similarities[:top_k]
# Usage
similar = find_similar_states(rl_manager, "Weather in Tokyo?", top_k=3)
for state, sim in similar:
print(f"Similarity: {sim:.3f} - {state}")
📈 Embedding Analysis
Visualize Embeddings
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
# Collect embeddings
embeddings = []
labels = []
for state, emb in rl_manager.state_embeddings.items():
embeddings.append(emb)
labels.append(state[:30]) # Truncate for display
# Reduce to 2D using PCA
pca = PCA(n_components=2)
embeddings_2d = pca.fit_transform(np.array(embeddings))
# Plot
plt.figure(figsize=(12, 8))
plt.scatter(embeddings_2d[:, 0], embeddings_2d[:, 1])
for i, label in enumerate(labels):
plt.annotate(label, (embeddings_2d[i, 0], embeddings_2d[i, 1]))
plt.title("State Embeddings (2D Projection)")
plt.xlabel("PC1")
plt.ylabel("PC2")
plt.tight_layout()
plt.savefig("embeddings_visualization.png")
Embedding Quality Metrics
def analyze_embedding_quality(rl_manager):
"""Analyze quality of learned embeddings."""
from sentence_transformers import util
states = list(rl_manager.state_embeddings.keys())
embeddings = [rl_manager.state_embeddings[s] for s in states]
if len(embeddings) < 2:
return
# Compute pairwise similarities
similarities = []
for i in range(len(embeddings)):
for j in range(i + 1, len(embeddings)):
sim = util.cos_sim(embeddings[i], embeddings[j]).item()
similarities.append(sim)
similarities = np.array(similarities)
print(f"Embedding Analysis:")
print(f" States: {len(states)}")
print(f" Avg Similarity: {similarities.mean():.3f}")
print(f" Std Similarity: {similarities.std():.3f}")
print(f" Min Similarity: {similarities.min():.3f}")
print(f" Max Similarity: {similarities.max():.3f}")
print(f" Matches (>0.7): {(similarities > 0.7).sum()}")
analyze_embedding_quality(rl_manager)
🎯 Best Practices
1. Enable Embeddings by Default
# ✅ GOOD: Use embeddings for better generalization
rl_manager = RLManager(
tool_names=tools,
use_embeddings=True
)
# ❌ BAD: Disable only if you have specific reasons
rl_manager = RLManager(
tool_names=tools,
use_embeddings=False # Treats each query as unique
)
2. Tune Similarity Threshold
# Start with default 0.7
rl_manager = RLManager(
tool_names=tools,
similarity_threshold=0.7
)
# If learning too slow, decrease threshold
similarity_threshold=0.6 # More generalization
# If false matches, increase threshold
similarity_threshold=0.8 # More specificity
3. Monitor State Growth
# Track number of unique states
stats = rl_manager.get_statistics()
print(f"Total states: {stats['total_states']}")
print(f"Cached embeddings: {stats['cached_embeddings']}")
# Ideal: States grow sublinearly with queries
# If linear growth, threshold may be too high
4. Use Appropriate Model
# Production: Fast, good enough
embedding_model_name="all-MiniLM-L6-v2"
# High stakes: Better quality
embedding_model_name="all-mpnet-base-v2"
# Multilingual: Support multiple languages
embedding_model_name="paraphrase-multilingual-MiniLM-L12-v2"
🚀 Complete Example
from azcore.rl.rl_manager import RLManager
from azcore.rl.rewards import HeuristicRewardCalculator
from sentence_transformers import util
# Setup with embeddings
rl_manager = RLManager(
tool_names=["weather", "search", "calculate", "email"],
use_embeddings=True,
embedding_model_name="all-MiniLM-L6-v2",
similarity_threshold=0.7,
q_table_path="rl_data/semantic_agent.pkl"
)
reward_calc = HeuristicRewardCalculator()
# Training queries
training_data = [
("What's the weather in Paris?", "weather", 1.0),
("Temperature in NYC?", "weather", 1.0),
("Calculate 15 * 23", "calculate", 1.0),
("What's 50 plus 25?", "calculate", 1.0),
("Search for Python tutorials", "search", 1.0),
("Find information on AI", "search", 1.0),
]
# Train
for query, correct_tool, reward in training_data:
selected, state_key = rl_manager.select_tools(query, top_n=2)
# Reward correct tool selection
for tool in selected:
tool_reward = reward if tool == correct_tool else -0.5
rl_manager.update(state_key, tool, tool_reward)
print(f"Query: {query}")
print(f" State Key: {state_key[:50]}...")
print(f" Selected: {selected}\n")
# Test generalization
test_queries = [
"How's the weather in London?", # Similar to training weather queries
"Calculate 100 divided by 5", # Similar to training math queries
"Search for machine learning", # Similar to training search queries
]
print("\n=== Testing Generalization ===")
rl_manager.exploration_rate = 0.0 # Pure exploitation
for query in test_queries:
selected, state_key = rl_manager.select_tools(query, top_n=1)
print(f"Query: {query}")
print(f" Matched State: {state_key[:50]}...")
print(f" Selected Tool: {selected[0]}\n")
# Analyze embeddings
print("\n=== Embedding Analysis ===")
states = list(rl_manager.state_embeddings.keys())
print(f"Total unique states: {len(states)}")
print(f"Queries processed: {len(training_data) + len(test_queries)}")
print(f"State efficiency: {len(training_data) + len(test_queries)}/{len(states)} = "
f"{(len(training_data) + len(test_queries))/len(states):.1f}x")
🎓 Summary
State representation with semantic embeddings provides:
- Data Efficiency: Learn from fewer examples
- Generalization: Knowledge transfers across similar queries
- Robustness: Handles paraphrasing and variations
- Fast Learning: Better cold-start performance
- Scalability: Sublinear state growth
Enabling semantic embeddings is one of the most impactful features for efficient RL in Azcore.