Beyond basic agent execution, Arc provides sophisticated patterns that enhance agent capabilities through specialized techniques like self-consistency, reflection, and reasoning.
Overview
Advanced agent patterns implement proven AI research techniques to improve agent performance, reliability, and reasoning quality. These patterns can be used individually or combined for powerful multi-stage reasoning systems.
Available Patterns
Self-Consistency
Generate multiple solutions and select the best through voting or consensus.
Key benefits:
- Improved accuracy through multiple attempts
- Reduced hallucinations
- Natural error detection
- Works well for problems with verifiable answers
When to use: Mathematical reasoning, fact verification, multiple-choice questions
Reflexion
Iterative self-improvement through reflection on past attempts and failures.
Key benefits:
- Learn from mistakes
- Progressively improve outputs
- Self-correction without human intervention
- Memory of past attempts
When to use: Complex problem-solving, iterative refinement, tasks requiring multiple attempts
Reasoning Duo
Two agents collaborate: one proposes solutions, another critiques and provides feedback.
Key benefits:
- Structured critique and improvement
- Separation of generation and evaluation
- Systematic quality improvement
- Debate-like reasoning
When to use: High-stakes decisions, quality-critical outputs, creative problem-solving
Agent Judge
Specialized agent evaluates and scores outputs from other agents.
Key benefits:
- Consistent evaluation criteria
- Objective quality assessment
- Automated quality gates
- Selection between alternatives
When to use: Output validation, quality control, A/B testing, selection tasks
Pattern Comparison
| Pattern | Complexity | Cost | Quality Improvement | Use Case |
|---|---|---|---|---|
| Self-Consistency | Low | Medium | High | Factual accuracy |
| Reflexion | Medium | High | Very High | Iterative tasks |
| Reasoning Duo | Medium | Medium | High | Creative work |
| Agent Judge | Low | Low | Medium | Evaluation |
Quick Selection Guide
┌─────────────────────────────────────────────┐
│ What do you need? │
└──────────────┬──────────────────────────────┘
│
┌─────────┴──────────┐
│ │
Multiple attempts Single attempt
│ │
│ ┌─────┴──────┐
│ │ │
│ Need feedback Need evaluation
│ │ │
│ Reasoning Duo Agent Judge
│
┌────┴─────┐
│ │
Known answer Unknown answer
│ │
Self- Reflexion
Consistency
Performance Characteristics
Latency
Self-Consistency: ████████░░ (8/10) - Multiple parallel calls
Reflexion: ██████████ (10/10) - Sequential iterations
Reasoning Duo: ████░░░░░░ (4/10) - Two sequential calls
Agent Judge: ██░░░░░░░░ (2/10) - Single evaluation
Cost
Self-Consistency: ███████░░░ (7/10) - N parallel generations
Reflexion: ██████████ (10/10) - Multiple iterations
Reasoning Duo: ████░░░░░░ (4/10) - Two agent calls
Agent Judge: ██░░░░░░░░ (2/10) - One evaluation
Quality Improvement
Self-Consistency: ████████░░ (8/10) - Strong for verifiable tasks
Reflexion: ██████████ (10/10) - Learns from mistakes
Reasoning Duo: ████████░░ (8/10) - Structured improvement
Agent Judge: █████░░░░░ (5/10) - Consistent evaluation
Combining Patterns
Patterns can be combined for enhanced capabilities:
Self-Consistency + Agent Judge
# Generate multiple candidates with self-consistency
# Use judge to select best one
candidates = self_consistency_agent.run(task)
best = judge_agent.evaluate(candidates)
Reflexion + Reasoning Duo
# Use reasoning duo within reflexion loop
# Each iteration has proposal + critique
for iteration in range(max_iterations):
proposal = proposer.run(task)
critique = critic.run(proposal)
if critique.is_acceptable():
break
All Patterns Together
# Ultimate quality pipeline
candidates = self_consistency_agent.run(task)
refined = []
for candidate in candidates:
improved = reflexion_agent.run(candidate)
polished = reasoning_duo_agent.run(improved)
refined.append(polished)
best = judge_agent.select_best(refined)
Common Use Cases
Mathematical Reasoning
Best pattern: Self-Consistency
- Generate multiple solutions
- Vote on final answer
- High accuracy on math problems
Creative Writing
Best pattern: Reasoning Duo or Reflexion
- Iterative improvement
- Structured feedback
- Progressive refinement
Code Generation
Best pattern: Reflexion
- Learn from compilation errors
- Fix bugs iteratively
- Improve until tests pass
Decision Making
Best pattern: Self-Consistency + Agent Judge
- Multiple perspectives
- Objective evaluation
- Consensus building
Research and Analysis
Best pattern: Reasoning Duo
- Proposal and critique
- Thorough analysis
- Balanced perspectives
Implementation Patterns
Sequential Pattern
# One pattern after another
draft = agent.run(task)
reviewed = reasoning_duo.run(draft)
final = reflexion.improve(reviewed)
Parallel Pattern
# Multiple approaches simultaneously
results = []
results.append(self_consistency.run(task))
results.append(reasoning_duo.run(task))
results.append(reflexion.run(task))
best = judge.select_best(results)
Nested Pattern
# Patterns within patterns
def enhanced_self_consistency(task):
candidates = []
for _ in range(n):
# Each candidate uses reasoning duo
candidate = reasoning_duo.run(task)
candidates.append(candidate)
return judge.select_best(candidates)
Best Practices
1. Match Pattern to Task
- Use self-consistency for factual tasks
- Use reflexion for iterative tasks
- Use reasoning duo for creative tasks
- Use judge for evaluation tasks
2. Consider Cost vs. Quality
- Simple tasks: Single agent
- Medium tasks: Reasoning duo or judge
- Complex tasks: Self-consistency or reflexion
- Critical tasks: Combine patterns
3. Set Appropriate Limits
# Don't overdo iterations
self_consistency_agent = SelfConsistencyAgent(
num_generations=5, # Not 50
)
reflexion_agent = ReflexionAgent(
max_iterations=3, # Not 20
)
4. Monitor Performance
# Track metrics
result = pattern.run(task)
metrics = {
"latency": result.duration,
"cost": result.token_usage,
"quality": evaluate_quality(result.output),
"iterations": result.iterations_used,
}
5. Use Early Stopping
# Stop when good enough
reflexion_agent = ReflexionAgent(
max_iterations=10,
early_stopping=True,
quality_threshold=0.9,
)
Anti-Patterns
❌ Using Advanced Patterns for Simple Tasks
# Bad: Overkill
result = reflexion_agent.run("What is 2+2?")
# Good: Simple task, simple solution
result = agent.run("What is 2+2?")
❌ Combining Too Many Patterns
# Bad: Unnecessary complexity and cost
result = self_consistency(
reflexion(
reasoning_duo(
agent.run(task)
)
)
)
# Good: One or two patterns max
result = reasoning_duo(agent.run(task))
❌ No Stopping Criteria
# Bad: Runs forever
reflexion_agent = ReflexionAgent(max_iterations=999)
# Good: Reasonable limit
reflexion_agent = ReflexionAgent(max_iterations=5)
Performance Optimization
Caching
# Cache repeated generations
@lru_cache(maxsize=100)
def cached_generation(task_hash):
return agent.run(task)
Parallel Execution
# Self-consistency with parallel execution
with ThreadPoolExecutor() as executor:
futures = [
executor.submit(agent.run, task)
for _ in range(num_generations)
]
results = [f.result() for f in futures]
Streaming
# Stream reflexion iterations
reflexion_agent = ReflexionAgent(streaming=True)
for iteration_result in reflexion_agent.stream(task):
print(f"Iteration {iteration_result.iteration}: {iteration_result.output}")
Debugging
Enable Verbose Logging
pattern = SelfConsistencyAgent(verbose=True)
result = pattern.run(task)
# Logs all generations and voting process
Access Intermediate Results
result = reflexion_agent.run(task)
print(f"Iterations: {result.num_iterations}")
print(f"History: {result.iteration_history}")
print(f"Improvements: {result.improvement_scores}")
Visualize Pattern Execution
def visualize_reflexion(result):
for i, iteration in enumerate(result.iteration_history):
print(f"\n=== Iteration {i+1} ===")
print(f"Output: {iteration.output[:100]}...")
print(f"Reflection: {iteration.reflection[:100]}...")
print(f"Score: {iteration.score}")
Pattern Selection Checklist
Before using an advanced pattern, ask:
- Is the task complex enough to warrant advanced patterns?
- What's the acceptable latency?
- What's the budget constraint?
- Is quality improvement measurable?
- Can the pattern's output be evaluated?
- Are multiple attempts beneficial?
- Is iterative improvement possible?
Getting Started
- Start simple: Try patterns individually
- Measure baseline: Compare against single agent
- Optimize: Tune parameters (iterations, generations)
- Combine carefully: Only if justified by results
- Monitor: Track cost and quality metrics
Research Background
These patterns are based on published research: