Reflexion Pattern

Iterative self-improvement through reflection on past attempts and failures. The agent learns from its mistakes and progressively improves its outputs.

Overview

Reflexion enables agents to iteratively improve their outputs by reflecting on previous attempts, identifying weaknesses, and incorporating learnings into subsequent iterations. This creates a self-improvement loop that can solve complex problems through progressive refinement.

When to Use

Complex problem-solving: Multi-step problems requiring iteration
Code generation: Fixing bugs and improving code quality
Iterative refinement: Tasks that benefit from multiple attempts
Learning from failures: When initial attempts often need improvement
Quality-critical outputs: When excellence requires iteration
Tasks with feedback: Measurable improvement criteria

Basic Usage

from azcore import ReflexionAgent, Agent

# Create base agent
base_agent = Agent(
    agent_name="Problem Solver",
    system_prompt="Solve problems step by step with careful reasoning.",
    llm=llm,
)

# Create reflexion agent
reflexion = ReflexionAgent(
    agent=base_agent,
    max_iterations=5,
    self_reflect=True,
)

# Run with iterative improvement
result = reflexion.run("Solve this complex problem...")
print(f"Final solution: {result.final_output}")
print(f"Iterations used: {result.iterations_used}")
print(f"Improvement: {result.improvement_score}")

Configuration Options

max_iterations

Maximum number of improvement attempts:

reflexion = ReflexionAgent(
    agent=base_agent,
    max_iterations=10,  # Up to 10 refinement rounds
)

reflection_prompt

Custom reflection instructions:

reflexion = ReflexionAgent(
    agent=base_agent,
    max_iterations=5,
    reflection_prompt="""Analyze your previous attempt:
    1. What went well?
    2. What could be improved?
    3. What mistakes did you make?
    4. How will you improve next time?""",
)

evaluator

Custom evaluation function:

def custom_evaluator(output, iteration):
    # Return score 0-1
    score = evaluate_quality(output)
    return score

reflexion = ReflexionAgent(
    agent=base_agent,
    evaluator=custom_evaluator,
    target_score=0.9,  # Stop when score >= 0.9
)

early_stopping

Stop when quality threshold reached:

reflexion = ReflexionAgent(
    agent=base_agent,
    max_iterations=10,
    early_stopping=True,
    target_score=0.85,
)

memory_window

Number of past iterations to remember:

reflexion = ReflexionAgent(
    agent=base_agent,
    max_iterations=10,
    memory_window=3,  # Remember last 3 iterations
)

Advanced Examples

Code Generation with Bug Fixing

from azcore import Agent, ReflexionAgent

# Code generation agent
code_agent = Agent(
    agent_name="Code Generator",
    system_prompt="""Generate Python code to solve the problem.

    Your code will be tested. If it fails, you'll see the error
    and can fix it in the next iteration.

    Always include:
    - Error handling
    - Input validation
    - Clear variable names
    - Docstrings""",
    llm=llm,
    tools=[code_execution_tool],
)

# Reflection for code
def evaluate_code(code, test_results):
    if test_results["all_passed"]:
        return 1.0
    else:
        passed_ratio = test_results["passed"] / test_results["total"]
        return passed_ratio

code_reflexion = ReflexionAgent(
    agent=code_agent,
    max_iterations=5,
    evaluator=evaluate_code,
    reflection_prompt="""Review the test failures:

    Test results: {test_results}
    Error messages: {errors}

    Reflect on:
    1. Why did these tests fail?
    2. What assumptions were wrong?
    3. How should you fix the code?
    4. What edge cases did you miss?""",
)

# Generate and refine code
problem = """
Write a function to find all prime numbers up to n using the Sieve of Eratosthenes.

Requirements:
- Handle n <= 1 (return empty list)
- Optimize for large n
- Include comprehensive tests
"""

result = code_reflexion.run(problem)
print(f"Final code:\n{result.final_output}")
print(f"Tests passed: {result.final_score:.0%}")
print(f"Iterations needed: {result.iterations_used}")

Research and Writing

# Research writer agent
research_writer = Agent(
    agent_name="Research Writer",
    system_prompt="""Write comprehensive research content.

    Include:
    - Clear thesis
    - Well-structured arguments
    - Citations and sources
    - Evidence-based reasoning
    - Balanced perspective""",
    llm=llm,
    tools=[search_tool, citation_tool],
)

# Quality evaluator
def evaluate_writing(text, criteria):
    scores = {
        "clarity": evaluate_clarity(text),
        "depth": evaluate_depth(text),
        "sources": evaluate_citations(text),
        "structure": evaluate_structure(text),
    }
    return sum(scores.values()) / len(scores)

# Reflexion for research
research_reflexion = ReflexionAgent(
    agent=research_writer,
    max_iterations=4,
    evaluator=evaluate_writing,
    reflection_prompt="""Evaluate your draft:

    Clarity: {clarity_score}/10
    Depth: {depth_score}/10
    Sources: {source_score}/10
    Structure: {structure_score}/10

    Identify improvements needed:
    1. Which arguments need strengthening?
    2. What sources should you add?
    3. How can you improve clarity?
    4. What structural changes are needed?

    Rewrite to address these issues.""",
    target_score=0.85,
)

# Write and refine article
topic = "The impact of large language models on software development"
result = research_reflexion.run(f"Write a 1000-word research article on: {topic}")

print(f"Final article: {result.final_output}")
print(f"Quality score: {result.final_score:.1%}")
print(f"\nImprovement trajectory:")
for i, score in enumerate(result.score_history):
    print(f"  Iteration {i+1}: {score:.1%}")

Mathematical Problem Solving

# Math solver agent
math_solver = Agent(
    agent_name="Math Solver",
    system_prompt="""Solve mathematical problems step by step.

    If you make errors, you'll see where you went wrong
    and can correct your approach.

    Show all work:
    - State the problem
    - Identify the approach
    - Work through calculations
    - Verify the answer""",
    llm=llm,
)

# Math evaluator
def evaluate_math(solution, correct_answer):
    # Extract final answer
    final_answer = extract_answer(solution)

    # Check correctness
    if abs(float(final_answer) - correct_answer) < 0.01:
        return 1.0
    else:
        # Partial credit for methodology
        return evaluate_methodology(solution) * 0.5

# Reflexion for math
math_reflexion = ReflexionAgent(
    agent=math_solver,
    max_iterations=3,
    evaluator=evaluate_math,
    reflection_prompt="""Your answer was incorrect.

    Your answer: {your_answer}
    Correct answer: {correct_answer}

    Reflect on:
    1. Where did your calculation go wrong?
    2. What formula or principle did you misapply?
    3. What should you check more carefully?

    Try solving again with more care.""",
)

# Solve problem
problem = """
A car travels from City A to City B at 60 mph and returns at 40 mph.
What is the average speed for the entire trip?
"""

result = math_reflexion.run(problem)
print(f"Solution:\n{result.final_output}")
print(f"Correct: {result.final_score == 1.0}")

Strategy and Planning

# Strategy agent
strategist = Agent(
    agent_name="Strategist",
    system_prompt="""Develop strategic plans for complex problems.

    Consider:
    - Multiple stakeholders
    - Resource constraints
    - Risks and mitigations
    - Short and long-term goals
    - Success metrics""",
    llm=llm,
)

# Strategy evaluator
def evaluate_strategy(plan, criteria):
    scores = {
        "feasibility": assess_feasibility(plan),
        "completeness": assess_completeness(plan),
        "risk_management": assess_risk_handling(plan),
        "stakeholder_alignment": assess_stakeholder_fit(plan),
    }
    return sum(scores.values()) / len(scores)

# Reflexion for strategy
strategy_reflexion = ReflexionAgent(
    agent=strategist,
    max_iterations=4,
    evaluator=evaluate_strategy,
    reflection_prompt="""Review your strategic plan:

    Scores:
    - Feasibility: {feasibility}/10
    - Completeness: {completeness}/10
    - Risk Management: {risk_management}/10
    - Stakeholder Alignment: {alignment}/10

    Gaps identified: {gaps}

    Improve the plan by:
    1. Addressing the identified gaps
    2. Adding missing details
    3. Strengthening risk mitigations
    4. Improving stakeholder alignment""",
)

# Develop strategy
challenge = "Launch a new AI product in a competitive market with limited resources"
result = strategy_reflexion.run(f"Develop a strategic plan for: {challenge}")

print(f"Final strategy:\n{result.final_output}")
print(f"Quality score: {result.final_score:.1%}")

Reflection Strategies

Self-Critique

Agent critiques its own output:

reflection_prompt = """Critically analyze your previous attempt:

Strengths:
- What worked well?
- What insights were valuable?

Weaknesses:
- What was missing?
- What was incorrect?
- What could be clearer?

Next steps:
- How will you improve?
- What will you focus on?"""

Error-Focused

Focus on specific errors:

reflection_prompt = """You made these errors: {errors}

For each error:
1. Why did it occur?
2. What was the root cause?
3. How will you prevent it?

Rewrite to fix all errors."""

Comparative

Compare with reference or benchmark:

reflection_prompt = """Compare your output with the reference:

Your output: {your_output}
Reference: {reference}

Differences:
- What's missing?
- What's different?
- What could be better?

Improve to match or exceed the reference."""

Incremental

Build on previous iteration:

reflection_prompt = """Your previous attempt: {previous_output}

Good aspects to keep:
- {good_aspects}

Areas to improve:
- {improvement_areas}

Build on your previous work, keeping what was good
and improving what wasn't."""

Evaluation Strategies

Test-Based

Use automated tests:

def test_based_evaluator(code):
    results = run_tests(code)
    return results.pass_rate

Rubric-Based

Score against criteria:

def rubric_evaluator(output):
    scores = {
        "accuracy": score_accuracy(output),
        "completeness": score_completeness(output),
        "clarity": score_clarity(output),
    }
    return sum(scores.values()) / len(scores)

Comparative

Compare with reference:

def comparative_evaluator(output, reference):
    similarity = compute_similarity(output, reference)
    return similarity

Judge Agent

Use another agent to evaluate:

judge = Agent(
    system_prompt="Evaluate the quality of the output on a scale of 0-10"
)

def judge_evaluator(output):
    score = judge.run(f"Evaluate: {output}")
    return float(extract_score(score)) / 10

Best Practices

1. Clear Improvement Criteria

Define what "better" means:

reflexion = ReflexionAgent(
    agent=agent,
    evaluator=clear_evaluator,
    reflection_prompt="""Improve specifically on:
    1. Accuracy of facts
    2. Clarity of explanation
    3. Completeness of coverage""",
)

2. Reasonable max_iterations

Don't over-iterate:

# Simple tasks: 2-3 iterations
reflexion = ReflexionAgent(agent=agent, max_iterations=3)

# Complex tasks: 3-5 iterations
reflexion = ReflexionAgent(agent=agent, max_iterations=5)

# Rarely useful: 10+ iterations

3. Early Stopping

Stop when good enough:

reflexion = ReflexionAgent(
    agent=agent,
    max_iterations=10,
    early_stopping=True,
    target_score=0.9,  # Stop at 90% quality
)

4. Memory Management

Limit context growth:

reflexion = ReflexionAgent(
    agent=agent,
    max_iterations=10,
    memory_window=3,  # Only remember last 3 iterations
)

5. Monitor Progress

Track improvement:

result = reflexion.run(task)

if result.final_score <= result.initial_score:
    print("Warning: No improvement achieved")

print(f"Improvement: {result.final_score - result.initial_score:+.1%}")

Performance Considerations

Latency

# Latency = max_iterations × agent_time
# Can be slow for high max_iterations

# Optimize with early stopping
reflexion = ReflexionAgent(
    agent=agent,
    max_iterations=10,  # Maximum
    early_stopping=True,  # But stop early if possible
    target_score=0.85,
)

Cost

# Cost = iterations_used × agent_cost
# Monitor actual iterations

result = reflexion.run(task)
print(f"Iterations: {result.iterations_used}/{result.max_iterations}")
print(f"Cost: ${result.estimated_cost:.2f}")

Diminishing Returns

Quality vs. Iterations:

100% |        ████████
      |     ███
      |   ██
      | ██
  50% |█
      |_________________
      0  2  4  6  8  10
         Iterations

Most improvement in first 2-3 iterations.

Error Handling

Handle edge cases:

try:
    result = reflexion.run(task)
except MaxIterationsReachedError:
    print("Failed to reach target quality")
    # Use best attempt so far
    result = result.best_attempt
except NoImprovementError:
    print("Agent not improving")
    # Fall back to different approach

Debugging

Inspect Iteration History

result = reflexion.run(task)

print("Iteration history:")
for i, iteration in enumerate(result.iteration_history):
    print(f"\n=== Iteration {i+1} ===")
    print(f"Output: {iteration.output[:100]}...")
    print(f"Reflection: {iteration.reflection[:100]}...")
    print(f"Score: {iteration.score:.2f}")

Visualize Improvement

import matplotlib.pyplot as plt

scores = result.score_history
iterations = list(range(1, len(scores) + 1))

plt.plot(iterations, scores, marker='o')
plt.xlabel('Iteration')
plt.ylabel('Quality Score')
plt.title('Reflexion Improvement Trajectory')
plt.grid(True, alpha=0.3)
plt.show()

Analyze Reflections

print("\nAll reflections:")
for i, reflection in enumerate(result.reflections):
    print(f"\nIteration {i+1} reflection:")
    print(reflection)

Limitations

Not Suitable For:

Simple tasks: Overkill and wasteful
One-shot tasks: No opportunity for improvement
Real-time systems: Too slow
Undefined improvement criteria: Can't measure progress

Better Alternatives:

Simple tasks → Single agent
Real-time → Self-consistency or single agent
Creative exploration → Mixture of Agents
Undefined criteria → Reasoning Duo

Research Background

Based on "Reflexion: Language Agents with Verbal Reinforcement Learning":

Paper: arxiv.org/abs/2303.11366
Authors: Shinn et al., 2023
Key insight: Self-reflection enables agents to learn from failures

.css-79wky{color:var(--chakra-colors-white);}AzrienLabs

Reflexion Pattern

Overview

When to Use

Basic Usage

Configuration Options

max_iterations

reflection_prompt

evaluator

early_stopping

memory_window

Advanced Examples

Code Generation with Bug Fixing

Research and Writing

Mathematical Problem Solving

Strategy and Planning

Reflection Strategies

Self-Critique

Error-Focused

Comparative

Incremental

Evaluation Strategies

Test-Based

Rubric-Based

Comparative

Judge Agent

Best Practices

1. Clear Improvement Criteria

2. Reasonable max_iterations

3. Early Stopping

4. Memory Management

5. Monitor Progress

Performance Considerations

Latency

Cost

Diminishing Returns

Error Handling

Debugging

Inspect Iteration History

Visualize Improvement

Analyze Reflections

Limitations

Not Suitable For:

Better Alternatives:

Research Background

AzrienLabs