Self-Consistency Pattern

Generate multiple independent solutions and select the most consistent answer through voting or aggregation. Improves accuracy and reliability for tasks with verifiable answers.

Overview

Self-Consistency is a powerful technique where an agent generates multiple independent solutions to the same problem, then selects the most consistent or frequent answer. This pattern is particularly effective for tasks with objective answers where multiple reasoning paths can lead to the same conclusion.

When to Use

Mathematical reasoning: Math problems, calculations, logic puzzles
Factual questions: Questions with definitive answers
Classification: Multiple-choice or categorization tasks
Fact verification: Checking claims against knowledge
Code generation: When tests can validate correctness
Decision making: Selecting between discrete options

Basic Usage

from azcore import SelfConsistencyAgent, Agent

# Create base agent
base_agent = Agent(
    agent_name="Reasoner",
    system_prompt="Solve the problem step by step. Show your reasoning.",
    llm=llm,
)

# Create self-consistency agent
self_consistency = SelfConsistencyAgent(
    agent=base_agent,
    num_generations=5,  # Generate 5 solutions
    voting_strategy="majority",  # Use majority voting
)

# Generate multiple solutions and select best
result = self_consistency.run("What is 15% of 240?")
print(f"Answer: {result.final_answer}")
print(f"Confidence: {result.confidence}")
print(f"All answers: {result.all_answers}")

Configuration Options

num_generations

Number of independent solutions to generate:

self_consistency = SelfConsistencyAgent(
    agent=base_agent,
    num_generations=10,  # More generations = higher confidence
)

Recommendations:

Simple tasks: 3-5 generations
Medium complexity: 5-7 generations
Complex/critical: 10+ generations

voting_strategy

How to select the final answer:

# Majority voting (default)
self_consistency = SelfConsistencyAgent(
    agent=base_agent,
    voting_strategy="majority",  # Most frequent answer wins
)

# Weighted voting by confidence
self_consistency = SelfConsistencyAgent(
    agent=base_agent,
    voting_strategy="weighted",  # Consider agent confidence scores
)

# Unanimous consensus
self_consistency = SelfConsistencyAgent(
    agent=base_agent,
    voting_strategy="unanimous",  # All must agree
)

# Custom voting function
def custom_vote(answers):
    # Your voting logic
    return selected_answer

self_consistency = SelfConsistencyAgent(
    agent=base_agent,
    voting_strategy="custom",
    voting_function=custom_vote,
)

temperature

Control diversity of solutions:

self_consistency = SelfConsistencyAgent(
    agent=base_agent,
    num_generations=5,
    temperature=0.8,  # Higher = more diverse solutions
)

parallel_execution

Run generations in parallel:

self_consistency = SelfConsistencyAgent(
    agent=base_agent,
    num_generations=10,
    parallel_execution=True,  # Faster but more resource-intensive
)

Advanced Examples

Mathematical Reasoning

from azcore import Agent, SelfConsistencyAgent

# Math-focused agent
math_agent = Agent(
    agent_name="Math Solver",
    system_prompt="""Solve math problems step by step.

    Format your final answer as: ANSWER: [number]

    Show all work:
    1. Identify the problem type
    2. Set up equations
    3. Solve step by step
    4. Verify the answer""",
    llm=llm,
)

# Self-consistency for math
math_solver = SelfConsistencyAgent(
    agent=math_agent,
    num_generations=7,
    voting_strategy="majority",
    answer_extractor=lambda text: re.search(r'ANSWER:\s*(\d+\.?\d*)', text).group(1),
)

# Solve math problem
problem = """
A store is having a 25% off sale. If an item originally costs $80,
and you have a coupon for an additional $10 off the sale price,
how much will you pay?
"""

result = math_solver.run(problem)
print(f"Answer: ${result.final_answer}")
print(f"Agreement: {result.agreement_rate:.1%}")
print(f"\nAll solutions:")
for i, answer in enumerate(result.all_answers):
    print(f"  {i+1}. ${answer}")

Multiple Choice Questions

# Multiple choice agent
mc_agent = Agent(
    agent_name="Multiple Choice",
    system_prompt="""Answer multiple choice questions.

    Think through each option carefully.
    Eliminate incorrect answers.
    Select the best answer.

    Format: ANSWER: [A/B/C/D]""",
    llm=llm,
)

# Self-consistency for MCQ
mc_solver = SelfConsistencyAgent(
    agent=mc_agent,
    num_generations=5,
    voting_strategy="majority",
    answer_extractor=lambda text: re.search(r'ANSWER:\s*([A-D])', text).group(1),
)

# Answer question
question = """
Which of the following is NOT a benefit of using microservices architecture?

A) Independent deployment of services
B) Technology diversity across services
C) Simpler system complexity
D) Better fault isolation

Provide your reasoning and answer.
"""

result = mc_solver.run(question)
print(f"Selected answer: {result.final_answer}")
print(f"Confidence: {result.confidence:.1%}")

Code Generation with Verification

# Code generation agent
code_agent = Agent(
    agent_name="Code Generator",
    system_prompt="""Generate Python code to solve the problem.

    Requirements:
    - Write clean, readable code
    - Include error handling
    - Add docstrings
    - Provide example usage""",
    llm=llm,
    tools=[code_execution_tool],
)

# Self-consistency for code
code_generator = SelfConsistencyAgent(
    agent=code_agent,
    num_generations=5,
    voting_strategy="test_passing",  # Select code that passes tests
    test_suite=test_cases,
)

# Generate code
problem = """
Write a function that finds the longest common subsequence of two strings.

Example:
lcs("ABCDGH", "AEDFHR") should return "ADH"
"""

result = code_generator.run(problem)
print("Generated code:")
print(result.final_answer)
print(f"\nTests passed: {result.tests_passed}/{result.total_tests}")

Fact Verification

# Fact checker agent
fact_checker = Agent(
    agent_name="Fact Checker",
    system_prompt="""Verify if the claim is true or false.

    Research the claim thoroughly.
    Cite reliable sources.
    Consider counter-evidence.

    Format: VERDICT: [TRUE/FALSE/UNCERTAIN]
    Confidence: [HIGH/MEDIUM/LOW]""",
    llm=llm,
    tools=[search_tool, wikipedia_tool],
)

# Self-consistency for facts
fact_verifier = SelfConsistencyAgent(
    agent=fact_checker,
    num_generations=5,
    voting_strategy="weighted",  # Weight by stated confidence
    confidence_extractor=lambda text: (
        "HIGH" if "Confidence: HIGH" in text else
        "MEDIUM" if "Confidence: MEDIUM" in text else
        "LOW"
    ),
)

# Verify claim
claim = "The Great Wall of China is visible from space with the naked eye."

result = fact_verifier.run(f"Verify this claim: {claim}")
print(f"Verdict: {result.final_answer}")
print(f"Consensus: {result.agreement_rate:.1%}")
print(f"\nIndividual verdicts:")
for i, verdict in enumerate(result.all_answers):
    print(f"  Attempt {i+1}: {verdict}")

Classification Task

# Sentiment classifier
sentiment_agent = Agent(
    agent_name="Sentiment Analyzer",
    system_prompt="""Classify the sentiment of the text.

    Categories: POSITIVE, NEGATIVE, NEUTRAL

    Consider:
    - Overall tone
    - Emotional words
    - Context
    - Sarcasm or irony

    Format: SENTIMENT: [category]""",
    llm=llm,
)

# Self-consistency for classification
classifier = SelfConsistencyAgent(
    agent=sentiment_agent,
    num_generations=5,
    voting_strategy="majority",
    answer_extractor=lambda text: re.search(r'SENTIMENT:\s*(\w+)', text).group(1),
)

# Classify sentiment
text = """
The product arrived late and the packaging was damaged, but the customer
service team was incredibly helpful and immediately sent a replacement.
"""

result = classifier.run(text)
print(f"Sentiment: {result.final_answer}")
print(f"Agreement: {result.agreement_rate:.1%}")
print(f"\nVote distribution:")
for sentiment, count in result.vote_distribution.items():
    print(f"  {sentiment}: {count} votes")

Voting Strategies

Majority Voting

Simple and effective - most frequent answer wins:

def majority_vote(answers):
    from collections import Counter
    counts = Counter(answers)
    return counts.most_common(1)[0][0]

Best for:

Discrete answers (A/B/C, True/False)
Classification tasks
Multiple choice questions

Weighted Voting

Consider confidence scores:

def weighted_vote(answers_with_confidence):
    scores = {}
    for answer, confidence in answers_with_confidence:
        scores[answer] = scores.get(answer, 0) + confidence
    return max(scores, key=scores.get)

Best for:

When agents provide confidence scores
Nuanced decision making
Combining varying quality outputs

Unanimous Consensus

All generations must agree:

def unanimous_vote(answers):
    if len(set(answers)) == 1:
        return answers[0]
    else:
        raise NoConsensusError("Generations do not agree")

Best for:

Critical decisions
Safety-critical applications
High-stakes scenarios

Threshold-Based

Require minimum agreement:

def threshold_vote(answers, threshold=0.6):
    from collections import Counter
    counts = Counter(answers)
    most_common, count = counts.most_common(1)[0]

    if count / len(answers) >= threshold:
        return most_common
    else:
        raise InsufficientConsensusError(f"Agreement below {threshold:.0%}")

Best for:

Balancing confidence and diversity
Quality control
Adjustable strictness

Answer Extraction

Extract the final answer from agent outputs:

Pattern Matching

import re

def extract_answer(text):
    # Look for "ANSWER: X" pattern
    match = re.search(r'ANSWER:\s*(.+)', text, re.IGNORECASE)
    if match:
        return match.group(1).strip()
    return None

Last Line

def extract_last_line(text):
    lines = text.strip().split('\n')
    return lines[-1] if lines else None

JSON Extraction

import json

def extract_json_answer(text):
    # Extract JSON object
    match = re.search(r'\{.*\}', text, re.DOTALL)
    if match:
        data = json.loads(match.group(0))
        return data.get('answer')
    return None

Numerical Extraction

def extract_number(text):
    # Find all numbers
    numbers = re.findall(r'-?\d+\.?\d*', text)
    # Return last number (usually the answer)
    return float(numbers[-1]) if numbers else None

Best Practices

1. Clear Answer Format

Instruct agents to format answers consistently:

system_prompt = """
Solve the problem and provide your answer in this format:

FINAL ANSWER: [your answer here]

This format is mandatory for answer extraction.
"""

2. Appropriate num_generations

Balance quality vs. cost:

# Simple task: fewer generations
easy_task = SelfConsistencyAgent(agent=agent, num_generations=3)

# Complex task: more generations
hard_task = SelfConsistencyAgent(agent=agent, num_generations=10)

# Critical task: many generations
critical_task = SelfConsistencyAgent(agent=agent, num_generations=20)

3. Temperature Settings

Higher temperature increases diversity:

# Low diversity (fast convergence)
agent_low = Agent(temperature=0.3)
sc_low = SelfConsistencyAgent(agent=agent_low, num_generations=5)

# High diversity (explore more paths)
agent_high = Agent(temperature=0.9)
sc_high = SelfConsistencyAgent(agent=agent_high, num_generations=5)

4. Monitor Agreement Rate

Low agreement indicates uncertain answer:

result = self_consistency.run(task)

if result.agreement_rate < 0.5:
    print("Warning: Low agreement. Answer may be unreliable.")
    # Consider increasing num_generations or reformulating task

5. Use Parallel Execution

For speed when possible:

self_consistency = SelfConsistencyAgent(
    agent=agent,
    num_generations=10,
    parallel_execution=True,  # Faster
    max_workers=5,  # Control parallelism
)

Performance Considerations

Latency

# Sequential: latency = num_generations × agent_time
sc_sequential = SelfConsistencyAgent(
    agent=agent,
    num_generations=10,
    parallel_execution=False,
)
# Latency: ~10x single agent

# Parallel: latency ≈ agent_time
sc_parallel = SelfConsistencyAgent(
    agent=agent,
    num_generations=10,
    parallel_execution=True,
)
# Latency: ~1x single agent (with more resources)

Cost

# Cost = num_generations × per_agent_cost
# Monitor token usage
result = self_consistency.run(task)
print(f"Total tokens: {result.total_tokens}")
print(f"Cost: ${result.estimated_cost}")

Quality vs. Generations

Diminishing returns after certain point:

Agreement Rate vs. Num Generations:

100% |     ████████████
     |   ███
     | ██
     |█
  0% |________________
     0  5  10  15  20
        Generations

Optimal: 5-10 generations for most tasks

Error Handling

Handle cases where consensus isn't reached:

try:
    result = self_consistency.run(task)
except NoConsensusError:
    # Fallback strategy
    result = fallback_agent.run(task)
except InsufficientGenerationsError:
    # Too few valid generations
    result = retry_with_more_generations()

Debugging

Inspect All Generations

result = self_consistency.run(task)

print("All generations:")
for i, generation in enumerate(result.generations):
    print(f"\n=== Generation {i+1} ===")
    print(generation.output)
    print(f"Extracted answer: {generation.extracted_answer}")

Analyze Vote Distribution

print("\nVote distribution:")
for answer, count in sorted(result.vote_distribution.items(),
                           key=lambda x: x[1], reverse=True):
    percentage = (count / result.num_generations) * 100
    print(f"{answer}: {count} votes ({percentage:.1f}%)")

Check Reasoning Paths

# Group generations by answer
from collections import defaultdict

by_answer = defaultdict(list)
for gen in result.generations:
    by_answer[gen.extracted_answer].append(gen.reasoning)

print(f"\nAnswer: {result.final_answer}")
print("Supporting reasoning paths:")
for reasoning in by_answer[result.final_answer]:
    print(f"\n- {reasoning[:200]}...")

Limitations

Not Suitable For:

Open-ended generation: No "correct" answer to vote on
Creative tasks: Diversity is the goal, not consensus
Very simple tasks: Overkill and wasteful
Tasks without extractable answers: Can't identify agreement

Better Alternatives:

Open-ended → Use Reflexion or Reasoning Duo
Creative → Use single agent or Mixture of Agents
Simple → Use single agent call
Complex reasoning → Combine with Reflexion

Research Background

Based on "Self-Consistency Improves Chain of Thought Reasoning in Language Models":

Paper: arxiv.org/abs/2203.11171
Authors: Wang et al., 2023
Key finding: Sampling diverse reasoning paths and marginalizing out reasoning improves accuracy

.css-79wky{color:var(--chakra-colors-white);}AzrienLabs

Self-Consistency Pattern

Overview

When to Use

Basic Usage

Configuration Options

num_generations

voting_strategy

temperature

parallel_execution

Advanced Examples

Mathematical Reasoning

Multiple Choice Questions

Code Generation with Verification

Fact Verification

Classification Task

Voting Strategies

Majority Voting

Weighted Voting

Unanimous Consensus

Threshold-Based

Answer Extraction

Pattern Matching

Last Line

JSON Extraction

Numerical Extraction

Best Practices

1. Clear Answer Format

2. Appropriate num_generations

3. Temperature Settings

4. Monitor Agreement Rate

5. Use Parallel Execution

Performance Considerations

Latency

Cost

Quality vs. Generations

Error Handling

Debugging

Inspect All Generations

Analyze Vote Distribution

Check Reasoning Paths

Limitations

Not Suitable For:

Better Alternatives:

Research Background

AzrienLabs