Prompt Optimization

One of the most powerful features of the PlanAI CLI is automatic prompt optimization. This tool helps refine prompts for Large Language Models by leveraging more advanced LLMs to improve prompt effectiveness based on real production data.

Overview

The prompt optimization tool:

Automates the process of iterating and improving prompts
Uses real production data from debug logs for optimization
Dynamically loads and uses production classes in the workflow
Employs an LLM-based scoring mechanism to evaluate prompt effectiveness
Adapts to various LLM tasks

Prerequisites

1. Enable Debug Mode

First, enable debug logging in your LLMTaskWorker:

class MyAnalyzer(LLMTaskWorker):
    prompt = "Analyze this data and provide insights"
    llm_input_type: Type[Task] = DataTask
    output_types: List[Type[Task]] = [AnalysisResult]

    # Enable debug mode to generate logs
    debug_mode = True

2. Generate Debug Data

Run your workflow to collect real production data:

# Your normal workflow execution
graph = Graph(name="Analysis Pipeline")
analyzer = MyAnalyzer(llm=llm)
graph.add_worker(analyzer)
graph.run(initial_tasks=[...])

# Debug logs will be saved to debug/MyAnalyzer.json

Basic Usage

Run the prompt optimization:

planai --llm-provider openai --llm-model gpt-4o-mini --llm-reason-model gpt-4 \
  optimize-prompt \
  --python-file my_app.py \
  --class-name MyAnalyzer \
  --search-path . \
  --debug-log debug/MyAnalyzer.json \
  --goal-prompt "Improve accuracy and reduce response length"

Command Options

Required Arguments

--python-file: Path to the Python file containing your LLMTaskWorker class
--class-name: Name of the LLMTaskWorker class to optimize
--search-path: Python path for module imports (usually . for current directory)
--debug-log: Path to the debug log file generated by your worker
--goal-prompt: Description of what you want to optimize for

Optional Arguments

--num-iterations: Number of optimization iterations (default: 3)
--output-dir: Directory to save optimized prompts (default: current directory)
--max-samples: Maximum number of debug samples to use (default: all)
--temperature: Temperature for prompt generation (default: 0.7)
--scoring-examples: Number of examples to use for scoring (default: 5)

Optimization Goals

Craft effective goal prompts for different objectives:

Accuracy Improvement

--goal-prompt "Improve accuracy in extracting key information while maintaining the current format"

Token Reduction

--goal-prompt "Reduce token usage while preserving output quality and completeness"

Format Consistency

--goal-prompt "Ensure consistent output format across all responses, following the Pydantic model structure"

Error Reduction

--goal-prompt "Minimize parsing errors and ensure all required fields are always populated"

Domain-Specific

--goal-prompt "Improve medical terminology accuracy and ensure HIPAA-compliant responses"

Output Files

The optimization process generates two types of files:

1. Optimized Prompt Text

optimized_prompt_MyAnalyzer_1.txt:

Analyze the provided data with focus on:
1. Key metrics and their trends
2. Anomalies or outliers
3. Actionable recommendations

Structure your response according to the required format.

2. Metadata JSON

optimized_prompt_MyAnalyzer_1.json:

{
  "class_name": "MyAnalyzer",
  "iteration": 1,
  "score": 8.5,
  "improvements": [
    "Added structured analysis points",
    "Clarified output format requirements",
    "Reduced ambiguous language"
  ],
  "token_reduction": "15%",
  "timestamp": "2024-01-15T10:30:00Z"
}

Advanced Usage

Multi-Stage Optimization

Optimize in stages for complex improvements:

# Stage 1: Focus on accuracy
planai optimize-prompt \
  --python-file app.py \
  --class-name MyWorker \
  --goal-prompt "Maximize extraction accuracy" \
  --output-dir stage1/

# Stage 2: Optimize tokens using Stage 1 results
planai optimize-prompt \
  --python-file app.py \
  --class-name MyWorker \
  --goal-prompt "Reduce tokens while maintaining Stage 1 accuracy" \
  --output-dir stage2/

Custom Scoring Criteria

Provide specific scoring criteria in your goal:

--goal-prompt "Optimize for:
1. Factual accuracy (40%)
2. Response conciseness (30%)
3. Professional tone (20%)
4. Format compliance (10%)"

Using Different Models

Use specialized models for optimization:

# Use GPT-4 for reasoning and optimization
planai --llm-provider openai --llm-model gpt-4o-mini --llm-reason-model gpt-4 \
  optimize-prompt ...

# Use Claude for optimization
planai --llm-provider anthropic --llm-model claude-3-opus --llm-reason-model claude-3-opus \
  optimize-prompt ...

Best Practices

1. Quality Debug Data

Ensure your debug logs contain diverse, representative examples:

class MyWorker(LLMTaskWorker):
    debug_mode = True

    # Collect more samples during peak usage
    debug_sample_rate = 1.0  # 100% sampling

Run multiple optimization passes:

#!/bin/bash

for i in {1..5}; do
  planai optimize-prompt \
    --python-file app.py \
    --class-name MyWorker \
    --goal-prompt "Iteration $i: Refine based on previous results" \
    --output-dir "iteration_$i/"
done

3. A/B Testing

Test optimized prompts against originals:

class ABTestWorker(TaskWorker):
    def __init__(self, original_prompt, optimized_prompt):
        self.original_worker = MyWorker(prompt=original_prompt)
        self.optimized_worker = MyWorker(prompt=optimized_prompt)

    def consume_work(self, task):
        # Randomly select worker for A/B test
        worker = random.choice([self.original_worker, self.optimized_worker])
        result = worker.process(task)

        # Log which version was used
        self.log_test_result(worker, result)

4. Version Control

Track prompt evolution:

# Create prompt history
git init prompts
cd prompts

# Save each optimization
cp ../optimized_prompt_*.txt ./
git add .
git commit -m "Optimization run: improve accuracy"

Troubleshooting

Common Issues

Import Errors

# Ensure correct Python path
--search-path /path/to/your/project

Empty Debug Logs
- Verify debug_mode = True in your worker
- Check debug directory permissions
- Ensure workflow actually processed tasks
Poor Optimization Results
- Provide more specific goal prompts
- Include more diverse debug samples
- Try different reasoning models

Debug Mode

Enable verbose output for troubleshooting:

planai -vv optimize-prompt \
  --python-file app.py \
  --class-name MyWorker \
  --debug-log debug/MyWorker.json \
  --goal-prompt "Improve accuracy"

Example: Complete Optimization Workflow

Here’s a complete example of optimizing a sentiment analysis prompt:

from planai import LLMTaskWorker
from typing import Type

class ReviewSentiment(Task):
    sentiment: Literal["positive", "negative", "neutral"]
    confidence: float
    key_phrases: List[str]

class SentimentAnalyzer(LLMTaskWorker):
    prompt = "Analyze the sentiment of this review"
    llm_input_type: Type[Task] = ReviewText
    output_types: List[Type[Task]] = [ReviewSentiment]
    debug_mode = True

# Collect data (run your normal workflow)
# ...

# Optimize the prompt

planai --llm-provider openai --llm-model gpt-4o-mini --llm-reason-model gpt-4 \
  optimize-prompt \
  --python-file sentiment_worker.py \
  --class-name SentimentAnalyzer \
  --search-path . \
  --debug-log debug/SentimentAnalyzer.json \
  --goal-prompt "Improve sentiment classification accuracy, ensure confidence scores are well-calibrated, and extract more relevant key phrases" \
  --num-iterations 5 \
  --output-dir optimized_prompts/

Integration with CI/CD

Automate prompt optimization in your pipeline:

name: Optimize Prompts

on:
  schedule:
    - cron: '0 0 * * 0'  # Weekly

jobs:
  optimize:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2

      - name: Setup Python
        uses: actions/setup-python@v2

      - name: Install dependencies
        run: pip install planai

      - name: Run optimization
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        run: |
          planai --llm-provider openai --llm-model gpt-4o-mini \
            optimize-prompt \
            --python-file src/workers.py \
            --class-name AnalysisWorker \
            --debug-log data/debug_logs.json \
            --goal-prompt "Improve accuracy based on last week's data"

      - name: Create PR with optimized prompts
        uses: peter-evans/create-pull-request@v3
        with:
          title: "Automated prompt optimization"
          body: "Weekly prompt optimization based on production data"

Next Steps

Review LLM Integration for LLMTaskWorker details
Learn about Prompts best practices
Explore Debug Mode configuration options