Skip to content

Prompt Optimization

One of the most powerful features of the PlanAI CLI is automatic prompt optimization. This tool helps refine prompts for Large Language Models by leveraging more advanced LLMs to improve prompt effectiveness based on real production data.

The prompt optimization tool:

  • Automates the process of iterating and improving prompts
  • Uses real production data from debug logs for optimization
  • Dynamically loads and uses production classes in the workflow
  • Employs an LLM-based scoring mechanism to evaluate prompt effectiveness
  • Adapts to various LLM tasks

First, enable debug logging in your LLMTaskWorker:

class MyAnalyzer(LLMTaskWorker):
prompt = "Analyze this data and provide insights"
llm_input_type: Type[Task] = DataTask
output_types: List[Type[Task]] = [AnalysisResult]
# Enable debug mode to generate logs
debug_mode = True

Run your workflow to collect real production data:

# Your normal workflow execution
graph = Graph(name="Analysis Pipeline")
analyzer = MyAnalyzer(llm=llm)
graph.add_worker(analyzer)
graph.run(initial_tasks=[...])
# Debug logs will be saved to debug/MyAnalyzer.json

Run the prompt optimization:

Terminal window
planai --llm-provider openai --llm-model gpt-4o-mini --llm-reason-model gpt-4 \
optimize-prompt \
--python-file my_app.py \
--class-name MyAnalyzer \
--search-path . \
--debug-log debug/MyAnalyzer.json \
--goal-prompt "Improve accuracy and reduce response length"
  • --python-file: Path to the Python file containing your LLMTaskWorker class
  • --class-name: Name of the LLMTaskWorker class to optimize
  • --search-path: Python path for module imports (usually . for current directory)
  • --debug-log: Path to the debug log file generated by your worker
  • --goal-prompt: Description of what you want to optimize for
  • --num-iterations: Number of optimization iterations (default: 3)
  • --output-dir: Directory to save optimized prompts (default: current directory)
  • --max-samples: Maximum number of debug samples to use (default: all)
  • --temperature: Temperature for prompt generation (default: 0.7)
  • --scoring-examples: Number of examples to use for scoring (default: 5)

Craft effective goal prompts for different objectives:

Terminal window
--goal-prompt "Improve accuracy in extracting key information while maintaining the current format"
Terminal window
--goal-prompt "Reduce token usage while preserving output quality and completeness"
Terminal window
--goal-prompt "Ensure consistent output format across all responses, following the Pydantic model structure"
Terminal window
--goal-prompt "Minimize parsing errors and ensure all required fields are always populated"
Terminal window
--goal-prompt "Improve medical terminology accuracy and ensure HIPAA-compliant responses"

The optimization process generates two types of files:

optimized_prompt_MyAnalyzer_1.txt:

Analyze the provided data with focus on:
1. Key metrics and their trends
2. Anomalies or outliers
3. Actionable recommendations
Structure your response according to the required format.

optimized_prompt_MyAnalyzer_1.json:

{
"class_name": "MyAnalyzer",
"iteration": 1,
"score": 8.5,
"improvements": [
"Added structured analysis points",
"Clarified output format requirements",
"Reduced ambiguous language"
],
"token_reduction": "15%",
"timestamp": "2024-01-15T10:30:00Z"
}

Optimize in stages for complex improvements:

Terminal window
# Stage 1: Focus on accuracy
planai optimize-prompt \
--python-file app.py \
--class-name MyWorker \
--goal-prompt "Maximize extraction accuracy" \
--output-dir stage1/
# Stage 2: Optimize tokens using Stage 1 results
planai optimize-prompt \
--python-file app.py \
--class-name MyWorker \
--goal-prompt "Reduce tokens while maintaining Stage 1 accuracy" \
--output-dir stage2/

Provide specific scoring criteria in your goal:

Terminal window
--goal-prompt "Optimize for:
1. Factual accuracy (40%)
2. Response conciseness (30%)
3. Professional tone (20%)
4. Format compliance (10%)"

Use specialized models for optimization:

Terminal window
# Use GPT-4 for reasoning and optimization
planai --llm-provider openai --llm-model gpt-4o-mini --llm-reason-model gpt-4 \
optimize-prompt ...
# Use Claude for optimization
planai --llm-provider anthropic --llm-model claude-3-opus --llm-reason-model claude-3-opus \
optimize-prompt ...

Ensure your debug logs contain diverse, representative examples:

class MyWorker(LLMTaskWorker):
debug_mode = True
# Collect more samples during peak usage
debug_sample_rate = 1.0 # 100% sampling

Run multiple optimization passes:

optimize.sh
#!/bin/bash
for i in {1..5}; do
planai optimize-prompt \
--python-file app.py \
--class-name MyWorker \
--goal-prompt "Iteration $i: Refine based on previous results" \
--output-dir "iteration_$i/"
done

Test optimized prompts against originals:

class ABTestWorker(TaskWorker):
def __init__(self, original_prompt, optimized_prompt):
self.original_worker = MyWorker(prompt=original_prompt)
self.optimized_worker = MyWorker(prompt=optimized_prompt)
def consume_work(self, task):
# Randomly select worker for A/B test
worker = random.choice([self.original_worker, self.optimized_worker])
result = worker.process(task)
# Log which version was used
self.log_test_result(worker, result)

Track prompt evolution:

Terminal window
# Create prompt history
git init prompts
cd prompts
# Save each optimization
cp ../optimized_prompt_*.txt ./
git add .
git commit -m "Optimization run: improve accuracy"
  1. Import Errors

    Terminal window
    # Ensure correct Python path
    --search-path /path/to/your/project
  2. Empty Debug Logs

    • Verify debug_mode = True in your worker
    • Check debug directory permissions
    • Ensure workflow actually processed tasks
  3. Poor Optimization Results

    • Provide more specific goal prompts
    • Include more diverse debug samples
    • Try different reasoning models

Enable verbose output for troubleshooting:

Terminal window
planai -vv optimize-prompt \
--python-file app.py \
--class-name MyWorker \
--debug-log debug/MyWorker.json \
--goal-prompt "Improve accuracy"

Here’s a complete example of optimizing a sentiment analysis prompt:

sentiment_worker.py
from planai import LLMTaskWorker
from typing import Type
class ReviewSentiment(Task):
sentiment: Literal["positive", "negative", "neutral"]
confidence: float
key_phrases: List[str]
class SentimentAnalyzer(LLMTaskWorker):
prompt = "Analyze the sentiment of this review"
llm_input_type: Type[Task] = ReviewText
output_types: List[Type[Task]] = [ReviewSentiment]
debug_mode = True
# Collect data (run your normal workflow)
# ...
# Optimize the prompt
Terminal window
planai --llm-provider openai --llm-model gpt-4o-mini --llm-reason-model gpt-4 \
optimize-prompt \
--python-file sentiment_worker.py \
--class-name SentimentAnalyzer \
--search-path . \
--debug-log debug/SentimentAnalyzer.json \
--goal-prompt "Improve sentiment classification accuracy, ensure confidence scores are well-calibrated, and extract more relevant key phrases" \
--num-iterations 5 \
--output-dir optimized_prompts/

Automate prompt optimization in your pipeline:

.github/workflows/optimize-prompts.yml
name: Optimize Prompts
on:
schedule:
- cron: '0 0 * * 0' # Weekly
jobs:
optimize:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Setup Python
uses: actions/setup-python@v2
- name: Install dependencies
run: pip install planai
- name: Run optimization
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
planai --llm-provider openai --llm-model gpt-4o-mini \
optimize-prompt \
--python-file src/workers.py \
--class-name AnalysisWorker \
--debug-log data/debug_logs.json \
--goal-prompt "Improve accuracy based on last week's data"
- name: Create PR with optimized prompts
uses: peter-evans/create-pull-request@v3
with:
title: "Automated prompt optimization"
body: "Weekly prompt optimization based on production data"