New Planner tool to help you break down complex ideas, problems, and projects into multiple manageable steps. This is a self-prompt generation tool whose output can then be fed into another tool and model as required
This commit is contained in:
2
.gitignore
vendored
2
.gitignore
vendored
@@ -169,3 +169,5 @@ test-setup/
|
|||||||
|
|
||||||
# Scratch feature documentation files
|
# Scratch feature documentation files
|
||||||
FEATURE_*.md
|
FEATURE_*.md
|
||||||
|
# Temporary files
|
||||||
|
/tmp/
|
||||||
|
|||||||
94
README.md
94
README.md
@@ -13,19 +13,20 @@ problem-solving, and collaborative development.
|
|||||||
|
|
||||||
**Features true AI orchestration with conversations that continue across tasks** - Give Claude a complex
|
**Features true AI orchestration with conversations that continue across tasks** - Give Claude a complex
|
||||||
task and let it orchestrate between models automatically. Claude stays in control, performs the actual work,
|
task and let it orchestrate between models automatically. Claude stays in control, performs the actual work,
|
||||||
but gets perspectives from the best AI for each subtask. With tools like [`analyze`](#7-analyze---smart-file-analysis) for
|
but gets perspectives from the best AI for each subtask. With tools like [`planner`](#3-planner---interactive-sequential-planning) for
|
||||||
understanding codebases, [`codereview`](#4-codereview---professional-code-review) for audits, [`refactor`](#8-refactor---intelligent-code-refactoring) for
|
breaking down complex projects, [`analyze`](#8-analyze---smart-file-analysis) for understanding codebases,
|
||||||
improving code structure, [`debug`](#6-debug---expert-debugging-assistant) for solving complex problems, and [`precommit`](#5-precommit---pre-commit-validation) for
|
[`codereview`](#5-codereview---professional-code-review) for audits, [`refactor`](#9-refactor---intelligent-code-refactoring) for
|
||||||
|
improving code structure, [`debug`](#7-debug---expert-debugging-assistant) for solving complex problems, and [`precommit`](#6-precommit---pre-commit-validation) for
|
||||||
validating changes, Claude can switch between different tools _and_ models mid-conversation,
|
validating changes, Claude can switch between different tools _and_ models mid-conversation,
|
||||||
with context carrying forward seamlessly.
|
with context carrying forward seamlessly.
|
||||||
|
|
||||||
**Example Workflow - Claude Code:**
|
**Example Workflow - Claude Code:**
|
||||||
1. Performs its own reasoning
|
1. Performs its own reasoning
|
||||||
2. Uses Gemini Pro to deeply [`analyze`](#7-analyze---smart-file-analysis) the code in question for a second opinion
|
2. Uses Gemini Pro to deeply [`analyze`](#8-analyze---smart-file-analysis) the code in question for a second opinion
|
||||||
3. Switches to O3 to continue [`chatting`](#1-chat---general-development-chat--collaborative-thinking) about its findings
|
3. Switches to O3 to continue [`chatting`](#1-chat---general-development-chat--collaborative-thinking) about its findings
|
||||||
4. Uses Flash to evaluate formatting suggestions from O3
|
4. Uses Flash to evaluate formatting suggestions from O3
|
||||||
5. Performs the actual work after taking in feedback from all three
|
5. Performs the actual work after taking in feedback from all three
|
||||||
6. Returns to Pro for a [`precommit`](#5-precommit---pre-commit-validation) review
|
6. Returns to Pro for a [`precommit`](#6-precommit---pre-commit-validation) review
|
||||||
|
|
||||||
All within a single conversation thread! Gemini Pro in step 6 _knows_ what was recommended by O3 in step 3! Taking that context
|
All within a single conversation thread! Gemini Pro in step 6 _knows_ what was recommended by O3 in step 3! Taking that context
|
||||||
and review into consideration to aid with its pre-commit review.
|
and review into consideration to aid with its pre-commit review.
|
||||||
@@ -48,14 +49,15 @@ and review into consideration to aid with its pre-commit review.
|
|||||||
- **Tools Reference**
|
- **Tools Reference**
|
||||||
- [`chat`](#1-chat---general-development-chat--collaborative-thinking) - Collaborative thinking
|
- [`chat`](#1-chat---general-development-chat--collaborative-thinking) - Collaborative thinking
|
||||||
- [`thinkdeep`](#2-thinkdeep---extended-reasoning-partner) - Extended reasoning
|
- [`thinkdeep`](#2-thinkdeep---extended-reasoning-partner) - Extended reasoning
|
||||||
- [`consensus`](#3-consensus---multi-model-perspective-gathering) - Multi-model consensus analysis
|
- [`planner`](#3-planner---interactive-sequential-planning) - Interactive sequential planning
|
||||||
- [`codereview`](#4-codereview---professional-code-review) - Code review
|
- [`consensus`](#4-consensus---multi-model-perspective-gathering) - Multi-model consensus analysis
|
||||||
- [`precommit`](#5-precommit---pre-commit-validation) - Pre-commit validation
|
- [`codereview`](#5-codereview---professional-code-review) - Code review
|
||||||
- [`debug`](#6-debug---expert-debugging-assistant) - Debugging help
|
- [`precommit`](#6-precommit---pre-commit-validation) - Pre-commit validation
|
||||||
- [`analyze`](#7-analyze---smart-file-analysis) - File analysis
|
- [`debug`](#7-debug---expert-debugging-assistant) - Debugging help
|
||||||
- [`refactor`](#8-refactor---intelligent-code-refactoring) - Code refactoring with decomposition focus
|
- [`analyze`](#8-analyze---smart-file-analysis) - File analysis
|
||||||
- [`tracer`](#9-tracer---static-code-analysis-prompt-generator) - Call-flow mapping and dependency tracing
|
- [`refactor`](#9-refactor---intelligent-code-refactoring) - Code refactoring with decomposition focus
|
||||||
- [`testgen`](#10-testgen---comprehensive-test-generation) - Test generation with edge cases
|
- [`tracer`](#10-tracer---static-code-analysis-prompt-generator) - Call-flow mapping and dependency tracing
|
||||||
|
- [`testgen`](#11-testgen---comprehensive-test-generation) - Test generation with edge cases
|
||||||
- [`your custom tool`](#add-your-own-tools) - Create custom tools for specialized workflows
|
- [`your custom tool`](#add-your-own-tools) - Create custom tools for specialized workflows
|
||||||
|
|
||||||
- **Advanced Usage**
|
- **Advanced Usage**
|
||||||
@@ -263,6 +265,7 @@ Just ask Claude naturally:
|
|||||||
**Quick Tool Selection Guide:**
|
**Quick Tool Selection Guide:**
|
||||||
- **Need a thinking partner?** → `chat` (brainstorm ideas, get second opinions, validate approaches)
|
- **Need a thinking partner?** → `chat` (brainstorm ideas, get second opinions, validate approaches)
|
||||||
- **Need deeper thinking?** → `thinkdeep` (extends analysis, finds edge cases)
|
- **Need deeper thinking?** → `thinkdeep` (extends analysis, finds edge cases)
|
||||||
|
- **Need to break down complex projects?** → `planner` (step-by-step planning, project structure, breaking down complex ideas)
|
||||||
- **Need multiple perspectives?** → `consensus` (get diverse expert opinions on proposals and decisions)
|
- **Need multiple perspectives?** → `consensus` (get diverse expert opinions on proposals and decisions)
|
||||||
- **Code needs review?** → `codereview` (bugs, security, performance issues)
|
- **Code needs review?** → `codereview` (bugs, security, performance issues)
|
||||||
- **Pre-commit validation?** → `precommit` (validate git changes before committing)
|
- **Pre-commit validation?** → `precommit` (validate git changes before committing)
|
||||||
@@ -288,16 +291,17 @@ Just ask Claude naturally:
|
|||||||
**Tools Overview:**
|
**Tools Overview:**
|
||||||
1. [`chat`](docs/tools/chat.md) - Collaborative thinking and development conversations
|
1. [`chat`](docs/tools/chat.md) - Collaborative thinking and development conversations
|
||||||
2. [`thinkdeep`](docs/tools/thinkdeep.md) - Extended reasoning and problem-solving
|
2. [`thinkdeep`](docs/tools/thinkdeep.md) - Extended reasoning and problem-solving
|
||||||
3. [`consensus`](docs/tools/consensus.md) - Multi-model consensus analysis with stance steering
|
3. [`planner`](docs/tools/planner.md) - Interactive sequential planning for complex projects
|
||||||
4. [`codereview`](docs/tools/codereview.md) - Professional code review with severity levels
|
4. [`consensus`](docs/tools/consensus.md) - Multi-model consensus analysis with stance steering
|
||||||
5. [`precommit`](docs/tools/precommit.md) - Validate git changes before committing
|
5. [`codereview`](docs/tools/codereview.md) - Professional code review with severity levels
|
||||||
6. [`debug`](docs/tools/debug.md) - Root cause analysis and debugging
|
6. [`precommit`](docs/tools/precommit.md) - Validate git changes before committing
|
||||||
7. [`analyze`](docs/tools/analyze.md) - General-purpose file and code analysis
|
7. [`debug`](docs/tools/debug.md) - Root cause analysis and debugging
|
||||||
8. [`refactor`](docs/tools/refactor.md) - Code refactoring with decomposition focus
|
8. [`analyze`](docs/tools/analyze.md) - General-purpose file and code analysis
|
||||||
9. [`tracer`](docs/tools/tracer.md) - Static code analysis prompt generator for call-flow mapping
|
9. [`refactor`](docs/tools/refactor.md) - Code refactoring with decomposition focus
|
||||||
10. [`testgen`](docs/tools/testgen.md) - Comprehensive test generation with edge case coverage
|
10. [`tracer`](docs/tools/tracer.md) - Static code analysis prompt generator for call-flow mapping
|
||||||
11. [`listmodels`](docs/tools/listmodels.md) - Display all available AI models organized by provider
|
11. [`testgen`](docs/tools/testgen.md) - Comprehensive test generation with edge case coverage
|
||||||
12. [`version`](docs/tools/version.md) - Get server version and configuration
|
12. [`listmodels`](docs/tools/listmodels.md) - Display all available AI models organized by provider
|
||||||
|
13. [`version`](docs/tools/version.md) - Get server version and configuration
|
||||||
|
|
||||||
### 1. `chat` - General Development Chat & Collaborative Thinking
|
### 1. `chat` - General Development Chat & Collaborative Thinking
|
||||||
Your thinking partner for brainstorming, getting second opinions, and validating approaches. Perfect for technology comparisons, architecture discussions, and collaborative problem-solving.
|
Your thinking partner for brainstorming, getting second opinions, and validating approaches. Perfect for technology comparisons, architecture discussions, and collaborative problem-solving.
|
||||||
@@ -318,7 +322,27 @@ and find out what the root cause is
|
|||||||
|
|
||||||
**[📖 Read More](docs/tools/thinkdeep.md)** - Enhanced analysis capabilities and critical evaluation process
|
**[📖 Read More](docs/tools/thinkdeep.md)** - Enhanced analysis capabilities and critical evaluation process
|
||||||
|
|
||||||
### 3. `consensus` - Multi-Model Perspective Gathering
|
### 3. `planner` - Interactive Step-by-Step Planning
|
||||||
|
Break down complex projects or ideas into manageable, structured plans through step-by-step thinking.
|
||||||
|
Perfect for adding new features to an existing system, scaling up system design, migration strategies,
|
||||||
|
and architectural planning with branching and revision capabilities.
|
||||||
|
|
||||||
|
#### Pro Tip
|
||||||
|
Claude supports `sub-tasks` where it will spawn and run separate background tasks. You can ask Claude to
|
||||||
|
run Zen's planner with two separate ideas. Then when it's done, use Zen's `consensus` tool to pass the entire
|
||||||
|
plan and get expert perspective from two powerful AI models on which one to work on first! Like performing **AB** testing
|
||||||
|
in one-go without the wait!
|
||||||
|
|
||||||
|
```
|
||||||
|
Create two separate sub-tasks: in one, using planner tool show me how to add natural language support
|
||||||
|
to my cooking app. In the other sub-task, use planner to plan how to add support for voice notes to my cooking app.
|
||||||
|
Once done, start a consensus by sharing both plans to o3 and flash to give me the final verdict. Which one do
|
||||||
|
I implement first?
|
||||||
|
```
|
||||||
|
|
||||||
|
**[📖 Read More](docs/tools/planner.md)** - Step-by-step planning methodology and multi-session continuation
|
||||||
|
|
||||||
|
### 4. `consensus` - Multi-Model Perspective Gathering
|
||||||
Get diverse expert opinions from multiple AI models on technical proposals and decisions. Supports stance steering (for/against/neutral) and structured decision-making.
|
Get diverse expert opinions from multiple AI models on technical proposals and decisions. Supports stance steering (for/against/neutral) and structured decision-making.
|
||||||
|
|
||||||
```
|
```
|
||||||
@@ -328,7 +352,7 @@ migrate from REST to GraphQL for our API. I need a definitive answer.
|
|||||||
|
|
||||||
**[📖 Read More](docs/tools/consensus.md)** - Multi-model orchestration and decision analysis
|
**[📖 Read More](docs/tools/consensus.md)** - Multi-model orchestration and decision analysis
|
||||||
|
|
||||||
### 4. `codereview` - Professional Code Review
|
### 5. `codereview` - Professional Code Review
|
||||||
Comprehensive code analysis with prioritized feedback and severity levels. Supports security reviews, performance analysis, and coding standards enforcement.
|
Comprehensive code analysis with prioritized feedback and severity levels. Supports security reviews, performance analysis, and coding standards enforcement.
|
||||||
|
|
||||||
```
|
```
|
||||||
@@ -338,7 +362,7 @@ and there may be more potential vulnerabilities. Find and share related code."
|
|||||||
|
|
||||||
**[📖 Read More](docs/tools/codereview.md)** - Professional review capabilities and parallel analysis
|
**[📖 Read More](docs/tools/codereview.md)** - Professional review capabilities and parallel analysis
|
||||||
|
|
||||||
### 5. `precommit` - Pre-Commit Validation
|
### 6. `precommit` - Pre-Commit Validation
|
||||||
Comprehensive review of staged/unstaged git changes across multiple repositories. Validates changes against requirements and detects potential regressions.
|
Comprehensive review of staged/unstaged git changes across multiple repositories. Validates changes against requirements and detects potential regressions.
|
||||||
|
|
||||||
```
|
```
|
||||||
@@ -348,7 +372,7 @@ Perform a thorough precommit with o3, we want to only highlight critical issues,
|
|||||||
|
|
||||||
**[📖 Read More](docs/tools/precommit.md)** - Multi-repository validation and change analysis
|
**[📖 Read More](docs/tools/precommit.md)** - Multi-repository validation and change analysis
|
||||||
|
|
||||||
### 6. `debug` - Expert Debugging Assistant
|
### 7. `debug` - Expert Debugging Assistant
|
||||||
Root cause analysis for complex problems with systematic hypothesis generation. Supports error context, stack traces, and structured debugging approaches.
|
Root cause analysis for complex problems with systematic hypothesis generation. Supports error context, stack traces, and structured debugging approaches.
|
||||||
|
|
||||||
```
|
```
|
||||||
@@ -359,7 +383,7 @@ why this is happening and what the root cause is and its fix
|
|||||||
|
|
||||||
**[📖 Read More](docs/tools/debug.md)** - Advanced debugging methodologies and troubleshooting
|
**[📖 Read More](docs/tools/debug.md)** - Advanced debugging methodologies and troubleshooting
|
||||||
|
|
||||||
### 7. `analyze` - Smart File Analysis
|
### 8. `analyze` - Smart File Analysis
|
||||||
General-purpose code understanding and exploration. Supports architecture analysis, pattern detection, and comprehensive codebase exploration.
|
General-purpose code understanding and exploration. Supports architecture analysis, pattern detection, and comprehensive codebase exploration.
|
||||||
|
|
||||||
```
|
```
|
||||||
@@ -368,7 +392,7 @@ Use gemini to analyze main.py to understand how it works
|
|||||||
|
|
||||||
**[📖 Read More](docs/tools/analyze.md)** - Code analysis types and exploration capabilities
|
**[📖 Read More](docs/tools/analyze.md)** - Code analysis types and exploration capabilities
|
||||||
|
|
||||||
### 8. `refactor` - Intelligent Code Refactoring
|
### 9. `refactor` - Intelligent Code Refactoring
|
||||||
Comprehensive refactoring analysis with top-down decomposition strategy. Prioritizes structural improvements and provides precise implementation guidance.
|
Comprehensive refactoring analysis with top-down decomposition strategy. Prioritizes structural improvements and provides precise implementation guidance.
|
||||||
|
|
||||||
```
|
```
|
||||||
@@ -377,7 +401,7 @@ Use gemini pro to decompose my_crazy_big_class.m into smaller extensions
|
|||||||
|
|
||||||
**[📖 Read More](docs/tools/refactor.md)** - Refactoring strategy and progressive analysis approach
|
**[📖 Read More](docs/tools/refactor.md)** - Refactoring strategy and progressive analysis approach
|
||||||
|
|
||||||
### 9. `tracer` - Static Code Analysis Prompt Generator
|
### 10. `tracer` - Static Code Analysis Prompt Generator
|
||||||
Creates detailed analysis prompts for call-flow mapping and dependency tracing. Generates structured analysis requests for precision execution flow or dependency mapping.
|
Creates detailed analysis prompts for call-flow mapping and dependency tracing. Generates structured analysis requests for precision execution flow or dependency mapping.
|
||||||
|
|
||||||
```
|
```
|
||||||
@@ -386,7 +410,7 @@ Use zen tracer to analyze how UserAuthManager.authenticate is used and why
|
|||||||
|
|
||||||
**[📖 Read More](docs/tools/tracer.md)** - Prompt generation and analysis modes
|
**[📖 Read More](docs/tools/tracer.md)** - Prompt generation and analysis modes
|
||||||
|
|
||||||
### 10. `testgen` - Comprehensive Test Generation
|
### 11. `testgen` - Comprehensive Test Generation
|
||||||
Generates thorough test suites with edge case coverage based on existing code and test framework. Uses multi-agent workflow for realistic failure mode analysis.
|
Generates thorough test suites with edge case coverage based on existing code and test framework. Uses multi-agent workflow for realistic failure mode analysis.
|
||||||
|
|
||||||
```
|
```
|
||||||
@@ -395,7 +419,7 @@ Use zen to generate tests for User.login() method
|
|||||||
|
|
||||||
**[📖 Read More](docs/tools/testgen.md)** - Test generation strategy and framework support
|
**[📖 Read More](docs/tools/testgen.md)** - Test generation strategy and framework support
|
||||||
|
|
||||||
### 11. `listmodels` - List Available Models
|
### 12. `listmodels` - List Available Models
|
||||||
Display all available AI models organized by provider, showing capabilities, context windows, and configuration status.
|
Display all available AI models organized by provider, showing capabilities, context windows, and configuration status.
|
||||||
|
|
||||||
```
|
```
|
||||||
@@ -404,7 +428,7 @@ Use zen to list available models
|
|||||||
|
|
||||||
**[📖 Read More](docs/tools/listmodels.md)** - Model capabilities and configuration details
|
**[📖 Read More](docs/tools/listmodels.md)** - Model capabilities and configuration details
|
||||||
|
|
||||||
### 12. `version` - Server Information
|
### 13. `version` - Server Information
|
||||||
Get server version, configuration details, and system status for debugging and troubleshooting.
|
Get server version, configuration details, and system status for debugging and troubleshooting.
|
||||||
|
|
||||||
```
|
```
|
||||||
@@ -422,6 +446,7 @@ Zen supports powerful structured prompts in Claude Code for quick access to tool
|
|||||||
#### Tool Prompts
|
#### Tool Prompts
|
||||||
- `/zen:chat ask local-llama what 2 + 2 is` - Use chat tool with auto-selected model
|
- `/zen:chat ask local-llama what 2 + 2 is` - Use chat tool with auto-selected model
|
||||||
- `/zen:thinkdeep use o3 and tell me why the code isn't working in sorting.swift` - Use thinkdeep tool with auto-selected model
|
- `/zen:thinkdeep use o3 and tell me why the code isn't working in sorting.swift` - Use thinkdeep tool with auto-selected model
|
||||||
|
- `/zen:planner break down the microservices migration project into manageable steps` - Use planner tool with auto-selected model
|
||||||
- `/zen:consensus use o3:for and flash:against and tell me if adding feature X is a good idea for the project. Pass them a summary of what it does.` - Use consensus tool with default configuration
|
- `/zen:consensus use o3:for and flash:against and tell me if adding feature X is a good idea for the project. Pass them a summary of what it does.` - Use consensus tool with default configuration
|
||||||
- `/zen:codereview review for security module ABC` - Use codereview tool with auto-selected model
|
- `/zen:codereview review for security module ABC` - Use codereview tool with auto-selected model
|
||||||
- `/zen:debug table view is not scrolling properly, very jittery, I suspect the code is in my_controller.m` - Use debug tool with auto-selected model
|
- `/zen:debug table view is not scrolling properly, very jittery, I suspect the code is in my_controller.m` - Use debug tool with auto-selected model
|
||||||
@@ -432,6 +457,7 @@ Zen supports powerful structured prompts in Claude Code for quick access to tool
|
|||||||
|
|
||||||
#### Advanced Examples
|
#### Advanced Examples
|
||||||
- `/zen:thinkdeeper check if the algorithm in @sort.py is performant and if there are alternatives we could explore`
|
- `/zen:thinkdeeper check if the algorithm in @sort.py is performant and if there are alternatives we could explore`
|
||||||
|
- `/zen:planner create a step-by-step plan for migrating our authentication system to OAuth2, including dependencies and rollback strategies`
|
||||||
- `/zen:consensus debate whether we should migrate to GraphQL for our API`
|
- `/zen:consensus debate whether we should migrate to GraphQL for our API`
|
||||||
- `/zen:precommit confirm these changes match our requirements in COOL_FEATURE.md`
|
- `/zen:precommit confirm these changes match our requirements in COOL_FEATURE.md`
|
||||||
- `/zen:testgen write me tests for class ABC`
|
- `/zen:testgen write me tests for class ABC`
|
||||||
@@ -440,7 +466,7 @@ Zen supports powerful structured prompts in Claude Code for quick access to tool
|
|||||||
#### Syntax Format
|
#### Syntax Format
|
||||||
The prompt format is: `/zen:[tool] [your_message]`
|
The prompt format is: `/zen:[tool] [your_message]`
|
||||||
|
|
||||||
- `[tool]` - Any available tool name (chat, thinkdeep, codereview, debug, analyze, consensus, etc.)
|
- `[tool]` - Any available tool name (chat, thinkdeep, planner, consensus, codereview, debug, analyze, etc.)
|
||||||
- `[your_message]` - Your request, question, or instructions for the tool
|
- `[your_message]` - Your request, question, or instructions for the tool
|
||||||
|
|
||||||
**Note:** All prompts will show as "(MCP) [tool]" in Claude Code to indicate they're provided by the MCP server.
|
**Note:** All prompts will show as "(MCP) [tool]" in Claude Code to indicate they're provided by the MCP server.
|
||||||
|
|||||||
@@ -14,7 +14,7 @@ import os
|
|||||||
# These values are used in server responses and for tracking releases
|
# These values are used in server responses and for tracking releases
|
||||||
# IMPORTANT: This is the single source of truth for version and author info
|
# IMPORTANT: This is the single source of truth for version and author info
|
||||||
# Semantic versioning: MAJOR.MINOR.PATCH
|
# Semantic versioning: MAJOR.MINOR.PATCH
|
||||||
__version__ = "4.9.3"
|
__version__ = "5.0.0"
|
||||||
# Last update date in ISO format
|
# Last update date in ISO format
|
||||||
__updated__ = "2025-06-17"
|
__updated__ = "2025-06-17"
|
||||||
# Primary maintainer
|
# Primary maintainer
|
||||||
|
|||||||
83
docs/tools/planner.md
Normal file
83
docs/tools/planner.md
Normal file
@@ -0,0 +1,83 @@
|
|||||||
|
# Planner Tool - Interactive Step-by-Step Planning
|
||||||
|
|
||||||
|
**Break down complex projects into manageable, structured plans through step-by-step thinking**
|
||||||
|
|
||||||
|
The `planner` tool helps you break down complex ideas, problems, or projects into multiple manageable steps. Perfect for system design, migration strategies,
|
||||||
|
architectural planning, and feature development with branching and revision capabilities.
|
||||||
|
|
||||||
|
## How It Works
|
||||||
|
|
||||||
|
The planner tool enables step-by-step thinking with incremental plan building:
|
||||||
|
|
||||||
|
1. **Start with step 1**: Describe the task or problem to plan
|
||||||
|
2. **Continue building**: Add subsequent steps, building the plan piece by piece
|
||||||
|
3. **Revise when needed**: Update earlier decisions as new insights emerge
|
||||||
|
4. **Branch alternatives**: Explore different approaches when multiple options exist
|
||||||
|
5. **Continue across sessions**: Resume planning later with full context
|
||||||
|
|
||||||
|
## Example Prompts
|
||||||
|
|
||||||
|
#### Pro Tip
|
||||||
|
Claude supports `sub-tasks` where it will spawn and run separate background tasks. You can ask Claude to
|
||||||
|
run Zen's planner with two separate ideas. Then when it's done, use Zen's `consensus` tool to pass the entire
|
||||||
|
plan and get expert perspective from two powerful AI models on which one to work on first! Like performing **AB** testing
|
||||||
|
in one-go without the wait!
|
||||||
|
|
||||||
|
```
|
||||||
|
Create two separate sub-tasks: in one, using planner tool show me how to add natural language support
|
||||||
|
to my cooking app. In the other sub-task, use planner to plan how to add support for voice notes to my cooking app.
|
||||||
|
Once done, start a consensus by sharing both plans to o3 and flash to give me the final verdict. Which one do
|
||||||
|
I implement first?
|
||||||
|
```
|
||||||
|
|
||||||
|
```
|
||||||
|
Use zen's planner and show me how to add real-time notifications to our mobile app
|
||||||
|
```
|
||||||
|
|
||||||
|
```
|
||||||
|
Using the planner tool, show me how to add CoreData sync to my app, include any sub-steps
|
||||||
|
```
|
||||||
|
|
||||||
|
## Key Features
|
||||||
|
|
||||||
|
- **Step-by-step breakdown**: Build plans incrementally with full context awareness
|
||||||
|
- **Branching support**: Explore alternative approaches when needed
|
||||||
|
- **Revision capabilities**: Update earlier decisions as new insights emerge
|
||||||
|
- **Multi-session continuation**: Resume planning across multiple sessions with context
|
||||||
|
- **Dynamic adjustment**: Modify step count and approach as planning progresses
|
||||||
|
- **Visual presentation**: ASCII charts, diagrams, and structured formatting
|
||||||
|
- **Professional output**: Clean, structured plans without emojis or time estimates
|
||||||
|
|
||||||
|
## More Examples
|
||||||
|
|
||||||
|
```
|
||||||
|
Using planner, plan the architecture for a new real-time chat system with 100k concurrent users
|
||||||
|
```
|
||||||
|
|
||||||
|
```
|
||||||
|
Create a plan using zen for migrating our React app from JavaScript to TypeScript
|
||||||
|
```
|
||||||
|
|
||||||
|
```
|
||||||
|
Develop a plan using zen for implementing CI/CD pipelines across our development teams
|
||||||
|
```
|
||||||
|
|
||||||
|
## Best Practices
|
||||||
|
|
||||||
|
- **Start broad, then narrow**: Begin with high-level strategy, then add implementation details
|
||||||
|
- **Include constraints**: Consider technical, organizational, and resource limitations
|
||||||
|
- **Plan for validation**: Include testing and verification steps
|
||||||
|
- **Think about dependencies**: Identify what needs to happen before each step
|
||||||
|
- **Consider alternatives**: Note when multiple approaches are viable
|
||||||
|
- **Enable continuation**: Use continuation_id for multi-session planning
|
||||||
|
|
||||||
|
## Continue With a New Plan
|
||||||
|
|
||||||
|
Like all other tools in Zen, you can `continue` with a new plan using the output from a previous plan by simply saying
|
||||||
|
|
||||||
|
```
|
||||||
|
Continue with zen's consensus tool and find out what o3:for and flash:against think of the plan
|
||||||
|
```
|
||||||
|
|
||||||
|
You can mix and match and take one output and feed it into another, continuing from where you left off using a different
|
||||||
|
tool / model combination.
|
||||||
@@ -54,6 +54,7 @@ from tools import (
|
|||||||
ConsensusTool,
|
ConsensusTool,
|
||||||
DebugIssueTool,
|
DebugIssueTool,
|
||||||
ListModelsTool,
|
ListModelsTool,
|
||||||
|
PlannerTool,
|
||||||
Precommit,
|
Precommit,
|
||||||
RefactorTool,
|
RefactorTool,
|
||||||
TestGenerationTool,
|
TestGenerationTool,
|
||||||
@@ -161,6 +162,7 @@ TOOLS = {
|
|||||||
"chat": ChatTool(), # Interactive development chat and brainstorming
|
"chat": ChatTool(), # Interactive development chat and brainstorming
|
||||||
"consensus": ConsensusTool(), # Multi-model consensus for diverse perspectives on technical proposals
|
"consensus": ConsensusTool(), # Multi-model consensus for diverse perspectives on technical proposals
|
||||||
"listmodels": ListModelsTool(), # List all available AI models by provider
|
"listmodels": ListModelsTool(), # List all available AI models by provider
|
||||||
|
"planner": PlannerTool(), # A task or problem to plan out as several smaller steps
|
||||||
"precommit": Precommit(), # Pre-commit validation of git changes
|
"precommit": Precommit(), # Pre-commit validation of git changes
|
||||||
"testgen": TestGenerationTool(), # Comprehensive test generation with edge case coverage
|
"testgen": TestGenerationTool(), # Comprehensive test generation with edge case coverage
|
||||||
"refactor": RefactorTool(), # Intelligent code refactoring suggestions with precise line references
|
"refactor": RefactorTool(), # Intelligent code refactoring suggestions with precise line references
|
||||||
@@ -214,6 +216,11 @@ PROMPT_TEMPLATES = {
|
|||||||
"description": "Trace code execution paths",
|
"description": "Trace code execution paths",
|
||||||
"template": "Generate tracer analysis with {model}",
|
"template": "Generate tracer analysis with {model}",
|
||||||
},
|
},
|
||||||
|
"planner": {
|
||||||
|
"name": "planner",
|
||||||
|
"description": "Break down complex ideas, problems, or projects into multiple manageable steps",
|
||||||
|
"template": "Create a detailed plan with {model}",
|
||||||
|
},
|
||||||
"listmodels": {
|
"listmodels": {
|
||||||
"name": "listmodels",
|
"name": "listmodels",
|
||||||
"description": "List available AI models",
|
"description": "List available AI models",
|
||||||
|
|||||||
@@ -24,6 +24,8 @@ from .test_ollama_custom_url import OllamaCustomUrlTest
|
|||||||
from .test_openrouter_fallback import OpenRouterFallbackTest
|
from .test_openrouter_fallback import OpenRouterFallbackTest
|
||||||
from .test_openrouter_models import OpenRouterModelsTest
|
from .test_openrouter_models import OpenRouterModelsTest
|
||||||
from .test_per_tool_deduplication import PerToolDeduplicationTest
|
from .test_per_tool_deduplication import PerToolDeduplicationTest
|
||||||
|
from .test_planner_continuation_history import PlannerContinuationHistoryTest
|
||||||
|
from .test_planner_validation import PlannerValidationTest
|
||||||
from .test_redis_validation import RedisValidationTest
|
from .test_redis_validation import RedisValidationTest
|
||||||
from .test_refactor_validation import RefactorValidationTest
|
from .test_refactor_validation import RefactorValidationTest
|
||||||
from .test_testgen_validation import TestGenValidationTest
|
from .test_testgen_validation import TestGenValidationTest
|
||||||
@@ -46,6 +48,8 @@ TEST_REGISTRY = {
|
|||||||
"ollama_custom_url": OllamaCustomUrlTest,
|
"ollama_custom_url": OllamaCustomUrlTest,
|
||||||
"openrouter_fallback": OpenRouterFallbackTest,
|
"openrouter_fallback": OpenRouterFallbackTest,
|
||||||
"openrouter_models": OpenRouterModelsTest,
|
"openrouter_models": OpenRouterModelsTest,
|
||||||
|
"planner_validation": PlannerValidationTest,
|
||||||
|
"planner_continuation_history": PlannerContinuationHistoryTest,
|
||||||
"token_allocation_validation": TokenAllocationValidationTest,
|
"token_allocation_validation": TokenAllocationValidationTest,
|
||||||
"testgen_validation": TestGenValidationTest,
|
"testgen_validation": TestGenValidationTest,
|
||||||
"refactor_validation": RefactorValidationTest,
|
"refactor_validation": RefactorValidationTest,
|
||||||
@@ -75,6 +79,8 @@ __all__ = [
|
|||||||
"OllamaCustomUrlTest",
|
"OllamaCustomUrlTest",
|
||||||
"OpenRouterFallbackTest",
|
"OpenRouterFallbackTest",
|
||||||
"OpenRouterModelsTest",
|
"OpenRouterModelsTest",
|
||||||
|
"PlannerValidationTest",
|
||||||
|
"PlannerContinuationHistoryTest",
|
||||||
"TokenAllocationValidationTest",
|
"TokenAllocationValidationTest",
|
||||||
"TestGenValidationTest",
|
"TestGenValidationTest",
|
||||||
"RefactorValidationTest",
|
"RefactorValidationTest",
|
||||||
|
|||||||
361
simulator_tests/test_planner_continuation_history.py
Normal file
361
simulator_tests/test_planner_continuation_history.py
Normal file
@@ -0,0 +1,361 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Planner Continuation History Test
|
||||||
|
|
||||||
|
Tests the planner tool's continuation history building across multiple completed planning sessions:
|
||||||
|
- Multiple completed planning sessions in sequence
|
||||||
|
- History context loading for new planning sessions
|
||||||
|
- Proper context building with multiple completed plans
|
||||||
|
- Context accumulation and retrieval
|
||||||
|
"""
|
||||||
|
|
||||||
|
import json
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
from .base_test import BaseSimulatorTest
|
||||||
|
|
||||||
|
|
||||||
|
class PlannerContinuationHistoryTest(BaseSimulatorTest):
|
||||||
|
"""Test planner tool's continuation history building across multiple completed sessions"""
|
||||||
|
|
||||||
|
@property
|
||||||
|
def test_name(self) -> str:
|
||||||
|
return "planner_continuation_history"
|
||||||
|
|
||||||
|
@property
|
||||||
|
def test_description(self) -> str:
|
||||||
|
return "Planner tool continuation history building across multiple completed planning sessions"
|
||||||
|
|
||||||
|
def run_test(self) -> bool:
|
||||||
|
"""Test planner continuation history building across multiple completed sessions"""
|
||||||
|
try:
|
||||||
|
self.logger.info("Test: Planner continuation history validation")
|
||||||
|
|
||||||
|
# Test 1: Complete first planning session (microservices migration)
|
||||||
|
if not self._test_first_planning_session():
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Test 2: Complete second planning session with context from first
|
||||||
|
if not self._test_second_planning_session():
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Test 3: Complete third planning session with context from both previous
|
||||||
|
if not self._test_third_planning_session():
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Test 4: Validate context accumulation across all sessions
|
||||||
|
if not self._test_context_accumulation():
|
||||||
|
return False
|
||||||
|
|
||||||
|
self.logger.info(" ✅ All planner continuation history tests passed")
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self.logger.error(f"Planner continuation history test failed: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
def _test_first_planning_session(self) -> bool:
|
||||||
|
"""Complete first planning session - microservices migration"""
|
||||||
|
try:
|
||||||
|
self.logger.info(" 2.1: First planning session - Microservices Migration")
|
||||||
|
|
||||||
|
# Step 1: Start migration planning
|
||||||
|
self.logger.info(" 2.1.1: Start migration planning")
|
||||||
|
response1, continuation_id = self.call_mcp_tool(
|
||||||
|
"planner",
|
||||||
|
{
|
||||||
|
"step": "I need to plan a microservices migration for our monolithic e-commerce platform. Let me analyze the current monolith structure.",
|
||||||
|
"step_number": 1,
|
||||||
|
"total_steps": 3,
|
||||||
|
"next_step_required": True,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
if not response1 or not continuation_id:
|
||||||
|
self.logger.error("Failed to start first planning session")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Step 2: Domain identification
|
||||||
|
self.logger.info(" 2.1.2: Domain identification")
|
||||||
|
response2, _ = self.call_mcp_tool(
|
||||||
|
"planner",
|
||||||
|
{
|
||||||
|
"step": "I've identified key domains: User Management, Product Catalog, Order Processing, Payment, and Inventory. Each will become a separate microservice.",
|
||||||
|
"step_number": 2,
|
||||||
|
"total_steps": 3,
|
||||||
|
"next_step_required": True,
|
||||||
|
"continuation_id": continuation_id,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
if not response2:
|
||||||
|
self.logger.error("Failed step 2 of first planning session")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Step 3: Complete migration plan
|
||||||
|
self.logger.info(" 2.1.3: Complete migration plan")
|
||||||
|
response3, _ = self.call_mcp_tool(
|
||||||
|
"planner",
|
||||||
|
{
|
||||||
|
"step": "Migration strategy: Phase 1 - Extract User Management service, Phase 2 - Product Catalog and Inventory services, Phase 3 - Order Processing and Payment services. Use API Gateway for service coordination.",
|
||||||
|
"step_number": 3,
|
||||||
|
"total_steps": 3,
|
||||||
|
"next_step_required": False, # Complete the session
|
||||||
|
"continuation_id": continuation_id,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
if not response3:
|
||||||
|
self.logger.error("Failed to complete first planning session")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Validate completion
|
||||||
|
response3_data = self._parse_planner_response(response3)
|
||||||
|
if not response3_data.get("planning_complete"):
|
||||||
|
self.logger.error("First planning session not marked as complete")
|
||||||
|
return False
|
||||||
|
|
||||||
|
if not response3_data.get("plan_summary"):
|
||||||
|
self.logger.error("First planning session missing plan summary")
|
||||||
|
return False
|
||||||
|
|
||||||
|
self.logger.info(" ✅ First planning session completed successfully")
|
||||||
|
|
||||||
|
# Store for next test
|
||||||
|
self.first_continuation_id = continuation_id
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self.logger.error(f"First planning session test failed: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
def _test_second_planning_session(self) -> bool:
|
||||||
|
"""Complete second planning session with context from first"""
|
||||||
|
try:
|
||||||
|
self.logger.info(" 2.2: Second planning session - Database Strategy")
|
||||||
|
|
||||||
|
# Step 1: Start database planning with previous context
|
||||||
|
self.logger.info(" 2.2.1: Start database strategy with microservices context")
|
||||||
|
response1, new_continuation_id = self.call_mcp_tool(
|
||||||
|
"planner",
|
||||||
|
{
|
||||||
|
"step": "Now I need to plan the database strategy for the microservices architecture. I'll design how each service will manage its data.",
|
||||||
|
"step_number": 1,
|
||||||
|
"total_steps": 2,
|
||||||
|
"next_step_required": True,
|
||||||
|
"continuation_id": self.first_continuation_id, # Use first session's continuation_id
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
if not response1 or not new_continuation_id:
|
||||||
|
self.logger.error("Failed to start second planning session")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Validate context loading
|
||||||
|
response1_data = self._parse_planner_response(response1)
|
||||||
|
if "previous_plan_context" not in response1_data:
|
||||||
|
self.logger.error("Second session should load context from first completed session")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Check context contains migration content
|
||||||
|
context = response1_data["previous_plan_context"].lower()
|
||||||
|
if "migration" not in context and "microservices" not in context:
|
||||||
|
self.logger.error("Context should contain migration/microservices content from first session")
|
||||||
|
return False
|
||||||
|
|
||||||
|
self.logger.info(" ✅ Second session loaded context from first completed session")
|
||||||
|
|
||||||
|
# Step 2: Complete database plan
|
||||||
|
self.logger.info(" 2.2.2: Complete database strategy")
|
||||||
|
response2, _ = self.call_mcp_tool(
|
||||||
|
"planner",
|
||||||
|
{
|
||||||
|
"step": "Database strategy: Each microservice gets its own database (database-per-service pattern). Use event sourcing for cross-service communication and eventual consistency. Implement CQRS for read/write separation.",
|
||||||
|
"step_number": 2,
|
||||||
|
"total_steps": 2,
|
||||||
|
"next_step_required": False, # Complete the session
|
||||||
|
"continuation_id": new_continuation_id,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
if not response2:
|
||||||
|
self.logger.error("Failed to complete second planning session")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Validate completion
|
||||||
|
response2_data = self._parse_planner_response(response2)
|
||||||
|
if not response2_data.get("planning_complete"):
|
||||||
|
self.logger.error("Second planning session not marked as complete")
|
||||||
|
return False
|
||||||
|
|
||||||
|
self.logger.info(" ✅ Second planning session completed successfully")
|
||||||
|
|
||||||
|
# Store for next test
|
||||||
|
self.second_continuation_id = new_continuation_id
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self.logger.error(f"Second planning session test failed: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
def _test_third_planning_session(self) -> bool:
|
||||||
|
"""Complete third planning session with context from both previous"""
|
||||||
|
try:
|
||||||
|
self.logger.info(" 2.3: Third planning session - Deployment Strategy")
|
||||||
|
|
||||||
|
# Step 1: Start deployment planning with accumulated context
|
||||||
|
self.logger.info(" 2.3.1: Start deployment strategy with accumulated context")
|
||||||
|
response1, new_continuation_id = self.call_mcp_tool(
|
||||||
|
"planner",
|
||||||
|
{
|
||||||
|
"step": "Now I need to plan the deployment strategy that supports both the microservices architecture and the database strategy. I'll design the infrastructure and deployment pipeline.",
|
||||||
|
"step_number": 1,
|
||||||
|
"total_steps": 2,
|
||||||
|
"next_step_required": True,
|
||||||
|
"continuation_id": self.second_continuation_id, # Use second session's continuation_id
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
if not response1 or not new_continuation_id:
|
||||||
|
self.logger.error("Failed to start third planning session")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Validate context loading
|
||||||
|
response1_data = self._parse_planner_response(response1)
|
||||||
|
if "previous_plan_context" not in response1_data:
|
||||||
|
self.logger.error("Third session should load context from previous completed sessions")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Check context contains content from most recent completed session
|
||||||
|
context = response1_data["previous_plan_context"].lower()
|
||||||
|
expected_terms = ["database", "event sourcing", "cqrs"]
|
||||||
|
found_terms = [term for term in expected_terms if term in context]
|
||||||
|
|
||||||
|
if len(found_terms) == 0:
|
||||||
|
self.logger.error(
|
||||||
|
f"Context should contain database strategy content from second session. Context: {context[:200]}..."
|
||||||
|
)
|
||||||
|
return False
|
||||||
|
|
||||||
|
self.logger.info(" ✅ Third session loaded context from most recent completed session")
|
||||||
|
|
||||||
|
# Step 2: Complete deployment plan
|
||||||
|
self.logger.info(" 2.3.2: Complete deployment strategy")
|
||||||
|
response2, _ = self.call_mcp_tool(
|
||||||
|
"planner",
|
||||||
|
{
|
||||||
|
"step": "Deployment strategy: Use Kubernetes for container orchestration with Helm charts. Implement CI/CD pipeline with GitOps. Use service mesh (Istio) for traffic management, monitoring, and security. Deploy databases in separate namespaces with backup automation.",
|
||||||
|
"step_number": 2,
|
||||||
|
"total_steps": 2,
|
||||||
|
"next_step_required": False, # Complete the session
|
||||||
|
"continuation_id": new_continuation_id,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
if not response2:
|
||||||
|
self.logger.error("Failed to complete third planning session")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Validate completion
|
||||||
|
response2_data = self._parse_planner_response(response2)
|
||||||
|
if not response2_data.get("planning_complete"):
|
||||||
|
self.logger.error("Third planning session not marked as complete")
|
||||||
|
return False
|
||||||
|
|
||||||
|
self.logger.info(" ✅ Third planning session completed successfully")
|
||||||
|
|
||||||
|
# Store for final test
|
||||||
|
self.third_continuation_id = new_continuation_id
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self.logger.error(f"Third planning session test failed: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
def _test_context_accumulation(self) -> bool:
|
||||||
|
"""Test that context properly accumulates across multiple completed sessions"""
|
||||||
|
try:
|
||||||
|
self.logger.info(" 2.4: Testing context accumulation across all sessions")
|
||||||
|
|
||||||
|
# Start a new planning session that should load context from the most recent completed session
|
||||||
|
self.logger.info(" 2.4.1: Start monitoring planning with full context history")
|
||||||
|
response1, _ = self.call_mcp_tool(
|
||||||
|
"planner",
|
||||||
|
{
|
||||||
|
"step": "Finally, I need to plan the monitoring and observability strategy that works with the microservices, database, and deployment architecture.",
|
||||||
|
"step_number": 1,
|
||||||
|
"total_steps": 1,
|
||||||
|
"next_step_required": False,
|
||||||
|
"continuation_id": self.third_continuation_id, # Use third session's continuation_id
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
if not response1:
|
||||||
|
self.logger.error("Failed to start monitoring planning session")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Validate context loading
|
||||||
|
response1_data = self._parse_planner_response(response1)
|
||||||
|
if "previous_plan_context" not in response1_data:
|
||||||
|
self.logger.error("Final session should load context from previous completed sessions")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Validate context contains most recent completed session content
|
||||||
|
context = response1_data["previous_plan_context"].lower()
|
||||||
|
|
||||||
|
# Should contain deployment strategy content (most recent)
|
||||||
|
deployment_terms = ["kubernetes", "deployment", "istio", "gitops"]
|
||||||
|
found_deployment_terms = [term for term in deployment_terms if term in context]
|
||||||
|
|
||||||
|
if len(found_deployment_terms) == 0:
|
||||||
|
self.logger.error(f"Context should contain deployment strategy content. Context: {context[:300]}...")
|
||||||
|
return False
|
||||||
|
|
||||||
|
self.logger.info(" ✅ Context accumulation working correctly")
|
||||||
|
|
||||||
|
# Validate this creates a complete planning session
|
||||||
|
if not response1_data.get("planning_complete"):
|
||||||
|
self.logger.error("Final planning session should be marked as complete")
|
||||||
|
return False
|
||||||
|
|
||||||
|
self.logger.info(" ✅ Context accumulation test completed successfully")
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self.logger.error(f"Context accumulation test failed: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
def call_mcp_tool(self, tool_name: str, params: dict) -> tuple[Optional[str], Optional[str]]:
|
||||||
|
"""Call an MCP tool via Claude CLI (docker exec) - override for planner-specific response handling"""
|
||||||
|
# Use parent implementation to get the raw response
|
||||||
|
response_text, _ = super().call_mcp_tool(tool_name, params)
|
||||||
|
|
||||||
|
if not response_text:
|
||||||
|
return None, None
|
||||||
|
|
||||||
|
# Extract continuation_id from planner response specifically
|
||||||
|
continuation_id = self._extract_planner_continuation_id(response_text)
|
||||||
|
|
||||||
|
return response_text, continuation_id
|
||||||
|
|
||||||
|
def _extract_planner_continuation_id(self, response_text: str) -> Optional[str]:
|
||||||
|
"""Extract continuation_id from planner response"""
|
||||||
|
try:
|
||||||
|
# Parse the response - it's now direct JSON, not wrapped
|
||||||
|
response_data = json.loads(response_text)
|
||||||
|
return response_data.get("continuation_id")
|
||||||
|
|
||||||
|
except json.JSONDecodeError as e:
|
||||||
|
self.logger.debug(f"Failed to parse response for planner continuation_id: {e}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
def _parse_planner_response(self, response_text: str) -> dict:
|
||||||
|
"""Parse planner tool JSON response"""
|
||||||
|
try:
|
||||||
|
# Parse the response - it's now direct JSON, not wrapped
|
||||||
|
return json.loads(response_text)
|
||||||
|
|
||||||
|
except json.JSONDecodeError as e:
|
||||||
|
self.logger.error(f"Failed to parse planner response as JSON: {e}")
|
||||||
|
self.logger.error(f"Response text: {response_text[:500]}...")
|
||||||
|
return {}
|
||||||
436
simulator_tests/test_planner_validation.py
Normal file
436
simulator_tests/test_planner_validation.py
Normal file
@@ -0,0 +1,436 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Planner Tool Validation Test
|
||||||
|
|
||||||
|
Tests the planner tool's sequential planning capabilities including:
|
||||||
|
- Step-by-step planning with proper JSON responses
|
||||||
|
- Continuation logic across planning sessions
|
||||||
|
- Branching and revision capabilities
|
||||||
|
- Previous plan context loading
|
||||||
|
- Plan completion and summary storage
|
||||||
|
"""
|
||||||
|
|
||||||
|
import json
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
from .base_test import BaseSimulatorTest
|
||||||
|
|
||||||
|
|
||||||
|
class PlannerValidationTest(BaseSimulatorTest):
|
||||||
|
"""Test planner tool's sequential planning and continuation features"""
|
||||||
|
|
||||||
|
@property
|
||||||
|
def test_name(self) -> str:
|
||||||
|
return "planner_validation"
|
||||||
|
|
||||||
|
@property
|
||||||
|
def test_description(self) -> str:
|
||||||
|
return "Planner tool sequential planning and continuation validation"
|
||||||
|
|
||||||
|
def run_test(self) -> bool:
|
||||||
|
"""Test planner tool sequential planning capabilities"""
|
||||||
|
try:
|
||||||
|
self.logger.info("Test: Planner tool validation")
|
||||||
|
|
||||||
|
# Test 1: Single planning session with multiple steps
|
||||||
|
if not self._test_single_planning_session():
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Test 2: Plan completion and continuation to new planning session
|
||||||
|
if not self._test_plan_continuation():
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Test 3: Branching and revision capabilities
|
||||||
|
if not self._test_branching_and_revision():
|
||||||
|
return False
|
||||||
|
|
||||||
|
self.logger.info(" ✅ All planner validation tests passed")
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self.logger.error(f"Planner validation test failed: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
def _test_single_planning_session(self) -> bool:
|
||||||
|
"""Test a complete planning session with multiple steps"""
|
||||||
|
try:
|
||||||
|
self.logger.info(" 1.1: Testing single planning session")
|
||||||
|
|
||||||
|
# Step 1: Start planning
|
||||||
|
self.logger.info(" 1.1.1: Step 1 - Initial planning step")
|
||||||
|
response1, continuation_id = self.call_mcp_tool(
|
||||||
|
"planner",
|
||||||
|
{
|
||||||
|
"step": "I need to plan a microservices migration for our monolithic e-commerce platform. Let me start by understanding the current architecture and identifying the key business domains.",
|
||||||
|
"step_number": 1,
|
||||||
|
"total_steps": 5,
|
||||||
|
"next_step_required": True,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
if not response1 or not continuation_id:
|
||||||
|
self.logger.error("Failed to get initial planning response")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Parse and validate JSON response
|
||||||
|
response1_data = self._parse_planner_response(response1)
|
||||||
|
if not response1_data:
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Validate step 1 response structure
|
||||||
|
if not self._validate_step_response(response1_data, 1, 5, True, "planning_success"):
|
||||||
|
return False
|
||||||
|
|
||||||
|
self.logger.info(f" ✅ Step 1 successful, continuation_id: {continuation_id}")
|
||||||
|
|
||||||
|
# Step 2: Continue planning
|
||||||
|
self.logger.info(" 1.1.2: Step 2 - Domain identification")
|
||||||
|
response2, _ = self.call_mcp_tool(
|
||||||
|
"planner",
|
||||||
|
{
|
||||||
|
"step": "Based on my analysis, I can identify the main business domains: User Management, Product Catalog, Order Processing, Payment, and Inventory. Let me plan how to extract these into separate services.",
|
||||||
|
"step_number": 2,
|
||||||
|
"total_steps": 5,
|
||||||
|
"next_step_required": True,
|
||||||
|
"continuation_id": continuation_id,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
if not response2:
|
||||||
|
self.logger.error("Failed to continue planning to step 2")
|
||||||
|
return False
|
||||||
|
|
||||||
|
response2_data = self._parse_planner_response(response2)
|
||||||
|
if not self._validate_step_response(response2_data, 2, 5, True, "planning_success"):
|
||||||
|
return False
|
||||||
|
|
||||||
|
self.logger.info(" ✅ Step 2 successful")
|
||||||
|
|
||||||
|
# Step 3: Final step
|
||||||
|
self.logger.info(" 1.1.3: Step 3 - Final planning step")
|
||||||
|
response3, _ = self.call_mcp_tool(
|
||||||
|
"planner",
|
||||||
|
{
|
||||||
|
"step": "Now I'll create a phased migration strategy: Phase 1 - Extract User Management, Phase 2 - Product Catalog and Inventory, Phase 3 - Order Processing and Payment services. This completes the initial migration plan.",
|
||||||
|
"step_number": 3,
|
||||||
|
"total_steps": 3, # Adjusted total
|
||||||
|
"next_step_required": False, # Final step
|
||||||
|
"continuation_id": continuation_id,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
if not response3:
|
||||||
|
self.logger.error("Failed to complete planning session")
|
||||||
|
return False
|
||||||
|
|
||||||
|
response3_data = self._parse_planner_response(response3)
|
||||||
|
if not self._validate_final_step_response(response3_data, 3, 3):
|
||||||
|
return False
|
||||||
|
|
||||||
|
self.logger.info(" ✅ Planning session completed successfully")
|
||||||
|
|
||||||
|
# Store continuation_id for next test
|
||||||
|
self.migration_continuation_id = continuation_id
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self.logger.error(f"Single planning session test failed: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
def _test_plan_continuation(self) -> bool:
|
||||||
|
"""Test continuing from a previous completed plan"""
|
||||||
|
try:
|
||||||
|
self.logger.info(" 1.2: Testing plan continuation with previous context")
|
||||||
|
|
||||||
|
# Start a new planning session using the continuation_id from previous completed plan
|
||||||
|
self.logger.info(" 1.2.1: New planning session with previous plan context")
|
||||||
|
response1, new_continuation_id = self.call_mcp_tool(
|
||||||
|
"planner",
|
||||||
|
{
|
||||||
|
"step": "Now that I have the microservices migration plan, let me plan the database strategy. I need to decide how to handle data consistency across the new services.",
|
||||||
|
"step_number": 1, # New planning session starts at step 1
|
||||||
|
"total_steps": 4,
|
||||||
|
"next_step_required": True,
|
||||||
|
"continuation_id": self.migration_continuation_id, # Use previous plan's continuation_id
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
if not response1 or not new_continuation_id:
|
||||||
|
self.logger.error("Failed to start new planning session with context")
|
||||||
|
return False
|
||||||
|
|
||||||
|
response1_data = self._parse_planner_response(response1)
|
||||||
|
if not response1_data:
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Should have previous plan context
|
||||||
|
if "previous_plan_context" not in response1_data:
|
||||||
|
self.logger.error("Expected previous_plan_context in new planning session")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Check for key terms from the previous plan
|
||||||
|
context = response1_data["previous_plan_context"].lower()
|
||||||
|
if "migration" not in context and "plan" not in context:
|
||||||
|
self.logger.error("Previous plan context doesn't contain expected content")
|
||||||
|
return False
|
||||||
|
|
||||||
|
self.logger.info(" ✅ New planning session loaded previous plan context")
|
||||||
|
|
||||||
|
# Continue the new planning session (step 2+ should NOT load context)
|
||||||
|
self.logger.info(" 1.2.2: Continue new planning session (no context loading)")
|
||||||
|
response2, _ = self.call_mcp_tool(
|
||||||
|
"planner",
|
||||||
|
{
|
||||||
|
"step": "I'll implement a database-per-service pattern with eventual consistency using event sourcing for cross-service communication.",
|
||||||
|
"step_number": 2,
|
||||||
|
"total_steps": 4,
|
||||||
|
"next_step_required": True,
|
||||||
|
"continuation_id": new_continuation_id, # Same continuation, step 2
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
if not response2:
|
||||||
|
self.logger.error("Failed to continue new planning session")
|
||||||
|
return False
|
||||||
|
|
||||||
|
response2_data = self._parse_planner_response(response2)
|
||||||
|
if not response2_data:
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Step 2+ should NOT have previous_plan_context (only step 1 with continuation_id gets context)
|
||||||
|
if "previous_plan_context" in response2_data:
|
||||||
|
self.logger.error("Step 2 should NOT have previous_plan_context")
|
||||||
|
return False
|
||||||
|
|
||||||
|
self.logger.info(" ✅ Step 2 correctly has no previous context (as expected)")
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self.logger.error(f"Plan continuation test failed: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
def _test_branching_and_revision(self) -> bool:
|
||||||
|
"""Test branching and revision capabilities"""
|
||||||
|
try:
|
||||||
|
self.logger.info(" 1.3: Testing branching and revision capabilities")
|
||||||
|
|
||||||
|
# Start a new planning session for testing branching
|
||||||
|
self.logger.info(" 1.3.1: Start planning session for branching test")
|
||||||
|
response1, continuation_id = self.call_mcp_tool(
|
||||||
|
"planner",
|
||||||
|
{
|
||||||
|
"step": "Let me plan the deployment strategy for the microservices. I'll consider different deployment options.",
|
||||||
|
"step_number": 1,
|
||||||
|
"total_steps": 4,
|
||||||
|
"next_step_required": True,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
if not response1 or not continuation_id:
|
||||||
|
self.logger.error("Failed to start branching test planning session")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Test branching
|
||||||
|
self.logger.info(" 1.3.2: Create a branch from step 1")
|
||||||
|
response2, _ = self.call_mcp_tool(
|
||||||
|
"planner",
|
||||||
|
{
|
||||||
|
"step": "Branch A: I'll explore Kubernetes deployment with service mesh (Istio) for advanced traffic management and observability.",
|
||||||
|
"step_number": 2,
|
||||||
|
"total_steps": 4,
|
||||||
|
"next_step_required": True,
|
||||||
|
"is_branch_point": True,
|
||||||
|
"branch_from_step": 1,
|
||||||
|
"branch_id": "kubernetes-istio",
|
||||||
|
"continuation_id": continuation_id,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
if not response2:
|
||||||
|
self.logger.error("Failed to create branch")
|
||||||
|
return False
|
||||||
|
|
||||||
|
response2_data = self._parse_planner_response(response2)
|
||||||
|
if not response2_data:
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Validate branching metadata
|
||||||
|
metadata = response2_data.get("metadata", {})
|
||||||
|
if not metadata.get("is_branch_point"):
|
||||||
|
self.logger.error("Branch point not properly recorded in metadata")
|
||||||
|
return False
|
||||||
|
|
||||||
|
if metadata.get("branch_id") != "kubernetes-istio":
|
||||||
|
self.logger.error("Branch ID not properly recorded")
|
||||||
|
return False
|
||||||
|
|
||||||
|
if "kubernetes-istio" not in metadata.get("branches", []):
|
||||||
|
self.logger.error("Branch not recorded in branches list")
|
||||||
|
return False
|
||||||
|
|
||||||
|
self.logger.info(" ✅ Branching working correctly")
|
||||||
|
|
||||||
|
# Test revision
|
||||||
|
self.logger.info(" 1.3.3: Revise step 2")
|
||||||
|
response3, _ = self.call_mcp_tool(
|
||||||
|
"planner",
|
||||||
|
{
|
||||||
|
"step": "Revision: Actually, let me revise the Kubernetes approach. I'll use a simpler Docker Swarm deployment initially, then migrate to Kubernetes later.",
|
||||||
|
"step_number": 3,
|
||||||
|
"total_steps": 4,
|
||||||
|
"next_step_required": True,
|
||||||
|
"is_step_revision": True,
|
||||||
|
"revises_step_number": 2,
|
||||||
|
"continuation_id": continuation_id,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
if not response3:
|
||||||
|
self.logger.error("Failed to create revision")
|
||||||
|
return False
|
||||||
|
|
||||||
|
response3_data = self._parse_planner_response(response3)
|
||||||
|
if not response3_data:
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Validate revision metadata
|
||||||
|
metadata = response3_data.get("metadata", {})
|
||||||
|
if not metadata.get("is_step_revision"):
|
||||||
|
self.logger.error("Step revision not properly recorded in metadata")
|
||||||
|
return False
|
||||||
|
|
||||||
|
if metadata.get("revises_step_number") != 2:
|
||||||
|
self.logger.error("Revised step number not properly recorded")
|
||||||
|
return False
|
||||||
|
|
||||||
|
self.logger.info(" ✅ Revision working correctly")
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self.logger.error(f"Branching and revision test failed: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
def call_mcp_tool(self, tool_name: str, params: dict) -> tuple[Optional[str], Optional[str]]:
|
||||||
|
"""Call an MCP tool via Claude CLI (docker exec) - override for planner-specific response handling"""
|
||||||
|
# Use parent implementation to get the raw response
|
||||||
|
response_text, _ = super().call_mcp_tool(tool_name, params)
|
||||||
|
|
||||||
|
if not response_text:
|
||||||
|
return None, None
|
||||||
|
|
||||||
|
# Extract continuation_id from planner response specifically
|
||||||
|
continuation_id = self._extract_planner_continuation_id(response_text)
|
||||||
|
|
||||||
|
return response_text, continuation_id
|
||||||
|
|
||||||
|
def _extract_planner_continuation_id(self, response_text: str) -> Optional[str]:
|
||||||
|
"""Extract continuation_id from planner response"""
|
||||||
|
try:
|
||||||
|
# Parse the response - it's now direct JSON, not wrapped
|
||||||
|
response_data = json.loads(response_text)
|
||||||
|
return response_data.get("continuation_id")
|
||||||
|
|
||||||
|
except json.JSONDecodeError as e:
|
||||||
|
self.logger.debug(f"Failed to parse response for planner continuation_id: {e}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
def _parse_planner_response(self, response_text: str) -> dict:
|
||||||
|
"""Parse planner tool JSON response"""
|
||||||
|
try:
|
||||||
|
# Parse the response - it's now direct JSON, not wrapped
|
||||||
|
return json.loads(response_text)
|
||||||
|
|
||||||
|
except json.JSONDecodeError as e:
|
||||||
|
self.logger.error(f"Failed to parse planner response as JSON: {e}")
|
||||||
|
self.logger.error(f"Response text: {response_text[:500]}...")
|
||||||
|
return {}
|
||||||
|
|
||||||
|
def _validate_step_response(
|
||||||
|
self,
|
||||||
|
response_data: dict,
|
||||||
|
expected_step: int,
|
||||||
|
expected_total: int,
|
||||||
|
expected_next_required: bool,
|
||||||
|
expected_status: str,
|
||||||
|
) -> bool:
|
||||||
|
"""Validate a planning step response structure"""
|
||||||
|
try:
|
||||||
|
# Check status
|
||||||
|
if response_data.get("status") != expected_status:
|
||||||
|
self.logger.error(f"Expected status '{expected_status}', got '{response_data.get('status')}'")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Check step number
|
||||||
|
if response_data.get("step_number") != expected_step:
|
||||||
|
self.logger.error(f"Expected step_number {expected_step}, got {response_data.get('step_number')}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Check total steps
|
||||||
|
if response_data.get("total_steps") != expected_total:
|
||||||
|
self.logger.error(f"Expected total_steps {expected_total}, got {response_data.get('total_steps')}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Check next_step_required
|
||||||
|
if response_data.get("next_step_required") != expected_next_required:
|
||||||
|
self.logger.error(
|
||||||
|
f"Expected next_step_required {expected_next_required}, got {response_data.get('next_step_required')}"
|
||||||
|
)
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Check that step_content exists
|
||||||
|
if not response_data.get("step_content"):
|
||||||
|
self.logger.error("Missing step_content in response")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Check metadata exists
|
||||||
|
if "metadata" not in response_data:
|
||||||
|
self.logger.error("Missing metadata in response")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Check next_steps guidance
|
||||||
|
if not response_data.get("next_steps"):
|
||||||
|
self.logger.error("Missing next_steps guidance in response")
|
||||||
|
return False
|
||||||
|
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self.logger.error(f"Error validating step response: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
def _validate_final_step_response(self, response_data: dict, expected_step: int, expected_total: int) -> bool:
|
||||||
|
"""Validate a final planning step response"""
|
||||||
|
try:
|
||||||
|
# Basic step validation
|
||||||
|
if not self._validate_step_response(
|
||||||
|
response_data, expected_step, expected_total, False, "planning_success"
|
||||||
|
):
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Check planning_complete flag
|
||||||
|
if not response_data.get("planning_complete"):
|
||||||
|
self.logger.error("Expected planning_complete=true for final step")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Check plan_summary exists
|
||||||
|
if not response_data.get("plan_summary"):
|
||||||
|
self.logger.error("Missing plan_summary in final step")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Check plan_summary contains expected content
|
||||||
|
plan_summary = response_data.get("plan_summary", "")
|
||||||
|
if "COMPLETE PLAN:" not in plan_summary:
|
||||||
|
self.logger.error("plan_summary doesn't contain 'COMPLETE PLAN:' marker")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Check next_steps mentions completion
|
||||||
|
next_steps = response_data.get("next_steps", "")
|
||||||
|
if "complete" not in next_steps.lower():
|
||||||
|
self.logger.error("next_steps doesn't indicate planning completion")
|
||||||
|
return False
|
||||||
|
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self.logger.error(f"Error validating final step response: {e}")
|
||||||
|
return False
|
||||||
@@ -7,6 +7,7 @@ from .chat_prompt import CHAT_PROMPT
|
|||||||
from .codereview_prompt import CODEREVIEW_PROMPT
|
from .codereview_prompt import CODEREVIEW_PROMPT
|
||||||
from .consensus_prompt import CONSENSUS_PROMPT
|
from .consensus_prompt import CONSENSUS_PROMPT
|
||||||
from .debug_prompt import DEBUG_ISSUE_PROMPT
|
from .debug_prompt import DEBUG_ISSUE_PROMPT
|
||||||
|
from .planner_prompt import PLANNER_PROMPT
|
||||||
from .precommit_prompt import PRECOMMIT_PROMPT
|
from .precommit_prompt import PRECOMMIT_PROMPT
|
||||||
from .refactor_prompt import REFACTOR_PROMPT
|
from .refactor_prompt import REFACTOR_PROMPT
|
||||||
from .testgen_prompt import TESTGEN_PROMPT
|
from .testgen_prompt import TESTGEN_PROMPT
|
||||||
@@ -19,6 +20,7 @@ __all__ = [
|
|||||||
"ANALYZE_PROMPT",
|
"ANALYZE_PROMPT",
|
||||||
"CHAT_PROMPT",
|
"CHAT_PROMPT",
|
||||||
"CONSENSUS_PROMPT",
|
"CONSENSUS_PROMPT",
|
||||||
|
"PLANNER_PROMPT",
|
||||||
"PRECOMMIT_PROMPT",
|
"PRECOMMIT_PROMPT",
|
||||||
"REFACTOR_PROMPT",
|
"REFACTOR_PROMPT",
|
||||||
"TESTGEN_PROMPT",
|
"TESTGEN_PROMPT",
|
||||||
|
|||||||
124
systemprompts/planner_prompt.py
Normal file
124
systemprompts/planner_prompt.py
Normal file
@@ -0,0 +1,124 @@
|
|||||||
|
"""
|
||||||
|
Planner tool system prompts
|
||||||
|
"""
|
||||||
|
|
||||||
|
PLANNER_PROMPT = """
|
||||||
|
You are an expert, seasoned planning consultant and systems architect with deep expertise in plan structuring, risk assessment,
|
||||||
|
and software development strategy. You have extensive experience organizing complex projects, guiding technical implementations,
|
||||||
|
and maintaining a sharp understanding of both your own and competing products across the market. From microservices
|
||||||
|
to global-scale deployments, your technical insight and architectural knowledge are unmatched. There is nothing related
|
||||||
|
to software and software development that you're not aware of. All the latest frameworks, languages, trends, techniques
|
||||||
|
is something you have mastery in. Your role is to critically evaluate and refine plans to make them more robust,
|
||||||
|
efficient, and implementation-ready.
|
||||||
|
|
||||||
|
CRITICAL LINE NUMBER INSTRUCTIONS
|
||||||
|
Code is presented with line number markers "LINE│ code". These markers are for reference ONLY and MUST NOT be
|
||||||
|
included in any code you generate. Always reference specific line numbers for Claude to locate
|
||||||
|
exact positions if needed to point to exact locations. Include a very short code excerpt alongside for clarity.
|
||||||
|
Include context_start_text and context_end_text as backup references. Never include "LINE│" markers in generated code
|
||||||
|
snippets.
|
||||||
|
|
||||||
|
IF MORE INFORMATION IS NEEDED
|
||||||
|
If Claude is discussing specific code, functions, or project components that was not given as part of the context,
|
||||||
|
and you need additional context (e.g., related files, configuration, dependencies, test files) to provide meaningful
|
||||||
|
collaboration, you MUST respond ONLY with this JSON format (and nothing else). Do NOT ask for the same file you've been
|
||||||
|
provided unless for some reason its content is missing or incomplete:
|
||||||
|
{"status": "clarification_required", "question": "<your brief question>",
|
||||||
|
"files_needed": ["[file name here]", "[or some folder/]"]}
|
||||||
|
|
||||||
|
PLANNING METHODOLOGY:
|
||||||
|
|
||||||
|
1. DECOMPOSITION: Break down the main objective into logical, sequential steps
|
||||||
|
2. DEPENDENCIES: Identify which steps depend on others and order them appropriately
|
||||||
|
3. BRANCHING: When multiple valid approaches exist, create branches to explore alternatives
|
||||||
|
4. ITERATION: Be willing to step back and refine earlier steps if new insights emerge
|
||||||
|
5. COMPLETENESS: Ensure all aspects of the task are covered without gaps
|
||||||
|
|
||||||
|
STEP STRUCTURE:
|
||||||
|
Each step in your plan MUST include:
|
||||||
|
- Step number and branch identifier (if branching)
|
||||||
|
- Clear, actionable description
|
||||||
|
- Prerequisites or dependencies
|
||||||
|
- Expected outcomes
|
||||||
|
- Potential challenges or considerations
|
||||||
|
- Alternative approaches (when applicable)
|
||||||
|
|
||||||
|
BRANCHING GUIDELINES:
|
||||||
|
- Use branches to explore different implementation strategies
|
||||||
|
- Label branches clearly (e.g., "Branch A: Microservices approach", "Branch B: Monolithic approach")
|
||||||
|
- Explain when and why to choose each branch
|
||||||
|
- Show how branches might reconverge
|
||||||
|
|
||||||
|
PLANNING PRINCIPLES:
|
||||||
|
- Start with high-level strategy, then add implementation details
|
||||||
|
- Consider technical, organizational, and resource constraints
|
||||||
|
- Include validation and testing steps
|
||||||
|
- Plan for error handling and rollback scenarios
|
||||||
|
- Think about maintenance and future extensibility
|
||||||
|
|
||||||
|
STRUCTURED JSON OUTPUT FORMAT:
|
||||||
|
You MUST respond with a properly formatted JSON object following this exact schema.
|
||||||
|
Do NOT include any text before or after the JSON. The response must be valid JSON only.
|
||||||
|
|
||||||
|
IF MORE INFORMATION IS NEEDED:
|
||||||
|
If you lack critical information to proceed with planning, you MUST only respond with:
|
||||||
|
{
|
||||||
|
"status": "clarification_required",
|
||||||
|
"question": "<your brief question>",
|
||||||
|
"files_needed": ["<file name here>", "<or some folder/>"]
|
||||||
|
}
|
||||||
|
|
||||||
|
FOR NORMAL PLANNING RESPONSES:
|
||||||
|
|
||||||
|
{
|
||||||
|
"status": "planning_success",
|
||||||
|
"step_number": <current step number>,
|
||||||
|
"total_steps": <estimated total steps>,
|
||||||
|
"next_step_required": <true/false>,
|
||||||
|
"step_content": "<detailed description of current planning step>",
|
||||||
|
"metadata": {
|
||||||
|
"branches": ["<list of branch IDs if any>"],
|
||||||
|
"step_history_length": <number of steps completed so far>,
|
||||||
|
"is_step_revision": <true/false>,
|
||||||
|
"revises_step_number": <number if this revises a previous step>,
|
||||||
|
"is_branch_point": <true/false>,
|
||||||
|
"branch_from_step": <step number if this branches from another step>,
|
||||||
|
"branch_id": "<unique branch identifier if creating/following a branch>",
|
||||||
|
"more_steps_needed": <true/false>
|
||||||
|
},
|
||||||
|
"continuation_id": "<thread_id for conversation continuity>",
|
||||||
|
"planning_complete": <true/false - set to true only on final step>,
|
||||||
|
"plan_summary": "<complete plan summary - only include when planning_complete is true>",
|
||||||
|
"next_steps": "<guidance for Claude on next actions>",
|
||||||
|
"previous_plan_context": "<context from previous completed plans - only on step 1 with continuation_id>"
|
||||||
|
}
|
||||||
|
|
||||||
|
PLANNING CONTENT GUIDELINES:
|
||||||
|
- step_content: Provide detailed planning analysis for the current step
|
||||||
|
- Include specific actions, prerequisites, outcomes, and considerations
|
||||||
|
- When branching, clearly explain the alternative approach and when to use it
|
||||||
|
- When completing planning, provide comprehensive plan_summary
|
||||||
|
- next_steps: Always guide Claude on what to do next (continue planning, implement, or branch)
|
||||||
|
|
||||||
|
PLAN PRESENTATION GUIDELINES:
|
||||||
|
When planning is complete (planning_complete: true), Claude should present the final plan with:
|
||||||
|
- Clear headings and numbered phases/sections
|
||||||
|
- Visual elements like ASCII charts for workflows, dependencies, or sequences
|
||||||
|
- Bullet points and sub-steps for detailed breakdowns
|
||||||
|
- Implementation guidance and next steps
|
||||||
|
- Visual organization (boxes, arrows, diagrams) for complex relationships
|
||||||
|
- Tables for comparisons or resource allocation
|
||||||
|
- Priority indicators and sequence information where relevant
|
||||||
|
|
||||||
|
IMPORTANT: Do NOT use emojis in plan presentations. Use clear text formatting, ASCII characters, and symbols only.
|
||||||
|
IMPORTANT: Do NOT mention time estimates, costs, or pricing unless explicitly requested by the user.
|
||||||
|
|
||||||
|
Example visual elements to use:
|
||||||
|
- Phase diagrams: Phase 1 → Phase 2 → Phase 3
|
||||||
|
- Dependency charts: A ← B ← C (C depends on B, B depends on A)
|
||||||
|
- Sequence boxes: [Phase 1: Setup] → [Phase 2: Development] → [Phase 3: Testing]
|
||||||
|
- Decision trees for branching strategies
|
||||||
|
- Resource allocation tables
|
||||||
|
|
||||||
|
Be thorough, practical, and consider edge cases. Your planning should be detailed enough that someone could follow it step-by-step to achieve the goal.
|
||||||
|
"""
|
||||||
413
tests/test_planner.py
Normal file
413
tests/test_planner.py
Normal file
@@ -0,0 +1,413 @@
|
|||||||
|
"""
|
||||||
|
Tests for the planner tool.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from unittest.mock import patch
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
from tools.models import ToolModelCategory
|
||||||
|
from tools.planner import PlannerRequest, PlannerTool
|
||||||
|
|
||||||
|
|
||||||
|
class TestPlannerTool:
|
||||||
|
"""Test suite for PlannerTool."""
|
||||||
|
|
||||||
|
def test_tool_metadata(self):
|
||||||
|
"""Test basic tool metadata and configuration."""
|
||||||
|
tool = PlannerTool()
|
||||||
|
|
||||||
|
assert tool.get_name() == "planner"
|
||||||
|
assert "SEQUENTIAL PLANNER" in tool.get_description()
|
||||||
|
assert tool.get_default_temperature() == 0.5 # TEMPERATURE_BALANCED
|
||||||
|
assert tool.get_model_category() == ToolModelCategory.EXTENDED_REASONING
|
||||||
|
assert tool.get_default_thinking_mode() == "high"
|
||||||
|
|
||||||
|
def test_request_validation(self):
|
||||||
|
"""Test Pydantic request model validation."""
|
||||||
|
# Valid interactive step request
|
||||||
|
step_request = PlannerRequest(
|
||||||
|
step="Create database migration scripts", step_number=3, total_steps=10, next_step_required=True
|
||||||
|
)
|
||||||
|
assert step_request.step == "Create database migration scripts"
|
||||||
|
assert step_request.step_number == 3
|
||||||
|
assert step_request.next_step_required is True
|
||||||
|
assert step_request.is_step_revision is False # default
|
||||||
|
|
||||||
|
# Missing required fields should fail
|
||||||
|
with pytest.raises(ValueError):
|
||||||
|
PlannerRequest() # Missing all required fields
|
||||||
|
|
||||||
|
with pytest.raises(ValueError):
|
||||||
|
PlannerRequest(step="test") # Missing other required fields
|
||||||
|
|
||||||
|
def test_input_schema_generation(self):
|
||||||
|
"""Test JSON schema generation for MCP client."""
|
||||||
|
tool = PlannerTool()
|
||||||
|
schema = tool.get_input_schema()
|
||||||
|
|
||||||
|
assert schema["type"] == "object"
|
||||||
|
# Interactive planning fields
|
||||||
|
assert "step" in schema["properties"]
|
||||||
|
assert "step_number" in schema["properties"]
|
||||||
|
assert "total_steps" in schema["properties"]
|
||||||
|
assert "next_step_required" in schema["properties"]
|
||||||
|
assert "is_step_revision" in schema["properties"]
|
||||||
|
assert "is_branch_point" in schema["properties"]
|
||||||
|
assert "branch_id" in schema["properties"]
|
||||||
|
assert "continuation_id" in schema["properties"]
|
||||||
|
|
||||||
|
# Check excluded fields are NOT present
|
||||||
|
assert "model" not in schema["properties"]
|
||||||
|
assert "images" not in schema["properties"]
|
||||||
|
assert "files" not in schema["properties"]
|
||||||
|
assert "temperature" not in schema["properties"]
|
||||||
|
assert "thinking_mode" not in schema["properties"]
|
||||||
|
assert "use_websearch" not in schema["properties"]
|
||||||
|
|
||||||
|
# Check required fields
|
||||||
|
assert "step" in schema["required"]
|
||||||
|
assert "step_number" in schema["required"]
|
||||||
|
assert "total_steps" in schema["required"]
|
||||||
|
assert "next_step_required" in schema["required"]
|
||||||
|
|
||||||
|
def test_model_category_for_planning(self):
|
||||||
|
"""Test that planner uses extended reasoning category."""
|
||||||
|
tool = PlannerTool()
|
||||||
|
category = tool.get_model_category()
|
||||||
|
|
||||||
|
# Planning needs deep thinking
|
||||||
|
assert category == ToolModelCategory.EXTENDED_REASONING
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_execute_first_step(self):
|
||||||
|
"""Test execute method for first planning step."""
|
||||||
|
tool = PlannerTool()
|
||||||
|
arguments = {
|
||||||
|
"step": "Plan a microservices migration for our monolithic e-commerce platform",
|
||||||
|
"step_number": 1,
|
||||||
|
"total_steps": 10,
|
||||||
|
"next_step_required": True,
|
||||||
|
}
|
||||||
|
|
||||||
|
# Mock conversation memory functions
|
||||||
|
with patch("utils.conversation_memory.create_thread", return_value="test-uuid-123"):
|
||||||
|
with patch("utils.conversation_memory.add_turn"):
|
||||||
|
result = await tool.execute(arguments)
|
||||||
|
|
||||||
|
# Should return a list with TextContent
|
||||||
|
assert len(result) == 1
|
||||||
|
assert result[0].type == "text"
|
||||||
|
|
||||||
|
# Parse the JSON response
|
||||||
|
import json
|
||||||
|
|
||||||
|
parsed_response = json.loads(result[0].text)
|
||||||
|
|
||||||
|
assert parsed_response["step_number"] == 1
|
||||||
|
assert parsed_response["total_steps"] == 10
|
||||||
|
assert parsed_response["next_step_required"] is True
|
||||||
|
assert parsed_response["continuation_id"] == "test-uuid-123"
|
||||||
|
assert parsed_response["status"] == "planning_success"
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_execute_subsequent_step(self):
|
||||||
|
"""Test execute method for subsequent planning step."""
|
||||||
|
tool = PlannerTool()
|
||||||
|
arguments = {
|
||||||
|
"step": "Set up Docker containers for each microservice",
|
||||||
|
"step_number": 2,
|
||||||
|
"total_steps": 8,
|
||||||
|
"next_step_required": True,
|
||||||
|
"continuation_id": "existing-uuid-456",
|
||||||
|
}
|
||||||
|
|
||||||
|
# Mock conversation memory functions
|
||||||
|
with patch("utils.conversation_memory.add_turn"):
|
||||||
|
result = await tool.execute(arguments)
|
||||||
|
|
||||||
|
# Should return a list with TextContent
|
||||||
|
assert len(result) == 1
|
||||||
|
assert result[0].type == "text"
|
||||||
|
|
||||||
|
# Parse the JSON response
|
||||||
|
import json
|
||||||
|
|
||||||
|
parsed_response = json.loads(result[0].text)
|
||||||
|
|
||||||
|
assert parsed_response["step_number"] == 2
|
||||||
|
assert parsed_response["total_steps"] == 8
|
||||||
|
assert parsed_response["next_step_required"] is True
|
||||||
|
assert parsed_response["continuation_id"] == "existing-uuid-456"
|
||||||
|
assert parsed_response["status"] == "planning_success"
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_execute_with_continuation_context(self):
|
||||||
|
"""Test execute method with continuation that loads previous context."""
|
||||||
|
tool = PlannerTool()
|
||||||
|
arguments = {
|
||||||
|
"step": "Continue planning the deployment phase",
|
||||||
|
"step_number": 1, # Step 1 with continuation_id loads context
|
||||||
|
"total_steps": 8,
|
||||||
|
"next_step_required": True,
|
||||||
|
"continuation_id": "test-continuation-id",
|
||||||
|
}
|
||||||
|
|
||||||
|
# Mock thread with completed plan
|
||||||
|
from utils.conversation_memory import ConversationTurn, ThreadContext
|
||||||
|
|
||||||
|
mock_turn = ConversationTurn(
|
||||||
|
role="assistant",
|
||||||
|
content='{"status": "planning_success", "planning_complete": true, "plan_summary": "COMPLETE PLAN: Authentication system with 3 steps completed"}',
|
||||||
|
tool_name="planner",
|
||||||
|
model_name="claude-planner",
|
||||||
|
timestamp="2024-01-01T00:00:00Z",
|
||||||
|
)
|
||||||
|
mock_thread = ThreadContext(
|
||||||
|
thread_id="test-id",
|
||||||
|
tool_name="planner",
|
||||||
|
turns=[mock_turn],
|
||||||
|
created_at="2024-01-01T00:00:00Z",
|
||||||
|
last_updated_at="2024-01-01T00:00:00Z",
|
||||||
|
initial_context={},
|
||||||
|
)
|
||||||
|
|
||||||
|
with patch("utils.conversation_memory.get_thread", return_value=mock_thread):
|
||||||
|
with patch("utils.conversation_memory.add_turn"):
|
||||||
|
result = await tool.execute(arguments)
|
||||||
|
|
||||||
|
# Should return a list with TextContent
|
||||||
|
assert len(result) == 1
|
||||||
|
response_text = result[0].text
|
||||||
|
|
||||||
|
# Should include previous plan context in JSON
|
||||||
|
import json
|
||||||
|
|
||||||
|
parsed_response = json.loads(response_text)
|
||||||
|
|
||||||
|
# Check for previous plan context in the structured response
|
||||||
|
assert "previous_plan_context" in parsed_response
|
||||||
|
assert "Authentication system" in parsed_response["previous_plan_context"]
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_execute_final_step(self):
|
||||||
|
"""Test execute method for final planning step."""
|
||||||
|
tool = PlannerTool()
|
||||||
|
arguments = {
|
||||||
|
"step": "Deploy and monitor the new system",
|
||||||
|
"step_number": 10,
|
||||||
|
"total_steps": 10,
|
||||||
|
"next_step_required": False, # Final step
|
||||||
|
"continuation_id": "test-uuid-789",
|
||||||
|
}
|
||||||
|
|
||||||
|
# Mock conversation memory functions
|
||||||
|
with patch("utils.conversation_memory.add_turn"):
|
||||||
|
result = await tool.execute(arguments)
|
||||||
|
|
||||||
|
# Should return a list with TextContent
|
||||||
|
assert len(result) == 1
|
||||||
|
response_text = result[0].text
|
||||||
|
|
||||||
|
# Parse the structured JSON response
|
||||||
|
import json
|
||||||
|
|
||||||
|
parsed_response = json.loads(response_text)
|
||||||
|
|
||||||
|
# Check final step structure
|
||||||
|
assert parsed_response["status"] == "planning_success"
|
||||||
|
assert parsed_response["step_number"] == 10
|
||||||
|
assert parsed_response["planning_complete"] is True
|
||||||
|
assert "plan_summary" in parsed_response
|
||||||
|
assert "COMPLETE PLAN:" in parsed_response["plan_summary"]
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_execute_with_branching(self):
|
||||||
|
"""Test execute method with branching."""
|
||||||
|
tool = PlannerTool()
|
||||||
|
arguments = {
|
||||||
|
"step": "Use Kubernetes for orchestration",
|
||||||
|
"step_number": 4,
|
||||||
|
"total_steps": 10,
|
||||||
|
"next_step_required": True,
|
||||||
|
"is_branch_point": True,
|
||||||
|
"branch_from_step": 3,
|
||||||
|
"branch_id": "cloud-native-path",
|
||||||
|
"continuation_id": "test-uuid-branch",
|
||||||
|
}
|
||||||
|
|
||||||
|
# Mock conversation memory functions
|
||||||
|
with patch("utils.conversation_memory.add_turn"):
|
||||||
|
result = await tool.execute(arguments)
|
||||||
|
|
||||||
|
# Should return a list with TextContent
|
||||||
|
assert len(result) == 1
|
||||||
|
response_text = result[0].text
|
||||||
|
|
||||||
|
# Parse the JSON response
|
||||||
|
import json
|
||||||
|
|
||||||
|
parsed_response = json.loads(response_text)
|
||||||
|
|
||||||
|
assert parsed_response["metadata"]["branches"] == ["cloud-native-path"]
|
||||||
|
assert "cloud-native-path" in str(tool.branches)
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_execute_with_revision(self):
|
||||||
|
"""Test execute method with step revision."""
|
||||||
|
tool = PlannerTool()
|
||||||
|
arguments = {
|
||||||
|
"step": "Revise API design to use GraphQL instead of REST",
|
||||||
|
"step_number": 3,
|
||||||
|
"total_steps": 8,
|
||||||
|
"next_step_required": True,
|
||||||
|
"is_step_revision": True,
|
||||||
|
"revises_step_number": 2,
|
||||||
|
"continuation_id": "test-uuid-revision",
|
||||||
|
}
|
||||||
|
|
||||||
|
# Mock conversation memory functions
|
||||||
|
with patch("utils.conversation_memory.add_turn"):
|
||||||
|
result = await tool.execute(arguments)
|
||||||
|
|
||||||
|
# Should return a list with TextContent
|
||||||
|
assert len(result) == 1
|
||||||
|
response_text = result[0].text
|
||||||
|
|
||||||
|
# Parse the JSON response
|
||||||
|
import json
|
||||||
|
|
||||||
|
parsed_response = json.loads(response_text)
|
||||||
|
|
||||||
|
assert parsed_response["step_number"] == 3
|
||||||
|
assert parsed_response["next_step_required"] is True
|
||||||
|
assert parsed_response["metadata"]["is_step_revision"] is True
|
||||||
|
assert parsed_response["metadata"]["revises_step_number"] == 2
|
||||||
|
|
||||||
|
# Check that step data was stored in history
|
||||||
|
assert len(tool.step_history) > 0
|
||||||
|
latest_step = tool.step_history[-1]
|
||||||
|
assert latest_step["is_step_revision"] is True
|
||||||
|
assert latest_step["revises_step_number"] == 2
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_execute_adjusts_total_steps(self):
|
||||||
|
"""Test execute method adjusts total steps when current step exceeds estimate."""
|
||||||
|
tool = PlannerTool()
|
||||||
|
arguments = {
|
||||||
|
"step": "Additional step discovered during planning",
|
||||||
|
"step_number": 8,
|
||||||
|
"total_steps": 5, # Current step exceeds total
|
||||||
|
"next_step_required": True,
|
||||||
|
"continuation_id": "test-uuid-adjust",
|
||||||
|
}
|
||||||
|
|
||||||
|
# Mock conversation memory functions
|
||||||
|
with patch("utils.conversation_memory.add_turn"):
|
||||||
|
result = await tool.execute(arguments)
|
||||||
|
|
||||||
|
# Should return a list with TextContent
|
||||||
|
assert len(result) == 1
|
||||||
|
response_text = result[0].text
|
||||||
|
|
||||||
|
# Parse the JSON response
|
||||||
|
import json
|
||||||
|
|
||||||
|
parsed_response = json.loads(response_text)
|
||||||
|
|
||||||
|
# Total steps should be adjusted to match current step
|
||||||
|
assert parsed_response["total_steps"] == 8
|
||||||
|
assert parsed_response["step_number"] == 8
|
||||||
|
assert parsed_response["status"] == "planning_success"
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_execute_error_handling(self):
|
||||||
|
"""Test execute method error handling."""
|
||||||
|
tool = PlannerTool()
|
||||||
|
# Invalid arguments - missing required fields
|
||||||
|
arguments = {
|
||||||
|
"step": "Invalid request"
|
||||||
|
# Missing required fields: step_number, total_steps, next_step_required
|
||||||
|
}
|
||||||
|
|
||||||
|
result = await tool.execute(arguments)
|
||||||
|
|
||||||
|
# Should return error response
|
||||||
|
assert len(result) == 1
|
||||||
|
response_text = result[0].text
|
||||||
|
|
||||||
|
# Parse the JSON response
|
||||||
|
import json
|
||||||
|
|
||||||
|
parsed_response = json.loads(response_text)
|
||||||
|
|
||||||
|
assert parsed_response["status"] == "planning_failed"
|
||||||
|
assert "error" in parsed_response
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_execute_step_history_tracking(self):
|
||||||
|
"""Test that execute method properly tracks step history."""
|
||||||
|
tool = PlannerTool()
|
||||||
|
|
||||||
|
# Execute multiple steps
|
||||||
|
step1_args = {"step": "First step", "step_number": 1, "total_steps": 3, "next_step_required": True}
|
||||||
|
|
||||||
|
step2_args = {
|
||||||
|
"step": "Second step",
|
||||||
|
"step_number": 2,
|
||||||
|
"total_steps": 3,
|
||||||
|
"next_step_required": True,
|
||||||
|
"continuation_id": "test-uuid-history",
|
||||||
|
}
|
||||||
|
|
||||||
|
# Mock conversation memory functions
|
||||||
|
with patch("utils.conversation_memory.create_thread", return_value="test-uuid-history"):
|
||||||
|
with patch("utils.conversation_memory.add_turn"):
|
||||||
|
await tool.execute(step1_args)
|
||||||
|
await tool.execute(step2_args)
|
||||||
|
|
||||||
|
# Should have tracked both steps
|
||||||
|
assert len(tool.step_history) == 2
|
||||||
|
assert tool.step_history[0]["step"] == "First step"
|
||||||
|
assert tool.step_history[1]["step"] == "Second step"
|
||||||
|
|
||||||
|
|
||||||
|
# Integration test
|
||||||
|
class TestPlannerToolIntegration:
|
||||||
|
"""Integration tests for planner tool."""
|
||||||
|
|
||||||
|
def setup_method(self):
|
||||||
|
"""Set up model context for integration tests."""
|
||||||
|
from utils.model_context import ModelContext
|
||||||
|
|
||||||
|
self.tool = PlannerTool()
|
||||||
|
self.tool._model_context = ModelContext("flash") # Test model
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_interactive_planning_flow(self):
|
||||||
|
"""Test complete interactive planning flow."""
|
||||||
|
arguments = {
|
||||||
|
"step": "Plan a complete system redesign",
|
||||||
|
"step_number": 1,
|
||||||
|
"total_steps": 5,
|
||||||
|
"next_step_required": True,
|
||||||
|
}
|
||||||
|
|
||||||
|
# Mock conversation memory functions
|
||||||
|
with patch("utils.conversation_memory.create_thread", return_value="test-flow-uuid"):
|
||||||
|
with patch("utils.conversation_memory.add_turn"):
|
||||||
|
result = await self.tool.execute(arguments)
|
||||||
|
|
||||||
|
# Verify response structure
|
||||||
|
assert len(result) == 1
|
||||||
|
response_text = result[0].text
|
||||||
|
|
||||||
|
# Parse the JSON response
|
||||||
|
import json
|
||||||
|
|
||||||
|
parsed_response = json.loads(response_text)
|
||||||
|
|
||||||
|
assert parsed_response["step_number"] == 1
|
||||||
|
assert parsed_response["total_steps"] == 5
|
||||||
|
assert parsed_response["continuation_id"] == "test-flow-uuid"
|
||||||
|
assert parsed_response["status"] == "planning_success"
|
||||||
@@ -27,10 +27,11 @@ class TestServerTools:
|
|||||||
assert "testgen" in tool_names
|
assert "testgen" in tool_names
|
||||||
assert "refactor" in tool_names
|
assert "refactor" in tool_names
|
||||||
assert "tracer" in tool_names
|
assert "tracer" in tool_names
|
||||||
|
assert "planner" in tool_names
|
||||||
assert "version" in tool_names
|
assert "version" in tool_names
|
||||||
|
|
||||||
# Should have exactly 12 tools (including consensus, refactor, tracer, and listmodels)
|
# Should have exactly 13 tools (including consensus, refactor, tracer, listmodels, and planner)
|
||||||
assert len(tools) == 12
|
assert len(tools) == 13
|
||||||
|
|
||||||
# Check descriptions are verbose
|
# Check descriptions are verbose
|
||||||
for tool in tools:
|
for tool in tools:
|
||||||
|
|||||||
@@ -8,6 +8,7 @@ from .codereview import CodeReviewTool
|
|||||||
from .consensus import ConsensusTool
|
from .consensus import ConsensusTool
|
||||||
from .debug import DebugIssueTool
|
from .debug import DebugIssueTool
|
||||||
from .listmodels import ListModelsTool
|
from .listmodels import ListModelsTool
|
||||||
|
from .planner import PlannerTool
|
||||||
from .precommit import Precommit
|
from .precommit import Precommit
|
||||||
from .refactor import RefactorTool
|
from .refactor import RefactorTool
|
||||||
from .testgen import TestGenerationTool
|
from .testgen import TestGenerationTool
|
||||||
@@ -22,6 +23,7 @@ __all__ = [
|
|||||||
"ChatTool",
|
"ChatTool",
|
||||||
"ConsensusTool",
|
"ConsensusTool",
|
||||||
"ListModelsTool",
|
"ListModelsTool",
|
||||||
|
"PlannerTool",
|
||||||
"Precommit",
|
"Precommit",
|
||||||
"RefactorTool",
|
"RefactorTool",
|
||||||
"TestGenerationTool",
|
"TestGenerationTool",
|
||||||
|
|||||||
440
tools/planner.py
Normal file
440
tools/planner.py
Normal file
@@ -0,0 +1,440 @@
|
|||||||
|
"""
|
||||||
|
Planner tool
|
||||||
|
|
||||||
|
This tool helps you break down complex ideas, problems, or projects into multiple
|
||||||
|
manageable steps. It enables Claude to think through larger problems sequentially, creating
|
||||||
|
detailed action plans with clear dependencies and alternatives where applicable.
|
||||||
|
|
||||||
|
=== CONTINUATION FLOW LOGIC ===
|
||||||
|
|
||||||
|
The tool implements sophisticated continuation logic that enables multi-session planning:
|
||||||
|
|
||||||
|
RULE 1: No continuation_id + step_number=1
|
||||||
|
→ Creates NEW planning thread
|
||||||
|
→ NO previous context loaded
|
||||||
|
→ Returns continuation_id for future steps
|
||||||
|
|
||||||
|
RULE 2: continuation_id provided + step_number=1
|
||||||
|
→ Loads PREVIOUS COMPLETE PLAN as context
|
||||||
|
→ Starts NEW planning session with historical context
|
||||||
|
→ Claude sees summary of previous completed plan
|
||||||
|
|
||||||
|
RULE 3: continuation_id provided + step_number>1
|
||||||
|
→ NO previous context loaded (middle of current planning session)
|
||||||
|
→ Continues current planning without historical interference
|
||||||
|
|
||||||
|
RULE 4: next_step_required=false (final step)
|
||||||
|
→ Stores COMPLETE PLAN summary in conversation memory
|
||||||
|
→ Returns continuation_id for future planning sessions
|
||||||
|
|
||||||
|
=== CONCRETE EXAMPLE ===
|
||||||
|
|
||||||
|
FIRST PLANNING SESSION (Feature A):
|
||||||
|
Call 1: planner(step="Plan user authentication", step_number=1, total_steps=3, next_step_required=true)
|
||||||
|
→ NEW thread created: "uuid-abc123"
|
||||||
|
→ Response: {"step_number": 1, "continuation_id": "uuid-abc123"}
|
||||||
|
|
||||||
|
Call 2: planner(step="Design login flow", step_number=2, total_steps=3, next_step_required=true, continuation_id="uuid-abc123")
|
||||||
|
→ Middle of current plan - NO context loading
|
||||||
|
→ Response: {"step_number": 2, "continuation_id": "uuid-abc123"}
|
||||||
|
|
||||||
|
Call 3: planner(step="Security implementation", step_number=3, total_steps=3, next_step_required=FALSE, continuation_id="uuid-abc123")
|
||||||
|
→ FINAL STEP: Stores "COMPLETE PLAN: Security implementation (3 steps completed)"
|
||||||
|
→ Response: {"step_number": 3, "planning_complete": true, "continuation_id": "uuid-abc123"}
|
||||||
|
|
||||||
|
LATER PLANNING SESSION (Feature B):
|
||||||
|
Call 1: planner(step="Plan dashboard system", step_number=1, total_steps=2, next_step_required=true, continuation_id="uuid-abc123")
|
||||||
|
→ Loads previous complete plan as context
|
||||||
|
→ Response includes: "=== PREVIOUS COMPLETE PLAN CONTEXT === Security implementation..."
|
||||||
|
→ Claude sees previous work and can build upon it
|
||||||
|
|
||||||
|
Call 2: planner(step="Dashboard widgets", step_number=2, total_steps=2, next_step_required=FALSE, continuation_id="uuid-abc123")
|
||||||
|
→ FINAL STEP: Stores new complete plan summary
|
||||||
|
→ Both planning sessions now available for future continuations
|
||||||
|
|
||||||
|
This enables Claude to say: "Continue planning feature C using the authentication and dashboard work"
|
||||||
|
and the tool will provide context from both previous completed planning sessions.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
from typing import TYPE_CHECKING, Any, Optional
|
||||||
|
|
||||||
|
from pydantic import Field
|
||||||
|
|
||||||
|
if TYPE_CHECKING:
|
||||||
|
from tools.models import ToolModelCategory
|
||||||
|
|
||||||
|
from config import TEMPERATURE_BALANCED
|
||||||
|
from systemprompts import PLANNER_PROMPT
|
||||||
|
|
||||||
|
from .base import BaseTool, ToolRequest
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
# Field descriptions to avoid duplication between Pydantic and JSON schema
|
||||||
|
PLANNER_FIELD_DESCRIPTIONS = {
|
||||||
|
# Interactive planning fields for step-by-step planning
|
||||||
|
"step": (
|
||||||
|
"Your current planning step. For the first step, describe the task/problem to plan. "
|
||||||
|
"For subsequent steps, provide the actual planning step content. Can include: regular planning steps, "
|
||||||
|
"revisions of previous steps, questions about previous decisions, realizations about needing more analysis, "
|
||||||
|
"changes in approach, etc."
|
||||||
|
),
|
||||||
|
"step_number": "Current step number in the planning sequence (starts at 1)",
|
||||||
|
"total_steps": "Current estimate of total steps needed (can be adjusted up/down as planning progresses)",
|
||||||
|
"next_step_required": "Whether another planning step is required after this one",
|
||||||
|
"is_step_revision": "True if this step revises/replaces a previous step",
|
||||||
|
"revises_step_number": "If is_step_revision is true, which step number is being revised",
|
||||||
|
"is_branch_point": "True if this step branches from a previous step to explore alternatives",
|
||||||
|
"branch_from_step": "If is_branch_point is true, which step number is the branching point",
|
||||||
|
"branch_id": "Identifier for the current branch (e.g., 'approach-A', 'microservices-path')",
|
||||||
|
"more_steps_needed": "True if more steps are needed beyond the initial estimate",
|
||||||
|
"continuation_id": "Thread continuation ID for multi-turn planning sessions (useful for seeding new plans with prior context)",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class PlanStep:
|
||||||
|
"""Represents a single step in the planning process."""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self, step_number: int, content: str, branch_id: Optional[str] = None, parent_step: Optional[int] = None
|
||||||
|
):
|
||||||
|
self.step_number = step_number
|
||||||
|
self.content = content
|
||||||
|
self.branch_id = branch_id or "main"
|
||||||
|
self.parent_step = parent_step
|
||||||
|
self.children = []
|
||||||
|
|
||||||
|
|
||||||
|
class PlannerRequest(ToolRequest):
|
||||||
|
"""Request model for the planner tool - interactive step-by-step planning."""
|
||||||
|
|
||||||
|
# Required fields for each planning step
|
||||||
|
step: str = Field(..., description=PLANNER_FIELD_DESCRIPTIONS["step"])
|
||||||
|
step_number: int = Field(..., description=PLANNER_FIELD_DESCRIPTIONS["step_number"])
|
||||||
|
total_steps: int = Field(..., description=PLANNER_FIELD_DESCRIPTIONS["total_steps"])
|
||||||
|
next_step_required: bool = Field(..., description=PLANNER_FIELD_DESCRIPTIONS["next_step_required"])
|
||||||
|
|
||||||
|
# Optional revision/branching fields
|
||||||
|
is_step_revision: Optional[bool] = Field(False, description=PLANNER_FIELD_DESCRIPTIONS["is_step_revision"])
|
||||||
|
revises_step_number: Optional[int] = Field(None, description=PLANNER_FIELD_DESCRIPTIONS["revises_step_number"])
|
||||||
|
is_branch_point: Optional[bool] = Field(False, description=PLANNER_FIELD_DESCRIPTIONS["is_branch_point"])
|
||||||
|
branch_from_step: Optional[int] = Field(None, description=PLANNER_FIELD_DESCRIPTIONS["branch_from_step"])
|
||||||
|
branch_id: Optional[str] = Field(None, description=PLANNER_FIELD_DESCRIPTIONS["branch_id"])
|
||||||
|
more_steps_needed: Optional[bool] = Field(False, description=PLANNER_FIELD_DESCRIPTIONS["more_steps_needed"])
|
||||||
|
|
||||||
|
# Optional continuation field
|
||||||
|
continuation_id: Optional[str] = Field(None, description=PLANNER_FIELD_DESCRIPTIONS["continuation_id"])
|
||||||
|
|
||||||
|
# Override inherited fields to exclude them from schema
|
||||||
|
model: Optional[str] = Field(default=None, exclude=True)
|
||||||
|
temperature: Optional[float] = Field(default=None, exclude=True)
|
||||||
|
thinking_mode: Optional[str] = Field(default=None, exclude=True)
|
||||||
|
use_websearch: Optional[bool] = Field(default=None, exclude=True)
|
||||||
|
images: Optional[list] = Field(default=None, exclude=True)
|
||||||
|
|
||||||
|
|
||||||
|
class PlannerTool(BaseTool):
|
||||||
|
"""Sequential planning tool with step-by-step breakdown and refinement."""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
super().__init__()
|
||||||
|
self.step_history = []
|
||||||
|
self.branches = {}
|
||||||
|
|
||||||
|
def get_name(self) -> str:
|
||||||
|
return "planner"
|
||||||
|
|
||||||
|
def get_description(self) -> str:
|
||||||
|
return (
|
||||||
|
"INTERACTIVE SEQUENTIAL PLANNER - Break down complex tasks through step-by-step planning. "
|
||||||
|
"This tool enables you to think sequentially, building plans incrementally with the ability "
|
||||||
|
"to revise, branch, and adapt as understanding deepens.\n\n"
|
||||||
|
"How it works:\n"
|
||||||
|
"- Start with step 1: describe the task/problem to plan\n"
|
||||||
|
"- Continue with subsequent steps, building the plan piece by piece\n"
|
||||||
|
"- Adjust total_steps estimate as you progress\n"
|
||||||
|
"- Revise previous steps when new insights emerge\n"
|
||||||
|
"- Branch into alternative approaches when needed\n"
|
||||||
|
"- Add more steps even after reaching the initial estimate\n\n"
|
||||||
|
"Key features:\n"
|
||||||
|
"- Sequential thinking with full context awareness\n"
|
||||||
|
"- Branching for exploring alternative strategies\n"
|
||||||
|
"- Revision capabilities to update earlier decisions\n"
|
||||||
|
"- Dynamic step count adjustment\n\n"
|
||||||
|
"Perfect for: complex project planning, system design with unknowns, "
|
||||||
|
"migration strategies, architectural decisions, problem decomposition."
|
||||||
|
)
|
||||||
|
|
||||||
|
def get_input_schema(self) -> dict[str, Any]:
|
||||||
|
schema = {
|
||||||
|
"type": "object",
|
||||||
|
"properties": {
|
||||||
|
# Interactive planning fields
|
||||||
|
"step": {
|
||||||
|
"type": "string",
|
||||||
|
"description": PLANNER_FIELD_DESCRIPTIONS["step"],
|
||||||
|
},
|
||||||
|
"step_number": {
|
||||||
|
"type": "integer",
|
||||||
|
"description": PLANNER_FIELD_DESCRIPTIONS["step_number"],
|
||||||
|
"minimum": 1,
|
||||||
|
},
|
||||||
|
"total_steps": {
|
||||||
|
"type": "integer",
|
||||||
|
"description": PLANNER_FIELD_DESCRIPTIONS["total_steps"],
|
||||||
|
"minimum": 1,
|
||||||
|
},
|
||||||
|
"next_step_required": {
|
||||||
|
"type": "boolean",
|
||||||
|
"description": PLANNER_FIELD_DESCRIPTIONS["next_step_required"],
|
||||||
|
},
|
||||||
|
"is_step_revision": {
|
||||||
|
"type": "boolean",
|
||||||
|
"description": PLANNER_FIELD_DESCRIPTIONS["is_step_revision"],
|
||||||
|
},
|
||||||
|
"revises_step_number": {
|
||||||
|
"type": "integer",
|
||||||
|
"description": PLANNER_FIELD_DESCRIPTIONS["revises_step_number"],
|
||||||
|
"minimum": 1,
|
||||||
|
},
|
||||||
|
"is_branch_point": {
|
||||||
|
"type": "boolean",
|
||||||
|
"description": PLANNER_FIELD_DESCRIPTIONS["is_branch_point"],
|
||||||
|
},
|
||||||
|
"branch_from_step": {
|
||||||
|
"type": "integer",
|
||||||
|
"description": PLANNER_FIELD_DESCRIPTIONS["branch_from_step"],
|
||||||
|
"minimum": 1,
|
||||||
|
},
|
||||||
|
"branch_id": {
|
||||||
|
"type": "string",
|
||||||
|
"description": PLANNER_FIELD_DESCRIPTIONS["branch_id"],
|
||||||
|
},
|
||||||
|
"more_steps_needed": {
|
||||||
|
"type": "boolean",
|
||||||
|
"description": PLANNER_FIELD_DESCRIPTIONS["more_steps_needed"],
|
||||||
|
},
|
||||||
|
"continuation_id": {
|
||||||
|
"type": "string",
|
||||||
|
"description": PLANNER_FIELD_DESCRIPTIONS["continuation_id"],
|
||||||
|
},
|
||||||
|
},
|
||||||
|
# Required fields for interactive planning
|
||||||
|
"required": ["step", "step_number", "total_steps", "next_step_required"],
|
||||||
|
}
|
||||||
|
return schema
|
||||||
|
|
||||||
|
def get_system_prompt(self) -> str:
|
||||||
|
return PLANNER_PROMPT
|
||||||
|
|
||||||
|
def get_request_model(self):
|
||||||
|
return PlannerRequest
|
||||||
|
|
||||||
|
def get_default_temperature(self) -> float:
|
||||||
|
return TEMPERATURE_BALANCED
|
||||||
|
|
||||||
|
def get_model_category(self) -> "ToolModelCategory":
|
||||||
|
from tools.models import ToolModelCategory
|
||||||
|
|
||||||
|
return ToolModelCategory.EXTENDED_REASONING # Planning benefits from deep thinking
|
||||||
|
|
||||||
|
def get_default_thinking_mode(self) -> str:
|
||||||
|
return "high" # Default to high thinking for comprehensive planning
|
||||||
|
|
||||||
|
async def execute(self, arguments: dict[str, Any]) -> list:
|
||||||
|
"""
|
||||||
|
Override execute to work like original TypeScript tool - no AI calls, just data processing.
|
||||||
|
|
||||||
|
This method implements the core continuation logic that enables multi-session planning:
|
||||||
|
|
||||||
|
CONTINUATION LOGIC:
|
||||||
|
1. If no continuation_id + step_number=1: Create new planning thread
|
||||||
|
2. If continuation_id + step_number=1: Load previous complete plan as context for NEW planning
|
||||||
|
3. If continuation_id + step_number>1: Continue current plan (no context loading)
|
||||||
|
4. If next_step_required=false: Mark complete and store plan summary for future use
|
||||||
|
|
||||||
|
CONVERSATION MEMORY INTEGRATION:
|
||||||
|
- Each step is stored in conversation memory for cross-tool continuation
|
||||||
|
- Final steps store COMPLETE PLAN summaries that can be loaded as context
|
||||||
|
- Only step 1 with continuation_id loads previous context (new planning session)
|
||||||
|
- Steps 2+ with continuation_id continue current session without context interference
|
||||||
|
"""
|
||||||
|
from mcp.types import TextContent
|
||||||
|
|
||||||
|
from utils.conversation_memory import add_turn, create_thread, get_thread
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Validate request like the original
|
||||||
|
request_model = self.get_request_model()
|
||||||
|
request = request_model(**arguments)
|
||||||
|
|
||||||
|
# Process step like original TypeScript tool
|
||||||
|
if request.step_number > request.total_steps:
|
||||||
|
request.total_steps = request.step_number
|
||||||
|
|
||||||
|
# === CONTINUATION LOGIC IMPLEMENTATION ===
|
||||||
|
# This implements the 4 rules documented in the module docstring
|
||||||
|
|
||||||
|
continuation_id = request.continuation_id
|
||||||
|
previous_plan_context = ""
|
||||||
|
|
||||||
|
# RULE 1: No continuation_id + step_number=1 → Create NEW planning thread
|
||||||
|
if not continuation_id and request.step_number == 1:
|
||||||
|
# Filter arguments to only include serializable data for conversation memory
|
||||||
|
serializable_args = {
|
||||||
|
k: v
|
||||||
|
for k, v in arguments.items()
|
||||||
|
if not hasattr(v, "__class__") or v.__class__.__module__ != "utils.model_context"
|
||||||
|
}
|
||||||
|
continuation_id = create_thread("planner", serializable_args)
|
||||||
|
# Result: New thread created, no previous context, returns continuation_id
|
||||||
|
|
||||||
|
# RULE 2: continuation_id + step_number=1 → Load PREVIOUS COMPLETE PLAN as context
|
||||||
|
elif continuation_id and request.step_number == 1:
|
||||||
|
thread = get_thread(continuation_id)
|
||||||
|
if thread:
|
||||||
|
# Search for most recent COMPLETE PLAN from previous planning sessions
|
||||||
|
for turn in reversed(thread.turns): # Newest first
|
||||||
|
if turn.tool_name == "planner" and turn.role == "assistant":
|
||||||
|
# Try to parse as JSON first (new format)
|
||||||
|
try:
|
||||||
|
turn_data = json.loads(turn.content)
|
||||||
|
if isinstance(turn_data, dict) and turn_data.get("planning_complete"):
|
||||||
|
# New JSON format
|
||||||
|
plan_summary = turn_data.get("plan_summary", "")
|
||||||
|
if plan_summary:
|
||||||
|
previous_plan_context = plan_summary[:500]
|
||||||
|
break
|
||||||
|
except (json.JSONDecodeError, ValueError):
|
||||||
|
# Fallback to old text format
|
||||||
|
if "planning_complete" in turn.content:
|
||||||
|
try:
|
||||||
|
if "COMPLETE PLAN:" in turn.content:
|
||||||
|
plan_start = turn.content.find("COMPLETE PLAN:")
|
||||||
|
previous_plan_context = turn.content[plan_start : plan_start + 500] + "..."
|
||||||
|
else:
|
||||||
|
previous_plan_context = turn.content[:300] + "..."
|
||||||
|
break
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
|
if previous_plan_context:
|
||||||
|
previous_plan_context = f"\\n\\n=== PREVIOUS COMPLETE PLAN CONTEXT ===\\n{previous_plan_context}\\n=== END CONTEXT ===\\n"
|
||||||
|
# Result: NEW planning session with previous complete plan as context
|
||||||
|
|
||||||
|
# RULE 3: continuation_id + step_number>1 → Continue current plan (no context loading)
|
||||||
|
# This case is handled by doing nothing - we're in the middle of current planning
|
||||||
|
# Result: Current planning continues without historical interference
|
||||||
|
|
||||||
|
step_data = {
|
||||||
|
"step": request.step,
|
||||||
|
"step_number": request.step_number,
|
||||||
|
"total_steps": request.total_steps,
|
||||||
|
"next_step_required": request.next_step_required,
|
||||||
|
"is_step_revision": request.is_step_revision,
|
||||||
|
"revises_step_number": request.revises_step_number,
|
||||||
|
"is_branch_point": request.is_branch_point,
|
||||||
|
"branch_from_step": request.branch_from_step,
|
||||||
|
"branch_id": request.branch_id,
|
||||||
|
"more_steps_needed": request.more_steps_needed,
|
||||||
|
"continuation_id": request.continuation_id,
|
||||||
|
}
|
||||||
|
|
||||||
|
# Store in local history like original
|
||||||
|
self.step_history.append(step_data)
|
||||||
|
|
||||||
|
# Handle branching like original
|
||||||
|
if request.is_branch_point and request.branch_from_step and request.branch_id:
|
||||||
|
if request.branch_id not in self.branches:
|
||||||
|
self.branches[request.branch_id] = []
|
||||||
|
self.branches[request.branch_id].append(step_data)
|
||||||
|
|
||||||
|
# Build structured JSON response like other tools (consensus, refactor)
|
||||||
|
response_data = {
|
||||||
|
"status": "planning_success",
|
||||||
|
"step_number": request.step_number,
|
||||||
|
"total_steps": request.total_steps,
|
||||||
|
"next_step_required": request.next_step_required,
|
||||||
|
"step_content": request.step,
|
||||||
|
"metadata": {
|
||||||
|
"branches": list(self.branches.keys()),
|
||||||
|
"step_history_length": len(self.step_history),
|
||||||
|
"is_step_revision": request.is_step_revision or False,
|
||||||
|
"revises_step_number": request.revises_step_number,
|
||||||
|
"is_branch_point": request.is_branch_point or False,
|
||||||
|
"branch_from_step": request.branch_from_step,
|
||||||
|
"branch_id": request.branch_id,
|
||||||
|
"more_steps_needed": request.more_steps_needed or False,
|
||||||
|
},
|
||||||
|
"output": {
|
||||||
|
"instructions": "This is a structured planning response. Present the step_content as the main planning analysis. If next_step_required is true, continue with the next step. If planning_complete is true, present the complete plan in a well-structured format with clear sections, headings, numbered steps, and visual elements like ASCII charts for phases/dependencies. Use bullet points, sub-steps, sequences, and visual organization to make complex plans easy to understand and follow. IMPORTANT: Do NOT use emojis - use clear text formatting and ASCII characters only. Do NOT mention time estimates or costs unless explicitly requested.",
|
||||||
|
"format": "step_by_step_planning",
|
||||||
|
"presentation_guidelines": {
|
||||||
|
"completed_plans": "Use clear headings, numbered phases, ASCII diagrams for workflows/dependencies, bullet points for sub-tasks, and visual sequences where helpful. No emojis. No time/cost estimates unless requested.",
|
||||||
|
"step_content": "Present as main analysis with clear structure and actionable insights. No emojis. No time/cost estimates unless requested.",
|
||||||
|
"continuation": "Use continuation_id for related planning sessions or implementation planning",
|
||||||
|
},
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
# Always include continuation_id if we have one (enables step chaining within session)
|
||||||
|
if continuation_id:
|
||||||
|
response_data["continuation_id"] = continuation_id
|
||||||
|
|
||||||
|
# Add previous plan context if available
|
||||||
|
if previous_plan_context:
|
||||||
|
response_data["previous_plan_context"] = previous_plan_context.strip()
|
||||||
|
|
||||||
|
# RULE 4: next_step_required=false → Mark complete and store plan summary
|
||||||
|
if not request.next_step_required:
|
||||||
|
response_data["planning_complete"] = True
|
||||||
|
response_data["plan_summary"] = (
|
||||||
|
f"COMPLETE PLAN: {request.step} (Total {request.total_steps} steps completed)"
|
||||||
|
)
|
||||||
|
response_data["next_steps"] = (
|
||||||
|
"Planning complete. Present the complete plan to the user in a well-structured format with clear sections, "
|
||||||
|
"numbered steps, visual elements (ASCII charts/diagrams where helpful), sub-step breakdowns, and implementation guidance. "
|
||||||
|
"Use headings, bullet points, and visual organization to make the plan easy to follow. "
|
||||||
|
"If there are phases, dependencies, or parallel tracks, show these relationships visually. "
|
||||||
|
"IMPORTANT: Do NOT use emojis - use clear text formatting and ASCII characters only. "
|
||||||
|
"Do NOT mention time estimates or costs unless explicitly requested. "
|
||||||
|
"After presenting the plan, offer to either help implement specific parts or use the continuation_id to start related planning sessions."
|
||||||
|
)
|
||||||
|
# Result: Planning marked complete, summary stored for future context loading
|
||||||
|
else:
|
||||||
|
response_data["planning_complete"] = False
|
||||||
|
remaining_steps = request.total_steps - request.step_number
|
||||||
|
response_data["next_steps"] = (
|
||||||
|
f"Continue with step {request.step_number + 1}. Approximately {remaining_steps} steps remaining."
|
||||||
|
)
|
||||||
|
# Result: Intermediate step, planning continues
|
||||||
|
|
||||||
|
# Convert to clean JSON response
|
||||||
|
response_content = json.dumps(response_data, indent=2)
|
||||||
|
|
||||||
|
# Store this step in conversation memory
|
||||||
|
if continuation_id:
|
||||||
|
add_turn(
|
||||||
|
thread_id=continuation_id,
|
||||||
|
role="assistant",
|
||||||
|
content=response_content,
|
||||||
|
tool_name="planner",
|
||||||
|
model_name="claude-planner",
|
||||||
|
)
|
||||||
|
|
||||||
|
# Return the JSON response directly as text content, like consensus tool
|
||||||
|
return [TextContent(type="text", text=response_content)]
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
# Error handling - return JSON directly like consensus tool
|
||||||
|
error_data = {"error": str(e), "status": "planning_failed"}
|
||||||
|
return [TextContent(type="text", text=json.dumps(error_data, indent=2))]
|
||||||
|
|
||||||
|
# Stub implementations for abstract methods (not used since we override execute)
|
||||||
|
async def prepare_prompt(self, request: PlannerRequest) -> str:
|
||||||
|
return "" # Not used - execute() is overridden
|
||||||
|
|
||||||
|
def format_response(self, response: str, request: PlannerRequest, model_info: dict = None) -> str:
|
||||||
|
return response # Not used - execute() is overridden
|
||||||
Reference in New Issue
Block a user