🚀 Major Enhancement: Workflow-Based Tool Architecture v5.5.0 (#95)

* WIP: new workflow architecture * WIP: further improvements and cleanup * WIP: cleanup and docks, replace old tool with new * WIP: cleanup and docks, replace old tool with new * WIP: new planner implementation using workflow * WIP: precommit tool working as a workflow instead of a basic tool Support for passing False to use_assistant_model to skip external models completely and use Claude only * WIP: precommit workflow version swapped with old * WIP: codereview * WIP: replaced codereview * WIP: replaced codereview * WIP: replaced refactor * WIP: workflow for thinkdeep * WIP: ensure files get embedded correctly * WIP: thinkdeep replaced with workflow version * WIP: improved messaging when an external model's response is received * WIP: analyze tool swapped * WIP: updated tests * Extract only the content when building history * Use "relevant_files" for workflow tools only * WIP: updated tests * Extract only the content when building history * Use "relevant_files" for workflow tools only * WIP: fixed get_completion_next_steps_message missing param * Fixed tests Request for files consistently * Fixed tests Request for files consistently * Fixed tests * New testgen workflow tool Updated docs * Swap testgen workflow * Fix CI test failures by excluding API-dependent tests - Update GitHub Actions workflow to exclude simulation tests that require API keys - Fix collaboration tests to properly mock workflow tool expert analysis calls - Update test assertions to handle new workflow tool response format - Ensure unit tests run without external API dependencies in CI 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * WIP - Update tests to match new tools * WIP - Update tests to match new tools --------- Co-authored-by: Claude <noreply@anthropic.com>
2025-06-21 00:08:11 +04:00
parent 4dae6e457e
commit 69a3121452
76 changed files with 17111 additions and 7725 deletions
--- a/docs/tools/analyze.md
+++ b/docs/tools/analyze.md
@@ -1,13 +1,32 @@
 # Analyze Tool - Smart File Analysis

-**General-purpose code understanding and exploration**
+**General-purpose code understanding and exploration through workflow-driven investigation**

-The `analyze` tool provides comprehensive code analysis and understanding capabilities, helping you explore codebases, understand architecture, and identify patterns across files and directories.
+The `analyze` tool provides comprehensive code analysis and understanding capabilities, helping you explore codebases, understand architecture, and identify patterns across files and directories. This workflow tool guides Claude through systematic investigation of code structure, patterns, and architectural decisions across multiple steps, gathering comprehensive insights before providing expert analysis.

 ## Thinking Mode

 **Default is `medium` (8,192 tokens).** Use `high` for architecture analysis (comprehensive insights worth the cost) or `low` for quick file overviews (save ~6k tokens).

+## How the Workflow Works
+
+The analyze tool implements a **structured workflow** for thorough code understanding:
+
+**Investigation Phase (Claude-Led):**
+1. **Step 1**: Claude describes the analysis plan and begins examining code structure
+2. **Step 2+**: Claude investigates architecture, patterns, dependencies, and design decisions
+3. **Throughout**: Claude tracks findings, relevant files, insights, and confidence levels
+4. **Completion**: Once analysis is comprehensive, Claude signals completion
+
+**Expert Analysis Phase:**
+After Claude completes the investigation (unless confidence is **certain**):
+- Complete analysis summary with all findings
+- Architectural insights and pattern identification
+- Strategic improvement recommendations
+- Final expert assessment based on investigation
+
+This workflow ensures methodical analysis before expert insights, resulting in deeper understanding and more valuable recommendations.
+
 ## Example Prompts

 **Basic Usage:**
@@ -30,7 +49,21 @@ The `analyze` tool provides comprehensive code analysis and understanding capabi

 ## Tool Parameters

- `files`: Files or directories to analyze (required, absolute paths)
+**Workflow Investigation Parameters (used during step-by-step process):**
+- `step`: Current investigation step description (required for each step)
+- `step_number`: Current step number in analysis sequence (required)
+- `total_steps`: Estimated total investigation steps (adjustable)
+- `next_step_required`: Whether another investigation step is needed
+- `findings`: Discoveries and insights collected in this step (required)
+- `files_checked`: All files examined during investigation
+- `relevant_files`: Files directly relevant to the analysis (required in step 1)
+- `relevant_context`: Methods/functions/classes central to analysis findings
+- `issues_found`: Issues or concerns identified with severity levels
+- `confidence`: Confidence level in analysis completeness (exploring/low/medium/high/certain)
+- `backtrack_from_step`: Step number to backtrack from (for revisions)
+- `images`: Visual references for analysis context
+
+**Initial Configuration (used in step 1):**
 - `prompt`: What to analyze or look for (required)
 - `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high|gpt4.1 (default: server default)
 - `analysis_type`: architecture|performance|security|quality|general (default: general)
@@ -38,6 +71,7 @@ The `analyze` tool provides comprehensive code analysis and understanding capabi
 - `temperature`: Temperature for analysis (0-1, default 0.2)
 - `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
 - `use_websearch`: Enable web search for documentation and best practices (default: true)
+- `use_assistant_model`: Whether to use expert analysis phase (default: true, set to false to use Claude only)
 - `continuation_id`: Continue previous analysis sessions

 ## Analysis Types
--- a/docs/tools/codereview.md
+++ b/docs/tools/codereview.md
@@ -1,13 +1,32 @@
 # CodeReview Tool - Professional Code Review

-**Comprehensive code analysis with prioritized feedback**
+**Comprehensive code analysis with prioritized feedback through workflow-driven investigation**

-The `codereview` tool provides professional code review capabilities with actionable feedback, severity-based issue prioritization, and support for various review types from quick style checks to comprehensive security audits.
+The `codereview` tool provides professional code review capabilities with actionable feedback, severity-based issue prioritization, and support for various review types from quick style checks to comprehensive security audits. This workflow tool guides Claude through systematic investigation steps with forced pauses between each step to ensure thorough code examination, issue identification, and quality assessment before providing expert analysis.

 ## Thinking Mode

 **Default is `medium` (8,192 tokens).** Use `high` for security-critical code (worth the extra tokens) or `low` for quick style checks (saves ~6k tokens).

+## How the Workflow Works
+
+The codereview tool implements a **structured workflow** that ensures thorough code examination:
+
+**Investigation Phase (Claude-Led):**
+1. **Step 1**: Claude describes the review plan and begins systematic analysis of code structure
+2. **Step 2+**: Claude examines code quality, security implications, performance concerns, and architectural patterns
+3. **Throughout**: Claude tracks findings, relevant files, issues, and confidence levels
+4. **Completion**: Once review is comprehensive, Claude signals completion
+
+**Expert Analysis Phase:**
+After Claude completes the investigation (unless confidence is **certain**):
+- Complete review summary with all findings and evidence
+- Relevant files and code patterns identified
+- Issues categorized by severity levels
+- Final recommendations based on investigation
+
+**Special Note**: If you want Claude to perform the entire review without calling another model, you can include "don't use any other model" in your prompt, and Claude will complete the full workflow independently.
+
 ## Model Recommendation

 This tool particularly benefits from Gemini Pro or Flash models due to their 1M context window, which allows comprehensive analysis of large codebases. Claude's context limitations make it challenging to see the "big picture" in complex projects - this is a concrete example where utilizing a secondary model with larger context provides significant value beyond just experimenting with different AI capabilities.
@@ -45,7 +64,21 @@ The above prompt will simultaneously run two separate `codereview` tools with tw

 ## Tool Parameters

- `files`: List of file paths or directories to review (required)
+**Workflow Investigation Parameters (used during step-by-step process):**
+- `step`: Current investigation step description (required for each step)
+- `step_number`: Current step number in review sequence (required)
+- `total_steps`: Estimated total investigation steps (adjustable)
+- `next_step_required`: Whether another investigation step is needed
+- `findings`: Discoveries and evidence collected in this step (required)
+- `files_checked`: All files examined during investigation
+- `relevant_files`: Files directly relevant to the review (required in step 1)
+- `relevant_context`: Methods/functions/classes central to review findings
+- `issues_found`: Issues identified with severity levels
+- `confidence`: Confidence level in review completeness (exploring/low/medium/high/certain)
+- `backtrack_from_step`: Step number to backtrack from (for revisions)
+- `images`: Visual references for review context
+
+**Initial Review Configuration (used in step 1):**
 - `prompt`: User's summary of what the code does, expected behavior, constraints, and review objectives (required)
 - `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high|gpt4.1 (default: server default)
 - `review_type`: full|security|performance|quick (default: full)
@@ -55,6 +88,7 @@ The above prompt will simultaneously run two separate `codereview` tools with tw
 - `temperature`: Temperature for consistency (0-1, default 0.2)
 - `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
 - `use_websearch`: Enable web search for best practices and documentation (default: true)
+- `use_assistant_model`: Whether to use expert analysis phase (default: true, set to false to use Claude only)
 - `continuation_id`: Continue previous review discussions

 ## Review Types
--- a/docs/tools/debug.md
+++ b/docs/tools/debug.md
@@ -37,6 +37,8 @@ in which case expert analysis is bypassed):

 This structured approach ensures Claude performs methodical groundwork before expert analysis, resulting in significantly better debugging outcomes and more efficient token usage.

+**Special Note**: If you want Claude to perform the entire debugging investigation without calling another model, you can include "don't use any other model" in your prompt, and Claude will complete the full workflow independently.
+
 ## Key Features

 - **Multi-step investigation process** with evidence collection and hypothesis evolution
@@ -63,7 +65,7 @@ This structured approach ensures Claude performs methodical groundwork before ex
 - `relevant_files`: Files directly tied to the root cause or its effects
 - `relevant_methods`: Specific methods/functions involved in the issue
 - `hypothesis`: Current best guess about the underlying cause
- `confidence`: Confidence level in current hypothesis (low/medium/high)
+- `confidence`: Confidence level in current hypothesis (exploring/low/medium/high/certain)
 - `backtrack_from_step`: Step number to backtrack from (for revisions)
 - `continuation_id`: Thread ID for continuing investigations across sessions
 - `images`: Visual debugging materials (error screenshots, logs, etc.)
@@ -72,6 +74,7 @@ This structured approach ensures Claude performs methodical groundwork before ex
 - `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high (default: server default)
 - `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
 - `use_websearch`: Enable web search for documentation and solutions (default: true)
+- `use_assistant_model`: Whether to use expert analysis phase (default: true, set to false to use Claude only)

 ## Usage Examples

--- a/docs/tools/precommit.md
+++ b/docs/tools/precommit.md
@@ -1,13 +1,32 @@
 # PreCommit Tool - Pre-Commit Validation

-**Comprehensive review of staged/unstaged git changes across multiple repositories**
+**Comprehensive review of staged/unstaged git changes across multiple repositories through workflow-driven investigation**

-The `precommit` tool provides thorough validation of git changes before committing, ensuring code quality, requirement compliance, and preventing regressions across multiple repositories.
+The `precommit` tool provides thorough validation of git changes before committing, ensuring code quality, requirement compliance, and preventing regressions across multiple repositories. This workflow tool guides Claude through systematic investigation of git changes, repository status, and file modifications across multiple steps before providing expert validation.

 ## Thinking Mode

 **Default is `medium` (8,192 tokens).** Use `high` or `max` for critical releases when thorough validation justifies the token cost.

+## How the Workflow Works
+
+The precommit tool implements a **structured workflow** for comprehensive change validation:
+
+**Investigation Phase (Claude-Led):**
+1. **Step 1**: Claude describes the validation plan and begins analyzing git status across repositories
+2. **Step 2+**: Claude examines changes, diffs, dependencies, and potential impacts
+3. **Throughout**: Claude tracks findings, relevant files, issues, and confidence levels
+4. **Completion**: Once investigation is thorough, Claude signals completion
+
+**Expert Validation Phase:**
+After Claude completes the investigation (unless confidence is **certain**):
+- Complete summary of all changes and their context
+- Potential issues and regressions identified
+- Requirement compliance assessment
+- Final recommendations for safe commit
+
+**Special Note**: If you want Claude to perform the entire pre-commit validation without calling another model, you can include "don't use any other model" in your prompt, and Claude will complete the full workflow independently.
+
 ## Model Recommendation

 Pre-commit validation benefits significantly from models with extended context windows like Gemini Pro, which can analyze extensive changesets across multiple files and repositories simultaneously. This comprehensive view enables detection of cross-file dependencies, architectural inconsistencies, and integration issues that might be missed when reviewing changes in isolation due to context constraints.
@@ -47,21 +66,34 @@ Use zen and perform a thorough precommit ensuring there aren't any new regressio

 ## Tool Parameters

+**Workflow Investigation Parameters (used during step-by-step process):**
+- `step`: Current investigation step description (required for each step)
+- `step_number`: Current step number in validation sequence (required)
+- `total_steps`: Estimated total investigation steps (adjustable)
+- `next_step_required`: Whether another investigation step is needed
+- `findings`: Discoveries and evidence collected in this step (required)
+- `files_checked`: All files examined during investigation
+- `relevant_files`: Files directly relevant to the changes
+- `relevant_context`: Methods/functions/classes affected by changes
+- `issues_found`: Issues identified with severity levels
+- `confidence`: Confidence level in validation completeness (exploring/low/medium/high/certain)
+- `backtrack_from_step`: Step number to backtrack from (for revisions)
+- `hypothesis`: Current assessment of change safety and completeness
+- `images`: Screenshots of requirements, design mockups for validation
+
+**Initial Configuration (used in step 1):**
 - `path`: Starting directory to search for repos (default: current directory, absolute path required)
 - `prompt`: The original user request description for the changes (required for context)
 - `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high|gpt4.1 (default: server default)
 - `compare_to`: Compare against a branch/tag instead of local changes (optional)
- `review_type`: full|security|performance|quick (default: full)
 - `severity_filter`: critical|high|medium|low|all (default: all)
- `max_depth`: How deep to search for nested repos (default: 5)
 - `include_staged`: Include staged changes in the review (default: true)
 - `include_unstaged`: Include uncommitted changes in the review (default: true)
- `images`: Screenshots of requirements, design mockups, or error states for validation context
- `files`: Optional files for additional context (not part of changes but provide context)
 - `focus_on`: Specific aspects to focus on
 - `temperature`: Temperature for response (default: 0.2)
 - `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
 - `use_websearch`: Enable web search for best practices (default: true)
+- `use_assistant_model`: Whether to use expert validation phase (default: true, set to false to use Claude only)
 - `continuation_id`: Continue previous validation discussions

 ## Usage Examples
--- a/docs/tools/refactor.md
+++ b/docs/tools/refactor.md
@@ -1,13 +1,32 @@
 # Refactor Tool - Intelligent Code Refactoring

-**Comprehensive refactoring analysis with top-down decomposition strategy**
+**Comprehensive refactoring analysis with top-down decomposition strategy through workflow-driven investigation**

-The `refactor` tool provides intelligent code refactoring recommendations with a focus on top-down decomposition and systematic code improvement. It prioritizes structural improvements over cosmetic changes.
+The `refactor` tool provides intelligent code refactoring recommendations with a focus on top-down decomposition and systematic code improvement. This workflow tool enforces systematic investigation of code smells, decomposition opportunities, and modernization possibilities across multiple steps, ensuring thorough analysis before providing expert refactoring recommendations with precise implementation guidance.

 ## Thinking Mode

 **Default is `medium` (8,192 tokens).** Use `high` for complex legacy systems (worth the investment for thorough refactoring plans) or `max` for extremely complex codebases requiring deep analysis.

+## How the Workflow Works
+
+The refactor tool implements a **structured workflow** for systematic refactoring analysis:
+
+**Investigation Phase (Claude-Led):**
+1. **Step 1**: Claude describes the refactoring plan and begins analyzing code structure
+2. **Step 2+**: Claude examines code smells, decomposition opportunities, and modernization possibilities
+3. **Throughout**: Claude tracks findings, relevant files, refactoring opportunities, and confidence levels
+4. **Completion**: Once investigation is thorough, Claude signals completion
+
+**Expert Analysis Phase:**
+After Claude completes the investigation (unless confidence is **complete**):
+- Complete refactoring opportunity summary
+- Prioritized recommendations by impact
+- Precise implementation guidance with line numbers
+- Final expert assessment for refactoring strategy
+
+This workflow ensures methodical investigation before expert recommendations, resulting in more targeted and valuable refactoring plans.
+
 ## Model Recommendation

 The refactor tool excels with models that have large context windows like Gemini Pro (1M tokens), which can analyze entire files and complex codebases simultaneously. This comprehensive view enables detection of cross-file dependencies, architectural patterns, and refactoring opportunities that might be missed when reviewing code in smaller chunks due to context constraints.
@@ -67,13 +86,28 @@ This results in Claude first performing its own expert analysis, encouraging it

 ## Tool Parameters

- `files`: Code files or directories to analyze for refactoring opportunities (required, absolute paths)
+**Workflow Investigation Parameters (used during step-by-step process):**
+- `step`: Current investigation step description (required for each step)
+- `step_number`: Current step number in refactoring sequence (required)
+- `total_steps`: Estimated total investigation steps (adjustable)
+- `next_step_required`: Whether another investigation step is needed
+- `findings`: Discoveries and refactoring opportunities in this step (required)
+- `files_checked`: All files examined during investigation
+- `relevant_files`: Files directly needing refactoring (required in step 1)
+- `relevant_context`: Methods/functions/classes requiring refactoring
+- `issues_found`: Refactoring opportunities with severity and type
+- `confidence`: Confidence level in analysis completeness (exploring/incomplete/partial/complete)
+- `backtrack_from_step`: Step number to backtrack from (for revisions)
+- `hypothesis`: Current assessment of refactoring priorities
+
+**Initial Configuration (used in step 1):**
 - `prompt`: Description of refactoring goals, context, and specific areas of focus (required)
- `refactor_type`: codesmells|decompose|modernize|organization (required)
+- `refactor_type`: codesmells|decompose|modernize|organization (default: codesmells)
 - `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high|gpt4.1 (default: server default)
 - `focus_areas`: Specific areas to focus on (e.g., 'performance', 'readability', 'maintainability', 'security')
 - `style_guide_examples`: Optional existing code files to use as style/pattern reference (absolute paths)
 - `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
+- `use_assistant_model`: Whether to use expert analysis phase (default: true, set to false to use Claude only)
 - `continuation_id`: Thread continuation ID for multi-turn conversations

 ## Usage Examples
--- a/docs/tools/testgen.md
+++ b/docs/tools/testgen.md
@@ -1,13 +1,32 @@
 # TestGen Tool - Comprehensive Test Generation

-**Generates thorough test suites with edge case coverage based on existing code and test framework used**
+**Generates thorough test suites with edge case coverage through workflow-driven investigation**

-The `testgen` tool creates comprehensive test suites by analyzing your code paths, understanding intricate dependencies, and identifying realistic edge cases and failure scenarios that need test coverage.
+The `testgen` tool creates comprehensive test suites by analyzing your code paths, understanding intricate dependencies, and identifying realistic edge cases and failure scenarios that need test coverage. This workflow tool guides Claude through systematic investigation of code functionality, critical paths, edge cases, and integration points across multiple steps before generating comprehensive tests with realistic failure mode analysis.

 ## Thinking Mode

 **Default is `medium` (8,192 tokens) for extended thinking models.** Use `high` for complex systems with many interactions or `max` for critical systems requiring exhaustive test coverage.

+## How the Workflow Works
+
+The testgen tool implements a **structured workflow** for comprehensive test generation:
+
+**Investigation Phase (Claude-Led):**
+1. **Step 1**: Claude describes the test generation plan and begins analyzing code functionality
+2. **Step 2+**: Claude examines critical paths, edge cases, error handling, and integration points
+3. **Throughout**: Claude tracks findings, test scenarios, and coverage gaps
+4. **Completion**: Once investigation is thorough, Claude signals completion
+
+**Test Generation Phase:**
+After Claude completes the investigation:
+- Complete test scenario catalog with all edge cases
+- Framework-specific test generation
+- Realistic failure mode coverage
+- Final test suite with comprehensive coverage
+
+This workflow ensures methodical analysis before test generation, resulting in more thorough and valuable test suites.
+
 ## Model Recommendation

 Test generation excels with extended reasoning models like Gemini Pro or O3, which can analyze complex code paths, understand intricate dependencies, and identify comprehensive edge cases. The combination of large context windows and advanced reasoning enables generation of thorough test suites that cover realistic failure scenarios and integration points that shorter-context models might overlook.
@@ -37,11 +56,24 @@ Test generation excels with extended reasoning models like Gemini Pro or O3, whi

 ## Tool Parameters

- `files`: Code files or directories to generate tests for (required, absolute paths)
+**Workflow Investigation Parameters (used during step-by-step process):**
+- `step`: Current investigation step description (required for each step)
+- `step_number`: Current step number in test generation sequence (required)
+- `total_steps`: Estimated total investigation steps (adjustable)
+- `next_step_required`: Whether another investigation step is needed
+- `findings`: Discoveries about functionality and test scenarios (required)
+- `files_checked`: All files examined during investigation
+- `relevant_files`: Files directly needing tests (required in step 1)
+- `relevant_context`: Methods/functions/classes requiring test coverage
+- `confidence`: Confidence level in test plan completeness (exploring/low/medium/high/certain)
+- `backtrack_from_step`: Step number to backtrack from (for revisions)
+
+**Initial Configuration (used in step 1):**
 - `prompt`: Description of what to test, testing objectives, and specific scope/focus areas (required)
 - `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high|gpt4.1 (default: server default)
 - `test_examples`: Optional existing test files or directories to use as style/pattern reference (absolute paths)
 - `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
+- `use_assistant_model`: Whether to use expert test generation phase (default: true, set to false to use Claude only)

 ## Usage Examples