From a0fbc078e8d36052a38f0e2c2a0db9787fac96df Mon Sep 17 00:00:00 2001
From: Sven Lito <me@svenlito.com>
Date: Tue, 12 Aug 2025 14:36:14 +0700
Subject: [PATCH 1/6] docs: update model references and add missing GPT-5 and
 Gemini 2.0 models

- Add missing models to all tool parameter documentation
- Update model table in advanced-usage.md with GPT-5 series
- Add Gemini 2.0 Flash and Flash Lite models
- Include detailed capabilities for each model variant
- Fix model parameter consistency across all tool docs

Models added:
- GPT-5 (gpt5): Advanced reasoning with 400K context
- GPT-5 Mini (gpt5-mini): Efficient variant
- GPT-5 Nano (gpt5-nano): Fast, low-cost variant
- Gemini 2.0 Flash (flash-2.0): Audio/video support
- Gemini 2.0 Flash Lite (flashlite): Text-only lightweight
---
 docs/advanced-usage.md       |  29 ++--
 docs/tools/analyze.md        |   2 +-
 docs/tools/analyze.md.bak    | 199 ++++++++++++++++++++++++++++
 docs/tools/chat.md           |   2 +-
 docs/tools/chat.md.bak       |  84 ++++++++++++
 docs/tools/codereview.md     |   2 +-
 docs/tools/codereview.md.bak | 174 ++++++++++++++++++++++++
 docs/tools/debug.md          |   2 +-
 docs/tools/precommit.md      |   2 +-
 docs/tools/precommit.md.bak  | 247 +++++++++++++++++++++++++++++++++++
 docs/tools/refactor.md       |   2 +-
 docs/tools/refactor.md.bak   | 201 ++++++++++++++++++++++++++++
 docs/tools/secaudit.md       |   2 +-
 docs/tools/secaudit.md.bak   | 222 +++++++++++++++++++++++++++++++
 docs/tools/testgen.md        |   2 +-
 docs/tools/testgen.md.bak    | 220 +++++++++++++++++++++++++++++++
 docs/tools/thinkdeep.md      |   2 +-
 docs/tools/thinkdeep.md.bak  |  97 ++++++++++++++
 18 files changed, 1474 insertions(+), 17 deletions(-)
 create mode 100644 docs/tools/analyze.md.bak
 create mode 100644 docs/tools/chat.md.bak
 create mode 100644 docs/tools/codereview.md.bak
 create mode 100644 docs/tools/precommit.md.bak
 create mode 100644 docs/tools/refactor.md.bak
 create mode 100644 docs/tools/secaudit.md.bak
 create mode 100644 docs/tools/testgen.md.bak
 create mode 100644 docs/tools/thinkdeep.md.bak

diff --git a/docs/advanced-usage.md b/docs/advanced-usage.md
index 63856da..fcba473 100644
--- a/docs/advanced-usage.md
+++ b/docs/advanced-usage.md
@@ -34,11 +34,16 @@ Regardless of your default configuration, you can specify models per request:
 | Model | Provider | Context | Strengths | Auto Mode Usage |
 |-------|----------|---------|-----------|------------------|
 | **`pro`** (Gemini 2.5 Pro) | Google | 1M tokens | Extended thinking (up to 32K tokens), deep analysis | Complex architecture, security reviews, deep debugging |
-| **`flash`** (Gemini 2.0 Flash) | Google | 1M tokens | Ultra-fast responses | Quick checks, formatting, simple analysis |
+| **`flash`** (Gemini 2.5 Flash) | Google | 1M tokens | Ultra-fast responses with thinking | Quick checks, formatting, simple analysis |
+| **`flash-2.0`** (Gemini 2.0 Flash) | Google | 1M tokens | Latest fast model with audio/video support | Quick analysis with multimodal input |
+| **`flashlite`** (Gemini 2.0 Flash Lite) | Google | 1M tokens | Lightweight text-only model | Fast text processing without vision |
 | **`o3`** | OpenAI | 200K tokens | Strong logical reasoning | Debugging logic errors, systematic analysis |
 | **`o3-mini`** | OpenAI | 200K tokens | Balanced speed/quality | Moderate complexity tasks |
 | **`o4-mini`** | OpenAI | 200K tokens | Latest reasoning model | Optimized for shorter contexts |
 | **`gpt4.1`** | OpenAI | 1M tokens | Latest GPT-4 with extended context | Large codebase analysis, comprehensive reviews |
+| **`gpt5`** (GPT-5) | OpenAI | 400K tokens | Advanced model with reasoning support | Complex problems requiring advanced reasoning |
+| **`gpt5-mini`** (GPT-5 Mini) | OpenAI | 400K tokens | Efficient variant with reasoning | Balanced performance and capability |
+| **`gpt5-nano`** (GPT-5 Nano) | OpenAI | 400K tokens | Fastest, cheapest GPT-5 variant | Summarization and classification tasks |
 | **`grok-4-latest`** | X.AI | 256K tokens | Latest flagship model with reasoning, vision | Complex analysis, reasoning tasks |
 | **`grok-3`** | X.AI | 131K tokens | Advanced reasoning model | Deep analysis, complex problems |
 | **`grok-3-fast`** | X.AI | 131K tokens | Higher performance variant | Fast responses with reasoning |
@@ -50,8 +55,16 @@ cloud models (expensive/powerful) AND local models (free/private) in the same co
 
 **Model Capabilities:**
 - **Gemini Models**: Support thinking modes (minimal to max), web search, 1M context
-- **O3 Models**: Excellent reasoning, systematic analysis, 200K context
+  - **Pro 2.5**: Deep analysis with max 32K thinking tokens
+  - **Flash 2.5**: Ultra-fast with thinking support (24K thinking tokens)
+  - **Flash 2.0**: Latest fast model with audio/video input (24K thinking tokens)  
+  - **Flash Lite 2.0**: Text-only lightweight model (no thinking support)
+- **O3/O4 Models**: Excellent reasoning, systematic analysis, 200K context
 - **GPT-4.1**: Extended context window (1M tokens), general capabilities
+- **GPT-5 Series**: Advanced reasoning models, 400K context
+  - **GPT-5**: Full-featured with reasoning support and vision
+  - **GPT-5 Mini**: Balanced efficiency and capability
+  - **GPT-5 Nano**: Optimized for fast, low-cost tasks
 - **Grok-4**: Extended thinking support, vision capabilities, 256K context
 - **Grok-3 Models**: Advanced reasoning, 131K context
 
@@ -148,7 +161,7 @@ All tools that work with files support **both individual files and entire direct
 **`analyze`** - Analyze files or directories
 - `files`: List of file paths or directories (required)
 - `question`: What to analyze (required)  
-- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `model`: auto|pro|flash|flash-2.0|flashlite|o3|o3-mini|o4-mini|gpt4.1|gpt5|gpt5-mini|gpt5-nano (default: server default)
 - `analysis_type`: architecture|performance|security|quality|general
 - `output_format`: summary|detailed|actionable
 - `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
@@ -163,7 +176,7 @@ All tools that work with files support **both individual files and entire direct
 
 **`codereview`** - Review code files or directories
 - `files`: List of file paths or directories (required)
-- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `model`: auto|pro|flash|flash-2.0|flashlite|o3|o3-mini|o4-mini|gpt4.1|gpt5|gpt5-mini|gpt5-nano (default: server default)
 - `review_type`: full|security|performance|quick
 - `focus_on`: Specific aspects to focus on
 - `standards`: Coding standards to enforce
@@ -179,7 +192,7 @@ All tools that work with files support **both individual files and entire direct
 
 **`debug`** - Debug with file context
 - `error_description`: Description of the issue (required)
-- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `model`: auto|pro|flash|flash-2.0|flashlite|o3|o3-mini|o4-mini|gpt4.1|gpt5|gpt5-mini|gpt5-nano (default: server default)
 - `error_context`: Stack trace or logs
 - `files`: Files or directories related to the issue
 - `runtime_info`: Environment details
@@ -195,7 +208,7 @@ All tools that work with files support **both individual files and entire direct
 
 **`thinkdeep`** - Extended analysis with file context
 - `current_analysis`: Your current thinking (required)
-- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `model`: auto|pro|flash|flash-2.0|flashlite|o3|o3-mini|o4-mini|gpt4.1|gpt5|gpt5-mini|gpt5-nano (default: server default)
 - `problem_context`: Additional context
 - `focus_areas`: Specific aspects to focus on
 - `files`: Files or directories for context
@@ -211,7 +224,7 @@ All tools that work with files support **both individual files and entire direct
 **`testgen`** - Comprehensive test generation with edge case coverage
 - `files`: Code files or directories to generate tests for (required)
 - `prompt`: Description of what to test, testing objectives, and scope (required)
-- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `model`: auto|pro|flash|flash-2.0|flashlite|o3|o3-mini|o4-mini|gpt4.1|gpt5|gpt5-mini|gpt5-nano (default: server default)
 - `test_examples`: Optional existing test files as style/pattern reference
 - `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
 
@@ -226,7 +239,7 @@ All tools that work with files support **both individual files and entire direct
 - `files`: Code files or directories to analyze for refactoring opportunities (required)
 - `prompt`: Description of refactoring goals, context, and specific areas of focus (required)
 - `refactor_type`: codesmells|decompose|modernize|organization (required)
-- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `model`: auto|pro|flash|flash-2.0|flashlite|o3|o3-mini|o4-mini|gpt4.1|gpt5|gpt5-mini|gpt5-nano (default: server default)
 - `focus_areas`: Specific areas to focus on (e.g., 'performance', 'readability', 'maintainability', 'security')
 - `style_guide_examples`: Optional existing code files to use as style/pattern reference
 - `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
diff --git a/docs/tools/analyze.md b/docs/tools/analyze.md
index 618a0be..af2c3a8 100644
--- a/docs/tools/analyze.md
+++ b/docs/tools/analyze.md
@@ -65,7 +65,7 @@ This workflow ensures methodical analysis before expert insights, resulting in d
 
 **Initial Configuration (used in step 1):**
 - `prompt`: What to analyze or look for (required)
-- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `model`: auto|pro|flash|flash-2.0|flashlite|o3|o3-mini|o4-mini|gpt4.1|gpt5|gpt5-mini|gpt5-nano (default: server default)
 - `analysis_type`: architecture|performance|security|quality|general (default: general)
 - `output_format`: summary|detailed|actionable (default: detailed)
 - `temperature`: Temperature for analysis (0-1, default 0.2)
diff --git a/docs/tools/analyze.md.bak b/docs/tools/analyze.md.bak
new file mode 100644
index 0000000..618a0be
--- /dev/null
+++ b/docs/tools/analyze.md.bak
@@ -0,0 +1,199 @@
+# Analyze Tool - Smart File Analysis
+
+**General-purpose code understanding and exploration through workflow-driven investigation**
+
+The `analyze` tool provides comprehensive code analysis and understanding capabilities, helping you explore codebases, understand architecture, and identify patterns across files and directories. This workflow tool guides Claude through systematic investigation of code structure, patterns, and architectural decisions across multiple steps, gathering comprehensive insights before providing expert analysis.
+
+## Thinking Mode
+
+**Default is `medium` (8,192 tokens).** Use `high` for architecture analysis (comprehensive insights worth the cost) or `low` for quick file overviews (save ~6k tokens).
+
+## How the Workflow Works
+
+The analyze tool implements a **structured workflow** for thorough code understanding:
+
+**Investigation Phase (Claude-Led):**
+1. **Step 1**: Claude describes the analysis plan and begins examining code structure
+2. **Step 2+**: Claude investigates architecture, patterns, dependencies, and design decisions
+3. **Throughout**: Claude tracks findings, relevant files, insights, and confidence levels
+4. **Completion**: Once analysis is comprehensive, Claude signals completion
+
+**Expert Analysis Phase:**
+After Claude completes the investigation (unless confidence is **certain**):
+- Complete analysis summary with all findings
+- Architectural insights and pattern identification
+- Strategic improvement recommendations
+- Final expert assessment based on investigation
+
+This workflow ensures methodical analysis before expert insights, resulting in deeper understanding and more valuable recommendations.
+
+## Example Prompts
+
+**Basic Usage:**
+```
+"Use gemini to analyze main.py to understand how it works"
+"Get gemini to do an architecture analysis of the src/ directory"
+```
+
+## Key Features
+
+- **Analyzes single files or entire directories** with intelligent file filtering
+- **Supports specialized analysis types**: architecture, performance, security, quality, general
+- **Uses file paths (not content) for clean terminal output** while processing full content
+- **Can identify patterns, anti-patterns, and refactoring opportunities**
+- **Large codebase support**: Handle massive codebases with 1M token context models
+- **Cross-file relationship mapping**: Understand dependencies and interactions
+- **Architecture visualization**: Describe system structure and component relationships
+- **Image support**: Analyze architecture diagrams, UML charts, flowcharts: `"Analyze this system diagram with gemini to understand the data flow and identify bottlenecks"`
+- **Web search capability**: When enabled with `use_websearch` (default: true), the model can request Claude to perform web searches and share results back to enhance analysis with current documentation, design patterns, and best practices
+
+## Tool Parameters
+
+**Workflow Investigation Parameters (used during step-by-step process):**
+- `step`: Current investigation step description (required for each step)
+- `step_number`: Current step number in analysis sequence (required)
+- `total_steps`: Estimated total investigation steps (adjustable)
+- `next_step_required`: Whether another investigation step is needed
+- `findings`: Discoveries and insights collected in this step (required)
+- `files_checked`: All files examined during investigation
+- `relevant_files`: Files directly relevant to the analysis (required in step 1)
+- `relevant_context`: Methods/functions/classes central to analysis findings
+- `issues_found`: Issues or concerns identified with severity levels
+- `confidence`: Confidence level in analysis completeness (exploring/low/medium/high/certain)
+- `backtrack_from_step`: Step number to backtrack from (for revisions)
+- `images`: Visual references for analysis context
+
+**Initial Configuration (used in step 1):**
+- `prompt`: What to analyze or look for (required)
+- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `analysis_type`: architecture|performance|security|quality|general (default: general)
+- `output_format`: summary|detailed|actionable (default: detailed)
+- `temperature`: Temperature for analysis (0-1, default 0.2)
+- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
+- `use_websearch`: Enable web search for documentation and best practices (default: true)
+- `use_assistant_model`: Whether to use expert analysis phase (default: true, set to false to use Claude only)
+- `continuation_id`: Continue previous analysis sessions
+
+## Analysis Types
+
+**General Analysis (default):**
+- Overall code structure and organization
+- Key components and their responsibilities
+- Data flow and control flow
+- Design patterns and architectural decisions
+
+**Architecture Analysis:**
+- System-level design and component relationships
+- Module dependencies and coupling
+- Separation of concerns and layering
+- Scalability and maintainability considerations
+
+**Performance Analysis:**
+- Potential bottlenecks and optimization opportunities
+- Algorithmic complexity assessment
+- Memory usage patterns
+- I/O and database interaction efficiency
+
+**Security Analysis:**
+- Security patterns and potential vulnerabilities
+- Input validation and sanitization
+- Authentication and authorization mechanisms
+- Data protection and privacy considerations
+
+**Quality Analysis:**
+- Code quality metrics and maintainability
+- Testing coverage and patterns
+- Documentation completeness
+- Best practices adherence
+
+## Usage Examples
+
+**Single File Analysis:**
+```
+"Analyze user_controller.py to understand the authentication flow with gemini"
+```
+
+**Directory Architecture Analysis:**
+```
+"Use pro to analyze the src/ directory architecture and identify the main components"
+```
+
+**Performance-Focused Analysis:**
+```
+"Analyze backend/api/ for performance bottlenecks with o3, focus on database queries"
+```
+
+**Security Assessment:**
+```
+"Use gemini pro to analyze the authentication module for security patterns and potential issues"
+```
+
+**Visual + Code Analysis:**
+```
+"Analyze this system architecture diagram along with the src/core/ implementation to understand the data flow"
+```
+
+**Large Codebase Analysis:**
+```
+"Analyze the entire project structure with gemini pro to understand how all components work together"
+```
+
+## Output Formats
+
+**Summary Format:**
+- High-level overview with key findings
+- Main components and their purposes
+- Critical insights and recommendations
+
+**Detailed Format (default):**
+- Comprehensive analysis with specific examples
+- Code snippets and file references
+- Detailed explanations of patterns and structures
+
+**Actionable Format:**
+- Specific recommendations and next steps
+- Prioritized list of improvements
+- Implementation guidance and examples
+
+## Best Practices
+
+- **Be specific about goals**: Clearly state what you want to understand or discover
+- **Use appropriate analysis types**: Choose the type that matches your needs
+- **Include related files**: Analyze modules together for better context understanding
+- **Leverage large context models**: Use Gemini Pro for comprehensive codebase analysis
+- **Combine with visual context**: Include architecture diagrams or documentation
+- **Use continuation**: Build on previous analysis for deeper understanding
+
+## Advanced Features
+
+**Large Codebase Support:**
+With models like Gemini Pro (1M context), you can analyze extensive codebases:
+```
+"Analyze the entire microservices architecture across all service directories"
+```
+
+**Cross-File Relationship Mapping:**
+Understand how components interact across multiple files:
+```
+"Analyze the data processing pipeline across input/, processing/, and output/ directories"
+```
+
+**Pattern Recognition:**
+Identify design patterns, anti-patterns, and architectural decisions:
+```
+"Analyze src/ to identify all design patterns used and assess their implementation quality"
+```
+
+**Web Search Enhancement:**
+The tool can recommend searches for current best practices and documentation:
+```
+After analysis: "Recommended searches for Claude: 'FastAPI async best practices 2024', 'SQLAlchemy ORM performance optimization patterns'"
+```
+
+## When to Use Analyze vs Other Tools
+
+- **Use `analyze`** for: Understanding code structure, exploring unfamiliar codebases, architecture assessment
+- **Use `codereview`** for: Finding bugs and security issues with actionable fixes
+- **Use `debug`** for: Diagnosing specific runtime errors or performance problems
+- **Use `refactor`** for: Getting specific refactoring recommendations and implementation plans
+- **Use `chat`** for: Open-ended discussions about code without structured analysis
\ No newline at end of file
diff --git a/docs/tools/chat.md b/docs/tools/chat.md
index b7557eb..31bd8ef 100644
--- a/docs/tools/chat.md
+++ b/docs/tools/chat.md
@@ -33,7 +33,7 @@ and then debate with the other models to give me a final verdict
 ## Tool Parameters
 
 - `prompt`: Your question or discussion topic (required)
-- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `model`: auto|pro|flash|flash-2.0|flashlite|o3|o3-mini|o4-mini|gpt4.1|gpt5|gpt5-mini|gpt5-nano (default: server default)
 - `files`: Optional files for context (absolute paths)
 - `images`: Optional images for visual context (absolute paths)
 - `temperature`: Response creativity (0-1, default 0.5)
diff --git a/docs/tools/chat.md.bak b/docs/tools/chat.md.bak
new file mode 100644
index 0000000..b7557eb
--- /dev/null
+++ b/docs/tools/chat.md.bak
@@ -0,0 +1,84 @@
+# Chat Tool - General Development Chat & Collaborative Thinking
+
+**Your thinking partner - bounce ideas, get second opinions, brainstorm collaboratively**
+
+The `chat` tool is your collaborative thinking partner for development conversations. It's designed to help you brainstorm, validate ideas, get second opinions, and explore alternatives in a conversational format.
+
+## Thinking Mode
+
+**Default is `medium` (8,192 tokens).** Use `low` for quick questions to save tokens, or `high` for complex discussions when thoroughness matters.
+
+## Example Prompt
+
+```
+Chat with zen and pick the best model for this job. I need to pick between Redis and Memcached for session storage 
+and I need an expert opinion for the project I'm working on. Get a good idea of what the project does, pick one of the two options
+and then debate with the other models to give me a final verdict
+```
+
+## Key Features
+
+- **Collaborative thinking partner** for your analysis and planning
+- **Get second opinions** on your designs and approaches
+- **Brainstorm solutions** and explore alternatives together
+- **Validate your checklists** and implementation plans
+- **General development questions** and explanations
+- **Technology comparisons** and best practices
+- **Architecture and design discussions**
+- **File reference support**: `"Use gemini to explain this algorithm with context from algorithm.py"`
+- **Image support**: Include screenshots, diagrams, UI mockups for visual analysis: `"Chat with gemini about this error dialog screenshot to understand the user experience issue"`
+- **Dynamic collaboration**: Gemini can request additional files or context during the conversation if needed for a more thorough response
+- **Web search capability**: Analyzes when web searches would be helpful and recommends specific searches for Claude to perform, ensuring access to current documentation and best practices
+
+## Tool Parameters
+
+- `prompt`: Your question or discussion topic (required)
+- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `files`: Optional files for context (absolute paths)
+- `images`: Optional images for visual context (absolute paths)
+- `temperature`: Response creativity (0-1, default 0.5)
+- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
+- `use_websearch`: Enable web search for documentation and insights (default: true)
+- `continuation_id`: Continue previous conversations
+
+## Usage Examples
+
+**Basic Development Chat:**
+```
+"Chat with zen about the best approach for user authentication in my React app"
+```
+
+**Technology Comparison:**
+```
+"Use flash to discuss whether PostgreSQL or MongoDB would be better for my e-commerce platform"
+```
+
+**Architecture Discussion:**
+```
+"Chat with pro about microservices vs monolith architecture for my project, consider scalability and team size"
+```
+
+**File Context Analysis:**
+```
+"Use gemini to chat about the current authentication implementation in auth.py and suggest improvements"
+```
+
+**Visual Analysis:**
+```
+"Chat with gemini about this UI mockup screenshot - is the user flow intuitive?"
+```
+
+## Best Practices
+
+- **Be specific about context**: Include relevant files or describe your project scope
+- **Ask for trade-offs**: Request pros/cons for better decision-making
+- **Use conversation continuation**: Build on previous discussions with `continuation_id`
+- **Leverage visual context**: Include diagrams, mockups, or screenshots when discussing UI/UX
+- **Request web searches**: Ask for current best practices or recent developments in technologies
+
+## When to Use Chat vs Other Tools
+
+- **Use `chat`** for: Open-ended discussions, brainstorming, getting second opinions, technology comparisons
+- **Use `thinkdeep`** for: Extending specific analysis, challenging assumptions, deeper reasoning
+- **Use `analyze`** for: Understanding existing code structure and patterns
+- **Use `debug`** for: Specific error diagnosis and troubleshooting
\ No newline at end of file
diff --git a/docs/tools/codereview.md b/docs/tools/codereview.md
index 9037cc2..bb095c5 100644
--- a/docs/tools/codereview.md
+++ b/docs/tools/codereview.md
@@ -80,7 +80,7 @@ The above prompt will simultaneously run two separate `codereview` tools with tw
 
 **Initial Review Configuration (used in step 1):**
 - `prompt`: User's summary of what the code does, expected behavior, constraints, and review objectives (required)
-- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `model`: auto|pro|flash|flash-2.0|flashlite|o3|o3-mini|o4-mini|gpt4.1|gpt5|gpt5-mini|gpt5-nano (default: server default)
 - `review_type`: full|security|performance|quick (default: full)
 - `focus_on`: Specific aspects to focus on (e.g., "security vulnerabilities", "performance bottlenecks")
 - `standards`: Coding standards to enforce (e.g., "PEP8", "ESLint", "Google Style Guide")
diff --git a/docs/tools/codereview.md.bak b/docs/tools/codereview.md.bak
new file mode 100644
index 0000000..9037cc2
--- /dev/null
+++ b/docs/tools/codereview.md.bak
@@ -0,0 +1,174 @@
+# CodeReview Tool - Professional Code Review
+
+**Comprehensive code analysis with prioritized feedback through workflow-driven investigation**
+
+The `codereview` tool provides professional code review capabilities with actionable feedback, severity-based issue prioritization, and support for various review types from quick style checks to comprehensive security audits. This workflow tool guides Claude through systematic investigation steps with forced pauses between each step to ensure thorough code examination, issue identification, and quality assessment before providing expert analysis.
+
+## Thinking Mode
+
+**Default is `medium` (8,192 tokens).** Use `high` for security-critical code (worth the extra tokens) or `low` for quick style checks (saves ~6k tokens).
+
+## How the Workflow Works
+
+The codereview tool implements a **structured workflow** that ensures thorough code examination:
+
+**Investigation Phase (Claude-Led):**
+1. **Step 1**: Claude describes the review plan and begins systematic analysis of code structure
+2. **Step 2+**: Claude examines code quality, security implications, performance concerns, and architectural patterns
+3. **Throughout**: Claude tracks findings, relevant files, issues, and confidence levels
+4. **Completion**: Once review is comprehensive, Claude signals completion
+
+**Expert Analysis Phase:**
+After Claude completes the investigation (unless confidence is **certain**):
+- Complete review summary with all findings and evidence
+- Relevant files and code patterns identified
+- Issues categorized by severity levels
+- Final recommendations based on investigation
+
+**Special Note**: If you want Claude to perform the entire review without calling another model, you can include "don't use any other model" in your prompt, and Claude will complete the full workflow independently.
+
+## Model Recommendation
+
+This tool particularly benefits from Gemini Pro or Flash models due to their 1M context window, which allows comprehensive analysis of large codebases. Claude's context limitations make it challenging to see the "big picture" in complex projects - this is a concrete example where utilizing a secondary model with larger context provides significant value beyond just experimenting with different AI capabilities.
+
+## Example Prompts
+
+```
+Perform a codereview with gemini pro and review auth.py for security issues and potential vulnerabilities.
+I need an actionable plan but break it down into smaller quick-wins that we can implement and test rapidly 
+```
+
+## Pro Tip: Multiple Parallel Reviews
+
+**You can start more than one codereview session with Claude:**
+
+```
+Start separate sub-tasks for codereview one with o3 finding critical issues and one with flash finding low priority issues
+and quick-wins and give me the final single combined review highlighting only the critical issues 
+```
+
+The above prompt will simultaneously run two separate `codereview` tools with two separate models and combine the output into a single summary for you to consume.
+
+## Key Features
+
+- **Issues prioritized by severity** (🔴 CRITICAL → 🟢 LOW)
+- **Supports specialized reviews**: security, performance, quick
+- **Coding standards enforcement**: `"Use gemini to review src/ against PEP8 standards"`
+- **Severity filtering**: `"Get gemini to review auth/ - only report critical vulnerabilities"`
+- **Image support**: Review code from screenshots, error dialogs, or visual bug reports: `"Review this error screenshot and the related auth.py file for potential security issues"`
+- **Multi-file analysis**: Comprehensive review of entire directories or codebases
+- **Actionable feedback**: Specific recommendations with line numbers and code examples
+- **Language-specific expertise**: Tailored analysis for Python, JavaScript, Java, C#, Swift, and more
+- **Integration issue detection**: Identifies cross-file dependencies and architectural problems
+- **Security vulnerability scanning**: Focused on common security patterns and anti-patterns
+
+## Tool Parameters
+
+**Workflow Investigation Parameters (used during step-by-step process):**
+- `step`: Current investigation step description (required for each step)
+- `step_number`: Current step number in review sequence (required)
+- `total_steps`: Estimated total investigation steps (adjustable)
+- `next_step_required`: Whether another investigation step is needed
+- `findings`: Discoveries and evidence collected in this step (required)
+- `files_checked`: All files examined during investigation
+- `relevant_files`: Files directly relevant to the review (required in step 1)
+- `relevant_context`: Methods/functions/classes central to review findings
+- `issues_found`: Issues identified with severity levels
+- `confidence`: Confidence level in review completeness (exploring/low/medium/high/certain)
+- `backtrack_from_step`: Step number to backtrack from (for revisions)
+- `images`: Visual references for review context
+
+**Initial Review Configuration (used in step 1):**
+- `prompt`: User's summary of what the code does, expected behavior, constraints, and review objectives (required)
+- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `review_type`: full|security|performance|quick (default: full)
+- `focus_on`: Specific aspects to focus on (e.g., "security vulnerabilities", "performance bottlenecks")
+- `standards`: Coding standards to enforce (e.g., "PEP8", "ESLint", "Google Style Guide")
+- `severity_filter`: critical|high|medium|low|all (default: all)
+- `temperature`: Temperature for consistency (0-1, default 0.2)
+- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
+- `use_websearch`: Enable web search for best practices and documentation (default: true)
+- `use_assistant_model`: Whether to use expert analysis phase (default: true, set to false to use Claude only)
+- `continuation_id`: Continue previous review discussions
+
+## Review Types
+
+**Full Review (default):**
+- Comprehensive analysis including bugs, security, performance, maintainability
+- Best for new features or significant code changes
+
+**Security Review:**
+- Focused on security vulnerabilities and attack vectors
+- Checks for common security anti-patterns
+- Best for authentication, authorization, data handling code
+
+**Performance Review:**
+- Analyzes performance bottlenecks and optimization opportunities
+- Memory usage, algorithmic complexity, resource management
+- Best for performance-critical code paths
+
+**Quick Review:**
+- Fast style and basic issue check
+- Lower token usage for rapid feedback
+- Best for code formatting and simple validation
+
+## Severity Levels
+
+Issues are categorized and prioritized:
+
+- **🔴 CRITICAL**: Security vulnerabilities, crashes, data corruption
+- **🟠 HIGH**: Logic errors, performance issues, reliability problems  
+- **🟡 MEDIUM**: Code smells, maintainability issues, minor bugs
+- **🟢 LOW**: Style issues, documentation, minor improvements
+
+## Usage Examples
+
+**Basic Security Review:**
+```
+"Review the authentication module in auth/ for security vulnerabilities with gemini pro"
+```
+
+**Performance-Focused Review:**
+```
+"Use o3 to review backend/api.py for performance issues, focus on database queries and caching"
+```
+
+**Quick Style Check:**
+```
+"Quick review of utils.py with flash, only report critical and high severity issues"
+```
+
+**Standards Enforcement:**
+```
+"Review src/ directory against PEP8 standards with gemini, focus on code formatting and structure"
+```
+
+**Visual Context Review:**
+```
+"Review this authentication code along with the error dialog screenshot to understand the security implications"
+```
+
+## Best Practices
+
+- **Provide context**: Describe what the code is supposed to do and any constraints
+- **Use appropriate review types**: Security for auth code, performance for critical paths
+- **Set severity filters**: Focus on critical issues for quick wins
+- **Include relevant files**: Review related modules together for better context
+- **Use parallel reviews**: Run multiple reviews with different models for comprehensive coverage
+- **Follow up on findings**: Use the continuation feature to discuss specific issues in detail
+
+## Output Format
+
+Reviews include:
+- **Executive Summary**: Overview of code quality and main concerns
+- **Detailed Findings**: Specific issues with severity levels, line numbers, and recommendations
+- **Quick Wins**: Easy-to-implement improvements with high impact
+- **Long-term Improvements**: Structural changes for better maintainability
+- **Security Considerations**: Specific security recommendations when relevant
+
+## When to Use CodeReview vs Other Tools
+
+- **Use `codereview`** for: Finding bugs, security issues, performance problems, code quality assessment
+- **Use `analyze`** for: Understanding code structure without finding issues
+- **Use `debug`** for: Diagnosing specific runtime errors or exceptions
+- **Use `refactor`** for: Identifying structural improvements and modernization opportunities
\ No newline at end of file
diff --git a/docs/tools/debug.md b/docs/tools/debug.md
index 6e7f20d..985708b 100644
--- a/docs/tools/debug.md
+++ b/docs/tools/debug.md
@@ -73,7 +73,7 @@ This structured approach ensures Claude performs methodical groundwork before ex
 - `images`: Visual debugging materials (error screenshots, logs, etc.)
 
 **Model Selection:**
-- `model`: auto|pro|flash|o3|o3-mini|o4-mini (default: server default)
+- `model`: auto|pro|flash|flash-2.0|flashlite|o3|o3-mini|o4-mini|gpt4.1|gpt5|gpt5-mini|gpt5-nano (default: server default)
 - `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
 - `use_websearch`: Enable web search for documentation and solutions (default: true)
 - `use_assistant_model`: Whether to use expert analysis phase (default: true, set to false to use Claude only)
diff --git a/docs/tools/precommit.md b/docs/tools/precommit.md
index d70c1ab..85c9422 100644
--- a/docs/tools/precommit.md
+++ b/docs/tools/precommit.md
@@ -135,7 +135,7 @@ Use zen and perform a thorough precommit ensuring there aren't any new regressio
 **Initial Configuration (used in step 1):**
 - `path`: Starting directory to search for repos (default: current directory, absolute path required)
 - `prompt`: The original user request description for the changes (required for context)
-- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `model`: auto|pro|flash|flash-2.0|flashlite|o3|o3-mini|o4-mini|gpt4.1|gpt5|gpt5-mini|gpt5-nano (default: server default)
 - `compare_to`: Compare against a branch/tag instead of local changes (optional)
 - `severity_filter`: critical|high|medium|low|all (default: all)
 - `include_staged`: Include staged changes in the review (default: true)
diff --git a/docs/tools/precommit.md.bak b/docs/tools/precommit.md.bak
new file mode 100644
index 0000000..d70c1ab
--- /dev/null
+++ b/docs/tools/precommit.md.bak
@@ -0,0 +1,247 @@
+# PreCommit Tool - Pre-Commit Validation
+
+**Comprehensive review of staged/unstaged git changes across multiple repositories through workflow-driven investigation**
+
+The `precommit` tool provides thorough validation of git changes before committing, ensuring code quality, requirement compliance, and preventing regressions across multiple repositories. This workflow tool guides Claude through systematic investigation of git changes, repository status, and file modifications across multiple steps before providing expert validation.
+
+## Thinking Mode
+
+**Default is `medium` (8,192 tokens).** Use `high` or `max` for critical releases when thorough validation justifies the token cost.
+
+## How the Workflow Works
+
+The precommit tool implements a **structured workflow** for comprehensive change validation:
+
+**Investigation Phase (Claude-Led):**
+1. **Step 1**: Claude describes the validation plan and begins analyzing git status across repositories
+2. **Step 2+**: Claude examines changes, diffs, dependencies, and potential impacts
+3. **Throughout**: Claude tracks findings, relevant files, issues, and confidence levels
+4. **Completion**: Once investigation is thorough, Claude signals completion
+
+**Expert Validation Phase:**
+After Claude completes the investigation (unless confidence is **certain**):
+- Complete summary of all changes and their context
+- Potential issues and regressions identified
+- Requirement compliance assessment
+- Final recommendations for safe commit
+
+**Special Note**: If you want Claude to perform the entire pre-commit validation without calling another model, you can include "don't use any other model" in your prompt, and Claude will complete the full workflow independently.
+
+## Model Recommendation
+
+Pre-commit validation benefits significantly from models with extended context windows like Gemini Pro, which can analyze extensive changesets across multiple files and repositories simultaneously. This comprehensive view enables detection of cross-file dependencies, architectural inconsistencies, and integration issues that might be missed when reviewing changes in isolation due to context constraints.
+
+## Visual Example
+
+<div align="center">
+  <img src="https://github.com/user-attachments/assets/584adfa6-d252-49b4-b5b0-0cd6e97fb2c6" width="950">
+</div>
+
+**Prompt Used:**
+```
+Now use gemini and perform a review and precommit and ensure original requirements are met, no duplication of code or
+logic, everything should work as expected
+```
+
+How beautiful is that? Claude used `precommit` twice and `codereview` once and actually found and fixed two critical errors before commit!
+
+### Real-world Example
+
+Here's an example where Claude was made to go through changes *it* was asked to make. Its recommendation 
+to **Publish all Changes** was made after a rigorous set of prompts where it examined the small set of changes 
+from different angles (performance, bugs, anti-patterns etc). After confirming that unit tests were 
+passing, it went ahead with an initial review with O3; came back clean - all good to go. O3 was happy, everything looked great.
+
+Then it shared the changes and relevant code with Gemini 2.5 Pro - the following is the outcome:
+
+```
+⏺ zen:precommit (MCP)(step: "FINAL VALIDATION COMPLETE. All changes have been thoroughly analyzed and are ready for
+                     commit. The modifications successfully fix 4 GitHub CI test failures with high-quality, defensive
+                     implementations that maintain architectural integrity and coding standards. No issues or regressions
+                      detected.", step_number: 4, total_steps: 4, next_step_required: false, findings: "COMMIT
+                     RECOMMENDATION: ✅ APPROVED\n\nSUMMARY OF CHANGES:\n✅ Fixed 4 GitHub CI test failures:\n   -
+                     test_openrouter_with_restrictions (Mock iteration issue)  \n   - test_execute_with_branching
+                     (Missing metadata)\n   - test_execute_with_revision (Missing metadata)\n   - test_metadata_preserved
+                      (Provider type AttributeError)\n\n✅ IMPLEMENTATION QUALITY VERIFIED:\n   - Defensive programming
+                     with proper type checking\n   - Error handling with meaningful fallbacks\n   - Backward
+                     compatibility maintained\n   - Coding standards followed (no hasattr/getattr in changes)\n   -
+                     Architectural patterns preserved\n\n✅ COMPREHENSIVE TESTING:\n   - All originally failing tests
+                     pass\n   - Affected functionality: 37/37 tests pass\n   - Provider systems: 16/16 tests pass  \n   -
+                      Module imports work correctly\n   - No regressions detected\n\n✅ CHANGE IMPACT ASSESSMENT:\n   -
+                     Minimal, targeted changes\n   - No security or performance concerns\n   - Maintains API
+                     compatibility\n   - Changes are well-documented with clear comments\n\nRECOMMENDATION: These changes
+                      are safe to commit and will resolve the GitHub CI failures without any negative impact.",
+  ⎿  {
+       "status": "calling_expert_analysis",
+       "step_number": 4,
+     … +169 lines (ctrl+r to expand)
+
+⏺ 🔍 Pre-Commit Validation Complete
+
+  Based on comprehensive analysis using both O3 and Gemini 2.5 Pro models, I can provide the following assessment:
+
+  ✅ APPROVED WITH CRITICAL FIX REQUIRED
+
+  The changes successfully fix all 4 GitHub CI test failures and are well-implemented with defensive programming
+  patterns. However, the expert analysis identified one critical issue that must be addressed before commit:
+
+  🚨 [HIGH PRIORITY] Critical Issue Found
+
+  Inconsistent Provider Handling in tools/base.py
+  - Location: tools/base.py line ~1833 in _create_continuation_offer_response method
+  - Problem: The defensive provider handling pattern was applied in 2 locations but missed a 3rd location
+  - Impact: Risk of AttributeError: 'str' object has no attribute 'get_provider_type' in continuation scenarios
+  - Fix Required: Apply the same defensive pattern to the missed location
+```
+
+A subtle but critical issue spotted by Gemini that both Claude + O3 missed. This is the power of running these workflows with Zen.
+
+## Example Prompts
+
+```
+Use zen and perform a thorough precommit ensuring there aren't any new regressions or bugs introduced
+```
+
+## Key Features
+
+- **Recursive repository discovery** - finds all git repos including nested ones
+- **Validates changes against requirements** - ensures implementation matches intent
+- **Detects incomplete changes** - finds added functions never called, missing tests, etc.
+- **Multi-repo support** - reviews changes across multiple repositories in one go
+- **Configurable scope** - review staged, unstaged, or compare against branches
+- **Security focused** - catches exposed secrets, vulnerabilities in new code
+- **Smart truncation** - handles large diffs without exceeding context limits
+- **Cross-file dependency analysis** - identifies breaking changes across modules
+- **Test coverage validation** - ensures new code has appropriate test coverage
+- **Regression detection** - compares against requirements to prevent scope creep
+
+## Tool Parameters
+
+**Workflow Investigation Parameters (used during step-by-step process):**
+- `step`: Current investigation step description (required for each step)
+- `step_number`: Current step number in validation sequence (required)
+- `total_steps`: Estimated total investigation steps (adjustable)
+- `next_step_required`: Whether another investigation step is needed
+- `findings`: Discoveries and evidence collected in this step (required)
+- `files_checked`: All files examined during investigation
+- `relevant_files`: Files directly relevant to the changes
+- `relevant_context`: Methods/functions/classes affected by changes
+- `issues_found`: Issues identified with severity levels
+- `confidence`: Confidence level in validation completeness (exploring/low/medium/high/certain)
+- `backtrack_from_step`: Step number to backtrack from (for revisions)
+- `hypothesis`: Current assessment of change safety and completeness
+- `images`: Screenshots of requirements, design mockups for validation
+
+**Initial Configuration (used in step 1):**
+- `path`: Starting directory to search for repos (default: current directory, absolute path required)
+- `prompt`: The original user request description for the changes (required for context)
+- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `compare_to`: Compare against a branch/tag instead of local changes (optional)
+- `severity_filter`: critical|high|medium|low|all (default: all)
+- `include_staged`: Include staged changes in the review (default: true)
+- `include_unstaged`: Include uncommitted changes in the review (default: true)
+- `focus_on`: Specific aspects to focus on
+- `temperature`: Temperature for response (default: 0.2)
+- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
+- `use_websearch`: Enable web search for best practices (default: true)
+- `use_assistant_model`: Whether to use expert validation phase (default: true, set to false to use Claude only)
+- `continuation_id`: Continue previous validation discussions
+
+## Usage Examples
+
+**Basic Pre-commit Validation:**
+```
+"Use zen precommit to validate my changes before committing"
+```
+
+**Security-Focused Validation:**
+```
+"Perform precommit security review with gemini pro on the authentication changes"
+```
+
+**Multi-Repository Validation:**
+```
+"Validate changes across all repositories in this workspace with o3"
+```
+
+**Against Specific Branch:**
+```
+"Compare current changes against main branch with precommit using gemini pro"
+```
+
+**With Requirements Context:**
+```
+"Precommit validation ensuring the new payment feature meets requirements in FEATURE_SPEC.md"
+```
+
+## Validation Scope
+
+The tool automatically discovers and validates:
+
+**Repository Discovery:**
+- Searches recursively for all `.git` directories
+- Handles nested repositories and submodules
+- Configurable search depth to prevent excessive recursion
+
+**Change Analysis:**
+- Staged changes (`git diff --cached`)
+- Unstaged changes (`git diff`)
+- Untracked files that should be added
+- Deleted files and their impact
+
+**Cross-Repository Impact:**
+- Shared dependencies between repositories
+- API contract changes that affect other repos
+- Configuration changes with system-wide impact
+
+## Validation Categories
+
+**Completeness Checks:**
+- New functions/classes have corresponding tests
+- Documentation updated for API changes
+- Configuration files updated as needed
+- Migration scripts for database changes
+
+**Quality Assurance:**
+- Code follows project standards
+- No obvious bugs or logical errors
+- Performance implications considered
+- Security vulnerabilities addressed
+
+**Requirement Compliance:**
+- Implementation matches original requirements
+- No scope creep or unauthorized changes
+- All acceptance criteria met
+- Edge cases properly handled
+
+**Integration Safety:**
+- Breaking changes properly documented
+- Backward compatibility maintained where required
+- Dependencies correctly updated
+- Environment-specific changes validated
+
+## Best Practices
+
+- **Provide clear context**: Include the original requirements or feature description
+- **Use for significant changes**: Most valuable for features, refactoring, or security updates
+- **Review before final commit**: Catch issues before they enter the main branch
+- **Include visual context**: Screenshots of requirements or expected behavior
+- **Focus validation scope**: Use `focus_on` parameter for specific concerns
+- **Multi-stage validation**: Use continuation for iterative improvement
+
+## Output Format
+
+Validation results include:
+- **Change Summary**: Overview of what was modified across repositories
+- **Requirement Compliance**: How well changes match original intent
+- **Completeness Assessment**: Missing tests, documentation, or related changes
+- **Security Review**: Potential vulnerabilities or exposed secrets
+- **Integration Impact**: Cross-repository and cross-module effects
+- **Recommendations**: Specific actions before committing
+
+## When to Use PreCommit vs Other Tools
+
+- **Use `precommit`** for: Validating changes before git commit, ensuring requirement compliance
+- **Use `codereview`** for: General code quality assessment without git context
+- **Use `debug`** for: Diagnosing specific runtime issues
+- **Use `analyze`** for: Understanding existing code without validation context
\ No newline at end of file
diff --git a/docs/tools/refactor.md b/docs/tools/refactor.md
index 6407a4a..d5fda94 100644
--- a/docs/tools/refactor.md
+++ b/docs/tools/refactor.md
@@ -103,7 +103,7 @@ This results in Claude first performing its own expert analysis, encouraging it
 **Initial Configuration (used in step 1):**
 - `prompt`: Description of refactoring goals, context, and specific areas of focus (required)
 - `refactor_type`: codesmells|decompose|modernize|organization (default: codesmells)
-- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `model`: auto|pro|flash|flash-2.0|flashlite|o3|o3-mini|o4-mini|gpt4.1|gpt5|gpt5-mini|gpt5-nano (default: server default)
 - `focus_areas`: Specific areas to focus on (e.g., 'performance', 'readability', 'maintainability', 'security')
 - `style_guide_examples`: Optional existing code files to use as style/pattern reference (absolute paths)
 - `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
diff --git a/docs/tools/refactor.md.bak b/docs/tools/refactor.md.bak
new file mode 100644
index 0000000..6407a4a
--- /dev/null
+++ b/docs/tools/refactor.md.bak
@@ -0,0 +1,201 @@
+# Refactor Tool - Intelligent Code Refactoring
+
+**Comprehensive refactoring analysis with top-down decomposition strategy through workflow-driven investigation**
+
+The `refactor` tool provides intelligent code refactoring recommendations with a focus on top-down decomposition and systematic code improvement. This workflow tool enforces systematic investigation of code smells, decomposition opportunities, and modernization possibilities across multiple steps, ensuring thorough analysis before providing expert refactoring recommendations with precise implementation guidance.
+
+## Thinking Mode
+
+**Default is `medium` (8,192 tokens).** Use `high` for complex legacy systems (worth the investment for thorough refactoring plans) or `max` for extremely complex codebases requiring deep analysis.
+
+## How the Workflow Works
+
+The refactor tool implements a **structured workflow** for systematic refactoring analysis:
+
+**Investigation Phase (Claude-Led):**
+1. **Step 1**: Claude describes the refactoring plan and begins analyzing code structure
+2. **Step 2+**: Claude examines code smells, decomposition opportunities, and modernization possibilities
+3. **Throughout**: Claude tracks findings, relevant files, refactoring opportunities, and confidence levels
+4. **Completion**: Once investigation is thorough, Claude signals completion
+
+**Expert Analysis Phase:**
+After Claude completes the investigation (unless confidence is **complete**):
+- Complete refactoring opportunity summary
+- Prioritized recommendations by impact
+- Precise implementation guidance with line numbers
+- Final expert assessment for refactoring strategy
+
+This workflow ensures methodical investigation before expert recommendations, resulting in more targeted and valuable refactoring plans.
+
+## Model Recommendation
+
+The refactor tool excels with models that have large context windows like Gemini Pro (1M tokens), which can analyze entire files and complex codebases simultaneously. This comprehensive view enables detection of cross-file dependencies, architectural patterns, and refactoring opportunities that might be missed when reviewing code in smaller chunks due to context constraints.
+
+## Example Prompts
+
+```
+"Use gemini pro to decompose my_crazy_big_class.m into smaller extensions"
+"Using zen's refactor decompose the all_in_one_sync_code.swift into maintainable extensions"
+```
+
+💡**Example of a powerful prompt** to get the best out of both Claude + Flash's 1M Context: 
+```
+"First, think about how the authentication module works, find related classes and find
+ any code smells, then using zen's refactor ask flash to confirm your findings but ask 
+ it to find additional code smells and any other quick-wins and then fix these issues"
+```
+
+This results in Claude first performing its own expert analysis, encouraging it to think critically and identify links within the project code. It then prompts `flash` to review the same code with a hint—preventing it from duplicating Claude's findings and encouraging it to explore other areas that Claude did *not* discover.
+
+## Key Features
+
+- **Intelligent prioritization** - Will refuse to work on low priority issues if code is unwieldy large and requires decomposition first, helps identify poorly managed classes and files that need structural improvements before detail work
+- **Top-down decomposition strategy** - Analyzes file → class → function levels systematically
+- **Four refactor types**: `codesmells` (detect anti-patterns), `decompose` (break down large components), `modernize` (update language features), `organization` (improve structure)
+- **Precise line-number references** - Provides exact line numbers for Claude to implement changes
+- **Language-specific guidance** - Tailored suggestions for Python, JavaScript, Java, C#, Swift, and more
+- **Style guide integration** - Uses existing project files as pattern references
+- **Conservative approach** - Careful dependency analysis to prevent breaking changes
+- **Multi-file analysis** - Understands cross-file relationships and dependencies
+- **Priority sequencing** - Recommends implementation order for refactoring changes
+- **Image support**: Analyze code architecture diagrams, legacy system charts: `"Refactor this legacy module using gemini pro with the current architecture diagram"`
+
+## Refactor Types (Progressive Priority System)
+
+**1. `decompose` (CRITICAL PRIORITY)** - Context-aware decomposition with adaptive thresholds:
+
+**AUTOMATIC decomposition** (CRITICAL severity - blocks all other refactoring):
+- Files >15,000 LOC, Classes >3,000 LOC, Functions >500 LOC
+
+**EVALUATE decomposition** (contextual severity - intelligent assessment):
+- Files >5,000 LOC, Classes >1,000 LOC, Functions >150 LOC
+- Only recommends if genuinely improves maintainability
+- Respects legacy stability, domain complexity, performance constraints
+- Considers legitimate cases where size is justified (algorithms, state machines, generated code)
+
+**2. `codesmells`** - Applied only after decomposition is complete:
+- Detect long methods, complex conditionals, duplicate code, magic numbers, poor naming
+
+**3. `modernize`** - Applied only after decomposition is complete:
+- Update to modern language features (f-strings, async/await, etc.)
+
+**4. `organization`** - Applied only after decomposition is complete:
+- Improve logical grouping, separation of concerns, module structure
+
+**Progressive Analysis:** The tool performs a top-down check (worse → bad → better) and refuses to work on lower-priority issues if critical decomposition is needed first. It understands that massive files and classes create cognitive overload that must be addressed before detail work can be effective. Legacy code that cannot be safely decomposed is handled with higher tolerance thresholds and context-sensitive exemptions.
+
+## Tool Parameters
+
+**Workflow Investigation Parameters (used during step-by-step process):**
+- `step`: Current investigation step description (required for each step)
+- `step_number`: Current step number in refactoring sequence (required)
+- `total_steps`: Estimated total investigation steps (adjustable)
+- `next_step_required`: Whether another investigation step is needed
+- `findings`: Discoveries and refactoring opportunities in this step (required)
+- `files_checked`: All files examined during investigation
+- `relevant_files`: Files directly needing refactoring (required in step 1)
+- `relevant_context`: Methods/functions/classes requiring refactoring
+- `issues_found`: Refactoring opportunities with severity and type
+- `confidence`: Confidence level in analysis completeness (exploring/incomplete/partial/complete)
+- `backtrack_from_step`: Step number to backtrack from (for revisions)
+- `hypothesis`: Current assessment of refactoring priorities
+
+**Initial Configuration (used in step 1):**
+- `prompt`: Description of refactoring goals, context, and specific areas of focus (required)
+- `refactor_type`: codesmells|decompose|modernize|organization (default: codesmells)
+- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `focus_areas`: Specific areas to focus on (e.g., 'performance', 'readability', 'maintainability', 'security')
+- `style_guide_examples`: Optional existing code files to use as style/pattern reference (absolute paths)
+- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
+- `use_assistant_model`: Whether to use expert analysis phase (default: true, set to false to use Claude only)
+- `continuation_id`: Thread continuation ID for multi-turn conversations
+
+## Usage Examples
+
+**Decomposition Analysis:**
+```
+"Analyze UserController.java for decomposition opportunities - it's becoming unwieldy"
+```
+
+**Code Smell Detection:**
+```
+"Use gemini to identify code smells in the authentication module with high thinking mode"
+```
+
+**Modernization:**
+```
+"Modernize legacy_parser.py to use modern Python features following examples/modern_patterns.py"
+```
+
+**Organization Improvement:**
+```
+"Refactor src/utils/ for better organization, focus on maintainability and readability"
+```
+
+**Legacy System Refactoring:**
+```
+"Use pro with max thinking to analyze this 10,000-line legacy file for decomposition strategy"
+```
+
+## Refactoring Strategy
+
+**Top-Down Analysis:**
+1. **File Level**: Identify oversized files that need splitting
+2. **Class Level**: Find classes with too many responsibilities  
+3. **Function Level**: Locate functions that are too complex or long
+4. **Code Quality**: Address smells, modernization, and organization
+
+**Context-Aware Decisions:**
+- **Domain Complexity**: Some domains legitimately require larger classes
+- **Performance Constraints**: Critical path code may resist decomposition
+- **Legacy Stability**: Old, working code may need gentler refactoring
+- **Test Coverage**: Refactoring recommendations consider testability
+
+**Breaking Change Prevention:**
+- Analyzes dependencies before suggesting splits
+- Recommends gradual migration strategies
+- Identifies public API impact
+- Suggests backward compatibility approaches
+
+## Best Practices
+
+- **Start with decomposition**: Address structural issues before cosmetic improvements
+- **Provide clear context**: Explain the codebase purpose and constraints
+- **Use appropriate refactor types**: Match the type to your primary concern
+- **Include style examples**: Reference existing well-structured code in your project
+- **Focus on high-impact areas**: Target the most problematic or frequently modified code
+- **Plan implementation order**: Follow the tool's sequencing recommendations
+- **Consider test coverage**: Ensure adequate tests before major structural changes
+
+## Output Format
+
+Refactoring analysis includes:
+- **Priority Assessment**: What needs attention first and why
+- **Decomposition Strategy**: Specific file/class/function splitting recommendations
+- **Implementation Plan**: Step-by-step refactoring sequence
+- **Line-Number References**: Exact locations for changes
+- **Dependency Analysis**: Impact assessment and migration strategies
+- **Risk Assessment**: Potential breaking changes and mitigation strategies
+
+## Advanced Features
+
+**Adaptive Thresholds:**
+The tool adjusts size thresholds based on context:
+- **Generated Code**: Higher tolerance for large files
+- **Algorithm Implementation**: Recognizes when size is justified
+- **Legacy Systems**: More conservative recommendations
+- **Test Files**: Different standards for test vs production code
+
+**Cross-File Refactoring:**
+Analyzes multiple files together to understand:
+- Shared responsibilities that could be extracted
+- Dependencies that complicate refactoring
+- Opportunities for new abstractions
+- Impact of changes across the codebase
+
+## When to Use Refactor vs Other Tools
+
+- **Use `refactor`** for: Structural improvements, decomposition, modernization, code organization
+- **Use `codereview`** for: Finding bugs and security issues with immediate fixes
+- **Use `analyze`** for: Understanding code without making change recommendations  
+- **Use `debug`** for: Solving specific runtime issues rather than structural problems
\ No newline at end of file
diff --git a/docs/tools/secaudit.md b/docs/tools/secaudit.md
index 280452f..31ddd6e 100644
--- a/docs/tools/secaudit.md
+++ b/docs/tools/secaudit.md
@@ -86,7 +86,7 @@ security remediation plan using planner
 - `images`: Architecture diagrams, security documentation, or visual references
 
 **Initial Security Configuration (used in step 1):**
-- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `model`: auto|pro|flash|flash-2.0|flashlite|o3|o3-mini|o4-mini|gpt4.1|gpt5|gpt5-mini|gpt5-nano (default: server default)
 - `security_scope`: Application context, technology stack, and security boundary definition (required)
 - `threat_level`: low|medium|high|critical (default: medium) - determines assessment depth and urgency
 - `compliance_requirements`: List of compliance frameworks to assess against (e.g., ["PCI DSS", "SOC2"])
diff --git a/docs/tools/secaudit.md.bak b/docs/tools/secaudit.md.bak
new file mode 100644
index 0000000..280452f
--- /dev/null
+++ b/docs/tools/secaudit.md.bak
@@ -0,0 +1,222 @@
+# Secaudit Tool - Comprehensive Security Audit
+
+**Systematic OWASP-based security assessment with compliance evaluation through workflow-driven investigation**
+
+The `secaudit` tool provides comprehensive security auditing capabilities with systematic OWASP Top 10 assessment, compliance framework evaluation, 
+and threat modeling. This workflow tool guides Claude through methodical security investigation steps with forced pauses between each step to ensure 
+thorough vulnerability assessment, security pattern analysis, and compliance verification before providing expert analysis.
+
+**Important**: AI models may not identify all security vulnerabilities. Always perform additional manual security reviews, 
+penetration testing, and verification.
+
+## How the Workflow Works
+
+The secaudit tool implements a **structured 6-step security workflow** that ensures comprehensive security assessment:
+
+**Investigation Phase (Claude-Led):**
+1. **Step 1**: Security Scope Analysis - Claude identifies application type, tech stack, attack surface, and compliance requirements
+2. **Step 2**: Authentication & Authorization Assessment - Analyzes auth mechanisms, session management, and access controls
+3. **Step 3**: Input Validation & Data Security - Reviews input handling, data protection, and injection vulnerabilities
+4. **Step 4**: OWASP Top 10 (2021) Review - Systematic assessment of all OWASP categories with specific findings
+5. **Step 5**: Dependencies & Infrastructure - Security analysis of third-party components and deployment configurations
+6. **Step 6**: Compliance & Risk Assessment - Evaluation against specified compliance frameworks and risk prioritization
+
+**Expert Analysis Phase:**
+After Claude completes the investigation (unless confidence is **certain**):
+- Complete security assessment summary with all vulnerabilities and evidence
+- OWASP Top 10 systematic findings with severity classifications
+- Compliance framework gap analysis and remediation recommendations
+- Risk-prioritized remediation roadmap based on threat level and business impact
+
+**Special Note**: If you want Claude to perform the entire security audit without calling another model, you can include "don't use any other model" in your prompt, and Claude will complete the full workflow independently.
+
+## Model Recommendation
+
+This tool particularly benefits from Gemini Pro or O3 models due to their advanced reasoning capabilities and large context windows, which allow comprehensive security analysis across complex codebases. Security audits require understanding subtle attack vectors and cross-component interactions that benefit from deeper analytical capabilities.
+
+## Example Prompts
+
+```
+Perform a secaudit with o3 on this e-commerce web application focusing on payment processing security and PCI DSS compliance
+```
+
+```
+Use secaudit to conduct a comprehensive security audit of the authentication system, threat level high, focus on enterprise 
+security patterns and HIPAA compliance
+```
+
+## Pro Tip: Multi-Scope Security Assessment
+
+**You can run parallel security audits for different application components:**
+
+```
+Start separate sub-tasks, in one start a secaudit for critical payment processing components focusing on PCI DSS with gemini pro, 
+and in the other for user management focusing on OWASP authentication vulnerabilities with o4-mini, then combine into a unified 
+security remediation plan using planner 
+```
+
+## Key Features
+
+- **OWASP Top 10 (2021) systematic assessment** with specific vulnerability identification
+- **Multi-compliance framework support**: SOC2, PCI DSS, HIPAA, GDPR, FedRAMP
+- **Threat-level aware analysis**: Critical, high, medium, low threat classifications
+- **Technology-specific security patterns**: Web apps, APIs, mobile, cloud, enterprise systems
+- **Risk-based prioritization**: Business impact and exploitability assessment
+- **Audit focus customization**: Comprehensive, authentication, data protection, infrastructure
+- **Image support**: Security analysis from architecture diagrams, network topology, or security findings
+- **Multi-file security analysis**: Cross-component vulnerability identification
+- **Compliance gap analysis**: Specific framework requirements with remediation guidance
+- **Attack surface mapping**: Entry points, data flows, and privilege boundaries
+- **Security control effectiveness**: Evaluation of existing security measures
+
+## Tool Parameters
+
+**Workflow Investigation Parameters (used during step-by-step process):**
+- `step`: Current security investigation step description (required for each step)
+- `step_number`: Current step number in audit sequence (required)
+- `total_steps`: Estimated total investigation steps (typically 4-6, adjustable)
+- `next_step_required`: Whether another investigation step is needed
+- `findings`: Security discoveries and evidence collected in this step (required)
+- `files_checked`: All files examined during security investigation
+- `relevant_files`: Files directly relevant to security assessment (required in step 1)
+- `relevant_context`: Methods/functions/classes central to security findings
+- `issues_found`: Security issues identified with severity levels
+- `confidence`: Confidence level in security assessment completeness (exploring/low/medium/high/certain)
+- `backtrack_from_step`: Step number to backtrack from (for revisions)
+- `images`: Architecture diagrams, security documentation, or visual references
+
+**Initial Security Configuration (used in step 1):**
+- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `security_scope`: Application context, technology stack, and security boundary definition (required)
+- `threat_level`: low|medium|high|critical (default: medium) - determines assessment depth and urgency
+- `compliance_requirements`: List of compliance frameworks to assess against (e.g., ["PCI DSS", "SOC2"])
+- `audit_focus`: comprehensive|authentication|data_protection|infrastructure|api_security (default: comprehensive)
+- `severity_filter`: critical|high|medium|low|all (default: all)
+- `temperature`: Temperature for analytical consistency (0-1, default 0.2)
+- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
+- `use_websearch`: Enable web search for security best practices and vulnerability databases (default: true)
+- `use_assistant_model`: Whether to use expert security analysis phase (default: true)
+- `continuation_id`: Continue previous security audit discussions
+
+## Audit Focus Areas
+
+**Comprehensive (default):**
+- Full OWASP Top 10 assessment with all security domains
+- Authentication, authorization, data protection, infrastructure
+- Best for complete security posture evaluation
+
+**Authentication:**
+- Focused on identity, access management, and session security
+- Multi-factor authentication, password policies, privilege escalation
+- Best for user management and access control systems
+
+**Data Protection:**
+- Encryption, data handling, privacy controls, and compliance
+- Input validation, output encoding, data classification
+- Best for applications handling sensitive or regulated data
+
+**Infrastructure:**
+- Deployment security, configuration management, dependency security
+- Network security, container security, cloud security posture
+- Best for DevOps and infrastructure security assessment
+
+**API Security:**
+- REST/GraphQL security, rate limiting, API authentication
+- Input validation, authorization patterns, API gateway security
+- Best for API-first applications and microservices
+
+## Threat Levels
+
+Security assessment depth and urgency:
+
+- **🔴 CRITICAL**: Mission-critical systems, high-value targets, regulatory requirements
+- **🟠 HIGH**: Business-critical applications, customer data handling, financial systems
+- **🟡 MEDIUM**: Standard business applications, internal tools, moderate risk exposure
+- **🟢 LOW**: Development environments, non-sensitive applications, proof-of-concepts
+
+## Compliance Frameworks
+
+Supported compliance assessments:
+
+- **SOC2**: Security, availability, processing integrity, confidentiality, privacy
+- **PCI DSS**: Payment card industry data security standards
+- **HIPAA**: Healthcare information privacy and security
+- **GDPR**: General data protection regulation compliance
+- **FedRAMP**: Federal risk and authorization management program
+- **ISO27001**: Information security management systems
+- **NIST**: Cybersecurity framework controls
+
+## OWASP Top 10 (2021) Coverage
+
+Systematic assessment includes:
+
+1. **A01 Broken Access Control**: Authorization flaws and privilege escalation
+2. **A02 Cryptographic Failures**: Encryption and data protection issues
+3. **A03 Injection**: SQL, NoSQL, OS, and LDAP injection vulnerabilities
+4. **A04 Insecure Design**: Security design flaws and threat modeling gaps
+5. **A05 Security Misconfiguration**: Configuration and hardening issues
+6. **A06 Vulnerable Components**: Third-party and dependency vulnerabilities
+7. **A07 Identification & Authentication Failures**: Authentication bypass and session management
+8. **A08 Software & Data Integrity Failures**: Supply chain and integrity violations
+9. **A09 Security Logging & Monitoring Failures**: Detection and response capabilities
+10. **A10 Server-Side Request Forgery**: SSRF and related vulnerabilities
+
+## Usage Examples
+
+**Comprehensive E-commerce Security Audit:**
+```
+"Conduct a comprehensive secaudit with gemini pro for our Node.js e-commerce platform, threat level high, 
+compliance requirements PCI DSS and SOC2, focus on payment processing security"
+```
+
+**Authentication System Security Review:**
+```
+"Use o3 to perform secaudit on authentication microservice, focus on authentication, 
+threat level critical, check for OWASP A07 and multi-factor authentication implementation"
+```
+
+**API Security Assessment:**
+```
+"Secaudit our REST API gateway with gemini pro, audit focus api_security, 
+compliance requirements GDPR, threat level medium"
+```
+
+**Infrastructure Security Review:**
+```
+"Perform secaudit on Kubernetes deployment manifests with o3, focus infrastructure, 
+threat level high, include container security and network policies"
+```
+
+**Quick Security Scan:**
+```
+"Fast secaudit of user registration flow with flash, focus authentication, 
+severity filter critical and high only"
+```
+
+## Best Practices
+
+- **Define clear security scope**: Specify application type, tech stack, and security boundaries
+- **Set appropriate threat levels**: Match assessment depth to risk exposure and criticality
+- **Include compliance requirements**: Specify relevant frameworks for regulatory alignment
+- **Use parallel audits**: Run separate assessments for different components or compliance frameworks
+- **Provide architectural context**: Include system diagrams, data flow documentation, or deployment topology
+- **Focus audit scope**: Use audit_focus for targeted assessments of specific security domains
+- **Follow up on findings**: Use continuation feature to dive deeper into specific vulnerabilities
+
+## Output Format
+
+Security audits include:
+- **Executive Security Summary**: Overall security posture and critical findings
+- **OWASP Top 10 Assessment**: Systematic review of each category with specific findings
+- **Compliance Gap Analysis**: Framework-specific requirements and current compliance status
+- **Risk-Prioritized Findings**: Vulnerabilities ordered by exploitability and business impact
+- **Remediation Roadmap**: Phased approach to security improvements with quick wins
+- **Security Architecture Recommendations**: Structural improvements for long-term security posture
+
+## When to Use Secaudit vs Other Tools
+
+- **Use `secaudit`** for: Comprehensive security assessment, compliance evaluation, OWASP-based vulnerability analysis
+- **Use `codereview`** for: General code quality with some security considerations
+- **Use `analyze`** for: Understanding security architecture without vulnerability assessment
+- **Use `debug`** for: Investigating specific security incidents or exploit attempts
+- **Use `precommit`** for: Pre-deployment security validation and change impact assessment
\ No newline at end of file
diff --git a/docs/tools/testgen.md b/docs/tools/testgen.md
index 0d74a98..7539b21 100644
--- a/docs/tools/testgen.md
+++ b/docs/tools/testgen.md
@@ -70,7 +70,7 @@ Test generation excels with extended reasoning models like Gemini Pro or O3, whi
 
 **Initial Configuration (used in step 1):**
 - `prompt`: Description of what to test, testing objectives, and specific scope/focus areas (required)
-- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `model`: auto|pro|flash|flash-2.0|flashlite|o3|o3-mini|o4-mini|gpt4.1|gpt5|gpt5-mini|gpt5-nano (default: server default)
 - `test_examples`: Optional existing test files or directories to use as style/pattern reference (absolute paths)
 - `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
 - `use_assistant_model`: Whether to use expert test generation phase (default: true, set to false to use Claude only)
diff --git a/docs/tools/testgen.md.bak b/docs/tools/testgen.md.bak
new file mode 100644
index 0000000..0d74a98
--- /dev/null
+++ b/docs/tools/testgen.md.bak
@@ -0,0 +1,220 @@
+# TestGen Tool - Comprehensive Test Generation
+
+**Generates thorough test suites with edge case coverage through workflow-driven investigation**
+
+The `testgen` tool creates comprehensive test suites by analyzing your code paths, understanding intricate dependencies, and identifying realistic edge cases and failure scenarios that need test coverage. This workflow tool guides Claude through systematic investigation of code functionality, critical paths, edge cases, and integration points across multiple steps before generating comprehensive tests with realistic failure mode analysis.
+
+## Thinking Mode
+
+**Default is `medium` (8,192 tokens) for extended thinking models.** Use `high` for complex systems with many interactions or `max` for critical systems requiring exhaustive test coverage.
+
+## How the Workflow Works
+
+The testgen tool implements a **structured workflow** for comprehensive test generation:
+
+**Investigation Phase (Claude-Led):**
+1. **Step 1**: Claude describes the test generation plan and begins analyzing code functionality
+2. **Step 2+**: Claude examines critical paths, edge cases, error handling, and integration points
+3. **Throughout**: Claude tracks findings, test scenarios, and coverage gaps
+4. **Completion**: Once investigation is thorough, Claude signals completion
+
+**Test Generation Phase:**
+After Claude completes the investigation:
+- Complete test scenario catalog with all edge cases
+- Framework-specific test generation
+- Realistic failure mode coverage
+- Final test suite with comprehensive coverage
+
+This workflow ensures methodical analysis before test generation, resulting in more thorough and valuable test suites.
+
+## Model Recommendation
+
+Test generation excels with extended reasoning models like Gemini Pro or O3, which can analyze complex code paths, understand intricate dependencies, and identify comprehensive edge cases. The combination of large context windows and advanced reasoning enables generation of thorough test suites that cover realistic failure scenarios and integration points that shorter-context models might overlook.
+
+## Example Prompts
+
+**Basic Usage:**
+```
+"Use zen to generate tests for User.login() method"
+"Generate comprehensive tests for the sorting method in src/new_sort.py using o3"
+"Create tests for edge cases not already covered in our tests using gemini pro"
+```
+
+## Key Features
+
+- **Multi-agent workflow** analyzing code paths and identifying realistic failure modes
+- **Generates framework-specific tests** following project conventions
+- **Supports test pattern following** when examples are provided
+- **Dynamic token allocation** (25% for test examples, 75% for main code)
+- **Prioritizes smallest test files** for pattern detection
+- **Can reference existing test files**: `"Generate tests following patterns from tests/unit/"`
+- **Specific code coverage** - target specific functions/classes rather than testing everything
+- **Image support**: Test UI components, analyze visual requirements: `"Generate tests for this login form using the UI mockup screenshot"`
+- **Edge case identification**: Systematic discovery of boundary conditions and error states
+- **Realistic failure mode analysis**: Understanding what can actually go wrong in production
+- **Integration test support**: Tests that cover component interactions and system boundaries
+
+## Tool Parameters
+
+**Workflow Investigation Parameters (used during step-by-step process):**
+- `step`: Current investigation step description (required for each step)
+- `step_number`: Current step number in test generation sequence (required)
+- `total_steps`: Estimated total investigation steps (adjustable)
+- `next_step_required`: Whether another investigation step is needed
+- `findings`: Discoveries about functionality and test scenarios (required)
+- `files_checked`: All files examined during investigation
+- `relevant_files`: Files directly needing tests (required in step 1)
+- `relevant_context`: Methods/functions/classes requiring test coverage
+- `confidence`: Confidence level in test plan completeness (exploring/low/medium/high/certain)
+- `backtrack_from_step`: Step number to backtrack from (for revisions)
+
+**Initial Configuration (used in step 1):**
+- `prompt`: Description of what to test, testing objectives, and specific scope/focus areas (required)
+- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `test_examples`: Optional existing test files or directories to use as style/pattern reference (absolute paths)
+- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
+- `use_assistant_model`: Whether to use expert test generation phase (default: true, set to false to use Claude only)
+
+## Usage Examples
+
+**Method-Specific Tests:**
+```
+"Generate tests for User.login() method covering authentication success, failure, and edge cases"
+```
+
+**Class Testing:**
+```
+"Use pro to generate comprehensive tests for PaymentProcessor class with max thinking mode"
+```
+
+**Following Existing Patterns:**
+```
+"Generate tests for new authentication module following patterns from tests/unit/auth/"
+```
+
+**UI Component Testing:**
+```
+"Generate tests for this login form component using the UI mockup screenshot"
+```
+
+**Algorithm Testing:**
+```
+"Create thorough tests for the sorting algorithm in utils/sort.py, focus on edge cases and performance"
+```
+
+**Integration Testing:**
+```
+"Generate integration tests for the payment processing pipeline from order creation to completion"
+```
+
+## Test Generation Strategy
+
+**Code Path Analysis:**
+- Identifies all execution paths through the code
+- Maps conditional branches and loops
+- Discovers error handling paths
+- Analyzes state transitions
+
+**Edge Case Discovery:**
+- Boundary value analysis (empty, null, max values)
+- Invalid input scenarios
+- Race conditions and timing issues
+- Resource exhaustion cases
+
+**Failure Mode Analysis:**
+- External dependency failures
+- Network and I/O errors
+- Authentication and authorization failures
+- Data corruption scenarios
+
+**Framework Detection:**
+The tool automatically detects and generates tests for:
+- **Python**: pytest, unittest, nose2
+- **JavaScript**: Jest, Mocha, Jasmine, Vitest
+- **Java**: JUnit 4/5, TestNG, Mockito
+- **C#**: NUnit, MSTest, xUnit
+- **Swift**: XCTest
+- **Go**: testing package
+- **And more**: Adapts to project conventions
+
+## Test Categories Generated
+
+**Unit Tests:**
+- Function/method behavior validation
+- Input/output verification
+- Error condition handling
+- State change verification
+
+**Integration Tests:**
+- Component interaction testing
+- API endpoint validation
+- Database integration
+- External service mocking
+
+**Edge Case Tests:**
+- Boundary conditions
+- Invalid inputs
+- Resource limits
+- Concurrent access
+
+**Performance Tests:**
+- Response time validation
+- Memory usage checks
+- Load handling
+- Scalability verification
+
+## Best Practices
+
+- **Be specific about scope**: Target specific functions/classes rather than requesting tests for everything
+- **Provide test examples**: Include existing test files for pattern consistency
+- **Focus on critical paths**: Prioritize testing of business-critical functionality
+- **Include visual context**: Screenshots or mockups for UI component testing
+- **Describe testing objectives**: Explain what aspects are most important to test
+- **Consider test maintenance**: Request readable, maintainable test code
+
+## Test Quality Features
+
+**Realistic Test Data:**
+- Generates meaningful test data that represents real-world scenarios
+- Avoids trivial test cases that don't add value
+- Creates data that exercises actual business logic
+
+**Comprehensive Coverage:**
+- Happy path scenarios
+- Error conditions and exceptions
+- Edge cases and boundary conditions
+- Integration points and dependencies
+
+**Maintainable Code:**
+- Clear test names that describe what's being tested
+- Well-organized test structure
+- Appropriate use of setup/teardown
+- Minimal test data and mocking
+
+## Advanced Features
+
+**Pattern Following:**
+When test examples are provided, the tool analyzes:
+- Naming conventions and structure
+- Assertion patterns and style
+- Mocking and setup approaches
+- Test data organization
+
+**Large Context Analysis:**
+With models like Gemini Pro, the tool can:
+- Analyze extensive codebases for comprehensive test coverage
+- Understand complex interactions across multiple modules
+- Generate integration tests that span multiple components
+
+**Visual Testing:**
+For UI components and visual elements:
+- Generate tests based on visual requirements
+- Create accessibility testing scenarios
+- Test responsive design behaviors
+
+## When to Use TestGen vs Other Tools
+
+- **Use `testgen`** for: Creating comprehensive test suites, filling test coverage gaps, testing new features
+- **Use `debug`** for: Diagnosing specific test failures or runtime issues
+- **Use `codereview`** for: Reviewing existing test quality and coverage
+- **Use `analyze`** for: Understanding existing test structure without generating new tests
\ No newline at end of file
diff --git a/docs/tools/thinkdeep.md b/docs/tools/thinkdeep.md
index 26d5322..68a788d 100644
--- a/docs/tools/thinkdeep.md
+++ b/docs/tools/thinkdeep.md
@@ -30,7 +30,7 @@ with the best architecture for my project
 ## Tool Parameters
 
 - `prompt`: Your current thinking/analysis to extend and validate (required)
-- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `model`: auto|pro|flash|flash-2.0|flashlite|o3|o3-mini|o4-mini|gpt4.1|gpt5|gpt5-mini|gpt5-nano (default: server default)
 - `problem_context`: Additional context about the problem or goal
 - `focus_areas`: Specific aspects to focus on (architecture, performance, security, etc.)
 - `files`: Optional file paths or directories for additional context (absolute paths)
diff --git a/docs/tools/thinkdeep.md.bak b/docs/tools/thinkdeep.md.bak
new file mode 100644
index 0000000..26d5322
--- /dev/null
+++ b/docs/tools/thinkdeep.md.bak
@@ -0,0 +1,97 @@
+# ThinkDeep Tool - Extended Reasoning Partner
+
+**Get a second opinion to augment Claude's own extended thinking**
+
+The `thinkdeep` tool provides extended reasoning capabilities, offering a second perspective to augment Claude's analysis. It's designed to challenge assumptions, find edge cases, and provide alternative approaches to complex problems.
+
+## Thinking Mode
+
+**Default is `high` (16,384 tokens) for deep analysis.** Claude will automatically choose the best mode based on complexity - use `low` for quick validations, `medium` for standard problems, `high` for complex issues (default), or `max` for extremely complex challenges requiring deepest analysis.
+
+## Example Prompt
+
+```
+Think deeper about my authentication design with pro using max thinking mode and brainstorm to come up 
+with the best architecture for my project
+```
+
+## Key Features
+
+- **Uses Gemini's specialized thinking models** for enhanced reasoning capabilities
+- **Provides a second opinion** on Claude's analysis
+- **Challenges assumptions** and identifies edge cases Claude might miss
+- **Offers alternative perspectives** and approaches
+- **Validates architectural decisions** and design patterns
+- **File reference support**: `"Use gemini to think deeper about my API design with reference to api/routes.py"`
+- **Image support**: Analyze architectural diagrams, flowcharts, design mockups: `"Think deeper about this system architecture diagram with gemini pro using max thinking mode"`
+- **Enhanced Critical Evaluation (v2.10.0)**: After Gemini's analysis, Claude is prompted to critically evaluate the suggestions, consider context and constraints, identify risks, and synthesize a final recommendation - ensuring a balanced, well-considered solution
+- **Web search capability**: When enabled (default: true), identifies areas where current documentation or community solutions would strengthen the analysis and suggests specific searches for Claude
+
+## Tool Parameters
+
+- `prompt`: Your current thinking/analysis to extend and validate (required)
+- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `problem_context`: Additional context about the problem or goal
+- `focus_areas`: Specific aspects to focus on (architecture, performance, security, etc.)
+- `files`: Optional file paths or directories for additional context (absolute paths)
+- `images`: Optional images for visual analysis (absolute paths)
+- `temperature`: Temperature for creative thinking (0-1, default 0.7)
+- `thinking_mode`: minimal|low|medium|high|max (default: high, Gemini only)
+- `use_websearch`: Enable web search for documentation and insights (default: true)
+- `continuation_id`: Continue previous conversations
+
+## Usage Examples
+
+**Architecture Design:**
+```
+"Think deeper about my microservices authentication strategy with pro using max thinking mode"
+```
+
+**With File Context:**
+```
+"Use gemini to think deeper about my API design with reference to api/routes.py and models/user.py"
+```
+
+**Visual Analysis:**
+```
+"Think deeper about this system architecture diagram with gemini pro - identify potential bottlenecks"
+```
+
+**Problem Solving:**
+```
+"I'm considering using GraphQL vs REST for my API. Think deeper about the trade-offs with o3 using high thinking mode"
+```
+
+**Code Review Enhancement:**
+```
+"Think deeper about the security implications of this authentication code with pro"
+```
+
+## Best Practices
+
+- **Provide detailed context**: Share your current thinking, constraints, and objectives
+- **Be specific about focus areas**: Mention what aspects need deeper analysis
+- **Include relevant files**: Reference code, documentation, or configuration files
+- **Use appropriate thinking modes**: Higher modes for complex problems, lower for quick validations
+- **Leverage visual context**: Include diagrams or mockups for architectural discussions
+- **Build on discussions**: Use continuation to extend previous analyses
+
+## Enhanced Critical Evaluation Process
+
+The `thinkdeep` tool includes a unique two-stage process:
+
+1. **Gemini's Analysis**: Extended reasoning with specialized thinking capabilities
+2. **Claude's Critical Evaluation**: Claude reviews Gemini's suggestions, considers:
+   - Context and constraints of your specific situation
+   - Potential risks and implementation challenges
+   - Trade-offs and alternatives
+   - Final synthesized recommendation
+
+This ensures you get both deep reasoning and practical, context-aware advice.
+
+## When to Use ThinkDeep vs Other Tools
+
+- **Use `thinkdeep`** for: Extending specific analysis, challenging assumptions, architectural decisions
+- **Use `chat`** for: Open-ended brainstorming and general discussions
+- **Use `analyze`** for: Understanding existing code without extending analysis
+- **Use `codereview`** for: Finding specific bugs and security issues
\ No newline at end of file

From 80ac32038e0dfa7fb632b16fe356994ee7bfb716 Mon Sep 17 00:00:00 2001
From: Sven Lito <me@svenlito.com>
Date: Tue, 12 Aug 2025 14:40:07 +0700
Subject: [PATCH 2/6] docs: major README refactor and comprehensive
 documentation improvements

README improvements:
- Reduce README from 725 to 169 lines (77% reduction)
- Focus on quick start and essential information
- Link to detailed docs instead of duplicating content
- Improve scannability with clear sections and emojis
- Add concise tool categorization and workflows

Documentation structure:
- Create comprehensive getting-started.md guide
- Move detailed setup instructions from README
- Include troubleshooting, configuration templates
- Add step-by-step installation for all methods

Benefits:
- Faster onboarding for new users
- Progressive disclosure of information
- Better GitHub discovery experience
- Maintainable documentation structure
- Clear separation of concerns

The README now serves as an effective landing page while the
detailed documentation provides comprehensive guidance.
---
 README.md               | 747 ++++++----------------------------------
 docs/getting-started.md | 359 +++++++++++++++++++
 2 files changed, 455 insertions(+), 651 deletions(-)
 create mode 100644 docs/getting-started.md

diff --git a/README.md b/README.md
index f2bc0ba..f955d34 100644
--- a/README.md
+++ b/README.md
@@ -8,716 +8,161 @@
 
 <br/>
 
-The ultimate development partners for your favorite Coding Agent ([Claude](https://www.anthropic.com/claude-code) OR [Gemini CLI](https://github.com/google-gemini/gemini-cli)) - a Model Context Protocol server that gives you access to multiple AI
-models for enhanced code analysis, problem-solving, and collaborative development.
+**AI orchestration for Claude Code** - A Model Context Protocol server that gives Claude access to multiple AI models for enhanced code analysis, problem-solving, and collaborative development.
 
-**Features true AI orchestration with conversations that continue across workflows** - Give Claude a complex
-_workflow_ and let it orchestrate between models automatically. Claude stays in control, performs the actual work,
-but gets perspectives from the best AI for each subtask. With tools like [`planner`](#3-planner---interactive-step-by-step-planning) for
-breaking down complex projects, [`analyze`](#8-analyze---smart-file-analysis) for understanding codebases,
-[`codereview`](#5-codereview---professional-code-review) for audits, [`refactor`](#9-refactor---intelligent-code-refactoring) for
-improving code structure, [`debug`](#7-debug---expert-debugging-assistant) for solving complex problems, and [`precommit`](#6-precommit---pre-commit-validation) for
-validating changes, Claude can switch between different tools _and_ models mid-conversation,
-with context carrying forward seamlessly.
+**True AI collaboration with conversation continuity** - Claude stays in control but gets perspectives from the best AI for each subtask. Context carries forward seamlessly across tools and models, enabling complex workflows like: code reviews with multiple models → automated planning → implementation → pre-commit validation.
 
-**Example Workflow - Claude Code:**
-1. `Perform a codereview using gemini pro and o3 and use planner to generate a detailed plan, implement the fixes and do a final precommit check by continuing from the previous codereview`
-2. This triggers a [`codereview`](#5-codereview---professional-code-review) workflow where Claude walks the code, looking for all kinds of issues
-3. After multiple passes, collects relevant code and makes note of issues along the way
-4. Maintains a `confidence` level between `exploring`, `low`, `medium`, `high` and `certain` to track how confidently it's been able to find and identify issues
-5. Generates a detailed list of critical -> low issues
-6. Shares the relevant files, findings, etc with **Gemini Pro** to perform a deep dive for a second [`codereview`](#5-codereview---professional-code-review)
-7. Comes back with a response and next does the same with o3, adding to the prompt if a new discovery comes to light
-8. When done, Claude takes in all the feedback and combines a single list of all critical -> low issues, including good patterns in your code. The final list includes new findings or revisions in case Claude misunderstood or missed something crucial and one of the other models pointed this out
-9. It then uses the [`planner`](#3-planner---interactive-step-by-step-planning) workflow to break the work down into simpler steps if a major refactor is required
-10. Claude then performs the actual work of fixing highlighted issues
-11. When done, Claude returns to Gemini Pro for a [`precommit`](#6-precommit---pre-commit-validation) review
+> **You're in control.** Claude orchestrates the AI team, but you decide the workflow. Craft powerful prompts that bring in Gemini, O3, Flash, or local models exactly when needed.
 
-All within a single conversation thread! Gemini Pro in step 11 _knows_ what was recommended by O3 in step 7! Taking that context
-and review into consideration to aid with its final pre-commit review.
+## Quick Start (5 minutes)
 
-**Think of it as Claude Code _for_ Claude Code.** This MCP isn't magic. It's just **super-glue**.
+**Prerequisites:** Python 3.10+, Git, [uv installed](https://docs.astral.sh/uv/getting-started/installation/)
 
-> **Remember:** Claude stays in full control — but **YOU** call the shots.
-> Zen is designed to have Claude engage other models only when needed — and to follow through with meaningful back-and-forth.
-> **You're** the one who crafts the powerful prompt that makes Claude bring in Gemini, Flash, O3 — or fly solo.
-> You're the guide. The prompter. The puppeteer.
-> ### You are the AI - **Actually Intelligent**.
+**1. Get API Keys** (choose one or more):
+- **[OpenRouter](https://openrouter.ai/)** - Access multiple models with one API
+- **[Gemini](https://makersuite.google.com/app/apikey)** - Google's latest models
+- **[OpenAI](https://platform.openai.com/api-keys)** - O3, GPT-5 series
+- **[Ollama](https://ollama.ai/)** - Local models (free)
 
-Because these AI models [clearly aren't when they get chatty →](docs/ai_banter.md)
+**2. Install** (choose one):
 
-## Quick Navigation
-
-- **Getting Started**
-  - [Quickstart](#quickstart-5-minutes) - Get running in 5 minutes
-  - [Available Tools](#available-tools) - Overview of all tools
-  - [AI-to-AI Conversations](#ai-to-ai-conversation-threading) - Multi-turn conversations
-
-- **Tools Reference**
-  - [`chat`](#1-chat---general-development-chat--collaborative-thinking) - Collaborative thinking
-  - [`thinkdeep`](#2-thinkdeep---extended-reasoning-partner) - Extended reasoning
-  - [`challenge`](#3-challenge---critical-challenge-prompt) - Prevents **You're absolutely right!** responses
-  - [`planner`](#4-planner---interactive-step-by-step-planning) - Interactive step-by-step planning
-  - [`consensus`](#5-consensus---multi-model-perspective-gathering) - Multi-model consensus analysis
-  - [`codereview`](#6-codereview---professional-code-review) - Code review
-  - [`precommit`](#7-precommit---pre-commit-validation) - Pre-commit validation
-  - [`debug`](#8-debug---expert-debugging-assistant) - Debugging help
-  - [`analyze`](#9-analyze---smart-file-analysis) - File analysis
-  - [`refactor`](#10-refactor---intelligent-code-refactoring) - Code refactoring with decomposition focus
-  - [`tracer`](#11-tracer---static-code-analysis-prompt-generator) - Call-flow mapping and dependency tracing
-  - [`testgen`](#12-testgen---comprehensive-test-generation) - Test generation with edge cases
-  - [`secaudit`](#13-secaudit---comprehensive-security-audit) - Security audit with OWASP analysis
-  - [`docgen`](#14-docgen---comprehensive-documentation-generation) - Documentation generation with complexity analysis
-
-- **Advanced Usage**
-  - [Advanced Features](#advanced-features) - AI-to-AI conversations, large prompts, web search
-  - [Complete Advanced Guide](docs/advanced-usage.md) - Model configuration, thinking modes, workflows, tool parameters
-
-- **Setup & Support**
-  - [WSL Setup Guide](docs/wsl-setup.md) - Windows Subsystem for Linux configuration
-  - [Troubleshooting Guide](docs/troubleshooting.md) - Common issues and debugging steps
-  - [License](#license) - Apache 2.0
-
-## Why This Server?
-
-Claude is brilliant, but sometimes you need:
-- **Guided workflows** - Developer-centric processes that enforce systematic investigation, preventing rushed analysis by ensuring Claude examines code thoroughly at each phase ([`debug`](#7-debug---expert-debugging-assistant), [`precommit`](#6-precommit---pre-commit-validation), [`refactor`](#9-refactor---intelligent-code-refactoring), [`analyze`](#8-analyze---smart-file-analysis), [`codereview`](#5-codereview---professional-code-review))
-- **Multiple AI perspectives** - Let Claude orchestrate between different models to get the best analysis
-- **Automatic model selection** - Claude picks the right model for each task (or you can specify)
-- **A senior developer partner** to validate and extend ideas ([`chat`](#1-chat---general-development-chat--collaborative-thinking))
-- **A second opinion** on complex architectural decisions - augment Claude's thinking with perspectives from Gemini Pro, O3, or [dozens of other models via custom endpoints](docs/custom_models.md) ([`thinkdeep`](#2-thinkdeep---extended-reasoning-partner))
-- **Get multiple expert opinions** - Have different AI models debate your ideas (some supporting, some critical) to help you make better decisions ([`consensus`](#3-consensus---multi-model-perspective-gathering))
-- **Professional code reviews** with actionable feedback across entire repositories ([`codereview`](#4-codereview---professional-code-review))
-- **Pre-commit validation** with deep analysis using the best model for the job ([`precommit`](#5-precommit---pre-commit-validation))
-- **Expert debugging** - O3 for logical issues, Gemini for architectural problems ([`debug`](#6-debug---expert-debugging-assistant))
-- **Extended context windows beyond Claude's limits** - Delegate analysis to Gemini (1M tokens) or O3 (200K tokens) for entire codebases, large datasets, or comprehensive documentation
-- **Model-specific strengths** - Extended thinking with Gemini Pro, fast iteration with Flash, strong reasoning with O3, local privacy with Ollama
-- **Local model support** - Run models like Llama 3.2 locally via Ollama, vLLM, or LM Studio for privacy and cost control
-- **Dynamic collaboration** - Models can request additional context and follow-up replies from Claude mid-analysis
-- **Smart file handling** - Automatically expands directories, manages token limits based on model capacity
-- **Vision support** - Analyze images, diagrams, screenshots, and visual content with vision-capable models
-- **[Bypass MCP's token limits](docs/advanced-usage.md#working-with-large-prompts)** - Work around MCP's 25K limit automatically
-- **[Context revival across sessions](docs/context-revival.md)** - Continue conversations even after Claude's context resets, with other models maintaining full history
-
-## Pro Tip: Context Revival
-
-**This is an extremely powerful feature that cannot be highlighted enough**:
-
-> The most amazing side-effect of this _conversation continuation_ system is that even AFTER Claude's context resets or
-> compacts, since the continuation info is kept within MCP's memory, you can ask it to _continue_ discussing
-> the plan with `o3`, and it will suddenly revive Claude because O3 would know what was being talked about and
-> relay this back in a way that re-ignites Claude's understanding. All this without wasting context on asking Claude to
-> ingest lengthy documents / code again and re-prompting it to communicate with another model. Zen manages that internally. The model's response
-> revives Claude with better context around the discussion than an automatic summary ever can.
-
-**[📖 Read the complete technical deep-dive on how this revolutionary system works](docs/context-revival.md)**
-
-This server orchestrates multiple AI models as your development team, with Claude automatically selecting the best model for each task or allowing you to choose specific models for different strengths.
-
-<div align="center">
-  <img src="https://github.com/user-attachments/assets/0f3c8e2d-a236-4068-a80e-46f37b0c9d35" width="600">
-</div>
-
-**Prompt Used:**
-```
-Study the code properly, think deeply about what this does and then see if there's any room for improvement in
-terms of performance optimizations, brainstorm with gemini on this to get feedback and then confirm any change by
-first adding a unit test with `measure` and measuring current code and then implementing the optimization and
-measuring again to ensure it improved, then share results. Check with gemini in between as you make tweaks.
-```
-
-The final implementation resulted in a 26% improvement in JSON parsing performance for the selected library, reducing processing time through targeted, collaborative optimizations guided by Gemini’s analysis and Claude’s refinement.
-
-## Quickstart (5 minutes)
-
-### Prerequisites
-
-- Python 3.10+ (3.12 recommended)
-- Git
-- **Windows users**: WSL2 is required for Claude Code CLI
-
-### 1. Get API Keys (at least one required)
-
-**Option A: OpenRouter (Access multiple models with one API)**
-- **OpenRouter**: Visit [OpenRouter](https://openrouter.ai/) for access to multiple models through one API. [Setup Guide](docs/custom_models.md)
-  - Control model access and spending limits directly in your OpenRouter dashboard
-  - Configure model aliases in [`conf/custom_models.json`](conf/custom_models.json)
-
-**Option B: Native APIs**
-- **Gemini**: Visit [Google AI Studio](https://makersuite.google.com/app/apikey) and generate an API key. For best results with Gemini 2.5 Pro, use a paid API key as the free tier has limited access to the latest models.
-- **OpenAI**: Visit [OpenAI Platform](https://platform.openai.com/api-keys) to get an API key for O3 model access.
-- **X.AI**: Visit [X.AI Console](https://console.x.ai/) to get an API key for GROK model access.
-- **DIAL**: Visit [DIAL Platform](https://dialx.ai/) to get an API key for accessing multiple models through their unified API. DIAL is an open-source AI orchestration platform that provides vendor-agnostic access to models from major providers, open-source community, and self-hosted deployments. [API Documentation](https://dialx.ai/dial_api)
-
-**Option C: Custom API Endpoints (Local models like Ollama, vLLM)**
-[Please see the setup guide](docs/custom_models.md#option-2-custom-api-setup-ollama-vllm-etc). With a custom API you can use:
-- **Ollama**: Run models like Llama 3.2 locally for free inference
-- **vLLM**: Self-hosted inference server for high-throughput inference
-- **LM Studio**: Local model hosting with OpenAI-compatible API interface
-- **Text Generation WebUI**: Popular local interface for running models
-- **Any OpenAI-compatible API**: Custom endpoints for your own infrastructure
-
-> **Note:** Using multiple provider options may create ambiguity about which provider / model to use if there is an overlap.
-> If all APIs are configured, native APIs will take priority when there is a clash in model name, such as for `gemini` and `o3`.
-> Configure your model aliases and give them unique names in [`conf/custom_models.json`](conf/custom_models.json)
-
-### 2. Choose Your Installation Method
-
-**Option A: Quick Install with uvx**
-
-**Prerequisites**: Install [uv](https://docs.astral.sh/uv/getting-started/installation/) first (required for uvx)
-
-<details>
-<summary>Claude Desktop Configuration</summary>
-
-Add this to your `claude_desktop_config.json`:
+**Option A: Instant Setup with uvx** (recommended)
 ```json
+// Add to ~/.claude/settings.json or .mcp.json
 {
   "mcpServers": {
     "zen": {
-      "command": "sh",
-      "args": [
-        "-c",
-        "exec $(which uvx || echo uvx) --from git+https://github.com/BeehiveInnovations/zen-mcp-server.git zen-mcp-server"
-      ],
-      "env": {
-        "PATH": "/usr/local/bin:/usr/bin:/bin:/opt/homebrew/bin:~/.local/bin",
-        "OPENAI_API_KEY": "your_api_key_here"
-      }
-    }
-  }
-}
-```
-</details>
-
-<details>
-<summary>Claude Code CLI Configuration</summary>
-
-Create a `.mcp.json` file in your project root for [project-scoped configuration](https://docs.anthropic.com/en/docs/claude-code/mcp#project-scope):
-```json
-{
-  "mcpServers": {
-    "zen": {
-      "command": "sh",
-      "args": [
-        "-c",
-        "exec $(which uvx || echo uvx) --from git+https://github.com/BeehiveInnovations/zen-mcp-server.git zen-mcp-server"
-      ],
-      "env": {
-        "PATH": "/usr/local/bin:/usr/bin:/bin:/opt/homebrew/bin:~/.local/bin",
-        "OPENAI_API_KEY": "your_api_key_here"
-      }
-    }
-  }
-}
-```
-</details>
-
-<details>
-<summary>Gemini CLI Configuration</summary>
-
-Edit `~/.gemini/settings.json` and add:
-```json
-{
-  "mcpServers": {
-    "zen": {
-      "command": "sh",
-      "args": [
-        "-c",
-        "exec $(which uvx || echo uvx) --from git+https://github.com/BeehiveInnovations/zen-mcp-server.git zen-mcp-server"
-      ],
-      "env": {
-        "PATH": "/usr/local/bin:/usr/bin:/bin:/opt/homebrew/bin:~/.local/bin",
-        "OPENAI_API_KEY": "your_api_key_here"
-      }
+      "command": "uvx",
+      "args": ["--from", "git+https://github.com/BeehiveInnovations/zen-mcp-server.git", "zen-mcp-server"],
+      "env": { "GEMINI_API_KEY": "your-key-here" }
     }
   }
 }
 ```
 
-**Note**: While Zen MCP Server connects successfully to Gemini CLI, tool invocation is not working correctly yet. See [Gemini CLI Setup](docs/gemini-setup.md) for updates.
-</details>
-
-**What this does:**
-- **Zero setup required** - uvx handles everything automatically
-- **Always up-to-date** - Pulls latest version on each run
-- **No local dependencies** - Works without Python environment setup
-- **Instant availability** - Ready to use immediately
-
-
-**Option B: Traditional Clone and Set Up**
-
+**Option B: Clone and Setup**
 ```bash
-# Clone to your preferred location
 git clone https://github.com/BeehiveInnovations/zen-mcp-server.git
 cd zen-mcp-server
-
-# One-command setup installs Zen in Claude
-./run-server.sh
-
-# Or for Windows users using PowerShell:
-./run-server.ps1
-
-# To view MCP configuration for Claude
-./run-server.sh -c
-
-# PowerShell:
-./run-server.ps1 -Config
-
-# See help for more
-./run-server.sh --help
-
-# PowerShell:
-./run-server.ps1 -Help
+./run-server.sh  # Handles everything: setup, config, API keys
 ```
 
-**What this does:**
-- **Sets up everything automatically** - Python environment, dependencies, configuration
-- **Configures Claude integrations** - Adds to Claude Code CLI and guides Desktop setup
-- **Ready to use immediately** - No manual configuration needed
-- **Also works with Gemini CLI** - See [Gemini CLI Setup](docs/gemini-setup.md) for configuration
-
-**After updates:** Always run `./run-server.sh` again after `git pull` to ensure everything stays current.
-
-**Windows users:** Using WSL? See the [WSL Setup Guide](docs/wsl-setup.md) for detailed instructions.
-
-### 3. Add Your API Keys
-
-```bash
-# Edit .env to add your API keys (if not already set in environment)
-nano .env
-
-# The file will contain, at least one should be set:
-# GEMINI_API_KEY=your-gemini-api-key-here  # For Gemini models
-# OPENAI_API_KEY=your-openai-api-key-here  # For O3 model
-# XAI_API_KEY=your-xai-api-key-here        # For Grok models
-# OPENROUTER_API_KEY=your-openrouter-key  # For OpenRouter (see docs/custom_models.md)
-# DIAL_API_KEY=your-dial-api-key-here      # For DIAL platform
-
-# For DIAL (optional configuration):
-# DIAL_API_HOST=https://core.dialx.ai      # Default DIAL host (optional)
-# DIAL_API_VERSION=2024-12-01-preview      # API version (optional)
-# DIAL_ALLOWED_MODELS=o3,gemini-2.5-pro    # Restrict to specific models (optional)
-
-# For local models (Ollama, vLLM, etc.):
-# CUSTOM_API_URL=http://localhost:11434/v1  # Ollama example
-# CUSTOM_API_KEY=                                      # Empty for Ollama
-# CUSTOM_MODEL_NAME=llama3.2                          # Default model
-
-# Note: At least one API key OR custom URL is required
+**3. Start Using!**
+```
+"Use zen to analyze this code for security issues with gemini pro"
+"Debug this error with o3 and then get flash to suggest optimizations"
+"Plan the migration strategy with zen, get consensus from multiple models"
 ```
 
-**No restart needed**: The server reads the .env file each time Claude calls a tool, so changes take effect immediately.
+👉 **[Complete Setup Guide](docs/getting-started.md)** with detailed installation, configuration, and troubleshooting
 
-**Next**: Now run `claude` from your project folder using the terminal for it to connect to the newly added mcp server.
-If you were already running a `claude` code session, please exit and start a new session.
+## Core Tools
 
-#### If Setting up for Claude Desktop
+**🤝 Collaboration & Planning**
+- **[`chat`](docs/tools/chat.md)** - Brainstorm ideas, get second opinions, validate approaches
+- **[`thinkdeep`](docs/tools/thinkdeep.md)** - Extended reasoning, edge case analysis, alternative perspectives
+- **[`planner`](docs/tools/planner.md)** - Break down complex projects into structured, actionable plans
+- **[`consensus`](docs/tools/consensus.md)** - Get expert opinions from multiple AI models with stance steering
 
-**Need the exact configuration?** Run `./run-server.sh -c` to display the platform-specific setup instructions with correct paths.
+**🔍 Code Analysis & Quality**
+- **[`analyze`](docs/tools/analyze.md)** - Understand architecture, patterns, dependencies across entire codebases
+- **[`codereview`](docs/tools/codereview.md)** - Professional reviews with severity levels and actionable feedback
+- **[`debug`](docs/tools/debug.md)** - Systematic investigation and root cause analysis
+- **[`precommit`](docs/tools/precommit.md)** - Validate changes before committing, prevent regressions
 
-1. **Open Claude Desktop config**: Settings → Developer → Edit Config
-2. **Copy the configuration** shown by `./run-server.sh -c` into your `claude_desktop_config.json`
-3. **Restart Claude Desktop** for changes to take effect
+**⚒️ Development Tools**
+- **[`refactor`](docs/tools/refactor.md)** - Intelligent code refactoring with decomposition focus
+- **[`testgen`](docs/tools/testgen.md)** - Comprehensive test generation with edge cases
+- **[`secaudit`](docs/tools/secaudit.md)** - Security audits with OWASP Top 10 analysis
+- **[`docgen`](docs/tools/docgen.md)** - Generate documentation with complexity analysis
 
-### 4. Start Using It!
+**🔧 Utilities**
+- **[`challenge`](docs/tools/challenge.md)** - Prevent "You're absolutely right!" responses with critical analysis
+- **[`tracer`](docs/tools/tracer.md)** - Static analysis prompts for call-flow mapping
 
-Just ask Claude naturally:
-- "Think deeper about this architecture design with zen" → Claude picks best model + `thinkdeep`
-- "Using zen perform a code review of this code for security issues" → Claude might pick Gemini Pro + `codereview`
-- "Use zen and debug why this test is failing, the bug might be in my_class.swift" → Claude might pick O3 + `debug`
-- "With zen, analyze these files to understand the data flow" → Claude picks appropriate model + `analyze`
-- "Use flash to suggest how to format this code based on the specs mentioned in policy.md" → Uses Gemini Flash specifically
-- "Think deeply about this and get o3 to debug this logic error I found in the checkOrders() function" → Uses O3 specifically
-- "Brainstorm scaling strategies with pro. Study the code, pick your preferred strategy and debate with pro to settle on two best approaches" → Uses Gemini Pro specifically
-- "Use local-llama to localize and add missing translations to this project" → Uses local Llama 3.2 via custom URL
-- "First use local-llama for a quick local analysis, then use opus for a thorough security review" → Uses both providers in sequence
+👉 **[Complete Tools Reference](docs/tools/)** with examples, parameters, and workflows
 
-## Available Tools
+## Key Features
 
-These aren't just tools—they're how you get Claude to think like a real developer. Instead of rushing to reply with
-surface-level takes or shallow-insight, these workflows make Claude pause, dig into your code, and reason through
-problems step by step.
+**🧠 AI Orchestration**
+- **Auto model selection** - Claude picks the right AI for each task
+- **Multi-model workflows** - Chain different models in single conversations
+- **Conversation continuity** - Context preserved across tools and models
+- **[Context revival](docs/context-revival.md)** - Continue conversations even after context resets
 
-It's the difference between a rushed guess and a focused second pair of eyes that actually understands your code. Try them
-and feel the difference.
+**⚡ Model Support**
+- **Multiple providers** - Gemini, OpenAI, X.AI, OpenRouter, DIAL, Ollama
+- **Latest models** - GPT-5, Gemini 2.5 Pro, O3, Grok-4, local Llama
+- **[Thinking modes](docs/advanced-usage.md#thinking-modes)** - Control reasoning depth vs cost
+- **Vision support** - Analyze images, diagrams, screenshots
 
-**Quick Tool Selection Guide:**
-- **Need a thinking partner?** → `chat` (brainstorm ideas, get second opinions, validate approaches)
-- **Need deeper thinking?** → `thinkdeep` (extends analysis, finds edge cases)
-- **Want to prevent "You're absolutely right!" responses?** → `challenge` (challenges assumptions, encourages thoughtful re-evaluation)
-- **Need to break down complex projects?** → `planner` (step-by-step planning, project structure, breaking down complex ideas)
-- **Need multiple perspectives?** → `consensus` (get diverse expert opinions on proposals and decisions)
-- **Code needs review?** → `codereview` (bugs, security, performance issues)
-- **Pre-commit validation?** → `precommit` (validate git changes before committing)
-- **Something's broken?** → `debug` (systematic investigation, step-by-step root cause analysis)
-- **Want to understand code?** → `analyze` (architecture, patterns, dependencies)
-- **Code needs refactoring?** → `refactor` (intelligent refactoring with decomposition focus)
-- **Need call-flow analysis?** → `tracer` (generates prompts for execution tracing and dependency mapping)
-- **Need comprehensive tests?** → `testgen` (generates test suites with edge cases)
-- **Security concerns?** → `secaudit` (OWASP analysis, compliance evaluation, vulnerability assessment)
-- **Code needs documentation?** → `docgen` (generates comprehensive documentation with complexity analysis)
-- **Which models are available?** → `listmodels` (shows all configured providers and models)
-- **Server info?** → `version` (version and configuration details)
+**🛠 Developer Experience**
+- **Guided workflows** - Systematic investigation prevents rushed analysis
+- **Smart file handling** - Auto-expand directories, manage token limits
+- **Web search integration** - Access current documentation and best practices
+- **[Large prompt support](docs/advanced-usage.md#working-with-large-prompts)** - Bypass MCP's 25K token limit
 
-**Auto Mode:** When `DEFAULT_MODEL=auto`, Claude automatically picks the best model for each task. You can override with: "Use flash for quick analysis" or "Use o3 to debug this".
-
-**Model Selection Examples:**
-- Complex architecture review → Claude picks Gemini Pro
-- Quick formatting check → Claude picks Flash
-- Logical debugging → Claude picks O3
-- General explanations → Claude picks Flash for speed
-- Local analysis → Claude picks your Ollama model
-
-**Pro Tip:** Thinking modes (for Gemini models) control depth vs token cost. Use "minimal" or "low" for quick tasks, "high" or "max" for complex problems. [Learn more](docs/advanced-usage.md#thinking-modes)
-
-**Tools Overview:**
-1. [`chat`](docs/tools/chat.md) - Collaborative thinking and development conversations
-2. [`thinkdeep`](docs/tools/thinkdeep.md) - Extended reasoning and problem-solving
-3. [`challenge`](docs/tools/challenge.md) - Critical challenge prompt, prevents **You're absolutely right!**
-4. [`planner`](docs/tools/planner.md) - Interactive sequential planning for complex projects
-5. [`consensus`](docs/tools/consensus.md) - Multi-model consensus analysis with stance steering
-6. [`codereview`](docs/tools/codereview.md) - Professional code review with severity levels
-7. [`precommit`](docs/tools/precommit.md) - Validate git changes before committing
-8. [`debug`](docs/tools/debug.md) - Systematic investigation and debugging
-9. [`analyze`](docs/tools/analyze.md) - General-purpose file and code analysis
-10. [`refactor`](docs/tools/refactor.md) - Code refactoring with decomposition focus
-11. [`tracer`](docs/tools/tracer.md) - Static code analysis prompt generator for call-flow mapping
-12. [`testgen`](docs/tools/testgen.md) - Comprehensive test generation with edge case coverage
-13. [`secaudit`](docs/tools/secaudit.md) - Comprehensive security audit with OWASP Top 10 analysis
-14. [`docgen`](docs/tools/docgen.md) - Comprehensive documentation generation with complexity analysis
-15. [`listmodels`](docs/tools/listmodels.md) - Display all available AI models organized by provider
-16. [`version`](docs/tools/version.md) - Get server version and configuration
-
-### 1. `chat` - General Development Chat & Collaborative Thinking
-Your thinking partner for brainstorming, getting second opinions, and validating approaches. Perfect for technology comparisons, architecture discussions, and collaborative problem-solving.
+## Example Workflows
 
+**Multi-model Code Review:**
 ```
-Chat with zen about the best approach for user authentication in my React app
+"Perform a codereview using gemini pro and o3, then use planner to create a fix strategy"
 ```
+→ Claude reviews code systematically → Consults Gemini Pro → Gets O3's perspective → Creates unified action plan
 
-**[📖 Read More](docs/tools/chat.md)** - Detailed features, examples, and best practices
-
-### 2. `thinkdeep` - Extended Reasoning Partner
-Get a second opinion to augment Claude's own extended thinking. Uses specialized thinking models to challenge assumptions, identify edge cases, and provide alternative perspectives.
-
+**Collaborative Debugging:**
 ```
-The button won't animate when clicked, it seems something else is intercepting the clicks. Use thinkdeep with gemini pro after gathering related code and handing it the files
-and find out what the root cause is
+"Debug this race condition with max thinking mode, then validate the fix with precommit"
 ```
+→ Deep investigation → Expert analysis → Solution implementation → Pre-commit validation
 
-**[📖 Read More](docs/tools/thinkdeep.md)** - Enhanced analysis capabilities and critical evaluation process
-
-### 3. `challenge` - Critical Challenge Prompt
-Encourages thoughtful reassessment of statements instead of automatic agreement, especially when you're wrong.
-Wraps your input with instructions for critical thinking and honest analysis.
-
+**Architecture Planning:**
 ```
-challenge isn't adding this function to the base class a bad idea?
+"Plan our microservices migration, get consensus from pro and o3 on the approach"
 ```
+→ Structured planning → Multiple expert opinions → Consensus building → Implementation roadmap
 
-Normally, your favorite coding agent will enthusiastically reply with **“You’re absolutely right!”**—then proceed 
-to completely reverse the _correct_ strategy, without ever explaining why you're wrong.
+👉 **[Advanced Usage Guide](docs/advanced-usage.md)** for complex workflows, model configuration, and power-user features
 
-<details>
-  <summary>Example: Without vs With Zen</summary>
+## Why Zen MCP?
 
-  **Without Zen:**
-  ![without_zen@2x](https://github.com/user-attachments/assets/64f3c9fb-7ca9-4876-b687-25e847edfd87)
+**🎯 Problem:** Claude is brilliant but sometimes needs:
+- Multiple AI perspectives for complex decisions
+- Systematic workflows to prevent rushed analysis
+- Extended context beyond its limits
+- Access to specialized models (reasoning, speed, local)
 
-  **With Zen:**
-  ![with_zen@2x](https://github.com/user-attachments/assets/9d72f444-ba53-4ab1-83e5-250062c6ee70)
+**✅ Solution:** Zen orchestrates AI models as Claude's development team:
+- **Claude stays in control** - You give instructions to Claude
+- **Models provide expertise** - Each AI contributes their strengths
+- **Context flows seamlessly** - Full conversation history across tools
+- **You decide the workflow** - Simple requests or complex orchestration
 
-  </details>
+## Quick Links
 
-**[📖 Read More](docs/tools/challenge.md)** - Challenge an approach or validate ideas with confidence
+**📖 Documentation**
+- [Getting Started](docs/getting-started.md) - Complete setup guide
+- [Tools Reference](docs/tools/) - All tools with examples
+- [Advanced Usage](docs/advanced-usage.md) - Power user features
+- [Configuration](docs/configuration.md) - Environment variables, restrictions
 
-### 4. `planner` - Interactive Step-by-Step Planning
-Break down complex projects or ideas into manageable, structured plans through step-by-step thinking.
-Perfect for adding new features to an existing system, scaling up system design, migration strategies,
-and architectural planning with branching and revision capabilities.
-
-#### Pro Tip
-Claude supports `sub-tasks` where it will spawn and run separate background tasks. You can ask Claude to
-run Zen's planner with two separate ideas. Then when it's done, use Zen's `consensus` tool to pass the entire
-plan and get expert perspective from two powerful AI models on which one to work on first! Like performing **AB** testing
-in one-go without the wait!
-
-```
-Create two separate sub-tasks: in one, using planner tool show me how to add natural language support
-to my cooking app. In the other sub-task, use planner to plan how to add support for voice notes to my cooking app.
-Once done, start a consensus by sharing both plans to o3 and flash to give me the final verdict. Which one do
-I implement first?
-```
-
-**[📖 Read More](docs/tools/planner.md)** - Step-by-step planning methodology and multi-session continuation
-
-### 5. `consensus` - Multi-Model Perspective Gathering
-Get diverse expert opinions from multiple AI models on technical proposals and decisions. Supports stance steering (for/against/neutral) and structured decision-making.
-
-```
-Get a consensus with flash taking a supportive stance and gemini pro being critical to evaluate whether we should
-migrate from REST to GraphQL for our API. I need a definitive answer.
-```
-
-**[📖 Read More](docs/tools/consensus.md)** - Multi-model orchestration and decision analysis
-
-### 6. `codereview` - Professional Code Review
-Comprehensive code analysis with prioritized feedback and severity levels. This workflow tool guides Claude through systematic investigation steps with forced pauses between each step to ensure thorough code examination, issue identification, and quality assessment before providing expert analysis.
-
-```
-Perform a codereview with gemini pro especially the auth.py as I feel some of the code is bypassing security checks
-and there may be more potential vulnerabilities. Find and share related code."
-```
-
-**Tips**:
-* To avoid extra API costs, add `do not use another model` to run the entire codereview workflow locally.
-* If you want to **always** consult an external model (recommended for code reviews) regardless of Claude’s confidence level in identifying issues, use `must codereview using <model name>`. This ensures the model is consulted even if Claude is _certain_ about identified issues.
-
-
-**[📖 Read More](docs/tools/codereview.md)** - Professional review workflow with step-by-step analysis
-
-### 7. `precommit` - Pre-Commit Validation
-Comprehensive review of staged/unstaged git changes across multiple repositories. This workflow tool guides Claude through systematic investigation of git changes, repository status, and file modifications across multiple steps before providing expert validation to ensure changes meet requirements and prevent regressions.
-
-```
-Perform a thorough precommit with o3, we want to only highlight critical issues, no blockers, no regressions. I need
-100% confidence this covers all the edge cases listed in FIX_ISSUE_ABC.md
-```
-
-#### Real Example:
-
-* I asked claude (sonnet-4) to refactor some code to fix a known bug; gave it exact instructions
-* The fix looked correct — and at a glance, so did the changes.
-* More importantly, all existing unit tests passed (540+ unit tests) without a hitch. Zero regressions. Claude seemed confident and eager to commit. Everything looked perfect.
-
-I then ran:
-
-```text
-Run a precommit with o3 confirm our changes are sound and diffs are valid. Confirm this won't cause breakage or
-regressions and codesmells are out
-```
-
-This passed the tracked and untracked changes / diffs and the related code to o3 for a thorough review via Zen's `precommit` tool.
-
-Output:
-
-```text
-⏺ Precommit Analysis Results - CRITICAL ISSUE FOUND
-
-  The O3 model has identified one HIGH-severity regression and several medium/low issues in our refactor. Here's the summary:
-
-  🚨 HIGH SEVERITY - MUST FIX BEFORE COMMIT
-
-  ...
-```
-
-The reported issue was in fact a _very subtle bug_ that slipped through the quick glance — and a unit test for this exact case apparently
-was missing (out of 540 existing tests!) - explains the zero reported regressions. The fix was ultimately simple, but the
-fact Claude (and by extension, I) overlooked this, was a stark reminder: no number of eyeballs is ever enough. Fixed the
-issue, ran `precommit` with o3 again and got:
-
- **RECOMMENDATION: PROCEED WITH COMMIT**
-
-Nice! This is just one instance - take a look at [another example here](docs/tools/precommit.md).
-
-**Tips**:
-* To avoid extra API costs, add `do not use another model` to run the entire precommit workflow locally.
-* If you want to **always** consult an external model (recommended for pre-commit analysis) regardless of Claude’s confidence level in identifying issues, use `must precommit using <model name>`. This ensures the model is consulted even if Claude is _certain_ about identified issues.
-
-**[📖 Read More](docs/tools/precommit.md)** - Multi-repository validation and change analysis
-
-### 8. `debug` - Expert Debugging Assistant
-Systematic investigation-guided debugging that walks Claude through step-by-step root cause analysis. This workflow 
-tool enforces a structured investigation process where Claude performs methodical code examination, evidence collection, 
-and hypothesis formation across multiple steps before receiving expert analysis from the selected AI model. When Claude's 
-confidence reaches **100% certainty** during the investigative workflow, expert analysis via another model is skipped to 
-save on tokens and cost, and Claude proceeds directly to fixing the issue. 
-
-```
-See logs under /Users/me/project/diagnostics.log and related code under the sync folder.
-Logs show that sync works but sometimes it gets stuck and there are no errors displayed to
-the user. Using zen's debug tool with gemini pro, find out why this is happening and what the root
-cause is and its fix
-```
-
-**Tips**:
-* To avoid extra API costs, add `do not use another model` to run the entire debugging workflow locally. This is recommended in most cases, as Claude typically identifies the root cause with high confidence by the end.
-* If you want to **always** consult an external model regardless of Claude’s confidence level, use `must debug using <model name>`. This ensures the model is consulted even if Claude is _certain_ about the issue.
-
-When in doubt, you can always follow up with a new prompt and ask Claude to share its findings with another model:
-
-```text
-Use continuation with thinkdeep, share details with o4-mini to find out what the best fix is for this
-```
-
-**[📖 Read More](docs/tools/debug.md)** - Step-by-step investigation methodology with workflow enforcement
-
-### 9. `analyze` - Smart File Analysis
-General-purpose code understanding and exploration. This workflow tool guides Claude through systematic investigation of code structure, patterns, and architectural decisions across multiple steps, gathering comprehensive insights before providing expert analysis for architecture assessment, pattern detection, and strategic improvement recommendations.
-
-```
-Use gemini to analyze main.py to understand how it works
-```
-
-**[📖 Read More](docs/tools/analyze.md)** - Comprehensive analysis workflow with step-by-step investigation
-
-### 10. `refactor` - Intelligent Code Refactoring
-Comprehensive refactoring analysis with top-down decomposition strategy. This workflow tool enforces systematic investigation of code smells, decomposition opportunities, and modernization possibilities across multiple steps, ensuring thorough analysis before providing expert refactoring recommendations with precise implementation guidance.
-
-```
-Use gemini pro to decompose my_crazy_big_class.m into smaller extensions
-```
-
-**[📖 Read More](docs/tools/refactor.md)** - Workflow-driven refactoring with progressive analysis
-
-### 11. `tracer` - Static Code Analysis Prompt Generator
-Creates detailed analysis prompts for call-flow mapping and dependency tracing. Generates structured analysis requests for precision execution flow or dependency mapping.
-
-```
-Use zen tracer to analyze how UserAuthManager.authenticate is used and why
-```
-
-**[📖 Read More](docs/tools/tracer.md)** - Prompt generation and analysis modes
-
-### 12. `testgen` - Comprehensive Test Generation
-Generates thorough test suites with edge case coverage based on existing code and test framework. This workflow tool guides Claude through systematic investigation of code functionality, critical paths, edge cases, and integration points across multiple steps before generating comprehensive tests with realistic failure mode analysis.
-
-```
-Use zen to generate tests for User.login() method
-```
-
-**[📖 Read More](docs/tools/testgen.md)** - Workflow-based test generation with comprehensive coverage
-
-### 13. `secaudit` - Comprehensive Security Audit
-Systematic OWASP-based security assessment with compliance evaluation. This workflow tool guides Claude through methodical security investigation steps with forced pauses between each step to ensure thorough vulnerability assessment, security pattern analysis, and compliance verification before providing expert analysis.
-
-```
-Perform a secaudit with o3 on this e-commerce web application focusing on payment processing security and PCI DSS compliance
-```
-
-**[📖 Read More](docs/tools/secaudit.md)** - OWASP Top 10 analysis with compliance framework support
-
-### 14. `docgen` - Comprehensive Documentation Generation
-Generates thorough documentation with complexity analysis and gotcha identification. This workflow tool guides Claude through systematic investigation of code structure, function complexity, and documentation needs across multiple steps before generating comprehensive documentation that includes algorithmic complexity, call flow information, and unexpected behaviors that developers should know about.
-
-```
-# Includes complexity Big-O notiation, documents dependencies / code-flow, fixes existing stale docs
-Use docgen to documentation the UserManager class
-
-# Includes complexity Big-O notiation, documents dependencies / code-flow
-Use docgen to add complexity analysis to all the new swift functions I added but don't update existing code
-```
-
-**[📖 Read More](docs/tools/docgen.md)** - Workflow-based documentation generation with gotcha detection
-
-### 15. `listmodels` - List Available Models
-Display all available AI models organized by provider, showing capabilities, context windows, and configuration status.
-
-```
-Use zen to list available models
-```
-
-**[📖 Read More](docs/tools/listmodels.md)** - Model capabilities and configuration details
-
-### 16. `version` - Server Information
-Get server version, configuration details, and system status for debugging and troubleshooting.
-
-```
-What version of zen do I have
-```
-
-**[📖 Read More](docs/tools/version.md)** - Server diagnostics and configuration verification
-
-For detailed tool parameters and configuration options, see the [Advanced Usage Guide](docs/advanced-usage.md).
-
-### Prompt Support
-
-Zen supports powerful structured prompts in Claude Code for quick access to tools and models:
-
-#### Tool Prompts
-- `/zen:chat ask local-llama what 2 + 2 is` - Use chat tool with auto-selected model
-- `/zen:thinkdeep use o3 and tell me why the code isn't working in sorting.swift` - Use thinkdeep tool with auto-selected model
-- `/zen:planner break down the microservices migration project into manageable steps` - Use planner tool with auto-selected model
-- `/zen:consensus use o3:for and flash:against and tell me if adding feature X is a good idea for the project. Pass them a summary of what it does.` - Use consensus tool with default configuration
-- `/zen:codereview review for security module ABC` - Use codereview tool with auto-selected model
-- `/zen:debug table view is not scrolling properly, very jittery, I suspect the code is in my_controller.m` - Use debug tool with auto-selected model
-- `/zen:analyze examine these files and tell me what if I'm using the CoreAudio framework properly` - Use analyze tool with auto-selected model
-- `/zen:docgen generate comprehensive documentation for the UserManager class with complexity analysis` - Use docgen tool with auto-selected model
-
-#### Continuation Prompts
-- `/zen:chat continue and ask gemini pro if framework B is better` - Continue previous conversation using chat tool
-
-#### Advanced Examples
-- `/zen:thinkdeeper check if the algorithm in @sort.py is performant and if there are alternatives we could explore`
-- `/zen:planner create a step-by-step plan for migrating our authentication system to OAuth2, including dependencies and rollback strategies`
-- `/zen:consensus debate whether we should migrate to GraphQL for our API`
-- `/zen:precommit confirm these changes match our requirements in COOL_FEATURE.md`
-- `/zen:testgen write me tests for class ABC`
-- `/zen:docgen document the payment processing module with gotchas and complexity analysis`
-- `/zen:refactor propose a decomposition strategy, make a plan and save it in FIXES.md`
-
-#### Syntax Format
-The prompt format is: `/zen:[tool] [your_message]`
-
-- `[tool]` - Any available tool name (chat, thinkdeep, planner, consensus, codereview, debug, analyze, docgen, etc.)
-- `[your_message]` - Your request, question, or instructions for the tool
-
-**Note:** All prompts will show as "(MCP) [tool]" in Claude Code to indicate they're provided by the MCP server.
-
-## Advanced Features
-
-### AI-to-AI Conversation Threading
-
-This server enables **true AI collaboration** between Claude and multiple AI models, where they can coordinate and build on each other's insights across tools and conversations.
-
-**[📖 Read More](docs/ai-collaboration.md)** - Multi-model coordination, conversation threading, and collaborative workflows
-
-
-## Configuration
-
-Configure the Zen MCP Server through environment variables in your `.env` file. Supports multiple AI providers, model restrictions, conversation settings, and advanced options.
-
-```env
-# Quick start - Auto mode (recommended)
-DEFAULT_MODEL=auto
-GEMINI_API_KEY=your-gemini-key
-OPENAI_API_KEY=your-openai-key
-DIAL_API_KEY=your-dial-key  # Optional: Access to multiple models via DIAL
-```
-
-**Key Configuration Options:**
-- **API Keys**: Native APIs (Gemini, OpenAI, X.AI), OpenRouter, DIAL, or Custom endpoints (Ollama, vLLM)
-- **Model Selection**: Auto mode or specific model defaults
-- **Usage Restrictions**: Control which models can be used for cost control
-- **Conversation Settings**: Timeout, turn limits, memory configuration
-- **Thinking Modes**: Token allocation for extended reasoning
-- **Logging**: Debug levels and operational visibility
-
-**[📖 Read More](docs/configuration.md)** - Complete configuration reference with examples
-
-## Testing
-
-For information on running tests, see the [Testing Guide](docs/testing.md).
-
-## Contributing
-
-We welcome contributions! Please see our comprehensive guides:
-- [Contributing Guide](docs/contributions.md) - Code standards, PR process, and requirements
-- [Adding a New Provider](docs/adding_providers.md) - Step-by-step guide for adding AI providers
+**🔧 Setup & Support**
+- [WSL Setup](docs/wsl-setup.md) - Windows users
+- [Troubleshooting](docs/troubleshooting.md) - Common issues
+- [Contributing](docs/contributions.md) - Code standards, PR process
 
 ## License
 
-Apache 2.0 License - see LICENSE file for details.
+Apache 2.0 License - see [LICENSE](LICENSE) file for details.
 
 ## Acknowledgments
 
 Built with the power of **Multi-Model AI** collaboration 🤝
 - **A**ctual **I**ntelligence by real Humans
 - [MCP (Model Context Protocol)](https://modelcontextprotocol.com) by Anthropic
-- [Claude Code](https://claude.ai/code) - Your AI coding assistant & orchestrator
-- [Gemini 2.5 Pro & 2.0 Flash](https://ai.google.dev/) - Extended thinking & fast analysis
-- [OpenAI O3](https://openai.com/) - Strong reasoning & general intelligence
+- [Claude Code](https://claude.ai/code) - Your AI coding orchestrator
+- [Gemini 2.5 Pro & Flash](https://ai.google.dev/) - Extended thinking & fast analysis
+- [OpenAI O3 & GPT-5](https://openai.com/) - Strong reasoning & latest capabilities
 
 ### Star History
 
diff --git a/docs/getting-started.md b/docs/getting-started.md
new file mode 100644
index 0000000..a08f857
--- /dev/null
+++ b/docs/getting-started.md
@@ -0,0 +1,359 @@
+# Getting Started with Zen MCP Server
+
+This guide walks you through setting up the Zen MCP Server from scratch, including installation, configuration, and first usage.
+
+## Prerequisites
+
+- **Python 3.10+** (3.12 recommended)
+- **Git**
+- **[uv installed](https://docs.astral.sh/uv/getting-started/installation/)** (for uvx method)
+- **Windows users**: WSL2 required for Claude Code CLI
+
+## Step 1: Get API Keys
+
+You need at least one API key. Choose based on your needs:
+
+### Option A: OpenRouter (Recommended for beginners)
+**One API for multiple models**
+- Visit [OpenRouter](https://openrouter.ai/) and sign up
+- Generate an API key
+- Control spending limits in your dashboard
+- Access GPT-4, Claude, Gemini, and more through one API
+
+### Option B: Native Provider APIs
+
+**Gemini (Google):**
+- Visit [Google AI Studio](https://makersuite.google.com/app/apikey)
+- Generate an API key
+- **Note**: For Gemini 2.5 Pro, use a paid API key (free tier has limited access)
+
+**OpenAI:**
+- Visit [OpenAI Platform](https://platform.openai.com/api-keys)
+- Generate an API key for O3, GPT-5 access
+
+**X.AI (Grok):**
+- Visit [X.AI Console](https://console.x.ai/)
+- Generate an API key for Grok models
+
+**DIAL Platform:**
+- Visit [DIAL Platform](https://dialx.ai/)
+- Generate API key for vendor-agnostic model access
+
+### Option C: Local Models (Free)
+
+**Ollama:**
+```bash
+# Install Ollama
+curl -fsSL https://ollama.ai/install.sh | sh
+
+# Start Ollama service
+ollama serve
+
+# Pull a model (e.g., Llama 3.2)
+ollama pull llama3.2
+```
+
+**Other local options:**
+- **vLLM**: Self-hosted inference server
+- **LM Studio**: Local model hosting with OpenAI-compatible API
+- **Text Generation WebUI**: Popular local interface
+
+👉 **[Complete custom model setup guide](custom_models.md)**
+
+## Step 2: Installation
+
+Choose your preferred installation method:
+
+### Method A: Instant Setup with uvx (Recommended)
+
+**Prerequisites**: [Install uv first](https://docs.astral.sh/uv/getting-started/installation/)
+
+**For Claude Desktop:**
+1. Open Claude Desktop → Settings → Developer → Edit Config
+2. Add this configuration:
+
+```json
+{
+  "mcpServers": {
+    "zen": {
+      "command": "sh",
+      "args": [
+        "-c", 
+        "exec $(which uvx || echo uvx) --from git+https://github.com/BeehiveInnovations/zen-mcp-server.git zen-mcp-server"
+      ],
+      "env": {
+        "PATH": "/usr/local/bin:/usr/bin:/bin:/opt/homebrew/bin:~/.local/bin",
+        "GEMINI_API_KEY": "your_api_key_here"
+      }
+    }
+  }
+}
+```
+
+**For Claude Code CLI:**
+Create `.mcp.json` in your project root:
+
+```json
+{
+  "mcpServers": {
+    "zen": {
+      "command": "sh", 
+      "args": [
+        "-c",
+        "exec $(which uvx || echo uvx) --from git+https://github.com/BeehiveInnovations/zen-mcp-server.git zen-mcp-server"
+      ],
+      "env": {
+        "PATH": "/usr/local/bin:/usr/bin:/bin:/opt/homebrew/bin:~/.local/bin",
+        "GEMINI_API_KEY": "your_api_key_here"
+      }
+    }
+  }
+}
+```
+
+**For Gemini CLI:**
+Edit `~/.gemini/settings.json`:
+
+```json
+{
+  "mcpServers": {
+    "zen": {
+      "command": "sh",
+      "args": [
+        "-c",
+        "exec $(which uvx || echo uvx) --from git+https://github.com/BeehiveInnovations/zen-mcp-server.git zen-mcp-server"  
+      ],
+      "env": {
+        "PATH": "/usr/local/bin:/usr/bin:/bin:/opt/homebrew/bin:~/.local/bin",
+        "GEMINI_API_KEY": "your_api_key_here"
+      }
+    }
+  }
+}
+```
+
+**Benefits of uvx method:**
+- ✅ Zero manual setup required
+- ✅ Always pulls latest version
+- ✅ No local dependencies to manage
+- ✅ Works without Python environment setup
+
+### Method B: Clone and Setup
+
+```bash
+# Clone the repository
+git clone https://github.com/BeehiveInnovations/zen-mcp-server.git
+cd zen-mcp-server
+
+# One-command setup (handles everything)
+./run-server.sh
+
+# Or for Windows PowerShell:
+./run-server.ps1
+
+# View configuration for Claude Desktop
+./run-server.sh -c
+
+# See all options
+./run-server.sh --help
+```
+
+**What the setup script does:**
+- ✅ Creates Python virtual environment
+- ✅ Installs all dependencies  
+- ✅ Creates .env file for API keys
+- ✅ Configures Claude integrations
+- ✅ Provides copy-paste configuration
+
+**After updates:** Always run `./run-server.sh` again after `git pull`.
+
+**Windows users**: See the [WSL Setup Guide](wsl-setup.md) for detailed WSL configuration.
+
+## Step 3: Configure API Keys
+
+### For uvx installation:
+Add your API keys directly to the MCP configuration shown above.
+
+### For clone installation:
+Edit the `.env` file:
+
+```bash
+nano .env
+```
+
+Add your API keys (at least one required):
+```env
+# Choose your providers (at least one required)
+GEMINI_API_KEY=your-gemini-api-key-here      # For Gemini models  
+OPENAI_API_KEY=your-openai-api-key-here      # For O3, GPT-5
+XAI_API_KEY=your-xai-api-key-here            # For Grok models
+OPENROUTER_API_KEY=your-openrouter-key       # For multiple models
+
+# DIAL Platform (optional)
+DIAL_API_KEY=your-dial-api-key-here
+DIAL_API_HOST=https://core.dialx.ai          # Default host (optional)
+DIAL_API_VERSION=2024-12-01-preview          # API version (optional) 
+DIAL_ALLOWED_MODELS=o3,gemini-2.5-pro       # Restrict models (optional)
+
+# Custom/Local models (Ollama, vLLM, etc.)
+CUSTOM_API_URL=http://localhost:11434/v1     # Ollama example
+CUSTOM_API_KEY=                              # Empty for Ollama
+CUSTOM_MODEL_NAME=llama3.2                   # Default model name
+```
+
+**Important notes:**
+- ⭐ **No restart needed** - Changes take effect immediately 
+- ⭐ If multiple APIs configured, native APIs take priority over OpenRouter
+- ⭐ Configure model aliases in [`conf/custom_models.json`](../conf/custom_models.json)
+
+## Step 4: Test the Installation
+
+### For Claude Desktop:
+1. Restart Claude Desktop
+2. Open a new conversation
+3. Try: `"Use zen to list available models"`
+
+### For Claude Code CLI:
+1. Exit any existing Claude session
+2. Run `claude` from your project directory  
+3. Try: `"Use zen to chat about Python best practices"`
+
+### For Gemini CLI:
+**Note**: While Zen MCP connects to Gemini CLI, tool invocation isn't working correctly yet. See [Gemini CLI Setup](gemini-setup.md) for updates.
+
+### Test Commands:
+```
+"Use zen to list available models"
+"Chat with zen about the best approach for API design"
+"Use zen thinkdeep with gemini pro about scaling strategies"  
+"Debug this error with o3: [paste error]"
+```
+
+## Step 5: Start Using Zen
+
+### Basic Usage Patterns:
+
+**Let Claude pick the model:**
+```
+"Use zen to analyze this code for security issues"
+"Debug this race condition with zen"
+"Plan the database migration with zen"
+```
+
+**Specify the model:**
+```  
+"Use zen with gemini pro to review this complex algorithm"
+"Debug with o3 using zen for logical analysis"
+"Get flash to quickly format this code via zen"
+```
+
+**Multi-model workflows:**
+```
+"Use zen to get consensus from pro and o3 on this architecture"
+"Code review with gemini, then precommit validation with o3"  
+"Analyze with flash, then deep dive with pro if issues found"
+```
+
+### Quick Tool Reference:
+
+**🤝 Collaboration**: `chat`, `thinkdeep`, `planner`, `consensus`
+**🔍 Code Analysis**: `analyze`, `codereview`, `debug`, `precommit`  
+**⚒️ Development**: `refactor`, `testgen`, `secaudit`, `docgen`
+**🔧 Utilities**: `challenge`, `tracer`, `listmodels`, `version`
+
+👉 **[Complete Tools Reference](tools/)** with detailed examples and parameters
+
+## Common Issues and Solutions
+
+### "zen not found" or "command not found"
+
+**For uvx installations:**
+- Ensure `uv` is installed and in PATH
+- Try: `which uvx` to verify uvx is available
+- Check PATH includes `/usr/local/bin` and `~/.local/bin`
+
+**For clone installations:**
+- Run `./run-server.sh` again to verify setup
+- Check virtual environment: `which python` should show `.zen_venv/bin/python`
+
+### API Key Issues
+
+**"Invalid API key" errors:**
+- Verify API keys in `.env` file or MCP configuration
+- Test API keys directly with provider's API
+- Check for extra spaces or quotes around keys
+
+**"Model not available":**
+- Run `"Use zen to list available models"` to see what's configured
+- Check model restrictions in environment variables
+- Verify API key has access to requested models
+
+### Performance Issues
+
+**Slow responses:**
+- Use faster models: `flash` instead of `pro`  
+- Lower thinking modes: `minimal` or `low` instead of `high`
+- Restrict model access to prevent expensive model selection
+
+**Token limit errors:**
+- Use models with larger context windows
+- Break large requests into smaller chunks
+- See [Working with Large Prompts](advanced-usage.md#working-with-large-prompts)
+
+### More Help
+
+👉 **[Complete Troubleshooting Guide](troubleshooting.md)** with detailed solutions
+
+👉 **[Advanced Usage Guide](advanced-usage.md)** for power-user features
+
+👉 **[Configuration Reference](configuration.md)** for all options
+
+## What's Next?
+
+🎯 **Try the example workflows in the main README**
+
+📚 **Explore the [Tools Reference](tools/)** to understand what each tool can do
+
+⚡ **Read the [Advanced Usage Guide](advanced-usage.md)** for complex workflows
+
+🔧 **Check out [Configuration Options](configuration.md)** to customize behavior
+
+💡 **Join discussions and get help** in the project issues or discussions
+
+## Quick Configuration Templates
+
+### Development Setup (Balanced)
+```env
+DEFAULT_MODEL=auto
+GEMINI_API_KEY=your-key
+OPENAI_API_KEY=your-key
+GOOGLE_ALLOWED_MODELS=flash,pro
+OPENAI_ALLOWED_MODELS=o4-mini,o3-mini
+```
+
+### Cost-Optimized Setup
+```env  
+DEFAULT_MODEL=flash
+GEMINI_API_KEY=your-key
+GOOGLE_ALLOWED_MODELS=flash
+```
+
+### High-Performance Setup  
+```env
+DEFAULT_MODEL=auto
+GEMINI_API_KEY=your-key
+OPENAI_API_KEY=your-key
+GOOGLE_ALLOWED_MODELS=pro
+OPENAI_ALLOWED_MODELS=o3
+```
+
+### Local-First Setup
+```env
+DEFAULT_MODEL=auto
+CUSTOM_API_URL=http://localhost:11434/v1
+CUSTOM_MODEL_NAME=llama3.2
+# Add cloud APIs as backup
+GEMINI_API_KEY=your-key
+```
+
+Happy coding with your AI development team! 🤖✨
\ No newline at end of file

From a204bda25dceabef8059107e6210088026cf3dea Mon Sep 17 00:00:00 2001
From: Sven Lito <me@svenlito.com>
Date: Tue, 12 Aug 2025 14:47:35 +0700
Subject: [PATCH 3/6] docs: remove backup files from cherry-pick

---
 docs/tools/analyze.md.bak    | 199 ----------------------------
 docs/tools/chat.md.bak       |  84 ------------
 docs/tools/codereview.md.bak | 174 ------------------------
 docs/tools/precommit.md.bak  | 247 -----------------------------------
 docs/tools/refactor.md.bak   | 201 ----------------------------
 docs/tools/secaudit.md.bak   | 222 -------------------------------
 docs/tools/testgen.md.bak    | 220 -------------------------------
 docs/tools/thinkdeep.md.bak  |  97 --------------
 8 files changed, 1444 deletions(-)
 delete mode 100644 docs/tools/analyze.md.bak
 delete mode 100644 docs/tools/chat.md.bak
 delete mode 100644 docs/tools/codereview.md.bak
 delete mode 100644 docs/tools/precommit.md.bak
 delete mode 100644 docs/tools/refactor.md.bak
 delete mode 100644 docs/tools/secaudit.md.bak
 delete mode 100644 docs/tools/testgen.md.bak
 delete mode 100644 docs/tools/thinkdeep.md.bak

diff --git a/docs/tools/analyze.md.bak b/docs/tools/analyze.md.bak
deleted file mode 100644
index 618a0be..0000000
--- a/docs/tools/analyze.md.bak
+++ /dev/null
@@ -1,199 +0,0 @@
-# Analyze Tool - Smart File Analysis
-
-**General-purpose code understanding and exploration through workflow-driven investigation**
-
-The `analyze` tool provides comprehensive code analysis and understanding capabilities, helping you explore codebases, understand architecture, and identify patterns across files and directories. This workflow tool guides Claude through systematic investigation of code structure, patterns, and architectural decisions across multiple steps, gathering comprehensive insights before providing expert analysis.
-
-## Thinking Mode
-
-**Default is `medium` (8,192 tokens).** Use `high` for architecture analysis (comprehensive insights worth the cost) or `low` for quick file overviews (save ~6k tokens).
-
-## How the Workflow Works
-
-The analyze tool implements a **structured workflow** for thorough code understanding:
-
-**Investigation Phase (Claude-Led):**
-1. **Step 1**: Claude describes the analysis plan and begins examining code structure
-2. **Step 2+**: Claude investigates architecture, patterns, dependencies, and design decisions
-3. **Throughout**: Claude tracks findings, relevant files, insights, and confidence levels
-4. **Completion**: Once analysis is comprehensive, Claude signals completion
-
-**Expert Analysis Phase:**
-After Claude completes the investigation (unless confidence is **certain**):
-- Complete analysis summary with all findings
-- Architectural insights and pattern identification
-- Strategic improvement recommendations
-- Final expert assessment based on investigation
-
-This workflow ensures methodical analysis before expert insights, resulting in deeper understanding and more valuable recommendations.
-
-## Example Prompts
-
-**Basic Usage:**
-```
-"Use gemini to analyze main.py to understand how it works"
-"Get gemini to do an architecture analysis of the src/ directory"
-```
-
-## Key Features
-
-- **Analyzes single files or entire directories** with intelligent file filtering
-- **Supports specialized analysis types**: architecture, performance, security, quality, general
-- **Uses file paths (not content) for clean terminal output** while processing full content
-- **Can identify patterns, anti-patterns, and refactoring opportunities**
-- **Large codebase support**: Handle massive codebases with 1M token context models
-- **Cross-file relationship mapping**: Understand dependencies and interactions
-- **Architecture visualization**: Describe system structure and component relationships
-- **Image support**: Analyze architecture diagrams, UML charts, flowcharts: `"Analyze this system diagram with gemini to understand the data flow and identify bottlenecks"`
-- **Web search capability**: When enabled with `use_websearch` (default: true), the model can request Claude to perform web searches and share results back to enhance analysis with current documentation, design patterns, and best practices
-
-## Tool Parameters
-
-**Workflow Investigation Parameters (used during step-by-step process):**
-- `step`: Current investigation step description (required for each step)
-- `step_number`: Current step number in analysis sequence (required)
-- `total_steps`: Estimated total investigation steps (adjustable)
-- `next_step_required`: Whether another investigation step is needed
-- `findings`: Discoveries and insights collected in this step (required)
-- `files_checked`: All files examined during investigation
-- `relevant_files`: Files directly relevant to the analysis (required in step 1)
-- `relevant_context`: Methods/functions/classes central to analysis findings
-- `issues_found`: Issues or concerns identified with severity levels
-- `confidence`: Confidence level in analysis completeness (exploring/low/medium/high/certain)
-- `backtrack_from_step`: Step number to backtrack from (for revisions)
-- `images`: Visual references for analysis context
-
-**Initial Configuration (used in step 1):**
-- `prompt`: What to analyze or look for (required)
-- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
-- `analysis_type`: architecture|performance|security|quality|general (default: general)
-- `output_format`: summary|detailed|actionable (default: detailed)
-- `temperature`: Temperature for analysis (0-1, default 0.2)
-- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
-- `use_websearch`: Enable web search for documentation and best practices (default: true)
-- `use_assistant_model`: Whether to use expert analysis phase (default: true, set to false to use Claude only)
-- `continuation_id`: Continue previous analysis sessions
-
-## Analysis Types
-
-**General Analysis (default):**
-- Overall code structure and organization
-- Key components and their responsibilities
-- Data flow and control flow
-- Design patterns and architectural decisions
-
-**Architecture Analysis:**
-- System-level design and component relationships
-- Module dependencies and coupling
-- Separation of concerns and layering
-- Scalability and maintainability considerations
-
-**Performance Analysis:**
-- Potential bottlenecks and optimization opportunities
-- Algorithmic complexity assessment
-- Memory usage patterns
-- I/O and database interaction efficiency
-
-**Security Analysis:**
-- Security patterns and potential vulnerabilities
-- Input validation and sanitization
-- Authentication and authorization mechanisms
-- Data protection and privacy considerations
-
-**Quality Analysis:**
-- Code quality metrics and maintainability
-- Testing coverage and patterns
-- Documentation completeness
-- Best practices adherence
-
-## Usage Examples
-
-**Single File Analysis:**
-```
-"Analyze user_controller.py to understand the authentication flow with gemini"
-```
-
-**Directory Architecture Analysis:**
-```
-"Use pro to analyze the src/ directory architecture and identify the main components"
-```
-
-**Performance-Focused Analysis:**
-```
-"Analyze backend/api/ for performance bottlenecks with o3, focus on database queries"
-```
-
-**Security Assessment:**
-```
-"Use gemini pro to analyze the authentication module for security patterns and potential issues"
-```
-
-**Visual + Code Analysis:**
-```
-"Analyze this system architecture diagram along with the src/core/ implementation to understand the data flow"
-```
-
-**Large Codebase Analysis:**
-```
-"Analyze the entire project structure with gemini pro to understand how all components work together"
-```
-
-## Output Formats
-
-**Summary Format:**
-- High-level overview with key findings
-- Main components and their purposes
-- Critical insights and recommendations
-
-**Detailed Format (default):**
-- Comprehensive analysis with specific examples
-- Code snippets and file references
-- Detailed explanations of patterns and structures
-
-**Actionable Format:**
-- Specific recommendations and next steps
-- Prioritized list of improvements
-- Implementation guidance and examples
-
-## Best Practices
-
-- **Be specific about goals**: Clearly state what you want to understand or discover
-- **Use appropriate analysis types**: Choose the type that matches your needs
-- **Include related files**: Analyze modules together for better context understanding
-- **Leverage large context models**: Use Gemini Pro for comprehensive codebase analysis
-- **Combine with visual context**: Include architecture diagrams or documentation
-- **Use continuation**: Build on previous analysis for deeper understanding
-
-## Advanced Features
-
-**Large Codebase Support:**
-With models like Gemini Pro (1M context), you can analyze extensive codebases:
-```
-"Analyze the entire microservices architecture across all service directories"
-```
-
-**Cross-File Relationship Mapping:**
-Understand how components interact across multiple files:
-```
-"Analyze the data processing pipeline across input/, processing/, and output/ directories"
-```
-
-**Pattern Recognition:**
-Identify design patterns, anti-patterns, and architectural decisions:
-```
-"Analyze src/ to identify all design patterns used and assess their implementation quality"
-```
-
-**Web Search Enhancement:**
-The tool can recommend searches for current best practices and documentation:
-```
-After analysis: "Recommended searches for Claude: 'FastAPI async best practices 2024', 'SQLAlchemy ORM performance optimization patterns'"
-```
-
-## When to Use Analyze vs Other Tools
-
-- **Use `analyze`** for: Understanding code structure, exploring unfamiliar codebases, architecture assessment
-- **Use `codereview`** for: Finding bugs and security issues with actionable fixes
-- **Use `debug`** for: Diagnosing specific runtime errors or performance problems
-- **Use `refactor`** for: Getting specific refactoring recommendations and implementation plans
-- **Use `chat`** for: Open-ended discussions about code without structured analysis
\ No newline at end of file
diff --git a/docs/tools/chat.md.bak b/docs/tools/chat.md.bak
deleted file mode 100644
index b7557eb..0000000
--- a/docs/tools/chat.md.bak
+++ /dev/null
@@ -1,84 +0,0 @@
-# Chat Tool - General Development Chat & Collaborative Thinking
-
-**Your thinking partner - bounce ideas, get second opinions, brainstorm collaboratively**
-
-The `chat` tool is your collaborative thinking partner for development conversations. It's designed to help you brainstorm, validate ideas, get second opinions, and explore alternatives in a conversational format.
-
-## Thinking Mode
-
-**Default is `medium` (8,192 tokens).** Use `low` for quick questions to save tokens, or `high` for complex discussions when thoroughness matters.
-
-## Example Prompt
-
-```
-Chat with zen and pick the best model for this job. I need to pick between Redis and Memcached for session storage 
-and I need an expert opinion for the project I'm working on. Get a good idea of what the project does, pick one of the two options
-and then debate with the other models to give me a final verdict
-```
-
-## Key Features
-
-- **Collaborative thinking partner** for your analysis and planning
-- **Get second opinions** on your designs and approaches
-- **Brainstorm solutions** and explore alternatives together
-- **Validate your checklists** and implementation plans
-- **General development questions** and explanations
-- **Technology comparisons** and best practices
-- **Architecture and design discussions**
-- **File reference support**: `"Use gemini to explain this algorithm with context from algorithm.py"`
-- **Image support**: Include screenshots, diagrams, UI mockups for visual analysis: `"Chat with gemini about this error dialog screenshot to understand the user experience issue"`
-- **Dynamic collaboration**: Gemini can request additional files or context during the conversation if needed for a more thorough response
-- **Web search capability**: Analyzes when web searches would be helpful and recommends specific searches for Claude to perform, ensuring access to current documentation and best practices
-
-## Tool Parameters
-
-- `prompt`: Your question or discussion topic (required)
-- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
-- `files`: Optional files for context (absolute paths)
-- `images`: Optional images for visual context (absolute paths)
-- `temperature`: Response creativity (0-1, default 0.5)
-- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
-- `use_websearch`: Enable web search for documentation and insights (default: true)
-- `continuation_id`: Continue previous conversations
-
-## Usage Examples
-
-**Basic Development Chat:**
-```
-"Chat with zen about the best approach for user authentication in my React app"
-```
-
-**Technology Comparison:**
-```
-"Use flash to discuss whether PostgreSQL or MongoDB would be better for my e-commerce platform"
-```
-
-**Architecture Discussion:**
-```
-"Chat with pro about microservices vs monolith architecture for my project, consider scalability and team size"
-```
-
-**File Context Analysis:**
-```
-"Use gemini to chat about the current authentication implementation in auth.py and suggest improvements"
-```
-
-**Visual Analysis:**
-```
-"Chat with gemini about this UI mockup screenshot - is the user flow intuitive?"
-```
-
-## Best Practices
-
-- **Be specific about context**: Include relevant files or describe your project scope
-- **Ask for trade-offs**: Request pros/cons for better decision-making
-- **Use conversation continuation**: Build on previous discussions with `continuation_id`
-- **Leverage visual context**: Include diagrams, mockups, or screenshots when discussing UI/UX
-- **Request web searches**: Ask for current best practices or recent developments in technologies
-
-## When to Use Chat vs Other Tools
-
-- **Use `chat`** for: Open-ended discussions, brainstorming, getting second opinions, technology comparisons
-- **Use `thinkdeep`** for: Extending specific analysis, challenging assumptions, deeper reasoning
-- **Use `analyze`** for: Understanding existing code structure and patterns
-- **Use `debug`** for: Specific error diagnosis and troubleshooting
\ No newline at end of file
diff --git a/docs/tools/codereview.md.bak b/docs/tools/codereview.md.bak
deleted file mode 100644
index 9037cc2..0000000
--- a/docs/tools/codereview.md.bak
+++ /dev/null
@@ -1,174 +0,0 @@
-# CodeReview Tool - Professional Code Review
-
-**Comprehensive code analysis with prioritized feedback through workflow-driven investigation**
-
-The `codereview` tool provides professional code review capabilities with actionable feedback, severity-based issue prioritization, and support for various review types from quick style checks to comprehensive security audits. This workflow tool guides Claude through systematic investigation steps with forced pauses between each step to ensure thorough code examination, issue identification, and quality assessment before providing expert analysis.
-
-## Thinking Mode
-
-**Default is `medium` (8,192 tokens).** Use `high` for security-critical code (worth the extra tokens) or `low` for quick style checks (saves ~6k tokens).
-
-## How the Workflow Works
-
-The codereview tool implements a **structured workflow** that ensures thorough code examination:
-
-**Investigation Phase (Claude-Led):**
-1. **Step 1**: Claude describes the review plan and begins systematic analysis of code structure
-2. **Step 2+**: Claude examines code quality, security implications, performance concerns, and architectural patterns
-3. **Throughout**: Claude tracks findings, relevant files, issues, and confidence levels
-4. **Completion**: Once review is comprehensive, Claude signals completion
-
-**Expert Analysis Phase:**
-After Claude completes the investigation (unless confidence is **certain**):
-- Complete review summary with all findings and evidence
-- Relevant files and code patterns identified
-- Issues categorized by severity levels
-- Final recommendations based on investigation
-
-**Special Note**: If you want Claude to perform the entire review without calling another model, you can include "don't use any other model" in your prompt, and Claude will complete the full workflow independently.
-
-## Model Recommendation
-
-This tool particularly benefits from Gemini Pro or Flash models due to their 1M context window, which allows comprehensive analysis of large codebases. Claude's context limitations make it challenging to see the "big picture" in complex projects - this is a concrete example where utilizing a secondary model with larger context provides significant value beyond just experimenting with different AI capabilities.
-
-## Example Prompts
-
-```
-Perform a codereview with gemini pro and review auth.py for security issues and potential vulnerabilities.
-I need an actionable plan but break it down into smaller quick-wins that we can implement and test rapidly 
-```
-
-## Pro Tip: Multiple Parallel Reviews
-
-**You can start more than one codereview session with Claude:**
-
-```
-Start separate sub-tasks for codereview one with o3 finding critical issues and one with flash finding low priority issues
-and quick-wins and give me the final single combined review highlighting only the critical issues 
-```
-
-The above prompt will simultaneously run two separate `codereview` tools with two separate models and combine the output into a single summary for you to consume.
-
-## Key Features
-
-- **Issues prioritized by severity** (🔴 CRITICAL → 🟢 LOW)
-- **Supports specialized reviews**: security, performance, quick
-- **Coding standards enforcement**: `"Use gemini to review src/ against PEP8 standards"`
-- **Severity filtering**: `"Get gemini to review auth/ - only report critical vulnerabilities"`
-- **Image support**: Review code from screenshots, error dialogs, or visual bug reports: `"Review this error screenshot and the related auth.py file for potential security issues"`
-- **Multi-file analysis**: Comprehensive review of entire directories or codebases
-- **Actionable feedback**: Specific recommendations with line numbers and code examples
-- **Language-specific expertise**: Tailored analysis for Python, JavaScript, Java, C#, Swift, and more
-- **Integration issue detection**: Identifies cross-file dependencies and architectural problems
-- **Security vulnerability scanning**: Focused on common security patterns and anti-patterns
-
-## Tool Parameters
-
-**Workflow Investigation Parameters (used during step-by-step process):**
-- `step`: Current investigation step description (required for each step)
-- `step_number`: Current step number in review sequence (required)
-- `total_steps`: Estimated total investigation steps (adjustable)
-- `next_step_required`: Whether another investigation step is needed
-- `findings`: Discoveries and evidence collected in this step (required)
-- `files_checked`: All files examined during investigation
-- `relevant_files`: Files directly relevant to the review (required in step 1)
-- `relevant_context`: Methods/functions/classes central to review findings
-- `issues_found`: Issues identified with severity levels
-- `confidence`: Confidence level in review completeness (exploring/low/medium/high/certain)
-- `backtrack_from_step`: Step number to backtrack from (for revisions)
-- `images`: Visual references for review context
-
-**Initial Review Configuration (used in step 1):**
-- `prompt`: User's summary of what the code does, expected behavior, constraints, and review objectives (required)
-- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
-- `review_type`: full|security|performance|quick (default: full)
-- `focus_on`: Specific aspects to focus on (e.g., "security vulnerabilities", "performance bottlenecks")
-- `standards`: Coding standards to enforce (e.g., "PEP8", "ESLint", "Google Style Guide")
-- `severity_filter`: critical|high|medium|low|all (default: all)
-- `temperature`: Temperature for consistency (0-1, default 0.2)
-- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
-- `use_websearch`: Enable web search for best practices and documentation (default: true)
-- `use_assistant_model`: Whether to use expert analysis phase (default: true, set to false to use Claude only)
-- `continuation_id`: Continue previous review discussions
-
-## Review Types
-
-**Full Review (default):**
-- Comprehensive analysis including bugs, security, performance, maintainability
-- Best for new features or significant code changes
-
-**Security Review:**
-- Focused on security vulnerabilities and attack vectors
-- Checks for common security anti-patterns
-- Best for authentication, authorization, data handling code
-
-**Performance Review:**
-- Analyzes performance bottlenecks and optimization opportunities
-- Memory usage, algorithmic complexity, resource management
-- Best for performance-critical code paths
-
-**Quick Review:**
-- Fast style and basic issue check
-- Lower token usage for rapid feedback
-- Best for code formatting and simple validation
-
-## Severity Levels
-
-Issues are categorized and prioritized:
-
-- **🔴 CRITICAL**: Security vulnerabilities, crashes, data corruption
-- **🟠 HIGH**: Logic errors, performance issues, reliability problems  
-- **🟡 MEDIUM**: Code smells, maintainability issues, minor bugs
-- **🟢 LOW**: Style issues, documentation, minor improvements
-
-## Usage Examples
-
-**Basic Security Review:**
-```
-"Review the authentication module in auth/ for security vulnerabilities with gemini pro"
-```
-
-**Performance-Focused Review:**
-```
-"Use o3 to review backend/api.py for performance issues, focus on database queries and caching"
-```
-
-**Quick Style Check:**
-```
-"Quick review of utils.py with flash, only report critical and high severity issues"
-```
-
-**Standards Enforcement:**
-```
-"Review src/ directory against PEP8 standards with gemini, focus on code formatting and structure"
-```
-
-**Visual Context Review:**
-```
-"Review this authentication code along with the error dialog screenshot to understand the security implications"
-```
-
-## Best Practices
-
-- **Provide context**: Describe what the code is supposed to do and any constraints
-- **Use appropriate review types**: Security for auth code, performance for critical paths
-- **Set severity filters**: Focus on critical issues for quick wins
-- **Include relevant files**: Review related modules together for better context
-- **Use parallel reviews**: Run multiple reviews with different models for comprehensive coverage
-- **Follow up on findings**: Use the continuation feature to discuss specific issues in detail
-
-## Output Format
-
-Reviews include:
-- **Executive Summary**: Overview of code quality and main concerns
-- **Detailed Findings**: Specific issues with severity levels, line numbers, and recommendations
-- **Quick Wins**: Easy-to-implement improvements with high impact
-- **Long-term Improvements**: Structural changes for better maintainability
-- **Security Considerations**: Specific security recommendations when relevant
-
-## When to Use CodeReview vs Other Tools
-
-- **Use `codereview`** for: Finding bugs, security issues, performance problems, code quality assessment
-- **Use `analyze`** for: Understanding code structure without finding issues
-- **Use `debug`** for: Diagnosing specific runtime errors or exceptions
-- **Use `refactor`** for: Identifying structural improvements and modernization opportunities
\ No newline at end of file
diff --git a/docs/tools/precommit.md.bak b/docs/tools/precommit.md.bak
deleted file mode 100644
index d70c1ab..0000000
--- a/docs/tools/precommit.md.bak
+++ /dev/null
@@ -1,247 +0,0 @@
-# PreCommit Tool - Pre-Commit Validation
-
-**Comprehensive review of staged/unstaged git changes across multiple repositories through workflow-driven investigation**
-
-The `precommit` tool provides thorough validation of git changes before committing, ensuring code quality, requirement compliance, and preventing regressions across multiple repositories. This workflow tool guides Claude through systematic investigation of git changes, repository status, and file modifications across multiple steps before providing expert validation.
-
-## Thinking Mode
-
-**Default is `medium` (8,192 tokens).** Use `high` or `max` for critical releases when thorough validation justifies the token cost.
-
-## How the Workflow Works
-
-The precommit tool implements a **structured workflow** for comprehensive change validation:
-
-**Investigation Phase (Claude-Led):**
-1. **Step 1**: Claude describes the validation plan and begins analyzing git status across repositories
-2. **Step 2+**: Claude examines changes, diffs, dependencies, and potential impacts
-3. **Throughout**: Claude tracks findings, relevant files, issues, and confidence levels
-4. **Completion**: Once investigation is thorough, Claude signals completion
-
-**Expert Validation Phase:**
-After Claude completes the investigation (unless confidence is **certain**):
-- Complete summary of all changes and their context
-- Potential issues and regressions identified
-- Requirement compliance assessment
-- Final recommendations for safe commit
-
-**Special Note**: If you want Claude to perform the entire pre-commit validation without calling another model, you can include "don't use any other model" in your prompt, and Claude will complete the full workflow independently.
-
-## Model Recommendation
-
-Pre-commit validation benefits significantly from models with extended context windows like Gemini Pro, which can analyze extensive changesets across multiple files and repositories simultaneously. This comprehensive view enables detection of cross-file dependencies, architectural inconsistencies, and integration issues that might be missed when reviewing changes in isolation due to context constraints.
-
-## Visual Example
-
-<div align="center">
-  <img src="https://github.com/user-attachments/assets/584adfa6-d252-49b4-b5b0-0cd6e97fb2c6" width="950">
-</div>
-
-**Prompt Used:**
-```
-Now use gemini and perform a review and precommit and ensure original requirements are met, no duplication of code or
-logic, everything should work as expected
-```
-
-How beautiful is that? Claude used `precommit` twice and `codereview` once and actually found and fixed two critical errors before commit!
-
-### Real-world Example
-
-Here's an example where Claude was made to go through changes *it* was asked to make. Its recommendation 
-to **Publish all Changes** was made after a rigorous set of prompts where it examined the small set of changes 
-from different angles (performance, bugs, anti-patterns etc). After confirming that unit tests were 
-passing, it went ahead with an initial review with O3; came back clean - all good to go. O3 was happy, everything looked great.
-
-Then it shared the changes and relevant code with Gemini 2.5 Pro - the following is the outcome:
-
-```
-⏺ zen:precommit (MCP)(step: "FINAL VALIDATION COMPLETE. All changes have been thoroughly analyzed and are ready for
-                     commit. The modifications successfully fix 4 GitHub CI test failures with high-quality, defensive
-                     implementations that maintain architectural integrity and coding standards. No issues or regressions
-                      detected.", step_number: 4, total_steps: 4, next_step_required: false, findings: "COMMIT
-                     RECOMMENDATION: ✅ APPROVED\n\nSUMMARY OF CHANGES:\n✅ Fixed 4 GitHub CI test failures:\n   -
-                     test_openrouter_with_restrictions (Mock iteration issue)  \n   - test_execute_with_branching
-                     (Missing metadata)\n   - test_execute_with_revision (Missing metadata)\n   - test_metadata_preserved
-                      (Provider type AttributeError)\n\n✅ IMPLEMENTATION QUALITY VERIFIED:\n   - Defensive programming
-                     with proper type checking\n   - Error handling with meaningful fallbacks\n   - Backward
-                     compatibility maintained\n   - Coding standards followed (no hasattr/getattr in changes)\n   -
-                     Architectural patterns preserved\n\n✅ COMPREHENSIVE TESTING:\n   - All originally failing tests
-                     pass\n   - Affected functionality: 37/37 tests pass\n   - Provider systems: 16/16 tests pass  \n   -
-                      Module imports work correctly\n   - No regressions detected\n\n✅ CHANGE IMPACT ASSESSMENT:\n   -
-                     Minimal, targeted changes\n   - No security or performance concerns\n   - Maintains API
-                     compatibility\n   - Changes are well-documented with clear comments\n\nRECOMMENDATION: These changes
-                      are safe to commit and will resolve the GitHub CI failures without any negative impact.",
-  ⎿  {
-       "status": "calling_expert_analysis",
-       "step_number": 4,
-     … +169 lines (ctrl+r to expand)
-
-⏺ 🔍 Pre-Commit Validation Complete
-
-  Based on comprehensive analysis using both O3 and Gemini 2.5 Pro models, I can provide the following assessment:
-
-  ✅ APPROVED WITH CRITICAL FIX REQUIRED
-
-  The changes successfully fix all 4 GitHub CI test failures and are well-implemented with defensive programming
-  patterns. However, the expert analysis identified one critical issue that must be addressed before commit:
-
-  🚨 [HIGH PRIORITY] Critical Issue Found
-
-  Inconsistent Provider Handling in tools/base.py
-  - Location: tools/base.py line ~1833 in _create_continuation_offer_response method
-  - Problem: The defensive provider handling pattern was applied in 2 locations but missed a 3rd location
-  - Impact: Risk of AttributeError: 'str' object has no attribute 'get_provider_type' in continuation scenarios
-  - Fix Required: Apply the same defensive pattern to the missed location
-```
-
-A subtle but critical issue spotted by Gemini that both Claude + O3 missed. This is the power of running these workflows with Zen.
-
-## Example Prompts
-
-```
-Use zen and perform a thorough precommit ensuring there aren't any new regressions or bugs introduced
-```
-
-## Key Features
-
-- **Recursive repository discovery** - finds all git repos including nested ones
-- **Validates changes against requirements** - ensures implementation matches intent
-- **Detects incomplete changes** - finds added functions never called, missing tests, etc.
-- **Multi-repo support** - reviews changes across multiple repositories in one go
-- **Configurable scope** - review staged, unstaged, or compare against branches
-- **Security focused** - catches exposed secrets, vulnerabilities in new code
-- **Smart truncation** - handles large diffs without exceeding context limits
-- **Cross-file dependency analysis** - identifies breaking changes across modules
-- **Test coverage validation** - ensures new code has appropriate test coverage
-- **Regression detection** - compares against requirements to prevent scope creep
-
-## Tool Parameters
-
-**Workflow Investigation Parameters (used during step-by-step process):**
-- `step`: Current investigation step description (required for each step)
-- `step_number`: Current step number in validation sequence (required)
-- `total_steps`: Estimated total investigation steps (adjustable)
-- `next_step_required`: Whether another investigation step is needed
-- `findings`: Discoveries and evidence collected in this step (required)
-- `files_checked`: All files examined during investigation
-- `relevant_files`: Files directly relevant to the changes
-- `relevant_context`: Methods/functions/classes affected by changes
-- `issues_found`: Issues identified with severity levels
-- `confidence`: Confidence level in validation completeness (exploring/low/medium/high/certain)
-- `backtrack_from_step`: Step number to backtrack from (for revisions)
-- `hypothesis`: Current assessment of change safety and completeness
-- `images`: Screenshots of requirements, design mockups for validation
-
-**Initial Configuration (used in step 1):**
-- `path`: Starting directory to search for repos (default: current directory, absolute path required)
-- `prompt`: The original user request description for the changes (required for context)
-- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
-- `compare_to`: Compare against a branch/tag instead of local changes (optional)
-- `severity_filter`: critical|high|medium|low|all (default: all)
-- `include_staged`: Include staged changes in the review (default: true)
-- `include_unstaged`: Include uncommitted changes in the review (default: true)
-- `focus_on`: Specific aspects to focus on
-- `temperature`: Temperature for response (default: 0.2)
-- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
-- `use_websearch`: Enable web search for best practices (default: true)
-- `use_assistant_model`: Whether to use expert validation phase (default: true, set to false to use Claude only)
-- `continuation_id`: Continue previous validation discussions
-
-## Usage Examples
-
-**Basic Pre-commit Validation:**
-```
-"Use zen precommit to validate my changes before committing"
-```
-
-**Security-Focused Validation:**
-```
-"Perform precommit security review with gemini pro on the authentication changes"
-```
-
-**Multi-Repository Validation:**
-```
-"Validate changes across all repositories in this workspace with o3"
-```
-
-**Against Specific Branch:**
-```
-"Compare current changes against main branch with precommit using gemini pro"
-```
-
-**With Requirements Context:**
-```
-"Precommit validation ensuring the new payment feature meets requirements in FEATURE_SPEC.md"
-```
-
-## Validation Scope
-
-The tool automatically discovers and validates:
-
-**Repository Discovery:**
-- Searches recursively for all `.git` directories
-- Handles nested repositories and submodules
-- Configurable search depth to prevent excessive recursion
-
-**Change Analysis:**
-- Staged changes (`git diff --cached`)
-- Unstaged changes (`git diff`)
-- Untracked files that should be added
-- Deleted files and their impact
-
-**Cross-Repository Impact:**
-- Shared dependencies between repositories
-- API contract changes that affect other repos
-- Configuration changes with system-wide impact
-
-## Validation Categories
-
-**Completeness Checks:**
-- New functions/classes have corresponding tests
-- Documentation updated for API changes
-- Configuration files updated as needed
-- Migration scripts for database changes
-
-**Quality Assurance:**
-- Code follows project standards
-- No obvious bugs or logical errors
-- Performance implications considered
-- Security vulnerabilities addressed
-
-**Requirement Compliance:**
-- Implementation matches original requirements
-- No scope creep or unauthorized changes
-- All acceptance criteria met
-- Edge cases properly handled
-
-**Integration Safety:**
-- Breaking changes properly documented
-- Backward compatibility maintained where required
-- Dependencies correctly updated
-- Environment-specific changes validated
-
-## Best Practices
-
-- **Provide clear context**: Include the original requirements or feature description
-- **Use for significant changes**: Most valuable for features, refactoring, or security updates
-- **Review before final commit**: Catch issues before they enter the main branch
-- **Include visual context**: Screenshots of requirements or expected behavior
-- **Focus validation scope**: Use `focus_on` parameter for specific concerns
-- **Multi-stage validation**: Use continuation for iterative improvement
-
-## Output Format
-
-Validation results include:
-- **Change Summary**: Overview of what was modified across repositories
-- **Requirement Compliance**: How well changes match original intent
-- **Completeness Assessment**: Missing tests, documentation, or related changes
-- **Security Review**: Potential vulnerabilities or exposed secrets
-- **Integration Impact**: Cross-repository and cross-module effects
-- **Recommendations**: Specific actions before committing
-
-## When to Use PreCommit vs Other Tools
-
-- **Use `precommit`** for: Validating changes before git commit, ensuring requirement compliance
-- **Use `codereview`** for: General code quality assessment without git context
-- **Use `debug`** for: Diagnosing specific runtime issues
-- **Use `analyze`** for: Understanding existing code without validation context
\ No newline at end of file
diff --git a/docs/tools/refactor.md.bak b/docs/tools/refactor.md.bak
deleted file mode 100644
index 6407a4a..0000000
--- a/docs/tools/refactor.md.bak
+++ /dev/null
@@ -1,201 +0,0 @@
-# Refactor Tool - Intelligent Code Refactoring
-
-**Comprehensive refactoring analysis with top-down decomposition strategy through workflow-driven investigation**
-
-The `refactor` tool provides intelligent code refactoring recommendations with a focus on top-down decomposition and systematic code improvement. This workflow tool enforces systematic investigation of code smells, decomposition opportunities, and modernization possibilities across multiple steps, ensuring thorough analysis before providing expert refactoring recommendations with precise implementation guidance.
-
-## Thinking Mode
-
-**Default is `medium` (8,192 tokens).** Use `high` for complex legacy systems (worth the investment for thorough refactoring plans) or `max` for extremely complex codebases requiring deep analysis.
-
-## How the Workflow Works
-
-The refactor tool implements a **structured workflow** for systematic refactoring analysis:
-
-**Investigation Phase (Claude-Led):**
-1. **Step 1**: Claude describes the refactoring plan and begins analyzing code structure
-2. **Step 2+**: Claude examines code smells, decomposition opportunities, and modernization possibilities
-3. **Throughout**: Claude tracks findings, relevant files, refactoring opportunities, and confidence levels
-4. **Completion**: Once investigation is thorough, Claude signals completion
-
-**Expert Analysis Phase:**
-After Claude completes the investigation (unless confidence is **complete**):
-- Complete refactoring opportunity summary
-- Prioritized recommendations by impact
-- Precise implementation guidance with line numbers
-- Final expert assessment for refactoring strategy
-
-This workflow ensures methodical investigation before expert recommendations, resulting in more targeted and valuable refactoring plans.
-
-## Model Recommendation
-
-The refactor tool excels with models that have large context windows like Gemini Pro (1M tokens), which can analyze entire files and complex codebases simultaneously. This comprehensive view enables detection of cross-file dependencies, architectural patterns, and refactoring opportunities that might be missed when reviewing code in smaller chunks due to context constraints.
-
-## Example Prompts
-
-```
-"Use gemini pro to decompose my_crazy_big_class.m into smaller extensions"
-"Using zen's refactor decompose the all_in_one_sync_code.swift into maintainable extensions"
-```
-
-💡**Example of a powerful prompt** to get the best out of both Claude + Flash's 1M Context: 
-```
-"First, think about how the authentication module works, find related classes and find
- any code smells, then using zen's refactor ask flash to confirm your findings but ask 
- it to find additional code smells and any other quick-wins and then fix these issues"
-```
-
-This results in Claude first performing its own expert analysis, encouraging it to think critically and identify links within the project code. It then prompts `flash` to review the same code with a hint—preventing it from duplicating Claude's findings and encouraging it to explore other areas that Claude did *not* discover.
-
-## Key Features
-
-- **Intelligent prioritization** - Will refuse to work on low priority issues if code is unwieldy large and requires decomposition first, helps identify poorly managed classes and files that need structural improvements before detail work
-- **Top-down decomposition strategy** - Analyzes file → class → function levels systematically
-- **Four refactor types**: `codesmells` (detect anti-patterns), `decompose` (break down large components), `modernize` (update language features), `organization` (improve structure)
-- **Precise line-number references** - Provides exact line numbers for Claude to implement changes
-- **Language-specific guidance** - Tailored suggestions for Python, JavaScript, Java, C#, Swift, and more
-- **Style guide integration** - Uses existing project files as pattern references
-- **Conservative approach** - Careful dependency analysis to prevent breaking changes
-- **Multi-file analysis** - Understands cross-file relationships and dependencies
-- **Priority sequencing** - Recommends implementation order for refactoring changes
-- **Image support**: Analyze code architecture diagrams, legacy system charts: `"Refactor this legacy module using gemini pro with the current architecture diagram"`
-
-## Refactor Types (Progressive Priority System)
-
-**1. `decompose` (CRITICAL PRIORITY)** - Context-aware decomposition with adaptive thresholds:
-
-**AUTOMATIC decomposition** (CRITICAL severity - blocks all other refactoring):
-- Files >15,000 LOC, Classes >3,000 LOC, Functions >500 LOC
-
-**EVALUATE decomposition** (contextual severity - intelligent assessment):
-- Files >5,000 LOC, Classes >1,000 LOC, Functions >150 LOC
-- Only recommends if genuinely improves maintainability
-- Respects legacy stability, domain complexity, performance constraints
-- Considers legitimate cases where size is justified (algorithms, state machines, generated code)
-
-**2. `codesmells`** - Applied only after decomposition is complete:
-- Detect long methods, complex conditionals, duplicate code, magic numbers, poor naming
-
-**3. `modernize`** - Applied only after decomposition is complete:
-- Update to modern language features (f-strings, async/await, etc.)
-
-**4. `organization`** - Applied only after decomposition is complete:
-- Improve logical grouping, separation of concerns, module structure
-
-**Progressive Analysis:** The tool performs a top-down check (worse → bad → better) and refuses to work on lower-priority issues if critical decomposition is needed first. It understands that massive files and classes create cognitive overload that must be addressed before detail work can be effective. Legacy code that cannot be safely decomposed is handled with higher tolerance thresholds and context-sensitive exemptions.
-
-## Tool Parameters
-
-**Workflow Investigation Parameters (used during step-by-step process):**
-- `step`: Current investigation step description (required for each step)
-- `step_number`: Current step number in refactoring sequence (required)
-- `total_steps`: Estimated total investigation steps (adjustable)
-- `next_step_required`: Whether another investigation step is needed
-- `findings`: Discoveries and refactoring opportunities in this step (required)
-- `files_checked`: All files examined during investigation
-- `relevant_files`: Files directly needing refactoring (required in step 1)
-- `relevant_context`: Methods/functions/classes requiring refactoring
-- `issues_found`: Refactoring opportunities with severity and type
-- `confidence`: Confidence level in analysis completeness (exploring/incomplete/partial/complete)
-- `backtrack_from_step`: Step number to backtrack from (for revisions)
-- `hypothesis`: Current assessment of refactoring priorities
-
-**Initial Configuration (used in step 1):**
-- `prompt`: Description of refactoring goals, context, and specific areas of focus (required)
-- `refactor_type`: codesmells|decompose|modernize|organization (default: codesmells)
-- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
-- `focus_areas`: Specific areas to focus on (e.g., 'performance', 'readability', 'maintainability', 'security')
-- `style_guide_examples`: Optional existing code files to use as style/pattern reference (absolute paths)
-- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
-- `use_assistant_model`: Whether to use expert analysis phase (default: true, set to false to use Claude only)
-- `continuation_id`: Thread continuation ID for multi-turn conversations
-
-## Usage Examples
-
-**Decomposition Analysis:**
-```
-"Analyze UserController.java for decomposition opportunities - it's becoming unwieldy"
-```
-
-**Code Smell Detection:**
-```
-"Use gemini to identify code smells in the authentication module with high thinking mode"
-```
-
-**Modernization:**
-```
-"Modernize legacy_parser.py to use modern Python features following examples/modern_patterns.py"
-```
-
-**Organization Improvement:**
-```
-"Refactor src/utils/ for better organization, focus on maintainability and readability"
-```
-
-**Legacy System Refactoring:**
-```
-"Use pro with max thinking to analyze this 10,000-line legacy file for decomposition strategy"
-```
-
-## Refactoring Strategy
-
-**Top-Down Analysis:**
-1. **File Level**: Identify oversized files that need splitting
-2. **Class Level**: Find classes with too many responsibilities  
-3. **Function Level**: Locate functions that are too complex or long
-4. **Code Quality**: Address smells, modernization, and organization
-
-**Context-Aware Decisions:**
-- **Domain Complexity**: Some domains legitimately require larger classes
-- **Performance Constraints**: Critical path code may resist decomposition
-- **Legacy Stability**: Old, working code may need gentler refactoring
-- **Test Coverage**: Refactoring recommendations consider testability
-
-**Breaking Change Prevention:**
-- Analyzes dependencies before suggesting splits
-- Recommends gradual migration strategies
-- Identifies public API impact
-- Suggests backward compatibility approaches
-
-## Best Practices
-
-- **Start with decomposition**: Address structural issues before cosmetic improvements
-- **Provide clear context**: Explain the codebase purpose and constraints
-- **Use appropriate refactor types**: Match the type to your primary concern
-- **Include style examples**: Reference existing well-structured code in your project
-- **Focus on high-impact areas**: Target the most problematic or frequently modified code
-- **Plan implementation order**: Follow the tool's sequencing recommendations
-- **Consider test coverage**: Ensure adequate tests before major structural changes
-
-## Output Format
-
-Refactoring analysis includes:
-- **Priority Assessment**: What needs attention first and why
-- **Decomposition Strategy**: Specific file/class/function splitting recommendations
-- **Implementation Plan**: Step-by-step refactoring sequence
-- **Line-Number References**: Exact locations for changes
-- **Dependency Analysis**: Impact assessment and migration strategies
-- **Risk Assessment**: Potential breaking changes and mitigation strategies
-
-## Advanced Features
-
-**Adaptive Thresholds:**
-The tool adjusts size thresholds based on context:
-- **Generated Code**: Higher tolerance for large files
-- **Algorithm Implementation**: Recognizes when size is justified
-- **Legacy Systems**: More conservative recommendations
-- **Test Files**: Different standards for test vs production code
-
-**Cross-File Refactoring:**
-Analyzes multiple files together to understand:
-- Shared responsibilities that could be extracted
-- Dependencies that complicate refactoring
-- Opportunities for new abstractions
-- Impact of changes across the codebase
-
-## When to Use Refactor vs Other Tools
-
-- **Use `refactor`** for: Structural improvements, decomposition, modernization, code organization
-- **Use `codereview`** for: Finding bugs and security issues with immediate fixes
-- **Use `analyze`** for: Understanding code without making change recommendations  
-- **Use `debug`** for: Solving specific runtime issues rather than structural problems
\ No newline at end of file
diff --git a/docs/tools/secaudit.md.bak b/docs/tools/secaudit.md.bak
deleted file mode 100644
index 280452f..0000000
--- a/docs/tools/secaudit.md.bak
+++ /dev/null
@@ -1,222 +0,0 @@
-# Secaudit Tool - Comprehensive Security Audit
-
-**Systematic OWASP-based security assessment with compliance evaluation through workflow-driven investigation**
-
-The `secaudit` tool provides comprehensive security auditing capabilities with systematic OWASP Top 10 assessment, compliance framework evaluation, 
-and threat modeling. This workflow tool guides Claude through methodical security investigation steps with forced pauses between each step to ensure 
-thorough vulnerability assessment, security pattern analysis, and compliance verification before providing expert analysis.
-
-**Important**: AI models may not identify all security vulnerabilities. Always perform additional manual security reviews, 
-penetration testing, and verification.
-
-## How the Workflow Works
-
-The secaudit tool implements a **structured 6-step security workflow** that ensures comprehensive security assessment:
-
-**Investigation Phase (Claude-Led):**
-1. **Step 1**: Security Scope Analysis - Claude identifies application type, tech stack, attack surface, and compliance requirements
-2. **Step 2**: Authentication & Authorization Assessment - Analyzes auth mechanisms, session management, and access controls
-3. **Step 3**: Input Validation & Data Security - Reviews input handling, data protection, and injection vulnerabilities
-4. **Step 4**: OWASP Top 10 (2021) Review - Systematic assessment of all OWASP categories with specific findings
-5. **Step 5**: Dependencies & Infrastructure - Security analysis of third-party components and deployment configurations
-6. **Step 6**: Compliance & Risk Assessment - Evaluation against specified compliance frameworks and risk prioritization
-
-**Expert Analysis Phase:**
-After Claude completes the investigation (unless confidence is **certain**):
-- Complete security assessment summary with all vulnerabilities and evidence
-- OWASP Top 10 systematic findings with severity classifications
-- Compliance framework gap analysis and remediation recommendations
-- Risk-prioritized remediation roadmap based on threat level and business impact
-
-**Special Note**: If you want Claude to perform the entire security audit without calling another model, you can include "don't use any other model" in your prompt, and Claude will complete the full workflow independently.
-
-## Model Recommendation
-
-This tool particularly benefits from Gemini Pro or O3 models due to their advanced reasoning capabilities and large context windows, which allow comprehensive security analysis across complex codebases. Security audits require understanding subtle attack vectors and cross-component interactions that benefit from deeper analytical capabilities.
-
-## Example Prompts
-
-```
-Perform a secaudit with o3 on this e-commerce web application focusing on payment processing security and PCI DSS compliance
-```
-
-```
-Use secaudit to conduct a comprehensive security audit of the authentication system, threat level high, focus on enterprise 
-security patterns and HIPAA compliance
-```
-
-## Pro Tip: Multi-Scope Security Assessment
-
-**You can run parallel security audits for different application components:**
-
-```
-Start separate sub-tasks, in one start a secaudit for critical payment processing components focusing on PCI DSS with gemini pro, 
-and in the other for user management focusing on OWASP authentication vulnerabilities with o4-mini, then combine into a unified 
-security remediation plan using planner 
-```
-
-## Key Features
-
-- **OWASP Top 10 (2021) systematic assessment** with specific vulnerability identification
-- **Multi-compliance framework support**: SOC2, PCI DSS, HIPAA, GDPR, FedRAMP
-- **Threat-level aware analysis**: Critical, high, medium, low threat classifications
-- **Technology-specific security patterns**: Web apps, APIs, mobile, cloud, enterprise systems
-- **Risk-based prioritization**: Business impact and exploitability assessment
-- **Audit focus customization**: Comprehensive, authentication, data protection, infrastructure
-- **Image support**: Security analysis from architecture diagrams, network topology, or security findings
-- **Multi-file security analysis**: Cross-component vulnerability identification
-- **Compliance gap analysis**: Specific framework requirements with remediation guidance
-- **Attack surface mapping**: Entry points, data flows, and privilege boundaries
-- **Security control effectiveness**: Evaluation of existing security measures
-
-## Tool Parameters
-
-**Workflow Investigation Parameters (used during step-by-step process):**
-- `step`: Current security investigation step description (required for each step)
-- `step_number`: Current step number in audit sequence (required)
-- `total_steps`: Estimated total investigation steps (typically 4-6, adjustable)
-- `next_step_required`: Whether another investigation step is needed
-- `findings`: Security discoveries and evidence collected in this step (required)
-- `files_checked`: All files examined during security investigation
-- `relevant_files`: Files directly relevant to security assessment (required in step 1)
-- `relevant_context`: Methods/functions/classes central to security findings
-- `issues_found`: Security issues identified with severity levels
-- `confidence`: Confidence level in security assessment completeness (exploring/low/medium/high/certain)
-- `backtrack_from_step`: Step number to backtrack from (for revisions)
-- `images`: Architecture diagrams, security documentation, or visual references
-
-**Initial Security Configuration (used in step 1):**
-- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
-- `security_scope`: Application context, technology stack, and security boundary definition (required)
-- `threat_level`: low|medium|high|critical (default: medium) - determines assessment depth and urgency
-- `compliance_requirements`: List of compliance frameworks to assess against (e.g., ["PCI DSS", "SOC2"])
-- `audit_focus`: comprehensive|authentication|data_protection|infrastructure|api_security (default: comprehensive)
-- `severity_filter`: critical|high|medium|low|all (default: all)
-- `temperature`: Temperature for analytical consistency (0-1, default 0.2)
-- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
-- `use_websearch`: Enable web search for security best practices and vulnerability databases (default: true)
-- `use_assistant_model`: Whether to use expert security analysis phase (default: true)
-- `continuation_id`: Continue previous security audit discussions
-
-## Audit Focus Areas
-
-**Comprehensive (default):**
-- Full OWASP Top 10 assessment with all security domains
-- Authentication, authorization, data protection, infrastructure
-- Best for complete security posture evaluation
-
-**Authentication:**
-- Focused on identity, access management, and session security
-- Multi-factor authentication, password policies, privilege escalation
-- Best for user management and access control systems
-
-**Data Protection:**
-- Encryption, data handling, privacy controls, and compliance
-- Input validation, output encoding, data classification
-- Best for applications handling sensitive or regulated data
-
-**Infrastructure:**
-- Deployment security, configuration management, dependency security
-- Network security, container security, cloud security posture
-- Best for DevOps and infrastructure security assessment
-
-**API Security:**
-- REST/GraphQL security, rate limiting, API authentication
-- Input validation, authorization patterns, API gateway security
-- Best for API-first applications and microservices
-
-## Threat Levels
-
-Security assessment depth and urgency:
-
-- **🔴 CRITICAL**: Mission-critical systems, high-value targets, regulatory requirements
-- **🟠 HIGH**: Business-critical applications, customer data handling, financial systems
-- **🟡 MEDIUM**: Standard business applications, internal tools, moderate risk exposure
-- **🟢 LOW**: Development environments, non-sensitive applications, proof-of-concepts
-
-## Compliance Frameworks
-
-Supported compliance assessments:
-
-- **SOC2**: Security, availability, processing integrity, confidentiality, privacy
-- **PCI DSS**: Payment card industry data security standards
-- **HIPAA**: Healthcare information privacy and security
-- **GDPR**: General data protection regulation compliance
-- **FedRAMP**: Federal risk and authorization management program
-- **ISO27001**: Information security management systems
-- **NIST**: Cybersecurity framework controls
-
-## OWASP Top 10 (2021) Coverage
-
-Systematic assessment includes:
-
-1. **A01 Broken Access Control**: Authorization flaws and privilege escalation
-2. **A02 Cryptographic Failures**: Encryption and data protection issues
-3. **A03 Injection**: SQL, NoSQL, OS, and LDAP injection vulnerabilities
-4. **A04 Insecure Design**: Security design flaws and threat modeling gaps
-5. **A05 Security Misconfiguration**: Configuration and hardening issues
-6. **A06 Vulnerable Components**: Third-party and dependency vulnerabilities
-7. **A07 Identification & Authentication Failures**: Authentication bypass and session management
-8. **A08 Software & Data Integrity Failures**: Supply chain and integrity violations
-9. **A09 Security Logging & Monitoring Failures**: Detection and response capabilities
-10. **A10 Server-Side Request Forgery**: SSRF and related vulnerabilities
-
-## Usage Examples
-
-**Comprehensive E-commerce Security Audit:**
-```
-"Conduct a comprehensive secaudit with gemini pro for our Node.js e-commerce platform, threat level high, 
-compliance requirements PCI DSS and SOC2, focus on payment processing security"
-```
-
-**Authentication System Security Review:**
-```
-"Use o3 to perform secaudit on authentication microservice, focus on authentication, 
-threat level critical, check for OWASP A07 and multi-factor authentication implementation"
-```
-
-**API Security Assessment:**
-```
-"Secaudit our REST API gateway with gemini pro, audit focus api_security, 
-compliance requirements GDPR, threat level medium"
-```
-
-**Infrastructure Security Review:**
-```
-"Perform secaudit on Kubernetes deployment manifests with o3, focus infrastructure, 
-threat level high, include container security and network policies"
-```
-
-**Quick Security Scan:**
-```
-"Fast secaudit of user registration flow with flash, focus authentication, 
-severity filter critical and high only"
-```
-
-## Best Practices
-
-- **Define clear security scope**: Specify application type, tech stack, and security boundaries
-- **Set appropriate threat levels**: Match assessment depth to risk exposure and criticality
-- **Include compliance requirements**: Specify relevant frameworks for regulatory alignment
-- **Use parallel audits**: Run separate assessments for different components or compliance frameworks
-- **Provide architectural context**: Include system diagrams, data flow documentation, or deployment topology
-- **Focus audit scope**: Use audit_focus for targeted assessments of specific security domains
-- **Follow up on findings**: Use continuation feature to dive deeper into specific vulnerabilities
-
-## Output Format
-
-Security audits include:
-- **Executive Security Summary**: Overall security posture and critical findings
-- **OWASP Top 10 Assessment**: Systematic review of each category with specific findings
-- **Compliance Gap Analysis**: Framework-specific requirements and current compliance status
-- **Risk-Prioritized Findings**: Vulnerabilities ordered by exploitability and business impact
-- **Remediation Roadmap**: Phased approach to security improvements with quick wins
-- **Security Architecture Recommendations**: Structural improvements for long-term security posture
-
-## When to Use Secaudit vs Other Tools
-
-- **Use `secaudit`** for: Comprehensive security assessment, compliance evaluation, OWASP-based vulnerability analysis
-- **Use `codereview`** for: General code quality with some security considerations
-- **Use `analyze`** for: Understanding security architecture without vulnerability assessment
-- **Use `debug`** for: Investigating specific security incidents or exploit attempts
-- **Use `precommit`** for: Pre-deployment security validation and change impact assessment
\ No newline at end of file
diff --git a/docs/tools/testgen.md.bak b/docs/tools/testgen.md.bak
deleted file mode 100644
index 0d74a98..0000000
--- a/docs/tools/testgen.md.bak
+++ /dev/null
@@ -1,220 +0,0 @@
-# TestGen Tool - Comprehensive Test Generation
-
-**Generates thorough test suites with edge case coverage through workflow-driven investigation**
-
-The `testgen` tool creates comprehensive test suites by analyzing your code paths, understanding intricate dependencies, and identifying realistic edge cases and failure scenarios that need test coverage. This workflow tool guides Claude through systematic investigation of code functionality, critical paths, edge cases, and integration points across multiple steps before generating comprehensive tests with realistic failure mode analysis.
-
-## Thinking Mode
-
-**Default is `medium` (8,192 tokens) for extended thinking models.** Use `high` for complex systems with many interactions or `max` for critical systems requiring exhaustive test coverage.
-
-## How the Workflow Works
-
-The testgen tool implements a **structured workflow** for comprehensive test generation:
-
-**Investigation Phase (Claude-Led):**
-1. **Step 1**: Claude describes the test generation plan and begins analyzing code functionality
-2. **Step 2+**: Claude examines critical paths, edge cases, error handling, and integration points
-3. **Throughout**: Claude tracks findings, test scenarios, and coverage gaps
-4. **Completion**: Once investigation is thorough, Claude signals completion
-
-**Test Generation Phase:**
-After Claude completes the investigation:
-- Complete test scenario catalog with all edge cases
-- Framework-specific test generation
-- Realistic failure mode coverage
-- Final test suite with comprehensive coverage
-
-This workflow ensures methodical analysis before test generation, resulting in more thorough and valuable test suites.
-
-## Model Recommendation
-
-Test generation excels with extended reasoning models like Gemini Pro or O3, which can analyze complex code paths, understand intricate dependencies, and identify comprehensive edge cases. The combination of large context windows and advanced reasoning enables generation of thorough test suites that cover realistic failure scenarios and integration points that shorter-context models might overlook.
-
-## Example Prompts
-
-**Basic Usage:**
-```
-"Use zen to generate tests for User.login() method"
-"Generate comprehensive tests for the sorting method in src/new_sort.py using o3"
-"Create tests for edge cases not already covered in our tests using gemini pro"
-```
-
-## Key Features
-
-- **Multi-agent workflow** analyzing code paths and identifying realistic failure modes
-- **Generates framework-specific tests** following project conventions
-- **Supports test pattern following** when examples are provided
-- **Dynamic token allocation** (25% for test examples, 75% for main code)
-- **Prioritizes smallest test files** for pattern detection
-- **Can reference existing test files**: `"Generate tests following patterns from tests/unit/"`
-- **Specific code coverage** - target specific functions/classes rather than testing everything
-- **Image support**: Test UI components, analyze visual requirements: `"Generate tests for this login form using the UI mockup screenshot"`
-- **Edge case identification**: Systematic discovery of boundary conditions and error states
-- **Realistic failure mode analysis**: Understanding what can actually go wrong in production
-- **Integration test support**: Tests that cover component interactions and system boundaries
-
-## Tool Parameters
-
-**Workflow Investigation Parameters (used during step-by-step process):**
-- `step`: Current investigation step description (required for each step)
-- `step_number`: Current step number in test generation sequence (required)
-- `total_steps`: Estimated total investigation steps (adjustable)
-- `next_step_required`: Whether another investigation step is needed
-- `findings`: Discoveries about functionality and test scenarios (required)
-- `files_checked`: All files examined during investigation
-- `relevant_files`: Files directly needing tests (required in step 1)
-- `relevant_context`: Methods/functions/classes requiring test coverage
-- `confidence`: Confidence level in test plan completeness (exploring/low/medium/high/certain)
-- `backtrack_from_step`: Step number to backtrack from (for revisions)
-
-**Initial Configuration (used in step 1):**
-- `prompt`: Description of what to test, testing objectives, and specific scope/focus areas (required)
-- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
-- `test_examples`: Optional existing test files or directories to use as style/pattern reference (absolute paths)
-- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
-- `use_assistant_model`: Whether to use expert test generation phase (default: true, set to false to use Claude only)
-
-## Usage Examples
-
-**Method-Specific Tests:**
-```
-"Generate tests for User.login() method covering authentication success, failure, and edge cases"
-```
-
-**Class Testing:**
-```
-"Use pro to generate comprehensive tests for PaymentProcessor class with max thinking mode"
-```
-
-**Following Existing Patterns:**
-```
-"Generate tests for new authentication module following patterns from tests/unit/auth/"
-```
-
-**UI Component Testing:**
-```
-"Generate tests for this login form component using the UI mockup screenshot"
-```
-
-**Algorithm Testing:**
-```
-"Create thorough tests for the sorting algorithm in utils/sort.py, focus on edge cases and performance"
-```
-
-**Integration Testing:**
-```
-"Generate integration tests for the payment processing pipeline from order creation to completion"
-```
-
-## Test Generation Strategy
-
-**Code Path Analysis:**
-- Identifies all execution paths through the code
-- Maps conditional branches and loops
-- Discovers error handling paths
-- Analyzes state transitions
-
-**Edge Case Discovery:**
-- Boundary value analysis (empty, null, max values)
-- Invalid input scenarios
-- Race conditions and timing issues
-- Resource exhaustion cases
-
-**Failure Mode Analysis:**
-- External dependency failures
-- Network and I/O errors
-- Authentication and authorization failures
-- Data corruption scenarios
-
-**Framework Detection:**
-The tool automatically detects and generates tests for:
-- **Python**: pytest, unittest, nose2
-- **JavaScript**: Jest, Mocha, Jasmine, Vitest
-- **Java**: JUnit 4/5, TestNG, Mockito
-- **C#**: NUnit, MSTest, xUnit
-- **Swift**: XCTest
-- **Go**: testing package
-- **And more**: Adapts to project conventions
-
-## Test Categories Generated
-
-**Unit Tests:**
-- Function/method behavior validation
-- Input/output verification
-- Error condition handling
-- State change verification
-
-**Integration Tests:**
-- Component interaction testing
-- API endpoint validation
-- Database integration
-- External service mocking
-
-**Edge Case Tests:**
-- Boundary conditions
-- Invalid inputs
-- Resource limits
-- Concurrent access
-
-**Performance Tests:**
-- Response time validation
-- Memory usage checks
-- Load handling
-- Scalability verification
-
-## Best Practices
-
-- **Be specific about scope**: Target specific functions/classes rather than requesting tests for everything
-- **Provide test examples**: Include existing test files for pattern consistency
-- **Focus on critical paths**: Prioritize testing of business-critical functionality
-- **Include visual context**: Screenshots or mockups for UI component testing
-- **Describe testing objectives**: Explain what aspects are most important to test
-- **Consider test maintenance**: Request readable, maintainable test code
-
-## Test Quality Features
-
-**Realistic Test Data:**
-- Generates meaningful test data that represents real-world scenarios
-- Avoids trivial test cases that don't add value
-- Creates data that exercises actual business logic
-
-**Comprehensive Coverage:**
-- Happy path scenarios
-- Error conditions and exceptions
-- Edge cases and boundary conditions
-- Integration points and dependencies
-
-**Maintainable Code:**
-- Clear test names that describe what's being tested
-- Well-organized test structure
-- Appropriate use of setup/teardown
-- Minimal test data and mocking
-
-## Advanced Features
-
-**Pattern Following:**
-When test examples are provided, the tool analyzes:
-- Naming conventions and structure
-- Assertion patterns and style
-- Mocking and setup approaches
-- Test data organization
-
-**Large Context Analysis:**
-With models like Gemini Pro, the tool can:
-- Analyze extensive codebases for comprehensive test coverage
-- Understand complex interactions across multiple modules
-- Generate integration tests that span multiple components
-
-**Visual Testing:**
-For UI components and visual elements:
-- Generate tests based on visual requirements
-- Create accessibility testing scenarios
-- Test responsive design behaviors
-
-## When to Use TestGen vs Other Tools
-
-- **Use `testgen`** for: Creating comprehensive test suites, filling test coverage gaps, testing new features
-- **Use `debug`** for: Diagnosing specific test failures or runtime issues
-- **Use `codereview`** for: Reviewing existing test quality and coverage
-- **Use `analyze`** for: Understanding existing test structure without generating new tests
\ No newline at end of file
diff --git a/docs/tools/thinkdeep.md.bak b/docs/tools/thinkdeep.md.bak
deleted file mode 100644
index 26d5322..0000000
--- a/docs/tools/thinkdeep.md.bak
+++ /dev/null
@@ -1,97 +0,0 @@
-# ThinkDeep Tool - Extended Reasoning Partner
-
-**Get a second opinion to augment Claude's own extended thinking**
-
-The `thinkdeep` tool provides extended reasoning capabilities, offering a second perspective to augment Claude's analysis. It's designed to challenge assumptions, find edge cases, and provide alternative approaches to complex problems.
-
-## Thinking Mode
-
-**Default is `high` (16,384 tokens) for deep analysis.** Claude will automatically choose the best mode based on complexity - use `low` for quick validations, `medium` for standard problems, `high` for complex issues (default), or `max` for extremely complex challenges requiring deepest analysis.
-
-## Example Prompt
-
-```
-Think deeper about my authentication design with pro using max thinking mode and brainstorm to come up 
-with the best architecture for my project
-```
-
-## Key Features
-
-- **Uses Gemini's specialized thinking models** for enhanced reasoning capabilities
-- **Provides a second opinion** on Claude's analysis
-- **Challenges assumptions** and identifies edge cases Claude might miss
-- **Offers alternative perspectives** and approaches
-- **Validates architectural decisions** and design patterns
-- **File reference support**: `"Use gemini to think deeper about my API design with reference to api/routes.py"`
-- **Image support**: Analyze architectural diagrams, flowcharts, design mockups: `"Think deeper about this system architecture diagram with gemini pro using max thinking mode"`
-- **Enhanced Critical Evaluation (v2.10.0)**: After Gemini's analysis, Claude is prompted to critically evaluate the suggestions, consider context and constraints, identify risks, and synthesize a final recommendation - ensuring a balanced, well-considered solution
-- **Web search capability**: When enabled (default: true), identifies areas where current documentation or community solutions would strengthen the analysis and suggests specific searches for Claude
-
-## Tool Parameters
-
-- `prompt`: Your current thinking/analysis to extend and validate (required)
-- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
-- `problem_context`: Additional context about the problem or goal
-- `focus_areas`: Specific aspects to focus on (architecture, performance, security, etc.)
-- `files`: Optional file paths or directories for additional context (absolute paths)
-- `images`: Optional images for visual analysis (absolute paths)
-- `temperature`: Temperature for creative thinking (0-1, default 0.7)
-- `thinking_mode`: minimal|low|medium|high|max (default: high, Gemini only)
-- `use_websearch`: Enable web search for documentation and insights (default: true)
-- `continuation_id`: Continue previous conversations
-
-## Usage Examples
-
-**Architecture Design:**
-```
-"Think deeper about my microservices authentication strategy with pro using max thinking mode"
-```
-
-**With File Context:**
-```
-"Use gemini to think deeper about my API design with reference to api/routes.py and models/user.py"
-```
-
-**Visual Analysis:**
-```
-"Think deeper about this system architecture diagram with gemini pro - identify potential bottlenecks"
-```
-
-**Problem Solving:**
-```
-"I'm considering using GraphQL vs REST for my API. Think deeper about the trade-offs with o3 using high thinking mode"
-```
-
-**Code Review Enhancement:**
-```
-"Think deeper about the security implications of this authentication code with pro"
-```
-
-## Best Practices
-
-- **Provide detailed context**: Share your current thinking, constraints, and objectives
-- **Be specific about focus areas**: Mention what aspects need deeper analysis
-- **Include relevant files**: Reference code, documentation, or configuration files
-- **Use appropriate thinking modes**: Higher modes for complex problems, lower for quick validations
-- **Leverage visual context**: Include diagrams or mockups for architectural discussions
-- **Build on discussions**: Use continuation to extend previous analyses
-
-## Enhanced Critical Evaluation Process
-
-The `thinkdeep` tool includes a unique two-stage process:
-
-1. **Gemini's Analysis**: Extended reasoning with specialized thinking capabilities
-2. **Claude's Critical Evaluation**: Claude reviews Gemini's suggestions, considers:
-   - Context and constraints of your specific situation
-   - Potential risks and implementation challenges
-   - Trade-offs and alternatives
-   - Final synthesized recommendation
-
-This ensures you get both deep reasoning and practical, context-aware advice.
-
-## When to Use ThinkDeep vs Other Tools
-
-- **Use `thinkdeep`** for: Extending specific analysis, challenging assumptions, architectural decisions
-- **Use `chat`** for: Open-ended brainstorming and general discussions
-- **Use `analyze`** for: Understanding existing code without extending analysis
-- **Use `codereview`** for: Finding specific bugs and security issues
\ No newline at end of file

From c2c6d5aa4dd60c3ad3576d442db161d432a2fb80 Mon Sep 17 00:00:00 2001
From: Sven Lito <svnlto@users.noreply.github.com>
Date: Tue, 12 Aug 2025 15:08:12 +0700
Subject: [PATCH 4/6] Update README.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
---
 README.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/README.md b/README.md
index f955d34..e162d36 100644
--- a/README.md
+++ b/README.md
@@ -22,6 +22,8 @@
 - **[OpenRouter](https://openrouter.ai/)** - Access multiple models with one API
 - **[Gemini](https://makersuite.google.com/app/apikey)** - Google's latest models
 - **[OpenAI](https://platform.openai.com/api-keys)** - O3, GPT-5 series
+- **[X.AI](https://console.x.ai/)** - Grok models
+- **[DIAL](https://dialx.ai/)** - Vendor-agnostic model access
 - **[Ollama](https://ollama.ai/)** - Local models (free)
 
 **2. Install** (choose one):

From b9a4db1352c20eb7df4735e36c044728b2c46d7b Mon Sep 17 00:00:00 2001
From: Sven Lito <svnlto@users.noreply.github.com>
Date: Tue, 12 Aug 2025 15:08:21 +0700
Subject: [PATCH 5/6] Update docs/advanced-usage.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
---
 docs/advanced-usage.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/advanced-usage.md b/docs/advanced-usage.md
index fcba473..f5644db 100644
--- a/docs/advanced-usage.md
+++ b/docs/advanced-usage.md
@@ -57,7 +57,7 @@ cloud models (expensive/powerful) AND local models (free/private) in the same co
 - **Gemini Models**: Support thinking modes (minimal to max), web search, 1M context
   - **Pro 2.5**: Deep analysis with max 32K thinking tokens
   - **Flash 2.5**: Ultra-fast with thinking support (24K thinking tokens)
-  - **Flash 2.0**: Latest fast model with audio/video input (24K thinking tokens)  
+  - **Flash 2.0**: Latest fast model with audio/video input (24K thinking tokens)
   - **Flash Lite 2.0**: Text-only lightweight model (no thinking support)
 - **O3/O4 Models**: Excellent reasoning, systematic analysis, 200K context
 - **GPT-4.1**: Extended context window (1M tokens), general capabilities

From 392c22ef06bfc16a3b9756e035f90bc01232a851 Mon Sep 17 00:00:00 2001
From: Sven Lito <me@svenlito.com>
Date: Tue, 12 Aug 2025 15:12:16 +0700
Subject: [PATCH 6/6] docs: add PATH environment variable to uvx configuration

Address gemini-code-assist bot feedback by adding explicit PATH
environment variable to ensure uvx-executed scripts can find
necessary executables regardless of inherited environment.

Maintains simplified 'command: uvx' while adding execution robustness.
---
 README.md | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/README.md b/README.md
index e162d36..6d37ad0 100644
--- a/README.md
+++ b/README.md
@@ -36,7 +36,10 @@
     "zen": {
       "command": "uvx",
       "args": ["--from", "git+https://github.com/BeehiveInnovations/zen-mcp-server.git", "zen-mcp-server"],
-      "env": { "GEMINI_API_KEY": "your-key-here" }
+      "env": {
+        "PATH": "/usr/local/bin:/usr/bin:/bin:/opt/homebrew/bin:~/.local/bin",
+        "GEMINI_API_KEY": "your-key-here"
+      }
     }
   }
 }