diff --git a/docs/advanced-usage.md b/docs/advanced-usage.md
index 63856da..fcba473 100644
--- a/docs/advanced-usage.md
+++ b/docs/advanced-usage.md
@@ -34,11 +34,16 @@ Regardless of your default configuration, you can specify models per request:
 | Model | Provider | Context | Strengths | Auto Mode Usage |
 |-------|----------|---------|-----------|------------------|
 | **`pro`** (Gemini 2.5 Pro) | Google | 1M tokens | Extended thinking (up to 32K tokens), deep analysis | Complex architecture, security reviews, deep debugging |
-| **`flash`** (Gemini 2.0 Flash) | Google | 1M tokens | Ultra-fast responses | Quick checks, formatting, simple analysis |
+| **`flash`** (Gemini 2.5 Flash) | Google | 1M tokens | Ultra-fast responses with thinking | Quick checks, formatting, simple analysis |
+| **`flash-2.0`** (Gemini 2.0 Flash) | Google | 1M tokens | Latest fast model with audio/video support | Quick analysis with multimodal input |
+| **`flashlite`** (Gemini 2.0 Flash Lite) | Google | 1M tokens | Lightweight text-only model | Fast text processing without vision |
 | **`o3`** | OpenAI | 200K tokens | Strong logical reasoning | Debugging logic errors, systematic analysis |
 | **`o3-mini`** | OpenAI | 200K tokens | Balanced speed/quality | Moderate complexity tasks |
 | **`o4-mini`** | OpenAI | 200K tokens | Latest reasoning model | Optimized for shorter contexts |
 | **`gpt4.1`** | OpenAI | 1M tokens | Latest GPT-4 with extended context | Large codebase analysis, comprehensive reviews |
+| **`gpt5`** (GPT-5) | OpenAI | 400K tokens | Advanced model with reasoning support | Complex problems requiring advanced reasoning |
+| **`gpt5-mini`** (GPT-5 Mini) | OpenAI | 400K tokens | Efficient variant with reasoning | Balanced performance and capability |
+| **`gpt5-nano`** (GPT-5 Nano) | OpenAI | 400K tokens | Fastest, cheapest GPT-5 variant | Summarization and classification tasks |
 | **`grok-4-latest`** | X.AI | 256K tokens | Latest flagship model with reasoning, vision | Complex analysis, reasoning tasks |
 | **`grok-3`** | X.AI | 131K tokens | Advanced reasoning model | Deep analysis, complex problems |
 | **`grok-3-fast`** | X.AI | 131K tokens | Higher performance variant | Fast responses with reasoning |
@@ -50,8 +55,16 @@ cloud models (expensive/powerful) AND local models (free/private) in the same co
 
 **Model Capabilities:**
 - **Gemini Models**: Support thinking modes (minimal to max), web search, 1M context
-- **O3 Models**: Excellent reasoning, systematic analysis, 200K context
+  - **Pro 2.5**: Deep analysis with max 32K thinking tokens
+  - **Flash 2.5**: Ultra-fast with thinking support (24K thinking tokens)
+  - **Flash 2.0**: Latest fast model with audio/video input (24K thinking tokens)  
+  - **Flash Lite 2.0**: Text-only lightweight model (no thinking support)
+- **O3/O4 Models**: Excellent reasoning, systematic analysis, 200K context
 - **GPT-4.1**: Extended context window (1M tokens), general capabilities
+- **GPT-5 Series**: Advanced reasoning models, 400K context
+  - **GPT-5**: Full-featured with reasoning support and vision
+  - **GPT-5 Mini**: Balanced efficiency and capability
+  - **GPT-5 Nano**: Optimized for fast, low-cost tasks
 - **Grok-4**: Extended thinking support, vision capabilities, 256K context
 - **Grok-3 Models**: Advanced reasoning, 131K context
 
@@ -148,7 +161,7 @@ All tools that work with files support **both individual files and entire direct
 **`analyze`** - Analyze files or directories
 - `files`: List of file paths or directories (required)
 - `question`: What to analyze (required)  
-- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `model`: auto|pro|flash|flash-2.0|flashlite|o3|o3-mini|o4-mini|gpt4.1|gpt5|gpt5-mini|gpt5-nano (default: server default)
 - `analysis_type`: architecture|performance|security|quality|general
 - `output_format`: summary|detailed|actionable
 - `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
@@ -163,7 +176,7 @@ All tools that work with files support **both individual files and entire direct
 
 **`codereview`** - Review code files or directories
 - `files`: List of file paths or directories (required)
-- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `model`: auto|pro|flash|flash-2.0|flashlite|o3|o3-mini|o4-mini|gpt4.1|gpt5|gpt5-mini|gpt5-nano (default: server default)
 - `review_type`: full|security|performance|quick
 - `focus_on`: Specific aspects to focus on
 - `standards`: Coding standards to enforce
@@ -179,7 +192,7 @@ All tools that work with files support **both individual files and entire direct
 
 **`debug`** - Debug with file context
 - `error_description`: Description of the issue (required)
-- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `model`: auto|pro|flash|flash-2.0|flashlite|o3|o3-mini|o4-mini|gpt4.1|gpt5|gpt5-mini|gpt5-nano (default: server default)
 - `error_context`: Stack trace or logs
 - `files`: Files or directories related to the issue
 - `runtime_info`: Environment details
@@ -195,7 +208,7 @@ All tools that work with files support **both individual files and entire direct
 
 **`thinkdeep`** - Extended analysis with file context
 - `current_analysis`: Your current thinking (required)
-- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `model`: auto|pro|flash|flash-2.0|flashlite|o3|o3-mini|o4-mini|gpt4.1|gpt5|gpt5-mini|gpt5-nano (default: server default)
 - `problem_context`: Additional context
 - `focus_areas`: Specific aspects to focus on
 - `files`: Files or directories for context
@@ -211,7 +224,7 @@ All tools that work with files support **both individual files and entire direct
 **`testgen`** - Comprehensive test generation with edge case coverage
 - `files`: Code files or directories to generate tests for (required)
 - `prompt`: Description of what to test, testing objectives, and scope (required)
-- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `model`: auto|pro|flash|flash-2.0|flashlite|o3|o3-mini|o4-mini|gpt4.1|gpt5|gpt5-mini|gpt5-nano (default: server default)
 - `test_examples`: Optional existing test files as style/pattern reference
 - `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
 
@@ -226,7 +239,7 @@ All tools that work with files support **both individual files and entire direct
 - `files`: Code files or directories to analyze for refactoring opportunities (required)
 - `prompt`: Description of refactoring goals, context, and specific areas of focus (required)
 - `refactor_type`: codesmells|decompose|modernize|organization (required)
-- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `model`: auto|pro|flash|flash-2.0|flashlite|o3|o3-mini|o4-mini|gpt4.1|gpt5|gpt5-mini|gpt5-nano (default: server default)
 - `focus_areas`: Specific areas to focus on (e.g., 'performance', 'readability', 'maintainability', 'security')
 - `style_guide_examples`: Optional existing code files to use as style/pattern reference
 - `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
diff --git a/docs/tools/analyze.md b/docs/tools/analyze.md
index 618a0be..af2c3a8 100644
--- a/docs/tools/analyze.md
+++ b/docs/tools/analyze.md
@@ -65,7 +65,7 @@ This workflow ensures methodical analysis before expert insights, resulting in d
 
 **Initial Configuration (used in step 1):**
 - `prompt`: What to analyze or look for (required)
-- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `model`: auto|pro|flash|flash-2.0|flashlite|o3|o3-mini|o4-mini|gpt4.1|gpt5|gpt5-mini|gpt5-nano (default: server default)
 - `analysis_type`: architecture|performance|security|quality|general (default: general)
 - `output_format`: summary|detailed|actionable (default: detailed)
 - `temperature`: Temperature for analysis (0-1, default 0.2)
diff --git a/docs/tools/analyze.md.bak b/docs/tools/analyze.md.bak
new file mode 100644
index 0000000..618a0be
--- /dev/null
+++ b/docs/tools/analyze.md.bak
@@ -0,0 +1,199 @@
+# Analyze Tool - Smart File Analysis
+
+**General-purpose code understanding and exploration through workflow-driven investigation**
+
+The `analyze` tool provides comprehensive code analysis and understanding capabilities, helping you explore codebases, understand architecture, and identify patterns across files and directories. This workflow tool guides Claude through systematic investigation of code structure, patterns, and architectural decisions across multiple steps, gathering comprehensive insights before providing expert analysis.
+
+## Thinking Mode
+
+**Default is `medium` (8,192 tokens).** Use `high` for architecture analysis (comprehensive insights worth the cost) or `low` for quick file overviews (save ~6k tokens).
+
+## How the Workflow Works
+
+The analyze tool implements a **structured workflow** for thorough code understanding:
+
+**Investigation Phase (Claude-Led):**
+1. **Step 1**: Claude describes the analysis plan and begins examining code structure
+2. **Step 2+**: Claude investigates architecture, patterns, dependencies, and design decisions
+3. **Throughout**: Claude tracks findings, relevant files, insights, and confidence levels
+4. **Completion**: Once analysis is comprehensive, Claude signals completion
+
+**Expert Analysis Phase:**
+After Claude completes the investigation (unless confidence is **certain**):
+- Complete analysis summary with all findings
+- Architectural insights and pattern identification
+- Strategic improvement recommendations
+- Final expert assessment based on investigation
+
+This workflow ensures methodical analysis before expert insights, resulting in deeper understanding and more valuable recommendations.
+
+## Example Prompts
+
+**Basic Usage:**
+```
+"Use gemini to analyze main.py to understand how it works"
+"Get gemini to do an architecture analysis of the src/ directory"
+```
+
+## Key Features
+
+- **Analyzes single files or entire directories** with intelligent file filtering
+- **Supports specialized analysis types**: architecture, performance, security, quality, general
+- **Uses file paths (not content) for clean terminal output** while processing full content
+- **Can identify patterns, anti-patterns, and refactoring opportunities**
+- **Large codebase support**: Handle massive codebases with 1M token context models
+- **Cross-file relationship mapping**: Understand dependencies and interactions
+- **Architecture visualization**: Describe system structure and component relationships
+- **Image support**: Analyze architecture diagrams, UML charts, flowcharts: `"Analyze this system diagram with gemini to understand the data flow and identify bottlenecks"`
+- **Web search capability**: When enabled with `use_websearch` (default: true), the model can request Claude to perform web searches and share results back to enhance analysis with current documentation, design patterns, and best practices
+
+## Tool Parameters
+
+**Workflow Investigation Parameters (used during step-by-step process):**
+- `step`: Current investigation step description (required for each step)
+- `step_number`: Current step number in analysis sequence (required)
+- `total_steps`: Estimated total investigation steps (adjustable)
+- `next_step_required`: Whether another investigation step is needed
+- `findings`: Discoveries and insights collected in this step (required)
+- `files_checked`: All files examined during investigation
+- `relevant_files`: Files directly relevant to the analysis (required in step 1)
+- `relevant_context`: Methods/functions/classes central to analysis findings
+- `issues_found`: Issues or concerns identified with severity levels
+- `confidence`: Confidence level in analysis completeness (exploring/low/medium/high/certain)
+- `backtrack_from_step`: Step number to backtrack from (for revisions)
+- `images`: Visual references for analysis context
+
+**Initial Configuration (used in step 1):**
+- `prompt`: What to analyze or look for (required)
+- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `analysis_type`: architecture|performance|security|quality|general (default: general)
+- `output_format`: summary|detailed|actionable (default: detailed)
+- `temperature`: Temperature for analysis (0-1, default 0.2)
+- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
+- `use_websearch`: Enable web search for documentation and best practices (default: true)
+- `use_assistant_model`: Whether to use expert analysis phase (default: true, set to false to use Claude only)
+- `continuation_id`: Continue previous analysis sessions
+
+## Analysis Types
+
+**General Analysis (default):**
+- Overall code structure and organization
+- Key components and their responsibilities
+- Data flow and control flow
+- Design patterns and architectural decisions
+
+**Architecture Analysis:**
+- System-level design and component relationships
+- Module dependencies and coupling
+- Separation of concerns and layering
+- Scalability and maintainability considerations
+
+**Performance Analysis:**
+- Potential bottlenecks and optimization opportunities
+- Algorithmic complexity assessment
+- Memory usage patterns
+- I/O and database interaction efficiency
+
+**Security Analysis:**
+- Security patterns and potential vulnerabilities
+- Input validation and sanitization
+- Authentication and authorization mechanisms
+- Data protection and privacy considerations
+
+**Quality Analysis:**
+- Code quality metrics and maintainability
+- Testing coverage and patterns
+- Documentation completeness
+- Best practices adherence
+
+## Usage Examples
+
+**Single File Analysis:**
+```
+"Analyze user_controller.py to understand the authentication flow with gemini"
+```
+
+**Directory Architecture Analysis:**
+```
+"Use pro to analyze the src/ directory architecture and identify the main components"
+```
+
+**Performance-Focused Analysis:**
+```
+"Analyze backend/api/ for performance bottlenecks with o3, focus on database queries"
+```
+
+**Security Assessment:**
+```
+"Use gemini pro to analyze the authentication module for security patterns and potential issues"
+```
+
+**Visual + Code Analysis:**
+```
+"Analyze this system architecture diagram along with the src/core/ implementation to understand the data flow"
+```
+
+**Large Codebase Analysis:**
+```
+"Analyze the entire project structure with gemini pro to understand how all components work together"
+```
+
+## Output Formats
+
+**Summary Format:**
+- High-level overview with key findings
+- Main components and their purposes
+- Critical insights and recommendations
+
+**Detailed Format (default):**
+- Comprehensive analysis with specific examples
+- Code snippets and file references
+- Detailed explanations of patterns and structures
+
+**Actionable Format:**
+- Specific recommendations and next steps
+- Prioritized list of improvements
+- Implementation guidance and examples
+
+## Best Practices
+
+- **Be specific about goals**: Clearly state what you want to understand or discover
+- **Use appropriate analysis types**: Choose the type that matches your needs
+- **Include related files**: Analyze modules together for better context understanding
+- **Leverage large context models**: Use Gemini Pro for comprehensive codebase analysis
+- **Combine with visual context**: Include architecture diagrams or documentation
+- **Use continuation**: Build on previous analysis for deeper understanding
+
+## Advanced Features
+
+**Large Codebase Support:**
+With models like Gemini Pro (1M context), you can analyze extensive codebases:
+```
+"Analyze the entire microservices architecture across all service directories"
+```
+
+**Cross-File Relationship Mapping:**
+Understand how components interact across multiple files:
+```
+"Analyze the data processing pipeline across input/, processing/, and output/ directories"
+```
+
+**Pattern Recognition:**
+Identify design patterns, anti-patterns, and architectural decisions:
+```
+"Analyze src/ to identify all design patterns used and assess their implementation quality"
+```
+
+**Web Search Enhancement:**
+The tool can recommend searches for current best practices and documentation:
+```
+After analysis: "Recommended searches for Claude: 'FastAPI async best practices 2024', 'SQLAlchemy ORM performance optimization patterns'"
+```
+
+## When to Use Analyze vs Other Tools
+
+- **Use `analyze`** for: Understanding code structure, exploring unfamiliar codebases, architecture assessment
+- **Use `codereview`** for: Finding bugs and security issues with actionable fixes
+- **Use `debug`** for: Diagnosing specific runtime errors or performance problems
+- **Use `refactor`** for: Getting specific refactoring recommendations and implementation plans
+- **Use `chat`** for: Open-ended discussions about code without structured analysis
\ No newline at end of file
diff --git a/docs/tools/chat.md b/docs/tools/chat.md
index b7557eb..31bd8ef 100644
--- a/docs/tools/chat.md
+++ b/docs/tools/chat.md
@@ -33,7 +33,7 @@ and then debate with the other models to give me a final verdict
 ## Tool Parameters
 
 - `prompt`: Your question or discussion topic (required)
-- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `model`: auto|pro|flash|flash-2.0|flashlite|o3|o3-mini|o4-mini|gpt4.1|gpt5|gpt5-mini|gpt5-nano (default: server default)
 - `files`: Optional files for context (absolute paths)
 - `images`: Optional images for visual context (absolute paths)
 - `temperature`: Response creativity (0-1, default 0.5)
diff --git a/docs/tools/chat.md.bak b/docs/tools/chat.md.bak
new file mode 100644
index 0000000..b7557eb
--- /dev/null
+++ b/docs/tools/chat.md.bak
@@ -0,0 +1,84 @@
+# Chat Tool - General Development Chat & Collaborative Thinking
+
+**Your thinking partner - bounce ideas, get second opinions, brainstorm collaboratively**
+
+The `chat` tool is your collaborative thinking partner for development conversations. It's designed to help you brainstorm, validate ideas, get second opinions, and explore alternatives in a conversational format.
+
+## Thinking Mode
+
+**Default is `medium` (8,192 tokens).** Use `low` for quick questions to save tokens, or `high` for complex discussions when thoroughness matters.
+
+## Example Prompt
+
+```
+Chat with zen and pick the best model for this job. I need to pick between Redis and Memcached for session storage 
+and I need an expert opinion for the project I'm working on. Get a good idea of what the project does, pick one of the two options
+and then debate with the other models to give me a final verdict
+```
+
+## Key Features
+
+- **Collaborative thinking partner** for your analysis and planning
+- **Get second opinions** on your designs and approaches
+- **Brainstorm solutions** and explore alternatives together
+- **Validate your checklists** and implementation plans
+- **General development questions** and explanations
+- **Technology comparisons** and best practices
+- **Architecture and design discussions**
+- **File reference support**: `"Use gemini to explain this algorithm with context from algorithm.py"`
+- **Image support**: Include screenshots, diagrams, UI mockups for visual analysis: `"Chat with gemini about this error dialog screenshot to understand the user experience issue"`
+- **Dynamic collaboration**: Gemini can request additional files or context during the conversation if needed for a more thorough response
+- **Web search capability**: Analyzes when web searches would be helpful and recommends specific searches for Claude to perform, ensuring access to current documentation and best practices
+
+## Tool Parameters
+
+- `prompt`: Your question or discussion topic (required)
+- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `files`: Optional files for context (absolute paths)
+- `images`: Optional images for visual context (absolute paths)
+- `temperature`: Response creativity (0-1, default 0.5)
+- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
+- `use_websearch`: Enable web search for documentation and insights (default: true)
+- `continuation_id`: Continue previous conversations
+
+## Usage Examples
+
+**Basic Development Chat:**
+```
+"Chat with zen about the best approach for user authentication in my React app"
+```
+
+**Technology Comparison:**
+```
+"Use flash to discuss whether PostgreSQL or MongoDB would be better for my e-commerce platform"
+```
+
+**Architecture Discussion:**
+```
+"Chat with pro about microservices vs monolith architecture for my project, consider scalability and team size"
+```
+
+**File Context Analysis:**
+```
+"Use gemini to chat about the current authentication implementation in auth.py and suggest improvements"
+```
+
+**Visual Analysis:**
+```
+"Chat with gemini about this UI mockup screenshot - is the user flow intuitive?"
+```
+
+## Best Practices
+
+- **Be specific about context**: Include relevant files or describe your project scope
+- **Ask for trade-offs**: Request pros/cons for better decision-making
+- **Use conversation continuation**: Build on previous discussions with `continuation_id`
+- **Leverage visual context**: Include diagrams, mockups, or screenshots when discussing UI/UX
+- **Request web searches**: Ask for current best practices or recent developments in technologies
+
+## When to Use Chat vs Other Tools
+
+- **Use `chat`** for: Open-ended discussions, brainstorming, getting second opinions, technology comparisons
+- **Use `thinkdeep`** for: Extending specific analysis, challenging assumptions, deeper reasoning
+- **Use `analyze`** for: Understanding existing code structure and patterns
+- **Use `debug`** for: Specific error diagnosis and troubleshooting
\ No newline at end of file
diff --git a/docs/tools/codereview.md b/docs/tools/codereview.md
index 9037cc2..bb095c5 100644
--- a/docs/tools/codereview.md
+++ b/docs/tools/codereview.md
@@ -80,7 +80,7 @@ The above prompt will simultaneously run two separate `codereview` tools with tw
 
 **Initial Review Configuration (used in step 1):**
 - `prompt`: User's summary of what the code does, expected behavior, constraints, and review objectives (required)
-- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `model`: auto|pro|flash|flash-2.0|flashlite|o3|o3-mini|o4-mini|gpt4.1|gpt5|gpt5-mini|gpt5-nano (default: server default)
 - `review_type`: full|security|performance|quick (default: full)
 - `focus_on`: Specific aspects to focus on (e.g., "security vulnerabilities", "performance bottlenecks")
 - `standards`: Coding standards to enforce (e.g., "PEP8", "ESLint", "Google Style Guide")
diff --git a/docs/tools/codereview.md.bak b/docs/tools/codereview.md.bak
new file mode 100644
index 0000000..9037cc2
--- /dev/null
+++ b/docs/tools/codereview.md.bak
@@ -0,0 +1,174 @@
+# CodeReview Tool - Professional Code Review
+
+**Comprehensive code analysis with prioritized feedback through workflow-driven investigation**
+
+The `codereview` tool provides professional code review capabilities with actionable feedback, severity-based issue prioritization, and support for various review types from quick style checks to comprehensive security audits. This workflow tool guides Claude through systematic investigation steps with forced pauses between each step to ensure thorough code examination, issue identification, and quality assessment before providing expert analysis.
+
+## Thinking Mode
+
+**Default is `medium` (8,192 tokens).** Use `high` for security-critical code (worth the extra tokens) or `low` for quick style checks (saves ~6k tokens).
+
+## How the Workflow Works
+
+The codereview tool implements a **structured workflow** that ensures thorough code examination:
+
+**Investigation Phase (Claude-Led):**
+1. **Step 1**: Claude describes the review plan and begins systematic analysis of code structure
+2. **Step 2+**: Claude examines code quality, security implications, performance concerns, and architectural patterns
+3. **Throughout**: Claude tracks findings, relevant files, issues, and confidence levels
+4. **Completion**: Once review is comprehensive, Claude signals completion
+
+**Expert Analysis Phase:**
+After Claude completes the investigation (unless confidence is **certain**):
+- Complete review summary with all findings and evidence
+- Relevant files and code patterns identified
+- Issues categorized by severity levels
+- Final recommendations based on investigation
+
+**Special Note**: If you want Claude to perform the entire review without calling another model, you can include "don't use any other model" in your prompt, and Claude will complete the full workflow independently.
+
+## Model Recommendation
+
+This tool particularly benefits from Gemini Pro or Flash models due to their 1M context window, which allows comprehensive analysis of large codebases. Claude's context limitations make it challenging to see the "big picture" in complex projects - this is a concrete example where utilizing a secondary model with larger context provides significant value beyond just experimenting with different AI capabilities.
+
+## Example Prompts
+
+```
+Perform a codereview with gemini pro and review auth.py for security issues and potential vulnerabilities.
+I need an actionable plan but break it down into smaller quick-wins that we can implement and test rapidly 
+```
+
+## Pro Tip: Multiple Parallel Reviews
+
+**You can start more than one codereview session with Claude:**
+
+```
+Start separate sub-tasks for codereview one with o3 finding critical issues and one with flash finding low priority issues
+and quick-wins and give me the final single combined review highlighting only the critical issues 
+```
+
+The above prompt will simultaneously run two separate `codereview` tools with two separate models and combine the output into a single summary for you to consume.
+
+## Key Features
+
+- **Issues prioritized by severity** (🔴 CRITICAL → 🟢 LOW)
+- **Supports specialized reviews**: security, performance, quick
+- **Coding standards enforcement**: `"Use gemini to review src/ against PEP8 standards"`
+- **Severity filtering**: `"Get gemini to review auth/ - only report critical vulnerabilities"`
+- **Image support**: Review code from screenshots, error dialogs, or visual bug reports: `"Review this error screenshot and the related auth.py file for potential security issues"`
+- **Multi-file analysis**: Comprehensive review of entire directories or codebases
+- **Actionable feedback**: Specific recommendations with line numbers and code examples
+- **Language-specific expertise**: Tailored analysis for Python, JavaScript, Java, C#, Swift, and more
+- **Integration issue detection**: Identifies cross-file dependencies and architectural problems
+- **Security vulnerability scanning**: Focused on common security patterns and anti-patterns
+
+## Tool Parameters
+
+**Workflow Investigation Parameters (used during step-by-step process):**
+- `step`: Current investigation step description (required for each step)
+- `step_number`: Current step number in review sequence (required)
+- `total_steps`: Estimated total investigation steps (adjustable)
+- `next_step_required`: Whether another investigation step is needed
+- `findings`: Discoveries and evidence collected in this step (required)
+- `files_checked`: All files examined during investigation
+- `relevant_files`: Files directly relevant to the review (required in step 1)
+- `relevant_context`: Methods/functions/classes central to review findings
+- `issues_found`: Issues identified with severity levels
+- `confidence`: Confidence level in review completeness (exploring/low/medium/high/certain)
+- `backtrack_from_step`: Step number to backtrack from (for revisions)
+- `images`: Visual references for review context
+
+**Initial Review Configuration (used in step 1):**
+- `prompt`: User's summary of what the code does, expected behavior, constraints, and review objectives (required)
+- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `review_type`: full|security|performance|quick (default: full)
+- `focus_on`: Specific aspects to focus on (e.g., "security vulnerabilities", "performance bottlenecks")
+- `standards`: Coding standards to enforce (e.g., "PEP8", "ESLint", "Google Style Guide")
+- `severity_filter`: critical|high|medium|low|all (default: all)
+- `temperature`: Temperature for consistency (0-1, default 0.2)
+- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
+- `use_websearch`: Enable web search for best practices and documentation (default: true)
+- `use_assistant_model`: Whether to use expert analysis phase (default: true, set to false to use Claude only)
+- `continuation_id`: Continue previous review discussions
+
+## Review Types
+
+**Full Review (default):**
+- Comprehensive analysis including bugs, security, performance, maintainability
+- Best for new features or significant code changes
+
+**Security Review:**
+- Focused on security vulnerabilities and attack vectors
+- Checks for common security anti-patterns
+- Best for authentication, authorization, data handling code
+
+**Performance Review:**
+- Analyzes performance bottlenecks and optimization opportunities
+- Memory usage, algorithmic complexity, resource management
+- Best for performance-critical code paths
+
+**Quick Review:**
+- Fast style and basic issue check
+- Lower token usage for rapid feedback
+- Best for code formatting and simple validation
+
+## Severity Levels
+
+Issues are categorized and prioritized:
+
+- **🔴 CRITICAL**: Security vulnerabilities, crashes, data corruption
+- **🟠 HIGH**: Logic errors, performance issues, reliability problems  
+- **🟡 MEDIUM**: Code smells, maintainability issues, minor bugs
+- **🟢 LOW**: Style issues, documentation, minor improvements
+
+## Usage Examples
+
+**Basic Security Review:**
+```
+"Review the authentication module in auth/ for security vulnerabilities with gemini pro"
+```
+
+**Performance-Focused Review:**
+```
+"Use o3 to review backend/api.py for performance issues, focus on database queries and caching"
+```
+
+**Quick Style Check:**
+```
+"Quick review of utils.py with flash, only report critical and high severity issues"
+```
+
+**Standards Enforcement:**
+```
+"Review src/ directory against PEP8 standards with gemini, focus on code formatting and structure"
+```
+
+**Visual Context Review:**
+```
+"Review this authentication code along with the error dialog screenshot to understand the security implications"
+```
+
+## Best Practices
+
+- **Provide context**: Describe what the code is supposed to do and any constraints
+- **Use appropriate review types**: Security for auth code, performance for critical paths
+- **Set severity filters**: Focus on critical issues for quick wins
+- **Include relevant files**: Review related modules together for better context
+- **Use parallel reviews**: Run multiple reviews with different models for comprehensive coverage
+- **Follow up on findings**: Use the continuation feature to discuss specific issues in detail
+
+## Output Format
+
+Reviews include:
+- **Executive Summary**: Overview of code quality and main concerns
+- **Detailed Findings**: Specific issues with severity levels, line numbers, and recommendations
+- **Quick Wins**: Easy-to-implement improvements with high impact
+- **Long-term Improvements**: Structural changes for better maintainability
+- **Security Considerations**: Specific security recommendations when relevant
+
+## When to Use CodeReview vs Other Tools
+
+- **Use `codereview`** for: Finding bugs, security issues, performance problems, code quality assessment
+- **Use `analyze`** for: Understanding code structure without finding issues
+- **Use `debug`** for: Diagnosing specific runtime errors or exceptions
+- **Use `refactor`** for: Identifying structural improvements and modernization opportunities
\ No newline at end of file
diff --git a/docs/tools/debug.md b/docs/tools/debug.md
index 6e7f20d..985708b 100644
--- a/docs/tools/debug.md
+++ b/docs/tools/debug.md
@@ -73,7 +73,7 @@ This structured approach ensures Claude performs methodical groundwork before ex
 - `images`: Visual debugging materials (error screenshots, logs, etc.)
 
 **Model Selection:**
-- `model`: auto|pro|flash|o3|o3-mini|o4-mini (default: server default)
+- `model`: auto|pro|flash|flash-2.0|flashlite|o3|o3-mini|o4-mini|gpt4.1|gpt5|gpt5-mini|gpt5-nano (default: server default)
 - `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
 - `use_websearch`: Enable web search for documentation and solutions (default: true)
 - `use_assistant_model`: Whether to use expert analysis phase (default: true, set to false to use Claude only)
diff --git a/docs/tools/precommit.md b/docs/tools/precommit.md
index d70c1ab..85c9422 100644
--- a/docs/tools/precommit.md
+++ b/docs/tools/precommit.md
@@ -135,7 +135,7 @@ Use zen and perform a thorough precommit ensuring there aren't any new regressio
 **Initial Configuration (used in step 1):**
 - `path`: Starting directory to search for repos (default: current directory, absolute path required)
 - `prompt`: The original user request description for the changes (required for context)
-- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `model`: auto|pro|flash|flash-2.0|flashlite|o3|o3-mini|o4-mini|gpt4.1|gpt5|gpt5-mini|gpt5-nano (default: server default)
 - `compare_to`: Compare against a branch/tag instead of local changes (optional)
 - `severity_filter`: critical|high|medium|low|all (default: all)
 - `include_staged`: Include staged changes in the review (default: true)
diff --git a/docs/tools/precommit.md.bak b/docs/tools/precommit.md.bak
new file mode 100644
index 0000000..d70c1ab
--- /dev/null
+++ b/docs/tools/precommit.md.bak
@@ -0,0 +1,247 @@
+# PreCommit Tool - Pre-Commit Validation
+
+**Comprehensive review of staged/unstaged git changes across multiple repositories through workflow-driven investigation**
+
+The `precommit` tool provides thorough validation of git changes before committing, ensuring code quality, requirement compliance, and preventing regressions across multiple repositories. This workflow tool guides Claude through systematic investigation of git changes, repository status, and file modifications across multiple steps before providing expert validation.
+
+## Thinking Mode
+
+**Default is `medium` (8,192 tokens).** Use `high` or `max` for critical releases when thorough validation justifies the token cost.
+
+## How the Workflow Works
+
+The precommit tool implements a **structured workflow** for comprehensive change validation:
+
+**Investigation Phase (Claude-Led):**
+1. **Step 1**: Claude describes the validation plan and begins analyzing git status across repositories
+2. **Step 2+**: Claude examines changes, diffs, dependencies, and potential impacts
+3. **Throughout**: Claude tracks findings, relevant files, issues, and confidence levels
+4. **Completion**: Once investigation is thorough, Claude signals completion
+
+**Expert Validation Phase:**
+After Claude completes the investigation (unless confidence is **certain**):
+- Complete summary of all changes and their context
+- Potential issues and regressions identified
+- Requirement compliance assessment
+- Final recommendations for safe commit
+
+**Special Note**: If you want Claude to perform the entire pre-commit validation without calling another model, you can include "don't use any other model" in your prompt, and Claude will complete the full workflow independently.
+
+## Model Recommendation
+
+Pre-commit validation benefits significantly from models with extended context windows like Gemini Pro, which can analyze extensive changesets across multiple files and repositories simultaneously. This comprehensive view enables detection of cross-file dependencies, architectural inconsistencies, and integration issues that might be missed when reviewing changes in isolation due to context constraints.
+
+## Visual Example
+
+<div align="center">
+  <img src="https://github.com/user-attachments/assets/584adfa6-d252-49b4-b5b0-0cd6e97fb2c6" width="950">
+</div>
+
+**Prompt Used:**
+```
+Now use gemini and perform a review and precommit and ensure original requirements are met, no duplication of code or
+logic, everything should work as expected
+```
+
+How beautiful is that? Claude used `precommit` twice and `codereview` once and actually found and fixed two critical errors before commit!
+
+### Real-world Example
+
+Here's an example where Claude was made to go through changes *it* was asked to make. Its recommendation 
+to **Publish all Changes** was made after a rigorous set of prompts where it examined the small set of changes 
+from different angles (performance, bugs, anti-patterns etc). After confirming that unit tests were 
+passing, it went ahead with an initial review with O3; came back clean - all good to go. O3 was happy, everything looked great.
+
+Then it shared the changes and relevant code with Gemini 2.5 Pro - the following is the outcome:
+
+```
+⏺ zen:precommit (MCP)(step: "FINAL VALIDATION COMPLETE. All changes have been thoroughly analyzed and are ready for
+                     commit. The modifications successfully fix 4 GitHub CI test failures with high-quality, defensive
+                     implementations that maintain architectural integrity and coding standards. No issues or regressions
+                      detected.", step_number: 4, total_steps: 4, next_step_required: false, findings: "COMMIT
+                     RECOMMENDATION: ✅ APPROVED\n\nSUMMARY OF CHANGES:\n✅ Fixed 4 GitHub CI test failures:\n   -
+                     test_openrouter_with_restrictions (Mock iteration issue)  \n   - test_execute_with_branching
+                     (Missing metadata)\n   - test_execute_with_revision (Missing metadata)\n   - test_metadata_preserved
+                      (Provider type AttributeError)\n\n✅ IMPLEMENTATION QUALITY VERIFIED:\n   - Defensive programming
+                     with proper type checking\n   - Error handling with meaningful fallbacks\n   - Backward
+                     compatibility maintained\n   - Coding standards followed (no hasattr/getattr in changes)\n   -
+                     Architectural patterns preserved\n\n✅ COMPREHENSIVE TESTING:\n   - All originally failing tests
+                     pass\n   - Affected functionality: 37/37 tests pass\n   - Provider systems: 16/16 tests pass  \n   -
+                      Module imports work correctly\n   - No regressions detected\n\n✅ CHANGE IMPACT ASSESSMENT:\n   -
+                     Minimal, targeted changes\n   - No security or performance concerns\n   - Maintains API
+                     compatibility\n   - Changes are well-documented with clear comments\n\nRECOMMENDATION: These changes
+                      are safe to commit and will resolve the GitHub CI failures without any negative impact.",
+  ⎿  {
+       "status": "calling_expert_analysis",
+       "step_number": 4,
+     … +169 lines (ctrl+r to expand)
+
+⏺ 🔍 Pre-Commit Validation Complete
+
+  Based on comprehensive analysis using both O3 and Gemini 2.5 Pro models, I can provide the following assessment:
+
+  ✅ APPROVED WITH CRITICAL FIX REQUIRED
+
+  The changes successfully fix all 4 GitHub CI test failures and are well-implemented with defensive programming
+  patterns. However, the expert analysis identified one critical issue that must be addressed before commit:
+
+  🚨 [HIGH PRIORITY] Critical Issue Found
+
+  Inconsistent Provider Handling in tools/base.py
+  - Location: tools/base.py line ~1833 in _create_continuation_offer_response method
+  - Problem: The defensive provider handling pattern was applied in 2 locations but missed a 3rd location
+  - Impact: Risk of AttributeError: 'str' object has no attribute 'get_provider_type' in continuation scenarios
+  - Fix Required: Apply the same defensive pattern to the missed location
+```
+
+A subtle but critical issue spotted by Gemini that both Claude + O3 missed. This is the power of running these workflows with Zen.
+
+## Example Prompts
+
+```
+Use zen and perform a thorough precommit ensuring there aren't any new regressions or bugs introduced
+```
+
+## Key Features
+
+- **Recursive repository discovery** - finds all git repos including nested ones
+- **Validates changes against requirements** - ensures implementation matches intent
+- **Detects incomplete changes** - finds added functions never called, missing tests, etc.
+- **Multi-repo support** - reviews changes across multiple repositories in one go
+- **Configurable scope** - review staged, unstaged, or compare against branches
+- **Security focused** - catches exposed secrets, vulnerabilities in new code
+- **Smart truncation** - handles large diffs without exceeding context limits
+- **Cross-file dependency analysis** - identifies breaking changes across modules
+- **Test coverage validation** - ensures new code has appropriate test coverage
+- **Regression detection** - compares against requirements to prevent scope creep
+
+## Tool Parameters
+
+**Workflow Investigation Parameters (used during step-by-step process):**
+- `step`: Current investigation step description (required for each step)
+- `step_number`: Current step number in validation sequence (required)
+- `total_steps`: Estimated total investigation steps (adjustable)
+- `next_step_required`: Whether another investigation step is needed
+- `findings`: Discoveries and evidence collected in this step (required)
+- `files_checked`: All files examined during investigation
+- `relevant_files`: Files directly relevant to the changes
+- `relevant_context`: Methods/functions/classes affected by changes
+- `issues_found`: Issues identified with severity levels
+- `confidence`: Confidence level in validation completeness (exploring/low/medium/high/certain)
+- `backtrack_from_step`: Step number to backtrack from (for revisions)
+- `hypothesis`: Current assessment of change safety and completeness
+- `images`: Screenshots of requirements, design mockups for validation
+
+**Initial Configuration (used in step 1):**
+- `path`: Starting directory to search for repos (default: current directory, absolute path required)
+- `prompt`: The original user request description for the changes (required for context)
+- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `compare_to`: Compare against a branch/tag instead of local changes (optional)
+- `severity_filter`: critical|high|medium|low|all (default: all)
+- `include_staged`: Include staged changes in the review (default: true)
+- `include_unstaged`: Include uncommitted changes in the review (default: true)
+- `focus_on`: Specific aspects to focus on
+- `temperature`: Temperature for response (default: 0.2)
+- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
+- `use_websearch`: Enable web search for best practices (default: true)
+- `use_assistant_model`: Whether to use expert validation phase (default: true, set to false to use Claude only)
+- `continuation_id`: Continue previous validation discussions
+
+## Usage Examples
+
+**Basic Pre-commit Validation:**
+```
+"Use zen precommit to validate my changes before committing"
+```
+
+**Security-Focused Validation:**
+```
+"Perform precommit security review with gemini pro on the authentication changes"
+```
+
+**Multi-Repository Validation:**
+```
+"Validate changes across all repositories in this workspace with o3"
+```
+
+**Against Specific Branch:**
+```
+"Compare current changes against main branch with precommit using gemini pro"
+```
+
+**With Requirements Context:**
+```
+"Precommit validation ensuring the new payment feature meets requirements in FEATURE_SPEC.md"
+```
+
+## Validation Scope
+
+The tool automatically discovers and validates:
+
+**Repository Discovery:**
+- Searches recursively for all `.git` directories
+- Handles nested repositories and submodules
+- Configurable search depth to prevent excessive recursion
+
+**Change Analysis:**
+- Staged changes (`git diff --cached`)
+- Unstaged changes (`git diff`)
+- Untracked files that should be added
+- Deleted files and their impact
+
+**Cross-Repository Impact:**
+- Shared dependencies between repositories
+- API contract changes that affect other repos
+- Configuration changes with system-wide impact
+
+## Validation Categories
+
+**Completeness Checks:**
+- New functions/classes have corresponding tests
+- Documentation updated for API changes
+- Configuration files updated as needed
+- Migration scripts for database changes
+
+**Quality Assurance:**
+- Code follows project standards
+- No obvious bugs or logical errors
+- Performance implications considered
+- Security vulnerabilities addressed
+
+**Requirement Compliance:**
+- Implementation matches original requirements
+- No scope creep or unauthorized changes
+- All acceptance criteria met
+- Edge cases properly handled
+
+**Integration Safety:**
+- Breaking changes properly documented
+- Backward compatibility maintained where required
+- Dependencies correctly updated
+- Environment-specific changes validated
+
+## Best Practices
+
+- **Provide clear context**: Include the original requirements or feature description
+- **Use for significant changes**: Most valuable for features, refactoring, or security updates
+- **Review before final commit**: Catch issues before they enter the main branch
+- **Include visual context**: Screenshots of requirements or expected behavior
+- **Focus validation scope**: Use `focus_on` parameter for specific concerns
+- **Multi-stage validation**: Use continuation for iterative improvement
+
+## Output Format
+
+Validation results include:
+- **Change Summary**: Overview of what was modified across repositories
+- **Requirement Compliance**: How well changes match original intent
+- **Completeness Assessment**: Missing tests, documentation, or related changes
+- **Security Review**: Potential vulnerabilities or exposed secrets
+- **Integration Impact**: Cross-repository and cross-module effects
+- **Recommendations**: Specific actions before committing
+
+## When to Use PreCommit vs Other Tools
+
+- **Use `precommit`** for: Validating changes before git commit, ensuring requirement compliance
+- **Use `codereview`** for: General code quality assessment without git context
+- **Use `debug`** for: Diagnosing specific runtime issues
+- **Use `analyze`** for: Understanding existing code without validation context
\ No newline at end of file
diff --git a/docs/tools/refactor.md b/docs/tools/refactor.md
index 6407a4a..d5fda94 100644
--- a/docs/tools/refactor.md
+++ b/docs/tools/refactor.md
@@ -103,7 +103,7 @@ This results in Claude first performing its own expert analysis, encouraging it
 **Initial Configuration (used in step 1):**
 - `prompt`: Description of refactoring goals, context, and specific areas of focus (required)
 - `refactor_type`: codesmells|decompose|modernize|organization (default: codesmells)
-- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `model`: auto|pro|flash|flash-2.0|flashlite|o3|o3-mini|o4-mini|gpt4.1|gpt5|gpt5-mini|gpt5-nano (default: server default)
 - `focus_areas`: Specific areas to focus on (e.g., 'performance', 'readability', 'maintainability', 'security')
 - `style_guide_examples`: Optional existing code files to use as style/pattern reference (absolute paths)
 - `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
diff --git a/docs/tools/refactor.md.bak b/docs/tools/refactor.md.bak
new file mode 100644
index 0000000..6407a4a
--- /dev/null
+++ b/docs/tools/refactor.md.bak
@@ -0,0 +1,201 @@
+# Refactor Tool - Intelligent Code Refactoring
+
+**Comprehensive refactoring analysis with top-down decomposition strategy through workflow-driven investigation**
+
+The `refactor` tool provides intelligent code refactoring recommendations with a focus on top-down decomposition and systematic code improvement. This workflow tool enforces systematic investigation of code smells, decomposition opportunities, and modernization possibilities across multiple steps, ensuring thorough analysis before providing expert refactoring recommendations with precise implementation guidance.
+
+## Thinking Mode
+
+**Default is `medium` (8,192 tokens).** Use `high` for complex legacy systems (worth the investment for thorough refactoring plans) or `max` for extremely complex codebases requiring deep analysis.
+
+## How the Workflow Works
+
+The refactor tool implements a **structured workflow** for systematic refactoring analysis:
+
+**Investigation Phase (Claude-Led):**
+1. **Step 1**: Claude describes the refactoring plan and begins analyzing code structure
+2. **Step 2+**: Claude examines code smells, decomposition opportunities, and modernization possibilities
+3. **Throughout**: Claude tracks findings, relevant files, refactoring opportunities, and confidence levels
+4. **Completion**: Once investigation is thorough, Claude signals completion
+
+**Expert Analysis Phase:**
+After Claude completes the investigation (unless confidence is **complete**):
+- Complete refactoring opportunity summary
+- Prioritized recommendations by impact
+- Precise implementation guidance with line numbers
+- Final expert assessment for refactoring strategy
+
+This workflow ensures methodical investigation before expert recommendations, resulting in more targeted and valuable refactoring plans.
+
+## Model Recommendation
+
+The refactor tool excels with models that have large context windows like Gemini Pro (1M tokens), which can analyze entire files and complex codebases simultaneously. This comprehensive view enables detection of cross-file dependencies, architectural patterns, and refactoring opportunities that might be missed when reviewing code in smaller chunks due to context constraints.
+
+## Example Prompts
+
+```
+"Use gemini pro to decompose my_crazy_big_class.m into smaller extensions"
+"Using zen's refactor decompose the all_in_one_sync_code.swift into maintainable extensions"
+```
+
+💡**Example of a powerful prompt** to get the best out of both Claude + Flash's 1M Context: 
+```
+"First, think about how the authentication module works, find related classes and find
+ any code smells, then using zen's refactor ask flash to confirm your findings but ask 
+ it to find additional code smells and any other quick-wins and then fix these issues"
+```
+
+This results in Claude first performing its own expert analysis, encouraging it to think critically and identify links within the project code. It then prompts `flash` to review the same code with a hint—preventing it from duplicating Claude's findings and encouraging it to explore other areas that Claude did *not* discover.
+
+## Key Features
+
+- **Intelligent prioritization** - Will refuse to work on low priority issues if code is unwieldy large and requires decomposition first, helps identify poorly managed classes and files that need structural improvements before detail work
+- **Top-down decomposition strategy** - Analyzes file → class → function levels systematically
+- **Four refactor types**: `codesmells` (detect anti-patterns), `decompose` (break down large components), `modernize` (update language features), `organization` (improve structure)
+- **Precise line-number references** - Provides exact line numbers for Claude to implement changes
+- **Language-specific guidance** - Tailored suggestions for Python, JavaScript, Java, C#, Swift, and more
+- **Style guide integration** - Uses existing project files as pattern references
+- **Conservative approach** - Careful dependency analysis to prevent breaking changes
+- **Multi-file analysis** - Understands cross-file relationships and dependencies
+- **Priority sequencing** - Recommends implementation order for refactoring changes
+- **Image support**: Analyze code architecture diagrams, legacy system charts: `"Refactor this legacy module using gemini pro with the current architecture diagram"`
+
+## Refactor Types (Progressive Priority System)
+
+**1. `decompose` (CRITICAL PRIORITY)** - Context-aware decomposition with adaptive thresholds:
+
+**AUTOMATIC decomposition** (CRITICAL severity - blocks all other refactoring):
+- Files >15,000 LOC, Classes >3,000 LOC, Functions >500 LOC
+
+**EVALUATE decomposition** (contextual severity - intelligent assessment):
+- Files >5,000 LOC, Classes >1,000 LOC, Functions >150 LOC
+- Only recommends if genuinely improves maintainability
+- Respects legacy stability, domain complexity, performance constraints
+- Considers legitimate cases where size is justified (algorithms, state machines, generated code)
+
+**2. `codesmells`** - Applied only after decomposition is complete:
+- Detect long methods, complex conditionals, duplicate code, magic numbers, poor naming
+
+**3. `modernize`** - Applied only after decomposition is complete:
+- Update to modern language features (f-strings, async/await, etc.)
+
+**4. `organization`** - Applied only after decomposition is complete:
+- Improve logical grouping, separation of concerns, module structure
+
+**Progressive Analysis:** The tool performs a top-down check (worse → bad → better) and refuses to work on lower-priority issues if critical decomposition is needed first. It understands that massive files and classes create cognitive overload that must be addressed before detail work can be effective. Legacy code that cannot be safely decomposed is handled with higher tolerance thresholds and context-sensitive exemptions.
+
+## Tool Parameters
+
+**Workflow Investigation Parameters (used during step-by-step process):**
+- `step`: Current investigation step description (required for each step)
+- `step_number`: Current step number in refactoring sequence (required)
+- `total_steps`: Estimated total investigation steps (adjustable)
+- `next_step_required`: Whether another investigation step is needed
+- `findings`: Discoveries and refactoring opportunities in this step (required)
+- `files_checked`: All files examined during investigation
+- `relevant_files`: Files directly needing refactoring (required in step 1)
+- `relevant_context`: Methods/functions/classes requiring refactoring
+- `issues_found`: Refactoring opportunities with severity and type
+- `confidence`: Confidence level in analysis completeness (exploring/incomplete/partial/complete)
+- `backtrack_from_step`: Step number to backtrack from (for revisions)
+- `hypothesis`: Current assessment of refactoring priorities
+
+**Initial Configuration (used in step 1):**
+- `prompt`: Description of refactoring goals, context, and specific areas of focus (required)
+- `refactor_type`: codesmells|decompose|modernize|organization (default: codesmells)
+- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `focus_areas`: Specific areas to focus on (e.g., 'performance', 'readability', 'maintainability', 'security')
+- `style_guide_examples`: Optional existing code files to use as style/pattern reference (absolute paths)
+- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
+- `use_assistant_model`: Whether to use expert analysis phase (default: true, set to false to use Claude only)
+- `continuation_id`: Thread continuation ID for multi-turn conversations
+
+## Usage Examples
+
+**Decomposition Analysis:**
+```
+"Analyze UserController.java for decomposition opportunities - it's becoming unwieldy"
+```
+
+**Code Smell Detection:**
+```
+"Use gemini to identify code smells in the authentication module with high thinking mode"
+```
+
+**Modernization:**
+```
+"Modernize legacy_parser.py to use modern Python features following examples/modern_patterns.py"
+```
+
+**Organization Improvement:**
+```
+"Refactor src/utils/ for better organization, focus on maintainability and readability"
+```
+
+**Legacy System Refactoring:**
+```
+"Use pro with max thinking to analyze this 10,000-line legacy file for decomposition strategy"
+```
+
+## Refactoring Strategy
+
+**Top-Down Analysis:**
+1. **File Level**: Identify oversized files that need splitting
+2. **Class Level**: Find classes with too many responsibilities  
+3. **Function Level**: Locate functions that are too complex or long
+4. **Code Quality**: Address smells, modernization, and organization
+
+**Context-Aware Decisions:**
+- **Domain Complexity**: Some domains legitimately require larger classes
+- **Performance Constraints**: Critical path code may resist decomposition
+- **Legacy Stability**: Old, working code may need gentler refactoring
+- **Test Coverage**: Refactoring recommendations consider testability
+
+**Breaking Change Prevention:**
+- Analyzes dependencies before suggesting splits
+- Recommends gradual migration strategies
+- Identifies public API impact
+- Suggests backward compatibility approaches
+
+## Best Practices
+
+- **Start with decomposition**: Address structural issues before cosmetic improvements
+- **Provide clear context**: Explain the codebase purpose and constraints
+- **Use appropriate refactor types**: Match the type to your primary concern
+- **Include style examples**: Reference existing well-structured code in your project
+- **Focus on high-impact areas**: Target the most problematic or frequently modified code
+- **Plan implementation order**: Follow the tool's sequencing recommendations
+- **Consider test coverage**: Ensure adequate tests before major structural changes
+
+## Output Format
+
+Refactoring analysis includes:
+- **Priority Assessment**: What needs attention first and why
+- **Decomposition Strategy**: Specific file/class/function splitting recommendations
+- **Implementation Plan**: Step-by-step refactoring sequence
+- **Line-Number References**: Exact locations for changes
+- **Dependency Analysis**: Impact assessment and migration strategies
+- **Risk Assessment**: Potential breaking changes and mitigation strategies
+
+## Advanced Features
+
+**Adaptive Thresholds:**
+The tool adjusts size thresholds based on context:
+- **Generated Code**: Higher tolerance for large files
+- **Algorithm Implementation**: Recognizes when size is justified
+- **Legacy Systems**: More conservative recommendations
+- **Test Files**: Different standards for test vs production code
+
+**Cross-File Refactoring:**
+Analyzes multiple files together to understand:
+- Shared responsibilities that could be extracted
+- Dependencies that complicate refactoring
+- Opportunities for new abstractions
+- Impact of changes across the codebase
+
+## When to Use Refactor vs Other Tools
+
+- **Use `refactor`** for: Structural improvements, decomposition, modernization, code organization
+- **Use `codereview`** for: Finding bugs and security issues with immediate fixes
+- **Use `analyze`** for: Understanding code without making change recommendations  
+- **Use `debug`** for: Solving specific runtime issues rather than structural problems
\ No newline at end of file
diff --git a/docs/tools/secaudit.md b/docs/tools/secaudit.md
index 280452f..31ddd6e 100644
--- a/docs/tools/secaudit.md
+++ b/docs/tools/secaudit.md
@@ -86,7 +86,7 @@ security remediation plan using planner
 - `images`: Architecture diagrams, security documentation, or visual references
 
 **Initial Security Configuration (used in step 1):**
-- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `model`: auto|pro|flash|flash-2.0|flashlite|o3|o3-mini|o4-mini|gpt4.1|gpt5|gpt5-mini|gpt5-nano (default: server default)
 - `security_scope`: Application context, technology stack, and security boundary definition (required)
 - `threat_level`: low|medium|high|critical (default: medium) - determines assessment depth and urgency
 - `compliance_requirements`: List of compliance frameworks to assess against (e.g., ["PCI DSS", "SOC2"])
diff --git a/docs/tools/secaudit.md.bak b/docs/tools/secaudit.md.bak
new file mode 100644
index 0000000..280452f
--- /dev/null
+++ b/docs/tools/secaudit.md.bak
@@ -0,0 +1,222 @@
+# Secaudit Tool - Comprehensive Security Audit
+
+**Systematic OWASP-based security assessment with compliance evaluation through workflow-driven investigation**
+
+The `secaudit` tool provides comprehensive security auditing capabilities with systematic OWASP Top 10 assessment, compliance framework evaluation, 
+and threat modeling. This workflow tool guides Claude through methodical security investigation steps with forced pauses between each step to ensure 
+thorough vulnerability assessment, security pattern analysis, and compliance verification before providing expert analysis.
+
+**Important**: AI models may not identify all security vulnerabilities. Always perform additional manual security reviews, 
+penetration testing, and verification.
+
+## How the Workflow Works
+
+The secaudit tool implements a **structured 6-step security workflow** that ensures comprehensive security assessment:
+
+**Investigation Phase (Claude-Led):**
+1. **Step 1**: Security Scope Analysis - Claude identifies application type, tech stack, attack surface, and compliance requirements
+2. **Step 2**: Authentication & Authorization Assessment - Analyzes auth mechanisms, session management, and access controls
+3. **Step 3**: Input Validation & Data Security - Reviews input handling, data protection, and injection vulnerabilities
+4. **Step 4**: OWASP Top 10 (2021) Review - Systematic assessment of all OWASP categories with specific findings
+5. **Step 5**: Dependencies & Infrastructure - Security analysis of third-party components and deployment configurations
+6. **Step 6**: Compliance & Risk Assessment - Evaluation against specified compliance frameworks and risk prioritization
+
+**Expert Analysis Phase:**
+After Claude completes the investigation (unless confidence is **certain**):
+- Complete security assessment summary with all vulnerabilities and evidence
+- OWASP Top 10 systematic findings with severity classifications
+- Compliance framework gap analysis and remediation recommendations
+- Risk-prioritized remediation roadmap based on threat level and business impact
+
+**Special Note**: If you want Claude to perform the entire security audit without calling another model, you can include "don't use any other model" in your prompt, and Claude will complete the full workflow independently.
+
+## Model Recommendation
+
+This tool particularly benefits from Gemini Pro or O3 models due to their advanced reasoning capabilities and large context windows, which allow comprehensive security analysis across complex codebases. Security audits require understanding subtle attack vectors and cross-component interactions that benefit from deeper analytical capabilities.
+
+## Example Prompts
+
+```
+Perform a secaudit with o3 on this e-commerce web application focusing on payment processing security and PCI DSS compliance
+```
+
+```
+Use secaudit to conduct a comprehensive security audit of the authentication system, threat level high, focus on enterprise 
+security patterns and HIPAA compliance
+```
+
+## Pro Tip: Multi-Scope Security Assessment
+
+**You can run parallel security audits for different application components:**
+
+```
+Start separate sub-tasks, in one start a secaudit for critical payment processing components focusing on PCI DSS with gemini pro, 
+and in the other for user management focusing on OWASP authentication vulnerabilities with o4-mini, then combine into a unified 
+security remediation plan using planner 
+```
+
+## Key Features
+
+- **OWASP Top 10 (2021) systematic assessment** with specific vulnerability identification
+- **Multi-compliance framework support**: SOC2, PCI DSS, HIPAA, GDPR, FedRAMP
+- **Threat-level aware analysis**: Critical, high, medium, low threat classifications
+- **Technology-specific security patterns**: Web apps, APIs, mobile, cloud, enterprise systems
+- **Risk-based prioritization**: Business impact and exploitability assessment
+- **Audit focus customization**: Comprehensive, authentication, data protection, infrastructure
+- **Image support**: Security analysis from architecture diagrams, network topology, or security findings
+- **Multi-file security analysis**: Cross-component vulnerability identification
+- **Compliance gap analysis**: Specific framework requirements with remediation guidance
+- **Attack surface mapping**: Entry points, data flows, and privilege boundaries
+- **Security control effectiveness**: Evaluation of existing security measures
+
+## Tool Parameters
+
+**Workflow Investigation Parameters (used during step-by-step process):**
+- `step`: Current security investigation step description (required for each step)
+- `step_number`: Current step number in audit sequence (required)
+- `total_steps`: Estimated total investigation steps (typically 4-6, adjustable)
+- `next_step_required`: Whether another investigation step is needed
+- `findings`: Security discoveries and evidence collected in this step (required)
+- `files_checked`: All files examined during security investigation
+- `relevant_files`: Files directly relevant to security assessment (required in step 1)
+- `relevant_context`: Methods/functions/classes central to security findings
+- `issues_found`: Security issues identified with severity levels
+- `confidence`: Confidence level in security assessment completeness (exploring/low/medium/high/certain)
+- `backtrack_from_step`: Step number to backtrack from (for revisions)
+- `images`: Architecture diagrams, security documentation, or visual references
+
+**Initial Security Configuration (used in step 1):**
+- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `security_scope`: Application context, technology stack, and security boundary definition (required)
+- `threat_level`: low|medium|high|critical (default: medium) - determines assessment depth and urgency
+- `compliance_requirements`: List of compliance frameworks to assess against (e.g., ["PCI DSS", "SOC2"])
+- `audit_focus`: comprehensive|authentication|data_protection|infrastructure|api_security (default: comprehensive)
+- `severity_filter`: critical|high|medium|low|all (default: all)
+- `temperature`: Temperature for analytical consistency (0-1, default 0.2)
+- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
+- `use_websearch`: Enable web search for security best practices and vulnerability databases (default: true)
+- `use_assistant_model`: Whether to use expert security analysis phase (default: true)
+- `continuation_id`: Continue previous security audit discussions
+
+## Audit Focus Areas
+
+**Comprehensive (default):**
+- Full OWASP Top 10 assessment with all security domains
+- Authentication, authorization, data protection, infrastructure
+- Best for complete security posture evaluation
+
+**Authentication:**
+- Focused on identity, access management, and session security
+- Multi-factor authentication, password policies, privilege escalation
+- Best for user management and access control systems
+
+**Data Protection:**
+- Encryption, data handling, privacy controls, and compliance
+- Input validation, output encoding, data classification
+- Best for applications handling sensitive or regulated data
+
+**Infrastructure:**
+- Deployment security, configuration management, dependency security
+- Network security, container security, cloud security posture
+- Best for DevOps and infrastructure security assessment
+
+**API Security:**
+- REST/GraphQL security, rate limiting, API authentication
+- Input validation, authorization patterns, API gateway security
+- Best for API-first applications and microservices
+
+## Threat Levels
+
+Security assessment depth and urgency:
+
+- **🔴 CRITICAL**: Mission-critical systems, high-value targets, regulatory requirements
+- **🟠 HIGH**: Business-critical applications, customer data handling, financial systems
+- **🟡 MEDIUM**: Standard business applications, internal tools, moderate risk exposure
+- **🟢 LOW**: Development environments, non-sensitive applications, proof-of-concepts
+
+## Compliance Frameworks
+
+Supported compliance assessments:
+
+- **SOC2**: Security, availability, processing integrity, confidentiality, privacy
+- **PCI DSS**: Payment card industry data security standards
+- **HIPAA**: Healthcare information privacy and security
+- **GDPR**: General data protection regulation compliance
+- **FedRAMP**: Federal risk and authorization management program
+- **ISO27001**: Information security management systems
+- **NIST**: Cybersecurity framework controls
+
+## OWASP Top 10 (2021) Coverage
+
+Systematic assessment includes:
+
+1. **A01 Broken Access Control**: Authorization flaws and privilege escalation
+2. **A02 Cryptographic Failures**: Encryption and data protection issues
+3. **A03 Injection**: SQL, NoSQL, OS, and LDAP injection vulnerabilities
+4. **A04 Insecure Design**: Security design flaws and threat modeling gaps
+5. **A05 Security Misconfiguration**: Configuration and hardening issues
+6. **A06 Vulnerable Components**: Third-party and dependency vulnerabilities
+7. **A07 Identification & Authentication Failures**: Authentication bypass and session management
+8. **A08 Software & Data Integrity Failures**: Supply chain and integrity violations
+9. **A09 Security Logging & Monitoring Failures**: Detection and response capabilities
+10. **A10 Server-Side Request Forgery**: SSRF and related vulnerabilities
+
+## Usage Examples
+
+**Comprehensive E-commerce Security Audit:**
+```
+"Conduct a comprehensive secaudit with gemini pro for our Node.js e-commerce platform, threat level high, 
+compliance requirements PCI DSS and SOC2, focus on payment processing security"
+```
+
+**Authentication System Security Review:**
+```
+"Use o3 to perform secaudit on authentication microservice, focus on authentication, 
+threat level critical, check for OWASP A07 and multi-factor authentication implementation"
+```
+
+**API Security Assessment:**
+```
+"Secaudit our REST API gateway with gemini pro, audit focus api_security, 
+compliance requirements GDPR, threat level medium"
+```
+
+**Infrastructure Security Review:**
+```
+"Perform secaudit on Kubernetes deployment manifests with o3, focus infrastructure, 
+threat level high, include container security and network policies"
+```
+
+**Quick Security Scan:**
+```
+"Fast secaudit of user registration flow with flash, focus authentication, 
+severity filter critical and high only"
+```
+
+## Best Practices
+
+- **Define clear security scope**: Specify application type, tech stack, and security boundaries
+- **Set appropriate threat levels**: Match assessment depth to risk exposure and criticality
+- **Include compliance requirements**: Specify relevant frameworks for regulatory alignment
+- **Use parallel audits**: Run separate assessments for different components or compliance frameworks
+- **Provide architectural context**: Include system diagrams, data flow documentation, or deployment topology
+- **Focus audit scope**: Use audit_focus for targeted assessments of specific security domains
+- **Follow up on findings**: Use continuation feature to dive deeper into specific vulnerabilities
+
+## Output Format
+
+Security audits include:
+- **Executive Security Summary**: Overall security posture and critical findings
+- **OWASP Top 10 Assessment**: Systematic review of each category with specific findings
+- **Compliance Gap Analysis**: Framework-specific requirements and current compliance status
+- **Risk-Prioritized Findings**: Vulnerabilities ordered by exploitability and business impact
+- **Remediation Roadmap**: Phased approach to security improvements with quick wins
+- **Security Architecture Recommendations**: Structural improvements for long-term security posture
+
+## When to Use Secaudit vs Other Tools
+
+- **Use `secaudit`** for: Comprehensive security assessment, compliance evaluation, OWASP-based vulnerability analysis
+- **Use `codereview`** for: General code quality with some security considerations
+- **Use `analyze`** for: Understanding security architecture without vulnerability assessment
+- **Use `debug`** for: Investigating specific security incidents or exploit attempts
+- **Use `precommit`** for: Pre-deployment security validation and change impact assessment
\ No newline at end of file
diff --git a/docs/tools/testgen.md b/docs/tools/testgen.md
index 0d74a98..7539b21 100644
--- a/docs/tools/testgen.md
+++ b/docs/tools/testgen.md
@@ -70,7 +70,7 @@ Test generation excels with extended reasoning models like Gemini Pro or O3, whi
 
 **Initial Configuration (used in step 1):**
 - `prompt`: Description of what to test, testing objectives, and specific scope/focus areas (required)
-- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `model`: auto|pro|flash|flash-2.0|flashlite|o3|o3-mini|o4-mini|gpt4.1|gpt5|gpt5-mini|gpt5-nano (default: server default)
 - `test_examples`: Optional existing test files or directories to use as style/pattern reference (absolute paths)
 - `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
 - `use_assistant_model`: Whether to use expert test generation phase (default: true, set to false to use Claude only)
diff --git a/docs/tools/testgen.md.bak b/docs/tools/testgen.md.bak
new file mode 100644
index 0000000..0d74a98
--- /dev/null
+++ b/docs/tools/testgen.md.bak
@@ -0,0 +1,220 @@
+# TestGen Tool - Comprehensive Test Generation
+
+**Generates thorough test suites with edge case coverage through workflow-driven investigation**
+
+The `testgen` tool creates comprehensive test suites by analyzing your code paths, understanding intricate dependencies, and identifying realistic edge cases and failure scenarios that need test coverage. This workflow tool guides Claude through systematic investigation of code functionality, critical paths, edge cases, and integration points across multiple steps before generating comprehensive tests with realistic failure mode analysis.
+
+## Thinking Mode
+
+**Default is `medium` (8,192 tokens) for extended thinking models.** Use `high` for complex systems with many interactions or `max` for critical systems requiring exhaustive test coverage.
+
+## How the Workflow Works
+
+The testgen tool implements a **structured workflow** for comprehensive test generation:
+
+**Investigation Phase (Claude-Led):**
+1. **Step 1**: Claude describes the test generation plan and begins analyzing code functionality
+2. **Step 2+**: Claude examines critical paths, edge cases, error handling, and integration points
+3. **Throughout**: Claude tracks findings, test scenarios, and coverage gaps
+4. **Completion**: Once investigation is thorough, Claude signals completion
+
+**Test Generation Phase:**
+After Claude completes the investigation:
+- Complete test scenario catalog with all edge cases
+- Framework-specific test generation
+- Realistic failure mode coverage
+- Final test suite with comprehensive coverage
+
+This workflow ensures methodical analysis before test generation, resulting in more thorough and valuable test suites.
+
+## Model Recommendation
+
+Test generation excels with extended reasoning models like Gemini Pro or O3, which can analyze complex code paths, understand intricate dependencies, and identify comprehensive edge cases. The combination of large context windows and advanced reasoning enables generation of thorough test suites that cover realistic failure scenarios and integration points that shorter-context models might overlook.
+
+## Example Prompts
+
+**Basic Usage:**
+```
+"Use zen to generate tests for User.login() method"
+"Generate comprehensive tests for the sorting method in src/new_sort.py using o3"
+"Create tests for edge cases not already covered in our tests using gemini pro"
+```
+
+## Key Features
+
+- **Multi-agent workflow** analyzing code paths and identifying realistic failure modes
+- **Generates framework-specific tests** following project conventions
+- **Supports test pattern following** when examples are provided
+- **Dynamic token allocation** (25% for test examples, 75% for main code)
+- **Prioritizes smallest test files** for pattern detection
+- **Can reference existing test files**: `"Generate tests following patterns from tests/unit/"`
+- **Specific code coverage** - target specific functions/classes rather than testing everything
+- **Image support**: Test UI components, analyze visual requirements: `"Generate tests for this login form using the UI mockup screenshot"`
+- **Edge case identification**: Systematic discovery of boundary conditions and error states
+- **Realistic failure mode analysis**: Understanding what can actually go wrong in production
+- **Integration test support**: Tests that cover component interactions and system boundaries
+
+## Tool Parameters
+
+**Workflow Investigation Parameters (used during step-by-step process):**
+- `step`: Current investigation step description (required for each step)
+- `step_number`: Current step number in test generation sequence (required)
+- `total_steps`: Estimated total investigation steps (adjustable)
+- `next_step_required`: Whether another investigation step is needed
+- `findings`: Discoveries about functionality and test scenarios (required)
+- `files_checked`: All files examined during investigation
+- `relevant_files`: Files directly needing tests (required in step 1)
+- `relevant_context`: Methods/functions/classes requiring test coverage
+- `confidence`: Confidence level in test plan completeness (exploring/low/medium/high/certain)
+- `backtrack_from_step`: Step number to backtrack from (for revisions)
+
+**Initial Configuration (used in step 1):**
+- `prompt`: Description of what to test, testing objectives, and specific scope/focus areas (required)
+- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `test_examples`: Optional existing test files or directories to use as style/pattern reference (absolute paths)
+- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
+- `use_assistant_model`: Whether to use expert test generation phase (default: true, set to false to use Claude only)
+
+## Usage Examples
+
+**Method-Specific Tests:**
+```
+"Generate tests for User.login() method covering authentication success, failure, and edge cases"
+```
+
+**Class Testing:**
+```
+"Use pro to generate comprehensive tests for PaymentProcessor class with max thinking mode"
+```
+
+**Following Existing Patterns:**
+```
+"Generate tests for new authentication module following patterns from tests/unit/auth/"
+```
+
+**UI Component Testing:**
+```
+"Generate tests for this login form component using the UI mockup screenshot"
+```
+
+**Algorithm Testing:**
+```
+"Create thorough tests for the sorting algorithm in utils/sort.py, focus on edge cases and performance"
+```
+
+**Integration Testing:**
+```
+"Generate integration tests for the payment processing pipeline from order creation to completion"
+```
+
+## Test Generation Strategy
+
+**Code Path Analysis:**
+- Identifies all execution paths through the code
+- Maps conditional branches and loops
+- Discovers error handling paths
+- Analyzes state transitions
+
+**Edge Case Discovery:**
+- Boundary value analysis (empty, null, max values)
+- Invalid input scenarios
+- Race conditions and timing issues
+- Resource exhaustion cases
+
+**Failure Mode Analysis:**
+- External dependency failures
+- Network and I/O errors
+- Authentication and authorization failures
+- Data corruption scenarios
+
+**Framework Detection:**
+The tool automatically detects and generates tests for:
+- **Python**: pytest, unittest, nose2
+- **JavaScript**: Jest, Mocha, Jasmine, Vitest
+- **Java**: JUnit 4/5, TestNG, Mockito
+- **C#**: NUnit, MSTest, xUnit
+- **Swift**: XCTest
+- **Go**: testing package
+- **And more**: Adapts to project conventions
+
+## Test Categories Generated
+
+**Unit Tests:**
+- Function/method behavior validation
+- Input/output verification
+- Error condition handling
+- State change verification
+
+**Integration Tests:**
+- Component interaction testing
+- API endpoint validation
+- Database integration
+- External service mocking
+
+**Edge Case Tests:**
+- Boundary conditions
+- Invalid inputs
+- Resource limits
+- Concurrent access
+
+**Performance Tests:**
+- Response time validation
+- Memory usage checks
+- Load handling
+- Scalability verification
+
+## Best Practices
+
+- **Be specific about scope**: Target specific functions/classes rather than requesting tests for everything
+- **Provide test examples**: Include existing test files for pattern consistency
+- **Focus on critical paths**: Prioritize testing of business-critical functionality
+- **Include visual context**: Screenshots or mockups for UI component testing
+- **Describe testing objectives**: Explain what aspects are most important to test
+- **Consider test maintenance**: Request readable, maintainable test code
+
+## Test Quality Features
+
+**Realistic Test Data:**
+- Generates meaningful test data that represents real-world scenarios
+- Avoids trivial test cases that don't add value
+- Creates data that exercises actual business logic
+
+**Comprehensive Coverage:**
+- Happy path scenarios
+- Error conditions and exceptions
+- Edge cases and boundary conditions
+- Integration points and dependencies
+
+**Maintainable Code:**
+- Clear test names that describe what's being tested
+- Well-organized test structure
+- Appropriate use of setup/teardown
+- Minimal test data and mocking
+
+## Advanced Features
+
+**Pattern Following:**
+When test examples are provided, the tool analyzes:
+- Naming conventions and structure
+- Assertion patterns and style
+- Mocking and setup approaches
+- Test data organization
+
+**Large Context Analysis:**
+With models like Gemini Pro, the tool can:
+- Analyze extensive codebases for comprehensive test coverage
+- Understand complex interactions across multiple modules
+- Generate integration tests that span multiple components
+
+**Visual Testing:**
+For UI components and visual elements:
+- Generate tests based on visual requirements
+- Create accessibility testing scenarios
+- Test responsive design behaviors
+
+## When to Use TestGen vs Other Tools
+
+- **Use `testgen`** for: Creating comprehensive test suites, filling test coverage gaps, testing new features
+- **Use `debug`** for: Diagnosing specific test failures or runtime issues
+- **Use `codereview`** for: Reviewing existing test quality and coverage
+- **Use `analyze`** for: Understanding existing test structure without generating new tests
\ No newline at end of file
diff --git a/docs/tools/thinkdeep.md b/docs/tools/thinkdeep.md
index 26d5322..68a788d 100644
--- a/docs/tools/thinkdeep.md
+++ b/docs/tools/thinkdeep.md
@@ -30,7 +30,7 @@ with the best architecture for my project
 ## Tool Parameters
 
 - `prompt`: Your current thinking/analysis to extend and validate (required)
-- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `model`: auto|pro|flash|flash-2.0|flashlite|o3|o3-mini|o4-mini|gpt4.1|gpt5|gpt5-mini|gpt5-nano (default: server default)
 - `problem_context`: Additional context about the problem or goal
 - `focus_areas`: Specific aspects to focus on (architecture, performance, security, etc.)
 - `files`: Optional file paths or directories for additional context (absolute paths)
diff --git a/docs/tools/thinkdeep.md.bak b/docs/tools/thinkdeep.md.bak
new file mode 100644
index 0000000..26d5322
--- /dev/null
+++ b/docs/tools/thinkdeep.md.bak
@@ -0,0 +1,97 @@
+# ThinkDeep Tool - Extended Reasoning Partner
+
+**Get a second opinion to augment Claude's own extended thinking**
+
+The `thinkdeep` tool provides extended reasoning capabilities, offering a second perspective to augment Claude's analysis. It's designed to challenge assumptions, find edge cases, and provide alternative approaches to complex problems.
+
+## Thinking Mode
+
+**Default is `high` (16,384 tokens) for deep analysis.** Claude will automatically choose the best mode based on complexity - use `low` for quick validations, `medium` for standard problems, `high` for complex issues (default), or `max` for extremely complex challenges requiring deepest analysis.
+
+## Example Prompt
+
+```
+Think deeper about my authentication design with pro using max thinking mode and brainstorm to come up 
+with the best architecture for my project
+```
+
+## Key Features
+
+- **Uses Gemini's specialized thinking models** for enhanced reasoning capabilities
+- **Provides a second opinion** on Claude's analysis
+- **Challenges assumptions** and identifies edge cases Claude might miss
+- **Offers alternative perspectives** and approaches
+- **Validates architectural decisions** and design patterns
+- **File reference support**: `"Use gemini to think deeper about my API design with reference to api/routes.py"`
+- **Image support**: Analyze architectural diagrams, flowcharts, design mockups: `"Think deeper about this system architecture diagram with gemini pro using max thinking mode"`
+- **Enhanced Critical Evaluation (v2.10.0)**: After Gemini's analysis, Claude is prompted to critically evaluate the suggestions, consider context and constraints, identify risks, and synthesize a final recommendation - ensuring a balanced, well-considered solution
+- **Web search capability**: When enabled (default: true), identifies areas where current documentation or community solutions would strengthen the analysis and suggests specific searches for Claude
+
+## Tool Parameters
+
+- `prompt`: Your current thinking/analysis to extend and validate (required)
+- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
+- `problem_context`: Additional context about the problem or goal
+- `focus_areas`: Specific aspects to focus on (architecture, performance, security, etc.)
+- `files`: Optional file paths or directories for additional context (absolute paths)
+- `images`: Optional images for visual analysis (absolute paths)
+- `temperature`: Temperature for creative thinking (0-1, default 0.7)
+- `thinking_mode`: minimal|low|medium|high|max (default: high, Gemini only)
+- `use_websearch`: Enable web search for documentation and insights (default: true)
+- `continuation_id`: Continue previous conversations
+
+## Usage Examples
+
+**Architecture Design:**
+```
+"Think deeper about my microservices authentication strategy with pro using max thinking mode"
+```
+
+**With File Context:**
+```
+"Use gemini to think deeper about my API design with reference to api/routes.py and models/user.py"
+```
+
+**Visual Analysis:**
+```
+"Think deeper about this system architecture diagram with gemini pro - identify potential bottlenecks"
+```
+
+**Problem Solving:**
+```
+"I'm considering using GraphQL vs REST for my API. Think deeper about the trade-offs with o3 using high thinking mode"
+```
+
+**Code Review Enhancement:**
+```
+"Think deeper about the security implications of this authentication code with pro"
+```
+
+## Best Practices
+
+- **Provide detailed context**: Share your current thinking, constraints, and objectives
+- **Be specific about focus areas**: Mention what aspects need deeper analysis
+- **Include relevant files**: Reference code, documentation, or configuration files
+- **Use appropriate thinking modes**: Higher modes for complex problems, lower for quick validations
+- **Leverage visual context**: Include diagrams or mockups for architectural discussions
+- **Build on discussions**: Use continuation to extend previous analyses
+
+## Enhanced Critical Evaluation Process
+
+The `thinkdeep` tool includes a unique two-stage process:
+
+1. **Gemini's Analysis**: Extended reasoning with specialized thinking capabilities
+2. **Claude's Critical Evaluation**: Claude reviews Gemini's suggestions, considers:
+   - Context and constraints of your specific situation
+   - Potential risks and implementation challenges
+   - Trade-offs and alternatives
+   - Final synthesized recommendation
+
+This ensures you get both deep reasoning and practical, context-aware advice.
+
+## When to Use ThinkDeep vs Other Tools
+
+- **Use `thinkdeep`** for: Extending specific analysis, challenging assumptions, architectural decisions
+- **Use `chat`** for: Open-ended brainstorming and general discussions
+- **Use `analyze`** for: Understanding existing code without extending analysis
+- **Use `codereview`** for: Finding specific bugs and security issues
\ No newline at end of file