New tool! "challenge" with confidence and stop Claude from agreeing with you blindly and undoing the _correct_ strategy because you were wrong

Fixed run script to ensure pip is installed
2025-06-29 15:50:45 +04:00
parent 4972e7c281
commit 6b495cea0b
8 changed files with 509 additions and 44 deletions
--- a/README.md
+++ b/README.md
@@ -56,17 +56,18 @@ Because these AI models [clearly aren't when they get chatty →](docs/ai_banter
 - **Tools Reference**
  - [`chat`](#1-chat---general-development-chat--collaborative-thinking) - Collaborative thinking
  - [`thinkdeep`](#2-thinkdeep---extended-reasoning-partner) - Extended reasoning
-  - [`planner`](#3-planner---interactive-step-by-step-planning) - Interactive step-by-step planning
-  - [`consensus`](#4-consensus---multi-model-perspective-gathering) - Multi-model consensus analysis
-  - [`codereview`](#5-codereview---professional-code-review) - Code review
-  - [`precommit`](#6-precommit---pre-commit-validation) - Pre-commit validation
-  - [`debug`](#7-debug---expert-debugging-assistant) - Debugging help
-  - [`analyze`](#8-analyze---smart-file-analysis) - File analysis
-  - [`refactor`](#9-refactor---intelligent-code-refactoring) - Code refactoring with decomposition focus
-  - [`tracer`](#10-tracer---static-code-analysis-prompt-generator) - Call-flow mapping and dependency tracing
-  - [`testgen`](#11-testgen---comprehensive-test-generation) - Test generation with edge cases
-  - [`secaudit`](#12-secaudit---comprehensive-security-audit) - Security audit with OWASP analysis
-  - [`docgen`](#13-docgen---comprehensive-documentation-generation) - Documentation generation with complexity analysis
+  - [`challenge`](#3-challenge---critical-challenge-prompt) - Prevents **You're absolutely right!** responses
+  - [`planner`](#4-planner---interactive-step-by-step-planning) - Interactive step-by-step planning
+  - [`consensus`](#5-consensus---multi-model-perspective-gathering) - Multi-model consensus analysis
+  - [`codereview`](#6-codereview---professional-code-review) - Code review
+  - [`precommit`](#7-precommit---pre-commit-validation) - Pre-commit validation
+  - [`debug`](#8-debug---expert-debugging-assistant) - Debugging help
+  - [`analyze`](#9-analyze---smart-file-analysis) - File analysis
+  - [`refactor`](#10-refactor---intelligent-code-refactoring) - Code refactoring with decomposition focus
+  - [`tracer`](#11-tracer---static-code-analysis-prompt-generator) - Call-flow mapping and dependency tracing
+  - [`testgen`](#12-testgen---comprehensive-test-generation) - Test generation with edge cases
+  - [`secaudit`](#13-secaudit---comprehensive-security-audit) - Security audit with OWASP analysis
+  - [`docgen`](#14-docgen---comprehensive-documentation-generation) - Documentation generation with complexity analysis

 - **Advanced Usage**
  - [Advanced Features](#advanced-features) - AI-to-AI conversations, large prompts, web search
@@ -343,6 +344,7 @@ and feel the difference.
 **Quick Tool Selection Guide:**
 - **Need a thinking partner?** → `chat` (brainstorm ideas, get second opinions, validate approaches)
 - **Need deeper thinking?** → `thinkdeep` (extends analysis, finds edge cases)
+- **Want to prevent "You're absolutely right!" responses?** → `challenge` (challenges assumptions, encourages thoughtful re-evaluation)
 - **Need to break down complex projects?** → `planner` (step-by-step planning, project structure, breaking down complex ideas)
 - **Need multiple perspectives?** → `consensus` (get diverse expert opinions on proposals and decisions)
 - **Code needs review?** → `codereview` (bugs, security, performance issues)
@@ -371,19 +373,20 @@ and feel the difference.
 **Tools Overview:**
 1. [`chat`](docs/tools/chat.md) - Collaborative thinking and development conversations
 2. [`thinkdeep`](docs/tools/thinkdeep.md) - Extended reasoning and problem-solving
-3. [`planner`](docs/tools/planner.md) - Interactive sequential planning for complex projects
-4. [`consensus`](docs/tools/consensus.md) - Multi-model consensus analysis with stance steering
-5. [`codereview`](docs/tools/codereview.md) - Professional code review with severity levels
-6. [`precommit`](docs/tools/precommit.md) - Validate git changes before committing
-7. [`debug`](docs/tools/debug.md) - Systematic investigation and debugging
-8. [`analyze`](docs/tools/analyze.md) - General-purpose file and code analysis
-9. [`refactor`](docs/tools/refactor.md) - Code refactoring with decomposition focus
-10. [`tracer`](docs/tools/tracer.md) - Static code analysis prompt generator for call-flow mapping
-11. [`testgen`](docs/tools/testgen.md) - Comprehensive test generation with edge case coverage
-12. [`secaudit`](docs/tools/secaudit.md) - Comprehensive security audit with OWASP Top 10 analysis
-13. [`docgen`](docs/tools/docgen.md) - Comprehensive documentation generation with complexity analysis
-14. [`listmodels`](docs/tools/listmodels.md) - Display all available AI models organized by provider
-15. [`version`](docs/tools/version.md) - Get server version and configuration
+3. [`challenge`](docs/tools/challenge.md) - Critical challenge prompt, prevents **You're absolutely right!**
+4. [`planner`](docs/tools/planner.md) - Interactive sequential planning for complex projects
+5. [`consensus`](docs/tools/consensus.md) - Multi-model consensus analysis with stance steering
+6. [`codereview`](docs/tools/codereview.md) - Professional code review with severity levels
+7. [`precommit`](docs/tools/precommit.md) - Validate git changes before committing
+8. [`debug`](docs/tools/debug.md) - Systematic investigation and debugging
+9. [`analyze`](docs/tools/analyze.md) - General-purpose file and code analysis
+10. [`refactor`](docs/tools/refactor.md) - Code refactoring with decomposition focus
+11. [`tracer`](docs/tools/tracer.md) - Static code analysis prompt generator for call-flow mapping
+12. [`testgen`](docs/tools/testgen.md) - Comprehensive test generation with edge case coverage
+13. [`secaudit`](docs/tools/secaudit.md) - Comprehensive security audit with OWASP Top 10 analysis
+14. [`docgen`](docs/tools/docgen.md) - Comprehensive documentation generation with complexity analysis
+15. [`listmodels`](docs/tools/listmodels.md) - Display all available AI models organized by provider
+16. [`version`](docs/tools/version.md) - Get server version and configuration

 ### 1. `chat` - General Development Chat & Collaborative Thinking
 Your thinking partner for brainstorming, getting second opinions, and validating approaches. Perfect for technology comparisons, architecture discussions, and collaborative problem-solving.
@@ -404,7 +407,20 @@ and find out what the root cause is

 **[📖 Read More](docs/tools/thinkdeep.md)** - Enhanced analysis capabilities and critical evaluation process

-### 3. `planner` - Interactive Step-by-Step Planning
+### 3. `challenge` - Critical Challenge Prompt
+Encourages thoughtful reassessment of statements instead of automatic agreement, especially when you're wrong.
+Wraps your input with instructions for critical thinking and honest analysis.
+
+```
+challenge isn't adding this function to the base class was a bad idea?
+```
+
+Normally, your favorite coding agent will enthusiastically reply with **“You’re absolutely right!”**—then proceed 
+to completely reverse the _correct_ strategy, without ever explaining why you're wrong.
+
+**[📖 Read More](docs/tools/challenge.md)** - Critical thinking tool for validating ideas
+
+### 4. `planner` - Interactive Step-by-Step Planning
 Break down complex projects or ideas into manageable, structured plans through step-by-step thinking.
 Perfect for adding new features to an existing system, scaling up system design, migration strategies,
 and architectural planning with branching and revision capabilities.
@@ -424,7 +440,7 @@ I implement first?

 **[📖 Read More](docs/tools/planner.md)** - Step-by-step planning methodology and multi-session continuation

-### 4. `consensus` - Multi-Model Perspective Gathering
+### 5. `consensus` - Multi-Model Perspective Gathering
 Get diverse expert opinions from multiple AI models on technical proposals and decisions. Supports stance steering (for/against/neutral) and structured decision-making.

 ```
@@ -434,7 +450,7 @@ migrate from REST to GraphQL for our API. I need a definitive answer.

 **[📖 Read More](docs/tools/consensus.md)** - Multi-model orchestration and decision analysis

-### 5. `codereview` - Professional Code Review
+### 6. `codereview` - Professional Code Review
 Comprehensive code analysis with prioritized feedback and severity levels. This workflow tool guides Claude through systematic investigation steps with forced pauses between each step to ensure thorough code examination, issue identification, and quality assessment before providing expert analysis.

 ```
@@ -449,7 +465,7 @@ and there may be more potential vulnerabilities. Find and share related code."

 **[📖 Read More](docs/tools/codereview.md)** - Professional review workflow with step-by-step analysis

-### 6. `precommit` - Pre-Commit Validation
+### 7. `precommit` - Pre-Commit Validation
 Comprehensive review of staged/unstaged git changes across multiple repositories. This workflow tool guides Claude through systematic investigation of git changes, repository status, and file modifications across multiple steps before providing expert validation to ensure changes meet requirements and prevent regressions.

 ```
@@ -499,7 +515,7 @@ Nice! This is just one instance - take a look at [another example here](docs/too

 **[📖 Read More](docs/tools/precommit.md)** - Multi-repository validation and change analysis

-### 7. `debug` - Expert Debugging Assistant
+### 8. `debug` - Expert Debugging Assistant
 Systematic investigation-guided debugging that walks Claude through step-by-step root cause analysis. This workflow 
 tool enforces a structured investigation process where Claude performs methodical code examination, evidence collection, 
 and hypothesis formation across multiple steps before receiving expert analysis from the selected AI model. When Claude's 
@@ -525,7 +541,7 @@ Use continuation with thinkdeep, share details with o4-mini to find out what the

 **[📖 Read More](docs/tools/debug.md)** - Step-by-step investigation methodology with workflow enforcement

-### 8. `analyze` - Smart File Analysis
+### 9. `analyze` - Smart File Analysis
 General-purpose code understanding and exploration. This workflow tool guides Claude through systematic investigation of code structure, patterns, and architectural decisions across multiple steps, gathering comprehensive insights before providing expert analysis for architecture assessment, pattern detection, and strategic improvement recommendations.

 ```
@@ -534,7 +550,7 @@ Use gemini to analyze main.py to understand how it works

 **[📖 Read More](docs/tools/analyze.md)** - Comprehensive analysis workflow with step-by-step investigation

-### 9. `refactor` - Intelligent Code Refactoring
+### 10. `refactor` - Intelligent Code Refactoring
 Comprehensive refactoring analysis with top-down decomposition strategy. This workflow tool enforces systematic investigation of code smells, decomposition opportunities, and modernization possibilities across multiple steps, ensuring thorough analysis before providing expert refactoring recommendations with precise implementation guidance.

 ```
@@ -543,7 +559,7 @@ Use gemini pro to decompose my_crazy_big_class.m into smaller extensions

 **[📖 Read More](docs/tools/refactor.md)** - Workflow-driven refactoring with progressive analysis

-### 10. `tracer` - Static Code Analysis Prompt Generator
+### 11. `tracer` - Static Code Analysis Prompt Generator
 Creates detailed analysis prompts for call-flow mapping and dependency tracing. Generates structured analysis requests for precision execution flow or dependency mapping.

 ```
@@ -552,7 +568,7 @@ Use zen tracer to analyze how UserAuthManager.authenticate is used and why

 **[📖 Read More](docs/tools/tracer.md)** - Prompt generation and analysis modes

-### 11. `testgen` - Comprehensive Test Generation
+### 12. `testgen` - Comprehensive Test Generation
 Generates thorough test suites with edge case coverage based on existing code and test framework. This workflow tool guides Claude through systematic investigation of code functionality, critical paths, edge cases, and integration points across multiple steps before generating comprehensive tests with realistic failure mode analysis.

 ```
@@ -561,7 +577,7 @@ Use zen to generate tests for User.login() method

 **[📖 Read More](docs/tools/testgen.md)** - Workflow-based test generation with comprehensive coverage

-### 12. `secaudit` - Comprehensive Security Audit
+### 13. `secaudit` - Comprehensive Security Audit
 Systematic OWASP-based security assessment with compliance evaluation. This workflow tool guides Claude through methodical security investigation steps with forced pauses between each step to ensure thorough vulnerability assessment, security pattern analysis, and compliance verification before providing expert analysis.

 ```
@@ -570,7 +586,7 @@ Perform a secaudit with o3 on this e-commerce web application focusing on paymen

 **[📖 Read More](docs/tools/secaudit.md)** - OWASP Top 10 analysis with compliance framework support

-### 13. `docgen` - Comprehensive Documentation Generation
+### 14. `docgen` - Comprehensive Documentation Generation
 Generates thorough documentation with complexity analysis and gotcha identification. This workflow tool guides Claude through systematic investigation of code structure, function complexity, and documentation needs across multiple steps before generating comprehensive documentation that includes algorithmic complexity, call flow information, and unexpected behaviors that developers should know about.

 ```
@@ -583,7 +599,7 @@ Use docgen to add complexity analysis to all the new swift functions I added but

 **[📖 Read More](docs/tools/docgen.md)** - Workflow-based documentation generation with gotcha detection

-### 14. `listmodels` - List Available Models
+### 15. `listmodels` - List Available Models
 Display all available AI models organized by provider, showing capabilities, context windows, and configuration status.

 ```
@@ -592,7 +608,7 @@ Use zen to list available models

 **[📖 Read More](docs/tools/listmodels.md)** - Model capabilities and configuration details

-### 15. `version` - Server Information
+### 16. `version` - Server Information
 Get server version, configuration details, and system status for debugging and troubleshooting.

 ```
--- a/config.py
+++ b/config.py
@@ -14,9 +14,9 @@ import os
 # These values are used in server responses and for tracking releases
 # IMPORTANT: This is the single source of truth for version and author info
 # Semantic versioning: MAJOR.MINOR.PATCH
-__version__ = "5.7.6"
+__version__ = "5.8.0"
 # Last update date in ISO format
-__updated__ = "2025-06-29"
+__updated__ = "2025-06-30"
 # Primary maintainer
 __author__ = "Fahad Gilani"

--- a/docs/tools/challenge.md
+++ b/docs/tools/challenge.md
@@ -0,0 +1,20 @@
+# challenge - Critical Challenge Tool
+
+The `challenge` tool encourages thoughtful critical thinking instead of automatic agreement with the dreaded **You're absolutely right!** responses - especially 
+when you're not. This tool wraps your comment with instructions that prompt critical thinking and honest analysis instead of blind agreement.
+
+## Quick Example
+
+```
+challenge but the new function you added duplicates the hashing method, no?
+```
+
+The tool wraps your statement with instructions that explicitly tell Claude to think critically and disagree if warranted, rather than automatically agreeing.
+
+## Why Use Challenge?
+
+AI assistants sometimes tend to agree too readily. The challenge tool helps you:
+- Get genuine critical evaluation of your ideas
+- Challenge assumptions constructively
+- Receive honest feedback on proposals
+- Validate approaches with thoughtful analysis
--- a/run-server.sh
+++ b/run-server.sh
@@ -512,7 +512,7 @@ bootstrap_pip() {
    print_info "Bootstrapping pip in virtual environment..."
    
    # Try ensurepip first
-    if $venv_python -m ensurepip --default-pip 2>/dev/null; then
+    if $venv_python -m ensurepip --default-pip >/dev/null 2>&1; then
        print_success "Successfully bootstrapped pip using ensurepip"
        return 0
    fi
@@ -579,6 +579,17 @@ setup_environment() {
            if venv_python=$(get_venv_python_path "$VENV_PATH"); then
                touch "$VENV_PATH/uv_created"  # Mark as uv-created
                print_success "Created environment with uv using Python 3.12"
+                
+                # Ensure pip is installed in uv environment
+                if ! $venv_python -m pip --version &>/dev/null 2>&1; then
+                    print_info "Installing pip in uv environment..."
+                    # uv doesn't install pip by default, use bootstrap method
+                    if bootstrap_pip "$venv_python" "python3"; then
+                        print_success "pip installed in uv environment"
+                    else
+                        print_warning "Failed to install pip in uv environment"
+                    fi
+                fi
            else
                print_warning "uv succeeded but Python executable not found in venv"
            fi
@@ -589,6 +600,17 @@ setup_environment() {
                touch "$VENV_PATH/uv_created"  # Mark as uv-created
                local python_version=$($venv_python --version 2>&1)
                print_success "Created environment with uv using $python_version"
+                
+                # Ensure pip is installed in uv environment
+                if ! $venv_python -m pip --version &>/dev/null 2>&1; then
+                    print_info "Installing pip in uv environment..."
+                    # uv doesn't install pip by default, use bootstrap method
+                    if bootstrap_pip "$venv_python" "python3"; then
+                        print_success "pip installed in uv environment"
+                    else
+                        print_warning "Failed to install pip in uv environment"
+                    fi
+                fi
            else
                print_warning "uv succeeded but Python executable not found in venv"
            fi
@@ -755,8 +777,10 @@ setup_venv() {
        exit 1
    fi
    
-    # Check if pip exists in the virtual environment (skip check if using uv-created environment)
-    if [[ ! -f "$VENV_PATH/uv_created" ]] && [[ ! -f "$venv_pip" ]] && ! $venv_python -m pip --version &>/dev/null 2>&1; then
+    # Always check if pip exists in the virtual environment (regardless of how it was created)
+    if [[ ! -f "$venv_pip" ]] && ! $venv_python -m pip --version &>/dev/null 2>&1; then
+        print_warning "pip not found in virtual environment, installing..."
+        
        # On Linux, try to install system packages if pip is missing
        local os_type=$(detect_os)
        if [[ "$os_type" == "linux" || "$os_type" == "wsl" ]]; then
@@ -838,8 +862,8 @@ install_dependencies() {
    local python_cmd="$1"
    local deps_needed=false
    
-    # First verify pip is available (skip check if using uv)
-    if [[ ! -f "$VENV_PATH/uv_created" ]] && ! $python_cmd -m pip --version &>/dev/null 2>&1; then
+    # First verify pip is available (always check, even for uv environments)
+    if ! $python_cmd -m pip --version &>/dev/null 2>&1; then
        print_error "pip is not available in the Python environment"
        echo ""
        echo "This indicates an incomplete Python installation."
--- a/server.py
+++ b/server.py
@@ -64,6 +64,7 @@ from config import (  # noqa: E402
 )
 from tools import (  # noqa: E402
    AnalyzeTool,
+    ChallengeTool,
    ChatTool,
    CodeReviewTool,
    ConsensusTool,
@@ -274,6 +275,7 @@ TOOLS = {
    "refactor": RefactorTool(),  # Step-by-step refactoring analysis workflow with expert validation
    "tracer": TracerTool(),  # Static call path prediction and control flow analysis
    "testgen": TestGenTool(),  # Step-by-step test generation workflow with expert validation
+    "challenge": ChallengeTool(),  # Critical challenge prompt wrapper to avoid automatic agreement
    "listmodels": ListModelsTool(),  # List all available AI models by provider
    "version": VersionTool(),  # Display server version and system information
 }
@@ -346,6 +348,11 @@ PROMPT_TEMPLATES = {
        "description": "Generate comprehensive tests",
        "template": "Generate comprehensive tests with {model}",
    },
+    "challenge": {
+        "name": "challenge",
+        "description": "Challenge a statement critically without automatic agreement",
+        "template": "Challenge this statement critically",
+    },
    "listmodels": {
        "name": "listmodels",
        "description": "List available AI models",
--- a/tests/test_challenge.py
+++ b/tests/test_challenge.py
@@ -0,0 +1,200 @@
+"""
+Tests for Challenge tool - validating critical challenge prompt wrapper
+
+This module contains unit tests to ensure that the Challenge tool
+properly wraps statements to encourage critical thinking and avoid
+automatic agreement patterns.
+"""
+
+import json
+from unittest.mock import patch
+
+import pytest
+
+from tools.challenge import ChallengeRequest, ChallengeTool
+
+
+class TestChallengeTool:
+    """Test suite for Challenge tool"""
+
+    def setup_method(self):
+        """Set up test fixtures"""
+        self.tool = ChallengeTool()
+
+    def test_tool_metadata(self):
+        """Test that tool metadata matches requirements"""
+        assert self.tool.get_name() == "challenge"
+        assert "CRITICAL CHALLENGE PROMPT" in self.tool.get_description()
+        assert "challenge it thoughtfully" in self.tool.get_description()
+        assert "agreeing by default" in self.tool.get_description()
+        assert self.tool.get_default_temperature() == 0.2  # TEMPERATURE_ANALYTICAL
+
+    def test_requires_model(self):
+        """Test that challenge tool doesn't require a model"""
+        assert self.tool.requires_model() is False
+
+    def test_schema_structure(self):
+        """Test that schema has correct structure and excludes model fields"""
+        schema = self.tool.get_input_schema()
+
+        # Basic schema structure
+        assert schema["type"] == "object"
+        assert "properties" in schema
+        assert "required" in schema
+
+        # Required fields
+        assert "prompt" in schema["required"]
+        assert len(schema["required"]) == 1  # Only prompt is required
+
+        # Properties
+        properties = schema["properties"]
+        assert "prompt" in properties
+
+        # Should NOT have model-related fields since it doesn't require a model
+        assert "model" not in properties
+        assert "temperature" not in properties
+        assert "thinking_mode" not in properties
+        assert "use_websearch" not in properties
+        assert "continuation_id" not in properties
+
+    def test_request_model_validation(self):
+        """Test that the request model validates correctly"""
+        # Test valid request
+        request = ChallengeRequest(prompt="The sky is green")
+        assert request.prompt == "The sky is green"
+
+        # Test with longer prompt
+        long_prompt = (
+            "Machine learning models always produce accurate results and should be trusted without verification"
+        )
+        request = ChallengeRequest(prompt=long_prompt)
+        assert request.prompt == long_prompt
+
+    def test_required_fields(self):
+        """Test that required fields are enforced"""
+        from pydantic import ValidationError
+
+        # Missing prompt should raise validation error
+        with pytest.raises(ValidationError):
+            ChallengeRequest()
+
+    @pytest.mark.asyncio
+    async def test_execute_success(self):
+        """Test successful execution of challenge tool"""
+        arguments = {"prompt": "All software bugs are caused by syntax errors"}
+
+        result = await self.tool.execute(arguments)
+
+        # Should return a list with TextContent
+        assert len(result) == 1
+        assert result[0].type == "text"
+
+        # Parse the JSON response
+        response_data = json.loads(result[0].text)
+
+        # Check response structure
+        assert response_data["status"] == "challenge_created"
+        assert response_data["original_statement"] == "All software bugs are caused by syntax errors"
+        assert "challenge_prompt" in response_data
+        assert "instructions" in response_data
+
+        # Check that the challenge prompt contains critical thinking instructions
+        challenge_prompt = response_data["challenge_prompt"]
+        assert "CHALLENGE THIS STATEMENT - Do not automatically agree" in challenge_prompt
+        assert "Is this actually correct? Check carefully" in challenge_prompt
+        assert response_data["original_statement"] in challenge_prompt
+        assert "you must say so" in challenge_prompt
+        assert "Provide your honest assessment, not automatic agreement" in challenge_prompt
+
+    @pytest.mark.asyncio
+    async def test_execute_error_handling(self):
+        """Test error handling in execute method"""
+        # Test with invalid arguments (non-dict)
+        with patch.object(self.tool, "get_request_model", side_effect=Exception("Test error")):
+            result = await self.tool.execute({"prompt": "test"})
+
+        assert len(result) == 1
+        response_data = json.loads(result[0].text)
+        assert response_data["status"] == "error"
+        assert "Test error" in response_data["error"]
+
+    def test_wrap_prompt_for_challenge(self):
+        """Test the prompt wrapping functionality"""
+        original_prompt = "Python is the best programming language"
+        wrapped = self.tool._wrap_prompt_for_challenge(original_prompt)
+
+        # Check structure
+        assert "CHALLENGE THIS STATEMENT - Do not automatically agree" in wrapped
+        assert "Is this actually correct? Check carefully" in wrapped
+        assert f'"{original_prompt}"' in wrapped
+        assert "you must say so" in wrapped
+        assert "Provide your honest assessment, not automatic agreement" in wrapped
+
+    def test_multiple_prompts(self):
+        """Test that tool handles various types of prompts correctly"""
+        test_prompts = [
+            "All code should be written in assembly for maximum performance",
+            "Comments are unnecessary if code is self-documenting",
+            "Testing is a waste of time for experienced developers",
+            "Global variables make code easier to understand",
+            "The more design patterns used, the better the code",
+        ]
+
+        for prompt in test_prompts:
+            request = ChallengeRequest(prompt=prompt)
+            wrapped = self.tool._wrap_prompt_for_challenge(request.prompt)
+
+            # Each wrapped prompt should contain the original
+            assert prompt in wrapped
+            assert "CHALLENGE THIS STATEMENT" in wrapped
+
+    def test_tool_fields(self):
+        """Test tool-specific field definitions"""
+        fields = self.tool.get_tool_fields()
+
+        assert "prompt" in fields
+        assert fields["prompt"]["type"] == "string"
+        assert "statement" in fields["prompt"]["description"]
+        assert "challenge" in fields["prompt"]["description"]
+
+    def test_required_fields_list(self):
+        """Test required fields list"""
+        required = self.tool.get_required_fields()
+        assert required == ["prompt"]
+
+    @pytest.mark.asyncio
+    async def test_not_used_methods(self):
+        """Test that methods not used by challenge tool work correctly"""
+        request = ChallengeRequest(prompt="test")
+
+        # These methods aren't used since challenge doesn't call AI
+        prompt = await self.tool.prepare_prompt(request)
+        assert prompt == ""
+
+        response = self.tool.format_response("test response", request)
+        assert response == "test response"
+
+    def test_special_characters_in_prompt(self):
+        """Test handling of special characters in prompts"""
+        special_prompt = 'The "best" way to handle errors is to use try/except: pass'
+        request = ChallengeRequest(prompt=special_prompt)
+        wrapped = self.tool._wrap_prompt_for_challenge(request.prompt)
+
+        # Should handle quotes properly
+        assert special_prompt in wrapped
+
+    @pytest.mark.asyncio
+    async def test_unicode_support(self):
+        """Test that tool handles unicode characters correctly"""
+        unicode_prompt = "软件开发中最重要的是写代码，测试不重要 🚀"
+        arguments = {"prompt": unicode_prompt}
+
+        result = await self.tool.execute(arguments)
+        response_data = json.loads(result[0].text)
+
+        assert response_data["original_statement"] == unicode_prompt
+        assert unicode_prompt in response_data["challenge_prompt"]
+
+
+if __name__ == "__main__":
+    pytest.main([__file__])
--- a/tools/init.py
+++ b/tools/init.py
@@ -3,6 +3,7 @@ Tool implementations for Zen MCP Server
 """

 from .analyze import AnalyzeTool
+from .challenge import ChallengeTool
 from .chat import ChatTool
 from .codereview import CodeReviewTool
 from .consensus import ConsensusTool
@@ -29,6 +30,7 @@ __all__ = [
    "ListModelsTool",
    "PlannerTool",
    "PrecommitTool",
+    "ChallengeTool",
    "RefactorTool",
    "SecauditTool",
    "TestGenTool",
--- a/tools/challenge.py
+++ b/tools/challenge.py
@@ -0,0 +1,196 @@
+"""
+Challenge tool - Encourages critical thinking and thoughtful disagreement
+
+This tool takes a user's statement and returns it wrapped in instructions that
+encourage the CLI agent to challenge ideas and think critically before agreeing. It helps
+avoid reflexive agreement by prompting deeper analysis and genuine evaluation.
+
+This is a simple, self-contained tool that doesn't require AI model access.
+"""
+
+from typing import TYPE_CHECKING, Any, Optional
+
+from pydantic import Field
+
+if TYPE_CHECKING:
+    from tools.models import ToolModelCategory
+
+from config import TEMPERATURE_ANALYTICAL
+from tools.shared.base_models import ToolRequest
+
+from .simple.base import SimpleTool
+
+# Field descriptions for the Challenge tool
+CHALLENGE_FIELD_DESCRIPTIONS = {
+    "prompt": (
+        "The statement, question, or assertion the user wants to challenge critically. "
+        "This may be a claim, suggestion, or idea that requires thoughtful reconsideration, not automatic agreement."
+    ),
+}
+
+
+class ChallengeRequest(ToolRequest):
+    """Request model for Challenge tool"""
+
+    prompt: str = Field(..., description=CHALLENGE_FIELD_DESCRIPTIONS["prompt"])
+
+
+class ChallengeTool(SimpleTool):
+    """
+    Challenge tool for encouraging critical thinking and avoiding automatic agreement.
+
+    This tool wraps user statements in instructions that encourage the CLI agent to:
+    - Challenge ideas and think critically before responding
+    - Evaluate whether they actually agree or disagree
+    - Provide thoughtful analysis rather than reflexive agreement
+
+    The tool is self-contained and doesn't require AI model access - it simply
+    transforms the input prompt into a structured critical thinking challenge.
+    """
+
+    def get_name(self) -> str:
+        return "challenge"
+
+    def get_description(self) -> str:
+        return (
+            "CRITICAL CHALLENGE PROMPT – Use this to frame your statement in a way that prompts "
+            "the CLI agent to challenge it thoughtfully instead of agreeing by default. Ideal for "
+            "challenging assumptions, validating ideas, and seeking honest, analytical feedback as part of an ongoing "
+            "task. The tool wraps your input with instructions explicitly telling the agent to think critically "
+            "and disagree if warranted."
+        )
+
+    def get_system_prompt(self) -> str:
+        # Challenge tool doesn't need a system prompt since it doesn't call AI
+        return ""
+
+    def get_default_temperature(self) -> float:
+        return TEMPERATURE_ANALYTICAL
+
+    def get_model_category(self) -> "ToolModelCategory":
+        """Challenge doesn't need a model category since it doesn't use AI"""
+        from tools.models import ToolModelCategory
+
+        return ToolModelCategory.FAST_RESPONSE  # Default, but not used
+
+    def requires_model(self) -> bool:
+        """
+        Challenge tool doesn't require model resolution at the MCP boundary.
+
+        Like the planner tool, this is a pure data processing tool that transforms
+        the input without calling external AI models.
+
+        Returns:
+            bool: False - challenge doesn't need AI model access
+        """
+        return False
+
+    def get_request_model(self):
+        """Return the Challenge-specific request model"""
+        return ChallengeRequest
+
+    def get_input_schema(self) -> dict[str, Any]:
+        """
+        Generate input schema for the challenge tool.
+
+        Since this tool doesn't require a model, we exclude model-related fields.
+        """
+        schema = {
+            "type": "object",
+            "properties": {
+                "prompt": {
+                    "type": "string",
+                    "description": CHALLENGE_FIELD_DESCRIPTIONS["prompt"],
+                },
+            },
+            "required": ["prompt"],
+        }
+
+        return schema
+
+    async def execute(self, arguments: dict[str, Any]) -> list:
+        """
+        Execute the challenge tool by wrapping the prompt in critical thinking instructions.
+
+        This is the main execution method that transforms the user's statement into
+        a structured challenge that encourages thoughtful re-evaluation.
+        """
+        import json
+
+        from mcp.types import TextContent
+
+        try:
+            # Validate request
+            request = self.get_request_model()(**arguments)
+
+            # Wrap the prompt in challenge instructions
+            wrapped_prompt = self._wrap_prompt_for_challenge(request.prompt)
+
+            # Return the wrapped prompt as the response
+            response_data = {
+                "status": "challenge_created",
+                "original_statement": request.prompt,
+                "challenge_prompt": wrapped_prompt,
+                "instructions": (
+                    "Present the challenge_prompt to yourself and follow its instructions. "
+                    "Challenge the statement critically before forming your response. "
+                    "If you disagree after careful reconsideration, explain why."
+                ),
+            }
+
+            return [TextContent(type="text", text=json.dumps(response_data, indent=2, ensure_ascii=False))]
+
+        except Exception as e:
+            import logging
+
+            logger = logging.getLogger(__name__)
+            logger.error(f"Error in challenge tool execution: {e}", exc_info=True)
+
+            error_data = {
+                "status": "error",
+                "error": str(e),
+                "content": f"Failed to create challenge prompt: {str(e)}",
+            }
+
+            return [TextContent(type="text", text=json.dumps(error_data, ensure_ascii=False))]
+
+    def _wrap_prompt_for_challenge(self, prompt: str) -> str:
+        """
+        Wrap the user's statement in instructions that encourage critical challenge.
+
+        Args:
+            prompt: The original user statement to wrap
+
+        Returns:
+            The statement wrapped in challenge instructions
+        """
+        return (
+            f"CHALLENGE THIS STATEMENT - Do not automatically agree:\n\n"
+            f'"{prompt}"\n\n'
+            f"Is this actually correct? Check carefully. If it's wrong, incomplete, misleading or incorrect, "
+            f"you must say so. Provide your honest assessment, not automatic agreement. If you "
+            f"feel there is merit in what the user is saying, explain WHY you agree."
+        )
+
+    # Required method implementations from SimpleTool
+
+    async def prepare_prompt(self, request: ChallengeRequest) -> str:
+        """Not used since challenge doesn't call AI models"""
+        return ""
+
+    def format_response(self, response: str, request: ChallengeRequest, model_info: Optional[dict] = None) -> str:
+        """Not used since challenge doesn't call AI models"""
+        return response
+
+    def get_tool_fields(self) -> dict[str, dict[str, Any]]:
+        """Tool-specific field definitions for Challenge"""
+        return {
+            "prompt": {
+                "type": "string",
+                "description": CHALLENGE_FIELD_DESCRIPTIONS["prompt"],
+            },
+        }
+
+    def get_required_fields(self) -> list[str]:
+        """Required fields for Challenge tool"""
+        return ["prompt"]