New tool! "challenge" with confidence and stop Claude from agreeing with you blindly and undoing the _correct_ strategy because you were wrong

Fixed run script to ensure pip is installed
This commit is contained in:
Fahad
2025-06-29 15:50:45 +04:00
parent 4972e7c281
commit 6b495cea0b
8 changed files with 509 additions and 44 deletions

View File

@@ -56,17 +56,18 @@ Because these AI models [clearly aren't when they get chatty →](docs/ai_banter
- **Tools Reference**
- [`chat`](#1-chat---general-development-chat--collaborative-thinking) - Collaborative thinking
- [`thinkdeep`](#2-thinkdeep---extended-reasoning-partner) - Extended reasoning
- [`planner`](#3-planner---interactive-step-by-step-planning) - Interactive step-by-step planning
- [`consensus`](#4-consensus---multi-model-perspective-gathering) - Multi-model consensus analysis
- [`codereview`](#5-codereview---professional-code-review) - Code review
- [`precommit`](#6-precommit---pre-commit-validation) - Pre-commit validation
- [`debug`](#7-debug---expert-debugging-assistant) - Debugging help
- [`analyze`](#8-analyze---smart-file-analysis) - File analysis
- [`refactor`](#9-refactor---intelligent-code-refactoring) - Code refactoring with decomposition focus
- [`tracer`](#10-tracer---static-code-analysis-prompt-generator) - Call-flow mapping and dependency tracing
- [`testgen`](#11-testgen---comprehensive-test-generation) - Test generation with edge cases
- [`secaudit`](#12-secaudit---comprehensive-security-audit) - Security audit with OWASP analysis
- [`docgen`](#13-docgen---comprehensive-documentation-generation) - Documentation generation with complexity analysis
- [`challenge`](#3-challenge---critical-challenge-prompt) - Prevents **You're absolutely right!** responses
- [`planner`](#4-planner---interactive-step-by-step-planning) - Interactive step-by-step planning
- [`consensus`](#5-consensus---multi-model-perspective-gathering) - Multi-model consensus analysis
- [`codereview`](#6-codereview---professional-code-review) - Code review
- [`precommit`](#7-precommit---pre-commit-validation) - Pre-commit validation
- [`debug`](#8-debug---expert-debugging-assistant) - Debugging help
- [`analyze`](#9-analyze---smart-file-analysis) - File analysis
- [`refactor`](#10-refactor---intelligent-code-refactoring) - Code refactoring with decomposition focus
- [`tracer`](#11-tracer---static-code-analysis-prompt-generator) - Call-flow mapping and dependency tracing
- [`testgen`](#12-testgen---comprehensive-test-generation) - Test generation with edge cases
- [`secaudit`](#13-secaudit---comprehensive-security-audit) - Security audit with OWASP analysis
- [`docgen`](#14-docgen---comprehensive-documentation-generation) - Documentation generation with complexity analysis
- **Advanced Usage**
- [Advanced Features](#advanced-features) - AI-to-AI conversations, large prompts, web search
@@ -343,6 +344,7 @@ and feel the difference.
**Quick Tool Selection Guide:**
- **Need a thinking partner?** → `chat` (brainstorm ideas, get second opinions, validate approaches)
- **Need deeper thinking?** → `thinkdeep` (extends analysis, finds edge cases)
- **Want to prevent "You're absolutely right!" responses?** → `challenge` (challenges assumptions, encourages thoughtful re-evaluation)
- **Need to break down complex projects?** → `planner` (step-by-step planning, project structure, breaking down complex ideas)
- **Need multiple perspectives?** → `consensus` (get diverse expert opinions on proposals and decisions)
- **Code needs review?** → `codereview` (bugs, security, performance issues)
@@ -371,19 +373,20 @@ and feel the difference.
**Tools Overview:**
1. [`chat`](docs/tools/chat.md) - Collaborative thinking and development conversations
2. [`thinkdeep`](docs/tools/thinkdeep.md) - Extended reasoning and problem-solving
3. [`planner`](docs/tools/planner.md) - Interactive sequential planning for complex projects
4. [`consensus`](docs/tools/consensus.md) - Multi-model consensus analysis with stance steering
5. [`codereview`](docs/tools/codereview.md) - Professional code review with severity levels
6. [`precommit`](docs/tools/precommit.md) - Validate git changes before committing
7. [`debug`](docs/tools/debug.md) - Systematic investigation and debugging
8. [`analyze`](docs/tools/analyze.md) - General-purpose file and code analysis
9. [`refactor`](docs/tools/refactor.md) - Code refactoring with decomposition focus
10. [`tracer`](docs/tools/tracer.md) - Static code analysis prompt generator for call-flow mapping
11. [`testgen`](docs/tools/testgen.md) - Comprehensive test generation with edge case coverage
12. [`secaudit`](docs/tools/secaudit.md) - Comprehensive security audit with OWASP Top 10 analysis
13. [`docgen`](docs/tools/docgen.md) - Comprehensive documentation generation with complexity analysis
14. [`listmodels`](docs/tools/listmodels.md) - Display all available AI models organized by provider
15. [`version`](docs/tools/version.md) - Get server version and configuration
3. [`challenge`](docs/tools/challenge.md) - Critical challenge prompt, prevents **You're absolutely right!**
4. [`planner`](docs/tools/planner.md) - Interactive sequential planning for complex projects
5. [`consensus`](docs/tools/consensus.md) - Multi-model consensus analysis with stance steering
6. [`codereview`](docs/tools/codereview.md) - Professional code review with severity levels
7. [`precommit`](docs/tools/precommit.md) - Validate git changes before committing
8. [`debug`](docs/tools/debug.md) - Systematic investigation and debugging
9. [`analyze`](docs/tools/analyze.md) - General-purpose file and code analysis
10. [`refactor`](docs/tools/refactor.md) - Code refactoring with decomposition focus
11. [`tracer`](docs/tools/tracer.md) - Static code analysis prompt generator for call-flow mapping
12. [`testgen`](docs/tools/testgen.md) - Comprehensive test generation with edge case coverage
13. [`secaudit`](docs/tools/secaudit.md) - Comprehensive security audit with OWASP Top 10 analysis
14. [`docgen`](docs/tools/docgen.md) - Comprehensive documentation generation with complexity analysis
15. [`listmodels`](docs/tools/listmodels.md) - Display all available AI models organized by provider
16. [`version`](docs/tools/version.md) - Get server version and configuration
### 1. `chat` - General Development Chat & Collaborative Thinking
Your thinking partner for brainstorming, getting second opinions, and validating approaches. Perfect for technology comparisons, architecture discussions, and collaborative problem-solving.
@@ -404,7 +407,20 @@ and find out what the root cause is
**[📖 Read More](docs/tools/thinkdeep.md)** - Enhanced analysis capabilities and critical evaluation process
### 3. `planner` - Interactive Step-by-Step Planning
### 3. `challenge` - Critical Challenge Prompt
Encourages thoughtful reassessment of statements instead of automatic agreement, especially when you're wrong.
Wraps your input with instructions for critical thinking and honest analysis.
```
challenge isn't adding this function to the base class was a bad idea?
```
Normally, your favorite coding agent will enthusiastically reply with **“Youre absolutely right!”**—then proceed
to completely reverse the _correct_ strategy, without ever explaining why you're wrong.
**[📖 Read More](docs/tools/challenge.md)** - Critical thinking tool for validating ideas
### 4. `planner` - Interactive Step-by-Step Planning
Break down complex projects or ideas into manageable, structured plans through step-by-step thinking.
Perfect for adding new features to an existing system, scaling up system design, migration strategies,
and architectural planning with branching and revision capabilities.
@@ -424,7 +440,7 @@ I implement first?
**[📖 Read More](docs/tools/planner.md)** - Step-by-step planning methodology and multi-session continuation
### 4. `consensus` - Multi-Model Perspective Gathering
### 5. `consensus` - Multi-Model Perspective Gathering
Get diverse expert opinions from multiple AI models on technical proposals and decisions. Supports stance steering (for/against/neutral) and structured decision-making.
```
@@ -434,7 +450,7 @@ migrate from REST to GraphQL for our API. I need a definitive answer.
**[📖 Read More](docs/tools/consensus.md)** - Multi-model orchestration and decision analysis
### 5. `codereview` - Professional Code Review
### 6. `codereview` - Professional Code Review
Comprehensive code analysis with prioritized feedback and severity levels. This workflow tool guides Claude through systematic investigation steps with forced pauses between each step to ensure thorough code examination, issue identification, and quality assessment before providing expert analysis.
```
@@ -449,7 +465,7 @@ and there may be more potential vulnerabilities. Find and share related code."
**[📖 Read More](docs/tools/codereview.md)** - Professional review workflow with step-by-step analysis
### 6. `precommit` - Pre-Commit Validation
### 7. `precommit` - Pre-Commit Validation
Comprehensive review of staged/unstaged git changes across multiple repositories. This workflow tool guides Claude through systematic investigation of git changes, repository status, and file modifications across multiple steps before providing expert validation to ensure changes meet requirements and prevent regressions.
```
@@ -499,7 +515,7 @@ Nice! This is just one instance - take a look at [another example here](docs/too
**[📖 Read More](docs/tools/precommit.md)** - Multi-repository validation and change analysis
### 7. `debug` - Expert Debugging Assistant
### 8. `debug` - Expert Debugging Assistant
Systematic investigation-guided debugging that walks Claude through step-by-step root cause analysis. This workflow
tool enforces a structured investigation process where Claude performs methodical code examination, evidence collection,
and hypothesis formation across multiple steps before receiving expert analysis from the selected AI model. When Claude's
@@ -525,7 +541,7 @@ Use continuation with thinkdeep, share details with o4-mini to find out what the
**[📖 Read More](docs/tools/debug.md)** - Step-by-step investigation methodology with workflow enforcement
### 8. `analyze` - Smart File Analysis
### 9. `analyze` - Smart File Analysis
General-purpose code understanding and exploration. This workflow tool guides Claude through systematic investigation of code structure, patterns, and architectural decisions across multiple steps, gathering comprehensive insights before providing expert analysis for architecture assessment, pattern detection, and strategic improvement recommendations.
```
@@ -534,7 +550,7 @@ Use gemini to analyze main.py to understand how it works
**[📖 Read More](docs/tools/analyze.md)** - Comprehensive analysis workflow with step-by-step investigation
### 9. `refactor` - Intelligent Code Refactoring
### 10. `refactor` - Intelligent Code Refactoring
Comprehensive refactoring analysis with top-down decomposition strategy. This workflow tool enforces systematic investigation of code smells, decomposition opportunities, and modernization possibilities across multiple steps, ensuring thorough analysis before providing expert refactoring recommendations with precise implementation guidance.
```
@@ -543,7 +559,7 @@ Use gemini pro to decompose my_crazy_big_class.m into smaller extensions
**[📖 Read More](docs/tools/refactor.md)** - Workflow-driven refactoring with progressive analysis
### 10. `tracer` - Static Code Analysis Prompt Generator
### 11. `tracer` - Static Code Analysis Prompt Generator
Creates detailed analysis prompts for call-flow mapping and dependency tracing. Generates structured analysis requests for precision execution flow or dependency mapping.
```
@@ -552,7 +568,7 @@ Use zen tracer to analyze how UserAuthManager.authenticate is used and why
**[📖 Read More](docs/tools/tracer.md)** - Prompt generation and analysis modes
### 11. `testgen` - Comprehensive Test Generation
### 12. `testgen` - Comprehensive Test Generation
Generates thorough test suites with edge case coverage based on existing code and test framework. This workflow tool guides Claude through systematic investigation of code functionality, critical paths, edge cases, and integration points across multiple steps before generating comprehensive tests with realistic failure mode analysis.
```
@@ -561,7 +577,7 @@ Use zen to generate tests for User.login() method
**[📖 Read More](docs/tools/testgen.md)** - Workflow-based test generation with comprehensive coverage
### 12. `secaudit` - Comprehensive Security Audit
### 13. `secaudit` - Comprehensive Security Audit
Systematic OWASP-based security assessment with compliance evaluation. This workflow tool guides Claude through methodical security investigation steps with forced pauses between each step to ensure thorough vulnerability assessment, security pattern analysis, and compliance verification before providing expert analysis.
```
@@ -570,7 +586,7 @@ Perform a secaudit with o3 on this e-commerce web application focusing on paymen
**[📖 Read More](docs/tools/secaudit.md)** - OWASP Top 10 analysis with compliance framework support
### 13. `docgen` - Comprehensive Documentation Generation
### 14. `docgen` - Comprehensive Documentation Generation
Generates thorough documentation with complexity analysis and gotcha identification. This workflow tool guides Claude through systematic investigation of code structure, function complexity, and documentation needs across multiple steps before generating comprehensive documentation that includes algorithmic complexity, call flow information, and unexpected behaviors that developers should know about.
```
@@ -583,7 +599,7 @@ Use docgen to add complexity analysis to all the new swift functions I added but
**[📖 Read More](docs/tools/docgen.md)** - Workflow-based documentation generation with gotcha detection
### 14. `listmodels` - List Available Models
### 15. `listmodels` - List Available Models
Display all available AI models organized by provider, showing capabilities, context windows, and configuration status.
```
@@ -592,7 +608,7 @@ Use zen to list available models
**[📖 Read More](docs/tools/listmodels.md)** - Model capabilities and configuration details
### 15. `version` - Server Information
### 16. `version` - Server Information
Get server version, configuration details, and system status for debugging and troubleshooting.
```

View File

@@ -14,9 +14,9 @@ import os
# These values are used in server responses and for tracking releases
# IMPORTANT: This is the single source of truth for version and author info
# Semantic versioning: MAJOR.MINOR.PATCH
__version__ = "5.7.6"
__version__ = "5.8.0"
# Last update date in ISO format
__updated__ = "2025-06-29"
__updated__ = "2025-06-30"
# Primary maintainer
__author__ = "Fahad Gilani"

20
docs/tools/challenge.md Normal file
View File

@@ -0,0 +1,20 @@
# challenge - Critical Challenge Tool
The `challenge` tool encourages thoughtful critical thinking instead of automatic agreement with the dreaded **You're absolutely right!** responses - especially
when you're not. This tool wraps your comment with instructions that prompt critical thinking and honest analysis instead of blind agreement.
## Quick Example
```
challenge but the new function you added duplicates the hashing method, no?
```
The tool wraps your statement with instructions that explicitly tell Claude to think critically and disagree if warranted, rather than automatically agreeing.
## Why Use Challenge?
AI assistants sometimes tend to agree too readily. The challenge tool helps you:
- Get genuine critical evaluation of your ideas
- Challenge assumptions constructively
- Receive honest feedback on proposals
- Validate approaches with thoughtful analysis

View File

@@ -512,7 +512,7 @@ bootstrap_pip() {
print_info "Bootstrapping pip in virtual environment..."
# Try ensurepip first
if $venv_python -m ensurepip --default-pip 2>/dev/null; then
if $venv_python -m ensurepip --default-pip >/dev/null 2>&1; then
print_success "Successfully bootstrapped pip using ensurepip"
return 0
fi
@@ -579,6 +579,17 @@ setup_environment() {
if venv_python=$(get_venv_python_path "$VENV_PATH"); then
touch "$VENV_PATH/uv_created" # Mark as uv-created
print_success "Created environment with uv using Python 3.12"
# Ensure pip is installed in uv environment
if ! $venv_python -m pip --version &>/dev/null 2>&1; then
print_info "Installing pip in uv environment..."
# uv doesn't install pip by default, use bootstrap method
if bootstrap_pip "$venv_python" "python3"; then
print_success "pip installed in uv environment"
else
print_warning "Failed to install pip in uv environment"
fi
fi
else
print_warning "uv succeeded but Python executable not found in venv"
fi
@@ -589,6 +600,17 @@ setup_environment() {
touch "$VENV_PATH/uv_created" # Mark as uv-created
local python_version=$($venv_python --version 2>&1)
print_success "Created environment with uv using $python_version"
# Ensure pip is installed in uv environment
if ! $venv_python -m pip --version &>/dev/null 2>&1; then
print_info "Installing pip in uv environment..."
# uv doesn't install pip by default, use bootstrap method
if bootstrap_pip "$venv_python" "python3"; then
print_success "pip installed in uv environment"
else
print_warning "Failed to install pip in uv environment"
fi
fi
else
print_warning "uv succeeded but Python executable not found in venv"
fi
@@ -755,8 +777,10 @@ setup_venv() {
exit 1
fi
# Check if pip exists in the virtual environment (skip check if using uv-created environment)
if [[ ! -f "$VENV_PATH/uv_created" ]] && [[ ! -f "$venv_pip" ]] && ! $venv_python -m pip --version &>/dev/null 2>&1; then
# Always check if pip exists in the virtual environment (regardless of how it was created)
if [[ ! -f "$venv_pip" ]] && ! $venv_python -m pip --version &>/dev/null 2>&1; then
print_warning "pip not found in virtual environment, installing..."
# On Linux, try to install system packages if pip is missing
local os_type=$(detect_os)
if [[ "$os_type" == "linux" || "$os_type" == "wsl" ]]; then
@@ -838,8 +862,8 @@ install_dependencies() {
local python_cmd="$1"
local deps_needed=false
# First verify pip is available (skip check if using uv)
if [[ ! -f "$VENV_PATH/uv_created" ]] && ! $python_cmd -m pip --version &>/dev/null 2>&1; then
# First verify pip is available (always check, even for uv environments)
if ! $python_cmd -m pip --version &>/dev/null 2>&1; then
print_error "pip is not available in the Python environment"
echo ""
echo "This indicates an incomplete Python installation."

View File

@@ -64,6 +64,7 @@ from config import ( # noqa: E402
)
from tools import ( # noqa: E402
AnalyzeTool,
ChallengeTool,
ChatTool,
CodeReviewTool,
ConsensusTool,
@@ -274,6 +275,7 @@ TOOLS = {
"refactor": RefactorTool(), # Step-by-step refactoring analysis workflow with expert validation
"tracer": TracerTool(), # Static call path prediction and control flow analysis
"testgen": TestGenTool(), # Step-by-step test generation workflow with expert validation
"challenge": ChallengeTool(), # Critical challenge prompt wrapper to avoid automatic agreement
"listmodels": ListModelsTool(), # List all available AI models by provider
"version": VersionTool(), # Display server version and system information
}
@@ -346,6 +348,11 @@ PROMPT_TEMPLATES = {
"description": "Generate comprehensive tests",
"template": "Generate comprehensive tests with {model}",
},
"challenge": {
"name": "challenge",
"description": "Challenge a statement critically without automatic agreement",
"template": "Challenge this statement critically",
},
"listmodels": {
"name": "listmodels",
"description": "List available AI models",

200
tests/test_challenge.py Normal file
View File

@@ -0,0 +1,200 @@
"""
Tests for Challenge tool - validating critical challenge prompt wrapper
This module contains unit tests to ensure that the Challenge tool
properly wraps statements to encourage critical thinking and avoid
automatic agreement patterns.
"""
import json
from unittest.mock import patch
import pytest
from tools.challenge import ChallengeRequest, ChallengeTool
class TestChallengeTool:
"""Test suite for Challenge tool"""
def setup_method(self):
"""Set up test fixtures"""
self.tool = ChallengeTool()
def test_tool_metadata(self):
"""Test that tool metadata matches requirements"""
assert self.tool.get_name() == "challenge"
assert "CRITICAL CHALLENGE PROMPT" in self.tool.get_description()
assert "challenge it thoughtfully" in self.tool.get_description()
assert "agreeing by default" in self.tool.get_description()
assert self.tool.get_default_temperature() == 0.2 # TEMPERATURE_ANALYTICAL
def test_requires_model(self):
"""Test that challenge tool doesn't require a model"""
assert self.tool.requires_model() is False
def test_schema_structure(self):
"""Test that schema has correct structure and excludes model fields"""
schema = self.tool.get_input_schema()
# Basic schema structure
assert schema["type"] == "object"
assert "properties" in schema
assert "required" in schema
# Required fields
assert "prompt" in schema["required"]
assert len(schema["required"]) == 1 # Only prompt is required
# Properties
properties = schema["properties"]
assert "prompt" in properties
# Should NOT have model-related fields since it doesn't require a model
assert "model" not in properties
assert "temperature" not in properties
assert "thinking_mode" not in properties
assert "use_websearch" not in properties
assert "continuation_id" not in properties
def test_request_model_validation(self):
"""Test that the request model validates correctly"""
# Test valid request
request = ChallengeRequest(prompt="The sky is green")
assert request.prompt == "The sky is green"
# Test with longer prompt
long_prompt = (
"Machine learning models always produce accurate results and should be trusted without verification"
)
request = ChallengeRequest(prompt=long_prompt)
assert request.prompt == long_prompt
def test_required_fields(self):
"""Test that required fields are enforced"""
from pydantic import ValidationError
# Missing prompt should raise validation error
with pytest.raises(ValidationError):
ChallengeRequest()
@pytest.mark.asyncio
async def test_execute_success(self):
"""Test successful execution of challenge tool"""
arguments = {"prompt": "All software bugs are caused by syntax errors"}
result = await self.tool.execute(arguments)
# Should return a list with TextContent
assert len(result) == 1
assert result[0].type == "text"
# Parse the JSON response
response_data = json.loads(result[0].text)
# Check response structure
assert response_data["status"] == "challenge_created"
assert response_data["original_statement"] == "All software bugs are caused by syntax errors"
assert "challenge_prompt" in response_data
assert "instructions" in response_data
# Check that the challenge prompt contains critical thinking instructions
challenge_prompt = response_data["challenge_prompt"]
assert "CHALLENGE THIS STATEMENT - Do not automatically agree" in challenge_prompt
assert "Is this actually correct? Check carefully" in challenge_prompt
assert response_data["original_statement"] in challenge_prompt
assert "you must say so" in challenge_prompt
assert "Provide your honest assessment, not automatic agreement" in challenge_prompt
@pytest.mark.asyncio
async def test_execute_error_handling(self):
"""Test error handling in execute method"""
# Test with invalid arguments (non-dict)
with patch.object(self.tool, "get_request_model", side_effect=Exception("Test error")):
result = await self.tool.execute({"prompt": "test"})
assert len(result) == 1
response_data = json.loads(result[0].text)
assert response_data["status"] == "error"
assert "Test error" in response_data["error"]
def test_wrap_prompt_for_challenge(self):
"""Test the prompt wrapping functionality"""
original_prompt = "Python is the best programming language"
wrapped = self.tool._wrap_prompt_for_challenge(original_prompt)
# Check structure
assert "CHALLENGE THIS STATEMENT - Do not automatically agree" in wrapped
assert "Is this actually correct? Check carefully" in wrapped
assert f'"{original_prompt}"' in wrapped
assert "you must say so" in wrapped
assert "Provide your honest assessment, not automatic agreement" in wrapped
def test_multiple_prompts(self):
"""Test that tool handles various types of prompts correctly"""
test_prompts = [
"All code should be written in assembly for maximum performance",
"Comments are unnecessary if code is self-documenting",
"Testing is a waste of time for experienced developers",
"Global variables make code easier to understand",
"The more design patterns used, the better the code",
]
for prompt in test_prompts:
request = ChallengeRequest(prompt=prompt)
wrapped = self.tool._wrap_prompt_for_challenge(request.prompt)
# Each wrapped prompt should contain the original
assert prompt in wrapped
assert "CHALLENGE THIS STATEMENT" in wrapped
def test_tool_fields(self):
"""Test tool-specific field definitions"""
fields = self.tool.get_tool_fields()
assert "prompt" in fields
assert fields["prompt"]["type"] == "string"
assert "statement" in fields["prompt"]["description"]
assert "challenge" in fields["prompt"]["description"]
def test_required_fields_list(self):
"""Test required fields list"""
required = self.tool.get_required_fields()
assert required == ["prompt"]
@pytest.mark.asyncio
async def test_not_used_methods(self):
"""Test that methods not used by challenge tool work correctly"""
request = ChallengeRequest(prompt="test")
# These methods aren't used since challenge doesn't call AI
prompt = await self.tool.prepare_prompt(request)
assert prompt == ""
response = self.tool.format_response("test response", request)
assert response == "test response"
def test_special_characters_in_prompt(self):
"""Test handling of special characters in prompts"""
special_prompt = 'The "best" way to handle errors is to use try/except: pass'
request = ChallengeRequest(prompt=special_prompt)
wrapped = self.tool._wrap_prompt_for_challenge(request.prompt)
# Should handle quotes properly
assert special_prompt in wrapped
@pytest.mark.asyncio
async def test_unicode_support(self):
"""Test that tool handles unicode characters correctly"""
unicode_prompt = "软件开发中最重要的是写代码,测试不重要 🚀"
arguments = {"prompt": unicode_prompt}
result = await self.tool.execute(arguments)
response_data = json.loads(result[0].text)
assert response_data["original_statement"] == unicode_prompt
assert unicode_prompt in response_data["challenge_prompt"]
if __name__ == "__main__":
pytest.main([__file__])

View File

@@ -3,6 +3,7 @@ Tool implementations for Zen MCP Server
"""
from .analyze import AnalyzeTool
from .challenge import ChallengeTool
from .chat import ChatTool
from .codereview import CodeReviewTool
from .consensus import ConsensusTool
@@ -29,6 +30,7 @@ __all__ = [
"ListModelsTool",
"PlannerTool",
"PrecommitTool",
"ChallengeTool",
"RefactorTool",
"SecauditTool",
"TestGenTool",

196
tools/challenge.py Normal file
View File

@@ -0,0 +1,196 @@
"""
Challenge tool - Encourages critical thinking and thoughtful disagreement
This tool takes a user's statement and returns it wrapped in instructions that
encourage the CLI agent to challenge ideas and think critically before agreeing. It helps
avoid reflexive agreement by prompting deeper analysis and genuine evaluation.
This is a simple, self-contained tool that doesn't require AI model access.
"""
from typing import TYPE_CHECKING, Any, Optional
from pydantic import Field
if TYPE_CHECKING:
from tools.models import ToolModelCategory
from config import TEMPERATURE_ANALYTICAL
from tools.shared.base_models import ToolRequest
from .simple.base import SimpleTool
# Field descriptions for the Challenge tool
CHALLENGE_FIELD_DESCRIPTIONS = {
"prompt": (
"The statement, question, or assertion the user wants to challenge critically. "
"This may be a claim, suggestion, or idea that requires thoughtful reconsideration, not automatic agreement."
),
}
class ChallengeRequest(ToolRequest):
"""Request model for Challenge tool"""
prompt: str = Field(..., description=CHALLENGE_FIELD_DESCRIPTIONS["prompt"])
class ChallengeTool(SimpleTool):
"""
Challenge tool for encouraging critical thinking and avoiding automatic agreement.
This tool wraps user statements in instructions that encourage the CLI agent to:
- Challenge ideas and think critically before responding
- Evaluate whether they actually agree or disagree
- Provide thoughtful analysis rather than reflexive agreement
The tool is self-contained and doesn't require AI model access - it simply
transforms the input prompt into a structured critical thinking challenge.
"""
def get_name(self) -> str:
return "challenge"
def get_description(self) -> str:
return (
"CRITICAL CHALLENGE PROMPT Use this to frame your statement in a way that prompts "
"the CLI agent to challenge it thoughtfully instead of agreeing by default. Ideal for "
"challenging assumptions, validating ideas, and seeking honest, analytical feedback as part of an ongoing "
"task. The tool wraps your input with instructions explicitly telling the agent to think critically "
"and disagree if warranted."
)
def get_system_prompt(self) -> str:
# Challenge tool doesn't need a system prompt since it doesn't call AI
return ""
def get_default_temperature(self) -> float:
return TEMPERATURE_ANALYTICAL
def get_model_category(self) -> "ToolModelCategory":
"""Challenge doesn't need a model category since it doesn't use AI"""
from tools.models import ToolModelCategory
return ToolModelCategory.FAST_RESPONSE # Default, but not used
def requires_model(self) -> bool:
"""
Challenge tool doesn't require model resolution at the MCP boundary.
Like the planner tool, this is a pure data processing tool that transforms
the input without calling external AI models.
Returns:
bool: False - challenge doesn't need AI model access
"""
return False
def get_request_model(self):
"""Return the Challenge-specific request model"""
return ChallengeRequest
def get_input_schema(self) -> dict[str, Any]:
"""
Generate input schema for the challenge tool.
Since this tool doesn't require a model, we exclude model-related fields.
"""
schema = {
"type": "object",
"properties": {
"prompt": {
"type": "string",
"description": CHALLENGE_FIELD_DESCRIPTIONS["prompt"],
},
},
"required": ["prompt"],
}
return schema
async def execute(self, arguments: dict[str, Any]) -> list:
"""
Execute the challenge tool by wrapping the prompt in critical thinking instructions.
This is the main execution method that transforms the user's statement into
a structured challenge that encourages thoughtful re-evaluation.
"""
import json
from mcp.types import TextContent
try:
# Validate request
request = self.get_request_model()(**arguments)
# Wrap the prompt in challenge instructions
wrapped_prompt = self._wrap_prompt_for_challenge(request.prompt)
# Return the wrapped prompt as the response
response_data = {
"status": "challenge_created",
"original_statement": request.prompt,
"challenge_prompt": wrapped_prompt,
"instructions": (
"Present the challenge_prompt to yourself and follow its instructions. "
"Challenge the statement critically before forming your response. "
"If you disagree after careful reconsideration, explain why."
),
}
return [TextContent(type="text", text=json.dumps(response_data, indent=2, ensure_ascii=False))]
except Exception as e:
import logging
logger = logging.getLogger(__name__)
logger.error(f"Error in challenge tool execution: {e}", exc_info=True)
error_data = {
"status": "error",
"error": str(e),
"content": f"Failed to create challenge prompt: {str(e)}",
}
return [TextContent(type="text", text=json.dumps(error_data, ensure_ascii=False))]
def _wrap_prompt_for_challenge(self, prompt: str) -> str:
"""
Wrap the user's statement in instructions that encourage critical challenge.
Args:
prompt: The original user statement to wrap
Returns:
The statement wrapped in challenge instructions
"""
return (
f"CHALLENGE THIS STATEMENT - Do not automatically agree:\n\n"
f'"{prompt}"\n\n'
f"Is this actually correct? Check carefully. If it's wrong, incomplete, misleading or incorrect, "
f"you must say so. Provide your honest assessment, not automatic agreement. If you "
f"feel there is merit in what the user is saying, explain WHY you agree."
)
# Required method implementations from SimpleTool
async def prepare_prompt(self, request: ChallengeRequest) -> str:
"""Not used since challenge doesn't call AI models"""
return ""
def format_response(self, response: str, request: ChallengeRequest, model_info: Optional[dict] = None) -> str:
"""Not used since challenge doesn't call AI models"""
return response
def get_tool_fields(self) -> dict[str, dict[str, Any]]:
"""Tool-specific field definitions for Challenge"""
return {
"prompt": {
"type": "string",
"description": CHALLENGE_FIELD_DESCRIPTIONS["prompt"],
},
}
def get_required_fields(self) -> list[str]:
"""Required fields for Challenge tool"""
return ["prompt"]