diff --git a/README.md b/README.md index b1509fe..9dcfa8a 100644 --- a/README.md +++ b/README.md @@ -383,6 +383,10 @@ security implications, flash on implementation speed, and pro stay neutral for o Use consensus tool with gemini being "for" the proposal and grok being "against" to debate whether we should adopt microservices architecture ``` +``` +I want to work on module X and Y, unsure which is going to be more popular with users of my app. +Get a consensus from gemini supporting the idea for implementing X, grok opposing it, and flash staying neutral +``` **Key Features:** - **Stance steering**: Assign specific perspectives (for/against/neutral) to each model with intelligent synonym handling @@ -408,20 +412,6 @@ whether we should adopt microservices architecture - `use_websearch`: Enable research for enhanced analysis (default: true) - `continuation_id`: Continue previous consensus discussions -**Example Natural Language Model Specifications:** -```json -[ - {"model": "o3", "stance": "for", "stance_prompt": "Focus on technical benefits and implementation feasibility"}, - {"model": "flash", "stance": "against", "stance_prompt": "Identify risks, costs, and potential downsides"}, - {"model": "pro", "stance": "neutral"} -] -``` - -**Or simply use natural language:** -``` -"Have gemini support the idea, grok oppose it, and flash stay neutral" -``` - ### 4. `codereview` - Professional Code Review **Comprehensive code analysis with prioritized feedback** diff --git a/docs/adding_tools.md b/docs/adding_tools.md index 307000c..161a57d 100644 --- a/docs/adding_tools.md +++ b/docs/adding_tools.md @@ -1,355 +1,333 @@ -# Adding a New Tool +# Adding a New Tool to Zen MCP Server -This guide explains how to add a new tool to the Zen MCP Server. Tools are the primary way Claude interacts with the AI models, providing specialized capabilities like code review, debugging, test generation, and more. +This guide provides step-by-step instructions for adding new tools to the Zen MCP Server. Tools are specialized interfaces that let Claude interact with AI models for specific tasks like code review, debugging, consensus gathering, and more. -## Overview +## Quick Overview -The tool system in Zen MCP Server is designed to be extensible. Each tool: -- Inherits from the `BaseTool` class -- Implements required abstract methods -- Defines a request model for parameter validation -- Is registered in the server's tool registry -- Can leverage different AI models based on task requirements +Every tool must: +- Inherit from `BaseTool` and implement 6 abstract methods +- Define a Pydantic request model for validation +- Create a system prompt in `systemprompts/` +- Register in `server.py` +- Handle file/image inputs and conversation threading -## Architecture Overview +**Key Features**: Automatic conversation threading, file deduplication, token management, model-specific capabilities, web search integration, and comprehensive error handling. -### Key Components +## Core Architecture -1. **BaseTool** (`tools/base.py`): Abstract base class providing common functionality -2. **Request Models**: Pydantic models for input validation -3. **System Prompts**: Specialized prompts that configure AI behavior -4. **Tool Registry**: Registration system in `server.py` +### Components +1. **BaseTool** (`tools/base.py`): Abstract base with conversation memory, file handling, and model management +2. **Request Models**: Pydantic validation with common fields (model, temperature, thinking_mode, continuation_id, images, use_websearch) +3. **System Prompts**: AI behavior configuration with placeholders for dynamic content +4. **Model Context**: Automatic provider resolution and token allocation -### Tool Lifecycle - -1. Claude calls the tool with parameters -2. Parameters are validated using Pydantic -3. File paths are security-checked -4. Prompt is prepared with system instructions -5. AI model generates response -6. Response is formatted and returned +### Execution Flow +1. **MCP Boundary**: Parameter validation, file security checks, image validation +2. **Model Resolution**: Automatic provider selection and capability checking +3. **Conversation Context**: History reconstruction and file deduplication +4. **Prompt Preparation**: System prompt + user content + file content + conversation history +5. **AI Generation**: Provider-agnostic model calls with retry logic +6. **Response Processing**: Format output, offer continuation, store in conversation memory ## Step-by-Step Implementation Guide ### 1. Create the Tool File -Create a new file in the `tools/` directory (e.g., `tools/example.py`): +Create `tools/example.py` with proper imports and structure: ```python """ -Example tool - Brief description of what your tool does +Example tool - Intelligent code analysis and recommendations -This tool provides [specific functionality] to help developers [achieve goal]. -Key features: -- Feature 1 -- Feature 2 -- Feature 3 +This tool provides comprehensive code analysis including style, performance, +and maintainability recommendations for development teams. """ -import logging -from typing import Any, Optional - -from mcp.types import TextContent +from typing import TYPE_CHECKING, Any, Optional from pydantic import Field +if TYPE_CHECKING: + from tools.models import ToolModelCategory + from config import TEMPERATURE_BALANCED from systemprompts import EXAMPLE_PROMPT # You'll create this from .base import BaseTool, ToolRequest -from .models import ToolOutput -logger = logging.getLogger(__name__) +# No need to import ToolOutput or logging - handled by base class ``` +**Key Points:** +- Use `TYPE_CHECKING` import for ToolModelCategory to avoid circular imports +- Import temperature constants from `config.py` +- System prompt imported from `systemprompts/` +- Base class handles all common functionality + ### 2. Define the Request Model -Create a Pydantic model that inherits from `ToolRequest`: +Create a Pydantic model inheriting from `ToolRequest`: ```python class ExampleRequest(ToolRequest): - """Request model for the example tool.""" + """Request model for example tool.""" - # Required parameters + # Required field - main user input prompt: str = Field( ..., - description="The main input/question for the tool" + description=( + "Detailed description of the code analysis needed. Include specific areas " + "of concern, goals, and any constraints. The more context provided, " + "the more targeted and valuable the analysis will be." + ) ) - # Optional parameters with defaults + # Optional file input with proper default files: Optional[list[str]] = Field( - default=None, - description="Files to analyze (must be absolute paths)" + default_factory=list, # Use factory for mutable defaults + description="Code files to analyze (must be absolute paths)" ) - focus_area: Optional[str] = Field( - default=None, - description="Specific aspect to focus on" + # Tool-specific parameters + analysis_depth: Optional[str] = Field( + default="standard", + description="Analysis depth: 'quick', 'standard', or 'comprehensive'" ) - # You can add tool-specific parameters - output_format: Optional[str] = Field( - default="detailed", - description="Output format: 'summary', 'detailed', or 'actionable'" + focus_areas: Optional[list[str]] = Field( + default_factory=list, + description="Specific areas to focus on (e.g., 'performance', 'security', 'maintainability')" ) - # New features - images and web search support - images: Optional[list[str]] = Field( - default=None, - description="Optional images for visual context (file paths or base64 data URLs)" - ) - - use_websearch: Optional[bool] = Field( - default=True, - description="Enable web search for documentation and current information" - ) + # Images field inherited from ToolRequest - no need to redefine + # use_websearch field inherited from ToolRequest - no need to redefine + # continuation_id field inherited from ToolRequest - no need to redefine ``` +**Key Points:** +- Use `default_factory=list` for mutable defaults (not `default=None`) +- Common fields (images, use_websearch, continuation_id, model, temperature) are inherited +- Detailed descriptions help Claude understand when/how to use parameters +- Focus on tool-specific parameters only + ### 3. Implement the Tool Class +Implement the 6 required abstract methods: + ```python class ExampleTool(BaseTool): - """Implementation of the example tool.""" + """Intelligent code analysis and recommendations tool.""" def get_name(self) -> str: - """Return the tool's unique identifier.""" + """Return unique tool identifier (used by MCP clients).""" return "example" def get_description(self) -> str: - """Return detailed description for Claude.""" + """Return detailed description to help Claude understand when to use this tool.""" return ( - "EXAMPLE TOOL - Brief tagline describing the tool's purpose. " - "Use this tool when you need to [specific use cases]. " - "Perfect for: [scenario 1], [scenario 2], [scenario 3]. " - "Supports [key features]. Choose thinking_mode based on " - "[guidance for mode selection]. " - "Note: If you're not currently using a top-tier model such as " - "Opus 4 or above, these tools can provide enhanced capabilities." + "CODE ANALYSIS & RECOMMENDATIONS - Provides comprehensive code analysis including " + "style improvements, performance optimizations, and maintainability suggestions. " + "Perfect for: code reviews, refactoring planning, performance analysis, best practices " + "validation. Supports multi-file analysis with focus areas. Use 'comprehensive' analysis " + "for complex codebases, 'standard' for regular reviews, 'quick' for simple checks." ) def get_input_schema(self) -> dict[str, Any]: - """Define the JSON schema for tool parameters.""" + """Generate JSON schema - inherit common fields from base class.""" schema = { "type": "object", "properties": { "prompt": { "type": "string", - "description": "The main input/question for the tool", + "description": ( + "Detailed description of the code analysis needed. Include specific areas " + "of concern, goals, and any constraints." + ), }, "files": { "type": "array", "items": {"type": "string"}, - "description": "Files to analyze (must be absolute paths)", + "description": "Code files to analyze (must be absolute paths)", }, - "focus_area": { + "analysis_depth": { "type": "string", - "description": "Specific aspect to focus on", + "enum": ["quick", "standard", "comprehensive"], + "description": "Analysis depth level", + "default": "standard", }, - "output_format": { - "type": "string", - "enum": ["summary", "detailed", "actionable"], - "description": "Output format type", - "default": "detailed", + "focus_areas": { + "type": "array", + "items": {"type": "string"}, + "description": "Specific areas to focus on (e.g., 'performance', 'security')", }, + # Common fields added automatically by base class "model": self.get_model_field_schema(), "temperature": { "type": "number", - "description": "Temperature (0-1, default varies by tool)", + "description": "Response creativity (0-1, default varies by tool)", "minimum": 0, "maximum": 1, }, "thinking_mode": { "type": "string", "enum": ["minimal", "low", "medium", "high", "max"], - "description": "Thinking depth: minimal (0.5% of model max), " - "low (8%), medium (33%), high (67%), max (100%)", + "description": "Thinking depth: minimal (0.5% of model max), low (8%), medium (33%), high (67%), max (100%)", }, "use_websearch": { "type": "boolean", - "description": "Enable web search for documentation and current information", + "description": "Enable web search for current best practices and documentation", "default": True, }, "images": { "type": "array", "items": {"type": "string"}, - "description": "Optional images for visual context", + "description": "Optional screenshots or diagrams for visual context", }, "continuation_id": { "type": "string", "description": "Thread continuation ID for multi-turn conversations", }, }, - "required": ["prompt"] + ( - ["model"] if self.is_effective_auto_mode() else [] - ), + "required": ["prompt"] + (["model"] if self.is_effective_auto_mode() else []), } return schema def get_system_prompt(self) -> str: - """Return the system prompt for this tool.""" - return EXAMPLE_PROMPT # Defined in systemprompts/ - - def get_default_temperature(self) -> float: - """Return default temperature for this tool.""" - # Use predefined constants from config.py: - # TEMPERATURE_CREATIVE (0.7) - For creative tasks - # TEMPERATURE_BALANCED (0.5) - For balanced tasks - # TEMPERATURE_ANALYTICAL (0.2) - For analytical tasks - return TEMPERATURE_BALANCED - - def get_model_category(self): - """Specify which type of model this tool needs.""" - from tools.models import ToolModelCategory - - # Choose based on your tool's needs: - # FAST_RESPONSE - Quick responses, cost-efficient (chat, simple queries) - # BALANCED - Standard analysis and generation - # EXTENDED_REASONING - Complex analysis, deep thinking (debug, review) - return ToolModelCategory.BALANCED + """Return system prompt that configures AI behavior.""" + return EXAMPLE_PROMPT def get_request_model(self): - """Return the request model class.""" + """Return Pydantic request model class for validation.""" return ExampleRequest - def wants_line_numbers_by_default(self) -> bool: - """Whether to add line numbers to code files.""" - # Return True if your tool benefits from precise line references - # (e.g., code review, debugging, refactoring) - # Return False for general analysis or token-sensitive operations - return False - async def prepare_prompt(self, request: ExampleRequest) -> str: - """ - Prepare the complete prompt for the AI model. - - This method combines: - - System prompt (behavior configuration) - - User request - - File contents (if provided) - - Additional context - """ - # Check for prompt.txt in files (handles large prompts) + """Prepare complete prompt with user request + file content + context.""" + # Handle large prompts via prompt.txt file mechanism prompt_content, updated_files = self.handle_prompt_file(request.files) - if prompt_content: - request.prompt = prompt_content + user_content = prompt_content if prompt_content else request.prompt + + # Check MCP transport size limits on user input + size_check = self.check_prompt_size(user_content) + if size_check: + from tools.models import ToolOutput + raise ValueError(f"MCP_SIZE_CHECK:{ToolOutput(**size_check).model_dump_json()}") + + # Update files list if prompt.txt was found if updated_files is not None: request.files = updated_files - # Build the prompt parts - prompt_parts = [] + # Add focus areas to user content + if request.focus_areas: + focus_text = "\n\nFocus areas: " + ", ".join(request.focus_areas) + user_content += focus_text - # Add main request - prompt_parts.append(f"=== USER REQUEST ===") - prompt_parts.append(f"Focus Area: {request.focus_area}" if request.focus_area else "") - prompt_parts.append(f"Output Format: {request.output_format}") - prompt_parts.append(request.prompt) - prompt_parts.append("=== END REQUEST ===") - - # Add file contents if provided + # Add file content using centralized handler (handles deduplication & token limits) if request.files: - # Use the centralized file handling (respects continuation) - file_content = self._prepare_file_content_for_prompt( - request.files, - request.continuation_id, - "Files to analyze" + file_content, processed_files = self._prepare_file_content_for_prompt( + request.files, request.continuation_id, "Code files" ) + self._actually_processed_files = processed_files # For conversation memory if file_content: - prompt_parts.append("\n=== FILES ===") - prompt_parts.append(file_content) - prompt_parts.append("=== END FILES ===") + user_content = f"{user_content}\n\n=== CODE FILES ===\n{file_content}\n=== END FILES ===" - # Validate token limits - full_prompt = "\n".join(filter(None, prompt_parts)) - self._validate_token_limit(full_prompt, "Prompt") + # Validate final prompt doesn't exceed model context window + self._validate_token_limit(user_content, "Prompt content") - return full_prompt + # Add web search instruction if enabled + websearch_instruction = self.get_websearch_instruction( + request.use_websearch, + """Consider searching for: +- Current best practices for the technologies used +- Recent security advisories or performance improvements +- Community solutions to similar code patterns""" + ) + + return f"""{self.get_system_prompt()}{websearch_instruction} + +=== ANALYSIS REQUEST === +Analysis Depth: {request.analysis_depth} + +{user_content} +=== END REQUEST === + +Provide comprehensive code analysis with specific, actionable recommendations:""" + + # Optional: Override these methods for customization + def get_default_temperature(self) -> float: + return TEMPERATURE_BALANCED # 0.5 - good for analytical tasks - def format_response(self, response: str, request: ExampleRequest, - model_info: Optional[dict] = None) -> str: - """ - Format the AI's response for display. - - Override this to add custom formatting, headers, or structure. - The base class handles special status parsing automatically. - """ - # Example: Add a footer with next steps - return f"{response}\n\n---\n\n**Next Steps:** Review the analysis above and proceed with implementation." + def get_model_category(self) -> "ToolModelCategory": + from tools.models import ToolModelCategory + return ToolModelCategory.BALANCED # Standard analysis capabilities + + def wants_line_numbers_by_default(self) -> bool: + return True # Essential for precise code feedback + + def format_response(self, response: str, request: ExampleRequest, model_info: Optional[dict] = None) -> str: + """Add custom formatting - base class handles continuation offers automatically.""" + return f"{response}\n\n---\n\n**Next Steps:** Review recommendations and prioritize implementation based on impact." ``` -### 4. Handle Large Prompts (Optional) +**Key Changes from Documentation:** +- **Schema Inheritance**: Common fields handled by base class automatically +- **MCP Size Checking**: Required for large prompt handling +- **File Processing**: Use `_prepare_file_content_for_prompt()` for conversation-aware deduplication +- **Error Handling**: `check_prompt_size()` and `_validate_token_limit()` prevent crashes +- **Web Search**: Use `get_websearch_instruction()` for consistent implementation -If your tool might receive large text inputs, override the `execute` method: +### 4. Create the System Prompt + +Create `systemprompts/example_prompt.py`: ```python -async def execute(self, arguments: dict[str, Any]) -> list[TextContent]: - """Override to check prompt size before processing.""" - # Validate request first - request_model = self.get_request_model() - request = request_model(**arguments) - - # Check if prompt is too large for MCP limits - size_check = self.check_prompt_size(request.prompt) - if size_check: - return [TextContent(type="text", text=ToolOutput(**size_check).model_dump_json())] - - # Continue with normal execution - return await super().execute(arguments) +"""System prompt for the example code analysis tool.""" + +EXAMPLE_PROMPT = """You are an expert code analyst and software engineering consultant specializing in comprehensive code review and optimization recommendations. + +Your analysis should cover: + +TECHNICAL ANALYSIS: +- Code structure, organization, and architectural patterns +- Performance implications and optimization opportunities +- Security vulnerabilities and defensive programming practices +- Maintainability factors and technical debt assessment +- Best practices adherence and industry standards compliance + +RECOMMENDATIONS FORMAT: +1. **Critical Issues** - Security, bugs, or breaking problems (fix immediately) +2. **Performance Optimizations** - Specific improvements with expected impact +3. **Code Quality Improvements** - Maintainability, readability, and structure +4. **Best Practices** - Industry standards and modern patterns +5. **Future Considerations** - Scalability and extensibility suggestions + +ANALYSIS GUIDELINES: +- Reference specific line numbers when discussing code (file:line format) +- Provide concrete, actionable recommendations with examples +- Explain the "why" behind each suggestion +- Consider the broader system context and trade-offs +- Prioritize suggestions by impact and implementation difficulty + +Be precise, practical, and constructive in your analysis. Focus on improvements that provide tangible value to the development team.""" ``` -### 5. Create the System Prompt - -Create a new file in `systemprompts/` (e.g., `systemprompts/example_prompt.py`): - -```python -"""System prompt for the example tool.""" - -EXAMPLE_PROMPT = """You are an AI assistant specialized in [tool purpose]. - -Your role is to [primary responsibility] by [approach/methodology]. - -Key principles: -1. [Principle 1] -2. [Principle 2] -3. [Principle 3] - -When analyzing content: -- [Guideline 1] -- [Guideline 2] -- [Guideline 3] - -Output format: -- Start with a brief summary -- Provide detailed analysis organized by [structure] -- Include specific examples and recommendations -- End with actionable next steps - -Remember to: -- Be specific and reference exact locations (file:line) when discussing code -- Provide practical, implementable suggestions -- Consider the broader context and implications -- Maintain a helpful, constructive tone -""" -``` - -Add the import to `systemprompts/__init__.py`: - +**Add to `systemprompts/__init__.py`:** ```python from .example_prompt import EXAMPLE_PROMPT ``` -### 6. Register the Tool +**Key Elements:** +- Clear role definition and expertise area +- Structured output format that's useful for developers +- Specific guidelines for code references and explanations +- Focus on actionable, prioritized recommendations -#### 6.1. Import in server.py - -Add the import at the top of `server.py`: +### 5. Register the Tool +**Step 5.1: Import in `server.py`** ```python from tools.example import ExampleTool ``` -#### 6.2. Add to TOOLS Dictionary - -Find the `TOOLS` dictionary in `server.py` and add your tool: - +**Step 5.2: Add to TOOLS dictionary in `server.py`** ```python TOOLS = { "thinkdeep": ThinkDeepTool(), @@ -357,18 +335,20 @@ TOOLS = { "debug": DebugIssueTool(), "analyze": AnalyzeTool(), "chat": ChatTool(), - "listmodels": ListModelsTool(), - "precommit": Precommit(), - "testgen": TestGenerationTool(), - "refactor": RefactorTool(), - "tracer": TracerTool(), "example": ExampleTool(), # Add your tool here + # ... other tools } ``` -### 7. Write Tests +**That's it!** The server automatically: +- Exposes the tool via MCP protocol +- Handles request validation and routing +- Manages model resolution and provider selection +- Implements conversation threading and file deduplication -Create unit tests in `tests/test_example.py`: +### 6. Write Tests + +Create `tests/test_example.py`: ```python """Tests for the example tool.""" @@ -384,466 +364,294 @@ class TestExampleTool: """Test suite for ExampleTool.""" def test_tool_metadata(self): - """Test tool metadata methods.""" + """Test basic tool metadata and configuration.""" tool = ExampleTool() assert tool.get_name() == "example" - assert "EXAMPLE TOOL" in tool.get_description() - assert tool.get_default_temperature() == 0.5 + assert "CODE ANALYSIS" in tool.get_description() + assert tool.get_default_temperature() == 0.5 # TEMPERATURE_BALANCED assert tool.get_model_category() == ToolModelCategory.BALANCED + assert tool.wants_line_numbers_by_default() is True def test_request_validation(self): - """Test request model validation.""" + """Test Pydantic request model validation.""" # Valid request - request = ExampleRequest(prompt="Test prompt") - assert request.prompt == "Test prompt" - assert request.output_format == "detailed" # default + request = ExampleRequest(prompt="Analyze this code for performance issues") + assert request.prompt == "Analyze this code for performance issues" + assert request.analysis_depth == "standard" # default + assert request.focus_areas == [] # default_factory # Invalid request (missing required field) with pytest.raises(ValueError): - ExampleRequest() + ExampleRequest() # Missing prompt - def test_input_schema(self): - """Test input schema generation.""" + def test_input_schema_generation(self): + """Test JSON schema generation for MCP client.""" tool = ExampleTool() schema = tool.get_input_schema() assert schema["type"] == "object" assert "prompt" in schema["properties"] assert "prompt" in schema["required"] + assert "analysis_depth" in schema["properties"] + + # Common fields should be present assert "model" in schema["properties"] + assert "continuation_id" in schema["properties"] + assert "images" in schema["properties"] + + def test_model_category_for_auto_mode(self): + """Test model category affects auto mode selection.""" + tool = ExampleTool() + category = tool.get_model_category() + + # Should match expected category for provider selection + assert category == ToolModelCategory.BALANCED @pytest.mark.asyncio - async def test_prepare_prompt(self): - """Test prompt preparation.""" + async def test_prepare_prompt_basic(self): + """Test prompt preparation with basic input.""" tool = ExampleTool() request = ExampleRequest( - prompt="Analyze this code", - focus_area="performance", - output_format="summary" + prompt="Review this code", + analysis_depth="comprehensive", + focus_areas=["performance", "security"] ) - with patch.object(tool, '_validate_token_limit'): - prompt = await tool.prepare_prompt(request) - - assert "USER REQUEST" in prompt - assert "Analyze this code" in prompt - assert "Focus Area: performance" in prompt - assert "Output Format: summary" in prompt - - @pytest.mark.asyncio - async def test_file_handling(self): - """Test file content handling.""" - tool = ExampleTool() - request = ExampleRequest( - prompt="Analyze", - files=["/path/to/file.py"] - ) - - # Mock file reading - with patch.object(tool, '_prepare_file_content_for_prompt') as mock_prep: - mock_prep.return_value = "file contents" + # Mock validation methods + with patch.object(tool, 'check_prompt_size', return_value=None): with patch.object(tool, '_validate_token_limit'): - prompt = await tool.prepare_prompt(request) + with patch.object(tool, 'get_websearch_instruction', return_value=""): + prompt = await tool.prepare_prompt(request) - assert "FILES" in prompt - assert "file contents" in prompt -``` - -### 8. Add Simulator Tests (Optional) - -For tools that interact with external systems, create simulator tests in `simulator_tests/test_example_basic.py`: - -```python -"""Basic simulator test for example tool.""" - -from simulator_tests.base_test import SimulatorTest - - -class TestExampleBasic(SimulatorTest): - """Test basic example tool functionality.""" + assert "Review this code" in prompt + assert "performance, security" in prompt + assert "comprehensive" in prompt + assert "ANALYSIS REQUEST" in prompt - def test_example_analysis(self): - """Test basic analysis with example tool.""" - result = self.call_tool( - "example", - { - "prompt": "Analyze the architecture of this codebase", - "model": "flash", - "output_format": "summary" - } + @pytest.mark.asyncio + async def test_file_handling_with_deduplication(self): + """Test file processing with conversation-aware deduplication.""" + tool = ExampleTool() + request = ExampleRequest( + prompt="Analyze these files", + files=["/path/to/file1.py", "/path/to/file2.py"], + continuation_id="test-thread-123" ) - self.assert_tool_success(result) - self.assert_content_contains(result, ["architecture", "summary"]) -``` - -### 9. Update Documentation - -Add your tool to the README.md in the tools section: - -```markdown -### Available Tools - -- **thinkdeep** - Extended thinking and reasoning for complex problems -- **codereview** - Professional code review with bug and security analysis -- **debug** - Debug and root cause analysis for complex issues -- **analyze** - General-purpose file and code analysis -- **chat** - General chat and collaborative thinking -- **listmodels** - List all available AI models and their capabilities -- **precommit** - Pre-commit validation for git changes -- **testgen** - Comprehensive test generation with edge cases -- **refactor** - Intelligent code refactoring suggestions -- **tracer** - Static analysis for tracing code execution paths -- **example** - Brief description of what the tool does - - Use cases: [scenario 1], [scenario 2] - - Supports: [key features] - - Best model: `balanced` category for standard analysis -``` - -## Advanced Features - -### Token Budget Management - -The server provides a `_remaining_tokens` parameter that tools can use for dynamic content allocation: - -```python -# In execute method, you receive remaining tokens: -async def execute(self, arguments: dict[str, Any]) -> list[TextContent]: - # Access remaining tokens if provided - remaining_tokens = arguments.get('_remaining_tokens') + # Mock file processing + with patch.object(tool, 'check_prompt_size', return_value=None): + with patch.object(tool, '_validate_token_limit'): + with patch.object(tool, 'get_websearch_instruction', return_value=""): + with patch.object(tool, '_prepare_file_content_for_prompt') as mock_prep: + mock_prep.return_value = ("file content", ["/path/to/file1.py"]) + + prompt = await tool.prepare_prompt(request) + + # Should call centralized file handler with continuation_id + mock_prep.assert_called_once_with( + ["/path/to/file1.py", "/path/to/file2.py"], + "test-thread-123", + "Code files" + ) + + assert "CODE FILES" in prompt + assert "file content" in prompt - # Use for file content preparation - file_content = self._prepare_file_content_for_prompt( - files, - continuation_id, - "Analysis files", - max_tokens=remaining_tokens - 5000 # Reserve for response - ) -``` - -### Understanding Conversation Memory - -The `continuation_id` feature enables multi-turn conversations using the conversation memory system (`utils/conversation_memory.py`). Here's how it works: - -1. **Thread Creation**: When a tool wants to enable follow-up conversations, it creates a thread -2. **Turn Storage**: Each exchange (user/assistant) is stored as a turn with metadata -3. **Cross-Tool Continuation**: Any tool can continue a conversation started by another tool -4. **Automatic History**: When `continuation_id` is provided, the full conversation history is reconstructed - -Key concepts: -- **ThreadContext**: Contains all conversation turns, files, and metadata -- **ConversationTurn**: Single exchange with role, content, timestamp, files, tool attribution -- **Thread Chains**: Conversations can have parent threads for extended discussions -- **Turn Limits**: Default 20 turns (configurable via MAX_CONVERSATION_TURNS) - -Example flow: -```python -# Tool A creates thread -thread_id = create_thread("analyze", request_data) - -# Tool A adds its response -add_turn(thread_id, "assistant", response, files=[...], tool_name="analyze") - -# Tool B continues the same conversation -context = get_thread(thread_id) # Gets full history -# Tool B sees all previous turns and files -``` - -### Supporting Special Response Types - -Tools can return special status responses for complex interactions. These are defined in `tools/models.py`: - -```python -# Currently supported special statuses: -SPECIAL_STATUS_MODELS = { - "clarification_required": ClarificationRequest, - "full_codereview_required": FullCodereviewRequired, - "focused_review_required": FocusedReviewRequired, - "test_sample_needed": TestSampleNeeded, - "more_tests_required": MoreTestsRequired, - "refactor_analysis_complete": RefactorAnalysisComplete, - "trace_complete": TraceComplete, - "resend_prompt": ResendPromptRequest, - "code_too_large": CodeTooLargeRequest, -} -``` - -Example implementation: -```python -# In your tool's format_response or within the AI response: -if need_clarification: - return json.dumps({ - "status": "need_clarification", - "questions": ["What specific aspect should I focus on?"], - "context": "I need more information to proceed" - }) - -# For custom review status: -if more_analysis_needed: - return json.dumps({ - "status": "focused_review_required", - "files": ["/path/to/file1.py", "/path/to/file2.py"], - "focus": "security", - "reason": "Found potential SQL injection vulnerabilities" - }) -``` - -To add a new custom response type: - -1. Define the model in `tools/models.py`: -```python -class CustomStatusModel(BaseModel): - """Model for custom status responses""" - status: Literal["custom_status"] - custom_field: str - details: dict[str, Any] -``` - -2. Register it in `SPECIAL_STATUS_MODELS`: -```python -SPECIAL_STATUS_MODELS = { - # ... existing statuses ... - "custom_status": CustomStatusModel, -} -``` - -3. The base tool will automatically handle parsing and validation - -### Token Management - -For tools processing large amounts of data: - -```python -# Calculate available tokens dynamically -def prepare_large_content(self, files: list[str], remaining_budget: int): - # Reserve tokens for response - reserve_tokens = 5000 + @pytest.mark.asyncio + async def test_prompt_file_handling(self): + """Test prompt.txt file handling for large inputs.""" + tool = ExampleTool() + request = ExampleRequest( + prompt="small prompt", # Will be replaced + files=["/path/to/prompt.txt", "/path/to/other.py"] + ) + + # Mock prompt.txt handling + with patch.object(tool, 'handle_prompt_file') as mock_handle: + mock_handle.return_value = ("Large prompt content from file", ["/path/to/other.py"]) + with patch.object(tool, 'check_prompt_size', return_value=None): + with patch.object(tool, '_validate_token_limit'): + with patch.object(tool, 'get_websearch_instruction', return_value=""): + with patch.object(tool, '_prepare_file_content_for_prompt', return_value=("", [])): + prompt = await tool.prepare_prompt(request) + + assert "Large prompt content from file" in prompt + mock_handle.assert_called_once() - # Use model-specific limits - effective_max = remaining_budget - reserve_tokens + def test_format_response_customization(self): + """Test custom response formatting.""" + tool = ExampleTool() + request = ExampleRequest(prompt="test") + + formatted = tool.format_response("Analysis complete", request) + + assert "Analysis complete" in formatted + assert "Next Steps:" in formatted + assert "prioritize implementation" in formatted + + +# Integration test (requires actual model context) +class TestExampleToolIntegration: + """Integration tests that require full tool setup.""" - # Process files with budget - content = self._prepare_file_content_for_prompt( - files, - continuation_id, - "Analysis files", - max_tokens=effective_max, - reserve_tokens=reserve_tokens - ) + def setup_method(self): + """Set up model context for integration tests.""" + # Initialize model context for file processing + from utils.model_context import ModelContext + self.tool = ExampleTool() + self.tool._model_context = ModelContext("flash") # Test model + + @pytest.mark.asyncio + async def test_full_prompt_preparation(self): + """Test complete prompt preparation flow.""" + request = ExampleRequest( + prompt="Analyze this codebase for security issues", + analysis_depth="comprehensive", + focus_areas=["security", "performance"] + ) + + # Mock file system and validation + with patch.object(self.tool, 'check_prompt_size', return_value=None): + with patch.object(self.tool, '_validate_token_limit'): + with patch.object(self.tool, 'get_websearch_instruction', return_value="\nWEB_SEARCH_ENABLED"): + prompt = await self.tool.prepare_prompt(request) + + # Verify complete prompt structure + assert self.tool.get_system_prompt() in prompt + assert "WEB_SEARCH_ENABLED" in prompt + assert "security, performance" in prompt + assert "comprehensive" in prompt + assert "ANALYSIS REQUEST" in prompt ``` -### Web Search Integration +**Key Testing Patterns:** +- **Metadata Tests**: Verify tool configuration and schema generation +- **Validation Tests**: Test Pydantic request models and edge cases +- **Prompt Tests**: Mock external dependencies, test prompt composition +- **Integration Tests**: Test full flow with model context +- **File Handling**: Test conversation-aware deduplication +- **Error Cases**: Test size limits, validation failures -Enable web search for tools that benefit from current information: +## Essential Gotchas & Best Practices + +### Critical Requirements + +**🚨 MUST DO:** +1. **Inherit from ToolRequest**: Request models MUST inherit from `ToolRequest` to get common fields +2. **Use `default_factory=list`**: For mutable defaults, never use `default=[]` - causes shared state bugs +3. **Implement all 6 abstract methods**: `get_name()`, `get_description()`, `get_input_schema()`, `get_system_prompt()`, `get_request_model()`, `prepare_prompt()` +4. **Handle MCP size limits**: Call `check_prompt_size()` on user input in `prepare_prompt()` +5. **Use centralized file processing**: Call `_prepare_file_content_for_prompt()` for conversation-aware deduplication +6. **Register in server.py**: Import tool and add to `TOOLS` dictionary + +**🚨 COMMON MISTAKES:** +- **Forgetting TYPE_CHECKING**: Import `ToolModelCategory` under `TYPE_CHECKING` to avoid circular imports +- **Hardcoding models**: Use `get_model_category()` instead of hardcoding model selection +- **Ignoring continuation_id**: File processing should pass `continuation_id` for deduplication +- **Missing error handling**: Always validate token limits with `_validate_token_limit()` +- **Wrong default patterns**: Use `default_factory=list` not `default=None` for file lists + +### File Handling Patterns ```python -# In prepare_prompt: -websearch_instruction = self.get_websearch_instruction( - request.use_websearch, - """Consider searching for: - - Current best practices for [topic] - - Recent updates to [technology] - - Community solutions for [problem]""" +# ✅ CORRECT: Conversation-aware file processing +file_content, processed_files = self._prepare_file_content_for_prompt( + request.files, request.continuation_id, "Context files" ) +self._actually_processed_files = processed_files # For conversation memory -full_prompt = f"{system_prompt}{websearch_instruction}\n\n{user_content}" +# ❌ WRONG: Direct file reading (no deduplication) +file_content = read_files(request.files) ``` -### Image Support - -Tools can now accept images for visual context: +### Request Model Patterns ```python -# In your request model: -images: Optional[list[str]] = Field( - None, - description="Optional images for visual context" -) - -# In prepare_prompt: -if request.images: - # Images are automatically validated and processed by base class - # They will be included in the prompt sent to the model - pass +# ✅ CORRECT: Proper defaults and inheritance +class MyToolRequest(ToolRequest): + files: Optional[list[str]] = Field(default_factory=list, ...) + options: Optional[list[str]] = Field(default_factory=list, ...) + +# ❌ WRONG: Shared mutable defaults +class MyToolRequest(ToolRequest): + files: Optional[list[str]] = Field(default=[], ...) # BUG! ``` -Image validation includes: -- Size limits based on model capabilities -- Format validation (PNG, JPEG, GIF, WebP) -- Automatic base64 encoding for file paths -- Model-specific image count limits +### Testing Requirements -## Best Practices +**Required Tests:** +- Tool metadata (name, description, category) +- Request validation (valid/invalid cases) +- Schema generation for MCP +- Prompt preparation with mocks +- File handling with conversation IDs +- Error cases (size limits, validation failures) -1. **Clear Tool Descriptions**: Write descriptive text that helps Claude understand when to use your tool -2. **Proper Validation**: Use Pydantic models for robust input validation -3. **Security First**: Always validate file paths are absolute -4. **Token Awareness**: Handle large inputs gracefully with prompt.txt mechanism -5. **Model Selection**: Choose appropriate model category for your tool's complexity -6. **Line Numbers**: Enable for tools needing precise code references -7. **Error Handling**: Provide helpful error messages for common issues -8. **Testing**: Write comprehensive unit tests and simulator tests -9. **Documentation**: Include examples and use cases in your description +### Model Categories Guide -## Common Pitfalls to Avoid +- **FAST_RESPONSE**: Chat, simple queries, quick tasks (→ o4-mini, flash) +- **BALANCED**: Standard analysis, code review, general tasks (→ o3-mini, pro) +- **EXTENDED_REASONING**: Complex debugging, deep analysis (→ o3, pro with high thinking) -1. **Don't Skip Validation**: Always validate inputs, especially file paths -2. **Don't Ignore Token Limits**: Use `_validate_token_limit` and handle large prompts -3. **Don't Hardcode Models**: Use model categories for flexibility -4. **Don't Forget Tests**: Every tool needs tests for reliability -5. **Don't Break Conventions**: Follow existing patterns from other tools -6. **Don't Overlook Images**: Validate image limits based on model capabilities -7. **Don't Waste Tokens**: Use remaining_tokens budget for efficient allocation +### Advanced Features -## Testing Your Tool +**Conversation Threading**: Automatic if `continuation_id` provided +**File Deduplication**: Automatic via `_prepare_file_content_for_prompt()` +**Web Search**: Use `get_websearch_instruction()` for consistent implementation +**Image Support**: Inherited from ToolRequest, validated automatically +**Large Prompts**: Handle via `check_prompt_size()` → prompt.txt mechanism -### Manual Testing +## Quick Checklist -1. Start the server with your tool registered -2. Use Claude Desktop to call your tool -3. Test various parameter combinations -4. Verify error handling - -### Automated Testing +**Before Submitting PR:** +- [ ] Tool inherits from `BaseTool`, request from `ToolRequest` +- [ ] All 6 abstract methods implemented +- [ ] System prompt created in `systemprompts/` +- [ ] Tool registered in `server.py` TOOLS dict +- [ ] Comprehensive unit tests written +- [ ] File handling uses `_prepare_file_content_for_prompt()` +- [ ] MCP size checking with `check_prompt_size()` +- [ ] Token validation with `_validate_token_limit()` +- [ ] Proper model category selected +- [ ] No hardcoded model names +**Run Before Commit:** ```bash -# Run unit tests +# Test your tool pytest tests/test_example.py -xvs -# Run all tests to ensure no regressions -pytest -xvs - -# Run simulator tests if applicable -python communication_simulator_test.py +# Run all tests +./code_quality_checks.sh ``` -## Checklist +## Complete Example -Before submitting your PR: +The example tool we built provides: +- **Comprehensive code analysis** with configurable depth +- **Multi-file support** with conversation-aware deduplication +- **Focus areas** for targeted analysis +- **Web search integration** for current best practices +- **Image support** for screenshots/diagrams +- **Conversation threading** for follow-up discussions +- **Automatic model selection** based on task complexity -- [ ] Tool class created inheriting from `BaseTool` -- [ ] All abstract methods implemented -- [ ] Request model defined with proper validation -- [ ] System prompt created in `systemprompts/` -- [ ] Tool registered in `server.py` -- [ ] Unit tests written and passing -- [ ] Simulator tests added (if applicable) -- [ ] Documentation updated -- [ ] Code follows project style (ruff, black, isort) -- [ ] Large prompt handling implemented (if needed) -- [ ] Security validation for file paths -- [ ] Appropriate model category selected -- [ ] Tool description is clear and helpful - -## Model Providers and Configuration - -The Zen MCP Server supports multiple AI providers: - -### Built-in Providers -- **Anthropic** (Claude models) -- **Google** (Gemini models) -- **OpenAI** (GPT and O-series models) -- **X.AI** (Grok models) -- **Mistral** (Mistral models) -- **Meta** (Llama models via various providers) -- **Groq** (Fast inference) -- **Fireworks** (Open models) -- **OpenRouter** (Multi-provider gateway) -- **Deepseek** (Deepseek models) -- **Together** (Open models) - -### Custom Endpoints -- **Ollama** - Local models via `http://host.docker.internal:11434/v1` -- **vLLM** - Custom inference endpoints - -### Prompt Templates - -The server supports prompt templates for quick tool invocation: - -```python -PROMPT_TEMPLATES = { - "thinkdeep": { - "name": "thinkdeeper", - "description": "Think deeply about the current context", - "template": "Think deeper about this with {model} using {thinking_mode} thinking mode", - }, - # Add your own templates in server.py +**Usage by Claude:** +```json +{ + "tool": "example", + "arguments": { + "prompt": "Analyze this codebase for security vulnerabilities and performance issues", + "files": ["/path/to/src/", "/path/to/config.py"], + "analysis_depth": "comprehensive", + "focus_areas": ["security", "performance"], + "model": "o3" + } } ``` -## Example: Complete Simple Tool +The tool automatically handles file deduplication, validates inputs, manages token limits, and offers continuation opportunities for deeper analysis. -Here's a minimal but complete example tool: +--- -```python -""" -Simple calculator tool for mathematical operations. -""" - -from typing import Any, Optional -from mcp.types import TextContent -from pydantic import Field - -from config import TEMPERATURE_ANALYTICAL -from .base import BaseTool, ToolRequest -from .models import ToolOutput - - -class CalculateRequest(ToolRequest): - """Request model for calculator tool.""" - - expression: str = Field( - ..., - description="Mathematical expression to evaluate" - ) - - -class CalculatorTool(BaseTool): - """Simple calculator tool.""" - - def get_name(self) -> str: - return "calculate" - - def get_description(self) -> str: - return ( - "CALCULATOR - Evaluates mathematical expressions. " - "Use this for calculations, conversions, and math problems." - ) - - def get_input_schema(self) -> dict[str, Any]: - schema = { - "type": "object", - "properties": { - "expression": { - "type": "string", - "description": "Mathematical expression to evaluate", - }, - "model": self.get_model_field_schema(), - }, - "required": ["expression"] + ( - ["model"] if self.is_effective_auto_mode() else [] - ), - } - return schema - - def get_system_prompt(self) -> str: - return """You are a mathematical assistant. Evaluate the expression - and explain the calculation steps clearly.""" - - def get_default_temperature(self) -> float: - return TEMPERATURE_ANALYTICAL - - def get_request_model(self): - return CalculateRequest - - async def prepare_prompt(self, request: CalculateRequest) -> str: - return f"Calculate: {request.expression}\n\nShow your work step by step." -``` - -## Need Help? - -- Look at existing tools (`chat.py`, `refactor.py`) for examples -- Check `base.py` for available helper methods -- Review test files for testing patterns -- Ask questions in GitHub issues if stuck \ No newline at end of file +**Need Help?** Look at existing tools like `chat.py` and `consensus.py` for reference implementations, or check GitHub issues for support. \ No newline at end of file