Files

Fahad 99fab3e83d Docs added to show how a new provider is added

Docs added to show how a new tool is created
All tools should add numbers to code for models to be able to reference if needed
Enabled line numbering for code for all tools to use
Additional tests to validate line numbering is not added to git diffs

2025-06-15 07:02:27 +04:00

23 KiB

Raw Blame History

Adding a New Tool

This guide explains how to add a new tool to the Zen MCP Server. Tools are the primary way Claude interacts with the AI models, providing specialized capabilities like code review, debugging, test generation, and more.

Overview

The tool system in Zen MCP Server is designed to be extensible. Each tool:

Inherits from the BaseTool class
Implements required abstract methods
Defines a request model for parameter validation
Is registered in the server's tool registry
Can leverage different AI models based on task requirements

Architecture Overview

Key Components

BaseTool (tools/base.py): Abstract base class providing common functionality
Request Models: Pydantic models for input validation
System Prompts: Specialized prompts that configure AI behavior
Tool Registry: Registration system in server.py

Tool Lifecycle

Claude calls the tool with parameters
Parameters are validated using Pydantic
File paths are security-checked
Prompt is prepared with system instructions
AI model generates response
Response is formatted and returned

Step-by-Step Implementation Guide

1. Create the Tool File

Create a new file in the tools/ directory (e.g., tools/example.py):

"""
Example tool - Brief description of what your tool does

This tool provides [specific functionality] to help developers [achieve goal].
Key features:
- Feature 1
- Feature 2
- Feature 3
"""

import logging
from typing import Any, Optional

from mcp.types import TextContent
from pydantic import Field

from config import TEMPERATURE_BALANCED
from systemprompts import EXAMPLE_PROMPT  # You'll create this

from .base import BaseTool, ToolRequest
from .models import ToolOutput

logger = logging.getLogger(__name__)

2. Define the Request Model

Create a Pydantic model that inherits from ToolRequest:

class ExampleRequest(ToolRequest):
    """Request model for the example tool."""
    
    # Required parameters
    prompt: str = Field(
        ...,
        description="The main input/question for the tool"
    )
    
    # Optional parameters with defaults
    files: Optional[list[str]] = Field(
        default=None,
        description="Files to analyze (must be absolute paths)"
    )
    
    focus_area: Optional[str] = Field(
        default=None,
        description="Specific aspect to focus on"
    )
    
    # You can add tool-specific parameters
    output_format: Optional[str] = Field(
        default="detailed",
        description="Output format: 'summary', 'detailed', or 'actionable'"
    )

3. Implement the Tool Class

class ExampleTool(BaseTool):
    """Implementation of the example tool."""
    
    def get_name(self) -> str:
        """Return the tool's unique identifier."""
        return "example"
    
    def get_description(self) -> str:
        """Return detailed description for Claude."""
        return (
            "EXAMPLE TOOL - Brief tagline describing the tool's purpose. "
            "Use this tool when you need to [specific use cases]. "
            "Perfect for: [scenario 1], [scenario 2], [scenario 3]. "
            "Supports [key features]. Choose thinking_mode based on "
            "[guidance for mode selection]. "
            "Note: If you're not currently using a top-tier model such as "
            "Opus 4 or above, these tools can provide enhanced capabilities."
        )
    
    def get_input_schema(self) -> dict[str, Any]:
        """Define the JSON schema for tool parameters."""
        schema = {
            "type": "object",
            "properties": {
                "prompt": {
                    "type": "string",
                    "description": "The main input/question for the tool",
                },
                "files": {
                    "type": "array",
                    "items": {"type": "string"},
                    "description": "Files to analyze (must be absolute paths)",
                },
                "focus_area": {
                    "type": "string",
                    "description": "Specific aspect to focus on",
                },
                "output_format": {
                    "type": "string",
                    "enum": ["summary", "detailed", "actionable"],
                    "description": "Output format type",
                    "default": "detailed",
                },
                "model": self.get_model_field_schema(),
                "temperature": {
                    "type": "number",
                    "description": "Temperature (0-1, default varies by tool)",
                    "minimum": 0,
                    "maximum": 1,
                },
                "thinking_mode": {
                    "type": "string",
                    "enum": ["minimal", "low", "medium", "high", "max"],
                    "description": "Thinking depth: minimal (0.5% of model max), "
                                   "low (8%), medium (33%), high (67%), max (100%)",
                },
                "continuation_id": {
                    "type": "string",
                    "description": "Thread continuation ID for multi-turn conversations",
                },
            },
            "required": ["prompt"] + (
                ["model"] if self.is_effective_auto_mode() else []
            ),
        }
        return schema
    
    def get_system_prompt(self) -> str:
        """Return the system prompt for this tool."""
        return EXAMPLE_PROMPT  # Defined in systemprompts/
    
    def get_default_temperature(self) -> float:
        """Return default temperature for this tool."""
        # Use predefined constants from config.py:
        # TEMPERATURE_CREATIVE (0.7) - For creative tasks
        # TEMPERATURE_BALANCED (0.5) - For balanced tasks
        # TEMPERATURE_ANALYTICAL (0.2) - For analytical tasks
        return TEMPERATURE_BALANCED
    
    def get_model_category(self):
        """Specify which type of model this tool needs."""
        from tools.models import ToolModelCategory
        
        # Choose based on your tool's needs:
        # FAST_RESPONSE - Quick responses, cost-efficient (chat, simple queries)
        # BALANCED - Standard analysis and generation
        # EXTENDED_REASONING - Complex analysis, deep thinking (debug, review)
        return ToolModelCategory.BALANCED
    
    def get_request_model(self):
        """Return the request model class."""
        return ExampleRequest
    
    def wants_line_numbers_by_default(self) -> bool:
        """Whether to add line numbers to code files."""
        # Return True if your tool benefits from precise line references
        # (e.g., code review, debugging, refactoring)
        # Return False for general analysis or token-sensitive operations
        return False
    
    async def prepare_prompt(self, request: ExampleRequest) -> str:
        """
        Prepare the complete prompt for the AI model.
        
        This method combines:
        - System prompt (behavior configuration)
        - User request
        - File contents (if provided)
        - Additional context
        """
        # Check for prompt.txt in files (handles large prompts)
        prompt_content, updated_files = self.handle_prompt_file(request.files)
        if prompt_content:
            request.prompt = prompt_content
        if updated_files is not None:
            request.files = updated_files
        
        # Build the prompt parts
        prompt_parts = []
        
        # Add main request
        prompt_parts.append(f"=== USER REQUEST ===")
        prompt_parts.append(f"Focus Area: {request.focus_area}" if request.focus_area else "")
        prompt_parts.append(f"Output Format: {request.output_format}")
        prompt_parts.append(request.prompt)
        prompt_parts.append("=== END REQUEST ===")
        
        # Add file contents if provided
        if request.files:
            # Use the centralized file handling (respects continuation)
            file_content = self._prepare_file_content_for_prompt(
                request.files,
                request.continuation_id,
                "Files to analyze"
            )
            if file_content:
                prompt_parts.append("\n=== FILES ===")
                prompt_parts.append(file_content)
                prompt_parts.append("=== END FILES ===")
        
        # Validate token limits
        full_prompt = "\n".join(filter(None, prompt_parts))
        self._validate_token_limit(full_prompt, "Prompt")
        
        return full_prompt
    
    def format_response(self, response: str, request: ExampleRequest, 
                       model_info: Optional[dict] = None) -> str:
        """
        Format the AI's response for display.
        
        Override this to add custom formatting, headers, or structure.
        The base class handles special status parsing automatically.
        """
        # Example: Add a footer with next steps
        return f"{response}\n\n---\n\n**Next Steps:** Review the analysis above and proceed with implementation."

4. Handle Large Prompts (Optional)

If your tool might receive large text inputs, override the execute method:

async def execute(self, arguments: dict[str, Any]) -> list[TextContent]:
    """Override to check prompt size before processing."""
    # Validate request first
    request_model = self.get_request_model()
    request = request_model(**arguments)
    
    # Check if prompt is too large for MCP limits
    size_check = self.check_prompt_size(request.prompt)
    if size_check:
        return [TextContent(type="text", text=ToolOutput(**size_check).model_dump_json())]
    
    # Continue with normal execution
    return await super().execute(arguments)

5. Create the System Prompt

Create a new file in systemprompts/ (e.g., systemprompts/example_prompt.py):

"""System prompt for the example tool."""

EXAMPLE_PROMPT = """You are an AI assistant specialized in [tool purpose].

Your role is to [primary responsibility] by [approach/methodology].

Key principles:
1. [Principle 1]
2. [Principle 2]
3. [Principle 3]

When analyzing content:
- [Guideline 1]
- [Guideline 2]
- [Guideline 3]

Output format:
- Start with a brief summary
- Provide detailed analysis organized by [structure]
- Include specific examples and recommendations
- End with actionable next steps

Remember to:
- Be specific and reference exact locations (file:line) when discussing code
- Provide practical, implementable suggestions
- Consider the broader context and implications
- Maintain a helpful, constructive tone
"""

Add the import to systemprompts/__init__.py:

from .example_prompt import EXAMPLE_PROMPT

6. Register the Tool

6.1. Import in server.py

Add the import at the top of server.py:

from tools.example import ExampleTool

6.2. Add to TOOLS Dictionary

Find the TOOLS dictionary in server.py and add your tool:

TOOLS = {
    "analyze": AnalyzeTool(),
    "chat": ChatTool(),
    "review_code": CodeReviewTool(),
    "debug": DebugTool(),
    "review_changes": PreCommitTool(),
    "generate_tests": TestGenTool(),
    "thinkdeep": ThinkDeepTool(),
    "refactor": RefactorTool(),
    "example": ExampleTool(),  # Add your tool here
}

7. Write Tests

Create unit tests in tests/test_example.py:

"""Tests for the example tool."""

import pytest
from unittest.mock import Mock, patch

from tools.example import ExampleTool, ExampleRequest
from tools.models import ToolModelCategory


class TestExampleTool:
    """Test suite for ExampleTool."""
    
    def test_tool_metadata(self):
        """Test tool metadata methods."""
        tool = ExampleTool()
        
        assert tool.get_name() == "example"
        assert "EXAMPLE TOOL" in tool.get_description()
        assert tool.get_default_temperature() == 0.5
        assert tool.get_model_category() == ToolModelCategory.BALANCED
    
    def test_request_validation(self):
        """Test request model validation."""
        # Valid request
        request = ExampleRequest(prompt="Test prompt")
        assert request.prompt == "Test prompt"
        assert request.output_format == "detailed"  # default
        
        # Invalid request (missing required field)
        with pytest.raises(ValueError):
            ExampleRequest()
    
    def test_input_schema(self):
        """Test input schema generation."""
        tool = ExampleTool()
        schema = tool.get_input_schema()
        
        assert schema["type"] == "object"
        assert "prompt" in schema["properties"]
        assert "prompt" in schema["required"]
        assert "model" in schema["properties"]
    
    @pytest.mark.asyncio
    async def test_prepare_prompt(self):
        """Test prompt preparation."""
        tool = ExampleTool()
        request = ExampleRequest(
            prompt="Analyze this code",
            focus_area="performance",
            output_format="summary"
        )
        
        with patch.object(tool, '_validate_token_limit'):
            prompt = await tool.prepare_prompt(request)
        
        assert "USER REQUEST" in prompt
        assert "Analyze this code" in prompt
        assert "Focus Area: performance" in prompt
        assert "Output Format: summary" in prompt
    
    @pytest.mark.asyncio
    async def test_file_handling(self):
        """Test file content handling."""
        tool = ExampleTool()
        request = ExampleRequest(
            prompt="Analyze",
            files=["/path/to/file.py"]
        )
        
        # Mock file reading
        with patch.object(tool, '_prepare_file_content_for_prompt') as mock_prep:
            mock_prep.return_value = "file contents"
            with patch.object(tool, '_validate_token_limit'):
                prompt = await tool.prepare_prompt(request)
        
        assert "FILES" in prompt
        assert "file contents" in prompt

8. Add Simulator Tests (Optional)

For tools that interact with external systems, create simulator tests in simulator_tests/test_example_basic.py:

"""Basic simulator test for example tool."""

from simulator_tests.base_test import SimulatorTest


class TestExampleBasic(SimulatorTest):
    """Test basic example tool functionality."""
    
    def test_example_analysis(self):
        """Test basic analysis with example tool."""
        result = self.call_tool(
            "example",
            {
                "prompt": "Analyze the architecture of this codebase",
                "model": "flash",
                "output_format": "summary"
            }
        )
        
        self.assert_tool_success(result)
        self.assert_content_contains(result, ["architecture", "summary"])

9. Update Documentation

Add your tool to the README.md in the tools section:

### Available Tools

- **example** - Brief description of what the tool does
  - Use cases: [scenario 1], [scenario 2]
  - Supports: [key features]
  - Best model: `balanced` category for standard analysis

Advanced Features

Understanding Conversation Memory

The continuation_id feature enables multi-turn conversations using the conversation memory system (utils/conversation_memory.py). Here's how it works:

Thread Creation: When a tool wants to enable follow-up conversations, it creates a thread
Turn Storage: Each exchange (user/assistant) is stored as a turn with metadata
Cross-Tool Continuation: Any tool can continue a conversation started by another tool
Automatic History: When continuation_id is provided, the full conversation history is reconstructed

Key concepts:

ThreadContext: Contains all conversation turns, files, and metadata
ConversationTurn: Single exchange with role, content, timestamp, files, tool attribution
Thread Chains: Conversations can have parent threads for extended discussions
Turn Limits: Default 20 turns (configurable via MAX_CONVERSATION_TURNS)

Example flow:

# Tool A creates thread
thread_id = create_thread("analyze", request_data)

# Tool A adds its response
add_turn(thread_id, "assistant", response, files=[...], tool_name="analyze")

# Tool B continues the same conversation
context = get_thread(thread_id)  # Gets full history
# Tool B sees all previous turns and files

Supporting Special Response Types

Tools can return special status responses for complex interactions. These are defined in tools/models.py:

# Currently supported special statuses:
SPECIAL_STATUS_MODELS = {
    "need_clarification": NeedClarificationModel,
    "focused_review_required": FocusedReviewRequiredModel,
    "more_review_required": MoreReviewRequiredModel,
    "more_testgen_required": MoreTestGenRequiredModel,
    "more_refactor_required": MoreRefactorRequiredModel,
    "resend_prompt": ResendPromptModel,
}

Example implementation:

# In your tool's format_response or within the AI response:
if need_clarification:
    return json.dumps({
        "status": "need_clarification",
        "questions": ["What specific aspect should I focus on?"],
        "context": "I need more information to proceed"
    })

# For custom review status:
if more_analysis_needed:
    return json.dumps({
        "status": "focused_review_required", 
        "files": ["/path/to/file1.py", "/path/to/file2.py"],
        "focus": "security",
        "reason": "Found potential SQL injection vulnerabilities"
    })

To add a new custom response type:

Define the model in tools/models.py:

class CustomStatusModel(BaseModel):
    """Model for custom status responses"""
    status: Literal["custom_status"]
    custom_field: str
    details: dict[str, Any]

SPECIAL_STATUS_MODELS = {
    # ... existing statuses ...
    "custom_status": CustomStatusModel,
}

The base tool will automatically handle parsing and validation

Token Management

For tools processing large amounts of data:

# Calculate available tokens dynamically
def prepare_large_content(self, files: list[str], remaining_budget: int):
    # Reserve tokens for response
    reserve_tokens = 5000
    
    # Use model-specific limits
    effective_max = remaining_budget - reserve_tokens
    
    # Process files with budget
    content = self._prepare_file_content_for_prompt(
        files,
        continuation_id,
        "Analysis files",
        max_tokens=effective_max,
        reserve_tokens=reserve_tokens
    )

Web Search Integration

Enable web search for tools that benefit from current information:

# In prepare_prompt:
websearch_instruction = self.get_websearch_instruction(
    request.use_websearch,
    """Consider searching for:
    - Current best practices for [topic]
    - Recent updates to [technology]
    - Community solutions for [problem]"""
)

full_prompt = f"{system_prompt}{websearch_instruction}\n\n{user_content}"

Best Practices

Clear Tool Descriptions: Write descriptive text that helps Claude understand when to use your tool
Proper Validation: Use Pydantic models for robust input validation
Security First: Always validate file paths are absolute
Token Awareness: Handle large inputs gracefully with prompt.txt mechanism
Model Selection: Choose appropriate model category for your tool's complexity
Line Numbers: Enable for tools needing precise code references
Error Handling: Provide helpful error messages for common issues
Testing: Write comprehensive unit tests and simulator tests
Documentation: Include examples and use cases in your description

Common Pitfalls to Avoid

Don't Skip Validation: Always validate inputs, especially file paths
Don't Ignore Token Limits: Use _validate_token_limit and handle large prompts
Don't Hardcode Models: Use model categories for flexibility
Don't Forget Tests: Every tool needs tests for reliability
Don't Break Conventions: Follow existing patterns from other tools

Testing Your Tool

Manual Testing

Start the server with your tool registered
Use Claude Desktop to call your tool
Test various parameter combinations
Verify error handling

Automated Testing

# Run unit tests
pytest tests/test_example.py -xvs

# Run all tests to ensure no regressions
pytest -xvs

# Run simulator tests if applicable
python communication_simulator_test.py

Checklist

Before submitting your PR:

Tool class created inheriting from BaseTool
All abstract methods implemented
Request model defined with proper validation
System prompt created in systemprompts/
Tool registered in server.py
Unit tests written and passing
Simulator tests added (if applicable)
Documentation updated
Code follows project style (ruff, black, isort)
Large prompt handling implemented (if needed)
Security validation for file paths
Appropriate model category selected
Tool description is clear and helpful

Example: Complete Simple Tool

Here's a minimal but complete example tool:

"""
Simple calculator tool for mathematical operations.
"""

from typing import Any, Optional
from mcp.types import TextContent
from pydantic import Field

from config import TEMPERATURE_ANALYTICAL
from .base import BaseTool, ToolRequest
from .models import ToolOutput


class CalculateRequest(ToolRequest):
    """Request model for calculator tool."""
    
    expression: str = Field(
        ...,
        description="Mathematical expression to evaluate"
    )


class CalculatorTool(BaseTool):
    """Simple calculator tool."""
    
    def get_name(self) -> str:
        return "calculate"
    
    def get_description(self) -> str:
        return (
            "CALCULATOR - Evaluates mathematical expressions. "
            "Use this for calculations, conversions, and math problems."
        )
    
    def get_input_schema(self) -> dict[str, Any]:
        schema = {
            "type": "object",
            "properties": {
                "expression": {
                    "type": "string",
                    "description": "Mathematical expression to evaluate",
                },
                "model": self.get_model_field_schema(),
            },
            "required": ["expression"] + (
                ["model"] if self.is_effective_auto_mode() else []
            ),
        }
        return schema
    
    def get_system_prompt(self) -> str:
        return """You are a mathematical assistant. Evaluate the expression 
        and explain the calculation steps clearly."""
    
    def get_default_temperature(self) -> float:
        return TEMPERATURE_ANALYTICAL
    
    def get_request_model(self):
        return CalculateRequest
    
    async def prepare_prompt(self, request: CalculateRequest) -> str:
        return f"Calculate: {request.expression}\n\nShow your work step by step."

Need Help?

Look at existing tools (chat.py, refactor.py) for examples
Check base.py for available helper methods
Review test files for testing patterns
Ask questions in GitHub issues if stuck

23 KiB Raw Blame History