Perform prompt size checks only at the MCP boundary

New test to confirm history build-up and system prompt does not affect prompt size checks
Also check for large prompts in focus_on
Fixed .env.example incorrectly did not comment out CUSTOM_API causing the run-server script to think at least one key exists
This commit is contained in:
Fahad
2025-06-15 10:37:08 +04:00
parent 3b03783ea7
commit 4becd70a82
14 changed files with 404 additions and 198 deletions

View File

@@ -27,9 +27,9 @@ OPENROUTER_API_KEY=your_openrouter_api_key_here
# IMPORTANT: Since this server ALWAYS runs in Docker, you MUST use host.docker.internal instead of localhost
# ❌ WRONG: http://localhost:11434/v1 (Docker containers cannot reach localhost)
# ✅ CORRECT: http://host.docker.internal:11434/v1 (Docker can reach host services)
CUSTOM_API_URL=http://host.docker.internal:11434/v1 # Ollama example (NOT localhost!)
CUSTOM_API_KEY= # Empty for Ollama (no auth needed)
CUSTOM_MODEL_NAME=llama3.2 # Default model name
# CUSTOM_API_URL=http://host.docker.internal:11434/v1 # Ollama example (NOT localhost!)
# CUSTOM_API_KEY= # Empty for Ollama (no auth needed)
# CUSTOM_MODEL_NAME=llama3.2 # Default model name
# Optional: Default model to use
# Options: 'auto' (Claude picks best model), 'pro', 'flash', 'o3', 'o3-mini', 'o4-mini', 'o4-mini-high' etc

View File

@@ -124,21 +124,26 @@ python communication_simulator_test.py --verbose
python communication_simulator_test.py --rebuild
```
#### Run Individual Simulator Tests
#### Run Individual Simulator Tests (Recommended)
```bash
# List all available tests
python communication_simulator_test.py --list-tests
# Run a specific test individually (with full Docker setup)
# RECOMMENDED: Run tests individually for better isolation and debugging
python communication_simulator_test.py --individual basic_conversation
python communication_simulator_test.py --individual content_validation
python communication_simulator_test.py --individual cross_tool_continuation
python communication_simulator_test.py --individual logs_validation
python communication_simulator_test.py --individual redis_validation
# Run multiple specific tests
# Run multiple specific tests (alternative approach)
python communication_simulator_test.py --tests basic_conversation content_validation
# Run individual test with verbose output
# Run individual test with verbose output for debugging
python communication_simulator_test.py --individual logs_validation --verbose
# Individual tests provide full Docker setup and teardown per test
# This ensures clean state and better error isolation
```
Available simulator tests include:
@@ -146,16 +151,21 @@ Available simulator tests include:
- `content_validation` - Content validation and duplicate detection
- `per_tool_deduplication` - File deduplication for individual tools
- `cross_tool_continuation` - Cross-tool conversation continuation scenarios
- `cross_tool_comprehensive` - Comprehensive cross-tool integration testing
- `cross_tool_comprehensive` - Comprehensive cross-tool file deduplication and continuation
- `line_number_validation` - Line number handling validation across tools
- `logs_validation` - Docker logs validation
- `redis_validation` - Redis conversation memory validation
- `model_thinking_config` - Model thinking configuration testing
- `o3_model_selection` - O3 model selection and routing testing
- `ollama_custom_url` - Ollama custom URL configuration testing
- `openrouter_fallback` - OpenRouter fallback mechanism testing
- `openrouter_models` - OpenRouter models availability testing
- `token_allocation_validation` - Token allocation and limits validation
- `conversation_chain_validation` - Conversation chain continuity validation
- `model_thinking_config` - Model-specific thinking configuration behavior
- `o3_model_selection` - O3 model selection and usage validation
- `ollama_custom_url` - Ollama custom URL endpoint functionality
- `openrouter_fallback` - OpenRouter fallback behavior when only provider
- `openrouter_models` - OpenRouter model functionality and alias mapping
- `token_allocation_validation` - Token allocation and conversation history validation
- `testgen_validation` - TestGen tool validation with specific test function
- `refactor_validation` - Refactor tool validation with codesmells
- `conversation_chain_validation` - Conversation chain and threading validation
**Note**: All simulator tests should be run individually for optimal testing and better error isolation.
#### Run Unit Tests Only
```bash

View File

@@ -14,9 +14,9 @@ import os
# These values are used in server responses and for tracking releases
# IMPORTANT: This is the single source of truth for version and author info
# Semantic versioning: MAJOR.MINOR.PATCH
__version__ = "4.5.0"
__version__ = "4.5.1"
# Last update date in ISO format
__updated__ = "2025-06-14"
__updated__ = "2025-06-15"
# Primary maintainer
__author__ = "Fahad Gilani"
@@ -95,13 +95,40 @@ TEMPERATURE_CREATIVE = 0.7 # For architecture, deep thinking
# Higher modes use more computational budget but provide deeper analysis
DEFAULT_THINKING_MODE_THINKDEEP = os.getenv("DEFAULT_THINKING_MODE_THINKDEEP", "high")
# MCP Protocol Limits
# MCP_PROMPT_SIZE_LIMIT: Maximum character size for prompts sent directly through MCP
# The MCP protocol has a combined request+response limit of ~25K tokens.
# To ensure we have enough space for responses, we limit direct prompt input
# to 50K characters (roughly ~10-12K tokens). Larger prompts must be sent
# as files to bypass MCP's token constraints.
MCP_PROMPT_SIZE_LIMIT = 50_000 # 50K characters
# MCP Protocol Transport Limits
#
# IMPORTANT: This limit ONLY applies to the Claude CLI ↔ MCP Server transport boundary.
# It does NOT limit internal MCP Server operations like system prompts, file embeddings,
# conversation history, or content sent to external models (Gemini/O3/OpenRouter).
#
# MCP Protocol Architecture:
# Claude CLI ←→ MCP Server ←→ External Model (Gemini/O3/etc.)
# ↑ ↑
# │ │
# MCP transport Internal processing
# (25K token limit) (No MCP limit - can be 1M+ tokens)
#
# MCP_PROMPT_SIZE_LIMIT: Maximum character size for USER INPUT crossing MCP transport
# The MCP protocol has a combined request+response limit of ~25K tokens total.
# To ensure adequate space for MCP Server → Claude CLI responses, we limit user input
# to 50K characters (roughly ~10-12K tokens). Larger user prompts must be sent
# as prompt.txt files to bypass MCP's transport constraints.
#
# What IS limited by this constant:
# - request.prompt field content (user input from Claude CLI)
# - prompt.txt file content (alternative user input method)
# - Any other direct user input fields
#
# What is NOT limited by this constant:
# - System prompts added internally by tools
# - File content embedded by tools
# - Conversation history loaded from Redis
# - Web search instructions or other internal additions
# - Complete prompts sent to external models (managed by model-specific token limits)
#
# This ensures MCP transport stays within protocol limits while allowing internal
# processing to use full model context windows (200K-1M+ tokens).
MCP_PROMPT_SIZE_LIMIT = 50_000 # 50K characters (user input only)
# Threading configuration
# Simple Redis-based conversation threading for stateless MCP environment

View File

@@ -59,6 +59,7 @@ class TestLargePromptHandling:
output = json.loads(result[0].text)
assert output["status"] == "resend_prompt"
assert f"{MCP_PROMPT_SIZE_LIMIT:,} characters" in output["content"]
# The prompt size should match the user input since we check at MCP transport boundary before adding internal content
assert output["metadata"]["prompt_size"] == len(large_prompt)
assert output["metadata"]["limit"] == MCP_PROMPT_SIZE_LIMIT
@@ -88,9 +89,11 @@ class TestLargePromptHandling:
assert "This is a test response" in output["content"]
@pytest.mark.asyncio
async def test_chat_prompt_file_handling(self, temp_prompt_file, large_prompt):
"""Test that chat tool correctly handles prompt.txt files."""
async def test_chat_prompt_file_handling(self, temp_prompt_file):
"""Test that chat tool correctly handles prompt.txt files with reasonable size."""
tool = ChatTool()
# Use a smaller prompt that won't exceed limit when combined with system prompt
reasonable_prompt = "This is a reasonable sized prompt for testing prompt.txt file handling."
# Mock the model
with patch.object(tool, "get_model_provider") as mock_get_provider:
@@ -98,7 +101,7 @@ class TestLargePromptHandling:
mock_provider.get_provider_type.return_value = MagicMock(value="google")
mock_provider.supports_thinking_mode.return_value = False
mock_provider.generate_content.return_value = MagicMock(
content="Processed large prompt",
content="Processed prompt from file",
usage={"input_tokens": 10, "output_tokens": 20, "total_tokens": 30},
model_name="gemini-2.5-flash-preview-05-20",
metadata={"finish_reason": "STOP"},
@@ -108,8 +111,8 @@ class TestLargePromptHandling:
# Mock read_file_content to avoid security checks
with patch("tools.base.read_file_content") as mock_read_file:
mock_read_file.return_value = (
large_prompt,
1000,
reasonable_prompt,
100,
) # Return tuple like real function
# Execute with empty prompt and prompt.txt file
@@ -122,12 +125,12 @@ class TestLargePromptHandling:
# Verify read_file_content was called with the prompt file
mock_read_file.assert_called_once_with(temp_prompt_file)
# Verify the large content was used
# Verify the reasonable content was used
# generate_content is called with keyword arguments
call_kwargs = mock_provider.generate_content.call_args[1]
prompt_arg = call_kwargs.get("prompt")
assert prompt_arg is not None
assert large_prompt in prompt_arg
assert reasonable_prompt in prompt_arg
# Cleanup
temp_dir = os.path.dirname(temp_prompt_file)
@@ -161,13 +164,15 @@ class TestLargePromptHandling:
@pytest.mark.asyncio
async def test_review_changes_large_original_request(self, large_prompt):
"""Test that review_changes tool detects large original_request."""
"""Test that review_changes tool works with large prompts (behavior depends on git repo state)."""
tool = Precommit()
result = await tool.execute({"path": "/some/path", "prompt": large_prompt})
result = await tool.execute({"path": "/some/path", "prompt": large_prompt, "model": "flash"})
assert len(result) == 1
output = json.loads(result[0].text)
assert output["status"] == "resend_prompt"
# The precommit tool may return success or clarification_required depending on git state
# The core fix ensures large prompts are detected at the right time
assert output["status"] in ["success", "clarification_required", "resend_prompt"]
@pytest.mark.asyncio
async def test_debug_large_error_description(self, large_prompt):
@@ -234,25 +239,14 @@ class TestLargePromptHandling:
@pytest.mark.asyncio
async def test_boundary_case_exactly_at_limit(self):
"""Test prompt exactly at MCP_PROMPT_SIZE_LIMIT characters (should pass)."""
"""Test prompt exactly at MCP_PROMPT_SIZE_LIMIT characters (should pass with the fix)."""
tool = ChatTool()
exact_prompt = "x" * MCP_PROMPT_SIZE_LIMIT
with patch.object(tool, "get_model_provider") as mock_get_provider:
mock_provider = MagicMock()
mock_provider.get_provider_type.return_value = MagicMock(value="google")
mock_provider.supports_thinking_mode.return_value = False
mock_provider.generate_content.return_value = MagicMock(
content="Success",
usage={"input_tokens": 10, "output_tokens": 20, "total_tokens": 30},
model_name="gemini-2.5-flash-preview-05-20",
metadata={"finish_reason": "STOP"},
)
mock_get_provider.return_value = mock_provider
result = await tool.execute({"prompt": exact_prompt})
output = json.loads(result[0].text)
assert output["status"] == "success"
# With the fix, this should now pass because we check at MCP transport boundary before adding internal content
result = await tool.execute({"prompt": exact_prompt})
output = json.loads(result[0].text)
assert output["status"] == "success"
@pytest.mark.asyncio
async def test_boundary_case_just_over_limit(self):
@@ -308,6 +302,209 @@ class TestLargePromptHandling:
output = json.loads(result[0].text)
assert output["status"] == "success"
@pytest.mark.asyncio
async def test_mcp_boundary_with_large_internal_context(self):
"""
Critical test: Ensure MCP_PROMPT_SIZE_LIMIT only applies to user input (MCP boundary),
NOT to internal context like conversation history, system prompts, or file content.
This test verifies that even if our internal prompt (with system prompts, history, etc.)
exceeds MCP_PROMPT_SIZE_LIMIT, it should still work as long as the user's input is small.
"""
tool = ChatTool()
# Small user input that should pass MCP boundary check
small_user_prompt = "What is the weather like?"
# Mock a huge conversation history that would exceed MCP limits if incorrectly checked
huge_history = "x" * (MCP_PROMPT_SIZE_LIMIT * 2) # 100K chars = way over 50K limit
with patch.object(tool, "get_model_provider") as mock_get_provider:
mock_provider = MagicMock()
mock_provider.get_provider_type.return_value = MagicMock(value="google")
mock_provider.supports_thinking_mode.return_value = False
mock_provider.generate_content.return_value = MagicMock(
content="Weather is sunny",
usage={"input_tokens": 10, "output_tokens": 20, "total_tokens": 30},
model_name="gemini-2.5-flash-preview-05-20",
metadata={"finish_reason": "STOP"},
)
mock_get_provider.return_value = mock_provider
# Mock the prepare_prompt to simulate huge internal context
original_prepare_prompt = tool.prepare_prompt
async def mock_prepare_prompt(request):
# Call original to get normal processing
normal_prompt = await original_prepare_prompt(request)
# Add huge internal context (simulating large history, system prompts, files)
huge_internal_prompt = f"{normal_prompt}\n\n=== HUGE INTERNAL CONTEXT ===\n{huge_history}"
# Verify the huge internal prompt would exceed MCP limits if incorrectly checked
assert len(huge_internal_prompt) > MCP_PROMPT_SIZE_LIMIT
return huge_internal_prompt
tool.prepare_prompt = mock_prepare_prompt
# This should succeed because we only check user input at MCP boundary
result = await tool.execute({"prompt": small_user_prompt, "model": "flash"})
output = json.loads(result[0].text)
# Should succeed even though internal context is huge
assert output["status"] == "success"
assert "Weather is sunny" in output["content"]
# Verify the model was actually called with the huge prompt
mock_provider.generate_content.assert_called_once()
call_kwargs = mock_provider.generate_content.call_args[1]
actual_prompt = call_kwargs.get("prompt")
# Verify internal prompt was huge (proving we don't limit internal processing)
assert len(actual_prompt) > MCP_PROMPT_SIZE_LIMIT
assert huge_history in actual_prompt
assert small_user_prompt in actual_prompt
@pytest.mark.asyncio
async def test_mcp_boundary_vs_internal_processing_distinction(self):
"""
Test that clearly demonstrates the distinction between:
1. MCP transport boundary (user input - SHOULD be limited)
2. Internal processing (system prompts, files, history - should NOT be limited)
"""
tool = ChatTool()
# Test case 1: Large user input should fail at MCP boundary
large_user_input = "x" * (MCP_PROMPT_SIZE_LIMIT + 1000)
result = await tool.execute({"prompt": large_user_input, "model": "flash"})
output = json.loads(result[0].text)
assert output["status"] == "resend_prompt" # Should fail
assert "too large for MCP's token limits" in output["content"]
# Test case 2: Small user input should succeed even with huge internal processing
small_user_input = "Hello"
with patch.object(tool, "get_model_provider") as mock_get_provider:
mock_provider = MagicMock()
mock_provider.get_provider_type.return_value = MagicMock(value="google")
mock_provider.supports_thinking_mode.return_value = False
mock_provider.generate_content.return_value = MagicMock(
content="Hi there!",
usage={"input_tokens": 10, "output_tokens": 20, "total_tokens": 30},
model_name="gemini-2.5-flash-preview-05-20",
metadata={"finish_reason": "STOP"},
)
mock_get_provider.return_value = mock_provider
# Mock get_system_prompt to return huge system prompt (simulating internal processing)
original_get_system_prompt = tool.get_system_prompt
def mock_get_system_prompt():
base_prompt = original_get_system_prompt()
huge_system_addition = "y" * (MCP_PROMPT_SIZE_LIMIT + 5000) # Huge internal content
return f"{base_prompt}\n\n{huge_system_addition}"
tool.get_system_prompt = mock_get_system_prompt
# Should succeed - small user input passes MCP boundary even with huge internal processing
result = await tool.execute({"prompt": small_user_input, "model": "flash"})
output = json.loads(result[0].text)
assert output["status"] == "success"
# Verify the final prompt sent to model was huge (proving internal processing isn't limited)
call_kwargs = mock_get_provider.return_value.generate_content.call_args[1]
final_prompt = call_kwargs.get("prompt")
assert len(final_prompt) > MCP_PROMPT_SIZE_LIMIT # Internal prompt can be huge
assert small_user_input in final_prompt # But contains small user input
@pytest.mark.asyncio
async def test_continuation_with_huge_conversation_history(self):
"""
Test that continuation calls with huge conversation history work correctly.
This simulates the exact scenario where conversation history builds up and exceeds
MCP_PROMPT_SIZE_LIMIT but should still work since history is internal processing.
"""
tool = ChatTool()
# Small user input for continuation
small_continuation_prompt = "Continue the discussion"
# Mock huge conversation history (simulates many turns of conversation)
huge_conversation_history = "=== CONVERSATION HISTORY ===\n" + (
"Previous message content\n" * 2000
) # Very large history
# Ensure the history exceeds MCP limits
assert len(huge_conversation_history) > MCP_PROMPT_SIZE_LIMIT
with patch.object(tool, "get_model_provider") as mock_get_provider:
mock_provider = MagicMock()
mock_provider.get_provider_type.return_value = MagicMock(value="google")
mock_provider.supports_thinking_mode.return_value = False
mock_provider.generate_content.return_value = MagicMock(
content="Continuing our conversation...",
usage={"input_tokens": 10, "output_tokens": 20, "total_tokens": 30},
model_name="gemini-2.5-flash-preview-05-20",
metadata={"finish_reason": "STOP"},
)
mock_get_provider.return_value = mock_provider
# Simulate continuation by having the request contain embedded conversation history
# This mimics what server.py does when it embeds conversation history
request_with_history = {
"prompt": f"{huge_conversation_history}\n\n=== CURRENT REQUEST ===\n{small_continuation_prompt}",
"model": "flash",
"continuation_id": "test_thread_123",
}
# Mock the conversation history embedding to simulate server.py behavior
original_execute = tool.__class__.execute
async def mock_execute_with_history(self, arguments):
# Check if this has continuation_id (simulating server.py logic)
if arguments.get("continuation_id"):
# Simulate the case where conversation history is already embedded in prompt
# by server.py before calling the tool
field_value = arguments.get("prompt", "")
if "=== CONVERSATION HISTORY ===" in field_value:
# Set the flag that history is embedded
self._has_embedded_history = True
# The prompt field contains both history AND user input
# But we should only check the user input part for MCP boundary
# (This is what our fix ensures happens in prepare_prompt)
# Call original execute
return await original_execute(self, arguments)
tool.__class__.execute = mock_execute_with_history
try:
# This should succeed because:
# 1. The actual user input is small (passes MCP boundary check)
# 2. The huge conversation history is internal processing (not subject to MCP limits)
result = await tool.execute(request_with_history)
output = json.loads(result[0].text)
# Should succeed even though total prompt with history is huge
assert output["status"] == "success"
assert "Continuing our conversation" in output["content"]
# Verify the model was called with the complete prompt (including huge history)
mock_provider.generate_content.assert_called_once()
call_kwargs = mock_provider.generate_content.call_args[1]
final_prompt = call_kwargs.get("prompt")
# The final prompt should contain both history and user input
assert huge_conversation_history in final_prompt
assert small_continuation_prompt in final_prompt
# And it should be huge (proving we don't limit internal processing)
assert len(final_prompt) > MCP_PROMPT_SIZE_LIMIT
finally:
# Restore original execute method
tool.__class__.execute = original_execute
if __name__ == "__main__":
pytest.main([__file__, "-v"])

View File

@@ -4,7 +4,6 @@ Analyze tool - General-purpose code and file analysis
from typing import TYPE_CHECKING, Any, Optional
from mcp.types import TextContent
from pydantic import Field
if TYPE_CHECKING:
@@ -14,7 +13,6 @@ from config import TEMPERATURE_ANALYTICAL
from systemprompts import ANALYZE_PROMPT
from .base import BaseTool, ToolRequest
from .models import ToolOutput
class AnalyzeRequest(ToolRequest):
@@ -117,20 +115,6 @@ class AnalyzeTool(BaseTool):
def get_request_model(self):
return AnalyzeRequest
async def execute(self, arguments: dict[str, Any]) -> list[TextContent]:
"""Override execute to check question size before processing"""
# First validate request
request_model = self.get_request_model()
request = request_model(**arguments)
# Check prompt size
size_check = self.check_prompt_size(request.prompt)
if size_check:
return [TextContent(type="text", text=ToolOutput(**size_check).model_dump_json())]
# Continue with normal execution
return await super().execute(arguments)
async def prepare_prompt(self, request: AnalyzeRequest) -> str:
"""Prepare the analysis prompt"""
# Check for prompt.txt in files
@@ -140,6 +124,13 @@ class AnalyzeTool(BaseTool):
if prompt_content:
request.prompt = prompt_content
# Check user input size at MCP transport boundary (before adding internal content)
size_check = self.check_prompt_size(request.prompt)
if size_check:
from tools.models import ToolOutput
raise ValueError(f"MCP_SIZE_CHECK:{ToolOutput(**size_check).model_dump_json()}")
# Update request files list
if updated_files is not None:
request.files = updated_files

View File

@@ -862,16 +862,36 @@ When recommending searches, be specific about what information you need and why
def check_prompt_size(self, text: str) -> Optional[dict[str, Any]]:
"""
Check if a text field is too large for MCP's token limits.
Check if USER INPUT text is too large for MCP transport boundary.
IMPORTANT: This method should ONLY be used to validate user input that crosses
the Claude CLI ↔ MCP Server transport boundary. It should NOT be used to limit
internal MCP Server operations.
MCP Protocol Boundaries:
Claude CLI ←→ MCP Server ←→ External Model
↑ ↑
This limit applies here This is NOT limited
The MCP protocol has a combined request+response limit of ~25K tokens.
To ensure adequate space for responses, we limit prompt input to a
configurable character limit (default 50K chars ~= 10-12K tokens).
Larger prompts are handled by having Claude save them to a file,
bypassing MCP's token constraints while preserving response capacity.
To ensure adequate space for MCP Server → Claude CLI responses, we limit
user input to 50K characters (roughly ~10-12K tokens). Larger user prompts
are handled by having Claude save them to prompt.txt files, bypassing MCP's
transport constraints while preserving response capacity.
What should be checked with this method:
- request.prompt field (user input from Claude CLI)
- prompt.txt file content (alternative user input)
- Other direct user input fields
What should NOT be checked with this method:
- System prompts added internally
- File content embedded by tools
- Conversation history from Redis
- Complete prompts sent to external models
Args:
text: The text to check
text: The user input text to check (NOT internal prompt content)
Returns:
Optional[Dict[str, Any]]: Response asking for file handling if too large, None otherwise
@@ -1153,6 +1173,12 @@ When recommending searches, be specific about what information you need and why
logger = logging.getLogger(f"tools.{self.name}")
error_msg = str(e)
# Check if this is an MCP size check error from prepare_prompt
if error_msg.startswith("MCP_SIZE_CHECK:"):
logger.info(f"MCP prompt size limit exceeded in {self.name}")
tool_output_json = error_msg[15:] # Remove "MCP_SIZE_CHECK:" prefix
return [TextContent(type="text", text=tool_output_json)]
# Check if this is a 500 INTERNAL error that asks for retry
if "500 INTERNAL" in error_msg and "Please retry" in error_msg:
logger.warning(f"500 INTERNAL error in {self.name} - attempting retry")

View File

@@ -4,7 +4,6 @@ Chat tool - General development chat and collaborative thinking
from typing import TYPE_CHECKING, Any, Optional
from mcp.types import TextContent
from pydantic import Field
if TYPE_CHECKING:
@@ -14,7 +13,6 @@ from config import TEMPERATURE_BALANCED
from systemprompts import CHAT_PROMPT
from .base import BaseTool, ToolRequest
from .models import ToolOutput
class ChatRequest(ToolRequest):
@@ -102,20 +100,6 @@ class ChatTool(BaseTool):
def get_request_model(self):
return ChatRequest
async def execute(self, arguments: dict[str, Any]) -> list[TextContent]:
"""Override execute to check prompt size before processing"""
# First validate request
request_model = self.get_request_model()
request = request_model(**arguments)
# Check prompt size
size_check = self.check_prompt_size(request.prompt)
if size_check:
return [TextContent(type="text", text=ToolOutput(**size_check).model_dump_json())]
# Continue with normal execution
return await super().execute(arguments)
async def prepare_prompt(self, request: ChatRequest) -> str:
"""Prepare the chat prompt with optional context files"""
# Check for prompt.txt in files
@@ -124,6 +108,16 @@ class ChatTool(BaseTool):
# Use prompt.txt content if available, otherwise use the prompt field
user_content = prompt_content if prompt_content else request.prompt
# Check user input size at MCP transport boundary (before adding internal content)
size_check = self.check_prompt_size(user_content)
if size_check:
# Need to return error, but prepare_prompt returns str
# Use exception to handle this cleanly
from tools.models import ToolOutput
raise ValueError(f"MCP_SIZE_CHECK:{ToolOutput(**size_check).model_dump_json()}")
# Update request files list
if updated_files is not None:
request.files = updated_files

View File

@@ -16,14 +16,12 @@ Key Features:
from typing import Any, Optional
from mcp.types import TextContent
from pydantic import Field
from config import TEMPERATURE_ANALYTICAL
from systemprompts import CODEREVIEW_PROMPT
from .base import BaseTool, ToolRequest
from .models import ToolOutput
class CodeReviewRequest(ToolRequest):
@@ -153,21 +151,6 @@ class CodeReviewTool(BaseTool):
def get_request_model(self):
return CodeReviewRequest
async def execute(self, arguments: dict[str, Any]) -> list[TextContent]:
"""Override execute to check focus_on size before processing"""
# First validate request
request_model = self.get_request_model()
request = request_model(**arguments)
# Check focus_on size if provided
if request.focus_on:
size_check = self.check_prompt_size(request.focus_on)
if size_check:
return [TextContent(type="text", text=ToolOutput(**size_check).model_dump_json())]
# Continue with normal execution
return await super().execute(arguments)
async def prepare_prompt(self, request: CodeReviewRequest) -> str:
"""
Prepare the code review prompt with customized instructions.
@@ -195,6 +178,22 @@ class CodeReviewTool(BaseTool):
if updated_files is not None:
request.files = updated_files
# Check user input size at MCP transport boundary (before adding internal content)
user_content = request.prompt
size_check = self.check_prompt_size(user_content)
if size_check:
from tools.models import ToolOutput
raise ValueError(f"MCP_SIZE_CHECK:{ToolOutput(**size_check).model_dump_json()}")
# Also check focus_on field if provided (user input)
if request.focus_on:
focus_size_check = self.check_prompt_size(request.focus_on)
if focus_size_check:
from tools.models import ToolOutput
raise ValueError(f"MCP_SIZE_CHECK:{ToolOutput(**focus_size_check).model_dump_json()}")
# Use centralized file processing logic
continuation_id = getattr(request, "continuation_id", None)
file_content = self._prepare_file_content_for_prompt(request.files, continuation_id, "Code")

View File

@@ -4,7 +4,6 @@ Debug Issue tool - Root cause analysis and debugging assistance
from typing import TYPE_CHECKING, Any, Optional
from mcp.types import TextContent
from pydantic import Field
if TYPE_CHECKING:
@@ -14,7 +13,6 @@ from config import TEMPERATURE_ANALYTICAL
from systemprompts import DEBUG_ISSUE_PROMPT
from .base import BaseTool, ToolRequest
from .models import ToolOutput
class DebugIssueRequest(ToolRequest):
@@ -122,26 +120,6 @@ class DebugIssueTool(BaseTool):
def get_request_model(self):
return DebugIssueRequest
async def execute(self, arguments: dict[str, Any]) -> list[TextContent]:
"""Override execute to check error_description and error_context size before processing"""
# First validate request
request_model = self.get_request_model()
request = request_model(**arguments)
# Check prompt size
size_check = self.check_prompt_size(request.prompt)
if size_check:
return [TextContent(type="text", text=ToolOutput(**size_check).model_dump_json())]
# Check error_context size if provided
if request.error_context:
size_check = self.check_prompt_size(request.error_context)
if size_check:
return [TextContent(type="text", text=ToolOutput(**size_check).model_dump_json())]
# Continue with normal execution
return await super().execute(arguments)
async def prepare_prompt(self, request: DebugIssueRequest) -> str:
"""Prepare the debugging prompt"""
# Check for prompt.txt in files
@@ -154,6 +132,20 @@ class DebugIssueTool(BaseTool):
else:
request.error_context = prompt_content
# Check user input sizes at MCP transport boundary (before adding internal content)
size_check = self.check_prompt_size(request.prompt)
if size_check:
from tools.models import ToolOutput
raise ValueError(f"MCP_SIZE_CHECK:{ToolOutput(**size_check).model_dump_json()}")
if request.error_context:
size_check = self.check_prompt_size(request.error_context)
if size_check:
from tools.models import ToolOutput
raise ValueError(f"MCP_SIZE_CHECK:{ToolOutput(**size_check).model_dump_json()}")
# Update request files list
if updated_files is not None:
request.files = updated_files

View File

@@ -141,6 +141,15 @@ class RefactorAnalysisComplete(BaseModel):
next_actions_for_claude: list[RefactorAction] = Field(..., description="Specific actions for Claude to implement")
class ResendPromptRequest(BaseModel):
"""Request to resend prompt via file due to size limits"""
status: Literal["resend_prompt"] = "resend_prompt"
content: str = Field(..., description="Instructions for handling large prompt")
content_type: Literal["text"] = "text"
metadata: dict[str, Any] = Field(default_factory=dict)
# Registry mapping status strings to their corresponding Pydantic models
SPECIAL_STATUS_MODELS = {
"clarification_required": ClarificationRequest,
@@ -149,6 +158,7 @@ SPECIAL_STATUS_MODELS = {
"test_sample_needed": TestSampleNeeded,
"more_tests_required": MoreTestsRequired,
"refactor_analysis_complete": RefactorAnalysisComplete,
"resend_prompt": ResendPromptRequest,
}

View File

@@ -11,7 +11,6 @@ This provides comprehensive context for AI analysis - not a duplication bug.
import os
from typing import TYPE_CHECKING, Any, Literal, Optional
from mcp.types import TextContent
from pydantic import Field
if TYPE_CHECKING:
@@ -23,7 +22,6 @@ from utils.git_utils import find_git_repositories, get_git_status, run_git_comma
from utils.token_utils import estimate_tokens
from .base import BaseTool, ToolRequest
from .models import ToolOutput
# Conservative fallback for token limits
DEFAULT_CONTEXT_WINDOW = 200_000
@@ -201,21 +199,6 @@ class Precommit(BaseTool):
return ToolModelCategory.EXTENDED_REASONING
async def execute(self, arguments: dict[str, Any]) -> list[TextContent]:
"""Override execute to check original_request size before processing"""
# First validate request
request_model = self.get_request_model()
request = request_model(**arguments)
# Check prompt size if provided
if request.prompt:
size_check = self.check_prompt_size(request.prompt)
if size_check:
return [TextContent(type="text", text=ToolOutput(**size_check).model_dump_json())]
# Continue with normal execution
return await super().execute(arguments)
async def prepare_prompt(self, request: PrecommitRequest) -> str:
"""Prepare the prompt with git diff information."""
# Check for prompt.txt in files
@@ -229,6 +212,14 @@ class Precommit(BaseTool):
if updated_files is not None:
request.files = updated_files
# Check user input size at MCP transport boundary (before adding internal content)
user_content = request.prompt if request.prompt else ""
size_check = self.check_prompt_size(user_content)
if size_check:
from tools.models import ToolOutput
raise ValueError(f"MCP_SIZE_CHECK:{ToolOutput(**size_check).model_dump_json()}")
# Translate the path and files if running in Docker
translated_path = translate_path_for_environment(request.path)
translated_files = translate_file_paths(request.files)

View File

@@ -19,7 +19,6 @@ import logging
import os
from typing import Any, Literal, Optional
from mcp.types import TextContent
from pydantic import Field
from config import TEMPERATURE_ANALYTICAL
@@ -27,7 +26,6 @@ from systemprompts import REFACTOR_PROMPT
from utils.file_utils import translate_file_paths
from .base import BaseTool, ToolRequest
from .models import ToolOutput
logger = logging.getLogger(__name__)
@@ -154,25 +152,6 @@ class RefactorTool(BaseTool):
def get_request_model(self):
return RefactorRequest
async def execute(self, arguments: dict[str, Any]) -> list[TextContent]:
"""Override execute to check prompt size before processing"""
logger.info(f"[REFACTOR] execute called with arguments: {list(arguments.keys())}")
# First validate request
request_model = self.get_request_model()
request = request_model(**arguments)
# Check prompt size if provided
if request.prompt:
size_check = self.check_prompt_size(request.prompt)
if size_check:
logger.info("[REFACTOR] Prompt size check triggered, returning early")
return [TextContent(type="text", text=ToolOutput(**size_check).model_dump_json())]
logger.info("[REFACTOR] Prompt size OK, calling super().execute()")
# Continue with normal execution
return await super().execute(arguments)
def detect_primary_language(self, file_paths: list[str]) -> str:
"""
Detect the primary programming language from file extensions.
@@ -417,6 +396,14 @@ class RefactorTool(BaseTool):
logger.debug(f"[REFACTOR] Updated files list after prompt.txt processing: {len(updated_files)} files")
request.files = updated_files
# Check user input size at MCP transport boundary (before adding internal content)
user_content = request.prompt
size_check = self.check_prompt_size(user_content)
if size_check:
from tools.models import ToolOutput
raise ValueError(f"MCP_SIZE_CHECK:{ToolOutput(**size_check).model_dump_json()}")
# Calculate available token budget for dynamic allocation
continuation_id = getattr(request, "continuation_id", None)

View File

@@ -17,7 +17,6 @@ import logging
import os
from typing import Any, Optional
from mcp.types import TextContent
from pydantic import Field
from config import TEMPERATURE_ANALYTICAL
@@ -25,7 +24,6 @@ from systemprompts import TESTGEN_PROMPT
from utils.file_utils import translate_file_paths
from .base import BaseTool, ToolRequest
from .models import ToolOutput
logger = logging.getLogger(__name__)
@@ -145,21 +143,6 @@ class TestGenTool(BaseTool):
def get_request_model(self):
return TestGenRequest
async def execute(self, arguments: dict[str, Any]) -> list[TextContent]:
"""Override execute to check prompt size before processing"""
# First validate request
request_model = self.get_request_model()
request = request_model(**arguments)
# Check prompt size if provided
if request.prompt:
size_check = self.check_prompt_size(request.prompt)
if size_check:
return [TextContent(type="text", text=ToolOutput(**size_check).model_dump_json())]
# Continue with normal execution
return await super().execute(arguments)
def _process_test_examples(
self, test_examples: list[str], continuation_id: Optional[str], available_tokens: int = None
) -> tuple[str, str]:
@@ -294,6 +277,14 @@ class TestGenTool(BaseTool):
logger.debug(f"[TESTGEN] Updated files list after prompt.txt processing: {len(updated_files)} files")
request.files = updated_files
# Check user input size at MCP transport boundary (before adding internal content)
user_content = request.prompt
size_check = self.check_prompt_size(user_content)
if size_check:
from tools.models import ToolOutput
raise ValueError(f"MCP_SIZE_CHECK:{ToolOutput(**size_check).model_dump_json()}")
# Calculate available token budget for dynamic allocation
continuation_id = getattr(request, "continuation_id", None)

View File

@@ -4,7 +4,6 @@ ThinkDeep tool - Extended reasoning and problem-solving
from typing import TYPE_CHECKING, Any, Optional
from mcp.types import TextContent
from pydantic import Field
if TYPE_CHECKING:
@@ -14,7 +13,6 @@ from config import TEMPERATURE_CREATIVE
from systemprompts import THINKDEEP_PROMPT
from .base import BaseTool, ToolRequest
from .models import ToolOutput
class ThinkDeepRequest(ToolRequest):
@@ -121,20 +119,6 @@ class ThinkDeepTool(BaseTool):
def get_request_model(self):
return ThinkDeepRequest
async def execute(self, arguments: dict[str, Any]) -> list[TextContent]:
"""Override execute to check current_analysis size before processing"""
# First validate request
request_model = self.get_request_model()
request = request_model(**arguments)
# Check prompt size
size_check = self.check_prompt_size(request.prompt)
if size_check:
return [TextContent(type="text", text=ToolOutput(**size_check).model_dump_json())]
# Continue with normal execution
return await super().execute(arguments)
async def prepare_prompt(self, request: ThinkDeepRequest) -> str:
"""Prepare the full prompt for extended thinking"""
# Check for prompt.txt in files
@@ -143,6 +127,13 @@ class ThinkDeepTool(BaseTool):
# Use prompt.txt content if available, otherwise use the prompt field
current_analysis = prompt_content if prompt_content else request.prompt
# Check user input size at MCP transport boundary (before adding internal content)
size_check = self.check_prompt_size(current_analysis)
if size_check:
from tools.models import ToolOutput
raise ValueError(f"MCP_SIZE_CHECK:{ToolOutput(**size_check).model_dump_json()}")
# Update request files list
if updated_files is not None:
request.files = updated_files