Major new addition: refactor tool

Supports decomposing large components and files, finding codesmells, finding modernizing opportunities as well as code organization opportunities. Fix this mega-classes today!
Line numbers added to embedded code for better references from model -> claude
This commit is contained in:
Fahad
2025-06-15 06:00:01 +04:00
parent 70f1356e3e
commit b5004b91fc
28 changed files with 2633 additions and 310 deletions

View File

@@ -13,7 +13,10 @@ problem-solving, and collaborative development.
**Features true AI orchestration with conversations that continue across tasks** - Give Claude a complex **Features true AI orchestration with conversations that continue across tasks** - Give Claude a complex
task and let it orchestrate between models automatically. Claude stays in control, performs the actual work, task and let it orchestrate between models automatically. Claude stays in control, performs the actual work,
but gets perspectives from the best AI for each subtask. Claude can switch between different tools _and_ models mid-conversation, but gets perspectives from the best AI for each subtask. With tools like [`analyze`](#6-analyze---smart-file-analysis) for
understanding codebases, [`codereview`](#3-codereview---professional-code-review) for audits, [`refactor`](#7-refactor---intelligent-code-refactoring) for
improving code structure, [`debug`](#5-debug---expert-debugging-assistant) for solving complex problems, and [`precommit`](#4-precommit---pre-commit-validation) for
validating changes, Claude can switch between different tools _and_ models mid-conversation,
with context carrying forward seamlessly. with context carrying forward seamlessly.
**Example Workflow - Claude Code:** **Example Workflow - Claude Code:**
@@ -49,7 +52,8 @@ and review into consideration to aid with its pre-commit review.
- [`precommit`](#4-precommit---pre-commit-validation) - Pre-commit validation - [`precommit`](#4-precommit---pre-commit-validation) - Pre-commit validation
- [`debug`](#5-debug---expert-debugging-assistant) - Debugging help - [`debug`](#5-debug---expert-debugging-assistant) - Debugging help
- [`analyze`](#6-analyze---smart-file-analysis) - File analysis - [`analyze`](#6-analyze---smart-file-analysis) - File analysis
- [`testgen`](#7-testgen---comprehensive-test-generation) - Test generation with edge cases - [`refactor`](#7-refactor---intelligent-code-refactoring) - Code refactoring with decomposition focus
- [`testgen`](#8-testgen---comprehensive-test-generation) - Test generation with edge cases
- **Advanced Usage** - **Advanced Usage**
- [Advanced Features](#advanced-features) - AI-to-AI conversations, large prompts, web search - [Advanced Features](#advanced-features) - AI-to-AI conversations, large prompts, web search
@@ -256,6 +260,7 @@ Just ask Claude naturally:
- **Something's broken?** → `debug` (root cause analysis, error tracing) - **Something's broken?** → `debug` (root cause analysis, error tracing)
- **Want to understand code?** → `analyze` (architecture, patterns, dependencies) - **Want to understand code?** → `analyze` (architecture, patterns, dependencies)
- **Need comprehensive tests?** → `testgen` (generates test suites with edge cases) - **Need comprehensive tests?** → `testgen` (generates test suites with edge cases)
- **Code needs refactoring?** → `refactor` (intelligent refactoring with decomposition focus)
- **Server info?** → `version` (version and configuration details) - **Server info?** → `version` (version and configuration details)
**Auto Mode:** When `DEFAULT_MODEL=auto`, Claude automatically picks the best model for each task. You can override with: "Use flash for quick analysis" or "Use o3 to debug this". **Auto Mode:** When `DEFAULT_MODEL=auto`, Claude automatically picks the best model for each task. You can override with: "Use flash for quick analysis" or "Use o3 to debug this".
@@ -276,8 +281,9 @@ Just ask Claude naturally:
4. [`precommit`](#4-precommit---pre-commit-validation) - Validate git changes before committing 4. [`precommit`](#4-precommit---pre-commit-validation) - Validate git changes before committing
5. [`debug`](#5-debug---expert-debugging-assistant) - Root cause analysis and debugging 5. [`debug`](#5-debug---expert-debugging-assistant) - Root cause analysis and debugging
6. [`analyze`](#6-analyze---smart-file-analysis) - General-purpose file and code analysis 6. [`analyze`](#6-analyze---smart-file-analysis) - General-purpose file and code analysis
7. [`testgen`](#7-testgen---comprehensive-test-generation) - Comprehensive test generation with edge case coverage 7. [`refactor`](#7-refactor---intelligent-code-refactoring) - Code refactoring with decomposition focus
8. [`version`](#8-version---server-information) - Get server version and configuration 8. [`testgen`](#8-testgen---comprehensive-test-generation) - Comprehensive test generation with edge case coverage
9. [`version`](#9-version---server-information) - Get server version and configuration
### 1. `chat` - General Development Chat & Collaborative Thinking ### 1. `chat` - General Development Chat & Collaborative Thinking
**Your thinking partner - bounce ideas, get second opinions, brainstorm collaboratively** **Your thinking partner - bounce ideas, get second opinions, brainstorm collaboratively**
@@ -435,7 +441,44 @@ Use zen and perform a thorough precommit ensuring there aren't any new regressio
- Uses file paths (not content) for clean terminal output - Uses file paths (not content) for clean terminal output
- Can identify patterns, anti-patterns, and refactoring opportunities - Can identify patterns, anti-patterns, and refactoring opportunities
- **Web search capability**: When enabled with `use_websearch` (default: true), the model can request Claude to perform web searches and share results back to enhance analysis with current documentation, design patterns, and best practices - **Web search capability**: When enabled with `use_websearch` (default: true), the model can request Claude to perform web searches and share results back to enhance analysis with current documentation, design patterns, and best practices
### 7. `testgen` - Comprehensive Test Generation
### 7. `refactor` - Intelligent Code Refactoring
**Comprehensive refactoring analysis with top-down decomposition strategy**
**Thinking Mode:** Default is `medium` (8,192 tokens). Use `high` for complex legacy systems (worth the investment for thorough refactoring plans) or `max` for extremely complex codebases requiring deep analysis.
**Model Recommendation:** The refactor tool excels with models that have large context windows like Gemini
Pro (1M tokens), which can analyze entire files and complex codebases simultaneously.
This comprehensive view enables detection of cross-file dependencies, architectural patterns,
and refactoring opportunities that might be missed when reviewing code in smaller chunks due to context
constraints.
#### Example Prompts:
**Basic Usage:**
```
"Use gemini pro to decompose my_crazy_big_class.m into smaller extensions"
"Get gemini pro to identify code smells in the authentication module"
```
**Key Features:**
- **Intelligent prioritization** - Will refuse to work on low priority issues if code is unwieldy large and requires decomposition first, helps identify poorly managed classes and files that need structural improvements before detail work
- **Top-down decomposition strategy** - Analyzes file → class → function levels systematically
- **Four refactor types**: `codesmells` (detect anti-patterns), `decompose` (break down large components), `modernize` (update language features), `organization` (improve structure)
- **Precise line-number references** - Provides exact line numbers for Claude to implement changes
- **Language-specific guidance** - Tailored suggestions for Python, JavaScript, Java, C#, Swift, and more
- **Style guide integration** - Uses existing project files as pattern references
- **Conservative approach** - Careful dependency analysis to prevent breaking changes
- **Multi-file analysis** - Understands cross-file relationships and dependencies
- **Priority sequencing** - Recommends implementation order for refactoring changes
**Refactor Types:**
- `codesmells`: Detect long methods, god classes, duplicated code, poor naming
- `decompose`: Break down large files (>1500 LOC), classes (>300 LOC), functions (>80 LOC)
- `modernize`: Update to modern language features (f-strings, async/await, etc.)
- `organization`: Improve logical grouping, separation of concerns, module structure
### 8. `testgen` - Comprehensive Test Generation
**Generates thorough test suites with edge case coverage** based on existing code and test framework used. **Generates thorough test suites with edge case coverage** based on existing code and test framework used.
**Thinking Mode (Extended thinking models):** Default is `medium` (8,192 tokens). Use `high` for complex systems with many interactions or `max` for critical systems requiring exhaustive test coverage. **Thinking Mode (Extended thinking models):** Default is `medium` (8,192 tokens). Use `high` for complex systems with many interactions or `max` for critical systems requiring exhaustive test coverage.
@@ -463,7 +506,7 @@ suites that cover realistic failure scenarios and integration points that shorte
- Can reference existing test files: `"Generate tests following patterns from tests/unit/"` - Can reference existing test files: `"Generate tests following patterns from tests/unit/"`
- Specific code coverage - target specific functions/classes rather than testing everything - Specific code coverage - target specific functions/classes rather than testing everything
### 8. `version` - Server Information ### 9. `version` - Server Information
``` ```
"Get zen to show its version" "Get zen to show its version"
``` ```

View File

@@ -14,7 +14,7 @@ import os
# These values are used in server responses and for tracking releases # These values are used in server responses and for tracking releases
# IMPORTANT: This is the single source of truth for version and author info # IMPORTANT: This is the single source of truth for version and author info
# Semantic versioning: MAJOR.MINOR.PATCH # Semantic versioning: MAJOR.MINOR.PATCH
__version__ = "4.4.4" __version__ = "4.5.0"
# Last update date in ISO format # Last update date in ISO format
__updated__ = "2025-06-14" __updated__ = "2025-06-14"
# Primary maintainer # Primary maintainer

View File

@@ -259,6 +259,23 @@ All tools that work with files support **both individual files and entire direct
"Generate tests following patterns from tests/unit/ for new auth module" "Generate tests following patterns from tests/unit/ for new auth module"
``` ```
**`refactor`** - Intelligent code refactoring with decomposition focus
- `files`: Code files or directories to analyze for refactoring opportunities (required)
- `prompt`: Description of refactoring goals, context, and specific areas of focus (required)
- `refactor_type`: codesmells|decompose|modernize|organization (required)
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high (default: server default)
- `focus_areas`: Specific areas to focus on (e.g., 'performance', 'readability', 'maintainability', 'security')
- `style_guide_examples`: Optional existing code files to use as style/pattern reference
- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
- `continuation_id`: Thread continuation ID for multi-turn conversations
```
"Analyze legacy codebase for decomposition opportunities" (auto mode picks best model)
"Use pro to identify code smells in the authentication module with max thinking mode"
"Use pro to modernize this JavaScript code following examples/modern-patterns.js"
"Refactor src/ for better organization, focus on maintainability and readability"
```
## Collaborative Workflows ## Collaborative Workflows
### Design → Review → Implement ### Design → Review → Implement
@@ -284,6 +301,14 @@ suspect lies the bug and then formulate and implement a bare minimal fix. Must n
with zen in the end using gemini pro to confirm we're okay to publish the fix with zen in the end using gemini pro to confirm we're okay to publish the fix
``` ```
### Refactor → Review → Implement → Test
```
Use zen to analyze this legacy authentication module for decomposition opportunities. The code is getting hard to
maintain and we need to break it down. Use gemini pro with high thinking mode to identify code smells and suggest
a modernization strategy. After reviewing the refactoring plan, implement the changes step by step and then
generate comprehensive tests with zen to ensure nothing breaks.
```
### Tool Selection Guidance ### Tool Selection Guidance
To help choose the right tool for your needs: To help choose the right tool for your needs:
@@ -292,14 +317,17 @@ To help choose the right tool for your needs:
2. **Want to find bugs/issues in code?** Use `codereview` 2. **Want to find bugs/issues in code?** Use `codereview`
3. **Want to understand how code works?** Use `analyze` 3. **Want to understand how code works?** Use `analyze`
4. **Need comprehensive test coverage?** Use `testgen` 4. **Need comprehensive test coverage?** Use `testgen`
5. **Have analysis that needs extension/validation?** Use `thinkdeep` 5. **Want to refactor/modernize code?** Use `refactor`
6. **Want to brainstorm or discuss?** Use `chat` 6. **Have analysis that needs extension/validation?** Use `thinkdeep`
7. **Want to brainstorm or discuss?** Use `chat`
**Key Distinctions:** **Key Distinctions:**
- `analyze` vs `codereview`: analyze explains, codereview prescribes fixes - `analyze` vs `codereview`: analyze explains, codereview prescribes fixes
- `chat` vs `thinkdeep`: chat is open-ended, thinkdeep extends specific analysis - `chat` vs `thinkdeep`: chat is open-ended, thinkdeep extends specific analysis
- `debug` vs `codereview`: debug diagnoses runtime errors, review finds static issues - `debug` vs `codereview`: debug diagnoses runtime errors, review finds static issues
- `testgen` vs `debug`: testgen creates test suites, debug just finds issues and recommends solutions - `testgen` vs `debug`: testgen creates test suites, debug just finds issues and recommends solutions
- `refactor` vs `codereview`: refactor suggests structural improvements, codereview finds bugs/issues
- `refactor` vs `analyze`: refactor provides actionable refactoring steps, analyze provides understanding
## Working with Large Prompts ## Working with Large Prompts

View File

@@ -30,6 +30,29 @@ Simulator tests replicate real-world Claude CLI interactions with the MCP server
**Important**: Simulator tests require `LOG_LEVEL=DEBUG` in your `.env` file to validate detailed execution logs. **Important**: Simulator tests require `LOG_LEVEL=DEBUG` in your `.env` file to validate detailed execution logs.
#### Monitoring Logs During Tests
**Important**: The MCP stdio protocol interferes with stderr output during tool execution. While server startup logs appear in `docker compose logs`, tool execution logs are only written to file-based logs inside the container. This is a known limitation of the stdio-based MCP protocol and cannot be fixed without changing the MCP implementation.
To monitor logs during test execution:
```bash
# Monitor main server logs (includes all tool execution details)
docker exec zen-mcp-server tail -f -n 500 /tmp/mcp_server.log
# Monitor MCP activity logs (tool calls and completions)
docker exec zen-mcp-server tail -f /tmp/mcp_activity.log
# Check log file sizes (logs rotate at 20MB)
docker exec zen-mcp-server ls -lh /tmp/mcp_*.log*
```
**Log Rotation**: All log files are configured with automatic rotation at 20MB to prevent disk space issues. The server keeps:
- 10 rotated files for mcp_server.log (200MB total)
- 5 rotated files for mcp_activity.log (100MB total)
**Why logs don't appear in docker compose logs**: The MCP stdio_server captures stderr during tool execution to prevent interference with the JSON-RPC protocol communication. This means that while you'll see startup logs in `docker compose logs`, you won't see tool execution logs there.
#### Running All Simulator Tests #### Running All Simulator Tests
```bash ```bash
# Run all simulator tests # Run all simulator tests

View File

@@ -44,6 +44,7 @@ from tools import (
CodeReviewTool, CodeReviewTool,
DebugIssueTool, DebugIssueTool,
Precommit, Precommit,
RefactorTool,
TestGenTool, TestGenTool,
ThinkDeepTool, ThinkDeepTool,
) )
@@ -70,55 +71,59 @@ class LocalTimeFormatter(logging.Formatter):
# Configure both console and file logging # Configure both console and file logging
log_format = "%(asctime)s - %(name)s - %(levelname)s - %(message)s" log_format = "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
logging.basicConfig(
level=getattr(logging, log_level, logging.INFO),
format=log_format,
force=True, # Force reconfiguration if already configured
stream=sys.stderr, # Use stderr to avoid interfering with MCP stdin/stdout protocol
)
# Apply local time formatter to root logger # Clear any existing handlers first
for handler in logging.getLogger().handlers: root_logger = logging.getLogger()
handler.setFormatter(LocalTimeFormatter(log_format)) root_logger.handlers.clear()
# Create and configure stderr handler explicitly
stderr_handler = logging.StreamHandler(sys.stderr)
stderr_handler.setLevel(getattr(logging, log_level, logging.INFO))
stderr_handler.setFormatter(LocalTimeFormatter(log_format))
root_logger.addHandler(stderr_handler)
# Note: MCP stdio_server interferes with stderr during tool execution
# All logs are properly written to /tmp/mcp_server.log for monitoring
# Set root logger level
root_logger.setLevel(getattr(logging, log_level, logging.INFO))
# Add rotating file handler for Docker log monitoring # Add rotating file handler for Docker log monitoring
try: try:
# Main server log with daily rotation (keep 7 days of logs) # Main server log with size-based rotation (20MB max per file)
# Using 'midnight' interval rotates at midnight each day # This ensures logs don't grow indefinitely and are properly managed
# Filename will have date suffix like mcp_server.log.2024-06-14 file_handler = RotatingFileHandler(
file_handler = TimedRotatingFileHandler(
"/tmp/mcp_server.log", "/tmp/mcp_server.log",
when="midnight", # Rotate at midnight maxBytes=20 * 1024 * 1024, # 20MB max file size
interval=1, # Every 1 day backupCount=10, # Keep 10 rotated files (200MB total)
backupCount=7, # Keep 7 days of logs
encoding="utf-8", encoding="utf-8",
) )
file_handler.setLevel(getattr(logging, log_level, logging.INFO)) file_handler.setLevel(getattr(logging, log_level, logging.INFO))
file_handler.setFormatter(LocalTimeFormatter(log_format)) file_handler.setFormatter(LocalTimeFormatter(log_format))
# Add suffix pattern for rotated files
file_handler.suffix = "%Y-%m-%d"
logging.getLogger().addHandler(file_handler) logging.getLogger().addHandler(file_handler)
# Create a special logger for MCP activity tracking with daily rotation # Create a special logger for MCP activity tracking with size-based rotation
mcp_logger = logging.getLogger("mcp_activity") mcp_logger = logging.getLogger("mcp_activity")
mcp_file_handler = TimedRotatingFileHandler( mcp_file_handler = RotatingFileHandler(
"/tmp/mcp_activity.log", "/tmp/mcp_activity.log",
when="midnight", # Rotate at midnight maxBytes=20 * 1024 * 1024, # 20MB max file size
interval=1, # Every 1 day backupCount=5, # Keep 5 rotated files (100MB total)
backupCount=7, # Keep 7 days of logs
encoding="utf-8", encoding="utf-8",
) )
mcp_file_handler.setLevel(logging.INFO) mcp_file_handler.setLevel(logging.INFO)
mcp_file_handler.setFormatter(LocalTimeFormatter("%(asctime)s - %(message)s")) mcp_file_handler.setFormatter(LocalTimeFormatter("%(asctime)s - %(message)s"))
mcp_file_handler.suffix = "%Y-%m-%d"
mcp_logger.addHandler(mcp_file_handler) mcp_logger.addHandler(mcp_file_handler)
mcp_logger.setLevel(logging.INFO) mcp_logger.setLevel(logging.INFO)
# Ensure MCP activity also goes to stderr
mcp_logger.propagate = True
# Also keep a size-based rotation as backup (100MB max per file) # Also keep a size-based rotation as backup (100MB max per file)
# This prevents any single day's log from growing too large # This prevents any single day's log from growing too large
size_handler = RotatingFileHandler( size_handler = RotatingFileHandler(
"/tmp/mcp_server_overflow.log", maxBytes=100 * 1024 * 1024, backupCount=3 # 100MB "/tmp/mcp_server_overflow.log",
maxBytes=100 * 1024 * 1024,
backupCount=3, # 100MB
) )
size_handler.setLevel(logging.WARNING) # Only warnings and errors size_handler.setLevel(logging.WARNING) # Only warnings and errors
size_handler.setFormatter(LocalTimeFormatter(log_format)) size_handler.setFormatter(LocalTimeFormatter(log_format))
@@ -144,6 +149,7 @@ TOOLS = {
"chat": ChatTool(), # Interactive development chat and brainstorming "chat": ChatTool(), # Interactive development chat and brainstorming
"precommit": Precommit(), # Pre-commit validation of git changes "precommit": Precommit(), # Pre-commit validation of git changes
"testgen": TestGenTool(), # Comprehensive test generation with edge case coverage "testgen": TestGenTool(), # Comprehensive test generation with edge case coverage
"refactor": RefactorTool(), # Intelligent code refactoring suggestions with precise line references
} }

View File

@@ -19,6 +19,7 @@ from .test_openrouter_fallback import OpenRouterFallbackTest
from .test_openrouter_models import OpenRouterModelsTest from .test_openrouter_models import OpenRouterModelsTest
from .test_per_tool_deduplication import PerToolDeduplicationTest from .test_per_tool_deduplication import PerToolDeduplicationTest
from .test_redis_validation import RedisValidationTest from .test_redis_validation import RedisValidationTest
from .test_refactor_validation import RefactorValidationTest
from .test_testgen_validation import TestGenValidationTest from .test_testgen_validation import TestGenValidationTest
from .test_token_allocation_validation import TokenAllocationValidationTest from .test_token_allocation_validation import TokenAllocationValidationTest
@@ -38,6 +39,7 @@ TEST_REGISTRY = {
"openrouter_models": OpenRouterModelsTest, "openrouter_models": OpenRouterModelsTest,
"token_allocation_validation": TokenAllocationValidationTest, "token_allocation_validation": TokenAllocationValidationTest,
"testgen_validation": TestGenValidationTest, "testgen_validation": TestGenValidationTest,
"refactor_validation": RefactorValidationTest,
"conversation_chain_validation": ConversationChainValidationTest, "conversation_chain_validation": ConversationChainValidationTest,
} }
@@ -57,6 +59,7 @@ __all__ = [
"OpenRouterModelsTest", "OpenRouterModelsTest",
"TokenAllocationValidationTest", "TokenAllocationValidationTest",
"TestGenValidationTest", "TestGenValidationTest",
"RefactorValidationTest",
"ConversationChainValidationTest", "ConversationChainValidationTest",
"TEST_REGISTRY", "TEST_REGISTRY",
] ]

View File

@@ -0,0 +1,283 @@
#!/usr/bin/env python3
"""
Refactor Tool Validation Test
Tests the refactor tool with a simple code smell example to validate:
- Proper execution with flash model
- Correct line number references in response
- Log validation for tool execution
"""
import json
from .base_test import BaseSimulatorTest
class RefactorValidationTest(BaseSimulatorTest):
"""Test refactor tool with codesmells detection"""
@property
def test_name(self) -> str:
return "refactor_validation"
@property
def test_description(self) -> str:
return "Refactor tool validation with codesmells"
def run_test(self) -> bool:
"""Test refactor tool with a simple code smell example"""
try:
self.logger.info("Test: Refactor tool validation")
# Setup test files directory first
self.setup_test_files()
# Create a simple Python file with obvious code smells
code_with_smells = '''# Code with obvious smells for testing
def process_data(data):
# Code smell: Magic number
if len(data) > 42:
result = []
# Code smell: Nested loops with poor variable names
for i in range(len(data)):
for j in range(len(data[i])):
x = data[i][j]
# Code smell: Duplicate code
if x > 0:
result.append(x * 2)
elif x < 0:
result.append(x * 2)
return result
else:
# Code smell: Return inconsistent type
return None
# Code smell: God function doing too many things
def handle_everything(user_input, config, database):
# Validation
if not user_input:
print("Error: No input") # Code smell: print instead of logging
return
# Processing
processed = user_input.strip().lower()
# Database operation
connection = database.connect()
data = connection.query("SELECT * FROM users") # Code smell: SQL in code
# Business logic mixed with data access
valid_users = []
for row in data:
if row[2] == processed: # Code smell: Magic index
valid_users.append(row)
return valid_users
'''
# Create test file
test_file = self.create_additional_test_file("smelly_code.py", code_with_smells)
self.logger.info(f" ✅ Created test file with code smells: {test_file}")
# Call refactor tool with codesmells type
self.logger.info(" 📝 Calling refactor tool with codesmells type...")
response, _ = self.call_mcp_tool(
"refactor",
{
"files": [test_file],
"prompt": "Find and suggest fixes for code smells in this file",
"refactor_type": "codesmells",
"model": "flash",
"thinking_mode": "low", # Keep it fast for testing
}
)
if not response:
self.logger.error("Failed to get refactor response")
return False
self.logger.info(" ✅ Got refactor response")
# Parse response to check for line references
try:
response_data = json.loads(response)
# Debug: log the response structure
self.logger.debug(f"Response keys: {list(response_data.keys())}")
# Extract the actual content if it's wrapped
if "content" in response_data:
# The actual refactoring data is in the content field
content = response_data["content"]
# Remove markdown code block markers if present
if content.startswith("```json"):
content = content[7:] # Remove ```json
if content.endswith("```"):
content = content[:-3] # Remove ```
content = content.strip()
# Find the end of the JSON object - handle truncated responses
# Count braces to find where the JSON ends
brace_count = 0
json_end = -1
in_string = False
escape_next = False
for i, char in enumerate(content):
if escape_next:
escape_next = False
continue
if char == '\\':
escape_next = True
continue
if char == '"' and not escape_next:
in_string = not in_string
if not in_string:
if char == '{':
brace_count += 1
elif char == '}':
brace_count -= 1
if brace_count == 0:
json_end = i + 1
break
if json_end > 0:
content = content[:json_end]
# Parse the inner JSON
inner_data = json.loads(content)
self.logger.debug(f"Inner data keys: {list(inner_data.keys())}")
else:
inner_data = response_data
# Check that we got refactoring suggestions (might be called refactor_opportunities)
refactorings_key = None
for key in ["refactorings", "refactor_opportunities"]:
if key in inner_data:
refactorings_key = key
break
if not refactorings_key:
self.logger.error("No refactorings found in response")
self.logger.error(f"Response structure: {json.dumps(inner_data, indent=2)[:500]}...")
return False
refactorings = inner_data[refactorings_key]
if not isinstance(refactorings, list) or len(refactorings) == 0:
self.logger.error("Empty refactorings list")
return False
# Validate that we have line references for code smells
# Flash model typically detects these issues:
# - Lines 4-18: process_data function (magic number, nested loops, duplicate code)
# - Lines 11-14: duplicate code blocks
# - Lines 21-40: handle_everything god function
expected_line_ranges = [
(4, 18), # process_data function issues
(11, 14), # duplicate code
(21, 40), # god function
]
self.logger.debug(f"Refactorings found: {len(refactorings)}")
for i, ref in enumerate(refactorings[:3]): # Log first 3
self.logger.debug(f"Refactoring {i}: start_line={ref.get('start_line')}, end_line={ref.get('end_line')}, type={ref.get('type')}")
found_references = []
for refactoring in refactorings:
# Check for line numbers in various fields
start_line = refactoring.get("start_line")
end_line = refactoring.get("end_line")
location = refactoring.get("location", "")
# Add found line numbers
if start_line:
found_references.append(f"line {start_line}")
if end_line and end_line != start_line:
found_references.append(f"line {end_line}")
# Also extract from location string
import re
line_matches = re.findall(r'line[s]?\s+(\d+)', location.lower())
found_references.extend([f"line {num}" for num in line_matches])
self.logger.info(f" 📍 Found line references: {found_references}")
# Check that flash found the expected refactoring areas
found_ranges = []
for refactoring in refactorings:
start = refactoring.get("start_line")
end = refactoring.get("end_line")
if start and end:
found_ranges.append((start, end))
self.logger.info(f" 📍 Found refactoring ranges: {found_ranges}")
# Verify we found issues in the main problem areas
# Check if we have issues detected in process_data function area (lines 2-18)
process_data_issues = [r for r in found_ranges if r[0] >= 2 and r[1] <= 18]
# Check if we have issues detected in handle_everything function area (lines 21-40)
god_function_issues = [r for r in found_ranges if r[0] >= 21 and r[1] <= 40]
self.logger.info(f" 📍 Issues in process_data area (lines 2-18): {len(process_data_issues)}")
self.logger.info(f" 📍 Issues in handle_everything area (lines 21-40): {len(god_function_issues)}")
if len(process_data_issues) >= 1 and len(god_function_issues) >= 1:
self.logger.info(f" ✅ Flash correctly identified code smells in both major areas")
self.logger.info(f" ✅ Found {len(refactorings)} total refactoring opportunities")
# Verify we have reasonable number of total issues
if len(refactorings) >= 3:
self.logger.info(f" ✅ Refactoring analysis validation passed")
else:
self.logger.warning(f" ⚠️ Only {len(refactorings)} refactorings found (expected >= 3)")
else:
self.logger.error(f" ❌ Flash didn't find enough issues in expected areas")
self.logger.error(f" - process_data area: found {len(process_data_issues)}, expected >= 1")
self.logger.error(f" - handle_everything area: found {len(god_function_issues)}, expected >= 1")
return False
except json.JSONDecodeError as e:
self.logger.error(f"Failed to parse refactor response as JSON: {e}")
return False
# Validate logs
self.logger.info(" 📋 Validating execution logs...")
# Get server logs from the actual log file inside the container
result = self.run_command(
["docker", "exec", self.container_name, "tail", "-500", "/tmp/mcp_server.log"],
capture_output=True
)
if result.returncode == 0:
logs = result.stdout.decode() + result.stderr.decode()
# Look for refactor tool execution patterns
refactor_patterns = [
"[REFACTOR]",
"refactor tool",
"codesmells",
"Token budget",
"Code files embedded successfully"
]
patterns_found = 0
for pattern in refactor_patterns:
if pattern in logs:
patterns_found += 1
self.logger.debug(f" ✅ Found log pattern: {pattern}")
if patterns_found >= 3:
self.logger.info(f" ✅ Log validation passed ({patterns_found}/{len(refactor_patterns)} patterns)")
else:
self.logger.warning(f" ⚠️ Only found {patterns_found}/{len(refactor_patterns)} log patterns")
else:
self.logger.warning(" ⚠️ Could not retrieve Docker logs")
self.logger.info(" ✅ Refactor tool validation completed successfully")
return True
except Exception as e:
self.logger.error(f"Refactor validation test failed: {e}")
return False
finally:
self.cleanup_test_files()

View File

@@ -7,6 +7,7 @@ from .chat_prompt import CHAT_PROMPT
from .codereview_prompt import CODEREVIEW_PROMPT from .codereview_prompt import CODEREVIEW_PROMPT
from .debug_prompt import DEBUG_ISSUE_PROMPT from .debug_prompt import DEBUG_ISSUE_PROMPT
from .precommit_prompt import PRECOMMIT_PROMPT from .precommit_prompt import PRECOMMIT_PROMPT
from .refactor_prompt import REFACTOR_PROMPT
from .testgen_prompt import TESTGEN_PROMPT from .testgen_prompt import TESTGEN_PROMPT
from .thinkdeep_prompt import THINKDEEP_PROMPT from .thinkdeep_prompt import THINKDEEP_PROMPT
@@ -17,5 +18,6 @@ __all__ = [
"ANALYZE_PROMPT", "ANALYZE_PROMPT",
"CHAT_PROMPT", "CHAT_PROMPT",
"PRECOMMIT_PROMPT", "PRECOMMIT_PROMPT",
"REFACTOR_PROMPT",
"TESTGEN_PROMPT", "TESTGEN_PROMPT",
] ]

View File

@@ -15,6 +15,11 @@ same file you've been provided unless for some reason its content is missing or
{"status": "clarification_required", "question": "<your brief question>", {"status": "clarification_required", "question": "<your brief question>",
"files_needed": ["[file name here]", "[or some folder/]"]} "files_needed": ["[file name here]", "[or some folder/]"]}
CRITICAL LINE NUMBER INSTRUCTIONS
Code is presented with line number markers "LINE│ code". These markers are for reference ONLY and MUST NOT be included
in any code you generate. Always reference specific line numbers for precise feedback. Include exact line numbers in
your issue descriptions.
CRITICAL: Align your review with the user's context and expectations. Focus on issues that matter for their CRITICAL: Align your review with the user's context and expectations. Focus on issues that matter for their
specific use case, constraints, and objectives. Don't provide a generic "find everything" review - tailor specific use case, constraints, and objectives. Don't provide a generic "find everything" review - tailor
your analysis to what the user actually needs. your analysis to what the user actually needs.

View File

@@ -15,6 +15,11 @@ Do NOT ask for the same file you've been provided unless for some reason its con
{"status": "clarification_required", "question": "<your brief question>", {"status": "clarification_required", "question": "<your brief question>",
"files_needed": ["[file name here]", "[or some folder/]"]} "files_needed": ["[file name here]", "[or some folder/]"]}
CRITICAL LINE NUMBER INSTRUCTIONS
Code is presented with line number markers "LINE│ code". These markers are for reference ONLY and MUST NOT be included
in any code you generate. Always reference specific line numbers for precise feedback. Include exact line numbers in
your issue descriptions.
CRITICAL: Your primary objective is to identify the root cause of the specific issue at hand and suggest the CRITICAL: Your primary objective is to identify the root cause of the specific issue at hand and suggest the
minimal fix required to resolve it. Stay focused on the main problem - avoid suggesting extensive refactoring, minimal fix required to resolve it. Stay focused on the main problem - avoid suggesting extensive refactoring,
architectural changes, or unrelated improvements. architectural changes, or unrelated improvements.

View File

@@ -0,0 +1,201 @@
"""
Refactor tool system prompt
"""
REFACTOR_PROMPT = """
ROLE
You are a principal software engineer specializing in intelligent code refactoring. You identify concrete improvement
opportunities and provide precise, actionable suggestions with exact line-number references that Claude can
implement directly.
IF MORE INFORMATION IS NEEDED
If you need additional context (e.g., related files, configuration, dependencies) to provide accurate refactoring
recommendations, you MUST respond ONLY with this JSON format (and nothing else). Do NOT ask for the same file you've
been provided unless for some reason its content is missing or incomplete:
{"status": "clarification_required", "question": "<your brief question>",
"files_needed": ["[file name here]", "[or some folder/]"]}
REFACTOR TYPES (PRIORITY ORDER)
1. **decompose** (CRITICAL PRIORITY)
2. **codesmells**
3. **modernize**
4. **organization**
**decompose**: CRITICAL PRIORITY for cognitive load reduction. When encountering large files (>1500 lines), huge classes
(>300 lines), or massive functions (>80 lines), decomposition is MANDATORY before any other refactoring type. Large
codebases are impossible to navigate, understand, or maintain.
DECOMPOSITION ORDER (STRICT TOP-DOWN, ADAPTIVE):
Analyze in this sequence, stopping at the FIRST breached threshold in each file:
1. **File Level (>1500 LOC)** → Propose file-level splits ONLY, then re-analyze after implementation
2. **Class Level (>300 LOC)** → Propose class extraction ONLY, then re-analyze after implementation
3. **Function Level (>80 LOC)** → Propose function extraction
RATIONALE: Outer-scope size dominates cognitive load and merge complexity. NEVER descend to an inner level until
the containing level is within its threshold. This prevents premature micro-optimization and ensures maximum
cognitive load reduction with minimum rework.
DECOMPOSITION STRATEGIES:
**File-Level Decomposition** (PRIORITY 1): Split oversized files into multiple focused files:
- Extract related classes/functions into separate modules using platform-specific patterns
- Create logical groupings (models, services, utilities, components, etc.)
- Use proper import/export mechanisms for the target language
- Focus on responsibility-based splits, not arbitrary size cuts
- CAUTION: When only a single file is provided, verify dependencies and imports before suggesting file splits
- DEPENDENCY ANALYSIS: Check for cross-references, shared constants, and inter-class dependencies
- If splitting breaks internal dependencies, suggest necessary visibility changes or shared modules
**Class-Level Decomposition** (PRIORITY 2): Break down mega-classes:
- FIRST: Split large classes into multiple classes where programming language allows (C# partial classes,
Swift and ObjC extensions, JavaScript modules, etc.)
- THEN: Extract specialized responsibilities into focused classes via composition or inheritance if this is feasible
- Use composition over inheritance where appropriate
- Apply single responsibility principle cautiously - avoid breaking existing APIs or adding new dependencies
- When only a single file is provided, prefer internal splitting methods (private classes, inner classes,
helper methods)
- Consider interface segregation for large public APIs only if it doesn't break existing consumers
- CRITICAL: When moving code between files/extensions, analyze access dependencies (private variables,
internal methods)
- WARNING: Some moves may break access visibility (Swift private→extension, C# internal→assembly) - flag for review
- If access breaks are unavoidable, explicitly note required visibility changes (private→internal, protected, etc.)
**Function-Level Decomposition** (PRIORITY 3): Eliminate long, complex functions:
- Extract logical chunks into private/helper methods within the same class/module
- Separate data processing from business logic conservatively
- Create clear, named abstractions for complex operations without breaking existing call sites
- Maintain function cohesion and minimize parameter passing
- Prefer internal extraction over creating new dependencies or external functions
- ANALYZE DEPENDENCIES: Check for private variable access, closure captures, and scope-dependent behavior
- If extraction breaks variable access, suggest parameter passing or scope adjustments
- Flag functions that require manual review due to complex inter-dependencies
CRITICAL RULE: If ANY file exceeds cognitive complexity thresholds (large files/classes/functions), you MUST:
1. Mark ALL decomposition opportunities as CRITICAL severity
2. Focus EXCLUSIVELY on decomposition - provide ONLY decomposition suggestions
3. DO NOT suggest ANY other refactoring type (code smells, modernization, organization)
4. List decomposition issues FIRST by severity: CRITICAL → HIGH → MEDIUM → LOW
5. Block all other refactoring until cognitive load is reduced
CRITICAL SEVERITY = BLOCKING ISSUE: Other refactoring types can only be applied AFTER all CRITICAL decomposition
is complete. Decomposition reduces navigation complexity, improves understanding, enables focused changes, and makes
future refactoring possible.
**codesmells**: Detect and fix quality issues - long methods, complex conditionals, duplicate code, magic numbers,
poor naming, feature envy. NOTE: Can only be applied AFTER decomposition if large files/classes/functions exist.
**modernize**: Update to modern language features - replace deprecated patterns, use newer syntax, improve error
handling and type safety. NOTE: Can only be applied AFTER decomposition if large files/classes/functions exist.
**organization**: Improve organization and structure - group related functionality, improve file structure,
standardize naming, clarify module boundaries. NOTE: Can only be applied AFTER decomposition if large files exist.
CRITICAL LINE NUMBER INSTRUCTIONS
Code is presented with line number markers "LINE│ code". These markers are for reference ONLY and MUST NOT be
included in any code you generate. Always reference specific line numbers for Claude to locate exact positions.
Include context_start_text and context_end_text as backup references. Never include "LINE│" markers in generated code
snippets.
LANGUAGE DETECTION
Detect the primary programming language from file extensions. Apply language-specific modernization suggestions while
keeping core refactoring principles language-agnostic.
SCOPE CONTROL
Stay strictly within the provided codebase. Do NOT invent features, suggest major architectural changes beyond current
structure, recommend external libraries not in use, or create speculative ideas outside project scope.
If scope is too large and refactoring would require large parts of the code to be involved, respond ONLY with:
{"status": "focused_review_required",
"reason": "<brief explanation>",
"suggestion": "<specific focused subset to analyze>"}
OUTPUT FORMAT
Return ONLY a JSON object with this exact structure:
{
"status": "refactor_analysis_complete",
"refactor_opportunities": [
{
"id": "refactor-001",
"type": "decompose|codesmells|modernize|organization",
"severity": "critical|high|medium|low",
"file": "/absolute/path/to/file.ext",
"start_line": 45,
"end_line": 67,
"context_start_text": "exact text from start line for verification",
"context_end_text": "exact text from end line for verification",
"issue": "Clear description of what needs refactoring",
"suggestion": "Specific refactoring action to take",
"rationale": "Why this improves the code (performance, readability, maintainability)",
"code_to_replace": "Original code that should be changed",
"replacement_code_snippet": "Refactored version of the code",
"new_code_snippets": [
{
"description": "What this new code does",
"location": "same_class|new_file|separate_module",
"code": "New code to be added"
}
]
}
],
"priority_sequence": ["refactor-001", "refactor-002"],
"next_actions_for_claude": [
{
"action_type": "EXTRACT_METHOD|SPLIT_CLASS|MODERNIZE_SYNTAX|REORGANIZE_CODE|DECOMPOSE_FILE",
"target_file": "/absolute/path/to/file.ext",
"source_lines": "45-67",
"description": "Specific step-by-step action for Claude"
}
]
}
QUALITY STANDARDS
Each refactoring opportunity must be specific and actionable. Code snippets must be syntactically correct. Preserve
existing functionality - refactoring changes structure, not behavior. Focus on high-impact changes that meaningfully
improve code quality.
SEVERITY GUIDELINES
- **critical**: EXCLUSIVELY for decomposition when large files/classes/functions detected - BLOCKS ALL OTHER
REFACTORING
- **high**: Critical code smells, major duplication, significant architectural issues (only after decomposition
complete)
- **medium**: Moderate complexity issues, minor duplication, organization improvements (only after decomposition
complete)
- **low**: Style improvements, minor modernization, optional optimizations (only after decomposition complete)
DECOMPOSITION PRIORITY RULES - CRITICAL SEVERITY:
1. If ANY file >2000 lines: Mark ALL decomposition opportunities as CRITICAL severity
2. If ANY class >1500 lines: Mark ALL class decomposition as CRITICAL severity
3. If ANY function >250 lines: Mark ALL function decomposition as CRITICAL severity
4. CRITICAL issues MUST BE RESOLVED FIRST - no other refactoring suggestions allowed
5. Focus EXCLUSIVELY on breaking down large components when CRITICAL issues exist
6. List ALL decomposition issues FIRST in severity order: CRITICAL → HIGH → MEDIUM → LOW
7. When CRITICAL decomposition issues exist, provide ONLY decomposition suggestions
FILE TYPE CONSIDERATIONS:
- CSS files can grow large with styling rules - consider logical grouping by components/pages
- JavaScript files may have multiple classes/modules - extract into separate files
- Configuration files may be legitimately large - focus on logical sections
- Generated code files should generally be excluded from decomposition
IF EXTENSIVE REFACTORING IS REQUIRED
If you determine that comprehensive refactoring requires dozens of changes across multiple files or would involve
extensive back-and-forth iterations that would risk exceeding context limits, you MUST follow this structured approach:
1. **Generate Essential Refactorings First**: Create the standard refactor_analysis_complete response with the most
critical and high-impact refactoring opportunities (typically 5-10 key changes covering the most important
improvements). Focus on CRITICAL and HIGH severity issues. Include full details with refactor_opportunities,
priority_sequence, and next_actions_for_claude.
2. **Request Continuation**: AFTER providing the refactor_analysis_complete response, append the following JSON
format as a separate response (and nothing more after this):
{"status": "more_refactor_required",
"message": "Explanation of why more refactoring is needed and overview of remaining work. For example: 'Extensive decomposition required across 15 additional files. Continuing analysis will identify module extraction opportunities in services/, controllers/, and utils/ directories.'"}
This approach ensures comprehensive refactoring coverage while maintaining quality and avoiding context overflow.
Claude will use the continuation_id to continue the refactoring analysis in subsequent requests.
Provide precise, implementable refactoring guidance that Claude can execute with confidence.
"""

View File

@@ -15,6 +15,11 @@ same file you've been provided unless for some reason its content is missing or
{"status": "clarification_required", "question": "<your brief question>", {"status": "clarification_required", "question": "<your brief question>",
"files_needed": ["[file name here]", "[or some folder/]"]} "files_needed": ["[file name here]", "[or some folder/]"]}
CRITICAL LINE NUMBER INSTRUCTIONS
Code is presented with line number markers "LINE│ code". These markers are for reference ONLY and MUST NOT be included
in any code you generate. Always reference specific line numbers for precise feedback. Include exact line numbers in
your issue descriptions.
MULTI-AGENT WORKFLOW MULTI-AGENT WORKFLOW
You sequentially inhabit five expert personas—each passes a concise artefact to the next: You sequentially inhabit five expert personas—each passes a concise artefact to the next:

View File

@@ -4,7 +4,8 @@ ThinkDeep tool system prompt
THINKDEEP_PROMPT = """ THINKDEEP_PROMPT = """
ROLE ROLE
You are a senior engineering collaborator working with Claude on complex software problems. Claude will send you content—analysis, prompts, questions, ideas, or theories—to deepen, validate, and extend. You are a senior engineering collaborator working with Claude on complex software problems. Claude will send you
content—analysis, prompts, questions, ideas, or theories—to deepen, validate, and extend.
IF MORE INFORMATION IS NEEDED IF MORE INFORMATION IS NEEDED
If you need additional context (e.g., related files, system architecture, requirements, code snippets) to provide If you need additional context (e.g., related files, system architecture, requirements, code snippets) to provide

View File

@@ -9,6 +9,7 @@ import importlib
import os import os
import tempfile import tempfile
from pathlib import Path from pathlib import Path
from unittest.mock import patch
import pytest import pytest
@@ -36,27 +37,33 @@ def test_docker_path_translation_integration():
try: try:
os.environ["WORKSPACE_ROOT"] = str(host_workspace) os.environ["WORKSPACE_ROOT"] = str(host_workspace)
# Reload the module to pick up new environment variables # Reload the modules to pick up new environment variables
# Need to reload security_config first since it sets WORKSPACE_ROOT
import utils.security_config
importlib.reload(utils.security_config)
importlib.reload(utils.file_utils) importlib.reload(utils.file_utils)
# Mock the CONTAINER_WORKSPACE to point to our test directory # Properly mock the CONTAINER_WORKSPACE
utils.file_utils.CONTAINER_WORKSPACE = container_workspace with patch("utils.file_utils.CONTAINER_WORKSPACE", container_workspace):
# Test the translation
from utils.file_utils import translate_path_for_environment
# Test the translation # This should translate the host path to container path
from utils.file_utils import translate_path_for_environment host_path = str(test_file)
result = translate_path_for_environment(host_path)
# This should translate the host path to container path # Verify the translation worked
host_path = str(test_file) expected = str(container_workspace / "src" / "test.py")
result = translate_path_for_environment(host_path) assert result == expected
# Verify the translation worked
expected = str(container_workspace / "src" / "test.py")
assert result == expected
finally: finally:
# Restore original environment # Restore original environment
os.environ.clear() os.environ.clear()
os.environ.update(original_env) os.environ.update(original_env)
import utils.security_config
importlib.reload(utils.security_config)
importlib.reload(utils.file_utils) importlib.reload(utils.file_utils)
@@ -80,19 +87,26 @@ def test_docker_security_validation():
try: try:
os.environ["WORKSPACE_ROOT"] = str(host_workspace) os.environ["WORKSPACE_ROOT"] = str(host_workspace)
# Reload the module # Reload the modules
import utils.security_config
importlib.reload(utils.security_config)
importlib.reload(utils.file_utils) importlib.reload(utils.file_utils)
utils.file_utils.CONTAINER_WORKSPACE = Path("/workspace")
from utils.file_utils import resolve_and_validate_path # Properly mock the CONTAINER_WORKSPACE
with patch("utils.file_utils.CONTAINER_WORKSPACE", Path("/workspace")):
from utils.file_utils import resolve_and_validate_path
# Trying to access the symlink should fail # Trying to access the symlink should fail
with pytest.raises(PermissionError): with pytest.raises(PermissionError):
resolve_and_validate_path(str(symlink)) resolve_and_validate_path(str(symlink))
finally: finally:
os.environ.clear() os.environ.clear()
os.environ.update(original_env) os.environ.update(original_env)
import utils.security_config
importlib.reload(utils.security_config)
importlib.reload(utils.file_utils) importlib.reload(utils.file_utils)
@@ -150,37 +164,46 @@ def test_review_changes_docker_path_translation():
# Simulate Docker environment # Simulate Docker environment
os.environ["WORKSPACE_ROOT"] = str(host_workspace) os.environ["WORKSPACE_ROOT"] = str(host_workspace)
# Reload the module # Reload the modules
import utils.security_config
importlib.reload(utils.security_config)
importlib.reload(utils.file_utils) importlib.reload(utils.file_utils)
utils.file_utils.CONTAINER_WORKSPACE = container_workspace
# Import after reloading to get updated environment # Properly mock the CONTAINER_WORKSPACE and reload precommit module
from tools.precommit import Precommit with patch("utils.file_utils.CONTAINER_WORKSPACE", container_workspace):
# Need to also patch it in the modules that import it
with patch("utils.security_config.CONTAINER_WORKSPACE", container_workspace):
# Import after patching to get updated environment
from tools.precommit import Precommit
# Create tool instance # Create tool instance
tool = Precommit() tool = Precommit()
# Test path translation in prepare_prompt # Test path translation in prepare_prompt
request = tool.get_request_model()( request = tool.get_request_model()(
path=str(host_workspace / "project"), # Host path that needs translation path=str(host_workspace / "project"), # Host path that needs translation
review_type="quick", review_type="quick",
severity_filter="all", severity_filter="all",
) )
# This should translate the path and find the git repository # This should translate the path and find the git repository
import asyncio import asyncio
result = asyncio.run(tool.prepare_prompt(request)) result = asyncio.run(tool.prepare_prompt(request))
# Should find the repository (not raise an error about inaccessible path) # Should find the repository (not raise an error about inaccessible path)
# If we get here without exception, the path was successfully translated # If we get here without exception, the path was successfully translated
assert isinstance(result, str) assert isinstance(result, str)
# The result should contain git diff information or indicate no changes # The result should contain git diff information or indicate no changes
assert "No git repositories found" not in result or "changes" in result.lower() assert "No git repositories found" not in result or "changes" in result.lower()
finally: finally:
os.environ.clear() os.environ.clear()
os.environ.update(original_env) os.environ.update(original_env)
import utils.security_config
importlib.reload(utils.security_config)
importlib.reload(utils.file_utils) importlib.reload(utils.file_utils)
@@ -203,36 +226,44 @@ def test_review_changes_docker_path_error():
# Simulate Docker environment # Simulate Docker environment
os.environ["WORKSPACE_ROOT"] = str(host_workspace) os.environ["WORKSPACE_ROOT"] = str(host_workspace)
# Reload the module # Reload the modules
import utils.security_config
importlib.reload(utils.security_config)
importlib.reload(utils.file_utils) importlib.reload(utils.file_utils)
utils.file_utils.CONTAINER_WORKSPACE = container_workspace
# Import after reloading to get updated environment # Properly mock the CONTAINER_WORKSPACE
from tools.precommit import Precommit with patch("utils.file_utils.CONTAINER_WORKSPACE", container_workspace):
with patch("utils.security_config.CONTAINER_WORKSPACE", container_workspace):
# Import after patching to get updated environment
from tools.precommit import Precommit
# Create tool instance # Create tool instance
tool = Precommit() tool = Precommit()
# Test path translation with an inaccessible path # Test path translation with an inaccessible path
request = tool.get_request_model()( request = tool.get_request_model()(
path=str(outside_path), # Path outside the mounted workspace path=str(outside_path), # Path outside the mounted workspace
review_type="quick", review_type="quick",
severity_filter="all", severity_filter="all",
) )
# This should raise a ValueError # This should raise a ValueError
import asyncio import asyncio
with pytest.raises(ValueError) as exc_info: with pytest.raises(ValueError) as exc_info:
asyncio.run(tool.prepare_prompt(request)) asyncio.run(tool.prepare_prompt(request))
# Check the error message # Check the error message
assert "not accessible from within the Docker container" in str(exc_info.value) assert "not accessible from within the Docker container" in str(exc_info.value)
assert "mounted workspace" in str(exc_info.value) assert "mounted workspace" in str(exc_info.value)
finally: finally:
os.environ.clear() os.environ.clear()
os.environ.update(original_env) os.environ.update(original_env)
import utils.security_config
importlib.reload(utils.security_config)
importlib.reload(utils.file_utils) importlib.reload(utils.file_utils)
@@ -250,31 +281,38 @@ def test_double_translation_prevention():
try: try:
os.environ["WORKSPACE_ROOT"] = str(host_workspace) os.environ["WORKSPACE_ROOT"] = str(host_workspace)
# Reload the module # Reload the modules
import utils.security_config
importlib.reload(utils.security_config)
importlib.reload(utils.file_utils) importlib.reload(utils.file_utils)
utils.file_utils.CONTAINER_WORKSPACE = container_workspace
from utils.file_utils import translate_path_for_environment # Properly mock the CONTAINER_WORKSPACE
with patch("utils.file_utils.CONTAINER_WORKSPACE", container_workspace):
from utils.file_utils import translate_path_for_environment
# Test 1: Normal translation # Test 1: Normal translation
host_path = str(host_workspace / "src" / "main.py") host_path = str(host_workspace / "src" / "main.py")
translated_once = translate_path_for_environment(host_path) translated_once = translate_path_for_environment(host_path)
expected = str(container_workspace / "src" / "main.py") expected = str(container_workspace / "src" / "main.py")
assert translated_once == expected assert translated_once == expected
# Test 2: Double translation should return the same path # Test 2: Double translation should return the same path
translated_twice = translate_path_for_environment(translated_once) translated_twice = translate_path_for_environment(translated_once)
assert translated_twice == translated_once assert translated_twice == translated_once
assert translated_twice == expected assert translated_twice == expected
# Test 3: Container workspace root should not be double-translated # Test 3: Container workspace root should not be double-translated
root_path = str(container_workspace) root_path = str(container_workspace)
translated_root = translate_path_for_environment(root_path) translated_root = translate_path_for_environment(root_path)
assert translated_root == root_path assert translated_root == root_path
finally: finally:
os.environ.clear() os.environ.clear()
os.environ.update(original_env) os.environ.update(original_env)
import utils.security_config
importlib.reload(utils.security_config)
importlib.reload(utils.file_utils) importlib.reload(utils.file_utils)

556
tests/test_refactor.py Normal file
View File

@@ -0,0 +1,556 @@
"""
Tests for the refactor tool functionality
"""
import json
from unittest.mock import MagicMock, patch
import pytest
from tools.refactor import RefactorTool
from utils.file_utils import read_file_content
class TestRefactorTool:
"""Test suite for the refactor tool"""
@pytest.fixture
def refactor_tool(self):
"""Create a refactor tool instance for testing"""
return RefactorTool()
@pytest.fixture
def mock_model_response(self):
"""Create a mock model response with valid JSON"""
def _create_response(content=None):
if content is None:
content = json.dumps(
{
"refactor_opportunities": [
{
"id": "refactor-001",
"type": "codesmells",
"severity": "high",
"file": "/test/file.py",
"start_line": 10,
"end_line": 25,
"context_start_text": "def long_method():",
"context_end_text": " return result",
"issue": "Method too long with multiple responsibilities",
"suggestion": "Extract helper methods",
"rationale": "Improves readability and maintainability",
"code_to_replace": "# original code",
"replacement_code_snippet": "# refactored code",
"new_code_snippets": [],
}
],
"priority_sequence": ["refactor-001"],
"next_actions_for_claude": [],
}
)
from unittest.mock import Mock
return Mock(
content=content,
usage={"input_tokens": 100, "output_tokens": 200, "total_tokens": 300},
model_name="test-model",
metadata={"finish_reason": "STOP"},
)
return _create_response
def test_get_name(self, refactor_tool):
"""Test that the tool returns the correct name"""
assert refactor_tool.get_name() == "refactor"
def test_get_description(self, refactor_tool):
"""Test that the tool returns a comprehensive description"""
description = refactor_tool.get_description()
assert "INTELLIGENT CODE REFACTORING" in description
assert "codesmells" in description
assert "decompose" in description
assert "modernize" in description
assert "organization" in description
def test_get_input_schema(self, refactor_tool):
"""Test that the input schema includes all required fields"""
schema = refactor_tool.get_input_schema()
assert schema["type"] == "object"
assert "files" in schema["properties"]
assert "prompt" in schema["properties"]
assert "refactor_type" in schema["properties"]
# Check refactor_type enum values
refactor_enum = schema["properties"]["refactor_type"]["enum"]
expected_types = ["codesmells", "decompose", "modernize", "organization"]
assert all(rt in refactor_enum for rt in expected_types)
def test_language_detection_python(self, refactor_tool):
"""Test language detection for Python files"""
files = ["/test/file1.py", "/test/file2.py", "/test/utils.py"]
language = refactor_tool.detect_primary_language(files)
assert language == "python"
def test_language_detection_javascript(self, refactor_tool):
"""Test language detection for JavaScript files"""
files = ["/test/app.js", "/test/component.jsx", "/test/utils.js"]
language = refactor_tool.detect_primary_language(files)
assert language == "javascript"
def test_language_detection_mixed(self, refactor_tool):
"""Test language detection for mixed language files"""
files = ["/test/app.py", "/test/script.js", "/test/main.java"]
language = refactor_tool.detect_primary_language(files)
assert language == "mixed"
def test_language_detection_unknown(self, refactor_tool):
"""Test language detection for unknown file types"""
files = ["/test/data.txt", "/test/config.json"]
language = refactor_tool.detect_primary_language(files)
assert language == "unknown"
def test_language_specific_guidance_python(self, refactor_tool):
"""Test language-specific guidance for Python modernization"""
guidance = refactor_tool.get_language_specific_guidance("python", "modernize")
assert "f-strings" in guidance
assert "dataclasses" in guidance
assert "type hints" in guidance
def test_language_specific_guidance_javascript(self, refactor_tool):
"""Test language-specific guidance for JavaScript modernization"""
guidance = refactor_tool.get_language_specific_guidance("javascript", "modernize")
assert "async/await" in guidance
assert "destructuring" in guidance
assert "arrow functions" in guidance
def test_language_specific_guidance_unknown(self, refactor_tool):
"""Test language-specific guidance for unknown languages"""
guidance = refactor_tool.get_language_specific_guidance("unknown", "modernize")
assert guidance == ""
@pytest.mark.asyncio
async def test_execute_basic_refactor(self, refactor_tool, mock_model_response):
"""Test basic refactor tool execution"""
with patch.object(refactor_tool, "get_model_provider") as mock_get_provider:
mock_provider = MagicMock()
mock_provider.get_provider_type.return_value = MagicMock(value="test")
mock_provider.supports_thinking_mode.return_value = False
mock_provider.generate_content.return_value = mock_model_response()
mock_get_provider.return_value = mock_provider
# Mock file processing
with patch.object(refactor_tool, "_prepare_file_content_for_prompt") as mock_prepare:
mock_prepare.return_value = "def test(): pass"
result = await refactor_tool.execute(
{
"files": ["/test/file.py"],
"prompt": "Find code smells in this Python code",
"refactor_type": "codesmells",
}
)
assert len(result) == 1
output = json.loads(result[0].text)
assert output["status"] == "success"
# The format_response method adds markdown instructions, so content_type should be "markdown"
# It could also be "json" or "text" depending on the response format
assert output["content_type"] in ["json", "text", "markdown"]
@pytest.mark.asyncio
async def test_execute_with_style_guide(self, refactor_tool, mock_model_response):
"""Test refactor tool execution with style guide examples"""
with patch.object(refactor_tool, "get_model_provider") as mock_get_provider:
mock_provider = MagicMock()
mock_provider.get_provider_type.return_value = MagicMock(value="test")
mock_provider.supports_thinking_mode.return_value = False
mock_provider.generate_content.return_value = mock_model_response()
mock_get_provider.return_value = mock_provider
# Mock file processing
with patch.object(refactor_tool, "_prepare_file_content_for_prompt") as mock_prepare:
mock_prepare.return_value = "def example(): pass"
with patch.object(refactor_tool, "_process_style_guide_examples") as mock_style:
mock_style.return_value = ("# style guide content", "")
result = await refactor_tool.execute(
{
"files": ["/test/file.py"],
"prompt": "Modernize this code following our style guide",
"refactor_type": "modernize",
"style_guide_examples": ["/test/style_example.py"],
}
)
assert len(result) == 1
output = json.loads(result[0].text)
assert output["status"] == "success"
def test_format_response_valid_json(self, refactor_tool):
"""Test response formatting with valid structured JSON"""
valid_json_response = json.dumps(
{
"status": "refactor_analysis_complete",
"refactor_opportunities": [
{
"id": "test-001",
"type": "codesmells",
"severity": "medium",
"file": "/test.py",
"start_line": 1,
"end_line": 5,
"context_start_text": "def test():",
"context_end_text": " pass",
"issue": "Test issue",
"suggestion": "Test suggestion",
"rationale": "Test rationale",
"code_to_replace": "old code",
"replacement_code_snippet": "new code",
}
],
"priority_sequence": ["test-001"],
"next_actions_for_claude": [],
}
)
# Create a mock request
request = MagicMock()
request.refactor_type = "codesmells"
formatted = refactor_tool.format_response(valid_json_response, request)
# Should contain the original response plus implementation instructions
assert valid_json_response in formatted
assert "IMMEDIATE NEXT ACTION" in formatted
assert "ULTRATHINK & IMPLEMENT REFACTORINGS" in formatted
assert "Step 4: COMPLETE REFACTORING" in formatted # Not more_required, so should be COMPLETE
def test_format_response_invalid_json(self, refactor_tool):
"""Test response formatting with invalid JSON - now handled by base tool"""
invalid_response = "This is not JSON content"
# Create a mock request
request = MagicMock()
request.refactor_type = "codesmells"
formatted = refactor_tool.format_response(invalid_response, request)
# Should contain the original response plus implementation instructions
assert invalid_response in formatted
assert "IMMEDIATE NEXT ACTION" in formatted
assert "ULTRATHINK & IMPLEMENT REFACTORINGS" in formatted
def test_model_category(self, refactor_tool):
"""Test that the refactor tool uses EXTENDED_REASONING category"""
from tools.models import ToolModelCategory
category = refactor_tool.get_model_category()
assert category == ToolModelCategory.EXTENDED_REASONING
def test_default_temperature(self, refactor_tool):
"""Test that the refactor tool uses analytical temperature"""
from config import TEMPERATURE_ANALYTICAL
temp = refactor_tool.get_default_temperature()
assert temp == TEMPERATURE_ANALYTICAL
def test_format_response_more_refactor_required(self, refactor_tool):
"""Test that format_response handles more_refactor_required status"""
more_refactor_response = json.dumps(
{
"status": "more_refactor_required",
"message": "Large codebase requires extensive refactoring across multiple files",
}
)
# Create a mock request
request = MagicMock()
request.refactor_type = "decompose"
formatted = refactor_tool.format_response(more_refactor_response, request)
# Should contain the original response plus continuation instructions
assert more_refactor_response in formatted
assert "IMMEDIATE NEXT ACTION" in formatted
assert "ULTRATHINK & IMPLEMENT REFACTORINGS" in formatted
assert "VERIFY CHANGES WORK" in formatted
assert "Step 4: CONTINUE WITH MORE REFACTORING" in formatted # more_required, so should be CONTINUE
assert "continuation_id" in formatted
assert "immediately continue with more refactoring analysis" in formatted
class TestFileUtilsLineNumbers:
"""Test suite for line numbering functionality in file_utils"""
def test_read_file_content_with_line_numbers(self, project_path):
"""Test reading file content with line numbers enabled"""
# Create a test file within the workspace
temp_path = project_path / "test_file.py"
with open(temp_path, "w") as f:
f.write("def hello():\n print('Hello')\n return True")
# Read with line numbers explicitly enabled
content, tokens = read_file_content(str(temp_path), include_line_numbers=True)
# Check that line numbers are present
assert "1│ def hello():" in content
assert "2│ print('Hello')" in content
assert "3│ return True" in content
assert "--- BEGIN FILE:" in content
assert "--- END FILE:" in content
def test_read_file_content_without_line_numbers(self, project_path):
"""Test reading file content with line numbers disabled"""
# Create a test file within the workspace
temp_path = project_path / "test_file.txt"
with open(temp_path, "w") as f:
f.write("Line 1\nLine 2\nLine 3")
# Read with line numbers explicitly disabled
content, tokens = read_file_content(str(temp_path), include_line_numbers=False)
# Check that line numbers are NOT present
assert "1│" not in content
assert "Line 1" in content
assert "Line 2" in content
assert "--- BEGIN FILE:" in content
def test_read_file_content_auto_detect_programming(self, project_path):
"""Test that auto-detection is OFF by default (backwards compatibility)"""
# Create a test file within the workspace
temp_path = project_path / "test_auto.py"
with open(temp_path, "w") as f:
f.write("import os\nprint('test')")
# Read without specifying line numbers (should NOT auto-detect for backwards compatibility)
content, tokens = read_file_content(str(temp_path))
# Should NOT automatically add line numbers for .py files (default behavior)
assert "1│" not in content
assert "import os" in content
assert "print('test')" in content
def test_read_file_content_auto_detect_text(self, project_path):
"""Test auto-detection of line numbers for text files"""
# Create a test file within the workspace
temp_path = project_path / "test_auto.txt"
with open(temp_path, "w") as f:
f.write("This is a text file\nWith multiple lines")
# Read without specifying line numbers (should auto-detect)
content, tokens = read_file_content(str(temp_path))
# Should NOT automatically add line numbers for .txt files
assert "1│" not in content
assert "This is a text file" in content
def test_line_ending_normalization(self):
"""Test that different line endings are normalized consistently"""
from utils.file_utils import _add_line_numbers, _normalize_line_endings
# Test different line ending formats
content_crlf = "Line 1\r\nLine 2\r\nLine 3"
content_cr = "Line 1\rLine 2\rLine 3"
content_lf = "Line 1\nLine 2\nLine 3"
# All should normalize to the same result
normalized_crlf = _normalize_line_endings(content_crlf)
normalized_cr = _normalize_line_endings(content_cr)
normalized_lf = _normalize_line_endings(content_lf)
assert normalized_crlf == normalized_cr == normalized_lf
assert normalized_lf == "Line 1\nLine 2\nLine 3"
# Line numbering should work consistently
numbered = _add_line_numbers(content_crlf)
assert " 1│ Line 1" in numbered
assert " 2│ Line 2" in numbered
assert " 3│ Line 3" in numbered
def test_detect_file_type(self):
"""Test file type detection"""
from utils.file_utils import detect_file_type
# Test programming language files
assert detect_file_type("test.py") == "text"
assert detect_file_type("test.js") == "text"
assert detect_file_type("test.java") == "text"
# Test image files
assert detect_file_type("image.png") == "image"
assert detect_file_type("photo.jpg") == "image"
# Test binary files
assert detect_file_type("program.exe") == "binary"
assert detect_file_type("library.dll") == "binary"
def test_should_add_line_numbers(self):
"""Test line number detection logic"""
from utils.file_utils import should_add_line_numbers
# NO files should get line numbers by default (backwards compatibility)
assert not should_add_line_numbers("test.py")
assert not should_add_line_numbers("app.js")
assert not should_add_line_numbers("Main.java")
assert not should_add_line_numbers("readme.txt")
assert not should_add_line_numbers("data.csv")
# Explicit override should work
assert should_add_line_numbers("readme.txt", True)
assert not should_add_line_numbers("test.py", False)
def test_line_numbers_double_triple_digits(self, project_path):
"""Test line numbering with double and triple digit line numbers"""
from utils.file_utils import _add_line_numbers
# Create content with many lines to test double and triple digit formatting
lines = []
for i in range(1, 125): # Lines 1-124 for testing up to triple digits
if i < 10:
lines.append(f"# Single digit line {i}")
elif i < 100:
lines.append(f"# Double digit line {i}")
else:
lines.append(f"# Triple digit line {i}")
content = "\n".join(lines)
numbered_content = _add_line_numbers(content)
# Test single digit formatting (should be right-aligned with spaces)
assert " 1│ # Single digit line 1" in numbered_content
assert " 9│ # Single digit line 9" in numbered_content
# Test double digit formatting (should be right-aligned)
assert " 10│ # Double digit line 10" in numbered_content # Line 10 has "double digit" content
assert " 50│ # Double digit line 50" in numbered_content
assert " 99│ # Double digit line 99" in numbered_content
# Test triple digit formatting (should be right-aligned)
assert " 100│ # Triple digit line 100" in numbered_content
assert " 124│ # Triple digit line 124" in numbered_content
# Verify consistent alignment - all line numbers should end with "│ "
lines_with_numbers = numbered_content.split("\n")
for line in lines_with_numbers:
if "" in line:
# Find the pipe character position
pipe_pos = line.find("")
# Ensure the character before pipe is a digit
assert line[pipe_pos - 1].isdigit(), f"Line format issue: {line}"
# Ensure the character after pipe is a space
assert line[pipe_pos + 1] == " ", f"Line format issue: {line}"
def test_line_numbers_with_file_reading(self, project_path):
"""Test line numbering through file reading with large file"""
# Create a test file with 150 functions (600 total lines: 4 lines per function)
temp_path = project_path / "large_test_file.py"
with open(temp_path, "w") as f:
for i in range(1, 151): # Functions 1-150
f.write(f"def function_{i}():\n")
f.write(f" # This is function number {i}\n")
f.write(f" return {i}\n")
f.write("\n")
# Read with line numbers enabled
content, tokens = read_file_content(str(temp_path), include_line_numbers=True)
# Calculate actual line numbers based on file structure (4 lines per function)
# Function 1: lines 1-4, Function 2: lines 5-8, etc.
# Line 1: def function_1():
# Line 2: # This is function number 1
# Line 3: return 1
# Line 4: (empty)
# Test various line number formats in the actual file content
assert " 1│ def function_1():" in content
# Function 13 starts at line 49 (12*4 + 1), so line 50 is " # This is function number 13"
assert " 50│ # This is function number 13" in content
# Line 100 is actually an empty line after function 25 (line 99 was "return 25")
assert " 100│ " in content # Empty line
# Line 99 is "return 25" from function 25
assert " 99│ return 25" in content
# Test more line numbers - line 147 is "return 37" from function 37
assert " 147│ return 37" in content
# Test that we have the final lines (600 total lines)
assert " 599│ return 150" in content
assert " 600│ " in content # Final empty line
# Verify the file structure is preserved
assert "--- BEGIN FILE:" in content
assert "--- END FILE:" in content
assert str(temp_path) in content
def test_line_numbers_large_files_22k_lines(self, project_path):
"""Test line numbering for very large files (22,500+ lines)"""
from utils.file_utils import _add_line_numbers
# Create content simulating a very large file with 25,000 lines
lines = []
for i in range(1, 25001): # Lines 1-25000
lines.append(f"// Large file line {i}")
content = "\n".join(lines)
numbered_content = _add_line_numbers(content)
# Test that width dynamically adjusts to 5 digits for large files
# Small line numbers should now have 5-digit width
assert " 1│ // Large file line 1" in numbered_content
assert " 9│ // Large file line 9" in numbered_content
assert " 10│ // Large file line 10" in numbered_content
assert " 99│ // Large file line 99" in numbered_content
assert " 100│ // Large file line 100" in numbered_content
assert " 999│ // Large file line 999" in numbered_content
assert " 1000│ // Large file line 1000" in numbered_content
assert " 9999│ // Large file line 9999" in numbered_content
assert "10000│ // Large file line 10000" in numbered_content
assert "22500│ // Large file line 22500" in numbered_content
assert "25000│ // Large file line 25000" in numbered_content
# Verify consistent alignment - all line numbers should end with "│ "
lines_with_numbers = numbered_content.split("\n")
for i, line in enumerate(lines_with_numbers[:100]): # Check first 100 lines
if "" in line:
pipe_pos = line.find("")
# For large files, should be 5-character width plus pipe
assert line[pipe_pos - 1].isdigit(), f"Line {i+1} format issue: {line}"
assert line[pipe_pos + 1] == " ", f"Line {i+1} format issue: {line}"
def test_line_numbers_boundary_conditions(self):
"""Test line numbering at boundary conditions (9999 vs 10000 lines)"""
from utils.file_utils import _add_line_numbers
# Test exactly 9999 lines (should use 4-digit width)
lines_9999 = [f"Line {i}" for i in range(1, 10000)] # 9999 lines
content_9999 = "\n".join(lines_9999)
numbered_9999 = _add_line_numbers(content_9999)
# Should use 4-digit format
assert " 1│ Line 1" in numbered_9999
assert "9999│ Line 9999" in numbered_9999
# Test exactly 10000 lines (should use 5-digit width)
lines_10000 = [f"Line {i}" for i in range(1, 10001)] # 10000 lines
content_10000 = "\n".join(lines_10000)
numbered_10000 = _add_line_numbers(content_10000)
# Should use 5-digit format
assert " 1│ Line 1" in numbered_10000
assert "10000│ Line 10000" in numbered_10000
if __name__ == "__main__":
pytest.main([__file__, "-v"])

View File

@@ -27,10 +27,11 @@ class TestServerTools:
assert "chat" in tool_names assert "chat" in tool_names
assert "precommit" in tool_names assert "precommit" in tool_names
assert "testgen" in tool_names assert "testgen" in tool_names
assert "refactor" in tool_names
assert "version" in tool_names assert "version" in tool_names
# Should have exactly 8 tools (including testgen) # Should have exactly 9 tools (including refactor)
assert len(tools) == 8 assert len(tools) == 9
# Check descriptions are verbose # Check descriptions are verbose
for tool in tools: for tool in tools:

View File

@@ -261,3 +261,95 @@ class TestSpecialStatusParsing:
# Should fall back to normal response since validation failed # Should fall back to normal response since validation failed
assert result.status == "success" assert result.status == "success"
assert result.content_type == "text" assert result.content_type == "text"
def test_refactor_analysis_complete_parsing(self):
"""Test that RefactorAnalysisComplete status is properly parsed"""
import json
json_response = {
"status": "refactor_analysis_complete",
"refactor_opportunities": [
{
"id": "refactor-001",
"type": "decompose",
"severity": "critical",
"file": "/test.py",
"start_line": 1,
"end_line": 5,
"context_start_text": "def test():",
"context_end_text": " pass",
"issue": "Large function needs decomposition",
"suggestion": "Extract helper methods",
"rationale": "Improves readability",
"code_to_replace": "old code",
"replacement_code_snippet": "new code",
}
],
"priority_sequence": ["refactor-001"],
"next_actions_for_claude": [
{
"action_type": "EXTRACT_METHOD",
"target_file": "/test.py",
"source_lines": "1-5",
"description": "Extract helper method",
}
],
}
result = self.tool._parse_response(json.dumps(json_response), self.request)
assert result.status == "refactor_analysis_complete"
assert result.content_type == "json"
parsed_content = json.loads(result.content)
assert "refactor_opportunities" in parsed_content
assert len(parsed_content["refactor_opportunities"]) == 1
assert parsed_content["refactor_opportunities"][0]["id"] == "refactor-001"
def test_refactor_analysis_complete_validation_error(self):
"""Test that RefactorAnalysisComplete validation catches missing required fields"""
import json
json_response = {
"status": "refactor_analysis_complete",
"refactor_opportunities": [
{
"id": "refactor-001",
# Missing required fields like type, severity, etc.
}
],
"priority_sequence": ["refactor-001"],
"next_actions_for_claude": [],
}
result = self.tool._parse_response(json.dumps(json_response), self.request)
# Should fall back to normal response since validation failed
assert result.status == "success"
assert result.content_type == "text"
def test_more_refactor_required_parsing(self):
"""Test that more_refactor_required status is parsed correctly"""
import json
json_response = {
"status": "more_refactor_required",
"message": "Large codebase requires extensive decomposition across 15 files. Continuing analysis for remaining modules.",
}
result = self.tool._parse_response(json.dumps(json_response), self.request)
assert result.status == "more_refactor_required"
assert result.content_type == "json"
parsed_content = json.loads(result.content)
assert parsed_content["status"] == "more_refactor_required"
assert "Large codebase requires extensive decomposition" in parsed_content["message"]
def test_more_refactor_required_missing_message(self):
"""Test that more_refactor_required without required message field fails validation"""
response_json = '{"status": "more_refactor_required"}'
result = self.tool._parse_response(response_json, self.request)
# Should fall back to normal processing since validation failed (missing required field)
assert result.status == "success"
assert result.content_type == "text"

View File

@@ -7,6 +7,7 @@ from .chat import ChatTool
from .codereview import CodeReviewTool from .codereview import CodeReviewTool
from .debug import DebugIssueTool from .debug import DebugIssueTool
from .precommit import Precommit from .precommit import Precommit
from .refactor import RefactorTool
from .testgen import TestGenTool from .testgen import TestGenTool
from .thinkdeep import ThinkDeepTool from .thinkdeep import ThinkDeepTool
@@ -17,5 +18,6 @@ __all__ = [
"AnalyzeTool", "AnalyzeTool",
"ChatTool", "ChatTool",
"Precommit", "Precommit",
"RefactorTool",
"TestGenTool", "TestGenTool",
] ]

View File

@@ -207,9 +207,7 @@ class BaseTool(ABC):
provider = ModelProviderRegistry.get_provider_for_model(model_name) provider = ModelProviderRegistry.get_provider_for_model(model_name)
if not provider: if not provider:
logger = logging.getLogger(f"tools.{self.name}") logger = logging.getLogger(f"tools.{self.name}")
logger.warning( logger.warning(f"Model '{model_name}' is not available with current API keys. Requiring model selection.")
f"Model '{model_name}' is not available with current API keys. " f"Requiring model selection."
)
return True return True
return False return False
@@ -397,6 +395,25 @@ class BaseTool(ABC):
""" """
return 0.5 return 0.5
def wants_line_numbers_by_default(self) -> bool:
"""
Return whether this tool wants line numbers added to code files by default.
Tools that benefit from precise line references (refactor, codereview, debug)
should return True. Tools that prioritize token efficiency or don't need
precise references can return False.
Line numbers add ~8-10% token overhead but provide precise targeting for:
- Code review feedback ("SQL injection on line 45")
- Debug error locations ("Memory leak in loop at lines 123-156")
- Test generation targets ("Generate tests for method at lines 78-95")
- Refactoring guidance ("Extract method from lines 67-89")
Returns:
bool: True if line numbers should be added by default for this tool
"""
return False # Conservative default - tools opt-in as needed
def get_default_thinking_mode(self) -> str: def get_default_thinking_mode(self) -> str:
""" """
Return the default thinking mode for this tool. Return the default thinking mode for this tool.
@@ -694,7 +711,10 @@ class BaseTool(ABC):
) )
try: try:
file_content = read_files( file_content = read_files(
files_to_embed, max_tokens=effective_max_tokens + reserve_tokens, reserve_tokens=reserve_tokens files_to_embed,
max_tokens=effective_max_tokens + reserve_tokens,
reserve_tokens=reserve_tokens,
include_line_numbers=self.wants_line_numbers_by_default(),
) )
self._validate_token_limit(file_content, context_description) self._validate_token_limit(file_content, context_description)
content_parts.append(file_content) content_parts.append(file_content)

View File

@@ -148,6 +148,10 @@ class CodeReviewTool(BaseTool):
def get_default_temperature(self) -> float: def get_default_temperature(self) -> float:
return TEMPERATURE_ANALYTICAL return TEMPERATURE_ANALYTICAL
def wants_line_numbers_by_default(self) -> bool:
"""Code review tool needs line numbers for precise feedback"""
return True
def get_request_model(self): def get_request_model(self):
return CodeReviewRequest return CodeReviewRequest

View File

@@ -111,6 +111,10 @@ class DebugIssueTool(BaseTool):
def get_default_temperature(self) -> float: def get_default_temperature(self) -> float:
return TEMPERATURE_ANALYTICAL return TEMPERATURE_ANALYTICAL
def wants_line_numbers_by_default(self) -> bool:
"""Debug tool needs line numbers for precise error location"""
return True
def get_model_category(self) -> "ToolModelCategory": def get_model_category(self) -> "ToolModelCategory":
"""Debug requires deep analysis and reasoning""" """Debug requires deep analysis and reasoning"""
from tools.models import ToolModelCategory from tools.models import ToolModelCategory

View File

@@ -40,6 +40,8 @@ class ToolOutput(BaseModel):
"focused_review_required", "focused_review_required",
"test_sample_needed", "test_sample_needed",
"more_tests_required", "more_tests_required",
"more_refactor_required",
"refactor_analysis_complete",
"resend_prompt", "resend_prompt",
"continuation_available", "continuation_available",
] = "success" ] = "success"
@@ -97,6 +99,56 @@ class MoreTestsRequired(BaseModel):
pending_tests: str = Field(..., description="List of pending tests to be generated") pending_tests: str = Field(..., description="List of pending tests to be generated")
class MoreRefactorRequired(BaseModel):
"""Request for continuation when refactoring requires extensive changes"""
status: Literal["more_refactor_required"] = "more_refactor_required"
message: str = Field(..., description="Explanation of why more refactoring is needed and what remains to be done")
class RefactorOpportunity(BaseModel):
"""A single refactoring opportunity with precise targeting information"""
id: str = Field(..., description="Unique identifier for this refactoring opportunity")
type: Literal["decompose", "codesmells", "modernize", "organization"] = Field(
..., description="Type of refactoring"
)
severity: Literal["critical", "high", "medium", "low"] = Field(..., description="Severity level")
file: str = Field(..., description="Absolute path to the file")
start_line: int = Field(..., description="Starting line number")
end_line: int = Field(..., description="Ending line number")
context_start_text: str = Field(..., description="Exact text from start line for verification")
context_end_text: str = Field(..., description="Exact text from end line for verification")
issue: str = Field(..., description="Clear description of what needs refactoring")
suggestion: str = Field(..., description="Specific refactoring action to take")
rationale: str = Field(..., description="Why this improves the code")
code_to_replace: str = Field(..., description="Original code that should be changed")
replacement_code_snippet: str = Field(..., description="Refactored version of the code")
new_code_snippets: Optional[list[dict]] = Field(
default_factory=list, description="Additional code snippets to be added"
)
class RefactorAction(BaseModel):
"""Next action for Claude to implement refactoring"""
action_type: Literal["EXTRACT_METHOD", "SPLIT_CLASS", "MODERNIZE_SYNTAX", "REORGANIZE_CODE", "DECOMPOSE_FILE"] = (
Field(..., description="Type of action to perform")
)
target_file: str = Field(..., description="Absolute path to target file")
source_lines: str = Field(..., description="Line range (e.g., '45-67')")
description: str = Field(..., description="Step-by-step action description for Claude")
class RefactorAnalysisComplete(BaseModel):
"""Complete refactor analysis with prioritized opportunities"""
status: Literal["refactor_analysis_complete"] = "refactor_analysis_complete"
refactor_opportunities: list[RefactorOpportunity] = Field(..., description="List of refactoring opportunities")
priority_sequence: list[str] = Field(..., description="Recommended order of refactoring IDs")
next_actions_for_claude: list[RefactorAction] = Field(..., description="Specific actions for Claude to implement")
# Registry mapping status strings to their corresponding Pydantic models # Registry mapping status strings to their corresponding Pydantic models
SPECIAL_STATUS_MODELS = { SPECIAL_STATUS_MODELS = {
"clarification_required": ClarificationRequest, "clarification_required": ClarificationRequest,
@@ -104,6 +156,8 @@ SPECIAL_STATUS_MODELS = {
"focused_review_required": FocusedReviewRequired, "focused_review_required": FocusedReviewRequired,
"test_sample_needed": TestSampleNeeded, "test_sample_needed": TestSampleNeeded,
"more_tests_required": MoreTestsRequired, "more_tests_required": MoreTestsRequired,
"more_refactor_required": MoreRefactorRequired,
"refactor_analysis_complete": RefactorAnalysisComplete,
} }

653
tools/refactor.py Normal file
View File

@@ -0,0 +1,653 @@
"""
Refactor tool - Intelligent code refactoring suggestions with precise line-number references
This tool analyzes code for refactoring opportunities across four main categories:
- codesmells: Detect and suggest fixes for common code smells
- decompose: Break down large functions, classes, and modules into smaller, focused components
- modernize: Update code to use modern language features and patterns
- organization: Suggest better organization and logical grouping of related functionality
Key Features:
- Cross-language support with language-specific guidance
- Precise line-number references for Claude
- Large context handling with token budgeting
- Structured JSON responses for easy parsing
- Style guide integration for project-specific patterns
"""
import logging
import os
from typing import Any, Literal, Optional
from mcp.types import TextContent
from pydantic import Field
from config import TEMPERATURE_ANALYTICAL
from systemprompts import REFACTOR_PROMPT
from utils.file_utils import translate_file_paths
from .base import BaseTool, ToolRequest
from .models import ToolOutput
logger = logging.getLogger(__name__)
class RefactorRequest(ToolRequest):
"""
Request model for the refactor tool.
This model defines all parameters that can be used to customize
the refactoring analysis process.
"""
files: list[str] = Field(
...,
description="Code files or directories to analyze for refactoring opportunities (must be absolute paths)",
)
prompt: str = Field(
...,
description="Description of refactoring goals, context, and specific areas of focus",
)
refactor_type: Literal["codesmells", "decompose", "modernize", "organization"] = Field(
..., description="Type of refactoring analysis to perform"
)
focus_areas: Optional[list[str]] = Field(
None,
description="Specific areas to focus on (e.g., 'performance', 'readability', 'maintainability', 'security')",
)
style_guide_examples: Optional[list[str]] = Field(
None,
description=(
"Optional existing code files to use as style/pattern reference (must be absolute paths). "
"These files represent the target coding style and patterns for the project. "
"Particularly useful for 'modernize' and 'organization' refactor types."
),
)
class RefactorTool(BaseTool):
"""
Refactor tool implementation.
This tool analyzes code to provide intelligent refactoring suggestions
with precise line-number references for Claude to implement.
"""
def get_name(self) -> str:
return "refactor"
def get_description(self) -> str:
return (
"INTELLIGENT CODE REFACTORING - Analyzes code for refactoring opportunities with precise line-number guidance. "
"Supports four refactor types: 'codesmells' (detect anti-patterns), 'decompose' (break down large functions/classes/modules into smaller components), "
"'modernize' (update to modern language features), and 'organization' (improve organization and grouping of related functionality). "
"Provides specific, actionable refactoring steps that Claude can implement directly. "
"Choose thinking_mode based on codebase complexity: 'medium' for standard modules (default), "
"'high' for complex systems, 'max' for legacy codebases requiring deep analysis. "
"Note: If you're not currently using a top-tier model such as Opus 4 or above, these tools can provide enhanced capabilities."
)
def get_input_schema(self) -> dict[str, Any]:
schema = {
"type": "object",
"properties": {
"files": {
"type": "array",
"items": {"type": "string"},
"description": "Code files or directories to analyze for refactoring opportunities (must be absolute paths)",
},
"model": self.get_model_field_schema(),
"prompt": {
"type": "string",
"description": "Description of refactoring goals, context, and specific areas of focus",
},
"refactor_type": {
"type": "string",
"enum": ["codesmells", "decompose", "modernize", "organization"],
"description": "Type of refactoring analysis to perform",
},
"focus_areas": {
"type": "array",
"items": {"type": "string"},
"description": "Specific areas to focus on (e.g., 'performance', 'readability', 'maintainability', 'security')",
},
"style_guide_examples": {
"type": "array",
"items": {"type": "string"},
"description": (
"Optional existing code files to use as style/pattern reference (must be absolute paths). "
"These files represent the target coding style and patterns for the project."
),
},
"thinking_mode": {
"type": "string",
"enum": ["minimal", "low", "medium", "high", "max"],
"description": "Thinking depth: minimal (0.5% of model max), low (8%), medium (33%), high (67%), max (100% of model max)",
},
"continuation_id": {
"type": "string",
"description": (
"Thread continuation ID for multi-turn conversations. Can be used to continue conversations "
"across different tools. Only provide this if continuing a previous conversation thread."
),
},
},
"required": ["files", "prompt", "refactor_type"] + (["model"] if self.is_effective_auto_mode() else []),
}
return schema
def get_system_prompt(self) -> str:
return REFACTOR_PROMPT
def get_default_temperature(self) -> float:
return TEMPERATURE_ANALYTICAL
def wants_line_numbers_by_default(self) -> bool:
"""Refactor tool needs line numbers for precise targeting"""
return True
def get_model_category(self):
"""Refactor tool requires extended reasoning for comprehensive analysis"""
from tools.models import ToolModelCategory
return ToolModelCategory.EXTENDED_REASONING
def get_request_model(self):
return RefactorRequest
async def execute(self, arguments: dict[str, Any]) -> list[TextContent]:
"""Override execute to check prompt size before processing"""
logger.info(f"[REFACTOR] execute called with arguments: {list(arguments.keys())}")
# First validate request
request_model = self.get_request_model()
request = request_model(**arguments)
# Check prompt size if provided
if request.prompt:
size_check = self.check_prompt_size(request.prompt)
if size_check:
logger.info(f"[REFACTOR] Prompt size check triggered, returning early")
return [TextContent(type="text", text=ToolOutput(**size_check).model_dump_json())]
logger.info(f"[REFACTOR] Prompt size OK, calling super().execute()")
# Continue with normal execution
return await super().execute(arguments)
def detect_primary_language(self, file_paths: list[str]) -> str:
"""
Detect the primary programming language from file extensions.
Args:
file_paths: List of file paths to analyze
Returns:
str: Detected language or "mixed" if multiple languages found
"""
# Language detection based on file extensions
language_extensions = {
"python": {".py"},
"javascript": {".js", ".jsx", ".mjs"},
"typescript": {".ts", ".tsx"},
"java": {".java"},
"csharp": {".cs"},
"cpp": {".cpp", ".cc", ".cxx", ".c", ".h", ".hpp"},
"go": {".go"},
"rust": {".rs"},
"swift": {".swift"},
"kotlin": {".kt"},
"ruby": {".rb"},
"php": {".php"},
"scala": {".scala"},
}
# Count files by language
language_counts = {}
for file_path in file_paths:
extension = os.path.splitext(file_path.lower())[1]
for lang, exts in language_extensions.items():
if extension in exts:
language_counts[lang] = language_counts.get(lang, 0) + 1
break
if not language_counts:
return "unknown"
# Return most common language, or "mixed" if multiple languages
max_count = max(language_counts.values())
dominant_languages = [lang for lang, count in language_counts.items() if count == max_count]
if len(dominant_languages) == 1:
return dominant_languages[0]
else:
return "mixed"
def get_language_specific_guidance(self, language: str, refactor_type: str) -> str:
"""
Generate language-specific guidance for the refactoring prompt.
Args:
language: Detected programming language
refactor_type: Type of refactoring being performed
Returns:
str: Language-specific guidance to inject into the prompt
"""
if language == "unknown" or language == "mixed":
return ""
# Language-specific modernization features
modernization_features = {
"python": "f-strings, dataclasses, type hints, pathlib, async/await, context managers, list/dict comprehensions, walrus operator",
"javascript": "async/await, destructuring, arrow functions, template literals, optional chaining, nullish coalescing, modules (import/export)",
"typescript": "strict type checking, utility types, const assertions, template literal types, mapped types",
"java": "streams API, lambda expressions, optional, records, pattern matching, var declarations, text blocks",
"csharp": "LINQ, nullable reference types, pattern matching, records, async streams, using declarations",
"swift": "value types, protocol-oriented programming, property wrappers, result builders, async/await",
"go": "modules, error wrapping, context package, generics (Go 1.18+)",
"rust": "ownership patterns, iterator adapters, error handling with Result, async/await",
}
# Language-specific code splitting patterns
splitting_patterns = {
"python": "modules, classes, functions, decorators for cross-cutting concerns",
"javascript": "modules (ES6), classes, functions, higher-order functions",
"java": "packages, classes, interfaces, abstract classes, composition over inheritance",
"csharp": "namespaces, classes, interfaces, extension methods, dependency injection",
"swift": "extensions, protocols, structs, enums with associated values",
"go": "packages, interfaces, struct composition, function types",
}
guidance_parts = []
if refactor_type == "modernize" and language in modernization_features:
guidance_parts.append(
f"LANGUAGE-SPECIFIC MODERNIZATION ({language.upper()}): Focus on {modernization_features[language]}"
)
if refactor_type == "decompose" and language in splitting_patterns:
guidance_parts.append(
f"LANGUAGE-SPECIFIC DECOMPOSITION ({language.upper()}): Use {splitting_patterns[language]} to break down large components"
)
# General language guidance
general_guidance = {
"python": "Follow PEP 8, use descriptive names, prefer composition over inheritance",
"javascript": "Use consistent naming conventions, avoid global variables, prefer functional patterns",
"java": "Follow Java naming conventions, use interfaces for abstraction, consider immutability",
"csharp": "Follow C# naming conventions, use nullable reference types, prefer async methods",
}
if language in general_guidance:
guidance_parts.append(f"GENERAL GUIDANCE ({language.upper()}): {general_guidance[language]}")
return "\n".join(guidance_parts) if guidance_parts else ""
def _process_style_guide_examples(
self, style_examples: list[str], continuation_id: Optional[str], available_tokens: int = None
) -> tuple[str, str]:
"""
Process style guide example files using available token budget.
Args:
style_examples: List of style guide file paths
continuation_id: Continuation ID for filtering already embedded files
available_tokens: Available token budget for examples
Returns:
tuple: (formatted_content, summary_note)
"""
logger.debug(f"[REFACTOR] Processing {len(style_examples)} style guide examples")
if not style_examples:
logger.debug("[REFACTOR] No style guide examples provided")
return "", ""
# Use existing file filtering to avoid duplicates in continuation
examples_to_process = self.filter_new_files(style_examples, continuation_id)
logger.debug(f"[REFACTOR] After filtering: {len(examples_to_process)} new style examples to process")
if not examples_to_process:
logger.info(f"[REFACTOR] All {len(style_examples)} style examples already in conversation history")
return "", ""
# Translate file paths for Docker environment before accessing files
translated_examples = translate_file_paths(examples_to_process)
logger.debug(f"[REFACTOR] Translated {len(examples_to_process)} file paths for container access")
# Calculate token budget for style examples (20% of available tokens, or fallback)
if available_tokens:
style_examples_budget = int(available_tokens * 0.20) # 20% for style examples
logger.debug(
f"[REFACTOR] Allocating {style_examples_budget:,} tokens (20% of {available_tokens:,}) for style examples"
)
else:
style_examples_budget = 25000 # Fallback if no budget provided
logger.debug(f"[REFACTOR] Using fallback budget of {style_examples_budget:,} tokens for style examples")
original_count = len(examples_to_process)
logger.debug(
f"[REFACTOR] Processing {original_count} style example files with {style_examples_budget:,} token budget"
)
# Sort by file size (smallest first) for pattern-focused selection
file_sizes = []
for i, file_path in enumerate(examples_to_process):
translated_path = translated_examples[i]
try:
size = os.path.getsize(translated_path)
file_sizes.append((file_path, size))
logger.debug(f"[REFACTOR] Style example {os.path.basename(file_path)}: {size:,} bytes")
except (OSError, FileNotFoundError) as e:
logger.warning(f"[REFACTOR] Could not get size for {file_path}: {e}")
file_sizes.append((file_path, float("inf")))
# Sort by size and take smallest files for pattern reference
file_sizes.sort(key=lambda x: x[1])
examples_to_process = [f[0] for f in file_sizes]
logger.debug(
f"[REFACTOR] Sorted style examples by size (smallest first): {[os.path.basename(f) for f in examples_to_process]}"
)
# Use standard file content preparation with dynamic token budget and line numbers
try:
logger.debug(f"[REFACTOR] Preparing file content for {len(examples_to_process)} style examples")
content = self._prepare_file_content_for_prompt(
examples_to_process,
continuation_id,
"Style guide examples",
max_tokens=style_examples_budget,
reserve_tokens=1000,
)
# Determine how many files were actually included
if content:
from utils.token_utils import estimate_tokens
used_tokens = estimate_tokens(content)
logger.info(
f"[REFACTOR] Successfully embedded style examples: {used_tokens:,} tokens used ({style_examples_budget:,} available)"
)
if original_count > 1:
truncation_note = f"Note: Used {used_tokens:,} tokens ({style_examples_budget:,} available) for style guide examples from {original_count} files to determine coding patterns."
else:
truncation_note = ""
else:
logger.warning("[REFACTOR] No content generated for style examples")
truncation_note = ""
return content, truncation_note
except Exception as e:
# If style example processing fails, continue without examples rather than failing
logger.error(f"[REFACTOR] Failed to process style examples: {type(e).__name__}: {e}")
return "", f"Warning: Could not process style guide examples: {str(e)}"
async def prepare_prompt(self, request: RefactorRequest) -> str:
"""
Prepare the refactoring prompt with code analysis and optional style examples.
This method reads the requested files, processes any style guide examples,
and constructs a detailed prompt for comprehensive refactoring analysis.
Args:
request: The validated refactor request
Returns:
str: Complete prompt for the model
Raises:
ValueError: If the code exceeds token limits
"""
logger.info(f"[REFACTOR] prepare_prompt called with {len(request.files)} files, type={request.refactor_type}")
logger.debug(f"[REFACTOR] Preparing prompt for {len(request.files)} code files")
logger.debug(f"[REFACTOR] Refactor type: {request.refactor_type}")
if request.style_guide_examples:
logger.debug(f"[REFACTOR] Including {len(request.style_guide_examples)} style guide examples")
# Check for prompt.txt in files
prompt_content, updated_files = self.handle_prompt_file(request.files)
# If prompt.txt was found, incorporate it into the prompt
if prompt_content:
logger.debug("[REFACTOR] Found prompt.txt file, incorporating content")
request.prompt = prompt_content + "\n\n" + request.prompt
# Update request files list
if updated_files is not None:
logger.debug(f"[REFACTOR] Updated files list after prompt.txt processing: {len(updated_files)} files")
request.files = updated_files
# Calculate available token budget for dynamic allocation
continuation_id = getattr(request, "continuation_id", None)
# Get model context for token budget calculation
model_name = getattr(self, "_current_model_name", None)
available_tokens = None
if model_name:
try:
provider = self.get_model_provider(model_name)
capabilities = provider.get_capabilities(model_name)
# Use 75% of context for content (code + style examples), 25% for response
available_tokens = int(capabilities.context_window * 0.75)
logger.debug(
f"[REFACTOR] Token budget calculation: {available_tokens:,} tokens (75% of {capabilities.context_window:,}) for model {model_name}"
)
except Exception as e:
# Fallback to conservative estimate
logger.warning(f"[REFACTOR] Could not get model capabilities for {model_name}: {e}")
available_tokens = 120000 # Conservative fallback
logger.debug(f"[REFACTOR] Using fallback token budget: {available_tokens:,} tokens")
# Process style guide examples first to determine token allocation
style_examples_content = ""
style_examples_note = ""
if request.style_guide_examples:
logger.debug(f"[REFACTOR] Processing {len(request.style_guide_examples)} style guide examples")
style_examples_content, style_examples_note = self._process_style_guide_examples(
request.style_guide_examples, continuation_id, available_tokens
)
if style_examples_content:
logger.info("[REFACTOR] Style guide examples processed successfully for pattern reference")
else:
logger.info("[REFACTOR] No style guide examples content after processing")
# Remove files that appear in both 'files' and 'style_guide_examples' to avoid duplicate embedding
code_files_to_process = request.files.copy()
if request.style_guide_examples:
# Normalize paths for comparison
style_example_set = {os.path.normpath(os.path.abspath(f)) for f in request.style_guide_examples}
original_count = len(code_files_to_process)
code_files_to_process = [
f for f in code_files_to_process if os.path.normpath(os.path.abspath(f)) not in style_example_set
]
duplicates_removed = original_count - len(code_files_to_process)
if duplicates_removed > 0:
logger.info(
f"[REFACTOR] Removed {duplicates_removed} duplicate files from code files list "
f"(already included in style guide examples for pattern reference)"
)
# Calculate remaining tokens for main code after style examples
if style_examples_content and available_tokens:
from utils.token_utils import estimate_tokens
style_tokens = estimate_tokens(style_examples_content)
remaining_tokens = available_tokens - style_tokens - 5000 # Reserve for prompt structure
logger.debug(
f"[REFACTOR] Token allocation: {style_tokens:,} for examples, {remaining_tokens:,} remaining for code files"
)
else:
if available_tokens:
remaining_tokens = available_tokens - 10000
else:
remaining_tokens = 110000 # Conservative fallback (120000 - 10000)
logger.debug(
f"[REFACTOR] Token allocation: {remaining_tokens:,} tokens available for code files (no style examples)"
)
# Use centralized file processing logic for main code files (with line numbers enabled)
logger.debug(f"[REFACTOR] Preparing {len(code_files_to_process)} code files for analysis")
code_content = self._prepare_file_content_for_prompt(
code_files_to_process, continuation_id, "Code to analyze", max_tokens=remaining_tokens, reserve_tokens=2000
)
if code_content:
from utils.token_utils import estimate_tokens
code_tokens = estimate_tokens(code_content)
logger.info(f"[REFACTOR] Code files embedded successfully: {code_tokens:,} tokens")
else:
logger.warning("[REFACTOR] No code content after file processing")
# Detect primary language for language-specific guidance
primary_language = self.detect_primary_language(request.files)
logger.debug(f"[REFACTOR] Detected primary language: {primary_language}")
# Get language-specific guidance
language_guidance = self.get_language_specific_guidance(primary_language, request.refactor_type)
# Build the complete prompt
prompt_parts = []
# Add system prompt with dynamic language guidance
base_system_prompt = self.get_system_prompt()
if language_guidance:
enhanced_system_prompt = f"{base_system_prompt}\n\n{language_guidance}"
else:
enhanced_system_prompt = base_system_prompt
prompt_parts.append(enhanced_system_prompt)
# Add user context
prompt_parts.append("=== USER CONTEXT ===")
prompt_parts.append(f"Refactor Type: {request.refactor_type}")
if request.focus_areas:
prompt_parts.append(f"Focus Areas: {', '.join(request.focus_areas)}")
prompt_parts.append(f"User Goals: {request.prompt}")
prompt_parts.append("=== END CONTEXT ===")
# Add style guide examples if provided
if style_examples_content:
prompt_parts.append("\n=== STYLE GUIDE EXAMPLES ===")
if style_examples_note:
prompt_parts.append(f"// {style_examples_note}")
prompt_parts.append(style_examples_content)
prompt_parts.append("=== END STYLE GUIDE EXAMPLES ===")
# Add main code to analyze
prompt_parts.append("\n=== CODE TO ANALYZE ===")
prompt_parts.append(code_content)
prompt_parts.append("=== END CODE ===")
# Add generation instructions
prompt_parts.append(
f"\nPlease analyze the code for {request.refactor_type} refactoring opportunities following the multi-expert workflow specified in the system prompt."
)
if style_examples_content:
prompt_parts.append(
"Use the provided style guide examples as a reference for target coding patterns and style."
)
full_prompt = "\n".join(prompt_parts)
# Log final prompt statistics
from utils.token_utils import estimate_tokens
total_tokens = estimate_tokens(full_prompt)
logger.info(f"[REFACTOR] Complete prompt prepared: {total_tokens:,} tokens, {len(full_prompt):,} characters")
return full_prompt
def format_response(self, response: str, request: RefactorRequest, model_info: Optional[dict] = None) -> str:
"""
Format the refactoring response.
The base tool handles structured response validation via SPECIAL_STATUS_MODELS,
so this method focuses on presentation formatting.
Args:
response: The raw refactoring analysis from the model
request: The original request for context
model_info: Optional dict with model metadata
Returns:
str: The response (base tool will handle structured parsing)
"""
logger.debug(f"[REFACTOR] Formatting response for {request.refactor_type} refactoring")
# Check if this is a more_refactor_required response
is_more_required = False
try:
import json
parsed = json.loads(response)
if isinstance(parsed, dict) and parsed.get("status") == "more_refactor_required":
is_more_required = True
except (json.JSONDecodeError, ValueError):
# Not JSON or not more_refactor_required
pass
# Always add implementation instructions for any refactoring response
step4_title = (
"## Step 4: CONTINUE WITH MORE REFACTORING" if is_more_required else "## Step 4: COMPLETE REFACTORING"
)
step4_intro = (
"Once all refactorings above are implemented and verified working, IMMEDIATELY continue the analysis:"
if is_more_required
else "Once all refactorings above are implemented and verified working:"
)
step4_action = (
"Use the refactor tool again with the SAME parameters but include the continuation_id from this response to get additional refactoring opportunities. The model will provide more refactoring suggestions building on what was already completed."
if is_more_required
else "Review all changes made and ensure the codebase is cleaner, more maintainable, and follows best practices."
)
critical_msg = (
"apply all refactorings, validate they work, then immediately continue with more refactoring analysis. Take full ownership of the refactoring implementation and continue until all opportunities are addressed."
if is_more_required
else "create, modify, and reorganize files as needed. Take full ownership of the refactoring implementation and ensure all changes work correctly."
)
return f"""{response}
---
# IMMEDIATE NEXT ACTION
Claude, you are now in EXECUTION MODE. Take immediate action:
## Step 1: ULTRATHINK & IMPLEMENT REFACTORINGS
ULTRATHINK while implementing these refactorings. Verify EVERY code reference, import, dependency, and access modifier is 100% accurate before making changes.
- **IMPLEMENT** all refactoring opportunities listed above in priority order
- **CREATE** any new files needed for decomposition or reorganization
- **MOVE** code to appropriate locations following the refactoring plan
- **UPDATE** all imports and references to maintain functionality
- **VALIDATE** that no functionality is broken by the changes
## Step 2: VERIFY CHANGES WORK
**MANDATORY**: After each refactoring step:
- Ensure all imports are updated correctly
- Verify access modifiers (private/public/internal) still work
- Check that all references to moved code are updated
- Run any existing tests to confirm nothing is broken
- Fix any issues that arise from the refactoring
## Step 3: DISPLAY RESULTS TO USER
After implementing each refactoring, show the user:
```
✅ Refactored: [refactor-id] - Brief description
- Files modified: [list of files]
- [Key change summary]
```
{step4_title}
{step4_intro}
{step4_action}
**CRITICAL**: Do NOT stop after generating the analysis - you MUST {critical_msg}"""

View File

@@ -134,6 +134,10 @@ class TestGenTool(BaseTool):
def get_default_temperature(self) -> float: def get_default_temperature(self) -> float:
return TEMPERATURE_ANALYTICAL return TEMPERATURE_ANALYTICAL
def wants_line_numbers_by_default(self) -> bool:
"""Test generation tool needs line numbers for precise targeting"""
return True
def get_model_category(self): def get_model_category(self):
"""TestGen requires extended reasoning for comprehensive test analysis""" """TestGen requires extended reasoning for comprehensive test analysis"""
from tools.models import ToolModelCategory from tools.models import ToolModelCategory

View File

@@ -2,7 +2,9 @@
Utility functions for Zen MCP Server Utility functions for Zen MCP Server
""" """
from .file_utils import CODE_EXTENSIONS, expand_paths, read_file_content, read_files from .file_types import CODE_EXTENSIONS, FILE_CATEGORIES, PROGRAMMING_EXTENSIONS, TEXT_EXTENSIONS
from .file_utils import expand_paths, read_file_content, read_files
from .security_config import EXCLUDED_DIRS, SECURITY_ROOT
from .token_utils import check_token_limit, estimate_tokens from .token_utils import check_token_limit, estimate_tokens
__all__ = [ __all__ = [
@@ -10,6 +12,11 @@ __all__ = [
"read_file_content", "read_file_content",
"expand_paths", "expand_paths",
"CODE_EXTENSIONS", "CODE_EXTENSIONS",
"PROGRAMMING_EXTENSIONS",
"TEXT_EXTENSIONS",
"FILE_CATEGORIES",
"SECURITY_ROOT",
"EXCLUDED_DIRS",
"estimate_tokens", "estimate_tokens",
"check_token_limit", "check_token_limit",
] ]

180
utils/file_types.py Normal file
View File

@@ -0,0 +1,180 @@
"""
File type definitions and constants for file processing
This module centralizes all file type and extension definitions used
throughout the MCP server for consistent file handling.
"""
# Programming language file extensions - core code files
PROGRAMMING_LANGUAGES = {
".py", # Python
".js", # JavaScript
".ts", # TypeScript
".jsx", # React JavaScript
".tsx", # React TypeScript
".java", # Java
".cpp", # C++
".c", # C
".h", # C/C++ Header
".hpp", # C++ Header
".cs", # C#
".go", # Go
".rs", # Rust
".rb", # Ruby
".php", # PHP
".swift", # Swift
".kt", # Kotlin
".scala", # Scala
".r", # R
".m", # Objective-C
".mm", # Objective-C++
}
# Script and shell file extensions
SCRIPTS = {
".sql", # SQL
".sh", # Shell
".bash", # Bash
".zsh", # Zsh
".fish", # Fish shell
".ps1", # PowerShell
".bat", # Batch
".cmd", # Command
}
# Configuration and data file extensions
CONFIGS = {
".yml", # YAML
".yaml", # YAML
".json", # JSON
".xml", # XML
".toml", # TOML
".ini", # INI
".cfg", # Config
".conf", # Config
".properties", # Properties
".env", # Environment
}
# Documentation and markup file extensions
DOCS = {
".txt", # Text
".md", # Markdown
".rst", # reStructuredText
".tex", # LaTeX
}
# Web development file extensions
WEB = {
".html", # HTML
".css", # CSS
".scss", # Sass
".sass", # Sass
".less", # Less
}
# Additional text file extensions for logs and data
TEXT_DATA = {
".log", # Log files
".csv", # CSV
".tsv", # TSV
".gitignore", # Git ignore
".dockerfile", # Docker
".makefile", # Make
".cmake", # CMake
".gradle", # Gradle
".sbt", # SBT
".pom", # Maven POM
".lock", # Lock files
}
# Image file extensions
IMAGES = {".jpg", ".jpeg", ".png", ".gif", ".bmp", ".svg", ".webp", ".ico", ".tiff", ".tif"}
# Binary executable and library extensions
BINARIES = {
".exe", # Windows executable
".dll", # Windows library
".so", # Linux shared object
".dylib", # macOS dynamic library
".bin", # Binary
".class", # Java class
}
# Archive and package file extensions
ARCHIVES = {
".jar",
".war",
".ear", # Java archives
".zip",
".tar",
".gz", # General archives
".7z",
".rar", # Compression
".deb",
".rpm", # Linux packages
".dmg",
".pkg", # macOS packages
}
# Derived sets for different use cases
CODE_EXTENSIONS = PROGRAMMING_LANGUAGES | SCRIPTS | CONFIGS | DOCS | WEB
PROGRAMMING_EXTENSIONS = PROGRAMMING_LANGUAGES # For line numbering
TEXT_EXTENSIONS = CODE_EXTENSIONS | TEXT_DATA
IMAGE_EXTENSIONS = IMAGES
BINARY_EXTENSIONS = BINARIES | ARCHIVES
# All extensions by category for easy access
FILE_CATEGORIES = {
"programming": PROGRAMMING_LANGUAGES,
"scripts": SCRIPTS,
"configs": CONFIGS,
"docs": DOCS,
"web": WEB,
"text_data": TEXT_DATA,
"images": IMAGES,
"binaries": BINARIES,
"archives": ARCHIVES,
}
def get_file_category(file_path: str) -> str:
"""
Determine the category of a file based on its extension.
Args:
file_path: Path to the file
Returns:
Category name or "unknown" if not recognized
"""
from pathlib import Path
extension = Path(file_path).suffix.lower()
for category, extensions in FILE_CATEGORIES.items():
if extension in extensions:
return category
return "unknown"
def is_code_file(file_path: str) -> bool:
"""Check if a file is a code file (programming language)."""
from pathlib import Path
return Path(file_path).suffix.lower() in PROGRAMMING_LANGUAGES
def is_text_file(file_path: str) -> bool:
"""Check if a file is a text file."""
from pathlib import Path
return Path(file_path).suffix.lower() in TEXT_EXTENSIONS
def is_binary_file(file_path: str) -> bool:
"""Check if a file is a binary file."""
from pathlib import Path
return Path(file_path).suffix.lower() in BINARY_EXTENSIONS

View File

@@ -23,148 +23,12 @@ import os
from pathlib import Path from pathlib import Path
from typing import Optional from typing import Optional
from .file_types import BINARY_EXTENSIONS, CODE_EXTENSIONS, IMAGE_EXTENSIONS, TEXT_EXTENSIONS
from .security_config import CONTAINER_WORKSPACE, EXCLUDED_DIRS, MCP_SIGNATURE_FILES, SECURITY_ROOT, WORKSPACE_ROOT
from .token_utils import DEFAULT_CONTEXT_WINDOW, estimate_tokens from .token_utils import DEFAULT_CONTEXT_WINDOW, estimate_tokens
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
# Get workspace root for Docker path translation
# IMPORTANT: WORKSPACE_ROOT should contain the HOST path (e.g., /Users/john/project)
# that gets mounted to /workspace in the Docker container. This enables proper
# path translation between host absolute paths and container workspace paths.
WORKSPACE_ROOT = os.environ.get("WORKSPACE_ROOT")
CONTAINER_WORKSPACE = Path("/workspace")
# Dangerous paths that should never be used as WORKSPACE_ROOT
# These would give overly broad access and pose security risks
DANGEROUS_WORKSPACE_PATHS = {
"/",
"/etc",
"/usr",
"/bin",
"/var",
"/root",
"/home",
"/workspace", # Container path - WORKSPACE_ROOT should be host path
"C:\\",
"C:\\Windows",
"C:\\Program Files",
"C:\\Users",
}
# Validate WORKSPACE_ROOT for security if it's set
if WORKSPACE_ROOT:
# Resolve to canonical path for comparison
resolved_workspace = Path(WORKSPACE_ROOT).resolve()
# Special check for /workspace - common configuration mistake
if str(resolved_workspace) == "/workspace":
raise RuntimeError(
f"Configuration Error: WORKSPACE_ROOT should be set to the HOST path, not the container path. "
f"Found: WORKSPACE_ROOT={WORKSPACE_ROOT} "
f"Expected: WORKSPACE_ROOT should be set to your host directory path (e.g., $HOME) "
f"that contains all files Claude might reference. "
f"This path gets mounted to /workspace inside the Docker container."
)
# Check against other dangerous paths
if str(resolved_workspace) in DANGEROUS_WORKSPACE_PATHS:
raise RuntimeError(
f"Security Error: WORKSPACE_ROOT '{WORKSPACE_ROOT}' is set to a dangerous system directory. "
f"This would give access to critical system files. "
f"Please set WORKSPACE_ROOT to a specific project directory."
)
# Additional check: prevent filesystem root
if resolved_workspace.parent == resolved_workspace:
raise RuntimeError(
f"Security Error: WORKSPACE_ROOT '{WORKSPACE_ROOT}' cannot be the filesystem root. "
f"This would give access to the entire filesystem. "
f"Please set WORKSPACE_ROOT to a specific project directory."
)
# Security boundary
# In Docker: use /workspace (container directory)
# In tests/direct mode: use WORKSPACE_ROOT (host directory)
if CONTAINER_WORKSPACE.exists():
# Running in Docker container
SECURITY_ROOT = CONTAINER_WORKSPACE
elif WORKSPACE_ROOT:
# Running in tests or direct mode with WORKSPACE_ROOT set
SECURITY_ROOT = Path(WORKSPACE_ROOT).resolve()
else:
# Fallback for backward compatibility (should not happen in normal usage)
SECURITY_ROOT = Path.home()
# Directories to exclude from recursive file search
# These typically contain generated code, dependencies, or build artifacts
EXCLUDED_DIRS = {
"__pycache__",
"node_modules",
".venv",
"venv",
"env",
".env",
".git",
".svn",
".hg",
"build",
"dist",
"target",
".idea",
".vscode",
"__pypackages__",
".mypy_cache",
".pytest_cache",
".tox",
"htmlcov",
".coverage",
# Additional build and temp directories
"out",
".next",
".nuxt",
".cache",
".temp",
".tmp",
"bower_components",
"vendor",
".sass-cache",
".gradle",
".m2",
"coverage",
# OS-specific directories
".DS_Store",
"Thumbs.db",
# Python specific
"*.egg-info",
".eggs",
"wheels",
".Python",
# IDE and editor directories
".sublime",
".atom",
".brackets",
"*.swp",
"*.swo",
"*~",
# Documentation build
"_build",
"site",
# Mobile development
".expo",
".flutter",
}
# MCP signature files - presence of these indicates the MCP's own directory
# Used to prevent the MCP from scanning its own codebase
MCP_SIGNATURE_FILES = {
"zen_server.py",
"server.py",
"tools/precommit.py",
"utils/file_utils.py",
"prompts/tool_prompts.py",
}
def is_mcp_directory(path: Path) -> bool: def is_mcp_directory(path: Path) -> bool:
""" """
@@ -242,7 +106,7 @@ def is_home_directory_root(path: Path) -> bool:
# Check if this is exactly the home directory # Check if this is exactly the home directory
if resolved_path == resolved_home: if resolved_path == resolved_home:
logger.warning( logger.warning(
f"Attempted to scan user home directory root: {path}. " f"Please specify a subdirectory instead." f"Attempted to scan user home directory root: {path}. Please specify a subdirectory instead."
) )
return True return True
@@ -277,56 +141,105 @@ def is_home_directory_root(path: Path) -> bool:
return False return False
# Common code file extensions that are automatically included when processing directories def detect_file_type(file_path: str) -> str:
# This set can be extended to support additional file types """
CODE_EXTENSIONS = { Detect file type for appropriate processing strategy.
".py",
".js", NOTE: This function is currently not used for line number auto-detection
".ts", due to backward compatibility requirements. It is intended for future
".jsx", features requiring specific file type handling (e.g., image processing,
".tsx", binary file analysis, or enhanced file filtering).
".java",
".cpp", Args:
".c", file_path: Path to the file to analyze
".h",
".hpp", Returns:
".cs", str: "text", "binary", or "image"
".go", """
".rs", path = Path(file_path)
".rb",
".php", # Check extension first (fast)
".swift", extension = path.suffix.lower()
".kt", if extension in TEXT_EXTENSIONS:
".scala", return "text"
".r", elif extension in IMAGE_EXTENSIONS:
".m", return "image"
".mm", elif extension in BINARY_EXTENSIONS:
".sql", return "binary"
".sh",
".bash", # Fallback: check magic bytes for text vs binary
".zsh", # This is helpful for files without extensions or unknown extensions
".fish", try:
".ps1", with open(path, "rb") as f:
".bat", chunk = f.read(1024)
".cmd", # Simple heuristic: if we can decode as UTF-8, likely text
".yml", chunk.decode("utf-8")
".yaml", return "text"
".json", except UnicodeDecodeError:
".xml", return "binary"
".toml", except (FileNotFoundError, PermissionError) as e:
".ini", logger.warning(f"Could not access file {file_path} for type detection: {e}")
".cfg", return "unknown"
".conf",
".txt",
".md", def should_add_line_numbers(file_path: str, include_line_numbers: Optional[bool] = None) -> bool:
".rst", """
".tex", Determine if line numbers should be added to a file.
".html",
".css", Args:
".scss", file_path: Path to the file
".sass", include_line_numbers: Explicit preference, or None for auto-detection
".less",
} Returns:
bool: True if line numbers should be added
"""
if include_line_numbers is not None:
return include_line_numbers
# Default: DO NOT add line numbers (backwards compatibility)
# Tools that want line numbers must explicitly request them
return False
def _normalize_line_endings(content: str) -> str:
"""
Normalize line endings for consistent line numbering.
Args:
content: File content with potentially mixed line endings
Returns:
str: Content with normalized LF line endings
"""
# Normalize all line endings to LF for consistent counting
return content.replace("\r\n", "\n").replace("\r", "\n")
def _add_line_numbers(content: str) -> str:
"""
Add line numbers to text content for precise referencing.
Args:
content: Text content to number
Returns:
str: Content with line numbers in format " 45│ actual code line"
Supports files up to 99,999 lines with dynamic width allocation
"""
# Normalize line endings first
normalized_content = _normalize_line_endings(content)
lines = normalized_content.split("\n")
# Dynamic width allocation based on total line count
# This supports files of any size by computing required width
total_lines = len(lines)
width = len(str(total_lines))
width = max(width, 4) # Minimum padding for readability
# Format with dynamic width and clear separator
numbered_lines = [f"{i + 1:{width}d}{line}" for i, line in enumerate(lines)]
return "\n".join(numbered_lines)
def translate_path_for_environment(path_str: str) -> str: def translate_path_for_environment(path_str: str) -> str:
@@ -515,15 +428,13 @@ def expand_paths(paths: list[str], extensions: Optional[set[str]] = None) -> lis
# Check 2: Prevent scanning user's home directory root # Check 2: Prevent scanning user's home directory root
if is_home_directory_root(path_obj): if is_home_directory_root(path_obj):
logger.warning( logger.warning(f"Skipping home directory root: {path}. Please specify a project subdirectory instead.")
f"Skipping home directory root: {path}. " f"Please specify a project subdirectory instead."
)
continue continue
# Check 3: Skip if this is the MCP's own directory # Check 3: Skip if this is the MCP's own directory
if is_mcp_directory(path_obj): if is_mcp_directory(path_obj):
logger.info( logger.info(
f"Skipping MCP server directory: {path}. " f"The MCP server code is excluded from project scans." f"Skipping MCP server directory: {path}. The MCP server code is excluded from project scans."
) )
continue continue
@@ -575,7 +486,9 @@ def expand_paths(paths: list[str], extensions: Optional[set[str]] = None) -> lis
return expanded_files return expanded_files
def read_file_content(file_path: str, max_size: int = 1_000_000) -> tuple[str, int]: def read_file_content(
file_path: str, max_size: int = 1_000_000, *, include_line_numbers: Optional[bool] = None
) -> tuple[str, int]:
""" """
Read a single file and format it for inclusion in AI prompts. Read a single file and format it for inclusion in AI prompts.
@@ -586,6 +499,7 @@ def read_file_content(file_path: str, max_size: int = 1_000_000) -> tuple[str, i
Args: Args:
file_path: Path to file (must be absolute) file_path: Path to file (must be absolute)
max_size: Maximum file size to read (default 1MB to prevent memory issues) max_size: Maximum file size to read (default 1MB to prevent memory issues)
include_line_numbers: Whether to add line numbers. If None, auto-detects based on file type
Returns: Returns:
Tuple of (formatted_content, estimated_tokens) Tuple of (formatted_content, estimated_tokens)
@@ -634,6 +548,10 @@ def read_file_content(file_path: str, max_size: int = 1_000_000) -> tuple[str, i
content = f"\n--- FILE TOO LARGE: {file_path} ---\nFile size: {file_size:,} bytes (max: {max_size:,})\n--- END FILE ---\n" content = f"\n--- FILE TOO LARGE: {file_path} ---\nFile size: {file_size:,} bytes (max: {max_size:,})\n--- END FILE ---\n"
return content, estimate_tokens(content) return content, estimate_tokens(content)
# Determine if we should add line numbers
add_line_numbers = should_add_line_numbers(file_path, include_line_numbers)
logger.debug(f"[FILES] Line numbers for {file_path}: {'enabled' if add_line_numbers else 'disabled'}")
# Read the file with UTF-8 encoding, replacing invalid characters # Read the file with UTF-8 encoding, replacing invalid characters
# This ensures we can handle files with mixed encodings # This ensures we can handle files with mixed encodings
logger.debug(f"[FILES] Reading file content for {file_path}") logger.debug(f"[FILES] Reading file content for {file_path}")
@@ -642,6 +560,14 @@ def read_file_content(file_path: str, max_size: int = 1_000_000) -> tuple[str, i
logger.debug(f"[FILES] Successfully read {len(file_content)} characters from {file_path}") logger.debug(f"[FILES] Successfully read {len(file_content)} characters from {file_path}")
# Add line numbers if requested or auto-detected
if add_line_numbers:
file_content = _add_line_numbers(file_content)
logger.debug(f"[FILES] Added line numbers to {file_path}")
else:
# Still normalize line endings for consistency
file_content = _normalize_line_endings(file_content)
# Format with clear delimiters that help the AI understand file boundaries # Format with clear delimiters that help the AI understand file boundaries
# Using consistent markers makes it easier for the model to parse # Using consistent markers makes it easier for the model to parse
# NOTE: These markers ("--- BEGIN FILE: ... ---") are distinct from git diff markers # NOTE: These markers ("--- BEGIN FILE: ... ---") are distinct from git diff markers
@@ -665,6 +591,8 @@ def read_files(
code: Optional[str] = None, code: Optional[str] = None,
max_tokens: Optional[int] = None, max_tokens: Optional[int] = None,
reserve_tokens: int = 50_000, reserve_tokens: int = 50_000,
*,
include_line_numbers: bool = False,
) -> str: ) -> str:
""" """
Read multiple files and optional direct code with smart token management. Read multiple files and optional direct code with smart token management.
@@ -679,6 +607,7 @@ def read_files(
code: Optional direct code to include (prioritized over files) code: Optional direct code to include (prioritized over files)
max_tokens: Maximum tokens to use (defaults to DEFAULT_CONTEXT_WINDOW) max_tokens: Maximum tokens to use (defaults to DEFAULT_CONTEXT_WINDOW)
reserve_tokens: Tokens to reserve for prompt and response (default 50K) reserve_tokens: Tokens to reserve for prompt and response (default 50K)
include_line_numbers: Whether to add line numbers to file content
Returns: Returns:
str: All file contents formatted for AI consumption str: All file contents formatted for AI consumption
@@ -728,7 +657,7 @@ def read_files(
files_skipped.extend(all_files[i:]) files_skipped.extend(all_files[i:])
break break
file_content, file_tokens = read_file_content(file_path) file_content, file_tokens = read_file_content(file_path, include_line_numbers=include_line_numbers)
logger.debug(f"[FILES] File {file_path}: {file_tokens:,} tokens") logger.debug(f"[FILES] File {file_path}: {file_tokens:,} tokens")
# Check if adding this file would exceed limit # Check if adding this file would exceed limit

174
utils/security_config.py Normal file
View File

@@ -0,0 +1,174 @@
"""
Security configuration and path validation constants
This module contains security-related constants and configurations
for file access control and workspace management.
"""
import os
from pathlib import Path
# Dangerous paths that should never be used as WORKSPACE_ROOT
# These would give overly broad access and pose security risks
DANGEROUS_WORKSPACE_PATHS = {
"/",
"/etc",
"/usr",
"/bin",
"/var",
"/root",
"/home",
"/workspace", # Container path - WORKSPACE_ROOT should be host path
"C:\\",
"C:\\Windows",
"C:\\Program Files",
"C:\\Users",
}
# Directories to exclude from recursive file search
# These typically contain generated code, dependencies, or build artifacts
EXCLUDED_DIRS = {
# Python
"__pycache__",
".venv",
"venv",
"env",
".env",
"*.egg-info",
".eggs",
"wheels",
".Python",
".mypy_cache",
".pytest_cache",
".tox",
"htmlcov",
".coverage",
"coverage",
# Node.js / JavaScript
"node_modules",
".next",
".nuxt",
"bower_components",
".sass-cache",
# Version Control
".git",
".svn",
".hg",
# Build Output
"build",
"dist",
"target",
"out",
# IDEs
".idea",
".vscode",
".sublime",
".atom",
".brackets",
# Temporary / Cache
".cache",
".temp",
".tmp",
"*.swp",
"*.swo",
"*~",
# OS-specific
".DS_Store",
"Thumbs.db",
# Java / JVM
".gradle",
".m2",
# Documentation build
"_build",
"site",
# Mobile development
".expo",
".flutter",
# Package managers
"vendor",
}
# MCP signature files - presence of these indicates the MCP's own directory
# Used to prevent the MCP from scanning its own codebase
MCP_SIGNATURE_FILES = {
"zen_server.py",
"server.py",
"tools/precommit.py",
"utils/file_utils.py",
"prompts/tool_prompts.py",
}
# Workspace configuration
WORKSPACE_ROOT = os.environ.get("WORKSPACE_ROOT")
CONTAINER_WORKSPACE = Path("/workspace")
def validate_workspace_security(workspace_root: str) -> None:
"""
Validate that WORKSPACE_ROOT is set to a safe directory.
Args:
workspace_root: The workspace root path to validate
Raises:
RuntimeError: If the workspace root is unsafe
"""
if not workspace_root:
return
# Resolve to canonical path for comparison
resolved_workspace = Path(workspace_root).resolve()
# Special check for /workspace - common configuration mistake
if str(resolved_workspace) == "/workspace":
raise RuntimeError(
f"Configuration Error: WORKSPACE_ROOT should be set to the HOST path, not the container path. "
f"Found: WORKSPACE_ROOT={workspace_root} "
f"Expected: WORKSPACE_ROOT should be set to your host directory path (e.g., $HOME) "
f"that contains all files Claude might reference. "
f"This path gets mounted to /workspace inside the Docker container."
)
# Check against other dangerous paths
if str(resolved_workspace) in DANGEROUS_WORKSPACE_PATHS:
raise RuntimeError(
f"Security Error: WORKSPACE_ROOT '{workspace_root}' is set to a dangerous system directory. "
f"This would give access to critical system files. "
f"Please set WORKSPACE_ROOT to a specific project directory."
)
# Additional check: prevent filesystem root
if resolved_workspace.parent == resolved_workspace:
raise RuntimeError(
f"Security Error: WORKSPACE_ROOT '{workspace_root}' cannot be the filesystem root. "
f"This would give access to the entire filesystem. "
f"Please set WORKSPACE_ROOT to a specific project directory."
)
def get_security_root() -> Path:
"""
Determine the security boundary for file access.
Returns:
Path object representing the security root directory
"""
# In Docker: use /workspace (container directory)
# In tests/direct mode: use WORKSPACE_ROOT (host directory)
if CONTAINER_WORKSPACE.exists():
# Running in Docker container
return CONTAINER_WORKSPACE
elif WORKSPACE_ROOT:
# Running in tests or direct mode with WORKSPACE_ROOT set
return Path(WORKSPACE_ROOT).resolve()
else:
# Fallback for backward compatibility (should not happen in normal usage)
return Path.home()
# Validate security on import if WORKSPACE_ROOT is set
if WORKSPACE_ROOT:
validate_workspace_security(WORKSPACE_ROOT)
# Export the computed security root
SECURITY_ROOT = get_security_root()