diff --git a/README.md b/README.md index 815bbf1..a67be15 100644 --- a/README.md +++ b/README.md @@ -13,7 +13,10 @@ problem-solving, and collaborative development. **Features true AI orchestration with conversations that continue across tasks** - Give Claude a complex task and let it orchestrate between models automatically. Claude stays in control, performs the actual work, -but gets perspectives from the best AI for each subtask. Claude can switch between different tools _and_ models mid-conversation, +but gets perspectives from the best AI for each subtask. With tools like [`analyze`](#6-analyze---smart-file-analysis) for +understanding codebases, [`codereview`](#3-codereview---professional-code-review) for audits, [`refactor`](#7-refactor---intelligent-code-refactoring) for +improving code structure, [`debug`](#5-debug---expert-debugging-assistant) for solving complex problems, and [`precommit`](#4-precommit---pre-commit-validation) for +validating changes, Claude can switch between different tools _and_ models mid-conversation, with context carrying forward seamlessly. **Example Workflow - Claude Code:** @@ -49,7 +52,8 @@ and review into consideration to aid with its pre-commit review. - [`precommit`](#4-precommit---pre-commit-validation) - Pre-commit validation - [`debug`](#5-debug---expert-debugging-assistant) - Debugging help - [`analyze`](#6-analyze---smart-file-analysis) - File analysis - - [`testgen`](#7-testgen---comprehensive-test-generation) - Test generation with edge cases + - [`refactor`](#7-refactor---intelligent-code-refactoring) - Code refactoring with decomposition focus + - [`testgen`](#8-testgen---comprehensive-test-generation) - Test generation with edge cases - **Advanced Usage** - [Advanced Features](#advanced-features) - AI-to-AI conversations, large prompts, web search @@ -256,6 +260,7 @@ Just ask Claude naturally: - **Something's broken?** → `debug` (root cause analysis, error tracing) - **Want to understand code?** → `analyze` (architecture, patterns, dependencies) - **Need comprehensive tests?** → `testgen` (generates test suites with edge cases) +- **Code needs refactoring?** → `refactor` (intelligent refactoring with decomposition focus) - **Server info?** → `version` (version and configuration details) **Auto Mode:** When `DEFAULT_MODEL=auto`, Claude automatically picks the best model for each task. You can override with: "Use flash for quick analysis" or "Use o3 to debug this". @@ -276,8 +281,9 @@ Just ask Claude naturally: 4. [`precommit`](#4-precommit---pre-commit-validation) - Validate git changes before committing 5. [`debug`](#5-debug---expert-debugging-assistant) - Root cause analysis and debugging 6. [`analyze`](#6-analyze---smart-file-analysis) - General-purpose file and code analysis -7. [`testgen`](#7-testgen---comprehensive-test-generation) - Comprehensive test generation with edge case coverage -8. [`version`](#8-version---server-information) - Get server version and configuration +7. [`refactor`](#7-refactor---intelligent-code-refactoring) - Code refactoring with decomposition focus +8. [`testgen`](#8-testgen---comprehensive-test-generation) - Comprehensive test generation with edge case coverage +9. [`version`](#9-version---server-information) - Get server version and configuration ### 1. `chat` - General Development Chat & Collaborative Thinking **Your thinking partner - bounce ideas, get second opinions, brainstorm collaboratively** @@ -435,7 +441,44 @@ Use zen and perform a thorough precommit ensuring there aren't any new regressio - Uses file paths (not content) for clean terminal output - Can identify patterns, anti-patterns, and refactoring opportunities - **Web search capability**: When enabled with `use_websearch` (default: true), the model can request Claude to perform web searches and share results back to enhance analysis with current documentation, design patterns, and best practices -### 7. `testgen` - Comprehensive Test Generation + +### 7. `refactor` - Intelligent Code Refactoring +**Comprehensive refactoring analysis with top-down decomposition strategy** + +**Thinking Mode:** Default is `medium` (8,192 tokens). Use `high` for complex legacy systems (worth the investment for thorough refactoring plans) or `max` for extremely complex codebases requiring deep analysis. + +**Model Recommendation:** The refactor tool excels with models that have large context windows like Gemini +Pro (1M tokens), which can analyze entire files and complex codebases simultaneously. +This comprehensive view enables detection of cross-file dependencies, architectural patterns, +and refactoring opportunities that might be missed when reviewing code in smaller chunks due to context +constraints. + +#### Example Prompts: + +**Basic Usage:** +``` +"Use gemini pro to decompose my_crazy_big_class.m into smaller extensions" +"Get gemini pro to identify code smells in the authentication module" +``` + +**Key Features:** +- **Intelligent prioritization** - Will refuse to work on low priority issues if code is unwieldy large and requires decomposition first, helps identify poorly managed classes and files that need structural improvements before detail work +- **Top-down decomposition strategy** - Analyzes file → class → function levels systematically +- **Four refactor types**: `codesmells` (detect anti-patterns), `decompose` (break down large components), `modernize` (update language features), `organization` (improve structure) +- **Precise line-number references** - Provides exact line numbers for Claude to implement changes +- **Language-specific guidance** - Tailored suggestions for Python, JavaScript, Java, C#, Swift, and more +- **Style guide integration** - Uses existing project files as pattern references +- **Conservative approach** - Careful dependency analysis to prevent breaking changes +- **Multi-file analysis** - Understands cross-file relationships and dependencies +- **Priority sequencing** - Recommends implementation order for refactoring changes + +**Refactor Types:** +- `codesmells`: Detect long methods, god classes, duplicated code, poor naming +- `decompose`: Break down large files (>1500 LOC), classes (>300 LOC), functions (>80 LOC) +- `modernize`: Update to modern language features (f-strings, async/await, etc.) +- `organization`: Improve logical grouping, separation of concerns, module structure + +### 8. `testgen` - Comprehensive Test Generation **Generates thorough test suites with edge case coverage** based on existing code and test framework used. **Thinking Mode (Extended thinking models):** Default is `medium` (8,192 tokens). Use `high` for complex systems with many interactions or `max` for critical systems requiring exhaustive test coverage. @@ -463,7 +506,7 @@ suites that cover realistic failure scenarios and integration points that shorte - Can reference existing test files: `"Generate tests following patterns from tests/unit/"` - Specific code coverage - target specific functions/classes rather than testing everything -### 8. `version` - Server Information +### 9. `version` - Server Information ``` "Get zen to show its version" ``` diff --git a/config.py b/config.py index d8cd42f..e99d2fd 100644 --- a/config.py +++ b/config.py @@ -14,7 +14,7 @@ import os # These values are used in server responses and for tracking releases # IMPORTANT: This is the single source of truth for version and author info # Semantic versioning: MAJOR.MINOR.PATCH -__version__ = "4.4.4" +__version__ = "4.5.0" # Last update date in ISO format __updated__ = "2025-06-14" # Primary maintainer diff --git a/docs/advanced-usage.md b/docs/advanced-usage.md index bf6fc42..1a0a97d 100644 --- a/docs/advanced-usage.md +++ b/docs/advanced-usage.md @@ -259,6 +259,23 @@ All tools that work with files support **both individual files and entire direct "Generate tests following patterns from tests/unit/ for new auth module" ``` +**`refactor`** - Intelligent code refactoring with decomposition focus +- `files`: Code files or directories to analyze for refactoring opportunities (required) +- `prompt`: Description of refactoring goals, context, and specific areas of focus (required) +- `refactor_type`: codesmells|decompose|modernize|organization (required) +- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high (default: server default) +- `focus_areas`: Specific areas to focus on (e.g., 'performance', 'readability', 'maintainability', 'security') +- `style_guide_examples`: Optional existing code files to use as style/pattern reference +- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only) +- `continuation_id`: Thread continuation ID for multi-turn conversations + +``` +"Analyze legacy codebase for decomposition opportunities" (auto mode picks best model) +"Use pro to identify code smells in the authentication module with max thinking mode" +"Use pro to modernize this JavaScript code following examples/modern-patterns.js" +"Refactor src/ for better organization, focus on maintainability and readability" +``` + ## Collaborative Workflows ### Design → Review → Implement @@ -284,6 +301,14 @@ suspect lies the bug and then formulate and implement a bare minimal fix. Must n with zen in the end using gemini pro to confirm we're okay to publish the fix ``` +### Refactor → Review → Implement → Test +``` +Use zen to analyze this legacy authentication module for decomposition opportunities. The code is getting hard to +maintain and we need to break it down. Use gemini pro with high thinking mode to identify code smells and suggest +a modernization strategy. After reviewing the refactoring plan, implement the changes step by step and then +generate comprehensive tests with zen to ensure nothing breaks. +``` + ### Tool Selection Guidance To help choose the right tool for your needs: @@ -292,14 +317,17 @@ To help choose the right tool for your needs: 2. **Want to find bugs/issues in code?** → Use `codereview` 3. **Want to understand how code works?** → Use `analyze` 4. **Need comprehensive test coverage?** → Use `testgen` -5. **Have analysis that needs extension/validation?** → Use `thinkdeep` -6. **Want to brainstorm or discuss?** → Use `chat` +5. **Want to refactor/modernize code?** → Use `refactor` +6. **Have analysis that needs extension/validation?** → Use `thinkdeep` +7. **Want to brainstorm or discuss?** → Use `chat` **Key Distinctions:** - `analyze` vs `codereview`: analyze explains, codereview prescribes fixes - `chat` vs `thinkdeep`: chat is open-ended, thinkdeep extends specific analysis - `debug` vs `codereview`: debug diagnoses runtime errors, review finds static issues - `testgen` vs `debug`: testgen creates test suites, debug just finds issues and recommends solutions +- `refactor` vs `codereview`: refactor suggests structural improvements, codereview finds bugs/issues +- `refactor` vs `analyze`: refactor provides actionable refactoring steps, analyze provides understanding ## Working with Large Prompts diff --git a/docs/testing.md b/docs/testing.md index 1b7aa87..76104e1 100644 --- a/docs/testing.md +++ b/docs/testing.md @@ -30,6 +30,29 @@ Simulator tests replicate real-world Claude CLI interactions with the MCP server **Important**: Simulator tests require `LOG_LEVEL=DEBUG` in your `.env` file to validate detailed execution logs. +#### Monitoring Logs During Tests + +**Important**: The MCP stdio protocol interferes with stderr output during tool execution. While server startup logs appear in `docker compose logs`, tool execution logs are only written to file-based logs inside the container. This is a known limitation of the stdio-based MCP protocol and cannot be fixed without changing the MCP implementation. + +To monitor logs during test execution: + +```bash +# Monitor main server logs (includes all tool execution details) +docker exec zen-mcp-server tail -f -n 500 /tmp/mcp_server.log + +# Monitor MCP activity logs (tool calls and completions) +docker exec zen-mcp-server tail -f /tmp/mcp_activity.log + +# Check log file sizes (logs rotate at 20MB) +docker exec zen-mcp-server ls -lh /tmp/mcp_*.log* +``` + +**Log Rotation**: All log files are configured with automatic rotation at 20MB to prevent disk space issues. The server keeps: +- 10 rotated files for mcp_server.log (200MB total) +- 5 rotated files for mcp_activity.log (100MB total) + +**Why logs don't appear in docker compose logs**: The MCP stdio_server captures stderr during tool execution to prevent interference with the JSON-RPC protocol communication. This means that while you'll see startup logs in `docker compose logs`, you won't see tool execution logs there. + #### Running All Simulator Tests ```bash # Run all simulator tests diff --git a/server.py b/server.py index 5dd06cd..ae8d038 100644 --- a/server.py +++ b/server.py @@ -44,6 +44,7 @@ from tools import ( CodeReviewTool, DebugIssueTool, Precommit, + RefactorTool, TestGenTool, ThinkDeepTool, ) @@ -70,55 +71,59 @@ class LocalTimeFormatter(logging.Formatter): # Configure both console and file logging log_format = "%(asctime)s - %(name)s - %(levelname)s - %(message)s" -logging.basicConfig( - level=getattr(logging, log_level, logging.INFO), - format=log_format, - force=True, # Force reconfiguration if already configured - stream=sys.stderr, # Use stderr to avoid interfering with MCP stdin/stdout protocol -) -# Apply local time formatter to root logger -for handler in logging.getLogger().handlers: - handler.setFormatter(LocalTimeFormatter(log_format)) +# Clear any existing handlers first +root_logger = logging.getLogger() +root_logger.handlers.clear() + +# Create and configure stderr handler explicitly +stderr_handler = logging.StreamHandler(sys.stderr) +stderr_handler.setLevel(getattr(logging, log_level, logging.INFO)) +stderr_handler.setFormatter(LocalTimeFormatter(log_format)) +root_logger.addHandler(stderr_handler) + +# Note: MCP stdio_server interferes with stderr during tool execution +# All logs are properly written to /tmp/mcp_server.log for monitoring + +# Set root logger level +root_logger.setLevel(getattr(logging, log_level, logging.INFO)) # Add rotating file handler for Docker log monitoring try: - # Main server log with daily rotation (keep 7 days of logs) - # Using 'midnight' interval rotates at midnight each day - # Filename will have date suffix like mcp_server.log.2024-06-14 - file_handler = TimedRotatingFileHandler( + # Main server log with size-based rotation (20MB max per file) + # This ensures logs don't grow indefinitely and are properly managed + file_handler = RotatingFileHandler( "/tmp/mcp_server.log", - when="midnight", # Rotate at midnight - interval=1, # Every 1 day - backupCount=7, # Keep 7 days of logs + maxBytes=20 * 1024 * 1024, # 20MB max file size + backupCount=10, # Keep 10 rotated files (200MB total) encoding="utf-8", ) file_handler.setLevel(getattr(logging, log_level, logging.INFO)) file_handler.setFormatter(LocalTimeFormatter(log_format)) - # Add suffix pattern for rotated files - file_handler.suffix = "%Y-%m-%d" logging.getLogger().addHandler(file_handler) - # Create a special logger for MCP activity tracking with daily rotation + # Create a special logger for MCP activity tracking with size-based rotation mcp_logger = logging.getLogger("mcp_activity") - mcp_file_handler = TimedRotatingFileHandler( + mcp_file_handler = RotatingFileHandler( "/tmp/mcp_activity.log", - when="midnight", # Rotate at midnight - interval=1, # Every 1 day - backupCount=7, # Keep 7 days of logs + maxBytes=20 * 1024 * 1024, # 20MB max file size + backupCount=5, # Keep 5 rotated files (100MB total) encoding="utf-8", ) mcp_file_handler.setLevel(logging.INFO) mcp_file_handler.setFormatter(LocalTimeFormatter("%(asctime)s - %(message)s")) - mcp_file_handler.suffix = "%Y-%m-%d" mcp_logger.addHandler(mcp_file_handler) mcp_logger.setLevel(logging.INFO) + # Ensure MCP activity also goes to stderr + mcp_logger.propagate = True # Also keep a size-based rotation as backup (100MB max per file) # This prevents any single day's log from growing too large size_handler = RotatingFileHandler( - "/tmp/mcp_server_overflow.log", maxBytes=100 * 1024 * 1024, backupCount=3 # 100MB + "/tmp/mcp_server_overflow.log", + maxBytes=100 * 1024 * 1024, + backupCount=3, # 100MB ) size_handler.setLevel(logging.WARNING) # Only warnings and errors size_handler.setFormatter(LocalTimeFormatter(log_format)) @@ -144,6 +149,7 @@ TOOLS = { "chat": ChatTool(), # Interactive development chat and brainstorming "precommit": Precommit(), # Pre-commit validation of git changes "testgen": TestGenTool(), # Comprehensive test generation with edge case coverage + "refactor": RefactorTool(), # Intelligent code refactoring suggestions with precise line references } diff --git a/simulator_tests/__init__.py b/simulator_tests/__init__.py index 48379e6..2ec4f74 100644 --- a/simulator_tests/__init__.py +++ b/simulator_tests/__init__.py @@ -19,6 +19,7 @@ from .test_openrouter_fallback import OpenRouterFallbackTest from .test_openrouter_models import OpenRouterModelsTest from .test_per_tool_deduplication import PerToolDeduplicationTest from .test_redis_validation import RedisValidationTest +from .test_refactor_validation import RefactorValidationTest from .test_testgen_validation import TestGenValidationTest from .test_token_allocation_validation import TokenAllocationValidationTest @@ -38,6 +39,7 @@ TEST_REGISTRY = { "openrouter_models": OpenRouterModelsTest, "token_allocation_validation": TokenAllocationValidationTest, "testgen_validation": TestGenValidationTest, + "refactor_validation": RefactorValidationTest, "conversation_chain_validation": ConversationChainValidationTest, } @@ -57,6 +59,7 @@ __all__ = [ "OpenRouterModelsTest", "TokenAllocationValidationTest", "TestGenValidationTest", + "RefactorValidationTest", "ConversationChainValidationTest", "TEST_REGISTRY", ] diff --git a/simulator_tests/test_refactor_validation.py b/simulator_tests/test_refactor_validation.py new file mode 100644 index 0000000..8ee17e5 --- /dev/null +++ b/simulator_tests/test_refactor_validation.py @@ -0,0 +1,283 @@ +#!/usr/bin/env python3 +""" +Refactor Tool Validation Test + +Tests the refactor tool with a simple code smell example to validate: +- Proper execution with flash model +- Correct line number references in response +- Log validation for tool execution +""" + +import json +from .base_test import BaseSimulatorTest + + +class RefactorValidationTest(BaseSimulatorTest): + """Test refactor tool with codesmells detection""" + + @property + def test_name(self) -> str: + return "refactor_validation" + + @property + def test_description(self) -> str: + return "Refactor tool validation with codesmells" + + def run_test(self) -> bool: + """Test refactor tool with a simple code smell example""" + try: + self.logger.info("Test: Refactor tool validation") + + # Setup test files directory first + self.setup_test_files() + + # Create a simple Python file with obvious code smells + code_with_smells = '''# Code with obvious smells for testing +def process_data(data): + # Code smell: Magic number + if len(data) > 42: + result = [] + # Code smell: Nested loops with poor variable names + for i in range(len(data)): + for j in range(len(data[i])): + x = data[i][j] + # Code smell: Duplicate code + if x > 0: + result.append(x * 2) + elif x < 0: + result.append(x * 2) + return result + else: + # Code smell: Return inconsistent type + return None + +# Code smell: God function doing too many things +def handle_everything(user_input, config, database): + # Validation + if not user_input: + print("Error: No input") # Code smell: print instead of logging + return + + # Processing + processed = user_input.strip().lower() + + # Database operation + connection = database.connect() + data = connection.query("SELECT * FROM users") # Code smell: SQL in code + + # Business logic mixed with data access + valid_users = [] + for row in data: + if row[2] == processed: # Code smell: Magic index + valid_users.append(row) + + return valid_users +''' + + # Create test file + test_file = self.create_additional_test_file("smelly_code.py", code_with_smells) + self.logger.info(f" ✅ Created test file with code smells: {test_file}") + + # Call refactor tool with codesmells type + self.logger.info(" 📝 Calling refactor tool with codesmells type...") + response, _ = self.call_mcp_tool( + "refactor", + { + "files": [test_file], + "prompt": "Find and suggest fixes for code smells in this file", + "refactor_type": "codesmells", + "model": "flash", + "thinking_mode": "low", # Keep it fast for testing + } + ) + + if not response: + self.logger.error("Failed to get refactor response") + return False + + self.logger.info(" ✅ Got refactor response") + + # Parse response to check for line references + try: + response_data = json.loads(response) + + # Debug: log the response structure + self.logger.debug(f"Response keys: {list(response_data.keys())}") + + # Extract the actual content if it's wrapped + if "content" in response_data: + # The actual refactoring data is in the content field + content = response_data["content"] + # Remove markdown code block markers if present + if content.startswith("```json"): + content = content[7:] # Remove ```json + if content.endswith("```"): + content = content[:-3] # Remove ``` + content = content.strip() + + # Find the end of the JSON object - handle truncated responses + # Count braces to find where the JSON ends + brace_count = 0 + json_end = -1 + in_string = False + escape_next = False + + for i, char in enumerate(content): + if escape_next: + escape_next = False + continue + if char == '\\': + escape_next = True + continue + if char == '"' and not escape_next: + in_string = not in_string + if not in_string: + if char == '{': + brace_count += 1 + elif char == '}': + brace_count -= 1 + if brace_count == 0: + json_end = i + 1 + break + + if json_end > 0: + content = content[:json_end] + + # Parse the inner JSON + inner_data = json.loads(content) + self.logger.debug(f"Inner data keys: {list(inner_data.keys())}") + else: + inner_data = response_data + + # Check that we got refactoring suggestions (might be called refactor_opportunities) + refactorings_key = None + for key in ["refactorings", "refactor_opportunities"]: + if key in inner_data: + refactorings_key = key + break + + if not refactorings_key: + self.logger.error("No refactorings found in response") + self.logger.error(f"Response structure: {json.dumps(inner_data, indent=2)[:500]}...") + return False + + refactorings = inner_data[refactorings_key] + if not isinstance(refactorings, list) or len(refactorings) == 0: + self.logger.error("Empty refactorings list") + return False + + # Validate that we have line references for code smells + # Flash model typically detects these issues: + # - Lines 4-18: process_data function (magic number, nested loops, duplicate code) + # - Lines 11-14: duplicate code blocks + # - Lines 21-40: handle_everything god function + expected_line_ranges = [ + (4, 18), # process_data function issues + (11, 14), # duplicate code + (21, 40), # god function + ] + + self.logger.debug(f"Refactorings found: {len(refactorings)}") + for i, ref in enumerate(refactorings[:3]): # Log first 3 + self.logger.debug(f"Refactoring {i}: start_line={ref.get('start_line')}, end_line={ref.get('end_line')}, type={ref.get('type')}") + + found_references = [] + for refactoring in refactorings: + # Check for line numbers in various fields + start_line = refactoring.get("start_line") + end_line = refactoring.get("end_line") + location = refactoring.get("location", "") + + # Add found line numbers + if start_line: + found_references.append(f"line {start_line}") + if end_line and end_line != start_line: + found_references.append(f"line {end_line}") + + # Also extract from location string + import re + line_matches = re.findall(r'line[s]?\s+(\d+)', location.lower()) + found_references.extend([f"line {num}" for num in line_matches]) + + self.logger.info(f" 📍 Found line references: {found_references}") + + # Check that flash found the expected refactoring areas + found_ranges = [] + for refactoring in refactorings: + start = refactoring.get("start_line") + end = refactoring.get("end_line") + if start and end: + found_ranges.append((start, end)) + + self.logger.info(f" 📍 Found refactoring ranges: {found_ranges}") + + # Verify we found issues in the main problem areas + # Check if we have issues detected in process_data function area (lines 2-18) + process_data_issues = [r for r in found_ranges if r[0] >= 2 and r[1] <= 18] + # Check if we have issues detected in handle_everything function area (lines 21-40) + god_function_issues = [r for r in found_ranges if r[0] >= 21 and r[1] <= 40] + + self.logger.info(f" 📍 Issues in process_data area (lines 2-18): {len(process_data_issues)}") + self.logger.info(f" 📍 Issues in handle_everything area (lines 21-40): {len(god_function_issues)}") + + if len(process_data_issues) >= 1 and len(god_function_issues) >= 1: + self.logger.info(f" ✅ Flash correctly identified code smells in both major areas") + self.logger.info(f" ✅ Found {len(refactorings)} total refactoring opportunities") + + # Verify we have reasonable number of total issues + if len(refactorings) >= 3: + self.logger.info(f" ✅ Refactoring analysis validation passed") + else: + self.logger.warning(f" ⚠️ Only {len(refactorings)} refactorings found (expected >= 3)") + else: + self.logger.error(f" ❌ Flash didn't find enough issues in expected areas") + self.logger.error(f" - process_data area: found {len(process_data_issues)}, expected >= 1") + self.logger.error(f" - handle_everything area: found {len(god_function_issues)}, expected >= 1") + return False + + except json.JSONDecodeError as e: + self.logger.error(f"Failed to parse refactor response as JSON: {e}") + return False + + # Validate logs + self.logger.info(" 📋 Validating execution logs...") + + # Get server logs from the actual log file inside the container + result = self.run_command( + ["docker", "exec", self.container_name, "tail", "-500", "/tmp/mcp_server.log"], + capture_output=True + ) + + if result.returncode == 0: + logs = result.stdout.decode() + result.stderr.decode() + + # Look for refactor tool execution patterns + refactor_patterns = [ + "[REFACTOR]", + "refactor tool", + "codesmells", + "Token budget", + "Code files embedded successfully" + ] + + patterns_found = 0 + for pattern in refactor_patterns: + if pattern in logs: + patterns_found += 1 + self.logger.debug(f" ✅ Found log pattern: {pattern}") + + if patterns_found >= 3: + self.logger.info(f" ✅ Log validation passed ({patterns_found}/{len(refactor_patterns)} patterns)") + else: + self.logger.warning(f" ⚠️ Only found {patterns_found}/{len(refactor_patterns)} log patterns") + else: + self.logger.warning(" ⚠️ Could not retrieve Docker logs") + + self.logger.info(" ✅ Refactor tool validation completed successfully") + return True + + except Exception as e: + self.logger.error(f"Refactor validation test failed: {e}") + return False + finally: + self.cleanup_test_files() \ No newline at end of file diff --git a/systemprompts/__init__.py b/systemprompts/__init__.py index 5a0156d..f9ca4e1 100644 --- a/systemprompts/__init__.py +++ b/systemprompts/__init__.py @@ -7,6 +7,7 @@ from .chat_prompt import CHAT_PROMPT from .codereview_prompt import CODEREVIEW_PROMPT from .debug_prompt import DEBUG_ISSUE_PROMPT from .precommit_prompt import PRECOMMIT_PROMPT +from .refactor_prompt import REFACTOR_PROMPT from .testgen_prompt import TESTGEN_PROMPT from .thinkdeep_prompt import THINKDEEP_PROMPT @@ -17,5 +18,6 @@ __all__ = [ "ANALYZE_PROMPT", "CHAT_PROMPT", "PRECOMMIT_PROMPT", + "REFACTOR_PROMPT", "TESTGEN_PROMPT", ] diff --git a/systemprompts/codereview_prompt.py b/systemprompts/codereview_prompt.py index 8dc87cd..5a5cd22 100644 --- a/systemprompts/codereview_prompt.py +++ b/systemprompts/codereview_prompt.py @@ -15,6 +15,11 @@ same file you've been provided unless for some reason its content is missing or {"status": "clarification_required", "question": "", "files_needed": ["[file name here]", "[or some folder/]"]} +CRITICAL LINE NUMBER INSTRUCTIONS +Code is presented with line number markers "LINE│ code". These markers are for reference ONLY and MUST NOT be included +in any code you generate. Always reference specific line numbers for precise feedback. Include exact line numbers in +your issue descriptions. + CRITICAL: Align your review with the user's context and expectations. Focus on issues that matter for their specific use case, constraints, and objectives. Don't provide a generic "find everything" review - tailor your analysis to what the user actually needs. diff --git a/systemprompts/debug_prompt.py b/systemprompts/debug_prompt.py index 09563c3..a92e6b2 100644 --- a/systemprompts/debug_prompt.py +++ b/systemprompts/debug_prompt.py @@ -15,6 +15,11 @@ Do NOT ask for the same file you've been provided unless for some reason its con {"status": "clarification_required", "question": "", "files_needed": ["[file name here]", "[or some folder/]"]} +CRITICAL LINE NUMBER INSTRUCTIONS +Code is presented with line number markers "LINE│ code". These markers are for reference ONLY and MUST NOT be included +in any code you generate. Always reference specific line numbers for precise feedback. Include exact line numbers in +your issue descriptions. + CRITICAL: Your primary objective is to identify the root cause of the specific issue at hand and suggest the minimal fix required to resolve it. Stay focused on the main problem - avoid suggesting extensive refactoring, architectural changes, or unrelated improvements. diff --git a/systemprompts/refactor_prompt.py b/systemprompts/refactor_prompt.py new file mode 100644 index 0000000..d560c2b --- /dev/null +++ b/systemprompts/refactor_prompt.py @@ -0,0 +1,201 @@ +""" +Refactor tool system prompt +""" + +REFACTOR_PROMPT = """ +ROLE +You are a principal software engineer specializing in intelligent code refactoring. You identify concrete improvement +opportunities and provide precise, actionable suggestions with exact line-number references that Claude can +implement directly. + +IF MORE INFORMATION IS NEEDED +If you need additional context (e.g., related files, configuration, dependencies) to provide accurate refactoring +recommendations, you MUST respond ONLY with this JSON format (and nothing else). Do NOT ask for the same file you've +been provided unless for some reason its content is missing or incomplete: +{"status": "clarification_required", "question": "", + "files_needed": ["[file name here]", "[or some folder/]"]} + +REFACTOR TYPES (PRIORITY ORDER) + +1. **decompose** (CRITICAL PRIORITY) +2. **codesmells** +3. **modernize** +4. **organization** + +**decompose**: CRITICAL PRIORITY for cognitive load reduction. When encountering large files (>1500 lines), huge classes +(>300 lines), or massive functions (>80 lines), decomposition is MANDATORY before any other refactoring type. Large +codebases are impossible to navigate, understand, or maintain. + +DECOMPOSITION ORDER (STRICT TOP-DOWN, ADAPTIVE): +Analyze in this sequence, stopping at the FIRST breached threshold in each file: + +1. **File Level (>1500 LOC)** → Propose file-level splits ONLY, then re-analyze after implementation +2. **Class Level (>300 LOC)** → Propose class extraction ONLY, then re-analyze after implementation +3. **Function Level (>80 LOC)** → Propose function extraction + +RATIONALE: Outer-scope size dominates cognitive load and merge complexity. NEVER descend to an inner level until +the containing level is within its threshold. This prevents premature micro-optimization and ensures maximum +cognitive load reduction with minimum rework. + +DECOMPOSITION STRATEGIES: + +**File-Level Decomposition** (PRIORITY 1): Split oversized files into multiple focused files: + - Extract related classes/functions into separate modules using platform-specific patterns + - Create logical groupings (models, services, utilities, components, etc.) + - Use proper import/export mechanisms for the target language + - Focus on responsibility-based splits, not arbitrary size cuts + - CAUTION: When only a single file is provided, verify dependencies and imports before suggesting file splits + - DEPENDENCY ANALYSIS: Check for cross-references, shared constants, and inter-class dependencies + - If splitting breaks internal dependencies, suggest necessary visibility changes or shared modules + +**Class-Level Decomposition** (PRIORITY 2): Break down mega-classes: + - FIRST: Split large classes into multiple classes where programming language allows (C# partial classes, + Swift and ObjC extensions, JavaScript modules, etc.) + - THEN: Extract specialized responsibilities into focused classes via composition or inheritance if this is feasible + - Use composition over inheritance where appropriate + - Apply single responsibility principle cautiously - avoid breaking existing APIs or adding new dependencies + - When only a single file is provided, prefer internal splitting methods (private classes, inner classes, + helper methods) + - Consider interface segregation for large public APIs only if it doesn't break existing consumers + - CRITICAL: When moving code between files/extensions, analyze access dependencies (private variables, + internal methods) + - WARNING: Some moves may break access visibility (Swift private→extension, C# internal→assembly) - flag for review + - If access breaks are unavoidable, explicitly note required visibility changes (private→internal, protected, etc.) + +**Function-Level Decomposition** (PRIORITY 3): Eliminate long, complex functions: + - Extract logical chunks into private/helper methods within the same class/module + - Separate data processing from business logic conservatively + - Create clear, named abstractions for complex operations without breaking existing call sites + - Maintain function cohesion and minimize parameter passing + - Prefer internal extraction over creating new dependencies or external functions + - ANALYZE DEPENDENCIES: Check for private variable access, closure captures, and scope-dependent behavior + - If extraction breaks variable access, suggest parameter passing or scope adjustments + - Flag functions that require manual review due to complex inter-dependencies + +CRITICAL RULE: If ANY file exceeds cognitive complexity thresholds (large files/classes/functions), you MUST: +1. Mark ALL decomposition opportunities as CRITICAL severity +2. Focus EXCLUSIVELY on decomposition - provide ONLY decomposition suggestions +3. DO NOT suggest ANY other refactoring type (code smells, modernization, organization) +4. List decomposition issues FIRST by severity: CRITICAL → HIGH → MEDIUM → LOW +5. Block all other refactoring until cognitive load is reduced + +CRITICAL SEVERITY = BLOCKING ISSUE: Other refactoring types can only be applied AFTER all CRITICAL decomposition +is complete. Decomposition reduces navigation complexity, improves understanding, enables focused changes, and makes +future refactoring possible. + +**codesmells**: Detect and fix quality issues - long methods, complex conditionals, duplicate code, magic numbers, +poor naming, feature envy. NOTE: Can only be applied AFTER decomposition if large files/classes/functions exist. + +**modernize**: Update to modern language features - replace deprecated patterns, use newer syntax, improve error +handling and type safety. NOTE: Can only be applied AFTER decomposition if large files/classes/functions exist. + +**organization**: Improve organization and structure - group related functionality, improve file structure, +standardize naming, clarify module boundaries. NOTE: Can only be applied AFTER decomposition if large files exist. + +CRITICAL LINE NUMBER INSTRUCTIONS +Code is presented with line number markers "LINE│ code". These markers are for reference ONLY and MUST NOT be +included in any code you generate. Always reference specific line numbers for Claude to locate exact positions. +Include context_start_text and context_end_text as backup references. Never include "LINE│" markers in generated code +snippets. + +LANGUAGE DETECTION +Detect the primary programming language from file extensions. Apply language-specific modernization suggestions while +keeping core refactoring principles language-agnostic. + +SCOPE CONTROL +Stay strictly within the provided codebase. Do NOT invent features, suggest major architectural changes beyond current +structure, recommend external libraries not in use, or create speculative ideas outside project scope. + +If scope is too large and refactoring would require large parts of the code to be involved, respond ONLY with: +{"status": "focused_review_required", + "reason": "", + "suggestion": ""} + +OUTPUT FORMAT +Return ONLY a JSON object with this exact structure: + +{ + "status": "refactor_analysis_complete", + "refactor_opportunities": [ + { + "id": "refactor-001", + "type": "decompose|codesmells|modernize|organization", + "severity": "critical|high|medium|low", + "file": "/absolute/path/to/file.ext", + "start_line": 45, + "end_line": 67, + "context_start_text": "exact text from start line for verification", + "context_end_text": "exact text from end line for verification", + "issue": "Clear description of what needs refactoring", + "suggestion": "Specific refactoring action to take", + "rationale": "Why this improves the code (performance, readability, maintainability)", + "code_to_replace": "Original code that should be changed", + "replacement_code_snippet": "Refactored version of the code", + "new_code_snippets": [ + { + "description": "What this new code does", + "location": "same_class|new_file|separate_module", + "code": "New code to be added" + } + ] + } + ], + "priority_sequence": ["refactor-001", "refactor-002"], + "next_actions_for_claude": [ + { + "action_type": "EXTRACT_METHOD|SPLIT_CLASS|MODERNIZE_SYNTAX|REORGANIZE_CODE|DECOMPOSE_FILE", + "target_file": "/absolute/path/to/file.ext", + "source_lines": "45-67", + "description": "Specific step-by-step action for Claude" + } + ] +} + +QUALITY STANDARDS +Each refactoring opportunity must be specific and actionable. Code snippets must be syntactically correct. Preserve +existing functionality - refactoring changes structure, not behavior. Focus on high-impact changes that meaningfully +improve code quality. + +SEVERITY GUIDELINES +- **critical**: EXCLUSIVELY for decomposition when large files/classes/functions detected - BLOCKS ALL OTHER + REFACTORING +- **high**: Critical code smells, major duplication, significant architectural issues (only after decomposition + complete) +- **medium**: Moderate complexity issues, minor duplication, organization improvements (only after decomposition + complete) +- **low**: Style improvements, minor modernization, optional optimizations (only after decomposition complete) + +DECOMPOSITION PRIORITY RULES - CRITICAL SEVERITY: +1. If ANY file >2000 lines: Mark ALL decomposition opportunities as CRITICAL severity +2. If ANY class >1500 lines: Mark ALL class decomposition as CRITICAL severity +3. If ANY function >250 lines: Mark ALL function decomposition as CRITICAL severity +4. CRITICAL issues MUST BE RESOLVED FIRST - no other refactoring suggestions allowed +5. Focus EXCLUSIVELY on breaking down large components when CRITICAL issues exist +6. List ALL decomposition issues FIRST in severity order: CRITICAL → HIGH → MEDIUM → LOW +7. When CRITICAL decomposition issues exist, provide ONLY decomposition suggestions + +FILE TYPE CONSIDERATIONS: +- CSS files can grow large with styling rules - consider logical grouping by components/pages +- JavaScript files may have multiple classes/modules - extract into separate files +- Configuration files may be legitimately large - focus on logical sections +- Generated code files should generally be excluded from decomposition + +IF EXTENSIVE REFACTORING IS REQUIRED +If you determine that comprehensive refactoring requires dozens of changes across multiple files or would involve +extensive back-and-forth iterations that would risk exceeding context limits, you MUST follow this structured approach: + +1. **Generate Essential Refactorings First**: Create the standard refactor_analysis_complete response with the most + critical and high-impact refactoring opportunities (typically 5-10 key changes covering the most important + improvements). Focus on CRITICAL and HIGH severity issues. Include full details with refactor_opportunities, + priority_sequence, and next_actions_for_claude. + +2. **Request Continuation**: AFTER providing the refactor_analysis_complete response, append the following JSON + format as a separate response (and nothing more after this): +{"status": "more_refactor_required", +"message": "Explanation of why more refactoring is needed and overview of remaining work. For example: 'Extensive decomposition required across 15 additional files. Continuing analysis will identify module extraction opportunities in services/, controllers/, and utils/ directories.'"} + +This approach ensures comprehensive refactoring coverage while maintaining quality and avoiding context overflow. +Claude will use the continuation_id to continue the refactoring analysis in subsequent requests. + +Provide precise, implementable refactoring guidance that Claude can execute with confidence. +""" diff --git a/systemprompts/testgen_prompt.py b/systemprompts/testgen_prompt.py index 2e0a03d..3166ddd 100644 --- a/systemprompts/testgen_prompt.py +++ b/systemprompts/testgen_prompt.py @@ -15,6 +15,11 @@ same file you've been provided unless for some reason its content is missing or {"status": "clarification_required", "question": "", "files_needed": ["[file name here]", "[or some folder/]"]} +CRITICAL LINE NUMBER INSTRUCTIONS +Code is presented with line number markers "LINE│ code". These markers are for reference ONLY and MUST NOT be included +in any code you generate. Always reference specific line numbers for precise feedback. Include exact line numbers in +your issue descriptions. + MULTI-AGENT WORKFLOW You sequentially inhabit five expert personas—each passes a concise artefact to the next: diff --git a/systemprompts/thinkdeep_prompt.py b/systemprompts/thinkdeep_prompt.py index 266aa55..f4bdc68 100644 --- a/systemprompts/thinkdeep_prompt.py +++ b/systemprompts/thinkdeep_prompt.py @@ -4,7 +4,8 @@ ThinkDeep tool system prompt THINKDEEP_PROMPT = """ ROLE -You are a senior engineering collaborator working with Claude on complex software problems. Claude will send you content—analysis, prompts, questions, ideas, or theories—to deepen, validate, and extend. +You are a senior engineering collaborator working with Claude on complex software problems. Claude will send you +content—analysis, prompts, questions, ideas, or theories—to deepen, validate, and extend. IF MORE INFORMATION IS NEEDED If you need additional context (e.g., related files, system architecture, requirements, code snippets) to provide diff --git a/tests/test_docker_path_integration.py b/tests/test_docker_path_integration.py index 3ae8667..f445d72 100644 --- a/tests/test_docker_path_integration.py +++ b/tests/test_docker_path_integration.py @@ -9,6 +9,7 @@ import importlib import os import tempfile from pathlib import Path +from unittest.mock import patch import pytest @@ -36,27 +37,33 @@ def test_docker_path_translation_integration(): try: os.environ["WORKSPACE_ROOT"] = str(host_workspace) - # Reload the module to pick up new environment variables + # Reload the modules to pick up new environment variables + # Need to reload security_config first since it sets WORKSPACE_ROOT + import utils.security_config + + importlib.reload(utils.security_config) importlib.reload(utils.file_utils) - # Mock the CONTAINER_WORKSPACE to point to our test directory - utils.file_utils.CONTAINER_WORKSPACE = container_workspace + # Properly mock the CONTAINER_WORKSPACE + with patch("utils.file_utils.CONTAINER_WORKSPACE", container_workspace): + # Test the translation + from utils.file_utils import translate_path_for_environment - # Test the translation - from utils.file_utils import translate_path_for_environment + # This should translate the host path to container path + host_path = str(test_file) + result = translate_path_for_environment(host_path) - # This should translate the host path to container path - host_path = str(test_file) - result = translate_path_for_environment(host_path) - - # Verify the translation worked - expected = str(container_workspace / "src" / "test.py") - assert result == expected + # Verify the translation worked + expected = str(container_workspace / "src" / "test.py") + assert result == expected finally: # Restore original environment os.environ.clear() os.environ.update(original_env) + import utils.security_config + + importlib.reload(utils.security_config) importlib.reload(utils.file_utils) @@ -80,19 +87,26 @@ def test_docker_security_validation(): try: os.environ["WORKSPACE_ROOT"] = str(host_workspace) - # Reload the module + # Reload the modules + import utils.security_config + + importlib.reload(utils.security_config) importlib.reload(utils.file_utils) - utils.file_utils.CONTAINER_WORKSPACE = Path("/workspace") - from utils.file_utils import resolve_and_validate_path + # Properly mock the CONTAINER_WORKSPACE + with patch("utils.file_utils.CONTAINER_WORKSPACE", Path("/workspace")): + from utils.file_utils import resolve_and_validate_path - # Trying to access the symlink should fail - with pytest.raises(PermissionError): - resolve_and_validate_path(str(symlink)) + # Trying to access the symlink should fail + with pytest.raises(PermissionError): + resolve_and_validate_path(str(symlink)) finally: os.environ.clear() os.environ.update(original_env) + import utils.security_config + + importlib.reload(utils.security_config) importlib.reload(utils.file_utils) @@ -150,37 +164,46 @@ def test_review_changes_docker_path_translation(): # Simulate Docker environment os.environ["WORKSPACE_ROOT"] = str(host_workspace) - # Reload the module + # Reload the modules + import utils.security_config + + importlib.reload(utils.security_config) importlib.reload(utils.file_utils) - utils.file_utils.CONTAINER_WORKSPACE = container_workspace - # Import after reloading to get updated environment - from tools.precommit import Precommit + # Properly mock the CONTAINER_WORKSPACE and reload precommit module + with patch("utils.file_utils.CONTAINER_WORKSPACE", container_workspace): + # Need to also patch it in the modules that import it + with patch("utils.security_config.CONTAINER_WORKSPACE", container_workspace): + # Import after patching to get updated environment + from tools.precommit import Precommit - # Create tool instance - tool = Precommit() + # Create tool instance + tool = Precommit() - # Test path translation in prepare_prompt - request = tool.get_request_model()( - path=str(host_workspace / "project"), # Host path that needs translation - review_type="quick", - severity_filter="all", - ) + # Test path translation in prepare_prompt + request = tool.get_request_model()( + path=str(host_workspace / "project"), # Host path that needs translation + review_type="quick", + severity_filter="all", + ) - # This should translate the path and find the git repository - import asyncio + # This should translate the path and find the git repository + import asyncio - result = asyncio.run(tool.prepare_prompt(request)) + result = asyncio.run(tool.prepare_prompt(request)) - # Should find the repository (not raise an error about inaccessible path) - # If we get here without exception, the path was successfully translated - assert isinstance(result, str) - # The result should contain git diff information or indicate no changes - assert "No git repositories found" not in result or "changes" in result.lower() + # Should find the repository (not raise an error about inaccessible path) + # If we get here without exception, the path was successfully translated + assert isinstance(result, str) + # The result should contain git diff information or indicate no changes + assert "No git repositories found" not in result or "changes" in result.lower() finally: os.environ.clear() os.environ.update(original_env) + import utils.security_config + + importlib.reload(utils.security_config) importlib.reload(utils.file_utils) @@ -203,36 +226,44 @@ def test_review_changes_docker_path_error(): # Simulate Docker environment os.environ["WORKSPACE_ROOT"] = str(host_workspace) - # Reload the module + # Reload the modules + import utils.security_config + + importlib.reload(utils.security_config) importlib.reload(utils.file_utils) - utils.file_utils.CONTAINER_WORKSPACE = container_workspace - # Import after reloading to get updated environment - from tools.precommit import Precommit + # Properly mock the CONTAINER_WORKSPACE + with patch("utils.file_utils.CONTAINER_WORKSPACE", container_workspace): + with patch("utils.security_config.CONTAINER_WORKSPACE", container_workspace): + # Import after patching to get updated environment + from tools.precommit import Precommit - # Create tool instance - tool = Precommit() + # Create tool instance + tool = Precommit() - # Test path translation with an inaccessible path - request = tool.get_request_model()( - path=str(outside_path), # Path outside the mounted workspace - review_type="quick", - severity_filter="all", - ) + # Test path translation with an inaccessible path + request = tool.get_request_model()( + path=str(outside_path), # Path outside the mounted workspace + review_type="quick", + severity_filter="all", + ) - # This should raise a ValueError - import asyncio + # This should raise a ValueError + import asyncio - with pytest.raises(ValueError) as exc_info: - asyncio.run(tool.prepare_prompt(request)) + with pytest.raises(ValueError) as exc_info: + asyncio.run(tool.prepare_prompt(request)) - # Check the error message - assert "not accessible from within the Docker container" in str(exc_info.value) - assert "mounted workspace" in str(exc_info.value) + # Check the error message + assert "not accessible from within the Docker container" in str(exc_info.value) + assert "mounted workspace" in str(exc_info.value) finally: os.environ.clear() os.environ.update(original_env) + import utils.security_config + + importlib.reload(utils.security_config) importlib.reload(utils.file_utils) @@ -250,31 +281,38 @@ def test_double_translation_prevention(): try: os.environ["WORKSPACE_ROOT"] = str(host_workspace) - # Reload the module + # Reload the modules + import utils.security_config + + importlib.reload(utils.security_config) importlib.reload(utils.file_utils) - utils.file_utils.CONTAINER_WORKSPACE = container_workspace - from utils.file_utils import translate_path_for_environment + # Properly mock the CONTAINER_WORKSPACE + with patch("utils.file_utils.CONTAINER_WORKSPACE", container_workspace): + from utils.file_utils import translate_path_for_environment - # Test 1: Normal translation - host_path = str(host_workspace / "src" / "main.py") - translated_once = translate_path_for_environment(host_path) - expected = str(container_workspace / "src" / "main.py") - assert translated_once == expected + # Test 1: Normal translation + host_path = str(host_workspace / "src" / "main.py") + translated_once = translate_path_for_environment(host_path) + expected = str(container_workspace / "src" / "main.py") + assert translated_once == expected - # Test 2: Double translation should return the same path - translated_twice = translate_path_for_environment(translated_once) - assert translated_twice == translated_once - assert translated_twice == expected + # Test 2: Double translation should return the same path + translated_twice = translate_path_for_environment(translated_once) + assert translated_twice == translated_once + assert translated_twice == expected - # Test 3: Container workspace root should not be double-translated - root_path = str(container_workspace) - translated_root = translate_path_for_environment(root_path) - assert translated_root == root_path + # Test 3: Container workspace root should not be double-translated + root_path = str(container_workspace) + translated_root = translate_path_for_environment(root_path) + assert translated_root == root_path finally: os.environ.clear() os.environ.update(original_env) + import utils.security_config + + importlib.reload(utils.security_config) importlib.reload(utils.file_utils) diff --git a/tests/test_refactor.py b/tests/test_refactor.py new file mode 100644 index 0000000..8bfeb23 --- /dev/null +++ b/tests/test_refactor.py @@ -0,0 +1,556 @@ +""" +Tests for the refactor tool functionality +""" + +import json +from unittest.mock import MagicMock, patch + +import pytest + +from tools.refactor import RefactorTool +from utils.file_utils import read_file_content + + +class TestRefactorTool: + """Test suite for the refactor tool""" + + @pytest.fixture + def refactor_tool(self): + """Create a refactor tool instance for testing""" + return RefactorTool() + + @pytest.fixture + def mock_model_response(self): + """Create a mock model response with valid JSON""" + + def _create_response(content=None): + if content is None: + content = json.dumps( + { + "refactor_opportunities": [ + { + "id": "refactor-001", + "type": "codesmells", + "severity": "high", + "file": "/test/file.py", + "start_line": 10, + "end_line": 25, + "context_start_text": "def long_method():", + "context_end_text": " return result", + "issue": "Method too long with multiple responsibilities", + "suggestion": "Extract helper methods", + "rationale": "Improves readability and maintainability", + "code_to_replace": "# original code", + "replacement_code_snippet": "# refactored code", + "new_code_snippets": [], + } + ], + "priority_sequence": ["refactor-001"], + "next_actions_for_claude": [], + } + ) + + from unittest.mock import Mock + + return Mock( + content=content, + usage={"input_tokens": 100, "output_tokens": 200, "total_tokens": 300}, + model_name="test-model", + metadata={"finish_reason": "STOP"}, + ) + + return _create_response + + def test_get_name(self, refactor_tool): + """Test that the tool returns the correct name""" + assert refactor_tool.get_name() == "refactor" + + def test_get_description(self, refactor_tool): + """Test that the tool returns a comprehensive description""" + description = refactor_tool.get_description() + assert "INTELLIGENT CODE REFACTORING" in description + assert "codesmells" in description + assert "decompose" in description + assert "modernize" in description + assert "organization" in description + + def test_get_input_schema(self, refactor_tool): + """Test that the input schema includes all required fields""" + schema = refactor_tool.get_input_schema() + + assert schema["type"] == "object" + assert "files" in schema["properties"] + assert "prompt" in schema["properties"] + assert "refactor_type" in schema["properties"] + + # Check refactor_type enum values + refactor_enum = schema["properties"]["refactor_type"]["enum"] + expected_types = ["codesmells", "decompose", "modernize", "organization"] + assert all(rt in refactor_enum for rt in expected_types) + + def test_language_detection_python(self, refactor_tool): + """Test language detection for Python files""" + files = ["/test/file1.py", "/test/file2.py", "/test/utils.py"] + language = refactor_tool.detect_primary_language(files) + assert language == "python" + + def test_language_detection_javascript(self, refactor_tool): + """Test language detection for JavaScript files""" + files = ["/test/app.js", "/test/component.jsx", "/test/utils.js"] + language = refactor_tool.detect_primary_language(files) + assert language == "javascript" + + def test_language_detection_mixed(self, refactor_tool): + """Test language detection for mixed language files""" + files = ["/test/app.py", "/test/script.js", "/test/main.java"] + language = refactor_tool.detect_primary_language(files) + assert language == "mixed" + + def test_language_detection_unknown(self, refactor_tool): + """Test language detection for unknown file types""" + files = ["/test/data.txt", "/test/config.json"] + language = refactor_tool.detect_primary_language(files) + assert language == "unknown" + + def test_language_specific_guidance_python(self, refactor_tool): + """Test language-specific guidance for Python modernization""" + guidance = refactor_tool.get_language_specific_guidance("python", "modernize") + assert "f-strings" in guidance + assert "dataclasses" in guidance + assert "type hints" in guidance + + def test_language_specific_guidance_javascript(self, refactor_tool): + """Test language-specific guidance for JavaScript modernization""" + guidance = refactor_tool.get_language_specific_guidance("javascript", "modernize") + assert "async/await" in guidance + assert "destructuring" in guidance + assert "arrow functions" in guidance + + def test_language_specific_guidance_unknown(self, refactor_tool): + """Test language-specific guidance for unknown languages""" + guidance = refactor_tool.get_language_specific_guidance("unknown", "modernize") + assert guidance == "" + + @pytest.mark.asyncio + async def test_execute_basic_refactor(self, refactor_tool, mock_model_response): + """Test basic refactor tool execution""" + with patch.object(refactor_tool, "get_model_provider") as mock_get_provider: + mock_provider = MagicMock() + mock_provider.get_provider_type.return_value = MagicMock(value="test") + mock_provider.supports_thinking_mode.return_value = False + mock_provider.generate_content.return_value = mock_model_response() + mock_get_provider.return_value = mock_provider + + # Mock file processing + with patch.object(refactor_tool, "_prepare_file_content_for_prompt") as mock_prepare: + mock_prepare.return_value = "def test(): pass" + + result = await refactor_tool.execute( + { + "files": ["/test/file.py"], + "prompt": "Find code smells in this Python code", + "refactor_type": "codesmells", + } + ) + + assert len(result) == 1 + output = json.loads(result[0].text) + assert output["status"] == "success" + # The format_response method adds markdown instructions, so content_type should be "markdown" + # It could also be "json" or "text" depending on the response format + assert output["content_type"] in ["json", "text", "markdown"] + + @pytest.mark.asyncio + async def test_execute_with_style_guide(self, refactor_tool, mock_model_response): + """Test refactor tool execution with style guide examples""" + with patch.object(refactor_tool, "get_model_provider") as mock_get_provider: + mock_provider = MagicMock() + mock_provider.get_provider_type.return_value = MagicMock(value="test") + mock_provider.supports_thinking_mode.return_value = False + mock_provider.generate_content.return_value = mock_model_response() + mock_get_provider.return_value = mock_provider + + # Mock file processing + with patch.object(refactor_tool, "_prepare_file_content_for_prompt") as mock_prepare: + mock_prepare.return_value = "def example(): pass" + + with patch.object(refactor_tool, "_process_style_guide_examples") as mock_style: + mock_style.return_value = ("# style guide content", "") + + result = await refactor_tool.execute( + { + "files": ["/test/file.py"], + "prompt": "Modernize this code following our style guide", + "refactor_type": "modernize", + "style_guide_examples": ["/test/style_example.py"], + } + ) + + assert len(result) == 1 + output = json.loads(result[0].text) + assert output["status"] == "success" + + def test_format_response_valid_json(self, refactor_tool): + """Test response formatting with valid structured JSON""" + valid_json_response = json.dumps( + { + "status": "refactor_analysis_complete", + "refactor_opportunities": [ + { + "id": "test-001", + "type": "codesmells", + "severity": "medium", + "file": "/test.py", + "start_line": 1, + "end_line": 5, + "context_start_text": "def test():", + "context_end_text": " pass", + "issue": "Test issue", + "suggestion": "Test suggestion", + "rationale": "Test rationale", + "code_to_replace": "old code", + "replacement_code_snippet": "new code", + } + ], + "priority_sequence": ["test-001"], + "next_actions_for_claude": [], + } + ) + + # Create a mock request + request = MagicMock() + request.refactor_type = "codesmells" + + formatted = refactor_tool.format_response(valid_json_response, request) + + # Should contain the original response plus implementation instructions + assert valid_json_response in formatted + assert "IMMEDIATE NEXT ACTION" in formatted + assert "ULTRATHINK & IMPLEMENT REFACTORINGS" in formatted + assert "Step 4: COMPLETE REFACTORING" in formatted # Not more_required, so should be COMPLETE + + def test_format_response_invalid_json(self, refactor_tool): + """Test response formatting with invalid JSON - now handled by base tool""" + invalid_response = "This is not JSON content" + + # Create a mock request + request = MagicMock() + request.refactor_type = "codesmells" + + formatted = refactor_tool.format_response(invalid_response, request) + + # Should contain the original response plus implementation instructions + assert invalid_response in formatted + assert "IMMEDIATE NEXT ACTION" in formatted + assert "ULTRATHINK & IMPLEMENT REFACTORINGS" in formatted + + def test_model_category(self, refactor_tool): + """Test that the refactor tool uses EXTENDED_REASONING category""" + from tools.models import ToolModelCategory + + category = refactor_tool.get_model_category() + assert category == ToolModelCategory.EXTENDED_REASONING + + def test_default_temperature(self, refactor_tool): + """Test that the refactor tool uses analytical temperature""" + from config import TEMPERATURE_ANALYTICAL + + temp = refactor_tool.get_default_temperature() + assert temp == TEMPERATURE_ANALYTICAL + + def test_format_response_more_refactor_required(self, refactor_tool): + """Test that format_response handles more_refactor_required status""" + more_refactor_response = json.dumps( + { + "status": "more_refactor_required", + "message": "Large codebase requires extensive refactoring across multiple files", + } + ) + + # Create a mock request + request = MagicMock() + request.refactor_type = "decompose" + + formatted = refactor_tool.format_response(more_refactor_response, request) + + # Should contain the original response plus continuation instructions + assert more_refactor_response in formatted + assert "IMMEDIATE NEXT ACTION" in formatted + assert "ULTRATHINK & IMPLEMENT REFACTORINGS" in formatted + assert "VERIFY CHANGES WORK" in formatted + assert "Step 4: CONTINUE WITH MORE REFACTORING" in formatted # more_required, so should be CONTINUE + assert "continuation_id" in formatted + assert "immediately continue with more refactoring analysis" in formatted + + +class TestFileUtilsLineNumbers: + """Test suite for line numbering functionality in file_utils""" + + def test_read_file_content_with_line_numbers(self, project_path): + """Test reading file content with line numbers enabled""" + + # Create a test file within the workspace + temp_path = project_path / "test_file.py" + with open(temp_path, "w") as f: + f.write("def hello():\n print('Hello')\n return True") + + # Read with line numbers explicitly enabled + content, tokens = read_file_content(str(temp_path), include_line_numbers=True) + + # Check that line numbers are present + assert "1│ def hello():" in content + assert "2│ print('Hello')" in content + assert "3│ return True" in content + assert "--- BEGIN FILE:" in content + assert "--- END FILE:" in content + + def test_read_file_content_without_line_numbers(self, project_path): + """Test reading file content with line numbers disabled""" + + # Create a test file within the workspace + temp_path = project_path / "test_file.txt" + with open(temp_path, "w") as f: + f.write("Line 1\nLine 2\nLine 3") + + # Read with line numbers explicitly disabled + content, tokens = read_file_content(str(temp_path), include_line_numbers=False) + + # Check that line numbers are NOT present + assert "1│" not in content + assert "Line 1" in content + assert "Line 2" in content + assert "--- BEGIN FILE:" in content + + def test_read_file_content_auto_detect_programming(self, project_path): + """Test that auto-detection is OFF by default (backwards compatibility)""" + + # Create a test file within the workspace + temp_path = project_path / "test_auto.py" + with open(temp_path, "w") as f: + f.write("import os\nprint('test')") + + # Read without specifying line numbers (should NOT auto-detect for backwards compatibility) + content, tokens = read_file_content(str(temp_path)) + + # Should NOT automatically add line numbers for .py files (default behavior) + assert "1│" not in content + assert "import os" in content + assert "print('test')" in content + + def test_read_file_content_auto_detect_text(self, project_path): + """Test auto-detection of line numbers for text files""" + + # Create a test file within the workspace + temp_path = project_path / "test_auto.txt" + with open(temp_path, "w") as f: + f.write("This is a text file\nWith multiple lines") + + # Read without specifying line numbers (should auto-detect) + content, tokens = read_file_content(str(temp_path)) + + # Should NOT automatically add line numbers for .txt files + assert "1│" not in content + assert "This is a text file" in content + + def test_line_ending_normalization(self): + """Test that different line endings are normalized consistently""" + from utils.file_utils import _add_line_numbers, _normalize_line_endings + + # Test different line ending formats + content_crlf = "Line 1\r\nLine 2\r\nLine 3" + content_cr = "Line 1\rLine 2\rLine 3" + content_lf = "Line 1\nLine 2\nLine 3" + + # All should normalize to the same result + normalized_crlf = _normalize_line_endings(content_crlf) + normalized_cr = _normalize_line_endings(content_cr) + normalized_lf = _normalize_line_endings(content_lf) + + assert normalized_crlf == normalized_cr == normalized_lf + assert normalized_lf == "Line 1\nLine 2\nLine 3" + + # Line numbering should work consistently + numbered = _add_line_numbers(content_crlf) + assert " 1│ Line 1" in numbered + assert " 2│ Line 2" in numbered + assert " 3│ Line 3" in numbered + + def test_detect_file_type(self): + """Test file type detection""" + from utils.file_utils import detect_file_type + + # Test programming language files + assert detect_file_type("test.py") == "text" + assert detect_file_type("test.js") == "text" + assert detect_file_type("test.java") == "text" + + # Test image files + assert detect_file_type("image.png") == "image" + assert detect_file_type("photo.jpg") == "image" + + # Test binary files + assert detect_file_type("program.exe") == "binary" + assert detect_file_type("library.dll") == "binary" + + def test_should_add_line_numbers(self): + """Test line number detection logic""" + from utils.file_utils import should_add_line_numbers + + # NO files should get line numbers by default (backwards compatibility) + assert not should_add_line_numbers("test.py") + assert not should_add_line_numbers("app.js") + assert not should_add_line_numbers("Main.java") + assert not should_add_line_numbers("readme.txt") + assert not should_add_line_numbers("data.csv") + + # Explicit override should work + assert should_add_line_numbers("readme.txt", True) + assert not should_add_line_numbers("test.py", False) + + def test_line_numbers_double_triple_digits(self, project_path): + """Test line numbering with double and triple digit line numbers""" + from utils.file_utils import _add_line_numbers + + # Create content with many lines to test double and triple digit formatting + lines = [] + for i in range(1, 125): # Lines 1-124 for testing up to triple digits + if i < 10: + lines.append(f"# Single digit line {i}") + elif i < 100: + lines.append(f"# Double digit line {i}") + else: + lines.append(f"# Triple digit line {i}") + + content = "\n".join(lines) + numbered_content = _add_line_numbers(content) + + # Test single digit formatting (should be right-aligned with spaces) + assert " 1│ # Single digit line 1" in numbered_content + assert " 9│ # Single digit line 9" in numbered_content + + # Test double digit formatting (should be right-aligned) + assert " 10│ # Double digit line 10" in numbered_content # Line 10 has "double digit" content + assert " 50│ # Double digit line 50" in numbered_content + assert " 99│ # Double digit line 99" in numbered_content + + # Test triple digit formatting (should be right-aligned) + assert " 100│ # Triple digit line 100" in numbered_content + assert " 124│ # Triple digit line 124" in numbered_content + + # Verify consistent alignment - all line numbers should end with "│ " + lines_with_numbers = numbered_content.split("\n") + for line in lines_with_numbers: + if "│" in line: + # Find the pipe character position + pipe_pos = line.find("│") + # Ensure the character before pipe is a digit + assert line[pipe_pos - 1].isdigit(), f"Line format issue: {line}" + # Ensure the character after pipe is a space + assert line[pipe_pos + 1] == " ", f"Line format issue: {line}" + + def test_line_numbers_with_file_reading(self, project_path): + """Test line numbering through file reading with large file""" + + # Create a test file with 150 functions (600 total lines: 4 lines per function) + temp_path = project_path / "large_test_file.py" + with open(temp_path, "w") as f: + for i in range(1, 151): # Functions 1-150 + f.write(f"def function_{i}():\n") + f.write(f" # This is function number {i}\n") + f.write(f" return {i}\n") + f.write("\n") + + # Read with line numbers enabled + content, tokens = read_file_content(str(temp_path), include_line_numbers=True) + + # Calculate actual line numbers based on file structure (4 lines per function) + # Function 1: lines 1-4, Function 2: lines 5-8, etc. + # Line 1: def function_1(): + # Line 2: # This is function number 1 + # Line 3: return 1 + # Line 4: (empty) + + # Test various line number formats in the actual file content + assert " 1│ def function_1():" in content + + # Function 13 starts at line 49 (12*4 + 1), so line 50 is " # This is function number 13" + assert " 50│ # This is function number 13" in content + + # Line 100 is actually an empty line after function 25 (line 99 was "return 25") + assert " 100│ " in content # Empty line + + # Line 99 is "return 25" from function 25 + assert " 99│ return 25" in content + + # Test more line numbers - line 147 is "return 37" from function 37 + assert " 147│ return 37" in content + + # Test that we have the final lines (600 total lines) + assert " 599│ return 150" in content + assert " 600│ " in content # Final empty line + + # Verify the file structure is preserved + assert "--- BEGIN FILE:" in content + assert "--- END FILE:" in content + assert str(temp_path) in content + + def test_line_numbers_large_files_22k_lines(self, project_path): + """Test line numbering for very large files (22,500+ lines)""" + from utils.file_utils import _add_line_numbers + + # Create content simulating a very large file with 25,000 lines + lines = [] + for i in range(1, 25001): # Lines 1-25000 + lines.append(f"// Large file line {i}") + + content = "\n".join(lines) + numbered_content = _add_line_numbers(content) + + # Test that width dynamically adjusts to 5 digits for large files + # Small line numbers should now have 5-digit width + assert " 1│ // Large file line 1" in numbered_content + assert " 9│ // Large file line 9" in numbered_content + assert " 10│ // Large file line 10" in numbered_content + assert " 99│ // Large file line 99" in numbered_content + assert " 100│ // Large file line 100" in numbered_content + assert " 999│ // Large file line 999" in numbered_content + assert " 1000│ // Large file line 1000" in numbered_content + assert " 9999│ // Large file line 9999" in numbered_content + assert "10000│ // Large file line 10000" in numbered_content + assert "22500│ // Large file line 22500" in numbered_content + assert "25000│ // Large file line 25000" in numbered_content + + # Verify consistent alignment - all line numbers should end with "│ " + lines_with_numbers = numbered_content.split("\n") + for i, line in enumerate(lines_with_numbers[:100]): # Check first 100 lines + if "│" in line: + pipe_pos = line.find("│") + # For large files, should be 5-character width plus pipe + assert line[pipe_pos - 1].isdigit(), f"Line {i+1} format issue: {line}" + assert line[pipe_pos + 1] == " ", f"Line {i+1} format issue: {line}" + + def test_line_numbers_boundary_conditions(self): + """Test line numbering at boundary conditions (9999 vs 10000 lines)""" + from utils.file_utils import _add_line_numbers + + # Test exactly 9999 lines (should use 4-digit width) + lines_9999 = [f"Line {i}" for i in range(1, 10000)] # 9999 lines + content_9999 = "\n".join(lines_9999) + numbered_9999 = _add_line_numbers(content_9999) + + # Should use 4-digit format + assert " 1│ Line 1" in numbered_9999 + assert "9999│ Line 9999" in numbered_9999 + + # Test exactly 10000 lines (should use 5-digit width) + lines_10000 = [f"Line {i}" for i in range(1, 10001)] # 10000 lines + content_10000 = "\n".join(lines_10000) + numbered_10000 = _add_line_numbers(content_10000) + + # Should use 5-digit format + assert " 1│ Line 1" in numbered_10000 + assert "10000│ Line 10000" in numbered_10000 + + +if __name__ == "__main__": + pytest.main([__file__, "-v"]) diff --git a/tests/test_server.py b/tests/test_server.py index 1292d51..2af485d 100644 --- a/tests/test_server.py +++ b/tests/test_server.py @@ -27,10 +27,11 @@ class TestServerTools: assert "chat" in tool_names assert "precommit" in tool_names assert "testgen" in tool_names + assert "refactor" in tool_names assert "version" in tool_names - # Should have exactly 8 tools (including testgen) - assert len(tools) == 8 + # Should have exactly 9 tools (including refactor) + assert len(tools) == 9 # Check descriptions are verbose for tool in tools: diff --git a/tests/test_special_status_parsing.py b/tests/test_special_status_parsing.py index b46445d..5fd129b 100644 --- a/tests/test_special_status_parsing.py +++ b/tests/test_special_status_parsing.py @@ -261,3 +261,95 @@ class TestSpecialStatusParsing: # Should fall back to normal response since validation failed assert result.status == "success" assert result.content_type == "text" + + def test_refactor_analysis_complete_parsing(self): + """Test that RefactorAnalysisComplete status is properly parsed""" + import json + + json_response = { + "status": "refactor_analysis_complete", + "refactor_opportunities": [ + { + "id": "refactor-001", + "type": "decompose", + "severity": "critical", + "file": "/test.py", + "start_line": 1, + "end_line": 5, + "context_start_text": "def test():", + "context_end_text": " pass", + "issue": "Large function needs decomposition", + "suggestion": "Extract helper methods", + "rationale": "Improves readability", + "code_to_replace": "old code", + "replacement_code_snippet": "new code", + } + ], + "priority_sequence": ["refactor-001"], + "next_actions_for_claude": [ + { + "action_type": "EXTRACT_METHOD", + "target_file": "/test.py", + "source_lines": "1-5", + "description": "Extract helper method", + } + ], + } + + result = self.tool._parse_response(json.dumps(json_response), self.request) + + assert result.status == "refactor_analysis_complete" + assert result.content_type == "json" + parsed_content = json.loads(result.content) + assert "refactor_opportunities" in parsed_content + assert len(parsed_content["refactor_opportunities"]) == 1 + assert parsed_content["refactor_opportunities"][0]["id"] == "refactor-001" + + def test_refactor_analysis_complete_validation_error(self): + """Test that RefactorAnalysisComplete validation catches missing required fields""" + import json + + json_response = { + "status": "refactor_analysis_complete", + "refactor_opportunities": [ + { + "id": "refactor-001", + # Missing required fields like type, severity, etc. + } + ], + "priority_sequence": ["refactor-001"], + "next_actions_for_claude": [], + } + + result = self.tool._parse_response(json.dumps(json_response), self.request) + + # Should fall back to normal response since validation failed + assert result.status == "success" + assert result.content_type == "text" + + def test_more_refactor_required_parsing(self): + """Test that more_refactor_required status is parsed correctly""" + import json + + json_response = { + "status": "more_refactor_required", + "message": "Large codebase requires extensive decomposition across 15 files. Continuing analysis for remaining modules.", + } + + result = self.tool._parse_response(json.dumps(json_response), self.request) + + assert result.status == "more_refactor_required" + assert result.content_type == "json" + parsed_content = json.loads(result.content) + assert parsed_content["status"] == "more_refactor_required" + assert "Large codebase requires extensive decomposition" in parsed_content["message"] + + def test_more_refactor_required_missing_message(self): + """Test that more_refactor_required without required message field fails validation""" + response_json = '{"status": "more_refactor_required"}' + + result = self.tool._parse_response(response_json, self.request) + + # Should fall back to normal processing since validation failed (missing required field) + assert result.status == "success" + assert result.content_type == "text" diff --git a/tools/__init__.py b/tools/__init__.py index 9260083..792cb88 100644 --- a/tools/__init__.py +++ b/tools/__init__.py @@ -7,6 +7,7 @@ from .chat import ChatTool from .codereview import CodeReviewTool from .debug import DebugIssueTool from .precommit import Precommit +from .refactor import RefactorTool from .testgen import TestGenTool from .thinkdeep import ThinkDeepTool @@ -17,5 +18,6 @@ __all__ = [ "AnalyzeTool", "ChatTool", "Precommit", + "RefactorTool", "TestGenTool", ] diff --git a/tools/base.py b/tools/base.py index 65aef7d..9dc8c57 100644 --- a/tools/base.py +++ b/tools/base.py @@ -207,9 +207,7 @@ class BaseTool(ABC): provider = ModelProviderRegistry.get_provider_for_model(model_name) if not provider: logger = logging.getLogger(f"tools.{self.name}") - logger.warning( - f"Model '{model_name}' is not available with current API keys. " f"Requiring model selection." - ) + logger.warning(f"Model '{model_name}' is not available with current API keys. Requiring model selection.") return True return False @@ -397,6 +395,25 @@ class BaseTool(ABC): """ return 0.5 + def wants_line_numbers_by_default(self) -> bool: + """ + Return whether this tool wants line numbers added to code files by default. + + Tools that benefit from precise line references (refactor, codereview, debug) + should return True. Tools that prioritize token efficiency or don't need + precise references can return False. + + Line numbers add ~8-10% token overhead but provide precise targeting for: + - Code review feedback ("SQL injection on line 45") + - Debug error locations ("Memory leak in loop at lines 123-156") + - Test generation targets ("Generate tests for method at lines 78-95") + - Refactoring guidance ("Extract method from lines 67-89") + + Returns: + bool: True if line numbers should be added by default for this tool + """ + return False # Conservative default - tools opt-in as needed + def get_default_thinking_mode(self) -> str: """ Return the default thinking mode for this tool. @@ -694,7 +711,10 @@ class BaseTool(ABC): ) try: file_content = read_files( - files_to_embed, max_tokens=effective_max_tokens + reserve_tokens, reserve_tokens=reserve_tokens + files_to_embed, + max_tokens=effective_max_tokens + reserve_tokens, + reserve_tokens=reserve_tokens, + include_line_numbers=self.wants_line_numbers_by_default(), ) self._validate_token_limit(file_content, context_description) content_parts.append(file_content) diff --git a/tools/codereview.py b/tools/codereview.py index 50b3bea..98fa2cc 100644 --- a/tools/codereview.py +++ b/tools/codereview.py @@ -148,6 +148,10 @@ class CodeReviewTool(BaseTool): def get_default_temperature(self) -> float: return TEMPERATURE_ANALYTICAL + def wants_line_numbers_by_default(self) -> bool: + """Code review tool needs line numbers for precise feedback""" + return True + def get_request_model(self): return CodeReviewRequest diff --git a/tools/debug.py b/tools/debug.py index 3d244c0..22837de 100644 --- a/tools/debug.py +++ b/tools/debug.py @@ -111,6 +111,10 @@ class DebugIssueTool(BaseTool): def get_default_temperature(self) -> float: return TEMPERATURE_ANALYTICAL + def wants_line_numbers_by_default(self) -> bool: + """Debug tool needs line numbers for precise error location""" + return True + def get_model_category(self) -> "ToolModelCategory": """Debug requires deep analysis and reasoning""" from tools.models import ToolModelCategory diff --git a/tools/models.py b/tools/models.py index bfc9847..ec22f54 100644 --- a/tools/models.py +++ b/tools/models.py @@ -40,6 +40,8 @@ class ToolOutput(BaseModel): "focused_review_required", "test_sample_needed", "more_tests_required", + "more_refactor_required", + "refactor_analysis_complete", "resend_prompt", "continuation_available", ] = "success" @@ -97,6 +99,56 @@ class MoreTestsRequired(BaseModel): pending_tests: str = Field(..., description="List of pending tests to be generated") +class MoreRefactorRequired(BaseModel): + """Request for continuation when refactoring requires extensive changes""" + + status: Literal["more_refactor_required"] = "more_refactor_required" + message: str = Field(..., description="Explanation of why more refactoring is needed and what remains to be done") + + +class RefactorOpportunity(BaseModel): + """A single refactoring opportunity with precise targeting information""" + + id: str = Field(..., description="Unique identifier for this refactoring opportunity") + type: Literal["decompose", "codesmells", "modernize", "organization"] = Field( + ..., description="Type of refactoring" + ) + severity: Literal["critical", "high", "medium", "low"] = Field(..., description="Severity level") + file: str = Field(..., description="Absolute path to the file") + start_line: int = Field(..., description="Starting line number") + end_line: int = Field(..., description="Ending line number") + context_start_text: str = Field(..., description="Exact text from start line for verification") + context_end_text: str = Field(..., description="Exact text from end line for verification") + issue: str = Field(..., description="Clear description of what needs refactoring") + suggestion: str = Field(..., description="Specific refactoring action to take") + rationale: str = Field(..., description="Why this improves the code") + code_to_replace: str = Field(..., description="Original code that should be changed") + replacement_code_snippet: str = Field(..., description="Refactored version of the code") + new_code_snippets: Optional[list[dict]] = Field( + default_factory=list, description="Additional code snippets to be added" + ) + + +class RefactorAction(BaseModel): + """Next action for Claude to implement refactoring""" + + action_type: Literal["EXTRACT_METHOD", "SPLIT_CLASS", "MODERNIZE_SYNTAX", "REORGANIZE_CODE", "DECOMPOSE_FILE"] = ( + Field(..., description="Type of action to perform") + ) + target_file: str = Field(..., description="Absolute path to target file") + source_lines: str = Field(..., description="Line range (e.g., '45-67')") + description: str = Field(..., description="Step-by-step action description for Claude") + + +class RefactorAnalysisComplete(BaseModel): + """Complete refactor analysis with prioritized opportunities""" + + status: Literal["refactor_analysis_complete"] = "refactor_analysis_complete" + refactor_opportunities: list[RefactorOpportunity] = Field(..., description="List of refactoring opportunities") + priority_sequence: list[str] = Field(..., description="Recommended order of refactoring IDs") + next_actions_for_claude: list[RefactorAction] = Field(..., description="Specific actions for Claude to implement") + + # Registry mapping status strings to their corresponding Pydantic models SPECIAL_STATUS_MODELS = { "clarification_required": ClarificationRequest, @@ -104,6 +156,8 @@ SPECIAL_STATUS_MODELS = { "focused_review_required": FocusedReviewRequired, "test_sample_needed": TestSampleNeeded, "more_tests_required": MoreTestsRequired, + "more_refactor_required": MoreRefactorRequired, + "refactor_analysis_complete": RefactorAnalysisComplete, } diff --git a/tools/refactor.py b/tools/refactor.py new file mode 100644 index 0000000..beae2f4 --- /dev/null +++ b/tools/refactor.py @@ -0,0 +1,653 @@ +""" +Refactor tool - Intelligent code refactoring suggestions with precise line-number references + +This tool analyzes code for refactoring opportunities across four main categories: +- codesmells: Detect and suggest fixes for common code smells +- decompose: Break down large functions, classes, and modules into smaller, focused components +- modernize: Update code to use modern language features and patterns +- organization: Suggest better organization and logical grouping of related functionality + +Key Features: +- Cross-language support with language-specific guidance +- Precise line-number references for Claude +- Large context handling with token budgeting +- Structured JSON responses for easy parsing +- Style guide integration for project-specific patterns +""" + +import logging +import os +from typing import Any, Literal, Optional + +from mcp.types import TextContent +from pydantic import Field + +from config import TEMPERATURE_ANALYTICAL +from systemprompts import REFACTOR_PROMPT +from utils.file_utils import translate_file_paths + +from .base import BaseTool, ToolRequest +from .models import ToolOutput + +logger = logging.getLogger(__name__) + + +class RefactorRequest(ToolRequest): + """ + Request model for the refactor tool. + + This model defines all parameters that can be used to customize + the refactoring analysis process. + """ + + files: list[str] = Field( + ..., + description="Code files or directories to analyze for refactoring opportunities (must be absolute paths)", + ) + prompt: str = Field( + ..., + description="Description of refactoring goals, context, and specific areas of focus", + ) + refactor_type: Literal["codesmells", "decompose", "modernize", "organization"] = Field( + ..., description="Type of refactoring analysis to perform" + ) + focus_areas: Optional[list[str]] = Field( + None, + description="Specific areas to focus on (e.g., 'performance', 'readability', 'maintainability', 'security')", + ) + style_guide_examples: Optional[list[str]] = Field( + None, + description=( + "Optional existing code files to use as style/pattern reference (must be absolute paths). " + "These files represent the target coding style and patterns for the project. " + "Particularly useful for 'modernize' and 'organization' refactor types." + ), + ) + + +class RefactorTool(BaseTool): + """ + Refactor tool implementation. + + This tool analyzes code to provide intelligent refactoring suggestions + with precise line-number references for Claude to implement. + """ + + def get_name(self) -> str: + return "refactor" + + def get_description(self) -> str: + return ( + "INTELLIGENT CODE REFACTORING - Analyzes code for refactoring opportunities with precise line-number guidance. " + "Supports four refactor types: 'codesmells' (detect anti-patterns), 'decompose' (break down large functions/classes/modules into smaller components), " + "'modernize' (update to modern language features), and 'organization' (improve organization and grouping of related functionality). " + "Provides specific, actionable refactoring steps that Claude can implement directly. " + "Choose thinking_mode based on codebase complexity: 'medium' for standard modules (default), " + "'high' for complex systems, 'max' for legacy codebases requiring deep analysis. " + "Note: If you're not currently using a top-tier model such as Opus 4 or above, these tools can provide enhanced capabilities." + ) + + def get_input_schema(self) -> dict[str, Any]: + schema = { + "type": "object", + "properties": { + "files": { + "type": "array", + "items": {"type": "string"}, + "description": "Code files or directories to analyze for refactoring opportunities (must be absolute paths)", + }, + "model": self.get_model_field_schema(), + "prompt": { + "type": "string", + "description": "Description of refactoring goals, context, and specific areas of focus", + }, + "refactor_type": { + "type": "string", + "enum": ["codesmells", "decompose", "modernize", "organization"], + "description": "Type of refactoring analysis to perform", + }, + "focus_areas": { + "type": "array", + "items": {"type": "string"}, + "description": "Specific areas to focus on (e.g., 'performance', 'readability', 'maintainability', 'security')", + }, + "style_guide_examples": { + "type": "array", + "items": {"type": "string"}, + "description": ( + "Optional existing code files to use as style/pattern reference (must be absolute paths). " + "These files represent the target coding style and patterns for the project." + ), + }, + "thinking_mode": { + "type": "string", + "enum": ["minimal", "low", "medium", "high", "max"], + "description": "Thinking depth: minimal (0.5% of model max), low (8%), medium (33%), high (67%), max (100% of model max)", + }, + "continuation_id": { + "type": "string", + "description": ( + "Thread continuation ID for multi-turn conversations. Can be used to continue conversations " + "across different tools. Only provide this if continuing a previous conversation thread." + ), + }, + }, + "required": ["files", "prompt", "refactor_type"] + (["model"] if self.is_effective_auto_mode() else []), + } + + return schema + + def get_system_prompt(self) -> str: + return REFACTOR_PROMPT + + def get_default_temperature(self) -> float: + return TEMPERATURE_ANALYTICAL + + def wants_line_numbers_by_default(self) -> bool: + """Refactor tool needs line numbers for precise targeting""" + return True + + def get_model_category(self): + """Refactor tool requires extended reasoning for comprehensive analysis""" + from tools.models import ToolModelCategory + + return ToolModelCategory.EXTENDED_REASONING + + def get_request_model(self): + return RefactorRequest + + async def execute(self, arguments: dict[str, Any]) -> list[TextContent]: + """Override execute to check prompt size before processing""" + logger.info(f"[REFACTOR] execute called with arguments: {list(arguments.keys())}") + + # First validate request + request_model = self.get_request_model() + request = request_model(**arguments) + + # Check prompt size if provided + if request.prompt: + size_check = self.check_prompt_size(request.prompt) + if size_check: + logger.info(f"[REFACTOR] Prompt size check triggered, returning early") + return [TextContent(type="text", text=ToolOutput(**size_check).model_dump_json())] + + logger.info(f"[REFACTOR] Prompt size OK, calling super().execute()") + # Continue with normal execution + return await super().execute(arguments) + + def detect_primary_language(self, file_paths: list[str]) -> str: + """ + Detect the primary programming language from file extensions. + + Args: + file_paths: List of file paths to analyze + + Returns: + str: Detected language or "mixed" if multiple languages found + """ + # Language detection based on file extensions + language_extensions = { + "python": {".py"}, + "javascript": {".js", ".jsx", ".mjs"}, + "typescript": {".ts", ".tsx"}, + "java": {".java"}, + "csharp": {".cs"}, + "cpp": {".cpp", ".cc", ".cxx", ".c", ".h", ".hpp"}, + "go": {".go"}, + "rust": {".rs"}, + "swift": {".swift"}, + "kotlin": {".kt"}, + "ruby": {".rb"}, + "php": {".php"}, + "scala": {".scala"}, + } + + # Count files by language + language_counts = {} + for file_path in file_paths: + extension = os.path.splitext(file_path.lower())[1] + for lang, exts in language_extensions.items(): + if extension in exts: + language_counts[lang] = language_counts.get(lang, 0) + 1 + break + + if not language_counts: + return "unknown" + + # Return most common language, or "mixed" if multiple languages + max_count = max(language_counts.values()) + dominant_languages = [lang for lang, count in language_counts.items() if count == max_count] + + if len(dominant_languages) == 1: + return dominant_languages[0] + else: + return "mixed" + + def get_language_specific_guidance(self, language: str, refactor_type: str) -> str: + """ + Generate language-specific guidance for the refactoring prompt. + + Args: + language: Detected programming language + refactor_type: Type of refactoring being performed + + Returns: + str: Language-specific guidance to inject into the prompt + """ + if language == "unknown" or language == "mixed": + return "" + + # Language-specific modernization features + modernization_features = { + "python": "f-strings, dataclasses, type hints, pathlib, async/await, context managers, list/dict comprehensions, walrus operator", + "javascript": "async/await, destructuring, arrow functions, template literals, optional chaining, nullish coalescing, modules (import/export)", + "typescript": "strict type checking, utility types, const assertions, template literal types, mapped types", + "java": "streams API, lambda expressions, optional, records, pattern matching, var declarations, text blocks", + "csharp": "LINQ, nullable reference types, pattern matching, records, async streams, using declarations", + "swift": "value types, protocol-oriented programming, property wrappers, result builders, async/await", + "go": "modules, error wrapping, context package, generics (Go 1.18+)", + "rust": "ownership patterns, iterator adapters, error handling with Result, async/await", + } + + # Language-specific code splitting patterns + splitting_patterns = { + "python": "modules, classes, functions, decorators for cross-cutting concerns", + "javascript": "modules (ES6), classes, functions, higher-order functions", + "java": "packages, classes, interfaces, abstract classes, composition over inheritance", + "csharp": "namespaces, classes, interfaces, extension methods, dependency injection", + "swift": "extensions, protocols, structs, enums with associated values", + "go": "packages, interfaces, struct composition, function types", + } + + guidance_parts = [] + + if refactor_type == "modernize" and language in modernization_features: + guidance_parts.append( + f"LANGUAGE-SPECIFIC MODERNIZATION ({language.upper()}): Focus on {modernization_features[language]}" + ) + + if refactor_type == "decompose" and language in splitting_patterns: + guidance_parts.append( + f"LANGUAGE-SPECIFIC DECOMPOSITION ({language.upper()}): Use {splitting_patterns[language]} to break down large components" + ) + + # General language guidance + general_guidance = { + "python": "Follow PEP 8, use descriptive names, prefer composition over inheritance", + "javascript": "Use consistent naming conventions, avoid global variables, prefer functional patterns", + "java": "Follow Java naming conventions, use interfaces for abstraction, consider immutability", + "csharp": "Follow C# naming conventions, use nullable reference types, prefer async methods", + } + + if language in general_guidance: + guidance_parts.append(f"GENERAL GUIDANCE ({language.upper()}): {general_guidance[language]}") + + return "\n".join(guidance_parts) if guidance_parts else "" + + def _process_style_guide_examples( + self, style_examples: list[str], continuation_id: Optional[str], available_tokens: int = None + ) -> tuple[str, str]: + """ + Process style guide example files using available token budget. + + Args: + style_examples: List of style guide file paths + continuation_id: Continuation ID for filtering already embedded files + available_tokens: Available token budget for examples + + Returns: + tuple: (formatted_content, summary_note) + """ + logger.debug(f"[REFACTOR] Processing {len(style_examples)} style guide examples") + + if not style_examples: + logger.debug("[REFACTOR] No style guide examples provided") + return "", "" + + # Use existing file filtering to avoid duplicates in continuation + examples_to_process = self.filter_new_files(style_examples, continuation_id) + logger.debug(f"[REFACTOR] After filtering: {len(examples_to_process)} new style examples to process") + + if not examples_to_process: + logger.info(f"[REFACTOR] All {len(style_examples)} style examples already in conversation history") + return "", "" + + # Translate file paths for Docker environment before accessing files + translated_examples = translate_file_paths(examples_to_process) + logger.debug(f"[REFACTOR] Translated {len(examples_to_process)} file paths for container access") + + # Calculate token budget for style examples (20% of available tokens, or fallback) + if available_tokens: + style_examples_budget = int(available_tokens * 0.20) # 20% for style examples + logger.debug( + f"[REFACTOR] Allocating {style_examples_budget:,} tokens (20% of {available_tokens:,}) for style examples" + ) + else: + style_examples_budget = 25000 # Fallback if no budget provided + logger.debug(f"[REFACTOR] Using fallback budget of {style_examples_budget:,} tokens for style examples") + + original_count = len(examples_to_process) + logger.debug( + f"[REFACTOR] Processing {original_count} style example files with {style_examples_budget:,} token budget" + ) + + # Sort by file size (smallest first) for pattern-focused selection + file_sizes = [] + for i, file_path in enumerate(examples_to_process): + translated_path = translated_examples[i] + try: + size = os.path.getsize(translated_path) + file_sizes.append((file_path, size)) + logger.debug(f"[REFACTOR] Style example {os.path.basename(file_path)}: {size:,} bytes") + except (OSError, FileNotFoundError) as e: + logger.warning(f"[REFACTOR] Could not get size for {file_path}: {e}") + file_sizes.append((file_path, float("inf"))) + + # Sort by size and take smallest files for pattern reference + file_sizes.sort(key=lambda x: x[1]) + examples_to_process = [f[0] for f in file_sizes] + logger.debug( + f"[REFACTOR] Sorted style examples by size (smallest first): {[os.path.basename(f) for f in examples_to_process]}" + ) + + # Use standard file content preparation with dynamic token budget and line numbers + try: + logger.debug(f"[REFACTOR] Preparing file content for {len(examples_to_process)} style examples") + content = self._prepare_file_content_for_prompt( + examples_to_process, + continuation_id, + "Style guide examples", + max_tokens=style_examples_budget, + reserve_tokens=1000, + ) + + # Determine how many files were actually included + if content: + from utils.token_utils import estimate_tokens + + used_tokens = estimate_tokens(content) + logger.info( + f"[REFACTOR] Successfully embedded style examples: {used_tokens:,} tokens used ({style_examples_budget:,} available)" + ) + if original_count > 1: + truncation_note = f"Note: Used {used_tokens:,} tokens ({style_examples_budget:,} available) for style guide examples from {original_count} files to determine coding patterns." + else: + truncation_note = "" + else: + logger.warning("[REFACTOR] No content generated for style examples") + truncation_note = "" + + return content, truncation_note + + except Exception as e: + # If style example processing fails, continue without examples rather than failing + logger.error(f"[REFACTOR] Failed to process style examples: {type(e).__name__}: {e}") + return "", f"Warning: Could not process style guide examples: {str(e)}" + + async def prepare_prompt(self, request: RefactorRequest) -> str: + """ + Prepare the refactoring prompt with code analysis and optional style examples. + + This method reads the requested files, processes any style guide examples, + and constructs a detailed prompt for comprehensive refactoring analysis. + + Args: + request: The validated refactor request + + Returns: + str: Complete prompt for the model + + Raises: + ValueError: If the code exceeds token limits + """ + logger.info(f"[REFACTOR] prepare_prompt called with {len(request.files)} files, type={request.refactor_type}") + logger.debug(f"[REFACTOR] Preparing prompt for {len(request.files)} code files") + logger.debug(f"[REFACTOR] Refactor type: {request.refactor_type}") + if request.style_guide_examples: + logger.debug(f"[REFACTOR] Including {len(request.style_guide_examples)} style guide examples") + + # Check for prompt.txt in files + prompt_content, updated_files = self.handle_prompt_file(request.files) + + # If prompt.txt was found, incorporate it into the prompt + if prompt_content: + logger.debug("[REFACTOR] Found prompt.txt file, incorporating content") + request.prompt = prompt_content + "\n\n" + request.prompt + + # Update request files list + if updated_files is not None: + logger.debug(f"[REFACTOR] Updated files list after prompt.txt processing: {len(updated_files)} files") + request.files = updated_files + + # Calculate available token budget for dynamic allocation + continuation_id = getattr(request, "continuation_id", None) + + # Get model context for token budget calculation + model_name = getattr(self, "_current_model_name", None) + available_tokens = None + + if model_name: + try: + provider = self.get_model_provider(model_name) + capabilities = provider.get_capabilities(model_name) + # Use 75% of context for content (code + style examples), 25% for response + available_tokens = int(capabilities.context_window * 0.75) + logger.debug( + f"[REFACTOR] Token budget calculation: {available_tokens:,} tokens (75% of {capabilities.context_window:,}) for model {model_name}" + ) + except Exception as e: + # Fallback to conservative estimate + logger.warning(f"[REFACTOR] Could not get model capabilities for {model_name}: {e}") + available_tokens = 120000 # Conservative fallback + logger.debug(f"[REFACTOR] Using fallback token budget: {available_tokens:,} tokens") + + # Process style guide examples first to determine token allocation + style_examples_content = "" + style_examples_note = "" + + if request.style_guide_examples: + logger.debug(f"[REFACTOR] Processing {len(request.style_guide_examples)} style guide examples") + style_examples_content, style_examples_note = self._process_style_guide_examples( + request.style_guide_examples, continuation_id, available_tokens + ) + if style_examples_content: + logger.info("[REFACTOR] Style guide examples processed successfully for pattern reference") + else: + logger.info("[REFACTOR] No style guide examples content after processing") + + # Remove files that appear in both 'files' and 'style_guide_examples' to avoid duplicate embedding + code_files_to_process = request.files.copy() + if request.style_guide_examples: + # Normalize paths for comparison + style_example_set = {os.path.normpath(os.path.abspath(f)) for f in request.style_guide_examples} + original_count = len(code_files_to_process) + + code_files_to_process = [ + f for f in code_files_to_process if os.path.normpath(os.path.abspath(f)) not in style_example_set + ] + + duplicates_removed = original_count - len(code_files_to_process) + if duplicates_removed > 0: + logger.info( + f"[REFACTOR] Removed {duplicates_removed} duplicate files from code files list " + f"(already included in style guide examples for pattern reference)" + ) + + # Calculate remaining tokens for main code after style examples + if style_examples_content and available_tokens: + from utils.token_utils import estimate_tokens + + style_tokens = estimate_tokens(style_examples_content) + remaining_tokens = available_tokens - style_tokens - 5000 # Reserve for prompt structure + logger.debug( + f"[REFACTOR] Token allocation: {style_tokens:,} for examples, {remaining_tokens:,} remaining for code files" + ) + else: + if available_tokens: + remaining_tokens = available_tokens - 10000 + else: + remaining_tokens = 110000 # Conservative fallback (120000 - 10000) + logger.debug( + f"[REFACTOR] Token allocation: {remaining_tokens:,} tokens available for code files (no style examples)" + ) + + # Use centralized file processing logic for main code files (with line numbers enabled) + logger.debug(f"[REFACTOR] Preparing {len(code_files_to_process)} code files for analysis") + code_content = self._prepare_file_content_for_prompt( + code_files_to_process, continuation_id, "Code to analyze", max_tokens=remaining_tokens, reserve_tokens=2000 + ) + + if code_content: + from utils.token_utils import estimate_tokens + + code_tokens = estimate_tokens(code_content) + logger.info(f"[REFACTOR] Code files embedded successfully: {code_tokens:,} tokens") + else: + logger.warning("[REFACTOR] No code content after file processing") + + # Detect primary language for language-specific guidance + primary_language = self.detect_primary_language(request.files) + logger.debug(f"[REFACTOR] Detected primary language: {primary_language}") + + # Get language-specific guidance + language_guidance = self.get_language_specific_guidance(primary_language, request.refactor_type) + + # Build the complete prompt + prompt_parts = [] + + # Add system prompt with dynamic language guidance + base_system_prompt = self.get_system_prompt() + if language_guidance: + enhanced_system_prompt = f"{base_system_prompt}\n\n{language_guidance}" + else: + enhanced_system_prompt = base_system_prompt + prompt_parts.append(enhanced_system_prompt) + + # Add user context + prompt_parts.append("=== USER CONTEXT ===") + prompt_parts.append(f"Refactor Type: {request.refactor_type}") + if request.focus_areas: + prompt_parts.append(f"Focus Areas: {', '.join(request.focus_areas)}") + prompt_parts.append(f"User Goals: {request.prompt}") + prompt_parts.append("=== END CONTEXT ===") + + # Add style guide examples if provided + if style_examples_content: + prompt_parts.append("\n=== STYLE GUIDE EXAMPLES ===") + if style_examples_note: + prompt_parts.append(f"// {style_examples_note}") + prompt_parts.append(style_examples_content) + prompt_parts.append("=== END STYLE GUIDE EXAMPLES ===") + + # Add main code to analyze + prompt_parts.append("\n=== CODE TO ANALYZE ===") + prompt_parts.append(code_content) + prompt_parts.append("=== END CODE ===") + + # Add generation instructions + prompt_parts.append( + f"\nPlease analyze the code for {request.refactor_type} refactoring opportunities following the multi-expert workflow specified in the system prompt." + ) + if style_examples_content: + prompt_parts.append( + "Use the provided style guide examples as a reference for target coding patterns and style." + ) + + full_prompt = "\n".join(prompt_parts) + + # Log final prompt statistics + from utils.token_utils import estimate_tokens + + total_tokens = estimate_tokens(full_prompt) + logger.info(f"[REFACTOR] Complete prompt prepared: {total_tokens:,} tokens, {len(full_prompt):,} characters") + + return full_prompt + + def format_response(self, response: str, request: RefactorRequest, model_info: Optional[dict] = None) -> str: + """ + Format the refactoring response. + + The base tool handles structured response validation via SPECIAL_STATUS_MODELS, + so this method focuses on presentation formatting. + + Args: + response: The raw refactoring analysis from the model + request: The original request for context + model_info: Optional dict with model metadata + + Returns: + str: The response (base tool will handle structured parsing) + """ + logger.debug(f"[REFACTOR] Formatting response for {request.refactor_type} refactoring") + + # Check if this is a more_refactor_required response + is_more_required = False + try: + import json + + parsed = json.loads(response) + if isinstance(parsed, dict) and parsed.get("status") == "more_refactor_required": + is_more_required = True + except (json.JSONDecodeError, ValueError): + # Not JSON or not more_refactor_required + pass + + # Always add implementation instructions for any refactoring response + step4_title = ( + "## Step 4: CONTINUE WITH MORE REFACTORING" if is_more_required else "## Step 4: COMPLETE REFACTORING" + ) + step4_intro = ( + "Once all refactorings above are implemented and verified working, IMMEDIATELY continue the analysis:" + if is_more_required + else "Once all refactorings above are implemented and verified working:" + ) + step4_action = ( + "Use the refactor tool again with the SAME parameters but include the continuation_id from this response to get additional refactoring opportunities. The model will provide more refactoring suggestions building on what was already completed." + if is_more_required + else "Review all changes made and ensure the codebase is cleaner, more maintainable, and follows best practices." + ) + critical_msg = ( + "apply all refactorings, validate they work, then immediately continue with more refactoring analysis. Take full ownership of the refactoring implementation and continue until all opportunities are addressed." + if is_more_required + else "create, modify, and reorganize files as needed. Take full ownership of the refactoring implementation and ensure all changes work correctly." + ) + + return f"""{response} + +--- + +# IMMEDIATE NEXT ACTION + +Claude, you are now in EXECUTION MODE. Take immediate action: + +## Step 1: ULTRATHINK & IMPLEMENT REFACTORINGS +ULTRATHINK while implementing these refactorings. Verify EVERY code reference, import, dependency, and access modifier is 100% accurate before making changes. + +- **IMPLEMENT** all refactoring opportunities listed above in priority order +- **CREATE** any new files needed for decomposition or reorganization +- **MOVE** code to appropriate locations following the refactoring plan +- **UPDATE** all imports and references to maintain functionality +- **VALIDATE** that no functionality is broken by the changes + +## Step 2: VERIFY CHANGES WORK +**MANDATORY**: After each refactoring step: +- Ensure all imports are updated correctly +- Verify access modifiers (private/public/internal) still work +- Check that all references to moved code are updated +- Run any existing tests to confirm nothing is broken +- Fix any issues that arise from the refactoring + +## Step 3: DISPLAY RESULTS TO USER +After implementing each refactoring, show the user: +``` +✅ Refactored: [refactor-id] - Brief description + - Files modified: [list of files] + - [Key change summary] +``` + +{step4_title} +{step4_intro} + +{step4_action} + +**CRITICAL**: Do NOT stop after generating the analysis - you MUST {critical_msg}""" diff --git a/tools/testgen.py b/tools/testgen.py index f3648eb..7f14898 100644 --- a/tools/testgen.py +++ b/tools/testgen.py @@ -134,6 +134,10 @@ class TestGenTool(BaseTool): def get_default_temperature(self) -> float: return TEMPERATURE_ANALYTICAL + def wants_line_numbers_by_default(self) -> bool: + """Test generation tool needs line numbers for precise targeting""" + return True + def get_model_category(self): """TestGen requires extended reasoning for comprehensive test analysis""" from tools.models import ToolModelCategory diff --git a/utils/__init__.py b/utils/__init__.py index 3de7faa..de1c380 100644 --- a/utils/__init__.py +++ b/utils/__init__.py @@ -2,7 +2,9 @@ Utility functions for Zen MCP Server """ -from .file_utils import CODE_EXTENSIONS, expand_paths, read_file_content, read_files +from .file_types import CODE_EXTENSIONS, FILE_CATEGORIES, PROGRAMMING_EXTENSIONS, TEXT_EXTENSIONS +from .file_utils import expand_paths, read_file_content, read_files +from .security_config import EXCLUDED_DIRS, SECURITY_ROOT from .token_utils import check_token_limit, estimate_tokens __all__ = [ @@ -10,6 +12,11 @@ __all__ = [ "read_file_content", "expand_paths", "CODE_EXTENSIONS", + "PROGRAMMING_EXTENSIONS", + "TEXT_EXTENSIONS", + "FILE_CATEGORIES", + "SECURITY_ROOT", + "EXCLUDED_DIRS", "estimate_tokens", "check_token_limit", ] diff --git a/utils/file_types.py b/utils/file_types.py new file mode 100644 index 0000000..c0bbd8f --- /dev/null +++ b/utils/file_types.py @@ -0,0 +1,180 @@ +""" +File type definitions and constants for file processing + +This module centralizes all file type and extension definitions used +throughout the MCP server for consistent file handling. +""" + +# Programming language file extensions - core code files +PROGRAMMING_LANGUAGES = { + ".py", # Python + ".js", # JavaScript + ".ts", # TypeScript + ".jsx", # React JavaScript + ".tsx", # React TypeScript + ".java", # Java + ".cpp", # C++ + ".c", # C + ".h", # C/C++ Header + ".hpp", # C++ Header + ".cs", # C# + ".go", # Go + ".rs", # Rust + ".rb", # Ruby + ".php", # PHP + ".swift", # Swift + ".kt", # Kotlin + ".scala", # Scala + ".r", # R + ".m", # Objective-C + ".mm", # Objective-C++ +} + +# Script and shell file extensions +SCRIPTS = { + ".sql", # SQL + ".sh", # Shell + ".bash", # Bash + ".zsh", # Zsh + ".fish", # Fish shell + ".ps1", # PowerShell + ".bat", # Batch + ".cmd", # Command +} + +# Configuration and data file extensions +CONFIGS = { + ".yml", # YAML + ".yaml", # YAML + ".json", # JSON + ".xml", # XML + ".toml", # TOML + ".ini", # INI + ".cfg", # Config + ".conf", # Config + ".properties", # Properties + ".env", # Environment +} + +# Documentation and markup file extensions +DOCS = { + ".txt", # Text + ".md", # Markdown + ".rst", # reStructuredText + ".tex", # LaTeX +} + +# Web development file extensions +WEB = { + ".html", # HTML + ".css", # CSS + ".scss", # Sass + ".sass", # Sass + ".less", # Less +} + +# Additional text file extensions for logs and data +TEXT_DATA = { + ".log", # Log files + ".csv", # CSV + ".tsv", # TSV + ".gitignore", # Git ignore + ".dockerfile", # Docker + ".makefile", # Make + ".cmake", # CMake + ".gradle", # Gradle + ".sbt", # SBT + ".pom", # Maven POM + ".lock", # Lock files +} + +# Image file extensions +IMAGES = {".jpg", ".jpeg", ".png", ".gif", ".bmp", ".svg", ".webp", ".ico", ".tiff", ".tif"} + +# Binary executable and library extensions +BINARIES = { + ".exe", # Windows executable + ".dll", # Windows library + ".so", # Linux shared object + ".dylib", # macOS dynamic library + ".bin", # Binary + ".class", # Java class +} + +# Archive and package file extensions +ARCHIVES = { + ".jar", + ".war", + ".ear", # Java archives + ".zip", + ".tar", + ".gz", # General archives + ".7z", + ".rar", # Compression + ".deb", + ".rpm", # Linux packages + ".dmg", + ".pkg", # macOS packages +} + +# Derived sets for different use cases +CODE_EXTENSIONS = PROGRAMMING_LANGUAGES | SCRIPTS | CONFIGS | DOCS | WEB +PROGRAMMING_EXTENSIONS = PROGRAMMING_LANGUAGES # For line numbering +TEXT_EXTENSIONS = CODE_EXTENSIONS | TEXT_DATA +IMAGE_EXTENSIONS = IMAGES +BINARY_EXTENSIONS = BINARIES | ARCHIVES + +# All extensions by category for easy access +FILE_CATEGORIES = { + "programming": PROGRAMMING_LANGUAGES, + "scripts": SCRIPTS, + "configs": CONFIGS, + "docs": DOCS, + "web": WEB, + "text_data": TEXT_DATA, + "images": IMAGES, + "binaries": BINARIES, + "archives": ARCHIVES, +} + + +def get_file_category(file_path: str) -> str: + """ + Determine the category of a file based on its extension. + + Args: + file_path: Path to the file + + Returns: + Category name or "unknown" if not recognized + """ + from pathlib import Path + + extension = Path(file_path).suffix.lower() + + for category, extensions in FILE_CATEGORIES.items(): + if extension in extensions: + return category + + return "unknown" + + +def is_code_file(file_path: str) -> bool: + """Check if a file is a code file (programming language).""" + from pathlib import Path + + return Path(file_path).suffix.lower() in PROGRAMMING_LANGUAGES + + +def is_text_file(file_path: str) -> bool: + """Check if a file is a text file.""" + from pathlib import Path + + return Path(file_path).suffix.lower() in TEXT_EXTENSIONS + + +def is_binary_file(file_path: str) -> bool: + """Check if a file is a binary file.""" + from pathlib import Path + + return Path(file_path).suffix.lower() in BINARY_EXTENSIONS diff --git a/utils/file_utils.py b/utils/file_utils.py index 3b8c1b3..4ed0395 100644 --- a/utils/file_utils.py +++ b/utils/file_utils.py @@ -23,148 +23,12 @@ import os from pathlib import Path from typing import Optional +from .file_types import BINARY_EXTENSIONS, CODE_EXTENSIONS, IMAGE_EXTENSIONS, TEXT_EXTENSIONS +from .security_config import CONTAINER_WORKSPACE, EXCLUDED_DIRS, MCP_SIGNATURE_FILES, SECURITY_ROOT, WORKSPACE_ROOT from .token_utils import DEFAULT_CONTEXT_WINDOW, estimate_tokens logger = logging.getLogger(__name__) -# Get workspace root for Docker path translation -# IMPORTANT: WORKSPACE_ROOT should contain the HOST path (e.g., /Users/john/project) -# that gets mounted to /workspace in the Docker container. This enables proper -# path translation between host absolute paths and container workspace paths. -WORKSPACE_ROOT = os.environ.get("WORKSPACE_ROOT") -CONTAINER_WORKSPACE = Path("/workspace") - -# Dangerous paths that should never be used as WORKSPACE_ROOT -# These would give overly broad access and pose security risks -DANGEROUS_WORKSPACE_PATHS = { - "/", - "/etc", - "/usr", - "/bin", - "/var", - "/root", - "/home", - "/workspace", # Container path - WORKSPACE_ROOT should be host path - "C:\\", - "C:\\Windows", - "C:\\Program Files", - "C:\\Users", -} - -# Validate WORKSPACE_ROOT for security if it's set -if WORKSPACE_ROOT: - # Resolve to canonical path for comparison - resolved_workspace = Path(WORKSPACE_ROOT).resolve() - - # Special check for /workspace - common configuration mistake - if str(resolved_workspace) == "/workspace": - raise RuntimeError( - f"Configuration Error: WORKSPACE_ROOT should be set to the HOST path, not the container path. " - f"Found: WORKSPACE_ROOT={WORKSPACE_ROOT} " - f"Expected: WORKSPACE_ROOT should be set to your host directory path (e.g., $HOME) " - f"that contains all files Claude might reference. " - f"This path gets mounted to /workspace inside the Docker container." - ) - - # Check against other dangerous paths - if str(resolved_workspace) in DANGEROUS_WORKSPACE_PATHS: - raise RuntimeError( - f"Security Error: WORKSPACE_ROOT '{WORKSPACE_ROOT}' is set to a dangerous system directory. " - f"This would give access to critical system files. " - f"Please set WORKSPACE_ROOT to a specific project directory." - ) - - # Additional check: prevent filesystem root - if resolved_workspace.parent == resolved_workspace: - raise RuntimeError( - f"Security Error: WORKSPACE_ROOT '{WORKSPACE_ROOT}' cannot be the filesystem root. " - f"This would give access to the entire filesystem. " - f"Please set WORKSPACE_ROOT to a specific project directory." - ) - -# Security boundary -# In Docker: use /workspace (container directory) -# In tests/direct mode: use WORKSPACE_ROOT (host directory) -if CONTAINER_WORKSPACE.exists(): - # Running in Docker container - SECURITY_ROOT = CONTAINER_WORKSPACE -elif WORKSPACE_ROOT: - # Running in tests or direct mode with WORKSPACE_ROOT set - SECURITY_ROOT = Path(WORKSPACE_ROOT).resolve() -else: - # Fallback for backward compatibility (should not happen in normal usage) - SECURITY_ROOT = Path.home() - - -# Directories to exclude from recursive file search -# These typically contain generated code, dependencies, or build artifacts -EXCLUDED_DIRS = { - "__pycache__", - "node_modules", - ".venv", - "venv", - "env", - ".env", - ".git", - ".svn", - ".hg", - "build", - "dist", - "target", - ".idea", - ".vscode", - "__pypackages__", - ".mypy_cache", - ".pytest_cache", - ".tox", - "htmlcov", - ".coverage", - # Additional build and temp directories - "out", - ".next", - ".nuxt", - ".cache", - ".temp", - ".tmp", - "bower_components", - "vendor", - ".sass-cache", - ".gradle", - ".m2", - "coverage", - # OS-specific directories - ".DS_Store", - "Thumbs.db", - # Python specific - "*.egg-info", - ".eggs", - "wheels", - ".Python", - # IDE and editor directories - ".sublime", - ".atom", - ".brackets", - "*.swp", - "*.swo", - "*~", - # Documentation build - "_build", - "site", - # Mobile development - ".expo", - ".flutter", -} - -# MCP signature files - presence of these indicates the MCP's own directory -# Used to prevent the MCP from scanning its own codebase -MCP_SIGNATURE_FILES = { - "zen_server.py", - "server.py", - "tools/precommit.py", - "utils/file_utils.py", - "prompts/tool_prompts.py", -} - def is_mcp_directory(path: Path) -> bool: """ @@ -242,7 +106,7 @@ def is_home_directory_root(path: Path) -> bool: # Check if this is exactly the home directory if resolved_path == resolved_home: logger.warning( - f"Attempted to scan user home directory root: {path}. " f"Please specify a subdirectory instead." + f"Attempted to scan user home directory root: {path}. Please specify a subdirectory instead." ) return True @@ -277,56 +141,105 @@ def is_home_directory_root(path: Path) -> bool: return False -# Common code file extensions that are automatically included when processing directories -# This set can be extended to support additional file types -CODE_EXTENSIONS = { - ".py", - ".js", - ".ts", - ".jsx", - ".tsx", - ".java", - ".cpp", - ".c", - ".h", - ".hpp", - ".cs", - ".go", - ".rs", - ".rb", - ".php", - ".swift", - ".kt", - ".scala", - ".r", - ".m", - ".mm", - ".sql", - ".sh", - ".bash", - ".zsh", - ".fish", - ".ps1", - ".bat", - ".cmd", - ".yml", - ".yaml", - ".json", - ".xml", - ".toml", - ".ini", - ".cfg", - ".conf", - ".txt", - ".md", - ".rst", - ".tex", - ".html", - ".css", - ".scss", - ".sass", - ".less", -} +def detect_file_type(file_path: str) -> str: + """ + Detect file type for appropriate processing strategy. + + NOTE: This function is currently not used for line number auto-detection + due to backward compatibility requirements. It is intended for future + features requiring specific file type handling (e.g., image processing, + binary file analysis, or enhanced file filtering). + + Args: + file_path: Path to the file to analyze + + Returns: + str: "text", "binary", or "image" + """ + path = Path(file_path) + + # Check extension first (fast) + extension = path.suffix.lower() + if extension in TEXT_EXTENSIONS: + return "text" + elif extension in IMAGE_EXTENSIONS: + return "image" + elif extension in BINARY_EXTENSIONS: + return "binary" + + # Fallback: check magic bytes for text vs binary + # This is helpful for files without extensions or unknown extensions + try: + with open(path, "rb") as f: + chunk = f.read(1024) + # Simple heuristic: if we can decode as UTF-8, likely text + chunk.decode("utf-8") + return "text" + except UnicodeDecodeError: + return "binary" + except (FileNotFoundError, PermissionError) as e: + logger.warning(f"Could not access file {file_path} for type detection: {e}") + return "unknown" + + +def should_add_line_numbers(file_path: str, include_line_numbers: Optional[bool] = None) -> bool: + """ + Determine if line numbers should be added to a file. + + Args: + file_path: Path to the file + include_line_numbers: Explicit preference, or None for auto-detection + + Returns: + bool: True if line numbers should be added + """ + if include_line_numbers is not None: + return include_line_numbers + + # Default: DO NOT add line numbers (backwards compatibility) + # Tools that want line numbers must explicitly request them + return False + + +def _normalize_line_endings(content: str) -> str: + """ + Normalize line endings for consistent line numbering. + + Args: + content: File content with potentially mixed line endings + + Returns: + str: Content with normalized LF line endings + """ + # Normalize all line endings to LF for consistent counting + return content.replace("\r\n", "\n").replace("\r", "\n") + + +def _add_line_numbers(content: str) -> str: + """ + Add line numbers to text content for precise referencing. + + Args: + content: Text content to number + + Returns: + str: Content with line numbers in format " 45│ actual code line" + Supports files up to 99,999 lines with dynamic width allocation + """ + # Normalize line endings first + normalized_content = _normalize_line_endings(content) + lines = normalized_content.split("\n") + + # Dynamic width allocation based on total line count + # This supports files of any size by computing required width + total_lines = len(lines) + width = len(str(total_lines)) + width = max(width, 4) # Minimum padding for readability + + # Format with dynamic width and clear separator + numbered_lines = [f"{i + 1:{width}d}│ {line}" for i, line in enumerate(lines)] + + return "\n".join(numbered_lines) def translate_path_for_environment(path_str: str) -> str: @@ -515,15 +428,13 @@ def expand_paths(paths: list[str], extensions: Optional[set[str]] = None) -> lis # Check 2: Prevent scanning user's home directory root if is_home_directory_root(path_obj): - logger.warning( - f"Skipping home directory root: {path}. " f"Please specify a project subdirectory instead." - ) + logger.warning(f"Skipping home directory root: {path}. Please specify a project subdirectory instead.") continue # Check 3: Skip if this is the MCP's own directory if is_mcp_directory(path_obj): logger.info( - f"Skipping MCP server directory: {path}. " f"The MCP server code is excluded from project scans." + f"Skipping MCP server directory: {path}. The MCP server code is excluded from project scans." ) continue @@ -575,7 +486,9 @@ def expand_paths(paths: list[str], extensions: Optional[set[str]] = None) -> lis return expanded_files -def read_file_content(file_path: str, max_size: int = 1_000_000) -> tuple[str, int]: +def read_file_content( + file_path: str, max_size: int = 1_000_000, *, include_line_numbers: Optional[bool] = None +) -> tuple[str, int]: """ Read a single file and format it for inclusion in AI prompts. @@ -586,6 +499,7 @@ def read_file_content(file_path: str, max_size: int = 1_000_000) -> tuple[str, i Args: file_path: Path to file (must be absolute) max_size: Maximum file size to read (default 1MB to prevent memory issues) + include_line_numbers: Whether to add line numbers. If None, auto-detects based on file type Returns: Tuple of (formatted_content, estimated_tokens) @@ -634,6 +548,10 @@ def read_file_content(file_path: str, max_size: int = 1_000_000) -> tuple[str, i content = f"\n--- FILE TOO LARGE: {file_path} ---\nFile size: {file_size:,} bytes (max: {max_size:,})\n--- END FILE ---\n" return content, estimate_tokens(content) + # Determine if we should add line numbers + add_line_numbers = should_add_line_numbers(file_path, include_line_numbers) + logger.debug(f"[FILES] Line numbers for {file_path}: {'enabled' if add_line_numbers else 'disabled'}") + # Read the file with UTF-8 encoding, replacing invalid characters # This ensures we can handle files with mixed encodings logger.debug(f"[FILES] Reading file content for {file_path}") @@ -642,6 +560,14 @@ def read_file_content(file_path: str, max_size: int = 1_000_000) -> tuple[str, i logger.debug(f"[FILES] Successfully read {len(file_content)} characters from {file_path}") + # Add line numbers if requested or auto-detected + if add_line_numbers: + file_content = _add_line_numbers(file_content) + logger.debug(f"[FILES] Added line numbers to {file_path}") + else: + # Still normalize line endings for consistency + file_content = _normalize_line_endings(file_content) + # Format with clear delimiters that help the AI understand file boundaries # Using consistent markers makes it easier for the model to parse # NOTE: These markers ("--- BEGIN FILE: ... ---") are distinct from git diff markers @@ -665,6 +591,8 @@ def read_files( code: Optional[str] = None, max_tokens: Optional[int] = None, reserve_tokens: int = 50_000, + *, + include_line_numbers: bool = False, ) -> str: """ Read multiple files and optional direct code with smart token management. @@ -679,6 +607,7 @@ def read_files( code: Optional direct code to include (prioritized over files) max_tokens: Maximum tokens to use (defaults to DEFAULT_CONTEXT_WINDOW) reserve_tokens: Tokens to reserve for prompt and response (default 50K) + include_line_numbers: Whether to add line numbers to file content Returns: str: All file contents formatted for AI consumption @@ -728,7 +657,7 @@ def read_files( files_skipped.extend(all_files[i:]) break - file_content, file_tokens = read_file_content(file_path) + file_content, file_tokens = read_file_content(file_path, include_line_numbers=include_line_numbers) logger.debug(f"[FILES] File {file_path}: {file_tokens:,} tokens") # Check if adding this file would exceed limit diff --git a/utils/security_config.py b/utils/security_config.py new file mode 100644 index 0000000..6e911a2 --- /dev/null +++ b/utils/security_config.py @@ -0,0 +1,174 @@ +""" +Security configuration and path validation constants + +This module contains security-related constants and configurations +for file access control and workspace management. +""" + +import os +from pathlib import Path + +# Dangerous paths that should never be used as WORKSPACE_ROOT +# These would give overly broad access and pose security risks +DANGEROUS_WORKSPACE_PATHS = { + "/", + "/etc", + "/usr", + "/bin", + "/var", + "/root", + "/home", + "/workspace", # Container path - WORKSPACE_ROOT should be host path + "C:\\", + "C:\\Windows", + "C:\\Program Files", + "C:\\Users", +} + +# Directories to exclude from recursive file search +# These typically contain generated code, dependencies, or build artifacts +EXCLUDED_DIRS = { + # Python + "__pycache__", + ".venv", + "venv", + "env", + ".env", + "*.egg-info", + ".eggs", + "wheels", + ".Python", + ".mypy_cache", + ".pytest_cache", + ".tox", + "htmlcov", + ".coverage", + "coverage", + # Node.js / JavaScript + "node_modules", + ".next", + ".nuxt", + "bower_components", + ".sass-cache", + # Version Control + ".git", + ".svn", + ".hg", + # Build Output + "build", + "dist", + "target", + "out", + # IDEs + ".idea", + ".vscode", + ".sublime", + ".atom", + ".brackets", + # Temporary / Cache + ".cache", + ".temp", + ".tmp", + "*.swp", + "*.swo", + "*~", + # OS-specific + ".DS_Store", + "Thumbs.db", + # Java / JVM + ".gradle", + ".m2", + # Documentation build + "_build", + "site", + # Mobile development + ".expo", + ".flutter", + # Package managers + "vendor", +} + +# MCP signature files - presence of these indicates the MCP's own directory +# Used to prevent the MCP from scanning its own codebase +MCP_SIGNATURE_FILES = { + "zen_server.py", + "server.py", + "tools/precommit.py", + "utils/file_utils.py", + "prompts/tool_prompts.py", +} + +# Workspace configuration +WORKSPACE_ROOT = os.environ.get("WORKSPACE_ROOT") +CONTAINER_WORKSPACE = Path("/workspace") + + +def validate_workspace_security(workspace_root: str) -> None: + """ + Validate that WORKSPACE_ROOT is set to a safe directory. + + Args: + workspace_root: The workspace root path to validate + + Raises: + RuntimeError: If the workspace root is unsafe + """ + if not workspace_root: + return + + # Resolve to canonical path for comparison + resolved_workspace = Path(workspace_root).resolve() + + # Special check for /workspace - common configuration mistake + if str(resolved_workspace) == "/workspace": + raise RuntimeError( + f"Configuration Error: WORKSPACE_ROOT should be set to the HOST path, not the container path. " + f"Found: WORKSPACE_ROOT={workspace_root} " + f"Expected: WORKSPACE_ROOT should be set to your host directory path (e.g., $HOME) " + f"that contains all files Claude might reference. " + f"This path gets mounted to /workspace inside the Docker container." + ) + + # Check against other dangerous paths + if str(resolved_workspace) in DANGEROUS_WORKSPACE_PATHS: + raise RuntimeError( + f"Security Error: WORKSPACE_ROOT '{workspace_root}' is set to a dangerous system directory. " + f"This would give access to critical system files. " + f"Please set WORKSPACE_ROOT to a specific project directory." + ) + + # Additional check: prevent filesystem root + if resolved_workspace.parent == resolved_workspace: + raise RuntimeError( + f"Security Error: WORKSPACE_ROOT '{workspace_root}' cannot be the filesystem root. " + f"This would give access to the entire filesystem. " + f"Please set WORKSPACE_ROOT to a specific project directory." + ) + + +def get_security_root() -> Path: + """ + Determine the security boundary for file access. + + Returns: + Path object representing the security root directory + """ + # In Docker: use /workspace (container directory) + # In tests/direct mode: use WORKSPACE_ROOT (host directory) + if CONTAINER_WORKSPACE.exists(): + # Running in Docker container + return CONTAINER_WORKSPACE + elif WORKSPACE_ROOT: + # Running in tests or direct mode with WORKSPACE_ROOT set + return Path(WORKSPACE_ROOT).resolve() + else: + # Fallback for backward compatibility (should not happen in normal usage) + return Path.home() + + +# Validate security on import if WORKSPACE_ROOT is set +if WORKSPACE_ROOT: + validate_workspace_security(WORKSPACE_ROOT) + +# Export the computed security root +SECURITY_ROOT = get_security_root()