Major new addition: refactor tool

Supports decomposing large components and files, finding codesmells, finding modernizing opportunities as well as code organization opportunities. Fix this mega-classes today! Line numbers added to embedded code for better references from model -> claude
2025-06-15 06:00:01 +04:00
parent 70f1356e3e
commit b5004b91fc
28 changed files with 2633 additions and 310 deletions
--- a/README.md
+++ b/README.md
@@ -13,7 +13,10 @@ problem-solving, and collaborative development.
 **Features true AI orchestration with conversations that continue across tasks** - Give Claude a complex
 task and let it orchestrate between models automatically. Claude stays in control, performs the actual work, 
-but gets perspectives from the best AI for each subtask. Claude can switch between different tools _and_ models mid-conversation, 
+but gets perspectives from the best AI for each subtask. With tools like [`analyze`](#6-analyze---smart-file-analysis) for 
 understanding codebases, [`codereview`](#3-codereview---professional-code-review) for audits, [`refactor`](#7-refactor---intelligent-code-refactoring) for 
 improving code structure, [`debug`](#5-debug---expert-debugging-assistant) for solving complex problems, and [`precommit`](#4-precommit---pre-commit-validation) for 
 validating changes, Claude can switch between different tools _and_ models mid-conversation, 
 with context carrying forward seamlessly.
 **Example Workflow - Claude Code:**
@@ -49,7 +52,8 @@ and review into consideration to aid with its pre-commit review.
  - [`precommit`](#4-precommit---pre-commit-validation) - Pre-commit validation
  - [`debug`](#5-debug---expert-debugging-assistant) - Debugging help
  - [`analyze`](#6-analyze---smart-file-analysis) - File analysis
-  - [`testgen`](#7-testgen---comprehensive-test-generation) - Test generation with edge cases
+  - [`refactor`](#7-refactor---intelligent-code-refactoring) - Code refactoring with decomposition focus
  - [`testgen`](#8-testgen---comprehensive-test-generation) - Test generation with edge cases
 - **Advanced Usage**
  - [Advanced Features](#advanced-features) - AI-to-AI conversations, large prompts, web search
@@ -256,6 +260,7 @@ Just ask Claude naturally:
 - **Something's broken?** → `debug` (root cause analysis, error tracing)
 - **Want to understand code?** → `analyze` (architecture, patterns, dependencies)
 - **Need comprehensive tests?** → `testgen` (generates test suites with edge cases)
 - **Code needs refactoring?** → `refactor` (intelligent refactoring with decomposition focus)
 - **Server info?** → `version` (version and configuration details)
 **Auto Mode:** When `DEFAULT_MODEL=auto`, Claude automatically picks the best model for each task. You can override with: "Use flash for quick analysis" or "Use o3 to debug this".
@@ -276,8 +281,9 @@ Just ask Claude naturally:
 4. [`precommit`](#4-precommit---pre-commit-validation) - Validate git changes before committing
 5. [`debug`](#5-debug---expert-debugging-assistant) - Root cause analysis and debugging
 6. [`analyze`](#6-analyze---smart-file-analysis) - General-purpose file and code analysis
-7. [`testgen`](#7-testgen---comprehensive-test-generation) - Comprehensive test generation with edge case coverage
+7. [`refactor`](#7-refactor---intelligent-code-refactoring) - Code refactoring with decomposition focus
-8. [`version`](#8-version---server-information) - Get server version and configuration
+8. [`testgen`](#8-testgen---comprehensive-test-generation) - Comprehensive test generation with edge case coverage
 9. [`version`](#9-version---server-information) - Get server version and configuration
 ### 1. `chat` - General Development Chat & Collaborative Thinking
 **Your thinking partner - bounce ideas, get second opinions, brainstorm collaboratively**
@@ -435,7 +441,44 @@ Use zen and perform a thorough precommit ensuring there aren't any new regressio
 - Uses file paths (not content) for clean terminal output
 - Can identify patterns, anti-patterns, and refactoring opportunities
 - **Web search capability**: When enabled with `use_websearch` (default: true), the model can request Claude to perform web searches and share results back to enhance analysis with current documentation, design patterns, and best practices
-### 7. `testgen` - Comprehensive Test Generation
+
 ### 7. `refactor` - Intelligent Code Refactoring
 **Comprehensive refactoring analysis with top-down decomposition strategy**
 **Thinking Mode:** Default is `medium` (8,192 tokens). Use `high` for complex legacy systems (worth the investment for thorough refactoring plans) or `max` for extremely complex codebases requiring deep analysis.
 **Model Recommendation:** The refactor tool excels with models that have large context windows like Gemini
 Pro (1M tokens), which can analyze entire files and complex codebases simultaneously. 
 This comprehensive view enables detection of cross-file dependencies, architectural patterns, 
 and refactoring opportunities that might be missed when reviewing code in smaller chunks due to context
 constraints.
 #### Example Prompts:
 **Basic Usage:**
 ```
 "Use gemini pro to decompose my_crazy_big_class.m into smaller extensions"
 "Get gemini pro to identify code smells in the authentication module"
 ```
 **Key Features:**
 - **Intelligent prioritization** - Will refuse to work on low priority issues if code is unwieldy large and requires decomposition first, helps identify poorly managed classes and files that need structural improvements before detail work
 - **Top-down decomposition strategy** - Analyzes file → class → function levels systematically
 - **Four refactor types**: `codesmells` (detect anti-patterns), `decompose` (break down large components), `modernize` (update language features), `organization` (improve structure)
 - **Precise line-number references** - Provides exact line numbers for Claude to implement changes
 - **Language-specific guidance** - Tailored suggestions for Python, JavaScript, Java, C#, Swift, and more
 - **Style guide integration** - Uses existing project files as pattern references
 - **Conservative approach** - Careful dependency analysis to prevent breaking changes
 - **Multi-file analysis** - Understands cross-file relationships and dependencies
 - **Priority sequencing** - Recommends implementation order for refactoring changes
 **Refactor Types:**
 - `codesmells`: Detect long methods, god classes, duplicated code, poor naming
 - `decompose`: Break down large files (>1500 LOC), classes (>300 LOC), functions (>80 LOC) 
 - `modernize`: Update to modern language features (f-strings, async/await, etc.)
 - `organization`: Improve logical grouping, separation of concerns, module structure
 ### 8. `testgen` - Comprehensive Test Generation
 **Generates thorough test suites with edge case coverage** based on existing code and test framework used.
 **Thinking Mode (Extended thinking models):** Default is `medium` (8,192 tokens). Use `high` for complex systems with many interactions or `max` for critical systems requiring exhaustive test coverage.
@@ -463,7 +506,7 @@ suites that cover realistic failure scenarios and integration points that shorte
 - Can reference existing test files: `"Generate tests following patterns from tests/unit/"`
 - Specific code coverage - target specific functions/classes rather than testing everything
-### 8. `version` - Server Information
+### 9. `version` - Server Information
 ```
 "Get zen to show its version"
 ```
--- a/config.py
+++ b/config.py
@@ -14,7 +14,7 @@ import os
 # These values are used in server responses and for tracking releases
 # IMPORTANT: This is the single source of truth for version and author info
 # Semantic versioning: MAJOR.MINOR.PATCH
-__version__ = "4.4.4"
+__version__ = "4.5.0"
 # Last update date in ISO format
 __updated__ = "2025-06-14"
 # Primary maintainer
--- a/docs/advanced-usage.md
+++ b/docs/advanced-usage.md
@@ -259,6 +259,23 @@ All tools that work with files support **both individual files and entire direct
 "Generate tests following patterns from tests/unit/ for new auth module"
 ```
 **`refactor`** - Intelligent code refactoring with decomposition focus
 - `files`: Code files or directories to analyze for refactoring opportunities (required)
 - `prompt`: Description of refactoring goals, context, and specific areas of focus (required)
 - `refactor_type`: codesmells|decompose|modernize|organization (required)
 - `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high (default: server default)
 - `focus_areas`: Specific areas to focus on (e.g., 'performance', 'readability', 'maintainability', 'security')
 - `style_guide_examples`: Optional existing code files to use as style/pattern reference
 - `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
 - `continuation_id`: Thread continuation ID for multi-turn conversations
 ```
 "Analyze legacy codebase for decomposition opportunities" (auto mode picks best model)
 "Use pro to identify code smells in the authentication module with max thinking mode"
 "Use pro to modernize this JavaScript code following examples/modern-patterns.js"
 "Refactor src/ for better organization, focus on maintainability and readability"
 ```
 ## Collaborative Workflows
 ### Design → Review → Implement
@@ -284,6 +301,14 @@ suspect lies the bug and then formulate and implement a bare minimal fix. Must n
 with zen in the end using gemini pro to confirm we're okay to publish the fix 
 ```
 ### Refactor → Review → Implement → Test
 ```
 Use zen to analyze this legacy authentication module for decomposition opportunities. The code is getting hard to 
 maintain and we need to break it down. Use gemini pro with high thinking mode to identify code smells and suggest 
 a modernization strategy. After reviewing the refactoring plan, implement the changes step by step and then 
 generate comprehensive tests with zen to ensure nothing breaks.
 ```
 ### Tool Selection Guidance
 To help choose the right tool for your needs:
@@ -292,14 +317,17 @@ To help choose the right tool for your needs:
 2. **Want to find bugs/issues in code?** → Use `codereview`
 3. **Want to understand how code works?** → Use `analyze`
 4. **Need comprehensive test coverage?** → Use `testgen`
-5. **Have analysis that needs extension/validation?** → Use `thinkdeep`
+5. **Want to refactor/modernize code?** → Use `refactor`
-6. **Want to brainstorm or discuss?** → Use `chat`
+6. **Have analysis that needs extension/validation?** → Use `thinkdeep`
 7. **Want to brainstorm or discuss?** → Use `chat`
 **Key Distinctions:**
 - `analyze` vs `codereview`: analyze explains, codereview prescribes fixes
 - `chat` vs `thinkdeep`: chat is open-ended, thinkdeep extends specific analysis
 - `debug` vs `codereview`: debug diagnoses runtime errors, review finds static issues
 - `testgen` vs `debug`: testgen creates test suites, debug just finds issues and recommends solutions
 - `refactor` vs `codereview`: refactor suggests structural improvements, codereview finds bugs/issues
 - `refactor` vs `analyze`: refactor provides actionable refactoring steps, analyze provides understanding
 ## Working with Large Prompts
--- a/docs/testing.md
+++ b/docs/testing.md
@@ -30,6 +30,29 @@ Simulator tests replicate real-world Claude CLI interactions with the MCP server
 **Important**: Simulator tests require `LOG_LEVEL=DEBUG` in your `.env` file to validate detailed execution logs.
 #### Monitoring Logs During Tests
 **Important**: The MCP stdio protocol interferes with stderr output during tool execution. While server startup logs appear in `docker compose logs`, tool execution logs are only written to file-based logs inside the container. This is a known limitation of the stdio-based MCP protocol and cannot be fixed without changing the MCP implementation.
 To monitor logs during test execution:
 ```bash
 # Monitor main server logs (includes all tool execution details)
 docker exec zen-mcp-server tail -f -n 500 /tmp/mcp_server.log
 # Monitor MCP activity logs (tool calls and completions)  
 docker exec zen-mcp-server tail -f /tmp/mcp_activity.log
 # Check log file sizes (logs rotate at 20MB)
 docker exec zen-mcp-server ls -lh /tmp/mcp_*.log*
 ```
 **Log Rotation**: All log files are configured with automatic rotation at 20MB to prevent disk space issues. The server keeps:
 - 10 rotated files for mcp_server.log (200MB total)
 - 5 rotated files for mcp_activity.log (100MB total)
 **Why logs don't appear in docker compose logs**: The MCP stdio_server captures stderr during tool execution to prevent interference with the JSON-RPC protocol communication. This means that while you'll see startup logs in `docker compose logs`, you won't see tool execution logs there.
 #### Running All Simulator Tests
 ```bash
 # Run all simulator tests
--- a/server.py
+++ b/server.py
@@ -44,6 +44,7 @@ from tools import (
    CodeReviewTool,
    DebugIssueTool,
    Precommit,
    RefactorTool,
    TestGenTool,
    ThinkDeepTool,
 )
@@ -70,55 +71,59 @@ class LocalTimeFormatter(logging.Formatter):
 # Configure both console and file logging
 log_format = "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
 logging.basicConfig(
    level=getattr(logging, log_level, logging.INFO),
    format=log_format,
    force=True,  # Force reconfiguration if already configured
    stream=sys.stderr,  # Use stderr to avoid interfering with MCP stdin/stdout protocol
 )
-# Apply local time formatter to root logger
+# Clear any existing handlers first
-for handler in logging.getLogger().handlers:
+root_logger = logging.getLogger()
-    handler.setFormatter(LocalTimeFormatter(log_format))
+root_logger.handlers.clear()
 # Create and configure stderr handler explicitly
 stderr_handler = logging.StreamHandler(sys.stderr)
 stderr_handler.setLevel(getattr(logging, log_level, logging.INFO))
 stderr_handler.setFormatter(LocalTimeFormatter(log_format))
 root_logger.addHandler(stderr_handler)
 # Note: MCP stdio_server interferes with stderr during tool execution
 # All logs are properly written to /tmp/mcp_server.log for monitoring
 # Set root logger level
 root_logger.setLevel(getattr(logging, log_level, logging.INFO))
 # Add rotating file handler for Docker log monitoring
 try:
-    # Main server log with daily rotation (keep 7 days of logs)
+    # Main server log with size-based rotation (20MB max per file)
-    # Using 'midnight' interval rotates at midnight each day
+    # This ensures logs don't grow indefinitely and are properly managed
-    # Filename will have date suffix like mcp_server.log.2024-06-14
+    file_handler = RotatingFileHandler(
    file_handler = TimedRotatingFileHandler(
        "/tmp/mcp_server.log",
-        when="midnight",  # Rotate at midnight
+        maxBytes=20 * 1024 * 1024,  # 20MB max file size
-        interval=1,  # Every 1 day
+        backupCount=10,  # Keep 10 rotated files (200MB total)
        backupCount=7,  # Keep 7 days of logs
        encoding="utf-8",
    )
    file_handler.setLevel(getattr(logging, log_level, logging.INFO))
    file_handler.setFormatter(LocalTimeFormatter(log_format))
    # Add suffix pattern for rotated files
    file_handler.suffix = "%Y-%m-%d"
    logging.getLogger().addHandler(file_handler)
-    # Create a special logger for MCP activity tracking with daily rotation
+    # Create a special logger for MCP activity tracking with size-based rotation
    mcp_logger = logging.getLogger("mcp_activity")
-    mcp_file_handler = TimedRotatingFileHandler(
+    mcp_file_handler = RotatingFileHandler(
        "/tmp/mcp_activity.log",
-        when="midnight",  # Rotate at midnight
+        maxBytes=20 * 1024 * 1024,  # 20MB max file size
-        interval=1,  # Every 1 day
+        backupCount=5,  # Keep 5 rotated files (100MB total)
        backupCount=7,  # Keep 7 days of logs
        encoding="utf-8",
    )
    mcp_file_handler.setLevel(logging.INFO)
    mcp_file_handler.setFormatter(LocalTimeFormatter("%(asctime)s - %(message)s"))
    mcp_file_handler.suffix = "%Y-%m-%d"
    mcp_logger.addHandler(mcp_file_handler)
    mcp_logger.setLevel(logging.INFO)
    # Ensure MCP activity also goes to stderr
    mcp_logger.propagate = True
    # Also keep a size-based rotation as backup (100MB max per file)
    # This prevents any single day's log from growing too large
    size_handler = RotatingFileHandler(
-        "/tmp/mcp_server_overflow.log", maxBytes=100 * 1024 * 1024, backupCount=3  # 100MB
+        "/tmp/mcp_server_overflow.log",
        maxBytes=100 * 1024 * 1024,
        backupCount=3,  # 100MB
    )
    size_handler.setLevel(logging.WARNING)  # Only warnings and errors
    size_handler.setFormatter(LocalTimeFormatter(log_format))
@@ -144,6 +149,7 @@ TOOLS = {
    "chat": ChatTool(),  # Interactive development chat and brainstorming
    "precommit": Precommit(),  # Pre-commit validation of git changes
    "testgen": TestGenTool(),  # Comprehensive test generation with edge case coverage
    "refactor": RefactorTool(),  # Intelligent code refactoring suggestions with precise line references
 }
--- a/simulator_tests/init.py
+++ b/simulator_tests/init.py
@@ -19,6 +19,7 @@ from .test_openrouter_fallback import OpenRouterFallbackTest
 from .test_openrouter_models import OpenRouterModelsTest
 from .test_per_tool_deduplication import PerToolDeduplicationTest
 from .test_redis_validation import RedisValidationTest
 from .test_refactor_validation import RefactorValidationTest
 from .test_testgen_validation import TestGenValidationTest
 from .test_token_allocation_validation import TokenAllocationValidationTest
@@ -38,6 +39,7 @@ TEST_REGISTRY = {
    "openrouter_models": OpenRouterModelsTest,
    "token_allocation_validation": TokenAllocationValidationTest,
    "testgen_validation": TestGenValidationTest,
    "refactor_validation": RefactorValidationTest,
    "conversation_chain_validation": ConversationChainValidationTest,
 }
@@ -57,6 +59,7 @@ __all__ = [
    "OpenRouterModelsTest",
    "TokenAllocationValidationTest",
    "TestGenValidationTest",
    "RefactorValidationTest",
    "ConversationChainValidationTest",
    "TEST_REGISTRY",
 ]
--- a/simulator_tests/test_refactor_validation.py
+++ b/simulator_tests/test_refactor_validation.py
@@ -0,0 +1,283 @@
 #!/usr/bin/env python3
 """
 Refactor Tool Validation Test
 Tests the refactor tool with a simple code smell example to validate:
 - Proper execution with flash model
 - Correct line number references in response
 - Log validation for tool execution
 """
 import json
 from .base_test import BaseSimulatorTest
 class RefactorValidationTest(BaseSimulatorTest):
    """Test refactor tool with codesmells detection"""
    @property
    def test_name(self) -> str:
        return "refactor_validation"
    @property
    def test_description(self) -> str:
        return "Refactor tool validation with codesmells"
    def run_test(self) -> bool:
        """Test refactor tool with a simple code smell example"""
        try:
            self.logger.info("Test: Refactor tool validation")
            # Setup test files directory first
            self.setup_test_files()
            # Create a simple Python file with obvious code smells
            code_with_smells = '''# Code with obvious smells for testing
 def process_data(data):
    # Code smell: Magic number
    if len(data) > 42:
        result = []
        # Code smell: Nested loops with poor variable names
        for i in range(len(data)):
            for j in range(len(data[i])):
                x = data[i][j]
                # Code smell: Duplicate code
                if x > 0:
                    result.append(x * 2)
                elif x < 0:
                    result.append(x * 2)
        return result
    else:
        # Code smell: Return inconsistent type
        return None
 # Code smell: God function doing too many things
 def handle_everything(user_input, config, database):
    # Validation
    if not user_input:
        print("Error: No input")  # Code smell: print instead of logging
        return
    # Processing
    processed = user_input.strip().lower()
    # Database operation
    connection = database.connect()
    data = connection.query("SELECT * FROM users")  # Code smell: SQL in code
    # Business logic mixed with data access
    valid_users = []
    for row in data:
        if row[2] == processed:  # Code smell: Magic index
            valid_users.append(row)
    return valid_users
 '''
            # Create test file
            test_file = self.create_additional_test_file("smelly_code.py", code_with_smells)
            self.logger.info(f"  ✅ Created test file with code smells: {test_file}")
            # Call refactor tool with codesmells type
            self.logger.info("  📝 Calling refactor tool with codesmells type...")
            response, _ = self.call_mcp_tool(
                "refactor",
                {
                    "files": [test_file],
                    "prompt": "Find and suggest fixes for code smells in this file",
                    "refactor_type": "codesmells",
                    "model": "flash",
                    "thinking_mode": "low",  # Keep it fast for testing
                }
            )
            if not response:
                self.logger.error("Failed to get refactor response")
                return False
            self.logger.info("  ✅ Got refactor response")
            # Parse response to check for line references
            try:
                response_data = json.loads(response)
                # Debug: log the response structure
                self.logger.debug(f"Response keys: {list(response_data.keys())}")
                # Extract the actual content if it's wrapped
                if "content" in response_data:
                    # The actual refactoring data is in the content field
                    content = response_data["content"]
                    # Remove markdown code block markers if present
                    if content.startswith("```json"):
                        content = content[7:]  # Remove ```json
                    if content.endswith("```"):
                        content = content[:-3]  # Remove ```
                    content = content.strip()
                    # Find the end of the JSON object - handle truncated responses
                    # Count braces to find where the JSON ends
                    brace_count = 0
                    json_end = -1
                    in_string = False
                    escape_next = False
                    for i, char in enumerate(content):
                        if escape_next:
                            escape_next = False
                            continue
                        if char == '\\':
                            escape_next = True
                            continue
                        if char == '"' and not escape_next:
                            in_string = not in_string
                        if not in_string:
                            if char == '{':
                                brace_count += 1
                            elif char == '}':
                                brace_count -= 1
                                if brace_count == 0:
                                    json_end = i + 1
                                    break
                    if json_end > 0:
                        content = content[:json_end]
                    # Parse the inner JSON
                    inner_data = json.loads(content)
                    self.logger.debug(f"Inner data keys: {list(inner_data.keys())}")
                else:
                    inner_data = response_data
                # Check that we got refactoring suggestions (might be called refactor_opportunities)
                refactorings_key = None
                for key in ["refactorings", "refactor_opportunities"]:
                    if key in inner_data:
                        refactorings_key = key
                        break
                if not refactorings_key:
                    self.logger.error("No refactorings found in response")
                    self.logger.error(f"Response structure: {json.dumps(inner_data, indent=2)[:500]}...")
                    return False
                refactorings = inner_data[refactorings_key]
                if not isinstance(refactorings, list) or len(refactorings) == 0:
                    self.logger.error("Empty refactorings list")
                    return False
                # Validate that we have line references for code smells
                # Flash model typically detects these issues:
                # - Lines 4-18: process_data function (magic number, nested loops, duplicate code)
                # - Lines 11-14: duplicate code blocks
                # - Lines 21-40: handle_everything god function
                expected_line_ranges = [
                    (4, 18),   # process_data function issues
                    (11, 14),  # duplicate code
                    (21, 40),  # god function
                ]
                self.logger.debug(f"Refactorings found: {len(refactorings)}")
                for i, ref in enumerate(refactorings[:3]):  # Log first 3
                    self.logger.debug(f"Refactoring {i}: start_line={ref.get('start_line')}, end_line={ref.get('end_line')}, type={ref.get('type')}")
                found_references = []
                for refactoring in refactorings:
                    # Check for line numbers in various fields
                    start_line = refactoring.get("start_line")
                    end_line = refactoring.get("end_line")
                    location = refactoring.get("location", "")
                    # Add found line numbers
                    if start_line:
                        found_references.append(f"line {start_line}")
                    if end_line and end_line != start_line:
                        found_references.append(f"line {end_line}")
                    # Also extract from location string
                    import re
                    line_matches = re.findall(r'line[s]?\s+(\d+)', location.lower())
                    found_references.extend([f"line {num}" for num in line_matches])
                self.logger.info(f"  📍 Found line references: {found_references}")
                # Check that flash found the expected refactoring areas
                found_ranges = []
                for refactoring in refactorings:
                    start = refactoring.get("start_line")
                    end = refactoring.get("end_line")
                    if start and end:
                        found_ranges.append((start, end))
                self.logger.info(f"  📍 Found refactoring ranges: {found_ranges}")
                # Verify we found issues in the main problem areas
                # Check if we have issues detected in process_data function area (lines 2-18)
                process_data_issues = [r for r in found_ranges if r[0] >= 2 and r[1] <= 18]
                # Check if we have issues detected in handle_everything function area (lines 21-40)
                god_function_issues = [r for r in found_ranges if r[0] >= 21 and r[1] <= 40]
                self.logger.info(f"  📍 Issues in process_data area (lines 2-18): {len(process_data_issues)}")
                self.logger.info(f"  📍 Issues in handle_everything area (lines 21-40): {len(god_function_issues)}")
                if len(process_data_issues) >= 1 and len(god_function_issues) >= 1:
                    self.logger.info(f"  ✅ Flash correctly identified code smells in both major areas")
                    self.logger.info(f"  ✅ Found {len(refactorings)} total refactoring opportunities")
                    # Verify we have reasonable number of total issues
                    if len(refactorings) >= 3:
                        self.logger.info(f"  ✅ Refactoring analysis validation passed")
                    else:
                        self.logger.warning(f"  ⚠️ Only {len(refactorings)} refactorings found (expected >= 3)")
                else:
                    self.logger.error(f"  ❌ Flash didn't find enough issues in expected areas")
                    self.logger.error(f"     - process_data area: found {len(process_data_issues)}, expected >= 1")
                    self.logger.error(f"     - handle_everything area: found {len(god_function_issues)}, expected >= 1")
                    return False
            except json.JSONDecodeError as e:
                self.logger.error(f"Failed to parse refactor response as JSON: {e}")
                return False
            # Validate logs
            self.logger.info("  📋 Validating execution logs...")
            # Get server logs from the actual log file inside the container
            result = self.run_command(
                ["docker", "exec", self.container_name, "tail", "-500", "/tmp/mcp_server.log"], 
                capture_output=True
            )
            if result.returncode == 0:
                logs = result.stdout.decode() + result.stderr.decode()
                # Look for refactor tool execution patterns
                refactor_patterns = [
                    "[REFACTOR]",
                    "refactor tool",
                    "codesmells",
                    "Token budget",
                    "Code files embedded successfully"
                ]
                patterns_found = 0
                for pattern in refactor_patterns:
                    if pattern in logs:
                        patterns_found += 1
                        self.logger.debug(f"  ✅ Found log pattern: {pattern}")
                if patterns_found >= 3:
                    self.logger.info(f"  ✅ Log validation passed ({patterns_found}/{len(refactor_patterns)} patterns)")
                else:
                    self.logger.warning(f"  ⚠️ Only found {patterns_found}/{len(refactor_patterns)} log patterns")
            else:
                self.logger.warning("  ⚠️ Could not retrieve Docker logs")
            self.logger.info("  ✅ Refactor tool validation completed successfully")
            return True
        except Exception as e:
            self.logger.error(f"Refactor validation test failed: {e}")
            return False
        finally:
            self.cleanup_test_files()
--- a/systemprompts/init.py
+++ b/systemprompts/init.py
@@ -7,6 +7,7 @@ from .chat_prompt import CHAT_PROMPT
 from .codereview_prompt import CODEREVIEW_PROMPT
 from .debug_prompt import DEBUG_ISSUE_PROMPT
 from .precommit_prompt import PRECOMMIT_PROMPT
 from .refactor_prompt import REFACTOR_PROMPT
 from .testgen_prompt import TESTGEN_PROMPT
 from .thinkdeep_prompt import THINKDEEP_PROMPT
@@ -17,5 +18,6 @@ __all__ = [
    "ANALYZE_PROMPT",
    "CHAT_PROMPT",
    "PRECOMMIT_PROMPT",
    "REFACTOR_PROMPT",
    "TESTGEN_PROMPT",
 ]
--- a/systemprompts/codereview_prompt.py
+++ b/systemprompts/codereview_prompt.py
@@ -15,6 +15,11 @@ same file you've been provided unless for some reason its content is missing or
 {"status": "clarification_required", "question": "<your brief question>",
 "files_needed": ["[file name here]", "[or some folder/]"]}
 CRITICAL LINE NUMBER INSTRUCTIONS
 Code is presented with line number markers "LINE│ code". These markers are for reference ONLY and MUST NOT be included
 in any code you generate. Always reference specific line numbers for precise feedback. Include exact line numbers in
 your issue descriptions.
 CRITICAL: Align your review with the user's context and expectations. Focus on issues that matter for their
 specific use case, constraints, and objectives. Don't provide a generic "find everything" review - tailor
 your analysis to what the user actually needs.
--- a/systemprompts/debug_prompt.py
+++ b/systemprompts/debug_prompt.py
@@ -15,6 +15,11 @@ Do NOT ask for the same file you've been provided unless for some reason its con
 {"status": "clarification_required", "question": "<your brief question>",
 "files_needed": ["[file name here]", "[or some folder/]"]}
 CRITICAL LINE NUMBER INSTRUCTIONS
 Code is presented with line number markers "LINE│ code". These markers are for reference ONLY and MUST NOT be included
 in any code you generate. Always reference specific line numbers for precise feedback. Include exact line numbers in
 your issue descriptions.
 CRITICAL: Your primary objective is to identify the root cause of the specific issue at hand and suggest the
 minimal fix required to resolve it. Stay focused on the main problem - avoid suggesting extensive refactoring,
 architectural changes, or unrelated improvements.
--- a/systemprompts/refactor_prompt.py
+++ b/systemprompts/refactor_prompt.py
@@ -0,0 +1,201 @@
 """
 Refactor tool system prompt
 """
 REFACTOR_PROMPT = """
 ROLE
 You are a principal software engineer specializing in intelligent code refactoring. You identify concrete improvement
 opportunities and provide precise, actionable suggestions with exact line-number references that Claude can
 implement directly.
 IF MORE INFORMATION IS NEEDED
 If you need additional context (e.g., related files, configuration, dependencies) to provide accurate refactoring
 recommendations, you MUST respond ONLY with this JSON format (and nothing else). Do NOT ask for the same file you've
 been provided unless for some reason its content is missing or incomplete:
 {"status": "clarification_required", "question": "<your brief question>",
 "files_needed": ["[file name here]", "[or some folder/]"]}
 REFACTOR TYPES (PRIORITY ORDER)
 1. **decompose** (CRITICAL PRIORITY)
 2. **codesmells**
 3. **modernize**
 4. **organization**
 **decompose**: CRITICAL PRIORITY for cognitive load reduction. When encountering large files (>1500 lines), huge classes
 (>300 lines), or massive functions (>80 lines), decomposition is MANDATORY before any other refactoring type. Large
 codebases are impossible to navigate, understand, or maintain.
 DECOMPOSITION ORDER (STRICT TOP-DOWN, ADAPTIVE):
 Analyze in this sequence, stopping at the FIRST breached threshold in each file:
 1. **File Level (>1500 LOC)** → Propose file-level splits ONLY, then re-analyze after implementation
 2. **Class Level (>300 LOC)** → Propose class extraction ONLY, then re-analyze after implementation
 3. **Function Level (>80 LOC)** → Propose function extraction
 RATIONALE: Outer-scope size dominates cognitive load and merge complexity. NEVER descend to an inner level until
 the containing level is within its threshold. This prevents premature micro-optimization and ensures maximum
 cognitive load reduction with minimum rework.
 DECOMPOSITION STRATEGIES:
 **File-Level Decomposition** (PRIORITY 1): Split oversized files into multiple focused files:
   - Extract related classes/functions into separate modules using platform-specific patterns
   - Create logical groupings (models, services, utilities, components, etc.)
   - Use proper import/export mechanisms for the target language
   - Focus on responsibility-based splits, not arbitrary size cuts
   - CAUTION: When only a single file is provided, verify dependencies and imports before suggesting file splits
   - DEPENDENCY ANALYSIS: Check for cross-references, shared constants, and inter-class dependencies
   - If splitting breaks internal dependencies, suggest necessary visibility changes or shared modules
 **Class-Level Decomposition** (PRIORITY 2): Break down mega-classes:
   - FIRST: Split large classes into multiple classes where programming language allows (C# partial classes,
   Swift and ObjC extensions, JavaScript modules, etc.)
   - THEN: Extract specialized responsibilities into focused classes via composition or inheritance if this is feasible
   - Use composition over inheritance where appropriate
   - Apply single responsibility principle cautiously - avoid breaking existing APIs or adding new dependencies
   - When only a single file is provided, prefer internal splitting methods (private classes, inner classes,
     helper methods)
   - Consider interface segregation for large public APIs only if it doesn't break existing consumers
   - CRITICAL: When moving code between files/extensions, analyze access dependencies (private variables,
     internal methods)
   - WARNING: Some moves may break access visibility (Swift private→extension, C# internal→assembly) - flag for review
   - If access breaks are unavoidable, explicitly note required visibility changes (private→internal, protected, etc.)
 **Function-Level Decomposition** (PRIORITY 3): Eliminate long, complex functions:
   - Extract logical chunks into private/helper methods within the same class/module
   - Separate data processing from business logic conservatively
   - Create clear, named abstractions for complex operations without breaking existing call sites
   - Maintain function cohesion and minimize parameter passing
   - Prefer internal extraction over creating new dependencies or external functions
   - ANALYZE DEPENDENCIES: Check for private variable access, closure captures, and scope-dependent behavior
   - If extraction breaks variable access, suggest parameter passing or scope adjustments
   - Flag functions that require manual review due to complex inter-dependencies
 CRITICAL RULE: If ANY file exceeds cognitive complexity thresholds (large files/classes/functions), you MUST:
 1. Mark ALL decomposition opportunities as CRITICAL severity
 2. Focus EXCLUSIVELY on decomposition - provide ONLY decomposition suggestions
 3. DO NOT suggest ANY other refactoring type (code smells, modernization, organization)
 4. List decomposition issues FIRST by severity: CRITICAL → HIGH → MEDIUM → LOW
 5. Block all other refactoring until cognitive load is reduced
 CRITICAL SEVERITY = BLOCKING ISSUE: Other refactoring types can only be applied AFTER all CRITICAL decomposition
 is complete. Decomposition reduces navigation complexity, improves understanding, enables focused changes, and makes
 future refactoring possible.
 **codesmells**: Detect and fix quality issues - long methods, complex conditionals, duplicate code, magic numbers,
 poor naming, feature envy. NOTE: Can only be applied AFTER decomposition if large files/classes/functions exist.
 **modernize**: Update to modern language features - replace deprecated patterns, use newer syntax, improve error
 handling and type safety. NOTE: Can only be applied AFTER decomposition if large files/classes/functions exist.
 **organization**: Improve organization and structure - group related functionality, improve file structure,
 standardize naming, clarify module boundaries. NOTE: Can only be applied AFTER decomposition if large files exist.
 CRITICAL LINE NUMBER INSTRUCTIONS
 Code is presented with line number markers "LINE│ code". These markers are for reference ONLY and MUST NOT be
 included in any code you generate. Always reference specific line numbers for Claude to locate exact positions.
 Include context_start_text and context_end_text as backup references. Never include "LINE│" markers in generated code
 snippets.
 LANGUAGE DETECTION
 Detect the primary programming language from file extensions. Apply language-specific modernization suggestions while
 keeping core refactoring principles language-agnostic.
 SCOPE CONTROL
 Stay strictly within the provided codebase. Do NOT invent features, suggest major architectural changes beyond current
 structure, recommend external libraries not in use, or create speculative ideas outside project scope.
 If scope is too large and refactoring would require large parts of the code to be involved, respond ONLY with:
 {"status": "focused_review_required",
 "reason": "<brief explanation>",
 "suggestion": "<specific focused subset to analyze>"}
 OUTPUT FORMAT
 Return ONLY a JSON object with this exact structure:
 {
  "status": "refactor_analysis_complete",
  "refactor_opportunities": [
    {
      "id": "refactor-001",
      "type": "decompose|codesmells|modernize|organization",
      "severity": "critical|high|medium|low",
      "file": "/absolute/path/to/file.ext",
      "start_line": 45,
      "end_line": 67,
      "context_start_text": "exact text from start line for verification",
      "context_end_text": "exact text from end line for verification",
      "issue": "Clear description of what needs refactoring",
      "suggestion": "Specific refactoring action to take",
      "rationale": "Why this improves the code (performance, readability, maintainability)",
      "code_to_replace": "Original code that should be changed",
      "replacement_code_snippet": "Refactored version of the code",
      "new_code_snippets": [
        {
          "description": "What this new code does",
          "location": "same_class|new_file|separate_module",
          "code": "New code to be added"
        }
      ]
    }
  ],
  "priority_sequence": ["refactor-001", "refactor-002"],
  "next_actions_for_claude": [
    {
      "action_type": "EXTRACT_METHOD|SPLIT_CLASS|MODERNIZE_SYNTAX|REORGANIZE_CODE|DECOMPOSE_FILE",
      "target_file": "/absolute/path/to/file.ext",
      "source_lines": "45-67",
      "description": "Specific step-by-step action for Claude"
    }
  ]
 }
 QUALITY STANDARDS
 Each refactoring opportunity must be specific and actionable. Code snippets must be syntactically correct. Preserve
 existing functionality - refactoring changes structure, not behavior. Focus on high-impact changes that meaningfully
 improve code quality.
 SEVERITY GUIDELINES
 - **critical**: EXCLUSIVELY for decomposition when large files/classes/functions detected - BLOCKS ALL OTHER
  REFACTORING
 - **high**: Critical code smells, major duplication, significant architectural issues (only after decomposition
  complete)
 - **medium**: Moderate complexity issues, minor duplication, organization improvements (only after decomposition
  complete)
 - **low**: Style improvements, minor modernization, optional optimizations (only after decomposition complete)
 DECOMPOSITION PRIORITY RULES - CRITICAL SEVERITY:
 1. If ANY file >2000 lines: Mark ALL decomposition opportunities as CRITICAL severity
 2. If ANY class >1500 lines: Mark ALL class decomposition as CRITICAL severity
 3. If ANY function >250 lines: Mark ALL function decomposition as CRITICAL severity
 4. CRITICAL issues MUST BE RESOLVED FIRST - no other refactoring suggestions allowed
 5. Focus EXCLUSIVELY on breaking down large components when CRITICAL issues exist
 6. List ALL decomposition issues FIRST in severity order: CRITICAL → HIGH → MEDIUM → LOW
 7. When CRITICAL decomposition issues exist, provide ONLY decomposition suggestions
 FILE TYPE CONSIDERATIONS:
 - CSS files can grow large with styling rules - consider logical grouping by components/pages
 - JavaScript files may have multiple classes/modules - extract into separate files
 - Configuration files may be legitimately large - focus on logical sections
 - Generated code files should generally be excluded from decomposition
 IF EXTENSIVE REFACTORING IS REQUIRED
 If you determine that comprehensive refactoring requires dozens of changes across multiple files or would involve
 extensive back-and-forth iterations that would risk exceeding context limits, you MUST follow this structured approach:
 1. **Generate Essential Refactorings First**: Create the standard refactor_analysis_complete response with the most
   critical and high-impact refactoring opportunities (typically 5-10 key changes covering the most important
   improvements). Focus on CRITICAL and HIGH severity issues. Include full details with refactor_opportunities,
   priority_sequence, and next_actions_for_claude.
 2. **Request Continuation**: AFTER providing the refactor_analysis_complete response, append the following JSON
   format as a separate response (and nothing more after this):
 {"status": "more_refactor_required",
 "message": "Explanation of why more refactoring is needed and overview of remaining work. For example: 'Extensive decomposition required across 15 additional files. Continuing analysis will identify module extraction opportunities in services/, controllers/, and utils/ directories.'"}
 This approach ensures comprehensive refactoring coverage while maintaining quality and avoiding context overflow.
 Claude will use the continuation_id to continue the refactoring analysis in subsequent requests.
 Provide precise, implementable refactoring guidance that Claude can execute with confidence.
 """
--- a/systemprompts/testgen_prompt.py
+++ b/systemprompts/testgen_prompt.py
@@ -15,6 +15,11 @@ same file you've been provided unless for some reason its content is missing or
 {"status": "clarification_required", "question": "<your brief question>",
 "files_needed": ["[file name here]", "[or some folder/]"]}
 CRITICAL LINE NUMBER INSTRUCTIONS
 Code is presented with line number markers "LINE│ code". These markers are for reference ONLY and MUST NOT be included
 in any code you generate. Always reference specific line numbers for precise feedback. Include exact line numbers in
 your issue descriptions.
 MULTI-AGENT WORKFLOW
 You sequentially inhabit five expert personas—each passes a concise artefact to the next:
--- a/systemprompts/thinkdeep_prompt.py
+++ b/systemprompts/thinkdeep_prompt.py
@@ -4,7 +4,8 @@ ThinkDeep tool system prompt
 THINKDEEP_PROMPT = """
 ROLE
-You are a senior engineering collaborator working with Claude on complex software problems. Claude will send you content—analysis, prompts, questions, ideas, or theories—to deepen, validate, and extend.
+You are a senior engineering collaborator working with Claude on complex software problems. Claude will send you
 content—analysis, prompts, questions, ideas, or theories—to deepen, validate, and extend.
 IF MORE INFORMATION IS NEEDED
 If you need additional context (e.g., related files, system architecture, requirements, code snippets) to provide
--- a/tests/test_docker_path_integration.py
+++ b/tests/test_docker_path_integration.py
@@ -9,6 +9,7 @@ import importlib
 import os
 import tempfile
 from pathlib import Path
 from unittest.mock import patch
 import pytest
@@ -36,27 +37,33 @@ def test_docker_path_translation_integration():
        try:
            os.environ["WORKSPACE_ROOT"] = str(host_workspace)
-            # Reload the module to pick up new environment variables
+            # Reload the modules to pick up new environment variables
            # Need to reload security_config first since it sets WORKSPACE_ROOT
            import utils.security_config
            importlib.reload(utils.security_config)
            importlib.reload(utils.file_utils)
-            # Mock the CONTAINER_WORKSPACE to point to our test directory
+            # Properly mock the CONTAINER_WORKSPACE
-            utils.file_utils.CONTAINER_WORKSPACE = container_workspace
+            with patch("utils.file_utils.CONTAINER_WORKSPACE", container_workspace):
                # Test the translation
                from utils.file_utils import translate_path_for_environment
-            # Test the translation
+                # This should translate the host path to container path
-            from utils.file_utils import translate_path_for_environment
+                host_path = str(test_file)
                result = translate_path_for_environment(host_path)
-            # This should translate the host path to container path
+                # Verify the translation worked
-            host_path = str(test_file)
+                expected = str(container_workspace / "src" / "test.py")
-            result = translate_path_for_environment(host_path)
+                assert result == expected
            # Verify the translation worked
            expected = str(container_workspace / "src" / "test.py")
            assert result == expected
        finally:
            # Restore original environment
            os.environ.clear()
            os.environ.update(original_env)
            import utils.security_config
            importlib.reload(utils.security_config)
            importlib.reload(utils.file_utils)
@@ -80,19 +87,26 @@ def test_docker_security_validation():
        try:
            os.environ["WORKSPACE_ROOT"] = str(host_workspace)
-            # Reload the module
+            # Reload the modules
            import utils.security_config
            importlib.reload(utils.security_config)
            importlib.reload(utils.file_utils)
            utils.file_utils.CONTAINER_WORKSPACE = Path("/workspace")
-            from utils.file_utils import resolve_and_validate_path
+            # Properly mock the CONTAINER_WORKSPACE
            with patch("utils.file_utils.CONTAINER_WORKSPACE", Path("/workspace")):
                from utils.file_utils import resolve_and_validate_path
-            # Trying to access the symlink should fail
+                # Trying to access the symlink should fail
-            with pytest.raises(PermissionError):
+                with pytest.raises(PermissionError):
-                resolve_and_validate_path(str(symlink))
+                    resolve_and_validate_path(str(symlink))
        finally:
            os.environ.clear()
            os.environ.update(original_env)
            import utils.security_config
            importlib.reload(utils.security_config)
            importlib.reload(utils.file_utils)
@@ -150,37 +164,46 @@ def test_review_changes_docker_path_translation():
            # Simulate Docker environment
            os.environ["WORKSPACE_ROOT"] = str(host_workspace)
-            # Reload the module
+            # Reload the modules
            import utils.security_config
            importlib.reload(utils.security_config)
            importlib.reload(utils.file_utils)
            utils.file_utils.CONTAINER_WORKSPACE = container_workspace
-            # Import after reloading to get updated environment
+            # Properly mock the CONTAINER_WORKSPACE and reload precommit module
-            from tools.precommit import Precommit
+            with patch("utils.file_utils.CONTAINER_WORKSPACE", container_workspace):
                # Need to also patch it in the modules that import it
                with patch("utils.security_config.CONTAINER_WORKSPACE", container_workspace):
                    # Import after patching to get updated environment
                    from tools.precommit import Precommit
-            # Create tool instance
+                    # Create tool instance
-            tool = Precommit()
+                    tool = Precommit()
-            # Test path translation in prepare_prompt
+                    # Test path translation in prepare_prompt
-            request = tool.get_request_model()(
+                    request = tool.get_request_model()(
-                path=str(host_workspace / "project"),  # Host path that needs translation
+                        path=str(host_workspace / "project"),  # Host path that needs translation
-                review_type="quick",
+                        review_type="quick",
-                severity_filter="all",
+                        severity_filter="all",
-            )
+                    )
-            # This should translate the path and find the git repository
+                    # This should translate the path and find the git repository
-            import asyncio
+                    import asyncio
-            result = asyncio.run(tool.prepare_prompt(request))
+                    result = asyncio.run(tool.prepare_prompt(request))
-            # Should find the repository (not raise an error about inaccessible path)
+                    # Should find the repository (not raise an error about inaccessible path)
-            # If we get here without exception, the path was successfully translated
+                    # If we get here without exception, the path was successfully translated
-            assert isinstance(result, str)
+                    assert isinstance(result, str)
-            # The result should contain git diff information or indicate no changes
+                    # The result should contain git diff information or indicate no changes
-            assert "No git repositories found" not in result or "changes" in result.lower()
+                    assert "No git repositories found" not in result or "changes" in result.lower()
        finally:
            os.environ.clear()
            os.environ.update(original_env)
            import utils.security_config
            importlib.reload(utils.security_config)
            importlib.reload(utils.file_utils)
@@ -203,36 +226,44 @@ def test_review_changes_docker_path_error():
            # Simulate Docker environment
            os.environ["WORKSPACE_ROOT"] = str(host_workspace)
-            # Reload the module
+            # Reload the modules
            import utils.security_config
            importlib.reload(utils.security_config)
            importlib.reload(utils.file_utils)
            utils.file_utils.CONTAINER_WORKSPACE = container_workspace
-            # Import after reloading to get updated environment
+            # Properly mock the CONTAINER_WORKSPACE
-            from tools.precommit import Precommit
+            with patch("utils.file_utils.CONTAINER_WORKSPACE", container_workspace):
                with patch("utils.security_config.CONTAINER_WORKSPACE", container_workspace):
                    # Import after patching to get updated environment
                    from tools.precommit import Precommit
-            # Create tool instance
+                    # Create tool instance
-            tool = Precommit()
+                    tool = Precommit()
-            # Test path translation with an inaccessible path
+                    # Test path translation with an inaccessible path
-            request = tool.get_request_model()(
+                    request = tool.get_request_model()(
-                path=str(outside_path),  # Path outside the mounted workspace
+                        path=str(outside_path),  # Path outside the mounted workspace
-                review_type="quick",
+                        review_type="quick",
-                severity_filter="all",
+                        severity_filter="all",
-            )
+                    )
-            # This should raise a ValueError
+                    # This should raise a ValueError
-            import asyncio
+                    import asyncio
-            with pytest.raises(ValueError) as exc_info:
+                    with pytest.raises(ValueError) as exc_info:
-                asyncio.run(tool.prepare_prompt(request))
+                        asyncio.run(tool.prepare_prompt(request))
-            # Check the error message
+                    # Check the error message
-            assert "not accessible from within the Docker container" in str(exc_info.value)
+                    assert "not accessible from within the Docker container" in str(exc_info.value)
-            assert "mounted workspace" in str(exc_info.value)
+                    assert "mounted workspace" in str(exc_info.value)
        finally:
            os.environ.clear()
            os.environ.update(original_env)
            import utils.security_config
            importlib.reload(utils.security_config)
            importlib.reload(utils.file_utils)
@@ -250,31 +281,38 @@ def test_double_translation_prevention():
        try:
            os.environ["WORKSPACE_ROOT"] = str(host_workspace)
-            # Reload the module
+            # Reload the modules
            import utils.security_config
            importlib.reload(utils.security_config)
            importlib.reload(utils.file_utils)
            utils.file_utils.CONTAINER_WORKSPACE = container_workspace
-            from utils.file_utils import translate_path_for_environment
+            # Properly mock the CONTAINER_WORKSPACE
            with patch("utils.file_utils.CONTAINER_WORKSPACE", container_workspace):
                from utils.file_utils import translate_path_for_environment
-            # Test 1: Normal translation
+                # Test 1: Normal translation
-            host_path = str(host_workspace / "src" / "main.py")
+                host_path = str(host_workspace / "src" / "main.py")
-            translated_once = translate_path_for_environment(host_path)
+                translated_once = translate_path_for_environment(host_path)
-            expected = str(container_workspace / "src" / "main.py")
+                expected = str(container_workspace / "src" / "main.py")
-            assert translated_once == expected
+                assert translated_once == expected
-            # Test 2: Double translation should return the same path
+                # Test 2: Double translation should return the same path
-            translated_twice = translate_path_for_environment(translated_once)
+                translated_twice = translate_path_for_environment(translated_once)
-            assert translated_twice == translated_once
+                assert translated_twice == translated_once
-            assert translated_twice == expected
+                assert translated_twice == expected
-            # Test 3: Container workspace root should not be double-translated
+                # Test 3: Container workspace root should not be double-translated
-            root_path = str(container_workspace)
+                root_path = str(container_workspace)
-            translated_root = translate_path_for_environment(root_path)
+                translated_root = translate_path_for_environment(root_path)
-            assert translated_root == root_path
+                assert translated_root == root_path
        finally:
            os.environ.clear()
            os.environ.update(original_env)
            import utils.security_config
            importlib.reload(utils.security_config)
            importlib.reload(utils.file_utils)
--- a/tests/test_refactor.py
+++ b/tests/test_refactor.py
@@ -0,0 +1,556 @@
 """
 Tests for the refactor tool functionality
 """
 import json
 from unittest.mock import MagicMock, patch
 import pytest
 from tools.refactor import RefactorTool
 from utils.file_utils import read_file_content
 class TestRefactorTool:
    """Test suite for the refactor tool"""
    @pytest.fixture
    def refactor_tool(self):
        """Create a refactor tool instance for testing"""
        return RefactorTool()
    @pytest.fixture
    def mock_model_response(self):
        """Create a mock model response with valid JSON"""
        def _create_response(content=None):
            if content is None:
                content = json.dumps(
                    {
                        "refactor_opportunities": [
                            {
                                "id": "refactor-001",
                                "type": "codesmells",
                                "severity": "high",
                                "file": "/test/file.py",
                                "start_line": 10,
                                "end_line": 25,
                                "context_start_text": "def long_method():",
                                "context_end_text": "    return result",
                                "issue": "Method too long with multiple responsibilities",
                                "suggestion": "Extract helper methods",
                                "rationale": "Improves readability and maintainability",
                                "code_to_replace": "# original code",
                                "replacement_code_snippet": "# refactored code",
                                "new_code_snippets": [],
                            }
                        ],
                        "priority_sequence": ["refactor-001"],
                        "next_actions_for_claude": [],
                    }
                )
            from unittest.mock import Mock
            return Mock(
                content=content,
                usage={"input_tokens": 100, "output_tokens": 200, "total_tokens": 300},
                model_name="test-model",
                metadata={"finish_reason": "STOP"},
            )
        return _create_response
    def test_get_name(self, refactor_tool):
        """Test that the tool returns the correct name"""
        assert refactor_tool.get_name() == "refactor"
    def test_get_description(self, refactor_tool):
        """Test that the tool returns a comprehensive description"""
        description = refactor_tool.get_description()
        assert "INTELLIGENT CODE REFACTORING" in description
        assert "codesmells" in description
        assert "decompose" in description
        assert "modernize" in description
        assert "organization" in description
    def test_get_input_schema(self, refactor_tool):
        """Test that the input schema includes all required fields"""
        schema = refactor_tool.get_input_schema()
        assert schema["type"] == "object"
        assert "files" in schema["properties"]
        assert "prompt" in schema["properties"]
        assert "refactor_type" in schema["properties"]
        # Check refactor_type enum values
        refactor_enum = schema["properties"]["refactor_type"]["enum"]
        expected_types = ["codesmells", "decompose", "modernize", "organization"]
        assert all(rt in refactor_enum for rt in expected_types)
    def test_language_detection_python(self, refactor_tool):
        """Test language detection for Python files"""
        files = ["/test/file1.py", "/test/file2.py", "/test/utils.py"]
        language = refactor_tool.detect_primary_language(files)
        assert language == "python"
    def test_language_detection_javascript(self, refactor_tool):
        """Test language detection for JavaScript files"""
        files = ["/test/app.js", "/test/component.jsx", "/test/utils.js"]
        language = refactor_tool.detect_primary_language(files)
        assert language == "javascript"
    def test_language_detection_mixed(self, refactor_tool):
        """Test language detection for mixed language files"""
        files = ["/test/app.py", "/test/script.js", "/test/main.java"]
        language = refactor_tool.detect_primary_language(files)
        assert language == "mixed"
    def test_language_detection_unknown(self, refactor_tool):
        """Test language detection for unknown file types"""
        files = ["/test/data.txt", "/test/config.json"]
        language = refactor_tool.detect_primary_language(files)
        assert language == "unknown"
    def test_language_specific_guidance_python(self, refactor_tool):
        """Test language-specific guidance for Python modernization"""
        guidance = refactor_tool.get_language_specific_guidance("python", "modernize")
        assert "f-strings" in guidance
        assert "dataclasses" in guidance
        assert "type hints" in guidance
    def test_language_specific_guidance_javascript(self, refactor_tool):
        """Test language-specific guidance for JavaScript modernization"""
        guidance = refactor_tool.get_language_specific_guidance("javascript", "modernize")
        assert "async/await" in guidance
        assert "destructuring" in guidance
        assert "arrow functions" in guidance
    def test_language_specific_guidance_unknown(self, refactor_tool):
        """Test language-specific guidance for unknown languages"""
        guidance = refactor_tool.get_language_specific_guidance("unknown", "modernize")
        assert guidance == ""
    @pytest.mark.asyncio
    async def test_execute_basic_refactor(self, refactor_tool, mock_model_response):
        """Test basic refactor tool execution"""
        with patch.object(refactor_tool, "get_model_provider") as mock_get_provider:
            mock_provider = MagicMock()
            mock_provider.get_provider_type.return_value = MagicMock(value="test")
            mock_provider.supports_thinking_mode.return_value = False
            mock_provider.generate_content.return_value = mock_model_response()
            mock_get_provider.return_value = mock_provider
            # Mock file processing
            with patch.object(refactor_tool, "_prepare_file_content_for_prompt") as mock_prepare:
                mock_prepare.return_value = "def test(): pass"
                result = await refactor_tool.execute(
                    {
                        "files": ["/test/file.py"],
                        "prompt": "Find code smells in this Python code",
                        "refactor_type": "codesmells",
                    }
                )
                assert len(result) == 1
                output = json.loads(result[0].text)
                assert output["status"] == "success"
                # The format_response method adds markdown instructions, so content_type should be "markdown"
                # It could also be "json" or "text" depending on the response format
                assert output["content_type"] in ["json", "text", "markdown"]
    @pytest.mark.asyncio
    async def test_execute_with_style_guide(self, refactor_tool, mock_model_response):
        """Test refactor tool execution with style guide examples"""
        with patch.object(refactor_tool, "get_model_provider") as mock_get_provider:
            mock_provider = MagicMock()
            mock_provider.get_provider_type.return_value = MagicMock(value="test")
            mock_provider.supports_thinking_mode.return_value = False
            mock_provider.generate_content.return_value = mock_model_response()
            mock_get_provider.return_value = mock_provider
            # Mock file processing
            with patch.object(refactor_tool, "_prepare_file_content_for_prompt") as mock_prepare:
                mock_prepare.return_value = "def example(): pass"
            with patch.object(refactor_tool, "_process_style_guide_examples") as mock_style:
                mock_style.return_value = ("# style guide content", "")
                result = await refactor_tool.execute(
                    {
                        "files": ["/test/file.py"],
                        "prompt": "Modernize this code following our style guide",
                        "refactor_type": "modernize",
                        "style_guide_examples": ["/test/style_example.py"],
                    }
                )
                assert len(result) == 1
                output = json.loads(result[0].text)
                assert output["status"] == "success"
    def test_format_response_valid_json(self, refactor_tool):
        """Test response formatting with valid structured JSON"""
        valid_json_response = json.dumps(
            {
                "status": "refactor_analysis_complete",
                "refactor_opportunities": [
                    {
                        "id": "test-001",
                        "type": "codesmells",
                        "severity": "medium",
                        "file": "/test.py",
                        "start_line": 1,
                        "end_line": 5,
                        "context_start_text": "def test():",
                        "context_end_text": "    pass",
                        "issue": "Test issue",
                        "suggestion": "Test suggestion",
                        "rationale": "Test rationale",
                        "code_to_replace": "old code",
                        "replacement_code_snippet": "new code",
                    }
                ],
                "priority_sequence": ["test-001"],
                "next_actions_for_claude": [],
            }
        )
        # Create a mock request
        request = MagicMock()
        request.refactor_type = "codesmells"
        formatted = refactor_tool.format_response(valid_json_response, request)
        # Should contain the original response plus implementation instructions
        assert valid_json_response in formatted
        assert "IMMEDIATE NEXT ACTION" in formatted
        assert "ULTRATHINK & IMPLEMENT REFACTORINGS" in formatted
        assert "Step 4: COMPLETE REFACTORING" in formatted  # Not more_required, so should be COMPLETE
    def test_format_response_invalid_json(self, refactor_tool):
        """Test response formatting with invalid JSON - now handled by base tool"""
        invalid_response = "This is not JSON content"
        # Create a mock request
        request = MagicMock()
        request.refactor_type = "codesmells"
        formatted = refactor_tool.format_response(invalid_response, request)
        # Should contain the original response plus implementation instructions
        assert invalid_response in formatted
        assert "IMMEDIATE NEXT ACTION" in formatted
        assert "ULTRATHINK & IMPLEMENT REFACTORINGS" in formatted
    def test_model_category(self, refactor_tool):
        """Test that the refactor tool uses EXTENDED_REASONING category"""
        from tools.models import ToolModelCategory
        category = refactor_tool.get_model_category()
        assert category == ToolModelCategory.EXTENDED_REASONING
    def test_default_temperature(self, refactor_tool):
        """Test that the refactor tool uses analytical temperature"""
        from config import TEMPERATURE_ANALYTICAL
        temp = refactor_tool.get_default_temperature()
        assert temp == TEMPERATURE_ANALYTICAL
    def test_format_response_more_refactor_required(self, refactor_tool):
        """Test that format_response handles more_refactor_required status"""
        more_refactor_response = json.dumps(
            {
                "status": "more_refactor_required",
                "message": "Large codebase requires extensive refactoring across multiple files",
            }
        )
        # Create a mock request
        request = MagicMock()
        request.refactor_type = "decompose"
        formatted = refactor_tool.format_response(more_refactor_response, request)
        # Should contain the original response plus continuation instructions
        assert more_refactor_response in formatted
        assert "IMMEDIATE NEXT ACTION" in formatted
        assert "ULTRATHINK & IMPLEMENT REFACTORINGS" in formatted
        assert "VERIFY CHANGES WORK" in formatted
        assert "Step 4: CONTINUE WITH MORE REFACTORING" in formatted  # more_required, so should be CONTINUE
        assert "continuation_id" in formatted
        assert "immediately continue with more refactoring analysis" in formatted
 class TestFileUtilsLineNumbers:
    """Test suite for line numbering functionality in file_utils"""
    def test_read_file_content_with_line_numbers(self, project_path):
        """Test reading file content with line numbers enabled"""
        # Create a test file within the workspace
        temp_path = project_path / "test_file.py"
        with open(temp_path, "w") as f:
            f.write("def hello():\n    print('Hello')\n    return True")
        # Read with line numbers explicitly enabled
        content, tokens = read_file_content(str(temp_path), include_line_numbers=True)
        # Check that line numbers are present
        assert "1│ def hello():" in content
        assert "2│     print('Hello')" in content
        assert "3│     return True" in content
        assert "--- BEGIN FILE:" in content
        assert "--- END FILE:" in content
    def test_read_file_content_without_line_numbers(self, project_path):
        """Test reading file content with line numbers disabled"""
        # Create a test file within the workspace
        temp_path = project_path / "test_file.txt"
        with open(temp_path, "w") as f:
            f.write("Line 1\nLine 2\nLine 3")
        # Read with line numbers explicitly disabled
        content, tokens = read_file_content(str(temp_path), include_line_numbers=False)
        # Check that line numbers are NOT present
        assert "1│" not in content
        assert "Line 1" in content
        assert "Line 2" in content
        assert "--- BEGIN FILE:" in content
    def test_read_file_content_auto_detect_programming(self, project_path):
        """Test that auto-detection is OFF by default (backwards compatibility)"""
        # Create a test file within the workspace
        temp_path = project_path / "test_auto.py"
        with open(temp_path, "w") as f:
            f.write("import os\nprint('test')")
        # Read without specifying line numbers (should NOT auto-detect for backwards compatibility)
        content, tokens = read_file_content(str(temp_path))
        # Should NOT automatically add line numbers for .py files (default behavior)
        assert "1│" not in content
        assert "import os" in content
        assert "print('test')" in content
    def test_read_file_content_auto_detect_text(self, project_path):
        """Test auto-detection of line numbers for text files"""
        # Create a test file within the workspace
        temp_path = project_path / "test_auto.txt"
        with open(temp_path, "w") as f:
            f.write("This is a text file\nWith multiple lines")
        # Read without specifying line numbers (should auto-detect)
        content, tokens = read_file_content(str(temp_path))
        # Should NOT automatically add line numbers for .txt files
        assert "1│" not in content
        assert "This is a text file" in content
    def test_line_ending_normalization(self):
        """Test that different line endings are normalized consistently"""
        from utils.file_utils import _add_line_numbers, _normalize_line_endings
        # Test different line ending formats
        content_crlf = "Line 1\r\nLine 2\r\nLine 3"
        content_cr = "Line 1\rLine 2\rLine 3"
        content_lf = "Line 1\nLine 2\nLine 3"
        # All should normalize to the same result
        normalized_crlf = _normalize_line_endings(content_crlf)
        normalized_cr = _normalize_line_endings(content_cr)
        normalized_lf = _normalize_line_endings(content_lf)
        assert normalized_crlf == normalized_cr == normalized_lf
        assert normalized_lf == "Line 1\nLine 2\nLine 3"
        # Line numbering should work consistently
        numbered = _add_line_numbers(content_crlf)
        assert "   1│ Line 1" in numbered
        assert "   2│ Line 2" in numbered
        assert "   3│ Line 3" in numbered
    def test_detect_file_type(self):
        """Test file type detection"""
        from utils.file_utils import detect_file_type
        # Test programming language files
        assert detect_file_type("test.py") == "text"
        assert detect_file_type("test.js") == "text"
        assert detect_file_type("test.java") == "text"
        # Test image files
        assert detect_file_type("image.png") == "image"
        assert detect_file_type("photo.jpg") == "image"
        # Test binary files
        assert detect_file_type("program.exe") == "binary"
        assert detect_file_type("library.dll") == "binary"
    def test_should_add_line_numbers(self):
        """Test line number detection logic"""
        from utils.file_utils import should_add_line_numbers
        # NO files should get line numbers by default (backwards compatibility)
        assert not should_add_line_numbers("test.py")
        assert not should_add_line_numbers("app.js")
        assert not should_add_line_numbers("Main.java")
        assert not should_add_line_numbers("readme.txt")
        assert not should_add_line_numbers("data.csv")
        # Explicit override should work
        assert should_add_line_numbers("readme.txt", True)
        assert not should_add_line_numbers("test.py", False)
    def test_line_numbers_double_triple_digits(self, project_path):
        """Test line numbering with double and triple digit line numbers"""
        from utils.file_utils import _add_line_numbers
        # Create content with many lines to test double and triple digit formatting
        lines = []
        for i in range(1, 125):  # Lines 1-124 for testing up to triple digits
            if i < 10:
                lines.append(f"# Single digit line {i}")
            elif i < 100:
                lines.append(f"# Double digit line {i}")
            else:
                lines.append(f"# Triple digit line {i}")
        content = "\n".join(lines)
        numbered_content = _add_line_numbers(content)
        # Test single digit formatting (should be right-aligned with spaces)
        assert "   1│ # Single digit line 1" in numbered_content
        assert "   9│ # Single digit line 9" in numbered_content
        # Test double digit formatting (should be right-aligned)
        assert "  10│ # Double digit line 10" in numbered_content  # Line 10 has "double digit" content
        assert "  50│ # Double digit line 50" in numbered_content
        assert "  99│ # Double digit line 99" in numbered_content
        # Test triple digit formatting (should be right-aligned)
        assert " 100│ # Triple digit line 100" in numbered_content
        assert " 124│ # Triple digit line 124" in numbered_content
        # Verify consistent alignment - all line numbers should end with "│ "
        lines_with_numbers = numbered_content.split("\n")
        for line in lines_with_numbers:
            if "│" in line:
                # Find the pipe character position
                pipe_pos = line.find("│")
                # Ensure the character before pipe is a digit
                assert line[pipe_pos - 1].isdigit(), f"Line format issue: {line}"
                # Ensure the character after pipe is a space
                assert line[pipe_pos + 1] == " ", f"Line format issue: {line}"
    def test_line_numbers_with_file_reading(self, project_path):
        """Test line numbering through file reading with large file"""
        # Create a test file with 150 functions (600 total lines: 4 lines per function)
        temp_path = project_path / "large_test_file.py"
        with open(temp_path, "w") as f:
            for i in range(1, 151):  # Functions 1-150
                f.write(f"def function_{i}():\n")
                f.write(f"    # This is function number {i}\n")
                f.write(f"    return {i}\n")
                f.write("\n")
        # Read with line numbers enabled
        content, tokens = read_file_content(str(temp_path), include_line_numbers=True)
        # Calculate actual line numbers based on file structure (4 lines per function)
        # Function 1: lines 1-4, Function 2: lines 5-8, etc.
        # Line 1: def function_1():
        # Line 2: # This is function number 1
        # Line 3: return 1
        # Line 4: (empty)
        # Test various line number formats in the actual file content
        assert "   1│ def function_1():" in content
        # Function 13 starts at line 49 (12*4 + 1), so line 50 is "    # This is function number 13"
        assert "  50│     # This is function number 13" in content
        # Line 100 is actually an empty line after function 25 (line 99 was "return 25")
        assert " 100│ " in content  # Empty line
        # Line 99 is "return 25" from function 25
        assert "  99│     return 25" in content
        # Test more line numbers - line 147 is "return 37" from function 37
        assert " 147│     return 37" in content
        # Test that we have the final lines (600 total lines)
        assert " 599│     return 150" in content
        assert " 600│ " in content  # Final empty line
        # Verify the file structure is preserved
        assert "--- BEGIN FILE:" in content
        assert "--- END FILE:" in content
        assert str(temp_path) in content
    def test_line_numbers_large_files_22k_lines(self, project_path):
        """Test line numbering for very large files (22,500+ lines)"""
        from utils.file_utils import _add_line_numbers
        # Create content simulating a very large file with 25,000 lines
        lines = []
        for i in range(1, 25001):  # Lines 1-25000
            lines.append(f"// Large file line {i}")
        content = "\n".join(lines)
        numbered_content = _add_line_numbers(content)
        # Test that width dynamically adjusts to 5 digits for large files
        # Small line numbers should now have 5-digit width
        assert "    1│ // Large file line 1" in numbered_content
        assert "    9│ // Large file line 9" in numbered_content
        assert "   10│ // Large file line 10" in numbered_content
        assert "   99│ // Large file line 99" in numbered_content
        assert "  100│ // Large file line 100" in numbered_content
        assert "  999│ // Large file line 999" in numbered_content
        assert " 1000│ // Large file line 1000" in numbered_content
        assert " 9999│ // Large file line 9999" in numbered_content
        assert "10000│ // Large file line 10000" in numbered_content
        assert "22500│ // Large file line 22500" in numbered_content
        assert "25000│ // Large file line 25000" in numbered_content
        # Verify consistent alignment - all line numbers should end with "│ "
        lines_with_numbers = numbered_content.split("\n")
        for i, line in enumerate(lines_with_numbers[:100]):  # Check first 100 lines
            if "│" in line:
                pipe_pos = line.find("│")
                # For large files, should be 5-character width plus pipe
                assert line[pipe_pos - 1].isdigit(), f"Line {i+1} format issue: {line}"
                assert line[pipe_pos + 1] == " ", f"Line {i+1} format issue: {line}"
    def test_line_numbers_boundary_conditions(self):
        """Test line numbering at boundary conditions (9999 vs 10000 lines)"""
        from utils.file_utils import _add_line_numbers
        # Test exactly 9999 lines (should use 4-digit width)
        lines_9999 = [f"Line {i}" for i in range(1, 10000)]  # 9999 lines
        content_9999 = "\n".join(lines_9999)
        numbered_9999 = _add_line_numbers(content_9999)
        # Should use 4-digit format
        assert "   1│ Line 1" in numbered_9999
        assert "9999│ Line 9999" in numbered_9999
        # Test exactly 10000 lines (should use 5-digit width)
        lines_10000 = [f"Line {i}" for i in range(1, 10001)]  # 10000 lines
        content_10000 = "\n".join(lines_10000)
        numbered_10000 = _add_line_numbers(content_10000)
        # Should use 5-digit format
        assert "    1│ Line 1" in numbered_10000
        assert "10000│ Line 10000" in numbered_10000
 if __name__ == "__main__":
    pytest.main([__file__, "-v"])
--- a/tests/test_server.py
+++ b/tests/test_server.py
@@ -27,10 +27,11 @@ class TestServerTools:
        assert "chat" in tool_names
        assert "precommit" in tool_names
        assert "testgen" in tool_names
        assert "refactor" in tool_names
        assert "version" in tool_names
-        # Should have exactly 8 tools (including testgen)
+        # Should have exactly 9 tools (including refactor)
-        assert len(tools) == 8
+        assert len(tools) == 9
        # Check descriptions are verbose
        for tool in tools:
--- a/tests/test_special_status_parsing.py
+++ b/tests/test_special_status_parsing.py
@@ -261,3 +261,95 @@ class TestSpecialStatusParsing:
        # Should fall back to normal response since validation failed
        assert result.status == "success"
        assert result.content_type == "text"
    def test_refactor_analysis_complete_parsing(self):
        """Test that RefactorAnalysisComplete status is properly parsed"""
        import json
        json_response = {
            "status": "refactor_analysis_complete",
            "refactor_opportunities": [
                {
                    "id": "refactor-001",
                    "type": "decompose",
                    "severity": "critical",
                    "file": "/test.py",
                    "start_line": 1,
                    "end_line": 5,
                    "context_start_text": "def test():",
                    "context_end_text": "    pass",
                    "issue": "Large function needs decomposition",
                    "suggestion": "Extract helper methods",
                    "rationale": "Improves readability",
                    "code_to_replace": "old code",
                    "replacement_code_snippet": "new code",
                }
            ],
            "priority_sequence": ["refactor-001"],
            "next_actions_for_claude": [
                {
                    "action_type": "EXTRACT_METHOD",
                    "target_file": "/test.py",
                    "source_lines": "1-5",
                    "description": "Extract helper method",
                }
            ],
        }
        result = self.tool._parse_response(json.dumps(json_response), self.request)
        assert result.status == "refactor_analysis_complete"
        assert result.content_type == "json"
        parsed_content = json.loads(result.content)
        assert "refactor_opportunities" in parsed_content
        assert len(parsed_content["refactor_opportunities"]) == 1
        assert parsed_content["refactor_opportunities"][0]["id"] == "refactor-001"
    def test_refactor_analysis_complete_validation_error(self):
        """Test that RefactorAnalysisComplete validation catches missing required fields"""
        import json
        json_response = {
            "status": "refactor_analysis_complete",
            "refactor_opportunities": [
                {
                    "id": "refactor-001",
                    # Missing required fields like type, severity, etc.
                }
            ],
            "priority_sequence": ["refactor-001"],
            "next_actions_for_claude": [],
        }
        result = self.tool._parse_response(json.dumps(json_response), self.request)
        # Should fall back to normal response since validation failed
        assert result.status == "success"
        assert result.content_type == "text"
    def test_more_refactor_required_parsing(self):
        """Test that more_refactor_required status is parsed correctly"""
        import json
        json_response = {
            "status": "more_refactor_required",
            "message": "Large codebase requires extensive decomposition across 15 files. Continuing analysis for remaining modules.",
        }
        result = self.tool._parse_response(json.dumps(json_response), self.request)
        assert result.status == "more_refactor_required"
        assert result.content_type == "json"
        parsed_content = json.loads(result.content)
        assert parsed_content["status"] == "more_refactor_required"
        assert "Large codebase requires extensive decomposition" in parsed_content["message"]
    def test_more_refactor_required_missing_message(self):
        """Test that more_refactor_required without required message field fails validation"""
        response_json = '{"status": "more_refactor_required"}'
        result = self.tool._parse_response(response_json, self.request)
        # Should fall back to normal processing since validation failed (missing required field)
        assert result.status == "success"
        assert result.content_type == "text"
--- a/tools/init.py
+++ b/tools/init.py
@@ -7,6 +7,7 @@ from .chat import ChatTool
 from .codereview import CodeReviewTool
 from .debug import DebugIssueTool
 from .precommit import Precommit
 from .refactor import RefactorTool
 from .testgen import TestGenTool
 from .thinkdeep import ThinkDeepTool
@@ -17,5 +18,6 @@ __all__ = [
    "AnalyzeTool",
    "ChatTool",
    "Precommit",
    "RefactorTool",
    "TestGenTool",
 ]
--- a/tools/base.py
+++ b/tools/base.py
@@ -207,9 +207,7 @@ class BaseTool(ABC):
        provider = ModelProviderRegistry.get_provider_for_model(model_name)
        if not provider:
            logger = logging.getLogger(f"tools.{self.name}")
-            logger.warning(
+            logger.warning(f"Model '{model_name}' is not available with current API keys. Requiring model selection.")
                f"Model '{model_name}' is not available with current API keys. " f"Requiring model selection."
            )
            return True
        return False
@@ -397,6 +395,25 @@ class BaseTool(ABC):
        """
        return 0.5
    def wants_line_numbers_by_default(self) -> bool:
        """
        Return whether this tool wants line numbers added to code files by default.
        Tools that benefit from precise line references (refactor, codereview, debug)
        should return True. Tools that prioritize token efficiency or don't need
        precise references can return False.
        Line numbers add ~8-10% token overhead but provide precise targeting for:
        - Code review feedback ("SQL injection on line 45")
        - Debug error locations ("Memory leak in loop at lines 123-156")
        - Test generation targets ("Generate tests for method at lines 78-95")
        - Refactoring guidance ("Extract method from lines 67-89")
        Returns:
            bool: True if line numbers should be added by default for this tool
        """
        return False  # Conservative default - tools opt-in as needed
    def get_default_thinking_mode(self) -> str:
        """
        Return the default thinking mode for this tool.
@@ -694,7 +711,10 @@ class BaseTool(ABC):
            )
            try:
                file_content = read_files(
-                    files_to_embed, max_tokens=effective_max_tokens + reserve_tokens, reserve_tokens=reserve_tokens
+                    files_to_embed,
                    max_tokens=effective_max_tokens + reserve_tokens,
                    reserve_tokens=reserve_tokens,
                    include_line_numbers=self.wants_line_numbers_by_default(),
                )
                self._validate_token_limit(file_content, context_description)
                content_parts.append(file_content)
--- a/tools/codereview.py
+++ b/tools/codereview.py
@@ -148,6 +148,10 @@ class CodeReviewTool(BaseTool):
    def get_default_temperature(self) -> float:
        return TEMPERATURE_ANALYTICAL
    def wants_line_numbers_by_default(self) -> bool:
        """Code review tool needs line numbers for precise feedback"""
        return True
    def get_request_model(self):
        return CodeReviewRequest
--- a/tools/debug.py
+++ b/tools/debug.py
@@ -111,6 +111,10 @@ class DebugIssueTool(BaseTool):
    def get_default_temperature(self) -> float:
        return TEMPERATURE_ANALYTICAL
    def wants_line_numbers_by_default(self) -> bool:
        """Debug tool needs line numbers for precise error location"""
        return True
    def get_model_category(self) -> "ToolModelCategory":
        """Debug requires deep analysis and reasoning"""
        from tools.models import ToolModelCategory
--- a/tools/models.py
+++ b/tools/models.py
@@ -40,6 +40,8 @@ class ToolOutput(BaseModel):
        "focused_review_required",
        "test_sample_needed",
        "more_tests_required",
        "more_refactor_required",
        "refactor_analysis_complete",
        "resend_prompt",
        "continuation_available",
    ] = "success"
@@ -97,6 +99,56 @@ class MoreTestsRequired(BaseModel):
    pending_tests: str = Field(..., description="List of pending tests to be generated")
 class MoreRefactorRequired(BaseModel):
    """Request for continuation when refactoring requires extensive changes"""
    status: Literal["more_refactor_required"] = "more_refactor_required"
    message: str = Field(..., description="Explanation of why more refactoring is needed and what remains to be done")
 class RefactorOpportunity(BaseModel):
    """A single refactoring opportunity with precise targeting information"""
    id: str = Field(..., description="Unique identifier for this refactoring opportunity")
    type: Literal["decompose", "codesmells", "modernize", "organization"] = Field(
        ..., description="Type of refactoring"
    )
    severity: Literal["critical", "high", "medium", "low"] = Field(..., description="Severity level")
    file: str = Field(..., description="Absolute path to the file")
    start_line: int = Field(..., description="Starting line number")
    end_line: int = Field(..., description="Ending line number")
    context_start_text: str = Field(..., description="Exact text from start line for verification")
    context_end_text: str = Field(..., description="Exact text from end line for verification")
    issue: str = Field(..., description="Clear description of what needs refactoring")
    suggestion: str = Field(..., description="Specific refactoring action to take")
    rationale: str = Field(..., description="Why this improves the code")
    code_to_replace: str = Field(..., description="Original code that should be changed")
    replacement_code_snippet: str = Field(..., description="Refactored version of the code")
    new_code_snippets: Optional[list[dict]] = Field(
        default_factory=list, description="Additional code snippets to be added"
    )
 class RefactorAction(BaseModel):
    """Next action for Claude to implement refactoring"""
    action_type: Literal["EXTRACT_METHOD", "SPLIT_CLASS", "MODERNIZE_SYNTAX", "REORGANIZE_CODE", "DECOMPOSE_FILE"] = (
        Field(..., description="Type of action to perform")
    )
    target_file: str = Field(..., description="Absolute path to target file")
    source_lines: str = Field(..., description="Line range (e.g., '45-67')")
    description: str = Field(..., description="Step-by-step action description for Claude")
 class RefactorAnalysisComplete(BaseModel):
    """Complete refactor analysis with prioritized opportunities"""
    status: Literal["refactor_analysis_complete"] = "refactor_analysis_complete"
    refactor_opportunities: list[RefactorOpportunity] = Field(..., description="List of refactoring opportunities")
    priority_sequence: list[str] = Field(..., description="Recommended order of refactoring IDs")
    next_actions_for_claude: list[RefactorAction] = Field(..., description="Specific actions for Claude to implement")
 # Registry mapping status strings to their corresponding Pydantic models
 SPECIAL_STATUS_MODELS = {
    "clarification_required": ClarificationRequest,
@@ -104,6 +156,8 @@ SPECIAL_STATUS_MODELS = {
    "focused_review_required": FocusedReviewRequired,
    "test_sample_needed": TestSampleNeeded,
    "more_tests_required": MoreTestsRequired,
    "more_refactor_required": MoreRefactorRequired,
    "refactor_analysis_complete": RefactorAnalysisComplete,
 }
--- a/tools/refactor.py
+++ b/tools/refactor.py
@@ -0,0 +1,653 @@
 """
 Refactor tool - Intelligent code refactoring suggestions with precise line-number references
 This tool analyzes code for refactoring opportunities across four main categories:
 - codesmells: Detect and suggest fixes for common code smells
 - decompose: Break down large functions, classes, and modules into smaller, focused components
 - modernize: Update code to use modern language features and patterns
 - organization: Suggest better organization and logical grouping of related functionality
 Key Features:
 - Cross-language support with language-specific guidance
 - Precise line-number references for Claude
 - Large context handling with token budgeting
 - Structured JSON responses for easy parsing
 - Style guide integration for project-specific patterns
 """
 import logging
 import os
 from typing import Any, Literal, Optional
 from mcp.types import TextContent
 from pydantic import Field
 from config import TEMPERATURE_ANALYTICAL
 from systemprompts import REFACTOR_PROMPT
 from utils.file_utils import translate_file_paths
 from .base import BaseTool, ToolRequest
 from .models import ToolOutput
 logger = logging.getLogger(__name__)
 class RefactorRequest(ToolRequest):
    """
    Request model for the refactor tool.
    This model defines all parameters that can be used to customize
    the refactoring analysis process.
    """
    files: list[str] = Field(
        ...,
        description="Code files or directories to analyze for refactoring opportunities (must be absolute paths)",
    )
    prompt: str = Field(
        ...,
        description="Description of refactoring goals, context, and specific areas of focus",
    )
    refactor_type: Literal["codesmells", "decompose", "modernize", "organization"] = Field(
        ..., description="Type of refactoring analysis to perform"
    )
    focus_areas: Optional[list[str]] = Field(
        None,
        description="Specific areas to focus on (e.g., 'performance', 'readability', 'maintainability', 'security')",
    )
    style_guide_examples: Optional[list[str]] = Field(
        None,
        description=(
            "Optional existing code files to use as style/pattern reference (must be absolute paths). "
            "These files represent the target coding style and patterns for the project. "
            "Particularly useful for 'modernize' and 'organization' refactor types."
        ),
    )
 class RefactorTool(BaseTool):
    """
    Refactor tool implementation.
    This tool analyzes code to provide intelligent refactoring suggestions
    with precise line-number references for Claude to implement.
    """
    def get_name(self) -> str:
        return "refactor"
    def get_description(self) -> str:
        return (
            "INTELLIGENT CODE REFACTORING - Analyzes code for refactoring opportunities with precise line-number guidance. "
            "Supports four refactor types: 'codesmells' (detect anti-patterns), 'decompose' (break down large functions/classes/modules into smaller components), "
            "'modernize' (update to modern language features), and 'organization' (improve organization and grouping of related functionality). "
            "Provides specific, actionable refactoring steps that Claude can implement directly. "
            "Choose thinking_mode based on codebase complexity: 'medium' for standard modules (default), "
            "'high' for complex systems, 'max' for legacy codebases requiring deep analysis. "
            "Note: If you're not currently using a top-tier model such as Opus 4 or above, these tools can provide enhanced capabilities."
        )
    def get_input_schema(self) -> dict[str, Any]:
        schema = {
            "type": "object",
            "properties": {
                "files": {
                    "type": "array",
                    "items": {"type": "string"},
                    "description": "Code files or directories to analyze for refactoring opportunities (must be absolute paths)",
                },
                "model": self.get_model_field_schema(),
                "prompt": {
                    "type": "string",
                    "description": "Description of refactoring goals, context, and specific areas of focus",
                },
                "refactor_type": {
                    "type": "string",
                    "enum": ["codesmells", "decompose", "modernize", "organization"],
                    "description": "Type of refactoring analysis to perform",
                },
                "focus_areas": {
                    "type": "array",
                    "items": {"type": "string"},
                    "description": "Specific areas to focus on (e.g., 'performance', 'readability', 'maintainability', 'security')",
                },
                "style_guide_examples": {
                    "type": "array",
                    "items": {"type": "string"},
                    "description": (
                        "Optional existing code files to use as style/pattern reference (must be absolute paths). "
                        "These files represent the target coding style and patterns for the project."
                    ),
                },
                "thinking_mode": {
                    "type": "string",
                    "enum": ["minimal", "low", "medium", "high", "max"],
                    "description": "Thinking depth: minimal (0.5% of model max), low (8%), medium (33%), high (67%), max (100% of model max)",
                },
                "continuation_id": {
                    "type": "string",
                    "description": (
                        "Thread continuation ID for multi-turn conversations. Can be used to continue conversations "
                        "across different tools. Only provide this if continuing a previous conversation thread."
                    ),
                },
            },
            "required": ["files", "prompt", "refactor_type"] + (["model"] if self.is_effective_auto_mode() else []),
        }
        return schema
    def get_system_prompt(self) -> str:
        return REFACTOR_PROMPT
    def get_default_temperature(self) -> float:
        return TEMPERATURE_ANALYTICAL
    def wants_line_numbers_by_default(self) -> bool:
        """Refactor tool needs line numbers for precise targeting"""
        return True
    def get_model_category(self):
        """Refactor tool requires extended reasoning for comprehensive analysis"""
        from tools.models import ToolModelCategory
        return ToolModelCategory.EXTENDED_REASONING
    def get_request_model(self):
        return RefactorRequest
    async def execute(self, arguments: dict[str, Any]) -> list[TextContent]:
        """Override execute to check prompt size before processing"""
        logger.info(f"[REFACTOR] execute called with arguments: {list(arguments.keys())}")
        # First validate request
        request_model = self.get_request_model()
        request = request_model(**arguments)
        # Check prompt size if provided
        if request.prompt:
            size_check = self.check_prompt_size(request.prompt)
            if size_check:
                logger.info(f"[REFACTOR] Prompt size check triggered, returning early")
                return [TextContent(type="text", text=ToolOutput(**size_check).model_dump_json())]
        logger.info(f"[REFACTOR] Prompt size OK, calling super().execute()")
        # Continue with normal execution
        return await super().execute(arguments)
    def detect_primary_language(self, file_paths: list[str]) -> str:
        """
        Detect the primary programming language from file extensions.
        Args:
            file_paths: List of file paths to analyze
        Returns:
            str: Detected language or "mixed" if multiple languages found
        """
        # Language detection based on file extensions
        language_extensions = {
            "python": {".py"},
            "javascript": {".js", ".jsx", ".mjs"},
            "typescript": {".ts", ".tsx"},
            "java": {".java"},
            "csharp": {".cs"},
            "cpp": {".cpp", ".cc", ".cxx", ".c", ".h", ".hpp"},
            "go": {".go"},
            "rust": {".rs"},
            "swift": {".swift"},
            "kotlin": {".kt"},
            "ruby": {".rb"},
            "php": {".php"},
            "scala": {".scala"},
        }
        # Count files by language
        language_counts = {}
        for file_path in file_paths:
            extension = os.path.splitext(file_path.lower())[1]
            for lang, exts in language_extensions.items():
                if extension in exts:
                    language_counts[lang] = language_counts.get(lang, 0) + 1
                    break
        if not language_counts:
            return "unknown"
        # Return most common language, or "mixed" if multiple languages
        max_count = max(language_counts.values())
        dominant_languages = [lang for lang, count in language_counts.items() if count == max_count]
        if len(dominant_languages) == 1:
            return dominant_languages[0]
        else:
            return "mixed"
    def get_language_specific_guidance(self, language: str, refactor_type: str) -> str:
        """
        Generate language-specific guidance for the refactoring prompt.
        Args:
            language: Detected programming language
            refactor_type: Type of refactoring being performed
        Returns:
            str: Language-specific guidance to inject into the prompt
        """
        if language == "unknown" or language == "mixed":
            return ""
        # Language-specific modernization features
        modernization_features = {
            "python": "f-strings, dataclasses, type hints, pathlib, async/await, context managers, list/dict comprehensions, walrus operator",
            "javascript": "async/await, destructuring, arrow functions, template literals, optional chaining, nullish coalescing, modules (import/export)",
            "typescript": "strict type checking, utility types, const assertions, template literal types, mapped types",
            "java": "streams API, lambda expressions, optional, records, pattern matching, var declarations, text blocks",
            "csharp": "LINQ, nullable reference types, pattern matching, records, async streams, using declarations",
            "swift": "value types, protocol-oriented programming, property wrappers, result builders, async/await",
            "go": "modules, error wrapping, context package, generics (Go 1.18+)",
            "rust": "ownership patterns, iterator adapters, error handling with Result, async/await",
        }
        # Language-specific code splitting patterns
        splitting_patterns = {
            "python": "modules, classes, functions, decorators for cross-cutting concerns",
            "javascript": "modules (ES6), classes, functions, higher-order functions",
            "java": "packages, classes, interfaces, abstract classes, composition over inheritance",
            "csharp": "namespaces, classes, interfaces, extension methods, dependency injection",
            "swift": "extensions, protocols, structs, enums with associated values",
            "go": "packages, interfaces, struct composition, function types",
        }
        guidance_parts = []
        if refactor_type == "modernize" and language in modernization_features:
            guidance_parts.append(
                f"LANGUAGE-SPECIFIC MODERNIZATION ({language.upper()}): Focus on {modernization_features[language]}"
            )
        if refactor_type == "decompose" and language in splitting_patterns:
            guidance_parts.append(
                f"LANGUAGE-SPECIFIC DECOMPOSITION ({language.upper()}): Use {splitting_patterns[language]} to break down large components"
            )
        # General language guidance
        general_guidance = {
            "python": "Follow PEP 8, use descriptive names, prefer composition over inheritance",
            "javascript": "Use consistent naming conventions, avoid global variables, prefer functional patterns",
            "java": "Follow Java naming conventions, use interfaces for abstraction, consider immutability",
            "csharp": "Follow C# naming conventions, use nullable reference types, prefer async methods",
        }
        if language in general_guidance:
            guidance_parts.append(f"GENERAL GUIDANCE ({language.upper()}): {general_guidance[language]}")
        return "\n".join(guidance_parts) if guidance_parts else ""
    def _process_style_guide_examples(
        self, style_examples: list[str], continuation_id: Optional[str], available_tokens: int = None
    ) -> tuple[str, str]:
        """
        Process style guide example files using available token budget.
        Args:
            style_examples: List of style guide file paths
            continuation_id: Continuation ID for filtering already embedded files
            available_tokens: Available token budget for examples
        Returns:
            tuple: (formatted_content, summary_note)
        """
        logger.debug(f"[REFACTOR] Processing {len(style_examples)} style guide examples")
        if not style_examples:
            logger.debug("[REFACTOR] No style guide examples provided")
            return "", ""
        # Use existing file filtering to avoid duplicates in continuation
        examples_to_process = self.filter_new_files(style_examples, continuation_id)
        logger.debug(f"[REFACTOR] After filtering: {len(examples_to_process)} new style examples to process")
        if not examples_to_process:
            logger.info(f"[REFACTOR] All {len(style_examples)} style examples already in conversation history")
            return "", ""
        # Translate file paths for Docker environment before accessing files
        translated_examples = translate_file_paths(examples_to_process)
        logger.debug(f"[REFACTOR] Translated {len(examples_to_process)} file paths for container access")
        # Calculate token budget for style examples (20% of available tokens, or fallback)
        if available_tokens:
            style_examples_budget = int(available_tokens * 0.20)  # 20% for style examples
            logger.debug(
                f"[REFACTOR] Allocating {style_examples_budget:,} tokens (20% of {available_tokens:,}) for style examples"
            )
        else:
            style_examples_budget = 25000  # Fallback if no budget provided
            logger.debug(f"[REFACTOR] Using fallback budget of {style_examples_budget:,} tokens for style examples")
        original_count = len(examples_to_process)
        logger.debug(
            f"[REFACTOR] Processing {original_count} style example files with {style_examples_budget:,} token budget"
        )
        # Sort by file size (smallest first) for pattern-focused selection
        file_sizes = []
        for i, file_path in enumerate(examples_to_process):
            translated_path = translated_examples[i]
            try:
                size = os.path.getsize(translated_path)
                file_sizes.append((file_path, size))
                logger.debug(f"[REFACTOR] Style example {os.path.basename(file_path)}: {size:,} bytes")
            except (OSError, FileNotFoundError) as e:
                logger.warning(f"[REFACTOR] Could not get size for {file_path}: {e}")
                file_sizes.append((file_path, float("inf")))
        # Sort by size and take smallest files for pattern reference
        file_sizes.sort(key=lambda x: x[1])
        examples_to_process = [f[0] for f in file_sizes]
        logger.debug(
            f"[REFACTOR] Sorted style examples by size (smallest first): {[os.path.basename(f) for f in examples_to_process]}"
        )
        # Use standard file content preparation with dynamic token budget and line numbers
        try:
            logger.debug(f"[REFACTOR] Preparing file content for {len(examples_to_process)} style examples")
            content = self._prepare_file_content_for_prompt(
                examples_to_process,
                continuation_id,
                "Style guide examples",
                max_tokens=style_examples_budget,
                reserve_tokens=1000,
            )
            # Determine how many files were actually included
            if content:
                from utils.token_utils import estimate_tokens
                used_tokens = estimate_tokens(content)
                logger.info(
                    f"[REFACTOR] Successfully embedded style examples: {used_tokens:,} tokens used ({style_examples_budget:,} available)"
                )
                if original_count > 1:
                    truncation_note = f"Note: Used {used_tokens:,} tokens ({style_examples_budget:,} available) for style guide examples from {original_count} files to determine coding patterns."
                else:
                    truncation_note = ""
            else:
                logger.warning("[REFACTOR] No content generated for style examples")
                truncation_note = ""
            return content, truncation_note
        except Exception as e:
            # If style example processing fails, continue without examples rather than failing
            logger.error(f"[REFACTOR] Failed to process style examples: {type(e).__name__}: {e}")
            return "", f"Warning: Could not process style guide examples: {str(e)}"
    async def prepare_prompt(self, request: RefactorRequest) -> str:
        """
        Prepare the refactoring prompt with code analysis and optional style examples.
        This method reads the requested files, processes any style guide examples,
        and constructs a detailed prompt for comprehensive refactoring analysis.
        Args:
            request: The validated refactor request
        Returns:
            str: Complete prompt for the model
        Raises:
            ValueError: If the code exceeds token limits
        """
        logger.info(f"[REFACTOR] prepare_prompt called with {len(request.files)} files, type={request.refactor_type}")
        logger.debug(f"[REFACTOR] Preparing prompt for {len(request.files)} code files")
        logger.debug(f"[REFACTOR] Refactor type: {request.refactor_type}")
        if request.style_guide_examples:
            logger.debug(f"[REFACTOR] Including {len(request.style_guide_examples)} style guide examples")
        # Check for prompt.txt in files
        prompt_content, updated_files = self.handle_prompt_file(request.files)
        # If prompt.txt was found, incorporate it into the prompt
        if prompt_content:
            logger.debug("[REFACTOR] Found prompt.txt file, incorporating content")
            request.prompt = prompt_content + "\n\n" + request.prompt
        # Update request files list
        if updated_files is not None:
            logger.debug(f"[REFACTOR] Updated files list after prompt.txt processing: {len(updated_files)} files")
            request.files = updated_files
        # Calculate available token budget for dynamic allocation
        continuation_id = getattr(request, "continuation_id", None)
        # Get model context for token budget calculation
        model_name = getattr(self, "_current_model_name", None)
        available_tokens = None
        if model_name:
            try:
                provider = self.get_model_provider(model_name)
                capabilities = provider.get_capabilities(model_name)
                # Use 75% of context for content (code + style examples), 25% for response
                available_tokens = int(capabilities.context_window * 0.75)
                logger.debug(
                    f"[REFACTOR] Token budget calculation: {available_tokens:,} tokens (75% of {capabilities.context_window:,}) for model {model_name}"
                )
            except Exception as e:
                # Fallback to conservative estimate
                logger.warning(f"[REFACTOR] Could not get model capabilities for {model_name}: {e}")
                available_tokens = 120000  # Conservative fallback
                logger.debug(f"[REFACTOR] Using fallback token budget: {available_tokens:,} tokens")
        # Process style guide examples first to determine token allocation
        style_examples_content = ""
        style_examples_note = ""
        if request.style_guide_examples:
            logger.debug(f"[REFACTOR] Processing {len(request.style_guide_examples)} style guide examples")
            style_examples_content, style_examples_note = self._process_style_guide_examples(
                request.style_guide_examples, continuation_id, available_tokens
            )
            if style_examples_content:
                logger.info("[REFACTOR] Style guide examples processed successfully for pattern reference")
            else:
                logger.info("[REFACTOR] No style guide examples content after processing")
        # Remove files that appear in both 'files' and 'style_guide_examples' to avoid duplicate embedding
        code_files_to_process = request.files.copy()
        if request.style_guide_examples:
            # Normalize paths for comparison
            style_example_set = {os.path.normpath(os.path.abspath(f)) for f in request.style_guide_examples}
            original_count = len(code_files_to_process)
            code_files_to_process = [
                f for f in code_files_to_process if os.path.normpath(os.path.abspath(f)) not in style_example_set
            ]
            duplicates_removed = original_count - len(code_files_to_process)
            if duplicates_removed > 0:
                logger.info(
                    f"[REFACTOR] Removed {duplicates_removed} duplicate files from code files list "
                    f"(already included in style guide examples for pattern reference)"
                )
        # Calculate remaining tokens for main code after style examples
        if style_examples_content and available_tokens:
            from utils.token_utils import estimate_tokens
            style_tokens = estimate_tokens(style_examples_content)
            remaining_tokens = available_tokens - style_tokens - 5000  # Reserve for prompt structure
            logger.debug(
                f"[REFACTOR] Token allocation: {style_tokens:,} for examples, {remaining_tokens:,} remaining for code files"
            )
        else:
            if available_tokens:
                remaining_tokens = available_tokens - 10000
            else:
                remaining_tokens = 110000  # Conservative fallback (120000 - 10000)
            logger.debug(
                f"[REFACTOR] Token allocation: {remaining_tokens:,} tokens available for code files (no style examples)"
            )
        # Use centralized file processing logic for main code files (with line numbers enabled)
        logger.debug(f"[REFACTOR] Preparing {len(code_files_to_process)} code files for analysis")
        code_content = self._prepare_file_content_for_prompt(
            code_files_to_process, continuation_id, "Code to analyze", max_tokens=remaining_tokens, reserve_tokens=2000
        )
        if code_content:
            from utils.token_utils import estimate_tokens
            code_tokens = estimate_tokens(code_content)
            logger.info(f"[REFACTOR] Code files embedded successfully: {code_tokens:,} tokens")
        else:
            logger.warning("[REFACTOR] No code content after file processing")
        # Detect primary language for language-specific guidance
        primary_language = self.detect_primary_language(request.files)
        logger.debug(f"[REFACTOR] Detected primary language: {primary_language}")
        # Get language-specific guidance
        language_guidance = self.get_language_specific_guidance(primary_language, request.refactor_type)
        # Build the complete prompt
        prompt_parts = []
        # Add system prompt with dynamic language guidance
        base_system_prompt = self.get_system_prompt()
        if language_guidance:
            enhanced_system_prompt = f"{base_system_prompt}\n\n{language_guidance}"
        else:
            enhanced_system_prompt = base_system_prompt
        prompt_parts.append(enhanced_system_prompt)
        # Add user context
        prompt_parts.append("=== USER CONTEXT ===")
        prompt_parts.append(f"Refactor Type: {request.refactor_type}")
        if request.focus_areas:
            prompt_parts.append(f"Focus Areas: {', '.join(request.focus_areas)}")
        prompt_parts.append(f"User Goals: {request.prompt}")
        prompt_parts.append("=== END CONTEXT ===")
        # Add style guide examples if provided
        if style_examples_content:
            prompt_parts.append("\n=== STYLE GUIDE EXAMPLES ===")
            if style_examples_note:
                prompt_parts.append(f"// {style_examples_note}")
            prompt_parts.append(style_examples_content)
            prompt_parts.append("=== END STYLE GUIDE EXAMPLES ===")
        # Add main code to analyze
        prompt_parts.append("\n=== CODE TO ANALYZE ===")
        prompt_parts.append(code_content)
        prompt_parts.append("=== END CODE ===")
        # Add generation instructions
        prompt_parts.append(
            f"\nPlease analyze the code for {request.refactor_type} refactoring opportunities following the multi-expert workflow specified in the system prompt."
        )
        if style_examples_content:
            prompt_parts.append(
                "Use the provided style guide examples as a reference for target coding patterns and style."
            )
        full_prompt = "\n".join(prompt_parts)
        # Log final prompt statistics
        from utils.token_utils import estimate_tokens
        total_tokens = estimate_tokens(full_prompt)
        logger.info(f"[REFACTOR] Complete prompt prepared: {total_tokens:,} tokens, {len(full_prompt):,} characters")
        return full_prompt
    def format_response(self, response: str, request: RefactorRequest, model_info: Optional[dict] = None) -> str:
        """
        Format the refactoring response.
        The base tool handles structured response validation via SPECIAL_STATUS_MODELS,
        so this method focuses on presentation formatting.
        Args:
            response: The raw refactoring analysis from the model
            request: The original request for context
            model_info: Optional dict with model metadata
        Returns:
            str: The response (base tool will handle structured parsing)
        """
        logger.debug(f"[REFACTOR] Formatting response for {request.refactor_type} refactoring")
        # Check if this is a more_refactor_required response
        is_more_required = False
        try:
            import json
            parsed = json.loads(response)
            if isinstance(parsed, dict) and parsed.get("status") == "more_refactor_required":
                is_more_required = True
        except (json.JSONDecodeError, ValueError):
            # Not JSON or not more_refactor_required
            pass
        # Always add implementation instructions for any refactoring response
        step4_title = (
            "## Step 4: CONTINUE WITH MORE REFACTORING" if is_more_required else "## Step 4: COMPLETE REFACTORING"
        )
        step4_intro = (
            "Once all refactorings above are implemented and verified working, IMMEDIATELY continue the analysis:"
            if is_more_required
            else "Once all refactorings above are implemented and verified working:"
        )
        step4_action = (
            "Use the refactor tool again with the SAME parameters but include the continuation_id from this response to get additional refactoring opportunities. The model will provide more refactoring suggestions building on what was already completed."
            if is_more_required
            else "Review all changes made and ensure the codebase is cleaner, more maintainable, and follows best practices."
        )
        critical_msg = (
            "apply all refactorings, validate they work, then immediately continue with more refactoring analysis. Take full ownership of the refactoring implementation and continue until all opportunities are addressed."
            if is_more_required
            else "create, modify, and reorganize files as needed. Take full ownership of the refactoring implementation and ensure all changes work correctly."
        )
        return f"""{response}
 ---
 # IMMEDIATE NEXT ACTION
 Claude, you are now in EXECUTION MODE. Take immediate action:
 ## Step 1: ULTRATHINK & IMPLEMENT REFACTORINGS
 ULTRATHINK while implementing these refactorings. Verify EVERY code reference, import, dependency, and access modifier is 100% accurate before making changes.
 - **IMPLEMENT** all refactoring opportunities listed above in priority order
 - **CREATE** any new files needed for decomposition or reorganization
 - **MOVE** code to appropriate locations following the refactoring plan
 - **UPDATE** all imports and references to maintain functionality
 - **VALIDATE** that no functionality is broken by the changes
 ## Step 2: VERIFY CHANGES WORK
 **MANDATORY**: After each refactoring step:
 - Ensure all imports are updated correctly
 - Verify access modifiers (private/public/internal) still work
 - Check that all references to moved code are updated
 - Run any existing tests to confirm nothing is broken
 - Fix any issues that arise from the refactoring
 ## Step 3: DISPLAY RESULTS TO USER
 After implementing each refactoring, show the user:
 ```
 ✅ Refactored: [refactor-id] - Brief description
   - Files modified: [list of files]
   - [Key change summary]
 ```
 {step4_title}
 {step4_intro}
 {step4_action}
 **CRITICAL**: Do NOT stop after generating the analysis - you MUST {critical_msg}"""
--- a/tools/testgen.py
+++ b/tools/testgen.py
@@ -134,6 +134,10 @@ class TestGenTool(BaseTool):
    def get_default_temperature(self) -> float:
        return TEMPERATURE_ANALYTICAL
    def wants_line_numbers_by_default(self) -> bool:
        """Test generation tool needs line numbers for precise targeting"""
        return True
    def get_model_category(self):
        """TestGen requires extended reasoning for comprehensive test analysis"""
        from tools.models import ToolModelCategory
--- a/utils/init.py
+++ b/utils/init.py
@@ -2,7 +2,9 @@
 Utility functions for Zen MCP Server
 """
-from .file_utils import CODE_EXTENSIONS, expand_paths, read_file_content, read_files
+from .file_types import CODE_EXTENSIONS, FILE_CATEGORIES, PROGRAMMING_EXTENSIONS, TEXT_EXTENSIONS
 from .file_utils import expand_paths, read_file_content, read_files
 from .security_config import EXCLUDED_DIRS, SECURITY_ROOT
 from .token_utils import check_token_limit, estimate_tokens
 __all__ = [
@@ -10,6 +12,11 @@ __all__ = [
    "read_file_content",
    "expand_paths",
    "CODE_EXTENSIONS",
    "PROGRAMMING_EXTENSIONS",
    "TEXT_EXTENSIONS",
    "FILE_CATEGORIES",
    "SECURITY_ROOT",
    "EXCLUDED_DIRS",
    "estimate_tokens",
    "check_token_limit",
 ]
--- a/utils/file_types.py
+++ b/utils/file_types.py
@@ -0,0 +1,180 @@
 """
 File type definitions and constants for file processing
 This module centralizes all file type and extension definitions used
 throughout the MCP server for consistent file handling.
 """
 # Programming language file extensions - core code files
 PROGRAMMING_LANGUAGES = {
    ".py",  # Python
    ".js",  # JavaScript
    ".ts",  # TypeScript
    ".jsx",  # React JavaScript
    ".tsx",  # React TypeScript
    ".java",  # Java
    ".cpp",  # C++
    ".c",  # C
    ".h",  # C/C++ Header
    ".hpp",  # C++ Header
    ".cs",  # C#
    ".go",  # Go
    ".rs",  # Rust
    ".rb",  # Ruby
    ".php",  # PHP
    ".swift",  # Swift
    ".kt",  # Kotlin
    ".scala",  # Scala
    ".r",  # R
    ".m",  # Objective-C
    ".mm",  # Objective-C++
 }
 # Script and shell file extensions
 SCRIPTS = {
    ".sql",  # SQL
    ".sh",  # Shell
    ".bash",  # Bash
    ".zsh",  # Zsh
    ".fish",  # Fish shell
    ".ps1",  # PowerShell
    ".bat",  # Batch
    ".cmd",  # Command
 }
 # Configuration and data file extensions
 CONFIGS = {
    ".yml",  # YAML
    ".yaml",  # YAML
    ".json",  # JSON
    ".xml",  # XML
    ".toml",  # TOML
    ".ini",  # INI
    ".cfg",  # Config
    ".conf",  # Config
    ".properties",  # Properties
    ".env",  # Environment
 }
 # Documentation and markup file extensions
 DOCS = {
    ".txt",  # Text
    ".md",  # Markdown
    ".rst",  # reStructuredText
    ".tex",  # LaTeX
 }
 # Web development file extensions
 WEB = {
    ".html",  # HTML
    ".css",  # CSS
    ".scss",  # Sass
    ".sass",  # Sass
    ".less",  # Less
 }
 # Additional text file extensions for logs and data
 TEXT_DATA = {
    ".log",  # Log files
    ".csv",  # CSV
    ".tsv",  # TSV
    ".gitignore",  # Git ignore
    ".dockerfile",  # Docker
    ".makefile",  # Make
    ".cmake",  # CMake
    ".gradle",  # Gradle
    ".sbt",  # SBT
    ".pom",  # Maven POM
    ".lock",  # Lock files
 }
 # Image file extensions
 IMAGES = {".jpg", ".jpeg", ".png", ".gif", ".bmp", ".svg", ".webp", ".ico", ".tiff", ".tif"}
 # Binary executable and library extensions
 BINARIES = {
    ".exe",  # Windows executable
    ".dll",  # Windows library
    ".so",  # Linux shared object
    ".dylib",  # macOS dynamic library
    ".bin",  # Binary
    ".class",  # Java class
 }
 # Archive and package file extensions
 ARCHIVES = {
    ".jar",
    ".war",
    ".ear",  # Java archives
    ".zip",
    ".tar",
    ".gz",  # General archives
    ".7z",
    ".rar",  # Compression
    ".deb",
    ".rpm",  # Linux packages
    ".dmg",
    ".pkg",  # macOS packages
 }
 # Derived sets for different use cases
 CODE_EXTENSIONS = PROGRAMMING_LANGUAGES | SCRIPTS | CONFIGS | DOCS | WEB
 PROGRAMMING_EXTENSIONS = PROGRAMMING_LANGUAGES  # For line numbering
 TEXT_EXTENSIONS = CODE_EXTENSIONS | TEXT_DATA
 IMAGE_EXTENSIONS = IMAGES
 BINARY_EXTENSIONS = BINARIES | ARCHIVES
 # All extensions by category for easy access
 FILE_CATEGORIES = {
    "programming": PROGRAMMING_LANGUAGES,
    "scripts": SCRIPTS,
    "configs": CONFIGS,
    "docs": DOCS,
    "web": WEB,
    "text_data": TEXT_DATA,
    "images": IMAGES,
    "binaries": BINARIES,
    "archives": ARCHIVES,
 }
 def get_file_category(file_path: str) -> str:
    """
    Determine the category of a file based on its extension.
    Args:
        file_path: Path to the file
    Returns:
        Category name or "unknown" if not recognized
    """
    from pathlib import Path
    extension = Path(file_path).suffix.lower()
    for category, extensions in FILE_CATEGORIES.items():
        if extension in extensions:
            return category
    return "unknown"
 def is_code_file(file_path: str) -> bool:
    """Check if a file is a code file (programming language)."""
    from pathlib import Path
    return Path(file_path).suffix.lower() in PROGRAMMING_LANGUAGES
 def is_text_file(file_path: str) -> bool:
    """Check if a file is a text file."""
    from pathlib import Path
    return Path(file_path).suffix.lower() in TEXT_EXTENSIONS
 def is_binary_file(file_path: str) -> bool:
    """Check if a file is a binary file."""
    from pathlib import Path
    return Path(file_path).suffix.lower() in BINARY_EXTENSIONS
--- a/utils/file_utils.py
+++ b/utils/file_utils.py
@@ -23,148 +23,12 @@ import os
 from pathlib import Path
 from typing import Optional
 from .file_types import BINARY_EXTENSIONS, CODE_EXTENSIONS, IMAGE_EXTENSIONS, TEXT_EXTENSIONS
 from .security_config import CONTAINER_WORKSPACE, EXCLUDED_DIRS, MCP_SIGNATURE_FILES, SECURITY_ROOT, WORKSPACE_ROOT
 from .token_utils import DEFAULT_CONTEXT_WINDOW, estimate_tokens
 logger = logging.getLogger(__name__)
 # Get workspace root for Docker path translation
 # IMPORTANT: WORKSPACE_ROOT should contain the HOST path (e.g., /Users/john/project)
 # that gets mounted to /workspace in the Docker container. This enables proper
 # path translation between host absolute paths and container workspace paths.
 WORKSPACE_ROOT = os.environ.get("WORKSPACE_ROOT")
 CONTAINER_WORKSPACE = Path("/workspace")
 # Dangerous paths that should never be used as WORKSPACE_ROOT
 # These would give overly broad access and pose security risks
 DANGEROUS_WORKSPACE_PATHS = {
    "/",
    "/etc",
    "/usr",
    "/bin",
    "/var",
    "/root",
    "/home",
    "/workspace",  # Container path - WORKSPACE_ROOT should be host path
    "C:\\",
    "C:\\Windows",
    "C:\\Program Files",
    "C:\\Users",
 }
 # Validate WORKSPACE_ROOT for security if it's set
 if WORKSPACE_ROOT:
    # Resolve to canonical path for comparison
    resolved_workspace = Path(WORKSPACE_ROOT).resolve()
    # Special check for /workspace - common configuration mistake
    if str(resolved_workspace) == "/workspace":
        raise RuntimeError(
            f"Configuration Error: WORKSPACE_ROOT should be set to the HOST path, not the container path. "
            f"Found: WORKSPACE_ROOT={WORKSPACE_ROOT} "
            f"Expected: WORKSPACE_ROOT should be set to your host directory path (e.g., $HOME) "
            f"that contains all files Claude might reference. "
            f"This path gets mounted to /workspace inside the Docker container."
        )
    # Check against other dangerous paths
    if str(resolved_workspace) in DANGEROUS_WORKSPACE_PATHS:
        raise RuntimeError(
            f"Security Error: WORKSPACE_ROOT '{WORKSPACE_ROOT}' is set to a dangerous system directory. "
            f"This would give access to critical system files. "
            f"Please set WORKSPACE_ROOT to a specific project directory."
        )
    # Additional check: prevent filesystem root
    if resolved_workspace.parent == resolved_workspace:
        raise RuntimeError(
            f"Security Error: WORKSPACE_ROOT '{WORKSPACE_ROOT}' cannot be the filesystem root. "
            f"This would give access to the entire filesystem. "
            f"Please set WORKSPACE_ROOT to a specific project directory."
        )
 # Security boundary
 # In Docker: use /workspace (container directory)
 # In tests/direct mode: use WORKSPACE_ROOT (host directory)
 if CONTAINER_WORKSPACE.exists():
    # Running in Docker container
    SECURITY_ROOT = CONTAINER_WORKSPACE
 elif WORKSPACE_ROOT:
    # Running in tests or direct mode with WORKSPACE_ROOT set
    SECURITY_ROOT = Path(WORKSPACE_ROOT).resolve()
 else:
    # Fallback for backward compatibility (should not happen in normal usage)
    SECURITY_ROOT = Path.home()
 # Directories to exclude from recursive file search
 # These typically contain generated code, dependencies, or build artifacts
 EXCLUDED_DIRS = {
    "__pycache__",
    "node_modules",
    ".venv",
    "venv",
    "env",
    ".env",
    ".git",
    ".svn",
    ".hg",
    "build",
    "dist",
    "target",
    ".idea",
    ".vscode",
    "__pypackages__",
    ".mypy_cache",
    ".pytest_cache",
    ".tox",
    "htmlcov",
    ".coverage",
    # Additional build and temp directories
    "out",
    ".next",
    ".nuxt",
    ".cache",
    ".temp",
    ".tmp",
    "bower_components",
    "vendor",
    ".sass-cache",
    ".gradle",
    ".m2",
    "coverage",
    # OS-specific directories
    ".DS_Store",
    "Thumbs.db",
    # Python specific
    "*.egg-info",
    ".eggs",
    "wheels",
    ".Python",
    # IDE and editor directories
    ".sublime",
    ".atom",
    ".brackets",
    "*.swp",
    "*.swo",
    "*~",
    # Documentation build
    "_build",
    "site",
    # Mobile development
    ".expo",
    ".flutter",
 }
 # MCP signature files - presence of these indicates the MCP's own directory
 # Used to prevent the MCP from scanning its own codebase
 MCP_SIGNATURE_FILES = {
    "zen_server.py",
    "server.py",
    "tools/precommit.py",
    "utils/file_utils.py",
    "prompts/tool_prompts.py",
 }
 def is_mcp_directory(path: Path) -> bool:
    """
@@ -242,7 +106,7 @@ def is_home_directory_root(path: Path) -> bool:
        # Check if this is exactly the home directory
        if resolved_path == resolved_home:
            logger.warning(
-                f"Attempted to scan user home directory root: {path}. " f"Please specify a subdirectory instead."
+                f"Attempted to scan user home directory root: {path}. Please specify a subdirectory instead."
            )
            return True
@@ -277,56 +141,105 @@ def is_home_directory_root(path: Path) -> bool:
    return False
-# Common code file extensions that are automatically included when processing directories
+def detect_file_type(file_path: str) -> str:
-# This set can be extended to support additional file types
+    """
-CODE_EXTENSIONS = {
+    Detect file type for appropriate processing strategy.
-    ".py",
+
-    ".js",
+    NOTE: This function is currently not used for line number auto-detection
-    ".ts",
+    due to backward compatibility requirements. It is intended for future
-    ".jsx",
+    features requiring specific file type handling (e.g., image processing,
-    ".tsx",
+    binary file analysis, or enhanced file filtering).
-    ".java",
+
-    ".cpp",
+    Args:
-    ".c",
+        file_path: Path to the file to analyze
-    ".h",
+
-    ".hpp",
+    Returns:
-    ".cs",
+        str: "text", "binary", or "image"
-    ".go",
+    """
-    ".rs",
+    path = Path(file_path)
-    ".rb",
+
-    ".php",
+    # Check extension first (fast)
-    ".swift",
+    extension = path.suffix.lower()
-    ".kt",
+    if extension in TEXT_EXTENSIONS:
-    ".scala",
+        return "text"
-    ".r",
+    elif extension in IMAGE_EXTENSIONS:
-    ".m",
+        return "image"
-    ".mm",
+    elif extension in BINARY_EXTENSIONS:
-    ".sql",
+        return "binary"
-    ".sh",
+
-    ".bash",
+    # Fallback: check magic bytes for text vs binary
-    ".zsh",
+    # This is helpful for files without extensions or unknown extensions
-    ".fish",
+    try:
-    ".ps1",
+        with open(path, "rb") as f:
-    ".bat",
+            chunk = f.read(1024)
-    ".cmd",
+            # Simple heuristic: if we can decode as UTF-8, likely text
-    ".yml",
+            chunk.decode("utf-8")
-    ".yaml",
+            return "text"
-    ".json",
+    except UnicodeDecodeError:
-    ".xml",
+        return "binary"
-    ".toml",
+    except (FileNotFoundError, PermissionError) as e:
-    ".ini",
+        logger.warning(f"Could not access file {file_path} for type detection: {e}")
-    ".cfg",
+        return "unknown"
-    ".conf",
+
-    ".txt",
+
-    ".md",
+def should_add_line_numbers(file_path: str, include_line_numbers: Optional[bool] = None) -> bool:
-    ".rst",
+    """
-    ".tex",
+    Determine if line numbers should be added to a file.
-    ".html",
+
-    ".css",
+    Args:
-    ".scss",
+        file_path: Path to the file
-    ".sass",
+        include_line_numbers: Explicit preference, or None for auto-detection
-    ".less",
+
-}
+    Returns:
        bool: True if line numbers should be added
    """
    if include_line_numbers is not None:
        return include_line_numbers
    # Default: DO NOT add line numbers (backwards compatibility)
    # Tools that want line numbers must explicitly request them
    return False
 def _normalize_line_endings(content: str) -> str:
    """
    Normalize line endings for consistent line numbering.
    Args:
        content: File content with potentially mixed line endings
    Returns:
        str: Content with normalized LF line endings
    """
    # Normalize all line endings to LF for consistent counting
    return content.replace("\r\n", "\n").replace("\r", "\n")
 def _add_line_numbers(content: str) -> str:
    """
    Add line numbers to text content for precise referencing.
    Args:
        content: Text content to number
    Returns:
        str: Content with line numbers in format "  45│ actual code line"
        Supports files up to 99,999 lines with dynamic width allocation
    """
    # Normalize line endings first
    normalized_content = _normalize_line_endings(content)
    lines = normalized_content.split("\n")
    # Dynamic width allocation based on total line count
    # This supports files of any size by computing required width
    total_lines = len(lines)
    width = len(str(total_lines))
    width = max(width, 4)  # Minimum padding for readability
    # Format with dynamic width and clear separator
    numbered_lines = [f"{i + 1:{width}d}│ {line}" for i, line in enumerate(lines)]
    return "\n".join(numbered_lines)
 def translate_path_for_environment(path_str: str) -> str:
@@ -515,15 +428,13 @@ def expand_paths(paths: list[str], extensions: Optional[set[str]] = None) -> lis
            # Check 2: Prevent scanning user's home directory root
            if is_home_directory_root(path_obj):
-                logger.warning(
+                logger.warning(f"Skipping home directory root: {path}. Please specify a project subdirectory instead.")
                    f"Skipping home directory root: {path}. " f"Please specify a project subdirectory instead."
                )
                continue
            # Check 3: Skip if this is the MCP's own directory
            if is_mcp_directory(path_obj):
                logger.info(
-                    f"Skipping MCP server directory: {path}. " f"The MCP server code is excluded from project scans."
+                    f"Skipping MCP server directory: {path}. The MCP server code is excluded from project scans."
                )
                continue
@@ -575,7 +486,9 @@ def expand_paths(paths: list[str], extensions: Optional[set[str]] = None) -> lis
    return expanded_files
-def read_file_content(file_path: str, max_size: int = 1_000_000) -> tuple[str, int]:
+def read_file_content(
    file_path: str, max_size: int = 1_000_000, *, include_line_numbers: Optional[bool] = None
 ) -> tuple[str, int]:
    """
    Read a single file and format it for inclusion in AI prompts.
@@ -586,6 +499,7 @@ def read_file_content(file_path: str, max_size: int = 1_000_000) -> tuple[str, i
    Args:
        file_path: Path to file (must be absolute)
        max_size: Maximum file size to read (default 1MB to prevent memory issues)
        include_line_numbers: Whether to add line numbers. If None, auto-detects based on file type
    Returns:
        Tuple of (formatted_content, estimated_tokens)
@@ -634,6 +548,10 @@ def read_file_content(file_path: str, max_size: int = 1_000_000) -> tuple[str, i
            content = f"\n--- FILE TOO LARGE: {file_path} ---\nFile size: {file_size:,} bytes (max: {max_size:,})\n--- END FILE ---\n"
            return content, estimate_tokens(content)
        # Determine if we should add line numbers
        add_line_numbers = should_add_line_numbers(file_path, include_line_numbers)
        logger.debug(f"[FILES] Line numbers for {file_path}: {'enabled' if add_line_numbers else 'disabled'}")
        # Read the file with UTF-8 encoding, replacing invalid characters
        # This ensures we can handle files with mixed encodings
        logger.debug(f"[FILES] Reading file content for {file_path}")
@@ -642,6 +560,14 @@ def read_file_content(file_path: str, max_size: int = 1_000_000) -> tuple[str, i
        logger.debug(f"[FILES] Successfully read {len(file_content)} characters from {file_path}")
        # Add line numbers if requested or auto-detected
        if add_line_numbers:
            file_content = _add_line_numbers(file_content)
            logger.debug(f"[FILES] Added line numbers to {file_path}")
        else:
            # Still normalize line endings for consistency
            file_content = _normalize_line_endings(file_content)
        # Format with clear delimiters that help the AI understand file boundaries
        # Using consistent markers makes it easier for the model to parse
        # NOTE: These markers ("--- BEGIN FILE: ... ---") are distinct from git diff markers
@@ -665,6 +591,8 @@ def read_files(
    code: Optional[str] = None,
    max_tokens: Optional[int] = None,
    reserve_tokens: int = 50_000,
    *,
    include_line_numbers: bool = False,
 ) -> str:
    """
    Read multiple files and optional direct code with smart token management.
@@ -679,6 +607,7 @@ def read_files(
        code: Optional direct code to include (prioritized over files)
        max_tokens: Maximum tokens to use (defaults to DEFAULT_CONTEXT_WINDOW)
        reserve_tokens: Tokens to reserve for prompt and response (default 50K)
        include_line_numbers: Whether to add line numbers to file content
    Returns:
        str: All file contents formatted for AI consumption
@@ -728,7 +657,7 @@ def read_files(
                    files_skipped.extend(all_files[i:])
                    break
-                file_content, file_tokens = read_file_content(file_path)
+                file_content, file_tokens = read_file_content(file_path, include_line_numbers=include_line_numbers)
                logger.debug(f"[FILES] File {file_path}: {file_tokens:,} tokens")
                # Check if adding this file would exceed limit
--- a/utils/security_config.py
+++ b/utils/security_config.py
@@ -0,0 +1,174 @@
 """
 Security configuration and path validation constants
 This module contains security-related constants and configurations
 for file access control and workspace management.
 """
 import os
 from pathlib import Path
 # Dangerous paths that should never be used as WORKSPACE_ROOT
 # These would give overly broad access and pose security risks
 DANGEROUS_WORKSPACE_PATHS = {
    "/",
    "/etc",
    "/usr",
    "/bin",
    "/var",
    "/root",
    "/home",
    "/workspace",  # Container path - WORKSPACE_ROOT should be host path
    "C:\\",
    "C:\\Windows",
    "C:\\Program Files",
    "C:\\Users",
 }
 # Directories to exclude from recursive file search
 # These typically contain generated code, dependencies, or build artifacts
 EXCLUDED_DIRS = {
    # Python
    "__pycache__",
    ".venv",
    "venv",
    "env",
    ".env",
    "*.egg-info",
    ".eggs",
    "wheels",
    ".Python",
    ".mypy_cache",
    ".pytest_cache",
    ".tox",
    "htmlcov",
    ".coverage",
    "coverage",
    # Node.js / JavaScript
    "node_modules",
    ".next",
    ".nuxt",
    "bower_components",
    ".sass-cache",
    # Version Control
    ".git",
    ".svn",
    ".hg",
    # Build Output
    "build",
    "dist",
    "target",
    "out",
    # IDEs
    ".idea",
    ".vscode",
    ".sublime",
    ".atom",
    ".brackets",
    # Temporary / Cache
    ".cache",
    ".temp",
    ".tmp",
    "*.swp",
    "*.swo",
    "*~",
    # OS-specific
    ".DS_Store",
    "Thumbs.db",
    # Java / JVM
    ".gradle",
    ".m2",
    # Documentation build
    "_build",
    "site",
    # Mobile development
    ".expo",
    ".flutter",
    # Package managers
    "vendor",
 }
 # MCP signature files - presence of these indicates the MCP's own directory
 # Used to prevent the MCP from scanning its own codebase
 MCP_SIGNATURE_FILES = {
    "zen_server.py",
    "server.py",
    "tools/precommit.py",
    "utils/file_utils.py",
    "prompts/tool_prompts.py",
 }
 # Workspace configuration
 WORKSPACE_ROOT = os.environ.get("WORKSPACE_ROOT")
 CONTAINER_WORKSPACE = Path("/workspace")
 def validate_workspace_security(workspace_root: str) -> None:
    """
    Validate that WORKSPACE_ROOT is set to a safe directory.
    Args:
        workspace_root: The workspace root path to validate
    Raises:
        RuntimeError: If the workspace root is unsafe
    """
    if not workspace_root:
        return
    # Resolve to canonical path for comparison
    resolved_workspace = Path(workspace_root).resolve()
    # Special check for /workspace - common configuration mistake
    if str(resolved_workspace) == "/workspace":
        raise RuntimeError(
            f"Configuration Error: WORKSPACE_ROOT should be set to the HOST path, not the container path. "
            f"Found: WORKSPACE_ROOT={workspace_root} "
            f"Expected: WORKSPACE_ROOT should be set to your host directory path (e.g., $HOME) "
            f"that contains all files Claude might reference. "
            f"This path gets mounted to /workspace inside the Docker container."
        )
    # Check against other dangerous paths
    if str(resolved_workspace) in DANGEROUS_WORKSPACE_PATHS:
        raise RuntimeError(
            f"Security Error: WORKSPACE_ROOT '{workspace_root}' is set to a dangerous system directory. "
            f"This would give access to critical system files. "
            f"Please set WORKSPACE_ROOT to a specific project directory."
        )
    # Additional check: prevent filesystem root
    if resolved_workspace.parent == resolved_workspace:
        raise RuntimeError(
            f"Security Error: WORKSPACE_ROOT '{workspace_root}' cannot be the filesystem root. "
            f"This would give access to the entire filesystem. "
            f"Please set WORKSPACE_ROOT to a specific project directory."
        )
 def get_security_root() -> Path:
    """
    Determine the security boundary for file access.
    Returns:
        Path object representing the security root directory
    """
    # In Docker: use /workspace (container directory)
    # In tests/direct mode: use WORKSPACE_ROOT (host directory)
    if CONTAINER_WORKSPACE.exists():
        # Running in Docker container
        return CONTAINER_WORKSPACE
    elif WORKSPACE_ROOT:
        # Running in tests or direct mode with WORKSPACE_ROOT set
        return Path(WORKSPACE_ROOT).resolve()
    else:
        # Fallback for backward compatibility (should not happen in normal usage)
        return Path.home()
 # Validate security on import if WORKSPACE_ROOT is set
 if WORKSPACE_ROOT:
    validate_workspace_security(WORKSPACE_ROOT)
 # Export the computed security root
 SECURITY_ROOT = get_security_root()