Easier access to logs at startup with -f on the run script

Improved prompt for immediate action Additional logging of tool names Updated documentation Context aware decomposition system prompt New script to run code quality checks
2025-06-15 09:25:52 +04:00
parent 318b5c7ae7
commit c7835e7eef
16 changed files with 635 additions and 201 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,251 @@
+# Claude Development Guide for Zen MCP Server
+
+This file contains essential commands and workflows for developing and maintaining the Zen MCP Server when working with Claude. Use these instructions to efficiently run quality checks, manage the server, check logs, and run tests.
+
+## Quick Reference Commands
+
+### Code Quality Checks
+
+Before making any changes or submitting PRs, always run the comprehensive quality checks:
+
+```bash
+# Activate virtual environment first
+source venv/bin/activate
+
+# Run all quality checks (linting, formatting, tests)
+./code_quality_checks.sh
+```
+
+This script automatically runs:
+- Ruff linting with auto-fix
+- Black code formatting 
+- Import sorting with isort
+- Complete unit test suite (361 tests)
+- Verification that all checks pass 100%
+
+### Server Management
+
+#### Start/Restart the Server
+```bash
+# Start or restart the Docker containers
+./run-server.sh
+```
+
+This script will:
+- Build/rebuild Docker images if needed
+- Start the MCP server container (`zen-mcp-server`)
+- Start the Redis container (`zen-mcp-redis`)
+- Set up proper networking and volumes
+
+#### Check Server Status
+```bash
+# Check if containers are running
+docker ps
+
+# Look for these containers:
+# - zen-mcp-server
+# - zen-mcp-redis
+```
+
+### Log Management
+
+#### View Server Logs
+```bash
+# View last 500 lines of server logs
+docker exec zen-mcp-server tail -n 500 /tmp/mcp_server.log
+
+# Follow logs in real-time
+docker exec zen-mcp-server tail -f /tmp/mcp_server.log
+
+# View specific number of lines (replace 100 with desired count)
+docker exec zen-mcp-server tail -n 100 /tmp/mcp_server.log
+
+# Search logs for specific patterns
+docker exec zen-mcp-server grep "ERROR" /tmp/mcp_server.log
+docker exec zen-mcp-server grep "tool_name" /tmp/mcp_server.log
+```
+
+#### Monitor Tool Executions Only
+```bash
+# View tool activity log (focused on tool calls and completions)
+docker exec zen-mcp-server tail -n 100 /tmp/mcp_activity.log
+
+# Follow tool activity in real-time
+docker exec zen-mcp-server tail -f /tmp/mcp_activity.log
+
+# Use the dedicated log monitor (shows tool calls, completions, errors)
+python log_monitor.py
+```
+
+The `log_monitor.py` script provides a real-time view of:
+- Tool calls and completions
+- Conversation resumptions and context
+- Errors and warnings from all log files
+- File rotation handling
+
+#### All Available Log Files
+```bash
+# Main server log (all activity)
+docker exec zen-mcp-server tail -f /tmp/mcp_server.log
+
+# Tool activity only (TOOL_CALL, TOOL_COMPLETED, etc.)
+docker exec zen-mcp-server tail -f /tmp/mcp_activity.log
+
+# Debug information
+docker exec zen-mcp-server tail -f /tmp/gemini_debug.log
+
+# Overflow logs (when main log gets too large)
+docker exec zen-mcp-server tail -f /tmp/mcp_server_overflow.log
+```
+
+#### Debug Container Issues
+```bash
+# Check container logs (Docker level)
+docker logs zen-mcp-server
+
+# Execute interactive shell in container
+docker exec -it zen-mcp-server /bin/bash
+
+# Check Redis container logs
+docker logs zen-mcp-redis
+```
+
+### Testing
+
+#### Run All Simulator Tests
+```bash
+# Run the complete test suite
+python communication_simulator_test.py
+
+# Run tests with verbose output
+python communication_simulator_test.py --verbose
+
+# Force rebuild environment before testing
+python communication_simulator_test.py --rebuild
+```
+
+#### Run Individual Simulator Tests
+```bash
+# List all available tests
+python communication_simulator_test.py --list-tests
+
+# Run a specific test individually (with full Docker setup)
+python communication_simulator_test.py --individual basic_conversation
+python communication_simulator_test.py --individual content_validation
+python communication_simulator_test.py --individual cross_tool_continuation
+
+# Run multiple specific tests
+python communication_simulator_test.py --tests basic_conversation content_validation
+
+# Run individual test with verbose output
+python communication_simulator_test.py --individual logs_validation --verbose
+```
+
+Available simulator tests include:
+- `basic_conversation` - Basic conversation flow with chat tool
+- `content_validation` - Content validation and duplicate detection
+- `per_tool_deduplication` - File deduplication for individual tools
+- `cross_tool_continuation` - Cross-tool conversation continuation scenarios
+- `cross_tool_comprehensive` - Comprehensive cross-tool integration testing
+- `logs_validation` - Docker logs validation
+- `redis_validation` - Redis conversation memory validation
+- `model_thinking_config` - Model thinking configuration testing
+- `o3_model_selection` - O3 model selection and routing testing
+- `ollama_custom_url` - Ollama custom URL configuration testing
+- `openrouter_fallback` - OpenRouter fallback mechanism testing
+- `openrouter_models` - OpenRouter models availability testing
+- `token_allocation_validation` - Token allocation and limits validation
+- `conversation_chain_validation` - Conversation chain continuity validation
+
+#### Run Unit Tests Only
+```bash
+# Run all unit tests
+python -m pytest tests/ -v
+
+# Run specific test file
+python -m pytest tests/test_refactor.py -v
+
+# Run specific test function
+python -m pytest tests/test_refactor.py::TestRefactorTool::test_format_response -v
+
+# Run tests with coverage
+python -m pytest tests/ --cov=. --cov-report=html
+```
+
+### Development Workflow
+
+#### Before Making Changes
+1. Ensure virtual environment is activated: `source venv/bin/activate`
+2. Run quality checks: `./code_quality_checks.sh`
+3. Check server is running: `./run-server.sh`
+
+#### After Making Changes
+1. Run quality checks again: `./code_quality_checks.sh`
+2. Run relevant simulator tests: `python communication_simulator_test.py --individual <test_name>`
+3. Check logs for any issues: `docker exec zen-mcp-server tail -n 100 /tmp/mcp_server.log`
+
+#### Before Committing/PR
+1. Final quality check: `./code_quality_checks.sh`
+2. Run full simulator test suite: `python communication_simulator_test.py`
+3. Verify all tests pass 100%
+
+### Common Troubleshooting
+
+#### Container Issues
+```bash
+# Restart containers if they're not responding
+docker stop zen-mcp-server zen-mcp-redis
+./run-server.sh
+
+# Check container resource usage
+docker stats zen-mcp-server
+
+# Remove containers and rebuild from scratch
+docker rm -f zen-mcp-server zen-mcp-redis
+./run-server.sh
+```
+
+#### Test Failures
+```bash
+# Run individual failing test with verbose output
+python communication_simulator_test.py --individual <test_name> --verbose
+
+# Check server logs during test execution
+docker exec zen-mcp-server tail -f /tmp/mcp_server.log
+
+# Run tests while keeping containers running for debugging
+python communication_simulator_test.py --keep-logs
+```
+
+#### Linting Issues
+```bash
+# Auto-fix most linting issues
+ruff check . --fix
+black .
+isort .
+
+# Check what would be changed without applying
+ruff check .
+black --check .
+isort --check-only .
+```
+
+### File Structure Context
+
+- `./code_quality_checks.sh` - Comprehensive quality check script
+- `./run-server.sh` - Docker container setup and management
+- `communication_simulator_test.py` - End-to-end testing framework
+- `simulator_tests/` - Individual test modules
+- `tests/` - Unit test suite
+- `tools/` - MCP tool implementations
+- `providers/` - AI provider implementations
+- `systemprompts/` - System prompt definitions
+
+### Environment Requirements
+
+- Python 3.8+ with virtual environment activated
+- Docker and Docker Compose installed
+- All dependencies from `requirements.txt` installed
+- Proper API keys configured in environment or config files
+
+This guide provides everything needed to efficiently work with the Zen MCP Server codebase using Claude. Always run quality checks before and after making changes to ensure code integrity.
--- a/README.md
+++ b/README.md
@@ -455,10 +455,16 @@ constraints.

 #### Example Prompts:

-**Basic Usage:**
 ```
 "Use gemini pro to decompose my_crazy_big_class.m into smaller extensions"
-"Get gemini pro to identify code smells in the authentication module"
+"Using zen's refactor decompose the all_in_one_sync_code.swift into maintainable extensions"
+```
+
+Example of a **powerful prompt** to get the best ouf of both Claude + Flash's 1M Context: 
+```
+"First, think about how the authentication module works, find related classes and find
+ any code smells, then using zen's refactor ask flash to confirm your findings but ask it to 
+ find additional code smells and any other quick-wins and then fix these issues"
 ```

 **Key Features:**
--- a/code_quality_checks.sh
+++ b/code_quality_checks.sh
@@ -0,0 +1,60 @@
+#!/bin/bash
+
+# Zen MCP Server - Code Quality Checks
+# This script runs all required linting and testing checks before committing changes.
+# ALL checks must pass 100% for CI/CD to succeed.
+
+set -e  # Exit on any error
+
+echo "🔍 Running Code Quality Checks for Zen MCP Server"
+echo "================================================="
+
+# Check if virtual environment is activated
+if [[ "$VIRTUAL_ENV" == "" ]]; then
+    echo "❌ Virtual environment not activated!"
+    echo "Please run: source venv/bin/activate"
+    exit 1
+fi
+
+echo "✅ Virtual environment detected: $VIRTUAL_ENV"
+echo ""
+
+# Step 1: Linting and Formatting
+echo "📋 Step 1: Running Linting and Formatting Checks"
+echo "--------------------------------------------------"
+
+echo "🔧 Running ruff linting with auto-fix..."
+ruff check --fix
+
+echo "🎨 Running black code formatting..."
+black .
+
+echo "📦 Running import sorting with isort..."
+isort .
+
+echo "✅ Verifying all linting passes..."
+ruff check
+
+echo "✅ Step 1 Complete: All linting and formatting checks passed!"
+echo ""
+
+# Step 2: Unit Tests
+echo "🧪 Step 2: Running Complete Unit Test Suite"
+echo "---------------------------------------------"
+
+echo "🏃 Running all 361 unit tests..."
+python -m pytest tests/ -v
+
+echo "✅ Step 2 Complete: All unit tests passed!"
+echo ""
+
+# Step 3: Final Summary
+echo "🎉 All Code Quality Checks Passed!"
+echo "=================================="
+echo "✅ Linting (ruff): PASSED"
+echo "✅ Formatting (black): PASSED" 
+echo "✅ Import sorting (isort): PASSED"
+echo "✅ Unit tests (361 tests): PASSED"
+echo ""
+echo "🚀 Your code is ready for commit and GitHub Actions!"
+echo "💡 Remember to add simulator tests if you modified tools"
--- a/docs/advanced-usage.md
+++ b/docs/advanced-usage.md
@@ -24,7 +24,7 @@ DEFAULT_MODEL=auto  # Claude picks the best model automatically

 # API Keys (at least one required)
 GEMINI_API_KEY=your-gemini-key    # Enables Gemini Pro & Flash
-OPENAI_API_KEY=your-openai-key    # Enables O3, O3-mini
+OPENAI_API_KEY=your-openai-key    # Enables O3, O3-mini, O4-mini, O4-mini-high
 ```

 **How Auto Mode Works:**
--- a/docs/contributions.md
+++ b/docs/contributions.md
@@ -25,12 +25,25 @@ We maintain high code quality standards. **All contributions must pass our autom

 #### Required Code Quality Checks

-Before submitting any PR, run these commands:
+Before submitting any PR, run our automated quality check script:

 ```bash
 # Activate virtual environment first
 source venv/bin/activate

+# Run the comprehensive quality checks script
+./code_quality_checks.sh
+```
+
+This script automatically runs:
+- Ruff linting with auto-fix
+- Black code formatting 
+- Import sorting with isort
+- Complete unit test suite (361 tests)
+- Verification that all checks pass 100%
+
+**Manual commands** (if you prefer to run individually):
+```bash
 # Run all linting checks (MUST pass 100%)
 ruff check .
 black --check .
@@ -105,7 +118,7 @@ Your PR title MUST follow one of these formats:
 Use our [PR template](../.github/pull_request_template.md) and ensure:

 - [ ] PR title follows the format guidelines above
- [ ] Activated venv and ran all linting
+- [ ] Activated venv and ran `./code_quality_checks.sh` (all checks passed 100%)
 - [ ] Self-review completed
 - [ ] Tests added for ALL changes
 - [ ] Documentation updated as needed
--- a/providers/custom.py
+++ b/providers/custom.py
@@ -71,11 +71,10 @@ class CustomProvider(OpenAICompatibleProvider):
        # Initialize model registry (shared with OpenRouter for consistent aliases)
        if CustomProvider._registry is None:
            CustomProvider._registry = OpenRouterModelRegistry()
-
-        # Log loaded models and aliases
-        models = self._registry.list_models()
-        aliases = self._registry.list_aliases()
-        logging.info(f"Custom provider loaded {len(models)} models with {len(aliases)} aliases")
+            # Log loaded models and aliases only on first load
+            models = self._registry.list_models()
+            aliases = self._registry.list_aliases()
+            logging.info(f"Custom provider loaded {len(models)} models with {len(aliases)} aliases")

    def _resolve_model_name(self, model_name: str) -> str:
        """Resolve model aliases to actual model names.
--- a/providers/openrouter.py
+++ b/providers/openrouter.py
@@ -45,11 +45,10 @@ class OpenRouterProvider(OpenAICompatibleProvider):
        # Initialize model registry
        if OpenRouterProvider._registry is None:
            OpenRouterProvider._registry = OpenRouterModelRegistry()
-
-        # Log loaded models and aliases
-        models = self._registry.list_models()
-        aliases = self._registry.list_aliases()
-        logging.info(f"OpenRouter loaded {len(models)} models with {len(aliases)} aliases")
+            # Log loaded models and aliases only on first load
+            models = self._registry.list_models()
+            aliases = self._registry.list_aliases()
+            logging.info(f"OpenRouter loaded {len(models)} models with {len(aliases)} aliases")

    def _parse_allowed_models(self) -> None:
        """Override to disable environment-based allow-list.
--- a/providers/openrouter_registry.py
+++ b/providers/openrouter_registry.py
@@ -78,7 +78,33 @@ class OpenRouterModelRegistry:
        try:
            configs = self._read_config()
            self._build_maps(configs)
-            logging.info(f"Loaded {len(self.model_map)} OpenRouter models with {len(self.alias_map)} aliases")
+            caller_info = ""
+            try:
+                import inspect
+
+                caller_frame = inspect.currentframe().f_back
+                if caller_frame:
+                    caller_name = caller_frame.f_code.co_name
+                    caller_file = (
+                        caller_frame.f_code.co_filename.split("/")[-1] if caller_frame.f_code.co_filename else "unknown"
+                    )
+                    # Look for tool context
+                    while caller_frame:
+                        frame_locals = caller_frame.f_locals
+                        if "self" in frame_locals and hasattr(frame_locals["self"], "get_name"):
+                            tool_name = frame_locals["self"].get_name()
+                            caller_info = f" (called from {tool_name} tool)"
+                            break
+                        caller_frame = caller_frame.f_back
+                    if not caller_info:
+                        caller_info = f" (called from {caller_name} in {caller_file})"
+            except Exception:
+                # If frame inspection fails, just continue without caller info
+                pass
+
+            logging.debug(
+                f"Loaded {len(self.model_map)} OpenRouter models with {len(self.alias_map)} aliases{caller_info}"
+            )
        except ValueError as e:
            # Re-raise ValueError only for duplicate aliases (critical config errors)
            logging.error(f"Failed to load OpenRouter model configuration: {e}")
--- a/systemprompts/refactor_prompt.py
+++ b/systemprompts/refactor_prompt.py
@@ -8,6 +8,9 @@ You are a principal software engineer specializing in intelligent code refactori
 opportunities and provide precise, actionable suggestions with exact line-number references that Claude can
 implement directly.

+CRITICAL: You MUST respond ONLY in valid JSON format. NO explanations, introductions, or text outside JSON structure.
+Claude cannot parse your response if you include any non-JSON content.
+
 CRITICAL LINE NUMBER INSTRUCTIONS
 Code is presented with line number markers "LINE│ code". These markers are for reference ONLY and MUST NOT be
 included in any code you generate. Always reference specific line numbers for Claude to locate exact positions.
@@ -16,10 +19,10 @@ snippets.

 IF MORE INFORMATION IS NEEDED
 If you need additional context (e.g., related files, configuration, dependencies) to provide accurate refactoring
-recommendations, you MUST respond ONLY with this JSON format (and nothing else). Do NOT ask for the same file you've
-been provided unless for some reason its content is missing or incomplete:
-{"status": "clarification_required", "question": "<your brief question>",
- "files_needed": ["[file name here]", "[or some folder/]"]}
+recommendations, you MUST respond ONLY with this JSON format (and ABSOLUTELY nothing else - no text before or after):
+{"status": "clarification_required", "question": "<your brief question>", "files_needed": ["[file name here]", "[or some folder/]"]}
+
+Do NOT ask for the same file you've been provided unless its content is missing or incomplete.

 REFACTOR TYPES (PRIORITY ORDER)

@@ -28,66 +31,170 @@ REFACTOR TYPES (PRIORITY ORDER)
 3. **modernize**
 4. **organization**

-**decompose**: CRITICAL PRIORITY for cognitive load reduction. When encountering large files (>1500 lines), huge classes
-(>300 lines), or massive functions (>80 lines), decomposition is MANDATORY before any other refactoring type. Large
-codebases are impossible to navigate, understand, or maintain.
+**decompose**: CONTEXT-AWARE PRIORITY for cognitive load reduction. Apply intelligent decomposition based on adaptive
+thresholds and contextual analysis:

-DECOMPOSITION ORDER (STRICT TOP-DOWN, ADAPTIVE):
-Analyze in this sequence, stopping at the FIRST breached threshold in each file:
+**AUTOMATIC decomposition (CRITICAL severity - MANDATORY before other refactoring)**:
+- Files >15000 LOC, Classes >3000 LOC, Functions >500 LOC
+- These thresholds indicate truly problematic code size that blocks maintainability

-1. **File Level (>1500 LOC)** → Propose file-level splits ONLY, then re-analyze after implementation
-2. **Class Level (>300 LOC)** → Propose class extraction ONLY, then re-analyze after implementation
-3. **Function Level (>80 LOC)** → Propose function extraction
+**EVALUATE decomposition (HIGH/MEDIUM/LOW severity - context-dependent)**:
+- Files >5000 LOC, Classes >1000 LOC, Functions >150 LOC
+- Analyze context: legacy stability, domain complexity, performance constraints, language patterns
+- Only recommend if decomposition genuinely improves maintainability without introducing complexity
+- Respect legitimate cases where size is justified (algorithms, state machines, domain entities, generated code)

-RATIONALE: Outer-scope size dominates cognitive load and merge complexity. NEVER descend to an inner level until
-the containing level is within its threshold. This prevents premature micro-optimization and ensures maximum
-cognitive load reduction with minimum rework.
+**INTELLIGENT ASSESSMENT**: Consider project context, team constraints, and engineering tradeoffs before
+suggesting decomposition. Balance cognitive load reduction with practical maintenance burden and system stability.
+
+DECOMPOSITION ORDER (CONTEXT-AWARE, ADAPTIVE THRESHOLDS):
+Analyze in this sequence using INTELLIGENT thresholds based on context, stopping at the FIRST breached threshold:
+
+**ADAPTIVE THRESHOLD SYSTEM:**
+Use HIGHER thresholds for automatic decomposition suggestions, with LOWER thresholds for "consider if necessary" analysis:
+
+1. **File Level**:
+   - AUTOMATIC (>15000 LOC): Immediate decomposition required - blocking issue
+   - EVALUATE (>5000 LOC): Consider decomposition ONLY if:
+     * Legacy monolith with poor organization patterns
+     * Multiple unrelated responsibilities mixed together
+     * High change frequency causing merge conflicts
+     * Team struggles with navigation/understanding
+     * Generated/config files are exempt unless truly problematic
+
+2. **Class Level**:
+   - AUTOMATIC (>3000 LOC): Immediate decomposition required - blocking issue
+   - EVALUATE (>1000 LOC): Consider decomposition ONLY if:
+     * Class violates single responsibility principle significantly
+     * Contains multiple distinct behavioral domains
+     * High coupling between unrelated methods/data
+     * Some large classes are intentionally monolithic (performance, state management, frameworks)
+     * Domain entities with complex business logic may legitimately be large
+
+3. **Function Level**:
+   - AUTOMATIC (>500 LOC): Immediate decomposition required - blocking issue
+   - EVALUATE (>150 LOC): Consider decomposition ONLY if:
+     * Function handles multiple distinct responsibilities
+     * Contains deeply nested control structures (>4 levels)
+     * Mixed abstraction levels (low-level + high-level operations)
+     * Some functions MUST be large (state machines, parsers, complex algorithms, performance-critical loops)
+     * Extraction would require excessive parameter passing (>6-8 parameters)
+
+**CONTEXT-SENSITIVE EXEMPTIONS:**
+- **Performance-Critical Code**: Avoid decomposition if it adds method call overhead in hot paths
+- **Legacy/Generated Code**: Higher tolerance for size if heavily tested and stable
+- **Domain Complexity**: Financial calculations, scientific algorithms may need larger methods for correctness
+- **Language Patterns**: Some languages favor larger constructs (C macros, template metaprogramming)
+- **Framework Constraints**: ORM entities, serialization classes, configuration objects
+- **Algorithmic Cohesion**: Don't split tightly coupled algorithmic steps that belong together
+- **State Management**: Complex state machines or transaction handlers may need size for correctness
+- **Platform Integration**: Large platform API wrappers or native interop code
+- **Testing Infrastructure**: Test fixtures and integration tests often grow large legitimately
+
+RATIONALE: Balance cognitive load reduction with practical engineering constraints. Avoid breaking working code
+unless there's clear benefit. Respect language idioms, performance requirements, and domain complexity.

 DECOMPOSITION STRATEGIES:

 **File-Level Decomposition** (PRIORITY 1): Split oversized files into multiple focused files:
+   - **CONTEXT ANALYSIS FIRST**: Assess if file size is problematic or justified:
+     * Legacy monoliths with mixed responsibilities → HIGH priority for decomposition
+     * Large but well-organized domain files → LOWER priority, focus on logical boundaries
+     * Generated/config files → Usually exempt unless causing real issues
+     * Platform-specific considerations (header files, modules, packages)
   - Extract related classes/functions into separate modules using platform-specific patterns
   - Create logical groupings (models, services, utilities, components, etc.)
   - Use proper import/export mechanisms for the target language
   - Focus on responsibility-based splits, not arbitrary size cuts
+   - **DEPENDENCY IMPACT ANALYSIS**: Assess extraction complexity:
+     * Simple extractions with clean boundaries → HIGH priority
+     * Complex interdependencies requiring major API changes → LOWER priority
+     * Circular dependencies or tight coupling → May need architectural changes first
   - CAUTION: When only a single file is provided, verify dependencies and imports before suggesting file splits
   - DEPENDENCY ANALYSIS: Check for cross-references, shared constants, and inter-class dependencies
   - If splitting breaks internal dependencies, suggest necessary visibility changes or shared modules
+   - **LEGACY SYSTEM CONSIDERATIONS**: Higher tolerance for large files if:
+     * Well-tested and stable with minimal change frequency
+     * Complex domain logic that benefits from co-location
+     * Breaking changes would require extensive testing across large system

 **Class-Level Decomposition** (PRIORITY 2): Break down mega-classes:
-   - FIRST: Split large classes into multiple classes where programming language allows (C# partial classes,
-   Swift and ObjC extensions, JavaScript modules, etc.)
-   - THEN: Extract specialized responsibilities into focused classes via composition or inheritance if this is feasible
-   - Use composition over inheritance where appropriate
-   - Apply single responsibility principle cautiously - avoid breaking existing APIs or adding new dependencies
-   - When only a single file is provided, prefer internal splitting methods (private classes, inner classes,
-     helper methods)
+   - **CONTEXT ANALYSIS FIRST**: Assess if class size is problematic or justified:
+     * Domain entities with complex business rules → May legitimately be large
+     * Framework/ORM base classes → Often intentionally comprehensive
+     * State management classes → Size may be necessary for correctness
+     * Mixed responsibilities in one class → HIGH priority for decomposition
+     * Performance-critical classes → Avoid decomposition if it adds overhead
+   - **LANGUAGE-SPECIFIC STRATEGIES**:
+     * C# partial classes for file splitting without architectural changes
+     * Swift extensions for logical grouping while maintaining access
+     * JavaScript modules for responsibility separation
+     * Java inner classes for helper functionality
+     * Python mixins for cross-cutting concerns
+   - FIRST: Split large classes using language-native mechanisms that preserve existing APIs
+   - THEN: Extract specialized responsibilities into focused classes via composition or inheritance if feasible
+   - **DEPENDENCY PRESERVATION**: Prioritize solutions that maintain existing public APIs:
+     * Use composition over inheritance where appropriate
+     * Apply single responsibility principle cautiously - avoid breaking existing consumers
+     * When only a single file is provided, prefer internal splitting methods (private classes, inner classes, helper methods)
   - Consider interface segregation for large public APIs only if it doesn't break existing consumers
-   - CRITICAL: When moving code between files/extensions, analyze access dependencies (private variables,
-     internal methods)
-   - WARNING: Some moves may break access visibility (Swift private→extension, C# internal→assembly) - flag for review
-   - If access breaks are unavoidable, explicitly note required visibility changes (private→internal, protected, etc.)
+   - **ACCESS CONTROL ANALYSIS**: Critical when moving code between files/extensions:
+     * Analyze access dependencies (private variables, internal methods, package-private)
+     * WARNING: Some moves may break access visibility (Swift private→extension, C# internal→assembly)
+     * If access breaks are unavoidable, explicitly note required visibility changes (private→internal, protected, public)
+     * Flag moves that would expose previously private members for security review

 **Function-Level Decomposition** (PRIORITY 3): Eliminate long, complex functions:
-   - Extract logical chunks into private/helper methods within the same class/module
-   - Separate data processing from business logic conservatively
-   - Create clear, named abstractions for complex operations without breaking existing call sites
-   - Maintain function cohesion and minimize parameter passing
+   - **CONTEXT ANALYSIS FIRST**: Assess if function size is problematic or justified:
+     * State machines, parsers, complex algorithms → Often legitimately large for correctness
+     * Performance-critical loops → Avoid decomposition if it adds call overhead
+     * Functions with high local variable coupling → Extraction may require excessive parameters
+     * Mixed abstraction levels in one function → HIGH priority for decomposition
+     * Deeply nested control structures (>4 levels) → HIGH priority for decomposition
+   - **ALGORITHMIC COHESION ASSESSMENT**: Avoid breaking tightly coupled algorithmic steps:
+     * Mathematical computations that belong together
+     * Transaction processing that must be atomic
+     * Error handling sequences that need coordinated rollback
+     * Security-sensitive operations that need to be auditable as a unit
+   - **EXTRACTION STRATEGIES** (prefer least disruptive):
+     * Extract logical chunks into private/helper methods within the same class/module
+     * Create clear, named abstractions for complex operations without breaking existing call sites
+     * Separate data processing from business logic conservatively
+     * Maintain function cohesion and minimize parameter passing (>6-8 parameters indicates poor extraction)
+   - **LANGUAGE-SPECIFIC CONSIDERATIONS**:
+     * Closure-heavy languages: Be careful with captured variable dependencies
+     * Static languages: Consider template/generic extraction for type safety
+     * Dynamic languages: Ensure extracted functions maintain same error handling
+     * Functional languages: Prefer function composition over imperative extraction
   - Prefer internal extraction over creating new dependencies or external functions
-   - ANALYZE DEPENDENCIES: Check for private variable access, closure captures, and scope-dependent behavior
-   - If extraction breaks variable access, suggest parameter passing or scope adjustments
-   - Flag functions that require manual review due to complex inter-dependencies
+   - **DEPENDENCY ANALYSIS**: Critical for successful extraction:
+     * Check for private variable access, closure captures, and scope-dependent behavior
+     * Analyze local variable lifecycle and mutation patterns
+     * If extraction breaks variable access, suggest parameter passing or scope adjustments
+     * Flag functions that require manual review due to complex inter-dependencies
+   - **PERFORMANCE IMPACT**: Consider if extraction affects performance-critical code paths

-CRITICAL RULE: If ANY file exceeds cognitive complexity thresholds (large files/classes/functions), you MUST:
-1. Mark ALL decomposition opportunities as CRITICAL severity
+CRITICAL RULE: If ANY component exceeds AUTOMATIC thresholds (15000+ LOC files, 3000+ LOC classes, 500+ LOC functions), you MUST:
+1. Mark ALL automatic decomposition opportunities as CRITICAL severity
 2. Focus EXCLUSIVELY on decomposition - provide ONLY decomposition suggestions
 3. DO NOT suggest ANY other refactoring type (code smells, modernization, organization)
 4. List decomposition issues FIRST by severity: CRITICAL → HIGH → MEDIUM → LOW
 5. Block all other refactoring until cognitive load is reduced

+INTELLIGENT SEVERITY ASSIGNMENT:
+- **CRITICAL**: Automatic thresholds breached (15000+ LOC files, 3000+ LOC classes, 500+ LOC functions)
+- **HIGH**: Evaluate thresholds breached (5000+ LOC files, 1000+ LOC classes, 150+ LOC functions) AND context indicates real issues
+- **MEDIUM**: Evaluate thresholds breached but context suggests legitimate size OR minor organizational improvements
+- **LOW**: Optional decomposition that would improve readability but isn't problematic
+
+CONTEXT ANALYSIS REQUIRED: For EVALUATE threshold breaches, analyze:
+- Is the size justified by domain complexity, performance needs, or language patterns?
+- Would decomposition actually improve maintainability or introduce unnecessary complexity?
+- Are there signs of multiple responsibilities that genuinely need separation?
+- Would changes break working, well-tested legacy code without clear benefit?
+
 CRITICAL SEVERITY = BLOCKING ISSUE: Other refactoring types can only be applied AFTER all CRITICAL decomposition
-is complete. Decomposition reduces navigation complexity, improves understanding, enables focused changes, and makes
-future refactoring possible.
+is complete. However, HIGH/MEDIUM/LOW decomposition can coexist with other refactoring types based on impact analysis.

 **codesmells**: Detect and fix quality issues - long methods, complex conditionals, duplicate code, magic numbers,
 poor naming, feature envy. NOTE: Can only be applied AFTER decomposition if large files/classes/functions exist.
@@ -106,13 +213,14 @@ SCOPE CONTROL
 Stay strictly within the provided codebase. Do NOT invent features, suggest major architectural changes beyond current
 structure, recommend external libraries not in use, or create speculative ideas outside project scope.

-If scope is too large and refactoring would require large parts of the code to be involved, respond ONLY with:
-{"status": "focused_review_required",
- "reason": "<brief explanation>",
- "suggestion": "<specific focused subset to analyze>"}
+If scope is too large and refactoring would require large parts of the code to be involved, respond ONLY with this JSON (no other text):
+{"status": "focused_review_required", "reason": "<brief explanation>", "suggestion": "<specific focused subset to analyze>"}

-OUTPUT FORMAT
-Return ONLY a JSON object with this exact structure:
+CRITICAL OUTPUT FORMAT REQUIREMENTS
+You MUST respond with ONLY the JSON format below. NO introduction, reasoning, explanation, or additional text.
+DO NOT include any text before or after the JSON. Claude cannot parse your response if you deviate from this format.
+
+Return ONLY this exact JSON structure:

 {
  "status": "refactor_analysis_complete",
@@ -148,7 +256,9 @@ Return ONLY a JSON object with this exact structure:
      "source_lines": "45-67",
      "description": "Specific step-by-step action for Claude"
    }
-  ]
+  ],
+  "more_refactor_required": false,
+  "continuation_message": "Optional: Explanation if more_refactor_required is true. Describe remaining work scope."
 }

 QUALITY STANDARDS
@@ -165,14 +275,20 @@ SEVERITY GUIDELINES
  complete)
 - **low**: Style improvements, minor modernization, optional optimizations (only after decomposition complete)

-DECOMPOSITION PRIORITY RULES - CRITICAL SEVERITY:
-1. If ANY file >2000 lines: Mark ALL decomposition opportunities as CRITICAL severity
-2. If ANY class >1500 lines: Mark ALL class decomposition as CRITICAL severity
-3. If ANY function >250 lines: Mark ALL function decomposition as CRITICAL severity
+DECOMPOSITION PRIORITY RULES - ADAPTIVE SEVERITY:
+1. If ANY file >15000 lines: Mark ALL file decomposition opportunities as CRITICAL severity
+2. If ANY class >3000 lines: Mark ALL class decomposition as CRITICAL severity
+3. If ANY function >500 lines: Mark ALL function decomposition as CRITICAL severity
 4. CRITICAL issues MUST BE RESOLVED FIRST - no other refactoring suggestions allowed
-5. Focus EXCLUSIVELY on breaking down large components when CRITICAL issues exist
-6. List ALL decomposition issues FIRST in severity order: CRITICAL → HIGH → MEDIUM → LOW
-7. When CRITICAL decomposition issues exist, provide ONLY decomposition suggestions
+5. Focus EXCLUSIVELY on breaking down AUTOMATIC threshold violations when CRITICAL issues exist
+6. For EVALUATE threshold violations (5000+ LOC files, 1000+ LOC classes, 150+ LOC functions):
+   - Analyze context, domain complexity, performance constraints, legacy stability
+   - Assign HIGH severity only if decomposition would genuinely improve maintainability
+   - Assign MEDIUM/LOW severity if size is justified but minor improvements possible
+   - Skip if decomposition would introduce unnecessary complexity or break working systems
+7. List ALL decomposition issues FIRST in severity order: CRITICAL → HIGH → MEDIUM → LOW
+8. When CRITICAL decomposition issues exist, provide ONLY decomposition suggestions
+9. HIGH/MEDIUM/LOW decomposition can coexist with other refactoring types

 FILE TYPE CONSIDERATIONS:
 - CSS files can grow large with styling rules - consider logical grouping by components/pages
@@ -182,20 +298,20 @@ FILE TYPE CONSIDERATIONS:

 IF EXTENSIVE REFACTORING IS REQUIRED
 If you determine that comprehensive refactoring requires dozens of changes across multiple files or would involve
-extensive back-and-forth iterations that would risk exceeding context limits, you MUST follow this structured approach:
+extensive back-and-forth iterations that would risk exceeding context limits, provide the most critical and high-impact
+refactoring opportunities (typically 5-10 key changes) in the standard response format, and set more_refactor_required
+to true with an explanation.

-1. **Generate Essential Refactorings First**: Create the standard refactor_analysis_complete response with the most
-   critical and high-impact refactoring opportunities (typically 5-10 key changes covering the most important
-   improvements). Focus on CRITICAL and HIGH severity issues. Include full details with refactor_opportunities,
-   priority_sequence, and next_actions_for_claude.
+Focus on CRITICAL and HIGH severity issues first. Include full details with refactor_opportunities, priority_sequence,
+and next_actions_for_claude for the immediate changes, then indicate that additional refactoring is needed.

-2. **Request Continuation**: AFTER providing the refactor_analysis_complete response, append the following JSON
-   format as a separate response (and nothing more after this):
-{"status": "more_refactor_required",
-"message": "Explanation of why more refactoring is needed and overview of remaining work. For example: 'Extensive decomposition required across 15 additional files. Continuing analysis will identify module extraction opportunities in services/, controllers/, and utils/ directories.'"}
+Claude will use the continuation_id to continue the refactoring analysis in subsequent requests when more_refactor_required is true.

-This approach ensures comprehensive refactoring coverage while maintaining quality and avoiding context overflow.
-Claude will use the continuation_id to continue the refactoring analysis in subsequent requests.
+FINAL REMINDER: CRITICAL OUTPUT FORMAT ENFORCEMENT
+Your response MUST start with "{" and end with "}". NO other text is allowed.
+If you include ANY text outside the JSON structure, Claude will be unable to parse your response and the tool will fail.
+DO NOT provide explanations, introductions, conclusions, or reasoning outside the JSON.
+ALL information must be contained within the JSON structure itself.

 Provide precise, implementable refactoring guidance that Claude can execute with confidence.
 """
--- a/tests/test_refactor.py
+++ b/tests/test_refactor.py
@@ -225,9 +225,9 @@ class TestRefactorTool:

        # Should contain the original response plus implementation instructions
        assert valid_json_response in formatted
-        assert "IMMEDIATE NEXT ACTION" in formatted
-        assert "ULTRATHINK & IMPLEMENT REFACTORINGS" in formatted
-        assert "Step 4: COMPLETE REFACTORING" in formatted  # Not more_required, so should be COMPLETE
+        assert "MANDATORY NEXT STEPS" in formatted
+        assert "Start executing the refactoring plan immediately" in formatted
+        assert "MANDATORY: MUST start executing the refactor plan" in formatted

    def test_format_response_invalid_json(self, refactor_tool):
        """Test response formatting with invalid JSON - now handled by base tool"""
@@ -241,8 +241,8 @@ class TestRefactorTool:

        # Should contain the original response plus implementation instructions
        assert invalid_response in formatted
-        assert "IMMEDIATE NEXT ACTION" in formatted
-        assert "ULTRATHINK & IMPLEMENT REFACTORINGS" in formatted
+        assert "MANDATORY NEXT STEPS" in formatted
+        assert "Start executing the refactoring plan immediately" in formatted

    def test_model_category(self, refactor_tool):
        """Test that the refactor tool uses EXTENDED_REASONING category"""
@@ -259,11 +259,39 @@ class TestRefactorTool:
        assert temp == TEMPERATURE_ANALYTICAL

    def test_format_response_more_refactor_required(self, refactor_tool):
-        """Test that format_response handles more_refactor_required status"""
+        """Test that format_response handles more_refactor_required field"""
        more_refactor_response = json.dumps(
            {
-                "status": "more_refactor_required",
-                "message": "Large codebase requires extensive refactoring across multiple files",
+                "status": "refactor_analysis_complete",
+                "refactor_opportunities": [
+                    {
+                        "id": "refactor-001",
+                        "type": "decompose",
+                        "severity": "critical",
+                        "file": "/test/file.py",
+                        "start_line": 1,
+                        "end_line": 10,
+                        "context_start_text": "def test_function():",
+                        "context_end_text": "    return True",
+                        "issue": "Function too large",
+                        "suggestion": "Break into smaller functions",
+                        "rationale": "Improves maintainability",
+                        "code_to_replace": "original code",
+                        "replacement_code_snippet": "refactored code",
+                        "new_code_snippets": [],
+                    }
+                ],
+                "priority_sequence": ["refactor-001"],
+                "next_actions_for_claude": [
+                    {
+                        "action_type": "EXTRACT_METHOD",
+                        "target_file": "/test/file.py",
+                        "source_lines": "1-10",
+                        "description": "Extract method from large function",
+                    }
+                ],
+                "more_refactor_required": True,
+                "continuation_message": "Large codebase requires extensive refactoring across multiple files",
            }
        )

@@ -275,12 +303,11 @@ class TestRefactorTool:

        # Should contain the original response plus continuation instructions
        assert more_refactor_response in formatted
-        assert "IMMEDIATE NEXT ACTION" in formatted
-        assert "ULTRATHINK & IMPLEMENT REFACTORINGS" in formatted
-        assert "VERIFY CHANGES WORK" in formatted
-        assert "Step 4: CONTINUE WITH MORE REFACTORING" in formatted  # more_required, so should be CONTINUE
+        assert "MANDATORY NEXT STEPS" in formatted
+        assert "Start executing the refactoring plan immediately" in formatted
+        assert "MANDATORY: MUST start executing the refactor plan" in formatted
+        assert "AFTER IMPLEMENTING ALL ABOVE" in formatted  # Special instruction for more_refactor_required
        assert "continuation_id" in formatted
-        assert "immediately continue with more refactoring analysis" in formatted


 class TestFileUtilsLineNumbers:
--- a/tests/test_special_status_parsing.py
+++ b/tests/test_special_status_parsing.py
@@ -326,30 +326,3 @@ class TestSpecialStatusParsing:
        # Should fall back to normal response since validation failed
        assert result.status == "success"
        assert result.content_type == "text"
-
-    def test_more_refactor_required_parsing(self):
-        """Test that more_refactor_required status is parsed correctly"""
-        import json
-
-        json_response = {
-            "status": "more_refactor_required",
-            "message": "Large codebase requires extensive decomposition across 15 files. Continuing analysis for remaining modules.",
-        }
-
-        result = self.tool._parse_response(json.dumps(json_response), self.request)
-
-        assert result.status == "more_refactor_required"
-        assert result.content_type == "json"
-        parsed_content = json.loads(result.content)
-        assert parsed_content["status"] == "more_refactor_required"
-        assert "Large codebase requires extensive decomposition" in parsed_content["message"]
-
-    def test_more_refactor_required_missing_message(self):
-        """Test that more_refactor_required without required message field fails validation"""
-        response_json = '{"status": "more_refactor_required"}'
-
-        result = self.tool._parse_response(response_json, self.request)
-
-        # Should fall back to normal processing since validation failed (missing required field)
-        assert result.status == "success"
-        assert result.content_type == "text"
--- a/tests/test_testgen.py
+++ b/tests/test_testgen.py
@@ -284,7 +284,7 @@ class TestComprehensive(unittest.TestCase):

        # Check formatting includes new action-oriented next steps
        assert raw_response in formatted
-        assert "IMMEDIATE NEXT ACTION" in formatted
+        assert "EXECUTION MODE" in formatted
        assert "ULTRATHINK" in formatted
        assert "CREATE" in formatted
        assert "VALIDATE BY EXECUTION" in formatted
--- a/tools/base.py
+++ b/tools/base.py
@@ -105,6 +105,7 @@ class BaseTool(ABC):
        self.name = self.get_name()
        self.description = self.get_description()
        self.default_temperature = self.get_default_temperature()
+        # Tool initialization complete

    @abstractmethod
    def get_name(self) -> str:
@@ -169,14 +170,14 @@ class BaseTool(ABC):
        Returns:
            bool: True if model parameter should be required in the schema
        """
-        from config import DEFAULT_MODEL, IS_AUTO_MODE
+        from config import DEFAULT_MODEL
        from providers.registry import ModelProviderRegistry

        # Case 1: Explicit auto mode
-        if IS_AUTO_MODE:
+        if DEFAULT_MODEL.lower() == "auto":
            return True

-        # Case 2: Model not available
+        # Case 2: Model not available (fallback to auto mode)
        if DEFAULT_MODEL.lower() != "auto":
            provider = ModelProviderRegistry.get_provider_for_model(DEFAULT_MODEL)
            if not provider:
--- a/tools/models.py
+++ b/tools/models.py
@@ -40,7 +40,6 @@ class ToolOutput(BaseModel):
        "focused_review_required",
        "test_sample_needed",
        "more_tests_required",
-        "more_refactor_required",
        "refactor_analysis_complete",
        "resend_prompt",
        "continuation_available",
@@ -99,13 +98,6 @@ class MoreTestsRequired(BaseModel):
    pending_tests: str = Field(..., description="List of pending tests to be generated")


-class MoreRefactorRequired(BaseModel):
-    """Request for continuation when refactoring requires extensive changes"""
-
-    status: Literal["more_refactor_required"] = "more_refactor_required"
-    message: str = Field(..., description="Explanation of why more refactoring is needed and what remains to be done")
-
-
 class RefactorOpportunity(BaseModel):
    """A single refactoring opportunity with precise targeting information"""

@@ -156,7 +148,6 @@ SPECIAL_STATUS_MODELS = {
    "focused_review_required": FocusedReviewRequired,
    "test_sample_needed": TestSampleNeeded,
    "more_tests_required": MoreTestsRequired,
-    "more_refactor_required": MoreRefactorRequired,
    "refactor_analysis_complete": RefactorAnalysisComplete,
 }

--- a/tools/refactor.py
+++ b/tools/refactor.py
@@ -563,10 +563,10 @@ class RefactorTool(BaseTool):

    def format_response(self, response: str, request: RefactorRequest, model_info: Optional[dict] = None) -> str:
        """
-        Format the refactoring response.
+        Format the refactoring response with immediate implementation directives.

        The base tool handles structured response validation via SPECIAL_STATUS_MODELS,
-        so this method focuses on presentation formatting.
+        so this method focuses on ensuring Claude immediately implements the refactorings.

        Args:
            response: The raw refactoring analysis from the model
@@ -574,78 +574,52 @@ class RefactorTool(BaseTool):
            model_info: Optional dict with model metadata

        Returns:
-            str: The response (base tool will handle structured parsing)
+            str: The response with clear implementation directives
        """
        logger.debug(f"[REFACTOR] Formatting response for {request.refactor_type} refactoring")

-        # Check if this is a more_refactor_required response
+        # Check if this response indicates more refactoring is required
        is_more_required = False
        try:
            import json

            parsed = json.loads(response)
-            if isinstance(parsed, dict) and parsed.get("status") == "more_refactor_required":
+            if isinstance(parsed, dict) and parsed.get("more_refactor_required") is True:
                is_more_required = True
        except (json.JSONDecodeError, ValueError):
-            # Not JSON or not more_refactor_required
+            # Not JSON or parsing error
            pass

-        # Always add implementation instructions for any refactoring response
-        step4_title = (
-            "## Step 4: CONTINUE WITH MORE REFACTORING" if is_more_required else "## Step 4: COMPLETE REFACTORING"
-        )
-        step4_intro = (
-            "Once all refactorings above are implemented and verified working, IMMEDIATELY continue the analysis:"
-            if is_more_required
-            else "Once all refactorings above are implemented and verified working:"
-        )
-        step4_action = (
-            "Use the refactor tool again with the SAME parameters but include the continuation_id from this response to get additional refactoring opportunities. The model will provide more refactoring suggestions building on what was already completed."
-            if is_more_required
-            else "Review all changes made and ensure the codebase is cleaner, more maintainable, and follows best practices."
-        )
-        critical_msg = (
-            "apply all refactorings, validate they work, then immediately continue with more refactoring analysis. Take full ownership of the refactoring implementation and continue until all opportunities are addressed."
-            if is_more_required
-            else "create, modify, and reorganize files as needed. Take full ownership of the refactoring implementation and ensure all changes work correctly."
-        )
+        continuation_instruction = ""
+        if is_more_required:
+            continuation_instruction = """

+AFTER IMPLEMENTING ALL ABOVE: Use the refactor tool again with the SAME parameters but include the continuation_id from this response to get additional refactoring opportunities."""
+        # endif
+
+        # Return response + steps
        return f"""{response}

 ---

-# IMMEDIATE NEXT ACTION
+MANDATORY NEXT STEPS:

-Claude, you are now in EXECUTION MODE. Take immediate action:
+Start executing the refactoring plan immediately:
+1. INFORM USER by displaying a brief summary of required refactorings
+2. CREATE A CHECKLIST of each refactoring to keep a record of what is to change, how and why
+3. IMPLEMENT each refactoring opportunity immediately - think carefully about each change as you implement
+4. CREATE new files as needed where decomposition is suggested
+5. MODIFY existing files to apply improvements as needed
+6. UPDATE all imports, references, and dependencies as needed
+7. VERIFY each change works before moving to the next

-## Step 1: ULTRATHINK & IMPLEMENT REFACTORINGS
-ULTRATHINK while implementing these refactorings. Verify EVERY code reference, import, dependency, and access modifier is 100% accurate before making changes.
+After each refactoring is implemented:
+Show: `IMPLEMENTED: [brief description] - Files: [list]` to the user

- **IMPLEMENT** all refactoring opportunities listed above in priority order
- **CREATE** any new files needed for decomposition or reorganization
- **MOVE** code to appropriate locations following the refactoring plan
- **UPDATE** all imports and references to maintain functionality
- **VALIDATE** that no functionality is broken by the changes
+IMPORTANT:
+- DO NOT SKIP any refactorings - implement them all one after another
+- VALIDATE each change doesn't break functionality
+- UPDATE any imports and references properly and think and search for any other reference that may need updating
+- TEST if possible to ensure changes work where tests are available

-## Step 2: VERIFY CHANGES WORK
-**MANDATORY**: After each refactoring step:
- Ensure all imports are updated correctly
- Verify access modifiers (private/public/internal) still work
- Check that all references to moved code are updated
- Run any existing tests to confirm nothing is broken
- Fix any issues that arise from the refactoring
-
-## Step 3: DISPLAY RESULTS TO USER
-After implementing each refactoring, show the user:
-```
-✅ Refactored: [refactor-id] - Brief description
-   - Files modified: [list of files]
-   - [Key change summary]
-```
-
-{step4_title}
-{step4_intro}
-
-{step4_action}
-
-**CRITICAL**: Do NOT stop after generating the analysis - you MUST {critical_msg}"""
+MANDATORY: MUST start executing the refactor plan and follow each step listed above{continuation_instruction}"""
--- a/tools/testgen.py
+++ b/tools/testgen.py
@@ -441,20 +441,18 @@ class TestGenTool(BaseTool):

 ---

-# IMMEDIATE NEXT ACTION
-
 Claude, you are now in EXECUTION MODE. Take immediate action:

-## Step 1: ULTRATHINK & CREATE TESTS
-ULTRATHINK while creating these tests. Verify EVERY code reference, import, function name, and logic path is
+## Step 1: THINK & CREATE TESTS
+ULTRATHINK while creating these in order to verify that every code reference, import, function name, and logic path is
 100% accurate before saving.

- **CREATE** all test files in the correct project structure
- **SAVE** each test with proper naming conventions
- **VALIDATE** all imports, references, and dependencies are correct as required by the current framework
+- CREATE all test files in the correct project structure
+- SAVE each test using proper naming conventions
+- VALIDATE all imports, references, and dependencies are correct as required by the current framework / project / file

 ## Step 2: DISPLAY RESULTS TO USER
-After creating each test file, show the user:
+After creating each test file, MUST show the user:
 ```
 ✅ Created: path/to/test_file.py
   - test_function_name(): Brief description of what it tests
@@ -463,11 +461,11 @@ After creating each test file, show the user:
 ```

 ## Step 3: VALIDATE BY EXECUTION
-**MANDATORY**: Run the tests immediately to confirm they work:
- Install any missing dependencies first
+CRITICAL: Run the tests immediately to confirm they work:
+- Install any missing dependencies first or request user to perform step if this cannot be automated
 - Execute the test suite
 - Fix any failures or errors
- Confirm 100% pass rate
+- Confirm 100% pass rate. If there's a failure, re-iterate, go over each test, validate and understand why it's failing

 ## Step 4: INTEGRATION VERIFICATION
 - Verify tests integrate with existing test infrastructure
@@ -477,6 +475,6 @@ After creating each test file, show the user:
 ## Step 5: MOVE TO NEXT ACTION
 Once tests are confirmed working, immediately proceed to the next logical step for the project.

-**CRITICAL**: Do NOT stop after generating - you MUST create, validate, run, and confirm the tests work. Take full
-ownership of the testing implementation and move to your next work. If you were supplied a more_work_required request
-in the response above, you MUST honor it."""
+MANDATORY: Do NOT stop after generating - you MUST create, validate, run, and confirm the tests work and all of the
+steps listed above are carried out correctly. Take full ownership of the testing implementation and move to your
+next work. If you were supplied a more_work_required request in the response above, you MUST honor it."""